Diving Deep into Python, the not-so-obvious Language Parts

Diving Deep into Python, the not-so-obvious Language Parts

Sections Sections The C3 class resolution algorithm for multiple class inheritance Assignment operators and lists – simple-add vs. add-AND operators True and False in the datetime module Python reuses objects for small integers – use “==” for equality, “is” for identity And to illustrate the test for equality (==) vs. identity (is): Shallow vs. deep […]

A Modern Guide to Getting Started with Data Science and Python

A Modern Guide to Getting Started with Data Science and Python

Thomas originally posted this article here at http://twiecki.github.io  Python has an extremely rich and healthy ecosystem of data science tools. Unfortunately, to outsiders this ecosystem can look like a jungle (cue snake joke). In this blog post I will provide a step-by-step guide to venturing into this PyData jungle. What’s wrong with the many lists of PyData […]

Visualizing Top Tweeps with t-SNE, in Javascript

Visualizing Top Tweeps with t-SNE, in Javascript

I was looking into various ways of embedding unlabeled, high-dimensional data in 2 dimensions for visualization. A wide variety of methods have been proposed for this task. This Review paper from 2009 contains nice references to many of them (PCA, Kernel PCA, Isomap, LLE, Autoencoders, etc.). If you have Matlab available, the Dimensionality Reduction Toolbox […]

Visualizing Census Estimate Margins of Error in R

Visualizing Census Estimate Margins of Error in R

A key feature of American Community Survey (ACS) data is that the reported values contain both estimates and margins of error. The margins of error, unfortunately, are often overlooked. After meeting with Ezra Glenn last year I gained a new appreciation of them. Today I’ll demonstrate how to visualize them, as well as how they tend to […]

Even Further Beyond One-Hot: Feature Hashing

Even Further Beyond One-Hot: Feature Hashing

In the previous post about categorical encoding we explored different methods for converting categorical variables into numeric features.  In this post, we will explore another method: feature hashing. Feature hashing, or the hashing trick is a method for turning arbitrary features into a sparse binary vector.  It can be extremely efficient by having a standalone hash […]

Margin of Error by Geography in the American Community Survey (ACS)

Margin of Error by Geography in the American Community Survey (AC...

Today I will demonstrate how the margin of error in American Community Survey (ACS) estimates grow as the size of the geography decreases. The final chart that we’ll create is this: The way I interpret the above chart is this: The ACS is very confident about its state-level estimates. It’s a bit less confident about […]

4. Unsupervised Learning: Seeking Representations of the Data

4. Unsupervised Learning: Seeking Representations of the Data

4.1. Clustering: grouping observations together The problem solved in clustering Given the iris dataset, if we knew that there were 3 types of iris, but did not have access to a taxonomist to label them: we could try a clustering task: split the observations in well-separated group called clusters. 4.1.1. K-means clustering Note that there exists many […]

Amazon will make $41B this Holiday Season! Forecasting Quarterly Revenue

Amazon will make $41B this Holiday Season! Forecasting Quarterly ...

The holiday shopping season is in full swing! The economy is relatively strong compared to a few years back and so retail sales are probably going to be strong especially for amazon. Other retailers like Target and Wal-Mart are also running amazing black Friday and holiday sales to attract customers. However, amazon has consistently shown […]

Are Data Sets the New Server Rooms?

Are Data Sets the New Server Rooms?

This blog post Data sets are the new server rooms makes the point that a bunch of companies raise a ton of money to go get really proprietary awesome data as a competitive moat. Because once you have the data, you can build a better product, and no one can copy it (at least not […]