Quick Alternative to Modulo Reduction

Quick Alternative to Modulo Reduction

Suppose you want to pick an integer at random in a set of N elements. Your computer has functions to generate random 32-bit integers, how do you transform such numbers into indexes no larger than N? Suppose you have a hash table with a capacity N. Again, you need to transform your hash values (typically 32-bit […]

Why Do Certain Musical Notes Sound “Good” Together?

Why Do Certain Musical Notes Sound “Good” Together?

Two notes sounding “good” together sounds like a very subjective statement.  The songs we like and the sounds we like are incredibly dependent on our culture, personality, mood, etc.  

What Can Baby Names Tell Us About Our Narcissism Epidemic?

What Can Baby Names Tell Us About Our Narcissism Epidemic?

Introduction: How might baby names lead to narcissism? Imagine you are walking down the street when someone calls out your name. You turn around finding a girl that looks familiar–you know you guys have met but you can’t remember where. She starts chatting with you and it is obvious that she remembers you well. But you […]

Recurrent Neural Networks

Recurrent Neural Networks

So far on this blog, we’ve mostly looked at data in two forms – vectors in which each data point is defined by a fixed set of features, and graphs in which each data point is defined by its connections to other data points. For other forms of data, notably sequences such as text and sound, I […]

Principal Component Analysis in 3 Simple Steps

Principal Component Analysis in 3 Simple Steps

Principal Component Analysis (PCA) is a simple yet popular and useful linear transformation technique that is used in numerous applications, such as stock market predictions, the analysis of gene expression data, and many more. In this tutorial, we will see that PCA is not just a “black box”, and we are going to unravel its […]

Entrepreneurial Geekiness

Entrepreneurial Geekiness

Over the last week I’ve surveyed my PyDataLondon meetup community (3,400+ members) to ask “Which version of Python do you use at work and at home?”. The goal is to gain evidence about which versions of Python are used by Data Scientists. This will help tool developers so they can make evidence-based decisions (e.g. this Dask […]

Stats Can’t Make Modeling Decisions

Stats Can’t Make Modeling Decisions

Here’s a question that appeared recently on the Reddit statistics forum: If effect sizes of coefficient are really small, can you interpret as no relationship?  Coefficients are very significant, which is expected with my large dataset. But coefficients are tiny (0.0000001). Can I conclude no relationship? Or must I say there is a relationship, but […]

Social Network Analysis

Social Network Analysis

Social media data is inexpensive, fast, and effortless to access. As a result, businesses are increasingly using data from social networking sites such as Twitter, Facebook, Youtube, and Google Search. As more value is being realized from online communities, graph-based approaches are becoming increasingly important for mining social networks. Presentation info:  Eric J. White will […]

Blood, Sweat & Civic Hacking

Blood, Sweat & Civic Hacking

The recent article “Open Data and Civic Apps: First-Generation Failures, Second Generation Improvements” by Melissa Lee, Esteve Almirall and Jonathan Wareham looks at early efforts to build civic applications through government-sponsored app challenges. The article evaluates the outcomes of some of the early government app challenges like the District of Columbia’s Apps for Democracy Contest […]