Recent Posts
-
April 19, 2016
Scraping IMDB top 250 movies in Python
Web crawling is much easier than it sounds like. I just started to use Python for about 3 weeks and now, with the help of a few modules, I’m able to start to scrape IMDB (static) pages. So … it’s not that hard. Why static pages? You will find it m...
-
April 10, 2016
Python Simulation Practice -- The Monty Hall Problem
Recently I’m following the Harvard CS109 online course, which definitely is an awesome one among many data science MOOCs. I came across the very interesting statistics problem, Monty Hall Probelm, in hw0 where we were trying to solve the problem ...
-
January 11, 2016
Computational Inference in Logistic Regression
Logistic regression is one of the most commonly used techniques to analyze binarydata. The classical method to estimate the parameters is through Newton-Rapson. Here I’m demonstratingthe alternative method: Bayesian method (MCMC) and make a compar...
-
December 23, 2015
Linsanity
Last Saturday night I watched a documentary film called linsanity starred Jeremy Lin who is the model Asian American NBA player.There are too many articles analyzing linsanity phenomenon and I don’t think I’m the right person to talk much about so...
-
December 07, 2015
Learning from Imbalanced Data
I once gave a short talk during lab’s Machine Learning seminar regarding classification algorithms in imbalanced data. Technically speaking, any data set that exhibits an unequal distribution between its classes can be considered imbalanced.(He H,...