Articles


Spam Classification

Spam has been around since the beginning of the internet. In fact, the use of spam over a network stretches back all the way to 1884 when wealthy Americans were sent unsolicited investment offers over the telegraph[1]. In more modern times, the first appearance of modern email spam occurred on ARPANET, a military precursor to the Internet, when "a man named Gary Turk sent an e-mail solicitation to 400 people, advertising his line of new computers"[1]. In this notebook we will consider the problem of finding spam in YouTube comments... Read more

Stack Exchange Tag Prediction

Stack Exchange is a popular question and article-based website where users can post and answer questions. Each question has a short title, followed by a longer description of the problem. The goal of this post is to come up with a machine learning algorithm that can predict the tags on a question given the content of the post. I also try to make the predictions work across the subjects, so that the algorithm can be trained on a cooking dataset, yet offer predictions for the physics dataset... Read more

UCI Heart Disease Analysis

The UCI data repository contains three datasets on heart disease. Each dataset contains information about several patients suspected of having heart disease such as whether or not the patient is a smoker, i.e. the patient's resting heart rate, age, sex, etc. The goal of this notebook is to use machine learning and statistical techniques to see if we can predict both the presence and severity of heart disease from the features given. In addition we will also analyze which features are the most important in predicting the presence and severity of heart disease... Read more