Kaggle Projects Presentation Sawinder Pal Kaur

Sawinder Pal Kaur, PhD
Kaggle Projects

Outline
 Problem
 Statement
 Methods used
 Results

Problem: Digit Recognizer
 Identify handwritten single digits 0~9, based
on grey scale images.
Sample images

Statement
Each image is 28 pixels in height and 28 pixels in width, for a
total of 784 pixels in total. Each pixel has a single pixel-
value associated with it, indicating the lightness or darkness
of that pixel, with higher numbers meaning darker. This
pixel-value is an integer between 0 and 255, inclusive.
pixel0 pixel1 pixel2 ... pixel27
| | | ... |

Statement
 The training data set, has 785 columns. The first
column, called "label", is the digit that was drawn by the
user. The rest of the columns contain the pixel-values of
the associated image.
 The test data set, is the same as the training set, except
that it does not contain the "label" column.
 Goal of the problem is to predict the images in the test
data set

Methods used to solve the
problem
 Random Forest
 Support Vector Machine (SVM)
 K-Nearest Neighborhood (KNN)

Random Forest
 Ensemble of decision trees
 Each tree is trained on a bootstrapped sample of the
original data set
 Each time a node is split, only a randomly chosen subset
of the dimensions are considered for splitting
 Each tree is fully grown and not pruned
 When a new input is entered into the system, it is run down
all of the trees. The result may either be an average or
weighted average of all of the terminal nodes that are
reached, or, in the case of categorical variables, a voting
majority

Support Vector Machine
 In a SVM model original objects (training data) are treated
as a points in the space (input space)
 These are mapped (rearranged) to a new space (feature
space) using mathematical functions called kernels
 After mapping objects of separate categories are divided
by a clear gap as wide as possible

K Nearest Neighborhood
 Basic idea
 If it walks like a duck, quacks like a duck than it is probably a duck
 There are three key elements :
 a set of labeled objects (e.g., a set of stored records)
 a distance or similarity metric to compute distance between objects,
and
 the value of k, the number of nearest neighbors.
 To classify an unlabeled object :
 the distance of this object to the labeled objects is computed,
 its k-nearest neighbors are identified, and
 the class labels of these nearest neighbors are then used to
determine the class label of the object.

Results
 Random Forests with 500 trees gave 97%
accuracy on the test data.
 SVM with RBF kernel and C=1, gave 97.71%
 KNN with k=10 gave 96% accuracy.

Titanic: Machine Learning
from Disaster

Problem
 The sinking of the RMS Titanic is one of the most
infamous shipwrecks in history.
 One of the reasons that the shipwreck led to such loss
of life was that there were not enough lifeboats for the
passengers and crew. Although there was some
element of luck involved in surviving the sinking, some
groups of people were more likely to survive than
others, such as women, children, and the upper-class.
 In this project, the analysis of what sorts of people
were likely to survive is done. In particular, the tools of
machine learning are applied to predict which
passengers survived the tragedy.

Statement
 The historical data has been split into two
groups, a 'training set' and a 'test set'. For the
training set, the outcome whether or not the
passenger survived the sinking ( 0 for deceased,
1 for survived ) is provided.
 The goal of the problem is to predict the
outcome for each passenger in the test set.

Methods used to solve the
problem
• Random Forest
• Support Vector Machine (SVM)

Results
 Random Forests with 300 trees gave 77.9%
 SVM with RBF kernel and C=1, gave 77.7%

Kaggle Projects Presentation Sawinder Pal Kaur

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Kaggle Projects Presentation Sawinder Pal Kaur

Similar a Kaggle Projects Presentation Sawinder Pal Kaur (20)

Último

Último (20)

Kaggle Projects Presentation Sawinder Pal Kaur