1. Predict Survival on Titanic
● This project was my final assignment for
"Machine Learning and Deep Learning"
class at UCSC. I did it with my partner
Sneha Das. Sneha did almost all writing for
our report and I did all machine learning
and data engineering parts since I had
Python and sklearn experience.
● We tried some different models and
checked which one had the best fit.
● We also tried some different set of
features.
● Using Gradient Boosting classifier Random
Forest with only four features we got an
accuracy of 0.79904
● For this assignment we got 18 points of 18
possible and a comment that our results
were very good.
● We got 885st/3661 at Kaggle competition:
https://www.kaggle.com/etcareva/results
● Our tools:
○ Python
○ Sklearn
○ pandas
https://github.com/Katy-katy/titanic_machine_lear
ing_python_pandas_sklearn
2. Shelter Animal Outcomes
● This project was done for a Kaggle
competition:
https://www.kaggle.com/c/shelter-animal-o
utcomes/data
● My goal was to predict the outcome of the
animals as they leave the Animal Center. I
used multi class logistic regression and got
log_loss 0.92871. Then I slightly improve
the result using VotingClassifier with
logistic regression, random forest, and
CalibratedClassifier as estimators. I got
log_loss 0.92197.
● I tried some different models and set of
features and checked which one had the
best fit.
https://github.com/Katy-katy/Shelter-Animal-Outc
omes-Machine-Learning-Python
● I also did some useful observations.
○ Neutered Males and Spayed
Females have a great chance to be
adopted, but it is very hard to find a
new family for Intact Males and Intact
Females.
○ The animals with unknown names
have almost no chance to return to
owners and have less chance to be
adopted.
3. Classification of Restaurant
Reviews using Python and nltk
● An assignment for "Introduction to Natural
Language Processing" class at UCSC.
● My goal was to predict the category
(positive or negative) of restaurant reviews
using nltk library and machine learning
algorithms.
● A set of labeled reviews was provided for
as. But it had labels: 1 (very negative) - 5
(positive). Thus, I remove the reviews with
score “3”
● Also, my assignment was
○ to compare accuracy given by
NaiveBayesClassifier and
DecisionTreeClassifier
○ to try some different number of
features
● I removed stopwords, created unigrams
and bigrams, and then tried to use up to
32768 best feature.
● Finally, I got accuracy 0.83 using Naive
Bayes classifier and only 32 best features.
● My grade was 9.8 (max 10.0) and I got a
comment from my instructor: "In all, an
excellent job!"
https://github.com/Katy-katy/Classification-of-Restau
rant-Reviews-using-Python-and-nltk-/blob/master/RE
ADME.md
4. Question Answering System
using Python and nltk
● I did this project with J. Chien and E.
Seither. Our goal was to design and to
build a question answering system which
can produce the answers to the questions
about a given text.
● We used
○ Lemmatization
○ Overlapping
● We also used SnowballStemmer from
nltk.stem to add stems of words in the
questions and and in the texts. This
technique helped us get accurate
overlapping and a better recall since the
program returned the sentence with max
overlapping as the answer.
● We improved the recall by using chunking
and sorting questions by the first word
(What, Who, Where, When, Why).
● We kept the propositions and did not
remove the stopword. We also added
keywords in question to overlap in text. For
example, for “Where” questions we added
words ["in", "on", "at", "under", "into",
"upon", "along"] to the questions.
● The performance of our system was
evaluated using the F-measure statistic
which combined recall and precision in a
single evaluation metric. We got:
AVERAGE RECALL = 0.7094 (165.29 / 233)
AVERAGE PRECISION = 0.6760 (154.80 / 229)
AVERAGE F-MEASURE = 0.6923
● Our result put us in the best 5 teams for
this class.
https://github.com/Katy-katy/Question-A
nswering-system--Python-nltk-
5. Pac-Man as an AI Agent:
search
● An assignment for "Artificial Intelligence"
class at UCSC.
● My goal was to implement some search
functions to help Pac-Man to find paths
through his maze world, both to reach a
particular location and to collect food
efficiently.
● I implemented:
○ depth-first search
○ breadth-first search
○ uniform cost search
○ A star search
https://github.com/Katy-katy/Pac-Man-as-an-
AI-Agent-search-
Breadth-First Search
Depth-First Search