SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Data Science that Vacation.
Using Data Science to find where you should take your next vacation.
TJ Stalcup
Lead Mentor @Thinkful
API Evangelist @WealthEngine
Pokemon Master
Englightened Ingress Agent
About Me
***see slide***
Speaker notes
What's your name?
What do you do?
Why are you interested in data science?
About you
Vacationing is fun.
Planning a vacation is not.
Data science can help.
TONIGHT: Data, Vacation, and AI
A text analyzer to take your writeup of your dream vacation and find your best match.
To do that we need 3 things:
A set of vacation reviews (we're going to use hotel reviews)
A text based model for hotel matching
Dream vacation descriptions
What we're building:
The data tonight is a sample of reviews of 1000 hotels collected by Datafinity, available
on Kaggle .
Has information about the hotel (name, location, etc)
Information about the reviewer
Review Text
Rating
here
The Data
Text processing is a slow and involved process
This way we can make a model and perform matching in a relatively quick amount of
time
Why is it slow?
Why only 1000 hotels?
Text data is often referred to as 'unstructured data'.
But what is structure data?
Let's talk about text
Structured Data
NameName EmailEmail Date of SignupDate of Signup
Jas Singh jas@thinkful.com 11/29/2017
... ... ...
This data is nice. It's a table with columns and we know what to
expect.
Unstructured Data
This data is not as nice. It's unpredictable, varying in length and we
don't really know what's what. It just kind of looks like one big thing.
The text above (and this text here) is unstructured data....
The Problems with Unstructured
Unstructured data gives us a few specific problems:
- What is a data point? Is it all just one data point? Is every character
a data point?
- How do we compare data? For things to be similar, how similar do
they have to be?
- What parts of the data matter? Are they all equally important?
An example
This is our test sentence.
So what parts of this sentence matter? What are our data points?
An example
This is our test sentence.
The words matter! And whitespace gives us a way to find them.
An example
This is our test sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
An example
This is our test sentence.
And this is a second sentence.
ThisThis isis ourour testtest sentence.sentence.
1 1 1 1 1
0 1 0 0 1
We've taken our data and turned it into a table.
We added structure!
Bag of words
This is called a 'bag of words' approach. (It's also called vectorizing.)
We took our initial sentence and created a bag for each word.
Then we went through any following sentences and counted the
number of times we found a word that matched.
It leaves us with a table where each column is a word and rows
are counts
Punctuation and case
However, in looking at our example, something should seem logically
off.
Things like 'This' and 'this' are not considered equal because the
computer doesn't see them as the same. The case is a difference.
This is why you (almost) always preprocess text data.
Back to the example
This is our test sentence. ---> this is our test sentence
And this is a second sentence. ---> and this is a second sentence
thisthis isis ourour testtest sentencesentence
1 1 1 1 1
1 1 0 0 1
Getting rid of case and punctuation makes comparisons easier and
more effective (particularly on small data)
Stop words
But there's more!
Some words don't matter. They don't really tell us anything.
These are called 'stop words'.
Things like 'it', 'is', 'the' are usually just thrown out.
Back to the example
This is our test sentence. ---> this our test sentence
And this is a second sentence. ---> this second sentence
thisthis ourour testtest sentencesentence
1 1 1 1
1 0 0 1
Now we have vectors of the essentials for each sentence.
This is something we can build a model on!This is something we can build a model on!
The Model
Our model is going to be Random Forest.
A random forest is an ensemble of decision trees to predict the most
likely class of an outcome variable.
What does that mean?
Decision Trees
A set of rules that get us to a prediction, in the form of a tree.
You can think of it like a computer building a version of 20
questions.
Decision Trees
Random Forest
A random forest builds a lot of different decision trees and then lets
each one vote.
Our questions will be things like "Contains the word 'beach'" or
"Contains the world 'sun' 2 or more times".
The Notebook
We're going to use a Google hosted Python to build this
model.
http://bit.ly/ideal-vacation
notebook
Shortcomings
Our model has a few weaknesses:
-What about relative frequency?
-What about context?
Relative Frequency
It has a nice beach.
vs
10 pages of text that says the word beach once.
Relative Frequency
Each one scores a 1 for beach.
TFIDF is the answer. It rates each word by its relative frequency.
So the word beach in a ten word sentence counts more than one
mention in 10000 words.
Context
'I hate beaches and love cities'
vs
'I love beaches and hate cities'
Our model would see these as the same thing.
Context - N-Grams
We can get a sense of context with n-grams. Each feature is a set of
words rather than individual words.
So we'd get features like 'love cities' and 'hate beaches' rather than
'love' 'cities' 'hate' 'beaches'.
There's a lot more
This all falls under the banner of Natural Language Processing, or
NLP, one of the largest and most exciting fields of data science and
artificial intelligence.
It's the basis for things like chatbots and Siri and the Turing test itself.
There is a lot of fun to be had in this space.
QA with Alex
Data Science @ Thinkful
Flexible, project-based curriculum to help you become the data
scientist you want to be
You don’t just learn skills, you get to make things
Mentor support from experts in the industry
Also, there's a job guarantee
Learning Mentor
Career MentorProgram Manager
Local Community
You
Unprecedented Support
http://bit.ly/dc-ds-trialhttp://bit.ly/dc-ds-trial
Initial 2-week trial course
Start with Python and Statistics
Unlimited Q&A Sessions
Option to continue with full bootcamp
Financing & scholarships available
Offer valid for tonight onlyOffer valid for tonight only
BenjyBenjy SchechnerSchechner
Education Advisor
Thinkful Two Week Trial

Más contenido relacionado

Similar a Data Science Vacation Matching

Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022Dominik Lukes
 
Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022Dominik Lukes
 
The tyranny of averages
The tyranny of averagesThe tyranny of averages
The tyranny of averagesPVS-Studio
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationChengeng Ma
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Dialog system understanding
Dialog system understandingDialog system understanding
Dialog system understandingTran Trung
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysiskrukob9
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements LabelingData Works MD
 
Sentence fragments
Sentence fragmentsSentence fragments
Sentence fragmentssasknic
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)Nick Hathaway
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Lucidworks
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterSubhabrata Mukherjee
 

Similar a Data Science Vacation Matching (20)

Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022
 
Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022Speech Recognition: Art of the possible - DigiFest 2022
Speech Recognition: Art of the possible - DigiFest 2022
 
The tyranny of averages
The tyranny of averagesThe tyranny of averages
The tyranny of averages
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classification
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Dialog system understanding
Dialog system understandingDialog system understanding
Dialog system understanding
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
 
Sentence fragments
Sentence fragmentsSentence fragments
Sentence fragments
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)HackYale - Natural Language Processing (Week 1)
HackYale - Natural Language Processing (Week 1)
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Nlp
NlpNlp
Nlp
 

Más de TJ Stalcup

Intro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DCIntro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DCTJ Stalcup
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash CourseTJ Stalcup
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Build Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSSBuild Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSSTJ Stalcup
 
Intro to Python
Intro to PythonIntro to Python
Intro to PythonTJ Stalcup
 
Intro to Python
Intro to PythonIntro to Python
Intro to PythonTJ Stalcup
 
Predict the Oscars using Data Science
Predict the Oscars using Data SciencePredict the Oscars using Data Science
Predict the Oscars using Data ScienceTJ Stalcup
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptTJ Stalcup
 
Build a Game with Javascript
Build a Game with JavascriptBuild a Game with Javascript
Build a Game with JavascriptTJ Stalcup
 
Thinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSSThinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSSTJ Stalcup
 
Build Your Own Instagram Filters
Build Your Own Instagram FiltersBuild Your Own Instagram Filters
Build Your Own Instagram FiltersTJ Stalcup
 
Choosing a Programming Language
Choosing a Programming LanguageChoosing a Programming Language
Choosing a Programming LanguageTJ Stalcup
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash CourseTJ Stalcup
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSTJ Stalcup
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSTJ Stalcup
 
Build a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScriptBuild a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScriptTJ Stalcup
 
Intro to Javascript
Intro to JavascriptIntro to Javascript
Intro to JavascriptTJ Stalcup
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptTJ Stalcup
 

Más de TJ Stalcup (20)

Intro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DCIntro to JavaScript - Thinkful DC
Intro to JavaScript - Thinkful DC
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash Course
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Build Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSSBuild Your Own Website - Intro to HTML & CSS
Build Your Own Website - Intro to HTML & CSS
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Predict the Oscars using Data Science
Predict the Oscars using Data SciencePredict the Oscars using Data Science
Predict the Oscars using Data Science
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScript
 
Build a Game with Javascript
Build a Game with JavascriptBuild a Game with Javascript
Build a Game with Javascript
 
Thinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSSThinkful DC FrontEnd Crash Course - HTML & CSS
Thinkful DC FrontEnd Crash Course - HTML & CSS
 
Build Your Own Instagram Filters
Build Your Own Instagram FiltersBuild Your Own Instagram Filters
Build Your Own Instagram Filters
 
Choosing a Programming Language
Choosing a Programming LanguageChoosing a Programming Language
Choosing a Programming Language
 
Frontend Crash Course
Frontend Crash CourseFrontend Crash Course
Frontend Crash Course
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSS
 
Thinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSSThinkful FrontEnd Crash Course - HTML & CSS
Thinkful FrontEnd Crash Course - HTML & CSS
 
Build a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScriptBuild a Virtual Pet with JavaScript
Build a Virtual Pet with JavaScript
 
Intro to Javascript
Intro to JavascriptIntro to Javascript
Intro to Javascript
 
DC jQuery App
DC jQuery AppDC jQuery App
DC jQuery App
 
Thinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScriptThinkful DC - Intro to JavaScript
Thinkful DC - Intro to JavaScript
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Data Science Vacation Matching

  • 1. Data Science that Vacation. Using Data Science to find where you should take your next vacation.
  • 2. TJ Stalcup Lead Mentor @Thinkful API Evangelist @WealthEngine Pokemon Master Englightened Ingress Agent About Me
  • 4. What's your name? What do you do? Why are you interested in data science? About you
  • 5. Vacationing is fun. Planning a vacation is not. Data science can help. TONIGHT: Data, Vacation, and AI
  • 6. A text analyzer to take your writeup of your dream vacation and find your best match. To do that we need 3 things: A set of vacation reviews (we're going to use hotel reviews) A text based model for hotel matching Dream vacation descriptions What we're building:
  • 7. The data tonight is a sample of reviews of 1000 hotels collected by Datafinity, available on Kaggle . Has information about the hotel (name, location, etc) Information about the reviewer Review Text Rating here The Data
  • 8. Text processing is a slow and involved process This way we can make a model and perform matching in a relatively quick amount of time Why is it slow? Why only 1000 hotels?
  • 9. Text data is often referred to as 'unstructured data'. But what is structure data? Let's talk about text
  • 10. Structured Data NameName EmailEmail Date of SignupDate of Signup Jas Singh jas@thinkful.com 11/29/2017 ... ... ... This data is nice. It's a table with columns and we know what to expect.
  • 11. Unstructured Data This data is not as nice. It's unpredictable, varying in length and we don't really know what's what. It just kind of looks like one big thing. The text above (and this text here) is unstructured data....
  • 12. The Problems with Unstructured Unstructured data gives us a few specific problems: - What is a data point? Is it all just one data point? Is every character a data point? - How do we compare data? For things to be similar, how similar do they have to be? - What parts of the data matter? Are they all equally important?
  • 13. An example This is our test sentence. So what parts of this sentence matter? What are our data points?
  • 14. An example This is our test sentence. The words matter! And whitespace gives us a way to find them.
  • 15. An example This is our test sentence. ThisThis isis ourour testtest sentence.sentence. 1 1 1 1 1
  • 16. An example This is our test sentence. And this is a second sentence. ThisThis isis ourour testtest sentence.sentence. 1 1 1 1 1 0 1 0 0 1 We've taken our data and turned it into a table. We added structure!
  • 17. Bag of words This is called a 'bag of words' approach. (It's also called vectorizing.) We took our initial sentence and created a bag for each word. Then we went through any following sentences and counted the number of times we found a word that matched. It leaves us with a table where each column is a word and rows are counts
  • 18. Punctuation and case However, in looking at our example, something should seem logically off. Things like 'This' and 'this' are not considered equal because the computer doesn't see them as the same. The case is a difference. This is why you (almost) always preprocess text data.
  • 19. Back to the example This is our test sentence. ---> this is our test sentence And this is a second sentence. ---> and this is a second sentence thisthis isis ourour testtest sentencesentence 1 1 1 1 1 1 1 0 0 1 Getting rid of case and punctuation makes comparisons easier and more effective (particularly on small data)
  • 20. Stop words But there's more! Some words don't matter. They don't really tell us anything. These are called 'stop words'. Things like 'it', 'is', 'the' are usually just thrown out.
  • 21. Back to the example This is our test sentence. ---> this our test sentence And this is a second sentence. ---> this second sentence thisthis ourour testtest sentencesentence 1 1 1 1 1 0 0 1 Now we have vectors of the essentials for each sentence. This is something we can build a model on!This is something we can build a model on!
  • 22. The Model Our model is going to be Random Forest. A random forest is an ensemble of decision trees to predict the most likely class of an outcome variable. What does that mean?
  • 23. Decision Trees A set of rules that get us to a prediction, in the form of a tree. You can think of it like a computer building a version of 20 questions.
  • 25. Random Forest A random forest builds a lot of different decision trees and then lets each one vote. Our questions will be things like "Contains the word 'beach'" or "Contains the world 'sun' 2 or more times".
  • 26. The Notebook We're going to use a Google hosted Python to build this model. http://bit.ly/ideal-vacation notebook
  • 27. Shortcomings Our model has a few weaknesses: -What about relative frequency? -What about context?
  • 28. Relative Frequency It has a nice beach. vs 10 pages of text that says the word beach once.
  • 29. Relative Frequency Each one scores a 1 for beach. TFIDF is the answer. It rates each word by its relative frequency. So the word beach in a ten word sentence counts more than one mention in 10000 words.
  • 30. Context 'I hate beaches and love cities' vs 'I love beaches and hate cities' Our model would see these as the same thing.
  • 31. Context - N-Grams We can get a sense of context with n-grams. Each feature is a set of words rather than individual words. So we'd get features like 'love cities' and 'hate beaches' rather than 'love' 'cities' 'hate' 'beaches'.
  • 32. There's a lot more This all falls under the banner of Natural Language Processing, or NLP, one of the largest and most exciting fields of data science and artificial intelligence. It's the basis for things like chatbots and Siri and the Turing test itself. There is a lot of fun to be had in this space.
  • 34. Data Science @ Thinkful Flexible, project-based curriculum to help you become the data scientist you want to be You don’t just learn skills, you get to make things Mentor support from experts in the industry Also, there's a job guarantee
  • 35. Learning Mentor Career MentorProgram Manager Local Community You Unprecedented Support
  • 36. http://bit.ly/dc-ds-trialhttp://bit.ly/dc-ds-trial Initial 2-week trial course Start with Python and Statistics Unlimited Q&A Sessions Option to continue with full bootcamp Financing & scholarships available Offer valid for tonight onlyOffer valid for tonight only BenjyBenjy SchechnerSchechner Education Advisor Thinkful Two Week Trial