SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
Looking into the Future
Using Google’s Prediction API
Justin Grammens
Recursive Awesome & IoT Weekly
What is Prediction?
• Defined by Wikipedia as: “A statement about an
uncertain event.”
• Continues on to read… “It is often, but not
always, based upon experience or knowledge.”
• In statistics, prediction is a part of Statistical
Inference.
Statistical Inference
• Statistical inference is the process of deducing
properties of an underlying distribution by analysis
of data.
• Two major paradigms used for statistical inference
• Frequentist Inference
• Bayesian Inference
Frequentist Inference
• Data is repeatable random sample with a specific
probability
• Parameters and probabilities remain constant during
the test
• Results are independent results from prior tests
• Q: Will the sun rise tomorrow? What’s the probability
of a sun dying based on all the suns in the universe
Bayesian Inference
• Take into account prior results and subjective
beliefs
• Update probabilities of occurrence based on new
data
• Tests are NOT run in isolation and affect one
another
• Q: Will the sun rise tomorrow? Depends on how
many times we have seen it rise in the past
Predictions by Machines
• Could therefore define
prediction as an “informed
guess or opinion.”
• Software systems have to
be trained before they can
be effective.
source: reading.pppst.com
What is Prediction API?
• Announced at Google I/O in 2011
• Provides pattern-matching and machine learning
capabilities.
• Handles both numeric or text input
• Handles both classification or regression output
• Access from App Engine, client libs and command line
• Able to retrain the model on the fly - Bayesian?
What Are Some Usages?
What Do You Need?
• Google Account
• Google Platform Console project
• Google Predication API Activated
• Google Cloud Storage API Activated
Steps Involved
• Define what you are trying to accomplish
• Find the training data and format to support your goal
(hardest part)
• Upload training data to Google Cloud Storage
• Train the system against the data you provide
• Send queries to your model
• Upload additional data with new information gained.
Hosted Model
• The Prediction API hosts a gallery of user-submitted
models
• Owners can charge for the use of the model
• Hosted models are versioned so they an be updated
easily
• Models are submitted in PMML format
• XML-based language to define statistical & data models
• Appears to currently be a waitlist
How To Train
• 3 ways to create and train the correct type of model
• CSV File - Lives on Google Cloud Storage
• Training data embedded in request
• Limited to the size of an HTTP Request < 2MB
• Empty model created and trained with update
calls
CSV File Rules
• Maximum file size 2.5 GB
• No header row. Yes, to the system it’s irrelevant
• One example per line
• The first column indicates to the system the type of
model.
• Ideally remove punctuation (other then
apostrophes) from your data.
CSV File Rules
• Text Strings
• Double quotes around all text strings
• Text matching is case-sensitive
• Numeric Values
• Integer and decimals are supported
• Numbers: "1", "23", “999"
• Strings: "6 12", “colt 45"
Structuring Data
• Example Value
• “The Answer”
• Features
• No limit on number of
feature
• More features & examples
the better
• To train 16MB ~ 1 hour
What’s The Answer?
Regression Model
Example Data
• Define your data to support numbers and strings
• Query of “Seattle, 288, sunny”, might get back value of 62
• Don’t need to match any values in the dataset
• Fill model with all columns then query with first column missing
Classification Model
Example Data
• Query of “Lose weight now!” you would get
result of “spam”
• Returns the category from the dataset
Authorization
• You must use OAuth 2.0 to authorize requests
• Can share your model with others
• View: User can call Analyze, Get, List and Predict on the
project and/or any model owned by the project.
• Edit: User has all the permissions of Can view, but can also
Delete, Insert, and Update any models owned by the
project.
• Is Owner: User has all the permissions of Can edit, but can
also grant permissions to other users to access the project.
Tips & Tricks
• The more examples & features the better results
• However - Adding more features doesn’t always give better
predictions
is_comedy is_drama is_action is_horror
Y N N N
VS
genre
Comedy
Tips & Tricks
• Need to add a numeric aspect to the genre?
• Add additional genre columns and weight it based
on count
genre genre genre genre genre
Drama Drama Drama Comedy Comedy
Tips & Tricks
• Always put something into each feature
• Include all the features that you know about
• For Regression:
• Make sure will have the time to ensure the values are
correct
• Conversely, if you have exact numbers use them
• Try to have at least a few hundred examples for each
category
Tips & Tricks
• Can only compare against known relationships
• Can’t feed an untrained title and user to get rating
• Solution is to break the title into genre, director,
actors
Rating user_name movie_title
9.5 Justin Star Wars
2.2 Justin Disaster Movie
5.0 Justin Billy Madison
Let’s Talk Data!
• Nice Ride
• Based on the starting station, predict the ending station
• New York Cab Rides
• Given a starting GPS coordinate, predict where the cab
ride will end
• Sentiment Analysis
• Based on the state of the union speech define the
sentiment
Based on the starting
station, can we predict
the ending station?
Nice Ride Location Rides
• https://
www.niceridemn.org/
data/
• Offers a live XML
stream to update
along the way
Nice Ride Location Rides
Started
with this:
Next: Ended
with
this:
Nice Ride Insert Data
ID
&
Location
Nice Ride Running
Prediction
Status
Lessons Learned
• I forgot to put the
values in quotes.
Treated it as
numerical
regression.
• Verify how it’s
interpreting your
data with “get” call.
Type
Nice Ride Location Rides
Show Scripts, API & Results
Can we predict the
movement of NYC cabs?
NYC Cab Ride Data
Data DictionaryData Website
Sample Data
Contains pickup & drop off latitude and longitude
There’s A Problem
• Asking for 2 inputs and 2 outputs!
• Not possible with Prediction API as it only supports
one dependent variable. :(
• Change of plan…
Let’s predict the cost of
a NYC cab ride instead!
Prediction Demo
• Features are
distances (B)
• Examples are prices
(A)
• Is this accurate?
• Different fares
based on areas of
the city
Ok, not really… Let's
use location based
data instead
Prediction Demo
• Latitude /
Longitude are the
features (B, C, D, E
• Price Is The
Example (A)
• Examples
NYC Cab Ride Location
Show Scripts, API & Results
Sentiment Analysis of
a Speech
Speech Sentiment
• Always Check Your Data!
• Website incorrectly
claimed positive(4),
negative(0) and
neutral(2) sentiment.
• Data had groups of
sentiment values.
• Source
Speech Sentiment
FeatureExample Value
Training
Examples
Sentiment Training
Sentiment Example
Show Scripts, API & Results
Obama State of the Union Speech - 1/16
Donald Trump Speech Des Moines, IA - 1/24
Smart Spreadsheets
Install Smart Autofill Add-on
Smart Spreadsheets
Prediction API used to fill in missing values
Smart Spreadsheets
Select columns to use for data training
Smart Spreadsheets
“Example Values” are populated
Final Thoughts - Overfitting
• Overfitting the model generally takes the form of
making an overly complex model to explain
idiosyncrasies in the data under study.
• Therefore, a model that has been overfit will
generally have poor predictive performance, as it
can exaggerate minor fluctuations in the data.
• Exact query should not return EXACT examples
Thank You
Justin Grammens
justin@recursiveawesome.com
http://recursiveawesome.com
Checkout my IoT Weekly Newsletter
http://iotweeklynews.com

Más contenido relacionado

La actualidad más candente

Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Databricks
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 

La actualidad más candente (20)

Second demo - Nu
Second demo - NuSecond demo - Nu
Second demo - Nu
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, QatarIBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
 
Love & Innovative technology presented by a technology pioneer and an AI expe...
Love & Innovative technology presented by a technology pioneer and an AI expe...Love & Innovative technology presented by a technology pioneer and an AI expe...
Love & Innovative technology presented by a technology pioneer and an AI expe...
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time Bidding
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
 
Demo threater
Demo threaterDemo threater
Demo threater
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
 
Bootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B TestsBootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B Tests
 
Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine Learning
 
PyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at QuansightPyDataStructs Tech Share at Quansight
PyDataStructs Tech Share at Quansight
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Nasscom ml ops webinar
Nasscom ml ops webinarNasscom ml ops webinar
Nasscom ml ops webinar
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 

Similar a Looking into the Future: Using Google's Prediction API

Similar a Looking into the Future: Using Google's Prediction API (20)

Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Intro to AWS Machine Learning
Intro to AWS Machine LearningIntro to AWS Machine Learning
Intro to AWS Machine Learning
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
Azure machine learning
Azure machine learningAzure machine learning
Azure machine learning
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
Pragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML SpainPragmatic Machine Learning @ ML Spain
Pragmatic Machine Learning @ ML Spain
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Machine learning
Machine learningMachine learning
Machine learning
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Agile Work Quality: Test Driven Development and Unit Tests
Agile Work Quality:  Test Driven Development and Unit TestsAgile Work Quality:  Test Driven Development and Unit Tests
Agile Work Quality: Test Driven Development and Unit Tests
 
Machine Learning with Azure
Machine Learning with AzureMachine Learning with Azure
Machine Learning with Azure
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Steps in Simulation Study
Steps in Simulation StudySteps in Simulation Study
Steps in Simulation Study
 
Machine learning and azure ml studio
Machine learning and azure ml studioMachine learning and azure ml studio
Machine learning and azure ml studio
 
Running with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsightRunning with Elephants: Predictive Analytics with HDInsight
Running with Elephants: Predictive Analytics with HDInsight
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 

Más de Justin Grammens

Más de Justin Grammens (16)

Scope Creep - Damned if I Do, Damned if I Don't
Scope Creep - Damned if I Do, Damned if I Don'tScope Creep - Damned if I Do, Damned if I Don't
Scope Creep - Damned if I Do, Damned if I Don't
 
NDC Minnesota 2019 - Fundamentals of Azure IoT
NDC Minnesota 2019 - Fundamentals of Azure IoTNDC Minnesota 2019 - Fundamentals of Azure IoT
NDC Minnesota 2019 - Fundamentals of Azure IoT
 
The Internet of Things - What It Is, Where Its Headed and Its Applications
The Internet of Things - What It Is, Where Its Headed and Its ApplicationsThe Internet of Things - What It Is, Where Its Headed and Its Applications
The Internet of Things - What It Is, Where Its Headed and Its Applications
 
Internet of Things: What It Is, Where's Headed and Its Applications
Internet of Things: What It Is, Where's Headed and Its ApplicationsInternet of Things: What It Is, Where's Headed and Its Applications
Internet of Things: What It Is, Where's Headed and Its Applications
 
Collaborative Learning - The Role Communities Play in IoT
Collaborative Learning - The Role Communities Play in IoTCollaborative Learning - The Role Communities Play in IoT
Collaborative Learning - The Role Communities Play in IoT
 
Internet of Things: What it is, where it is going and how it is being applied.
Internet of Things: What it is, where it is going and how it is being applied.Internet of Things: What it is, where it is going and how it is being applied.
Internet of Things: What it is, where it is going and how it is being applied.
 
Arduino, Open Source and The Internet of Things Landscape
Arduino, Open Source and The Internet of Things LandscapeArduino, Open Source and The Internet of Things Landscape
Arduino, Open Source and The Internet of Things Landscape
 
Gobot Meets IoT : Using the Go Programming Language to Control The “Things” A...
Gobot Meets IoT : Using the Go Programming Language to Control The “Things” A...Gobot Meets IoT : Using the Go Programming Language to Control The “Things” A...
Gobot Meets IoT : Using the Go Programming Language to Control The “Things” A...
 
Physical Computing Using Go and Arduino
Physical Computing Using Go and ArduinoPhysical Computing Using Go and Arduino
Physical Computing Using Go and Arduino
 
The State of Arduino and IoT
The State of Arduino and IoTThe State of Arduino and IoT
The State of Arduino and IoT
 
Android Minnebar
Android MinnebarAndroid Minnebar
Android Minnebar
 
Android TCJUG
Android TCJUGAndroid TCJUG
Android TCJUG
 
Voice Enabled Applications
Voice Enabled ApplicationsVoice Enabled Applications
Voice Enabled Applications
 
Android Intro
Android IntroAndroid Intro
Android Intro
 
Adhearsion and Telegraph Framework Presentation
Adhearsion and Telegraph Framework PresentationAdhearsion and Telegraph Framework Presentation
Adhearsion and Telegraph Framework Presentation
 
Asterisk-Java Framework Presentation
Asterisk-Java Framework PresentationAsterisk-Java Framework Presentation
Asterisk-Java Framework Presentation
 

Último

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Último (20)

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

Looking into the Future: Using Google's Prediction API

  • 1. Looking into the Future Using Google’s Prediction API Justin Grammens Recursive Awesome & IoT Weekly
  • 2. What is Prediction? • Defined by Wikipedia as: “A statement about an uncertain event.” • Continues on to read… “It is often, but not always, based upon experience or knowledge.” • In statistics, prediction is a part of Statistical Inference.
  • 3. Statistical Inference • Statistical inference is the process of deducing properties of an underlying distribution by analysis of data. • Two major paradigms used for statistical inference • Frequentist Inference • Bayesian Inference
  • 4. Frequentist Inference • Data is repeatable random sample with a specific probability • Parameters and probabilities remain constant during the test • Results are independent results from prior tests • Q: Will the sun rise tomorrow? What’s the probability of a sun dying based on all the suns in the universe
  • 5. Bayesian Inference • Take into account prior results and subjective beliefs • Update probabilities of occurrence based on new data • Tests are NOT run in isolation and affect one another • Q: Will the sun rise tomorrow? Depends on how many times we have seen it rise in the past
  • 6. Predictions by Machines • Could therefore define prediction as an “informed guess or opinion.” • Software systems have to be trained before they can be effective. source: reading.pppst.com
  • 7. What is Prediction API? • Announced at Google I/O in 2011 • Provides pattern-matching and machine learning capabilities. • Handles both numeric or text input • Handles both classification or regression output • Access from App Engine, client libs and command line • Able to retrain the model on the fly - Bayesian?
  • 8. What Are Some Usages?
  • 9. What Do You Need? • Google Account • Google Platform Console project • Google Predication API Activated • Google Cloud Storage API Activated
  • 10. Steps Involved • Define what you are trying to accomplish • Find the training data and format to support your goal (hardest part) • Upload training data to Google Cloud Storage • Train the system against the data you provide • Send queries to your model • Upload additional data with new information gained.
  • 11. Hosted Model • The Prediction API hosts a gallery of user-submitted models • Owners can charge for the use of the model • Hosted models are versioned so they an be updated easily • Models are submitted in PMML format • XML-based language to define statistical & data models • Appears to currently be a waitlist
  • 12. How To Train • 3 ways to create and train the correct type of model • CSV File - Lives on Google Cloud Storage • Training data embedded in request • Limited to the size of an HTTP Request < 2MB • Empty model created and trained with update calls
  • 13. CSV File Rules • Maximum file size 2.5 GB • No header row. Yes, to the system it’s irrelevant • One example per line • The first column indicates to the system the type of model. • Ideally remove punctuation (other then apostrophes) from your data.
  • 14. CSV File Rules • Text Strings • Double quotes around all text strings • Text matching is case-sensitive • Numeric Values • Integer and decimals are supported • Numbers: "1", "23", “999" • Strings: "6 12", “colt 45"
  • 15. Structuring Data • Example Value • “The Answer” • Features • No limit on number of feature • More features & examples the better • To train 16MB ~ 1 hour
  • 17. Regression Model Example Data • Define your data to support numbers and strings • Query of “Seattle, 288, sunny”, might get back value of 62 • Don’t need to match any values in the dataset • Fill model with all columns then query with first column missing
  • 18. Classification Model Example Data • Query of “Lose weight now!” you would get result of “spam” • Returns the category from the dataset
  • 19. Authorization • You must use OAuth 2.0 to authorize requests • Can share your model with others • View: User can call Analyze, Get, List and Predict on the project and/or any model owned by the project. • Edit: User has all the permissions of Can view, but can also Delete, Insert, and Update any models owned by the project. • Is Owner: User has all the permissions of Can edit, but can also grant permissions to other users to access the project.
  • 20. Tips & Tricks • The more examples & features the better results • However - Adding more features doesn’t always give better predictions is_comedy is_drama is_action is_horror Y N N N VS genre Comedy
  • 21. Tips & Tricks • Need to add a numeric aspect to the genre? • Add additional genre columns and weight it based on count genre genre genre genre genre Drama Drama Drama Comedy Comedy
  • 22. Tips & Tricks • Always put something into each feature • Include all the features that you know about • For Regression: • Make sure will have the time to ensure the values are correct • Conversely, if you have exact numbers use them • Try to have at least a few hundred examples for each category
  • 23. Tips & Tricks • Can only compare against known relationships • Can’t feed an untrained title and user to get rating • Solution is to break the title into genre, director, actors Rating user_name movie_title 9.5 Justin Star Wars 2.2 Justin Disaster Movie 5.0 Justin Billy Madison
  • 24. Let’s Talk Data! • Nice Ride • Based on the starting station, predict the ending station • New York Cab Rides • Given a starting GPS coordinate, predict where the cab ride will end • Sentiment Analysis • Based on the state of the union speech define the sentiment
  • 25. Based on the starting station, can we predict the ending station?
  • 26. Nice Ride Location Rides • https:// www.niceridemn.org/ data/ • Offers a live XML stream to update along the way
  • 27. Nice Ride Location Rides Started with this: Next: Ended with this:
  • 28. Nice Ride Insert Data ID & Location
  • 30. Lessons Learned • I forgot to put the values in quotes. Treated it as numerical regression. • Verify how it’s interpreting your data with “get” call. Type
  • 31. Nice Ride Location Rides Show Scripts, API & Results
  • 32. Can we predict the movement of NYC cabs?
  • 33. NYC Cab Ride Data Data DictionaryData Website
  • 34. Sample Data Contains pickup & drop off latitude and longitude
  • 35. There’s A Problem • Asking for 2 inputs and 2 outputs! • Not possible with Prediction API as it only supports one dependent variable. :( • Change of plan…
  • 36. Let’s predict the cost of a NYC cab ride instead!
  • 37. Prediction Demo • Features are distances (B) • Examples are prices (A) • Is this accurate? • Different fares based on areas of the city
  • 38. Ok, not really… Let's use location based data instead
  • 39. Prediction Demo • Latitude / Longitude are the features (B, C, D, E • Price Is The Example (A) • Examples
  • 40. NYC Cab Ride Location Show Scripts, API & Results
  • 42. Speech Sentiment • Always Check Your Data! • Website incorrectly claimed positive(4), negative(0) and neutral(2) sentiment. • Data had groups of sentiment values. • Source
  • 45. Sentiment Example Show Scripts, API & Results Obama State of the Union Speech - 1/16 Donald Trump Speech Des Moines, IA - 1/24
  • 47. Smart Spreadsheets Prediction API used to fill in missing values
  • 48. Smart Spreadsheets Select columns to use for data training
  • 50. Final Thoughts - Overfitting • Overfitting the model generally takes the form of making an overly complex model to explain idiosyncrasies in the data under study. • Therefore, a model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. • Exact query should not return EXACT examples