SlideShare una empresa de Scribd logo
1 de 50
A Sentiment Pipeline with AWS and
Amazon SageMaker
Jeff Fenchel
Starbucks - Racial Profiling
Shutdown for racial bias training estimated to cost an
additional 16.7 million in lost revenue.
Agenda
1. Why are we building yet another sentiment API?
2. How we leverage Amazon Mechanical Turk to collect labeled
data
3. Utilizing Amazon SageMaker to regularly retrain and update
models in a resilient fashion
www.linkedin.com/in/jeffreyfenchel
Why Sentiment?
Option A Option B
Customer Feedback
“The sentiment is too neutral.”
“I have removed sentiment from all my reports.”
“I spend hours doing manual sentiment overrides.”
“Why was tweet X labeled neutral/positive/negative.”
What Happened
Zignal Labs
Rule Based Sentiment
● Positive if it:
○ mentions and no dissatisfaction is expressed
○ portrays the company as being sustainable
○ Is introducing a new executive
● Negative if it:
○ Equates the company to something negative e.g world
hunger
● Neutral if it:
○ focuses on a new facility being opened
Reputation Polarity
"Polarity for reputation: Does the information (facts, opinions) in the text have positive, negative,
or neutral implications for the image of the company? This problem is related to sentiment
analysis and opinion mining, but has substantial differences with the mainstream research in that
areas: polar facts are ubiquitous (for instance, “Lehmann Brothers goes bankrupt” is a fact with
negative implications for reputation), perspective plays a key role. The same information may
have negative implications from the point of view of clients and positive from the point of view of
investors, negative sentiments may have positive polarity for reputation (for example, “R.I.P.
Michael Jackson. We’ll miss you” has a negative associated sentiment - sadness -, but a positive
implication for the reputation of Michael Jackson.)”
-- RepLab 2012
Reputation Polarity
"Polarity for reputation: Does the information (facts, opinions) in the text have positive, negative,
or neutral implications for the image of the company? This problem is related to sentiment
analysis and opinion mining, but has substantial differences with the mainstream research in that
areas: polar facts are ubiquitous (for instance, “Lehmann Brothers goes bankrupt” is a fact with
negative implications for reputation), perspective plays a key role. The same information may
have negative implications from the point of view of clients and positive from the point of view of
investors, negative sentiments may have positive polarity for reputation (for example, “R.I.P.
Michael Jackson. We’ll miss you” has a negative associated sentiment - sadness -, but a positive
implication for the reputation of Michael Jackson.)”
-- RepLab 2012
Negative Neutral Positive
Data Skew and Model Variance
Transparency And Trust
Predicted Sentiment
Actual
Sentiment
Negative Neutral Positive
Negative 102 18 13
Neutral 16 39 16
Positive 6 2 33
The Plan
Mechanical Turk for Human Intelligence Tasks (HITs)
On Demand Workforce
Where do we start?
$ click_
https://github.com/pallets/click
Quality Control
● Fleiss’ Kappa Agreement
● Worker quality and bias assessment
with expectation maximization
● Qualification test and training
○ 21% pass rate
Continuous Labeling
Complete Records are critical
including:
● Raw assignment answers
from Mturk + HIT info
● Computed worker
evaluations (quality + bias +
support)
● Best fit answers
Continuous Labeling
Complete Records are critical
including:
● Raw assignment answers
from Mturk + HIT info
● Computed worker
evaluations (quality + bias +
support)
● Best fit answers
Quality in Test != Quality at Scale
● 92% -> 73% Label accuracy
● We get repeat workers!
CrossPolarityErrorRate
Worker Score [0,1]
Number of batches with contribution
Ratioofworkers
Workers by Repeat work Cumulative Histogram
Worker Score vs Cross Polarity Error
Investment in Workers
● Transparent feedback
● Automatic exclusions
Iteration is Key
Deployable Project Pattern
sultan/
├── deploy/
│ ├── templates/
│ │ ├── __init__.py
│ │ ├── pipeline.py
│ │ └── sultan.py
│ ├── __init__.py
│ └──requirements.txt
├── functions/
│ ├── create_sentiment_questions/
│ ├── provide_feedback/
│ └── manage_sentiment_results/
├── sultan/
├── setup.py
└── requirements.py
Troposphere
https://github.com/cloudtools/troposphere
ZignalLib
- Cloudformation in Python
- Common Deployment Patterns
The Pipeline Code
The Twitter Model
● Current models use Keras + NLTK (for longform)
● BYOE approach
The Longform Model
● Two phase training: sentence level and document level
The Twitter Model
The Twitter Model
The Twitter Model
The Twitter Model
Amazon SageMaker
Optimized For:
Also supports:
Model Lifecycle
Training jobs:
● Longform
● Twitter
Endpoints:
● Twitter
○ Variant 1 - <Date Trained> - 95%
Traffic
○ Variant 2 - <Date Trained> - 5%
Traffic
● Longform
Model Training
model/
├── config.json
├── model.hdf5
├── output_encoder.pickle
├── stats.json
└── tokenizer.pickle
Model Training
Model Training
Model Training
Model Deployment
● New models introduced as a 5%
variant
1) Deploy
2) Promote ● Promoted if 5xx replies < 10%
Model Deployment
Model Deployment
Model Deployment
Model Deployment
Model Deployment
Model Deployment (Review)
● New models introduced as a 5%
variant
1) Deploy
2) Promote ● Promoted if 5xx replies < 10%
Model Promotion
Model Promotion
Model Promotion
Model Promotion
Model Deployment (Review)
● New models introduced as a 5%
variant
1) Deploy
2) Promote ● Promoted if 5xx replies < 10%
Serving Model Architecture
● Amazon SageMaker provided framework
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advan
ced_functionality/scikit_bring_your_own
● API consumer side batching
Summary
● Continuously gather labeled data from Mechanical Turk
● Leverage Amazon SageMaker to retrain daily and provide an
endpoint for our real time data pipeline
○ Serverless
○ Provides architecture patterns
● Received positive feedback in a trials with numerous
customers especially around sentiment directionality
● Future Work
○ Explore Hyper Parameter Tuning
○ Improve the inclusion of relevance in sentiment analysis
Zignal Labs
https://zignallabs.com/
www.linkedin.com/in/jeffreyfenchel
jfenchel@zignallabs.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Using Amazon ML Services for Video Transcription & Translation: Machine Learn...
Using Amazon ML Services for Video Transcription & Translation: Machine Learn...Using Amazon ML Services for Video Transcription & Translation: Machine Learn...
Using Amazon ML Services for Video Transcription & Translation: Machine Learn...
 
Add Intelligence to Applications with AWS ML Services
Add Intelligence to Applications with AWS ML ServicesAdd Intelligence to Applications with AWS ML Services
Add Intelligence to Applications with AWS ML Services
 
Workshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMakerWorkshop: Build Deep Learning Applications with TensorFlow and SageMaker
Workshop: Build Deep Learning Applications with TensorFlow and SageMaker
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
AWS Machine Learning Language Services (May 2018)
AWS Machine Learning Language Services (May 2018)AWS Machine Learning Language Services (May 2018)
AWS Machine Learning Language Services (May 2018)
 
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
Automate for Efficiency with Amazon Transcribe & Amazon Translate: Machine Le...
 
AWS Machine Learning Week SF: Build Intelligent Applications with AWS ML Serv...
AWS Machine Learning Week SF: Build Intelligent Applications with AWS ML Serv...AWS Machine Learning Week SF: Build Intelligent Applications with AWS ML Serv...
AWS Machine Learning Week SF: Build Intelligent Applications with AWS ML Serv...
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SF
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SFAdd Intelligence to Applications with AWS ML: Machine Learning Workshops SF
Add Intelligence to Applications with AWS ML: Machine Learning Workshops SF
 
Add Intelligence to Applications with AWS ML Services: Machine Learning Week ...
Add Intelligence to Applications with AWS ML Services: Machine Learning Week ...Add Intelligence to Applications with AWS ML Services: Machine Learning Week ...
Add Intelligence to Applications with AWS ML Services: Machine Learning Week ...
 
Build Text Analytics Solutions with AWS ML Services: Machine Learning Worksho...
Build Text Analytics Solutions with AWS ML Services: Machine Learning Worksho...Build Text Analytics Solutions with AWS ML Services: Machine Learning Worksho...
Build Text Analytics Solutions with AWS ML Services: Machine Learning Worksho...
 
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
Workshop Build an Image-Based Automatic Alert System with Amazon Rekognition:...
 
Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)Become a Machine Learning developer with AWS services (May 2019)
Become a Machine Learning developer with AWS services (May 2019)
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon TranslateBuild Text Analytics Solutions with Amazon Comprehend and Amazon Translate
Build Text Analytics Solutions with Amazon Comprehend and Amazon Translate
 
Customer-Obsessed Digital User Engagement
Customer-Obsessed Digital User EngagementCustomer-Obsessed Digital User Engagement
Customer-Obsessed Digital User Engagement
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
 
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
Build, train and deploy Machine Learning models on Amazon SageMaker (May 2019)
 
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
 
Automate for Efficiency with Amazon Transcribe and Amazon Translate
Automate for Efficiency with Amazon Transcribe and Amazon TranslateAutomate for Efficiency with Amazon Transcribe and Amazon Translate
Automate for Efficiency with Amazon Transcribe and Amazon Translate
 

Similar a Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling to a Real-time Data Pipeline by Zignal Labs

Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 

Similar a Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling to a Real-time Data Pipeline by Zignal Labs (20)

How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
Growth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - AntwerpGrowth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - Antwerp
 
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
 
Are you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testAre you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point test
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
 
Can AI finally "cure" the Marketing Myopia?
Can AI finally "cure" the Marketing Myopia?Can AI finally "cure" the Marketing Myopia?
Can AI finally "cure" the Marketing Myopia?
 
X talks (前沿对话)
X talks (前沿对话)X talks (前沿对话)
X talks (前沿对话)
 
Quality myths
Quality mythsQuality myths
Quality myths
 
FAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of OpportunitiesFAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of Opportunities
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
The 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business TransformationThe 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business Transformation
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Intro to ai application emeritus uob-final
Intro to ai application emeritus uob-finalIntro to ai application emeritus uob-final
Intro to ai application emeritus uob-final
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Einstein Analytics Prediction Builder
Einstein Analytics Prediction BuilderEinstein Analytics Prediction Builder
Einstein Analytics Prediction Builder
 
Best practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure MLBest practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure ML
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling to a Real-time Data Pipeline by Zignal Labs

  • 1. A Sentiment Pipeline with AWS and Amazon SageMaker Jeff Fenchel
  • 2. Starbucks - Racial Profiling Shutdown for racial bias training estimated to cost an additional 16.7 million in lost revenue.
  • 3. Agenda 1. Why are we building yet another sentiment API? 2. How we leverage Amazon Mechanical Turk to collect labeled data 3. Utilizing Amazon SageMaker to regularly retrain and update models in a resilient fashion www.linkedin.com/in/jeffreyfenchel
  • 5. Customer Feedback “The sentiment is too neutral.” “I have removed sentiment from all my reports.” “I spend hours doing manual sentiment overrides.” “Why was tweet X labeled neutral/positive/negative.”
  • 8. Rule Based Sentiment ● Positive if it: ○ mentions and no dissatisfaction is expressed ○ portrays the company as being sustainable ○ Is introducing a new executive ● Negative if it: ○ Equates the company to something negative e.g world hunger ● Neutral if it: ○ focuses on a new facility being opened
  • 9. Reputation Polarity "Polarity for reputation: Does the information (facts, opinions) in the text have positive, negative, or neutral implications for the image of the company? This problem is related to sentiment analysis and opinion mining, but has substantial differences with the mainstream research in that areas: polar facts are ubiquitous (for instance, “Lehmann Brothers goes bankrupt” is a fact with negative implications for reputation), perspective plays a key role. The same information may have negative implications from the point of view of clients and positive from the point of view of investors, negative sentiments may have positive polarity for reputation (for example, “R.I.P. Michael Jackson. We’ll miss you” has a negative associated sentiment - sadness -, but a positive implication for the reputation of Michael Jackson.)” -- RepLab 2012
  • 10. Reputation Polarity "Polarity for reputation: Does the information (facts, opinions) in the text have positive, negative, or neutral implications for the image of the company? This problem is related to sentiment analysis and opinion mining, but has substantial differences with the mainstream research in that areas: polar facts are ubiquitous (for instance, “Lehmann Brothers goes bankrupt” is a fact with negative implications for reputation), perspective plays a key role. The same information may have negative implications from the point of view of clients and positive from the point of view of investors, negative sentiments may have positive polarity for reputation (for example, “R.I.P. Michael Jackson. We’ll miss you” has a negative associated sentiment - sadness -, but a positive implication for the reputation of Michael Jackson.)” -- RepLab 2012 Negative Neutral Positive
  • 11. Data Skew and Model Variance
  • 12. Transparency And Trust Predicted Sentiment Actual Sentiment Negative Neutral Positive Negative 102 18 13 Neutral 16 39 16 Positive 6 2 33
  • 14. Mechanical Turk for Human Intelligence Tasks (HITs) On Demand Workforce
  • 15. Where do we start? $ click_ https://github.com/pallets/click
  • 16. Quality Control ● Fleiss’ Kappa Agreement ● Worker quality and bias assessment with expectation maximization ● Qualification test and training ○ 21% pass rate
  • 17. Continuous Labeling Complete Records are critical including: ● Raw assignment answers from Mturk + HIT info ● Computed worker evaluations (quality + bias + support) ● Best fit answers
  • 18. Continuous Labeling Complete Records are critical including: ● Raw assignment answers from Mturk + HIT info ● Computed worker evaluations (quality + bias + support) ● Best fit answers
  • 19. Quality in Test != Quality at Scale ● 92% -> 73% Label accuracy ● We get repeat workers! CrossPolarityErrorRate Worker Score [0,1] Number of batches with contribution Ratioofworkers Workers by Repeat work Cumulative Histogram Worker Score vs Cross Polarity Error
  • 20. Investment in Workers ● Transparent feedback ● Automatic exclusions
  • 22. Deployable Project Pattern sultan/ ├── deploy/ │ ├── templates/ │ │ ├── __init__.py │ │ ├── pipeline.py │ │ └── sultan.py │ ├── __init__.py │ └──requirements.txt ├── functions/ │ ├── create_sentiment_questions/ │ ├── provide_feedback/ │ └── manage_sentiment_results/ ├── sultan/ ├── setup.py └── requirements.py Troposphere https://github.com/cloudtools/troposphere ZignalLib - Cloudformation in Python - Common Deployment Patterns
  • 24. The Twitter Model ● Current models use Keras + NLTK (for longform) ● BYOE approach
  • 25. The Longform Model ● Two phase training: sentence level and document level
  • 31. Model Lifecycle Training jobs: ● Longform ● Twitter Endpoints: ● Twitter ○ Variant 1 - <Date Trained> - 95% Traffic ○ Variant 2 - <Date Trained> - 5% Traffic ● Longform
  • 32. Model Training model/ ├── config.json ├── model.hdf5 ├── output_encoder.pickle ├── stats.json └── tokenizer.pickle
  • 36. Model Deployment ● New models introduced as a 5% variant 1) Deploy 2) Promote ● Promoted if 5xx replies < 10%
  • 42. Model Deployment (Review) ● New models introduced as a 5% variant 1) Deploy 2) Promote ● Promoted if 5xx replies < 10%
  • 47. Model Deployment (Review) ● New models introduced as a 5% variant 1) Deploy 2) Promote ● Promoted if 5xx replies < 10%
  • 48. Serving Model Architecture ● Amazon SageMaker provided framework https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advan ced_functionality/scikit_bring_your_own ● API consumer side batching
  • 49. Summary ● Continuously gather labeled data from Mechanical Turk ● Leverage Amazon SageMaker to retrain daily and provide an endpoint for our real time data pipeline ○ Serverless ○ Provides architecture patterns ● Received positive feedback in a trials with numerous customers especially around sentiment directionality ● Future Work ○ Explore Hyper Parameter Tuning ○ Improve the inclusion of relevance in sentiment analysis