SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Peter W. Hallinan, Ph.D.
A9.com
November 30, 2016
MAC201
Getting to Ground Truth with
Amazon Mechanical Turk
What to expect from the session
• How to use Mechanical Turk to build ML datasets
• Best practices at smaller and larger scales
• Lessons learned building datasets for an AWS service
What not to expect
• Detailed tutorial on how to use Mechanical Turk
Machine learning requires large scale data
production pipelines
• Readily available: algorithms and compute clusters
• Not readily available: large scale, high quality datasets
Amazon Mechanical Turk can help you build your dataset
• Training data is the key differentiator
• Success depends on the quality, scale, and throughput
of your dataset production pipelines
www.mturk.com
What is Amazon Mechanical Turk (MTurk)?
• A marketplace for getting simple tasks done in parallel by humans
• Launched in 2005, one of the first AWS services
• Basic unit of work is a HIT, a single, self-contained task
• Example: “How many wolves are in this photo?”
• Requesters use website or APIs to publish HITs to workers and
consume results
• One or more workers per HIT
• Rapid response times
• Simple workflow: HTML template, csv in, csv out
What datasets can you build with MTurk?
Almost any domain
• Vision, NLP, psych, etc.
All key task types
• Open-ended questions
• Structured questions
• Binary verifications
ImageNet
• Prof. Fei Fei Li, Stanford AI Lab
• 21841 WordNet categories
• 14.1 MM total images
• 1 MM localized examples
• ImageNet Challenge 2010-2016
Search Google Scholar for “Mechanical Turk” + “machine learning”
Result: 6000+ citations
L. Fei-Fei, ImageNet: crowdsourcing, benchmarking and other
cool things, CMU VASC Seminar, March, 2010.
What kind of quality can you expect?
1. How much do your
categories intrinsically
overlap?
2. How representative is
your “golden set”?
3. How well can workers
solve your specific HIT?
Wolves Dogs?
Feature B
Feature A
11
What kind of quality can you expect?
1. How much do your
categories intrinsically
overlap?
2. How representative is
your “golden set”?
3. How well can Workers
solve your specific HIT?
True wolves
2
Golden wolves
Feature B
Feature A
Worker
wolves
3
Rapid prototyping:
Smaller scale datasets
Dataset construction is highly iterative...
… so MTurk supports
rapid iterations
Source
data
Define HIT &
golden set
Evaluate
HIT results
Augment
data
Train & test
ML algorithm
Define
objectives
MTurk
Example: Build a wristwatch classifier
• ML objective: Label wristwatch “shape”
• Dataset objective: ~2000 training examples
• Data source: Amazon Catalog
Experiment A: 1 shape feature, 3 categories
Rectangular
Circular
Other
Experiment A: HIT design
Experiment A: Accuracy is 97.29%
Golden set
MTurk
Circle Rect Other 700 Accuracy Precision Recall
Circle 504 3 10 571 97.29% 97% 99%
Rect 0 64 2 66 97% 94%
Other 3 1 113 117 97% 90%
700 507 68 125 681
72% 10% 18%
Can we do better?
Experiment B: 2 shape features, 16 categories
Dial face
shape
Casing shape
Other
Rectangle
Circle/oval
Circle/oval
Rectangle
Other
Tonneau
Tonneau
Experiment B: Hit Design
Experiment B: Accuracy drops 4.72% to 92.57%
Golden set
Case/
Dial C/C T/C R/C O/C C/T T/T R/T O/T C/R T/R R/R O/R C/O T/O R/O O/O 700 Accuracy Precision Recall
MTurk
C/C 453 3 3 7 8 474 92.57% 96% 97%
T/C 11 19 1 2 1 34 56% 86%
R/C 1 1 2 50% 20%
O/C 5 4 9 44% 31%
C/T 0 0 - -
T/T 3 1 4 75% 100%
R/T 0 1 1 0% -
O/T 0 0 - -
C/R 0 0 - -
T/R 0 0 - 0%
R/R 53 2 2 57 93% 93%
O/R 2 2 100% 40%
C/O 0 1 1 0% -
T/O 0 0 - -
R/O 0 0 - -
O/O 1 2 113 116 97% 90%
700 469 22 5 13 0 3 0 0 0 1 57 5 0 0 0 125 648
Prevalence 67% 3% 1% 2% 0% 0% 0% 0% 0% 0% 8% 1% 0% 0% 0% 18%
A second “fuzzy” feature creates more opportunity for disagreement
C = Circle/Oval; T = Tonneau; R = Rectangular; O = Other
Experiment B: Filtering workers adds 0.14%
Golden set
Case/
Dial C/C T/C R/C O/C C/T T/T R/T O/T C/R T/R R/R O/R C/O T/O R/O O/O 700 Accuracy Precision Recall
MTurk
C/C 455 3 3 7 8 476 92.71% 96% 97%
T/C 10 19 1 2 1 33 58% 86%
R/C 1 2 3 33% 20%
O/C 4 4 8 50% 31%
C/T 0 0 - -
T/T 3 1 4 75% 100%
R/T 0 1 1 0% -
O/T 0 0 - -
C/R 0 0 - -
T/R 0 0 - 0%
R/R 52 2 2 56 93% 91%
O/R 2 2 100% 40%
C/O 0 1 1 0% -
T/O 0 0 - -
R/O 0 0 - -
O/O 1 2 113 116 97% 90%
700 469 22 5 13 0 3 0 0 0 1 57 5 0 0 0 125 649
Prevalence 67% 3% 1% 2% 0% 0% 0% 0% 0% 0% 8% 1% 0% 0% 0% 18%
Only 5/124 workers are in the minority for a majority of their votes
Many possible experiments; we’ve reported three
Quality levers Lever settings / experiments
ML objectives 1 feature / 3 categories, 2 features / 16 categories
Data sources and segments Held constant
Golden set # examples: 100, 700
# annotators: 1,3
HIT design / instructions Picture only, text only, picture and text
Worker selection Prequalified
Workers per HIT 3, 5
HIT aggregation rules Majority vote vs majority of filtered workers
Worker feedback None
Quality control levers
ML objectives
Data sources and segments
Golden sets
HIT design / instructions
Worker selection
Workers per HIT
HIT aggregation rules
Worker feedback
Myth: Dataset quality is an intrinsic property of the
MTurk marketplace
Throughput control levers
• HIT price
• HIT publication rate
1
2
3
MTurk provides you with control
levers to optimize dataset
quality and throughput
Best practices: HIT design
• Simplicity of question vs. clarity of answers
• Prefer questions with limited option sets to open-ended questions
• Prefer mutually exclusive, collectively exhaustive option sets
• Prefer smaller option sets to larger ones
• Ease of learning vs. time to complete HIT
• Prefer more questions and simpler instructions to fewer questions and more
complex instructions
• Prefer that each possible answer set costs the same time to provide
• Workers optimize their behaviors to your design; don’t “tweak” too much
Using MTurk criteria
• Geography
• Worker approval rating
• Total # HITs approved
• Masters status
• Mobile device user
• Political affiliation
• High school graduate
• Bachelor’s degree
• Marital status
• Parenthood status
• Voted in 2012 presidential election
• Smoker
• Car owner
• Handedness
Best practices: Selecting workers
Using your own criteria
• Past performance on your HITs
• Custom tests of domain specific knowledge
• Custom tests of decision-making ability
Aggregating results
• When using multiple workers /
HIT, aggregate results with voting
scheme
• Align voting w/ prevalence
• Moderate prevalence => Majority
voting
• Low prevalence => Any yes vote
• Either drop split decisions, or force
them into a category that can be
split later
Best practices: Assessing results
Worker feedback
• Approve and reject HITs carefully
• Automatic rejections require
ironclad reasons
• Adjust selection criteria
• Monitor emails and forums
Best practices: Boosting quality
• Separate your categories
• Scrutinize false positives and negatives
• Simplify and clarify instructions
• Optimize worker quals
• Experiment, experiment, experiment!
Dataset accuracy puts an upper bound on system performance
Scaling Up:
Larger scale datasets
Challenge: Measuring quality over time
Example
• 1 MM HITs @ 100K / week
• 99% confidence level
• Confidence interval (CI)
varies with sample size
Three strategies
• Scrutinize @ fixed CI
• Scrutinize @ decreasing CI
• Scrutinize and trust
Potential tactic
• Partition workers and
interleave golden sets
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
0 1000 2000 3000 4000 5000 6000
Credibleregionwidth
Number of answers checked
Credible Region Width vs. # Answers Checked
Myth: Workers always fatigue/satisfice with time
Worker accuracy is
stable and predictable
Check forums for
Worker discussions of
your HITs
Kenji Hata, Ranjay Krishna, Li Fei-Fei, and Michael S. Bernstein. “A Glimpse Far into the Future:
Understanding Long-term Crowd Worker Accuracy.” arXiv preprint arXiv:1609.04855 (2016).
Top down (20 questions)
• Build classifier with 100 root
categories, not 20K leaf
categories
• HIT 1 labels candidate root
examples (only 10-20K ex. req’d)
• HIT 2 verifies root members
• Split root categories
• Repeat
Challenge: Minimizing cost per training example
Bottom up (data mining)
• Mine your source data for clusters
• HIT 1 assigns labels to clusters
• HIT 2 verifies members of clusters
• Delete clusters members from
source data
• Repeat
Divide and conquer to maximize validation rates
ML goal: recognize 20K categories; Dataset goal: 1K examples / category.
Challenge: Success
• Eventually, user input will
differ from the data you
trained on
• Assess your actual
recognition rates
• Use errors to guide
expansion of your test and
training datasets
Amazon Rekognition:
Lessons Learned
Ranju Das
Amazon Rekognition
Amazon Rekognition
A deep learning-based image recognition service
Search, verify, and organize millions of images
Object and scene
detection
Facial analysis Face comparison Facial recognition
What do people see?
• People see a lot more than what is imaged on the retina
 Vision involves a process called “unconscious inference” in neuroscience
 The largely unconscious nature of the inferences is confirmed by the study
of optical illusions
• In order for a human observer to recognize an image,
two neuronal processes come together:
 Sensory activation from the eyes (referential system)
 Information from past experience that is stored in distributed regions across
the brain (inferential system)
What do you see in the yellow bounding box (“region proposal”)?
“a hat”?
We “know” from correlation with other image crops and past experience that it’s a baby
People don’t classify “region proposals” in isolation
What do you see in the yellow bounding box (“region proposal”)?
“a baby”!
Examples: Common inferences
Adding “must be” invisible objects
baby
Examples: Common inferences
Adding “must be” invisible objects
fish
Examples: Common inferences
Adding “must be” invisible objects
ring
Examples: Common inferences
Betting whole from parts
baby
Examples: Common inferences
Betting whole from parts
family
Examples: Common inferences
Betting whole from parts
balloon
Examples: Common inferences
Reading (and trusting) text hints
farm
Examples: Common inferences
Reading (and trusting) text hints
chocolate
Examples: Common inferences
Reading (and trusting) text hints
pizza
Examples: Common inferences
Reading (and trusting) stereotypes and symbols
beer
Examples: Common inferences
Reading (and trusting) stereotypes and symbols
4th of July
Examples: Common inferences
Reading (and trusting) stereotypes and symbols
party
Examples: Common inferences
Gambling on the past and future
swimming
Examples: Common inferences
Gambling on the past and future
wedding
Examples: Common inferences
Gambling on the past and future
camping
Verification example
Bounding box example
Group image verification example
Yes
No
Sample Images
Descriptions>
Progress
6/200
Back
Does each image contain a cat?
How to improve quality and consistency
 Make the HITs “bite”-sized
 Create clear and concise instructions
 Ask multiple people and build consensus
 Include control images to measure performance of workers
 Use qualifications and white/black lists to control workforce
Thank you!
Remember to complete
your evaluations!

Más contenido relacionado

La actualidad más candente

아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)Amazon Web Services Korea
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...Amazon Web Services
 
Getting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningGetting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningAmazon Web Services
 
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWSAmazon Web Services
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudAmazon Web Services
 
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Amazon Web Services
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the CloudAmazon Web Services
 
(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOL(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOLAmazon Web Services
 
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...Amazon Web Services
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)Amazon Web Services Korea
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud EcosystemAmazon Web Services
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...Amazon Web Services
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Amazon Web Services
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsAmazon Web Services
 
Cloud Economics: Optimising for Cost
Cloud Economics: Optimising for CostCloud Economics: Optimising for Cost
Cloud Economics: Optimising for CostAmazon Web Services
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 

La actualidad más candente (20)

아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
Getting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningGetting Started with Amazon Machine Learning
Getting Started with Amazon Machine Learning
 
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite Cloud
 
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the Cloud
 
(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOL(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOL
 
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
Cloud Economics: Optimising for Cost
Cloud Economics: Optimising for CostCloud Economics: Optimising for Cost
Cloud Economics: Optimising for Cost
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 

Destacado

AWS User Group Berlin - Introduction To Amazon Mechanical Turk
AWS User Group Berlin - Introduction To Amazon Mechanical TurkAWS User Group Berlin - Introduction To Amazon Mechanical Turk
AWS User Group Berlin - Introduction To Amazon Mechanical TurkThomas Lobinger
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreDropsolid
 
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWSAmazon Web Services
 
AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...
AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...
AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...Amazon Web Services
 
Common Workloads on the AWS Cloud
Common Workloads on the AWS CloudCommon Workloads on the AWS Cloud
Common Workloads on the AWS CloudAmazon Web Services
 
Journey through the Cloud - Best Practices Getting Started in the AWS Cloud
Journey through the Cloud - Best Practices Getting Started in the AWS CloudJourney through the Cloud - Best Practices Getting Started in the AWS Cloud
Journey through the Cloud - Best Practices Getting Started in the AWS CloudAmazon Web Services
 
AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...
AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...
AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...Amazon Web Services
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWSAmazon Web Services
 
DNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS ShieldDNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS ShieldAmazon Web Services
 
cloud computing in e commerce
cloud computing in e commercecloud computing in e commerce
cloud computing in e commercesteffz
 
Best Practices for Running eCommerce in the AWS Cloud
Best Practices for Running eCommerce in the AWS CloudBest Practices for Running eCommerce in the AWS Cloud
Best Practices for Running eCommerce in the AWS CloudAmazon Web Services
 
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...Amazon Web Services
 
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...Amazon Web Services
 
AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...
AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...
AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...Amazon Web Services
 
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...Amazon Web Services
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Amazon Web Services
 
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)Amazon Web Services
 
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & ArchiveAmazon Web Services
 
AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)
AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)
AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)Amazon Web Services
 

Destacado (20)

AWS User Group Berlin - Introduction To Amazon Mechanical Turk
AWS User Group Berlin - Introduction To Amazon Mechanical TurkAWS User Group Berlin - Introduction To Amazon Mechanical Turk
AWS User Group Berlin - Introduction To Amazon Mechanical Turk
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
 
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
(STG202) AWS Import/Export Snowball: Large-Scale Data Ingest into AWS
 
AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...
AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...
AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Le...
 
Common Workloads on the AWS Cloud
Common Workloads on the AWS CloudCommon Workloads on the AWS Cloud
Common Workloads on the AWS Cloud
 
Journey through the Cloud - Best Practices Getting Started in the AWS Cloud
Journey through the Cloud - Best Practices Getting Started in the AWS CloudJourney through the Cloud - Best Practices Getting Started in the AWS Cloud
Journey through the Cloud - Best Practices Getting Started in the AWS Cloud
 
AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...
AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...
AWS re:Invent 2016: Robots: The Fading Line Between Real and Virtual Worlds (...
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
 
DNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS ShieldDNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS Shield
 
cloud computing in e commerce
cloud computing in e commercecloud computing in e commerce
cloud computing in e commerce
 
Best Practices for Running eCommerce in the AWS Cloud
Best Practices for Running eCommerce in the AWS CloudBest Practices for Running eCommerce in the AWS Cloud
Best Practices for Running eCommerce in the AWS Cloud
 
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
 
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
AWS re:Invent 2016: Transforming Industrial Processes with Deep Learning (MAC...
 
AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...
AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...
AWS re:Invent 2016: IoT and Beyond: Building IoT Solutions for Exploring the ...
 
Cloudschool 2014
Cloudschool 2014Cloudschool 2014
Cloudschool 2014
 
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
 
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
AWS re:Invent 2016: IoT Visualizations and Analytics (IOT306)
 
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
 
AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)
AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)
AWS re:Invent 2016: Introduction to AWS IoT in the Cloud (IOT204)
 

Similar a AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)

Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Towards a Practice of Token Engineering
Towards a Practice of Token EngineeringTowards a Practice of Token Engineering
Towards a Practice of Token EngineeringTrent McConaghy
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMarion DE SOUSA
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Dataiku
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learningShishir Choudhary
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceProduct School
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
 

Similar a AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201) (20)

Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Towards a Practice of Token Engineering
Towards a Practice of Token EngineeringTowards a Practice of Token Engineering
Towards a Practice of Token Engineering
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
 
Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
Vissec2014
Vissec2014Vissec2014
Vissec2014
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceAI Models For Fun and Profit by Walmart Director of Artificial Intelligence
AI Models For Fun and Profit by Walmart Director of Artificial Intelligence
 
It's all About the Data
It's all About the DataIt's all About the Data
It's all About the Data
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product Manager
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Peter W. Hallinan, Ph.D. A9.com November 30, 2016 MAC201 Getting to Ground Truth with Amazon Mechanical Turk
  • 2. What to expect from the session • How to use Mechanical Turk to build ML datasets • Best practices at smaller and larger scales • Lessons learned building datasets for an AWS service What not to expect • Detailed tutorial on how to use Mechanical Turk
  • 3. Machine learning requires large scale data production pipelines • Readily available: algorithms and compute clusters • Not readily available: large scale, high quality datasets Amazon Mechanical Turk can help you build your dataset • Training data is the key differentiator • Success depends on the quality, scale, and throughput of your dataset production pipelines
  • 5. What is Amazon Mechanical Turk (MTurk)? • A marketplace for getting simple tasks done in parallel by humans • Launched in 2005, one of the first AWS services • Basic unit of work is a HIT, a single, self-contained task • Example: “How many wolves are in this photo?” • Requesters use website or APIs to publish HITs to workers and consume results • One or more workers per HIT • Rapid response times • Simple workflow: HTML template, csv in, csv out
  • 6. What datasets can you build with MTurk? Almost any domain • Vision, NLP, psych, etc. All key task types • Open-ended questions • Structured questions • Binary verifications ImageNet • Prof. Fei Fei Li, Stanford AI Lab • 21841 WordNet categories • 14.1 MM total images • 1 MM localized examples • ImageNet Challenge 2010-2016 Search Google Scholar for “Mechanical Turk” + “machine learning” Result: 6000+ citations L. Fei-Fei, ImageNet: crowdsourcing, benchmarking and other cool things, CMU VASC Seminar, March, 2010.
  • 7. What kind of quality can you expect? 1. How much do your categories intrinsically overlap? 2. How representative is your “golden set”? 3. How well can workers solve your specific HIT? Wolves Dogs? Feature B Feature A 11
  • 8. What kind of quality can you expect? 1. How much do your categories intrinsically overlap? 2. How representative is your “golden set”? 3. How well can Workers solve your specific HIT? True wolves 2 Golden wolves Feature B Feature A Worker wolves 3
  • 10. Dataset construction is highly iterative... … so MTurk supports rapid iterations Source data Define HIT & golden set Evaluate HIT results Augment data Train & test ML algorithm Define objectives MTurk
  • 11. Example: Build a wristwatch classifier • ML objective: Label wristwatch “shape” • Dataset objective: ~2000 training examples • Data source: Amazon Catalog
  • 12. Experiment A: 1 shape feature, 3 categories Rectangular Circular Other
  • 14. Experiment A: Accuracy is 97.29% Golden set MTurk Circle Rect Other 700 Accuracy Precision Recall Circle 504 3 10 571 97.29% 97% 99% Rect 0 64 2 66 97% 94% Other 3 1 113 117 97% 90% 700 507 68 125 681 72% 10% 18% Can we do better?
  • 15. Experiment B: 2 shape features, 16 categories Dial face shape Casing shape Other Rectangle Circle/oval Circle/oval Rectangle Other Tonneau Tonneau
  • 17. Experiment B: Accuracy drops 4.72% to 92.57% Golden set Case/ Dial C/C T/C R/C O/C C/T T/T R/T O/T C/R T/R R/R O/R C/O T/O R/O O/O 700 Accuracy Precision Recall MTurk C/C 453 3 3 7 8 474 92.57% 96% 97% T/C 11 19 1 2 1 34 56% 86% R/C 1 1 2 50% 20% O/C 5 4 9 44% 31% C/T 0 0 - - T/T 3 1 4 75% 100% R/T 0 1 1 0% - O/T 0 0 - - C/R 0 0 - - T/R 0 0 - 0% R/R 53 2 2 57 93% 93% O/R 2 2 100% 40% C/O 0 1 1 0% - T/O 0 0 - - R/O 0 0 - - O/O 1 2 113 116 97% 90% 700 469 22 5 13 0 3 0 0 0 1 57 5 0 0 0 125 648 Prevalence 67% 3% 1% 2% 0% 0% 0% 0% 0% 0% 8% 1% 0% 0% 0% 18% A second “fuzzy” feature creates more opportunity for disagreement C = Circle/Oval; T = Tonneau; R = Rectangular; O = Other
  • 18. Experiment B: Filtering workers adds 0.14% Golden set Case/ Dial C/C T/C R/C O/C C/T T/T R/T O/T C/R T/R R/R O/R C/O T/O R/O O/O 700 Accuracy Precision Recall MTurk C/C 455 3 3 7 8 476 92.71% 96% 97% T/C 10 19 1 2 1 33 58% 86% R/C 1 2 3 33% 20% O/C 4 4 8 50% 31% C/T 0 0 - - T/T 3 1 4 75% 100% R/T 0 1 1 0% - O/T 0 0 - - C/R 0 0 - - T/R 0 0 - 0% R/R 52 2 2 56 93% 91% O/R 2 2 100% 40% C/O 0 1 1 0% - T/O 0 0 - - R/O 0 0 - - O/O 1 2 113 116 97% 90% 700 469 22 5 13 0 3 0 0 0 1 57 5 0 0 0 125 649 Prevalence 67% 3% 1% 2% 0% 0% 0% 0% 0% 0% 8% 1% 0% 0% 0% 18% Only 5/124 workers are in the minority for a majority of their votes
  • 19. Many possible experiments; we’ve reported three Quality levers Lever settings / experiments ML objectives 1 feature / 3 categories, 2 features / 16 categories Data sources and segments Held constant Golden set # examples: 100, 700 # annotators: 1,3 HIT design / instructions Picture only, text only, picture and text Worker selection Prequalified Workers per HIT 3, 5 HIT aggregation rules Majority vote vs majority of filtered workers Worker feedback None
  • 20. Quality control levers ML objectives Data sources and segments Golden sets HIT design / instructions Worker selection Workers per HIT HIT aggregation rules Worker feedback Myth: Dataset quality is an intrinsic property of the MTurk marketplace Throughput control levers • HIT price • HIT publication rate 1 2 3 MTurk provides you with control levers to optimize dataset quality and throughput
  • 21. Best practices: HIT design • Simplicity of question vs. clarity of answers • Prefer questions with limited option sets to open-ended questions • Prefer mutually exclusive, collectively exhaustive option sets • Prefer smaller option sets to larger ones • Ease of learning vs. time to complete HIT • Prefer more questions and simpler instructions to fewer questions and more complex instructions • Prefer that each possible answer set costs the same time to provide • Workers optimize their behaviors to your design; don’t “tweak” too much
  • 22. Using MTurk criteria • Geography • Worker approval rating • Total # HITs approved • Masters status • Mobile device user • Political affiliation • High school graduate • Bachelor’s degree • Marital status • Parenthood status • Voted in 2012 presidential election • Smoker • Car owner • Handedness Best practices: Selecting workers Using your own criteria • Past performance on your HITs • Custom tests of domain specific knowledge • Custom tests of decision-making ability
  • 23. Aggregating results • When using multiple workers / HIT, aggregate results with voting scheme • Align voting w/ prevalence • Moderate prevalence => Majority voting • Low prevalence => Any yes vote • Either drop split decisions, or force them into a category that can be split later Best practices: Assessing results Worker feedback • Approve and reject HITs carefully • Automatic rejections require ironclad reasons • Adjust selection criteria • Monitor emails and forums
  • 24. Best practices: Boosting quality • Separate your categories • Scrutinize false positives and negatives • Simplify and clarify instructions • Optimize worker quals • Experiment, experiment, experiment! Dataset accuracy puts an upper bound on system performance
  • 26. Challenge: Measuring quality over time Example • 1 MM HITs @ 100K / week • 99% confidence level • Confidence interval (CI) varies with sample size Three strategies • Scrutinize @ fixed CI • Scrutinize @ decreasing CI • Scrutinize and trust Potential tactic • Partition workers and interleave golden sets 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 8.0% 9.0% 0 1000 2000 3000 4000 5000 6000 Credibleregionwidth Number of answers checked Credible Region Width vs. # Answers Checked
  • 27. Myth: Workers always fatigue/satisfice with time Worker accuracy is stable and predictable Check forums for Worker discussions of your HITs Kenji Hata, Ranjay Krishna, Li Fei-Fei, and Michael S. Bernstein. “A Glimpse Far into the Future: Understanding Long-term Crowd Worker Accuracy.” arXiv preprint arXiv:1609.04855 (2016).
  • 28. Top down (20 questions) • Build classifier with 100 root categories, not 20K leaf categories • HIT 1 labels candidate root examples (only 10-20K ex. req’d) • HIT 2 verifies root members • Split root categories • Repeat Challenge: Minimizing cost per training example Bottom up (data mining) • Mine your source data for clusters • HIT 1 assigns labels to clusters • HIT 2 verifies members of clusters • Delete clusters members from source data • Repeat Divide and conquer to maximize validation rates ML goal: recognize 20K categories; Dataset goal: 1K examples / category.
  • 29. Challenge: Success • Eventually, user input will differ from the data you trained on • Assess your actual recognition rates • Use errors to guide expansion of your test and training datasets
  • 31. Amazon Rekognition A deep learning-based image recognition service Search, verify, and organize millions of images Object and scene detection Facial analysis Face comparison Facial recognition
  • 32. What do people see? • People see a lot more than what is imaged on the retina  Vision involves a process called “unconscious inference” in neuroscience  The largely unconscious nature of the inferences is confirmed by the study of optical illusions • In order for a human observer to recognize an image, two neuronal processes come together:  Sensory activation from the eyes (referential system)  Information from past experience that is stored in distributed regions across the brain (inferential system)
  • 33. What do you see in the yellow bounding box (“region proposal”)? “a hat”?
  • 34. We “know” from correlation with other image crops and past experience that it’s a baby People don’t classify “region proposals” in isolation What do you see in the yellow bounding box (“region proposal”)? “a baby”!
  • 35. Examples: Common inferences Adding “must be” invisible objects baby
  • 36. Examples: Common inferences Adding “must be” invisible objects fish
  • 37. Examples: Common inferences Adding “must be” invisible objects ring
  • 38. Examples: Common inferences Betting whole from parts baby
  • 39. Examples: Common inferences Betting whole from parts family
  • 40. Examples: Common inferences Betting whole from parts balloon
  • 41. Examples: Common inferences Reading (and trusting) text hints farm
  • 42. Examples: Common inferences Reading (and trusting) text hints chocolate
  • 43. Examples: Common inferences Reading (and trusting) text hints pizza
  • 44. Examples: Common inferences Reading (and trusting) stereotypes and symbols beer
  • 45. Examples: Common inferences Reading (and trusting) stereotypes and symbols 4th of July
  • 46. Examples: Common inferences Reading (and trusting) stereotypes and symbols party
  • 47. Examples: Common inferences Gambling on the past and future swimming
  • 48. Examples: Common inferences Gambling on the past and future wedding
  • 49. Examples: Common inferences Gambling on the past and future camping
  • 52. Group image verification example Yes No Sample Images Descriptions> Progress 6/200 Back Does each image contain a cat?
  • 53. How to improve quality and consistency  Make the HITs “bite”-sized  Create clear and concise instructions  Ask multiple people and build consensus  Include control images to measure performance of workers  Use qualifications and white/black lists to control workforce