SlideShare a Scribd company logo
1 of 32
Download to read offline
Enterprise Grade Data Labeling
Design Your Ground Truth to Scale in Production
Jai Natarajan
jai@imerit.net
Obsession + Craft
Obsession + Craft
AI Production Pipeline
Data
collection
Data
Annotation
Model
training Deployment
Feedback Loop
Software 2.0
“A large portion of programmers of tomorrow … collect, clean,
manipulate, label, analyze and visualize data that feeds neural
networks."
Andrej Karpathy, Tesla
The data is an intrinsic part of the algorithm
Outcome depends as much on the data as on the code
TLDR: There are ways to be as mindful about your data
strategy as you are about your algorithm strategy
Algorithm Training is Algorithm Design
The Data Situation
Data Annotation Takes Time
Figure-Eight estimates 80% of
development time spent on Data Prep
and Labeling
Cognilytica estimates 25% of time spent
on Data Labeling
Data Annotation Needs Are Substantial
Automotive Customer
● 250 k – 500 k frames per month
● Average 10 objects/frame for object detection
● Average 45 mins per frame for full segmentation
● Multiple judgements (3-5) on each data piece
Medical Image Customer
● 200 k endoscopic scans
● Average 2 anomalies per scan
● Multiple judgements (3-5) on each data piece
Bounding
Boxes
Polygons
Segmentation
PanOptic
Segmentation
Tracking
LIDAR
MultiSensor
Fusion
Data Annotation is increasingly Complex
Simple Boxes
+Secs/ task
Precise
Boundaries on
some objects
+Mins/task
All objects
precisely
marked
+30mins/task
All objects
precisely
marked and
clubbed by
type
+45mins/task
Objects
marked and
tracked across
frames of
video
+30mins/task
Thousands of
points clubbed
into objects
+90mins/task
Combine
LIDAR and
images from
multiple angles
+90mins/task
Complex Subject Matter
Healthcare, finance, law
Jargon-Rich Domain
Image editing, e-commerce
(brand jargon)
SKILLED
GENERAL
Specific World Knowledge
Current events, fashion
General Knowledge
Travel AI assistant
SPECIALIZED
EXPERT
Diagnosis & Treatment
Clinical History, Epidemiology,
Contextual analysis
Classification
Pathophysiology, multiple
dependency decision tree
Identification
Anatomy & Physiology, Pattern
Recognition, Ontological
Understanding
Segmentation
Navigation & Tool Familiarity
DOMAIN
LABELING
Data Annotation involves Domains
Data Security and Audit Trail
Quality and Consistency
Custom Tooling and Insights
Domain Knowledge & Targeted Skilling
Retained Learnings across Iterations
The Case For Enterprise Annotation
Enterprise Annotation @ iMerit
iMerit is a tech-enabled data services company that leverages human intelligence in
data, content, and machine learning.
We deliver high-quality, managed services while effecting
positive social and economic change.
Our data experts work full-time onsite at our secure delivery facilities.
We are iMerit
24x7
operations
< 5%
attrition
9
centers
200 M+
data points
delivered
130+
clients
SOC 2
certified
2,600
employees
Annotation Specialties
Capture Video during game
Mark joint positions of pitcher
Build 3D skeleton for analytics
Expand to multiple teams
Extend to batters, fielders
HELPING CHICAGO CUBS WIN WORLD SERIES
• Street scenes for Autonomous Vehicles -Images + LiDAR
• Named Entitites/Salience in Financial Documents
• Aerial Imagery of healthy and diseased crops
• Peril Assessment for Property Insurance
• Identification of tumors and lesions in medical scans
• Risk Assessment of Power Assets
Experience and Expertise
Annotation Framework
TRAINING
EXPERT
CONSULTATION
FEEDBACK
CYCLE
WORKFLOW
CUSTOMIZATION
EVALUATION
Collaborative Framework
ML Engineer
Subject-matter Expert
Trainer
Use case
Edge case discovery
Task design (granularity,
cognitive load of task)
1. Expert Consultation
For generalists
Narrow and Deep
Example Rich, requires time to
train, practice, and iterate
2. Guidelines & Training
Data and QC Pipeline
UI optimizations
Crawl (calibration)
Walk (soft production-rapid feedback)
Run (production, internal QA)
Supports scale, ensures quality
3. Workflow Customization
Collaboration: SMEs, PM,
engineer, generalists
Insights into unanticipated
deviations
No penalty for challenging
assumptions
Improve model by identifying biases
Ensure reliability of annotations
4. Feedback Cycle
Key metrics & thresholds
Share responsibility
Test against gold set
Measure inter-rater reliability
Increase rigor over project life
Minimize rework iterations
Ensures quality
Validates assumptions
5. Evaluation
Good Annotation Design
Good Annotation Design: Context Matters
Person or Vehicle?
Good Annotation Design: Context Matters
Are you trying to avoid
hitting people or are you
counting vehicles?
Person or Vehicle?
Good Annotation Design: UI Matters
I want bounding boxes no
smaller than 1.5 cms. in
any dimension
Go for it !
iMerit Solution Architect + Customer
Expert:
Unpack the jargon
Create deep and narrow training
curriculum (docs, videos, video-
confs)
Retain learnings across time
Good Annotation Design: Domain Specific
Good Annotation Design: Allow Open Feedback
● Conversation around quality
Are some errors more important than other errors ?
How will you sample quality ?
● Safe space to Iterate without penalty
● Small discovery and calibration pilots
● Ask your labeling force to question edge cases
Summary – Mindful Data Annotation
Data strategy as mindful as your
algorithm strategy
● Ask the right questions
● Plan time and budget
● Plan for increased skill needs
● Partner with your annotation
team
● Create an environment where
insight is possible
● Build long term, secure, scalable
pipeline
Thank You!
jai@imerit.net

More Related Content

What's hot

Intel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew ParryIntel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew Parrymikeohara
 
940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptop940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptopRising Media, Inc.
 
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS cscpconf
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPeculium Crypto
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
 
Applications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud ProductsApplications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud ProductsLisa Cohen
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahidBigDataExpo
 
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Sri Ambati
 
How AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceHow AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceLucidworks
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdxThinkful
 
Hyf azure ml_1
Hyf azure ml_1Hyf azure ml_1
Hyf azure ml_1KatoK1
 

What's hot (11)

Intel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew ParryIntel Faster Risk Oct08 - Andrew Parry
Intel Faster Risk Oct08 - Andrew Parry
 
940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptop940 diamond sponsor sengupta,_using our laptop
940 diamond sponsor sengupta,_using our laptop
 
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
DATA SCIENCE METHODOLOGY FOR CYBERSECURITY PROJECTS
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedback
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
Applications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud ProductsApplications of Data Science in Microsoft Cloud Products
Applications of Data Science in Microsoft Cloud Products
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahid
 
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
Nick Schmidt, BLDS - Responsible Data Science: Identifying and Fixing Biased ...
 
How AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceHow AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee Experience
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdx
 
Hyf azure ml_1
Hyf azure ml_1Hyf azure ml_1
Hyf azure ml_1
 

Similar to Enterprise Grade Data Labeling: Design Your Ground Truth to Scale in Production

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaCapgemini
 
Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)AnandSRao1962
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionSkyl.ai
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Maxim Salnikov
 
Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025Lucidworks
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLPSkyl.ai
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)Michael King
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleMaxim Salnikov
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY
 
How AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform OrganizationsHow AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform OrganizationsSkyl.ai
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Pragmatic Enterprise Architecture
Pragmatic Enterprise ArchitecturePragmatic Enterprise Architecture
Pragmatic Enterprise ArchitectureE2 Partners
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Bhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogueBhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogueVijayananda Mohire
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleMartin Dvorak
 

Similar to Enterprise Grade Data Labeling: Design Your Ground Truth to Scale in Production (20)

DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-indiaArtificial intelligence capabilities overview yashowardhan sowale cwin18-india
Artificial intelligence capabilities overview yashowardhan sowale cwin18-india
 
Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)Functionalities in AI Applications and Use Cases (OECD)
Functionalities in AI Applications and Use Cases (OECD)
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
 
Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025Webinar: Enterprise Search in 2025
Webinar: Enterprise Search in 2025
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLP
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)CIO 101 for Entrepreneurs (2016)
CIO 101 for Entrepreneurs (2016)
 
ChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scaleChatGPT and not only: how can you use the power of Generative AI at scale
ChatGPT and not only: how can you use the power of Generative AI at scale
 
Sumyag profile deck
Sumyag profile deck Sumyag profile deck
Sumyag profile deck
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
How AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform OrganizationsHow AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform Organizations
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Pragmatic Enterprise Architecture
Pragmatic Enterprise ArchitecturePragmatic Enterprise Architecture
Pragmatic Enterprise Architecture
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Bhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogueBhadale group of companies data science project methodologies catalogue
Bhadale group of companies data science project methodologies catalogue
 
Explainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI moduleExplainable AI with H2O Driverless AI's MLI module
Explainable AI with H2O Driverless AI's MLI module
 

Recently uploaded

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 

Recently uploaded (20)

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 

Enterprise Grade Data Labeling: Design Your Ground Truth to Scale in Production

  • 1. Enterprise Grade Data Labeling Design Your Ground Truth to Scale in Production Jai Natarajan jai@imerit.net
  • 5. Software 2.0 “A large portion of programmers of tomorrow … collect, clean, manipulate, label, analyze and visualize data that feeds neural networks." Andrej Karpathy, Tesla
  • 6. The data is an intrinsic part of the algorithm Outcome depends as much on the data as on the code TLDR: There are ways to be as mindful about your data strategy as you are about your algorithm strategy Algorithm Training is Algorithm Design
  • 8. Data Annotation Takes Time Figure-Eight estimates 80% of development time spent on Data Prep and Labeling Cognilytica estimates 25% of time spent on Data Labeling
  • 9. Data Annotation Needs Are Substantial Automotive Customer ● 250 k – 500 k frames per month ● Average 10 objects/frame for object detection ● Average 45 mins per frame for full segmentation ● Multiple judgements (3-5) on each data piece Medical Image Customer ● 200 k endoscopic scans ● Average 2 anomalies per scan ● Multiple judgements (3-5) on each data piece
  • 10. Bounding Boxes Polygons Segmentation PanOptic Segmentation Tracking LIDAR MultiSensor Fusion Data Annotation is increasingly Complex Simple Boxes +Secs/ task Precise Boundaries on some objects +Mins/task All objects precisely marked +30mins/task All objects precisely marked and clubbed by type +45mins/task Objects marked and tracked across frames of video +30mins/task Thousands of points clubbed into objects +90mins/task Combine LIDAR and images from multiple angles +90mins/task
  • 11. Complex Subject Matter Healthcare, finance, law Jargon-Rich Domain Image editing, e-commerce (brand jargon) SKILLED GENERAL Specific World Knowledge Current events, fashion General Knowledge Travel AI assistant SPECIALIZED EXPERT Diagnosis & Treatment Clinical History, Epidemiology, Contextual analysis Classification Pathophysiology, multiple dependency decision tree Identification Anatomy & Physiology, Pattern Recognition, Ontological Understanding Segmentation Navigation & Tool Familiarity DOMAIN LABELING Data Annotation involves Domains
  • 12. Data Security and Audit Trail Quality and Consistency Custom Tooling and Insights Domain Knowledge & Targeted Skilling Retained Learnings across Iterations The Case For Enterprise Annotation
  • 14. iMerit is a tech-enabled data services company that leverages human intelligence in data, content, and machine learning. We deliver high-quality, managed services while effecting positive social and economic change. Our data experts work full-time onsite at our secure delivery facilities. We are iMerit 24x7 operations < 5% attrition 9 centers 200 M+ data points delivered 130+ clients SOC 2 certified 2,600 employees
  • 16. Capture Video during game Mark joint positions of pitcher Build 3D skeleton for analytics Expand to multiple teams Extend to batters, fielders HELPING CHICAGO CUBS WIN WORLD SERIES
  • 17. • Street scenes for Autonomous Vehicles -Images + LiDAR • Named Entitites/Salience in Financial Documents • Aerial Imagery of healthy and diseased crops • Peril Assessment for Property Insurance • Identification of tumors and lesions in medical scans • Risk Assessment of Power Assets Experience and Expertise
  • 20. ML Engineer Subject-matter Expert Trainer Use case Edge case discovery Task design (granularity, cognitive load of task) 1. Expert Consultation
  • 21. For generalists Narrow and Deep Example Rich, requires time to train, practice, and iterate 2. Guidelines & Training
  • 22. Data and QC Pipeline UI optimizations Crawl (calibration) Walk (soft production-rapid feedback) Run (production, internal QA) Supports scale, ensures quality 3. Workflow Customization
  • 23. Collaboration: SMEs, PM, engineer, generalists Insights into unanticipated deviations No penalty for challenging assumptions Improve model by identifying biases Ensure reliability of annotations 4. Feedback Cycle
  • 24. Key metrics & thresholds Share responsibility Test against gold set Measure inter-rater reliability Increase rigor over project life Minimize rework iterations Ensures quality Validates assumptions 5. Evaluation
  • 26. Good Annotation Design: Context Matters Person or Vehicle?
  • 27. Good Annotation Design: Context Matters Are you trying to avoid hitting people or are you counting vehicles? Person or Vehicle?
  • 28. Good Annotation Design: UI Matters I want bounding boxes no smaller than 1.5 cms. in any dimension Go for it !
  • 29. iMerit Solution Architect + Customer Expert: Unpack the jargon Create deep and narrow training curriculum (docs, videos, video- confs) Retain learnings across time Good Annotation Design: Domain Specific
  • 30. Good Annotation Design: Allow Open Feedback ● Conversation around quality Are some errors more important than other errors ? How will you sample quality ? ● Safe space to Iterate without penalty ● Small discovery and calibration pilots ● Ask your labeling force to question edge cases
  • 31. Summary – Mindful Data Annotation Data strategy as mindful as your algorithm strategy ● Ask the right questions ● Plan time and budget ● Plan for increased skill needs ● Partner with your annotation team ● Create an environment where insight is possible ● Build long term, secure, scalable pipeline