SlideShare una empresa de Scribd logo
1 de 50
Human-Efficient Discovery of
Training Data for
Visual Machine Learning
Thesis Proposal
Ziqiang (Edmond) Feng
Committee:
Mahadev Satyanarayanan (Chair)
Martial Hebert
Roberta Klatzky
Padmanabhan Pillai (Intel Labs)
Agenda
• The Problem
• Thesis Statement
• Overview of Eureka
• Research Thrusts
• Related Work
• Timeline
2
Deep Learning for Computer Vision
Classification Detection
Segmentation Activity Recognition
 Bird
 Cat
 Dog
3
Training Data Is A Key Ingredient
Raw
pixels
Deep Neural Network
Predictio
n
(1) Forward pass
Label
(ground truth)
⊗ErrorDeep Neural Network
Predictio
n(2) Backward pass (aka back-propagation)
A training example = ( , )
Raw
pixels
Label
(ground
truth)
4
Deep learning needs
training data
a lot of
accurately labeled
5
Some Famous Labeled Training Sets
Training Set Size
Object recognition/detection
ImageNet 1,200,000 images
COCO 330,000 images
PASCAL VOC 12,000 images
Activity recognition
UCF101 13,000 videos
Charades 10,000 videos
6
7
Source: The New York Times 11/25/2018
The Training Data Problem of Domain
Experts
(scientists, military, medical doctors, etc.)
Masked palm
civet (Paguma
larvata).
Transmitter of
SARS during its
2003 outbreak
BUK-M1.
Believed to
have shot
down MH17
and killed 298,
2014.
Nuclear atypia
in pathological
images. Cue of
several
diseases and
cancers. 8
Why Is It Difficult?
Crowds are not experts.
Domain-specific expert knowledge is
required.
Interesting phenomena are rare.
Scan through a lot of data to find a few
positives.
Access restriction.
Only one or a few experts can label the
9
How can a single domain expert discover
thousands of positive examples of a rare
object from unlabeled data efficiently?
10
Thesis Statement
The manual effort of discovering a large training set for visual machine
learning can be reduced by a system combining:
• Early discard
• Just-in-time machine learning
• The ability to create more accurate filters without writing new code
11
This approach is efficient in:
• Different computing landscapes
(e.g., edge computing and smart storage)
• Different problem domains
(e.g., object detection in images and activity recognition in videos).
Agenda
• The Problem
• Thesis Statement
• Overview of Eureka
• Research Thrusts
• Related Work
• Timeline
12
Eureka
A methodology and a system.
• For finding rare phenomena in unlabeled visual data
Goal: utilize an expert’s time efficiently
• Reduce expert’s idle time
• Improve candidate examples’ quality
13
Itemizer
(scoping)
Data Source
(images, videos, map data, etc.)
User Interface
Item: independent unit of
early-discard and display
(e.g., a single image)
c kAttribute: key-value attached by
filters; facilitates communication
between filters and post-analysis
Filter: examine and try to drop
items; short-circuit evaluation of
cascade filter chains
dcba
jihg
Item Processor
F2
drop
drop
e
f k
F1
Logical Execution
Flow
14
…
Early Discard Filters
• Purpose: to drop probably negative data and narrow
search space
• Not be to taken as “perfect detector”
• Reduce the demand for expert time & attention
• Examples:
• Sky-blue color for birds
• Bullet shape for rocket propelled grenade (RPG)
15
Creating a Difference of Gaussian (DoG)
Filter
16
Early Discard: Finding Deer
17
Early-discard filters
Dropped
99.29%
Passed 0.71%
Eureka’s Iterative Workflow
18
Explicit features, manual
weights (RGB histogram,
SIFT, perceptual hashing)
Explicit features, learned
weights (HOG + SVM)
Shallow transfer learning
(MobileNet + SVM)
Deep transfer learning
(Faster R-CNN finetuning)
Deep learning
100 101 102 103 104
Number of Examples (log scale)
Accuracy(nottoscale)
Just-in-time Machine Learning
(use just-collected examples to retrain ML models)
19
Agenda
• The Problem
• Thesis Statement
• Overview of Eureka
• Research Thrusts
• Related Work
• Timeline
20
Eureka in Different Computing
Landscapes
21
Edge Computing Cloud Computing Smart Storage
Image Data
Focus:
• Computer system efficiency (throughput, latency,
etc.)
• Identify hardware and software bottleneck
• Develop techniques to improve computational
efficiency
Eureka in Different Problem Domains
22
Edge Computing
Image Data
Focus:
• Domain-specific optimization
• Expressive programming abstraction
• User productivity
Video Data
Other multi-
dimensional data (e.g.,
whole-slide image, HD
map)
Research Thrusts: Progress
23
Edge Computing
Image Data Video Data
Other multi-
dimensional data (e.g.,
whole-slide image, HD
map)
Cloud Computing Smart Storage
24
Edge Computing
Image Data Video Data
Other multi-
dimensional data (e.g.,
whole-slide image, HD
map)
Cloud Computing Smart Storage
Why the Edge?
• Data is generated on the edge
• Sensors, cameras, smart phones,
drones, self-driving cars, smart
streetlights, etc.
• Edge computing is the answer to
scalability
• Can’t afford to send all data into the
cloud for computation
• US average Internet bandwidth
(2017) = 19 Mbps
• Barely enough to stream a 4K video
by Netflix
25
System Architecture
26
Expert with
domain-specific
GUI
cloudlet
Archival
Data
Source
LAN
cloudlet
LAN
cloudlet Live
Video
I
n
t
e
r
n
e
t
Archival
Data
Source
LAN
Executes early-discard code to
drop probably irrelevant data
Only a tiny fraction of data along with
extracted information is transmitted to
user, consuming little Internet
bandwidth.
Experiments
27
• Yahoo! Flickr 100 Million (YFCC100M) images
• Unlabeled. Real-life object distribution.
• Evenly partitioned on cloudlets
Data
• 8 cloudlets
• Intel Xeon E5-1650, 32 GB DRAM
• Nvidia GTX 1060 GPU
Cloudlets
• 5 iterations for each target
• Start with SIFT, RGB color histogram, Difference of Gaussian, …
• Later: iteratively re-train SVM using MobileNet features
Workflow
Edge + Image: Results
28
Deer Taj Mahal Fire hydrant
0.07% 0.02% 0.005%Estimated base rate
(Prevalence)
111 105 74Total true positives
collected in 5 iterations
7,447 4,791 15,379Images labeled
by user
2,104,076 2,542,889 2,734,070Images discarded
by Eureka
Compare with Naïve Hand-Labeling
1,000
10,000
100,000
1,000,000
Deer Taj Mahal Fire hydrant
Images (TP+FP) the user inspected to collect ~100 true positives
Naïve hand-labeling Single-pass early discard Eureka
better
Effect of the Iterative Workflow
30
0
5
10
0 20 40 60 80
Cumulative Minutes in Workflow
Newly-discovered True Positives Per Minute
Deer (Base rate=0.07%) Taj Mahal (0.02%)
Fire Hydrant (0.005%)
Due to lower
base rate of
target
Bottlenecked by low
base rate and
computation (user’s
waiting)
better
Proximity to Data Is Important
31
0
500
1000
10 Mbps 25 Mbps 100 Mbps 1 Gbps (LAN)
ProcessingThroughput
(images/s)
Throttling bandwidth between compute and data.
RGB color histogram filter
US average:
18.7 Mbps (2017)
better
32
Edge Computing
Image Data Video Data
Other multi-
dimensional data (e.g.,
whole-slide image, HD
map)
Cloud Computing Smart Storage
Why the Cloud?
• Historically, many data sets have been centralized into the
cloud
• Elasticity -- easy to recruit more compute resource by
adding VMs
• Trade off $$$ for better use of expert time
33
Edge vs. Cloud
34
EC2 EC2 EC2 EC2
Network
S3 Storage
Edge
• Independent CPUs and disks
• Access to local disk is fast
Cloud (Amazon Web Services as example)
• Elasticity leads to separation of
computing and storage layers
• I/O stack adds extra latency
• Contention for shared
bandwidth
Edge vs. Cloud Results
0
500
1 2 3 4 5 6 7 8
Throughput(images/s)
Servers
SIFT filter
(expensive, compute-bound)
Edge Cloud (AWS)
35
0
10000
1 2 3 4 5 6 7 8
Throughput(images/s)
Servers
RGB color histogram filter
(cheap, data-bound)
Edge Cloud (AWS)
better
Fast (cheap) filters manifest I/O latency in the cloud.
What Can We Do in the Cloud
• Use extra threads to pre-fetch data asynchronously
• Utilize the many cores
• In practice -- got throttled by the service provider
• Cache data for later re-access
• Utilize the large main memory
• Useful if workload revisits data items
36
37
Edge Computing
Image Data Video Data
Other multi-
dimensional data (e.g.,
whole-slide image, HD
map)
Cloud Computing Smart Storage
Eureka + Smart Storage
(work in progress)
Smart storage = execute application logic in on-disk controllers
• Today’s disk controllers are already small computers
Why do this?
• Storage is the first thing that scales with data
• Lower energy consumption
• Fast access to data
• Passing application knowledge to the storage system for optimization
Challenges
• Low compute capacity on device
• Difficulty of programming/debugging
38
Optimizing Image Storage for Eureka
• Object store semantics  no need for partial reads
• Read only  no writes
• Read order doesn’t matter  reduce disk seek, exploit
cache, etc.
39
40
Edge Computing
Image Data Video Data
Other multi-
dimensional data (e.g.,
whole-slide image, HD
map)
Cloud Computing Smart Storage
Activity Recognition in Video
(work in progress)
• Challenge 1: the extra time dimension
• Both algorithmic and computational
41
Eureka + Video Data
• Challenge 2: gap between available training sets and real-
world data
42
UCF101 data set (Soomro et al.) Surveillance Video on Forbes Avenue near CMU
Eureka + Video Data
• Challenge 3: complex search conditions
• “Search for men”
• “Search for a man wearing red shirt running after a child on
street side”
• Features needed
• Combining techniques for video (frame sequence) and frame
(static image)
• Nesting detection
• Correlating timestamp and location
• Feeding back Eureka’s iterative approach
43
Other
Multidimensional
Data
• Examples
• Whole-slide image pyramids in
digital pathology
• Map data
• Challenges
• Query interface
• Efficient computation
• What can be discovered from the
data?
Let’s build the telescope so that
domain experts can discover craters
44
Agenda
• The Problem
• Thesis Statement
• Overview of Eureka
• Research Thrusts
• Related Work
• Timeline
45
Related Work:
DNN Training and Inference
• DNN structures
• AlexNet, VGG, Inception, MobileNet, Faster R-CNN …
• Software libraries
• TensorFlow (Google), PyTorch (Facebook), …
• Hardware accelerator
• Movidius (Intel), TPU (Google), …
• On constrained hardware
• Model compression, model quantization, …
46
• Video analytics
• VideoStorm, NoScope, Focus, FilterForward, BlazeIt, VideoEdge,
…
• Premise: a “good” model exits to detect object of interest
• Ask: “how to run it faster”?
• Batch or stream processing
• My thesis is intrinsically human-in-the-loop and interactive, and
has no good model to begin with
Related Work:
Training Data Augmentation and
Synthesis
• Traditional data augmentation
• More intelligent data synthesis based on computer
graphics and machine learning
47
Source: Dwibedi et al.
ICCV’17
Problem:
Not truly diverse examples.
Related Work:
Human-sourced Labeling
48
Crowd-sourcing
• Only useful for common targets
• Human-computer interaction, active learning,
etc.
(CVPR’13, ECCV’16, etc.)
Target Visual
Data
Target
Domain
Experts
No Coding
Barrier
Expert/crowd-sourcing
• Medical literature screening
(HCOMP’15, etc.)
Snorkel
• Ask experts to write “labeling functions”
• Infer labels using statistical models
(NeurIPS’16, VLDB’18, etc.)
My Thesis Proposal
Timeline
49
Thank you.
Questions?
50
Video demo:
https://youtu.be/Ajo0APnSV10

Más contenido relacionado

La actualidad más candente

Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
Rushin Shah
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 

La actualidad más candente (20)

Gesture Recognition
Gesture RecognitionGesture Recognition
Gesture Recognition
 
Object detection.pptx
Object detection.pptxObject detection.pptx
Object detection.pptx
 
Image feature extraction
Image feature extractionImage feature extraction
Image feature extraction
 
Stress detection using Image processing
Stress detection using Image processingStress detection using Image processing
Stress detection using Image processing
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Computer vision
Computer visionComputer vision
Computer vision
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
introduction to Digital Image Processing
introduction to Digital Image Processingintroduction to Digital Image Processing
introduction to Digital Image Processing
 
Image Processing with OpenCV
Image Processing with OpenCVImage Processing with OpenCV
Image Processing with OpenCV
 
Introduction to emotion detection
Introduction to emotion detectionIntroduction to emotion detection
Introduction to emotion detection
 
Traffic control using image processing
Traffic control using image processingTraffic control using image processing
Traffic control using image processing
 
Lung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image ClassificationLung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image Classification
 
Wiener Filter
Wiener FilterWiener Filter
Wiener Filter
 
The Near Future: AI in 2024
The Near Future: AI in 2024The Near Future: AI in 2024
The Near Future: AI in 2024
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Attendance Management System using Face Recognition
Attendance Management System using Face RecognitionAttendance Management System using Face Recognition
Attendance Management System using Face Recognition
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Multimodal Deep Learning
Multimodal Deep LearningMultimodal Deep Learning
Multimodal Deep Learning
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 
Integration of Machine Learning in attendance and payroll
Integration of Machine Learning in attendance and payrollIntegration of Machine Learning in attendance and payroll
Integration of Machine Learning in attendance and payroll
 

Similar a PhD Thesis Proposal

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
Turi, Inc.
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is you
Davide Carboni
 

Similar a PhD Thesis Proposal (20)

陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.ppt
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
SKG-2013, Beijing, China, 03 October 2013
SKG-2013, Beijing, China, 03 October 2013SKG-2013, Beijing, China, 03 October 2013
SKG-2013, Beijing, China, 03 October 2013
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Blackhat USA 2016 - What's the DFIRence for ICS?
Blackhat USA 2016 - What's the DFIRence for ICS?Blackhat USA 2016 - What's the DFIRence for ICS?
Blackhat USA 2016 - What's the DFIRence for ICS?
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is you
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

PhD Thesis Proposal

  • 1. Human-Efficient Discovery of Training Data for Visual Machine Learning Thesis Proposal Ziqiang (Edmond) Feng Committee: Mahadev Satyanarayanan (Chair) Martial Hebert Roberta Klatzky Padmanabhan Pillai (Intel Labs)
  • 2. Agenda • The Problem • Thesis Statement • Overview of Eureka • Research Thrusts • Related Work • Timeline 2
  • 3. Deep Learning for Computer Vision Classification Detection Segmentation Activity Recognition  Bird  Cat  Dog 3
  • 4. Training Data Is A Key Ingredient Raw pixels Deep Neural Network Predictio n (1) Forward pass Label (ground truth) ⊗ErrorDeep Neural Network Predictio n(2) Backward pass (aka back-propagation) A training example = ( , ) Raw pixels Label (ground truth) 4
  • 5. Deep learning needs training data a lot of accurately labeled 5
  • 6. Some Famous Labeled Training Sets Training Set Size Object recognition/detection ImageNet 1,200,000 images COCO 330,000 images PASCAL VOC 12,000 images Activity recognition UCF101 13,000 videos Charades 10,000 videos 6
  • 7. 7 Source: The New York Times 11/25/2018
  • 8. The Training Data Problem of Domain Experts (scientists, military, medical doctors, etc.) Masked palm civet (Paguma larvata). Transmitter of SARS during its 2003 outbreak BUK-M1. Believed to have shot down MH17 and killed 298, 2014. Nuclear atypia in pathological images. Cue of several diseases and cancers. 8
  • 9. Why Is It Difficult? Crowds are not experts. Domain-specific expert knowledge is required. Interesting phenomena are rare. Scan through a lot of data to find a few positives. Access restriction. Only one or a few experts can label the 9
  • 10. How can a single domain expert discover thousands of positive examples of a rare object from unlabeled data efficiently? 10
  • 11. Thesis Statement The manual effort of discovering a large training set for visual machine learning can be reduced by a system combining: • Early discard • Just-in-time machine learning • The ability to create more accurate filters without writing new code 11 This approach is efficient in: • Different computing landscapes (e.g., edge computing and smart storage) • Different problem domains (e.g., object detection in images and activity recognition in videos).
  • 12. Agenda • The Problem • Thesis Statement • Overview of Eureka • Research Thrusts • Related Work • Timeline 12
  • 13. Eureka A methodology and a system. • For finding rare phenomena in unlabeled visual data Goal: utilize an expert’s time efficiently • Reduce expert’s idle time • Improve candidate examples’ quality 13
  • 14. Itemizer (scoping) Data Source (images, videos, map data, etc.) User Interface Item: independent unit of early-discard and display (e.g., a single image) c kAttribute: key-value attached by filters; facilitates communication between filters and post-analysis Filter: examine and try to drop items; short-circuit evaluation of cascade filter chains dcba jihg Item Processor F2 drop drop e f k F1 Logical Execution Flow 14 …
  • 15. Early Discard Filters • Purpose: to drop probably negative data and narrow search space • Not be to taken as “perfect detector” • Reduce the demand for expert time & attention • Examples: • Sky-blue color for birds • Bullet shape for rocket propelled grenade (RPG) 15
  • 16. Creating a Difference of Gaussian (DoG) Filter 16
  • 17. Early Discard: Finding Deer 17 Early-discard filters Dropped 99.29% Passed 0.71%
  • 18. Eureka’s Iterative Workflow 18 Explicit features, manual weights (RGB histogram, SIFT, perceptual hashing) Explicit features, learned weights (HOG + SVM) Shallow transfer learning (MobileNet + SVM) Deep transfer learning (Faster R-CNN finetuning) Deep learning 100 101 102 103 104 Number of Examples (log scale) Accuracy(nottoscale) Just-in-time Machine Learning (use just-collected examples to retrain ML models)
  • 19. 19
  • 20. Agenda • The Problem • Thesis Statement • Overview of Eureka • Research Thrusts • Related Work • Timeline 20
  • 21. Eureka in Different Computing Landscapes 21 Edge Computing Cloud Computing Smart Storage Image Data Focus: • Computer system efficiency (throughput, latency, etc.) • Identify hardware and software bottleneck • Develop techniques to improve computational efficiency
  • 22. Eureka in Different Problem Domains 22 Edge Computing Image Data Focus: • Domain-specific optimization • Expressive programming abstraction • User productivity Video Data Other multi- dimensional data (e.g., whole-slide image, HD map)
  • 23. Research Thrusts: Progress 23 Edge Computing Image Data Video Data Other multi- dimensional data (e.g., whole-slide image, HD map) Cloud Computing Smart Storage
  • 24. 24 Edge Computing Image Data Video Data Other multi- dimensional data (e.g., whole-slide image, HD map) Cloud Computing Smart Storage
  • 25. Why the Edge? • Data is generated on the edge • Sensors, cameras, smart phones, drones, self-driving cars, smart streetlights, etc. • Edge computing is the answer to scalability • Can’t afford to send all data into the cloud for computation • US average Internet bandwidth (2017) = 19 Mbps • Barely enough to stream a 4K video by Netflix 25
  • 26. System Architecture 26 Expert with domain-specific GUI cloudlet Archival Data Source LAN cloudlet LAN cloudlet Live Video I n t e r n e t Archival Data Source LAN Executes early-discard code to drop probably irrelevant data Only a tiny fraction of data along with extracted information is transmitted to user, consuming little Internet bandwidth.
  • 27. Experiments 27 • Yahoo! Flickr 100 Million (YFCC100M) images • Unlabeled. Real-life object distribution. • Evenly partitioned on cloudlets Data • 8 cloudlets • Intel Xeon E5-1650, 32 GB DRAM • Nvidia GTX 1060 GPU Cloudlets • 5 iterations for each target • Start with SIFT, RGB color histogram, Difference of Gaussian, … • Later: iteratively re-train SVM using MobileNet features Workflow
  • 28. Edge + Image: Results 28 Deer Taj Mahal Fire hydrant 0.07% 0.02% 0.005%Estimated base rate (Prevalence) 111 105 74Total true positives collected in 5 iterations 7,447 4,791 15,379Images labeled by user 2,104,076 2,542,889 2,734,070Images discarded by Eureka
  • 29. Compare with Naïve Hand-Labeling 1,000 10,000 100,000 1,000,000 Deer Taj Mahal Fire hydrant Images (TP+FP) the user inspected to collect ~100 true positives Naïve hand-labeling Single-pass early discard Eureka better
  • 30. Effect of the Iterative Workflow 30 0 5 10 0 20 40 60 80 Cumulative Minutes in Workflow Newly-discovered True Positives Per Minute Deer (Base rate=0.07%) Taj Mahal (0.02%) Fire Hydrant (0.005%) Due to lower base rate of target Bottlenecked by low base rate and computation (user’s waiting) better
  • 31. Proximity to Data Is Important 31 0 500 1000 10 Mbps 25 Mbps 100 Mbps 1 Gbps (LAN) ProcessingThroughput (images/s) Throttling bandwidth between compute and data. RGB color histogram filter US average: 18.7 Mbps (2017) better
  • 32. 32 Edge Computing Image Data Video Data Other multi- dimensional data (e.g., whole-slide image, HD map) Cloud Computing Smart Storage
  • 33. Why the Cloud? • Historically, many data sets have been centralized into the cloud • Elasticity -- easy to recruit more compute resource by adding VMs • Trade off $$$ for better use of expert time 33
  • 34. Edge vs. Cloud 34 EC2 EC2 EC2 EC2 Network S3 Storage Edge • Independent CPUs and disks • Access to local disk is fast Cloud (Amazon Web Services as example) • Elasticity leads to separation of computing and storage layers • I/O stack adds extra latency • Contention for shared bandwidth
  • 35. Edge vs. Cloud Results 0 500 1 2 3 4 5 6 7 8 Throughput(images/s) Servers SIFT filter (expensive, compute-bound) Edge Cloud (AWS) 35 0 10000 1 2 3 4 5 6 7 8 Throughput(images/s) Servers RGB color histogram filter (cheap, data-bound) Edge Cloud (AWS) better Fast (cheap) filters manifest I/O latency in the cloud.
  • 36. What Can We Do in the Cloud • Use extra threads to pre-fetch data asynchronously • Utilize the many cores • In practice -- got throttled by the service provider • Cache data for later re-access • Utilize the large main memory • Useful if workload revisits data items 36
  • 37. 37 Edge Computing Image Data Video Data Other multi- dimensional data (e.g., whole-slide image, HD map) Cloud Computing Smart Storage
  • 38. Eureka + Smart Storage (work in progress) Smart storage = execute application logic in on-disk controllers • Today’s disk controllers are already small computers Why do this? • Storage is the first thing that scales with data • Lower energy consumption • Fast access to data • Passing application knowledge to the storage system for optimization Challenges • Low compute capacity on device • Difficulty of programming/debugging 38
  • 39. Optimizing Image Storage for Eureka • Object store semantics  no need for partial reads • Read only  no writes • Read order doesn’t matter  reduce disk seek, exploit cache, etc. 39
  • 40. 40 Edge Computing Image Data Video Data Other multi- dimensional data (e.g., whole-slide image, HD map) Cloud Computing Smart Storage
  • 41. Activity Recognition in Video (work in progress) • Challenge 1: the extra time dimension • Both algorithmic and computational 41
  • 42. Eureka + Video Data • Challenge 2: gap between available training sets and real- world data 42 UCF101 data set (Soomro et al.) Surveillance Video on Forbes Avenue near CMU
  • 43. Eureka + Video Data • Challenge 3: complex search conditions • “Search for men” • “Search for a man wearing red shirt running after a child on street side” • Features needed • Combining techniques for video (frame sequence) and frame (static image) • Nesting detection • Correlating timestamp and location • Feeding back Eureka’s iterative approach 43
  • 44. Other Multidimensional Data • Examples • Whole-slide image pyramids in digital pathology • Map data • Challenges • Query interface • Efficient computation • What can be discovered from the data? Let’s build the telescope so that domain experts can discover craters 44
  • 45. Agenda • The Problem • Thesis Statement • Overview of Eureka • Research Thrusts • Related Work • Timeline 45
  • 46. Related Work: DNN Training and Inference • DNN structures • AlexNet, VGG, Inception, MobileNet, Faster R-CNN … • Software libraries • TensorFlow (Google), PyTorch (Facebook), … • Hardware accelerator • Movidius (Intel), TPU (Google), … • On constrained hardware • Model compression, model quantization, … 46 • Video analytics • VideoStorm, NoScope, Focus, FilterForward, BlazeIt, VideoEdge, … • Premise: a “good” model exits to detect object of interest • Ask: “how to run it faster”? • Batch or stream processing • My thesis is intrinsically human-in-the-loop and interactive, and has no good model to begin with
  • 47. Related Work: Training Data Augmentation and Synthesis • Traditional data augmentation • More intelligent data synthesis based on computer graphics and machine learning 47 Source: Dwibedi et al. ICCV’17 Problem: Not truly diverse examples.
  • 48. Related Work: Human-sourced Labeling 48 Crowd-sourcing • Only useful for common targets • Human-computer interaction, active learning, etc. (CVPR’13, ECCV’16, etc.) Target Visual Data Target Domain Experts No Coding Barrier Expert/crowd-sourcing • Medical literature screening (HCOMP’15, etc.) Snorkel • Ask experts to write “labeling functions” • Infer labels using statistical models (NeurIPS’16, VLDB’18, etc.) My Thesis Proposal

Notas del editor

  1. Supervised learning
  2. Where does this data come from?
  3. AMT. ImageNet, COCO used crowd-sourcing.
  4. Valuable to have DNN …
  5. Each item is a single image. Only a few items pass and transmit to user.
  6. Can we improve the accuracy?
  7. Two things go hand-in-hand
  8. BW between data and compute.
  9. Although these are not scientific targets … To understand where the improvement comes from …
  10. Note y-axis in log scale.
  11. Numbers are small because FP. Iterate -> fewer false positive -> productivity go up
  12. Single frame doesn’t tell action.
  13. Use a design
  14. On the application side …
  15. Can’t not detect new breed of cats