During this webinar Unity’s computer vision team provides an overview of computer vision, walks through current real-world data workflows, and explains why companies are moving toward synthetically generated data as an alternate data source for model training.
Watch the webinar: https://resources.unity.com/ai-ml/cv-webinar-dec-2021
Using synthetic data for computer vision model training
1. Using synthetic data
for computer vision
model training
WEBINAR
DECEMBER 9,
2021
Alex Thaman
Senior Manager,
Computer Vision
Kevin Saito
Senior Manager, AI
Commercialization
Salehe Erfanian Ebadi
Senior ML Developer, AMLR
2. Agenda
→ Computer vision overview and advantages of synthetic data
→ Applying synthetic data to production systems
→ Synthetic data case studies
→ Unity’s research with synthetic data
→ Synthetic data generators
→ Q&A
5. Training computer vision models on
real-world data has been the answer, but...
It’s time
consuming
It’s biased
and
inefficient
It’s
expensive
It’s not always
privacy-
compliant.
Computer vision overview and advantages of synthetic data
6. Computer vision overview and advantages of synthetic data
* 70% of time is spent on data collection, labeling and annotation
Typical computer vision workflow
Acquire real
world images
Label and
annotate images
Train
CV model
Evaluate
CV model
Deploy
CV model
Iteration
* *
7. Solving challenges with ...
Data
collection
Data
labelling
● Insufficient data for their
project due to
non-availability of data
● Privacy and compliance
hindering data collection
● Human labeling is costly,
time consuming and error
prone
● Bias/errors in collected data
as the collected data
represents only a subset of
the population
Computer vision overview and advantages of synthetic data
8. Computer vision overview and advantages of synthetic data
Object detection
Semantic
segmentation
Instance segmentation
Panoptic
segmentation
Cost of labeling increases with complexity
Input Labels
9. Situations with lots of assets to label or
background labeling is required
When variational differences are subtle
Impracticality of real-world data in many situations
When the situation occurs very
infrequently or is impractical to capture
Computer vision overview and advantages of synthetic data
10. Domain randomization
Vary features of your dataset to make your model more
robust
Computer vision overview and and advantages of synthetic data
→ Lighting
→ Background
→ Object orientation
→ Distractor objects
11. Complex orientations Complex configurations Complex lighting
Green = Accurate detections // Red = Missed detections
Real only
Synthetic
+ real
20-30% more accurate
detections under
different conditions
for this use case
Performance improvements with synthetic data
13. Bringing AI to production
P R O B L E M
“Quality of Service” - How can
I be sure that my system
works well, and continues to
work well, in the real world?
- Pre-production: Development /
Production data mismatch,
edge cases, selection bias
- Post-production: Model Drift,
Survivorship Bias
Applying synthetic data to production systems
S O L U T I O N
Model generalization with
synthetic data via domain
randomization
- Synthetic data solution! =
Real world solution
- We want to leverage the
programmability of synthetic
data as a strength
14. Why does it work?
- Domain randomization
- Perturbations to the environment do not have to be realistic, but merely show
variation along dimensions that also vary in the real world
Intervention Design for Effective Sim2Real Transfer -
https://arxiv.org/pdf/2012.02055.pdf
- Focuses on building “Domain Invariance” - if backgrounds should not matter for
detecting objects, teach the model that the background does not matter.
- Well-known research on domain randomization
- Domain Randomization for Transferring Deep Neural Networks from Simulation to the
Real World (OpenAI) - https://arxiv.org/pdf/1703.06907.pdf
- Structured Domain Randomization: Bridging the Reality Gap by Context-Aware
Synthetic Data (nVidia) - https://arxiv.org/pdf/1810.10093.pdf
- An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for
Object Instance Detection (Google Cloud AI) - https://arxiv.org/pdf/1902.09967.pdf
- Large sets of highly-varied synthetic data + small sets of real world data produce the best
results
Applying synthetic data to production systems
16. Case studies
Neural Pocket
unity.com/case-study/neural-pocket
Customer problem
As a smart city solutions
provider Neural Pocket needs
scalable ways to train systems
to recognize vehicles, people,
smartphones, and identify
potential security threats.
Resulting objective
Reduce the cycle time and
overall costs for creating
production ready computer
vision models
Cost of the real-world data
Using real world-data Neural
Pocket typically had to do 30
training cycles which costs
$60K–150K and took 4-6 months
per project
19. Case studies
Audere
resources.unity.com/ai-ml-content/audere-session
Customer problem
High labor costs to read COVID
tests and report results,
possibility for human error at
scale.
Resulting objective
Build a mobile application that
will read a result from a COVID
test kit to improve reliability and
reduce costs, with minimal
human oversight.
Problem they ran into
COVID kits change frequently
(monthly), test result
appearances vary widely even
within a single kit. Kits are
required to be stored in a bio
safety lab with no windows until
deployment to real world, no
available real training data with
natural lighting or shadows.
20. Case studies
Audere
Locating the test kit parts
(brand, diagnostic, etc.)
-> OBJECT DETECTION Reading test results as
positive/negative
-> IMAGE CLASSIFICATION
21. 21
Approach
→ Create a digital copy of test kits with an artist
→ Place test kits into Unity with random backgrounds,
lighting, blur, etc.
→ Use procedural material for test kit strips to create high
variations on test results
→
Results
→ Able to match performance of full real world dataset
using 4x less real world data and ~8k synthetic images
→ Synthetic trained models were more resilient to
adverse conditions
→
Audere
Case studies
23. Unity’s research with synthetic data
PeopleSansPeople
People + Sans (Middle English for “without”) + People
A data generator for a few human-centric computer
vision tasks without needing real-world human data.
27. Unity’s research with synthetic data
What does PeopleSansPeople
provide?
● 28 parameterized simulation-ready 3D human assets
● 39 diverse animation clips
● 21,952 unique clothing textures (from 28 albedos, 28 masks,
and 28 normals)
● Parameterized lighting
● Parameterized camera system
● Natural backgrounds
● Primitive occluders/distractors
● All packaged in a macOS and Linux binary
28. Unity’s research with synthetic data
Which CV tasks does
PeopleSansPeople target?
● Human (2D and 3D bounding box) detection
● Human keypoint detection
● Human semantic/instance segmentation
43. Dataset Statistics and Analysis
COCO
Synth
JTA
● Synth data from PeopleSansPeople has higher diversity of poses.
● Also our pose footprint encompasses those of COCO and JTA.
44. Model Training
● Detectron2 Keypoint R-CNN R50-FPN model
● We train models from scratch on real and synthetic data
● We train models pre-trained on synthetic data and fine-tune on real
data
● In both cases above, we
○ use different subsets of the data (1%, 10%, 50%, and 100%)
○ perform evaluation on real data
47. Results
COCO test-dev2017
COCO person-val2017
● Adding more synthetic pre-training data boosts performance in few-shot and full-shot training, although
due to domain gap zero-shot performance is not good.
● Also adding more fine-tuning real data unsurprisingly increases performance.
48. Results
Comparison of gains obtained from synthetic pre-training vs. training from scratch and ImageNet weights
49. Results
Comparison of gains obtained from synthetic pre-training vs. training from scratch and ImageNet weights
50. Results
Comparison of gains obtained from synthetic pre-training vs. training from scratch and ImageNet weights
For domain-specific tasks, such as human-centric computer vision, domain-specific synthetic pre-training
offers much bigger advantage over ImageNet pre-training. The advantage is even more pronounced when
fine-tuning data is scarce, as is the case with human data, due to ethical, legal, and privacy reasons.
54. Creating a Synthetic Data Generator
- Optimal synthetic data generation does not involve replicating real data
collection strategies
- Start with data diversity
- Then focus on domain adaptation (as needed)
- Define your problem
- What am I predicting?
- What distributions do I know that I need?
- Which variables do I have uncertainty?
- Build a “Data Generator”: Assets + Sensor/Labeler + Randomizers →
Data
- These generators allow experimentation across ranges and
distributions with multiple exposed “data hyperparameters”
- Scale in the cloud
56. Asset sourcing
- Often need very specific objects for your use case - products, parts, etc. Multiple
approaches to acquiring “digital twins”:
- Artist modeling
- Contract artists to build assets or environments on a contract basis
- Often see costs up to $100 per object
- Building assets for computer vision use cases is relatively new and requirements are not
well understood
- Scanning
- Create a 3D shape and scan all sides of the object
- Works well for rectangular/boxy objects, more difficult for complex shapes
- Typically needs artist cleanup/refinement
- Photogrammetry
- Use a 3D scanner to create a digital twin
- Many tools do not reliably handle reflections and transparency and require artist
cleanup/augmentation
- Procedural/Parameterized models
- Useful for cases where you need a wide variance of a particular semantic category
57. Asset sourcing – Unity Asset Store
Unity has a large collection of
reusable 3D content and
environments developed by our
community of developers
62. Common questions
- Any question you can think about that involves the words
“photorealism” or “ray tracing”
- Importance depends on your starting point - existing data, target task,
performance goals, training methodology. We have seen significant
performance boosts without it.
- Isn’t data augmentation easier?
- For some tasks it can be, but the sim2real gap still exists
- Example: compositing - difficult to manage occlusion diversity, difficult to have
consistent scene lighting/shadows
- Can we use GANs for domain adaptation?
- Active research area, no clear winners that generalize well yet
63. Feedback for us, the chance for you to
win a $150 Amazon gift card
https://unitysoftware.co1.qualtrics.com/jfe/form/SV_dfXCjWzS5YOP2w6?&source=on
demand
→ Please click on the link in the chat
window (also shown below):
‒
→ We want to get a better sense of our
audience and the things that might
interest you in future webinar topics
64. Q&A 2 0 2 1
Alex Thaman
Senior Manager,
Computer Vision
Kevin Saito
Senior Manager, AI
Commercialization
Salehe Erfanian Ebadi
Senior ML Developer, AMLR
unity.com/products/computer-vision