SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Synthetic Data Tools for
Computer Vision-Based AI
Chris Andrews
COO, Rendered.ai
chris@rendered.ai
Dan Hedges
Lead Solution Architect, Rendered.ai
dan@rendered.ai
Presenters
3
• COO & Head of Product, Rendered.ai
• 25 years experience in commercial and
government geospatial-related products
and technologies
• 3D, enterprise integration, BIM-GIS,
defense-related apps & solutions
• 15 years experience with product
development at companies including Esri,
IBM, and Autodesk
• Lead Solution Engineer, Rendered.ai
• 11+ years experience building geospatial
solutions for industry verticals including
urban planning, local government, federal
government
• Subject matter expertise in remote
sensing, 3D data, and feature extraction
Chris Andrews Dan Hedges
Rendered.ai’s cloud-hosted platform for synthetic
data enables customers to overcome the costs
and challenges of acquiring and using real data
for training and validating computer vision ML and
AI systems and algorithms
• Established in 2019 in Bellevue, WA
• Inclusive subscription encompasses 2D & 3D content
creation, simulation design, data generation
• Rapid setup and configuration for shortest path to
synthetic data generation for multiple applications
• Available on the AWS Marketplace
Synthetic Data experts with experience in:
• Remote sensing – Satellite, Aerial
• Ground-based imagery & video
• Non-visible EM spectra
• 2D and 3D modeling and simulation
• GAN training and dataset post processing
• Dataset comparison and validation
The Platform as a Service for Synthetic Data
Partnering with
Member
The AI Data Problem
BIAS
COST & TIME
INNOVATION PRIVACY/SECURITY
Real data is expensive and often
costly and time consuming to
acquire and label
Rare objects and scenarios
are hard to capture
Without data it is impossible to
explore new sensors and data types
Real data can have security or high-risk
information concerns that limit usage
Intro to Synthetic Data
6
Synthetic Data solves the AI data problem
Rendered.ai is a PaaS and developer
framework for synthetic data
Synthetic Data is engineered data that
AI interprets as real data
60% of data used for AI and data analytics
projects will be synthetic, and by 2030, synthetic
data will have completely overtaken real data in AI
models.
- Gartner, September 2021
Imagine if it were possible to produce infinite
amounts of the world’s most valuable resource,
cheaply and quickly…
This is a reality today. It is called synthetic data.
- Forbes, July 2022
What do we mean by Synthetic Data?
Synthetic data can be created for any type of data used to train or
validate AI/ML systems, even for sensors or systems that don’t exist
CV-based synthetic data simulates bitmap sensor data capture
whether from sensors, recorded spatial patterns, or other CV input
content
Physics-based synthetic data includes creation of 2D/3D/4D output
based on ‘digital twins’ of physical sensors, the sensor platform, and
the scene in which the sensor would operate
Rendered.ai can be used to generate any kind of synthetic data
Initial focus has been on physics-based synthetic data generation for
CV workflows
• RGB imagery and video, RGB microscopy, IR imagery, X-ray, SAR,…
Source: Wikipedia
Today’s AI workflow relies on finding or
acquiring data
Acquire or
find data
Train algorithm Test algorithm Accept/Reject
result
Expensive, unpredictable data acquisition costs
Difficulty training algorithms on inconsistent data
Testing requires reuse of real datasets
Results are limited to what can be achieved with real datasets
Tomorrow’s AI workflow incorporates synthetic data
Inexpensive, unlimited data generation
100% accurate labeling, consistent data
Real datasets used for comparison and post processing
Data can be designed for edge or impossible cases and for removing bias
Create data Train algorithm
Test algorithm
Compare
datasets
Simulator
Dataset and
metadata
Managed Compute
Improved and explainable
outcomes
World building and
procedural gen
Asset Acquisition /
Integration
AI Model
Real-world workflow
For more information:info@rendered.ai
Post processing /
Domain adaptation
Quality assessment
Synthetic Data Engineers
Data Scientist
Platform Automation
Simulator
Synthetic
Dataset
AI Model
Hypothetical workflow
Synthetic data generation steps
1. Scenario characterization - Data output, variability, problem(s) addressed or tested
2. World building - Asset and scene content composition and aggregation
3. Sensor modeling & simulation - Rendering, visual effects, environmental effects
4. Annotation & mask calculation
5. Job execution & dataset compilation
6. Annotation mapping
7. Domain Adaptation post-processing
8. Dataset characterization and comparison
12
New AI job: Synthetic Data Engineer
If most data used to train AI will be synthetic…
…who will be engineering the data?
Design & engineer datasets to achieve specific AI outcomes
Software development-oriented
• python, data science, 3D, game engines
Domain or industry expertise
Expert in specific data types & technologies
• Sensors, Renderers, Modeling, Simulation
What about Generative AI?
Physics-based synthetic data
• Starts from a 3D simulation
• Can add wide variation including absurd,
unnatural, or extremely rare phenomena
• Can generate multiple ‘maps’ for depth,
instances, surfaces, normals, motion
• Can generate fully pixel-labeled content
• Can incorporate accurate physics-based
models for imagery generation
Generative AI (2023)
• Starts with large, known datasets
• Can add variation, but must be driven by
addition of additional training data
• Cannot generate extra maps with
information in the scene
• Cannot label at the pixel level
• Does not incorporate physics-based
models
Generative AI is moving fast and we see it as another tool for both
content generation and post processing or consuming other synthetic data
New AI job: Prompt Engineer
In the world of Generative AI, someone needs to tell the AI what to
produce!
Design & engineer inputs to Generative AI systems to achieve specific
outcomes
Narrative-oriented
• Good at defining context, describing problems
Domain or industry expertise
Expert in specific data types & technologies
• Sensors, Renderers, Modeling, Simulation
Common gaps when introducing customers to synthetic data
• Hyper focus on the bounds of found or acquired data only
• Most data scientists aren’t sensor experts
• Concern about ‘good data’
• Concern about one-off datasets vs. investment in data
• Belief that human perception is good enough to judge data quality
• Confusion over Generative AI vs. simulation ntechniques
… Note that the biggest hurdle is that customers rarely stop to ask what the
ideal dataset would be that would address their business problem!
Synthetic data generation is an empirical process
17
Identify the
problem
Describe
the (ideal)
data
Generate
data
Can I
achieve
any
training?
Refine
data
generation
Can I
improve
training?
Supporting GEOINT workflows with continuously evolving AI
Model digital
sensor
Aggregate &
create scene
content
Create
Channel
configuration
Publish to
Rendered.ai
Add Channel
to
Workspace
Create &
configure
Graphs
Run Jobs
Channel development
(GIS Developer, Database Engineer,
Synthetic Data Engineer)
Train and
Evaluate AI
Datasets
Graph configuration and job execution
(GIS Analyst, Computer Vision Engineers,
Data Scientists & Automated Workflows)
Change graph
configuration
Add/update sensor configuration,
Scene content, scene configuration
Annotation
Images
Masks
Statistics
GIS tools
Data Science toolkits
Embedded AI tools
Scene
config &
generation
Render
simulation
Post-
processing Dataset
Packaging
Sensor
configuration
Platform
configuration
Sensor
simulator
Sensor
simulator
Environment
effects
Objects of interest
Animated people
& animals
Geospatial
services
World construction
Scenario composition
Content distribution
Environmental conditions
Filters
Architecture of Synthetic Data Channels
Cloud-hosted PaaS (COTS)
Job manager
User management
& roles
Archive &
search
Content
volumes
Remote
access APIs
Characterization
(UMAP)
Annotation
microservices
Images
Masks
Statistics
Annotation
CycleGAN
microservices
Annotation
Channels become open-source examples for users to build upon
Textures &
shaders
Don’t rebuild everything for every AI application
Remote Sensing
Supply Chain
Object detection
Automotive
Economic monitoring
Medical Imaging
Security
…
Sensors
Radar Imagery
RGB Camera
Panchromatic
Infrared
High-Definition Radar
Microscopy
X-Ray/CT Scan
MRI
…
Applications Reusable modular architecture
in the cloud
• Content pipelines
• Sensor models
• Analytics toolsets
• AI integrations
Enabling access to synthetic
data as an enterprise capability
Channel Development | Blender
Content Code: SATRDEMO
- Dependencies installed:
- Blender and Python (versions harmonized),
OpenCV, GPU drivers, Ana, Anatools SDK
- Can Edit and Deploy Channels with SDK
- Offered as AMI or from git with
.devcontainer for VS-code
- ArcGIS integration for 2D raster
backgrounds
Custom Code
Available now
Case study slide: EO scenarios
Searching for cranes, and crane trucks as an economic indicator in satellite imagery
Objects are rare relative to other features in overhead imagery.
Which means very large labelling campaigns are needed to collect
examples. Original dataset only had ~100 examples of each class.
Objects are difficult to label. Inconsistent sizing of crane bounding
boxes and similarities between crane trucks and cement pumps
were two notable challenges in the real datasets.
Synthetic and real datasets
2-3x improvement in AP scores over peak performance
without Synthetic data
Channel Development | DIRSIGTM
Content Code: DIRSIGDEMO
- DIRSIG accessed through
python and web interface
- Can Edit and Deploy Channels
with SDK
- No RIT DIRSIG training
required! Custom Code
Available now
Example Applications:
Hyper-spectral Imaging,
Multi-spectral Imaging
Unique relationship with RIT allows
Rendered.ai to package DIRSIG in synthetic
data channels for customers
MSI, HSI, other radiometrically complex
imagery output
Validation possible with calibration panels,
3rd party consulting
Pixel-level geospatial accuracy
Geospatially accurate, high resolution scene
content used in cloud-based generation for
very large datasets
RGB bands from MSI, HSI images created with
DIRSIG and Rendered.ai
Channel Development | Omniverse
Available on request
• Preinstalled dependencies:
• USD, Python, OpenCV, GPU drivers,
Ana, Anatools SDK
• Edit and Deploy Channels to
Rendered.ai with SDK
• Offered as AMI or from git with
.devcontainer for VS-code
Custom Code
Example Applications:
Omniverse Replicator
channel
Use industry-leading 3D toolkit in the cloud
Configurable in a web-based SaaS experience
Starting place for users who may already
have some experience or investment in
NVIDIA tools
Familiar architecture that extends to
multiple use cases
Synthetic imagery chips generated with Omniverse Replicator running inside Rendered.ai on AWS
Example Application:
Synthetic Aperture
Radar
Enterprise & Developer Subscription
Customers
Experimental, cutting-edge Synthetic
Aperture Radar simulation built by
Rendered.ai
SAR output is not human readable,
making human labeling impossible
Emerging commercial SAR industry
seeking better tools for exploitation,
value creation
Applications in defense, disaster
response, Earth observation &
monitoring, insuretech
Synthetic SAR images generated using Rendered.ai
Identical object shown with several image capture scenarios
Example
Application: Marine
Imagery
Enterprise tier customer
Vessel detection in open ocean
scenarios for defense and
contraband interdiction
Supporting edge-based, onboard
object detection systems
Variable weather, wave, obstruction
characteristics
Variable object placement
generators
Synthetic RGB images simulating marine UAV imagery capture
Satellite Visible
Synthetic IR (MWIR)
Synthetic SAR
Over 1.2TB of
synthetic
images
produced with
channel coverage
growing
Security Imaging FLIR Camera
Examples of synthetic CV content
X-Ray and CT scans
Urban & natural
environments
Industrial and
residential settings
And after you have your imagery… compare it!
Creating datasets is a starting point
Training and Validation are next
Compare datasets to explore similarity
• Real-synthetic, synthetic-synthetic
Use tools such as UMAP, FID
Use inference to change SDG
Try again!
UMAP analysis enables data scientists
To explore similarities and differences in
The parameter embedding space of multiple
datasets
Demo
31
Internal
Past experience with cost or failure of
one-off synthetic data experiments
Unprepared for experimentation
Effort to achieve acceptable level of
realism
Complexity/difficulty with physics-
based modeling
External
• Information about emerging tools
• TCO of yet-another-IT project
• Talent shortage
• Lack of benchmarks/standards
─ Need for analytic tools
─ Need for sensitivity analysis
• Lack of industry collaboration
Typical challenges adopting synthetic data
Opportunity of Synthetic Data
Supplement real data
Evaluate and remove bias
Reduce expensive dataset
labeling and reacquisition
Explore scenarios
Simulate sensor models
and collection techniques
Create novel data with
zero PII or
security concerns
Synthetic data as a Standard
Synthetic data is rapidly moving from uncertain value to required tool. Synthetic
data has the opportunity to be used as part of regulatory and ethical frameworks
around bias reduction, demonstrable sensitivity analysis, and reducing the need
for human curation of training data.
Regulatory & compliance
• Bias reduction and testing
• Sensitivity analysis
• Efficacy demonstration
• Removing human-in-the-loop from ethical/harmful scenarios
Synthetic data as an enabler for innovation
As synthetic data generation capabilities improve and become more
accessible, users will have expanded opportunity to experiment,
innovate, and build AI without expensive or impossible real sensor
dataset collection.
Innovation
• Complex sensor fusion
• New & hard-to-acquire sensors
• New dataset combinations
• Digital Twins
Synthetic data driving sustainability
Synthetic data is 100% reliably labeled, has been shown to reduce the size
of training datasets, and potentially reduces the need for real sensor-based
data collection.
Cost and impact
• Reducing labeling costs
• Reducing collection costs
• Reducing environmental footprint of real sensor data collection
• Enabling innovation without physical material consumption/investment
Wrap up
37
For slides and supporting content:
https://bit.ly/GEOINT2023
Try it at:
https://rendered.ai/getstarted.html
Thank you
chris@rendered.ai
dan@rendered.ai
38

Más contenido relacionado

La actualidad más candente

Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...
Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...
Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...
Andrew Schwartz
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
Amazon Web Services
 

La actualidad más candente (20)

Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...
Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...
Mentoring Employees (Comprehensive) PowerPoint Presentation 155 slides with 7...
 
Instructional Design Model
Instructional Design ModelInstructional Design Model
Instructional Design Model
 
Governança de Dados nas empresas - BI Summit 2017
Governança de Dados nas empresas - BI Summit 2017Governança de Dados nas empresas - BI Summit 2017
Governança de Dados nas empresas - BI Summit 2017
 
5 Types of Modern Mentoring That Can Benefit Your Organization
5 Types of Modern Mentoring That Can Benefit Your Organization5 Types of Modern Mentoring That Can Benefit Your Organization
5 Types of Modern Mentoring That Can Benefit Your Organization
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
 
Effective clinical ,practical skill teaching
Effective clinical ,practical skill teaching Effective clinical ,practical skill teaching
Effective clinical ,practical skill teaching
 
Escritório de Governança de Dados - Conceitos e dicas para implantação
Escritório de Governança de Dados - Conceitos e dicas para implantaçãoEscritório de Governança de Dados - Conceitos e dicas para implantação
Escritório de Governança de Dados - Conceitos e dicas para implantação
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
Research Proposal and Milestone
Research Proposal and MilestoneResearch Proposal and Milestone
Research Proposal and Milestone
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management Strategies
 
Mentoring for success
Mentoring for successMentoring for success
Mentoring for success
 
Managing your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure MLManaging your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure ML
 
Training Needs Analysis
Training Needs AnalysisTraining Needs Analysis
Training Needs Analysis
 
Developing mentoring program
Developing mentoring programDeveloping mentoring program
Developing mentoring program
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Enabling an Analytics-Driven Organization
Enabling an Analytics-Driven OrganizationEnabling an Analytics-Driven Organization
Enabling an Analytics-Driven Organization
 
Be like mentor
Be like mentorBe like mentor
Be like mentor
 

Similar a 2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Rendered.ai

Similar a 2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Rendered.ai (20)

Rendered.ai - Intro to Synthetic data for Computer Vision.pdf
Rendered.ai - Intro to Synthetic data for Computer Vision.pdfRendered.ai - Intro to Synthetic data for Computer Vision.pdf
Rendered.ai - Intro to Synthetic data for Computer Vision.pdf
 
Dell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western OntarioDell NVIDIA AI Roadshow - South Western Ontario
Dell NVIDIA AI Roadshow - South Western Ontario
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
A Data Fabric for All Things Intelligent
A Data Fabric for All Things IntelligentA Data Fabric for All Things Intelligent
A Data Fabric for All Things Intelligent
 
Dell AI Telecom Webinar
Dell AI Telecom WebinarDell AI Telecom Webinar
Dell AI Telecom Webinar
 
Arocom Company - Portfolio Brochure Details.pdf
Arocom Company - Portfolio Brochure Details.pdfArocom Company - Portfolio Brochure Details.pdf
Arocom Company - Portfolio Brochure Details.pdf
 
AWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWSAWS O&G Day - Ambyint and AWS
AWS O&G Day - Ambyint and AWS
 
Vertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part IVertex Perspectives | AI-optimized Chipsets | Part I
Vertex Perspectives | AI-optimized Chipsets | Part I
 
Vertex perspectives ai optimized chipsets (part i)
Vertex perspectives   ai optimized chipsets (part i)Vertex perspectives   ai optimized chipsets (part i)
Vertex perspectives ai optimized chipsets (part i)
 
AI Overview and Capabilities
AI Overview and CapabilitiesAI Overview and Capabilities
AI Overview and Capabilities
 
AI in the Enterprise at Scale
AI in the Enterprise at ScaleAI in the Enterprise at Scale
AI in the Enterprise at Scale
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
 
FIWARE Global Summit - Advanced ML/AI Techniques with FIWARE and Connected Io...
FIWARE Global Summit - Advanced ML/AI Techniques with FIWARE and Connected Io...FIWARE Global Summit - Advanced ML/AI Techniques with FIWARE and Connected Io...
FIWARE Global Summit - Advanced ML/AI Techniques with FIWARE and Connected Io...
 
Dell AI and HPC University Roadshow
Dell AI and HPC University RoadshowDell AI and HPC University Roadshow
Dell AI and HPC University Roadshow
 
IBM Cognitive Manufacturing Overview Public
IBM Cognitive Manufacturing Overview PublicIBM Cognitive Manufacturing Overview Public
IBM Cognitive Manufacturing Overview Public
 
Bhadale group of companies data science services catalogue - detailed
Bhadale group of companies data science services catalogue - detailedBhadale group of companies data science services catalogue - detailed
Bhadale group of companies data science services catalogue - detailed
 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

2023 GEOINT Tutorial - Synthetic Data Tools for Computer Vision-Based AI - Rendered.ai

  • 1.
  • 2. Synthetic Data Tools for Computer Vision-Based AI Chris Andrews COO, Rendered.ai chris@rendered.ai Dan Hedges Lead Solution Architect, Rendered.ai dan@rendered.ai
  • 3. Presenters 3 • COO & Head of Product, Rendered.ai • 25 years experience in commercial and government geospatial-related products and technologies • 3D, enterprise integration, BIM-GIS, defense-related apps & solutions • 15 years experience with product development at companies including Esri, IBM, and Autodesk • Lead Solution Engineer, Rendered.ai • 11+ years experience building geospatial solutions for industry verticals including urban planning, local government, federal government • Subject matter expertise in remote sensing, 3D data, and feature extraction Chris Andrews Dan Hedges
  • 4. Rendered.ai’s cloud-hosted platform for synthetic data enables customers to overcome the costs and challenges of acquiring and using real data for training and validating computer vision ML and AI systems and algorithms • Established in 2019 in Bellevue, WA • Inclusive subscription encompasses 2D & 3D content creation, simulation design, data generation • Rapid setup and configuration for shortest path to synthetic data generation for multiple applications • Available on the AWS Marketplace Synthetic Data experts with experience in: • Remote sensing – Satellite, Aerial • Ground-based imagery & video • Non-visible EM spectra • 2D and 3D modeling and simulation • GAN training and dataset post processing • Dataset comparison and validation The Platform as a Service for Synthetic Data Partnering with Member
  • 5. The AI Data Problem BIAS COST & TIME INNOVATION PRIVACY/SECURITY Real data is expensive and often costly and time consuming to acquire and label Rare objects and scenarios are hard to capture Without data it is impossible to explore new sensors and data types Real data can have security or high-risk information concerns that limit usage
  • 7. Synthetic Data solves the AI data problem Rendered.ai is a PaaS and developer framework for synthetic data Synthetic Data is engineered data that AI interprets as real data 60% of data used for AI and data analytics projects will be synthetic, and by 2030, synthetic data will have completely overtaken real data in AI models. - Gartner, September 2021 Imagine if it were possible to produce infinite amounts of the world’s most valuable resource, cheaply and quickly… This is a reality today. It is called synthetic data. - Forbes, July 2022
  • 8. What do we mean by Synthetic Data? Synthetic data can be created for any type of data used to train or validate AI/ML systems, even for sensors or systems that don’t exist CV-based synthetic data simulates bitmap sensor data capture whether from sensors, recorded spatial patterns, or other CV input content Physics-based synthetic data includes creation of 2D/3D/4D output based on ‘digital twins’ of physical sensors, the sensor platform, and the scene in which the sensor would operate Rendered.ai can be used to generate any kind of synthetic data Initial focus has been on physics-based synthetic data generation for CV workflows • RGB imagery and video, RGB microscopy, IR imagery, X-ray, SAR,… Source: Wikipedia
  • 9. Today’s AI workflow relies on finding or acquiring data Acquire or find data Train algorithm Test algorithm Accept/Reject result Expensive, unpredictable data acquisition costs Difficulty training algorithms on inconsistent data Testing requires reuse of real datasets Results are limited to what can be achieved with real datasets
  • 10. Tomorrow’s AI workflow incorporates synthetic data Inexpensive, unlimited data generation 100% accurate labeling, consistent data Real datasets used for comparison and post processing Data can be designed for edge or impossible cases and for removing bias Create data Train algorithm Test algorithm Compare datasets
  • 11. Simulator Dataset and metadata Managed Compute Improved and explainable outcomes World building and procedural gen Asset Acquisition / Integration AI Model Real-world workflow For more information:info@rendered.ai Post processing / Domain adaptation Quality assessment Synthetic Data Engineers Data Scientist Platform Automation Simulator Synthetic Dataset AI Model Hypothetical workflow
  • 12. Synthetic data generation steps 1. Scenario characterization - Data output, variability, problem(s) addressed or tested 2. World building - Asset and scene content composition and aggregation 3. Sensor modeling & simulation - Rendering, visual effects, environmental effects 4. Annotation & mask calculation 5. Job execution & dataset compilation 6. Annotation mapping 7. Domain Adaptation post-processing 8. Dataset characterization and comparison 12
  • 13. New AI job: Synthetic Data Engineer If most data used to train AI will be synthetic… …who will be engineering the data? Design & engineer datasets to achieve specific AI outcomes Software development-oriented • python, data science, 3D, game engines Domain or industry expertise Expert in specific data types & technologies • Sensors, Renderers, Modeling, Simulation
  • 14. What about Generative AI? Physics-based synthetic data • Starts from a 3D simulation • Can add wide variation including absurd, unnatural, or extremely rare phenomena • Can generate multiple ‘maps’ for depth, instances, surfaces, normals, motion • Can generate fully pixel-labeled content • Can incorporate accurate physics-based models for imagery generation Generative AI (2023) • Starts with large, known datasets • Can add variation, but must be driven by addition of additional training data • Cannot generate extra maps with information in the scene • Cannot label at the pixel level • Does not incorporate physics-based models Generative AI is moving fast and we see it as another tool for both content generation and post processing or consuming other synthetic data
  • 15. New AI job: Prompt Engineer In the world of Generative AI, someone needs to tell the AI what to produce! Design & engineer inputs to Generative AI systems to achieve specific outcomes Narrative-oriented • Good at defining context, describing problems Domain or industry expertise Expert in specific data types & technologies • Sensors, Renderers, Modeling, Simulation
  • 16. Common gaps when introducing customers to synthetic data • Hyper focus on the bounds of found or acquired data only • Most data scientists aren’t sensor experts • Concern about ‘good data’ • Concern about one-off datasets vs. investment in data • Belief that human perception is good enough to judge data quality • Confusion over Generative AI vs. simulation ntechniques … Note that the biggest hurdle is that customers rarely stop to ask what the ideal dataset would be that would address their business problem!
  • 17. Synthetic data generation is an empirical process 17 Identify the problem Describe the (ideal) data Generate data Can I achieve any training? Refine data generation Can I improve training?
  • 18. Supporting GEOINT workflows with continuously evolving AI Model digital sensor Aggregate & create scene content Create Channel configuration Publish to Rendered.ai Add Channel to Workspace Create & configure Graphs Run Jobs Channel development (GIS Developer, Database Engineer, Synthetic Data Engineer) Train and Evaluate AI Datasets Graph configuration and job execution (GIS Analyst, Computer Vision Engineers, Data Scientists & Automated Workflows) Change graph configuration Add/update sensor configuration, Scene content, scene configuration Annotation Images Masks Statistics GIS tools Data Science toolkits Embedded AI tools
  • 19. Scene config & generation Render simulation Post- processing Dataset Packaging Sensor configuration Platform configuration Sensor simulator Sensor simulator Environment effects Objects of interest Animated people & animals Geospatial services World construction Scenario composition Content distribution Environmental conditions Filters Architecture of Synthetic Data Channels Cloud-hosted PaaS (COTS) Job manager User management & roles Archive & search Content volumes Remote access APIs Characterization (UMAP) Annotation microservices Images Masks Statistics Annotation CycleGAN microservices Annotation Channels become open-source examples for users to build upon Textures & shaders
  • 20. Don’t rebuild everything for every AI application Remote Sensing Supply Chain Object detection Automotive Economic monitoring Medical Imaging Security … Sensors Radar Imagery RGB Camera Panchromatic Infrared High-Definition Radar Microscopy X-Ray/CT Scan MRI … Applications Reusable modular architecture in the cloud • Content pipelines • Sensor models • Analytics toolsets • AI integrations Enabling access to synthetic data as an enterprise capability
  • 21. Channel Development | Blender Content Code: SATRDEMO - Dependencies installed: - Blender and Python (versions harmonized), OpenCV, GPU drivers, Ana, Anatools SDK - Can Edit and Deploy Channels with SDK - Offered as AMI or from git with .devcontainer for VS-code - ArcGIS integration for 2D raster backgrounds Custom Code Available now
  • 22. Case study slide: EO scenarios Searching for cranes, and crane trucks as an economic indicator in satellite imagery Objects are rare relative to other features in overhead imagery. Which means very large labelling campaigns are needed to collect examples. Original dataset only had ~100 examples of each class. Objects are difficult to label. Inconsistent sizing of crane bounding boxes and similarities between crane trucks and cement pumps were two notable challenges in the real datasets. Synthetic and real datasets 2-3x improvement in AP scores over peak performance without Synthetic data
  • 23. Channel Development | DIRSIGTM Content Code: DIRSIGDEMO - DIRSIG accessed through python and web interface - Can Edit and Deploy Channels with SDK - No RIT DIRSIG training required! Custom Code Available now
  • 24. Example Applications: Hyper-spectral Imaging, Multi-spectral Imaging Unique relationship with RIT allows Rendered.ai to package DIRSIG in synthetic data channels for customers MSI, HSI, other radiometrically complex imagery output Validation possible with calibration panels, 3rd party consulting Pixel-level geospatial accuracy Geospatially accurate, high resolution scene content used in cloud-based generation for very large datasets RGB bands from MSI, HSI images created with DIRSIG and Rendered.ai
  • 25. Channel Development | Omniverse Available on request • Preinstalled dependencies: • USD, Python, OpenCV, GPU drivers, Ana, Anatools SDK • Edit and Deploy Channels to Rendered.ai with SDK • Offered as AMI or from git with .devcontainer for VS-code Custom Code
  • 26. Example Applications: Omniverse Replicator channel Use industry-leading 3D toolkit in the cloud Configurable in a web-based SaaS experience Starting place for users who may already have some experience or investment in NVIDIA tools Familiar architecture that extends to multiple use cases Synthetic imagery chips generated with Omniverse Replicator running inside Rendered.ai on AWS
  • 27. Example Application: Synthetic Aperture Radar Enterprise & Developer Subscription Customers Experimental, cutting-edge Synthetic Aperture Radar simulation built by Rendered.ai SAR output is not human readable, making human labeling impossible Emerging commercial SAR industry seeking better tools for exploitation, value creation Applications in defense, disaster response, Earth observation & monitoring, insuretech Synthetic SAR images generated using Rendered.ai Identical object shown with several image capture scenarios
  • 28. Example Application: Marine Imagery Enterprise tier customer Vessel detection in open ocean scenarios for defense and contraband interdiction Supporting edge-based, onboard object detection systems Variable weather, wave, obstruction characteristics Variable object placement generators Synthetic RGB images simulating marine UAV imagery capture
  • 29. Satellite Visible Synthetic IR (MWIR) Synthetic SAR Over 1.2TB of synthetic images produced with channel coverage growing Security Imaging FLIR Camera Examples of synthetic CV content X-Ray and CT scans Urban & natural environments Industrial and residential settings
  • 30. And after you have your imagery… compare it! Creating datasets is a starting point Training and Validation are next Compare datasets to explore similarity • Real-synthetic, synthetic-synthetic Use tools such as UMAP, FID Use inference to change SDG Try again! UMAP analysis enables data scientists To explore similarities and differences in The parameter embedding space of multiple datasets
  • 32. Internal Past experience with cost or failure of one-off synthetic data experiments Unprepared for experimentation Effort to achieve acceptable level of realism Complexity/difficulty with physics- based modeling External • Information about emerging tools • TCO of yet-another-IT project • Talent shortage • Lack of benchmarks/standards ─ Need for analytic tools ─ Need for sensitivity analysis • Lack of industry collaboration Typical challenges adopting synthetic data
  • 33. Opportunity of Synthetic Data Supplement real data Evaluate and remove bias Reduce expensive dataset labeling and reacquisition Explore scenarios Simulate sensor models and collection techniques Create novel data with zero PII or security concerns
  • 34. Synthetic data as a Standard Synthetic data is rapidly moving from uncertain value to required tool. Synthetic data has the opportunity to be used as part of regulatory and ethical frameworks around bias reduction, demonstrable sensitivity analysis, and reducing the need for human curation of training data. Regulatory & compliance • Bias reduction and testing • Sensitivity analysis • Efficacy demonstration • Removing human-in-the-loop from ethical/harmful scenarios
  • 35. Synthetic data as an enabler for innovation As synthetic data generation capabilities improve and become more accessible, users will have expanded opportunity to experiment, innovate, and build AI without expensive or impossible real sensor dataset collection. Innovation • Complex sensor fusion • New & hard-to-acquire sensors • New dataset combinations • Digital Twins
  • 36. Synthetic data driving sustainability Synthetic data is 100% reliably labeled, has been shown to reduce the size of training datasets, and potentially reduces the need for real sensor-based data collection. Cost and impact • Reducing labeling costs • Reducing collection costs • Reducing environmental footprint of real sensor data collection • Enabling innovation without physical material consumption/investment
  • 37. Wrap up 37 For slides and supporting content: https://bit.ly/GEOINT2023 Try it at: https://rendered.ai/getstarted.html