Wejo has largest connected vehicle data set in the world, processing 17bn data points a day. Our data is of value to customers in multiple industries and to customers of multiple sizes. By utilising the Databricks whitelable offering allowing controlled, secure access to our data, we have opened up the unique value of Wejo data to a whole new user base.
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Customers
1. 150bn+ 2 trillion+2014 200+
Winner, Fastest Growing
Company 2019
Winner, Customer
Value Leadership
2019Founded Employees miles curated data points captured
Databricks White label -
making petabyte scale data
consumable to all our
customers
2. Ingress of data from over 15m+ vehicles
2
Ingestion, processing and storage of enormous data sets
OEM A
Streamed
OEM A
Micro-batch (Ign OFF)
OEM B - n
Streamed
15m+ Vehicles
18 billion data points per
day
0.4 trillion per month
Peak ~900,000 per second
5Pb data (and growing)
3. A range of data, insights and tools making all of
Wejo’s products accessible to any business
3
Stream
Live feeds of near-real
time data, for low latency
applications.
Examples:
• Vehicle Movements
• Driving Events
Batch
Deferred and bulk data,
for one off or regular
transfers.
Examples:
• Vehicle Movements
• Driving Events
Intelligence
Pre-defined and
customisable reports and
visualisations.
Examples:
• Traffic Intelligence
• Dealership Network
Analytics
Mobility Intelligence
Studio
Self-serve access to Wejo
datasets and data science
tools for data evaluation,
profiling analysis and
advanced modelling.
Developer Labs
Suite of Developer APIs
providing on-demand
access to Wejo and
vehicle data.
Examples:
• Current EV charge level
• Historic traffic
Access to raw and derived
datasets to support modelling
and analytics
Mobility Intelligence and data
science toolsets
Developer tools providing ways
to incorporate data in apps and
propositions
COMING SOON
4. Impact of Databricks
4Data Lake
Data analysis Data science Data governance
CoreIngress
OEM
Egress
Transform
Filter &
Aggregate
Stream
Raw asset
Bespoke DS output
Adept Stream
BI and Dashboards
Incident
Manage
ment
Change
and
Release
24x7
Monitori
ng
Alerting
SecOps
Devops
Stream Stream
Stream
Aggregates
Batch
Insight Batch
Bespoke
Insights
Sample
Preview
Infosec
Complian
ce
BI Datamart Data Aggregates
Infosec Tooling
Sample
Generation
On-prem
AWS
Google Cloud
Microsoft Azure
Portal
OEM
OEM
OEM
Databricks underpins our ad-hoc
analysis of data for data science
We use it to populate both the
datamart for BI and the derived
data used in WIM
Introduction of Deltalake to support
geospatial workloads and CCPA and
GDPR
Databricks have also introduced a
white label/multi tenant option for
use in sample preview/Traffic
Intelligence delivery
PaaS further leverages Databricks
for data science/analytics
5. Databricks white label use cases
• Open up Wejo data to more users
• Reduce the technical barrier to entry
• Remove up-front investments to ingest Wejo data
▪ Use case 1 – Whitelable as a preview environment for Wejo data to
reduce the sales cycle
▪ Use case 2 – Whitelable as an environment for Wejo customers to
consume and interact with Wejo insight
5
6. Use case 1 – evaluating Wejo big data products
Leveraging Whitelable for faster sales, enhanced conversion rate,
risk removed
Ø Multi-stage process for validating user for sample
Ø Document heavy, both internal and OEM required
documents
Ø Legal and info sec barriers
Ø Requires extensive SME involvement on the egress
partner and wejo side
Ø Data has to be securely transferred with deletion
protocols followed up
ü Simplified data preview process
ü No data is “sent”
ü Improved conversion to sale
ü All OEM data
ü Data risk significantly reduced
ü Legal and info sec barriers removed
ü Sales engagement tool for presentations and top funnel
discussions
DEA,
schedules &
TPISR
completed
Sample
Request
form
completed
Data
Sample
Prepared
Data
Sample
picked up
Data
Sample
reviewed
Data
Sample
deletion
requested
Credentials created
& Terms of use
accepted
User granted
access to data
preview within 24
hours & training
scheduled
Data reviewed
Users credentials
removed at the end
of evaluation
Current process
Process: Data Labs
wejo network
wejo network
Egress partner network
7. Authorisation
Sample Preview Overview
7
ADEPT
Core ADEPT
Raw historical asset
Core ADEPT MIS Data Preview(s)
Data Sample 1
Notebook
Server
ADEPT Sample
Storage
Copy White Label IAM/Cluster
access
Egress Partner 1
Query
Control Plane
Notes
1. Segregated S3 storage
2. Limited data set, fully secured
3. No download ability or additional
data set access
4. Pre-written Notebooks and data
dictionary
5. Managed instance per customer,
looking at same data
Data Sample 2
Notebook
Server
Copy White Label IAM/Cluster
access
Query
Egress Partner 2
8. Use Case 2 – Mobility Intelligence Studio
8
• Access to Wejo Traffic intelligence
product
• Reduces partner need to build
infrastructure
9. MIS Data Preview(s)
Authorisation
MIS Overview
9
ADEPT
Core ADEPT
Raw historical asset
Core ADEPT
Aggregat
e data
Notebook
Delta Lake
Copy
White Label 1 Cluster
access
Control Plane
• Use delta lake views and tables to
limit access to relevant content
• Don’t store data multiple times
• Cluster policies as well as user set
up will allow limiting content
White Label 2 Cluster
access
Notebook
10. Summary
• Wejo’s primary data platform is Databricks
• Whitelabel functionality allows our partners to engage with Wejo data in a controlled, secure and
fully functional Data Platform
• Reducing the technical barrier to Wejo data
• By increasing the amount of data available this massively opens up the use cases and market 10
12. Agenda
First Presenter
Topic Lorem ipsum dolor sit amet, consectetur
adipiscing elit
Second Presenter
Topic Lorem ipsum dolor sit amet, consectetur
adipiscing elit
Third Presenter
Topic Lorem ipsum dolor sit amet,
consectetur adipiscing elit
16. Reduce Long Titles
▪ Bullet 1
▪ Sub-bullet
▪ Sub-bullet
▪ Bullet 2
▪ Sub-bullet
▪ Sub-bullet
By splitting them into a short title, and a more detailed subtitle using this slide format
that includes a subtitle area
18. Two Columns
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
Headline FormatHeadline Format
19. Attribution Format
Second line of attribution
This is a template for a quote slide.
This is where the quote goes.
Attribute the source below…