SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Adastra Group
Our Solution Portfolio
Webinar: Fast data in times of crisis with the help of GPU2
One Focus: Data & Digitalization
Advanced Analytics
(Big) Data
Engineering
Data Governance Cloud
Services
Machine Learning
& AI
Digital
Transformation
ADASTRA Group
Adastra introduction
3 Adastra Group
International consulting company
that creates functional solutions
in various sectors, facilitating
the transition to the digital era.
Cutting-edge software for data
quality management, Master Data
Management, and data governance.
Solutions to complex business
problems in risk management, sales,
and process optimization.
Specialist in mobile app
development.
Full-service creative agency based
on a strong technological
background.
Recruitment for banks, financial institutions,
telecoms and insurance companies, and many
others, including Adastra.
Artificial intelligence, machine
learning and optimization services.
Big data monetization solutions.
Webinar: Fast data in times of crisis with the help of GPU
Adastra Group
Technical & other details
Webinar: Fast data in times of crisis with the help of GPU4
The panel
Matej Misik
QikkDB & TellStory product owner
Ask questions &
answer polls
Get beta access
to the tools we
show
Leave us with
feedback
Tomas Synek
Moderator
Martin Zahumensky
TellStory power user
Data bases & GPU intro with QikkDB [45mins]
Intro into the deep-tech DB space
What are GPUs and how they accelerate HPC
Data story telling with TellStory [45mins]
Traditional BI vs. data story telling
Explaining Covid19 by creating a data story
Agenda for today
Let’s
GO
General intro into
the problem and to
DBs
Some of our
challenges
Real-time visitors reporting
over stream of data
30k per second
~ 2.6 billion per day
e.g. monitoring crowd
during an event, targeted
marketing
Some of our
challenges
Data science on large
datasets
Testing hypotheses and
ad-hoc querying when
indexing is not predictable
Profiling new datasets
Large flows of commuters
above 500 SIM-cards
We were looking for
solutions
Tested different technologies Elastic,
ClickHouse... not working for us very well
for various reasons
Came across GPU accelerated
computing
so?
Why not?
Elastic – slow on one node,
slow data ingest
Actian Vector – faster, but still
not performing well on one
node
Clickhouse – much faster, no
geo-spatial capabilities, only
for linux
MS-SQL – even when tuned
not fast enough
MapD (Omnisci) – considered
but far too expensive
Types of databases
By type of use:
• Transactional
• Batch
• Real-time
• Analytical
• Streaming
By using resources:
• In-memory
• Disk databases
• Hardware accelerated (FPGA, GPU,
Quantum)
Relational
Columnar Time-seriesGraph
DocumentKey-value
By stored data:
...
The technological edge – Why GPU?
GPUs for HPC (high performance computing)
~10x higher performance in
single hardware unit
Great effectiveness (cheaper
computations)
Power growing exponentially vs
linear CPU
Image
processing
Tsunami
simulation
DNA
analyses
Generic
commodi
ty HW
Available in
Cloud
AWS, Azure
Lot of processors for
parallel computing
Intel® Xeon® Platinum 8253
has 16 cores
NVDIA Tesla V100
has 5120 cores and is data
center focused
Rediscovery of Columnar Data
Storage
Utilizing GPUs computation power requires different approach to storing data.
The most suitable database architecture that works well with parallel processing
is columnar storage. In contrast to conventional relational databases which store
data in row-based format, columnar databases store data in separate columns.
In context of parallel processing, GPUs love long vectors of the same data type
FIgure 1: GPUs have thousands of arithmetic logic units (ALUs) in one piece of hardware.
CPU GPU
GPUs help to accelerate
compute-intensive use-cases
“1 GPU node replaces up to 54 CPU nodes” (NVIDIA)
New cards to be announced 2020 with approx. 8000 cores & 40% faster
Inserting a GPU into the
machine is not enough
Need to parallelize programs = hard
CUDA programming model since 2007 by Nvidia
Algorithms must be Embarrassingly parallel
Multi-GPU
How the computation is spread onto cores
GPU CUDA core A B C
Logical conditions
Records meeting the
condition
Result after
reconstruction
A>= B A < 5 Final AND mask
1st
1 5 Apple 0 1 0 - Orange
2 4 Grapes 0 1 0 - Lemon
3 3 Orange 1 1 1 Orange -
2nd
4 2 Lemon 1 1 1 Lemon -
5 1 Banana 1 0 0 - -
nth ...
Transfer data CPU RAM to GPU GPU memory – no transfers GPU to CPU RAM
SELECT C FROM FRUIT_TABLE WHERE A >= B AND A < 5
Parallel execution
1
2
n
1
n
Where is Spatio-temporal different?
Polygon Operations
Crucial requirements for the
database system
Fast insert Fast processing
Scalability & high
availability
Limit pre-aggregations
Standardized access and
common syntax
Deep-tech based on real
science
Google Protocol Buffers
Processing data on GPU is written in CUDA 10 (direct commands to HW
on single core level)
Database core is written in low level language C++ 17 (memory
management, control of instructions…)
Libraries for specific modules
(networking, building, parsing…)
Created in cooperation with Slovak
Technical University top talents
What is qikkDB for?
Filtering and aggregations over single flat huge table
Spatio-temporal data processing
Complex polygon operations (contains, intersect,
union)
Numeric and datetime data
Incremental data which are growing over time
Network utilization & analysis, Risk scoring, Dynamic pricing,
Real-time Analytics, Hypothesis verification, Profiling of big
data, Machine learning, etc.
Logs
Polygons
IoT
GPSNetwork
Events
Auto
motive
Maps
So how fast is it?
1.2B data rows in
7 columns
Average execution
time was obtained
based on 200 query
runs
Biggest datasets
tested at 400GB,
limited by Memory,
can be cached from
disk for bigger
datasets,
benchmarks to
come soon
Execution
Times Results
1. QikkDB
2. GiraffeDB
leading GPU database
3. CatDB
leading columnar database
4. RacoonDB
tuned leading relational database
CPU machine(c5d.9xlarge)36 CPU cores
We use codenames for well known
databases because for legal
reasons we can’t tell you who
these slow guys are.
GPU machine(p3.8xlarge)4x Tesla V100
Compared to Other DBs (results in ms)
Query qikkDB @
p3.8xl
qikkDB @
g4dn.12xl
GiraffeDB
@ p3.8xl
CatDB @
c5d.9xl
RacoonDB
@ c5d.9xl
Elastic
(tuned)
Spark 21x
m3xl
Spark
i3.8xl
#1 22 37 25 435 22 810 2362 22000
#2 37 82 235 1061 964 1818 3559 25000
#3 228 925 231 1630 3491 n/a 4019 27000
#4 283 1105 417 2174 3996 n/a 20412 65000
Avg 143 537 227 1325 2118 n/a 7588 34750
10
to 100x
quicker
The blazing speed
Same HW, 1.2bn data points, 2 databases
www.tellstory.cloud
Both running
on AWS
g4dn.12xlarge
48vCPU 192GB
RAM, 4x Tesla
T4 GPU
Deployed beta
platform with
data
exploration
front-end
QikkDB demo on
smart meter data
Persisted data
on disk
(compressed)
Pre-loaded
data on RAM
Relevant
columns go
to the GPU
Data on GPU RAM
(decompressed)
Result set
PCI-E
Filters &
aggregations
CUDA kernels
When inserting new data a column is automatically created ~
“schema less”, good for IoT and similar
Whats going on in the background?
Data storage & flow
How can it scale?
Multi-GPU (vertical) scalability single-node (up
to 8 GPUs)
• Accelerating computations
• Enabling multiple session
Multi-level caching
• GPU RAM cache
• CPU RAM cache
On roadmap
• Multi-nodes (horizontal) scalability
• High-availability
• Data lazy loading
Not limited to data size ~ Best performance when
data fit GPU mem, but can load from disk on demand
Why not just index?
Traditional databases use indexing for faster processing
resulting in slow insert
qikkDB does not need indexing
(but they are available anyway)
Data are just appended
GPU takes care of fast processing
Integration with your
environment on
standards you know
Kafka connector
ODBC/JDBC
Adapters
C#, Java, Python
Streaming data
Visualizationtools (PowerBI…)
Customapplications,data analysis…
Speed up your
BI tools,
applications or
use TellStory for
fast analysis
TellStory
Exploration & analysis FE
Data story tellingwith real-timedata exploration
GPU AWS
12USD/hour
GPU HW
~50k USD
Expensive
hardware?
QikkDB can handle the queries in a fraction
of the time of traditional databases, so you
can do more with your hardware
allocation in the same time.
It also means that to do the same amount
of work you need a lot less hardware and
therefore saving on costs.
“1 GPU node replaces up to 54 CPU nodes” (NVIDIA)
v
In short: Interactive analytics
on massive data sets
GPU acceleration
§ Billions of data points in milliseconds
Great for spatio-temporal data
§ Finding & understanding links between data
points in space & time
Standard SQL syntax
§ Easy to start using & integrate into the data
science environment
Efficiency & speed
§ GPUs becoming commodity HW and thanks to
their efficiency cost per 10k queries on par
with CPU approaches
GPU
Columnar
DB
Real-time
queries in
millisecs
API, ODBC,
JDBC,
connects
to
everything
SQL
standard
Spatio-
temporal
data
processing
Cloud or
on-prem
Data bases & GPU intro with QikkDB
Intro into the deep-tech DB space
What are GPUs and how they accelerate HPC
Data story telling with TellStory
Traditional BI vs. data story telling
Explaining Covid19 by creating a data story
Live stories and fast data
TellStory Roadmap
Q&A
Part 2!
Let’s
GO
Martin is ex-Instarea CEO now
working in Ataccama as Head
of Product Strategy
Martin created
https://qikk.ly/c
ovid19 story and
will lead you
through how he
did it
Interpreted data, easy to understand, with new facts
brought to reader
And once they have the story they can start to sell it to
other parties
Animated
video
playing
Story telling
A story is about being visual
Cool
Visualization
Plugins &
animations
Newspaper
like reading
&
interactive
Interesting
facts
1
2
3
Creating the Covid-19 story Live
When you want to have the story live,
you must have the data live, and when
you work with billions big data sets you
need
Fast Database
Animated
video
playing
LIVE story
Live
More features to come in Phase 2, Let AI create your Story is in progress
TellStory Roadmap
Beta release JUNE
Find interesting facts
Minute by Minute
updates
(be notified when something
interesting happens)
Animated
visualizations
(timeline charts, maps)
Share as Video
(Instagram upload, Youtube
livestream)
Google sheets
integration
Auto update data
(scheduled refresh)
Embed sections
(embedding only parts of
story will be possible)
Value
proposition
for Adastra
services
with these
tools
Quick pilots for hands on
experience
§ GPU data acceleration: 2 month pilot to
deliver real-time processing of vast
streaming data (e.g. 5G, smart meters,
transactions)
§ Data story telling: 1 month pilot to
provide customers with live &
interactive intelligence and insights
§ Data story telling: 1 month pilot to give
management the minute by minute data
they need
Q&A
Check out
www.qikk.ly
and
www.tellstory.ai
Useful links
More info
§ https://qikk.ly – product web with basic
information
§ https://qikk.ly/downloads/qikkDB_white_pa
per.pdf – White paper
§ https://docs.qikk.ly/ – Documentation &
Installation instructions
§ https://support.qikk.ly/ – Issues & Features
reporting portal
§ https://tellstory.cloud – Front-end for data
visualization, SQL console on AWS
§ https://tellstory.ai – Find out more about
TellStory

Más contenido relacionado

La actualidad más candente

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesKinetica
 
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...Kinetica
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsOperationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsKinetica
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsDatabricks
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big dataNathan Bijnens
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...VMware Tanzu
 

La actualidad más candente (20)

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsOperationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...Present & Future of Greenplum Database A massively parallel Postgres Database...
Present & Future of Greenplum Database A massively parallel Postgres Database...
 

Similar a Fast data in times of crisis with GPU accelerated database QikkDB | Business Breakfast | 23.4.2020

NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020John Zedlewski
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...E-Commerce Brasil
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfDLow6
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data FrameGTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data FrameAaron Williams
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemDan Eaton
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataMatt Stubbs
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
 
Accelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUAccelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUJoshua Patterson
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 

Similar a Fast data in times of crisis with GPU accelerated database QikkDB | Business Breakfast | 23.4.2020 (20)

NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdfS51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
S51281 - Accelerate Data Science in Python with RAPIDS_1679330128290001YmT7.pdf
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data FrameGTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
GTC Tel Aviv: Accelerate Analytics with a GPU Data Frame
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Complex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real TimeComplex Analytics with NoSQL Data Store in Real Time
Complex Analytics with NoSQL Data Store in Real Time
 
Accelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPUAccelerating Cyber Threat Detection With GPU
Accelerating Cyber Threat Detection With GPU
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 

Último

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Último (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Fast data in times of crisis with GPU accelerated database QikkDB | Business Breakfast | 23.4.2020

  • 1.
  • 2. Adastra Group Our Solution Portfolio Webinar: Fast data in times of crisis with the help of GPU2 One Focus: Data & Digitalization Advanced Analytics (Big) Data Engineering Data Governance Cloud Services Machine Learning & AI Digital Transformation
  • 3. ADASTRA Group Adastra introduction 3 Adastra Group International consulting company that creates functional solutions in various sectors, facilitating the transition to the digital era. Cutting-edge software for data quality management, Master Data Management, and data governance. Solutions to complex business problems in risk management, sales, and process optimization. Specialist in mobile app development. Full-service creative agency based on a strong technological background. Recruitment for banks, financial institutions, telecoms and insurance companies, and many others, including Adastra. Artificial intelligence, machine learning and optimization services. Big data monetization solutions. Webinar: Fast data in times of crisis with the help of GPU
  • 4. Adastra Group Technical & other details Webinar: Fast data in times of crisis with the help of GPU4 The panel Matej Misik QikkDB & TellStory product owner Ask questions & answer polls Get beta access to the tools we show Leave us with feedback Tomas Synek Moderator Martin Zahumensky TellStory power user
  • 5. Data bases & GPU intro with QikkDB [45mins] Intro into the deep-tech DB space What are GPUs and how they accelerate HPC Data story telling with TellStory [45mins] Traditional BI vs. data story telling Explaining Covid19 by creating a data story Agenda for today Let’s GO
  • 6. General intro into the problem and to DBs
  • 7. Some of our challenges Real-time visitors reporting over stream of data 30k per second ~ 2.6 billion per day e.g. monitoring crowd during an event, targeted marketing
  • 8. Some of our challenges Data science on large datasets Testing hypotheses and ad-hoc querying when indexing is not predictable Profiling new datasets Large flows of commuters above 500 SIM-cards
  • 9. We were looking for solutions Tested different technologies Elastic, ClickHouse... not working for us very well for various reasons Came across GPU accelerated computing so? Why not? Elastic – slow on one node, slow data ingest Actian Vector – faster, but still not performing well on one node Clickhouse – much faster, no geo-spatial capabilities, only for linux MS-SQL – even when tuned not fast enough MapD (Omnisci) – considered but far too expensive
  • 10. Types of databases By type of use: • Transactional • Batch • Real-time • Analytical • Streaming By using resources: • In-memory • Disk databases • Hardware accelerated (FPGA, GPU, Quantum) Relational Columnar Time-seriesGraph DocumentKey-value By stored data: ...
  • 11. The technological edge – Why GPU? GPUs for HPC (high performance computing) ~10x higher performance in single hardware unit Great effectiveness (cheaper computations) Power growing exponentially vs linear CPU Image processing Tsunami simulation DNA analyses Generic commodi ty HW Available in Cloud AWS, Azure
  • 12. Lot of processors for parallel computing Intel® Xeon® Platinum 8253 has 16 cores NVDIA Tesla V100 has 5120 cores and is data center focused Rediscovery of Columnar Data Storage Utilizing GPUs computation power requires different approach to storing data. The most suitable database architecture that works well with parallel processing is columnar storage. In contrast to conventional relational databases which store data in row-based format, columnar databases store data in separate columns. In context of parallel processing, GPUs love long vectors of the same data type FIgure 1: GPUs have thousands of arithmetic logic units (ALUs) in one piece of hardware. CPU GPU GPUs help to accelerate compute-intensive use-cases “1 GPU node replaces up to 54 CPU nodes” (NVIDIA) New cards to be announced 2020 with approx. 8000 cores & 40% faster
  • 13. Inserting a GPU into the machine is not enough Need to parallelize programs = hard CUDA programming model since 2007 by Nvidia Algorithms must be Embarrassingly parallel
  • 14. Multi-GPU How the computation is spread onto cores GPU CUDA core A B C Logical conditions Records meeting the condition Result after reconstruction A>= B A < 5 Final AND mask 1st 1 5 Apple 0 1 0 - Orange 2 4 Grapes 0 1 0 - Lemon 3 3 Orange 1 1 1 Orange - 2nd 4 2 Lemon 1 1 1 Lemon - 5 1 Banana 1 0 0 - - nth ... Transfer data CPU RAM to GPU GPU memory – no transfers GPU to CPU RAM SELECT C FROM FRUIT_TABLE WHERE A >= B AND A < 5 Parallel execution 1 2 n 1 n
  • 15. Where is Spatio-temporal different? Polygon Operations
  • 16. Crucial requirements for the database system Fast insert Fast processing Scalability & high availability Limit pre-aggregations Standardized access and common syntax
  • 17. Deep-tech based on real science Google Protocol Buffers Processing data on GPU is written in CUDA 10 (direct commands to HW on single core level) Database core is written in low level language C++ 17 (memory management, control of instructions…) Libraries for specific modules (networking, building, parsing…) Created in cooperation with Slovak Technical University top talents
  • 18. What is qikkDB for? Filtering and aggregations over single flat huge table Spatio-temporal data processing Complex polygon operations (contains, intersect, union) Numeric and datetime data Incremental data which are growing over time Network utilization & analysis, Risk scoring, Dynamic pricing, Real-time Analytics, Hypothesis verification, Profiling of big data, Machine learning, etc. Logs Polygons IoT GPSNetwork Events Auto motive Maps
  • 19. So how fast is it? 1.2B data rows in 7 columns Average execution time was obtained based on 200 query runs Biggest datasets tested at 400GB, limited by Memory, can be cached from disk for bigger datasets, benchmarks to come soon
  • 20. Execution Times Results 1. QikkDB 2. GiraffeDB leading GPU database 3. CatDB leading columnar database 4. RacoonDB tuned leading relational database CPU machine(c5d.9xlarge)36 CPU cores We use codenames for well known databases because for legal reasons we can’t tell you who these slow guys are. GPU machine(p3.8xlarge)4x Tesla V100 Compared to Other DBs (results in ms) Query qikkDB @ p3.8xl qikkDB @ g4dn.12xl GiraffeDB @ p3.8xl CatDB @ c5d.9xl RacoonDB @ c5d.9xl Elastic (tuned) Spark 21x m3xl Spark i3.8xl #1 22 37 25 435 22 810 2362 22000 #2 37 82 235 1061 964 1818 3559 25000 #3 228 925 231 1630 3491 n/a 4019 27000 #4 283 1105 417 2174 3996 n/a 20412 65000 Avg 143 537 227 1325 2118 n/a 7588 34750 10 to 100x quicker
  • 21. The blazing speed Same HW, 1.2bn data points, 2 databases www.tellstory.cloud Both running on AWS g4dn.12xlarge 48vCPU 192GB RAM, 4x Tesla T4 GPU Deployed beta platform with data exploration front-end
  • 22. QikkDB demo on smart meter data
  • 23. Persisted data on disk (compressed) Pre-loaded data on RAM Relevant columns go to the GPU Data on GPU RAM (decompressed) Result set PCI-E Filters & aggregations CUDA kernels When inserting new data a column is automatically created ~ “schema less”, good for IoT and similar Whats going on in the background? Data storage & flow
  • 24. How can it scale? Multi-GPU (vertical) scalability single-node (up to 8 GPUs) • Accelerating computations • Enabling multiple session Multi-level caching • GPU RAM cache • CPU RAM cache On roadmap • Multi-nodes (horizontal) scalability • High-availability • Data lazy loading Not limited to data size ~ Best performance when data fit GPU mem, but can load from disk on demand
  • 25. Why not just index? Traditional databases use indexing for faster processing resulting in slow insert qikkDB does not need indexing (but they are available anyway) Data are just appended GPU takes care of fast processing
  • 26. Integration with your environment on standards you know Kafka connector ODBC/JDBC Adapters C#, Java, Python Streaming data Visualizationtools (PowerBI…) Customapplications,data analysis… Speed up your BI tools, applications or use TellStory for fast analysis TellStory Exploration & analysis FE Data story tellingwith real-timedata exploration
  • 27. GPU AWS 12USD/hour GPU HW ~50k USD Expensive hardware? QikkDB can handle the queries in a fraction of the time of traditional databases, so you can do more with your hardware allocation in the same time. It also means that to do the same amount of work you need a lot less hardware and therefore saving on costs. “1 GPU node replaces up to 54 CPU nodes” (NVIDIA)
  • 28. v In short: Interactive analytics on massive data sets GPU acceleration § Billions of data points in milliseconds Great for spatio-temporal data § Finding & understanding links between data points in space & time Standard SQL syntax § Easy to start using & integrate into the data science environment Efficiency & speed § GPUs becoming commodity HW and thanks to their efficiency cost per 10k queries on par with CPU approaches GPU Columnar DB Real-time queries in millisecs API, ODBC, JDBC, connects to everything SQL standard Spatio- temporal data processing Cloud or on-prem
  • 29. Data bases & GPU intro with QikkDB Intro into the deep-tech DB space What are GPUs and how they accelerate HPC Data story telling with TellStory Traditional BI vs. data story telling Explaining Covid19 by creating a data story Live stories and fast data TellStory Roadmap Q&A Part 2! Let’s GO
  • 30. Martin is ex-Instarea CEO now working in Ataccama as Head of Product Strategy Martin created https://qikk.ly/c ovid19 story and will lead you through how he did it
  • 31. Interpreted data, easy to understand, with new facts brought to reader And once they have the story they can start to sell it to other parties Animated video playing Story telling
  • 32. A story is about being visual Cool Visualization Plugins & animations Newspaper like reading & interactive Interesting facts
  • 34. When you want to have the story live, you must have the data live, and when you work with billions big data sets you need Fast Database Animated video playing LIVE story Live
  • 35. More features to come in Phase 2, Let AI create your Story is in progress TellStory Roadmap Beta release JUNE Find interesting facts Minute by Minute updates (be notified when something interesting happens) Animated visualizations (timeline charts, maps) Share as Video (Instagram upload, Youtube livestream) Google sheets integration Auto update data (scheduled refresh) Embed sections (embedding only parts of story will be possible)
  • 36. Value proposition for Adastra services with these tools Quick pilots for hands on experience § GPU data acceleration: 2 month pilot to deliver real-time processing of vast streaming data (e.g. 5G, smart meters, transactions) § Data story telling: 1 month pilot to provide customers with live & interactive intelligence and insights § Data story telling: 1 month pilot to give management the minute by minute data they need
  • 38. Useful links More info § https://qikk.ly – product web with basic information § https://qikk.ly/downloads/qikkDB_white_pa per.pdf – White paper § https://docs.qikk.ly/ – Documentation & Installation instructions § https://support.qikk.ly/ – Issues & Features reporting portal § https://tellstory.cloud – Front-end for data visualization, SQL console on AWS § https://tellstory.ai – Find out more about TellStory