Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
1. BI Converges with AI - GPUs for Fast Data
James Mesney | Principal Solution Engineer | Kinetica EMEA
2. Analytics challenges faced by
US Army Intelligence
2
Kinetica incubated as a massively parallel
computational engine for US Army INSCOM
200+ sources of streaming data producing 20B
new records per day.
Requirements to do ad-hoc analysis with human-
response time on hot data.
Reduce reliance on expensive racks of premium
hardware.
3. Why a GPU Database? Why Now?
• Leverage Innovations in CPU and GPU technology
• Big Data
• In-memory and Parallel Processing
• Traditional Analytics
• Emerging AI/ML/Deep Learning Computing
• Real-Time and Streaming Data
• Geospatial and Temporal
• Use Commodity Hardware (and less of it)
• With Simplified Architecture / software stack
3
4. Why GPUs?
3
“By 2020, 80% of Big Data and Analytics deployments will need
distributed micro analytics and 40% of all business analytics software
will incorporate prescriptive analytics built on cognitive computing
functionality. Both of these trends require a dramatic increase in
processing power that could be enabled by GPUs.”
— IDC
“By 2018, over 50% of developer teams will embed cognitive services
in their apps (vs 1% today) providing U.S. enterprises with over
$60 billion annual savings by 2020.”
— IDC
5. 5
5,000+ cores per device
versus 16 to 32 cores per
typical CPU device.
High performance computing
trend to using GPU’s to solve
massive processing
challenges GPU acceleration brings high
performance compute to
commodity hardware
Parallel processing is ideal for
scanning entire dataset &
brute force compute.
GPUs are designed around thousands of small, efficient cores that are well suited to performing repeated
similar instructions in parallel. This makes them well-suited to the compute-intensive workloads required of
large data sets.
What is a GPU?
6. GPU Benefits – One Tenth of the Hardware
SOLUTION
• Replacing a 300 node database cluster with
30 nodes of Kinetica powered by GPUs
BENEFITS
• 1/10 the size
• 100x to 200x faster than other In-memory
Databases
• Significant datacenter operations cost
savings – headcount, environmental
footprint, etc
• Better deployment flexibility
• Very high performance, at scale
CPU clusters
NVIDIA
7. Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp
1980 1990 2000 2010 2020
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
GPU-Computing perf
1.5X per year 1000X
By 2025
The Rise of GPU Computing
SpecINT
11. What is Kinetica?
GPU-accelerated
In-memory, MPP
Relational Database
Natural language
processing and
full-text search
Native Geo-spatial
support and Data
Visualisation
Real time data
handlers to ingest
structured and
unstructured data
Deep integration with open
source and commercial
frameworks / apps: TensorFlow,
Hadoop, Spark, NiFi, Kafka, Storm,
Tableau, Kibana, Caravel…
Linear, predictable
scale out for data
ingestion, retention
and querying
No typical tuning,
indexing, and
tweaking
Huge range of API’s:
ODBC, JDBC, SQL, Java,
JS, C++, Python, C#,
Node.js, REST
12. KINETICA
Commodity Hardware with GPUs
Disk
GPU Accelerated
Columnar In-memory Database
HTTP Head Node
KINETICA
Commodity Hardware with GPUs
Disk
GPU Accelerated
Columnar In-memory Database
HTTP Head Node
Kinetica: Core
12
ANALYTICS DATABASE ACCELERATED BY GPUs
Columnar in-memory database. Data persisted to disk
Data available much like a traditional RDBMS… tables,
rows, columns, views
Interact with Kinetica through its native REST API, Java,
Python, JavaScript, NodeJS, C++, SQL, ODBC, JDBC.
Native GIS support and Visualisation
High concurrency
Security + Administration + Backup + Monitoring + Audit
Typical hardware setup: 256GB –
1.5TB memory. 2-4 GPUs per node.
KINETICA
Commodity Hardware with GPUs
Disk
GPU Accelerated
Columnar In-memory Database
Interfaces & Orchestration
13. Kinetica High-Level Architecture
VISUALIZATION via ODBC/JDBCAPIs
Java API
JavaScript API
REST API
C# and C++ API
Node.js API
Python API
OPEN SOURCE
INTEGRATION
Apache NiFi
Apache Kafka
Apache Spark
Apache Storm
GEOSPATIAL CAPABILITIES
Geometric
Objects
Tracks
Geospatial
Endpoints
WMS
WKT
KINETICA CLUSTER On-Demand Scale
OTHER
INTEGRATION
Message Queues
ETL Tools
Streaming Tools
SERVER 1 SERVER 2 SERVER 3 SERVER n…
Commodity Hardware with GPUs
Disk
GPU Accelerated
Columnar In-memory
Database
Coordination &
Orchestration
Commodity Hardware with GPUs
Disk
GPU Accelerated
Columnar In-memory
Database
Coordination &
Orchestration
Commodity Hardware with GPUs
Disk
GPU Accelerated
Columnar In-memory
Database
Coordination &
Orchestration
Commodity Hardware with GPUs
Disk
GPU Accelerated
Columnar In-memory
Database
Coordination &
Orchestration
15. Visualization Pipeline: Outputs, Maps, Video
15
RENDER MASSIVE DATASETS IN SUB-SECOND
e.g. 4bn Twitter posts on a map in < 1 second
VIDEOS for GEO-TEMPORAL VISUALISATION
HEAT MAPS
FIND MATCHING RESULTS IN A CUSTOM AREA
17. More Sophisticated Analytics Benefits from GPU
17
Simple
Reporting
Standard
Analytics
Real-time Analytics Machine
Learning
Deep Learning
List defaults from
customers in the last
3 years.
What is the default
rate for customers
over a certain age, by
region? by income?
What is the risk-
profile of this
customer up to and
including the
transactions he made
10 seconds ago?
Given location,
buying history,
demographic, past-
history, past-
purchases, what is
the likelihood this
customer will default?
Deduce from
unspecified signals
across a wide range
of datasets the
likelihood this
customer will default?
INCREASING BENEFIT FROM GPUs
18. Advanced In-Database Analytics
1. User-defined functions (UDFs) can receive table data, do
arbitrary computations, and save output to a separate table
in a distributed manner.
2. UDFs have direct access to CUDA APIs – enables compute-
to-grid analytics for logic deployed within Kinetica.
3. Works with custom code, or packaged code. Opens the way
for machine learning/artificial intelligence libraries such as
TensorFlow, BIDMach, Caffe and Torch to work on data
directly within Kinetica.
4. Available now with C++, Python & Java bindings.
18
ORCHESTRATION LAYER WITH USER-DEFINED FUNCTIONS (UDFs)
PHYSICAL / VIRTUAL SERVER
Table A
Table n
GPU
Data returned to
output table for
further analysis &
Visualisation
CUDA Libraries
n number of Kinetica servers
Table B
Table C
Proc Server
UDF_A UDF_B UDF_n
Execution
20. Kinetica Enables Broad Enterprise Solutions
RETAIL/CPG
Omni-Channel
Customer Experience
Supply Chain Optimization
Targeted Marketing
UTILITIES
Smart Meters
Smart Grid Optimization
Infrastructure MGMT
CROSS INDUSTRY
Real-Time Analytics
Converge AI & BI
Location-Based Analytics
IoT Analytics
FINANCIAL SERVICES
Risk Modeling
Financial Crimes
Compliance
Customer Experience
HEALTHCARE
Drug Development
Precision Medicine
Patient 360
MEDIA/ENTERTAINMENT
Sentiment Analytics
Recommendation Engines
Ad Targeting
COMMUNICATIONS
Customer Churn
Network Optimization
Content Targeting
LAW ENFORCEMENT
INTEL & DEFENCE
Cyber Security
Counter-Terrorism
Border Control
Threat Detection 16
21. CASE STUDY : LOCATION BASED ANALYTICS
INTELLIGENCE: US Army - INSCOM
Oracle Spatial
(92 Minutes)
42x Lower Space
28x Lower Cost
38x Lower Power Cost
U.S Army INSCOM Migrated from Oracle to Kinetica
GPUdb
(20ms)
1 GPUdb server vs. 42 servers with Oracle 10gR2 (2011)
MISSION OBJECTIVE
• Kill or capture terrorists in real-time
• Move from document-based to entity-based search
NEW CAPABILITIES DELIVERED
• Intel analysts can do real-time geospatial analytics on 200B
new records per day from 200+ UAV, SIGINT, ISR, and
GEOINT streaming big data feeds
• Military analysts are able to query and visualise billions to
trillions of near real-time objects
SOLUTION OVERVIEW
• US Army’s in-memory computational engine for geospatial
and temporal data. A major joint cloud initiative within the
Intelligence Community (IC ITE)
• Queries down from 92 minutes to less than 1 second
• Replaced 42 x Oracle 10gR2 servers with SINGLE Kinetica
server – 42x lower space, 28x lower cost, 38X less power