XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem

Database and Data
Analytics Ecosystem
Dan Eaton
Sr. Manager, Market Development, Accelerated Computing
8/27/2019

Agenda
› Xilinx DB/DA introduction
› Ecosystem and Use Cases
› Interactive Panel Discussion
2

© Copyright 2019 Xilinx
CLOUD ON-PREMISE
HPC Video and Imaging Data Analytics Fintech Public Sector
SaaS
Developers
IP & App
Developers
Platform
Providers
End
Customers
Xilinx: The Clear Leader in FPGA
Accelerated Computing

© Copyright 2019 Xilinx4
80
100
120
140
160
2019 2020 2021 2022
Data Market Revenue ($B)

Data Platforms
and Analytics
Operational
Databases
Relational
Relational
Operational
Databases
NewSQL
Non-Relational
Non-Relational
Database
NoSQL
Distributed Data
Grid/Cache
Data Management
Analytic Data
Platforms
Analytic Databases
Distributed Data
Processing
Frameworks
Corporate
Performance
Management
Event/Stream
Processing
Reporting and
Analytics
Advanced Analytics:
Predictive Analytics
Advanced Analytics:
Machine and Deep
Learning
Traditional BI &
Reporting
Self-Service BI &
Visualization
Search (Search-
based data
platforms and
analytics)
5
$15B in new revenue between 2017 and 2022
23% CAGR
27% CAGR
29% CAGR
29% CAGR
29% CAGR
$33B 2019
Data Platforms and Analytics

Xilinx Database and Data Analytics Ecosystem
6
(MLlib)
(MLlib)

8
Accelerating Big Data Analytics
Hardik Sharma
Lead Hardware Architect, Bigstream

9
Big Data Application Performance
Security AnalyticsRisk Management
Behavioral Analytics
Natural Language Processing
IoT/Edge Analytics
Machine Learning

10
ASIC
GPU
FPGA
Hardware Accelerators Break Through the
Processing Wall

11
Acceleration
Kernels
Library
Samsung SmartSSD Platform
SmartSSD
SSD
Controller
Moves compute near data
Faster analytics
CPU offload
PCIe scalability
Runtime, Libraries, API, Drivers, Acceleration Stack
Connectors to Applications Frameworks
BIG DATA PLATFORMS
FPGA

12
BIG DATA PLATFORMS
Data Scientists &
Developers
Performance
Engineering
Acceleration Programming Model
Inhibitors for Hardware Acceleration
Data Science Programming Model
Focus on Analytics
Focus on
Microarchitecture
Programming Model Gap
Skills Gap
Many-Cores FPGASmartSSD

13
Cross platform
Hybrid acceleration
Intelligent, automatic
computation slicing
Zero code change
Dataflow Adaptation Layer
Dataflow Compiler
Hypervisor
HYPER-ACCELERATION
2X to 10X acceleration
BIG DATA PLATFORMS
Bigstream Hyper-acceleration Layer
Many-Cores FPGASmartSSD

14
Apache Spark
Executor Node
Resource
ManagerApplication
Master
Catalyst
Cluster
Management
Master NodeClient
Application
Big Data
Platform APIs
Application
Commands
Node Manager
Spark Task
Executors
Tasks
Extended Query
Optimization
Strategies
Resource Management Messages
Physical
Plan
Many-Cores

15
Bigstream Seamless Acceleration of Apache Spark
Executor Node
Dataflow
Compiler
Resource
Manager
Catalyst
Cluster
Management
Master NodeClient
Application
Big Data
Platform APIs
Application
Commands
Node Manager
Spark Task
Y
Bigstream Hypervisor
Executors
N
Accelerate?
Physical
Plan
Tasks (Normal/Hyper-accelerated)
HW Accelerator
TemplatesAccelerated Tasks
Resource Management
Messages
Application
Master
Dataflow
Adaptation
Many-Cores
FPGASmartSSD

16
AWS F1 TPC-DS Speedup Results vs Spark
Time(s)
Lowerisbetter
CPU: AWS F1 instance with 8 vCPU with one Xilinx VU9P FPGA
~4x on average
~5x scan heavy
Query Number

17
TPC-DS Scan Heavy Query - Cluster Results
Query Number
Processor: Intel Xeon Gold 6152 : 22 Cores with one SmartSSD per node. Memory: 200G per node
Spark Config: One master 4 executor nodes. 6 spark cores per node.
~4x on average
~6x scan heavy
Time(s)
Lowerisbetter

18
*SmartSSD end-to-end speedup (vs. standard SSD) on 20 GB demo data set
Find “Annoying” flights
with >10 minute delay
from scheduled departure
Query 1: Create heatmap of number of annoying
flights on US map in last 5 years
Query 2 : Create heatmap of number of annoying
flights in Bay Area since 2000
4x Faster Spark Queries on Microsoft Azure
Queries
Results
Data:
Flights, Planes,
Airports, Airlines
Data:
Flights, Planes,
Airports, Airlines
51 sec/13 sec = 3.9x faster
49 sec/11 sec = 4.4x faster

UNCLASSIFIED
High Performance Analytics at Scale -- Before ETL and Indexing
Neil Tender, BlackLynx, Senior Research Engineer
www.BlackLynx.tech
October 2, 2019

UNCLASSIFIED
The Big Data Problem
Big Data:
 We’re generating data faster than ever
 Over 90% of all the world’s data was generated in the last two years
 Over 175 ZB of data per year by 2025
Volume, Variety and Velocity
 Traditional approaches require the use of data preprocessing, such
as Extract, Transform, Load (ETL) for Data Warehousing
 The growth rate of actionable data is exponentially outpacing the
growth of analyzed data
 Most data is generated at the Edge -- impractical to rely completely
on data center-based approaches
Computational Challenges
 Cluster Computing (Apache Spark, Hadoop) does not scale and is
not practical for many use cases
 Mobile environments with Size, Weight, and Power requirements
Source: Design World Online
Analytics challenges are forcing new thinking in network, storage, and computing.

UNCLASSIFIED
BlackLynx Value Proposition
BlackLynx Enables High Performance Analytics -- without first requiring ETL and Indexing
• High volume/velocity source data is “thinned” to manageable size of useful data in real-time using FPGA/CPU
heterogenous high performance compute
 Results can then be fed into traditional data pipeline with ETL/Indexing
 Preserves ability to store raw data and perform post-analysis on complete data source
• Supports wide variety of data formats:
 unstructured and structured text, PCAP, geospacial, wide-area video and imagery
• Powerful BlackLynx APIs allow chaining of analytics primitives to perform complex searches and analytics
• BlackLynx technologies work together with your preferred visualization tools and applications to supercharge
the speed and capabilities of analytics

UNCLASSIFIED
BlackLynx Solutions
 SearchLynx - text search and pattern matching and analytics
 Complex queries including fuzzy, regex, and geolocation
searches
 Semi-structured (XML, JSON, CSV) and unstructured data
 CyberLynx - PCAP/network forensic analysis on raw files
 Layer 2-4 tags, coupled with SearchLynx to search payloads
 VisionLynx - object detection/recognition
 Wide Area Imagery still/video
 Uses accelerated DNN inferencing techniques
 SignalLynx – accelerated processing of signals
 Integrated with GNU Radio

UNCLASSIFIED
BlackLynx Powers the Next Generation of Technologies

UNCLASSIFIED
Example: BlackLynx Solution as a Splunk Enterprise App
• Extend Splunk Enterprise via “Apps” to
integrate BlackLynx software technology
and search all the raw data for cyber,
performance, and compliance purposes
• In parallel with Splunk ingest, direct all data
(PCAP for example) to BlackLynx servers
and provide high performance forensics
while reducing Splunk storage costs
• Integrate with Splunk’s 24 hour real-time
monitoring with BlackLynx raw data, 7 layer
visibility to identify and resolve issues faster
• Create opportunities for future machine
learning by fully analyzing the machine
generated data
Packet
Capture
Server
BlackLynx
Server
RAW Storage
Repository
10-100 Gbps
Network
Data
Saved PCAP/JSON/CSV
XML/Unstructured files
BlackLynx Splunk App >
for Alerts & Full Analytics
Splunk > Ingestion of PCAP,
netflow, active triggers, etc.
Bro logs / machine
data
3rd Party Applications Using
RESTful or ODBC/JDBC Interfaces
Future machine learning by fully analyzing
the machine generated data
Ability to search ALL the data enables improved visibility to
answer the hard questions while not raising Splunk license
costs
More Efficient Triage while reducing TCO
Enable automation methods to accelerate event detection
through the elimination of ETL and indexing
Discover events faster
Leverage all the Splunk capabilities while adding BlackLynx performance and high end search
capabilities (fuzzy searching, regular expressions, raw PCAP, etc.) to handle the growth in machine data

UNCLASSIFIED
Splunk Powered by BlackLynx Performance Examples
• The DNS log (2 GB) and the PCAP files (15.6 GB) are from the U.S. National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) dataset
• The tre-agrep tool was co-authored by Udi Manber, one of the great names in contemporary Computer Science and author of the well-regarded textbook Introduction to Algorithms: A Creative
Approach, which to this day enjoys wide use in Computer Science curricula worldwide
• TSHARK Search is doing the filter parameter(ip.dest) on 16 files (serially). The TSHARK Decode is only the time to build the decoded files (parallel processes) and does not include any filter time

UNCLASSIFIED
Wide Range of Hardware Platforms
Cloud
• Ultimate in
scalability
Edge
• Small form factor (SWaP) for
mobile, space, aeronautical
• Ruggedized/portable
environments
On-Premises
• High performance, dual-socket
servers
• Flexible compute/storage
configurations

UNCLASSIFIED
Example BlackLynx Primitives Implemented in
Xilinx Alveo U250
Pattern Matching Primitive Object Detection Primitive

UNCLASSIFIED
Accessing BlackLynx Technology
Check out our booth in the Alveo Showcase Demo Room!
BlackLynx web site:
https://www.blacklynx.tech/advanced-edge-processing/
Free trial request:
https://www.blacklynx.tech/get-started/
Contact Us:
https://www.blacklynx.tech/contact-us/
301.560.2797

ACCELERATE COMPUTING.
Andrea Suardi
XDF San Jose, October 2nd 2019

X E L E R A A C C E L E R A T I O N S O F T W A R E
04.10.2019 31
• Analytics microservices
• Deterministic latency

04.10.2019 32
Hard
real-time
Actionable Reactive Historical
milliseconds seconds minutes hours days
Real-time Batch

04.10.2019 33
Hard
real-time
Actionable Reactive Historical
milliseconds seconds minutes hours days
Real-time Batch

S A P - B I A N A L Y T I C S - F R A U D D E T E C T I O N
04.10.2019 34
Transaction
request
Collected
customer
behavior
Outlier?
No FraudFraud
• More detections
• Fewer servers
• Lower operational costs
microservice
Web page

S A P - B I A N A L Y T I C S - F R A U D D E T E C T I O N
04.10.2019 35
Credit card transaction frauds detection:
• 145751 data-points
• 74 features per point
• Clustered into 2000 partitions
0
50
100
150
200
250
300
Processing time [s] (*)
Xelera Analytics
OTC fp1c.2xlarge
SAP PAL
OTC s1.2xlarge
(*) Benchmarks obtained with SAP HANA PAL
on OTC; other recommender engine software
may deviate from these results

S P A R K - B I A N A L Y T I C S - R E C O M M E N D A T I O N E N G I N E
04.10.2019 36
Web page
Web service
microservice
Prediction
Ask prediction
• More recommendations per second
• Fewer servers
• Lower operational costs

S P A R K - B I A N A L Y T I C S - R E C O M M E N D A T I O N E N G I N E
04.10.2019 37
Real-Time Movie Recommendation:
• 1,000 user requests per second
• 1,682 movies
(Machine Learning models)
• 50 ms round-trip latency constraint
0
5
10
15
20
25
30
35
40
Number of cloud instances (*)
Xelera Analytics
AWS f1.2xlarge
Spark Mllib
AWS c4.8xlarge
(*) Benchmarks obtained with Apache Spark
framework on AWS; other recommender engine
software may deviate from these results

A U D I O S T R E A M I N G A N A L Y T I C S - S P E A K E R R E C O G N I T I O N
04.10.2019 38
Neural network
Audio signal representaton & preprocessing
microservice
Audio stream
Speaker
• Support for multiple user sessions connect
asynchronously to the microservices
• Scalable on-demand
• Each request must be completed within a
60ms latency window

A U D I O S T R E A M I N G A N A L Y T I C S - S P E A K E R R E C O G N I T I O N
04.10.2019 39
0
10
20
30
40
50
60
70
80
90
Concurrent sessions per
accelerator (*)
Alveo U250 (no batching, multi-DNN-model)
Tesla V100 (no batching, multi-DNN-model)
0
5
10
15
20
25
30
35
40
45
50
Alveo U250 Tesla V100
Single-request latency [ms] (*)
Single-request latency (mean) Single-session latency (max)
(*) Benchmark obtained with Alveo U250 Dell R740 server vs. NVIDIA Tesla V100 architecture on AWS EC2
p3.2xlarge instance. Other recommender engine software may deviate from these results

C A L L T O A C T I O N
04.10.2019 40
Join Xelera Analytics microservices
Alveo U200 Alveo U250 Alveo U280

41 © 2019 rENIAC. Proprietary & Confidential XILINX CONFIDENTIAL
rENIAC: Data Acceleration at Scale
Cassandra NoSQL Acceleration
Prasanna Sundararajan, CEO
October 2019

Confidential and Proprietary Material © 2019 rENIAC, Inc.
XILINX CONFIDENTIAL
DATABASES ARE PROVING INCAPABLE AND INEFFICIENT
AT KEEPING UP WITH THE RATE OF DATA GROWTH AND
USAGE WE EXPECT AND RELY ON
TOO MUCH DATA, NOT ENOUGH POWER
20202010
Data growth rate
of 50x in 10 years
Google AI projects require
2x the arithmetic operations
every 3 months
devoted to system
compute & IO in open
source databases
25%
devoted to
business logic
75%
Total CPU power dedication
Source: insidebigdata.com
Source: zdnet.com
42

XILINX CONFIDENTIAL
To keep up with the explosion of data, enterprises are
forced to adopt new data stores*
A new generation of open source data stores are designed
to scale with data & transaction growth
Scaling these data stores on existing CPU based systems is
highly inefficient
CHALLENGE IN SCALING DATA STORES ON EXISTING SYSTEMS
43
• MariaDB part of major
Linux Distros (Red Hat,
SUSE, etc)
• 1000+ customers at last
Cassandra Summit
• Elastic has 350M+
downloads to date
• Elastic has had a very
successful IPO
• 115K Cassandra nodes at
Apple
* Databases & Search
THE PROBLEM

XILINX CONFIDENTIAL
Microsoft Scale-out multi-function accelerator uses FPGAs
• Diversity of cloud work loads and…rapidly changing (weekly or monthly)
• Compression, SmartNIC, encryption, big data analytics, search
• Lower & predictable latency using FPGA accelerated ranking vs. software version
ALGORITHM, NETWORKING & DATA ACCESS ACCELERATION USING FPGAS
TRENDS
Source: Microsoft

XILINX CONFIDENTIAL
rENIAC SOFTWARE SOLVES SYSTEM AND
I/O BOTTLENECKS AND ACCELERATES AI
Up to 10x
increased
revenue
75%
devoted to business logic
25%
ACCELERATED COMPUTING POWER
significantly
lower TCO
45
Total CPU power dedication
devoted to system
compute & IO in open
source databases

XILINX CONFIDENTIAL
OUR SOLUTION:
IO & COMPUTE ACCELERATION WITH NO SW CHANGE
Acts as an I/O accelerator
to resolve any bottlenecks
Accelerates AI and analytics
by uniquely coupling infren-
cing algorithms to the data
Tightly integrates storage
class memory to a low
latency network stack
Leverages off the
shelves servers/CPU,
Xilinx FPGA, and SSD
Deployed with no software
change in both bare metal
and virtualized environments
Proprietary technology:
5 patents awarded
46

XILINX CONFIDENTIAL
COMPANY SNAPSHOT
47
L E A D E R S H I P
I N V E S T O R S P A R T N E R S
Prasanna Sundararajan
Founder & CEO
C O M P A N Y
T R A C K R E C O R D
25+ team members with
experts from Xilinx, IBM,
Riverbed, LinkedIn, AWS
& Napatech
Patents: 5 patents with
more pending
Production readiness:
Gen 1 technology
production qualified to run
24/7 in digital media
company
25+
5
Chidamber Kulkarni
Founder & CTO
24/7

XILINX CONFIDENTIAL
rENIAC Cassandra NoSQL Accelerator (rDS) has been designed to work without requiring
any changes to the client code or the database, and with minimal configuration
RENIAC DATA ENGINE USED AS A CASSANDRA NOSQL ACCELERATOR
48
rENIAC Data Proxy

XILINX CONFIDENTIAL
HIGHER PERFORMANCE FOR SCALE-OUT ARCHITECTURES
49
EXAMPLE USE CASE: ONLINE AD COMPANY PERSONALIZATION USING CASSANDRA
Current
Infrastructure
With rENIAC rENIAC advantage
DB Servers # 160 + 60 (new) = 220 10-20; rDS servers: 11,
Total: 21-31
7-10x Reduction in
Servers
DB Queries per
node #
2905 20,000-26,000 80% Lower CAPEX*
Latency per query
(SLA)
75th percentile: 7-8ms
95th percentile: 35ms
98th percentile: 60ms
99th percentile: 5-8ms Increased Revenue
Software API Cassandra
community: 2.1.13
Cassandra community:
2.1.13
No SW changes needed
Increased revenue from meeting 99th percentile SLA
can only be achieved with rENIAC
* Capital Expenditure

XILINX CONFIDENTIAL
WORKLOAD PERFORMANCE TESTING
50
Tput IncreaseLower Latency

XILINX CONFIDENTIAL
1. Customer signs POC agreement and mutual NDA with rENIAC
2. Start a POC on-prem or in the cloud
3. rENIAC will assist with configuration and support during POC
rENIAC POC PROCESS
51

XILINX CONFIDENTIAL
Contact rENIAC/Xilinx to arrange a POC or to see a live demo
Contacts
Prasanna Sundararajan, CEO: prasanna@reniac.com
Thomas Jorgensen, VP Operations & Customer Success: thomas@reniac.com
Technology
rENIAC rDS for Cassandra supports Xilinx Alveo 250 and will support Azure cloud
deployments in the future
RESOURCES AND CONTACTS
52

The Accelerated Open Source Analytics Solution

Confidential & Proprietary©Swarm64 AS, 2019
The Accelerated Open Source Analytics Solution
54
Accelerate the World’s most Fully Featured Open Source Database with Reconfigurable Hardware
Elasticity, Speed, Connectivity
Simple to integrate with no lock-in
Low TCO + Low Wattage + Reconfiguration

Confidential & Proprietary©Swarm64 AS, 2019 55
Customer Relevance
Explore a Multitude of Data Sources
Sensor data
Time series data
Geospatial data
All existing operational databases
Error logs …
Enable Cutting Edge Exploration
Relational Modelling and full SQL
Near real time BI
Machine learning / Deep learning
Data science …

DATA SOURCE SYSTEMS BI TOOLS & REPORTING
EXTRACT
TRANSFORM
LOAD
(ETL, ELT)
STREAMING DATA
CURRENT /
LEGACY
DATABASES
CUSTOM APPLICATIONS
56
Enterprise Analytics: Current State

ACCELERATED
OPEN SOURCE
ANALYTICS
EXTRACT
TRANSFORM
LOAD
(ETL, ELT)
STREAMING DATA CUSTOM APPLICATIONS
DATA SOURCE SYSTEMS BI TOOLS & REPORTING
57
Enterprise Analytics: Future State

Challenge
Leading consumer loan company
in Europe
Processing entire enterprise data
pipeline – data mining, data
warehousing, reporting – within
limited time window
Solution
Swarm64 came in and accelerated
the data processing pipeline and
delivered optimized data
warehousing
Swift, low-risk integration into
existing PostgreSQL environment
Return-on-Investment weeks from
project start
Results
Processing twice as many loan
applications per day
Enabled the rapid business growth
while retaining processing speed and
data focus across the organization
58
Financial Services Case Study
Swarm64 won on: Speed, Features, Connectivity and Simple Integration with No Lock-in

Applications
Cloud or On-Premise Servers
PostgreSQL
Swarm64 Extension
HW Accelerator+
SQL Interfaces and Tools
59
Swarm64 Core Architecture

Concurrent Users (Throughput Test)Query Speed (Power Test)
97 min
618 min
274 min
1576 min
60
Performance (TPC-H 1000 Industry Standard Benchmark)
SWARM64 VS. NATIVE POSTGRESQL
(SMALLER IS BETTER)
3 Year TCO
$ 66k
$ 40k

Swarm64 Unfair Advantage: Fast, Compressed, HW Accelerated
Queue Data (low CPU load)
INTO
Compress and Finalize
INSERT VALUES
Max 20m records/sec
Executed on the HW Accelerator

Decompress Pick RowsPick Columns Result
FROM SELECT
Parallel Plan Optimized Columns
WHERE
Executed on the HW Accelerator
WHERE
62
Swarm64 Unfair Advantage: Hardware Accelerated Queries

Call to Action
Request a Demo https://www.swarm64.com
Free Trial https://www.swarm64.com/contact
Partnership Inquiries paul@swarm64.com
Press & Analysts info@swarm64.com

Founded in 2013
Large portfolio of granted and pending patents
Locations in Berlin, Cologne, Seattle, Chicago, Palo Alto
Serving the Enterprise Analytics Market
64
About Swarm64

info@swarm64.com
Follow us:
©Swarm64 AS, 2019

Challenge:
Insights from large-scale, high-velocity text data in real time
80%
OF RELEVANT DATA
RESEARCH PAPERS, NEWS ARTICLES, EMAILS, SOCIAL MEDIA FEEDS, CHAT LOGS, INTERNAL NOTES, CALL TRANSCRIPTS, PRESS
RELEASES
50%
GROWTH YoY
+
UNSTRUCTURED TEXT
DATA
TEXT DATA GROWTH
60%
OF DATA
TEAMS’ WORK
DATA PREPARATION
• Data is streaming, non-stationary, large-scale, noisy
• Speed and scalability
• Robustness and transparency

Nucleus by SumUp
Beyond search, a new paradigm for discovery, learning, insight, & action.
A flexible platform of powerful learning modules, designed for the most challenging problems.
Accepts 5 data
formats
Upload your data
or access data
feeds
Keyword filtering,
elimination and
discovery
Identify / drill down on
topics
Doc summaries and
sources
Consensus,
prevalence and
sentiment analysis
Global sentiment
intelligence

Increase efficiency & capabilities, significant time
savings, potential to reduce infrastructure costs
 Enable real-time topic extraction & sentiment
 4X faster preprocessing**
 80X faster topic extraction model
**anticipate further acceleration with implementationon FPGAs (YE ’19)
 Enhanced capabilities: sentiment, consensus, recommendation,
author connectivity, transfer, contrast, and historical analysis (currently,
this analyses is done manually by staff members rather than
computationally)
 95% increase in computational efficiency
 10.9 mil (large) & 2.2 mil (mid) computation hours saved
annually
 Potential reduction in infrastructure costs
Runtime (seconds) GPUs
GenSim on
CPU
Nucleus on
FPGA+CPU
Preprocessing (distributed CPUs) 1421 325
Topic extraction 6875 83
Topic summary 360 395
Document recommendation 90 100
Document sentiment not supported 0.1 per tweet
Topic sentiment not supported 80
Topic consensus not supported 80
Author Connectivity not supported 515
Topic Transfer not supported 10
Historical Analysis not supported 1255
Assumptions: Extract 20 topics for each 1GB of Twitter data
Research
suggests
sparse matrix
computations
less efficient
on GPUs than
CPUs/FPGAs*
*Supporting papers: 1) Ji, Satish, Li, Dubey 2016 "Parallelizing Word2Vec in Shared and Distributed Memory";
2)Fowers, Ovtcharov, Strauss, Chung, Stitt 2014 “A High Memory Bandwidth FPGA Accelerator for Sparse
Matrix Vector Multiplication
GenSim on
CPU
Nucleus on
FPGA+CPU
5 petabytes annually (large social media /
gaming companies)
Computation Hours 11,522,222 566,667
1 petabyte annually (mid-size social media /
gaming company)
Computation Hours 2,304,444 113,333
Cost assumptions: assumes AWS reserved instance, 3 year term paid up front. storage raw docs:
$0.023/GB/month; storage processed docs: $ 0.115/GB/month, back-up cost raw docs: $0.004/GB; back-up
costs processed docs: $0.095/GB; data transfer costs: $0.09/GB; compute cost for CPU: $0.621/hr; compute
cost for FPGA: $0.717/hr

Nucleus by SumUp Analytics
www.sumup.ai
SOLUTION AVAILABLE ON

CK Tan | CEO
cktan@vitessedata.com

Greenplum: Open-Source DW Solution
• Field tested with widespread adoption in Telco,
Financial, Government, Retail, Insurance, …
• ~5% market-share currently, growing slowly
• ~150mm per year
https://discovery.hgdata.com/product/greenplum-database

Deepgreen DB: a (much) better Greenplum
More Speed
Between 2 – 15X faster for
complex OLAP queries
while maintaining 100%
compatibility.
More Connected
Dynamically read/write
AWS S3, HDFS, Oracle,
Kafka, etc.
More Intelligent
Integrated in-database
machine learning,
geospatial function, video
decoding and object
classification.

Gain Speed by Removing Bottleneck
Abundant Storage
• TB RAM
• NVMe SSD
• Smart SSD
Abundant Network
Bandwidth
• 10, 100 GigE is common
CPU severely limited
• Same old Xeon
• Xeon-Phi is a no-show
The New Bottleneck

Core Technology
Accelerate SQL through full exploits
of x86, FPGA & SSD
• JIT code-gen on SQL
• Use FPGA to relief CPU
• SIMD column-store + zonemap
• Performant network interconnect
• Integrated In-database Machine
Learning, Geospatial and Video
Decoding with Xilinx FPGA
 Keep pushing compute to data
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4
Speed Up across available HW

Customer Use Case
• TELCO – churn analysis, BI, end-user usage application, etc.
• IOT – analysis on self-service data lake, SIM-card life-management
• Smart Cities – video discovery / log discovery
• Internet Company – anti-fraud, customer tagging, BI and reporting

FPGA Support
• Alveo U200, U250 – video discovery and log discovery applications
• Alveo U50 – Accelerated Postgres for Analytics
• AWS F1
• Azure
• Samsung Smart-SSD (coming soon)

Accelerating Hive: Big Data Query
Processor

Vision:
To enable customers to achieve significant performance improvement and
cost-savings beyond what traditional methods of computing can provide
About BigZetta
Location:
R&D center in Noida (India)
Business presence in San Jose and Seattle (USA)
Expertise:
Big-Data technologies like , and
Power/Performance/Area Optimized Hardware design
Performance optimization of software applications using Hardware-
Software co-design

Our Product Portfolio
Hadoop Accelerator
Hive
Accelerator
Hardware
IPs
bzQAccel

Why accelerate ?
Most widely used Query Processing Engine in Big Data eco-system
More than 10,000 companies use Hive for their Big Data processing needs
Caters to variety of requirements: data warehouse, ETL, analytics etc.
Hive’s use for BI queries has critical runtime requirements (sub-second)
Provided by all major Big Data vendors:

Data Analytics With Hive
Faster
Scale Up
Scale Out
 CPU clock-speeds have saturated
 Scale Up/Out give diminishing returns
 How to get more speed?

FPGA Driven Acceleration
Work as co-processor to CPU
Speed-up compute intensive
tasks
Available on all major clouds
(AWS, Azure, Alibaba, Nimbix …) How to
get
benefits
of FPGA in
Hive?

CPU Middleware FPGA
bzQAccel
 Middleware between
Hive and underlying
hardware
 Optimizes query
execution plan suited
for FPGAs
 Provides fastest
execution of the plan
on FPGAs
 For different queries,
no need to recompile
either the host code or
FPGA kernel
 Minimal penalty of
data movement
and Table
Transfer table
data to kernel
Call kernel
computation
Pass result back
to Hive
bzQAccel
Loaded with SQL
operations
Call to
kernel
Query
Call to host

bzQAccel results on select TPCH benchmark queries
0
1
2
3
4
5
6
7
8
9
q5_1 q5_2 q5 q8 q10 q14 q19
Runtime(secs)
Queries
FPGA Accelerated Hive on TPCH Queries
Default Hive FPGA Hive
1TB of table data. 5 node cluster on Nimbix cloud.

Solution: bzQAccel (BigZetta Query
Accelerator)
bzQAccel
No software or query
changes required
4x speed-up of
analytical queries
1-click install over any
Hive distribution
Technology extensible to Spark,
Presto, Impala, Druid ….

Availability
Supported Xilinx platforms: Alveo U200, U250 and U280
Whitepaper, datasheet and demo available at
http://www.bigzetta.com/
Trial software available on Nimbix cloud
To request for an evaluation: sales@bigzetta.com
Fill an evaluation checklist to help with qualification

Leading Application Acceleration

www.inaccel.com™
helps companies speedup
their applications
by providing ready-to-use
accelerators-as-a-service in
the cloud or on-prem
15x Speedup
4x Lower TCO
Zero code changes
8
9

www.inaccel.com™
Applications and Platforms
• Applications
• Platforms
• Partnerships
Machine learning Financial Analytics
9
0

www.inaccel.com™
Integrated solution for Application Acceleration
9
1
InAccel Scalable FPGA Resource Manager
Accelerated ML suite
On-premise Cloud
Higher Performance
Up to 16x Speedup compared to
highly optimized libraries
Lower Cost
Up to 4x lower TCO
Zero-code changes
Seamless integration to widely
used frameworks
Easy deployment
Docker-based container for
seamless integration
On-prem or on cloud
Available on cloud and on-prem

www.inaccel.com™
InAccel Technology: Coral FPGA Resource Manager
˃ Coral abstracts FPGA resources
(device, memory), enabling fault-
tolerant heterogeneous distributed
systems to easily be built and run
effectively.
9
2
Worlds’ first FPGA Orchestrator:
Program against your FPGAs like
it’s a single pool of accelerators
InAccel Coral
Resource
Manager
InAccel Runtime
- Resource isolation
Applications
FPGA drivers
Serve
r
FPGA
Kernels
“automated deployment, scaling,
and management of FPGAs”

www.inaccel.com™
InAccel Docker Service
˃ Sustain FPGA driver
compatibility between the host
and the containers
• discover available resources
• mount/isolate visible devices
‒ forget --priviledged
• resolve library dependencies
93
FPGAs
(Intel/Xilinx)
Server
FPGA
RunTime Host OS
InAccel
Container Runtime Docker engine
App App App
InAccel’s Coral
Device Plugin
containers

www.inaccel.com™
Products (Accelerators as IP)
www.inaccel.com 94
• Logistic Regression
• K-means Clustering
• Naïve-Bayes
• FAISS (Similarity search)
•
Speedup
15x
14x
5x
2x
6x
Cost reduction
4x
4x
2x
1.5x
2x
https://github.com/inaccel

www.inaccel.com™
Performance evaluation on Machine Learning
˃ Up to 15x speedup for LR ML
(7.5x overall)
˃ Up to 14x speedup for Kmeans
ML (6.2x overall)
˃ Spark- GPU* (3.8x – 5.7x)
˃ F1.4x
16 cores + 2 FPGAs (InAccel)
˃ R5d.4x
16 cores
9
5
r5d.4x
0 500 1000 1500
Logistic Regression execution
time MNIST 24GB, 100 iter.
(secs)
Data preprocessing Data transformation
ML training
15x Speedup
r5d.4x
f1.4x (InAccel)
0 500 1000 1500 2000 2500
K-Means clustering exection time
MNIST 24GB, 100 iter. (secs)
Data preprocessing Data transformation ML training
14x Speedup
*[Spark-GPU: An Accelerated In-Memory Data Processing Engine on
Clusters]

www.inaccel.com™
Serverless deployment
˃ Integrated framework for serverless
deployment
˃ Compatible with Kubernetes
˃ Compatible with Kubeless, Knative
˃ Users only have to upload the images on
the S3 bucket and then InAccel’s FPGA
Manager automatically deploy the cluster
of FPGAs, process the data and then store
back the results on the S3 bucket.
˃ Users do not have to know anything about
the FPGA execution.
9
6
Amazon S3 Amazon S3
Cluster of Amazon
EC2 f1 instances
trigger
InAccel FPGA Resource Manager
f1 library of
accelerated
functions
Upload files Download files
Accelerated
function
https://medium.com/@inaccel/fpgas-goes-serverless-on-kubernetes-55c1d39c5e30

www.inaccel.com™
Software simplicity
9
7
30x simpler code
https://github.com/Xilinx/AWS-F1-Developer-Labs/blob/master/helloworld_ocl/src/host.cpp

www.inaccel.com™
Example on scaling to 2 FPGA using the resource
manager for logistic regression
9
8
1.86x speedup using 2 FPGAs
simply by changing a line
inaccel start --fpga=xilinx:0,xilinx:1
You specify how many
FPGAs you want to
use
inaccel start --fpga=all
or

www.inaccel.com™
Apache Arrow Summing up
˃ Seamless Arrow integration
˃ Page-aligned
columnar format
˃ Native memory map
˃ Zero-copy operations
99
App1
Coral FPGA
Resource Manager
FPGA Cluster
App2 App3
columnar
format
structure
DRAM

www.inaccel.com™
Try it now for free
˃ Get now for free a license for the Coral Resource Manager
https://inaccel.com/license/
Scale Xilinx’s cores (compression or OpenCV)
‒ https://docs.inaccel.com/latest/develop/examples/
˃ Use the open-source ML cores:
https://github.com/inaccel
100

Application Acceleration, seamlessly
www.inaccel.com
info@inaccel.com
USA:
500 Delaware Ave STE 1, #1960
Wilmington, DE 19801
USA
Europe (Design Center):
Formionos 47
Kesariani 116 33
Athens, Greece

XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem

Similar to XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem (20)

Recently uploaded

Recently uploaded (20)

XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem