TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop

© 2016 IBM Corporation
Big SQL – An Overview
Julio Boehl
boehl@br.ibm.com

© 2016 IBM Corporation2
Big SQL Master Class
▪ 25+ Micro Learning Topics (5-15 minute, short Videos)
 Use Cases
 Install
 Security
 Performance
 Federation
 More…!
http://bit.ly/2tHYfw0

Leaders in Technology with Common Goals
Consumers get the best in class technology with a solid roadmap
• Data Science Platform
ranked #1 by Gartner
• Leader in SQL technology
for Hadoop
• Leader in on premise and
hybrid cloud data and
analytics solutions
• Leader in Open Source
Hadoop Distribution
• 1000+ customers and
2100+ ecosystem partners
• Original architects,
developers and operators
of Hadoop
Commitment to progressing advanced analytics
through open source
+

IBM and Hortonworks Partnership History
IBM and Hortonworks
co-found ODPi
IBM IOP and HDP
Certify for ODPi V1
IBM and Hortonworks
Power partnership
IBM IOP and HDP
Certify for ODPi V2
201720162015
Big SQL Certified for
IOP and HDP
ODPi = Open Data Platform initiative. For more information, visit odpi.org
IBM and Hortonworks
Expand Partnership
+

IBM and Hortonworks Advance Client’s Analytics Journey
Big Data
Persistent Storage
Hortonworks
Data Platform
IBM
Big SQL
Big Data
Access Layer
IBM Data Science
& Machine Learning
Data Science and
Machine Learning IDE

16+ SQL Engines for Hadoop
(Alphabetical Ordering)
Big SQL (IBM)
Drill
HAWQ
Hive
Impala
InfiniDB
JethroData
MemSQL
Phoenix
Presto
Spark SQL
Splice Machine
Transwarp
Trifodion
Vertica on Hadoop
(… and I’m sure we’re missing a few …)

Fewer Users
Ad Hoc
Queries &
Discovery
Transactional
Fast Lookups
Operational
Data Store
Ad Hoc
Data
Preparation
EL-T and
Simpler
Large Scale
Queries
Hive
Complex
SQL,
Many Users,
Warehousing
Spark SQL
Drill
Phoenix + HBase
Splice-Machine
???
Cada engine SQL tem a sua vantagem

Hive is Really 3 Things…
Open source SQL on Hadoop
SQL Execution Engine
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet ORC Others…Tab Delim.
Hive Metastore
(open source)
MapReduce
Tez
Applications

Big SQL Preserves Open Source Foundation
Leverages Hive metastore and storage formats.
No Lock-in. Data part of Hadoop, not Big SQL. Fall back to Open Source Hive Engine at any time.
SQL Execution Engines
Big SQL
(IBM)
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet ORC Others…Tab Delim.
Hive Metastore
(open source)
Applications

IBM Big SQL
Making Big Data SQL Accessible
Rich SQL
Application Portability
High Performance
Enterprise Ready
ANSI Compliant SQL
IBM SQL PL Compatibility
Extensive Analytic Functions
Fluid Query for Heterogeneous DB support
SQL Compatibility
Standard ODBC and JDBC Drivers
Comprehensive File Format Support
Data Shared with Hadoop Ecosystem
Modern MPP Runtime
Cost based Optimizer
Powerful Query Rewrite
Optimized for Concurrent User Throughput
Advanced Security and Auditing
Workload Management
Self-Tuning Memory Management
Comprehensive Monitoring

IBM Big SQL on Hadoop
▪ Comprehensive ANSI SQL on Hadoop
– All standard SQL language
– Stored procedures and user-defined functions
▪ Integration with RDBMS
▪ BIG SQL LOAD command can load data from a
remote database or table
▪ Query heterogeneous databases, such as Oracle
or Teradata, using the federation feature
▪ Optimization and performance
– Replaces MapReduce layer
– In-memory operations with ability to spill to disk
– Cost-based query optimization
▪ Open hadoop storage supported
– Data persisted in HDFS, Hive, HBase
SQL-based
Application
Big SQL Engine
Data Storage
SQL MPP Run-time
HDFS
Hadoop

Boost Your Performance!
Hive 2 LLAP is good and with Big SQL it’s even better
Concurrent
Queries
Hive 2 +LLAP
24 x 2TB disks
Big SQL
12 x 2TB disks
5 7.76 4.23
25 36.24 4.42
100 102.89 4.72
Despite running with with 50% less nodes
Big SQL was 22X faster
@ 100 concurrent users
Concurrent
Queries
Hive 2+LLAP
@ 1 TB
Big SQL
@ 1 TB
Big SQL
@ 10 TB
5 7.76 4.23 8.72
25 36.24 4.42 36.39
100 102.89 4.72 37.02
Let’s try 10x more data
Big SQL performs 275% Faster
@ 100 concurrent users!
0
20
40
60
80
100
120
5 25 100
ElapsedTime
Hive 2 + LLAP and Big SQL 4.3
Hive Big SQL

▪ Easy porting of enterprise applications
▪ Ability to work seamlessly with Business Intelligence
tools like Cognos to gain insights
▪ Big SQL integrates with Information Governance Catalog
by enabling easy shared imports to InfoSphere Metadata
Asset Manager, which allows:
 Analyze assets
 Utilize assets in jobs
 Designate stewards for the assets
Oracle
SQL
DB2
SQL
Netezza
SQL
Big SQL
SQL syntax tolerance (ANSI SQL Compliant)
Cognos Analytics
InfoSphere Metadata Asset Manager
Data
engineer
Big SQL is a synergetic SQL engine that offers SQL
compatibility, portability and collaborative ability to get
composite analysis on data

Manhattan Associates
A World Leader for Warehouse Management Solutions
#1 Requirement
Existing Cognos reports (on Oracle)
must run against data archived to Hadoop.
Before Big SQL,… Failed with Cloudera Impala, Hive + Tez, MapR DB
With Big SQL,… Successful with all reports, unmodified, in 1 day PoC.
"If you're using Cognos,
Big SQL is the best option for Hadoop”
- Vivek Srivastava, Sr. Director, Manhattan Associates
Supply Chain Solutions

PERFORMANCE
Big SQL 4.3 is 3.2x faster than Spark SQL 2.1
(4 Concurrent Streams)
100TB HADOOP-DS AT A GLANCE
I/O (vs Spark)
Big SQL reads 12x less data
Big SQL writes 30x less data
COMPRESSION
60%
SPACE SAVED
WITH PARQUET
AVERAGE CPU
USAGE 76.4%
MAX I/O
THROUGHPUT
READ 4.4 GB/SEC
WRITE 2.8 GB/SEC
WORKING QUERIES
Data
engineer
Big SQL is a powerful analytical engine with leading
performance metrics on high volumes of data and
concurrent streams

BigSQLworker
Sparkexecutor
Share data in
memory
Spark 2.1 is a powerful analytic co-processor that complements
the rich SQL functionality of Big SQL
Tight integration with Spark enables Big
SQL worker and Spark Executor to
communicate in memory without writing to
disk
Bi-directional integration allows Spark
jobs can be executed from Big SQL
HDFS
Data
engineer
Big SQL is a self-tuning memory management SQL
engine that integrates with Spark 2.1

Big SQL transparently queries heterogeneous systems in a single query
 Join Hadoop to RDBMSs
 Query optimizer understands capabilities of external system including available
statistics
Pre-bundled Progress’ DataDirect drivers offers easy connection setup
Big SQL
Fluid Query (federation)
Oracle
SQL
Server
Teradata DB2
Netezza
(PDA) Informix
Microsoft
SQL Server
Hive HBase HDFSObject Store
(Swift / S3)
WebHDFS
Data
engineer
Big SQL is the ultimate hybrid SQL engine that allows
query federation by virtualizing data sources and pushes
processing where data resides

BRANCH_A FINANCE
(security admin)BRANCH_B
Role Based Access Control
enables separation
of Duties / Audit
Row Level Security
Row and Colum Level Security
Data
engineer
Big SQL is the most secure analytical engine that offers
row and column level access control (RCAC) among other
security settings

Leading Technology for Advanced Analytics
Big Data Storage
SQL Access
Data Science

TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop

Similar a TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop (20)

Más de tdc-globalcode

Más de tdc-globalcode (20)

Último

Último (20)

TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop