Más contenido relacionado
La actualidad más candente (20)
Similar a TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop (20)
Más de tdc-globalcode (20)
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop
- 1. © 2016 IBM Corporation
Big SQL – An Overview
Julio Boehl
boehl@br.ibm.com
- 2. © 2016 IBM Corporation2
Big SQL Master Class
▪ 25+ Micro Learning Topics (5-15 minute, short Videos)
Use Cases
Install
Security
Performance
Federation
More…!
http://bit.ly/2tHYfw0
- 3. © 2016 IBM Corporation3
Leaders in Technology with Common Goals
Consumers get the best in class technology with a solid roadmap
• Data Science Platform
ranked #1 by Gartner
• Leader in SQL technology
for Hadoop
• Leader in on premise and
hybrid cloud data and
analytics solutions
• Leader in Open Source
Hadoop Distribution
• 1000+ customers and
2100+ ecosystem partners
• Original architects,
developers and operators
of Hadoop
Commitment to progressing advanced analytics
through open source
+
- 4. © 2016 IBM Corporation4
IBM and Hortonworks Partnership History
IBM and Hortonworks
co-found ODPi
IBM IOP and HDP
Certify for ODPi V1
IBM and Hortonworks
Power partnership
IBM IOP and HDP
Certify for ODPi V2
201720162015
Big SQL Certified for
IOP and HDP
ODPi = Open Data Platform initiative. For more information, visit odpi.org
IBM and Hortonworks
Expand Partnership
+
- 5. © 2016 IBM Corporation5
IBM and Hortonworks Advance Client’s Analytics Journey
Big Data
Persistent Storage
Hortonworks
Data Platform
IBM
Big SQL
Big Data
Access Layer
IBM Data Science
& Machine Learning
Data Science and
Machine Learning IDE
- 6. © 2016 IBM Corporation6
16+ SQL Engines for Hadoop
(Alphabetical Ordering)
Big SQL (IBM)
Drill
HAWQ
Hive
Impala
InfiniDB
JethroData
MemSQL
Phoenix
Presto
Spark SQL
Splice Machine
Transwarp
Trifodion
Vertica on Hadoop
(… and I’m sure we’re missing a few …)
- 7. © 2016 IBM Corporation7
Fewer Users
Ad Hoc
Queries &
Discovery
Transactional
Fast Lookups
Operational
Data Store
Ad Hoc
Data
Preparation
EL-T and
Simpler
Large Scale
Queries
Hive
Complex
SQL,
Many Users,
Warehousing
Spark SQL
Drill
Phoenix + HBase
Splice-Machine
???
Cada engine SQL tem a sua vantagem
- 8. © 2016 IBM Corporation8
Hive is Really 3 Things…
Open source SQL on Hadoop
SQL Execution Engine
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet ORC Others…Tab Delim.
Hive Metastore
(open source)
MapReduce
Tez
Applications
- 9. © 2016 IBM Corporation9
Big SQL Preserves Open Source Foundation
Leverages Hive metastore and storage formats.
No Lock-in. Data part of Hadoop, not Big SQL. Fall back to Open Source Hive Engine at any time.
SQL Execution Engines
Big SQL
(IBM)
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet ORC Others…Tab Delim.
Hive Metastore
(open source)
Applications
- 10. © 2016 IBM Corporation10
IBM Big SQL
Making Big Data SQL Accessible
Rich SQL
Application Portability
High Performance
Enterprise Ready
ANSI Compliant SQL
IBM SQL PL Compatibility
Extensive Analytic Functions
Fluid Query for Heterogeneous DB support
SQL Compatibility
Standard ODBC and JDBC Drivers
Comprehensive File Format Support
Data Shared with Hadoop Ecosystem
Modern MPP Runtime
Cost based Optimizer
Powerful Query Rewrite
Optimized for Concurrent User Throughput
Advanced Security and Auditing
Workload Management
Self-Tuning Memory Management
Comprehensive Monitoring
- 11. © 2016 IBM Corporation11
IBM Big SQL on Hadoop
▪ Comprehensive ANSI SQL on Hadoop
– All standard SQL language
– Stored procedures and user-defined functions
▪ Integration with RDBMS
▪ BIG SQL LOAD command can load data from a
remote database or table
▪ Query heterogeneous databases, such as Oracle
or Teradata, using the federation feature
▪ Optimization and performance
– Replaces MapReduce layer
– In-memory operations with ability to spill to disk
– Cost-based query optimization
▪ Open hadoop storage supported
– Data persisted in HDFS, Hive, HBase
SQL-based
Application
Big SQL Engine
Data Storage
SQL MPP Run-time
HDFS
Hadoop
- 12. © 2016 IBM Corporation12
Boost Your Performance!
Hive 2 LLAP is good and with Big SQL it’s even better
Concurrent
Queries
Hive 2 +LLAP
24 x 2TB disks
Big SQL
12 x 2TB disks
5 7.76 4.23
25 36.24 4.42
100 102.89 4.72
Despite running with with 50% less nodes
Big SQL was 22X faster
@ 100 concurrent users
Concurrent
Queries
Hive 2+LLAP
@ 1 TB
Big SQL
@ 1 TB
Big SQL
@ 10 TB
5 7.76 4.23 8.72
25 36.24 4.42 36.39
100 102.89 4.72 37.02
Let’s try 10x more data
Big SQL performs 275% Faster
@ 100 concurrent users!
0
20
40
60
80
100
120
5 25 100
ElapsedTime
Hive 2 + LLAP and Big SQL 4.3
Hive Big SQL
- 13. © 2016 IBM Corporation14
▪ Easy porting of enterprise applications
▪ Ability to work seamlessly with Business Intelligence
tools like Cognos to gain insights
▪ Big SQL integrates with Information Governance Catalog
by enabling easy shared imports to InfoSphere Metadata
Asset Manager, which allows:
Analyze assets
Utilize assets in jobs
Designate stewards for the assets
Oracle
SQL
DB2
SQL
Netezza
SQL
Big SQL
SQL syntax tolerance (ANSI SQL Compliant)
Cognos Analytics
InfoSphere Metadata Asset Manager
Data
engineer
Big SQL is a synergetic SQL engine that offers SQL
compatibility, portability and collaborative ability to get
composite analysis on data
- 14. © 2016 IBM Corporation15
Manhattan Associates
A World Leader for Warehouse Management Solutions
#1 Requirement
Existing Cognos reports (on Oracle)
must run against data archived to Hadoop.
Before Big SQL,… Failed with Cloudera Impala, Hive + Tez, MapR DB
With Big SQL,… Successful with all reports, unmodified, in 1 day PoC.
"If you're using Cognos,
Big SQL is the best option for Hadoop”
- Vivek Srivastava, Sr. Director, Manhattan Associates
Supply Chain Solutions
- 15. © 2016 IBM Corporation16
PERFORMANCE
Big SQL 4.3 is 3.2x faster than Spark SQL 2.1
(4 Concurrent Streams)
100TB HADOOP-DS AT A GLANCE
I/O (vs Spark)
Big SQL reads 12x less data
Big SQL writes 30x less data
COMPRESSION
60%
SPACE SAVED
WITH PARQUET
AVERAGE CPU
USAGE 76.4%
MAX I/O
THROUGHPUT
READ 4.4 GB/SEC
WRITE 2.8 GB/SEC
WORKING QUERIES
Data
engineer
Big SQL is a powerful analytical engine with leading
performance metrics on high volumes of data and
concurrent streams
- 16. © 2016 IBM Corporation17
BigSQLworker
Sparkexecutor
Share data in
memory
Spark 2.1 is a powerful analytic co-processor that complements
the rich SQL functionality of Big SQL
Tight integration with Spark enables Big
SQL worker and Spark Executor to
communicate in memory without writing to
disk
Bi-directional integration allows Spark
jobs can be executed from Big SQL
HDFS
Data
engineer
Big SQL is a self-tuning memory management SQL
engine that integrates with Spark 2.1
- 17. © 2016 IBM Corporation18
Big SQL transparently queries heterogeneous systems in a single query
Join Hadoop to RDBMSs
Query optimizer understands capabilities of external system including available
statistics
Pre-bundled Progress’ DataDirect drivers offers easy connection setup
Big SQL
Fluid Query (federation)
Oracle
SQL
Server
Teradata DB2
Netezza
(PDA) Informix
Microsoft
SQL Server
Hive HBase HDFSObject Store
(Swift / S3)
WebHDFS
Data
engineer
Big SQL is the ultimate hybrid SQL engine that allows
query federation by virtualizing data sources and pushes
processing where data resides
- 18. © 2016 IBM Corporation19
BRANCH_A FINANCE
(security admin)BRANCH_B
Role Based Access Control
enables separation
of Duties / Audit
Row Level Security
Row and Colum Level Security
Data
engineer
Big SQL is the most secure analytical engine that offers
row and column level access control (RCAC) among other
security settings
- 19. © 2016 IBM Corporation20
Leading Technology for Advanced Analytics
Big Data Storage
SQL Access
Data Science