SlideShare una empresa de Scribd logo
1 de 34
1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SQL on Hadoop- Batch, Interactive and Beyond
SoCal Big Data Day
John Park
Solution Engineer, Hortonworks
Rm 138-140
2 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development, may be
under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation
project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release
through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache
Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual
commitment, promise or obligation from Hortonworks to deliver these features in any generally available
product.
Product features and technology directions are subject to change, and must not be included in contracts,
purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon
it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Presenter John Park
• Solution Engineer, SoCal
• Data Science ETL, data warehousing,
software design, architecture
• Previous – Various Startups, Qlik,
DW consultant, NCR
• Current – Helping customers
implement and understand open
source big data platforms
• Twitter: @jpark328
• Email: jpark@hortonworks.com
4 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Before We Began
• We have a Raffle
• 2 winner at the end of
presentation
• Prize – Amazon Echo Dot
• Ask Questions
https://www.surveymonkey.com/r/940amSQLHadoopBatch
Survey Link
5 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SQL is King
 Why ?
– Familiarity
• Primary Technical language or Business Analyst
– Powerful
• Maturation of RDBMS, EDW, OLTP
• ACID Compliant
– Flexible
• Covers Transactional Processes to Analytics
– Pervasive
• Emergence of BI tools(Tableau, BOBJ, Cognos),
• Deep ecosystem of tools
6 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Overview of SQL on Hadoop Solutions
Spark's module for working with structured data. Run SQL
queries alongside complex analytic algorithms.
Apache Hive is a data warehouse infrastructure built on
top of Hadoop for providing data summarization, query,
and analysis.
High performance relational database layer over HBase for
low latency applications.
Traditional
MPP on
Hadoop
Many traditionally architected MPP solutions have been
ported to Hadoop and some new ones have been
developed from scratch.
7 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SQL on Hadoop: Vitals
Project First GA Release
Lines of Code
(June 2015)(*) Most Typical Use
Apache Hive April, 2009 (7 Years) 1 Million EDW / ETL Offload
SparkSQL March, 2015 (2 years) 56.6k Exploratory Analytics
Apache Phoenix March, 2014 (3 Year) 200k Low-Latency
Dashboards
8 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Hive: Fast Facts
Most Queries Per Hour
100,000 Queries Per Hour
(Yahoo Japan)
Analytics Performance
100 Million rows/s Per Node
(with Hive LLAP)
Largest Hive Warehouse
300+ PB Raw Storage
(Facebook)
Largest Cluster
4,500+ Nodes
(Yahoo)
9 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Phoenix and HBase: Fast Facts
Largest Database
5 Petabytes
(Flurry)
Best Known App
Facebook Messages
(Facebook)
Fastest Ingestion
10 Million Events/s
(Yahoo)
Biggest SQL App
Real-Time SQL on 140m+ Records
(PubMatic)
10 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Hive: Strengths and Cautions
• Huge Datasets
• Deep SQL Analytics
• EDW Offload
• BI Integration
Strengths+
• Near-Real-Time
Cautions?
11 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SparkSQL: Strengths and Cautions
• Language-Integrated Query
• Exploratory Analytics
Strengths+
• Large Datasets
• High Concurrency
• EDW Offload
Cautions?
12 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Phoenix: Strengths and Cautions
• High Concurrency
• Near-Real-Time Query
• Fast Updates
Strengths+
• Deep SQL Analytics
• Full-Table Scans / Scaled Analytics
• Existing BI Integrations
Cautions?
13 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SQL on Hadoop - Good to know
 No One Size Fits all solution
 Use Cases and Query Patterns are important
 Prototype and Fail Fast
 Define Scalability and Performance criteria
SQL on Hadoop
Use Cases
15 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Hive: Analytics Use Case
 Financial Services Company:
– Analyze large dataset to identify potential fraud.
– Re platformed from a mature EDW platform.
– Selection drivers: Breadth of SQL support, query performance, cloud consumption.
 Use Case Vitals:
– Analyze > 25 billion transactions per week.
– More than 1.5 TB new data per day.
– > 4PB historical data available for analysis through cloud infrastructure.
16 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Hive Performance with Scaling - Customer results on HDP 2.2
0
500
1,000
1,500
2,000
2,500
3,000
Multi join - Allocation Aggregation Total
Elapsedtime(seconds)
Scalability on Hive
5 nodes 10 nodes 20 nodes 40 nodes 60 nodes
Benchmark test 5 nodes 10 nodes 20 nodes 40 nodes* 60 nodes*
Multi join 24:02 14:33 10:32 06:54 05:49
Aggregation 21:59 12:20 07:55 05:16 02:38
Total 46:02 26:53 18:27 12:10 08:27
Same Workload on EDW -- Full Rack 8:00
(*) Projected times based on 5, 10 and 20 node results.
Aggregation Workload
• 5% more time required on
Hive.
• < 50% solution cost versus
traditional EDW.
17 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SparkSQL Use Case: Medical
Sensor Data HDFS
Aggregations
(Hive)
HCatalog
Analytical Tools
JDBC Connector
SparkSQL
- Sensor data streamed into HDFS
- Large-scale pre-aggregations done using Hive
- SparkSQL powered dashboard for fast analytics.
18 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
+
Phoenix at PubMatic
Near-Real-Time SQL over >15 TB of Data
Using Apache Phoenix
19 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Phoenix at PubMatic
Key Concerns Solution
PubMatic offers marketing automation with real-time
analytics that enable publishers to make smarter and
faster decisions.
To empower publishers to make real-time decisions,
PubMatic needs a SQL solution that scales to
terabytes of data yet can process hundreds of
thousands of queries daily with near-real-time SLAs.
Phoenix is the only Open Source SQL Solution for
Hadoop designed for near-real-time querying, giving
PubMatic’s publishers the timely insight they need to
optimize their advertising strategies.
Phoenix’s linear scalability enables PubMatic to offer
real-time query over more than 15 terabytes of data
using commodity hardware.
Phoenix’s ANSI SQL Interface make it easy for
publishers to slice and dice data the way they want.
Read more at http://phoenix.apache.org/who_is_using.html
SQL on Hadoop
Next Evolution
21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Evolution of Hive
Batch/ETL
(HDP 2.2)
• Transactions with ACID allowing
insert, update and delete
• Temporary Tables
• Cost Based Optimizer optimizes
complex join queries well.
Faster SQL
• Tech Preview: Sub-5-Second
queries with LLAP
• Usability: SQL Query Editor, Visual
Explain and Debugging
• Transparent Data Encryption
• Cross-Site Replication
• SQL, Performance Improvements
• Hive-on-Spark (Alpha / Beta)
Sub-Second with
Rich Analytics
• Rich SQL:2011 Analytics
• Tech Preview : Druid OLAP Index for
Hive
• GA: Sub-Second queries with LLAP
• Transaction Improvements
(BEGIN/COMMIT/ROLLBACK,
MERGE)
Phase 1
(Delivered: HDP 2.2)
Phase 2
(Delivered: HDP 2.5)
Phase 3
(Planned: HDP 2.6*)
22 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Hive: Modern Architecture
Storage
Columnar Storage
ORCFile Parquet
Unstructured Data
JSON CSV
Text Avro
Custom
Weblog
Engine
SQL Engines
Row Engine Vector Engine
SQL
SQL Support
SQL:2011 Optimizer HCatalog HiveServer2
Cache
Block Cache
Linux Cache
Distributed
Execution
Hadoop 1
MapReduce
Hadoop 2
Tez Spark
Vector Cache
LLAP
Persistent Server
Historical
Current
In Development
Legend
23 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Sub-Second Hive with LLAP
Sub Second:
• LLAP: Persistent server to instantly execute SQL queries.
• Caches hottest data in RAM.
• Overcomes latencies associated with Hive on Tez or Hive on Spark.
SQL Compatibility:
• 100% Compatible with Hive SQL.
• Compatible with existing tools (BI, ETL, etc.)
Security:
• Security via HiveServer2.
• Integrates with Apache Ranger.
Hadoop
Node
Hadoop
Node
Hadoop
Node
Vector
Cache
LLAP
Server
Vector
Cache
LLAP
Server
Vector
Cache
LLAP
Server
Hive
Sever2
LLAP Servers
(1 Per Hadoop Node)
Hive SQL
24 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Hive 2 with LLAP: Architecture Overview
Deep
Storage
HDFS
S3 + Other HDFS
Compatible Filesystems
YARN Cluster
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2
(Query
Endpoint)
ODBC /
JDBC
SQL
Queries In-Memory Cache
(Shared Across All Users)
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Types SQL Features File Formats Futures
Numeric Core SQL Features Columnar ACID MERGE
FLOAT, DOUBLE Date, Time and Arithmetical Functions ORCFile Multi Subquery
DECIMAL INNER, OUTER, CROSS and SEMI Joins Parquet Scalar Subqueries
INT, TINYINT, SMALLINT, BIGINT Derived Table Subqueries Text Non-Equijoins
BOOLEAN Correlated + Uncorrelated Subqueries CSV INTERSECT / EXCEPT
String UNION ALL Logfile
CHAR, VARCHAR UDFs, UDAFs, UDTFs Nested / Complex Recursive CTEs
BLOB (BINARY), CLOB (String) Common Table Expressions Avro NOT NULL Constraints
Date, Time UNION DISTINCT JSON Default Values
DATE, TIMESTAMP, Interval Types Advanced Analytics XML Multi Table Transactions
Complex Types OLAP and Windowing Functions Custom Formats
ARRAY / MAP / STRUCT / UNION OLAP: Partition, Order by UDAF Other Features
Nested Data Analytics CUBE and Grouping Sets XPath Analytics
Nested Data Traversal ACID Transactions
Lateral Views INSERT / UPDATE / DELETE
Procedural Extensions Constraints
HPL/SQL Primary / Foreign Key (Non Validated)
Apache Hive: Journey to SQL:2011 Analytics
Legend
New
Projected: HDP 3.0
HDP 2.6
Track Hive SQL:2011 Complete: HIVE-13554
26 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Phoenix SQL: Today and Tomorrow
Phoenix: SQL for HBase
SQL Datatypes (VARCHAR, INTEGER,
etc.)
UNION ALL
JOINs: Inner, Left/Right Outer, Cross Functional Indexes
UPSERT / DELETE Date / Time Functions
Derived Tables UDFs
GROUP BY, ORDER BY, HAVING Multi Table Transactions
AVG, COUNT, MIN, MAX, SUM SQL GRANT / REVOKE
Primary keys, NOT NULL constraints Replication Management
CASE, COALESCE Column Constraints and Defaults
VIEWs OLAP, Cubing, Rollup
Secondary Indexes UNION
Flexible Schema
Current
Future
Phoenix 4.4
27 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Looking forward - What Is Druid?
Druid is a distributed, real-time, column-oriented datastore
designed to quickly ingest and index large amounts of data
and make it available for real-time query.
Features:
• Streaming Data Ingestion
• Sub-Second Queries
• Merge Historical and Real-Time Data
• Approximate Computation
28 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Druid’s Role in Scalable Data Warehousing
UI
Core Platform
S3 or HDFS
HiveServer2
MDX
Unified SQL and MDX Layer
SQL BI Tools MDX Tools
Hive
Realtime Feeds
(Kafka, Storm, etc.)
Druid
OLAP Indexes
HiveServer2
Hive SQL
Thrift Server
SparkSQL
Fast SQL MDX
Superset UI
Fast Exploration
Builder UI
SmartSense
Ranger
Atlas
Ambari
Management
SQL on Hadoop
Conclusion
30 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
0
5
10
15
20
25
30
35
40
45
50
0
50
100
150
200
250
Speedup(xFactor)
QueryTime(s)(LowerisBetter)
Hive 2 with LLAP averages 26x faster than Hive 1
Hive 1 / Tez Time (s) Hive 2 / LLAP Time(s) Speedup (x Factor)
Hive 2 with LLAP: 26x Performance Boost
31 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SQL on Hadoop: Investment Areas
Interactive Performance
Caching in Flash / SSD
Fast Analytics on Raw Text
Materialized Views
SQL Compliance
Comprehensive SQL:2011
Support
SQL ACID
SQL Standard MERGE
EDW Integrations
Joint AtScale / Syncsort Roadmap
OLAP Indexes with Druid
32 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
SQL on Hadoop Summary
Project Strengths Use Cases Unique Capabilities
Apache Hive • Most Comprehensive SQL
• Scale
• Maturity
• ETL Offload
• Reporting
• Large-Scale Aggregations
• Robust Cost-Based
Optimizer
• Mature Ecosystem (BI,
Backup, Security,
Replication)
SparkSQL • In-Memory
• Low Latency
• Exploratory Analytics
• Dashboards
• Language-Integrated
Query
Apache Phoenix • Real-Time Read/Write
• Transactions
• High Concurrency
• Dashboards
• System-of-Engagement
• Drill-Down / Drill-Up
• Real-Time Read/Write
33 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Scalable Data Warehousing on Hadoop: Overview
Other ETL
Tools
Ingest and Store ETL, Data Mining,
Advanced Analytics
Interactive SQL,
Reporting, OLAP
Kafka
HDFS
NiFi Druid
(Future)
Hive
LLAP
HAWQ
AtScale
Spark
Hive
HPL /
SQL
ACID
Atlas
Governance and
Lineage
Ranger
Advanced
Security
Syncsort
DMX-h
ETL
Zeppelin
Ambari
Hive View
BI Tools
Reporting
Tools
34 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Thank You
https://www.surveymonkey.com/r/940amSQLHadoopBatch

Más contenido relacionado

La actualidad más candente

Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with ZeppelinHortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesTimothy Spann
 
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017Timothy Spann
 
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...Databricks
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4DataWorks Summit
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Hortonworks
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...Timothy Spann
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFHortonworks
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIMithun (Matt) Mathew
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 

La actualidad más candente (20)

Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
 
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
 
Managing Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARIManaging Apache HAWQ with Apache AMBARI
Managing Apache HAWQ with Apache AMBARI
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 

Destacado

Qonnections2015 From raw data to analysis
Qonnections2015 From raw data to analysisQonnections2015 From raw data to analysis
Qonnections2015 From raw data to analysisJohn Park
 
QWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 wordsQWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 wordsJohn Park
 
Enterprise Transformation in Technology Changes
Enterprise Transformation in Technology ChangesEnterprise Transformation in Technology Changes
Enterprise Transformation in Technology ChangesRaj Matthew
 
Not fudging nudges: What Internet law can teach regulatory scholarship
Not fudging nudges: What Internet law can teach regulatory scholarshipNot fudging nudges: What Internet law can teach regulatory scholarship
Not fudging nudges: What Internet law can teach regulatory scholarshipChris Marsden
 
「稼げる施設」を建てる為に押さえておきたい四つの観点
「稼げる施設」を建てる為に押さえておきたい四つの観点「稼げる施設」を建てる為に押さえておきたい四つの観点
「稼げる施設」を建てる為に押さえておきたい四つの観点Takehito Akima
 
Debugging lightning components-SEDreamin17
Debugging lightning components-SEDreamin17Debugging lightning components-SEDreamin17
Debugging lightning components-SEDreamin17Mohith Shrivastava
 
L’Espace Entreprises Bretagne romantique accueille le Club Prescrire
L’Espace Entreprises Bretagne romantique accueille le Club PrescrireL’Espace Entreprises Bretagne romantique accueille le Club Prescrire
L’Espace Entreprises Bretagne romantique accueille le Club PrescrireEsperluette & Associés
 
Direto ao Ponto - DevOpsSummit Brasil
Direto ao Ponto - DevOpsSummit BrasilDireto ao Ponto - DevOpsSummit Brasil
Direto ao Ponto - DevOpsSummit BrasilVictor Hugo Germano
 
What we talk about when we talk about open ...
What we talk about when we talk about open ... What we talk about when we talk about open ...
What we talk about when we talk about open ... Ronald Macintyre
 
Digital Single Market and Brexit (Eleonora Rosati)
Digital Single Market and Brexit (Eleonora Rosati)Digital Single Market and Brexit (Eleonora Rosati)
Digital Single Market and Brexit (Eleonora Rosati)Eleonora Rosati
 
WiCyS Career Fair Handbook
WiCyS Career Fair HandbookWiCyS Career Fair Handbook
WiCyS Career Fair HandbookClearedJobs.Net
 
5 Estrategias para aumentar x4 tu tráfico web
5 Estrategias para aumentar x4 tu tráfico web5 Estrategias para aumentar x4 tu tráfico web
5 Estrategias para aumentar x4 tu tráfico webMiguel Florido
 
Moderni urheilumarkkinointi 30.03.2017, Sport & Business Forum, Oulu
Moderni urheilumarkkinointi 30.03.2017, Sport & Business Forum, OuluModerni urheilumarkkinointi 30.03.2017, Sport & Business Forum, Oulu
Moderni urheilumarkkinointi 30.03.2017, Sport & Business Forum, OuluArto Kuuluvainen
 
From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...
From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...
From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...Publicis Sapient
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale Timothy Spann
 
What People Want: Accenture Public Service Citizen Survey
What People Want: Accenture Public Service Citizen SurveyWhat People Want: Accenture Public Service Citizen Survey
What People Want: Accenture Public Service Citizen Surveyaccenture
 
ぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまでぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまでShinichi Takahashi
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 

Destacado (20)

Qonnections2015 From raw data to analysis
Qonnections2015 From raw data to analysisQonnections2015 From raw data to analysis
Qonnections2015 From raw data to analysis
 
QWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 wordsQWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 words
 
Enterprise Transformation in Technology Changes
Enterprise Transformation in Technology ChangesEnterprise Transformation in Technology Changes
Enterprise Transformation in Technology Changes
 
Not fudging nudges: What Internet law can teach regulatory scholarship
Not fudging nudges: What Internet law can teach regulatory scholarshipNot fudging nudges: What Internet law can teach regulatory scholarship
Not fudging nudges: What Internet law can teach regulatory scholarship
 
「稼げる施設」を建てる為に押さえておきたい四つの観点
「稼げる施設」を建てる為に押さえておきたい四つの観点「稼げる施設」を建てる為に押さえておきたい四つの観点
「稼げる施設」を建てる為に押さえておきたい四つの観点
 
Debugging lightning components-SEDreamin17
Debugging lightning components-SEDreamin17Debugging lightning components-SEDreamin17
Debugging lightning components-SEDreamin17
 
Head and shoulders
Head and shouldersHead and shoulders
Head and shoulders
 
L’Espace Entreprises Bretagne romantique accueille le Club Prescrire
L’Espace Entreprises Bretagne romantique accueille le Club PrescrireL’Espace Entreprises Bretagne romantique accueille le Club Prescrire
L’Espace Entreprises Bretagne romantique accueille le Club Prescrire
 
Direto ao Ponto - DevOpsSummit Brasil
Direto ao Ponto - DevOpsSummit BrasilDireto ao Ponto - DevOpsSummit Brasil
Direto ao Ponto - DevOpsSummit Brasil
 
What we talk about when we talk about open ...
What we talk about when we talk about open ... What we talk about when we talk about open ...
What we talk about when we talk about open ...
 
Digital Single Market and Brexit (Eleonora Rosati)
Digital Single Market and Brexit (Eleonora Rosati)Digital Single Market and Brexit (Eleonora Rosati)
Digital Single Market and Brexit (Eleonora Rosati)
 
WiCyS Career Fair Handbook
WiCyS Career Fair HandbookWiCyS Career Fair Handbook
WiCyS Career Fair Handbook
 
5 Estrategias para aumentar x4 tu tráfico web
5 Estrategias para aumentar x4 tu tráfico web5 Estrategias para aumentar x4 tu tráfico web
5 Estrategias para aumentar x4 tu tráfico web
 
Moderni urheilumarkkinointi 30.03.2017, Sport & Business Forum, Oulu
Moderni urheilumarkkinointi 30.03.2017, Sport & Business Forum, OuluModerni urheilumarkkinointi 30.03.2017, Sport & Business Forum, Oulu
Moderni urheilumarkkinointi 30.03.2017, Sport & Business Forum, Oulu
 
From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...
From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...
From Hype to Impact: Applying This Year's SXSW Highlights to Business Transfo...
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale
 
What People Want: Accenture Public Service Citizen Survey
What People Want: Accenture Public Service Citizen SurveyWhat People Want: Accenture Public Service Citizen Survey
What People Want: Accenture Public Service Citizen Survey
 
ぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまでぼくがAthenaで死ぬまで
ぼくがAthenaで死ぬまで
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Hoofdzaken App
Hoofdzaken AppHoofdzaken App
Hoofdzaken App
 

Similar a SoCal BigData Day

Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution Hortonworks
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善HortonworksJapan
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016alanfgates
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueTimothy Spann
 

Similar a SoCal BigData Day (20)

Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

SoCal BigData Day

  • 1. 1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SQL on Hadoop- Batch, Interactive and Beyond SoCal Big Data Day John Park Solution Engineer, Hortonworks Rm 138-140
  • 2. 2 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Presenter John Park • Solution Engineer, SoCal • Data Science ETL, data warehousing, software design, architecture • Previous – Various Startups, Qlik, DW consultant, NCR • Current – Helping customers implement and understand open source big data platforms • Twitter: @jpark328 • Email: jpark@hortonworks.com
  • 4. 4 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Before We Began • We have a Raffle • 2 winner at the end of presentation • Prize – Amazon Echo Dot • Ask Questions https://www.surveymonkey.com/r/940amSQLHadoopBatch Survey Link
  • 5. 5 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SQL is King  Why ? – Familiarity • Primary Technical language or Business Analyst – Powerful • Maturation of RDBMS, EDW, OLTP • ACID Compliant – Flexible • Covers Transactional Processes to Analytics – Pervasive • Emergence of BI tools(Tableau, BOBJ, Cognos), • Deep ecosystem of tools
  • 6. 6 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Overview of SQL on Hadoop Solutions Spark's module for working with structured data. Run SQL queries alongside complex analytic algorithms. Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. High performance relational database layer over HBase for low latency applications. Traditional MPP on Hadoop Many traditionally architected MPP solutions have been ported to Hadoop and some new ones have been developed from scratch.
  • 7. 7 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SQL on Hadoop: Vitals Project First GA Release Lines of Code (June 2015)(*) Most Typical Use Apache Hive April, 2009 (7 Years) 1 Million EDW / ETL Offload SparkSQL March, 2015 (2 years) 56.6k Exploratory Analytics Apache Phoenix March, 2014 (3 Year) 200k Low-Latency Dashboards
  • 8. 8 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Apache Hive: Fast Facts Most Queries Per Hour 100,000 Queries Per Hour (Yahoo Japan) Analytics Performance 100 Million rows/s Per Node (with Hive LLAP) Largest Hive Warehouse 300+ PB Raw Storage (Facebook) Largest Cluster 4,500+ Nodes (Yahoo)
  • 9. 9 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Phoenix and HBase: Fast Facts Largest Database 5 Petabytes (Flurry) Best Known App Facebook Messages (Facebook) Fastest Ingestion 10 Million Events/s (Yahoo) Biggest SQL App Real-Time SQL on 140m+ Records (PubMatic)
  • 10. 10 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Apache Hive: Strengths and Cautions • Huge Datasets • Deep SQL Analytics • EDW Offload • BI Integration Strengths+ • Near-Real-Time Cautions?
  • 11. 11 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SparkSQL: Strengths and Cautions • Language-Integrated Query • Exploratory Analytics Strengths+ • Large Datasets • High Concurrency • EDW Offload Cautions?
  • 12. 12 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Apache Phoenix: Strengths and Cautions • High Concurrency • Near-Real-Time Query • Fast Updates Strengths+ • Deep SQL Analytics • Full-Table Scans / Scaled Analytics • Existing BI Integrations Cautions?
  • 13. 13 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SQL on Hadoop - Good to know  No One Size Fits all solution  Use Cases and Query Patterns are important  Prototype and Fail Fast  Define Scalability and Performance criteria
  • 15. 15 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hive: Analytics Use Case  Financial Services Company: – Analyze large dataset to identify potential fraud. – Re platformed from a mature EDW platform. – Selection drivers: Breadth of SQL support, query performance, cloud consumption.  Use Case Vitals: – Analyze > 25 billion transactions per week. – More than 1.5 TB new data per day. – > 4PB historical data available for analysis through cloud infrastructure.
  • 16. 16 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hive Performance with Scaling - Customer results on HDP 2.2 0 500 1,000 1,500 2,000 2,500 3,000 Multi join - Allocation Aggregation Total Elapsedtime(seconds) Scalability on Hive 5 nodes 10 nodes 20 nodes 40 nodes 60 nodes Benchmark test 5 nodes 10 nodes 20 nodes 40 nodes* 60 nodes* Multi join 24:02 14:33 10:32 06:54 05:49 Aggregation 21:59 12:20 07:55 05:16 02:38 Total 46:02 26:53 18:27 12:10 08:27 Same Workload on EDW -- Full Rack 8:00 (*) Projected times based on 5, 10 and 20 node results. Aggregation Workload • 5% more time required on Hive. • < 50% solution cost versus traditional EDW.
  • 17. 17 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SparkSQL Use Case: Medical Sensor Data HDFS Aggregations (Hive) HCatalog Analytical Tools JDBC Connector SparkSQL - Sensor data streamed into HDFS - Large-scale pre-aggregations done using Hive - SparkSQL powered dashboard for fast analytics.
  • 18. 18 © Hortonworks Inc. 2011 – 2017 All Rights Reserved + Phoenix at PubMatic Near-Real-Time SQL over >15 TB of Data Using Apache Phoenix
  • 19. 19 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Apache Phoenix at PubMatic Key Concerns Solution PubMatic offers marketing automation with real-time analytics that enable publishers to make smarter and faster decisions. To empower publishers to make real-time decisions, PubMatic needs a SQL solution that scales to terabytes of data yet can process hundreds of thousands of queries daily with near-real-time SLAs. Phoenix is the only Open Source SQL Solution for Hadoop designed for near-real-time querying, giving PubMatic’s publishers the timely insight they need to optimize their advertising strategies. Phoenix’s linear scalability enables PubMatic to offer real-time query over more than 15 terabytes of data using commodity hardware. Phoenix’s ANSI SQL Interface make it easy for publishers to slice and dice data the way they want. Read more at http://phoenix.apache.org/who_is_using.html
  • 20. SQL on Hadoop Next Evolution
  • 21. 21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Evolution of Hive Batch/ETL (HDP 2.2) • Transactions with ACID allowing insert, update and delete • Temporary Tables • Cost Based Optimizer optimizes complex join queries well. Faster SQL • Tech Preview: Sub-5-Second queries with LLAP • Usability: SQL Query Editor, Visual Explain and Debugging • Transparent Data Encryption • Cross-Site Replication • SQL, Performance Improvements • Hive-on-Spark (Alpha / Beta) Sub-Second with Rich Analytics • Rich SQL:2011 Analytics • Tech Preview : Druid OLAP Index for Hive • GA: Sub-Second queries with LLAP • Transaction Improvements (BEGIN/COMMIT/ROLLBACK, MERGE) Phase 1 (Delivered: HDP 2.2) Phase 2 (Delivered: HDP 2.5) Phase 3 (Planned: HDP 2.6*)
  • 22. 22 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Apache Hive: Modern Architecture Storage Columnar Storage ORCFile Parquet Unstructured Data JSON CSV Text Avro Custom Weblog Engine SQL Engines Row Engine Vector Engine SQL SQL Support SQL:2011 Optimizer HCatalog HiveServer2 Cache Block Cache Linux Cache Distributed Execution Hadoop 1 MapReduce Hadoop 2 Tez Spark Vector Cache LLAP Persistent Server Historical Current In Development Legend
  • 23. 23 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Sub-Second Hive with LLAP Sub Second: • LLAP: Persistent server to instantly execute SQL queries. • Caches hottest data in RAM. • Overcomes latencies associated with Hive on Tez or Hive on Spark. SQL Compatibility: • 100% Compatible with Hive SQL. • Compatible with existing tools (BI, ETL, etc.) Security: • Security via HiveServer2. • Integrates with Apache Ranger. Hadoop Node Hadoop Node Hadoop Node Vector Cache LLAP Server Vector Cache LLAP Server Vector Cache LLAP Server Hive Sever2 LLAP Servers (1 Per Hadoop Node) Hive SQL
  • 24. 24 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hive 2 with LLAP: Architecture Overview Deep Storage HDFS S3 + Other HDFS Compatible Filesystems YARN Cluster LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache (Shared Across All Users)
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Types SQL Features File Formats Futures Numeric Core SQL Features Columnar ACID MERGE FLOAT, DOUBLE Date, Time and Arithmetical Functions ORCFile Multi Subquery DECIMAL INNER, OUTER, CROSS and SEMI Joins Parquet Scalar Subqueries INT, TINYINT, SMALLINT, BIGINT Derived Table Subqueries Text Non-Equijoins BOOLEAN Correlated + Uncorrelated Subqueries CSV INTERSECT / EXCEPT String UNION ALL Logfile CHAR, VARCHAR UDFs, UDAFs, UDTFs Nested / Complex Recursive CTEs BLOB (BINARY), CLOB (String) Common Table Expressions Avro NOT NULL Constraints Date, Time UNION DISTINCT JSON Default Values DATE, TIMESTAMP, Interval Types Advanced Analytics XML Multi Table Transactions Complex Types OLAP and Windowing Functions Custom Formats ARRAY / MAP / STRUCT / UNION OLAP: Partition, Order by UDAF Other Features Nested Data Analytics CUBE and Grouping Sets XPath Analytics Nested Data Traversal ACID Transactions Lateral Views INSERT / UPDATE / DELETE Procedural Extensions Constraints HPL/SQL Primary / Foreign Key (Non Validated) Apache Hive: Journey to SQL:2011 Analytics Legend New Projected: HDP 3.0 HDP 2.6 Track Hive SQL:2011 Complete: HIVE-13554
  • 26. 26 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Phoenix SQL: Today and Tomorrow Phoenix: SQL for HBase SQL Datatypes (VARCHAR, INTEGER, etc.) UNION ALL JOINs: Inner, Left/Right Outer, Cross Functional Indexes UPSERT / DELETE Date / Time Functions Derived Tables UDFs GROUP BY, ORDER BY, HAVING Multi Table Transactions AVG, COUNT, MIN, MAX, SUM SQL GRANT / REVOKE Primary keys, NOT NULL constraints Replication Management CASE, COALESCE Column Constraints and Defaults VIEWs OLAP, Cubing, Rollup Secondary Indexes UNION Flexible Schema Current Future Phoenix 4.4
  • 27. 27 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Looking forward - What Is Druid? Druid is a distributed, real-time, column-oriented datastore designed to quickly ingest and index large amounts of data and make it available for real-time query. Features: • Streaming Data Ingestion • Sub-Second Queries • Merge Historical and Real-Time Data • Approximate Computation
  • 28. 28 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Druid’s Role in Scalable Data Warehousing UI Core Platform S3 or HDFS HiveServer2 MDX Unified SQL and MDX Layer SQL BI Tools MDX Tools Hive Realtime Feeds (Kafka, Storm, etc.) Druid OLAP Indexes HiveServer2 Hive SQL Thrift Server SparkSQL Fast SQL MDX Superset UI Fast Exploration Builder UI SmartSense Ranger Atlas Ambari Management
  • 30. 30 © Hortonworks Inc. 2011 – 2017 All Rights Reserved 0 5 10 15 20 25 30 35 40 45 50 0 50 100 150 200 250 Speedup(xFactor) QueryTime(s)(LowerisBetter) Hive 2 with LLAP averages 26x faster than Hive 1 Hive 1 / Tez Time (s) Hive 2 / LLAP Time(s) Speedup (x Factor) Hive 2 with LLAP: 26x Performance Boost
  • 31. 31 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SQL on Hadoop: Investment Areas Interactive Performance Caching in Flash / SSD Fast Analytics on Raw Text Materialized Views SQL Compliance Comprehensive SQL:2011 Support SQL ACID SQL Standard MERGE EDW Integrations Joint AtScale / Syncsort Roadmap OLAP Indexes with Druid
  • 32. 32 © Hortonworks Inc. 2011 – 2017 All Rights Reserved SQL on Hadoop Summary Project Strengths Use Cases Unique Capabilities Apache Hive • Most Comprehensive SQL • Scale • Maturity • ETL Offload • Reporting • Large-Scale Aggregations • Robust Cost-Based Optimizer • Mature Ecosystem (BI, Backup, Security, Replication) SparkSQL • In-Memory • Low Latency • Exploratory Analytics • Dashboards • Language-Integrated Query Apache Phoenix • Real-Time Read/Write • Transactions • High Concurrency • Dashboards • System-of-Engagement • Drill-Down / Drill-Up • Real-Time Read/Write
  • 33. 33 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Scalable Data Warehousing on Hadoop: Overview Other ETL Tools Ingest and Store ETL, Data Mining, Advanced Analytics Interactive SQL, Reporting, OLAP Kafka HDFS NiFi Druid (Future) Hive LLAP HAWQ AtScale Spark Hive HPL / SQL ACID Atlas Governance and Lineage Ranger Advanced Security Syncsort DMX-h ETL Zeppelin Ambari Hive View BI Tools Reporting Tools
  • 34. 34 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Thank You https://www.surveymonkey.com/r/940amSQLHadoopBatch