SlideShare una empresa de Scribd logo
1 de 32
Why Qlik® is Better with Big Data
John Park, Qlik
Eli Singer, JethroData
28 April, 2015
2#qonnections
Eli Singer
• CEO & Co-Founder, JethroData
• Data management, Information Security,
e-commerce
• Over 20 Years in leading start-ups
• Twitter: @jethrodata
Presenters
John Park
• Senior Solutions Architect, Partner Engineering
• ETL, data warehousing, software design, *NIX,
architecture
• 2 years Qlik, 7 years data warehouse consultant
• Twitter: @jpark328
3#qonnections
Legal Disclaimer
©Qlik Confidential
This Presentation contains forward-looking statements, including, but not limited to, statements regarding the value and effectiveness of
Qlik's products, the introduction of product enhancements or additional products, Qlik’s partner and customer relationships, and Qlik's
growth, expansion and market leadership, that involve risks, uncertainties, assumptions and other factors which, if they do not materialize
or prove correct, could cause Qlik's results to differ materially from those expressed or implied by such forward-looking statements. All
statements, other than statements of historical fact, are statements that could be deemed forward-looking statements, including
statements containing the words "predicts," "plan," "expects," "anticipates,“ “see,” "believes," "goal," "target," "estimate," "potential," "may",
"will," "might," "could," and similar words. Qlik intends all such forward-looking statements to be covered by the safe harbor provisions for
forward-looking statements contained in Section 21E of the Exchange Act and the Private Securities Litigation Reform Act of 1995. Actual
results may differ materially from those projected in such statements due to various factors, including but not limited to: risks and
uncertainties inherent in our business; our ability to attract new customers and retain existing customers; our ability to effectively sell,
service and support our products; our ability to manage our international operations; our ability to compete effectively; our ability to
develop and introduce new products and add-ons or enhancements to existing products; our ability to continue to promote and maintain
our brand in a cost-effective manner; our ability to manage growth; our ability to attract and retain key personnel; the scope and validity of
intellectual property rights applicable to our products; adverse economic conditions in general and adverse economic conditions
specifically affecting the markets in which we operate; and other risks and uncertainties more fully described in Qlik's publicly available
filings with the Securities and Exchange Commission. Past performance is not necessarily indicative of future results. The forward-
looking statements included in this presentation represent Qlik's views as of the date of this presentation. Qlik anticipates that subsequent
events and developments will cause its views to change. Qlik undertakes no intention or obligation to update or revise any forward-looking
statements, whether as a result of new information, future events or otherwise. These forward-looking statements should not be relied
upon as representing Qlik's views as of any date subsequent to the date of this presentation.
This Presentation should be read in conjunction with Qlik's periodic reports filed with the SEC (SEC Information), including the disclosures
therein of certain factors which may affect Qlik’s future performance. Individual statements appearing in this Presentation are intended to
be read in conjunction with and in the context of the complete SEC Information documents in which they appear, rather than as stand-
alone statements. This presentation is intended to outline our general product direction and should not be relied on in making a
purchase decision, as the development, release, and timing of any features or functionality described for our products remains
at our sole discretion.
© 2015 QlikTech International AB. All rights reserved. Qlik®, QlikView®, Qlik® Sense, QlikTech®, and the Qlik logos are trademarks of
QlikTech International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or
registered trademarks of their respective owners.
4#qonnections
Agenda: Why Qlik® is Better with Big Data
• Introduction
• Notes about Big Data / Hadoop / Data Lakes
• Current state of Data Lake implementations
• JethroData overview
• Demo: Let’s analyse 2.5billion rows from the Data Lake
• JethroData architecture
• Why Qlik is better with Big Data
• Key takeaways
5#qonnections
Introduction
Data, technology, use cases are growing
exponentially, but it seems
we do not know what to do
6#qonnections
Big Data / Hadoop / Data Lakes
Big Data is a marketing term
7#qonnections
Big Data / Hadoop / Data Lakes
Collection of services and toolkits working
with HDFS
8#qonnections
Big Data / Hadoop / Data Lakes
Architecture pattern where data is stored/landed for
processing by *
9#qonnections
What is happening ?
• Enterprise are adopting Hadoop for Data Scientist (Science Project -> Real “Production”)
• Hadoop is maturing
• Data Lake Architecture is being adopted
• Companies and startups are building innovative services on Hadoop
• Data is becoming life blood of companies
However there are incredible amount of challenges for using Hadoop as single source of
truth and sole underpinning of BI Platform:
Image source: Gartner's 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business, Gartner (August 2014)
Data Lake vision vs. reality
Scalability, low cost and performance were promised but…
Data Lake vision vs. reality
Scalability, Low Cost, and Performance were promised but…
It’s been a challenge working with BI Tools
Is the Data Lake frozen ?
12#qonnections
Reasons why we are struggling
• Data Lakes were not designed to run ad-hoc / interactive queries
• Hadoop was designed to run Batch process(ML, ETL, Predictive, Canned Reporting)
processes
• Designed for data scientist not business analyst
• Our trade off: cost vs. scale vs. performance
13#qonnections
Strategies to bridging the gap
• Use best of technology to overcome the gap
• Solve technical problems one issue at a time
• Let’s do Interactive Query on Hadoop!
Eli Singer – More about JethroData
15#qonnections
Who is Jethrodata?
• A Qlik Technology Partner
• Developing next-generation analytical database
• Focused on making interactive BI on Big Data a reality
• Combine analytical DB design with full-indexing
technology
• JethroData 1.0 went GA Apr 7, 2015
• Offices in NY and Israel
• Backed by world-class VCs
16#qonnections
Common use cases
• Challenge: BI on Hadoop is slow
• Current solution: Replicate data
into a separate EDW system for
fast BI
• Challenge: EDW is expensive
• Current solution: Migrate batch
processes to Hadoop, keep BI
on EDW
• Pain: Maintaining two separate data systems is
expensive and an operational nightmare
• With Jethro: Run fast BI directly on data in Hadoop
17#qonnections
Why Is Hadoop So Slow?
Architecture:
MPP / Full-Scan
(All SQL-on-Hadoop)
Query: list books by
author “Stephen King”
Process: each librarian is
assigned a rack, they
then view each book,
check if author is
“Stephen King”, if so, get
book title
Result: too slow, costly,
unscalable
18#qonnections
JethroData and Big Data: Index Access
Architecture:
Index Access
(Only JethroData)
Query 1: list books by
author “Stephen King”
Process: go-to Author
index, entry of
“Stephen King”, get
list of books, fetch
only these books
Result: Fast, minimal
resources, scalable
Let’s Analyze 2.5 Billion Rows
• Hadoop Environment - CDH 5.3 / Severs: NN: 1x r3.large; DN: 9x m1.large
• Jethrodata Server 1X- r3.8xlarge: 244GB RAM; 320GB SSD (SPOT)
• Qlik Sense 1.10 - r3.2xlarge 4 vCPU 30 Gb RAM
• Demo 1 TPC-DS
• Based on TPC-DS, Replication Factor 1,000
• Sales_demo table based on store_sales fact table with dimension data
added
• 2.5B rows, 33 columns
• 600GB raw data
• Demo 2 Airline
• Airline Data 123 Million Commercial Flight Data + CSV(Hybrid
Architecture)
Qlik and Jethro Demo Setup
Business Discovery Apps
Qlik Server
Big Data Platform
Big Data Indexing
HTTP, HTTPS
Protocol
Hadoop
Client
HDFS
Everything hosted on Amazon Web Services
Direct Discovery
ODBC Protocol
Data
Node
Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state
Data
Node
Data
Node
Data
Node
Data
Node
Query
Executor
Query
Executor
Query
Executor
Query
Executor
Query
Executor
Query
Planner/
Mgr
Query
Planner/
Mgr
Query
Planner/
Mgr
Query
Planner/
Mgr
Query
Planner/
Mgr
Performance and resources based on the size of the dataset
SQL-on-Hadoop Architecture: MPP/Full-Scan
Data
Node
Data
Node
Data
Node
Data
Node
Data
Node
Jethro
Query
Node
Query
Node
Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state
1. Index Access 2. Read data only for require rows
Performance and resources based on the size of the result-set
Jethro SQL-on-Hadoop Architecture: Index Access
23#qonnections
SQL on Hadoop: competitive landscape
• Hive
• Impala
• Presto
• SparkSQL
• Drill
• Pivotal/HAWQ
• IBM/Big SQL
• Actian
• Teradata/SQL-H
• Microsoft/PDW
Full-Scan Based Solutions
Read all rows. Every Time.
• JethroData
Index Based Solution
Read ONLY needed rows.
Use-Case Comparison:
Full-Scan: Optimal for ETL, Predictive
Index: Optimal for Interactive BI
JethroLoad
er
JethroServ
er
JethroData: system overview
Jethro
Loader
Jethro
Server
Hadoop
Data Source
• Hadoop
• EDW
• Streams
• …
1. Initial load –
extract data from
relevant sources
and load through
JethroLoader.
Incremental data
can be loaded at
short intervals
1
2
Queries
• BI Tools
• SQL client
• …
3
4
2. Index and
column files are
stored in HDFS
(or S3).
Typical size is
35% of raw data
4. JethroServer
communicates
directly with HDFS
to retrieve relevant
data.
No MapReduce /
Spark are used
3. Queries are
sent via standard
ODBC/JDBC
interface.
Automatic load-
balance across
servers
25#qonnections
Jethrodata: architecture highlights
Every Column is Indexed!
Allow users to slice & dice any way they
choose and always got fast response.
Scales to any size
From 100M to 100B, columns and indexes are
compressed and partitioned.
Super-easy to implement
Compatible with every Hadoop distribution.
Installs on separate server(s) from Hadoop
cluster.
26#qonnections
Jethro Indexes: innovative technology
• Fast to read
 Simple: Inverted-list indexes
map each column value to a
list of rows
 Fast: Direct access O(1) to
each value entry
 Scale: Distributed, highly
hierarchical compressed
bitmaps
• Fast to write
– Index files are appended,
duplicate entries allowed
– Incremental – new data
indexed as it comes in
– No locks, no random
read/write
http://www.google.com/patents/WO2013001535A3?cl=enPatent Pending:
Intelligent caching yechnology
• Reuse of intermediate/final query results
– Repeat queries in sub-seconds
• Addresses wide top-of-the-funnel queries
– Analysis starts with queries with no/few
filters
– Those queries are often repeated in
dashboard scenarios
• Transparently adapts to incremental loads
– Execution on delta data + merge saved
results
Query Speed
Query
Selectivity
Fast
Slow
Few More
Query Repeat
Query
Selectivity
Hi
Low
Few More
Query
speed
Query
Selectivity
Fast
Slow
Few More
Advantage of Qlik and Jethro Data
• Now, we can get scale, cost, and performance
• Faster queries with more selection criteria
• Faster Direct Discovery load time due to dimensional unique values
already available in JethroData indexes
• Use of system-wide optimization and smart caching
• Creates an interactive Hadoop for Business Discovery
• Allows Qlik Customers have same experience across all data sources including
Hadoop.
Qlik
APPLICATION
DIRECT DISCOVERY QUERIES
29#qonnections
Why Qlik Is Better for Big Data
• Allows users to analyze your big data the way
Business users want to.
• Scalability, Extensibility and Beautiful
Visualizations to Tell your Story.
30#qonnections
Key takeaways
• Qlik can do Big Data well with right complementary technology
• Qlik associative interface can open up new use cases with Big Data
• Qlik and Jethro can provide true interactive BI with Hadoop
31#qonnections
Follow-up resources
• Open to public Qlik Sense hub: http://jethrodata.qlik.com
• Free evaluation version of Jethro available at http://jethrodata.com/download
Thank You

Más contenido relacionado

La actualidad más candente

Qlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docxQlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docx
Antonino Barbaro ©
 
VYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OKVYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OK
Marco Zampieri
 
QlikView projects in Agile Environment
QlikView projects in Agile EnvironmentQlikView projects in Agile Environment
QlikView projects in Agile Environment
Saleha Amin, CSM, PMP
 
Qlik-Education-Catalog-EN
Qlik-Education-Catalog-ENQlik-Education-Catalog-EN
Qlik-Education-Catalog-EN
Marco Zampieri
 
Qlik Sense for Beginners - www.techstuffy.com - QlikView Next Generation
Qlik Sense for Beginners - www.techstuffy.com - QlikView Next GenerationQlik Sense for Beginners - www.techstuffy.com - QlikView Next Generation
Qlik Sense for Beginners - www.techstuffy.com - QlikView Next Generation
Practical QlikView
 

La actualidad más candente (20)

Qlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docxQlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docx
 
QlikView
QlikViewQlikView
QlikView
 
QlikView / Qlik Sense (ver. EN)
QlikView / Qlik Sense (ver. EN)QlikView / Qlik Sense (ver. EN)
QlikView / Qlik Sense (ver. EN)
 
BI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewBI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikView
 
Discover the QlikView Way
Discover the QlikView Way Discover the QlikView Way
Discover the QlikView Way
 
What makes QlikView unique
What makes QlikView unique  What makes QlikView unique
What makes QlikView unique
 
Information Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise ChallengeInformation Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise Challenge
 
Getting Started with Qlikview
Getting Started with QlikviewGetting Started with Qlikview
Getting Started with Qlikview
 
A Synopsis Of Qlik Sense Software
A Synopsis Of Qlik Sense SoftwareA Synopsis Of Qlik Sense Software
A Synopsis Of Qlik Sense Software
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
 
QlikView & Salesforce
QlikView & SalesforceQlikView & Salesforce
QlikView & Salesforce
 
VYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OKVYW_Online Live Story Pitch OK
VYW_Online Live Story Pitch OK
 
Modernizing the Finance Function with Qlik
Modernizing the Finance Function with QlikModernizing the Finance Function with Qlik
Modernizing the Finance Function with Qlik
 
Best Practices - QlikView Application Development
Best Practices - QlikView Application DevelopmentBest Practices - QlikView Application Development
Best Practices - QlikView Application Development
 
QlikView projects in Agile Environment
QlikView projects in Agile EnvironmentQlikView projects in Agile Environment
QlikView projects in Agile Environment
 
Qlik-Education-Catalog-EN
Qlik-Education-Catalog-ENQlik-Education-Catalog-EN
Qlik-Education-Catalog-EN
 
Credon - Qlik Sense Presentation
Credon - Qlik Sense PresentationCredon - Qlik Sense Presentation
Credon - Qlik Sense Presentation
 
Qlikview for Beginners
Qlikview for BeginnersQlikview for Beginners
Qlikview for Beginners
 
Qlik Sense for Beginners - www.techstuffy.com - QlikView Next Generation
Qlik Sense for Beginners - www.techstuffy.com - QlikView Next GenerationQlik Sense for Beginners - www.techstuffy.com - QlikView Next Generation
Qlik Sense for Beginners - www.techstuffy.com - QlikView Next Generation
 
Qlikview-online-training | Qlikview Server training | Qlikview Designer
Qlikview-online-training | Qlikview Server training | Qlikview DesignerQlikview-online-training | Qlikview Server training | Qlikview Designer
Qlikview-online-training | Qlikview Server training | Qlikview Designer
 

Destacado

Destacado (12)

QWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 wordsQWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 words
 
Big Data = Intelligence
Big Data = IntelligenceBig Data = Intelligence
Big Data = Intelligence
 
QLIK MAKES THE BEST BI TOOLS AND APPLICATIONS FOR THE AGE OF BIG-DATA
QLIK MAKES THE BEST BI TOOLS AND APPLICATIONS FOR THE AGE OF BIG-DATAQLIK MAKES THE BEST BI TOOLS AND APPLICATIONS FOR THE AGE OF BIG-DATA
QLIK MAKES THE BEST BI TOOLS AND APPLICATIONS FOR THE AGE OF BIG-DATA
 
Leveraging BI and Predictive Analytics to deliver Real time forecasting
Leveraging BI and Predictive Analytics to deliver Real time forecastingLeveraging BI and Predictive Analytics to deliver Real time forecasting
Leveraging BI and Predictive Analytics to deliver Real time forecasting
 
The Power of Business Analytics in the Big Data world - Cognizant PPT at Qlik...
The Power of Business Analytics in the Big Data world - Cognizant PPT at Qlik...The Power of Business Analytics in the Big Data world - Cognizant PPT at Qlik...
The Power of Business Analytics in the Big Data world - Cognizant PPT at Qlik...
 
Best analytics tool
 Best analytics tool Best analytics tool
Best analytics tool
 
BI Analytics Overview Observations and Demo for Entrisik, Domo and Tableau
BI Analytics Overview Observations and Demo for Entrisik, Domo and TableauBI Analytics Overview Observations and Demo for Entrisik, Domo and Tableau
BI Analytics Overview Observations and Demo for Entrisik, Domo and Tableau
 
A Picture is Worth a Thousand Words
A Picture is Worth a Thousand WordsA Picture is Worth a Thousand Words
A Picture is Worth a Thousand Words
 
Practical qlikview 25 page sample
Practical qlikview   25 page samplePractical qlikview   25 page sample
Practical qlikview 25 page sample
 
Introduction to-sql
Introduction to-sqlIntroduction to-sql
Introduction to-sql
 
Qlik vs. Tableau: High-Level Comparison
Qlik vs. Tableau: High-Level ComparisonQlik vs. Tableau: High-Level Comparison
Qlik vs. Tableau: High-Level Comparison
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 

Similar a Qonnections2015 - Why Qlik is better with Big Data

Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
confluent
 
Nimble storage investor presentation q3 fy15(1)
Nimble storage investor presentation   q3 fy15(1)Nimble storage investor presentation   q3 fy15(1)
Nimble storage investor presentation q3 fy15(1)
nimblestorageIR
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 

Similar a Qonnections2015 - Why Qlik is better with Big Data (20)

What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
1 greenplum in banking sk cab
1 greenplum in banking   sk cab1 greenplum in banking   sk cab
1 greenplum in banking sk cab
 
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
 
How Zebra Technologies delivers business intelligence with Elastic on Google ...
How Zebra Technologies delivers business intelligence with Elastic on Google ...How Zebra Technologies delivers business intelligence with Elastic on Google ...
How Zebra Technologies delivers business intelligence with Elastic on Google ...
 
3rd day big data
3rd day   big data3rd day   big data
3rd day big data
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
Augmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big DataAugmented OLAP Analytics for Big Data
Augmented OLAP Analytics for Big Data
 
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
Accelerating Self-Service Analytics with Denodo and Tableau (Singapore)
 
Nimble storage investor presentation q3 fy15(1)
Nimble storage investor presentation   q3 fy15(1)Nimble storage investor presentation   q3 fy15(1)
Nimble storage investor presentation q3 fy15(1)
 
Big Data LDN 2017: Managing the Explosion of Data With Qlik- Big Data & IoT
Big Data LDN 2017: Managing the Explosion of Data With Qlik- Big Data & IoTBig Data LDN 2017: Managing the Explosion of Data With Qlik- Big Data & IoT
Big Data LDN 2017: Managing the Explosion of Data With Qlik- Big Data & IoT
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
 
Why we should consider Open Hybrid Cloud.pdf
Why we should  consider Open Hybrid Cloud.pdfWhy we should  consider Open Hybrid Cloud.pdf
Why we should consider Open Hybrid Cloud.pdf
 
Business Intelligence Best Practice Summit: BI Quo Vadis
Business Intelligence Best Practice Summit:  BI Quo VadisBusiness Intelligence Best Practice Summit:  BI Quo Vadis
Business Intelligence Best Practice Summit: BI Quo Vadis
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data ContinuumFast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
Fast Data: Achieving Real-Time Data Analysis Across the Financial Data Continuum
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Qonnections2015 - Why Qlik is better with Big Data

  • 1. Why Qlik® is Better with Big Data John Park, Qlik Eli Singer, JethroData 28 April, 2015
  • 2. 2#qonnections Eli Singer • CEO & Co-Founder, JethroData • Data management, Information Security, e-commerce • Over 20 Years in leading start-ups • Twitter: @jethrodata Presenters John Park • Senior Solutions Architect, Partner Engineering • ETL, data warehousing, software design, *NIX, architecture • 2 years Qlik, 7 years data warehouse consultant • Twitter: @jpark328
  • 3. 3#qonnections Legal Disclaimer ©Qlik Confidential This Presentation contains forward-looking statements, including, but not limited to, statements regarding the value and effectiveness of Qlik's products, the introduction of product enhancements or additional products, Qlik’s partner and customer relationships, and Qlik's growth, expansion and market leadership, that involve risks, uncertainties, assumptions and other factors which, if they do not materialize or prove correct, could cause Qlik's results to differ materially from those expressed or implied by such forward-looking statements. All statements, other than statements of historical fact, are statements that could be deemed forward-looking statements, including statements containing the words "predicts," "plan," "expects," "anticipates,“ “see,” "believes," "goal," "target," "estimate," "potential," "may", "will," "might," "could," and similar words. Qlik intends all such forward-looking statements to be covered by the safe harbor provisions for forward-looking statements contained in Section 21E of the Exchange Act and the Private Securities Litigation Reform Act of 1995. Actual results may differ materially from those projected in such statements due to various factors, including but not limited to: risks and uncertainties inherent in our business; our ability to attract new customers and retain existing customers; our ability to effectively sell, service and support our products; our ability to manage our international operations; our ability to compete effectively; our ability to develop and introduce new products and add-ons or enhancements to existing products; our ability to continue to promote and maintain our brand in a cost-effective manner; our ability to manage growth; our ability to attract and retain key personnel; the scope and validity of intellectual property rights applicable to our products; adverse economic conditions in general and adverse economic conditions specifically affecting the markets in which we operate; and other risks and uncertainties more fully described in Qlik's publicly available filings with the Securities and Exchange Commission. Past performance is not necessarily indicative of future results. The forward- looking statements included in this presentation represent Qlik's views as of the date of this presentation. Qlik anticipates that subsequent events and developments will cause its views to change. Qlik undertakes no intention or obligation to update or revise any forward-looking statements, whether as a result of new information, future events or otherwise. These forward-looking statements should not be relied upon as representing Qlik's views as of any date subsequent to the date of this presentation. This Presentation should be read in conjunction with Qlik's periodic reports filed with the SEC (SEC Information), including the disclosures therein of certain factors which may affect Qlik’s future performance. Individual statements appearing in this Presentation are intended to be read in conjunction with and in the context of the complete SEC Information documents in which they appear, rather than as stand- alone statements. This presentation is intended to outline our general product direction and should not be relied on in making a purchase decision, as the development, release, and timing of any features or functionality described for our products remains at our sole discretion. © 2015 QlikTech International AB. All rights reserved. Qlik®, QlikView®, Qlik® Sense, QlikTech®, and the Qlik logos are trademarks of QlikTech International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners.
  • 4. 4#qonnections Agenda: Why Qlik® is Better with Big Data • Introduction • Notes about Big Data / Hadoop / Data Lakes • Current state of Data Lake implementations • JethroData overview • Demo: Let’s analyse 2.5billion rows from the Data Lake • JethroData architecture • Why Qlik is better with Big Data • Key takeaways
  • 5. 5#qonnections Introduction Data, technology, use cases are growing exponentially, but it seems we do not know what to do
  • 6. 6#qonnections Big Data / Hadoop / Data Lakes Big Data is a marketing term
  • 7. 7#qonnections Big Data / Hadoop / Data Lakes Collection of services and toolkits working with HDFS
  • 8. 8#qonnections Big Data / Hadoop / Data Lakes Architecture pattern where data is stored/landed for processing by *
  • 9. 9#qonnections What is happening ? • Enterprise are adopting Hadoop for Data Scientist (Science Project -> Real “Production”) • Hadoop is maturing • Data Lake Architecture is being adopted • Companies and startups are building innovative services on Hadoop • Data is becoming life blood of companies However there are incredible amount of challenges for using Hadoop as single source of truth and sole underpinning of BI Platform: Image source: Gartner's 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business, Gartner (August 2014)
  • 10. Data Lake vision vs. reality Scalability, low cost and performance were promised but…
  • 11. Data Lake vision vs. reality Scalability, Low Cost, and Performance were promised but… It’s been a challenge working with BI Tools Is the Data Lake frozen ?
  • 12. 12#qonnections Reasons why we are struggling • Data Lakes were not designed to run ad-hoc / interactive queries • Hadoop was designed to run Batch process(ML, ETL, Predictive, Canned Reporting) processes • Designed for data scientist not business analyst • Our trade off: cost vs. scale vs. performance
  • 13. 13#qonnections Strategies to bridging the gap • Use best of technology to overcome the gap • Solve technical problems one issue at a time • Let’s do Interactive Query on Hadoop!
  • 14. Eli Singer – More about JethroData
  • 15. 15#qonnections Who is Jethrodata? • A Qlik Technology Partner • Developing next-generation analytical database • Focused on making interactive BI on Big Data a reality • Combine analytical DB design with full-indexing technology • JethroData 1.0 went GA Apr 7, 2015 • Offices in NY and Israel • Backed by world-class VCs
  • 16. 16#qonnections Common use cases • Challenge: BI on Hadoop is slow • Current solution: Replicate data into a separate EDW system for fast BI • Challenge: EDW is expensive • Current solution: Migrate batch processes to Hadoop, keep BI on EDW • Pain: Maintaining two separate data systems is expensive and an operational nightmare • With Jethro: Run fast BI directly on data in Hadoop
  • 17. 17#qonnections Why Is Hadoop So Slow? Architecture: MPP / Full-Scan (All SQL-on-Hadoop) Query: list books by author “Stephen King” Process: each librarian is assigned a rack, they then view each book, check if author is “Stephen King”, if so, get book title Result: too slow, costly, unscalable
  • 18. 18#qonnections JethroData and Big Data: Index Access Architecture: Index Access (Only JethroData) Query 1: list books by author “Stephen King” Process: go-to Author index, entry of “Stephen King”, get list of books, fetch only these books Result: Fast, minimal resources, scalable
  • 19. Let’s Analyze 2.5 Billion Rows
  • 20. • Hadoop Environment - CDH 5.3 / Severs: NN: 1x r3.large; DN: 9x m1.large • Jethrodata Server 1X- r3.8xlarge: 244GB RAM; 320GB SSD (SPOT) • Qlik Sense 1.10 - r3.2xlarge 4 vCPU 30 Gb RAM • Demo 1 TPC-DS • Based on TPC-DS, Replication Factor 1,000 • Sales_demo table based on store_sales fact table with dimension data added • 2.5B rows, 33 columns • 600GB raw data • Demo 2 Airline • Airline Data 123 Million Commercial Flight Data + CSV(Hybrid Architecture) Qlik and Jethro Demo Setup Business Discovery Apps Qlik Server Big Data Platform Big Data Indexing HTTP, HTTPS Protocol Hadoop Client HDFS Everything hosted on Amazon Web Services Direct Discovery ODBC Protocol
  • 21. Data Node Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state Data Node Data Node Data Node Data Node Query Executor Query Executor Query Executor Query Executor Query Executor Query Planner/ Mgr Query Planner/ Mgr Query Planner/ Mgr Query Planner/ Mgr Query Planner/ Mgr Performance and resources based on the size of the dataset SQL-on-Hadoop Architecture: MPP/Full-Scan
  • 22. Data Node Data Node Data Node Data Node Data Node Jethro Query Node Query Node Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state 1. Index Access 2. Read data only for require rows Performance and resources based on the size of the result-set Jethro SQL-on-Hadoop Architecture: Index Access
  • 23. 23#qonnections SQL on Hadoop: competitive landscape • Hive • Impala • Presto • SparkSQL • Drill • Pivotal/HAWQ • IBM/Big SQL • Actian • Teradata/SQL-H • Microsoft/PDW Full-Scan Based Solutions Read all rows. Every Time. • JethroData Index Based Solution Read ONLY needed rows. Use-Case Comparison: Full-Scan: Optimal for ETL, Predictive Index: Optimal for Interactive BI
  • 24. JethroLoad er JethroServ er JethroData: system overview Jethro Loader Jethro Server Hadoop Data Source • Hadoop • EDW • Streams • … 1. Initial load – extract data from relevant sources and load through JethroLoader. Incremental data can be loaded at short intervals 1 2 Queries • BI Tools • SQL client • … 3 4 2. Index and column files are stored in HDFS (or S3). Typical size is 35% of raw data 4. JethroServer communicates directly with HDFS to retrieve relevant data. No MapReduce / Spark are used 3. Queries are sent via standard ODBC/JDBC interface. Automatic load- balance across servers
  • 25. 25#qonnections Jethrodata: architecture highlights Every Column is Indexed! Allow users to slice & dice any way they choose and always got fast response. Scales to any size From 100M to 100B, columns and indexes are compressed and partitioned. Super-easy to implement Compatible with every Hadoop distribution. Installs on separate server(s) from Hadoop cluster.
  • 26. 26#qonnections Jethro Indexes: innovative technology • Fast to read  Simple: Inverted-list indexes map each column value to a list of rows  Fast: Direct access O(1) to each value entry  Scale: Distributed, highly hierarchical compressed bitmaps • Fast to write – Index files are appended, duplicate entries allowed – Incremental – new data indexed as it comes in – No locks, no random read/write http://www.google.com/patents/WO2013001535A3?cl=enPatent Pending:
  • 27. Intelligent caching yechnology • Reuse of intermediate/final query results – Repeat queries in sub-seconds • Addresses wide top-of-the-funnel queries – Analysis starts with queries with no/few filters – Those queries are often repeated in dashboard scenarios • Transparently adapts to incremental loads – Execution on delta data + merge saved results Query Speed Query Selectivity Fast Slow Few More Query Repeat Query Selectivity Hi Low Few More Query speed Query Selectivity Fast Slow Few More
  • 28. Advantage of Qlik and Jethro Data • Now, we can get scale, cost, and performance • Faster queries with more selection criteria • Faster Direct Discovery load time due to dimensional unique values already available in JethroData indexes • Use of system-wide optimization and smart caching • Creates an interactive Hadoop for Business Discovery • Allows Qlik Customers have same experience across all data sources including Hadoop. Qlik APPLICATION DIRECT DISCOVERY QUERIES
  • 29. 29#qonnections Why Qlik Is Better for Big Data • Allows users to analyze your big data the way Business users want to. • Scalability, Extensibility and Beautiful Visualizations to Tell your Story.
  • 30. 30#qonnections Key takeaways • Qlik can do Big Data well with right complementary technology • Qlik associative interface can open up new use cases with Big Data • Qlik and Jethro can provide true interactive BI with Hadoop
  • 31. 31#qonnections Follow-up resources • Open to public Qlik Sense hub: http://jethrodata.qlik.com • Free evaluation version of Jethro available at http://jethrodata.com/download