Qonnections2015 - Why Qlik is better with Big Data

Why Qlik® is Better with Big Data
John Park, Qlik
Eli Singer, JethroData
28 April, 2015

2#qonnections
Eli Singer
• CEO & Co-Founder, JethroData
• Data management, Information Security,
e-commerce
• Over 20 Years in leading start-ups
• Twitter: @jethrodata
Presenters
John Park
• Senior Solutions Architect, Partner Engineering
• ETL, data warehousing, software design, *NIX,
architecture
• 2 years Qlik, 7 years data warehouse consultant
• Twitter: @jpark328

3#qonnections
Legal Disclaimer
©Qlik Confidential
This Presentation contains forward-looking statements, including, but not limited to, statements regarding the value and effectiveness of
Qlik's products, the introduction of product enhancements or additional products, Qlik’s partner and customer relationships, and Qlik's
growth, expansion and market leadership, that involve risks, uncertainties, assumptions and other factors which, if they do not materialize
or prove correct, could cause Qlik's results to differ materially from those expressed or implied by such forward-looking statements. All
statements, other than statements of historical fact, are statements that could be deemed forward-looking statements, including
statements containing the words "predicts," "plan," "expects," "anticipates,“ “see,” "believes," "goal," "target," "estimate," "potential," "may",
"will," "might," "could," and similar words. Qlik intends all such forward-looking statements to be covered by the safe harbor provisions for
forward-looking statements contained in Section 21E of the Exchange Act and the Private Securities Litigation Reform Act of 1995. Actual
results may differ materially from those projected in such statements due to various factors, including but not limited to: risks and
uncertainties inherent in our business; our ability to attract new customers and retain existing customers; our ability to effectively sell,
service and support our products; our ability to manage our international operations; our ability to compete effectively; our ability to
develop and introduce new products and add-ons or enhancements to existing products; our ability to continue to promote and maintain
our brand in a cost-effective manner; our ability to manage growth; our ability to attract and retain key personnel; the scope and validity of
intellectual property rights applicable to our products; adverse economic conditions in general and adverse economic conditions
specifically affecting the markets in which we operate; and other risks and uncertainties more fully described in Qlik's publicly available
filings with the Securities and Exchange Commission. Past performance is not necessarily indicative of future results. The forward-
looking statements included in this presentation represent Qlik's views as of the date of this presentation. Qlik anticipates that subsequent
events and developments will cause its views to change. Qlik undertakes no intention or obligation to update or revise any forward-looking
statements, whether as a result of new information, future events or otherwise. These forward-looking statements should not be relied
upon as representing Qlik's views as of any date subsequent to the date of this presentation.
This Presentation should be read in conjunction with Qlik's periodic reports filed with the SEC (SEC Information), including the disclosures
therein of certain factors which may affect Qlik’s future performance. Individual statements appearing in this Presentation are intended to
be read in conjunction with and in the context of the complete SEC Information documents in which they appear, rather than as stand-
alone statements. This presentation is intended to outline our general product direction and should not be relied on in making a
purchase decision, as the development, release, and timing of any features or functionality described for our products remains
at our sole discretion.
© 2015 QlikTech International AB. All rights reserved. Qlik®, QlikView®, Qlik® Sense, QlikTech®, and the Qlik logos are trademarks of
QlikTech International AB which have been registered in multiple countries. Other marks and logos mentioned herein are trademarks or
registered trademarks of their respective owners.

4#qonnections
Agenda: Why Qlik® is Better with Big Data
• Introduction
• Notes about Big Data / Hadoop / Data Lakes
• Current state of Data Lake implementations
• JethroData overview
• Demo: Let’s analyse 2.5billion rows from the Data Lake
• JethroData architecture
• Why Qlik is better with Big Data
• Key takeaways

5#qonnections
Introduction
Data, technology, use cases are growing
exponentially, but it seems
we do not know what to do

6#qonnections
Big Data / Hadoop / Data Lakes
Big Data is a marketing term

7#qonnections
Collection of services and toolkits working
with HDFS

8#qonnections
Architecture pattern where data is stored/landed for
processing by *

9#qonnections
What is happening ?
• Enterprise are adopting Hadoop for Data Scientist (Science Project -> Real “Production”)
• Hadoop is maturing
• Data Lake Architecture is being adopted
• Companies and startups are building innovative services on Hadoop
• Data is becoming life blood of companies
However there are incredible amount of challenges for using Hadoop as single source of
truth and sole underpinning of BI Platform:
Image source: Gartner's 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business, Gartner (August 2014)

Data Lake vision vs. reality
Scalability, low cost and performance were promised but…

Data Lake vision vs. reality
Scalability, Low Cost, and Performance were promised but…
It’s been a challenge working with BI Tools
Is the Data Lake frozen ?

12#qonnections
Reasons why we are struggling
• Data Lakes were not designed to run ad-hoc / interactive queries
• Hadoop was designed to run Batch process(ML, ETL, Predictive, Canned Reporting)
processes
• Designed for data scientist not business analyst
• Our trade off: cost vs. scale vs. performance

13#qonnections
Strategies to bridging the gap
• Use best of technology to overcome the gap
• Solve technical problems one issue at a time
• Let’s do Interactive Query on Hadoop!

Eli Singer – More about JethroData

15#qonnections
Who is Jethrodata?
• A Qlik Technology Partner
• Developing next-generation analytical database
• Focused on making interactive BI on Big Data a reality
• Combine analytical DB design with full-indexing
technology
• JethroData 1.0 went GA Apr 7, 2015
• Offices in NY and Israel
• Backed by world-class VCs

16#qonnections
Common use cases
• Challenge: BI on Hadoop is slow
• Current solution: Replicate data
into a separate EDW system for
fast BI
• Challenge: EDW is expensive
• Current solution: Migrate batch
processes to Hadoop, keep BI
on EDW
• Pain: Maintaining two separate data systems is
expensive and an operational nightmare
• With Jethro: Run fast BI directly on data in Hadoop

17#qonnections
Why Is Hadoop So Slow?
Architecture:
MPP / Full-Scan
(All SQL-on-Hadoop)
Query: list books by
author “Stephen King”
Process: each librarian is
assigned a rack, they
then view each book,
check if author is
“Stephen King”, if so, get
book title
Result: too slow, costly,
unscalable

18#qonnections
JethroData and Big Data: Index Access
Architecture:
Index Access
(Only JethroData)
Query 1: list books by
author “Stephen King”
Process: go-to Author
index, entry of
“Stephen King”, get
list of books, fetch
only these books
Result: Fast, minimal
resources, scalable

Let’s Analyze 2.5 Billion Rows

• Hadoop Environment - CDH 5.3 / Severs: NN: 1x r3.large; DN: 9x m1.large
• Jethrodata Server 1X- r3.8xlarge: 244GB RAM; 320GB SSD (SPOT)
• Qlik Sense 1.10 - r3.2xlarge 4 vCPU 30 Gb RAM
• Demo 1 TPC-DS
• Based on TPC-DS, Replication Factor 1,000
• Sales_demo table based on store_sales fact table with dimension data
added
• 2.5B rows, 33 columns
• 600GB raw data
• Demo 2 Airline
• Airline Data 123 Million Commercial Flight Data + CSV(Hybrid
Architecture)
Qlik and Jethro Demo Setup
Business Discovery Apps
Qlik Server
Big Data Platform
Big Data Indexing
HTTP, HTTPS
Protocol
Hadoop
Client
HDFS
Everything hosted on Amazon Web Services
Direct Discovery
ODBC Protocol

Data
Node
Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state
Data
Node
Data
Node
Data
Node
Data
Node
Query
Executor
Query
Executor
Query
Executor
Query
Executor
Query
Executor
Query
Planner/
Mgr
Query
Planner/
Mgr
Query
Planner/
Mgr
Query
Planner/
Mgr
Query
Planner/
Mgr
Performance and resources based on the size of the dataset
SQL-on-Hadoop Architecture: MPP/Full-Scan

Data
Node
Data
Node
Data
Node
Data
Node
Data
Node
Jethro
Query
Node
Query
Node
Client: SELECT state, sum(sales) FROM t1 WHERE prod=‘abc’ GROUP BY state
1. Index Access 2. Read data only for require rows
Performance and resources based on the size of the result-set
Jethro SQL-on-Hadoop Architecture: Index Access

23#qonnections
SQL on Hadoop: competitive landscape
• Hive
• Impala
• Presto
• SparkSQL
• Drill
• Pivotal/HAWQ
• IBM/Big SQL
• Actian
• Teradata/SQL-H
• Microsoft/PDW
Full-Scan Based Solutions
Read all rows. Every Time.
• JethroData
Index Based Solution
Read ONLY needed rows.
Use-Case Comparison:
Full-Scan: Optimal for ETL, Predictive
Index: Optimal for Interactive BI

JethroLoad
er
JethroServ
er
JethroData: system overview
Jethro
Loader
Jethro
Server
Hadoop
Data Source
• Hadoop
• EDW
• Streams
• …
1. Initial load –
extract data from
relevant sources
and load through
JethroLoader.
Incremental data
can be loaded at
short intervals
1
2
Queries
• BI Tools
• SQL client
• …
3
4
2. Index and
column files are
stored in HDFS
(or S3).
Typical size is
35% of raw data
4. JethroServer
communicates
directly with HDFS
to retrieve relevant
data.
No MapReduce /
Spark are used
3. Queries are
sent via standard
ODBC/JDBC
interface.
Automatic load-
balance across
servers

25#qonnections
Jethrodata: architecture highlights
Every Column is Indexed!
Allow users to slice & dice any way they
choose and always got fast response.
Scales to any size
From 100M to 100B, columns and indexes are
compressed and partitioned.
Super-easy to implement
Compatible with every Hadoop distribution.
Installs on separate server(s) from Hadoop
cluster.

26#qonnections
Jethro Indexes: innovative technology
• Fast to read
 Simple: Inverted-list indexes
map each column value to a
list of rows
 Fast: Direct access O(1) to
each value entry
 Scale: Distributed, highly
hierarchical compressed
bitmaps
• Fast to write
– Index files are appended,
duplicate entries allowed
– Incremental – new data
indexed as it comes in
– No locks, no random
read/write
http://www.google.com/patents/WO2013001535A3?cl=enPatent Pending:

Intelligent caching yechnology
• Reuse of intermediate/final query results
– Repeat queries in sub-seconds
• Addresses wide top-of-the-funnel queries
– Analysis starts with queries with no/few
filters
– Those queries are often repeated in
dashboard scenarios
• Transparently adapts to incremental loads
– Execution on delta data + merge saved
results
Query Speed
Query
Selectivity
Fast
Slow
Few More
Query Repeat
Query
Selectivity
Hi
Low
Few More
Query
speed
Query
Selectivity
Fast
Slow
Few More

Advantage of Qlik and Jethro Data
• Now, we can get scale, cost, and performance
• Faster queries with more selection criteria
• Faster Direct Discovery load time due to dimensional unique values
already available in JethroData indexes
• Use of system-wide optimization and smart caching
• Creates an interactive Hadoop for Business Discovery
• Allows Qlik Customers have same experience across all data sources including
Hadoop.
Qlik
APPLICATION
DIRECT DISCOVERY QUERIES

29#qonnections
Why Qlik Is Better for Big Data
• Allows users to analyze your big data the way
Business users want to.
• Scalability, Extensibility and Beautiful
Visualizations to Tell your Story.

30#qonnections
Key takeaways
• Qlik can do Big Data well with right complementary technology
• Qlik associative interface can open up new use cases with Big Data
• Qlik and Jethro can provide true interactive BI with Hadoop

31#qonnections
Follow-up resources
• Open to public Qlik Sense hub: http://jethrodata.qlik.com
• Free evaluation version of Jethro available at http://jethrodata.com/download

Qonnections2015 - Why Qlik is better with Big Data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (12)

Similar a Qonnections2015 - Why Qlik is better with Big Data

Similar a Qonnections2015 - Why Qlik is better with Big Data (20)

Último

Último (20)

Qonnections2015 - Why Qlik is better with Big Data