SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Grab some coffee and enjoy
the pre-show banter before
the top of the hour!
Outside the Box: Alternate Query Models & the Future of Big Data

The Briefing Room
Welcome

Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com

Twitter Tag: #briefr

The Briefing Room
Mission

!   Reveal the essential characteristics of enterprise software,
good and bad
!   Provide a forum for detailed analysis of today s innovative
technologies
!   Give vendors a chance to explain their product to savvy
analysts
!   Allow audience members to pose serious questions... and get
answers!

Twitter Tag: #briefr

The Briefing Room
Topics

This Month: INNOVATORS
January: ANALYTICS
February: BIG DATA
2014 Editorial Calendar at

www.insideanalysis.com/webcasts/the-briefing-room

Twitter Tag: #briefr

The Briefing Room
Data Discovery & Visualization

INNOVATORS
Twitter Tag: #briefr

The Briefing Room
Analyst: Robin Bloor

Robin Bloor is
Chief Analyst at
The Bloor Group	
	

robin.bloor@bloorgroup.com

Twitter Tag: #briefr

The Briefing Room
Infobright
! Infobright’s columnar database is used for applications and
data marts that analyze large volumes of machinegenerated data
!   It leverages patented compression and optimization
techniques, and a “knowledge grid,” to achieve real-time
analytics
! Infobright offers a commercial version of its software, as
well as a freely-available, open source product

Twitter Tag: #briefr

The Briefing Room
Guests: Don DeLoach and Jeff Kibler
Don DeLoach is CEO and
President of Infobright

Jeff Kibler is Senior Technical
Architect for Infobright

Twitter Tag: #briefr

The Briefing Room
Turning	
  “Huh?”	
  into	
  “Aha!”	
  
Alternate	
  Query	
  Models	
  and	
  Big	
  Data	
  Analy;cs	
  
About Infobright
§  400+	
  direct	
  and	
  OEM	
  customers	
  across	
  North	
  America,	
  EMEA	
  and	
  Asia	
  
§  1,000	
  installa:ons	
  
§  8	
  of	
  Top	
  10	
  Global	
  Telecom	
  Carriers	
  use	
  Infobright	
  via	
  OEM/ISVs	
  

Logis;cs,	
  
Manufacturing,	
  
Business	
  
Intelligence	
  	
  

Online	
  &	
  Mobile	
  Adver;sing/Web	
  
Analy;cs,	
  eCommerce,	
  Social	
  Networks	
  

Government,	
  
U;li;es,	
  
Research	
  
	
  

Financial	
  Services	
  
	
  

Telecom,	
  Security	
  
	
  
Core Competencies

Columnar	
  
Database	
  

Intelligence,	
  
not	
  Hardware	
  

Administra:ve	
  
Simplicity	
  

Designed	
  for	
  
fast	
  analy:cs	
  

Knowledge	
  
Grid	
  

No	
  manual	
  
tuning	
  

Deep	
  data	
  
compression	
  

Itera:ve	
  
Engine	
  

Minimal	
  
ongoing	
  
administra:on	
  
Machine-Generated Data Is Everywhere
§  Weblogs	
  
§  Computer,	
  network	
  events	
  
§  Call	
  detail	
  records	
  
§  Financial	
  trade	
  data	
  
§  Sensors,	
  RFID	
  
§  Online	
  game	
  data	
  
Businesses	
  need	
  to	
  extract	
  insight	
  in	
  near-­‐real	
  ;me	
  from	
  rapidly	
  growing	
  data	
  
volume:	
  
•  Segment	
  and	
  target	
  website	
  visitors	
  

•  Troubleshoot	
  networks	
  

•  Iden7fy	
  security	
  threats	
  and	
  fraud	
  

•  Op7mize	
  online/mobile	
  ads	
  
Internet of Things is a Multiplier for EVERYTHING
Emerging Data Analytics Stack:
Days of One-Size-Fits-All Are Gone
“Yesterday’s	
  BI-­‐ETL-­‐EDW	
  stack	
  is	
  wrong-­‐sided	
  for	
  tomorrow’s	
  
needs,	
  and	
  quickly	
  becoming	
  irrelevant.”	
  Gigamon	
  
§  Data	
  management	
  
§  Hadoop	
  transforming	
  this	
  area	
  

§  Transparent	
  analy:c	
  stack	
  
§  Opera:onal,	
  inves:ga:ve,	
  predic:ve	
  	
  
§  Machine-­‐generated,	
  text	
  

§  User	
  consump:on	
  	
  
§  Real-­‐:me,	
  interac:ve	
  visualiza:on	
  &	
  query	
  crea:on	
  

§  Data	
  Center	
  /	
  Data	
  Warehouse	
  
§  Infrastructure	
  strategies,	
  op:ons	
  prolifera:ng	
  
Infobright: Columnar Architecture
Column Orientation

Knowledge	
  Grid	
  –	
  sta:s:cs	
  and	
  
metadata	
  “describing”	
  	
  the	
  super-­‐
compressed	
  data	
  
Data	
  Packs	
  –	
  data	
  stored	
  	
  
in	
  manageably	
  sized,	
  
highly	
  compressed	
  data	
  
packs	
  
Data	
  compressed	
  using	
  
algorithms	
  tailored	
  to	
  	
  
data	
  type	
  

Smarter	
  architecture	
  	
  
§  Load	
  data	
  and	
  go	
  
§  No	
  indices	
  or	
  par::ons	
  	
  

to	
  build	
  and	
  maintain	
  
§  Knowledge	
  Grid	
  
automa:cally	
  updated	
  as	
  
data	
  packs	
  are	
  created	
  or	
  
updated	
  
§  Super-­‐compact	
  data	
  foot-­‐	
  
print	
  can	
  leverage	
  off-­‐the-­‐
shelf	
  hardware	
  
The Knowledge Grid
Knowledge	
  Grid	
  

Knowledge	
  Nodes	
  

applies	
  to	
  the	
  whole	
  table

built	
  for	
  each	
  Data	
  Pack

Informa:on	
  about	
  the	
  data	
  
Column	
   A
Column	
  A	
  	
  
DP1	
  

DP2	
  
DP3	
  
DP4	
  
DP5	
  
DP6	
  

Column	
  B	
  

…	
  

Global	
  knowledge	
  
String	
  and	
  character	
  data	
  
Numeric	
  data	
  

Built	
  during	
  
	
  LOAD	
  

Distribu;ons	
  

Dynamic	
  knowledge	
  
§  	
  Knowledge	
  Nodes	
  answer	
  the	
  query	
  directly,	
  or	
  
§  	
  Iden:fy	
  only	
  required	
  Data	
  Packs,	
  minimizing	
  decompression,	
  and	
  
§  	
  Predict	
  required	
  data	
  in	
  advance	
  based	
  on	
  workload	
  

Built	
  per	
  query	
  
E.g.	
  for	
  
aggregates,	
  joins	
  
Optimizer / Granular Engine
1. 
2. 
3. 
4. 
	
  
	
  

Query	
  received	
  
Engine	
  iterates	
  on	
  Knowledge	
  Grid	
  
Each	
  pass	
  eliminates	
  Data	
  Packs	
  
If	
  any	
  Data	
  Packs	
  are	
  needed	
  to	
  resolve	
  query,	
  only	
  those	
  are	
  decompressed	
  

Query

Knowledge	
  Grid

Results

1%
Q:	
  How	
  are	
  my	
  
sales	
  doing	
  this	
  
year?

Compressed	
  Data
Infobright Architecture: Data Packs and Compression
Data	
  Packs	
  
§  Each	
  data	
  pack	
  contains	
  65,536	
  data	
  values	
  
§  Compression	
  is	
  applied	
  to	
  each	
  individual	
  data	
  pack	
  

64K	
  

§  The	
  compression	
  algorithm	
  varies	
  depending	
  on	
  data	
  type	
  and	
  

distribu:on	
  

64K	
  

Compression	
  
§  Results	
  vary	
  depending	
  on	
  the	
  distribu:on	
  

64K	
  
64K	
  
Patent-­‐Pending	
  
Compression	
  
Algorithms	
  

of	
  data	
  among	
  data	
  packs	
  
§  A	
  typical	
  overall	
  compression	
  ra:o	
  seen	
  in	
  
the	
  field	
  is	
  10:1	
  
§  Some	
  customers	
  have	
  seen	
  results	
  of	
  40:1	
  
and	
  higher	
  
§  For	
  example,	
  1TB	
  of	
  raw	
  data	
  compressed	
  
10	
  to	
  1	
  would	
  only	
  require	
  100GB	
  of	
  disk	
  
capacity	
  
What Your Data Looks Like Now
Original	
  data	
  

Compressed	
  data	
  

10TB	
  

50	
  GB	
  

=

Avg	
  compression	
  ra:o	
  of	
  20:1	
  

+
Knowledge	
  Grid	
  
<	
  .5	
  GB	
  
<	
  1%	
  of	
  compressed	
  data
Alternate Query Models: When Good Enough Works
§  “Principle	
  of	
  exactness”	
  the	
  
default	
  for	
  most	
  data	
  analy:cs	
  
and	
  access	
  systems	
  today	
  
§  Using	
  “approximate	
  queries”	
  
good	
  enough	
  answers	
  can	
  be	
  
found	
  using	
  less	
  resources	
  
§  Works	
  best	
  when	
  given	
  the	
  
ability	
  to	
  alternate	
  between	
  
approxima:on	
  and	
  exactness	
  in	
  
an	
  easy	
  way	
  
§  Crea:ng	
  an	
  interac:vity	
  that	
  
accelerates	
  :me	
  to	
  answers	
  and	
  
reduces	
  compu:ng	
  resources	
  
required	
  
Tools for Investigative Analysis

Today, Infobright provides:
§  Standard Queries: Knowledge Grid is used to
aid performance, only required data packs are
opened, retrieves exact results
§  Rough Queries: Only Knowledge Grid is used
to derive an answer quickly, typically for
analytics like SUM, AVG, MAX
Tools for Investigative Analysis

Fast and Informative:
§  Approximate Queries: Uses a combination of
the Knowledge Grid and Intelligent Random
Sampling to return results very quickly applicable for any type of query
§  Exact results are not important
§  Top-N type queries
§  Investigative Analytics
Use Case
§  Approximate Query useful when looking for data in an exploratory fashion
(e.g. anomalous events, understanding data characteristics)
§  Example: Find the “Top-10” protocols and ports extracted from event records.
§  Exact Query may take minutes, Approximate Query can answer in seconds. What’s
important is the Top-10 not necessarily the exact numbers
EXACT QUERY	
  
DY_HR	
   SUM(TDR)	
  
AP_NAME	
  
8	
  
14269152	
  DNS	
  
8	
  
13716936	
  HTTP-80	
  
8	
  
13527636	
  HTTPS-443	
  
8	
  
13044432	
  UNDEFINED	
  
8	
  
11486904	
  NO APPL PORT	
  
8	
  
4280412	
  UNDEFINED	
  
8	
  
2313288	
  HTTP-ALT-8080	
  
8	
  
1278876	
  5223	
  
8	
  
1214100	
  DNS-53	
  
8	
  
991560	
  NO APPL PORT	
  
8	
  
899220	
  XMPP-Client	
  

APPROXIMATE QUERY	
  
DY_HR	
   SUM(TDR)	
  
AP_NAME	
  
8	
  
16872663	
  HTTP-80	
  
8	
  
15361320	
  DNS	
  
8	
  
14528793	
  HTTPS-443	
  
8	
  
13578984	
  UNDEFINED	
  
8	
  
11613616	
  NO APPL PORT	
  
8	
  
3659742	
  UNDEFINED	
  
8	
  
2724149	
  HTTP-ALT-8080	
  
8	
  
1427824	
  5223	
  
8	
  
1194147	
  DNS-53	
  
8	
  
1083973	
  NO APPL PORT	
  
8	
  
967579	
  XMPP-Client	
  
Example: Online Advertising Segmentation

Approximate Queries

Traditional Queries

The goal in this example is to create a targeted campaign. They have a
minimum number of participants that have to be included in the target group
Find the top n
individuals who
meet criteria 1

Then find the top m
individuals who
meet criteria 1 and
criteria 2

This process can take a
considerable amount of time
Approximate query could dramatically
save the amount of time it takes to
determine which set of criteria they
should use

This is repeated until they are in the range that
that want to work with, and there can be up to
1500 different criteria, though they normally stop
after 7 or 8 different filters
They also have to a look at how
many individuals who are in
each permutation of the criteria.

They can (if desired) use exact queries
to calculate the exact final numbers,
instead of having to do exact queries for
all the runs.

This process can collapse an effort that takes hours into minutes or seconds
Big Data Analytics At the End of the Day

AD HOC
PERFORMANCE

SCALABILITY

LOAD SPEEDS

HIGH AVAILABILITY

LOW TOUCH

COMPRESSION

TCO

AFFORDABILITY
Thank	
  you!	
  
Perceptions & Questions

Analyst:
Robin Bloor

Twitter Tag: #briefr

The Briefing Room
The Current Disposition

u 
u 
u 
u 
u 
u 

10 bn connected devices
13 to 14 bn new processors
embedded every year
Estimate 31 bn connected
devices by 2020
Sensors, RFID tags, DSPs,
FPGAs, CPUs, etc.
To control, alert, log and
report
Data growth at 55% pa
IOT Data Characteristics
u 
u 
u 
u 
u 
u 

Arrives in continuous streams
Generally reliable (i.e., not
in need of cleansing)
Very high volume
“Big tables” of predictably
structured data
So, very little need for ETL
activity
If “valuable” then processing
speed is likely to be critical
IOT Apps and Database
u 
u 
u 
u 

u 
u 

Mostly streaming – for alerts
and BI (analysis, discovery)
DBMS choice is a “horses for
courses” thing
If performance matters,
probably not a Hadoop app
The data structure does not
favor the prominent NoSQL
DBMSs
Traditional RDBMS will not do
well
Hence column-store
approach is most logical
The Coming Inversion
1. Instrument existing
(dumb) devices

2. Gather and analyze
data

3. Redesign device and
its instrumentation
from knowledge gained

4. Iterate
Going Forward

In terms of

DATA VOLUMES
we expect the

IOT DATA VOLUME
to swamp all other
sources of data
u  Do

the high compression rates you achieve occur
because it is machine data, i.e., it’s a function of
the characteristics of the data?

u  Is

the “approximate query” an Infobright
invention?

u  How

frequently do customers use this type of
query and for what type of applications?

u  Who,

typically, are the Infobright end users?
u  What

“relationship” does Infobright favor with
Hadoop?

u  What

statistical functions, if any, does Infobright
offer?

u  What

does the product roadmap look like?
Twitter Tag: #briefr

The Briefing Room
Upcoming Topics

This Month: INNOVATORS
January: ANALYTICS
February: BIG DATA
2014 Editorial Calendar at

www.insideanalysis.com/webcasts/the-briefing-room

www.insideanalysis.com

Twitter Tag: #briefr

The Briefing Room
Thank You
for Your
Attention

Twitter Tag: #briefr

The Briefing Room

Más contenido relacionado

Más de Inside Analysis

Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariInside Analysis
 
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)Inside Analysis
 
DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)Inside Analysis
 

Más de Inside Analysis (20)

Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 
Red Hat - Sarangan Rangachari
Red Hat - Sarangan RangachariRed Hat - Sarangan Rangachari
Red Hat - Sarangan Rangachari
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
DisrupTech 2015ek
DisrupTech 2015ekDisrupTech 2015ek
DisrupTech 2015ek
 
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)
 
DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)DisrupTech - Robin Bloor (1)
DisrupTech - Robin Bloor (1)
 

Último

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Outside the Box: Alternate Query Models and the Future of Big Data

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. Outside the Box: Alternate Query Models & the Future of Big Data The Briefing Room
  • 4. Mission !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room
  • 5. Topics This Month: INNOVATORS January: ANALYTICS February: BIG DATA 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room
  • 6. Data Discovery & Visualization INNOVATORS Twitter Tag: #briefr The Briefing Room
  • 7. Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com Twitter Tag: #briefr The Briefing Room
  • 8. Infobright ! Infobright’s columnar database is used for applications and data marts that analyze large volumes of machinegenerated data !   It leverages patented compression and optimization techniques, and a “knowledge grid,” to achieve real-time analytics ! Infobright offers a commercial version of its software, as well as a freely-available, open source product Twitter Tag: #briefr The Briefing Room
  • 9. Guests: Don DeLoach and Jeff Kibler Don DeLoach is CEO and President of Infobright Jeff Kibler is Senior Technical Architect for Infobright Twitter Tag: #briefr The Briefing Room
  • 10. Turning  “Huh?”  into  “Aha!”   Alternate  Query  Models  and  Big  Data  Analy;cs  
  • 11. About Infobright §  400+  direct  and  OEM  customers  across  North  America,  EMEA  and  Asia   §  1,000  installa:ons   §  8  of  Top  10  Global  Telecom  Carriers  use  Infobright  via  OEM/ISVs   Logis;cs,   Manufacturing,   Business   Intelligence     Online  &  Mobile  Adver;sing/Web   Analy;cs,  eCommerce,  Social  Networks   Government,   U;li;es,   Research     Financial  Services     Telecom,  Security    
  • 12. Core Competencies Columnar   Database   Intelligence,   not  Hardware   Administra:ve   Simplicity   Designed  for   fast  analy:cs   Knowledge   Grid   No  manual   tuning   Deep  data   compression   Itera:ve   Engine   Minimal   ongoing   administra:on  
  • 13. Machine-Generated Data Is Everywhere §  Weblogs   §  Computer,  network  events   §  Call  detail  records   §  Financial  trade  data   §  Sensors,  RFID   §  Online  game  data   Businesses  need  to  extract  insight  in  near-­‐real  ;me  from  rapidly  growing  data   volume:   •  Segment  and  target  website  visitors   •  Troubleshoot  networks   •  Iden7fy  security  threats  and  fraud   •  Op7mize  online/mobile  ads  
  • 14. Internet of Things is a Multiplier for EVERYTHING
  • 15. Emerging Data Analytics Stack: Days of One-Size-Fits-All Are Gone “Yesterday’s  BI-­‐ETL-­‐EDW  stack  is  wrong-­‐sided  for  tomorrow’s   needs,  and  quickly  becoming  irrelevant.”  Gigamon   §  Data  management   §  Hadoop  transforming  this  area   §  Transparent  analy:c  stack   §  Opera:onal,  inves:ga:ve,  predic:ve     §  Machine-­‐generated,  text   §  User  consump:on     §  Real-­‐:me,  interac:ve  visualiza:on  &  query  crea:on   §  Data  Center  /  Data  Warehouse   §  Infrastructure  strategies,  op:ons  prolifera:ng  
  • 16. Infobright: Columnar Architecture Column Orientation Knowledge  Grid  –  sta:s:cs  and   metadata  “describing”    the  super-­‐ compressed  data   Data  Packs  –  data  stored     in  manageably  sized,   highly  compressed  data   packs   Data  compressed  using   algorithms  tailored  to     data  type   Smarter  architecture     §  Load  data  and  go   §  No  indices  or  par::ons     to  build  and  maintain   §  Knowledge  Grid   automa:cally  updated  as   data  packs  are  created  or   updated   §  Super-­‐compact  data  foot-­‐   print  can  leverage  off-­‐the-­‐ shelf  hardware  
  • 17. The Knowledge Grid Knowledge  Grid   Knowledge  Nodes   applies  to  the  whole  table built  for  each  Data  Pack Informa:on  about  the  data   Column   A Column  A     DP1   DP2   DP3   DP4   DP5   DP6   Column  B   …   Global  knowledge   String  and  character  data   Numeric  data   Built  during    LOAD   Distribu;ons   Dynamic  knowledge   §   Knowledge  Nodes  answer  the  query  directly,  or   §   Iden:fy  only  required  Data  Packs,  minimizing  decompression,  and   §   Predict  required  data  in  advance  based  on  workload   Built  per  query   E.g.  for   aggregates,  joins  
  • 18. Optimizer / Granular Engine 1.  2.  3.  4.      Query  received   Engine  iterates  on  Knowledge  Grid   Each  pass  eliminates  Data  Packs   If  any  Data  Packs  are  needed  to  resolve  query,  only  those  are  decompressed   Query Knowledge  Grid Results 1% Q:  How  are  my   sales  doing  this   year? Compressed  Data
  • 19. Infobright Architecture: Data Packs and Compression Data  Packs   §  Each  data  pack  contains  65,536  data  values   §  Compression  is  applied  to  each  individual  data  pack   64K   §  The  compression  algorithm  varies  depending  on  data  type  and   distribu:on   64K   Compression   §  Results  vary  depending  on  the  distribu:on   64K   64K   Patent-­‐Pending   Compression   Algorithms   of  data  among  data  packs   §  A  typical  overall  compression  ra:o  seen  in   the  field  is  10:1   §  Some  customers  have  seen  results  of  40:1   and  higher   §  For  example,  1TB  of  raw  data  compressed   10  to  1  would  only  require  100GB  of  disk   capacity  
  • 20. What Your Data Looks Like Now Original  data   Compressed  data   10TB   50  GB   = Avg  compression  ra:o  of  20:1   + Knowledge  Grid   <  .5  GB   <  1%  of  compressed  data
  • 21. Alternate Query Models: When Good Enough Works §  “Principle  of  exactness”  the   default  for  most  data  analy:cs   and  access  systems  today   §  Using  “approximate  queries”   good  enough  answers  can  be   found  using  less  resources   §  Works  best  when  given  the   ability  to  alternate  between   approxima:on  and  exactness  in   an  easy  way   §  Crea:ng  an  interac:vity  that   accelerates  :me  to  answers  and   reduces  compu:ng  resources   required  
  • 22. Tools for Investigative Analysis Today, Infobright provides: §  Standard Queries: Knowledge Grid is used to aid performance, only required data packs are opened, retrieves exact results §  Rough Queries: Only Knowledge Grid is used to derive an answer quickly, typically for analytics like SUM, AVG, MAX
  • 23. Tools for Investigative Analysis Fast and Informative: §  Approximate Queries: Uses a combination of the Knowledge Grid and Intelligent Random Sampling to return results very quickly applicable for any type of query §  Exact results are not important §  Top-N type queries §  Investigative Analytics
  • 24. Use Case §  Approximate Query useful when looking for data in an exploratory fashion (e.g. anomalous events, understanding data characteristics) §  Example: Find the “Top-10” protocols and ports extracted from event records. §  Exact Query may take minutes, Approximate Query can answer in seconds. What’s important is the Top-10 not necessarily the exact numbers EXACT QUERY   DY_HR   SUM(TDR)   AP_NAME   8   14269152  DNS   8   13716936  HTTP-80   8   13527636  HTTPS-443   8   13044432  UNDEFINED   8   11486904  NO APPL PORT   8   4280412  UNDEFINED   8   2313288  HTTP-ALT-8080   8   1278876  5223   8   1214100  DNS-53   8   991560  NO APPL PORT   8   899220  XMPP-Client   APPROXIMATE QUERY   DY_HR   SUM(TDR)   AP_NAME   8   16872663  HTTP-80   8   15361320  DNS   8   14528793  HTTPS-443   8   13578984  UNDEFINED   8   11613616  NO APPL PORT   8   3659742  UNDEFINED   8   2724149  HTTP-ALT-8080   8   1427824  5223   8   1194147  DNS-53   8   1083973  NO APPL PORT   8   967579  XMPP-Client  
  • 25. Example: Online Advertising Segmentation Approximate Queries Traditional Queries The goal in this example is to create a targeted campaign. They have a minimum number of participants that have to be included in the target group Find the top n individuals who meet criteria 1 Then find the top m individuals who meet criteria 1 and criteria 2 This process can take a considerable amount of time Approximate query could dramatically save the amount of time it takes to determine which set of criteria they should use This is repeated until they are in the range that that want to work with, and there can be up to 1500 different criteria, though they normally stop after 7 or 8 different filters They also have to a look at how many individuals who are in each permutation of the criteria. They can (if desired) use exact queries to calculate the exact final numbers, instead of having to do exact queries for all the runs. This process can collapse an effort that takes hours into minutes or seconds
  • 26. Big Data Analytics At the End of the Day AD HOC PERFORMANCE SCALABILITY LOAD SPEEDS HIGH AVAILABILITY LOW TOUCH COMPRESSION TCO AFFORDABILITY
  • 28. Perceptions & Questions Analyst: Robin Bloor Twitter Tag: #briefr The Briefing Room
  • 29.
  • 30. The Current Disposition u  u  u  u  u  u  10 bn connected devices 13 to 14 bn new processors embedded every year Estimate 31 bn connected devices by 2020 Sensors, RFID tags, DSPs, FPGAs, CPUs, etc. To control, alert, log and report Data growth at 55% pa
  • 31. IOT Data Characteristics u  u  u  u  u  u  Arrives in continuous streams Generally reliable (i.e., not in need of cleansing) Very high volume “Big tables” of predictably structured data So, very little need for ETL activity If “valuable” then processing speed is likely to be critical
  • 32. IOT Apps and Database u  u  u  u  u  u  Mostly streaming – for alerts and BI (analysis, discovery) DBMS choice is a “horses for courses” thing If performance matters, probably not a Hadoop app The data structure does not favor the prominent NoSQL DBMSs Traditional RDBMS will not do well Hence column-store approach is most logical
  • 33. The Coming Inversion 1. Instrument existing (dumb) devices 2. Gather and analyze data 3. Redesign device and its instrumentation from knowledge gained 4. Iterate
  • 34. Going Forward In terms of DATA VOLUMES we expect the IOT DATA VOLUME to swamp all other sources of data
  • 35. u  Do the high compression rates you achieve occur because it is machine data, i.e., it’s a function of the characteristics of the data? u  Is the “approximate query” an Infobright invention? u  How frequently do customers use this type of query and for what type of applications? u  Who, typically, are the Infobright end users?
  • 36. u  What “relationship” does Infobright favor with Hadoop? u  What statistical functions, if any, does Infobright offer? u  What does the product roadmap look like?
  • 37. Twitter Tag: #briefr The Briefing Room
  • 38. Upcoming Topics This Month: INNOVATORS January: ANALYTICS February: BIG DATA 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room www.insideanalysis.com Twitter Tag: #briefr The Briefing Room
  • 39. Thank You for Your Attention Twitter Tag: #briefr The Briefing Room