SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
The Economics of SQL on
Hadoop

© 2013 Datameer, Inc. All rights reserved.
Watch the Recording of this Webinar


View the entire recorded webinar at:

http://info.datameer.com/SlideshareEconomics-SQL-Hadoop.html
About our Speakers
John Myers
!
John Myers joined Enterprise Management Associates
in 2011 as senior analyst of the business intelligence
(BI) practice area. John has 10+ years of experience
working in areas related to business analytics in
professional services consulting and product
development roles, as well as helping organizations
solve their business analytics problems, whether they
relate to operational platforms, such as customer care
or billing, or applied analytical applications, such as
revenue assurance or fraud management. !

Slide 3

© 2013 Datameer, Inc. All rights reserved.
About our Speakers
Stefan Groschupf!
!
▪  Stefan Groschupf is the co-founder and CEO of

Datameer. He is one of the original contributors to
Nutch, the open source predecessor of Hadoop,
Stefan has been at the forefront of the Hadoop and
Big Data market.
Prior to Datameer, Stefan was the co-founder and
CEO of Scale Unlimited, which implemented
custom Hadoop analytic solutions for HP, Sun,
Deutsche Telekom, Nokia and others. Earlier,
Stefan was CEO of 101Tec, a supplier of Hadoop
and Nutch-based search and text classification
software to industry-leading companies such as
Apple, DHL and EMI Music. Stefan has also served
as CTO at multiple companies, including Sproose,
a social search engine company.

Slide 4

© 2013 Datameer, Inc. All rights reserved.
About our Speakers
Matt Schumpert!
!
Matt has been working in enterprise software of
over 10 years in various capacities, including sales
engineering, strategic alliances and consulting.  !
!
Matt currently runs the pre-sales engineering team
at Datameer, supporting all technical aspects of
customer engagement through roll-out of customers
into production. !
 !
Matt holds a BS in Computer Science from the
University of Virginia.!

Slide 5

© 2013 Datameer, Inc. All rights reserved.
Agenda
▪  EMA on Current State of the Big Data Industry!
– 
– 
– 
– 
– 

Online Archiving in Practice!
SQL on NoSQL: Metadata!
Exploratory Use Cases!
Late Binding Schemas better for Discovery!
Economics of Hadoop!

▪  Datameer on how to solve these problems!
–  Use Case #1: Semi-Structured Data !
–  Use Case #2: Text Analytics data!
–  Use Case #3: Path Analysis!

▪  Takeaways; and Question and Answer!

Slide 6

© 2013 Datameer, Inc. All rights reserved.
State of Big Data Industry

© 2013 Datameer, Inc. All rights reserved.
Online Archiving is the majority use case for Big
Data projects

Slide 8

© 2013Enterprise Management Associates, Inc.
Moving Beyond select * from tablename
SQL requires a managed set of metadata

Slide 9

© 2013Enterprise Management Associates, Inc.
Big Data Platforms have Multiple Uses:
Discovery is a significant portion

Slide 10

© 2013Enterprise Management Associates, Inc.
Late Binding Schemas are good for Discovery

Slide 11

© 2013Enterprise Management Associates, Inc.
Free as a Free puppy…

Slide 12

© 2013 Enterprise Management Associates, Inc.
Datameer Demos

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal

Slide 14

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand

Slide 15

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand
▪  Painful/impossible without inspection

Slide 16

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand
▪  Painful/impossible without inspection
▪  “One-offs” are possible with SQL+UDFs
▪  But better to collaborate with shared “views”

Slide 17

© 2013 Datameer, Inc. All rights reserved.
Use Case #1: Semi-Structured Data

▪  Noisy, log-structured data à signal
▪  Extract, cast, & define fields on demand
▪  Painful/impossible without inspection
▪  “One-offs” are possible with SQL+UDFs
▪  But better to collaborate with shared “views”

▪  Examples:
▪  “User-agent” string
▪  URL Parameters 
▪  JSON
Slide 18

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields

Slide 19

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid

Slide 20

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid
▪  Wrangling and mining

Slide 21

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid
▪  Wrangling and mining
▪  “Bag-of-Words” is a sensible start

Slide 22

© 2013 Datameer, Inc. All rights reserved.
Use Case #2: Text Analytics
▪  Few/no known fields
▪  Notion of a record is nebulous / fluid
▪  Wrangling and mining
▪  “Bag-of-Words” is a sensible start
▪  Again, frequent inspection is key

Slide 23

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis

Slide 24

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous

Slide 25

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous
▪  Defines/summarizes transitions, not events

Slide 26

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous
▪  Defines/summarizes transitions, not events
▪  Supported by list/array types

Slide 27

© 2013 Datameer, Inc. All rights reserved.
Use Case #3: Path Analysis 
▪  Key component of clickstream analysis
▪  Compares each record to the next/previous
▪  Defines/summarizes transitions, not events
▪  Supported by list/array types
▪  Requires multi-pass queries

Slide 28

© 2013 Datameer, Inc. All rights reserved.
Takeaways

© 2013 Datameer, Inc. All rights reserved.
When NOT to use SQL on Hadoop
▪  Structured Schemas

or “Schema on Write”

Slide 30

© 2013 Datameer, Inc. All rights reserved.
When NOT to use SQL on Hadoop
▪  Structured Schemas

or “Schema on Write”
▪  “Realtime” Query
SLAs for operational
or reporting tasks

Slide 31

© 2013 Datameer, Inc. All rights reserved.
When NOT to use SQL on Hadoop
▪  Structured Schemas

or “Schema on Write”
▪  “Realtime” Query
SLAs for operational
or reporting tasks
▪  Highly detailed SQL
query requirements
(SQL-2003)

Slide 32

© 2013 Datameer, Inc. All rights reserved.
When to use SQL on Hadoop
▪  Unstructured

Datasets and
“Schema on Read”

Slide 33

© 2013 Datameer, Inc. All rights reserved.
When to use SQL on Hadoop
▪  Unstructured

Datasets and
“Schema on Read”
▪  Discovery tasks
designed to find new
connections and new
business value

Slide 34

© 2013 Datameer, Inc. All rights reserved.
When to use SQL on Hadoop
▪  Unstructured

Datasets and
“Schema on Read”
▪  Discovery tasks
designed to find new
connections and new
business value
▪  Lower level SQL
queries (SQL-99) 

Slide 35

© 2013 Datameer, Inc. All rights reserved.
Summary
▪  EMA on Current State of the Big Data Industry
–  Online Archiving in Practice
–  SQL on NoSQL: Metadata
–  Exploratory Use Cases
–  Late Binding Schemas better for Discovery

▪  Datameer on how to solve these problems
–  Use Case #1: Semi-Structured Data
–  Use Case #2: Text Analytics
–  Use Case #3: Path Analysis

Slide 36

© 2013 Datameer, Inc. All rights reserved.
Call To Action
■  Visit our website
–  www.datameer.com

■  Download our Trial
–  http://www.datameer.com/Datameer-trial.html

Slide 37

© 2013 Datameer, Inc. All rights reserved.
The Economics of SQL on Hadoop

Más contenido relacionado

Similar a The Economics of SQL on Hadoop

The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudInside Analysis
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the ScientistDatameer
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...Enterprise Management Associates
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsDatameer
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Cloudera, Inc.
 

Similar a The Economics of SQL on Hadoop (20)

The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...Looking Before You Leap into the Cloud: A proactive approach to machine learn...
Looking Before You Leap into the Cloud: A proactive approach to machine learn...
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data Analytics
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17Transform Banking with Big Data and Automated Machine Learning 9.12.17
Transform Banking with Big Data and Automated Machine Learning 9.12.17
 

Más de Datameer

Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...Datameer
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Datameer
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndDatameer
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Datameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarDatameer
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseDatameer
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataDatameer
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataDatameer
 

Más de Datameer (17)

Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Último

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

The Economics of SQL on Hadoop

  • 1. The Economics of SQL on Hadoop © 2013 Datameer, Inc. All rights reserved.
  • 2. Watch the Recording of this Webinar View the entire recorded webinar at: http://info.datameer.com/SlideshareEconomics-SQL-Hadoop.html
  • 3. About our Speakers John Myers ! John Myers joined Enterprise Management Associates in 2011 as senior analyst of the business intelligence (BI) practice area. John has 10+ years of experience working in areas related to business analytics in professional services consulting and product development roles, as well as helping organizations solve their business analytics problems, whether they relate to operational platforms, such as customer care or billing, or applied analytical applications, such as revenue assurance or fraud management. ! Slide 3 © 2013 Datameer, Inc. All rights reserved.
  • 4. About our Speakers Stefan Groschupf! ! ▪  Stefan Groschupf is the co-founder and CEO of Datameer. He is one of the original contributors to Nutch, the open source predecessor of Hadoop, Stefan has been at the forefront of the Hadoop and Big Data market. Prior to Datameer, Stefan was the co-founder and CEO of Scale Unlimited, which implemented custom Hadoop analytic solutions for HP, Sun, Deutsche Telekom, Nokia and others. Earlier, Stefan was CEO of 101Tec, a supplier of Hadoop and Nutch-based search and text classification software to industry-leading companies such as Apple, DHL and EMI Music. Stefan has also served as CTO at multiple companies, including Sproose, a social search engine company. Slide 4 © 2013 Datameer, Inc. All rights reserved.
  • 5. About our Speakers Matt Schumpert! ! Matt has been working in enterprise software of over 10 years in various capacities, including sales engineering, strategic alliances and consulting.  ! ! Matt currently runs the pre-sales engineering team at Datameer, supporting all technical aspects of customer engagement through roll-out of customers into production. !  ! Matt holds a BS in Computer Science from the University of Virginia.! Slide 5 © 2013 Datameer, Inc. All rights reserved.
  • 6. Agenda ▪  EMA on Current State of the Big Data Industry! –  –  –  –  –  Online Archiving in Practice! SQL on NoSQL: Metadata! Exploratory Use Cases! Late Binding Schemas better for Discovery! Economics of Hadoop! ▪  Datameer on how to solve these problems! –  Use Case #1: Semi-Structured Data ! –  Use Case #2: Text Analytics data! –  Use Case #3: Path Analysis! ▪  Takeaways; and Question and Answer! Slide 6 © 2013 Datameer, Inc. All rights reserved.
  • 7. State of Big Data Industry © 2013 Datameer, Inc. All rights reserved.
  • 8. Online Archiving is the majority use case for Big Data projects Slide 8 © 2013Enterprise Management Associates, Inc.
  • 9. Moving Beyond select * from tablename SQL requires a managed set of metadata Slide 9 © 2013Enterprise Management Associates, Inc.
  • 10. Big Data Platforms have Multiple Uses: Discovery is a significant portion Slide 10 © 2013Enterprise Management Associates, Inc.
  • 11. Late Binding Schemas are good for Discovery Slide 11 © 2013Enterprise Management Associates, Inc.
  • 12. Free as a Free puppy… Slide 12 © 2013 Enterprise Management Associates, Inc.
  • 13. Datameer Demos © 2013 Datameer, Inc. All rights reserved.
  • 14. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal Slide 14 © 2013 Datameer, Inc. All rights reserved.
  • 15. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand Slide 15 © 2013 Datameer, Inc. All rights reserved.
  • 16. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand ▪  Painful/impossible without inspection Slide 16 © 2013 Datameer, Inc. All rights reserved.
  • 17. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand ▪  Painful/impossible without inspection ▪  “One-offs” are possible with SQL+UDFs ▪  But better to collaborate with shared “views” Slide 17 © 2013 Datameer, Inc. All rights reserved.
  • 18. Use Case #1: Semi-Structured Data ▪  Noisy, log-structured data à signal ▪  Extract, cast, & define fields on demand ▪  Painful/impossible without inspection ▪  “One-offs” are possible with SQL+UDFs ▪  But better to collaborate with shared “views” ▪  Examples: ▪  “User-agent” string ▪  URL Parameters ▪  JSON Slide 18 © 2013 Datameer, Inc. All rights reserved.
  • 19. Use Case #2: Text Analytics ▪  Few/no known fields Slide 19 © 2013 Datameer, Inc. All rights reserved.
  • 20. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid Slide 20 © 2013 Datameer, Inc. All rights reserved.
  • 21. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid ▪  Wrangling and mining Slide 21 © 2013 Datameer, Inc. All rights reserved.
  • 22. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid ▪  Wrangling and mining ▪  “Bag-of-Words” is a sensible start Slide 22 © 2013 Datameer, Inc. All rights reserved.
  • 23. Use Case #2: Text Analytics ▪  Few/no known fields ▪  Notion of a record is nebulous / fluid ▪  Wrangling and mining ▪  “Bag-of-Words” is a sensible start ▪  Again, frequent inspection is key Slide 23 © 2013 Datameer, Inc. All rights reserved.
  • 24. Use Case #3: Path Analysis ▪  Key component of clickstream analysis Slide 24 © 2013 Datameer, Inc. All rights reserved.
  • 25. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous Slide 25 © 2013 Datameer, Inc. All rights reserved.
  • 26. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous ▪  Defines/summarizes transitions, not events Slide 26 © 2013 Datameer, Inc. All rights reserved.
  • 27. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous ▪  Defines/summarizes transitions, not events ▪  Supported by list/array types Slide 27 © 2013 Datameer, Inc. All rights reserved.
  • 28. Use Case #3: Path Analysis ▪  Key component of clickstream analysis ▪  Compares each record to the next/previous ▪  Defines/summarizes transitions, not events ▪  Supported by list/array types ▪  Requires multi-pass queries Slide 28 © 2013 Datameer, Inc. All rights reserved.
  • 29. Takeaways © 2013 Datameer, Inc. All rights reserved.
  • 30. When NOT to use SQL on Hadoop ▪  Structured Schemas or “Schema on Write” Slide 30 © 2013 Datameer, Inc. All rights reserved.
  • 31. When NOT to use SQL on Hadoop ▪  Structured Schemas or “Schema on Write” ▪  “Realtime” Query SLAs for operational or reporting tasks Slide 31 © 2013 Datameer, Inc. All rights reserved.
  • 32. When NOT to use SQL on Hadoop ▪  Structured Schemas or “Schema on Write” ▪  “Realtime” Query SLAs for operational or reporting tasks ▪  Highly detailed SQL query requirements (SQL-2003) Slide 32 © 2013 Datameer, Inc. All rights reserved.
  • 33. When to use SQL on Hadoop ▪  Unstructured Datasets and “Schema on Read” Slide 33 © 2013 Datameer, Inc. All rights reserved.
  • 34. When to use SQL on Hadoop ▪  Unstructured Datasets and “Schema on Read” ▪  Discovery tasks designed to find new connections and new business value Slide 34 © 2013 Datameer, Inc. All rights reserved.
  • 35. When to use SQL on Hadoop ▪  Unstructured Datasets and “Schema on Read” ▪  Discovery tasks designed to find new connections and new business value ▪  Lower level SQL queries (SQL-99) Slide 35 © 2013 Datameer, Inc. All rights reserved.
  • 36. Summary ▪  EMA on Current State of the Big Data Industry –  Online Archiving in Practice –  SQL on NoSQL: Metadata –  Exploratory Use Cases –  Late Binding Schemas better for Discovery ▪  Datameer on how to solve these problems –  Use Case #1: Semi-Structured Data –  Use Case #2: Text Analytics –  Use Case #3: Path Analysis Slide 36 © 2013 Datameer, Inc. All rights reserved.
  • 37. Call To Action ■  Visit our website –  www.datameer.com ■  Download our Trial –  http://www.datameer.com/Datameer-trial.html Slide 37 © 2013 Datameer, Inc. All rights reserved.

Notas del editor

  1. According to 2012 EMA research, Online Archiving, or Hadumping, is the Phase “zero” of most Big Data initiatives Teaches Internal teams about the data delivery and structure How to interact with the data How to apply data to business cases as opposed to simply a technology project It is the where you start when: “you don’t know what you don’t know…” 2013 EMA Research shows that over half of Big Data projects have online archiving as an ‘In Operation’ status In Production or as a Pilot Project with hands on keyboards. Software installed. Over 4 in 10 respondents say “Economics” are a Business Reason for Online Archiving Use Case. These organizations are attempting to lower their operational costs
  2. Moving beyond select * requires a standard requires a facility that manages and tracks metadata Select * tablename is the rough equivalent to cat filename SQL starts to become truly “special” when you use a query such as Select t.columnA, s.columnB, s.columnC from tablename t tablename s Where t.columnZ = s.column.X NoSQL and specifically Hadoop have focused on the ability to be flexible in data storage often at the expense of metadata management SQL doesn’t do with an “or” data structure (image on right) SQL works best with a defined data structure (image on right) When you ask Hive a question it doesn’t understand…. You get the error message. In2013 EMA Research Big Data initiatives used the following datasets Machine generated (JSON, XML, etc) almost 40% Process mediated (structured) just under 30% Human sourced (emails, texts,) over 30% Over 30% of respondents indicate that a lack of self-service data access (SQL) is a challenge to operate a Hadoop platform Nearly 40% of respondents say a lack of SQL data access is a challenge to operate a NoSQL platform In each of these instances, it indicates that while you “CAN” perform certain applications on Hadoop, SQL-based data access is a high concern.
  3. Big Data environments aren’t just for EDW replacement as some would say There are multiple use cases Operational Analytical Exploratory Nearly 3 of 10 respondents in 2013 research say that they are using Exploratory or Discovery use cases Just under 50% of respondents say operational costs (staff head count is included) are a challenge to operate a discovery platform. 3 of 10 respondents want to utilize the features and functions of products to speed their skills acquisition. Often times these are features that they feel most comfortable with. Interfaces and processes that they use every day. MS Excel is an example. Nearly 4 out 10 respondents indicate new skills development is a challenge to operate a discovery platform
  4. When you are using exploratory or discovery use cases, you need flexibility… applying a hard schema (structured) presupposes particular questions AND answers. Square wooden peg and round wooden hole – not a lot of give. Being able to apply a schema or structure at the time of query or late binding schema enables the best method of discovery Flexible schema at the time of processing…. Sausage grinder 2013 EMA research says Over 30% of respondents use late binding schemas when processing data Nearly a third use multiple approaches Over 10% don’t apply a schema at all… “Only” about one third of Respondents are using external technical resources to bridge their skills gaps. This comes from the costs associated with the outside consultants vs existing staff
  5. “Free as in Speech” or “Free as in Beer”… Big Data is “Free as a Free Puppy” Over 40% of respondents say Economics are a Business Reason for Online Archiving Use Case Back to Metadata…. Over one third of respondents indicate shortage of technical metadata a challenge to operate a discovery platform. Applying that technical metadata layer takes a manual effort and thus additional headcount. When you link this to ‘only’ a 1% increase in big data budget from 2013 to 2014 for Hadoop implementations, it is important to put the best use for hadoop platforms. 36% implementation time to implement is a challenge to operate a hadoop platform 43% say operational costs are a challenge to operate a discovery platform (link to a 1% increase in big data operational budget from 2013 to 2014) Over one third of respondents say they lack the skills to manage multi-structured data platforms as an obstacle to implement (Top answer)