SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential
© 2016 MapR Technologies
1
SQL-on-Hadoop with
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 2
{ about : me }
Tugdual “Tug” Grall
• MapR
• Technical Evangelist
• MongoDB
• Technical Evangelist
• Couchbase
• Technical Evangelist
• eXo
• CTO
• Oracle
• Developer/Product Manager
• Mainly Java/SOA
• Developer in consulting firms
• Web
• @tgrall
• http://tgrall.github.io
• tgrall
• NantesJUG co-founder
• Pet Project :
• http://www.resultri.com
• tug@mapr.com
• tugdual@gmail.com
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 3
Ingest
Store
Process
Consume
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 4
MapR Converged Data Platform
© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential 5
The MapR Distribution including Apache Hadoop
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Spark
Streaming
Storm
StreamingNoSQL &
Search
Sahara
Provisioning
&
Coordination
ML, Graph
Mahout
MLLib
GraphX
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Pig
Spark
Batch
MapReduce
v1 & v2
HBase
Solr
Hive
Impala
Spark SQL
Drill
SQL
Sentry Oozie ZooKeeperSqoop
Flume
Data
Integration
& Access
HttpFS
Hue
Management
Data HubEnterprise Grade Operational
Data PlatformMapR-FS MapR-DB
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 6
Agenda
• Why Drill?
• Some Drilling with Open Data
• How does it work?
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 7
1980 2000 20101990 2020
Fixed schema
DBA controls structure
Dynamic / Flexible schema
Application controls structure
NON-RELATIONAL DATASTORESRELATIONAL DATABASES
GBs-TBs TBs-PBsVolume
Database
Data Increasingly Stored in Non-Relational Datastores
Structure
Development
Structured Structured, semi-structured and unstructured
Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 8
How To Bring SQL to Non-Relational Data Stores?
Familiarity of SQL Agility of NoSQL
• ANSI SQL semantics
• Low latency
• Integrated with Tools/Applications
• No schema management
– HDFS (Parquet, JSON, etc.)
– HBase
– …
• No transformation
– No silos of data
• Ease of use
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 9
Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA BEFORE
READ
SCHEMA ON THE
FLY
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 10
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
50+ contributors
150+ years of experience building
databases and distributed systems
Contributing to Apache Drill
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 11
- Sub-directory
- HBase namespace
- Hive database
- Database
Drill Enables ‘SQL-on-Everything’
SELECT * FROM dfs.yelp.`business.json`
Workspace
- Pathnames
- Hive table
- HBase table
- Table
Table
- DFS (Text, Parquet, JSON, XML)
- HBase/MapR-DB
- Hive Metastore/HCatalog
- RDBMS using JDBC
- Easy API to go beyond Hadoop
Storage plugin instance
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 12
Drill’s Data Model is Flexible
JSON
BSON
HBase
Parquet
Avro
CSV
TSV
Dynamic
schema
Fixed schema
Complex
Flat
Flexibility
Name Gender Age
Michael M 6
Jennifer F 3
RDBMS/SQL-on-Hadoop table
Flexibility
Apache Drill table
{
name: {
first: Michael,
last: Smith
},
hobbies: [ski, soccer],
district: Los Altos
}
{
name: {
first: Jennifer,
last: Gates
},
hobbies: [sing],
preschool: CCLC
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 13
Drilling into Data
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 14
Business dataset {
"business_id": "4bEjOyTaDG24SY5TxsaUNQ",
"full_address": "3655 Las Vegas Blvd SnThe StripnLas Vegas, NV 89109",
"hours": {
"Monday": {"close": "23:00", "open": "07:00"},
"Tuesday": {"close": "23:00", "open": "07:00"},
"Friday": {"close": "00:00", "open": "07:00"},
"Wednesday": {"close": "23:00", "open": "07:00"},
"Thursday": {"close": "23:00", "open": "07:00"},
"Sunday": {"close": "23:00", "open": "07:00"},
"Saturday": {"close": "00:00", "open": "07:00"}
},
"open": true,
"categories": ["Breakfast & Brunch", "Steakhouses", "French", "Restaurants"],
"city": "Las Vegas",
"review_count": 4084,
"name": "Mon Ami Gabi",
"neighborhoods": ["The Strip"],
"longitude": -115.172588519464,
"state": "NV",
"stars": 4.0,
"attributes": {
"Alcohol": "full_bar”,
"Noise Level": "average",
"Has TV": false,
"Attire": "casual",
"Ambience": {
"romantic": true,
"intimate": false,
"touristy": false,
"hipster": false,
"classy": true,
"trendy": false,
"casual": false
},
"Good For": {"dessert": false, "latenight": false, "lunch": false,
"dinner": true, "breakfast": false, "brunch": false},
}
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 15
Reviews dataset
{
"votes": {"funny": 0, "useful": 2, "cool": 1},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg",
"stars": 5,
"date": "2007-05-17",
"text": "dr. goldberg offers everything ...",
"type": "review",
"business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 16
Interfaces and Tools
WebUIDrill Explorer
ODBC/JDBC
Sqlline
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 17
`
$ tar -xvzf apache-drill-1.4.0.tar.gz
$ bin/sqlline -u jdbc:drill:zk=local
$ bin/drill-embedded
> SELECT state, city, count(*) AS businesses
FROM dfs.yelp.`business.json.gz`
GROUP BY state, city
ORDER BY businesses DESC LIMIT 10;
+------------+------------+-------------+
| state | city | businesses |
+------------+------------+-------------+
| NV | Las Vegas | 12021 |
| AZ | Phoenix | 7499 |
| AZ | Scottsdale | 3605 |
| EDH | Edinburgh | 2804 |
| AZ | Mesa | 2041 |
| AZ | Tempe | 2025 |
| NV | Henderson | 1914 |
| AZ | Chandler | 1637 |
| WI | Madison | 1630 |
| AZ | Glendale | 1196 |
+------------+------------+-------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 18
Intuitive SQL Access to Complex Data
// It’s Friday 10pm in Vegas and looking for Hummus
> SELECT name, stars, b.hours.Friday friday, categories
FROM dfs.yelp.`business.json` b
WHERE b.hours.Friday.`open` < '22:00' AND
b.hours.Friday.`close` > '22:00' AND
REPEATED_CONTAINS(categories, 'Mediterranean') AND
city = 'Las Vegas'
ORDER BY stars DESC
LIMIT 2;
+------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
| name | stars | friday | categories |
+------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
| Khoury's Mediterranean Restaurant | 4.0 | {"close":"23:00","open":"11:00"} | ["Greek","Mediterranean","Middle Eastern","Restaurants"] |
| Olives | 4.0 | {"close":"22:30","open":"11:00"} | ["Bars","Mediterranean","Nightlife","Restaurants"] |
+------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 19
ANSI SQL Compatibility
//Get top cool rated businesses
 SELECT b.name from dfs.yelp.`business.json` b
WHERE b.business_id IN
(SELECT r.business_id FROM dfs.yelp.`review.json` r
GROUP BY r.business_id HAVING SUM(r.votes.cool) > 2000 ORDER BY
SUM(r.votes.cool) DESC);
+--------------------------------+
| name |
+--------------------------------+
| Earl of Sandwich |
| XS Nightclub |
| The Cosmopolitan of Las Vegas |
| Wicked Spoon |
| Bacchanal Buffet |
+--------------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 20
Logical Views
//Create a view combining business and reviews datasets
> CREATE OR REPLACE VIEW dfs.tmp.BusinessReviews AS
SELECT b.name, b.stars, r.votes.funny,
r.votes.useful, r.votes.cool, r.`date`
FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r
WHERE r.business_id = b.business_id;
+-------+-------------------------------------------------------------------+
| ok | summary |
+-------+-------------------------------------------------------------------+
| true | View 'BusinessReviews' replaced successfully in 'dfs.tmp' schema |
+-------+-------------------------------------------------------------------+
> SELECT COUNT(*) AS Total FROM dfs.tmp.BusinessReviews;
+------------+
| Total |
+------------+
| 1125458 |
+------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 21
Materialized Views AKA Tables
> ALTER SESSION SET `store.format` = 'parquet';
> CREATE TABLE dfs.yelp.BusinessReviewsTbl AS
SELECT b.name, b.stars, r.votes.funny funny,
r.votes.useful useful, r.votes.cool cool, r.`date`
FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r
WHERE r.business_id = b.business_id;
+------------+---------------------------+
| Fragment | Number of records written |
+------------+---------------------------+
| 1_0 | 176448 |
| 1_1 | 192439 |
| 1_2 | 198625 |
| 1_3 | 200863 |
| 1_4 | 181420 |
| 1_5 | 175663 |
+------------+---------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 22
Repeated Values Support
// Flatten repeated categories
> SELECT name, categories
FROM dfs.yelp.`business.json` LIMIT 3;
+---------------------------+-------------------------------------+
| name | categories |
+---------------------------+-------------------------------------+
| Eric Goldberg, MD | ["Doctors","Health & Medical"] |
| Clancy's Pub | ["Nightlife"] |
| Cool Springs Golf Center | ["Active Life","Mini Golf","Golf"] |
+—————————————+-------------------------------------+
> SELECT name, FLATTEN(categories) AS categories
FROM dfs.yelp.`business.json` LIMIT 5;
+---------------------------+-------------------+
| name | categories |
+---------------------------+-------------------+
| Eric Goldberg, MD | Doctors |
| Eric Goldberg, MD | Health & Medical |
| Clancy's Pub | Nightlife |
| Cool Springs Golf Center | Active Life |
| Cool Springs Golf Center | Mini Golf |
+---------------------------+-------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 23
Extensions to ANSI SQL to work with repeated values
// Get most common business categories
>SELECT category, count(*) AS categorycount
FROM (SELECT name, FLATTEN(categories) AS category
FROM dfs.yelp.`business.json`) c
GROUP BY category ORDER BY categorycount DESC;
+-------------------------------+----------------+
| category | categorycount |
+-------------------------------+----------------+
| Restaurants | 21892 |
| Shopping | 8919 |
| Automotive | 2965 |
… … … … … … … … … … … … … … … … … … … … … … … … … … …
| Oriental | 1 |
| High Fidelity Audio Equipment | 1 |
+--------------------------------+---------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 24
Checkins dataset {
"checkin_info":{
"3-4":1,
"13-5":1,
"6-6":1,
"14-5":1,
"14-6":1,
"14-2":1,
"14-3":1,
"19-0":1,
"11-5":1,
"13-2":1,
"11-6":2,
"11-3":1,
"12-6":1,
"6-5":1,
"5-5":1,
"9-2":1,
"9-5":1,
"9-6":1,
"5-2":1,
"7-6":1,
"7-5":1,
"7-4":1,
"17-5":1,
"8-5":1,
"10-2":1,
"10-5":1,
"10-6":1
},
"type":"checkin",
"business_id":"JwUE5GmEO-sH1FuwJgKBlQ"
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 25
Supports Dynamic / Unknown Columns
> SELECT KVGEN(checkin_info) checkins
FROM dfs.yelp.`checkin.json` LIMIT 1;
+------------+
| checkins |
+------------+
| [{"key":"3-4","value":1},{"key":"13-5","value":1},{"key":"6-6","value":1},{"key":"14-
5","value":1},{"key":"14-6","value":1},{"key":"14-2","value":1},{"key":"14-3","value":1},{"key":"19-
0","value":1},{"key":"11-5","value":1},{"key":"13-2","value":1},{"key":"11-6","value":2},{"key":"11-
3","value":1},{"key":"12-6","value":1},{"key":"6-5","value":1},{"key":"5-5","value":1},{"key":"9-
2","value":1},{"key":"9-5","value":1},{"key":"9-6","value":1},{"key":"5-2","value":1},{"key":"7-
6","value":1},{"key":"7-5","value":1},{"key":"7-4","value":1},{"key":"17-5","value":1},{"key":"8-
5","value":1},{"key":"10-2","value":1},{"key":"10-5","value":1},{"key":"10-6","value":1}] |
+------------+
> SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM
dfs.yelp.`checkin.json` limit 6;
+---------------------------+
| checkins |
+---------------------------+
| {"key":"9-5","value":1} |
| {"key":"7-5","value":1} |
| {"key":"13-3","value":1} |
| {"key":"17-6","value":1} |
| {"key":"13-0","value":1} |
| {"key":"17-3","value":1} |
+---------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 26
Makes it easy to work with dynamic/unknown columns
// Count total number of checkins on Sunday midnight
> SELECT SUM(checkintbl.checkins.`value`) as SundayMidnightCheckins FROM
(SELECT FLATTEN(KVGEN(checkin_info)) checkins
FROM dfs.yelp.checkin.json`) checkintbl
WHERE checkintbl.checkins.key='23-0';
+------------------------+
| SundayMidnightCheckins |
+------------------------+
| 8575 |
+------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 27
Federated Queries
// Join JSON File, Parquet and MongoDB collection
> SELECT u.name, b.category, count(1) nb_review
FROM mongo.yelp.`user` u , dfs.yelp.`review.parquet` r, (select business_id, flatten(categories) category from
dfs.yelp.`business.json` ) b
WHERE u.user_id = r.user_id
AND b.business_id = r.business_id
GROUP BY u.user_id, u.name, b.category
ORDER BY nb_review DESC
LIMIT 10;
+-----------+--------------+------------+
| name | category | nb_review |
+-----------+--------------+------------+
| Rand | Restaurants | 1086 |
| J | Restaurants | 661 |
| Jennifer | Restaurants | 657 |
| Brad | Restaurants | 638 |
| Jade | Restaurants | 586 |
… … … … … … … … … .. .. ..
| Emily | Restaurants | 560 |
| Norm | Restaurants | 543 |
| Aileen | Restaurants | 499 |
| Michael | Restaurants | 496 |
+-----------+--------------+------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 28
Drill Data Sources
JDBC
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 29
Directories are implicit partitions
select dir0, count(1)
from dfs.data.`/logs/`
where dir1 in (1,2,3)
group by dir0
logs
├── 2014
│ ├── 1
│ ├── 2
│ ├── 3
│ └── 4
└── 2015
└── 1
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 30
How does it work?
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 31
Everything is a “Drillbit”
• “Drillbits” run on each node
• In-memory columnar execution
• Coordination, Planning,
Execution
• Networked or not
• Exposes JDBC, ODBC, REST
• Built In Web UI and CLI
• Extensible
• Custom Functions
• Data Sources
Drillbit
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 32
Data Locality
HBase HBase
Mac
HDFS HDFS HDFS
HDFS HDFS HDFS
mongod mongod
HBase
Windows
Clusters DesktopClusters
HDFS & HBase cluster
HDFS cluster
MongoDB cluster
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 33
Run drillbits close to your data!
Drillbit
HBase HBase
Mac
HDFS HDFS HDFS
HDFS HDFS HDFS
mongod mongod
HBase
Windows
Clusters DesktopClusters
HDFS & HBase cluster
HDFS cluster
MongoDB cluster
Drillbit Drillbit
Drillbit Drillbit Drillbit
Drillbit Drillbit
Drillbit
Drillbit
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 34
Query Execution
Client
Drillbit Drillbit Drillbit
• Client connects to “a” Drillbit
• This Drillbit becomes the foreman
• Foreman generates execution plan
• cost base query optimisation
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 35
Query Execution
Client
Drillbit Drillbit Drillbit
• Execution fragment are farmed to other Drillbits
• Drillbits exchange data when necessary
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 36
Query Execution
Client
Drillbit Drillbit Drillbit
• Results are returned to the user through foreman
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 37
Granular Security via Drill Views
Name City State Credit Card #
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Owner
Admins
Permission Admins
Business Analyst Data Scientist
Name City State Credit Card #
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist View (/views/maskedcards.csv)
Not a physical data copy
Name City State
Dave San Jose CA
John Boulder CO
Business Analyst View
Owner
Admins
Permission
Business
Analysts
Owner
Admins
Permission
Data
Scientists
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 38
Extensibility
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 39
Extending Apache Drill
• File Format
• ORC
• …
• Data Sources
• NoSQL Databases
• Search Engines
• REST
• ….
• Custom Functions
https://github.com/mapr-demos/simple-drill-functions
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 40
Recommendations for Getting Started with Drill
New to Drill?
– Get started with Free MapR On Demand training
– Test Drive Drill on cloud with AWS
– Learn how to use Drill with Hadoop using MapR sandbox
– Workshop : https://github.com/tgrall/drill-workshop
Ready to play with your data?
– Try out Apache Drill in 10 mins guide on your desktop
– Download Drill for your cluster and start exploration
– Comprehensive tutorials and documentation available
Ask questions
– user@drill.apache.org
– dev@drill.apache.com
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 41
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 42
Q&A
@tgrall maprtech
tug@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

Más contenido relacionado

La actualidad más candente

How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...DataWorks Summit/Hadoop Summit
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Tugdual Grall
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin DataWorks Summit/Hadoop Summit
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes StrategicMapR Technologies
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm DataWorks Summit/Hadoop Summit
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeDataWorks Summit
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 

La actualidad más candente (20)

How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
When Streaming Becomes Strategic
When Streaming Becomes StrategicWhen Streaming Becomes Strategic
When Streaming Becomes Strategic
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 

Destacado

Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suiteSpagoWorld
 
My First Report slide support
My First Report slide supportMy First Report slide support
My First Report slide supportSpagoWorld
 
Parametric report slide support
Parametric report slide supportParametric report slide support
Parametric report slide supportSpagoWorld
 
My First Worksheet slide support
My First Worksheet slide supportMy First Worksheet slide support
My First Worksheet slide supportSpagoWorld
 
SpagoBI Suite Slide Support
SpagoBI Suite Slide SupportSpagoBI Suite Slide Support
SpagoBI Suite Slide SupportSpagoWorld
 
Starting with SpagoBI Slide Support
Starting with SpagoBI Slide SupportStarting with SpagoBI Slide Support
Starting with SpagoBI Slide SupportSpagoWorld
 
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...SpagoWorld
 
Spagic 3: OSGi Universal Middleware for an effective SOA solution
Spagic 3: OSGi Universal Middleware for an effective SOA solution Spagic 3: OSGi Universal Middleware for an effective SOA solution
Spagic 3: OSGi Universal Middleware for an effective SOA solution SpagoWorld
 
Webinar: SpagoBI Suite
Webinar: SpagoBI SuiteWebinar: SpagoBI Suite
Webinar: SpagoBI SuiteSpagoWorld
 
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...SpagoWorld
 
Openness as the Engine for Digital Innovation
Openness as the Engine for Digital InnovationOpenness as the Engine for Digital Innovation
Openness as the Engine for Digital InnovationSpagoWorld
 

Destacado (11)

Data Mining with SpagoBI suite
Data Mining with SpagoBI suiteData Mining with SpagoBI suite
Data Mining with SpagoBI suite
 
My First Report slide support
My First Report slide supportMy First Report slide support
My First Report slide support
 
Parametric report slide support
Parametric report slide supportParametric report slide support
Parametric report slide support
 
My First Worksheet slide support
My First Worksheet slide supportMy First Worksheet slide support
My First Worksheet slide support
 
SpagoBI Suite Slide Support
SpagoBI Suite Slide SupportSpagoBI Suite Slide Support
SpagoBI Suite Slide Support
 
Starting with SpagoBI Slide Support
Starting with SpagoBI Slide SupportStarting with SpagoBI Slide Support
Starting with SpagoBI Slide Support
 
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
 
Spagic 3: OSGi Universal Middleware for an effective SOA solution
Spagic 3: OSGi Universal Middleware for an effective SOA solution Spagic 3: OSGi Universal Middleware for an effective SOA solution
Spagic 3: OSGi Universal Middleware for an effective SOA solution
 
Webinar: SpagoBI Suite
Webinar: SpagoBI SuiteWebinar: SpagoBI Suite
Webinar: SpagoBI Suite
 
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...
 
Openness as the Engine for Digital Innovation
Openness as the Engine for Digital InnovationOpenness as the Engine for Digital Innovation
Openness as the Engine for Digital Innovation
 

Similar a HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist

Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillMapR Technologies
 
Apache drill self service data exploration (113)
Apache drill   self service data exploration (113)Apache drill   self service data exploration (113)
Apache drill self service data exploration (113)MapR Technologies
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillMapR Technologies
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Gralldistributed matters
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillVince Gonzalez
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...Big Data Spain
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 

Similar a HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist (20)

Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache Drill
 
Apache drill self service data exploration (113)
Apache drill   self service data exploration (113)Apache drill   self service data exploration (113)
Apache drill self service data exploration (113)
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache Drill
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Grall
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and Drill
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 

Más de SpagoWorld

[SFScon'17] More than a decade with free open source software
[SFScon'17] More than a decade with free open source software[SFScon'17] More than a decade with free open source software
[SFScon'17] More than a decade with free open source softwareSpagoWorld
 
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...EclipseDay Milano 2017 - How to make Data Science appealing with open source ...
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...SpagoWorld
 
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?SpagoWorld
 
Webinar - SpagoBI 5: here comes the Social Network analysis
Webinar - SpagoBI 5: here comes the Social Network analysis Webinar - SpagoBI 5: here comes the Social Network analysis
Webinar - SpagoBI 5: here comes the Social Network analysis SpagoWorld
 
Webinar - What's new with SpagoBI 5: presentation and demo
Webinar - What's new with SpagoBI 5: presentation and demoWebinar - What's new with SpagoBI 5: presentation and demo
Webinar - What's new with SpagoBI 5: presentation and demoSpagoWorld
 
SpagoBI 5 Demo Day and Workshop : Business Applications and Uses
SpagoBI 5 Demo Day and Workshop : Business Applications and UsesSpagoBI 5 Demo Day and Workshop : Business Applications and Uses
SpagoBI 5 Demo Day and Workshop : Business Applications and UsesSpagoWorld
 
SpagoBI 5 Demo Day and Workshop : Technology Applications and Uses
SpagoBI 5 Demo Day and Workshop : Technology Applications and UsesSpagoBI 5 Demo Day and Workshop : Technology Applications and Uses
SpagoBI 5 Demo Day and Workshop : Technology Applications and UsesSpagoWorld
 
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...SpagoWorld
 
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...SpagoWorld
 
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...SpagoWorld
 
Simpda 2014 - A living story: measuring quality of developments in a large in...
Simpda 2014 - A living story: measuring quality of developments in a large in...Simpda 2014 - A living story: measuring quality of developments in a large in...
Simpda 2014 - A living story: measuring quality of developments in a large in...SpagoWorld
 
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...SpagoWorld
 
SpagoBI 5 official presentation in Paris
SpagoBI 5 official presentation in ParisSpagoBI 5 official presentation in Paris
SpagoBI 5 official presentation in ParisSpagoWorld
 
Balanced Measurement Sets - Criteria for Improving Project Management Practices
Balanced Measurement Sets - Criteria for Improving Project Management PracticesBalanced Measurement Sets - Criteria for Improving Project Management Practices
Balanced Measurement Sets - Criteria for Improving Project Management PracticesSpagoWorld
 
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...SpagoWorld
 
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?SpagoWorld
 
Webinar - Self-build your cockpits and gain instant insights with SpagoBI 5
Webinar -  Self-build your cockpits and gain instant insights with SpagoBI 5Webinar -  Self-build your cockpits and gain instant insights with SpagoBI 5
Webinar - Self-build your cockpits and gain instant insights with SpagoBI 5SpagoWorld
 
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertips
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertipsWebinar - What's new in SpagoBI 5: advanced data analytics at your fingertips
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertipsSpagoWorld
 
The Business Intelligence SpagoBI suite and Big Data
The Business Intelligence SpagoBI suite and Big DataThe Business Intelligence SpagoBI suite and Big Data
The Business Intelligence SpagoBI suite and Big DataSpagoWorld
 
Open Source, a business model based on collaboration
Open Source, a business model based on collaborationOpen Source, a business model based on collaboration
Open Source, a business model based on collaborationSpagoWorld
 

Más de SpagoWorld (20)

[SFScon'17] More than a decade with free open source software
[SFScon'17] More than a decade with free open source software[SFScon'17] More than a decade with free open source software
[SFScon'17] More than a decade with free open source software
 
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...EclipseDay Milano 2017 - How to make Data Science appealing with open source ...
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...
 
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
 
Webinar - SpagoBI 5: here comes the Social Network analysis
Webinar - SpagoBI 5: here comes the Social Network analysis Webinar - SpagoBI 5: here comes the Social Network analysis
Webinar - SpagoBI 5: here comes the Social Network analysis
 
Webinar - What's new with SpagoBI 5: presentation and demo
Webinar - What's new with SpagoBI 5: presentation and demoWebinar - What's new with SpagoBI 5: presentation and demo
Webinar - What's new with SpagoBI 5: presentation and demo
 
SpagoBI 5 Demo Day and Workshop : Business Applications and Uses
SpagoBI 5 Demo Day and Workshop : Business Applications and UsesSpagoBI 5 Demo Day and Workshop : Business Applications and Uses
SpagoBI 5 Demo Day and Workshop : Business Applications and Uses
 
SpagoBI 5 Demo Day and Workshop : Technology Applications and Uses
SpagoBI 5 Demo Day and Workshop : Technology Applications and UsesSpagoBI 5 Demo Day and Workshop : Technology Applications and Uses
SpagoBI 5 Demo Day and Workshop : Technology Applications and Uses
 
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...
 
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
 
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
 
Simpda 2014 - A living story: measuring quality of developments in a large in...
Simpda 2014 - A living story: measuring quality of developments in a large in...Simpda 2014 - A living story: measuring quality of developments in a large in...
Simpda 2014 - A living story: measuring quality of developments in a large in...
 
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
 
SpagoBI 5 official presentation in Paris
SpagoBI 5 official presentation in ParisSpagoBI 5 official presentation in Paris
SpagoBI 5 official presentation in Paris
 
Balanced Measurement Sets - Criteria for Improving Project Management Practices
Balanced Measurement Sets - Criteria for Improving Project Management PracticesBalanced Measurement Sets - Criteria for Improving Project Management Practices
Balanced Measurement Sets - Criteria for Improving Project Management Practices
 
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...
 
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
 
Webinar - Self-build your cockpits and gain instant insights with SpagoBI 5
Webinar -  Self-build your cockpits and gain instant insights with SpagoBI 5Webinar -  Self-build your cockpits and gain instant insights with SpagoBI 5
Webinar - Self-build your cockpits and gain instant insights with SpagoBI 5
 
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertips
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertipsWebinar - What's new in SpagoBI 5: advanced data analytics at your fingertips
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertips
 
The Business Intelligence SpagoBI suite and Big Data
The Business Intelligence SpagoBI suite and Big DataThe Business Intelligence SpagoBI suite and Big Data
The Business Intelligence SpagoBI suite and Big Data
 
Open Source, a business model based on collaboration
Open Source, a business model based on collaborationOpen Source, a business model based on collaboration
Open Source, a business model based on collaboration
 

Último

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist

  • 1. © 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential © 2016 MapR Technologies 1 SQL-on-Hadoop with
  • 2. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 2 { about : me } Tugdual “Tug” Grall • MapR • Technical Evangelist • MongoDB • Technical Evangelist • Couchbase • Technical Evangelist • eXo • CTO • Oracle • Developer/Product Manager • Mainly Java/SOA • Developer in consulting firms • Web • @tgrall • http://tgrall.github.io • tgrall • NantesJUG co-founder • Pet Project : • http://www.resultri.com • tug@mapr.com • tugdual@gmail.com
  • 3. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 3 Ingest Store Process Consume
  • 4. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 4 MapR Converged Data Platform
  • 5. © 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential 5 The MapR Distribution including Apache Hadoop APACHE HADOOP AND OSS ECOSYSTEM Security YARN Spark Streaming Storm StreamingNoSQL & Search Sahara Provisioning & Coordination ML, Graph Mahout MLLib GraphX EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Pig Spark Batch MapReduce v1 & v2 HBase Solr Hive Impala Spark SQL Drill SQL Sentry Oozie ZooKeeperSqoop Flume Data Integration & Access HttpFS Hue Management Data HubEnterprise Grade Operational Data PlatformMapR-FS MapR-DB
  • 6. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 6 Agenda • Why Drill? • Some Drilling with Open Data • How does it work?
  • 7. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 7 1980 2000 20101990 2020 Fixed schema DBA controls structure Dynamic / Flexible schema Application controls structure NON-RELATIONAL DATASTORESRELATIONAL DATABASES GBs-TBs TBs-PBsVolume Database Data Increasingly Stored in Non-Relational Datastores Structure Development Structured Structured, semi-structured and unstructured Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
  • 8. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 8 How To Bring SQL to Non-Relational Data Stores? Familiarity of SQL Agility of NoSQL • ANSI SQL semantics • Low latency • Integrated with Tools/Applications • No schema management – HDFS (Parquet, JSON, etc.) – HBase – … • No transformation – No silos of data • Ease of use
  • 9. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 9 Drill Supports Schema Discovery On-The-Fly • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data 2Schema Discovered On-The-FlySchema Declared In Advance SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 10. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 10 • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications 50+ contributors 150+ years of experience building databases and distributed systems Contributing to Apache Drill
  • 11. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 11 - Sub-directory - HBase namespace - Hive database - Database Drill Enables ‘SQL-on-Everything’ SELECT * FROM dfs.yelp.`business.json` Workspace - Pathnames - Hive table - HBase table - Table Table - DFS (Text, Parquet, JSON, XML) - HBase/MapR-DB - Hive Metastore/HCatalog - RDBMS using JDBC - Easy API to go beyond Hadoop Storage plugin instance
  • 12. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 12 Drill’s Data Model is Flexible JSON BSON HBase Parquet Avro CSV TSV Dynamic schema Fixed schema Complex Flat Flexibility Name Gender Age Michael M 6 Jennifer F 3 RDBMS/SQL-on-Hadoop table Flexibility Apache Drill table { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos } { name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
  • 13. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 13 Drilling into Data
  • 14. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 14 Business dataset { "business_id": "4bEjOyTaDG24SY5TxsaUNQ", "full_address": "3655 Las Vegas Blvd SnThe StripnLas Vegas, NV 89109", "hours": { "Monday": {"close": "23:00", "open": "07:00"}, "Tuesday": {"close": "23:00", "open": "07:00"}, "Friday": {"close": "00:00", "open": "07:00"}, "Wednesday": {"close": "23:00", "open": "07:00"}, "Thursday": {"close": "23:00", "open": "07:00"}, "Sunday": {"close": "23:00", "open": "07:00"}, "Saturday": {"close": "00:00", "open": "07:00"} }, "open": true, "categories": ["Breakfast & Brunch", "Steakhouses", "French", "Restaurants"], "city": "Las Vegas", "review_count": 4084, "name": "Mon Ami Gabi", "neighborhoods": ["The Strip"], "longitude": -115.172588519464, "state": "NV", "stars": 4.0, "attributes": { "Alcohol": "full_bar”, "Noise Level": "average", "Has TV": false, "Attire": "casual", "Ambience": { "romantic": true, "intimate": false, "touristy": false, "hipster": false, "classy": true, "trendy": false, "casual": false }, "Good For": {"dessert": false, "latenight": false, "lunch": false, "dinner": true, "breakfast": false, "brunch": false}, } }
  • 15. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 15 Reviews dataset { "votes": {"funny": 0, "useful": 2, "cool": 1}, "user_id": "Xqd0DzHaiyRqVH3WRG7hzg", "review_id": "15SdjuK7DmYqUAj6rjGowg", "stars": 5, "date": "2007-05-17", "text": "dr. goldberg offers everything ...", "type": "review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA" }
  • 16. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 16 Interfaces and Tools WebUIDrill Explorer ODBC/JDBC Sqlline
  • 17. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 17 ` $ tar -xvzf apache-drill-1.4.0.tar.gz $ bin/sqlline -u jdbc:drill:zk=local $ bin/drill-embedded > SELECT state, city, count(*) AS businesses FROM dfs.yelp.`business.json.gz` GROUP BY state, city ORDER BY businesses DESC LIMIT 10; +------------+------------+-------------+ | state | city | businesses | +------------+------------+-------------+ | NV | Las Vegas | 12021 | | AZ | Phoenix | 7499 | | AZ | Scottsdale | 3605 | | EDH | Edinburgh | 2804 | | AZ | Mesa | 2041 | | AZ | Tempe | 2025 | | NV | Henderson | 1914 | | AZ | Chandler | 1637 | | WI | Madison | 1630 | | AZ | Glendale | 1196 | +------------+------------+-------------+
  • 18. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 18 Intuitive SQL Access to Complex Data // It’s Friday 10pm in Vegas and looking for Hummus > SELECT name, stars, b.hours.Friday friday, categories FROM dfs.yelp.`business.json` b WHERE b.hours.Friday.`open` < '22:00' AND b.hours.Friday.`close` > '22:00' AND REPEATED_CONTAINS(categories, 'Mediterranean') AND city = 'Las Vegas' ORDER BY stars DESC LIMIT 2; +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+ | name | stars | friday | categories | +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+ | Khoury's Mediterranean Restaurant | 4.0 | {"close":"23:00","open":"11:00"} | ["Greek","Mediterranean","Middle Eastern","Restaurants"] | | Olives | 4.0 | {"close":"22:30","open":"11:00"} | ["Bars","Mediterranean","Nightlife","Restaurants"] | +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
  • 19. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 19 ANSI SQL Compatibility //Get top cool rated businesses  SELECT b.name from dfs.yelp.`business.json` b WHERE b.business_id IN (SELECT r.business_id FROM dfs.yelp.`review.json` r GROUP BY r.business_id HAVING SUM(r.votes.cool) > 2000 ORDER BY SUM(r.votes.cool) DESC); +--------------------------------+ | name | +--------------------------------+ | Earl of Sandwich | | XS Nightclub | | The Cosmopolitan of Las Vegas | | Wicked Spoon | | Bacchanal Buffet | +--------------------------------+
  • 20. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 20 Logical Views //Create a view combining business and reviews datasets > CREATE OR REPLACE VIEW dfs.tmp.BusinessReviews AS SELECT b.name, b.stars, r.votes.funny, r.votes.useful, r.votes.cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id; +-------+-------------------------------------------------------------------+ | ok | summary | +-------+-------------------------------------------------------------------+ | true | View 'BusinessReviews' replaced successfully in 'dfs.tmp' schema | +-------+-------------------------------------------------------------------+ > SELECT COUNT(*) AS Total FROM dfs.tmp.BusinessReviews; +------------+ | Total | +------------+ | 1125458 | +------------+
  • 21. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 21 Materialized Views AKA Tables > ALTER SESSION SET `store.format` = 'parquet'; > CREATE TABLE dfs.yelp.BusinessReviewsTbl AS SELECT b.name, b.stars, r.votes.funny funny, r.votes.useful useful, r.votes.cool cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id; +------------+---------------------------+ | Fragment | Number of records written | +------------+---------------------------+ | 1_0 | 176448 | | 1_1 | 192439 | | 1_2 | 198625 | | 1_3 | 200863 | | 1_4 | 181420 | | 1_5 | 175663 | +------------+---------------------------+
  • 22. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 22 Repeated Values Support // Flatten repeated categories > SELECT name, categories FROM dfs.yelp.`business.json` LIMIT 3; +---------------------------+-------------------------------------+ | name | categories | +---------------------------+-------------------------------------+ | Eric Goldberg, MD | ["Doctors","Health & Medical"] | | Clancy's Pub | ["Nightlife"] | | Cool Springs Golf Center | ["Active Life","Mini Golf","Golf"] | +—————————————+-------------------------------------+ > SELECT name, FLATTEN(categories) AS categories FROM dfs.yelp.`business.json` LIMIT 5; +---------------------------+-------------------+ | name | categories | +---------------------------+-------------------+ | Eric Goldberg, MD | Doctors | | Eric Goldberg, MD | Health & Medical | | Clancy's Pub | Nightlife | | Cool Springs Golf Center | Active Life | | Cool Springs Golf Center | Mini Golf | +---------------------------+-------------------+
  • 23. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 23 Extensions to ANSI SQL to work with repeated values // Get most common business categories >SELECT category, count(*) AS categorycount FROM (SELECT name, FLATTEN(categories) AS category FROM dfs.yelp.`business.json`) c GROUP BY category ORDER BY categorycount DESC; +-------------------------------+----------------+ | category | categorycount | +-------------------------------+----------------+ | Restaurants | 21892 | | Shopping | 8919 | | Automotive | 2965 | … … … … … … … … … … … … … … … … … … … … … … … … … … … | Oriental | 1 | | High Fidelity Audio Equipment | 1 | +--------------------------------+---------------+
  • 24. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 24 Checkins dataset { "checkin_info":{ "3-4":1, "13-5":1, "6-6":1, "14-5":1, "14-6":1, "14-2":1, "14-3":1, "19-0":1, "11-5":1, "13-2":1, "11-6":2, "11-3":1, "12-6":1, "6-5":1, "5-5":1, "9-2":1, "9-5":1, "9-6":1, "5-2":1, "7-6":1, "7-5":1, "7-4":1, "17-5":1, "8-5":1, "10-2":1, "10-5":1, "10-6":1 }, "type":"checkin", "business_id":"JwUE5GmEO-sH1FuwJgKBlQ" }
  • 25. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 25 Supports Dynamic / Unknown Columns > SELECT KVGEN(checkin_info) checkins FROM dfs.yelp.`checkin.json` LIMIT 1; +------------+ | checkins | +------------+ | [{"key":"3-4","value":1},{"key":"13-5","value":1},{"key":"6-6","value":1},{"key":"14- 5","value":1},{"key":"14-6","value":1},{"key":"14-2","value":1},{"key":"14-3","value":1},{"key":"19- 0","value":1},{"key":"11-5","value":1},{"key":"13-2","value":1},{"key":"11-6","value":2},{"key":"11- 3","value":1},{"key":"12-6","value":1},{"key":"6-5","value":1},{"key":"5-5","value":1},{"key":"9- 2","value":1},{"key":"9-5","value":1},{"key":"9-6","value":1},{"key":"5-2","value":1},{"key":"7- 6","value":1},{"key":"7-5","value":1},{"key":"7-4","value":1},{"key":"17-5","value":1},{"key":"8- 5","value":1},{"key":"10-2","value":1},{"key":"10-5","value":1},{"key":"10-6","value":1}] | +------------+ > SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.`checkin.json` limit 6; +---------------------------+ | checkins | +---------------------------+ | {"key":"9-5","value":1} | | {"key":"7-5","value":1} | | {"key":"13-3","value":1} | | {"key":"17-6","value":1} | | {"key":"13-0","value":1} | | {"key":"17-3","value":1} | +---------------------------+
  • 26. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 26 Makes it easy to work with dynamic/unknown columns // Count total number of checkins on Sunday midnight > SELECT SUM(checkintbl.checkins.`value`) as SundayMidnightCheckins FROM (SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.checkin.json`) checkintbl WHERE checkintbl.checkins.key='23-0'; +------------------------+ | SundayMidnightCheckins | +------------------------+ | 8575 | +------------------------+
  • 27. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 27 Federated Queries // Join JSON File, Parquet and MongoDB collection > SELECT u.name, b.category, count(1) nb_review FROM mongo.yelp.`user` u , dfs.yelp.`review.parquet` r, (select business_id, flatten(categories) category from dfs.yelp.`business.json` ) b WHERE u.user_id = r.user_id AND b.business_id = r.business_id GROUP BY u.user_id, u.name, b.category ORDER BY nb_review DESC LIMIT 10; +-----------+--------------+------------+ | name | category | nb_review | +-----------+--------------+------------+ | Rand | Restaurants | 1086 | | J | Restaurants | 661 | | Jennifer | Restaurants | 657 | | Brad | Restaurants | 638 | | Jade | Restaurants | 586 | … … … … … … … … … .. .. .. | Emily | Restaurants | 560 | | Norm | Restaurants | 543 | | Aileen | Restaurants | 499 | | Michael | Restaurants | 496 | +-----------+--------------+------------+
  • 28. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 28 Drill Data Sources JDBC
  • 29. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 29 Directories are implicit partitions select dir0, count(1) from dfs.data.`/logs/` where dir1 in (1,2,3) group by dir0 logs ├── 2014 │ ├── 1 │ ├── 2 │ ├── 3 │ └── 4 └── 2015 └── 1
  • 30. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 30 How does it work?
  • 31. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 31 Everything is a “Drillbit” • “Drillbits” run on each node • In-memory columnar execution • Coordination, Planning, Execution • Networked or not • Exposes JDBC, ODBC, REST • Built In Web UI and CLI • Extensible • Custom Functions • Data Sources Drillbit
  • 32. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 32 Data Locality HBase HBase Mac HDFS HDFS HDFS HDFS HDFS HDFS mongod mongod HBase Windows Clusters DesktopClusters HDFS & HBase cluster HDFS cluster MongoDB cluster
  • 33. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 33 Run drillbits close to your data! Drillbit HBase HBase Mac HDFS HDFS HDFS HDFS HDFS HDFS mongod mongod HBase Windows Clusters DesktopClusters HDFS & HBase cluster HDFS cluster MongoDB cluster Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit
  • 34. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 34 Query Execution Client Drillbit Drillbit Drillbit • Client connects to “a” Drillbit • This Drillbit becomes the foreman • Foreman generates execution plan • cost base query optimisation
  • 35. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 35 Query Execution Client Drillbit Drillbit Drillbit • Execution fragment are farmed to other Drillbits • Drillbits exchange data when necessary
  • 36. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 36 Query Execution Client Drillbit Drillbit Drillbit • Results are returned to the user through foreman
  • 37. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 37 Granular Security via Drill Views Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Owner Admins Permission Admins Business Analyst Data Scientist Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist View (/views/maskedcards.csv) Not a physical data copy Name City State Dave San Jose CA John Boulder CO Business Analyst View Owner Admins Permission Business Analysts Owner Admins Permission Data Scientists
  • 38. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 38 Extensibility
  • 39. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 39 Extending Apache Drill • File Format • ORC • … • Data Sources • NoSQL Databases • Search Engines • REST • …. • Custom Functions https://github.com/mapr-demos/simple-drill-functions
  • 40. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 40 Recommendations for Getting Started with Drill New to Drill? – Get started with Free MapR On Demand training – Test Drive Drill on cloud with AWS – Learn how to use Drill with Hadoop using MapR sandbox – Workshop : https://github.com/tgrall/drill-workshop Ready to play with your data? – Try out Apache Drill in 10 mins guide on your desktop – Download Drill for your cluster and start exploration – Comprehensive tutorials and documentation available Ask questions – user@drill.apache.org – dev@drill.apache.com
  • 41. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 41
  • 42. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 42 Q&A @tgrall maprtech tug@mapr.com Engage with us! MapR maprtech mapr-technologies