SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
in the toolbox
Naoki Takezoe
@takezoen
BizReach, Inc
A lot of JSON in the world
● Configuration
● Data
● Log
We want to query or analyze them.
How?
Solutions for searching JSON
We♥SQL
What is Apache Drill?
● Storage
○ Classpath, Local file system / HDFS / S3, HBase,
Hive, MongoDB, JDBC
● File format
○ JSON, Parquet, CSV / TSV / PSV
Schema-free SQL Query Engine for
Hadoop, NoSQL and Cloud Storage
Let's begin!!
Installation
1. Download and expand Drill distribution
2. cd apache-drill-1.6.0/bin
3. ./drill-embedded http://localhost:8047/
Query local JSON files
{"name": "suzuki", "dept": "sales"}
{"name": "yamada", "dept": "development"}
{"name": "sato", "dept": "development"}
...
SELECT * FROM dfs.`/tmp/users.json` T1
WHERE T1.name = 'takezoe'
Access to RDB tables
Configure jdbc storage plugin at the web console:
{
"type": "jdbc",
"driver": "org.h2.Driver",
"url": "jdbc:h2:~/.gitbucket/data",
"username": "sa",
"password": "sa",
"enabled": true
}
Join JSON and RDB
SELECT
T1.`user`.name AS name,
T2.MAIL_ADDRESS AS mail
FROM dfs.`/tmp/users.json` T1
INNER JOIN h2.DATA.PUBLIC.ACCOUNT T2
ON T1.`user`.name = T2.USER_NAME
Connect to Drill via JDBC
We can use any JDBC frontend or BI tool with Drill
JDBC
Requires ZooKeeper
Connect to Drill via JDBC
Setup ZooKeeper
$ tar xvzf zookeeper-3.4.8.tar.gz
$ cd zookeeper-3.4.8
$ mv conf/zoo_sample.cfg conf/zoo.cfg
$ cd bin
$ ./zkServer.sh start
Run drillbit
$ cd apache-drill-1.6.0/bin
$ ./drillbit.sh start
Connect to Drill via JDBC
● JDBC Driver
○ DRILL_HOME/jars/jdbc-driver/drill-jdbc-all-1.6.0.jar
● Class
○ org.apache.drill.jdbc.Driver
● URL
○ jdbc:drill:drillbit=localhost
Handling nested JSON
Query nested JSON
{"user": {"name": "suzuki", "dept": "sales"}}
{"user": {"name": "yamada", "dept": "development"}}
{"user": {"name": "sato", "dept": "development"}}
...
SELECT
T.`user`.name AS name,
T.`user`.dept AS dept
FROM dfs.`/tmp/users.json` T
WHERE T.`user`.name = 'yamada';
Extract JSON
property as column
Expand nested JSON property to records
{"user": {
"name": "yamada",
"experience": [ {"lang": "Java"}, {"lang": "Scala"} ]
}}
SELECT
T2.name AS name,
T2.experience.lang AS lang,
FROM (
SELECT
T1.`user`.name AS name,
FLATTEN(T1.`user`.experience) AS experience
FROM dfs.`/tmp/users.json` T1
) T2
Expand nested array
as individual table
In the case of jq
$ cat users.json | jq '.user | select(.name == "yamada")'
Nested JSON in Drill brings complexy.
Maybe jq is better for simple query?
Use cases
Action log
● Store action log into the local file as JSON
● We can query them using Drill if necessary
Data warehouse
● Aggregate various datasources to Drill
● Data synchronization is no need
e.g. Access Elasticsearch through Hive
● elasticsearch-hadoop supports Hive
● Drill supports Hive
http://takezoe.hatenablog.com/entry/20150524/p1
Can we access Elasticsearch from Drill?
Conclusion
Conclusion
Apache Drill is
● good tool for querying various datasets
● easy setup and user friendly
● pre-investment is not required
● useful for small data, not only big data
Put Apache Drill into your toolbox!

Más contenido relacionado

Destacado

Scala Frustrations
Scala FrustrationsScala Frustrations
Scala Frustrationstakezoe
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
JavaからScalaへ
JavaからScalaへJavaからScalaへ
JavaからScalaへtakezoe
 
ネタじゃないScala.js
ネタじゃないScala.jsネタじゃないScala.js
ネタじゃないScala.jstakezoe
 
Play2実践tips集
Play2実践tips集Play2実践tips集
Play2実践tips集takezoe
 
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12MapR Technologies Japan
 
Scala界隈の近況
Scala界隈の近況Scala界隈の近況
Scala界隈の近況takezoe
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
GitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by ScalaGitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by Scalatakezoe
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3takezoe
 
Lightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just RightLightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just Rightmircodotta
 
そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?takezoe
 
Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析MapR Technologies Japan
 
Scala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscalaScala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscalaKazuhiro Sera
 
Java9 and Project Jigsaw
Java9 and Project JigsawJava9 and Project Jigsaw
Java9 and Project Jigsawtakezoe
 
SIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたことSIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたことtakezoe
 
イマドキの現場で使えるJavaライブラリ事情
イマドキの現場で使えるJavaライブラリ事情イマドキの現場で使えるJavaライブラリ事情
イマドキの現場で使えるJavaライブラリ事情takezoe
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcingAdam Warski
 
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscalaビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscalatakezoe
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache DrillCharles Givre
 

Destacado (20)

Scala Frustrations
Scala FrustrationsScala Frustrations
Scala Frustrations
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
JavaからScalaへ
JavaからScalaへJavaからScalaへ
JavaからScalaへ
 
ネタじゃないScala.js
ネタじゃないScala.jsネタじゃないScala.js
ネタじゃないScala.js
 
Play2実践tips集
Play2実践tips集Play2実践tips集
Play2実践tips集
 
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
 
Scala界隈の近況
Scala界隈の近況Scala界隈の近況
Scala界隈の近況
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
GitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by ScalaGitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by Scala
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 
Lightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just RightLightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just Right
 
そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?
 
Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析
 
Scala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscalaScala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscala
 
Java9 and Project Jigsaw
Java9 and Project JigsawJava9 and Project Jigsaw
Java9 and Project Jigsaw
 
SIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたことSIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたこと
 
イマドキの現場で使えるJavaライブラリ事情
イマドキの現場で使えるJavaライブラリ事情イマドキの現場で使えるJavaライブラリ事情
イマドキの現場で使えるJavaライブラリ事情
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcing
 
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscalaビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
 

Más de takezoe

Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloudtakezoe
 
GitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by ScalaGitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by Scalatakezoe
 
Testing Distributed Query Engine as a Service
Testing Distributed Query Engine as a ServiceTesting Distributed Query Engine as a Service
Testing Distributed Query Engine as a Servicetakezoe
 
Revisit Dependency Injection in scala
Revisit Dependency Injection in scalaRevisit Dependency Injection in scala
Revisit Dependency Injection in scalatakezoe
 
How to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applicationsHow to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applicationstakezoe
 
頑張りすぎないScala
頑張りすぎないScala頑張りすぎないScala
頑張りすぎないScalatakezoe
 
GitBucket: Git Centric Software Development Platform by Scala
GitBucket:  Git Centric Software Development Platform by ScalaGitBucket:  Git Centric Software Development Platform by Scala
GitBucket: Git Centric Software Development Platform by Scalatakezoe
 
Non-Functional Programming in Scala
Non-Functional Programming in ScalaNon-Functional Programming in Scala
Non-Functional Programming in Scalatakezoe
 
Scala警察のすすめ
Scala警察のすすめScala警察のすすめ
Scala警察のすすめtakezoe
 
Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」takezoe
 
The best of AltJava is Xtend
The best of AltJava is XtendThe best of AltJava is Xtend
The best of AltJava is Xtendtakezoe
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jstakezoe
 
Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015takezoe
 

Más de takezoe (13)

Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
 
GitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by ScalaGitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by Scala
 
Testing Distributed Query Engine as a Service
Testing Distributed Query Engine as a ServiceTesting Distributed Query Engine as a Service
Testing Distributed Query Engine as a Service
 
Revisit Dependency Injection in scala
Revisit Dependency Injection in scalaRevisit Dependency Injection in scala
Revisit Dependency Injection in scala
 
How to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applicationsHow to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applications
 
頑張りすぎないScala
頑張りすぎないScala頑張りすぎないScala
頑張りすぎないScala
 
GitBucket: Git Centric Software Development Platform by Scala
GitBucket:  Git Centric Software Development Platform by ScalaGitBucket:  Git Centric Software Development Platform by Scala
GitBucket: Git Centric Software Development Platform by Scala
 
Non-Functional Programming in Scala
Non-Functional Programming in ScalaNon-Functional Programming in Scala
Non-Functional Programming in Scala
 
Scala警察のすすめ
Scala警察のすすめScala警察のすすめ
Scala警察のすすめ
 
Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」
 
The best of AltJava is Xtend
The best of AltJava is XtendThe best of AltJava is Xtend
The best of AltJava is Xtend
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
 
Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015
 

Último

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Último (20)

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

Apache Drill in the toolbox

  • 1. in the toolbox Naoki Takezoe @takezoen BizReach, Inc
  • 2. A lot of JSON in the world ● Configuration ● Data ● Log
  • 3. We want to query or analyze them. How?
  • 6. What is Apache Drill? ● Storage ○ Classpath, Local file system / HDFS / S3, HBase, Hive, MongoDB, JDBC ● File format ○ JSON, Parquet, CSV / TSV / PSV Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
  • 8. Installation 1. Download and expand Drill distribution 2. cd apache-drill-1.6.0/bin 3. ./drill-embedded http://localhost:8047/
  • 9. Query local JSON files {"name": "suzuki", "dept": "sales"} {"name": "yamada", "dept": "development"} {"name": "sato", "dept": "development"} ... SELECT * FROM dfs.`/tmp/users.json` T1 WHERE T1.name = 'takezoe'
  • 10. Access to RDB tables Configure jdbc storage plugin at the web console: { "type": "jdbc", "driver": "org.h2.Driver", "url": "jdbc:h2:~/.gitbucket/data", "username": "sa", "password": "sa", "enabled": true }
  • 11. Join JSON and RDB SELECT T1.`user`.name AS name, T2.MAIL_ADDRESS AS mail FROM dfs.`/tmp/users.json` T1 INNER JOIN h2.DATA.PUBLIC.ACCOUNT T2 ON T1.`user`.name = T2.USER_NAME
  • 12. Connect to Drill via JDBC We can use any JDBC frontend or BI tool with Drill JDBC Requires ZooKeeper
  • 13. Connect to Drill via JDBC Setup ZooKeeper $ tar xvzf zookeeper-3.4.8.tar.gz $ cd zookeeper-3.4.8 $ mv conf/zoo_sample.cfg conf/zoo.cfg $ cd bin $ ./zkServer.sh start Run drillbit $ cd apache-drill-1.6.0/bin $ ./drillbit.sh start
  • 14. Connect to Drill via JDBC ● JDBC Driver ○ DRILL_HOME/jars/jdbc-driver/drill-jdbc-all-1.6.0.jar ● Class ○ org.apache.drill.jdbc.Driver ● URL ○ jdbc:drill:drillbit=localhost
  • 16. Query nested JSON {"user": {"name": "suzuki", "dept": "sales"}} {"user": {"name": "yamada", "dept": "development"}} {"user": {"name": "sato", "dept": "development"}} ... SELECT T.`user`.name AS name, T.`user`.dept AS dept FROM dfs.`/tmp/users.json` T WHERE T.`user`.name = 'yamada'; Extract JSON property as column
  • 17. Expand nested JSON property to records {"user": { "name": "yamada", "experience": [ {"lang": "Java"}, {"lang": "Scala"} ] }} SELECT T2.name AS name, T2.experience.lang AS lang, FROM ( SELECT T1.`user`.name AS name, FLATTEN(T1.`user`.experience) AS experience FROM dfs.`/tmp/users.json` T1 ) T2 Expand nested array as individual table
  • 18. In the case of jq $ cat users.json | jq '.user | select(.name == "yamada")' Nested JSON in Drill brings complexy. Maybe jq is better for simple query?
  • 20. Action log ● Store action log into the local file as JSON ● We can query them using Drill if necessary
  • 21. Data warehouse ● Aggregate various datasources to Drill ● Data synchronization is no need
  • 22. e.g. Access Elasticsearch through Hive ● elasticsearch-hadoop supports Hive ● Drill supports Hive http://takezoe.hatenablog.com/entry/20150524/p1 Can we access Elasticsearch from Drill?
  • 24. Conclusion Apache Drill is ● good tool for querying various datasets ● easy setup and user friendly ● pre-investment is not required ● useful for small data, not only big data Put Apache Drill into your toolbox!