SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Treasure  Data  Inc.
Research  Engineer
Makoto  YUI  @myui
2015/05/14
TD  tech  talk  #3  @Retty 1
http://myui.github.io/
20  min.  Introduction  to  Hivemall
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Ø2015/04  Joined  Treasure  Data,  Inc.
Ø1st Research  Engineer  in  Treasure  Data
ØMy  mission  in  TD  is  developing  ML-­‐as-­‐a-­‐Service  (MLaaS)  
Ø2010/04-­‐2015/03  Senior  Researcher  at  National  Institute  
of  Advanced  Industrial  Science  and  Technology,  Japan.  
ØWorked  on  a  large-­‐scale  Machine  Learning  project  and  Parallel  
Databases  
Ø2009/03  Ph.D.  in  Computer  Science  from  NAIST
Ø My  research  topic  was  about  building  XML  native  database  and  
Parallel  Database  systems
ØSuper  programmer  award  from  the  MITOU  Foundation  
(a  Government  founded  program  for  finding  young  and  
talented  programmers)
Ø Super  creators  in  Treasure  Data:  Sada Furuhashi,  Keisuke  Nishida
2
Who  am    I  ?
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
3
0
2000
4000
6000
8000
10000
12000
Aug-­‐12Sep-­‐12Oct-­‐12Nov-­‐12Dec-­‐12
Jan-­‐13Feb-­‐13M
ar-­‐13Apr-­‐13M
ay-­‐13Jun-­‐13
Jul-­‐13Aug-­‐13Sep-­‐13Oct-­‐13Nov-­‐13Dec-­‐13
Jan-­‐14Feb-­‐14M
ar-­‐14Apr-­‐14M
ay-­‐14Jun-­‐14
Jul-­‐14Aug-­‐14Sep-­‐14Oct-­‐14
Billion  records  (Unit)
Service  in
Series  A  Funding
Reached  100  customers
Selected  as  “Cool  Vendor  
in  Big  Data”  by  Gartner
10  trillion
records  
5  trillion  records
Figures on Oct. 2014
4 hundred thousand (40万) records Imported for each SECOND!!
10+ trillion (10兆) records Total number of imported records
12 billion (120億) records # records sent by an Ad-tech company
Figures  of  Imported  Data  in  Treasure  Data
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
The  latest  numbers  in  Treasure  Data
100+
Customers
In Japan
15 trillion
# of
stored records
4,000
A single company
sends data to us
from 4,000 nodes
500,000
# of records
stored per a second
4
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Plan  of  the  Talk
1. Brief  introduction  to  Hivemall
2. How  to  use  Hivemall
3. Real-­‐time  prediction  w/  Hivemall  and  RDBMS
5
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
What  is  Hivemall
Scalable  machine  learning  library  built  on  the  top  of  
Apache  Hive,  licensed  under  the  Apache  License  v2
Hadoop  HDFS
MapReduce
(MRv1)
Hive /  PIG
Hivemall
Apache  YARN
Apache  Tez
DAG processing
MR v2
Machine  Learning
Check  http://github.com/myui/hivemall
6
Query  Processing
Parallel  Data  
Processing  Framework
Resource  Management
Distributed  File  System
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
R
M MM
M
HDFS
HDFS
M M M
R
M M M
R
HDFS
M MM
M M
HDFS
R
MapReduce  and  DAG  engine
MapReduce   DAG  engine
Tez/Spark
No  intermediate  DFS  reads/writes!
7
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Very  easy  to  use;  Machine  Learning  on  SQL
The  key  characteristic  of  Hivemall
100+  lines
of  code
Classification  with  Mahout
CREATE  TABLE  lr_model AS
SELECT
feature,  -­‐-­‐ reducers  perform  model  averaging  in  
parallel
avg(weight)  as  weight
FROM  (
SELECT  logress(features,label,..)  as  (feature,weight)
FROM  train
)  t  -­‐-­‐ map-­‐only  task
GROUP  BY  feature;  -­‐-­‐ shuffled  to  reducers
ü Machine  Learning  made  easy  for  SQL  
developers  (ML  for  the  rest  of  us)
ü APIs  are  very  stable  because  of  SQL  
abstraction
This  SQL  query  automatically  runs  in  parallel
on  Hadoop  
8
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
List  of  functions  in  Hivemall  v0.3
9
• Classification  (both  
binary-­‐ and  multi-­‐class)
ü Perceptron
ü Passive  Aggressive  (PA)
ü Confidence  Weighted  (CW)
ü Adaptive  Regularization  of  
Weight  Vectors  (AROW)
ü Soft  Confidence  Weighted  (SCW)
ü AdaGrad+RDA
• Regression
ü Logistic  Regression  (SGD)
ü PA  Regression
ü AROW  Regression
ü AdaGrad
ü AdaDELTA
• kNN and  Recommendation
ü Minhash and  b-­‐Bit  Minhash
(LSH  variant)
ü Similarity  Search  using  K-­‐NN
ü Matrix  Factorization
• Feature  engineering
ü Feature  hashing
ü Feature  scaling
(normalization,  z-­‐score)  
ü TF-­‐IDF  vectorizer
Treasure  Data  will  support  Hivemall
v0.3.1  in  the  next  week!  
bit.ly/hivemall-­‐mf
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
• Contribution  from  Daniel  Dai  (Pig  PMC)  from  
Hortonworks
• To  be  supported  from  Pig  0.15
10
Hivemall  on  Apache  Pig
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Plan  of  the  Talk
1. Brief  introduction  to  Hivemall
2. How  to  use  Hivemall
3. Real-­‐time  prediction  w/  Hivemall  and  RDBMS
11
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall
Machine
Learning
Training
Prediction
Prediction
Model
Label
Feature  Vector
Feature  Vector
Label
Data  preparation
12
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Create external table e2006tfidf_train (
rowid int,
label float,
features ARRAY<STRING>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '¥t'
COLLECTION ITEMS TERMINATED BY ",“
STORED AS TEXTFILE LOCATION '/dataset/E2006-
tfidf/train';
How  to  use  Hivemall  -­‐ Data  preparation
Define  a  Hive  table  for  training/testing  data
13
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall
Machine
Learning
Training
Prediction
Prediction
Model
Label
Feature  Vector
Feature  Vector
Label
Feature  Engineering
14
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
create view e2006tfidf_train_scaled
as
select
rowid,
rescale(target,${min_label},${max_label})
as label,
features
from
e2006tfidf_train;
Applying a Min-Max Feature Normalization
How  to  use  Hivemall  -­‐ Feature  Engineering
Transforming  a  label  value  
to  a  value  between  0.0  and  1.0
15
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall
Machine
Learning
Training
Prediction
Prediction
Model
Label
Feature  Vector
Feature  Vector
Label
Training
16
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall  -­‐ Training
CREATE TABLE lr_model AS
SELECT
feature,
avg(weight) as weight
FROM (
SELECT logress(features,label,..)
as (feature,weight)
FROM train
) t
GROUP BY feature
Training  by  logistic  regression
map-­‐only  task  to  learn  a  prediction  model
Shuffle  map-­‐outputs  to  reduces  by  feature
Reducers  perform  model  averaging  
in  parallel
17
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall  -­‐ Training
CREATE TABLE news20b_cw_model1 AS
SELECT
feature,
voted_avg(weight) as weight
FROM
(SELECT
train_cw(features,label)
as (feature,weight)
FROM
news20b_train
) t
GROUP BY feature
Training  of  Confidence  Weighted  Classifier
Vote  to  use  negative  or  positive  
weights  for  avg
+0.7,  +0.3,  +0.2,  -­‐0.1,  +0.7
Training  for  the  CW  classifier
18
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
create table news20mc_ensemble_model1 as
select
label,
cast(feature as int) as feature,
cast(voted_avg(weight) as float) as weight
from
(select
train_multiclass_cw(addBias(features),label)
as (label,feature,weight)
from
news20mc_train_x3
union all
select
train_multiclass_arow(addBias(features),label)
as (label,feature,weight)
from
news20mc_train_x3
union all
select
train_multiclass_scw(addBias(features),label)
as (label,feature,weight)
from
news20mc_train_x3
) t
group by label, feature;
Ensemble  learning  for  stable  prediction  performance
Just  stack  prediction  models  
by  union  all
19
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall
Machine
Learning
Training
Prediction
Prediction
Model
Label
Feature  Vector
Feature  Vector
Label
Prediction
20
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall  -­‐ Prediction
CREATE TABLE lr_predict
as
SELECT
t.rowid,
sigmoid(sum(m.weight)) as prob
FROM
testing_exploded t LEFT OUTER JOIN
lr_model m ON (t.feature = m.feature)
GROUP BY
t.rowid
Prediction  is  done  by  LEFT  OUTER  JOIN
between  test  data  and  prediction  model
No  need  to  load  the  entire  model  into  memory
21
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Plan  of  the  Talk
1. Brief  introduction  to  Hivemall
2. How  to  use  Hivemall
3. Real-­‐time  prediction  w/  Hivemall  and  RDBMS
22
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Type/Purpose  Matrix  of  Machine  Learning
23
Online
Learning
Offline
Learning
Online
Prediction
• Algorithm Trade  (HFT)
• Twitter  real-­‐time  
analysis
• Ad-­‐tech (e.g.,  CTR/CVR  
prediction)
• Real-­‐time  
recommendation
Offline
Prediction
no/fewneeds?
• Daily/weeklybatch  
systems
• Business
Analytics/Reporting
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
How  to  use  Hivemall
Machine
Learning
Batch Training on Hadoop
Online Prediction on RDBMS
Prediction
Model
Label
Feature  Vector
Feature  Vector
Label
Export  
prediction  model
24
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Export  Prediction  Model  to  a  RDBMS
25
hive> desc news20b_cw_model1;
feature int
weight double
Any  RDBMS
TD  export
Periodical  export  is  very easy
in  Treasure  Data
103 -0.4896543622016907
104 -0.0955817922949791
105 0.12560302019119263
106 0.09214721620082855
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
26
hive>  desc  testing_exploded;                                                    
feature                                  string  
value                                      float
Real-­‐time  Prediction  on  MySQL
#2  Preparing  a  Test  data  table
SIGMOID(x) =  1.0  /  (1.0  +  exp(-­‐x))
Prediction
Model
Label
Feature  Vector
SELECT    
sigmoid(sum(t.value   *  m.weight))  as  prob
FROM
testing_exploded   t  LEFT  OUTER  JOIN  
prediction_model   m  ON  (t.feature  =  m.feature)
#3  Online  prediction  on  MySQL  
You  can  alternatively  use  SQL  view
defining  for  testing  target
Index  lookups  are  very
efficient  in  RDBMSs
http://bit.ly/hivemall-­‐rtp
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
Cost  of  Amazon  Machine  Learning
Amazon-­‐ML  is  suspected  to  be  based  on  Vowpal Wabbit
(single  process)  
27
Data  Analysis  and  Model  Building  Fees
$0.42/Instance  per  Hour
Batch  Prediction
$0.1/1000 requests
Real-­‐time  Prediction
$0.0001  per  a  request
Pay-­‐per-­‐request    is  apparently  not  suitable  for  doing  prediction  for  
each  web  request  (e.g.  online  CTR  prediction)
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
28
Real-­‐time  Prediction  on  Treasure  Data
Run  batch  training
job  periodically
Real-­‐time  prediction
on  a  RDBMS
Periodical
export
Copyright  ©2015  Treasure  Data.    All  Rights  Reserved.
29
Beyond  Query-­‐as-­‐a-­‐Service!
We  ❤️  Open-­‐source!  We  Invented  ..
We  are  Hiring!

Más contenido relacionado

La actualidad más candente

Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Ken SASAKI
 
機械学習を用いた効果検証POL共催セミナー_20220819.pdf
機械学習を用いた効果検証POL共催セミナー_20220819.pdf機械学習を用いた効果検証POL共催セミナー_20220819.pdf
機械学習を用いた効果検証POL共催セミナー_20220819.pdfssuser5ec200
 
スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習hagino 3000
 
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudy
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudyネットワークの自動化・監視の取り組みについて #netopscoding #npstudy
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudyYahoo!デベロッパーネットワーク
 
MaxScaleを触ってみた
MaxScaleを触ってみたMaxScaleを触ってみた
MaxScaleを触ってみたFujishiro Takuya
 
FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化Kazunori Sato
 
CVPR2016 reading - 特徴量学習とクロスモーダル転移について
CVPR2016 reading - 特徴量学習とクロスモーダル転移についてCVPR2016 reading - 特徴量学習とクロスモーダル転移について
CVPR2016 reading - 特徴量学習とクロスモーダル転移についてAkisato Kimura
 
Amazon EC2 HPCインスタンス - AWSマイスターシリーズ
Amazon EC2 HPCインスタンス - AWSマイスターシリーズAmazon EC2 HPCインスタンス - AWSマイスターシリーズ
Amazon EC2 HPCインスタンス - AWSマイスターシリーズAmazon Web Services Japan
 
Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能
Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能
Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能Hiroaki Fujiwara
 
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)lestrrat
 
Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)
Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)
Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)akira6592
 
エッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことエッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことLeapMind Inc
 
フラッター開発におけるシークレット情報取扱考察
フラッター開発におけるシークレット情報取扱考察フラッター開発におけるシークレット情報取扱考察
フラッター開発におけるシークレット情報取扱考察cch-robo
 
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...NTT DATA Technology & Innovation
 
ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)
ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)
ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)Atsushi Hasegawa
 

La actualidad más candente (20)

Hadoopの概念と基本的知識
Hadoopの概念と基本的知識Hadoopの概念と基本的知識
Hadoopの概念と基本的知識
 
機械学習を用いた効果検証POL共催セミナー_20220819.pdf
機械学習を用いた効果検証POL共催セミナー_20220819.pdf機械学習を用いた効果検証POL共催セミナー_20220819.pdf
機械学習を用いた効果検証POL共催セミナー_20220819.pdf
 
スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習
 
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudy
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudyネットワークの自動化・監視の取り組みについて #netopscoding #npstudy
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudy
 
Data Lake ハンズオン
Data Lake ハンズオンData Lake ハンズオン
Data Lake ハンズオン
 
MaxScaleを触ってみた
MaxScaleを触ってみたMaxScaleを触ってみた
MaxScaleを触ってみた
 
FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化
 
CVPR2016 reading - 特徴量学習とクロスモーダル転移について
CVPR2016 reading - 特徴量学習とクロスモーダル転移についてCVPR2016 reading - 特徴量学習とクロスモーダル転移について
CVPR2016 reading - 特徴量学習とクロスモーダル転移について
 
Amazon EC2 HPCインスタンス - AWSマイスターシリーズ
Amazon EC2 HPCインスタンス - AWSマイスターシリーズAmazon EC2 HPCインスタンス - AWSマイスターシリーズ
Amazon EC2 HPCインスタンス - AWSマイスターシリーズ
 
Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能
Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能
Sansan with kintone | アプリ構成「会社アプリ+担当者アプリ」で利用できる機能
 
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
GoらしいAPIを求める旅路 (Go Conference 2018 Spring)
 
Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)
Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)
Ansibleではじめるサーバー・ネットワークの自動化(2018/08/22)
 
NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例
 
エッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことエッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なこと
 
フラッター開発におけるシークレット情報取扱考察
フラッター開発におけるシークレット情報取扱考察フラッター開発におけるシークレット情報取扱考察
フラッター開発におけるシークレット情報取扱考察
 
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
 
SpringBootTest入門
SpringBootTest入門SpringBootTest入門
SpringBootTest入門
 
ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)
ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)
ROSCon発表の振り返りとROSConの振り返り(ROS Japan UG #48 ROSCon 2022ふりかえり会)
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
KafkaとPulsar
 
Deadlock
DeadlockDeadlock
Deadlock
 

Similar a Introduction to Hivemall

Hivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAHivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAMakoto Yui
 
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...huguk
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataAki Ariga
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Srivatsan Ramanujam
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemallMakoto Yui
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Makoto Yui
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Makoto Yui
 
Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceMakoto Yui
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworksIJDKP
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Steve Keil
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsStephan Reimann
 

Similar a Introduction to Hivemall (20)

Hivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CAHivemall tech talk at Redwood, CA
Hivemall tech talk at Redwood, CA
 
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
Hadoop for Data Science: Moving from BI dashboards to R models, using Hive st...
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure Data
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemall
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
 
Apache Hivemall and my OSS experience
Apache Hivemall and my OSS experienceApache Hivemall and my OSS experience
Apache Hivemall and my OSS experience
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworks
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
 

Más de Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 

Más de Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 

Último

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 

Último (20)

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 

Introduction to Hivemall

  • 1. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Treasure  Data  Inc. Research  Engineer Makoto  YUI  @myui 2015/05/14 TD  tech  talk  #3  @Retty 1 http://myui.github.io/ 20  min.  Introduction  to  Hivemall
  • 2. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Ø2015/04  Joined  Treasure  Data,  Inc. Ø1st Research  Engineer  in  Treasure  Data ØMy  mission  in  TD  is  developing  ML-­‐as-­‐a-­‐Service  (MLaaS)   Ø2010/04-­‐2015/03  Senior  Researcher  at  National  Institute   of  Advanced  Industrial  Science  and  Technology,  Japan.   ØWorked  on  a  large-­‐scale  Machine  Learning  project  and  Parallel   Databases   Ø2009/03  Ph.D.  in  Computer  Science  from  NAIST Ø My  research  topic  was  about  building  XML  native  database  and   Parallel  Database  systems ØSuper  programmer  award  from  the  MITOU  Foundation   (a  Government  founded  program  for  finding  young  and   talented  programmers) Ø Super  creators  in  Treasure  Data:  Sada Furuhashi,  Keisuke  Nishida 2 Who  am    I  ?
  • 3. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. 3 0 2000 4000 6000 8000 10000 12000 Aug-­‐12Sep-­‐12Oct-­‐12Nov-­‐12Dec-­‐12 Jan-­‐13Feb-­‐13M ar-­‐13Apr-­‐13M ay-­‐13Jun-­‐13 Jul-­‐13Aug-­‐13Sep-­‐13Oct-­‐13Nov-­‐13Dec-­‐13 Jan-­‐14Feb-­‐14M ar-­‐14Apr-­‐14M ay-­‐14Jun-­‐14 Jul-­‐14Aug-­‐14Sep-­‐14Oct-­‐14 Billion  records  (Unit) Service  in Series  A  Funding Reached  100  customers Selected  as  “Cool  Vendor   in  Big  Data”  by  Gartner 10  trillion records   5  trillion  records Figures on Oct. 2014 4 hundred thousand (40万) records Imported for each SECOND!! 10+ trillion (10兆) records Total number of imported records 12 billion (120億) records # records sent by an Ad-tech company Figures  of  Imported  Data  in  Treasure  Data
  • 4. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. The  latest  numbers  in  Treasure  Data 100+ Customers In Japan 15 trillion # of stored records 4,000 A single company sends data to us from 4,000 nodes 500,000 # of records stored per a second 4
  • 5. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Plan  of  the  Talk 1. Brief  introduction  to  Hivemall 2. How  to  use  Hivemall 3. Real-­‐time  prediction  w/  Hivemall  and  RDBMS 5
  • 6. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. What  is  Hivemall Scalable  machine  learning  library  built  on  the  top  of   Apache  Hive,  licensed  under  the  Apache  License  v2 Hadoop  HDFS MapReduce (MRv1) Hive /  PIG Hivemall Apache  YARN Apache  Tez DAG processing MR v2 Machine  Learning Check  http://github.com/myui/hivemall 6 Query  Processing Parallel  Data   Processing  Framework Resource  Management Distributed  File  System
  • 7. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. R M MM M HDFS HDFS M M M R M M M R HDFS M MM M M HDFS R MapReduce  and  DAG  engine MapReduce   DAG  engine Tez/Spark No  intermediate  DFS  reads/writes! 7
  • 8. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Very  easy  to  use;  Machine  Learning  on  SQL The  key  characteristic  of  Hivemall 100+  lines of  code Classification  with  Mahout CREATE  TABLE  lr_model AS SELECT feature,  -­‐-­‐ reducers  perform  model  averaging  in   parallel avg(weight)  as  weight FROM  ( SELECT  logress(features,label,..)  as  (feature,weight) FROM  train )  t  -­‐-­‐ map-­‐only  task GROUP  BY  feature;  -­‐-­‐ shuffled  to  reducers ü Machine  Learning  made  easy  for  SQL   developers  (ML  for  the  rest  of  us) ü APIs  are  very  stable  because  of  SQL   abstraction This  SQL  query  automatically  runs  in  parallel on  Hadoop   8
  • 9. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. List  of  functions  in  Hivemall  v0.3 9 • Classification  (both   binary-­‐ and  multi-­‐class) ü Perceptron ü Passive  Aggressive  (PA) ü Confidence  Weighted  (CW) ü Adaptive  Regularization  of   Weight  Vectors  (AROW) ü Soft  Confidence  Weighted  (SCW) ü AdaGrad+RDA • Regression ü Logistic  Regression  (SGD) ü PA  Regression ü AROW  Regression ü AdaGrad ü AdaDELTA • kNN and  Recommendation ü Minhash and  b-­‐Bit  Minhash (LSH  variant) ü Similarity  Search  using  K-­‐NN ü Matrix  Factorization • Feature  engineering ü Feature  hashing ü Feature  scaling (normalization,  z-­‐score)   ü TF-­‐IDF  vectorizer Treasure  Data  will  support  Hivemall v0.3.1  in  the  next  week!   bit.ly/hivemall-­‐mf
  • 10. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. • Contribution  from  Daniel  Dai  (Pig  PMC)  from   Hortonworks • To  be  supported  from  Pig  0.15 10 Hivemall  on  Apache  Pig
  • 11. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Plan  of  the  Talk 1. Brief  introduction  to  Hivemall 2. How  to  use  Hivemall 3. Real-­‐time  prediction  w/  Hivemall  and  RDBMS 11
  • 12. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall Machine Learning Training Prediction Prediction Model Label Feature  Vector Feature  Vector Label Data  preparation 12
  • 13. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Create external table e2006tfidf_train ( rowid int, label float, features ARRAY<STRING> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '¥t' COLLECTION ITEMS TERMINATED BY ",“ STORED AS TEXTFILE LOCATION '/dataset/E2006- tfidf/train'; How  to  use  Hivemall  -­‐ Data  preparation Define  a  Hive  table  for  training/testing  data 13
  • 14. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall Machine Learning Training Prediction Prediction Model Label Feature  Vector Feature  Vector Label Feature  Engineering 14
  • 15. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. create view e2006tfidf_train_scaled as select rowid, rescale(target,${min_label},${max_label}) as label, features from e2006tfidf_train; Applying a Min-Max Feature Normalization How  to  use  Hivemall  -­‐ Feature  Engineering Transforming  a  label  value   to  a  value  between  0.0  and  1.0 15
  • 16. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall Machine Learning Training Prediction Prediction Model Label Feature  Vector Feature  Vector Label Training 16
  • 17. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall  -­‐ Training CREATE TABLE lr_model AS SELECT feature, avg(weight) as weight FROM ( SELECT logress(features,label,..) as (feature,weight) FROM train ) t GROUP BY feature Training  by  logistic  regression map-­‐only  task  to  learn  a  prediction  model Shuffle  map-­‐outputs  to  reduces  by  feature Reducers  perform  model  averaging   in  parallel 17
  • 18. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall  -­‐ Training CREATE TABLE news20b_cw_model1 AS SELECT feature, voted_avg(weight) as weight FROM (SELECT train_cw(features,label) as (feature,weight) FROM news20b_train ) t GROUP BY feature Training  of  Confidence  Weighted  Classifier Vote  to  use  negative  or  positive   weights  for  avg +0.7,  +0.3,  +0.2,  -­‐0.1,  +0.7 Training  for  the  CW  classifier 18
  • 19. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. create table news20mc_ensemble_model1 as select label, cast(feature as int) as feature, cast(voted_avg(weight) as float) as weight from (select train_multiclass_cw(addBias(features),label) as (label,feature,weight) from news20mc_train_x3 union all select train_multiclass_arow(addBias(features),label) as (label,feature,weight) from news20mc_train_x3 union all select train_multiclass_scw(addBias(features),label) as (label,feature,weight) from news20mc_train_x3 ) t group by label, feature; Ensemble  learning  for  stable  prediction  performance Just  stack  prediction  models   by  union  all 19
  • 20. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall Machine Learning Training Prediction Prediction Model Label Feature  Vector Feature  Vector Label Prediction 20
  • 21. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall  -­‐ Prediction CREATE TABLE lr_predict as SELECT t.rowid, sigmoid(sum(m.weight)) as prob FROM testing_exploded t LEFT OUTER JOIN lr_model m ON (t.feature = m.feature) GROUP BY t.rowid Prediction  is  done  by  LEFT  OUTER  JOIN between  test  data  and  prediction  model No  need  to  load  the  entire  model  into  memory 21
  • 22. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Plan  of  the  Talk 1. Brief  introduction  to  Hivemall 2. How  to  use  Hivemall 3. Real-­‐time  prediction  w/  Hivemall  and  RDBMS 22
  • 23. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Type/Purpose  Matrix  of  Machine  Learning 23 Online Learning Offline Learning Online Prediction • Algorithm Trade  (HFT) • Twitter  real-­‐time   analysis • Ad-­‐tech (e.g.,  CTR/CVR   prediction) • Real-­‐time   recommendation Offline Prediction no/fewneeds? • Daily/weeklybatch   systems • Business Analytics/Reporting
  • 24. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. How  to  use  Hivemall Machine Learning Batch Training on Hadoop Online Prediction on RDBMS Prediction Model Label Feature  Vector Feature  Vector Label Export   prediction  model 24
  • 25. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Export  Prediction  Model  to  a  RDBMS 25 hive> desc news20b_cw_model1; feature int weight double Any  RDBMS TD  export Periodical  export  is  very easy in  Treasure  Data 103 -0.4896543622016907 104 -0.0955817922949791 105 0.12560302019119263 106 0.09214721620082855
  • 26. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. 26 hive>  desc  testing_exploded;                                                     feature                                  string   value                                      float Real-­‐time  Prediction  on  MySQL #2  Preparing  a  Test  data  table SIGMOID(x) =  1.0  /  (1.0  +  exp(-­‐x)) Prediction Model Label Feature  Vector SELECT     sigmoid(sum(t.value   *  m.weight))  as  prob FROM testing_exploded   t  LEFT  OUTER  JOIN   prediction_model   m  ON  (t.feature  =  m.feature) #3  Online  prediction  on  MySQL   You  can  alternatively  use  SQL  view defining  for  testing  target Index  lookups  are  very efficient  in  RDBMSs http://bit.ly/hivemall-­‐rtp
  • 27. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. Cost  of  Amazon  Machine  Learning Amazon-­‐ML  is  suspected  to  be  based  on  Vowpal Wabbit (single  process)   27 Data  Analysis  and  Model  Building  Fees $0.42/Instance  per  Hour Batch  Prediction $0.1/1000 requests Real-­‐time  Prediction $0.0001  per  a  request Pay-­‐per-­‐request    is  apparently  not  suitable  for  doing  prediction  for   each  web  request  (e.g.  online  CTR  prediction)
  • 28. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. 28 Real-­‐time  Prediction  on  Treasure  Data Run  batch  training job  periodically Real-­‐time  prediction on  a  RDBMS Periodical export
  • 29. Copyright  ©2015  Treasure  Data.    All  Rights  Reserved. 29 Beyond  Query-­‐as-­‐a-­‐Service! We  ❤️  Open-­‐source!  We  Invented  .. We  are  Hiring!