Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Introduction to Apache Hivemall v0.5.2 and v0.6
1. Introduction to Apache Hivemall
v0.5.2 and v0.6
Principal Engineer
Makoto YUI @myui
@ApacheHivemall
1Hadoop Conf Japan - Mar 14, 2019
2. Hadoop Conf Japan - Mar 14, 2019 2
We Open-source!
Streaming log collector Bulk data import/export Efficient binary serialization
Machine learning on Hadoop Workflow EngineEmbedded version of Fluentd
4. BigQuery ML at Google I/O 2018
4
https://ai.googleblog.com/2018/07/machine-learning-in-google-bigquery.html
Hadoop Conf Japan - Mar 14, 2019
5. 5
Could I use ML-in-SQL in my cluster?
Hadoop Conf Japan - Mar 14, 2019
6. 6
Open-source Machine Learning Solution
for SQL-on-Hadoop
Hadoop Conf Japan - Mar 14, 2019
hivemall.apache.org (incubating)
7. 7
HiveQL SparkSQL/Dataframe API Pig Latin
Hivemall is a multi/cross platform ML library
that provides rich set of functions
Hadoop Conf Japan - Mar 14, 2019
13. New in v0.5.2 – Brickhouse UDFs
Hadoop Conf Japan - Mar 14, 2019 13
JSON
Hyper
LogLog
14. New in v0.5.2 – Field-aware Factorization Machines
Hadoop Conf Japan - Mar 14, 2019 14
15. Hadoop Conf Japan - Mar 14, 2019 15
New in v0.5.2 – Okapi BM25 term weighting
16. Plan for v0.6
16Hadoop Conf Japan - Mar 14, 2019
Release in April-May, 2019
ü New state-of-the-art optimizers like AdamHD (merged)
ü Gradient boosting
ü Stable XGBoost support
ü More efficient Sparse vector support in RandomForest
ü Spark 2.4 support
17. 17
SELECT train_xgboost_classifier(features, label) as (model_id, model)
FROM training_data
XGBoost support in Hivemall (beta version)
SELECT rowed, AVG(predicted) as predicted
FROM (
-- predict with each model
SELECT xgboost_predict(rowid, features, model_id, model) AS (rowid, predicted)
-- join each test record with each model
FROM xgboost_models CROSS JOIN test_data_with_id
) t
GROUP BY rowid;
Hadoop Conf Japan - Mar 14, 2019
18. ü Word2Vec support
ü Multi-class Logistic Regression
ü Hyperparameter tuning (e.g., grid search)
ü Yarn application/standalone Hivemall
Future work (v0.7 or later)
18
PR#91
PR#116
Hadoop Conf Japan - Mar 14, 2019
19. Hadoop Conf Japan - Mar 14, 2019 19
We are hiring..
Engineer (Java/Scala/Ruby), Data Scientist, Sales Engineer, SRE, Support Engineer