SlideShare una empresa de Scribd logo
1 de 84
Descargar para leer sin conexión
Big Data Analytics
Using Mahout
Assoc. Prof. Dr. Thanachart Numnonda
Executive Director
IMC Institute
April 2015
2
Mahout
3
Mahout is a Java library which Implementing
Machine Learning techniques for
clustering, classification and recommendation
What is Mahout?
4
Mahout in Apache Software
5
Why Mahout?
Apache License
Good Community
Good Documentation
Scalable
Extensible
Command Line Interface
Java Library
6
List of Algorithms
7
List of Algorithms
8
List of Algorithms
9
Mahout Architecture
10
Use Cases
11
Installing Mahout
12
13
Select a EC2 service and click on Lunch Instance
14
Choose My AMIs and select “Hadoop Lab Image”
15
Choose m3.medium Type virtual server
16
Leave configuration details as default
17
Add Storage: 20 GB
18
Name the instance
19
Select an existing security group > Select Security
Group Name: default
20
Click Launch and choose imchadoop as a key pair
21
Review an instance / click Connect for
an instruction to connect to the instance
22
Connect to an instance from Mac/Linux
23
Connect to an instance from Windows using Putty
24
Connect to the instance
25
Install Maven
$ sudo apt-get install maven
$ mvn -v
26
Install Subversion
$ sudo apt-get install subversion
$ svn --version
27
Install Mahout
$ cd /usr/local/
$ sudo mkdir mahout
$ cd mahout
$ sudo svn co http://svn.apache.org/repos/asf/mahout/trunk
$ cd trunk
$ sudo mvn -DskipTests
28
Install Mahout (cont.)
29
Edit batch files
$ sudo vi $HOME/.bashrc
$ exec bash
30
Running
Recommendation Algorithms
31
MovieLens
http://grouplens.org/datasets/movielens/
32
Architecture for Recommender Engine
33
Item-Based Recommendation
Step 1: Gather some test data
Step 2: Pick a similarity measure
Step 3: Configure the Mahout command
Step 4: Making use of the output and doing more
with Mahout
34
Preparing Movielen data
$ wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
$ unzip ml-100k.zip
$ hadoop fs -mkdir /input
$ hadoop fs -put u.data /input/u.data
$ hadoop fs -mkdir /results
$ unset MAHOUT_LOCAL
35
Running Recommend Command
$ mahout recommenditembased -i /input/u.data -o
/results/itemRecom.txt -s SIMILARITY_LOGLIKELIHOOD
--tempDir /temp/recommend1
$ hadoop fs -ls /results/itemRecom.txt
36
View the result
$ hadoop fs -cat /results/itemRecom.txt/part-r-00000
37
Similarity Classname
SIMILARITY_COOCCURRENCE
SIMILARITY_LOGLIKELIHOOD
SIMILARITY_TANIMOTO_COEFFICIENT
SIMILARITY_CITY_BLOCK
SIMILARITY_COSINE
SIMILARITY_PEARSON_CORRELATION
SIMILARITY_EUCLIDEAN_DISTANCE
38
Running Recommendation in
a single machine
$ export MAHOUT_LOCAL=true
$ mahout recommenditembased -i ml-100k/u.data -o
/results/itemRecom.txt -s SIMILARITY_LOGLIKELIHOOD
--numRecommendations 5
$ cat results/itemRecom.txt/part-r-00000
39
Running
Example Program
Using CBayes classifer
40
Running Example Program
41
Preparing data
$ export WORK_DIR=/tmp/mahout-work-${USER}
$ mkdir -p ${WORK_DIR}
$ mkdir -p ${WORK_DIR}/20news-bydate
$ cd ${WORK_DIR}/20news-bydate
$ wget
http://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz
$ tar -xzf 20news-bydate.tar.gz
$ mkdir ${WORK_DIR}/20news-all
$ cd
$ cp -R ${WORK_DIR}/20news-bydate/*/* $
{WORK_DIR}/20news-all
42
Note: Running on MapReduce
If you want to run onMapReduce mode, you need to run the
following commands before running the feature extraction
commands
$ unset MAHOUT_LOCAL
$ hadoop fs -put ${WORK_DIR}/20news-all $
{WORK_DIR}/20news-all
43
Preparing the Sequence File
Mahout provides you a utility to convert the given input file in to a
sequence file format.
The input file directory where the original data resides.
The output file directory where the clustered data is to be stored.
44
Sequence Files
Sequence files are binary encoding of key/value pairs. There is a
header on the top of the file organized with some metadata
information which includes:
– Version
– Key name
– Value name
– Compression
To view the sequential file
mahout seqdumper -i <input file> | more
45
Generate Vectors from Sequence Files
Mahout provides a command to create vector files from
sequence files.
mahout seq2sparse -i <input file path> -o <output file path>
Important Options:
-lnorm Whether output vectors should be logNormalize.
-nv Whether output vectors should be NamedVectors
-wt The kind of weight to use. Currently TF or TFIDF.
Default: TFIDF
46
Extract Features
Convert the full 20 newsgroups dataset into a < Text, Text >
SequenceFile.
Convert and preprocesses the dataset into a < Text,
VectorWritable > SequenceFile containing term frequencies for
each document.
47
Prepare Testing Dataset
Split the preprocessed dataset into training and testing sets.
48
Training process
Train the classifier.
49
Testing the result
Test the classifier.
50
Dumping a vector file
We can dump vector files to normal text ones, as fillow
mahout vectordump -i <input file> -o <output file>
Options
--useKey If the Key is a vector than dump that instead
--csv Output the Vector as CSV
--dictionary The dictionary file.
51
Sample Output
52
Command line options
53
Command line options
54
Command line options
55
K-means clustering
56
Reuters Newswire
57
Preparing data
$ export WORK_DIR=/tmp/kmeans
$ mkdir $WORK_DIR
$ mkdir $WORK_DIR/reuters-out
$ cd $WORK_DIR
$ wget
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
$ mkdir $WORK_DIR/reuters-sgm
$ tar -xzf reuters21578.tar.gz -C $WORK_DIR/reuters-sgm
58
Convert input to a sequential file
$ mahout org.apache.lucene.benchmark.utils.ExtractReuters
$WORK_DIR/reuters-sgm $WORK_DIR/reuters-out
59
Convert input to a sequential file (cont)
$ mahout seqdirectory -i $WORK_DIR/reuters-out -o
$WORK_DIR/reuters-out-seqdir -c UTF-8 -chunk 5
60
Create the sparse vector files
$ mahout seq2sparse -i $WORK_DIR/reuters-out-seqdir/ -o
$WORK_DIR/reuters-out-seqdir-sparse-kmeans
--maxDFPercent 85 --namedVector
61
Running K-Means
$ mahout kmeans -i $WORK_DIR/reuters-out-seqdir-sparse-
kmeans/tfidf-vectors/ -c $WORK_DIR/reuters-kmeans-clusters
-o $WORK_DIR/reuters-kmeans -dm
org.apache.mahout.common.distance.CosineDistanceMeasure
-x 10 -k 20 -ow
62
K-Means command line options
63
Viewing Result
$mkdir $WORK_DIR/reuters-kmeans/clusteredPoints
$ mahout clusterdump -i $WORK_DIR/reuters-kmeans/clusters-
*-final -o $WORK_DIR/reuters-kmeans/clusterdump -d
$WORK_DIR/reuters-out-seqdir-sparse-kmeans/dictionary.file-0
-dt sequencefile -b 100 -n 20 --evaluate -dm
org.apache.mahout.common.distance.CosineDistanceMeasure
-sp 0 --pointsDir $WORK_DIR/reuters-kmeans/clusteredPoints
64
Viewing Result
65
Dumping a cluster file
We can dump cluster files to normal text ones, as fillow
mahout clusterdump -i <input file> -o <output file>
Options
-of The optional output format for the results.
Options: TEXT, CSV, JSON or GRAPH_ML
-dt The dictionary file type
--evaluate Run ClusterEvaluator
66
Canopy Clustering
67
Fuzyy k-mean Clustering
68
Command line options
69
Exercise: Traffic Accidents Dataset
http://fimi.ua.ac.be/data/accidents.dat.gz
70
Import-Export RDBMS data
71
Sqoop Hands-On Labs
1. Loading Data into MySQL DB
2. Installing Sqoop
3. Configuring Sqoop
4. Installing DB driver for Sqoop
5. Importing data from MySQL to Hive Table
6. Reviewing data from Hive Table
7. Reviewing HDFS Database Table files
Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop
1. MySQL RDS Server on AWS
A RDS Server is running on AWS with the following
configuration
> database: imc_db
> username: admin
> password: imcinstitute
>addr: imcinstitutedb.cmw65obdqfnx.us-west-2.rds.amazonaws.com
[This address may change]
73
1. country_tbl data
Testing data query from MySQL DB
Table name > country_tbl
74
2. Installing Sqoop
# wget http://apache.osuosl.org/sqoop/1.4.5/sqoop-1.4.5.bin__hadoop-
1.0.0.tar.gz
# tar -xvzf sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz
# sudo mv sqoop-1.4.5.bin__hadoop-1.0.0 /usr/local/
# rm sqoop-1.4.5.bin__hadoop-1.0.0
75
Installing Sqoop
Edit $HOME ./bashrc
# sudo vi $HOME/.bashrc
76
3. Configuring Sqoop
ubuntu@ip-172-31-12-11:~$ cd /usr/local/sqoop-1.4.5.bin__hadoop-
1.0.0/conf/
ubuntu@ip-172-31-12-11:~$ vi sqoop-env.sh
77
4. Installing DB driver for Sqoop
ubuntu@ip-172-31-12-11:~$ cd /usr/local/sqoop-1.4.5.bin__hadoop-
1.0.0/lib/
ubuntu@ip-172-31-12-11:/usr/local/sqoop-1.4.5.bin__hadoop-1.0.05/lib$
wget
https://www.dropbox.com/s/6zrp5nerrwfixcj/mysql-connector-java-5.1.23-bin.jar
ubuntu@ip-172-31-12-11:/usr/local/sqoop-1.4.5.bin__hadoop-1.0.055/lib$
exit
78
5. Importing data from MySQL to Hive Table
[hdadmin@localhost ~]$sqoop import --connect
jdbc:mysql://imcinstitutedb.cmw65obdqfnx.us-west-
2.rds.amazonaws.com/imc_db --username admin -P --table country_tbl
--hive-import --hive-table country -m 1
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: $HADOOP_HOME is deprecated.
Enter password: <enter here>
79
6. Reviewing data from Hive Table
80
7. Reviewing HDFS Database Table files
Start Web Browser to http://http://54.68.149.232:50070 then navigate to /user/hive/warehouse
81
Sqoop commands
82
Recommended Books
83
www.facebook.com/imcinstitute
84
Thank you
thanachart@imcinstitute.com
www.facebook.com/imcinstitute
www.slideshare.net/imcinstitute
www.thanachart.org

Más contenido relacionado

La actualidad más candente

Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 DigiGurukul
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus systemElvis Jonyo
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed databaseSonia Panesar
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Dynamic storage allocation techniques
Dynamic storage allocation techniquesDynamic storage allocation techniques
Dynamic storage allocation techniquesShashwat Shriparv
 
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSINGINTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSINGAaqib Hussain
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AIVishal Singh
 

La actualidad más candente (20)

Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1
 
hierarchical bus system
 hierarchical bus system hierarchical bus system
hierarchical bus system
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Java awt
Java awtJava awt
Java awt
 
Virtualization in cloud computing
Virtualization in cloud computingVirtualization in cloud computing
Virtualization in cloud computing
 
ch3.ppt
ch3.pptch3.ppt
ch3.ppt
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Struts framework
Struts frameworkStruts framework
Struts framework
 
strong slot and filler
strong slot and fillerstrong slot and filler
strong slot and filler
 
Apache mahout
Apache mahoutApache mahout
Apache mahout
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Dynamic storage allocation techniques
Dynamic storage allocation techniquesDynamic storage allocation techniques
Dynamic storage allocation techniques
 
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSINGINTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE  AND JSP PROCESSING
INTRODUCTION TO JSP,JSP LIFE CYCLE, ANATOMY OF JSP PAGE AND JSP PROCESSING
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
 
Google App Engine ppt
Google App Engine  pptGoogle App Engine  ppt
Google App Engine ppt
 

Destacado

Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartIMC Institute
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2DataWorks Summit
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for KidsIMC Institute
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015IMC Institute
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in ChinaIMC Institute
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache SqoopAvkash Chauhan
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/ProductionIMC Institute
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibIMC Institute
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by stepGiulio Roggero
 

Destacado (15)

Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
สมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kidsสมุดกิจกรรม Code for Kids
สมุดกิจกรรม Code for Kids
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in China
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Install Apache Hadoop for Development/Production
Install Apache Hadoop for  Development/ProductionInstall Apache Hadoop for  Development/Production
Install Apache Hadoop for Development/Production
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
Kanban boards step by step
Kanban boards step by stepKanban boards step by step
Kanban boards step by step
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 

Similar a Big Data Analytics using Mahout

Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerNopparat Nopkuat
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Matt Fuller
 
Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...Brad Chapman
 
Foreman presentation
Foreman presentationForeman presentation
Foreman presentationGlen Ogilvie
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
R server and spark
R server and sparkR server and spark
R server and sparkBAINIDA
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Amazon Web Services
 
Introduction to PowerShell
Introduction to PowerShellIntroduction to PowerShell
Introduction to PowerShellBoulos Dib
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability ManagementXavierDevroey
 
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...inovex GmbH
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodLin Yuan
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...Chester Chen
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahoutaneeshabakharia
 

Similar a Big Data Analytics using Mahout (20)

Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...Developing distributed analysis pipelines with shared community resources usi...
Developing distributed analysis pipelines with shared community resources usi...
 
Foreman presentation
Foreman presentationForeman presentation
Foreman presentation
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
R server and spark
R server and sparkR server and spark
R server and spark
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
 
Introduction to PowerShell
Introduction to PowerShellIntroduction to PowerShell
Introduction to PowerShell
 
Software Variability Management
Software Variability ManagementSoftware Variability Management
Software Variability Management
 
Pyramid deployment
Pyramid deploymentPyramid deployment
Pyramid deployment
 
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
Pyramid patterns
Pyramid patternsPyramid patterns
Pyramid patterns
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
 

Más de IMC Institute

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14IMC Institute
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019IMC Institute
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AIIMC Institute
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12IMC Institute
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital TransformationIMC Institute
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIMC Institute
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมIMC Institute
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon ValleyIMC Institute
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)IMC Institute
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง IMC Institute
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9 IMC Institute
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016IMC Institute
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger IMC Institute
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.orgIMC Institute
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgIMC Institute
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital TransformationIMC Institute
 

Más de IMC Institute (20)

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AI
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to Work
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon Valley
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.org
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.org
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Big Data Analytics using Mahout