1. Apache Mahout
● What is it ?
● How does it work ?
● Machine Learning
● Algorithms
● Install
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
2. Mahout – What is it ?
● Machine learning
● For large data
● Based on Hadoop
● But can work on a non Hadoop cluster
● Scaleable
● Licensed by Apache
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
3. Mahout – How does it work ?
● Uses Hadoop Map Reduce
● Has many supplied algorithms
● Supports four use cases
– Recommendation mining
– Clustering
– Classification
– Frequent Itemset Mining
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
4. Mahout - Machine Learning
Machine learning – what does it mean ?
● A branch of artificial intelligence
● Systems that learn from data
● Classify data after learning
● Learn on test data sets
● Generalisation – the ability to classify unseen data sets
– after learning
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
5. Mahout – Algorithms
Some of the available algorithms (among many others)
– Collaborative filtering
● Narrow Sense – make predictions about user interests by
collecting preferences
● General - Multi agent collaboration for information filtering
– Mean shift clustering
● Mode seeking, used for visual tracking
– Parallel frequent pattern mining
● Find unique features
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
6. Mahout – Install
So how do we install Mahout and test it ?
– Install Maven
● sudo apt-get install maven3
– Install Apache Mahout
● You will need subversion installed
● svn co http://svn.apache.org/repos/asf/mahout/trunk
● Go to dir containing pom.xml file
– mvn install ## in ./trunk
Full details available in the Mahout install guide on our web site shop
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
7. Mahout – Test Install
So let us run a test
● cd $MAHOUT_HOME/examples/bin
● ./build-reuters.sh
● choose option 1 kmeans clustering
● Should finish with – see next slide
Full details available in the Mahout install guide on our web site shop
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
8. Mahout – Test Install
cd $MAHOUT_HOME/examples/bin ; ./build-reuters.sh
Please call cluster-reuters.sh directly next time. This file is going away.
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. fuzzykmeans clustering
3. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
.................................
Inter-Cluster Density: NaN
Intra-Cluster Density: 0.0
CDbw Inter-Cluster Density: NaN
CDbw Intra-Cluster Density: NaN
CDbw Separation: NaN
Full details available in the Mahout install guide on our web site shop
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
9. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems
10. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems