Using Jython To Prototype Mahout Code

•Descargar como KEY, PDF•

4 recomendaciones•2,062 vistas

async_io

Presentation to the DC BigData meetup #2 on using Mahout via jython

Tecnología

Using Jython To Prototype 
Mahout Code
Jonathan Altman
Principal Engineer, Concur
Twi=er: @async_io

Who Am I and What Do I Do?
• By day: principal engineer at Concur
• Architect of high‐volume travel booking site
• Architect of travel data model for iKnerary 
storage/expense integraKon
• Currently team lead for eﬀort to leverage our 
travel and spend data into an eﬀecKve 
recommendaKon engine

What is Mahout?
• Java library of pre‐built implementaKons of 
various machine learning tasks
• Recommenders: collaboraKve ﬁltering
• Clustering: grouping things by similarity
• ClassiﬁcaKon: analysis of a corpus for clustering
• Intended to run against Hadoop‐based data sets
• h=p://mahout.apache.org/

What is jython?
• ImplementaKon of python that runs against 
the jvm
• Has full access to any well‐behaved java library
• Started in 1997 by Jim Hugunin, who also later 
did IronPython for the .Net CLR
• Version 2.5.2 mirrors python 2.5
• h=p://www.jython.org/

Why Do This?
• I needed to evaluate Mahout’s suitability as 
the toolkit for our travel recommender system
• I am not primarily a java dev (yet?), and I don’t 
know how to create a maven project
• But I do know python
• Fastest way between 2 points is a straight line
• Step 1: adapt sample code from “Mahout In 
AcKon” to jython

How Do I Do This?
# Add Mahout jars to jython’s path
sys.path.append(os.environ.get("MAHOUT_CORE"))
for jar in glob.glob(os.environ.get("MAHOUT_JAR_DIR") +
"/*.jar"):
sys.path.append(jar)

# import classes from Mahout jar…
from org.apache.mahout.cf.taste.impl.model.file import *
# Bunch of imports deleted

def main():
# and we are using the imported FileDataModel
model = FileDataModel(File(sys.argv[1]))

What Did We Learn?
• About 3 hours to port ﬁrst “Mahout In AcKon” 
example to jython
• 3 minutes to port the second
• Includes learning how to import jars into python
• And building a nice loop to punt on jar 
dependency management :‐)
• Increases ability to experiment with ideas in 
Mahout by reducing ceremony

Want Some Extra Stuﬀ?
• Python IDEs that work with jython:
– PyCharm (JetBrains)
– PyDev (Eclipse add‐on)
– WingIDE (no debugger)
• Ported GroupLens 100k data set example from 
secKon 2.5 of “Mahout In AcKon” is at h=ps://
gist.github.com/1041033

Más contenido relacionado

La actualidad más candente

Monitoring kubernetes with prometheuscontinohq

CICD Pipeline Using Github ActionsKumar Shìvam

Using GitHub Actions to Deploy your Workloads to AzureKasun Kodagoda

JAX 2013: Introducing Eclipse Orionmartinlippert

Ktor 部署攻略 - 老派 Fat Jar 大法Shengyou Fan

A User Interface for adding Machine Learning tools into GitHubRumyana Rumenova

PR workflowWeiqiang Zhuang

2d web mapping with flaskCharmyne Mamador

Chat+twitter app with liftk4200

Spring Tooling: What's new and what's comingmartinlippert

Ratpack and Grails 3Lari Hotari

GitHub Actions demo with mablBertold Kolics

Apache AirflowSumit Maheshwari

Apache Airflow IntroductionLiangjun Jiang

Automate your businesszmoog

用 OPENRNDR 將 Chatbot 訊息視覺化Shengyou Fan

Knative CloudEventsNobuhiro Sue

Ecs gitlab runnersdynnamitt

Jenkins-Koji plugin presentation on Python & Ruby devel group @ BrnoVaclav Tunka

Azkabanwyukawa

La actualidad más candente (20)

Monitoring kubernetes with prometheus

CICD Pipeline Using Github Actions

Using GitHub Actions to Deploy your Workloads to Azure

JAX 2013: Introducing Eclipse Orion

Ktor 部署攻略 - 老派 Fat Jar 大法

A User Interface for adding Machine Learning tools into GitHub

PR workflow

2d web mapping with flask

Chat+twitter app with lift

Spring Tooling: What's new and what's coming

Ratpack and Grails 3

GitHub Actions demo with mabl

Apache Airflow

Apache Airflow Introduction

Automate your business

用 OPENRNDR 將 Chatbot 訊息視覺化

Knative CloudEvents

Ecs gitlab runners

Jenkins-Koji plugin presentation on Python & Ruby devel group @ Brno

Azkaban

Destacado

Guide to AngularJS Services - NOVA MEAN August 2014async_io

NOVA MEAN - Why the M in MEAN is a Significant Contributor to Its Successasync_io

Building a Cauldron for Chef to Cook Inasync_io

Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!async_io

Javascript Promises/Q Libraryasync_io

Dcjq node.js presentationasync_io

Destacado (6)

Guide to AngularJS Services - NOVA MEAN August 2014

NOVA MEAN - Why the M in MEAN is a Significant Contributor to Its Success

Building a Cauldron for Chef to Cook In

Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!

Javascript Promises/Q Library

Dcjq node.js presentation

Similar a Using Jython To Prototype Mahout Code

State of angular ecosystemGiovanni Cândido da Silva

All about that reactive uiPaul van Zyl

Modern Web Framework : Play frameworkSuman Adak

Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData

Introduction to React nativeDhaval Barot

Jython in workflow and rules enginesVaclav Tunka

Lattice yapc-slideshareGwenn Etourneau

An evening with React NativeMike Melusky

Google App Engine Java, Groovy and GaelykGuillaume Laforge

The Kubernetes EffectBilgin Ibryam

Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos

Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena

DevOps: Automate all the thingsMat Mannion

mpandya_posterMihir Pandya

Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)OpenBlend society

Why real integration developers ride CamelsChristian Posta

[DevDay 2017] ReactJS Hands on - Speaker: Binh Phan - Developer at mgm techno...DevDay.org

Immutable infrastructure：觀念與實作 (建議)William Yeh

Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan

Similar a Using Jython To Prototype Mahout Code (20)

State of angular ecosystem

All about that reactive ui

Modern Web Framework : Play framework

Hot to build continuously processing for 24/7 real-time data streaming platform?

Introduction to React native

Jython in workflow and rules engines

Lattice yapc-slideshare

An evening with React Native

Google App Engine Java, Groovy and Gaelyk

The Kubernetes Effect

Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)

Concurrency Programming in Java - 01 - Introduction to Concurrency Programming

DevOps: Automate all the things

mpandya_poster

Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)

Why real integration developers ride Camels

[DevDay 2017] ReactJS Hands on - Speaker: Binh Phan - Developer at mgm techno...

Immutable infrastructure：觀念與實作 (建議)

Introduction to Jupyter notebook and MS Azure Machine Learning Studio

Último

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

WordPress Websites for Engineers: Elevate Your Brandgvaughan

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Advanced Computer Architecture – An IntroductionDilum Bandara

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

unit 4 immunoblotting technique complete.pptxBkGupta21

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Using Jython To Prototype Mahout Code

1. Using Jython To Prototype  Mahout Code Jonathan Altman Principal Engineer, Concur Twi=er: @async_io

2. Who Am I and What Do I Do? • By day: principal engineer at Concur • Architect of high‐volume travel booking site • Architect of travel data model for iKnerary  storage/expense integraKon • Currently team lead for eﬀort to leverage our  travel and spend data into an eﬀecKve  recommendaKon engine

3. What is Mahout? • Java library of pre‐built implementaKons of  various machine learning tasks • Recommenders: collaboraKve ﬁltering • Clustering: grouping things by similarity • ClassiﬁcaKon: analysis of a corpus for clustering • Intended to run against Hadoop‐based data sets • h=p://mahout.apache.org/

4. What is jython? • ImplementaKon of python that runs against  the jvm • Has full access to any well‐behaved java library • Started in 1997 by Jim Hugunin, who also later  did IronPython for the .Net CLR • Version 2.5.2 mirrors python 2.5 • h=p://www.jython.org/

5. Why Do This? • I needed to evaluate Mahout’s suitability as  the toolkit for our travel recommender system • I am not primarily a java dev (yet?), and I don’t  know how to create a maven project • But I do know python • Fastest way between 2 points is a straight line • Step 1: adapt sample code from “Mahout In  AcKon” to jython

6. How Do I Do This? # Add Mahout jars to jython’s path sys.path.append(os.environ.get("MAHOUT_CORE")) for jar in glob.glob(os.environ.get("MAHOUT_JAR_DIR") + "/*.jar"): sys.path.append(jar) # import classes from Mahout jar… from org.apache.mahout.cf.taste.impl.model.file import * # Bunch of imports deleted def main(): # and we are using the imported FileDataModel model = FileDataModel(File(sys.argv[1]))

7. What Did We Learn? • About 3 hours to port ﬁrst “Mahout In AcKon”  example to jython • 3 minutes to port the second • Includes learning how to import jars into python • And building a nice loop to punt on jar  dependency management :‐) • Increases ability to experiment with ideas in  Mahout by reducing ceremony

8. Want Some Extra Stuﬀ? • Python IDEs that work with jython: – PyCharm (JetBrains) – PyDev (Eclipse add‐on) – WingIDE (no debugger) • Ported GroupLens 100k data set example from  secKon 2.5 of “Mahout In AcKon” is at h=ps:// gist.github.com/1041033

Notas del editor

\n
First we built a travel booking tool\nThen we integrated it with expense and built reporting\nThen we went back and built the trip data storage subsystem to handle increased volumes of data\nNow we are trying to put the combined travel and expense data into Hadoop to do analysis and leverage the knowledge of our customers for their benefit\n
So Mahout looked like it might be a good way to bootstrap our efforts around building recommendations. If nothing else, it might be a fast path to v1 while we write more specialized algorithms tuned to our specific data sets as a v2.\n
It&#x2019;s very cool: Jim H started both projects as tests: jython to see if jvm would be faster than python&#x2019;s vm. IronPython to &#x201C;prove&#x201D; CLR was slow compared to e.g. JVM (it wasn&#x2019;t)\nYeah, jython&#x2019;s definitely on the cutting edge with python 2.5 support\n
Mahout appears to be a good system for doing recommendation engines. We need to find out how good, and what its strengths and limitations are.\n\nI do know some java; enough to do some light recreational Android programming. But not only do I know python, the data scientist who will actually determine the optimal factors to build our recommendation engine on knows python. She also doesn&#x2019;t know java (yet?). So I have a tool that the team is familiar with\n\nJust building Mahout so I could test it out was painful enough. It requires maven2 to build, but since this is an existing project it was all configured for me to just build after downloading. But I still find it painful to watch maven work.\n\nI shuddered at the thought of having to actually do the maven setup for a new project that would have to be built\n\nMost importantly here, what you end up with when you make Mahout accessible via jython is a rapid prototyping/testing/experimentation tool for building out Mahout code. We&#x2019;ve taken out the ceremony. That&#x2019;s all.\n\nWhen you&#x2019;re done figuring out what you need to do, you could then move to compiled java for speed.\n\nBut, for many/most applications, you can probably stop there. The actual Mahout processing is the serious limiting factor here, not the jython code. My suspicion is that there&#x2019;s far more performance to be gained optimizing the actual Mahout implementation than moving the jython code (which is native jvm by the time it runs) to java/scala/clojure\n
\n
The single largest chunk of my time was actually spent trying to decide what jars I had to append to my jython path, followed by really grokking the jython path/import stuff\n\nAs you can see, after enough time I just punted on the jar dependencies. Every single jar is on the path, although I only import from the ones I need. Worth some research into jython to see if I&#x2019;m adding any overhead other than search path like opening/inspecting the jars. I suspect not.\n\nNow, if you knew maven, it might take less time to start a new project and get it up than I would take, but once *I* was done, every subsequent jython script takes almost no time to set up, and the project is ready to run as soon as you&#x2019;ve saved your source code.\n\nWe can work without having to either build a new app for every experiment, or build in some way to control which experiment runs in some ever-growing app\n
I haven&#x2019;t really tested either PyCharm or PyDev to do these things. Someone else can do *that* lightning talk at a later meetup\n

Using Jython To Prototype Mahout Code

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a Using Jython To Prototype Mahout Code

Similar a Using Jython To Prototype Mahout Code (20)

Último

Último (20)

Using Jython To Prototype Mahout Code

Notas del editor