Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Vegas
The Missing Matplotlib for
Scala/Spark
DB Tsai
Roger Menezes
Homepage Kids Page Downloads Page
Netflix Recommendations
Every aspect
of the
Experience is
Machine
Learned
3
2017
> 100M members
> 190 countries
Multiple Devices
Genres: 23 rows/page average
Sims: 10 rows/page average
My List:
Continue Watching:
Popular on Netflix:
Trending Now:
Watch It Again:
Top Picks:
Because You Watched:
Genres:
New ...
Machine Learning at Netflix
● Optimize the Experimentation usecase vs Productionization
● Experimentation
○ Opportunity si...
Experimenter’s loop
Problem
Explore
Data
Identify
Features
Produce
Model
Evaluate
Model
Share
Findings
Notebooks
● Optimal for Experimentation
● Sharing reproducible research
○ Facilitates feedback loop with Product Managers
...
Python Notebooks
Python Notebooks
● Seamless Experience - ML experimentation
● Well known Scientific computing libraries
● Huge catalog of ...
Scala Notebooks
● Zeppelin, Jupyter, Databricks, Spark-Notebooks, ...
● Computing library gap filling up
● Lack of Visuali...
Introducing Vegas
● Visualization Library in Scala
● Mainly built for the notebook use case
● Scala wrapper around Vega-Li...
DECLARATIVE
STATISTICAL
VISUALIZATION
GRAMMAR
IN SCALA
You tell it WHAT should be done with the data, and it knows
HOW to ...
Added Bonus of Declarative
Visualizations:
INTERACTIVITY!
D3JS
VEGAS
VEGAS CODE EXPANDS OUT TO D3JS CODE!
Anatomy of a plot: Channels
X/Y channel
Shape Channel
Size Channel
Color Channel
Features…
1. Supports most plot types
2. Trellis plots
3. Layers
Layer 1.
Layer 2.
Layer 3.
4. Notebook and Consoles
5. Built-in spark support
Vegas
.withDataFrame(myDataFrame)
.encodeX(“population”)
.encodeY(“age”)
Mapped Columns
Pass In ...
6. Visual statistics
● Advanced Binning
● Sorting
● Scaling
● Custom Transforms
● Time Series
● Aggregation
● Filtering
● ...
How It Works !
1. Specify in Scala
2. Embed HTML
(iFrame)
3. Render within
iFrame using JS
VEGA
D3JS
VEGA-LITE*
VEGAS
MOREABSTRACTION SCALA DSL EMITS TYPE-CHECKED
VEGA-LITE JSON
VEGA-LITE CONVERTS INTERNALLY
TO VE...
What’s coming
1. Interactive selections
2. Selections transforms
Contributors
Aish DB Roger
Sudeep Jeremy
Thank you.
@NetflixResearch
@rogermenezes @dbtsai
The missing MatPlotLib
for Scala/Spark
http://vegas-viz.org
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes
Próxima SlideShare
Cargando en…5
×

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes

2.346 visualizaciones

Publicado el

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

Publicado en: Datos y análisis

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes

  1. 1. Vegas The Missing Matplotlib for Scala/Spark DB Tsai Roger Menezes
  2. 2. Homepage Kids Page Downloads Page Netflix Recommendations Every aspect of the Experience is Machine Learned
  3. 3. 3 2017 > 100M members > 190 countries
  4. 4. Multiple Devices
  5. 5. Genres: 23 rows/page average Sims: 10 rows/page average
  6. 6. My List: Continue Watching: Popular on Netflix: Trending Now: Watch It Again: Top Picks: Because You Watched: Genres: New Releases: Recently Added: Originals RowBillboard:
  7. 7. Machine Learning at Netflix ● Optimize the Experimentation usecase vs Productionization ● Experimentation ○ Opportunity sizing, Data Exploration ○ Feature Identification and Selection ○ Tweaks to ML algos ○ Model Evaluation
  8. 8. Experimenter’s loop Problem Explore Data Identify Features Produce Model Evaluate Model Share Findings
  9. 9. Notebooks ● Optimal for Experimentation ● Sharing reproducible research ○ Facilitates feedback loop with Product Managers ● End to end ML experiment. ○ Interactivity drives productivity
  10. 10. Python Notebooks
  11. 11. Python Notebooks ● Seamless Experience - ML experimentation ● Well known Scientific computing libraries ● Huge catalog of Visualization plotting libraries ○ Matplotlib, Seaborn, Bokeh, BQPlot, Lightning, etc.
  12. 12. Scala Notebooks ● Zeppelin, Jupyter, Databricks, Spark-Notebooks, ... ● Computing library gap filling up ● Lack of Visualization Libraries ○ Main friction point in adoption ○ End to End ML use case not convincing
  13. 13. Introducing Vegas ● Visualization Library in Scala ● Mainly built for the notebook use case ● Scala wrapper around Vega-Lite ○ Missing MatPlotLib for the Scala/Spark world.
  14. 14. DECLARATIVE STATISTICAL VISUALIZATION GRAMMAR IN SCALA You tell it WHAT should be done with the data, and it knows HOW to do it! Operations such as filtering, aggregation, faceting are built into the visualization, rather than putting the burden on the user to massage the data into shape. Complex visualizations can be built with a few high level abstractions: DATA TRANS- FORMS SCALES GUIDES MARKS cf : Altair Talk by Brian Granger in PyData 2016 https://youtu.be/v5mrwq7yJc4
  15. 15. Added Bonus of Declarative Visualizations: INTERACTIVITY! D3JS VEGAS VEGAS CODE EXPANDS OUT TO D3JS CODE!
  16. 16. Anatomy of a plot: Channels X/Y channel Shape Channel Size Channel Color Channel
  17. 17. Features…
  18. 18. 1. Supports most plot types
  19. 19. 2. Trellis plots
  20. 20. 3. Layers Layer 1. Layer 2. Layer 3.
  21. 21. 4. Notebook and Consoles
  22. 22. 5. Built-in spark support Vegas .withDataFrame(myDataFrame) .encodeX(“population”) .encodeY(“age”) Mapped Columns Pass In DF.
  23. 23. 6. Visual statistics ● Advanced Binning ● Sorting ● Scaling ● Custom Transforms ● Time Series ● Aggregation ● Filtering ● Math functions (log, etc) ● Descriptive Statistics
  24. 24. How It Works !
  25. 25. 1. Specify in Scala 2. Embed HTML (iFrame) 3. Render within iFrame using JS
  26. 26. VEGA D3JS VEGA-LITE* VEGAS MOREABSTRACTION SCALA DSL EMITS TYPE-CHECKED VEGA-LITE JSON VEGA-LITE CONVERTS INTERNALLY TO VEGA JSON SPEC VEGA TRANSLATES JSON TO D3JS CODE THAT CAN BE VERY VERBOSE A SCALA DSL FOR VEGA-LITE * Vega-Lite
  27. 27. What’s coming
  28. 28. 1. Interactive selections
  29. 29. 2. Selections transforms
  30. 30. Contributors Aish DB Roger Sudeep Jeremy
  31. 31. Thank you.
  32. 32. @NetflixResearch @rogermenezes @dbtsai The missing MatPlotLib for Scala/Spark http://vegas-viz.org

×