Information Visualization for Large-Scale Data Workflows by Michael Conover (Linkedin)

•

3 recomendaciones•1,794 vistas

The ability to instrument and interrogate data as it moves through a processing pipeline is fundamental to effective machine learning at scale. Applied in this capacity, information visualization technologies drive product innovation, shorten iteration cycles, reduce uncertainty, and ultimately improve the performance of predictive models. It can be challenging, however, to understand where in a workflow to employ data visualization, and, once committed to doing so, developing revealing visualizations that suggest clear next steps can be similarly daunting. In this talk we’ll describe the role that information visualization technologies play in the LinkedIn data science ecosystem, and explore best practices for understanding the structure of large-scale data in a production environment. From hypothesis generation and feature development to model evaluation and tooling, visualization is at the heart of LinkedIn’s machine learning workflows, enabling our data scientists to reason and communicate more effectively. Broken down into clear, structured insights based on proven technology and workflow patterns, this talk will help you understand how to apply information visualization to the analytical challenges you encounter every day. Meet Michael Conover Mike Conover builds machine learning technologies that leverage the behavior and relationships of hundreds of millions of people. A senior data scientist at LinkedIn, Mike has a Ph.D. in complex systems analysis with a focus on information propagation in large-scale social networks. His work has appeared in the New York Times, the Wall Street Journal, Science, MIT Technology Review and on National Public Radio.

Tecnología Educación

Information Visualization for
Large-Scale Data Workﬂows
Michael Conover
Senior Data Scientist, LinkedIn
@vagabondjack

Credit Pedro Cruz, University of Coimbra
David Crandall, Indiana University
John Nelson, IDV Solutions Elegant Complexity

Intellectual Dividends
Realistic Mental Models
Veriﬁcation of Assumptions
Shortened Iteration Cycles
Improved Predictive Performance
Product Insights
Clarity of Communication

Anscombe’s Quartet
http://en.wikpedia.org/Anscombe’s_quartet

0.0
0.1
0.2
0.3
0.4
−5.0 −2.5 0.0 2.5 5.0
Standard Normal
Density
0.0
0.1
0.2
0.3
0.4
−5.0 −2.5 0.0 2.5 5.0
Standard Normal
Density
100,0001,000,000

geom_point()
A Lens on the Joint Distribution

geom_point(alpha=1/5)
A Lens on the Joint Distribution

geom_bin2d(bins=35)
A Lens on the Joint Distribution

geom_point(alpha=1/5, aes(color=label))
A Lens on the Joint Distribution

geom_density2d(aes(color=label), bins=20)
A Lens on the Joint Distribution

Marginal Histograms
A Lens on the Joint Distribution

GGally (ggpairs)
A Lens on the Joint Distribution

Model A Model B
Training Data I
Training Data II
Evaluation Batteries

stanford.edu/~jhuang11/
Homework at Scale

github.com/StanfordHCI/termite
Topic Modeling

Workﬂow Principles
Latent, Pervasive
Modular
Consistent Visual Language

data.linkedin.com/opensource/azkaban
LinkedIn Azkaban

data.linkedin.com/opensource/white-elephant
LinkedIn White Elephant

github.com/Netﬂix/Lipstick
Netflix Lipstick

RStudio Shiny
rweb.stat.ucla.edu/ggplot2/

Más contenido relacionado

Destacado

The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...The Hive

Susheel Patel, Pivotal_Hadoop&SQLThe Hive

[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29The Hive

Pre production planningsofiamorana1

San martin 2013 2014Joserra Abarretegui

The Hive Think Tank: Rocking the Database World with RocksDBThe Hive

Redbookens007

Expt panel hive_data_rp_20130320_final-1The Hive

Groupon_Controlled Experimentation_Panel_The HiveThe Hive

1.nigam shah stanford_meetupThe Hive

San martin 2013 2014Joserra Abarretegui

Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...The Hive

My magazine editedsofiamorana1

Destacado (13)

The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...

Susheel Patel, Pivotal_Hadoop&SQL

[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29

Pre production planning

San martin 2013 2014

The Hive Think Tank: Rocking the Database World with RocksDB

Redbook

Expt panel hive_data_rp_20130320_final-1

Groupon_Controlled Experimentation_Panel_The Hive

1.nigam shah stanford_meetup

San martin 2013 2014

Opportunites in Big Data by Sumant Mandal, Founder of The Hive for The Hive I...

My magazine edited

Similar a Information Visualization for Large-Scale Data Workflows by Michael Conover (Linkedin)

Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Andrew Gardner

20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD

Ben Shneiderman: Thrill of Discoveryruss9595

Utilization of Super Pixel Based Microarray Image Segmentationijtsrd

Prediction Analysis in Clinical and Basic NeuroscienceCameron Craddock

Reproducible Research and the CloudMicrosoft Azure for Research

Learning where to look: focus and attention in deep visionUniversitat Politècnica de Catalunya

Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong

Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Sri Ambati

Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive

Visual Analytics in Omics - why, what, how?Jan Aerts

Putting the Magic in Data ScienceSean Taylor

Scientific data management from the lab to the webJose Manuel Gómez-Pérez

Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.

Deep Dive Into Deep Learning : How AI is Powering the Future of Endpoint Secu...Digital Transformation EXPO Event Series

Le Bauer: Data Driven Model DevelopmentquestRCN

Cloud Accelerated GenomicsIdan Tohami

20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD

Exponentials and NetworksDavid Orban

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya

Similar a Information Visualization for Large-Scale Data Workflows by Michael Conover (Linkedin) (20)

Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...

20170402 Crop Innovation and Business - Amsterdam

Ben Shneiderman: Thrill of Discovery

Utilization of Super Pixel Based Microarray Image Segmentation

Prediction Analysis in Clinical and Basic Neuroscience

Reproducible Research and the Cloud

Learning where to look: focus and attention in deep vision

Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing

Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!

Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik

Visual Analytics in Omics - why, what, how?

Putting the Magic in Data Science

Scientific data management from the lab to the web

Docker in Open Science Data Analysis Challenges by Bruce Hoff

Deep Dive Into Deep Learning : How AI is Powering the Future of Endpoint Secu...

Le Bauer: Data Driven Model Development

Cloud Accelerated Genomics

20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix

Exponentials and Networks

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019

Más de The Hive

"Responsible AI", by Charlie MuirheadThe Hive

Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...The Hive

Digital Transformation; Digital Twins for Delivering Business Value in IIoTThe Hive

Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive

The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive

Data Science in the EnterpriseThe Hive

AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive

“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...The Hive

"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell AutomationThe Hive

Social Impact & Ethics of AI by Steve OmohundroThe Hive

The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive

The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive

The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive

The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive

The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive

The Hive Think Tank: Heron at TwitterThe Hive

The Hive Think Tank: Unpacking AI for Healthcare The Hive

The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive

The Hive Think Tank - Design Thinking by Bernie Roth, Professor at Stanford U...The Hive

Más de The Hive (20)

"Responsible AI", by Charlie Muirhead

Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...

Digital Transformation; Digital Twins for Delivering Business Value in IIoT

Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18

The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...

Data Science in the Enterprise

AI in Software for Augmenting Intelligence Across the Enterprise

“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...

"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation

Social Impact & Ethics of AI by Steve Omohundro

The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan

The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...

The Hive Think Tank: The Future Of Customer Support - AI Driven Automation

The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...

The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change

The Hive Think Tank: Heron at Twitter

The Hive Think Tank: Unpacking AI for Healthcare

The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...

The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...

The Hive Think Tank - Design Thinking by Bernie Roth, Professor at Stanford U...

Último

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

2024 April Patch TuesdayIvanti

How to write a Business Continuity PlanDatabarracks

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

A Framework for Development in the AI AgeCprime

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Connecting the Dots for Information Discovery.pdfNeo4j

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Top 10 Hubspot Development Companies in 2024TopCSSGallery

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Information Visualization for Large-Scale Data Workflows by Michael Conover (Linkedin)

1. Information Visualization for Large-Scale Data Workﬂows Michael Conover Senior Data Scientist, LinkedIn @vagabondjack

2. Emergent Structure

3. Credit Pedro Cruz, University of Coimbra David Crandall, Indiana University John Nelson, IDV Solutions Elegant Complexity

4. Intellectual Dividends Realistic Mental Models Veriﬁcation of Assumptions Shortened Iteration Cycles Improved Predictive Performance Product Insights Clarity of Communication

5. Hypothesis Generation

7. @whitehouse #RSVP Color Commentary

8. Flock Together

9. Political Polarization on Twitter

10. Feature Development

11. Anscombe’s Quartet http://en.wikpedia.org/Anscombe’s_quartet

12. Basic Workflow Structure

13. 0.0 0.1 0.2 0.3 0.4 −5.0 −2.5 0.0 2.5 5.0 Standard Normal Density 0.0 0.1 0.2 0.3 0.4 −5.0 −2.5 0.0 2.5 5.0 Standard Normal Density 100,0001,000,000

14. geom_point() A Lens on the Joint Distribution

15. geom_point(alpha=1/5) A Lens on the Joint Distribution

16. geom_bin2d(bins=35) A Lens on the Joint Distribution

17. geom_point(alpha=1/5, aes(color=label)) A Lens on the Joint Distribution

18. geom_density2d(aes(color=label), bins=20) A Lens on the Joint Distribution

19. Marginal Histograms A Lens on the Joint Distribution

20. GGally (ggpairs) A Lens on the Joint Distribution

21. Model Fitting & Evaluation

22. Model A Model B Training Data I Training Data II Evaluation Batteries

23. stanford.edu/~jhuang11/ Homework at Scale

24. github.com/StanfordHCI/termite Topic Modeling

25. Layercake