MLconf NYC Josh Wills

MLconf
MLconfMLconf
1
MLConf NYC 2014
Josh Wills, Senior Director of Data Science
Cloudera
A Little Bit About Me
2
An Experience I Had Recently
3
The Two Kinds of Data Scientists
• The Lab
• Statisticians who got
really good at
programming
• Neuroscientists,
geneticists, etc.
• The Factory
• Software engineers who
were in the wrong place
at the wrong time
4
The Lab and The Factory
Analytics in the Lab
• Question-driven
• Interactive
• Ad-hoc, post-hoc
• Fixed data
• Focus on speed and
flexibility
• Output is embedded into a
report or in-database scoring
engine
Analytics in the Factory
• Metric-driven
• Automated
• Systematic
• Fluid data
• Focus on transparency and
reliability
• Output is a production
system that makes customer-
facing decisions
5
6
Data Science In The Factory
On Icebergs
7
The Impedance Mismatch
8
What Do We Need?
9
Apache Spark
10
A Feature Extraction DSL for Spark
11
The R Formula Specification
12
So Why Doesn’t This Exist Yet?
13
Functional Programming to the Rescue
14
15
Data Science in the Lab
Great Tools for Investigative Analytics
16
Cloudera Impala
17
LLVM and NUMBA
18
Python UDFs for Impala
19
Python UDFs for Impala
• github.com/cloudera/impyla
• Already There
• Numeric and boolean types (as native python objects)
• In Progress
• String support
• C/C++ function integration
• Planned
• Struct/tuple and array types
• UDAFs
• Include support for PyData stack (scikit-learn, NLTK)
20
Josh Wills, Director of Data Science, Cloudera @josh_wills
Thank you!
1 de 21

Recomendados

MLconf NYC Ted Willke por
MLconf NYC Ted WillkeMLconf NYC Ted Willke
MLconf NYC Ted WillkeMLconf
2.3K vistas39 diapositivas
Innovate Better Through Machine data Analytics por
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsHal Rottenberg
263 vistas65 diapositivas
H2O for Medicine and Intro to H2O in Python por
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonSri Ambati
2.4K vistas28 diapositivas
Model Monitoring at Scale with Apache Spark and Verta por
Model Monitoring at Scale with Apache Spark and VertaModel Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and VertaDatabricks
360 vistas26 diapositivas
H2O World - H2O Deep Learning with Arno Candel por
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelSri Ambati
3.5K vistas8 diapositivas
Predicting Patient Outcomes in Real-Time at HCA por
Predicting Patient Outcomes in Real-Time at HCAPredicting Patient Outcomes in Real-Time at HCA
Predicting Patient Outcomes in Real-Time at HCASri Ambati
2.5K vistas20 diapositivas

Más contenido relacionado

La actualidad más candente

AdvancedMD Customer Presentation por
AdvancedMD Customer PresentationAdvancedMD Customer Presentation
AdvancedMD Customer PresentationSplunk
797 vistas16 diapositivas
Open Data Science Conference Agile Data por
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile DataDataKitchen
1.5K vistas27 diapositivas
Developing Highly Instrumented Applications with Minimal Effort por
Developing Highly Instrumented Applications with Minimal EffortDeveloping Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal EffortTim Hobson
8K vistas30 diapositivas
Nicola Pagni - Anomaly Detection in Elasticsearch por
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchMeetupDataScienceRoma
2.4K vistas30 diapositivas
Bradley Evans SPEDDEXES 2014 por
Bradley Evans SPEDDEXES 2014Bradley Evans SPEDDEXES 2014
Bradley Evans SPEDDEXES 2014aceas13tern
308 vistas21 diapositivas
Machine Learning and Analytics Breakout Session por
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
441 vistas27 diapositivas

La actualidad más candente(19)

AdvancedMD Customer Presentation por Splunk
AdvancedMD Customer PresentationAdvancedMD Customer Presentation
AdvancedMD Customer Presentation
Splunk797 vistas
Open Data Science Conference Agile Data por DataKitchen
Open Data Science Conference Agile DataOpen Data Science Conference Agile Data
Open Data Science Conference Agile Data
DataKitchen1.5K vistas
Developing Highly Instrumented Applications with Minimal Effort por Tim Hobson
Developing Highly Instrumented Applications with Minimal EffortDeveloping Highly Instrumented Applications with Minimal Effort
Developing Highly Instrumented Applications with Minimal Effort
Tim Hobson8K vistas
Bradley Evans SPEDDEXES 2014 por aceas13tern
Bradley Evans SPEDDEXES 2014Bradley Evans SPEDDEXES 2014
Bradley Evans SPEDDEXES 2014
aceas13tern308 vistas
Machine Learning and Analytics Breakout Session por Splunk
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
Splunk441 vistas
The Critical Missing Component in the Production ML Stack por Databricks
The Critical Missing Component in the Production ML StackThe Critical Missing Component in the Production ML Stack
The Critical Missing Component in the Production ML Stack
Databricks66 vistas
Getting Started with Splunk Enterprise Hands-On Breakout Session por Splunk
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout Session
Splunk509 vistas
From Volume to Value - A Guide to Data Engineering por Ry Walker
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
Ry Walker723 vistas
Python + MPP Database = Large Scale AI/ML Projects in Production Faster por Paige_Roberts
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts149 vistas
Machine Learning and the Elastic Stack por Yann Cluchey
Machine Learning and the Elastic StackMachine Learning and the Elastic Stack
Machine Learning and the Elastic Stack
Yann Cluchey4.3K vistas
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... por Dataconomy Media
 Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of... Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...
Dataconomy Media537 vistas
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S... por Work-Bench
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare S...
Work-Bench11.6K vistas
Fast Data Intelligence in the IoT - real-time data analytics with Spark por Bas Geerdink
Fast Data Intelligence in the IoT - real-time data analytics with SparkFast Data Intelligence in the IoT - real-time data analytics with Spark
Fast Data Intelligence in the IoT - real-time data analytics with Spark
Bas Geerdink768 vistas
Data Science at Scale - The DevOps Approach por Mihai Criveti
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
Mihai Criveti126 vistas
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall por Brian Glas
Secure360 May 2018 Lessons Learned from OWASP T10 DatacallSecure360 May 2018 Lessons Learned from OWASP T10 Datacall
Secure360 May 2018 Lessons Learned from OWASP T10 Datacall
Brian Glas346 vistas
Big Data Berlin v8.0 Stream Processing with Apache Apex por Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex1.1K vistas
Data Quality With or Without Apache Spark and Its Ecosystem por Databricks
Data Quality With or Without Apache Spark and Its EcosystemData Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its Ecosystem
Databricks1.3K vistas

Destacado

Building a geospatial processing pipeline using Hadoop and HBase and how Mons... por
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...DataWorks Summit
14.5K vistas20 diapositivas
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016 por
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
1.9K vistas46 diapositivas
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud por
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
1.2K vistas25 diapositivas
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop por
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
82.7K vistas40 diapositivas
Pig, Making Hadoop Easy por
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
84.7K vistas16 diapositivas
introduction to data processing using Hadoop and Pig por
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
92.5K vistas32 diapositivas

Destacado(13)

Building a geospatial processing pipeline using Hadoop and HBase and how Mons... por DataWorks Summit
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
DataWorks Summit14.5K vistas
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016 por MLconf
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
MLconf1.9K vistas
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud por Cloudera, Inc.
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Cloudera, Inc.1.2K vistas
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop por royans
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
royans82.7K vistas
Pig, Making Hadoop Easy por Nick Dimiduk
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
Nick Dimiduk84.7K vistas
introduction to data processing using Hadoop and Pig por Ricardo Varela
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela92.5K vistas
Practical Problem Solving with Apache Hadoop & Pig por Milind Bhandarkar
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar237.2K vistas
HIVE: Data Warehousing & Analytics on Hadoop por Zheng Shao
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao111.2K vistas
Hive Quick Start Tutorial por Carl Steinbach
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
Carl Steinbach139.8K vistas
Integration of Hive and HBase por Hortonworks
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
Hortonworks99.6K vistas
Hadoop, Pig, and Twitter (NoSQL East 2009) por Kevin Weil
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
Kevin Weil143.2K vistas
Introduction To Map Reduce por rantav
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav106.2K vistas
Big Data Analytics with Hadoop por Philippe Julio
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio441.9K vistas

Similar a MLconf NYC Josh Wills

Production machine learning_infrastructure por
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
9.2K vistas47 diapositivas
Cloudera User Group - From the Lab to the Factory por
Cloudera User Group - From the Lab to the FactoryCloudera User Group - From the Lab to the Factory
Cloudera User Group - From the Lab to the FactoryClouderaUserGroups
807 vistas19 diapositivas
How and why you need to build a big data lab por
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data labChris Kernaghan
1.3K vistas27 diapositivas
Big Data Rampage por
Big Data RampageBig Data Rampage
Big Data RampageNiko Vuokko
1.2K vistas35 diapositivas
Automating the process of continuously prioritising data, updating and deploy... por
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth
253 vistas27 diapositivas
Turn Data Into Actionable Insights - StampedeCon 2016 por
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
1.9K vistas24 diapositivas

Similar a MLconf NYC Josh Wills(20)

Production machine learning_infrastructure por joshwills
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
joshwills9.2K vistas
Cloudera User Group - From the Lab to the Factory por ClouderaUserGroups
Cloudera User Group - From the Lab to the FactoryCloudera User Group - From the Lab to the Factory
Cloudera User Group - From the Lab to the Factory
ClouderaUserGroups807 vistas
How and why you need to build a big data lab por Chris Kernaghan
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
Chris Kernaghan1.3K vistas
Big Data Rampage por Niko Vuokko
Big Data RampageBig Data Rampage
Big Data Rampage
Niko Vuokko1.2K vistas
Automating the process of continuously prioritising data, updating and deploy... por Ola Spjuth
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
Ola Spjuth253 vistas
Turn Data Into Actionable Insights - StampedeCon 2016 por StampedeCon
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon1.9K vistas
PXL Data Engineering Workshop By Selligent por Jonny Daenen
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent
Jonny Daenen45 vistas
Data Science meets Software Development por Alexis Seigneurin
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Alexis Seigneurin1.1K vistas
Chaos Engineering and How to Manage Data Stages With Adi Polak | Current 2022 por HostedbyConfluent
Chaos Engineering and How to Manage Data Stages With Adi Polak | Current 2022Chaos Engineering and How to Manage Data Stages With Adi Polak | Current 2022
Chaos Engineering and How to Manage Data Stages With Adi Polak | Current 2022
HostedbyConfluent321 vistas
Customer Presentation por Splunk
Customer PresentationCustomer Presentation
Customer Presentation
Splunk608 vistas
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit... por Ilkay Altintas, Ph.D.
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by... por Lucidworks
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Lucidworks1.5K vistas
Dev nexus 2017 por Roy Russo
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
Roy Russo1.6K vistas
Real-time Operational Intelligence for machine data por jKool
Real-time Operational Intelligence for machine dataReal-time Operational Intelligence for machine data
Real-time Operational Intelligence for machine data
jKool357 vistas
Real time monitoring of hadoop and spark workflows por Shankar Manian
Real time monitoring of hadoop and spark workflowsReal time monitoring of hadoop and spark workflows
Real time monitoring of hadoop and spark workflows
Shankar Manian200 vistas
You've Got No UI?! (Agile Data Teams) por Mark Barber
You've Got No UI?! (Agile Data Teams)You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)
Mark Barber943 vistas
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware por DaveEdwards12
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
DaveEdwards121.1K vistas
Options for Data Prep - A Survey of the Current Market por Dremio Corporation
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation2.2K vistas
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ... por confluent
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
confluent1.6K vistas
Efficient & effective data management for research projects : ILRI's Data Ma... por CIARD Movement
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...
CIARD Movement1.5K vistas

Más de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments... por
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
946 vistas15 diapositivas
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding por
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
634 vistas49 diapositivas
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re... por
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
535 vistas18 diapositivas
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush por
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
749 vistas25 diapositivas
Josh Wills - Data Labeling as Religious Experience por
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
628 vistas22 diapositivas
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai... por
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
615 vistas60 diapositivas

Más de MLconf(20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments... por MLconf
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf946 vistas
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding por MLconf
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf634 vistas
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re... por MLconf
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf535 vistas
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush por MLconf
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf749 vistas
Josh Wills - Data Labeling as Religious Experience por MLconf
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf628 vistas
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai... por MLconf
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf615 vistas
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea... por MLconf
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf956 vistas
Meghana Ravikumar - Optimized Image Classification on the Cheap por MLconf
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf371 vistas
Noam Finkelstein - The Importance of Modeling Data Collection por MLconf
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf304 vistas
June Andrews - The Uncanny Valley of ML por MLconf
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf423 vistas
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks por MLconf
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf451 vistas
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D... por MLconf
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf444 vistas
Vito Ostuni - The Voice: New Challenges in a Zero UI World por MLconf
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf303 vistas
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection... por MLconf
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf811 vistas
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip... por MLconf
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf573 vistas
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o... por MLconf
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf650 vistas
Neel Sundaresan - Teaching a machine to code por MLconf
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf1K vistas
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl... por MLconf
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf4K vistas
Soumith Chintala - Increasing the Impact of AI Through Better Software por MLconf
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf646 vistas
Roy Lowrance - Predicting Bond Prices: Regime Changes por MLconf
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf426 vistas

Último

Ransomware is Knocking your Door_Final.pdf por
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
98 vistas46 diapositivas
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... por
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...ShapeBlue
141 vistas29 diapositivas
Cencora Executive Symposium por
Cencora Executive SymposiumCencora Executive Symposium
Cencora Executive Symposiummarketingcommunicati21
160 vistas14 diapositivas
Future of AR - Facebook Presentation por
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
65 vistas27 diapositivas
The Role of Patterns in the Era of Large Language Models por
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsYunyao Li
91 vistas65 diapositivas
CryptoBotsAI por
CryptoBotsAICryptoBotsAI
CryptoBotsAIchandureddyvadala199
42 vistas5 diapositivas

Último(20)

Ransomware is Knocking your Door_Final.pdf por Security Bootcamp
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdf
Security Bootcamp98 vistas
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... por ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue141 vistas
Future of AR - Facebook Presentation por Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty65 vistas
The Role of Patterns in the Era of Large Language Models por Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 vistas
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... por Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro35 vistas
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... por ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 vistas
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... por ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue108 vistas
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue por ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue152 vistas
Why and How CloudStack at weSystems - Stephan Bienek - weSystems por ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue247 vistas
LLMs in Production: Tooling, Process, and Team Structure por Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage57 vistas
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... por ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 vistas
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 por BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 vistas
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue por ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue137 vistas
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... por TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc176 vistas
State of the Union - Rohit Yadav - Apache CloudStack por ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 vistas
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... por ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 vistas

MLconf NYC Josh Wills

Notas del editor

  1. My major contribution to western civilization.See also: http://www.quora.com/Data-Science/What-is-the-difference-between-a-data-scientist-and-a-statistician/answer/Josh-Wills
  2. Curt Monashmakes a distinction between investigative analytics (which he defines here: http://www.dbms2.com/2011/03/03/investigative-analytics/ ) and operational analytics that I like, and I expanded it into my own set of differences that I want to walk through here.Investigative analytics is what we think of when we think of traditional BI: there’s an analyst or an executive that is searching for previously unknown patterns in a data set, either by looking at a series of visualizations mediated by database queries, or by applying some statistical models to a prepared data set to tease out some deeper explanations. This is where the vast majority of the BI market is focused right now.Operational analytics, on the other hand, is a nascent market, and I don’t believe the existing BI tools have done a good job of supporting companies that want to start leveraging their modeling and analytical prowess in order to make better decisions in real-time. I’d like to shift some of the conversation and the focus in the market from the lab to the factory.
  3. The tip of the iceberg metaphor. This has been a useful metaphor for me throughout my career, I feel like I am constantly exploring the tip of the iceberg,from the theory of model building to the practice of model building to operational model building.There is a ton of stuff I don’t know, but I hope that I can provide a useful sort of commentary on the culture of credit scoring from the perspective of an outsider, kind of like Alexis de Tocqueville or Borat
  4. Parser combinators,monoids, regular expressions, oh my!
  5. Tools focus on speed and flexibility.