Enviar búsqueda
Cargar
Hadoop and R Go to the Movies
•
0 recomendaciones
•
484 vistas
DataWorks Summit
Seguir
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 30
Recomendados
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
Ted Dunning
Finding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
T digest-update
T digest-update
Ted Dunning
Machine Learning logistics
Machine Learning logistics
Ted Dunning
How to tell which algorithms really matter
How to tell which algorithms really matter
DataWorks Summit
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
DataWorks Summit
Which Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
Recomendados
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
Ted Dunning
Finding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
T digest-update
T digest-update
Ted Dunning
Machine Learning logistics
Machine Learning logistics
Ted Dunning
How to tell which algorithms really matter
How to tell which algorithms really matter
DataWorks Summit
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
DataWorks Summit
Which Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
Machine Learning Logistics
Machine Learning Logistics
Ted Dunning
Doing-the-impossible
Doing-the-impossible
Ted Dunning
Strata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Ted Dunning
Strata New York 2012
Strata New York 2012
MapR Technologies
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
Ellen Friedman
11 2016 jit-dumping_ss360
11 2016 jit-dumping_ss360
Yvonne C. Salazar
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
DataWorks Summit
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Thomas Willberger
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
Deep Learning for Fraud Detection
Deep Learning for Fraud Detection
DataWorks Summit/Hadoop Summit
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
DataWorks Summit
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
Allen Day, PhD
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
MapR Technologies
Más contenido relacionado
La actualidad más candente
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
Machine Learning Logistics
Machine Learning Logistics
Ted Dunning
Doing-the-impossible
Doing-the-impossible
Ted Dunning
Strata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Ted Dunning
Strata New York 2012
Strata New York 2012
MapR Technologies
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
Ellen Friedman
11 2016 jit-dumping_ss360
11 2016 jit-dumping_ss360
Yvonne C. Salazar
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
DataWorks Summit
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Thomas Willberger
La actualidad más candente
(10)
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Machine Learning Logistics
Machine Learning Logistics
Doing-the-impossible
Doing-the-impossible
Strata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Strata New York 2012
Strata New York 2012
Surprising Advantages of Streaming - ACM March 2018
Surprising Advantages of Streaming - ACM March 2018
11 2016 jit-dumping_ss360
11 2016 jit-dumping_ss360
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Real-time path tracing using a hybrid deferred approach, GTC EUR 2017
Similar a Hadoop and R Go to the Movies
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
Deep Learning for Fraud Detection
Deep Learning for Fraud Detection
DataWorks Summit/Hadoop Summit
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
DataWorks Summit
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
Allen Day, PhD
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
MapR Technologies
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
John Mulhall
Practical Computing with Chaos
Practical Computing with Chaos
MapR Technologies
Practical Computing With Chaos
Practical Computing With Chaos
DataWorks Summit
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
MapR Technologies
Mathematical bridges From Old to New
Mathematical bridges From Old to New
MapR Technologies
Similar a Hadoop and R Go to the Movies
(20)
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Deep Learning for Fraud Detection
Deep Learning for Fraud Detection
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
Predictive Analytics with Hadoop
Predictive Analytics with Hadoop
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Building HBase Applications - Ted Dunning
Building HBase Applications - Ted Dunning
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
Practical Computing with Chaos
Practical Computing with Chaos
Practical Computing With Chaos
Practical Computing With Chaos
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
Mathematical bridges From Old to New
Mathematical bridges From Old to New
Más de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Más de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Último
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
WSO2
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
Kumar Satyam
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Sandro Moreira
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
johnbeverley2021
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
MadyBayot
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
AnitaRaj43
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Jago de Vreede
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Samir Dash
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
apidays
Último
(20)
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Hadoop and R Go to the Movies
1.
© 2014 MapR
Technologies 1© 2014 MapR Technologies
2.
© 2014 MapR
Technologies 2 Agenda A sample problem A general approach Complications arise Light is cast on the villains Who flee from the scene
3.
© 2014 MapR
Technologies 3 Agenda Script A sample problem A general approach Complications arise Light is cast on the villains Who flee from the scene
4.
© 2014 MapR
Technologies 4 Model Building in a Nutshell Gather data Build models Predict future World domination! Fight fraud Save the planet ✔
5.
© 2014 MapR
Technologies 5 A Sample Problem
6.
© 2014 MapR
Technologies 6 Modeling Energy Use • Modeling office and home energy use can save energy • Guides retrofits • Finds bad leaks • Increases awareness and understanding of problems • Demonstrated results of 20% or more savings • Savings = less CO2 = less planet warming
7.
© 2014 MapR
Technologies 7 Modeling Energy Use See ASHRAE RP-1050 http://bit.ly/1ovwGfy
8.
© 2014 MapR
Technologies 8 Modeling Energy Use (or not)
9.
© 2014 MapR
Technologies 9 Modeling Energy Use (complete hash)
10.
© 2014 MapR
Technologies 10 Some Notes on the Method • Can’t change method since this is ASHRAE standard • Small changes in cutoff can have ragged effect on model fit – Linear methods out of the question – Gradient based methods find local minima • All parameters interact strongly – Can’t solve for one at a time
11.
© 2014 MapR
Technologies 11 Evolutionary Algorithms • Basic algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population } until happy with results • Works great • Converges very slowly – If mutation is small, takes many, many steps to find best, gets trapped – If mutation is too big, keeps jumping away from optimum
12.
© 2014 MapR
Technologies 12 Doesn’t work in practice
13.
© 2014 MapR
Technologies 13 Meta-Evolutionary Algorithms • Meta mutation algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population use mutation size to set mutation rate per candidate } until happy with results • Works great • Converges very fast – If small jump works, we get more of that – If big jump works, we get more of that
14.
© 2014 MapR
Technologies 14 Meta-Evolutionary Algorithms • Meta mutation algorithm: fill population with random solutions do { keep best x% of solutions mutate survivors to fill population use mutation size to set mutation rate per candidate } until happy with results • Works great • Converges very fast – If small jump works, we get more of that – If big jump works, we get more of that
15.
© 2014 MapR
Technologies 15 Meta-Evolutionary Algorithms • Algorithm may go wrong way • May take wrong-size steps • But it quickly learns to correct • Bad strategies die out along with bad solutions
16.
© 2014 MapR
Technologies 16 But There’s a Rub • This new algorithm may be gang busters – But it comes with new knobs to turn • How can we tell where to turn them? • How do we make sense of a seething mass of 5 dimensional spiders?
17.
© 2014 MapR
Technologies 17 We need to look inside
18.
© 2014 MapR
Technologies 18 Demo Reel Synopsis • Constant mutation rate failure example • Meta-mutation succeeds • Meta-mutation can handle highly correlated narrow valleys • Very complex landscapes can be navigated • Strategy shifts fluidly to find solutions
19.
© 2014 MapR
Technologies 19 Let’s put on a show!
20.
© 2014 MapR
Technologies 20 Not quite that simple • Current problem is 5-dimensional • Problem parameters don’t make sense directly • So we need to show the human face of the problem (that is where we started!) • We also need dynamics to understand how the algorithm gets where it goes
21.
© 2014 MapR
Technologies 21 Main-line Model and Visualization Flow Data repo Solver grep Solver JSON model d3 + twistd JSON model Conventional Scalable
22.
© 2014 MapR
Technologies 22 How does R make video?
23.
© 2014 MapR
Technologies 23
24.
© 2014 MapR
Technologies 24
25.
© 2014 MapR
Technologies 25 Diagnostic Visualizations Solver JSON model Scalable Logs ScaleR ffmpeg
26.
© 2014 MapR
Technologies 26 Of Note • RevoScaleR solves most of the parallelism issues • We still want to run arbitrary R • Some legacy functions are Particularly Unfriendly to hdfs – png(filename) – requires conventional file access – system(command) – assumes conventional file access – ffmpeg (1) – assumes conventional file access
27.
© 2014 MapR
Technologies 27 Simple Solution • MapR provides hdfs and NFS access to cluster • All path names are the same • Map reduce programs can use legacy POSIX code
28.
© 2014 MapR
Technologies 28 Diagnostic Videos • 5D x 100 can get trapped in local minimum – ’470 example • 5D x 500 avoids trapping issues – ’470 quiescence and resurgence • 3D x 500 and 3D x 100 also avoid trapping • Need to distinguish empty house from occupied – ’771 shows poor fit to either regime, classic real world issue
29.
© 2014 MapR
Technologies 29 Lessons I Learned by Watching Movies • Lower dimensional problems are easier – Evolve baseline level and cut-points, solve for wing slopes – Hybrid solutions are not “cheating” • Real-world data always has surprises and I am always surprised by this • Can use 5P models as cluster “centroids” to handle 2-state homes
30.
© 2014 MapR
Technologies 30 And there’s a PRIZE in every box!