SlideShare una empresa de Scribd logo
1 de 162
challenges, learnings and opportunities
presented by imron zuhri, adit, and samudra
KUDO codefest 14 May 2016
machine learning
can a machine think?
in 1996, Garry Kasparov was not afraid of a computer, and he won
the next year, he played against a new and improved Deep Blue and lost
this is the move that was so surprising, so un-machine-like,
that he was sure the IBM team had cheated
Rd5
Rd1
a random move, a computer bug
to kasparov, a sign of superior intelligence
Rd5
Rd1
big data analytics, is the culmination
of the machine way of thinking
we can now immensely
extend our memory and computational power
to helped us doing that
what is machine learning
some definitions
 a (hypnotized) user’s perspective
a scientific (witchcraft) field that:
researches fundamental principles from data (potions) and
develops magical algorithms (spells to cast)
 (pascal vincent, 2015)
 field of study that gives computers the ability to learn without
being explicitly programmed
 arthur samuel (1959)
 formal definitions (tom mitchell, 1998):
“A machine is said to be learning IF
it improves with:
 each experience E
 on specific tasks T
 with specific performance P
CURRENT VIEW OF ML FOUNDING DISCIPLINES
10
three niches for machine learning
data mining: using historical data to improve
decisions
 medical records  medical knowledge
software applications that are difficult to program
by hand
 autonomous driving
 image classification
user modeling
 automatic recommender systems
source: rong jin, 2013
(some) open problems in machine learning
 one-shot learning
 unsupervised learning
 reinforced learning
 artificial general intelligence
“most of human and animal learning
is unsupervised learning. If
intelligence was a cake, unsupervised
learning would be the cake,
supervised learning would be the
icing on the cake, and reinforcement
learning would be the cherry on the
cake. We know how to make the icing
and the cherry, but we don't know
how to make the cake.”
yan lecun
challenges in machine learning
 data-related:
 abundant yet scattered data
 unstructured, noisy data
 offline-stored data (duh!)
 resource-related:
 data storage
 space constraints
 computing power
 training time
 inve$$$tments
• initial investments
• running costs
challenges in machine learning
 methodical issues:
 result consistency
(i.e. accuracy)
 overfitting
 algorithm computational efficiency
 miscellaneous:
 architectural differences/
 portability issues
 popularity of non-open standard, vendor-
locked compute libraries/apis
(rawr!)
recent breakthroughs in machine learning
deepmind atari q learner (2014)
plays 5 kinds of atari 2600 games
states: pixels in atari
actions: left/right move
reward: score
algorithm used:
feedforward “q-learning”
conv-net
for unsupervised map of reward
recent breakthroughs in machine learning
the translator (2015)
real-time translations of speech
from/into 7 different languages
able to run from even from
resource-constrained embedded
hardware (i.e. smartphones)
uses same engine that was used in
microsoft cortana (creepy!)
Reinforcement Learning: DeepMind AlphaGo
 google deepmind alphago (2016)
 99.8% winning rate
vs other algorithm
 first program to defeat
human go champion
 algorithm used:
 deep neural network
 monte carlo search tree
 supervised learning from expert games
 reinforcement learning vs other alphago instances
supervised learning: random forest
deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data,
result:
 top 5 are random forest classifier
 for kaggle competition, try gbm : xgboost.
supervised: deep learning
don’t be fooled, dl research improve
part by part, either new kind of layer,
new activation function, new non-
convex optimization solver, or deeper
neural net.
from rodrigo benenson
deep learning accuracies ranking
supervised: deep learning
summary:
 relu works better than sigmoid function for activation.
 maxout works better when applied to dropconnect for
activation function.
 dropout layer works to fight overfitting.
 adagrad and adadelta works better if you don’t want to
tune optimization hyperparameter.
 deeper layer works: highway layer and residual layer.
unsupervised: t-sne
t-stochastic neighbor embedding
maaten and hinton (2008):
mnist data set visualization
 works best for data-viz
 can be used for clustering too
(if you’d bother to tweak the algo)
Given 100 and 1000 label of data, and the other unlabeled (~50.000)
Try to predict 10.000 future data.
● It works! with small label data.
● Now we don’t have to tell some interns or PhD student to label some
data. :)
A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015)
semi-supervised learning: ladder neural networks
collaborative filtering: restricted boltzmann machine
rbm for collaborative learning (hinton, 2008):
 it has been used in netflix and spotify algo.
 it works better than svd!
 correlation(svd, rbm) : -1 < c < 1
• can be assembled with svd
 to improve the prediction.
some advices for applied machine learning research
(this competition)
 preprocessing: scaling & imputation
 cross-validation: choose best algos
 hyperparameter optimization
 ensembling n-models: dark knowledge
raschka(2014):
scaling improve prediction!
gelman(2006)
do prediction for n/a data, then
predict the data with noise
less biased!
data preprocessing: scaling & imputation
cross-validation: how to choose best algo?
 cross-validation is a must!
 (tibshirani et.al 2014)
 don’t overlap your cross-
validation data partition!
 (zhang, data robot)
hyperparameter optimization
if you want to search best hyperparamaters:
do random search.
random search is better than grid search
(bengio, 2012)
ensembling n-models: dark knowledge
If two model give same accuracy, but low
correlation of prediction output, then we can
improve prediction accuracy by averaging
model prediction.
(Hinton, 2015)
the landscape of opportunities
Popular Big Data Industry
Financial Services Telco Web/Media Retail Healthcare Government
• Fraud detection
• Compliance
reporting
• Portfolio analysis
• Customer
statements
• Wire transfer alerts
• Customer
acquisition,
retention, and
profitability
• Subscriber data
management
• Fraud analysis
• Social analysis
• Response times
• Traffic analysis
• Product
affinity/bundling
• Sentiment Analysis
• Content
monetization
• Advertising
optimization
• Optimization of user
experience/ click
stream analysis
• Network
optimization to
support service
levels
• Store operation
analysis
• Customer loyalty
programs
• Collaborative
planning and
forecasting
• Loss prevention
• Supply chain
optimization
• Drug development
and launch cost
reduction
• Regulatory
compliance
• Product quality
• Return on
promotional
investment
• Lowered risk of new
product success
• Security/anti-terror
• Recovery Act public
disclosure
• Budgetary control
and management
• Educational
reporting
• Asset control and
assessment
Environment
monitoring
*cisco 2013-2014
currently the biggest prescriptive analytics engine:
contextual advertising
http://www.flashtalking.com/us/targeted-ads/
another one:
marketplace and services recommendation engine
challenges of implementation
and
what we do with machine learning
do you follow waze instruction during the first one week?
 would you buy a self-driving car that couldn’t drive
itself in 99 percent of the country?
 or that knew nearly nothing about parking,
 couldn’t be taken out in snow or heavy rain,
 and would drive straight over a gaping pothole?
if your answer is yes, then check out the google self-driving car, model year
2014
but
can we trust them enough?
the BIGGEST CHALLENGES in indonesia
DATA SETS
the current analytics technology
human still doing
most of the process
the current challenges of big data analytics?
heterogeneous
data sources,
systems and
formats
time consuming
and complex
data preparation
process
almost
impossible task
of integrating
various kind of
data
it requires
experts to
analyze big and
complex data
most of the user
interactions are
not intuitive
“Before performing analytics, data scientists must first
format and prepare the raw data for analytics, often with
more than 80% of the effort.”, said Intel Corp. Research
what it would be like,
if we can simplify the whole process?
?
?
hence our vision
we believe human should not be bogged down by tedious matters.
by reimagining analytics we envisioned the creation of intelligent
machines,
that will free human to focus on solving the world’s toughest
problems.
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
x-men
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
gundam
batman
sith
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
regi
mita
gundam
batman
tom
mediatrac
gundam
batman
sith
through a highly intuitive and natural user interface
natural language interface
voice and gesture recognition
ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
digital
telco
legal
retail
healthcare
agriculture
multi format
structured
unstructured
unclean
missing data
unstandardized
unconnected
difficult to analyze
cleaned and standardized
enriched and validated
connected at granular level
analytics ready
data
automatic
data collection
automatic
data preparation
automatic
data integration
teritory management
CONFIDENTIAL for internal use only
all of our silo data will have a totally elevated value,
once you connect them all in a meaningful way
are all of our current data connected yet?
Almost…
google is a humongous library index, with a smart
library card search that redirects you to the original
documents
facebook is a giant personal scrapbook of all your
acquaintances that are currently linked by manual
tagging and friends list
source:techglimpse
youtube and instagram are a huge repository of
current knowledge, lifestyle and trends that are still
largely unconnected
now imagine this!
when we can have intelligent machines that can
connect everything, in a meaningful way…
we can start asking questions, on things we never
thought possible to be asked before
can map songs across social
graphs.
Spotify
can give us situational data — where
someone is listening to a song,
when, how and even (to an extent) why.
Shazam
can help us track the growth of a song
using search and streams.
YouTube
are becoming hotbeds for music discovery.
Instagram & Vine
If we can connect all their data together?
or if you have a radio station, what sort of playlist that will appeal to
your target audience, if we know, that a sizeable percentage of them
have a hummer?
we can even predict specific combination of words, notes and
beats that will increase the chance of putting the song in
billboard top 40 this upcoming season.
here are some sample of hidden insights
that we can discover from our own large repository of data,
using our intelligent data integration and data discovery tools
when we integrate historical media articles with geodemographic and point of
interest database we can create a model that can predict high probability of fire
incidence down to street level
productivy optimization
lessons learned including how to scale your ML
scalability problems - outline
 large scale machine learning
 mahout - scalable ml on hadoop
 jubatus – distributed online real-time ml
 vowpal wabbit – fast learning at yahoo/ms
 trident ml and storm pattern: ml on storm, yarn
 upcoming --- samoa: ml on s4, storm
 issues in scalable distributed ml
 load balancing
 auto scaling
 job scheduling
 workflow management
 data and model parallelism
 parameter server framework
 peer-to-peer framework
scalability problems - outline
 distributed deep learning
 yahoolda: scalable parallel framework in latent variable models
 distbelief – distributed deep learning on cluster
 h2o – distributed deep learning on spark
 adam at msr – distributed deep learning
 dl4j – open source for deep learning on hadoop and spark
 petuum – distributed machine learning
 singa – distributed deep learning
 tensorflow: google large scale distributed dl
 mxnet: heterogeneous distributed deep learning
 caffee on spark: yahoo
 distributed learning and optimization
 proximal splitting/auxiliary coordinates;
 bundle (sub-gradient);
 shotgun: parallelized cdm (coordinate descent method)
 asynchronous sgd;
 hogwild/dogwild;
what’s next?
emerging analytics technology for automatic
analytics on large dimensional data
online deep learning
topological data analysis
fuzzy-rough set based data exploration system
granular computing
kernel set and spatiotemporal analysis
applied differential geometry
non axiomatic reasoning system
intelligent rule and knowledge extraction/discovery
multi agent based modeling
weak signal detection and analysis
bayesian networks analysis
genetic programming
self organizing neural networks
and also more humanlike user
interaction and data visualization
technology
eye tracking
glass-free auto stereoscopy
touch sensitive hologram
natural language user interface
tangible user interface
wearable gestural interface
brain-computer interface
sensor network user interface
In the meantime
principles for the development of a complete mind:
study the science of art. study the art of science.
develop your senses — especially learn how to see.
realize that everything connects to everything else.
Leonardo DaVinci

Más contenido relacionado

La actualidad más candente

Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
butest
 

La actualidad más candente (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Machine learning
Machine learning Machine learning
Machine learning
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 

Similar a Machine Learning - Challenges, Learnings & Opportunities

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

Similar a Machine Learning - Challenges, Learnings & Opportunities (20)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Intro to AI.pptx
Intro to AI.pptxIntro to AI.pptx
Intro to AI.pptx
 
What is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersWhat is Artificial Intelligence - Beginners
What is Artificial Intelligence - Beginners
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
SSE 2017 10-09
SSE 2017 10-09SSE 2017 10-09
SSE 2017 10-09
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGYAI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van Tol
 

Más de CodePolitan

Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
CodePolitan
 

Más de CodePolitan (19)

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium Member
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarov
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potential
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayat
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexander
 
Vison final
Vison   finalVison   final
Vison final
 
Tride
TrideTride
Tride
 
React ftw
React ftwReact ftw
React ftw
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for Hackathon
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.js
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of Things
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOP
 

Último

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Último (20)

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Machine Learning - Challenges, Learnings & Opportunities

  • 1. challenges, learnings and opportunities presented by imron zuhri, adit, and samudra KUDO codefest 14 May 2016 machine learning
  • 2. can a machine think?
  • 3. in 1996, Garry Kasparov was not afraid of a computer, and he won the next year, he played against a new and improved Deep Blue and lost
  • 4. this is the move that was so surprising, so un-machine-like, that he was sure the IBM team had cheated Rd5 Rd1
  • 5. a random move, a computer bug to kasparov, a sign of superior intelligence Rd5 Rd1
  • 6. big data analytics, is the culmination of the machine way of thinking we can now immensely extend our memory and computational power to helped us doing that
  • 7. what is machine learning
  • 8. some definitions  a (hypnotized) user’s perspective a scientific (witchcraft) field that: researches fundamental principles from data (potions) and develops magical algorithms (spells to cast)  (pascal vincent, 2015)  field of study that gives computers the ability to learn without being explicitly programmed  arthur samuel (1959)  formal definitions (tom mitchell, 1998): “A machine is said to be learning IF it improves with:  each experience E  on specific tasks T  with specific performance P
  • 9. CURRENT VIEW OF ML FOUNDING DISCIPLINES
  • 10. 10 three niches for machine learning data mining: using historical data to improve decisions  medical records  medical knowledge software applications that are difficult to program by hand  autonomous driving  image classification user modeling  automatic recommender systems source: rong jin, 2013
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. (some) open problems in machine learning  one-shot learning  unsupervised learning  reinforced learning  artificial general intelligence “most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake.” yan lecun
  • 50. challenges in machine learning  data-related:  abundant yet scattered data  unstructured, noisy data  offline-stored data (duh!)  resource-related:  data storage  space constraints  computing power  training time  inve$$$tments • initial investments • running costs
  • 51. challenges in machine learning  methodical issues:  result consistency (i.e. accuracy)  overfitting  algorithm computational efficiency  miscellaneous:  architectural differences/  portability issues  popularity of non-open standard, vendor- locked compute libraries/apis (rawr!)
  • 52. recent breakthroughs in machine learning deepmind atari q learner (2014) plays 5 kinds of atari 2600 games states: pixels in atari actions: left/right move reward: score algorithm used: feedforward “q-learning” conv-net for unsupervised map of reward
  • 53. recent breakthroughs in machine learning the translator (2015) real-time translations of speech from/into 7 different languages able to run from even from resource-constrained embedded hardware (i.e. smartphones) uses same engine that was used in microsoft cortana (creepy!)
  • 54. Reinforcement Learning: DeepMind AlphaGo  google deepmind alphago (2016)  99.8% winning rate vs other algorithm  first program to defeat human go champion  algorithm used:  deep neural network  monte carlo search tree  supervised learning from expert games  reinforcement learning vs other alphago instances
  • 55. supervised learning: random forest deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data, result:  top 5 are random forest classifier  for kaggle competition, try gbm : xgboost.
  • 56. supervised: deep learning don’t be fooled, dl research improve part by part, either new kind of layer, new activation function, new non- convex optimization solver, or deeper neural net. from rodrigo benenson deep learning accuracies ranking
  • 57. supervised: deep learning summary:  relu works better than sigmoid function for activation.  maxout works better when applied to dropconnect for activation function.  dropout layer works to fight overfitting.  adagrad and adadelta works better if you don’t want to tune optimization hyperparameter.  deeper layer works: highway layer and residual layer.
  • 58. unsupervised: t-sne t-stochastic neighbor embedding maaten and hinton (2008): mnist data set visualization  works best for data-viz  can be used for clustering too (if you’d bother to tweak the algo)
  • 59. Given 100 and 1000 label of data, and the other unlabeled (~50.000) Try to predict 10.000 future data. ● It works! with small label data. ● Now we don’t have to tell some interns or PhD student to label some data. :) A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015) semi-supervised learning: ladder neural networks
  • 60. collaborative filtering: restricted boltzmann machine rbm for collaborative learning (hinton, 2008):  it has been used in netflix and spotify algo.  it works better than svd!  correlation(svd, rbm) : -1 < c < 1 • can be assembled with svd  to improve the prediction.
  • 61. some advices for applied machine learning research (this competition)  preprocessing: scaling & imputation  cross-validation: choose best algos  hyperparameter optimization  ensembling n-models: dark knowledge
  • 62. raschka(2014): scaling improve prediction! gelman(2006) do prediction for n/a data, then predict the data with noise less biased! data preprocessing: scaling & imputation
  • 63. cross-validation: how to choose best algo?  cross-validation is a must!  (tibshirani et.al 2014)  don’t overlap your cross- validation data partition!  (zhang, data robot)
  • 64. hyperparameter optimization if you want to search best hyperparamaters: do random search. random search is better than grid search (bengio, 2012)
  • 65. ensembling n-models: dark knowledge If two model give same accuracy, but low correlation of prediction output, then we can improve prediction accuracy by averaging model prediction. (Hinton, 2015)
  • 66. the landscape of opportunities
  • 67.
  • 68. Popular Big Data Industry Financial Services Telco Web/Media Retail Healthcare Government • Fraud detection • Compliance reporting • Portfolio analysis • Customer statements • Wire transfer alerts • Customer acquisition, retention, and profitability • Subscriber data management • Fraud analysis • Social analysis • Response times • Traffic analysis • Product affinity/bundling • Sentiment Analysis • Content monetization • Advertising optimization • Optimization of user experience/ click stream analysis • Network optimization to support service levels • Store operation analysis • Customer loyalty programs • Collaborative planning and forecasting • Loss prevention • Supply chain optimization • Drug development and launch cost reduction • Regulatory compliance • Product quality • Return on promotional investment • Lowered risk of new product success • Security/anti-terror • Recovery Act public disclosure • Budgetary control and management • Educational reporting • Asset control and assessment Environment monitoring *cisco 2013-2014
  • 69.
  • 70.
  • 71. currently the biggest prescriptive analytics engine: contextual advertising http://www.flashtalking.com/us/targeted-ads/
  • 72. another one: marketplace and services recommendation engine
  • 73. challenges of implementation and what we do with machine learning
  • 74. do you follow waze instruction during the first one week?
  • 75.  would you buy a self-driving car that couldn’t drive itself in 99 percent of the country?  or that knew nearly nothing about parking,  couldn’t be taken out in snow or heavy rain,  and would drive straight over a gaping pothole? if your answer is yes, then check out the google self-driving car, model year 2014
  • 76. but
  • 77. can we trust them enough?
  • 78. the BIGGEST CHALLENGES in indonesia
  • 80. the current analytics technology human still doing most of the process
  • 81. the current challenges of big data analytics? heterogeneous data sources, systems and formats time consuming and complex data preparation process almost impossible task of integrating various kind of data it requires experts to analyze big and complex data most of the user interactions are not intuitive “Before performing analytics, data scientists must first format and prepare the raw data for analytics, often with more than 80% of the effort.”, said Intel Corp. Research
  • 82. what it would be like, if we can simplify the whole process? ? ?
  • 83. hence our vision we believe human should not be bogged down by tedious matters. by reimagining analytics we envisioned the creation of intelligent machines, that will free human to focus on solving the world’s toughest problems.
  • 84. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 85. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 86. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam x-men
  • 87. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam
  • 88. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam
  • 89. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 90. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 91. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery gundam batman sith
  • 92. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery regi mita gundam batman tom mediatrac gundam batman sith
  • 93. through a highly intuitive and natural user interface natural language interface voice and gesture recognition ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
  • 95.
  • 96. multi format structured unstructured unclean missing data unstandardized unconnected difficult to analyze cleaned and standardized enriched and validated connected at granular level analytics ready data automatic data collection automatic data preparation automatic data integration
  • 97.
  • 98.
  • 100. all of our silo data will have a totally elevated value, once you connect them all in a meaningful way
  • 101. are all of our current data connected yet?
  • 103. google is a humongous library index, with a smart library card search that redirects you to the original documents
  • 104. facebook is a giant personal scrapbook of all your acquaintances that are currently linked by manual tagging and friends list source:techglimpse
  • 105. youtube and instagram are a huge repository of current knowledge, lifestyle and trends that are still largely unconnected
  • 107. when we can have intelligent machines that can connect everything, in a meaningful way… we can start asking questions, on things we never thought possible to be asked before
  • 108. can map songs across social graphs. Spotify can give us situational data — where someone is listening to a song, when, how and even (to an extent) why. Shazam can help us track the growth of a song using search and streams. YouTube are becoming hotbeds for music discovery. Instagram & Vine If we can connect all their data together?
  • 109. or if you have a radio station, what sort of playlist that will appeal to your target audience, if we know, that a sizeable percentage of them have a hummer?
  • 110. we can even predict specific combination of words, notes and beats that will increase the chance of putting the song in billboard top 40 this upcoming season.
  • 111. here are some sample of hidden insights that we can discover from our own large repository of data, using our intelligent data integration and data discovery tools
  • 112. when we integrate historical media articles with geodemographic and point of interest database we can create a model that can predict high probability of fire incidence down to street level
  • 113.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120. lessons learned including how to scale your ML
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153. scalability problems - outline  large scale machine learning  mahout - scalable ml on hadoop  jubatus – distributed online real-time ml  vowpal wabbit – fast learning at yahoo/ms  trident ml and storm pattern: ml on storm, yarn  upcoming --- samoa: ml on s4, storm  issues in scalable distributed ml  load balancing  auto scaling  job scheduling  workflow management  data and model parallelism  parameter server framework  peer-to-peer framework
  • 154. scalability problems - outline  distributed deep learning  yahoolda: scalable parallel framework in latent variable models  distbelief – distributed deep learning on cluster  h2o – distributed deep learning on spark  adam at msr – distributed deep learning  dl4j – open source for deep learning on hadoop and spark  petuum – distributed machine learning  singa – distributed deep learning  tensorflow: google large scale distributed dl  mxnet: heterogeneous distributed deep learning  caffee on spark: yahoo  distributed learning and optimization  proximal splitting/auxiliary coordinates;  bundle (sub-gradient);  shotgun: parallelized cdm (coordinate descent method)  asynchronous sgd;  hogwild/dogwild;
  • 156.
  • 157.
  • 158.
  • 159. emerging analytics technology for automatic analytics on large dimensional data online deep learning topological data analysis fuzzy-rough set based data exploration system granular computing kernel set and spatiotemporal analysis applied differential geometry non axiomatic reasoning system intelligent rule and knowledge extraction/discovery multi agent based modeling weak signal detection and analysis bayesian networks analysis genetic programming self organizing neural networks
  • 160. and also more humanlike user interaction and data visualization technology eye tracking glass-free auto stereoscopy touch sensitive hologram natural language user interface tangible user interface wearable gestural interface brain-computer interface sensor network user interface
  • 162. principles for the development of a complete mind: study the science of art. study the art of science. develop your senses — especially learn how to see. realize that everything connects to everything else. Leonardo DaVinci