Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Introduction to Applied Machine Learning for Data-Driven Science
1. Introduction to Applied Machine Learning
for Data-Driven Science
Ichigaku Takigawa
takigawa@ist.hokudai.ac.jp
2. Today’s talk about only 3 questions
1. What is (and what isn’t) machine learning (ML)?
2. How can we use ML for own “data-driven” science?
3. What should we take care to use ML for own problems?
3. What is (and what isn’t) machine learning?
Question 1
4. Machine Learning
Concerned with the question of how to construct
computer programs that automatically improve with experience.
• Learning to recognize spoken words, handwritten characters, etc
• Learning to recognize who is who by seeing faces
• Learning to walk, speak, swim, ski, etc.
• Learning to drive an autonomous vehicle
• Learning to play world-class go, chess, shogi, etc.
Tasks below would need experience rather than a single principle.
Tom Mitchell
5. Machine Learning
“Learning” means to automatically improve with experience
“Machine” means computer programs
By machine learning, let’s get
computer programs to automatically improve with experience
But what exactly means….
• Computer programs? 🤔
• Automatically improve with experience? 🤔
7. Design and implement computer programs
Any computer programs process inputs to get outputs.
Computer programsInputs Outputs
• We need to explicitly know the complete procedure
to get outputs from inputs. 😱
• We manually code the computer program using computer
programming languages. 😵
8. But often we don’t know how to do it 😫
Spoken words Text words
• Learn to recognize spoken words, handwritten characters, etc
• Learn to recognize who is who
• Learn to walk, speak, swim, ski, etc.
• Learn to drive an autonomous vehicle
• Learn to play world-class go, chess, shogi, etc.
Consider how we can construct computer programs that can
Speech Recognizer
9. So we give “training data” to teach programs
Speech RecognizerSpoken words Text words
“hello”
Inputs Outputs
“machine”
“learning”
“automatically improve with experience (given data)”
…
…
“training data” = examples of input-output pairs
10. (Supervised) machine learning
Machine learning is a way to construct a computer program
directly by a given (large) collection of input-output examples
without being explicitly programmed.
Generic Object Recognition
Speech Recognition
Machine Translation
Super-Resolution Imaging
AI Game Players
“ ”
J’aime la
musique I love music
11. More typical cases with tabular data
ML modelx<latexit sha1_base64="7MVBSNoyoDWzaMSxoBhJg5NShso=">AAAB+HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48JmAckS5id9CZDZnaXmVkxLvkCr3r3Jl79G69+iZNkD5pY0FBUdVNNBYng2rjul1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFDzCpuFGYCdRSGUgsB2Mb2d++wGV5nF0byYJ+pIOIx5yRo2VGo/9csWtunOQVeLlpAI56v3yd28Qs1RiZJigWnc9NzF+RpXhTOC01Es1JpSN6RC7lkZUovaz+aNTcmaVAQljZScyZK7+vsio1HoiA7spqRnpZW8m/usFcinZhNd+xqMkNRixRXCYCmJiMmuBDLhCZsTEEsoUt78TNqKKMmO7KtlSvOUKVknrouq5Va9xWand5PUU4QRO4Rw8uIIa3EEdmsAA4Rle4NV5ct6cd+djsVpw8ptj+APn8wcecJOi</latexit><latexit sha1_base64="7MVBSNoyoDWzaMSxoBhJg5NShso=">AAAB+HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48JmAckS5id9CZDZnaXmVkxLvkCr3r3Jl79G69+iZNkD5pY0FBUdVNNBYng2rjul1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFDzCpuFGYCdRSGUgsB2Mb2d++wGV5nF0byYJ+pIOIx5yRo2VGo/9csWtunOQVeLlpAI56v3yd28Qs1RiZJigWnc9NzF+RpXhTOC01Es1JpSN6RC7lkZUovaz+aNTcmaVAQljZScyZK7+vsio1HoiA7spqRnpZW8m/usFcinZhNd+xqMkNRixRXCYCmJiMmuBDLhCZsTEEsoUt78TNqKKMmO7KtlSvOUKVknrouq5Va9xWand5PUU4QRO4Rw8uIIa3EEdmsAA4Rle4NV5ct6cd+djsVpw8ptj+APn8wcecJOi</latexit><latexit sha1_base64="7MVBSNoyoDWzaMSxoBhJg5NShso=">AAAB+HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48JmAckS5id9CZDZnaXmVkxLvkCr3r3Jl79G69+iZNkD5pY0FBUdVNNBYng2rjul1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFDzCpuFGYCdRSGUgsB2Mb2d++wGV5nF0byYJ+pIOIx5yRo2VGo/9csWtunOQVeLlpAI56v3yd28Qs1RiZJigWnc9NzF+RpXhTOC01Es1JpSN6RC7lkZUovaz+aNTcmaVAQljZScyZK7+vsio1HoiA7spqRnpZW8m/usFcinZhNd+xqMkNRixRXCYCmJiMmuBDLhCZsTEEsoUt78TNqKKMmO7KtlSvOUKVknrouq5Va9xWand5PUU4QRO4Rw8uIIa3EEdmsAA4Rle4NV5ct6cd+djsVpw8ptj+APn8wcecJOi</latexit><latexit sha1_base64="7MVBSNoyoDWzaMSxoBhJg5NShso=">AAAB+HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48JmAckS5id9CZDZnaXmVkxLvkCr3r3Jl79G69+iZNkD5pY0FBUdVNNBYng2rjul1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFDzCpuFGYCdRSGUgsB2Mb2d++wGV5nF0byYJ+pIOIx5yRo2VGo/9csWtunOQVeLlpAI56v3yd28Qs1RiZJigWnc9NzF+RpXhTOC01Es1JpSN6RC7lkZUovaz+aNTcmaVAQljZScyZK7+vsio1HoiA7spqRnpZW8m/usFcinZhNd+xqMkNRixRXCYCmJiMmuBDLhCZsTEEsoUt78TNqKKMmO7KtlSvOUKVknrouq5Va9xWand5PUU4QRO4Rw8uIIa3EEdmsAA4Rle4NV5ct6cd+djsVpw8ptj+APn8wcecJOi</latexit>
y<latexit sha1_base64="IAkI/WnulkwRo6OBvH4htvdy0fo=">AAAB+HicbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUL9gFtKJPpTTt0JgkzEyGGfoFb3bsTt/6NW7/EaZuFVg9cOJxzL+dygkRwbVz30ymtrW9sbpW3Kzu7e/sH1cOjjo5TxbDNYhGrXkA1Ch5h23AjsJcopDIQ2A2mt3O/+4BK8zi6N1mCvqTjiIecUWOlVjas1ty6uwD5S7yC1KBAc1j9GoxilkqMDBNU677nJsbPqTKcCZxVBqnGhLIpHWPf0ohK1H6+eHRGzqwyImGs7ESGLNSfFzmVWmcysJuSmole9ebiv14gV5JNeO3nPEpSgxFbBoepICYm8xbIiCtkRmSWUKa4/Z2wCVWUGdtVxZbirVbwl3Qu6p5b91qXtcZNUU8ZTuAUzsGDK2jAHTShDQwQnuAZXpxH59V5c96XqyWnuDmGX3A+vgEgBJOj</latexit><latexit sha1_base64="IAkI/WnulkwRo6OBvH4htvdy0fo=">AAAB+HicbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUL9gFtKJPpTTt0JgkzEyGGfoFb3bsTt/6NW7/EaZuFVg9cOJxzL+dygkRwbVz30ymtrW9sbpW3Kzu7e/sH1cOjjo5TxbDNYhGrXkA1Ch5h23AjsJcopDIQ2A2mt3O/+4BK8zi6N1mCvqTjiIecUWOlVjas1ty6uwD5S7yC1KBAc1j9GoxilkqMDBNU677nJsbPqTKcCZxVBqnGhLIpHWPf0ohK1H6+eHRGzqwyImGs7ESGLNSfFzmVWmcysJuSmole9ebiv14gV5JNeO3nPEpSgxFbBoepICYm8xbIiCtkRmSWUKa4/Z2wCVWUGdtVxZbirVbwl3Qu6p5b91qXtcZNUU8ZTuAUzsGDK2jAHTShDQwQnuAZXpxH59V5c96XqyWnuDmGX3A+vgEgBJOj</latexit><latexit sha1_base64="IAkI/WnulkwRo6OBvH4htvdy0fo=">AAAB+HicbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUL9gFtKJPpTTt0JgkzEyGGfoFb3bsTt/6NW7/EaZuFVg9cOJxzL+dygkRwbVz30ymtrW9sbpW3Kzu7e/sH1cOjjo5TxbDNYhGrXkA1Ch5h23AjsJcopDIQ2A2mt3O/+4BK8zi6N1mCvqTjiIecUWOlVjas1ty6uwD5S7yC1KBAc1j9GoxilkqMDBNU677nJsbPqTKcCZxVBqnGhLIpHWPf0ohK1H6+eHRGzqwyImGs7ESGLNSfFzmVWmcysJuSmole9ebiv14gV5JNeO3nPEpSgxFbBoepICYm8xbIiCtkRmSWUKa4/Z2wCVWUGdtVxZbirVbwl3Qu6p5b91qXtcZNUU8ZTuAUzsGDK2jAHTShDQwQnuAZXpxH59V5c96XqyWnuDmGX3A+vgEgBJOj</latexit><latexit sha1_base64="IAkI/WnulkwRo6OBvH4htvdy0fo=">AAAB+HicbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUL9gFtKJPpTTt0JgkzEyGGfoFb3bsTt/6NW7/EaZuFVg9cOJxzL+dygkRwbVz30ymtrW9sbpW3Kzu7e/sH1cOjjo5TxbDNYhGrXkA1Ch5h23AjsJcopDIQ2A2mt3O/+4BK8zi6N1mCvqTjiIecUWOlVjas1ty6uwD5S7yC1KBAc1j9GoxilkqMDBNU677nJsbPqTKcCZxVBqnGhLIpHWPf0ohK1H6+eHRGzqwyImGs7ESGLNSfFzmVWmcysJuSmole9ebiv14gV5JNeO3nPEpSgxFbBoepICYm8xbIiCtkRmSWUKa4/Z2wCVWUGdtVxZbirVbwl3Qu6p5b91qXtcZNUU8ZTuAUzsGDK2jAHTShDQwQnuAZXpxH59V5c96XqyWnuDmGX3A+vgEgBJOj</latexit>
Inputs Outputs
12. Underlying principle: Use of statistical trends
We see common patterns (empirical rules) emerge from observing
many examples, which we cannot recognize when we see only a few.
Characterize the difference between
not by explicit rules,
but by implicit statistical rules directly defined by many observations.
“ ” and “ ”
23. This is all about (supervised) machine learning
Get weight(g) and height(cm)
Fruit SorterInputs Outputs
Now we got a computer program for this classification problem.
weight(g)
height(cm)
“features” or “descriptors”
Apple or Orange
“class label” or “response”