SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Learning for Big Data
Hsuan-Tien Lin (林軒田)
htlin@csie.ntu.edu.tw
Department of Computer Science
& Information Engineering
National Taiwan University
(國立台灣大學資訊工程系)
slightly modified from my keynote talk
in IEEE BigData 2015 Taipei Satellite Session
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 1/30
Introduction
About the Title
• “Learning for Big Data”
—my wife: you have made a typo
• do you mean “Learning from 
Z
ZBig Data”?
—no, not a shameless sales campaign
for my co-authored best-selling book
(http://amlbook.com)
as machine learning
researcher
machine learning for big data
—easy?!
as machine learning
educator
human learning for big data
—hard!!
will focus on human learning for big data
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 2/30
Introduction
Human Learning for Big Data
Todo
• some FAQs that I have encountered as
• educator (NTU and NTU@Coursera)
• team mentor (KDDCups, TSMC Big Data competition, etc.)
• researcher (CLLab@NTU)
• consultant ( ), a real-time advertisement bidding startup
• my imperfect yet honest answers that hint what shall be learned
First Honest Claims
• must-learn for big data ≈ must-learn for small data in ML,
but the former with bigger seriousness
• system design/architecture very important, but somewhat
beyond my pay grade
I wish I had an answer to that
because I’m tired of answering that question.
—Yogi Berra (Athlete)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 3/30
Asking Questions
Big Data FAQs (1/4)
how to ask good questions from
(my precious big) data?
My Polite Answer
good start already , any more
thoughts that you have in mind?
My Honest Answer
I don’t know.
or a slightly longer answer:
if you don’t know, I don’t know.
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 4/30
Asking Questions
A Similar Scenario
how to ask good questions from
(my precious big) data?
how to find a research topic for my thesis?
My Polite Answer
good start already , any more
thoughts that you have in mind?
My Honest Answer
I don’t know.
or a slightly longer answer:
I don’t know, but perhaps you can start by
thinking about motivation and feasibility.
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 5/30
Asking Questions
Finding (Big) Data Questions
≈ Finding Research Topics
• motivation: what are you interested in?
• feasibility: what can or cannot be done?
motivation
• something publishable?
oh, possibly just for
people in academia
• something that improves
xyz performance
• something that inspires
deeper study
—helps generate questions
feasibility
• modeling
• computational
• budget
• timeline
• . . .
—helps filter questions
brainstorm from motivation;
rationalize from feasibility
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 6/30
Asking Questions
Finding Big Data Questions
generate questions
from motivation
• variety: dream more in big
data age
• velocity: evolving data,
evolving questions
filter questions
from feasibility
• volume: computational
bottleneck
• veracity: modeling with
non-textbook data
almost never find right question in your first try
—good questions come interactively
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 7/30
Asking Questions
Interactive Question-Asking from Big Data:
Our KDDCup 2011 Experience (1/2)
Recommender System
• data: how users have rated movies
• goal: predict how a user would rate an unrated movie
A Hot Problem
• competition held by Netflix in 2006
• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
• similar competition (movies → songs) held by Yahoo! in
KDDCup 2011, the most prestigious data mining competition
• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs
National Taiwan University got two world
champions in KDDCup 2011—with Profs.
Chih-Jen Lin, Shou-De Lin, and many students.
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 8/30
Asking Questions
Interactive Question-Asking from Big Data:
Our KDDCup 2011 Experience (2/2)
Q1 (pre-defined): can we improve rating prediction of (user, song)?
Q1.1 after data analysis:
two types of users, lazy 7% (same
rating always)  normal
—if a user gives 60, 60, . . . during
training, how’d she rate next item?
same (80%) different (20%)
Q1.1.1: can we distinguish 80%
using other features?
. . .
—failed (something you normally
wouldn’t see in paper )
Q1.2 after considering domain
knowledge: test data are newer
logs
—shall we emphasize newer logs in
training data?
Q1.2.1: can we just give each log
different weight? (but how?)
Q1.2.2: can we tune optimization
to effectively emphasize newer
logs? (yes this worked )
our KDDCup experience: interactive
(good or bad) question-asking kept us going!
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 9/30
Asking Questions
Learning to Ask Questions from Big Data
Must-learn Items
• true interest for motivation
—big data don’t generate questions, big interests do
• capability of machines (when to use ML?) for feasibility
Taught in ML Foundations on NTU@Coursera
1 exists underlying pattern to be learned
2 no easy/programmable definition of pattern
3 having data related to pattern
—ML isn’t cure-all
• research cycle for systematic steps
—a Ph.D. or serious research during M.S./undergraduate study
Computers are useless. They can only give
you answers.—Pablo Picasso (Artist)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 10/30
Simple Model
Big Data FAQs (2/4)
what is the best machine learning model for
(my precious big) data?
My Polite Answer
the best model is
data-dependent, let’s chat
about your data first
My Honest Answer
I don’t know.
or a slightly longer answer:
I don’t know about best, but perhaps you can
start by thinking about simple models.
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 11/30
Simple Model
Sophisticated Model for Big Data
what is the best machine learning model for
(my precious big) data?
what is the most sophisticated machine
learning model for (my precious big) data?
• myth: my big data work best with most sophisticated model
• partially true: deep learning for image recognition @ Google
—10 million images on 1 billion internal weights
(Le et al., Building High-level Features Using Large Scale Unsupervised Learning, ICML
2012)
Science must begin with myths,
and with the criticism of myths.
—Karl Popper (Philosopher)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 12/30
Simple Model
Criticism of Sophisticated Model
myth: my big data work best
with most sophisticated model
Sophisticated Model
• time-consuming to train and predict
—often mismatch to big data
• difficult to tune or modify
—often exhausting to use
• point of no return
—often cannot “simplify” nor “analyze”
sophisticated model shouldn’t be
first-choice for big data
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 13/30
Simple Model
Linear First (1/2)
what is the first machine learning model for
(my precious big) data?
Taught in ML Foundations on NTU@Coursera
linear model (or simpler) first:
• efficient to train and predict, e.g. (Lin et al., Large-scale logistic regression
and linear support vector machines using Spark. IEEE BigData 2014)
—my favorite in , a real-time ad. bidding startup
• easy to tune or modify
—key of our KDDCup winning solutions in 2010 (educational
data mining) and 2012 (online ads)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 14/30
Simple Model
Linear First (2/2)
what is the first machine learning model for
(my precious big) data?
Taught in ML Foundations on NTU@Coursera
linear model (or simpler) first:
• somewhat “analyzable”
—my students’ winning choice in TSMC Big Data Competition
(just old-fashioned linear regression! )
• little risk
• if linear good enough, live happily thereafter
• otherwise, try something more complicated, with theoretically
nothing lost except “wasted” computation
My KISS Principle:
Keep It Simple, XXXXStupid Safe
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 15/30
Simple Model
Learning to Start Modeling for Big Data
Must-learn Items
• linear models, especially
• how to tune them
• how to interpret their outcomes
• simple models with frequency-based probability estimates,
such as Naïve Bayes
• decision tree (or perhaps even better, Random Forest) as a KISS
non-linear model
An explanation of the data should be
made as simple as possible, but no
simpler.—[?] Albert Einstein (Scientist)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 16/30
Feature Construction
Big Data FAQs (3/4)
how should I improve ML performance with
(my precious big) data?
My Polite Answer
do we have domain knowledge
about your problem?
My Honest Answer
I don’t know.
or a slightly longer answer:
I don’t know for sure, but perhaps you can
start by encoding your human
intelligence/knowledge.
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 17/30
Feature Construction
A Similar Scenario
how should I improve ML performance with
(my precious big) data?
how should I improve the performance of
my classroom students?
instructor teaching ≡ student learning
• teach more concretely −→ better performance
• teach more professionally −→ better performance
• teach more key points/aspects −→ better performance
to improve learning performance,
you should perhaps teach better
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 18/30
Feature Construction
Teaching Your Machine Better with Big Data
• concrete:
good research questions, as discussed
• professional:
embed domain knowledge during data construction
• key:
facilitate your learner using proper data pruning/cleaning/hinting
IMHO, data construction is more important
for big data than machine learning is
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 19/30
Feature Construction
Your Big Data Need Further Construction
Big Data Characteristics
many fields, and many abstract ones
Our KDDCup 2010 Experience
educational data mining
(Yu et al., Feature Engineering and Classifier Ensemble for KDD Cup 2010)
• “Because all feature meanings are available, we are able to
manually identify some useful pairs of features ...”:
• domain knowledge: “student s does step i of problem j in unit k”
• hierarchical encoding: [has student s tried unit k] more
meaningful than [has student s tried step i]
• “Correct First-Attempt Rate” cj of each problem j:
• domain knowledge: cj ≈ hardness
• condensed encoding: cj physically more meaningful than j
feature engineering: make your (feature) data
concrete by embedding domain knowledge
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 20/30
Feature Construction
Learning to Construct Features for Big Data
Must-learn Items
• domain knowledge
• if available, great!
• if not, start by analyzing data first, not by learning from data
—correlations, co-occurrences, informative parts, frequent items,
etc.
• common feature construction techniques
• encoding
• combination
• importance estimation: linear models and Random Forests
especially useful (simple models, remember? )
one secret in winning KDDCups:
ask interactive questions (remember?)
that allows encoding human intelligence
into feature construction
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 21/30
Trap Escaping
Big Data FAQs (4/4)
how should I escape from the unsatisfactory
test performance on (my precious big) data?
My Step by Step Diagnosis
if (training performance okay) [ 90% of the time]
• combat overfitting
• correct training/testing mismatch
• check for misuse
else
• construct better features by asking more questions,
remember?
• now you can try more sophisticated models
will focus on the first part
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 22/30
Trap Escaping
Combat Overfitting (1/2)
myth: my big data is so big that
overfitting is impossible
• no, big data usually
high-dimensional
• no, big data usually
heterogeneous
• no, big data usually
redundant
• no, big data usually
noisy
Overfitting Hazard
Number of Data Points, N
NoiseLevel,σ2
80 100 120
-0.2
-0.1
0
0.1
0.2
0
1
2
(Learning from Data book)
data-size-to-noise ratio is
what matters!
big data still require
careful treatment of overfitting
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 23/30
Trap Escaping
Combat Overfitting (2/2)
Driving Analogy of Overfitting
learning driving
overfit commit a car accident
sophisticated model “drive too fast”
noise bumpy road
limited data size limited observations about road condition
—big data only cross out last line
Regularization
regularization put brake
—important to know
where the brake is
Validation
validation monitor dashboard
—important to
ensure correctness
Overfitting is real, and here to stay.—Learning
from Data (Book)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 24/30
Trap Escaping
Correct Training/Testing Mismatch
A True Personal Story
• Netflix competition for movie recommender system:
10% improvement = 1M US dollars
• on my own validation data, first shot, showed 13% improvement
• why am I still teaching in NTU?
validation: random examples within data;
test: “last” user records “after” data
Technical Solutions
practical rule of thumb: match test scenario as much as possible
• training: emphasize later examples (KDDCup 2011)
• validation: use “late” user record
If the data is sampled in a biased way, learning
will produce a similarly biased
outcome.—Learning from Data (Book)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 25/30
Trap Escaping
Biggest Misuse in Machine Learning: Data Snooping
• 8 years of currency trading data
• first 6 years for training,
last two 2 years for testing
• feature = previous 20 days,
label = 21th day
• snooping versus no snooping:
superior profit possible
Day
CumulativeProfit%
no snooping
snooping
0 100 200 300 400 500
-10
0
10
20
30
• snooping: shift-scale all values by training + testing
• no snooping: shift-scale all values by training only
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 26/30
Trap Escaping
Data Snooping by Data Reusing
Data Snooping by Data Reusing: Research Scenario
with my precious data
• paper 1: propose algorithm 1 that works well on data
• paper 2: find room for improvement, propose algorithm 2
—and publish only if better than algorithm 2 on data
• paper 3: find room for improvement, propose algorithm 3
—and publish only if better than algorithm 2 on data
• . . .
• if all papers from the same author in one big paper: as if using a
super-sophisticated model that includes algorithms 1, 2, 3, . . .
• step-wise: later author snooped data by reading earlier papers,
bad generalization worsen by publish only if better
If you torture the data long enough, it will
confess.—Folklore in ML/DM
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 27/30
Trap Escaping
Avoid Big Data Snooping
data snooping =⇒ human overfitting
Honesty Matters
• very hard to avoid data
snooping, unless being
extremely honest
• extremely honest: lock
your test data in safe
• less honest: reserve
validation and use
cautiously
Guidelines
• be blind: avoid making
modeling decision by
data
• be suspicious: interpret
findings (including your
own) by proper feeling of
contamination—keep your
data fresh if possible
one last secret to winning KDDCups:
“art” to carefully balance between
data-driven modeling (snooping) 
validation (no-snooping)
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 28/30
Trap Escaping
Learning to Escape Traps for Big Data
Must-learn Items
• combat overfitting: regularization and validation
• correct training/testing mismatch: philosophy and perhaps
some heuristics
• avoid data snooping: philosophy and research cycle
(remember? )
happy big data learning!
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 29/30
Trap Escaping
Summary
• human must-learn ML topics for big data:
• procedure: research cycle
• tools: simple model, feature construction, overfitting elimination
• sense: philosophy behind machine learning
• foundations even more important in big data age
—now a shameless sales campaign for my co-authored book
and online course (to be re-run on September 8, 2015)
—special thanks to Prof. Yuh-Jye Lee and Mr. Yi-Hung Huang for
suggesting materials
Thank you!
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 30/30
Trap Escaping
Appendix: ML Foundations on NTU@Coursera
https://www.coursera.org/course/ntumlone
When can machines learn?
• L1: the learning problem ( )
• L2: learning to answer yes/no
( )
• L3: types of learning ( )
• L4: feasibility of learning
Why can machines learn?
• L5: training versus testing
• L6: theory of generalization
• L7: the VC dimension ( )
• L8: noise and error
How can machines learn?
• L9: linear regression ( )
• L10: logistic regression ( )
• L11: linear models for
classification ( )
• L12: nonlinear transformation ( )
How can machines learn better?
• L13: hazard of overfitting ( )
• L14: regularization ( )
• L15: validation ( )
• L16: three learning principles ( )
≈ must-learn
Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 30/30

Más contenido relacionado

La actualidad más candente

Overview of Machine Learning and its Applications
Overview of Machine Learning and its ApplicationsOverview of Machine Learning and its Applications
Overview of Machine Learning and its ApplicationsDeepak Chawla
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learningSara Hooker
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learningShishir Choudhary
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedJonathan Mugan
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à ZAlexia Audevart
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningKnoldus Inc.
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningPruet Boonma
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Machine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainMachine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainNishant Jain
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine LearningJeff Tanner
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
 

La actualidad más candente (20)

Overview of Machine Learning and its Applications
Overview of Machine Learning and its ApplicationsOverview of Machine Learning and its Applications
Overview of Machine Learning and its Applications
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist李育杰/The Growth of a Data Scientist
李育杰/The Growth of a Data Scientist
 
Active learning
Active learningActive learning
Active learning
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
ML Basics
ML BasicsML Basics
ML Basics
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Machine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainMachine learning_ Replicating Human Brain
Machine learning_ Replicating Human Brain
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 

Similar a Learning for Big Data-林軒田

Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.pptbutest
 
Dialogue system②
Dialogue system②Dialogue system②
Dialogue system②Kent T
 
Twdatasci cjlin-big data analytics - challenges and opportunities
Twdatasci cjlin-big data analytics - challenges and opportunitiesTwdatasci cjlin-big data analytics - challenges and opportunities
Twdatasci cjlin-big data analytics - challenges and opportunitiesAravindharamanan S
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningshivani saluja
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities台灣資料科學年會
 
100_Days_of_Data_Science
100_Days_of_Data_Science100_Days_of_Data_Science
100_Days_of_Data_ScienceSajzat hossain
 
BigMLSchool: Customer Segmentation
BigMLSchool: Customer SegmentationBigMLSchool: Customer Segmentation
BigMLSchool: Customer SegmentationBigML, Inc
 
CS Education for All. A new wave of opportunity
CS Education for All. A new wave of opportunityCS Education for All. A new wave of opportunity
CS Education for All. A new wave of opportunityPeter Donaldson
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessChia-Chi Chang
 
Cyber securityeducation may2015
Cyber securityeducation may2015Cyber securityeducation may2015
Cyber securityeducation may2015Mark Guzdial
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfKuan-Tsae Huang
 

Similar a Learning for Big Data-林軒田 (20)

Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Dialogue system②
Dialogue system②Dialogue system②
Dialogue system②
 
Twdatasci cjlin-big data analytics - challenges and opportunities
Twdatasci cjlin-big data analytics - challenges and opportunitiesTwdatasci cjlin-big data analytics - challenges and opportunities
Twdatasci cjlin-big data analytics - challenges and opportunities
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
lecture1.pptx
lecture1.pptxlecture1.pptx
lecture1.pptx
 
100_Days_of_Data_Science
100_Days_of_Data_Science100_Days_of_Data_Science
100_Days_of_Data_Science
 
BigMLSchool: Customer Segmentation
BigMLSchool: Customer SegmentationBigMLSchool: Customer Segmentation
BigMLSchool: Customer Segmentation
 
CS Education for All. A new wave of opportunity
CS Education for All. A new wave of opportunityCS Education for All. A new wave of opportunity
CS Education for All. A new wave of opportunity
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
ML crash course
ML crash courseML crash course
ML crash course
 
Presentation MaSE 18-102012
Presentation MaSE 18-102012Presentation MaSE 18-102012
Presentation MaSE 18-102012
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 
Cyber securityeducation may2015
Cyber securityeducation may2015Cyber securityeducation may2015
Cyber securityeducation may2015
 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 

Más de 台灣資料科學年會

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用台灣資料科學年會
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告台灣資料科學年會
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇台灣資料科學年會
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 台灣資料科學年會
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵台灣資料科學年會
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用台灣資料科學年會
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告台灣資料科學年會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維台灣資料科學年會
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察台灣資料科學年會
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰台灣資料科學年會
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳台灣資料科學年會
 

Más de 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 

Último

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Último (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Learning for Big Data-林軒田

  • 1. Learning for Big Data Hsuan-Tien Lin (林軒田) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University (國立台灣大學資訊工程系) slightly modified from my keynote talk in IEEE BigData 2015 Taipei Satellite Session Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 1/30
  • 2. Introduction About the Title • “Learning for Big Data” —my wife: you have made a typo • do you mean “Learning from Z ZBig Data”? —no, not a shameless sales campaign for my co-authored best-selling book (http://amlbook.com) as machine learning researcher machine learning for big data —easy?! as machine learning educator human learning for big data —hard!! will focus on human learning for big data Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 2/30
  • 3. Introduction Human Learning for Big Data Todo • some FAQs that I have encountered as • educator (NTU and NTU@Coursera) • team mentor (KDDCups, TSMC Big Data competition, etc.) • researcher (CLLab@NTU) • consultant ( ), a real-time advertisement bidding startup • my imperfect yet honest answers that hint what shall be learned First Honest Claims • must-learn for big data ≈ must-learn for small data in ML, but the former with bigger seriousness • system design/architecture very important, but somewhat beyond my pay grade I wish I had an answer to that because I’m tired of answering that question. —Yogi Berra (Athlete) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 3/30
  • 4. Asking Questions Big Data FAQs (1/4) how to ask good questions from (my precious big) data? My Polite Answer good start already , any more thoughts that you have in mind? My Honest Answer I don’t know. or a slightly longer answer: if you don’t know, I don’t know. Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 4/30
  • 5. Asking Questions A Similar Scenario how to ask good questions from (my precious big) data? how to find a research topic for my thesis? My Polite Answer good start already , any more thoughts that you have in mind? My Honest Answer I don’t know. or a slightly longer answer: I don’t know, but perhaps you can start by thinking about motivation and feasibility. Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 5/30
  • 6. Asking Questions Finding (Big) Data Questions ≈ Finding Research Topics • motivation: what are you interested in? • feasibility: what can or cannot be done? motivation • something publishable? oh, possibly just for people in academia • something that improves xyz performance • something that inspires deeper study —helps generate questions feasibility • modeling • computational • budget • timeline • . . . —helps filter questions brainstorm from motivation; rationalize from feasibility Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 6/30
  • 7. Asking Questions Finding Big Data Questions generate questions from motivation • variety: dream more in big data age • velocity: evolving data, evolving questions filter questions from feasibility • volume: computational bottleneck • veracity: modeling with non-textbook data almost never find right question in your first try —good questions come interactively Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 7/30
  • 8. Asking Questions Interactive Question-Asking from Big Data: Our KDDCup 2011 Experience (1/2) Recommender System • data: how users have rated movies • goal: predict how a user would rate an unrated movie A Hot Problem • competition held by Netflix in 2006 • 100,480,507 ratings that 480,189 users gave to 17,770 movies • 10% improvement = 1 million dollar prize • similar competition (movies → songs) held by Yahoo! in KDDCup 2011, the most prestigious data mining competition • 252,800,275 ratings that 1,000,990 users gave to 624,961 songs National Taiwan University got two world champions in KDDCup 2011—with Profs. Chih-Jen Lin, Shou-De Lin, and many students. Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 8/30
  • 9. Asking Questions Interactive Question-Asking from Big Data: Our KDDCup 2011 Experience (2/2) Q1 (pre-defined): can we improve rating prediction of (user, song)? Q1.1 after data analysis: two types of users, lazy 7% (same rating always) normal —if a user gives 60, 60, . . . during training, how’d she rate next item? same (80%) different (20%) Q1.1.1: can we distinguish 80% using other features? . . . —failed (something you normally wouldn’t see in paper ) Q1.2 after considering domain knowledge: test data are newer logs —shall we emphasize newer logs in training data? Q1.2.1: can we just give each log different weight? (but how?) Q1.2.2: can we tune optimization to effectively emphasize newer logs? (yes this worked ) our KDDCup experience: interactive (good or bad) question-asking kept us going! Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 9/30
  • 10. Asking Questions Learning to Ask Questions from Big Data Must-learn Items • true interest for motivation —big data don’t generate questions, big interests do • capability of machines (when to use ML?) for feasibility Taught in ML Foundations on NTU@Coursera 1 exists underlying pattern to be learned 2 no easy/programmable definition of pattern 3 having data related to pattern —ML isn’t cure-all • research cycle for systematic steps —a Ph.D. or serious research during M.S./undergraduate study Computers are useless. They can only give you answers.—Pablo Picasso (Artist) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 10/30
  • 11. Simple Model Big Data FAQs (2/4) what is the best machine learning model for (my precious big) data? My Polite Answer the best model is data-dependent, let’s chat about your data first My Honest Answer I don’t know. or a slightly longer answer: I don’t know about best, but perhaps you can start by thinking about simple models. Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 11/30
  • 12. Simple Model Sophisticated Model for Big Data what is the best machine learning model for (my precious big) data? what is the most sophisticated machine learning model for (my precious big) data? • myth: my big data work best with most sophisticated model • partially true: deep learning for image recognition @ Google —10 million images on 1 billion internal weights (Le et al., Building High-level Features Using Large Scale Unsupervised Learning, ICML 2012) Science must begin with myths, and with the criticism of myths. —Karl Popper (Philosopher) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 12/30
  • 13. Simple Model Criticism of Sophisticated Model myth: my big data work best with most sophisticated model Sophisticated Model • time-consuming to train and predict —often mismatch to big data • difficult to tune or modify —often exhausting to use • point of no return —often cannot “simplify” nor “analyze” sophisticated model shouldn’t be first-choice for big data Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 13/30
  • 14. Simple Model Linear First (1/2) what is the first machine learning model for (my precious big) data? Taught in ML Foundations on NTU@Coursera linear model (or simpler) first: • efficient to train and predict, e.g. (Lin et al., Large-scale logistic regression and linear support vector machines using Spark. IEEE BigData 2014) —my favorite in , a real-time ad. bidding startup • easy to tune or modify —key of our KDDCup winning solutions in 2010 (educational data mining) and 2012 (online ads) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 14/30
  • 15. Simple Model Linear First (2/2) what is the first machine learning model for (my precious big) data? Taught in ML Foundations on NTU@Coursera linear model (or simpler) first: • somewhat “analyzable” —my students’ winning choice in TSMC Big Data Competition (just old-fashioned linear regression! ) • little risk • if linear good enough, live happily thereafter • otherwise, try something more complicated, with theoretically nothing lost except “wasted” computation My KISS Principle: Keep It Simple, XXXXStupid Safe Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 15/30
  • 16. Simple Model Learning to Start Modeling for Big Data Must-learn Items • linear models, especially • how to tune them • how to interpret their outcomes • simple models with frequency-based probability estimates, such as Naïve Bayes • decision tree (or perhaps even better, Random Forest) as a KISS non-linear model An explanation of the data should be made as simple as possible, but no simpler.—[?] Albert Einstein (Scientist) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 16/30
  • 17. Feature Construction Big Data FAQs (3/4) how should I improve ML performance with (my precious big) data? My Polite Answer do we have domain knowledge about your problem? My Honest Answer I don’t know. or a slightly longer answer: I don’t know for sure, but perhaps you can start by encoding your human intelligence/knowledge. Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 17/30
  • 18. Feature Construction A Similar Scenario how should I improve ML performance with (my precious big) data? how should I improve the performance of my classroom students? instructor teaching ≡ student learning • teach more concretely −→ better performance • teach more professionally −→ better performance • teach more key points/aspects −→ better performance to improve learning performance, you should perhaps teach better Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 18/30
  • 19. Feature Construction Teaching Your Machine Better with Big Data • concrete: good research questions, as discussed • professional: embed domain knowledge during data construction • key: facilitate your learner using proper data pruning/cleaning/hinting IMHO, data construction is more important for big data than machine learning is Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 19/30
  • 20. Feature Construction Your Big Data Need Further Construction Big Data Characteristics many fields, and many abstract ones Our KDDCup 2010 Experience educational data mining (Yu et al., Feature Engineering and Classifier Ensemble for KDD Cup 2010) • “Because all feature meanings are available, we are able to manually identify some useful pairs of features ...”: • domain knowledge: “student s does step i of problem j in unit k” • hierarchical encoding: [has student s tried unit k] more meaningful than [has student s tried step i] • “Correct First-Attempt Rate” cj of each problem j: • domain knowledge: cj ≈ hardness • condensed encoding: cj physically more meaningful than j feature engineering: make your (feature) data concrete by embedding domain knowledge Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 20/30
  • 21. Feature Construction Learning to Construct Features for Big Data Must-learn Items • domain knowledge • if available, great! • if not, start by analyzing data first, not by learning from data —correlations, co-occurrences, informative parts, frequent items, etc. • common feature construction techniques • encoding • combination • importance estimation: linear models and Random Forests especially useful (simple models, remember? ) one secret in winning KDDCups: ask interactive questions (remember?) that allows encoding human intelligence into feature construction Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 21/30
  • 22. Trap Escaping Big Data FAQs (4/4) how should I escape from the unsatisfactory test performance on (my precious big) data? My Step by Step Diagnosis if (training performance okay) [ 90% of the time] • combat overfitting • correct training/testing mismatch • check for misuse else • construct better features by asking more questions, remember? • now you can try more sophisticated models will focus on the first part Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 22/30
  • 23. Trap Escaping Combat Overfitting (1/2) myth: my big data is so big that overfitting is impossible • no, big data usually high-dimensional • no, big data usually heterogeneous • no, big data usually redundant • no, big data usually noisy Overfitting Hazard Number of Data Points, N NoiseLevel,σ2 80 100 120 -0.2 -0.1 0 0.1 0.2 0 1 2 (Learning from Data book) data-size-to-noise ratio is what matters! big data still require careful treatment of overfitting Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 23/30
  • 24. Trap Escaping Combat Overfitting (2/2) Driving Analogy of Overfitting learning driving overfit commit a car accident sophisticated model “drive too fast” noise bumpy road limited data size limited observations about road condition —big data only cross out last line Regularization regularization put brake —important to know where the brake is Validation validation monitor dashboard —important to ensure correctness Overfitting is real, and here to stay.—Learning from Data (Book) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 24/30
  • 25. Trap Escaping Correct Training/Testing Mismatch A True Personal Story • Netflix competition for movie recommender system: 10% improvement = 1M US dollars • on my own validation data, first shot, showed 13% improvement • why am I still teaching in NTU? validation: random examples within data; test: “last” user records “after” data Technical Solutions practical rule of thumb: match test scenario as much as possible • training: emphasize later examples (KDDCup 2011) • validation: use “late” user record If the data is sampled in a biased way, learning will produce a similarly biased outcome.—Learning from Data (Book) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 25/30
  • 26. Trap Escaping Biggest Misuse in Machine Learning: Data Snooping • 8 years of currency trading data • first 6 years for training, last two 2 years for testing • feature = previous 20 days, label = 21th day • snooping versus no snooping: superior profit possible Day CumulativeProfit% no snooping snooping 0 100 200 300 400 500 -10 0 10 20 30 • snooping: shift-scale all values by training + testing • no snooping: shift-scale all values by training only Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 26/30
  • 27. Trap Escaping Data Snooping by Data Reusing Data Snooping by Data Reusing: Research Scenario with my precious data • paper 1: propose algorithm 1 that works well on data • paper 2: find room for improvement, propose algorithm 2 —and publish only if better than algorithm 2 on data • paper 3: find room for improvement, propose algorithm 3 —and publish only if better than algorithm 2 on data • . . . • if all papers from the same author in one big paper: as if using a super-sophisticated model that includes algorithms 1, 2, 3, . . . • step-wise: later author snooped data by reading earlier papers, bad generalization worsen by publish only if better If you torture the data long enough, it will confess.—Folklore in ML/DM Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 27/30
  • 28. Trap Escaping Avoid Big Data Snooping data snooping =⇒ human overfitting Honesty Matters • very hard to avoid data snooping, unless being extremely honest • extremely honest: lock your test data in safe • less honest: reserve validation and use cautiously Guidelines • be blind: avoid making modeling decision by data • be suspicious: interpret findings (including your own) by proper feeling of contamination—keep your data fresh if possible one last secret to winning KDDCups: “art” to carefully balance between data-driven modeling (snooping) validation (no-snooping) Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 28/30
  • 29. Trap Escaping Learning to Escape Traps for Big Data Must-learn Items • combat overfitting: regularization and validation • correct training/testing mismatch: philosophy and perhaps some heuristics • avoid data snooping: philosophy and research cycle (remember? ) happy big data learning! Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 29/30
  • 30. Trap Escaping Summary • human must-learn ML topics for big data: • procedure: research cycle • tools: simple model, feature construction, overfitting elimination • sense: philosophy behind machine learning • foundations even more important in big data age —now a shameless sales campaign for my co-authored book and online course (to be re-run on September 8, 2015) —special thanks to Prof. Yuh-Jye Lee and Mr. Yi-Hung Huang for suggesting materials Thank you! Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 30/30
  • 31. Trap Escaping Appendix: ML Foundations on NTU@Coursera https://www.coursera.org/course/ntumlone When can machines learn? • L1: the learning problem ( ) • L2: learning to answer yes/no ( ) • L3: types of learning ( ) • L4: feasibility of learning Why can machines learn? • L5: training versus testing • L6: theory of generalization • L7: the VC dimension ( ) • L8: noise and error How can machines learn? • L9: linear regression ( ) • L10: logistic regression ( ) • L11: linear models for classification ( ) • L12: nonlinear transformation ( ) How can machines learn better? • L13: hazard of overfitting ( ) • L14: regularization ( ) • L15: validation ( ) • L16: three learning principles ( ) ≈ must-learn Hsuan-Tien Lin (NTU CSIE) Learning for Big Data 30/30