SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Smart recommendation engine
of things to do in destination
Natural Language Processing and
Machine Learning
How to automatically categorize tours
and activities ?
July 2nd 2018
Introduction
MyLittleAdventure
@mylitadventure
Johnny RAHAJARISON
@brainstorm_me
johnny.rahajarison@mylittleadventure.com
2
Agenda
Introduction to machine learning
Why Natural Language Processing is so hard?
How do we process text?
Let’s try it out
Go further
3
What’s Machine Learning ?
Software that do something without being
explicitly programmed to, just by learning
through examples
Same software can be used for various tasks
It learns from experiences with respect to some task and
performance, and improves through experience
4
Unsupervised algorithms
Unsupervised algorithms
ClusteringAnomaly detection
5
Supervised algorithms
Supervised algorithms
ClassificationRegression
6
You said text, right?
7
Obviously, you said text
Not numbers
ContextPolysemy
Synonyms
Enantiosemy
Neologisms
Sarcasm
Names
Rare words
Common sense
Dialects
Non formal / abbrev.
8
Ambiguity?
9
I saw a man on a hill with a telescope.
Ambiguity?
10
I saw a man on a hill with a telescope.
Text should be prepared
11
Let’s clean our text first
['one', 'morn', 'when', 'gregor', 'samsa', 'woke', 'from', 'troubl', 'dream', 'he', 'found',
'himself', 'transform', 'in', 'hi', 'bed', 'into', 'a', 'horribl', 'vermin', 'He', 'lay', 'on',
'hi', 'armour-lik', 'back', 'and', 'if', 'he', 'lift', 'hi', 'head', 'a', 'littl', 'he', 'could',
'see', 'hi', 'brown', 'belli', 'slightli', 'dome', 'and', 'divid', 'by', 'arch', 'into', 'stiff',
'section', 'the', 'bed', 'wa', 'hardli', 'abl', 'to', 'cover', 'it', 'and', 'seem', 'readi', 'to',
'slide', 'off', 'ani', 'moment', 'hi', 'mani', 'leg', 'piti', 'thin', 'compar', 'with', 'the',
'size', 'of', 'the', 'rest', 'of', 'him', 'wave', 'about', 'helplessli', 'as', 'he', 'look',
'what', "'s", 'happen', ‘to']
✓ Tokenize sentences
✓ Tokenize words
✓ Transliterate
✓ Normalize
✓ Filter out 

(punctuation, special characters, stop words)
✓ Use a stemmer and / or a lemmatizer

("be" = am, are, is; “vari" = variation, vary, varies, variables)
12
A bag of words
“John","likes","to","watch","movies","Mary","likes","movies","too"
{"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1}
{131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1}
[1, 2, 1, 1, 2, 1, 1]
Each unique word in our dictionary will correspond to a feature
13
Count of documents
TF-IDF
TF (Term Frequencies)
Occurrences of a term
IDF (Inverse Document Frequency)
log( )Count of documents where terms appear
Total words in each document
14
Another way: use words embeddings
Words embeddings captures relative meaning
Use vectors to get comprehensive geometry of words
15
Paris - France + China = Beijing
Another way: use words embeddings
16
Example of “movies" vector
movies -0.34582 0.057328 0.1328 0.22376 0.10161 0.52948 -0.30199 0.45676 -0.37643 -0.51857 0.67325 -0.012444 -0.099021 0.43823
-0.28905 -1.0183 -0.0062387 -0.32893 0.55547 0.44181 0.31524 0.29909 0.51605 0.32109 0.021471 0.67909 0.037333 -0.42321
0.56517 0.47979 -0.63307 0.1126 0.0050579 -0.18879 -0.87478 -0.29481 -0.70824 -0.072256 0.1614 0.34523 0.61872 -0.036932
-0.43343 0.29604 0.18671 -0.33384 0.50628 -0.013876 0.46303 0.19298 0.16783 -0.55786 -0.16947 -0.27382 0.31027 0.10974 0.12819
0.23538 0.038003 -0.077524 -0.23291 0.044094 0.36325 0.20611 0.55571 -0.022715 -0.04996 0.32312 0.44176 0.25272 0.15159
0.22682 -0.10425 0.73375 0.66572 -0.55885 0.082242 -0.13387 0.31042 -0.38443 -0.38631 -0.7518 0.6706 -0.17495 0.056298 0.82038
0.41573 -0.12316 0.28437 -0.19324 -0.13485 0.28862 -0.37817 0.37268 0.01515 0.39123 0.059544 -0.074006 -0.17152 -1.1523
0.26541 0.082314 0.17914 -0.089861 -0.20884 0.29248 -0.60263 -0.0024285 0.24521 -0.5427 -0.074404 0.14034 0.0085891 -0.37351
0.23573 0.1493 -0.14038 0.11725 -0.51013 -0.64531 0.1329 0.075911 -0.10827 0.22077 -0.086253 0.4096 0.052314 0.40964 -0.030506
0.30572 -0.40694 -0.11773 0.21586 0.14448 0.23419 -0.23401 0.06811 0.29447 -0.4086 0.88777 -0.19477 -0.18847 0.10324 -0.24593
-0.10173 -0.43226 -0.091173 -0.092602 -0.23385 -0.16498 0.22057 0.11014 -0.25018 -0.43089 0.19759 0.11762 -0.045432 0.13331
0.032684 -0.21702 0.35082 -0.40466 -0.02425 -0.22637 0.0094442 0.72848 0.10286 0.27199 -0.40396 0.22366 -0.039481 -0.17164
-1.7307 0.3706 -0.13711 0.2295 -0.34432 -0.024381 -0.093941 -0.29861 -0.33164 -0.12931 -0.11218 0.047052 0.40442 0.0043382
0.22364 -0.31537 0.1987 -0.46108 -0.35126 -0.14584 0.17765 0.10869 -0.14434 -0.6152 -0.5874 0.014977 -0.1691 -0.46926 1.3959
-0.15449 -0.24167 -0.002575 0.4758 -0.044786 -0.21345 0.22983 -0.34356 -0.43402 -0.45719 -0.29775 -0.053295 0.50132 -0.24066
0.45762 0.095118 0.21008 0.71912 0.028577 -0.64176 0.1314 0.21556 -0.12536 -0.3298 -0.07123 0.35428 -0.3787 0.12348 -0.060439
0.19217 -0.29951 -0.73189 -0.33589 0.449 0.22654 1.0404 0.019947 -0.74711 0.071042 0.067809 0.36341 -0.32579 -0.11085 -0.24507
-0.13518 -0.44326 0.022784 -0.57252 0.33756 -0.23411 -0.062955 -0.35353 1.0497 -0.14938 -0.57772 0.27652 -0.28787 -0.0040621
0.25113 0.40818 -0.13227 0.016032 -0.55465 0.0021098 -0.27755 0.16082 -0.055202 0.21104 0.58412 0.42842 -0.047253 0.10542
0.027478 0.30911 0.31792 -1.8564 0.014412 -0.29748 -0.70103 -0.068219 -0.53071 -0.10661 0.028596 0.081479 0.34323 -0.047833
0.023129 0.028697 0.33859 -0.20706 -0.0025571 -0.18267 -0.26946 -1.1064 -0.31228 -0.13101 0.1161 -0.068647 -0.09988
Another way: use words embeddings
17
[[], 2*[], [], [], 2 *[-0.34582, 0.057328, … 0.22376, 0.10161], [], []]
{"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1}
{131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1}
[1, 2, 1, 1, 2, 1, 1]
Another way: use words embeddings
Embeddings vector for “movies"
18
Let’s predict
19
Recipe
Prepare
Training / Test
data
Files, database,
cache, data flow
Selection of model,
and (hyper) parameters
Train algorithm
Use or store your
trained estimator
Make
predictions
Measure accuracy
precision
Measure
20
Collect our training & test dataset
Food Label Vectorized
Eiffel Tower with Dinner
[ 0., 0., 0., 0., 0.5, 0.5, 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.5, 0., 0.5],
Skip the line Eiffel Tower
[ 0., 0., 0., 0., 0., 0.3967171 , 0., 0., 0., 0.47792296, 0., 0.,
0., 0., 0., 0.47792296, 0.47792296, 0., 0., 0.3967171 , 0., 0.],
Louvre Museum fast track
[ 0., 0., 0., 0., 0., 0., 0.5, 0., 0., 0., 0.5, 0.5, 0., 0., 0.,
0., 0., 0., 0., 0., 0.5, 0.],
Gourmet tour of Paris
[ 0., 0., 0., 0., 0., 0., 0., 0.58910044, 0., 0., 0., 0.,
0.41798437, 0.48900396, 0., 0., 0., 0., 0.48900396, 0., 0., 0.],
Segway tour of city’s highlights
[ 0., 0., 0.48838773, 0., 0., 0., 0., 0., 0.48838773, 0., 0., 0.,
0.3465257 , 0., 0.48838773, 0., 0., 0., 0.40540376, 0., 0., 0.],
Dinner cruise with Champagne
[ 0., 0.54408243, 0., 0.54408243, 0.45163515, 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.45163515],
Aquarium of Paris ticket
[ 0.55967542, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.39710644, 0.46457866, 0., 0., 0., 0.55967542, 0., 0., 0., 0.]
… …
21
Choose a classifier algorithm
22
A few recommendations
Naive Bayes / Logistic Regression
Decision Trees
Random Forest
Gradient Boosting
SVM
Neural Networks
23
Let’s measure
Food Label Prediction
Eiffel Tower with Dinner 0.83
Gourmet tour of Paris 0.96
Dinner cruise with Champagne 1.0
Segway tour of city’s highlights 0.03
Orsay dedicated entrance 0.02
3 course meal in Eiffel Tower 0.97
Cooking class in Paris 0.89
Moulin Rouge Paris dinner show 0.91
24
Training set
Real datas
25
Go further
26
There is way more
Cross validation dataset
N-Grams
Wrong user content
Misspellings & typos
Hard to get training data
Harder languages or transliterations issues
Memory / computing limitations
Online learning & Stacking
27
Some resources
https://www.slideshare.net/mylittleadventure/introduction-machine-learning-by-mylittleadventure
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://bit.ly/2uL954v
NLTK
Book
Stanford’s GloVe
DatasetCourse
Andrew Ng (coursera)
Platform
28
Libraries
Thank you
July 2nd 2018
Questions ?
@mylitadventure
@brainstorm_me
johnny.rahajarison@mylittleadventure.com

Más contenido relacionado

Similar a SophiaConf 2018 - J. Rahajarison (My Little Adventure)

Harkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the ParkHarkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the ParkHarkable
 
What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...Domino Data Lab
 
What's in your Workflow?
What's in your Workflow?What's in your Workflow?
What's in your Workflow?Emily Riederer
 
Wearables that rocks my world and some that don't
Wearables that rocks my world and some that don'tWearables that rocks my world and some that don't
Wearables that rocks my world and some that don'tLBi
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeAssignmentchimp
 
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.Brittany Smith
 
Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123Hina Chen
 
Performics at CES: Day 2
Performics at CES: Day 2 Performics at CES: Day 2
Performics at CES: Day 2 Performics
 
"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentation"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentationgeorge_edwards
 
Data science in action
Data science in actionData science in action
Data science in actionLonghow Lam
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
Patient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, OnePatient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, OneChris Dancy
 
PPT Seminar TA Augmented Reality
PPT Seminar TA Augmented RealityPPT Seminar TA Augmented Reality
PPT Seminar TA Augmented RealityAhmad Arif Faizin
 
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...Ahmad Arif Faizin
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningAndrew Beard
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSingleStore
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesOSCON Byrum
 
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptxLuis Beltran
 
AUGMENTED REALITY Refernces
AUGMENTED REALITY ReferncesAUGMENTED REALITY Refernces
AUGMENTED REALITY ReferncesVenu Gopal
 

Similar a SophiaConf 2018 - J. Rahajarison (My Little Adventure) (20)

Harkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the ParkHarkable Day of Innovation Oct 2013 - Hark in the Park
Harkable Day of Innovation Oct 2013 - Hark in the Park
 
Fighting Digital Dizzyness
Fighting Digital DizzynessFighting Digital Dizzyness
Fighting Digital Dizzyness
 
What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
What's in your Workflow?
What's in your Workflow?What's in your Workflow?
What's in your Workflow?
 
Wearables that rocks my world and some that don't
Wearables that rocks my world and some that don'tWearables that rocks my world and some that don't
Wearables that rocks my world and some that don't
 
Business statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylmeBusiness statistics -_assignment_dec_2019_zf_sgc5ylme
Business statistics -_assignment_dec_2019_zf_sgc5ylme
 
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
Fahrenheit 451 Part 1 Essay Topics. Online assignment writing service.
 
Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123Webconf 2013 - Media Query 123
Webconf 2013 - Media Query 123
 
Performics at CES: Day 2
Performics at CES: Day 2 Performics at CES: Day 2
Performics at CES: Day 2
 
"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentation"The Cutting Edge" - Palletways Business Club Presentation
"The Cutting Edge" - Palletways Business Club Presentation
 
Data science in action
Data science in actionData science in action
Data science in action
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Patient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, OnePatient Zero, One, One, Zero, One
Patient Zero, One, One, Zero, One
 
PPT Seminar TA Augmented Reality
PPT Seminar TA Augmented RealityPPT Seminar TA Augmented Reality
PPT Seminar TA Augmented Reality
 
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
Aplikasi Media Pemasaran Properti dengan menggunakan Teknologi Augmented Real...
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine Learning
 
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image RecognitionSpark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
Spark Summit Dublin 2017 - MemSQL - Real-Time Image Recognition
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypes
 
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
03 GlobalAIBootcamp2020Lisboa-Rock, Paper, Scissors.pptx
 
AUGMENTED REALITY Refernces
AUGMENTED REALITY ReferncesAUGMENTED REALITY Refernces
AUGMENTED REALITY Refernces
 

Más de TelecomValley

Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022TelecomValley
 
Rapport d'activité 2022
Rapport d'activité 2022Rapport d'activité 2022
Rapport d'activité 2022TelecomValley
 
Rapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom ValleyRapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom ValleyTelecomValley
 
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...TelecomValley
 
Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020TelecomValley
 
Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020TelecomValley
 
Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019TelecomValley
 
Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019TelecomValley
 
Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020TelecomValley
 
Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020TelecomValley
 
Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019TelecomValley
 
Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019TelecomValley
 
Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019TelecomValley
 
Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019TelecomValley
 
Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19TelecomValley
 
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...TelecomValley
 
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...TelecomValley
 
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...TelecomValley
 
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFEA la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFETelecomValley
 
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.12019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1TelecomValley
 

Más de TelecomValley (20)

Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022Rapport d'activité SoFAB 2022
Rapport d'activité SoFAB 2022
 
Rapport d'activité 2022
Rapport d'activité 2022Rapport d'activité 2022
Rapport d'activité 2022
 
Rapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom ValleyRapport d'activité 2021 - Telecom Valley
Rapport d'activité 2021 - Telecom Valley
 
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
Livre blanc "Les métamorphoses de l'entreprise face à l'imprévu - Tome 1 : la...
 
Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020Rapport d'activité SoFAB 2020
Rapport d'activité SoFAB 2020
 
Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020Rapport d'activité Telecom Valley 2020
Rapport d'activité Telecom Valley 2020
 
Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019Rapport d'activité SoFAB 2019
Rapport d'activité SoFAB 2019
 
Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019Rapport d'activité Telecom Valley 2019
Rapport d'activité Telecom Valley 2019
 
Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020Revue de presse Telecom Valley - Février 2020
Revue de presse Telecom Valley - Février 2020
 
Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020Revue de presse Telecom Valley - Janvier 2020
Revue de presse Telecom Valley - Janvier 2020
 
Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019Revue de presse Telecom Valley - Décembre 2019
Revue de presse Telecom Valley - Décembre 2019
 
Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019Revue de presse Telecom Valley - Novembre 2019
Revue de presse Telecom Valley - Novembre 2019
 
Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019Revue de presse Telecom Valley - Octobre 2019
Revue de presse Telecom Valley - Octobre 2019
 
Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019Revue de presse Telecom Valley - Septembre 2019
Revue de presse Telecom Valley - Septembre 2019
 
Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19Présentation Team France Export régionale - 29/11/19
Présentation Team France Export régionale - 29/11/19
 
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
2019 - NOURI - ALL4TEST- Le BDD pour decouvrir et specifier les besoins metie...
 
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
Tester c'est bien, monitorer c'est mieux - 2019 - KISSI - Soirée du Test Logi...
 
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
Et si mon test était la spécification de mon application ? - JACOB - iWE - So...
 
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFEA la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
A la poursuite du bug perdu - 2019 - THEAULT - DI GIORGIO - ACPQUALIFE
 
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.12019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
2019 - HAGE CHAHINE - ALTRAN - Presentation-DecouverteMondeAgile_V1.1
 

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Último (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

SophiaConf 2018 - J. Rahajarison (My Little Adventure)

  • 1. Smart recommendation engine of things to do in destination Natural Language Processing and Machine Learning How to automatically categorize tours and activities ? July 2nd 2018
  • 3. Agenda Introduction to machine learning Why Natural Language Processing is so hard? How do we process text? Let’s try it out Go further 3
  • 4. What’s Machine Learning ? Software that do something without being explicitly programmed to, just by learning through examples Same software can be used for various tasks It learns from experiences with respect to some task and performance, and improves through experience 4
  • 7. You said text, right? 7
  • 8. Obviously, you said text Not numbers ContextPolysemy Synonyms Enantiosemy Neologisms Sarcasm Names Rare words Common sense Dialects Non formal / abbrev. 8
  • 9. Ambiguity? 9 I saw a man on a hill with a telescope.
  • 10. Ambiguity? 10 I saw a man on a hill with a telescope.
  • 11. Text should be prepared 11
  • 12. Let’s clean our text first ['one', 'morn', 'when', 'gregor', 'samsa', 'woke', 'from', 'troubl', 'dream', 'he', 'found', 'himself', 'transform', 'in', 'hi', 'bed', 'into', 'a', 'horribl', 'vermin', 'He', 'lay', 'on', 'hi', 'armour-lik', 'back', 'and', 'if', 'he', 'lift', 'hi', 'head', 'a', 'littl', 'he', 'could', 'see', 'hi', 'brown', 'belli', 'slightli', 'dome', 'and', 'divid', 'by', 'arch', 'into', 'stiff', 'section', 'the', 'bed', 'wa', 'hardli', 'abl', 'to', 'cover', 'it', 'and', 'seem', 'readi', 'to', 'slide', 'off', 'ani', 'moment', 'hi', 'mani', 'leg', 'piti', 'thin', 'compar', 'with', 'the', 'size', 'of', 'the', 'rest', 'of', 'him', 'wave', 'about', 'helplessli', 'as', 'he', 'look', 'what', "'s", 'happen', ‘to'] ✓ Tokenize sentences ✓ Tokenize words ✓ Transliterate ✓ Normalize ✓ Filter out 
 (punctuation, special characters, stop words) ✓ Use a stemmer and / or a lemmatizer
 ("be" = am, are, is; “vari" = variation, vary, varies, variables) 12
  • 13. A bag of words “John","likes","to","watch","movies","Mary","likes","movies","too" {"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1} {131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1} [1, 2, 1, 1, 2, 1, 1] Each unique word in our dictionary will correspond to a feature 13
  • 14. Count of documents TF-IDF TF (Term Frequencies) Occurrences of a term IDF (Inverse Document Frequency) log( )Count of documents where terms appear Total words in each document 14
  • 15. Another way: use words embeddings Words embeddings captures relative meaning Use vectors to get comprehensive geometry of words 15
  • 16. Paris - France + China = Beijing Another way: use words embeddings 16
  • 17. Example of “movies" vector movies -0.34582 0.057328 0.1328 0.22376 0.10161 0.52948 -0.30199 0.45676 -0.37643 -0.51857 0.67325 -0.012444 -0.099021 0.43823 -0.28905 -1.0183 -0.0062387 -0.32893 0.55547 0.44181 0.31524 0.29909 0.51605 0.32109 0.021471 0.67909 0.037333 -0.42321 0.56517 0.47979 -0.63307 0.1126 0.0050579 -0.18879 -0.87478 -0.29481 -0.70824 -0.072256 0.1614 0.34523 0.61872 -0.036932 -0.43343 0.29604 0.18671 -0.33384 0.50628 -0.013876 0.46303 0.19298 0.16783 -0.55786 -0.16947 -0.27382 0.31027 0.10974 0.12819 0.23538 0.038003 -0.077524 -0.23291 0.044094 0.36325 0.20611 0.55571 -0.022715 -0.04996 0.32312 0.44176 0.25272 0.15159 0.22682 -0.10425 0.73375 0.66572 -0.55885 0.082242 -0.13387 0.31042 -0.38443 -0.38631 -0.7518 0.6706 -0.17495 0.056298 0.82038 0.41573 -0.12316 0.28437 -0.19324 -0.13485 0.28862 -0.37817 0.37268 0.01515 0.39123 0.059544 -0.074006 -0.17152 -1.1523 0.26541 0.082314 0.17914 -0.089861 -0.20884 0.29248 -0.60263 -0.0024285 0.24521 -0.5427 -0.074404 0.14034 0.0085891 -0.37351 0.23573 0.1493 -0.14038 0.11725 -0.51013 -0.64531 0.1329 0.075911 -0.10827 0.22077 -0.086253 0.4096 0.052314 0.40964 -0.030506 0.30572 -0.40694 -0.11773 0.21586 0.14448 0.23419 -0.23401 0.06811 0.29447 -0.4086 0.88777 -0.19477 -0.18847 0.10324 -0.24593 -0.10173 -0.43226 -0.091173 -0.092602 -0.23385 -0.16498 0.22057 0.11014 -0.25018 -0.43089 0.19759 0.11762 -0.045432 0.13331 0.032684 -0.21702 0.35082 -0.40466 -0.02425 -0.22637 0.0094442 0.72848 0.10286 0.27199 -0.40396 0.22366 -0.039481 -0.17164 -1.7307 0.3706 -0.13711 0.2295 -0.34432 -0.024381 -0.093941 -0.29861 -0.33164 -0.12931 -0.11218 0.047052 0.40442 0.0043382 0.22364 -0.31537 0.1987 -0.46108 -0.35126 -0.14584 0.17765 0.10869 -0.14434 -0.6152 -0.5874 0.014977 -0.1691 -0.46926 1.3959 -0.15449 -0.24167 -0.002575 0.4758 -0.044786 -0.21345 0.22983 -0.34356 -0.43402 -0.45719 -0.29775 -0.053295 0.50132 -0.24066 0.45762 0.095118 0.21008 0.71912 0.028577 -0.64176 0.1314 0.21556 -0.12536 -0.3298 -0.07123 0.35428 -0.3787 0.12348 -0.060439 0.19217 -0.29951 -0.73189 -0.33589 0.449 0.22654 1.0404 0.019947 -0.74711 0.071042 0.067809 0.36341 -0.32579 -0.11085 -0.24507 -0.13518 -0.44326 0.022784 -0.57252 0.33756 -0.23411 -0.062955 -0.35353 1.0497 -0.14938 -0.57772 0.27652 -0.28787 -0.0040621 0.25113 0.40818 -0.13227 0.016032 -0.55465 0.0021098 -0.27755 0.16082 -0.055202 0.21104 0.58412 0.42842 -0.047253 0.10542 0.027478 0.30911 0.31792 -1.8564 0.014412 -0.29748 -0.70103 -0.068219 -0.53071 -0.10661 0.028596 0.081479 0.34323 -0.047833 0.023129 0.028697 0.33859 -0.20706 -0.0025571 -0.18267 -0.26946 -1.1064 -0.31228 -0.13101 0.1161 -0.068647 -0.09988 Another way: use words embeddings 17
  • 18. [[], 2*[], [], [], 2 *[-0.34582, 0.057328, … 0.22376, 0.10161], [], []] {"John":1,"likes":2,"to":1,"watch":1,"movies":2,"Mary":1,"too":1} {131:1, 132:2, 133:1, 134:1, 135:2, 136:1, 137:1} [1, 2, 1, 1, 2, 1, 1] Another way: use words embeddings Embeddings vector for “movies" 18
  • 20. Recipe Prepare Training / Test data Files, database, cache, data flow Selection of model, and (hyper) parameters Train algorithm Use or store your trained estimator Make predictions Measure accuracy precision Measure 20
  • 21. Collect our training & test dataset Food Label Vectorized Eiffel Tower with Dinner [ 0., 0., 0., 0., 0.5, 0.5, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.5, 0., 0.5], Skip the line Eiffel Tower [ 0., 0., 0., 0., 0., 0.3967171 , 0., 0., 0., 0.47792296, 0., 0., 0., 0., 0., 0.47792296, 0.47792296, 0., 0., 0.3967171 , 0., 0.], Louvre Museum fast track [ 0., 0., 0., 0., 0., 0., 0.5, 0., 0., 0., 0.5, 0.5, 0., 0., 0., 0., 0., 0., 0., 0., 0.5, 0.], Gourmet tour of Paris [ 0., 0., 0., 0., 0., 0., 0., 0.58910044, 0., 0., 0., 0., 0.41798437, 0.48900396, 0., 0., 0., 0., 0.48900396, 0., 0., 0.], Segway tour of city’s highlights [ 0., 0., 0.48838773, 0., 0., 0., 0., 0., 0.48838773, 0., 0., 0., 0.3465257 , 0., 0.48838773, 0., 0., 0., 0.40540376, 0., 0., 0.], Dinner cruise with Champagne [ 0., 0.54408243, 0., 0.54408243, 0.45163515, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.45163515], Aquarium of Paris ticket [ 0.55967542, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.39710644, 0.46457866, 0., 0., 0., 0.55967542, 0., 0., 0., 0.] … … 21
  • 22. Choose a classifier algorithm 22
  • 23. A few recommendations Naive Bayes / Logistic Regression Decision Trees Random Forest Gradient Boosting SVM Neural Networks 23
  • 24. Let’s measure Food Label Prediction Eiffel Tower with Dinner 0.83 Gourmet tour of Paris 0.96 Dinner cruise with Champagne 1.0 Segway tour of city’s highlights 0.03 Orsay dedicated entrance 0.02 3 course meal in Eiffel Tower 0.97 Cooking class in Paris 0.89 Moulin Rouge Paris dinner show 0.91 24 Training set Real datas
  • 25. 25
  • 27. There is way more Cross validation dataset N-Grams Wrong user content Misspellings & typos Hard to get training data Harder languages or transliterations issues Memory / computing limitations Online learning & Stacking 27
  • 29. Thank you July 2nd 2018 Questions ? @mylitadventure @brainstorm_me johnny.rahajarison@mylittleadventure.com