SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Lynn Cherny, Assoc Prof Data Science, emlyon business
school
& Students!
@arnicas
PyData Paris 2017
Why am I here?
• Starting up a program in data science/analytics at
a business school: emlyon business school
• My courses first year: Python bootcamp, Data
analysis with Pandas, Text analysis/NLP, Business
Analytics (Excel pivot tables, SQL, Tableau).
• Next year: an intro AI course, some web & db stuff,
plus above.
–faculty in the marketing department when I introduced myself
“What do our students really need to know?”
–faculty in the marketing department when I introduced myself
“What do our students really need to know?”
–me, who likes NLP problems
“Hey, let’s find out by looking at job ads in
France.”
Also, This Project Course
• “Business Data Science Projects” — combine students
from
• École Lyon Centrale (engineering school, so
presumably coders) +
• emlyon business students (presumably non-coders)
for product design/research/plan
In practice, coding skills in the teams were not distributed
as expected; but my project had strong skills on both
sides (we already taught a few Python courses by then)
The student team
• Mathilde TRÉARDE (superb
project manager)
• Thomas PUCCI (amazing
reactjs front-end dev)
• Yann VAGINAY (great python
data scientist)
• Imen FEHRI
• Mohamed Amine MEJRI
• Roxane MARCILHACY (great
python data scientist)
• Julien RAULT
• Eric DUPRAZ
• Sophie REISER (great market
research/analyst)
• Nicolas LOUVIGNE (top notch
visual designer/branding)
• Grégoire CANER-CHABRAN
• Sarah DAIEN
Data Sources
Indeed API: targeted searches, text collection
apec.fr: targeted searches (and sifoning from API)
“JT” (CSV data dump from an edu provider)
Data collection began in February 2017 in earnest.
I beefed it up in April/May.
Demo
Filter: A PDF resume uploaded… maybe a bit imperfect now:
Biz students:
95 student interviews of job searchers
Excellent creative work
UI mockup suggestions from
biz team
Architecture
Lynn said we should do these (Mongo, ES, Flask)
and set up (poorly managed and insecure) Mongo / Elastic / EC2 crawler host
herself on AWS.
Dev team did their own github/react & nodejs/Heroku plan.
Some discoveries in the
code after it was over.
• Databases didn’t have date the items were added
to them (date of scrape)
• Scraping was based on rather random sets of
words, and not consistent across site sources
• No automation of the indexing in Elastic - manual
job from Jupyter notebook (they knew this was an
issue too)
• Scraper code was never put on github.
My security issues
• Tried and failed to secure mongo by my own ssh key gen,
ended up using tunneling from scraping machine(that works
fine).
• Elastic is wide open and had been written to by a virus
(Amazon just sent me a warning), creating extra tables.
• We had a lot of issues with university firewalls and the cloud.
We all had to tether to phones to access the dbs from
school.
• AWS security stuff is really confusing. (One student team
didn’t succeed in using AWS at all— no one helped them.)
the data in more
detail…
Total Data Now by Source
• “JT,” an academic partner (given us as dump in
Jan, now “out of date”): 78K
• Apec: 25K
• Indeed: 10K
Apec - cadres
My student: “they would never hire someone like me”
Indeed - international feed (API)
with links - need to scrape text
more english:
Data in the db : the search
terms requested by API (!?)
apec.fr Indeed
Dates in the db (remember,
not the date scraped…)
Indeed’s date of
publication counts
Apec
student work ended March/Apr - I added new terms and increased scraping into May/June
JT provided data dates
JT provided data dates
No, this spike is real,
they are different ads and dated
this same day.
Job type labels
on JT data
Largest cats are
Marketing, Bizdev,
Communication
(Dev/IT not small tho)
“JT” : more “stages”
Revisit the word2vec
part
Or create your own list
and see the related
skills in the
“neighborhood”:
scikit-learn is not in the skills list? but is found in a job ad!
What is that graph?
a few “closely related skills” (by word2vec distance) in
a simple TSNE layout, computed and passed over API.
Awesome idea… but caveat: “Skills” were pre-filtered from the
word2vec model of the job ads, using the list of LinkedIn
skills.
link
A few related links
Radio’s tutorial on using word2vec in gensim:
https://rare-technologies.com/word2vec-tutorial/
My 8 million links on w2v papers/code etc:
https://pinboard.in/search/u:arnicas?query=word2vec
Interactive demo of w2v tsne layout of Yelp text reviews:
https://bl.ocks.org/arnicas/dd2ef348ad8854e40ef2
Useful warnings/info about making tsne layouts (we need
a grid search option):
http://distill.pub/2016/misread-tsne/
LinkedIn Skillz list: English,
Mysterious, —Garbage?
LI skills only
from the w2v model
in March
Zoom in…
Word2Vec updated (a week ago)
?!
Python also didn’t make the
“top 50 words
per search term,” which is
sad.
My shitty tsne layout that took
40 minutes on my laptop
Tensorboard projector view
convert your gensim model to tensorflow tsv files and upload
http://projector.tensorflow.org/
english
Tableau app
vis in
Tableau,
more UI
options
Most frequent
data-related
words, sized by
frequency
in search on
source.
Note: few JT ads
words (pink)
Sales,
logistics
supply chain -
lot of JT.
Let’s look at job ads
again…
“skills” are often soft or “previous
experience doing” in business job ads
link
Market research with
students:
Algorithm to determine “skill” “matches” is interesting but
worrying. It has to be really “good.”
–one of my students (who did better after tips on searching for skills I’d
taught on other job sites) :)
“I feel like we’re all looking at the same vague
job ads and competing with each other.”
Search by courses taken?
some of these descriptions are really short and vague; what’s
a good criterion for match?
sure, with 2 words, we get
some matches…
Teaching vs. Jobs, a Gap.
Les	 entrepreneurs	 sont	 appelés	 à	 résoudre	
constamment	des	problèmes	avec	peu	de	temps	
et	 de	 ressources	 pour	 prendre	 du	 recul	 dans	 un	
environnement	à	forte	incer7tude.	En	s'appuyant	
sur	des	résultats	en	recherche	sur	le	management	
et	la	psychologie	cogni7ve,	ce	cours	vise	à	fournir	
quelques	 apports	 simples	 pour	 développer	 et	
accompagner	 l'ap7tude	 décisionnelle	 des	
par7cipants.
“decision-making” course:
Job ad: “You can make decisions”?
So, Extension Ideas
• For student job search improvement:
• Return to skill extraction problem; use some training data. (Do some qualitative
analysis.)
• CV matching problem: revisit. Use different skills extraction (n-grams)
• Compare description of ALL courses taken (and liked) vs. jobs out there; is this
better?
• Curriculum development:
• Evaluate course descriptions by how well they match jobs
• Find “gaps” in teaching — what’s not being taught? (E.g., SQL.)
• Could course descriptions (and content) be better? Make this easier for
students?
My plan now
• Generally, starting up a Data Science Institute in
EM-Lyon. Money —> DS and data vis visitors/
confs/talks.
• Looking for help with teaching/workshops/tutorials
(Paris, Lyon, St. Etienne, Shanghai, Casablanca,
India)
• Contact me at cherny@em-lyon.com or @arnicas
Reminder: The student team
• Mathilde TRÉARDE (superb
project manager)
• Thomas PUCCI (amazing
reactjs front-end dev, multiply
employed)
• Yann VAGINAY (great python
data scientist doing NLP in
German stage now)
• Imen FEHRI
• Mohamed Amine MEJRI
• Roxane MARCILHACY (great python
data scientist) - now also web dev.
Looking for stage in Paris.
• Julien RAULT
• Eric DUPRAZ
• Sophie REISER (great market
research/analyst, not dev, but looking)
• Nicolas LOUVIGNE (top notch visual
designer/branding)
• Grégoire CANER-CHABRAN
• Sarah DAIEN

Más contenido relacionado

La actualidad más candente

PuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside PuppetPuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside PuppetPuppet
 
How to really obfuscate your pdf malware
How to really obfuscate your pdf malwareHow to really obfuscate your pdf malware
How to really obfuscate your pdf malwarezynamics GmbH
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVMjexp
 
C++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the uglyC++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the uglyDror Helper
 
Introduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonIntroduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonTharindu Weerasinghe
 
Java vs JavaScript | Edureka
Java vs JavaScript | EdurekaJava vs JavaScript | Edureka
Java vs JavaScript | EdurekaEdureka!
 
Introduction to mobile reversing
Introduction to mobile reversingIntroduction to mobile reversing
Introduction to mobile reversingjduart
 
Down With JavaScript!
Down With JavaScript!Down With JavaScript!
Down With JavaScript!Garth Gilmour
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingMaarten Balliauw
 
Application Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience CourseApplication Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience Courseparag
 
Introduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and ToolsIntroduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and ToolsTharindu Weerasinghe
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...Maarten Balliauw
 
Node.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHPNode.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHPJoris Verbogt
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsPiyush Katariya
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpAhmed Abdou
 

La actualidad más candente (20)

PuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside PuppetPuppetConf track overview: Inside Puppet
PuppetConf track overview: Inside Puppet
 
How to really obfuscate your pdf malware
How to really obfuscate your pdf malwareHow to really obfuscate your pdf malware
How to really obfuscate your pdf malware
 
Polyglot Applications with GraalVM
Polyglot Applications with GraalVMPolyglot Applications with GraalVM
Polyglot Applications with GraalVM
 
C++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the uglyC++ Unit testing - the good, the bad & the ugly
C++ Unit testing - the good, the bad & the ugly
 
Polyglot
PolyglotPolyglot
Polyglot
 
Introduction to Agile Software Development & Python
Introduction to Agile Software Development & PythonIntroduction to Agile Software Development & Python
Introduction to Agile Software Development & Python
 
Core java slides
Core java slidesCore java slides
Core java slides
 
Java vs JavaScript | Edureka
Java vs JavaScript | EdurekaJava vs JavaScript | Edureka
Java vs JavaScript | Edureka
 
Introduction to mobile reversing
Introduction to mobile reversingIntroduction to mobile reversing
Introduction to mobile reversing
 
Down With JavaScript!
Down With JavaScript!Down With JavaScript!
Down With JavaScript!
 
ConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttlingConFoo Montreal - Approaches for application request throttling
ConFoo Montreal - Approaches for application request throttling
 
Application Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience CourseApplication Development Using Java - DIYComputerScience Course
Application Development Using Java - DIYComputerScience Course
 
Introduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and ToolsIntroduction to Enterprise Applications and Tools
Introduction to Enterprise Applications and Tools
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
 
Why Concurrency is hard ?
Why Concurrency is hard ?Why Concurrency is hard ?
Why Concurrency is hard ?
 
Ateji PX for Java
Ateji PX for JavaAteji PX for Java
Ateji PX for Java
 
Node.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHPNode.js Presentation Rotterdam.PHP
Node.js Presentation Rotterdam.PHP
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise Applications
 
Java vs python
Java vs pythonJava vs python
Java vs python
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in php
 

Similar a Lynn Cherny Data Science Program emlyon business school

Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...HRITIKKHURANA1
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group Orientation20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group OrientationDuc Lai Trung Minh
 
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project SuccessfulCETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project SuccessfulChicago eLearning & Technology Showcase
 
When develpment met test(shift left testing)
When develpment met test(shift left testing)When develpment met test(shift left testing)
When develpment met test(shift left testing)SangIn Choung
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19Yong Siang (Ivan) Tan
 
Start Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You ThinkStart Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You ThinkCheah Eng Soon
 
Rapid elearning tools and techniques
Rapid elearning tools and techniquesRapid elearning tools and techniques
Rapid elearning tools and techniquesSteve Rayson
 
How to Build your Career.pptx
How to Build your Career.pptxHow to Build your Career.pptx
How to Build your Career.pptxvaideheekore
 
Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013lokori
 
Designing for insight
Designing for insightDesigning for insight
Designing for insightAaron Silvers
 

Similar a Lynn Cherny Data Science Program emlyon business school (20)

Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Google summer of code 2012
Google summer of code 2012Google summer of code 2012
Google summer of code 2012
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group Orientation20180707 - 2nd meeting - Group Orientation
20180707 - 2nd meeting - Group Orientation
 
Report on web development
Report on web developmentReport on web development
Report on web development
 
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project SuccessfulCETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
CETS 2011, Mark Steiner, Top 10 Ways to Make Your eLearning Project Successful
 
Why other ppl_dont_get_it
Why other ppl_dont_get_itWhy other ppl_dont_get_it
Why other ppl_dont_get_it
 
When develpment met test(shift left testing)
When develpment met test(shift left testing)When develpment met test(shift left testing)
When develpment met test(shift left testing)
 
Computer software specialists wikki verma
Computer software specialists   wikki vermaComputer software specialists   wikki verma
Computer software specialists wikki verma
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
Start Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You ThinkStart Building Machine Learning Models Faster Than You Think
Start Building Machine Learning Models Faster Than You Think
 
Rapid elearning tools and techniques
Rapid elearning tools and techniquesRapid elearning tools and techniques
Rapid elearning tools and techniques
 
How to Build your Career.pptx
How to Build your Career.pptxHow to Build your Career.pptx
How to Build your Career.pptx
 
Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013Agilelessons scanagile-final 2013
Agilelessons scanagile-final 2013
 
Designing for insight
Designing for insightDesigning for insight
Designing for insight
 

Más de Pôle Systematic Paris-Region

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyPôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAPôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentPôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotPôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPôle Systematic Paris-Region
 

Más de Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Último

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Último (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Lynn Cherny Data Science Program emlyon business school

  • 1. Lynn Cherny, Assoc Prof Data Science, emlyon business school & Students! @arnicas PyData Paris 2017
  • 2. Why am I here? • Starting up a program in data science/analytics at a business school: emlyon business school • My courses first year: Python bootcamp, Data analysis with Pandas, Text analysis/NLP, Business Analytics (Excel pivot tables, SQL, Tableau). • Next year: an intro AI course, some web & db stuff, plus above.
  • 3. –faculty in the marketing department when I introduced myself “What do our students really need to know?”
  • 4. –faculty in the marketing department when I introduced myself “What do our students really need to know?” –me, who likes NLP problems “Hey, let’s find out by looking at job ads in France.”
  • 5. Also, This Project Course • “Business Data Science Projects” — combine students from • École Lyon Centrale (engineering school, so presumably coders) + • emlyon business students (presumably non-coders) for product design/research/plan In practice, coding skills in the teams were not distributed as expected; but my project had strong skills on both sides (we already taught a few Python courses by then)
  • 6. The student team • Mathilde TRÉARDE (superb project manager) • Thomas PUCCI (amazing reactjs front-end dev) • Yann VAGINAY (great python data scientist) • Imen FEHRI • Mohamed Amine MEJRI • Roxane MARCILHACY (great python data scientist) • Julien RAULT • Eric DUPRAZ • Sophie REISER (great market research/analyst) • Nicolas LOUVIGNE (top notch visual designer/branding) • Grégoire CANER-CHABRAN • Sarah DAIEN
  • 7. Data Sources Indeed API: targeted searches, text collection apec.fr: targeted searches (and sifoning from API) “JT” (CSV data dump from an edu provider) Data collection began in February 2017 in earnest. I beefed it up in April/May.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Filter: A PDF resume uploaded… maybe a bit imperfect now:
  • 15. Biz students: 95 student interviews of job searchers
  • 17. UI mockup suggestions from biz team
  • 18. Architecture Lynn said we should do these (Mongo, ES, Flask) and set up (poorly managed and insecure) Mongo / Elastic / EC2 crawler host herself on AWS. Dev team did their own github/react & nodejs/Heroku plan.
  • 19. Some discoveries in the code after it was over. • Databases didn’t have date the items were added to them (date of scrape) • Scraping was based on rather random sets of words, and not consistent across site sources • No automation of the indexing in Elastic - manual job from Jupyter notebook (they knew this was an issue too) • Scraper code was never put on github.
  • 20. My security issues • Tried and failed to secure mongo by my own ssh key gen, ended up using tunneling from scraping machine(that works fine). • Elastic is wide open and had been written to by a virus (Amazon just sent me a warning), creating extra tables. • We had a lot of issues with university firewalls and the cloud. We all had to tether to phones to access the dbs from school. • AWS security stuff is really confusing. (One student team didn’t succeed in using AWS at all— no one helped them.)
  • 21. the data in more detail…
  • 22. Total Data Now by Source • “JT,” an academic partner (given us as dump in Jan, now “out of date”): 78K • Apec: 25K • Indeed: 10K
  • 23. Apec - cadres My student: “they would never hire someone like me”
  • 24. Indeed - international feed (API) with links - need to scrape text
  • 26. Data in the db : the search terms requested by API (!?) apec.fr Indeed
  • 27. Dates in the db (remember, not the date scraped…) Indeed’s date of publication counts Apec student work ended March/Apr - I added new terms and increased scraping into May/June
  • 29. JT provided data dates No, this spike is real, they are different ads and dated this same day.
  • 30. Job type labels on JT data Largest cats are Marketing, Bizdev, Communication (Dev/IT not small tho)
  • 31. “JT” : more “stages”
  • 33. Or create your own list and see the related skills in the “neighborhood”: scikit-learn is not in the skills list? but is found in a job ad!
  • 34. What is that graph? a few “closely related skills” (by word2vec distance) in a simple TSNE layout, computed and passed over API. Awesome idea… but caveat: “Skills” were pre-filtered from the word2vec model of the job ads, using the list of LinkedIn skills. link
  • 35. A few related links Radio’s tutorial on using word2vec in gensim: https://rare-technologies.com/word2vec-tutorial/ My 8 million links on w2v papers/code etc: https://pinboard.in/search/u:arnicas?query=word2vec Interactive demo of w2v tsne layout of Yelp text reviews: https://bl.ocks.org/arnicas/dd2ef348ad8854e40ef2 Useful warnings/info about making tsne layouts (we need a grid search option): http://distill.pub/2016/misread-tsne/
  • 36. LinkedIn Skillz list: English, Mysterious, —Garbage?
  • 37. LI skills only from the w2v model in March
  • 39. Word2Vec updated (a week ago) ?! Python also didn’t make the “top 50 words per search term,” which is sad.
  • 40.
  • 41. My shitty tsne layout that took 40 minutes on my laptop
  • 42. Tensorboard projector view convert your gensim model to tensorflow tsv files and upload http://projector.tensorflow.org/ english
  • 43.
  • 45. Most frequent data-related words, sized by frequency in search on source. Note: few JT ads words (pink)
  • 47. Let’s look at job ads again…
  • 48.
  • 49.
  • 50. “skills” are often soft or “previous experience doing” in business job ads link
  • 51. Market research with students: Algorithm to determine “skill” “matches” is interesting but worrying. It has to be really “good.”
  • 52. –one of my students (who did better after tips on searching for skills I’d taught on other job sites) :) “I feel like we’re all looking at the same vague job ads and competing with each other.”
  • 53. Search by courses taken? some of these descriptions are really short and vague; what’s a good criterion for match?
  • 54. sure, with 2 words, we get some matches…
  • 55. Teaching vs. Jobs, a Gap. Les entrepreneurs sont appelés à résoudre constamment des problèmes avec peu de temps et de ressources pour prendre du recul dans un environnement à forte incer7tude. En s'appuyant sur des résultats en recherche sur le management et la psychologie cogni7ve, ce cours vise à fournir quelques apports simples pour développer et accompagner l'ap7tude décisionnelle des par7cipants. “decision-making” course: Job ad: “You can make decisions”?
  • 56. So, Extension Ideas • For student job search improvement: • Return to skill extraction problem; use some training data. (Do some qualitative analysis.) • CV matching problem: revisit. Use different skills extraction (n-grams) • Compare description of ALL courses taken (and liked) vs. jobs out there; is this better? • Curriculum development: • Evaluate course descriptions by how well they match jobs • Find “gaps” in teaching — what’s not being taught? (E.g., SQL.) • Could course descriptions (and content) be better? Make this easier for students?
  • 57. My plan now • Generally, starting up a Data Science Institute in EM-Lyon. Money —> DS and data vis visitors/ confs/talks. • Looking for help with teaching/workshops/tutorials (Paris, Lyon, St. Etienne, Shanghai, Casablanca, India) • Contact me at cherny@em-lyon.com or @arnicas
  • 58. Reminder: The student team • Mathilde TRÉARDE (superb project manager) • Thomas PUCCI (amazing reactjs front-end dev, multiply employed) • Yann VAGINAY (great python data scientist doing NLP in German stage now) • Imen FEHRI • Mohamed Amine MEJRI • Roxane MARCILHACY (great python data scientist) - now also web dev. Looking for stage in Paris. • Julien RAULT • Eric DUPRAZ • Sophie REISER (great market research/analyst, not dev, but looking) • Nicolas LOUVIGNE (top notch visual designer/branding) • Grégoire CANER-CHABRAN • Sarah DAIEN