SlideShare una empresa de Scribd logo
1 de 13
Could You Be a Data Scientist? 
Carlo Torniai, Ph.D. 
@carlotorniai
Goal 
• Quantify data scientist profiles features 
• Analyze aspirant data scientist profiles 
• Provide useful feedback 
?
Why this is relevant? 
• A quantitative characterization of data scientists 
profiles can help closing the loop between job 
seekers and recruiters 
Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
Data Collection 
• Linkedin API: 
– General Information 
– Past work history 
– Education 
• Web Scraping: 
– Skills 
• 1500 profiles 
– Data Scientists 
– Software Engineer 
– Business Analysts 
– Mathematicians 
– Statisticians
Data Analysis 
Feature Extraction 
Software Engineers 
Business Analysts 
Data scientists 
Statisticians 
Mathematicians
Data Analysis 
Feature Extraction 
Astronomy 
Bioinformatics 
Biology 
Computer 
Science 
Economics 
Electronics 
Engineering 
Math 
Neuroscience 
Other 
Physics 
Psychology 
Stats 
Number of PhDs by topic and profiles
Model Testing 
For the purpose of this project I trained with skills and 
education features the following models: 
Random Forest 
• Classify the profile 
Naïve Bayes 
• Multi class probabilities to asses profiles 
background components 
K-means 
• Capability of suggesting similar and relevant profiles
Model Testing 
For the purpose of this project I trained with skills and 
education features the following models: 
Model Training set Purpose 
Random 
Forest 
All 5 categories Classify the profile 
Naïve Bayes 4 classic 
categories: SE, BA, 
MT, ST 
Asses profile backgrounds 
components with multi class 
probabilities 
K-means All 5 categories Identify similar profiles
Data Product 
bit.ly/cybads
Data Product 
Naïve Bayes 
Multi class 
probabilities 
Random Forest
Data Product 
K-means 
clustering
Next Steps 
Data Collection 
Data Analysis 
Feature Extraction 
Model Testing Data Product 
Get more data: 
- Other websites 
- Indeed 
- User input on 
Web app 
- Fine grained 
parsing of 
education 
- Experiment with 
additional features 
(industry, years of 
experience) 
• Extend feature set 
and test more 
models 
• Fuzzy C-means 
• Add interactive 
data collection 
• Personalized links 
for skills 
• Explanation about 
similarity results 
Close the loop by analyzing job offers and suggest 
matching profiles
Thank you! 
Technologies 
Web App: 
Flask, jQuery, Vega, MongoDB 
NMF, HC, RF ,DT, NB, K-means models:: 
scikit-learn 
Visualizations: 
Vincent, Vega, NetworkX, Gephi 
Acknowledgement 
yatish27 : Ruby Linkedin public profile Web Scraper 
ozgut : Linkedin API Python wrapper

Más contenido relacionado

La actualidad más candente

FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyDatabricks
 
Cvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML modelCvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML modelLviv Startup Club
 
Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...Databricks
 
How to Use Social Media for Recruitment
How to Use Social Media for RecruitmentHow to Use Social Media for Recruitment
How to Use Social Media for RecruitmentJosé Kadlec
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesNeo4j
 
Data & AI Session @ RBS
Data & AI Session @ RBSData & AI Session @ RBS
Data & AI Session @ RBSAnkit Rathi
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
AI in the Enterprise
AI in the EnterpriseAI in the Enterprise
AI in the EnterpriseRon Bodkin
 
Strategy toolbox for startsups
Strategy toolbox for startsupsStrategy toolbox for startsups
Strategy toolbox for startsupsAsher Sterkin
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
 
Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...Data Con LA
 
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Databricks
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesConnected Data World
 
MLCommons: Better ML for Everyone
MLCommons: Better ML for EveryoneMLCommons: Better ML for Everyone
MLCommons: Better ML for EveryoneDatabricks
 
Big Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueBig Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueDan Mallinger
 
Kasuria - Lead Backend Developer
Kasuria - Lead Backend DeveloperKasuria - Lead Backend Developer
Kasuria - Lead Backend DeveloperKasuriaGmbH
 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Eugene Yan Ziyou
 
Data science with python certification training course with
Data science with python certification training course withData science with python certification training course with
Data science with python certification training course withkiruthikab6
 

La actualidad más candente (20)

FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and Cheaply
 
Cvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML modelCvetanka Eftimoska: How we can use PySpark for building and training an ML model
Cvetanka Eftimoska: How we can use PySpark for building and training an ML model
 
Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...Stories from the Financial Service AI Trenches: Lessons Learned from Building...
Stories from the Financial Service AI Trenches: Lessons Learned from Building...
 
How to Use Social Media for Recruitment
How to Use Social Media for RecruitmentHow to Use Social Media for Recruitment
How to Use Social Media for Recruitment
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial Times
 
Data & AI Session @ RBS
Data & AI Session @ RBSData & AI Session @ RBS
Data & AI Session @ RBS
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
AI in the Enterprise
AI in the EnterpriseAI in the Enterprise
AI in the Enterprise
 
Strategy toolbox for startsups
Strategy toolbox for startsupsStrategy toolbox for startsups
Strategy toolbox for startsups
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...Data engineering at the interface of art and analytics: the why, what, and ho...
Data engineering at the interface of art and analytics: the why, what, and ho...
 
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
 
MLCommons: Better ML for Everyone
MLCommons: Better ML for EveryoneMLCommons: Better ML for Everyone
MLCommons: Better ML for Everyone
 
Big Analytics: Building Lasting Value
Big Analytics: Building Lasting ValueBig Analytics: Building Lasting Value
Big Analytics: Building Lasting Value
 
Kasuria - Lead Backend Developer
Kasuria - Lead Backend DeveloperKasuria - Lead Backend Developer
Kasuria - Lead Backend Developer
 
Data Warehousing Trends
Data Warehousing TrendsData Warehousing Trends
Data Warehousing Trends
 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
 
Data science with python certification training course with
Data science with python certification training course withData science with python certification training course with
Data science with python certification training course with
 

Destacado

Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social GoodCarlo Torniai
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart ManufacturingCarlo Torniai
 
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...Carlo Torniai
 
ITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION
 
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktykaWłodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktykaAnalyticsConf
 
Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science? Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science? Carlo Torniai
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...Carlo Torniai
 
Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...SAVO
 
User empathy-with-acf
User empathy-with-acfUser empathy-with-acf
User empathy-with-acfDavid Evans
 

Destacado (10)

Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social Good
 
Data Science for Smart Manufacturing
Data Science for Smart ManufacturingData Science for Smart Manufacturing
Data Science for Smart Manufacturing
 
Torniai icbo
Torniai icboTorniai icbo
Torniai icbo
 
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
Amia 2013: From EHRs to Linked Data: representing and mining encounter data f...
 
ITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clientsITMAGINATION - competences, facts, technologies, clients
ITMAGINATION - competences, facts, technologies, clients
 
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktykaWłodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
Włodek Bielski: Efektywne wdrożenie BI - z notatnika praktyka
 
Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science? Amia 2013: How can bio-ontologies support clinical and translational science?
Amia 2013: How can bio-ontologies support clinical and translational science?
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...
 
Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...Cracking the Sales Management Code – Improved Sales Performance through Bette...
Cracking the Sales Management Code – Improved Sales Performance through Bette...
 
User empathy-with-acf
User empathy-with-acfUser empathy-with-acf
User empathy-with-acf
 

Similar a Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.

Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent searchRyan Wu
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsQamar un Nisa
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptxHakkinsRaj
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in EuropeSteven Miller
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistLisa Cohen
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
Career in Data Using Tableau
Career in Data Using TableauCareer in Data Using Tableau
Career in Data Using TableauJen Vaughan
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Fried data summit big data for lob content
Fried data summit big data for lob contentFried data summit big data for lob content
Fried data summit big data for lob contentJeff Fried
 

Similar a Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API. (20)

Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent search
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM Tools
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptx
 
Building the Data Science Profession in Europe
Building the Data Science Profession in EuropeBuilding the Data Science Profession in Europe
Building the Data Science Profession in Europe
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Career in Data Using Tableau
Career in Data Using TableauCareer in Data Using Tableau
Career in Data Using Tableau
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Fried data summit big data for lob content
Fried data summit big data for lob contentFried data summit big data for lob content
Fried data summit big data for lob content
 

Último

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.

  • 1. Could You Be a Data Scientist? Carlo Torniai, Ph.D. @carlotorniai
  • 2. Goal • Quantify data scientist profiles features • Analyze aspirant data scientist profiles • Provide useful feedback ?
  • 3. Why this is relevant? • A quantitative characterization of data scientists profiles can help closing the loop between job seekers and recruiters Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
  • 4. Data Collection • Linkedin API: – General Information – Past work history – Education • Web Scraping: – Skills • 1500 profiles – Data Scientists – Software Engineer – Business Analysts – Mathematicians – Statisticians
  • 5. Data Analysis Feature Extraction Software Engineers Business Analysts Data scientists Statisticians Mathematicians
  • 6. Data Analysis Feature Extraction Astronomy Bioinformatics Biology Computer Science Economics Electronics Engineering Math Neuroscience Other Physics Psychology Stats Number of PhDs by topic and profiles
  • 7. Model Testing For the purpose of this project I trained with skills and education features the following models: Random Forest • Classify the profile Naïve Bayes • Multi class probabilities to asses profiles background components K-means • Capability of suggesting similar and relevant profiles
  • 8. Model Testing For the purpose of this project I trained with skills and education features the following models: Model Training set Purpose Random Forest All 5 categories Classify the profile Naïve Bayes 4 classic categories: SE, BA, MT, ST Asses profile backgrounds components with multi class probabilities K-means All 5 categories Identify similar profiles
  • 10. Data Product Naïve Bayes Multi class probabilities Random Forest
  • 11. Data Product K-means clustering
  • 12. Next Steps Data Collection Data Analysis Feature Extraction Model Testing Data Product Get more data: - Other websites - Indeed - User input on Web app - Fine grained parsing of education - Experiment with additional features (industry, years of experience) • Extend feature set and test more models • Fuzzy C-means • Add interactive data collection • Personalized links for skills • Explanation about similarity results Close the loop by analyzing job offers and suggest matching profiles
  • 13. Thank you! Technologies Web App: Flask, jQuery, Vega, MongoDB NMF, HC, RF ,DT, NB, K-means models:: scikit-learn Visualizations: Vincent, Vega, NetworkX, Gephi Acknowledgement yatish27 : Ruby Linkedin public profile Web Scraper ozgut : Linkedin API Python wrapper