SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
DATA MINING AND STATISTICAL ANALYSIS SOLUTIONS
Skills demand analysis based on the data from
online HR websites: Using web scraping and text
mining applications: IT Sector
Habet Madoyan
Vahe Movsisyan
Sunday, July 03, 2016
The analysis is funded by the research grant from American University of Armenia.
Presented at:
IX International School-Seminar. Town of Tsakhkadzor, Republic of Armenia
Methodology:
Overview
Datamotus LLC 2
Introduction
In recent years online job ads became a popular job-search model, that’s
why the research community is increasingly experimenting with the
detailed breakdown of online job ads to study labor market dynamics.
It is estimated that in USA 60-70 percent of job openings are now posted
on the Internet. However these job ads are biased toward industries and
occupations that seek high-skilled, “white-collar” workers.
Introduction
Job seekers, employers, students, researchers, policymakers, higher education
institutions, career advisors, and curriculum developers now view online job ads
data as a practical source to explore the nature of today’s dynamic of labor market.
Online job ads can show the relative demand for different types of skills and levels
of education. The real-time nature of job ads data also allows for the early
detection of labor demand trends, which gives job seekers, employers, and
policymakers a forward-looking analytical tool.
Real-time labor market indicators can be particularly useful in aligning education
and training curricula with workforce needs in emerging or rapidly changing
industries, such as healthcare and information technology, etc.
Job ads provide an incomplete picture of labor
demand
Online job ads data strongly correlate with job
openings data
Web Scraping
Text Mining
Datamotus LLC 7
Synopsys of the study
• Develop an algorithm for web scrapping job announcement
data (careercenter.am)
• Text mining and parsing algorithms to structure job
announcements
• Algorithms to assess and track vacancy rates by:
• Industry
• Job role
• Specific skills
What was done
• Around 20,000 posts are scrapped from the web,
• Posts come in rough, unstructured way. Algorithm is
developed to structure them.
A variable for each “section”
Total vacancy rate (Careercenter) and Official Labor
Demand (2004-2016 I Quarter)
Datamotus LLC 11
500
1000
1500
2000
2500
3000
100
150
200
250
300
350
400
450
500
550
600
2004Q1
2004Q2
2004Q3
2004Q4
2005Q1
2005Q2
2005Q3
2005Q4
2006Q1
2006Q2
2006Q3
2006Q4
2007Q1
2007Q2
2007Q3
2007Q4
2008Q1
2008Q2
2008Q3
2008Q4
2009Q1
2009Q2
2009Q3
2009Q4
2010Q1
2010Q2
2010Q3
2010Q4
2011Q1
2011Q2
2011Q3
2011Q4
2012Q1
2012Q2
2012Q3
2012Q4
2013Q1
2013Q2
2013Q3
2013Q4
2014Q1
2014Q2
2014Q3
2014Q4
2015Q1
2015Q2
2015Q3
2015Q4
2016Q1
Total jobs (Careercenter) Job Demand (NSS, right scale)
Correlation=0.76
Job Market Overview
IT sector
Datamotus LLC 12
ICT sector and overall economy
Datamotus LLC 13
3.00
3.20
3.40
3.60
3.80
4.00
4.20
4.40
1.60
1.70
1.80
1.90
2.00
2.10
2.20
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Average yearly wage in Transport and Communication sector/Average yearly wage in RA
Weight of Transport and Communication sector (including IT sector) in GDP (right scale, in %)
Total vacancy and IT sector vacancy rates (Careercenter,
2004-2016)
Datamotus LLC 14
0
20
40
60
80
100
120
140
160
180
200
100
150
200
250
300
350
400
450
2004Q1
2004Q2
2004Q3
2004Q4
2005Q1
2005Q2
2005Q3
2005Q4
2006Q1
2006Q2
2006Q3
2006Q4
2007Q1
2007Q2
2007Q3
2007Q4
2008Q1
2008Q2
2008Q3
2008Q4
2009Q1
2009Q2
2009Q3
2009Q4
2010Q1
2010Q2
2010Q3
2010Q4
2011Q1
2011Q2
2011Q3
2011Q4
2012Q1
2012Q2
2012Q3
2012Q4
2013Q1
2013Q2
2013Q3
2013Q4
2014Q1
2014Q2
2014Q3
2014Q4
2015Q1
2015Q2
2015Q3
2015Q4
2016Q1
Non IT Jobs (Careercenter) IT Jobs (Careercenter, right scale)
Correlation=0.81
Hard Skills in IT
Sector
Datamotus LLC 15
Time series: Annual demand for top 5 programming languages
Datamotus LLC 16
0
50
100
150
200
250
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
C++ Javascript Java C# PHP
Time series: Annual demand for top 5 programming languages
(parabolic trend)
Datamotus LLC 17
-30
20
70
120
170
220
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Poly. (C++) Poly. (Javascript) Poly. (Java) Poly. (C#) Poly. (PHP)
Analyzing demand for
programming languages using
association rules
Datamotus LLC 18
Arules
• Association rules mining is used to analyse the co-
occurrence of programming languages in a job post
• R package “”arules” and “arulesViz” are used for
the analysis
• Analysis is done for IT jobs only
Association rules: Measures of rules
interestingness
Datamotus LLC 20
Measure 1
Support = 𝑃 𝐴 ∩ 𝐵
Measure 2
Confidence = 𝑃 𝐵|𝐴 = 𝑃(𝐵 ∩ 𝐴)/𝑃(𝐴)
Measure 3
Lift =
𝑃 𝐵|𝐴
𝑃 𝐵
=
𝑃(𝐴∩𝐵)
𝑃(𝐴)
∗
1
𝑃(𝐵)
Suppose we have the rule : IF {A} = > {B}
Visualizing the rules
Datamotus LLC 21
Association Mining for
Programming languages: C++
Datamotus LLC 22
• Set of association rules is generated for top20 programming languages.
• Rules are subsetted with min support of 0.01 and min confidence of 0.1
Two items on the left
One item on the left
Association Mining for
Programming languages: Java
Datamotus LLC 23
Rules visualization:
Java (all rules)
Datamotus LLC 24
Rules Visualization:
Javascript
Datamotus LLC 25
Job Title Analysis
Datamotus LLC 26
IT Job Titles Frequency
Datamotus LLC 27
Most popular Job Titles (2004Q1-2016Q1) Percentage
software developer/engineer 18.29%
quality assurance engineer 5.42%
java software developer 4.98%
system administrator 4.00%
web developer 3.66%
.net developer 2.94%
php developer 2.33%
graphic designer 1.89%
ios developer 1.31%
android developer 1.26%
deep submicron 0.98%
database developer 0.96%
support specialist 0.96%
database administrator 0.92%
technical support 0.89%
technical writer 0.83%
support engineer 0.80%
application developer 0.72%
design engineer 0.72%
r&d engineer 0.68%
team leader 0.67%
frontend developer 0.55%
monitoring evaluation 0.52%
information security 0.50%
senior r&d 0.50%
57.29%
Software developer/engineer
Datamotus LLC 28
0
20
40
60
80
100
120
140
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Quality assurance engineer
Datamotus LLC 29
0
5
10
15
20
25
30
35
40
45
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
quality.assurance.engineer
Java software developer
Datamotus LLC 30
0
5
10
15
20
25
30
35
40
45
50
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
java.software.developer
System administrator
Datamotus LLC 31
0
5
10
15
20
25
30
35
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
system.administrator
Web developer
Datamotus LLC 32
0
5
10
15
20
25
30
35
40
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
web.developer
IT Job Titles vs Programming
languages
Job Titile => Programming language confidence Job Titile => Programming language confidence
{software developer/engineer} => {csharp} 0.33 {java software developer} => {java} 0.98
{software developer/engineer} => {java} 0.30 {java software developer} => {javascript} 0.47
{software developer/engineer} => {javascript} 0.20 {java software developer} => {j} 0.39
{software developer/engineer} => {asp} 0.20 {java software developer} => {shell} 0.11
{software developer/engineer} => {php} 0.12 {java software developer} => {ruby} 0.05
{software developer/engineer} => {j} 0.12 {system administrator} => {perl} 0.09
{software developer/engineer} => {tcl} 0.09 {system administrator} => {shell} 0.09
{software developer/engineer} => {python} 0.07 {system administrator} => {bash} 0.03
{software developer/engineer} => {cplusplus} 0.06 {system administrator} => {pl.sql} 0.02
{software developer/engineer} => {ruby} 0.03 {web developer} => {javascript} 0.76
{software developer/engineer} => {visual.basic} 0.02 {web developer} => {php} 0.57
{software developer/engineer} => {verilog} 0.02 {web developer} => {asp} 0.36
{quality assurance engineer} => {java} 0.27 {web developer} => {csharp} 0.27
{quality assurance engineer} => {shell} 0.25 {web developer} => {ruby} 0.02
{quality assurance engineer} => {perl} 0.22 {.net developer} => {asp} 0.82
{quality assurance engineer} => {python} 0.14 {.net developer} => {csharp} 0.80
{quality assurance engineer} => {tcl} 0.12 {.net developer} => {javascript} 0.42
{quality assurance engineer} => {bash} 0.04 {.net developer} => {visual.basic} 0.03
{quality assurance engineer} => {verilog} 0.04 {php developer} => {php} 1.00
{php developer} => {javascript} 0.71
{php developer} => {ruby} 0.08
{php developer} => {python} 0.07
Datamotus LLC 33
Next Steps:
• Develop machine learning algorithm to classify job ads by sectors,
• Develop state of art text mining and topic modeling algorithms to
predict demand for skills, professions and job roles,
• Create interactive web dashboard (using R shiny) to help:
• Potential job seekers
• Potential employees
• Policy makers
• Universities
Datamotus LLC 34
Thank You For Your Attention!
Datamotus LLC 35

Más contenido relacionado

La actualidad más candente

Software design of library circulation system
Software design of  library circulation systemSoftware design of  library circulation system
Software design of library circulation systemMd. Shafiuzzaman Hira
 
Comunicarea cromatica roslir 2006
Comunicarea cromatica   roslir 2006Comunicarea cromatica   roslir 2006
Comunicarea cromatica roslir 2006Dia Cora
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptChien Chung Shen
 
MySQL_SQL_Tunning_v0.1.3.docx
MySQL_SQL_Tunning_v0.1.3.docxMySQL_SQL_Tunning_v0.1.3.docx
MySQL_SQL_Tunning_v0.1.3.docxNeoClova
 
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱PgDay.Seoul
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)YoungHeon (Roy) Kim
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
Dba PostgreSQL desde básico a avanzado parte1
Dba PostgreSQL desde básico a avanzado parte1Dba PostgreSQL desde básico a avanzado parte1
Dba PostgreSQL desde básico a avanzado parte1EQ SOFT EIRL
 
Parallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLParallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLMydbops
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowKevin Kline
 

La actualidad más candente (12)

Software design of library circulation system
Software design of  library circulation systemSoftware design of  library circulation system
Software design of library circulation system
 
Comunicarea cromatica roslir 2006
Comunicarea cromatica   roslir 2006Comunicarea cromatica   roslir 2006
Comunicarea cromatica roslir 2006
 
Oracle Database SQL Tuning Concept
Oracle Database SQL Tuning ConceptOracle Database SQL Tuning Concept
Oracle Database SQL Tuning Concept
 
MySQL_SQL_Tunning_v0.1.3.docx
MySQL_SQL_Tunning_v0.1.3.docxMySQL_SQL_Tunning_v0.1.3.docx
MySQL_SQL_Tunning_v0.1.3.docx
 
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Dba PostgreSQL desde básico a avanzado parte1
Dba PostgreSQL desde básico a avanzado parte1Dba PostgreSQL desde básico a avanzado parte1
Dba PostgreSQL desde básico a avanzado parte1
 
Tema 2.- E-BUSINESS GLOBAL Y COLABORACIÓN.pdf
Tema 2.- E-BUSINESS GLOBAL Y COLABORACIÓN.pdfTema 2.- E-BUSINESS GLOBAL Y COLABORACIÓN.pdf
Tema 2.- E-BUSINESS GLOBAL Y COLABORACIÓN.pdf
 
Parallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLParallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQL
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
 
Redo log
Redo logRedo log
Redo log
 

Similar a IT Skills Analysis

가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215dasandata
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseLeMeniz Infotech
 
IRJET- Placement Portal
IRJET- Placement PortalIRJET- Placement Portal
IRJET- Placement PortalIRJET Journal
 
SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP Dave Stokes
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkJim Kaplan CIA CFE
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Dave Stokes
 
Web crawler with seo analysis
Web crawler with seo analysis Web crawler with seo analysis
Web crawler with seo analysis Vikram Parmar
 
SQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce AnalyticsSQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce AnalyticsSumit Sarkar
 
10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model GovernanceQuantUniversity
 
Rietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISVRietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISVFrank Rietta
 
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdfSLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdfSamuelNahum1
 
4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatestashutosh kumar
 
香港六合彩
香港六合彩香港六合彩
香港六合彩weige
 
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...OpenSource Connections
 
Online examination documentation
Online examination documentationOnline examination documentation
Online examination documentationWakimul Alam
 
Draft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industriesDraft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industriesARCResearch
 

Similar a IT Skills Analysis (20)

가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215가격표 Matlab korea academic january 2013_20130215
가격표 Matlab korea academic january 2013_20130215
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
 
Java Programming Materials
Java Programming MaterialsJava Programming Materials
Java Programming Materials
 
IRJET- Placement Portal
IRJET- Placement PortalIRJET- Placement Portal
IRJET- Placement Portal
 
SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP SkiPHP -- Database Basics for PHP
SkiPHP -- Database Basics for PHP
 
LokeshMahawarResume
LokeshMahawarResumeLokeshMahawarResume
LokeshMahawarResume
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
 
Web crawler with seo analysis
Web crawler with seo analysis Web crawler with seo analysis
Web crawler with seo analysis
 
SQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce AnalyticsSQL vs SOQL for Salesforce Analytics
SQL vs SOQL for Salesforce Analytics
 
10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance
 
Rietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISVRietta Business Intelligence for the MicroISV
Rietta Business Intelligence for the MicroISV
 
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdfSLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
SLIDES_Electrification__AI_and_the_Future_of_Engineering_Education.pdf.pdf
 
50120130406017
5012013040601750120130406017
50120130406017
 
4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest4+UpdatedAshuResumeLatest
4+UpdatedAshuResumeLatest
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...
 
Online examination documentation
Online examination documentationOnline examination documentation
Online examination documentation
 
ZaheerFinal20Aug
ZaheerFinal20AugZaheerFinal20Aug
ZaheerFinal20Aug
 
Draft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industriesDraft oct 22 executive summary burning glass targeted industries
Draft oct 22 executive summary burning glass targeted industries
 

Último

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Último (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

IT Skills Analysis

  • 1. DATA MINING AND STATISTICAL ANALYSIS SOLUTIONS Skills demand analysis based on the data from online HR websites: Using web scraping and text mining applications: IT Sector Habet Madoyan Vahe Movsisyan Sunday, July 03, 2016 The analysis is funded by the research grant from American University of Armenia. Presented at: IX International School-Seminar. Town of Tsakhkadzor, Republic of Armenia
  • 3. Introduction In recent years online job ads became a popular job-search model, that’s why the research community is increasingly experimenting with the detailed breakdown of online job ads to study labor market dynamics. It is estimated that in USA 60-70 percent of job openings are now posted on the Internet. However these job ads are biased toward industries and occupations that seek high-skilled, “white-collar” workers.
  • 4. Introduction Job seekers, employers, students, researchers, policymakers, higher education institutions, career advisors, and curriculum developers now view online job ads data as a practical source to explore the nature of today’s dynamic of labor market. Online job ads can show the relative demand for different types of skills and levels of education. The real-time nature of job ads data also allows for the early detection of labor demand trends, which gives job seekers, employers, and policymakers a forward-looking analytical tool. Real-time labor market indicators can be particularly useful in aligning education and training curricula with workforce needs in emerging or rapidly changing industries, such as healthcare and information technology, etc.
  • 5. Job ads provide an incomplete picture of labor demand Online job ads data strongly correlate with job openings data
  • 6.
  • 8. Synopsys of the study • Develop an algorithm for web scrapping job announcement data (careercenter.am) • Text mining and parsing algorithms to structure job announcements • Algorithms to assess and track vacancy rates by: • Industry • Job role • Specific skills
  • 9. What was done • Around 20,000 posts are scrapped from the web, • Posts come in rough, unstructured way. Algorithm is developed to structure them.
  • 10. A variable for each “section”
  • 11. Total vacancy rate (Careercenter) and Official Labor Demand (2004-2016 I Quarter) Datamotus LLC 11 500 1000 1500 2000 2500 3000 100 150 200 250 300 350 400 450 500 550 600 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4 2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3 2010Q4 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 2013Q1 2013Q2 2013Q3 2013Q4 2014Q1 2014Q2 2014Q3 2014Q4 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 Total jobs (Careercenter) Job Demand (NSS, right scale) Correlation=0.76
  • 12. Job Market Overview IT sector Datamotus LLC 12
  • 13. ICT sector and overall economy Datamotus LLC 13 3.00 3.20 3.40 3.60 3.80 4.00 4.20 4.40 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Average yearly wage in Transport and Communication sector/Average yearly wage in RA Weight of Transport and Communication sector (including IT sector) in GDP (right scale, in %)
  • 14. Total vacancy and IT sector vacancy rates (Careercenter, 2004-2016) Datamotus LLC 14 0 20 40 60 80 100 120 140 160 180 200 100 150 200 250 300 350 400 450 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4 2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3 2010Q4 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 2013Q1 2013Q2 2013Q3 2013Q4 2014Q1 2014Q2 2014Q3 2014Q4 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 Non IT Jobs (Careercenter) IT Jobs (Careercenter, right scale) Correlation=0.81
  • 15. Hard Skills in IT Sector Datamotus LLC 15
  • 16. Time series: Annual demand for top 5 programming languages Datamotus LLC 16 0 50 100 150 200 250 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 C++ Javascript Java C# PHP
  • 17. Time series: Annual demand for top 5 programming languages (parabolic trend) Datamotus LLC 17 -30 20 70 120 170 220 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Poly. (C++) Poly. (Javascript) Poly. (Java) Poly. (C#) Poly. (PHP)
  • 18. Analyzing demand for programming languages using association rules Datamotus LLC 18
  • 19. Arules • Association rules mining is used to analyse the co- occurrence of programming languages in a job post • R package “”arules” and “arulesViz” are used for the analysis • Analysis is done for IT jobs only
  • 20. Association rules: Measures of rules interestingness Datamotus LLC 20 Measure 1 Support = 𝑃 𝐴 ∩ 𝐵 Measure 2 Confidence = 𝑃 𝐵|𝐴 = 𝑃(𝐵 ∩ 𝐴)/𝑃(𝐴) Measure 3 Lift = 𝑃 𝐵|𝐴 𝑃 𝐵 = 𝑃(𝐴∩𝐵) 𝑃(𝐴) ∗ 1 𝑃(𝐵) Suppose we have the rule : IF {A} = > {B}
  • 22. Association Mining for Programming languages: C++ Datamotus LLC 22 • Set of association rules is generated for top20 programming languages. • Rules are subsetted with min support of 0.01 and min confidence of 0.1 Two items on the left One item on the left
  • 23. Association Mining for Programming languages: Java Datamotus LLC 23
  • 24. Rules visualization: Java (all rules) Datamotus LLC 24
  • 27. IT Job Titles Frequency Datamotus LLC 27 Most popular Job Titles (2004Q1-2016Q1) Percentage software developer/engineer 18.29% quality assurance engineer 5.42% java software developer 4.98% system administrator 4.00% web developer 3.66% .net developer 2.94% php developer 2.33% graphic designer 1.89% ios developer 1.31% android developer 1.26% deep submicron 0.98% database developer 0.96% support specialist 0.96% database administrator 0.92% technical support 0.89% technical writer 0.83% support engineer 0.80% application developer 0.72% design engineer 0.72% r&d engineer 0.68% team leader 0.67% frontend developer 0.55% monitoring evaluation 0.52% information security 0.50% senior r&d 0.50% 57.29%
  • 28. Software developer/engineer Datamotus LLC 28 0 20 40 60 80 100 120 140 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
  • 29. Quality assurance engineer Datamotus LLC 29 0 5 10 15 20 25 30 35 40 45 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 quality.assurance.engineer
  • 30. Java software developer Datamotus LLC 30 0 5 10 15 20 25 30 35 40 45 50 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 java.software.developer
  • 31. System administrator Datamotus LLC 31 0 5 10 15 20 25 30 35 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 system.administrator
  • 32. Web developer Datamotus LLC 32 0 5 10 15 20 25 30 35 40 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 web.developer
  • 33. IT Job Titles vs Programming languages Job Titile => Programming language confidence Job Titile => Programming language confidence {software developer/engineer} => {csharp} 0.33 {java software developer} => {java} 0.98 {software developer/engineer} => {java} 0.30 {java software developer} => {javascript} 0.47 {software developer/engineer} => {javascript} 0.20 {java software developer} => {j} 0.39 {software developer/engineer} => {asp} 0.20 {java software developer} => {shell} 0.11 {software developer/engineer} => {php} 0.12 {java software developer} => {ruby} 0.05 {software developer/engineer} => {j} 0.12 {system administrator} => {perl} 0.09 {software developer/engineer} => {tcl} 0.09 {system administrator} => {shell} 0.09 {software developer/engineer} => {python} 0.07 {system administrator} => {bash} 0.03 {software developer/engineer} => {cplusplus} 0.06 {system administrator} => {pl.sql} 0.02 {software developer/engineer} => {ruby} 0.03 {web developer} => {javascript} 0.76 {software developer/engineer} => {visual.basic} 0.02 {web developer} => {php} 0.57 {software developer/engineer} => {verilog} 0.02 {web developer} => {asp} 0.36 {quality assurance engineer} => {java} 0.27 {web developer} => {csharp} 0.27 {quality assurance engineer} => {shell} 0.25 {web developer} => {ruby} 0.02 {quality assurance engineer} => {perl} 0.22 {.net developer} => {asp} 0.82 {quality assurance engineer} => {python} 0.14 {.net developer} => {csharp} 0.80 {quality assurance engineer} => {tcl} 0.12 {.net developer} => {javascript} 0.42 {quality assurance engineer} => {bash} 0.04 {.net developer} => {visual.basic} 0.03 {quality assurance engineer} => {verilog} 0.04 {php developer} => {php} 1.00 {php developer} => {javascript} 0.71 {php developer} => {ruby} 0.08 {php developer} => {python} 0.07 Datamotus LLC 33
  • 34. Next Steps: • Develop machine learning algorithm to classify job ads by sectors, • Develop state of art text mining and topic modeling algorithms to predict demand for skills, professions and job roles, • Create interactive web dashboard (using R shiny) to help: • Potential job seekers • Potential employees • Policy makers • Universities Datamotus LLC 34
  • 35. Thank You For Your Attention! Datamotus LLC 35