Data and Analytics Career Paths, Presented at IEEE LYC'19.
About Speaker:
Ahmed Amr is a Data/Analytics Engineer at Rubikal, where he leads, develops, and creates daily data/analytics operations, which includes data ingestion , data streaming, data warehousing, and analytical dashboards. Ahmed is graduated from Computer Engineering Department, Alexandria University; and he is currently pursuing his MSc degree in Computer Science, AAST. Professionally, Ahmed worked with Egyptian/US startups such as (Badr, Incorta, WhoKnows) to develop their data/analytics projects. Academically, Ahmed worked as a Teaching Assistant in CS department, AAST. Ahmed helps software companies to develop robust data engineering infrastructure, and powerful analytical insights.
References:
1) https://www.datacamp.com/community/tutorials/data-science-industry-infographic
2) Analytics: The real-world use of big data, IBM, Executive Report
2. Road Map
● Defining Data Science.
● Data Science Marketplace.
● Required Skills for Data Science.
● Data Science Career Paths.
● Day in the life of Data Scientist.
3. Data Science, hype/reality?
“Data Scientist: The Sexiest Job of the 21st Century” – Thomas H.
Davenport and D.J. Patil
“Analytics is defined as the scientific process of transforming
data into insight for making better decisions.” – The Institute for
Operations Research and the Management Sciences (INFORMS)
“With more and more companies using big data, the demand for
data analytic specialists,—sometimes called data scientists, who
know how to manage the tsunami of information, spot patterns
within it and draw conclusions and insights—is nearing a frenzy.”
– Chris Morris, CNBC
4. Data Scientist
● “A person who is better at statistics than any software engineer
and better at software engineering than any statistician.”
- Josh Wills- Director of Data Science at Cloudera
● “Data scientists are inquisitive: exploring, asking questions, doing
“what if” analysis, questioning existing assumptions and
processes. Armed with data and analytical results, a top-tier data
scientist will then communicate informed conclusions and
recommendations across an organization’s leadership structure. ”
- Anjul Bhambhri, IBM
6. Role of Computer Science
Empowering Statistics
Solving a wide practical problems by providing
number of crunching and massive storage.
7. Inventions
Accelerating the pace of the marriage between
Statistics and Data Science.
1960s
Database Management Systems (DBMS)
1970s
Relational DBMS
8. Knowledge Discovery and Data Mining
Late 1980s
Terms like Knowledge Discovery and Data Mining
started to be used widely.
10. Data Science
Late 1990s
The phrase data science first appeared to inspire
professionals to harness the power of data by
effectively analyzing them and producing useful
intelligence.
Statistician is replaced by data scientist.
11. Analytics
Mid 2000s
The word analytics was adopted by data scientists
to emphasize the fact that an increasing number
of companies started to heavily rely on the
statistical and quantitative analysis of data
as well as predictive modeling to make informed
decisions so that they can compete better with
other businesses.
13. 1-Data Infrastructure Technologies
● Support how data is :
1) Shared.
2) Processed.
3) Consumed.
● Distributed Computing and Cloud Computing.
○ Virtualization and distributed file sharing.
14. Distributed Computing
● An approach to break down a task into smaller
pieces that are easier to process.
● Each element in the task is assigned to a
processor which could be geographically
dispersed.
● A software is necessary to manage all aspects of
distributed computing.
○ i.e. Hadoop
15. Cloud Computing
● Platform to support distributed computing.
● A bunch of computers housed in data centers.
● Can be used as an easy hardware for distributed
computing.
16. 2-Data Management Technologies
● Data Management is handled by DBMS.
● Data Science requires highly scalable, reliable,
efficient ways to store, manage and process data.
● Structure and unstructured data.
17. 3-Visualization Technologies
● Acquired insights need to be conveyed to
leadership of an organization.
● Effective communication with non-experts.
● Responsible for increasing the impact of the data
science project results.
19. Fraud Detection
● Criminals are committing fraud against banking
sector.
● In the past:
○ Significant human intervention.
○ Desired outcome to improve accuracy.
● Today:
○ Machine Learning and Big data analytics
20. Social Media Analytics
● Huge Amount of data, Millions of posting.
● Metadata is valuable.
○ Data about data, such as location information
and timestamps.
● IBM personality insights product.
○ Uncover a deeper understanding of customers
personality to companies.
24. Data Mining and Analytics Skill
1. Classification:
● Constructs a model with knows labels.
● Data represented into discrete sets.
● Can categorize trustworthy and not
trustworthy users for an online banking
system.
25. Data Mining and Analytics Skill
2. Prediction:
● Builds a model that predict a continuous or
ordered values.
● These models can predict for example, mean
time to failures for computers.
26. Data Mining and Analytics Skill
3. Clustering:
● Is a process of grouping similar data objects
into a class.
● Helps reveal features that distinguish one class
of data objects from the other, leading to new
discoveries on a dataset.
● As an example, clustering can reveal people
with similar purchasing behaviours.
27. ●
Machine Learning Skill
● Machine learning is based on self-learning or
self-improving algorithms.
● In machine learning, a computer starts with a
model, and continues to enhance it through
trial and error.
● It can then provide meaningful insight in the
form of classification, prediction, and
clustering.
28. ●
Machine Learning Skill
● A data scientist needs to be familiar with
models that commonly used in Data such as:
○ Logistic regression.
○ Support vector machines.
○ Bayesian methods.
29. ●
Statistics Skill
● Lays a foundation for data science.
● The more you know about it, the better.
● At minimum, you need to know:
○ Probability.
○ Correlations.
○ Variables, distributions, and regression.
○ Null hypothesis significance tests.
○ Confidence intervals, ANOVA, t-tests, and chi-square
○ Tools like:
■ R, Excel.
30. ●
Visualization Skill
● Important skill to overcome the challenge of
effectively communicating the results of data
analytics to an audience.
● Tableau offers one of the most popular and
comprehensive visualization tools for data
scientists. It supports a variety of visualization
elements such as different types of charts,
graphs, maps.
31. ●
Programming Skill
● Ability to code in at least one of the
programming languages such as Python, Java,
or Scala.
● Many languages have powerful libraries to
clean and process your data (pandas)
● Along with powerful libraries to build machine
learning models (i.e. sci-kit learn)