This video will give you an idea about Data science for beginners.
Also explain Data Science Process , Data Science Job Roles , Stages in Data Science Project
3. DISCOVERY
It involves acquiring data from all the identified
internal and external sources which helps you to
answer the business question.
The data can be :
1. Logs from webservers
2. Data gathered from social media
3. Census datasets
4. Data streamed from online sources using APIs
4. DATA PREPARATION
Data can have lots of inconsistencies like
missing value,blank columns,incorrect data
format which needs to be cleaned.
You need to process,explore and condition
data before modeling.
The cleaner your data, the better are your
predictions.
5. MODEL PLANNING
In this stage, you need to determine the
method and technique to draw the relation
between input variables.
Planning for a model is performed by using
different statistical formulas and
visualization tools like SQL analysis
services, R and SAS/access
6. MODEL BUILDING
Data scientist distributes datasets for
training and testing.
Techniques like association, classification,
and clustering are applied to the training
dataset.
The model once prepared is tested
against the “testing” dataset
7. OPERATIONALIZE
You deliver the final baselined model with
reports,code and technical documents.
Model is deployed into a real-time
production environment after through
testing.
8. COMMUNICATE RESULTS
The key findings are communicated to all
stakeholders.
This helps you to decide if the results of
the project are a success or a failure
based on the inputs from the model.
9.
10. MOST PROMINENT DATA SCIENTIST JOB TITLES ARE :
1) Data scientist
2) Data engineer
3) Data analyst
4) Statistician
5) Data admin
6) Business analyst
11. Data Scientist
ROLE LANGUAGES
It is a professional who
manages enormous
amounts of data to come
up with compelling
business visions by using
various tools, techniques,
methodologies, algorithms
etc…
R
SAS
PYTHON
SQL
HIVE
MATLAB
PIG
SPARK
12. Data Engineer
ROLE LANGUAGES
He is working with large
amounts of data and
develops constructs,
tests and maintains
architectures like large
scale processing system
and databases.
SQL
HIVE
R
SAS
MATLAB
PYTHON
JAVA
RUBY
C++
PERL
13. Data Analyst
ROLE LANGUAGES
Responsible for mining vast
amounts of data and look
for relationships, patterns,
trends in data.
Later deliver compeling
reporting and visualization
for analyzing the data to
take the most viable
business decisions.
R
PYTHON
HTML
JS
C
C++
SQL
14. Statistician
ROLE LANGUAGES
Collects, analyses,
understand qualitative
and quantitative data by
using statistical theories
and methods.
SQL
R
MATLAB
TABLEAU
PYTHON
PERL
SPARK
HIVE
15. Data Administrator
ROLE LANGUAGES
Data admin should
ensure that the database
is accessible to all
relevant users also
makes sure that it is
performing correctly and
is being kept safe from
hacking
RUBY on Rails
SQL
JAVA
C#
PYTHON
16. Business Analyst
ROLE LANGUAGES
This professional need to
improves business
processes and He is an
intermediary between the
business executive team
and IT department
SQL
TABLEAU
POWER BI
PYTHON
17.
18.
19. DEFINE THE GOAL
Define a measurable and quantifiable goal
Goal should be specific and precise
Goal is come up with candidate
hypothesis. These hypothesis can then be
turned into concrete questions or goals for
a full-scale modeling project.
20. COLLECT AND MANAGE DATA
Time consuming step
Conduct initial exploration and
visualization of the data
Clean data: repair data errors and
transform variables as needed
21. BUILD THE MODEL
Most common data science modeling tasks are
Classification
Scoring
Ranking
Clustering
Finding relations
Characterization
22. EVALUATE AND CRITIQUE MODEL
Once you have a model, you need to
determine if it meets your goals :
Is it accurate enough for your needs ?
Does it perform better than the obvious
guess ?
Do the results of the model make sense in
the context of the problem domain ?
23. PRESENT RESULTS AND DOCUMENT
Present results to your project sponser
and other stakeholders.
Document the model for those in the
organization who are responsible for
using running and maintaining the model
once it has been deployed.
24. DEPLOY MODEL
Make sure that the model can be updated
as its environment changes.
The model initially be deployed in a small
pilot program.
25.
26. Several ways of gathering data for
analysis are :
CSV FILE
FLAT FILE(tab, space
or any other separator)
TEXT FILE(In a single
file- reading data all at
once) or (reading data
line by line)
ZIP FILE
APIs(JSON)
MULTIPLE TEXT
FILE(data is split over
multiple text files)
DOWNLOAD FILE
FROM INTERNET(file
hosted on a server)
WEBPAGE(scraping)
RDBMS(SQL tables)
27.
28. Relational database uses tables which
are called Records
Establish connections among records by
using primary key and foreign key
Allows users to establish defined
relationships between tables
In RDBMS, we use SQL instructions to
reproduce and analyze data separately
29.
30. SOME COMMONLY USED PLOTS FOR EDA ARE :
Histogram
Scatter plots
Maps
Feature corelation plot(Heatmap)
Time series plots
31.
32. Data management platforms enables
organizations and enterprises to use data
analytics in beneficial ways, such as :
Personalizing the customer experience
Adding value to customer interactions
Improving customer engagement
Increasing customer loyalty
Reaping and revenues associated with data
driven marketing
Identifying the root causes of marketing failures
and business issues in real time