This presentation briefly discusses about the following topics:
Data Analytics Lifecycle
Importance of Data Analytics Lifecycle
Phase 1: Discovery
Phase 2: Data Preparation
Phase 3: Model Planning
Phase 4: Model Building
Phase 5: Communication Results
Phase 6: Operationalize
Data Analytics Lifecycle Example
2. (CentreforKnowledgeTransfer)
institute
DISCUSSIONTOPICS
Data Analytics Lifecycle
Importance of Data Analytics Lifecycle
Phase 1: Discovery
Phase 2: Data Preparation
Phase 3: Model Planning
Phase 4: Model Building
Phase 5: Communication Results
Phase 6: Operationalize
Data Analytics Lifecycle Example
3. (CentreforKnowledgeTransfer)
institute
DATA ANALYTICS LIFECYCLE
The Data analytic lifecycle is designed for Big Data problems
and data science projects.
The cycle is iterative to represent real project.
To address the distinct requirements for performing analysis
on Big Data, step – by – step methodology is needed to
organize the activities and tasks involved with
acquiring,
processing,
analyzing, and
repurposing data.
4. (CentreforKnowledgeTransfer)
institute
IMPORTANCE OF DATA ANALYTICS
LIFECYCLE
DataAnalytics Lifecycle defines the roadmap of how data is generated, collected,
processed, used, and analyzed to achieve business goals.
It offers a systematic way to manage data for converting it into information that
can be used to fulfil organizational and project goals.
The process provides the direction and methods to extract information from the
data and proceed in the right direction to accomplish business goals.
Data professionals use the lifecycle’s circular form to proceed with data analytics
in either forward or backward direction.
Based on the newly received insights, they can decide whether to proceed with
their existing research or scrap it and redo the complete analysis.
The Data Analytics lifecycle guides them throughout this process.
5. (CentreforKnowledgeTransfer)
institute
PHASE 1: DISCOVERY
The data science team learn and investigate the problem.
Develop context and understanding.
Come to know about data sources needed and available for the project.
The team formulates initial hypothesis that can be later tested with data.
6. (CentreforKnowledgeTransfer)
institute
PHASE 2: DATA PREPARATION
Steps to explore, preprocess, and condition data prior to modeling and analysis.
It requires the presence of an analytic sandbox, the team execute, load, and
transform, to get data into the sandbox.
Data preparation tasks are likely to be performed multiple times and not in
predefined order.
Several tools commonly used for this phase are – Hadoop,Alpine Miner, Open
Refine, etc.
7. (CentreforKnowledgeTransfer)
institute
PHASE 3: MODEL PLANNING
Team explores data to learn about relationships between variables and
subsequently, selects key variables and the most suitable models.
In this phase, data science team develop data sets for training, testing, and
production purposes.
Team builds and executes models based on the work done in the model planning
phase.
Several tools commonly used for this phase are – Matlab, STASTICA
8. (CentreforKnowledgeTransfer)
institute
PHASE 4: MODEL BUILDING
Team develops datasets for testing, training, and production purposes.
Team also considers whether its existing tools will suffice for running the models
or if they need more robust environment for executing models.
Free or open-source tools – Rand PL/R, Octave,WEKA.
Commercial tools – Matlab , STASTICA.
9. (CentreforKnowledgeTransfer)
institute
PHASE 5: COMMUNICATION RESULTS
After executing model team need to compare outcomes of modeling to criteria
established for success and failure.
Team considers how best to articulate findings and outcomes to various team
members and stakeholders, taking into account warning, assumptions.
Team should identify key findings, quantify business value, and develop narrative
to summarize and convey findings to stakeholders
10. (CentreforKnowledgeTransfer)
institute
PHASE 6: OPERATIONALIZE
The team communicates benefits of project more broadly and sets up pilot
project to deploy work in controlled way before broadening the work to full
enterprise of users.
This approach enables team to learn about performance and related constraints
of the model in production environment on small scale , and make adjustments
before full deployment.
The team delivers final reports, briefings, codes.
Free or open source tools – Octave,WEKA, SQL, MADlib.
11. (CentreforKnowledgeTransfer)
institute
DATA ANALYTICS LIFECYCLE EXAMPLE
Consider an example of a retail store chain that wants to optimize its products’ prices for boosting its
revenue.
The store chain has thousands of products over hundreds of outlets, making it a highly complex scenario.
Once you identify the store chain’s objective, you find the data you need, prepare it, and go through the
Data Analytics lifecycle process.
You observe different types of customers, such as ordinary customers and customers like contractors who
buy in bulk.
According to you, treating various types of customers differently can give you the solution.
However, you don’t have enough information about it and need to discuss this with the client team.
In this case, you need to get the definition, find data, and conduct the hypothesis testing to check whether
various customer types impact the model results and get the right output.
Once you are convinced with the model results, you can deploy the model, integrate it into the business,
and you are all set to deploy the prices you think are the most optimal across the outlets of the store.