(Presented at MapR's Big Data Everywhere event in Redwood City, CA in December 2016)
The relationship between business teams and IT has changed as the complexity of data has increased. A traditional data pipeline designed for an IT-centered approach to information management is not designed for the data demands of today's business decisions. Designing a big data strategy requires modernizing previous approaches. Self-service data preparation in a collaborative, intuitive, governed, and secure environment is the key to a nimble and decisive business unit.
3. Paxata’s mission (since 2012)
Deliver the only enterprise-grade data preparation platform
for everyone to transform raw, meaningless data into
valuable, contextual and complete information
4. 4
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
83%Companies agree that data is
their most strategic asset
5. 5
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
80%Time analysts will spend trying to
create data sets to draw insights
83%Companies agree that data is
their most strategic asset
6. 6
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
12%Amount of data most companies
estimate they are analyzing
80%Time analysts will spend trying to
create data sets to draw insights
83%Companies agree that data is
their most strategic asset
7. 7
The data chasm
Source: Gartner News Room: http://www.gartner.com/newsroom/id/2975018
12%Amount of data most companies
estimate they are analyzing
80%Time analysts will spend trying to
create data sets to draw insights
83%Companies agree that data is
their most strategic asset
10. Traditional data preparation creates a bottleneck
Business teams have complex data sources for analytics projects
11. Traditional data preparation creates a bottleneck
Business teams funnel their requirements to IT
IT-centric data preparation
Business
Information
12. Traditional data preparation creates a bottleneck
IT runs requirements through a linear ETL process
executed with manual scripting or coding
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
13. Traditional data preparation creates a bottleneck
IT reviews with business. Makes changes, fixes errors.
(Repeat)
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
14. Business teams make decisions before data is available
-or-
Ask for changes and restart the process.
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
Traditional data preparation creates a bottleneck
15. Designed for highly specialized technical people to prepare data for
business teams
IT-Centric Data Preparation
Model Extract Transform Load Optimize
Business
Information
Traditional data preparation creates a bottleneck
16. Designing for highly specialized technical
people to prepare data for business teams.
Expensive
Complicated
Error-prone
Time-consuming
18. Modern architecture: balancing freedom with responsibility
Built for business
•Freedom and
flexibility with
collaboration
19. Modern architecture: balancing freedom with responsibility
Collect and manage data
Time
Built for business
•Freedom and
flexibility with
collaboration
Enabled by IT
•Data governance,
scale, efficiency
20. Modern information pipeline is
Built for business
Freedom and flexibility with collaboration
Enabled by IT
Data governance, scale, efficiency
22. Data prep must address the range of information workers
Source: Forrester Research, Inc., “Info Workers Will Erase The Boundary Between
Enterprise and Consumer Technologies,” August 30, 2012
Deep Technical Skills Limited Technical Skills
Data Scientist
Data Developer
Data Analyst
Business Analyst
Information
Worker
23. Data prep must address the range of information workers
Source: Forrester Research, Inc., “Info Workers Will Erase The Boundary Between
Enterprise and Consumer Technologies,” August 30, 2012
Deep Technical Skills Limited Technical Skills
Data Scientist
(200K)
Data Developer
(600K)
Data Analyst
(100M)
Business Analyst
(275M)
Information
Worker
(460M)
Deliver the only enterprise-grade data preparation platform that lets everyone transform raw, meaningless data into valuable, contextual and complete information
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
To seize the opportunity you must cross this data chasm.
Why…Because its hard
Traditional, legacy technologies and processes that companies currently leverage were NOT designed for the variety and volume of data that companies are working with today.
Companies need to be more nimble
We have many customers that have 10’s of Millions invested annual in traditional ETL processes, and they were still spending too much time preparing data and not on the value added tasks of analytics.
They selected Paxata to help complement these technologies and fill the gaps with a more exploratory, interactive experience.
Deliver the only enterprise-grade data preparation platform that lets everyone transform raw, meaningless data into valuable, contextual and complete information
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Visual Data Discovery Tools – people had a hunger to get at and dig into their data – traditional small spreadsheets or databases
1. Business teams funnel their data requirements to IT
2. IT runs requirements through linear ETL process, executed with manual scripting or coding
3. IT reviews with business, makes changes, fixes errors. Repeats this cycle.
4. By then, business teams make decisions long before data is available or they ask for changes and re-start the process
Traditional Technologies Do Not Meet Today’s Needs
Batch, Complicated, No Visibility, IT Only, Time Consuming, Error Prone, Expensive
Legacy infrastructure for data preparation was never designed to scale to the orders of magnitude more data and the orders of magnitude more consumers of today’s information-driven world.
A model in which a small set of highly skilled IT data scientists and data developers take business requirements and then execute a highly prescribed, lengthy, waterfall process for preparing data only to more often than not realize that they missed the mark as they lack the business context, is not a viable model.
Slide use: problem of data (option 4)
This is a five-part slide. Use this along with the 4 slides before it.
Talking Points: Big Data and self-service analytics necessitate a fundamental transformation from an IT-centric data preparation process to a self-service data preparation model. In the self-service model, the steps that make of data preparation – data integration, quality, cleansing, enrichment and shaping don’t go away, they need to be re-imagined in a way that enables the business or data analyst to accomplish these tasks on their own which in turn empowers them to work with vertical slices of relevant data and get the results they want, when they need them. However, it’s important that the self-service model also provide the governance and traceability that IT requires to maintain trust in data and analytic results. In this new model, IT’s role changes to collection and centralization of access to raw data and to providing the right infrastructure to the business that drive self-service data preparation and analytics, while maintaining full governance.
Slide use: problem of data (option 4)
This is a five-part slide. Use this along with the 4 slides before it.
Talking Points: Big Data and self-service analytics necessitate a fundamental transformation from an IT-centric data preparation process to a self-service data preparation model. In the self-service model, the steps that make of data preparation – data integration, quality, cleansing, enrichment and shaping don’t go away, they need to be re-imagined in a way that enables the business or data analyst to accomplish these tasks on their own which in turn empowers them to work with vertical slices of relevant data and get the results they want, when they need them. However, it’s important that the self-service model also provide the governance and traceability that IT requires to maintain trust in data and analytic results. In this new model, IT’s role changes to collection and centralization of access to raw data and to providing the right infrastructure to the business that drive self-service data preparation and analytics, while maintaining full governance.
Slide use: Who are the data analysts
Talking points: This pyramid describes the typical information work roles in today’s enterprises and highlights the dramatic scale that self-service data preparation can bring. Legacy and many Big Data tools target the Data Scientist and the Data Developer, but as you can see there are hugely more data analysts our there, and self-service data prep empowers them to drive their own data destiny, breaking the logjam of traditional IT-constrained ETL and data preparation. By Data Analysts, we are referring to Power Excel users or Tableau users who understand data and analytics, but don’t write code or scripts. For self-service data prep to truly transform an organization, it must empower the data analyst; however, self-service data prep simplifies many traditionally complex and time-consuming preparation operations and the work of data scientists and data developers can be dramatically accelerated by self-service data prep.
Source: Prakash VC deck
Slide use: Who are the data analysts
Talking points: This pyramid describes the typical information work roles in today’s enterprises and highlights the dramatic scale that self-service data preparation can bring. Legacy and many Big Data tools target the Data Scientist and the Data Developer, but as you can see there are hugely more data analysts our there, and self-service data prep empowers them to drive their own data destiny, breaking the logjam of traditional IT-constrained ETL and data preparation. By Data Analysts, we are referring to Power Excel users or Tableau users who understand data and analytics, but don’t write code or scripts. For self-service data prep to truly transform an organization, it must empower the data analyst; however, self-service data prep simplifies many traditionally complex and time-consuming preparation operations and the work of data scientists and data developers can be dramatically accelerated by self-service data prep.
Deliver the only enterprise-grade data preparation platform that lets everyone transform raw, meaningless data into valuable, contextual and complete information