In the past few years third-party data marketplaces, often provided as Data as a Service, have taken off. But most organizations already own the data most relevant to their business – data pertaining to their own customers, transactions, products, etc.
That’s why the most successful organizations are applying the concepts of external data markets to create their own enterprise data marketplaces, where users can easily find and access data from across the company that is clean, trustworthy and auditable.
View this webinar on-demand to learn how to build an enterprise data marketplace of your own with DMX-h! We'll cover:
• Attributes of a successful enterprise data marketplace
• Potential roadblocks, and how to overcome them
• Examples of customers who have successfully built data marketplaces with DMX-h
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Building Your Enterprise Data Marketplace with DMX-h
1. Building Your Enterprise Data
Marketplace with DMX-h
Jennifer Cheplick
Sr. Director, Product Marketing
2. Today’s agenda
• The Need for an Enterprise Data Marketplace
• Attributes of a Successful Enterprise Data Marketplace
• Building an Enterprise Data Marketplace
• Potential Roadblocks
• How Syncsort Helps
3. 3
Data Growth
(quintillion) bytes of
data created every day
of the world’s data generated
in the past two years alone
smart devices
projected by 20202.5Q 90% 200B
4. Data Delivers
Competitive
Advantage
“Compared with their
peers, high performers
report a greater variety
of actions to monetize
data – with greater
revenue impact”
- McKinsey Global Survey: Fueling growth
through data monetization
Enterprise Data Marketplace4
73.2%
Percentage of executives
whose firms have
achieved measurable
results from Big Data
and AI investments
- NewVantage Partners Big Data Executive
Survey 2018
$1.8 Trillion
Projected annual
revenue for insights-
driven businesses by
2021
- “Insights-Driven Businesses Set the Pace
for Global Growth,” Forrester, October 19,
2018
85%
Firms that leverage
customer behavioral
insights outperform peers
by 85 percent in sales
growth and 25 percent in
gross margin
- McKinsey Global Survey: Capturing value
from your customer data
5. Enterprise Data Marketplace5
Promise of a Data-Driven Culture
ACCURATE ANALYTICS & FASTER TIME-TO-VALUE
▪ Reduce bias, uncertainty, and misunderstanding
▪ Uncover new, previously inaccessible insights
▪ Accelerate speed of organizational decision-making
▪ Gain the most accurate, in-depth view of your customers
▪ Monitor and respond to customer activity in real-time
▪ Ensure confidence in regulatory reporting
▪ Identify and manage risk more quickly and completely
▪ Minimize time spent on manual data preparation
▪ Ensure accuracy of global operations and supply chain
TARGETED MARKETING & REVENUE GROWTH
OPERATIONAL EFFICIENCY & COST REDUCTION
REDUCED RISK & COMPLIANCE WITH CONFIDENCE
6. • Data has outgrown the
data warehouse
• Data lakes can be
polluted and chaotic
• Data is inconsistent
across data marts
Enterprise Data Marketplace6
• Every part of the
business demands
sophisticated data
analysis
• Departments need
access to the
company’s many
data sets,
combined in
different ways
• IT can’t be a
bottleneck
But most
organizations
are not getting
the full value of
their data
91% of organizations
have not yet reached
a “transformational”
level of maturity in
data and analytics
- Gartner
68% of IT professionals
state that data silos
negatively impact their
organization’s ability
to get value from their
data
7. The Rise of
The Enterprise
Data
Marketplace
• Enables data-driven
organizations
• Analytics teams and
business users can shop
and find the data they
need
• Data can be combined
for ever-expanding
applications
Overcomes the
limitations of previous
solutions to deliver
the best of each, in
one central repository
• Volume and variety of
the data lake
• Veracity and auditability
of the data warehouse
• Velocity and specificity
of purpose of the data
mart
Enterprise Data Marketplace7
Enables data-driven
organizations
8. Enterprise Data
Marketplace
Attributes:
Reliability
Provides a centralized location for
curated, trusted data, that it is:
• Clean
• Standardized
• Verified
Guardian Life Insurance
needed to enable Machine
Learning, visualization and BI
on broad range of datasets,
and reduce time-to-market for
analytics projects.
• Reduce data preparation,
transformation times
• Make data assets available to
whole enterprise – including
Mainframe data
Enterprise Data Marketplace8
Data Marketplace –
centralized, reusable, up-to-the-
minute current, searchable,
accessible, managed,
trustworthy data for analytics
Fast Time-to-Market
for new analytics and reporting
9. Enterprise Data
Marketplace
Attributes:
Flexibility
Pulls data from across the
enterprise and allows users to pick
and choose the data you need,
depending on what you want to
accomplish.
Progressive Insurance needs
cost-effective, easily accessible
operational data – including
Claims Liability, Policy,
Customer, Incident and more –
for advanced analytics
• Data marketplace includes 50
data sources
• More are added as business
needs evolve
Enterprise Data Marketplace9
Better Analytics – with readily
accessible, up-to-date data.
Fast Analytics Time-to-Market –
Data available in hours not days.
Audit Trails for Compliance
while keeping the EDW current
Low Archival Costs
10. Enterprise Data
Marketplace
Attributes:
Availability
Empower analytics teams to create
new data schemas on their own
• The right data sets are available
• Data is always up to date and
ready for various types of
analytics
• Removes wait times and IT
bottlenecks
Analysts at Symphony
Health no longer wait for
requests for specific data
schemas, or data subsets,
to work their way through
the IT team’s queue
Enterprise Data Marketplace10
“Before, part of the
data wasn’t available
for a day, and other
parts, not for a week.
Now it’s all available
for analysis within
minutes of the data
arriving.”
Robert Hathaway
Senior Manager Big Data
11. Today’s agenda
• The Need for an Enterprise Data Marketplace
• Attributes of a Successful Enterprise Data Marketplace
• Building an Enterprise Data Marketplace
• Potential Roadblocks
• How Syncsort Helps
12. Enterprise Data Marketplace12
Building an Enterprise Data Marketplace
Data Lake or Cloud
Raw Landing Zone
Access & Onboard – Elect to include data to understand
• What you don’t know CAN hurt you – e.g. bias
• If you’ve left it out, you cannot know it exists
• Data sets have more power to predict when combined
13. Enterprise Data Marketplace13
Building an Enterprise Data Marketplace
Data Lake or Cloud
Raw Landing Zone
Refined Zone
Refine – cleanse, enrich, de-duplicate
• What data needs refinement? – use cases will determine
• Each data set should be refined once – don’t repeat work
14. Enterprise Data Marketplace14
Building an Enterprise Data Marketplace
Data Lake or Cloud
Raw Landing Zone
Refined Zone
Track Provenance
• Data lineage documentation is necessary for establishing data
can be trusted, and for auditing, regulatory compliance
• Also, useful for reproducing steps in production machine
learning data pipelines
15. Enterprise Data Marketplace15
Building an Enterprise Data Marketplace
Data Lake or Cloud
Raw Landing Zone
Refined Zone
Shop for data sets, features & validate against your questions
• Analyst, data scientist shops for data
• What do I need for my purpose?
• Quality is already assured, provenance documented
• Improves trust, saves time
16. 5 Potential Roadblocks to Building Your
Enterprise Data Warehouse
• Can be trapped in
hard-to-reach
systems like
mainframes, etc.
• Found in streams
in from POS, web
clicks, etc.
• Incompatible
formats, making it
difficult to gather
and prepare the
data for model
training.
Enterprise Data Marketplace16
Data Cleansing
at Scale
• Cleanse, enrich,
de-duplicate
• What data needs
refinement? – use
cases will
determine
• Each data set
should be refined
once – don’t
repeat work
Tracking
Lineage from
the Source
• Capture of
complete lineage,
from source to end
point – across
systems -- is
needed.
• Data changes made
to help train
models have to be
exactly duplicated
in production, in
order for models to
accurately make
predictions on new
data, and for
required audit
trails.
Entity
Resolution
• Matching across
massive datasets
that indicate a
single specific
entity (person,
company,
product, etc.)
• Requires
sophisticated
multi-field
matching
algorithms and a
lot of compute
power.
Siloed, Hard to
Reach Datasets
Ongoing Real-
Time Changed
Data Capture
• Tracking and
detection needs
to happen very
rapidly.
• Current
transactions need
to be constantly
added to
combined
datasets,
prepared and
presented to
models as close
to real-time as
possible.
17. Today’s agenda
• The Need for an Enterprise Data Marketplace
• Attributes of a Successful Enterprise Data Marketplace
• Building an Enterprise Data Marketplace
• Potential Roadblocks
• How Syncsort Helps
18. 18
Build Your Enterprise Data Marketplace with Syncsort
Onboard ALL
enterprise
data.
Access
Join, transform,
cleanse, de-
duplicate batch
or streaming
data.
Integrate
Secure, govern,
manage and
monitor
everything.
Comply
Design once,
deploy anywhere.
Simplify
19. 19
Simplify Big Data Integration with Syncsort
Simplify Big Data Integration
Onboard ALL
enterprise
data.
Access
20. Enterprise Data Marketplace20
Access & Integrate ALL Enterprise Data – Mainframe to Streaming
Data Sources
Onboard data, modify
on-the-fly to match
Hadoop storage model,
or store unchanged for
archive and compliance.
Access data from
streaming and batch
sources outside
cluster.
Cluster or Cloud
Data
Refine, transform, join,
cleanse, enhance
data in cluster or Cloud
with MapReduce,
EMR, or Spark.
21. Simplify Big Data Integration21
Comply: Govern and Track Everything for Compliance
• Metadata and data lineage for Hive, Avro and Parquet
through HCatalog
• Metadata lineage export and API from DMX/DMX-h
• Simplify audits, analytics dashboards, metrics
• Integrate with enterprise metadata repositories
• Cloudera Navigator certified integration
• Track lineage from source – even changes made off cluster
• HDFS, YARN, Spark and other metadata
• Lineage, tagging
• Business and structural metadata
• Apache Atlas ingestion lineage integration
• Lineage, tagging
• Track lineage from source – even changes made off cluster
DMX-h
22. Simplify Big Data Integration22
Comply: Secure the Entire Process
• Native Kerberos and LDAP support
• Kerberos-secured clusters
• Authenticated browsing
• Authenticated sampling
• Security certified
• Apache Ranger
• Apache Sentry
• FTPS, Connect:Direct secure data transfers
DMX-h
23. 23
Simplify: Design Once, Deploy Anywhere
Simplify Big Data Integration
Intelligent Execution - Insulate your organization from underlying complexities of Hadoop.
Get excellent performance every time
without tuning, load balancing, etc.
No re-design, re-compile, no re-work ever
• Future-proof job designs for emerging
compute frameworks, e.g. Spark 2.x
• Move from dev to test to production
• Move from on-premise to Cloud
• Move from one Cloud to another
Use existing ETL skills
No parallel programming – Java, MapReduce, Spark …
No worries about:
• Mappers, Reducers
• Big side or small side of joins …
Design Once
in visual GUI
Deploy Anywhere!
On-Premise,
Cloud
Mapreduce, Spark,
Future Platforms
Windows, Unix,
Linux
Batch,
Streaming
Single Node,
Cluster
24. Trillium Quality for Big Data – Data Cleansing at Scale
Boost effectiveness of machine learning, AI with complete, standardized data.
1. Visually create and test data
quality processes locally
2. Execute in MapReduce or Spark
On premise or in the Cloud
25. Build Your
Enterprise
Data
Warehouse
with Syncsort
“Ingestion has
gone from
days to hours”
- Progressive Big Data Tech
Lead
“DMX-h is already
optimized. We use
its Intelligent
Execution and it
just performs.”
Enterprise Data Marketplace25
“DMX-h is already
optimized. We use
its Intelligent
Execution and it
just performs.”
- Robert Hathaway
Senior Manager Big Data,
Symphony Health
“We found DMX-h
to be very usable
and easy to ramp
up in terms of
skills. Most of all,
Syncsort has been
a very good
partner in terms
of support and
listening to our
needs.”
- Alex Rosenthal, Enterprise
Data Office, Guardian Life
Insurance
Visit
www.syncsort.com
to learn more