Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data Expo 2015 - Trillium software Big Data and the Data Quality

900 visualizaciones

Publicado el

Successful Big Data initiatives rely on accurate, complete data, but the information they draw on is often not validated when it enters an organization. In this session we will look at the challenges big data brings to an organization, and how data quality principles are adapting to ensure business goals and return on investments in big data are realised. We will cover:

- Challenges of big data
- Turning data lakes into reservoirs
- How data quality tools are adapting
- Why data governance disciplines remain crucial

Publicado en: Datos y análisis
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Big Data Expo 2015 - Trillium software Big Data and the Data Quality

  2. 2. 2 EMERGENCE OF THE “NEW” ENTERPRISE DATA HUB Data Sources Applications Data Warehouse Data Marts Databases RDBMS Files Reference Data Enterprise Applications Business Intelligence Custom Analytics Enterprise Hub New Sources Monitor & Manage The expanded Data Hub Data Ingestion + Volume + Velocity + Variety
  3. 3. 3 CHALLENGES WITH ENTERPRISE DATA  Multiple silos of information  Collating information is resource intensive  Analysis of data is difficult and intensive  Inconsistent, inaccurate, incomplete data  Difficult to reconcile  Manual overhead  No single version of the truth!
  4. 4. 4 BIG DATA USE CASES Profiled database (RDMS such as MySQL) Single Customer View • Cleanse, validate and match disparate customer data points to improve customer experience, customer insights, more targeted marketing Analytics • Ensure accuracy for downstream analytics initiatives for marketing, fraud detection, risk mitigation, etc. Data Lake • Data isn’t often cleansed as it enters the organization or data lake, resulting in larger scale of data quality issues Lower-cost storage, processing • Organizations seek low-cost, high-performance ways to store, process, analyze, and manage larger volumes of data at faster speeds
  5. 5. 5 BIG DATA CHALLENGES Common Big Data Roadblocks  Limited in-house expertise  Maturity of emerging technology  Alignment to business objectives  Complexity of unstructured data  Lack of trust and assurance in data  Inability to manage velocity of data expansion  Number of internal and external sources of data
  6. 6. 6 DATA QUALITY AND SINGLE CUSTOMER VIEWS Integrating data from multiple data sources presents differences in completeness, consistency and quality
  7. 7. 7 Can I trust this data enough to make my critical decisions? How accurate are these numbers? IMPACT OF POOR DATA QUALITY ON ANALYTICS Are these terms consistent with our business definitions? How current is this data? When was it last updated?
  8. 8. 8 COMPLEXITY OF UNSTRUCTURED DATA Revd new transfer claim ondiary. inj party still OOW and treating. Atty repped.called atty for status. Been treating for over 4 months now, sft tissue neck and back sprain. Clmnt complaining of numbness and tingling in fingers. Clmnt is now being scheduled for MRI and CT scan. RX has been written for oxycotin for pain. Atty will send all updated meds and records he has in his file. Severity Indicator ? Medication? Employment Status ?
  10. 10. 10 BIG DATA QUALITY CHALLENGES PERSIST “ I spend the vast majority of my time cleaning data systems…cleaning and preparing data sets makes everything I do better … it’s the highest value activity I do” Josh Willis Senior Director of Data Science Cloudera (From “Training a new generation of Data Scientists” – Cloudera video)
  11. 11. 11 SHIFT IN FOCUS Profiled database (RDMS such as MySQL) Big Data adopters moving beyond the hype and focusing on traditional challenges and business goals Top 3 Challenges  Finding value  Risk and governance (security, privacy, data quality)  Integrating multiple data sources Top 3 Priorities  Enhanced customer experience  Process efficiency  More targeted marketing Source: Gartner
  12. 12. 12 ABOUT TRILLIUM Trillium is a global provider and innovator of data quality solutions • A business unit of Harte Hanks (HHS-NYSE) • Over 2 decades in business with specific focus on data quality • Data quality solutions for Big Data, CRM, MDM, ERP, Single Customer Views, Data Integration Data Governance, Risk & Compliance, Fraud, Marketing Analyst Ratings Gartner  2014 Magic Quadrant: Leader Forrester  Forrester Wave 2013 – Leader Bloor Research  Market Leader Client Examples
  13. 13. 13 TRILLIUM BIG DATA • Graphically build DQ workflows • Reuse existing processes • Deploy natively in Hadoop • Leverage Hadoop processing architecture Trillium Server Interface Hadoop HDFS 17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | Parse Parse Standardize Match Commonize
  14. 14. 14 BENEFITS OF BIG DATA QUALITY Understand the impact of data quality and reduce downstream risk • Profile, analyze and measure the quality of multi-domain data • Create a data quality blueprint and plan for data cleansing Build the best view of your global customer data • Cleanse and enrich customer data and create single customer views • Improve business processes, detect fraud, create personalized customer experiences, and deploy targeted marketing campaigns Maximize the value of your Big Data investments • Power downstream machine learning initiatives and analytics platforms with reliable, fit-for-purpose data that supports timely, accurate business decisions 17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 |
  15. 15. 15 CONTACT INFORMATION email: Tel: +44 118 940 7634 web: 17 New England Executive Park, Suite 300 | Burlington, MA 01803 | 1-978-436-8900 | email: Tel: 0297 254 390 web: