Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 25 Anuncio

Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote

The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.

This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.

The “Big Data era” has ushered in an avalanche of new technologies and approaches for delivering information and insights to business users. What is the role of the cloud in your analytical environment? How can you make your migration as seamless as possible? This closing keynote, delivered by Joe Caserta, a prominent consultant who has helped many global enterprises adopt Big Data, provided the audience with the inside scoop needed to supplement data warehousing environments with data intelligence—the amalgamation of Big Data and business intelligence.

This presentation was given as the closing keynote at DBTA's annual Data Summit in NYC.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote (20)

Más reciente (20)

Anuncio

Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote

  1. 1. @joe_Caserta#DataSummit @joe_Caserta Architecting Data For The Modern Enterprise Presented by Joe Caserta May 17, 2017 Data Summit 2017 New York City #DataSummit
  2. 2. @joe_Caserta#DataSummit
  3. 3. @joe_Caserta#DataSummit About Joe Caserta Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Data Analysis, Data Warehousing and Business Intelligence since 1996 Began consulting database programing and data modeling 25+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise magazine Launched Data Science, Data Interaction and Cloud practices Laser focus on extending Data Analytics with Big Data solutions 1986 2004 1996 2009 2001 2013 2012 2014 Dedicated to Data Governance Techniques on Big Data (Innovation) Awarded Top 20 Big Data Companies 2016 Top 20 Most Powerful Big Data consulting firms Launched Big Data Warehousing (BDW) Meetup NYC: 2,000+ Members 2016 Awarded Fastest Growing Big Data Companies 2016 Established best practices for big data ecosystem implementations
  4. 4. @joe_Caserta#DataSummit About Caserta Concepts – Consulting Data Innovation – Award-winning company – Internationally recognized work force – Strategy, Architecture, Implementation, Governance – Innovation Partner – Strategic Consulting – Advanced Architecture – Build & Deploy - Leader in Enterprise Data Solutions – Big Data Analytics – Data Warehousing – Business Intelligence Data Science Cloud Computing Data Governance
  5. 5. @joe_Caserta#DataSummit Why is Data so Important? 1500s Printing Press 1840s Penny Post 1850s Telegraph 1850s Rural Free Post 1890s Telephone 1900s Radio 1950s TV 1970s PCs 1980s Internet 1990s Web 2000s Social Media, Mobile, Big Data, Cloud 98,000+ Tweets 695,000 Status Updates 11 Million instant messages 698,445 Google Searches 168 million+ emails sent 1,829 TB of data created 217 new mobile web users Every 60 Seconds
  6. 6. @joe_Caserta#DataSummit Understanding the Customer Awareness Consideration Purchase Service Loyalty Expansion PR Radio TV Print Outdoor Word of Mouth Direct Mail Customer Service Physical Touchpoints Digital Touchpoints Search Paid Content email Website/ Landing Pages Social Media Community Chat Social Media Call Center Offers Mailings Survey Loyalty Programs email Agents Partners Ads Website Mobile 3rd Party Sites Offers Web self-service
  7. 7. @joe_Caserta#DataSummit Life As We Know It Business: “I need to analyze some new data”  IT collects requirements  Creates normalized and/or dimensional data models  Profiles and conforms and the data  Sophisticated ETL programs and quality standards  Loads it into data models  Builds a BI semantic layer  Creates dashboards and reports IT: “You can access your data in 3-6 months to see if it has value! – Onboarding new data is difficult! – Rigid Structures and Data Governance – Disconnected/removed from business
  8. 8. @joe_Caserta#DataSummit The Problem: Shadow IT = Data Sprawl • There is one application for every 5-10 employees generating copies of the same files leading to massive amounts of duplicate idle data strewn all across the enterprise. - Michael Vizard, ITBusinessEdge.com • Employees spend 35% of their work time searching for information... finding what they seek 50% of the time or less. - “The High Cost of Not Finding Information,” IDC
  9. 9. @joe_Caserta#DataSummit
  10. 10. @joe_Caserta#DataSummit The New Data Paradigm OLD WAY: • Structure Data  Ingest Data  Analyze Data • Fully Governed • Monolith NEW WAY: • Ingest Data  Analyze Data  Structure Data • Just Enough Governance • Dynamic RECIPE: • Data Officer & Data Organization • Enterprise Data Lake • Corporate Data Pyramid
  11. 11. @joe_Caserta#DataSummit Business Value Cloud-based Data Lake Big Data Analysis: The Ecosystem of the future Analyze Persist DeployIngest Data Integration Identity Resolution Data Quality Discovery Exploration Machine Learning Models Development Reports / Dashboards Applications APIs Structured Data Unstructured Data SQL, NoSQL, Object Store Find Share Collaborate Data Engineer Data Scientist Business Analyst App Developer Provides innovative and industry leading technologies to rapidly be applied to the business without having to manage compatibility and data complexity. Technical Value Provides an open framework to reduce the number of integration points and testing environments to deliver business solutions. or
  12. 12. @joe_Caserta#DataSummit Ingest Raw Data Organize, Define, Complete Munging, Blending Machine Learning Data Quality and Monitoring Metadata, ILM , Security Data Catalog Data Integration Fully Governed ( trusted) Arbitrary/Ad-hoc Queries and Reporting Usage Pattern Data Governance Metadata, ILM, Security Corporate Data Pyramid (CDP)
  13. 13. @joe_Caserta#DataSummit Cloud Component AWS Google Microsoft Scalable distributed storage S3 GCS Azure Storage Pluggable fit-for-purpose processing EMR DataProc HDInsight Compute Services EC2 GCE VMs Consistent extensible framework Spark Spark Spark Dimensional MPP Data Warehouse Redshift BigQuery Azure SQL Data Warehouse Data Streaming Kenesis PubSub Azure Stream Common Interface Jupyter DataLab Azure Notebook The Data Lake on the Cloud • Remove barriers between data ingestion and analysis • Democratize data with Just Enough Data Governance (JEDG)
  14. 14. @joe_Caserta#DataSummit Which Cloud?
  15. 15. @joe_Caserta#DataSummit The Clouds Coalesce Percent of organizations with AWS as primary, also uses GCP Percent of organizations with AWS as primary, also uses Azure Percent of organizations with GCP as primary, also uses AWS 41% 32% 31% Source: Clutch, 2016
  16. 16. @joe_Caserta#DataSummit • Development local or distributed is identical • Beautiful high level API’s • Full universe of Python modules • Open source and Free • Blazing fast! Spark has become our default processing engine for a data engineering & science Why Spark?
  17. 17. @joe_Caserta#DataSummit Analytics Development Lifecycle • Data Science is performed in the ephemeral workspaces • The work products of data science is promoted from “insights” to real applications. • Rigorous Data Governance applied • Processes must be hardened, repeatable, and performant Big$ Data$ Warehouse$ Data$Science$Workspace$ Data$Lake$–$Integrated$Sandbox$$ Landing$Area$–$Source$Data$in$“Full$Fidelity”$ New Data New Insights Governance Refinery
  18. 18. @joe_Caserta#DataSummit Unexpected Reaction to Change
  19. 19. @joe_Caserta#DataSummit Global economics Intensity of competition Reduce costs Move to cross-functional teams New executive leadership Speed of technical change Social trends and changes Period of time in present role Status & perks of office/dept under threat No apparent reasons for proposed changes Lack of understanding of proposed changes Fear of inability to cope with new technology Concern over job security Forces for Change Forces Resisting Change Status Quo Moving the Status Quo http://www.change-management-coach.com/force-field-analysis.html
  20. 20. @joe_Caserta#DataSummit Introducing the Chief Data Officer • Evangelize a data vision for the organization • Support & enforce data governance policies via outreach, training & tools • Monitor and enforce data quality in collaboration with data owners • Monitor and enforce data security along with Legal/Security/Compliance • Work with IT to develop/maintain an enterprise repository of strategic data • Set standards for analytical reporting and generate data insights • Provide a single point of accountability for data initiatives and issues • Innovate ways to use existing data • Enrich and augment data by combining internal and external sources • Support efficient and agile analytics through training and templates
  21. 21. @joe_Caserta#DataSummit The CDO: The Whole Brain Challenge Front Back Analytics Oriented • Data Science • Research Process Oriented • Data Governance • Compliance Operations Oriented • Shared Services • Data Engineering Revenue Oriented • Revenue Goals • Monetizing Data
  22. 22. @joe_Caserta#DataSummit Chief Data Organization (Oversight) Vertical Business Area [Sales/Finance/Marketing/Operations/Customer Svc] Product Owner SCRUM Master Agile Development Team Business Subject Matter Expertise Data Librarian/Data Stewardship Data Science/ Statistical Skills Data Engineering / Architecture Presentation/ BI Report Development Skills Data Quality Assurance DevOps IT Organization (Oversight) Enterprise Data Architect Solution Engineers Data Integration Practice User Experience Practice QA Practice Operations Practice Advanced Analytics Business Analysts Data Analysts Data Scientists Statisticians Data Engineers Planning Organization Project Managers Data Organization Data Gov Coordinator Data Librarians Data Stewards Agile Data Teams
  23. 23. @joe_Caserta#DataSummit Caution: Assembly Required  Some of the most hopeful tools are brand new or in incubation  Enterprise big data implementations typically combine products with custom built components The Buildout People, Processes and Business commitment are still critical! Data Integration & Quality Data Catalog & Governance Emerging Solutions
  24. 24. @joe_Caserta#DataSummit What the Future Holds • DevOps for Analytics • Search-Based BI (NLP) • Artificial Intelligence (AI) • Virtual Reality BI (VR) • Virtual Assistant BI (Voice) • Reporting/Predictions Converge • Citizen Data Scientists Emerge
  25. 25. @joe_Caserta#DataSummit Joe Caserta President, Caserta Concepts joe@casertaconcepts.com @joe_caserta Thank You!

Notas del editor

  • Capture, Analyze, influence, and maximize every touchpoint online and offline
  • Ask DG effectiveness questions.
  • Recent article - Oct 21, 2015
  • 80% of all business are doing something
  • The paradigm shift is in the way we onboard and process data:

    Formerly, we structured data before we would ingest and analyze it, Now, we ingest and analyze data, and then structure it.
    This allows immediate access for both analysts and data scientists
    Streamlines the path to cash register
    We have also moved from fixed capacity to on-demand infrastructure
    Large datasets and new datasets are being added at a rapid rate
    They could grow or shrink on demand; many of the providers are startups
    This minimizes the cost of operation
    From Monolith to Ecosystem
    No one set of tools will solve everything
    Use a diverse set of technologies, and let them evolve over time
    Solve for this using a combination of three concepts:
    Cloud Computing, Data lake, and the Polyglot Warehouse.
  • Data has different audience and usage patterns each tier.
    All tiers work cohesively to comprise the Big Data Ecosystem
    All tiers are governed. Only the top tier is fully governed
    When to use late bind, decided when to structure on case by case.
    7 components of gov: Org, Metadata, Security, DQ, Business Integration, MDM, ILM

    Organization
    This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.
    Metadata
    Definitions, lineage (where does this data come from), business definitions, technical metadata
    Privacy/Security
    Identify and control sensitive data, regulatory compliance
    Data Quality and Monitoring
    Data must be complete and correct. Measure, improve, certify
    Business Process Integration
    Policies around data frequency, source availability, etc.
    Master Data Management
    Ensure consistent business critical data i.e. Members, Providers, Agents, etc.
    Information Lifecycle Management (ILM)
    Data retention, purge schedule, storage/archiving
  • https://clutch.co/cloud/resources/amazon-web-services-vs-google-cloud-platform-vs-microsoft-azure

  • “Big Box” tools vs ROI?
    Prohibitively expensive  limited by licensing $$$
    Typically limited to the scalability of a single server
  • Cascading, Zementis
  • I’ve been doing it this way for 15 years. It works, don’t mess with it! People must learn: Evolution is inevitable. Evolve or die.
  • Kurt Lewin’s Force Field analysis
  • Data Governance
    Data Insight
    Generate Revenue
    Reduce Risk
  • Over the course of my 30-year career, more change has occurred in the last three years, than in the previous 27 combined. This has been the most disruptive period in data science that I’ve seen.

×