Publicidad

2017 OpenWorld Keynote for Data Integration

Vice President of Product Management at Oracle
3 de May de 2021
Publicidad

Más contenido relacionado

Presentaciones para ti(20)

Similar a 2017 OpenWorld Keynote for Data Integration(20)

Publicidad
Publicidad

2017 OpenWorld Keynote for Data Integration

  1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | OpenWorld 2017 Data Integration Platform Keynote Next-Gen Enterprise Data Management Jeff Pollock Vice President, Product Management PaaS and Big Data Integration & Governance October 02, 2017 Confidential – Oracle Internal/Restricted/Highly Restricted
  2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cloud Platform Confidential – Oracle Internal/Restricted/Highly Restricted On-Prem Operations Insights from Analytics Move Workloads Embrace SaaS Modernize AppDev Our Most Innovative Customers are on a Journey to Cloud…
  3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Photo Film Music Industry Maps Television Spotify Netflix Smartphone Waze Yellow Pages Yelp Digital Transformation is the Key Business Driver…
  4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 4 Business & economic model Strategic execution & delivery Common resources Business opportunities Integrated Applications, IT & Data Managed as one 4 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Cloud Platform Confidential – Oracle Internal/Restricted/Highly Restricted On-Prem Operations Insights from Analytics Migrate Oracle and Non-Oracle Workloads Disaster Recovery in the Cloud Move Data Warehouses Connect and Extend Apps Move Workloads Integrate & Automate SaaS with On-Prem Extend for Social, Mobile, Process Embrace SaaS Unify SSO and Security Gain Insights from Combined Analytics Build Cloud Native Apps Dev/Test Environments Visual Development Innovate with Intelligent Bots Modernize AppDev Migrate Analytics, Warehouse Enable Smart Self-Service Insights across Data Lakes Integrated Apps, Data & IT are Mandatory for Success…
  6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 6 Oracle Integration Platform Comprehensive Best-of-Breed Capabilities for All Integration Needs Applications Infrastructure Analytics Integration for… Integration for… Integration for… Cloud Integrations On-Premises Integrations Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 6 Unified Technology Platform (PaaS)
  7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 7 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 7 Applications Infrastructure Analytics Integration for… Integration for… Integration for… Unified Technology Platform (PaaS) Application Integration API Management Process Integration Stream Processing Data Replication Bulk Data ETL & E-LT Metadata Management Data Quality Unified Integration Capabilities Converged Solution for All Integration Needs
  8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 8 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 8 Oracle Integration Platform Converged Solution for All Integration Needs Complete Simplified Open DATA GOVERNANCE PROCESS AUTOMATION STREAM ANALYTICS API MANAGEMENT APPLICATION INTEGRATION DATA QUALITY BULK DATA TRANSFORMATION REAL TIME DATA STREAMING AND DATA REPLICATION
  9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 9 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 9 NEW: Oracle Data Integration Platform Integrate Cloud and On Premise Data Lakes and Data Warehouses …a Unified solution …that’s Easy to use …for Powerful data-driven solutions Key Capabilities 1. Data High Availability 2. Data Migrations 3. Data Warehouse Automation 4. Databus & Stream Integration 5. Data Governance
  10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 10 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 10 DIPC Solution Use Cases Database record level sharding Data High Availability Multi-Region Cloud Availability (Oracle or Amazon) Active-Active Databases Migrate from Amazon RDS to Oracle Cloud Data Migrations PeopleSoft or Workday into Fusion HCM Oracle Database Migrations into 12c Customer 360 from Salesforce or Sales Cloud DW/Mart Automation Marketing Analytics on Big Data Cloud Move a Data Warehouse into the Cloud Streaming ETL for Data Pipelines Streaming Integration 3 Kinds of Data Lineage for LoB and IT Users Serving Layer for Raw Data Access Prepared Data Subscriptions for LoB Data Governance Data Catalog and Policies Data Profiling and Cleansing
  11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | BUT: Data Management is going through a major transformation…
  12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Discovery RESTful API for Producers and Subscribers (events are pushed) Raw Data Topics Schema Event Topics Data Pipeline (ETL) Prepared Data Topics Master Data Topics Data Pipeline (ETL) 1,000’s 100’s 10’s Oracle Open World 2015 12 App DB App DB App DB ERP Operational Data Store EDW Staging Prod ETL ETL ETL ETL ETL Mart Mart Mart ETL Enterprise BI Mart Mart Mart ETL Departmental BI Discovery App DB App DB App DB ERP WebApps Mobile EDW NoSQL Hadoop / Spark Marts Marts Less Governed --------------------------------------------------------------- More Governed Enterprise BI Departmental BI Apps / Mobile Classical Data Management: Hub and Spoke • Invasive on Sources • High Latency / SLA • Mainly Relational Views • Heavy IT process overhead • Vendor-centric software Next-Gen: Streaming Databus/Kappa • Low impact on Sources • Low Latency (< 1 second) • Variety of Data Formats • More Agile DevOps processes • Open source centric software GoldenGate MDM Hub After 20yrs Reign… Hub-and-Spoke is now a Legacy • ODS & ETL Hubs • EDW/Mart Hubs • MDM/RDM Hubs • Static Data Lake Hubs • Pub/Sub for Staging • ETL in Pipelines • Analytics/CEP in Stream • Data is in Motion NoSQL / APIs LEGACY: NEXT-GEN: Less Governed ---------------------------------------------------- More Governed Physical Layer for ETL Pipelines = MPP Streaming (eg; Apache Spark Streaming) Physical Layer for Events = MPP Messaging (eg; Apache Kafka)
  13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Data Staging or Archive Data Discovery ETL Offload Batch Layer Oracle Confidential 13 Business Data Analytics EDWs Data Streams Social and Logs Enterprise Data Highly Available Databases Databus (topic modeling) Stream Analytics ETL Data Pipelines Speed Layer Our Vision is to enable the modern ‘Kappa style' data architecture for Enterprise Strength solutions • Raw Data Layer common ingestion point for all enterprise data sources • Speed Layer data processing for streaming data and ETL data pipelines, in-memory • Batch Layer data processing for huge data volumes, that may span long time periods, using MPP • Serving Layer technologies for easy access to any data, at any latency Raw Data Layer Raw Events Changed Data Schema Events Core Design Pattern: Kappa-style Databus Pub / Sub REST APIs NoSQL Bulk Data Serving Layer Apps
  14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 14 Business Data Serving Layer Apps Analytics EDWs Batch Layer Data Streams Social and Logs Enterprise Data Highly Available Databases Analytics Speed Layer Pub / Sub REST APIs NoSQL Bulk Data Raw Data Layer Oracle Approach: Blend of Commercial + Open Source Modern Architecture will be a ‘Hybrid Open-Source’ pattern: • Open Source at the core of speed and batch processing engines for general purpose data workloads • Enterprise Vendors for connecting to legacy systems, strong governance, and for highly optimized workloads • Cloud Platforms for Dev-Test (at least), rapid prototyping and eventually all production workloads • SaaS & Applications are key data “producers” and will remain largely proprietary and/or highly customized
  15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Proof this is a Pattern: Many Instantiations Kafka Storm | Spark | Apex | Flink MapReduce | Pig | Hive | Spark Cassandra | HBase Hive Event Hubs Stream Analytics Data Lake Table Storage SQL Server Data Factory Kinesis Firehose EMR Dynamo Redshift DMS Pub/Sub Dataflow Dataproc BigTable BigQuery
  16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 16 Business Data Serving Layer Apps Analytics EDWs Batch Layer Data Streams Social and Logs Enterprise Data Highly Available Databases Analytics Speed Layer Pub / Sub REST APIs NoSQL Bulk Data Raw Data Layer Best-of-Breed: Oracle Platform for Kappa-style Architecture Oracle Software can help customers Accelerate & Reduce Risk around adoption: • Ingest Data with lower latency, greater reliability and from any database using Oracle GoldenGate • ETP Pipelines for Data automate pipeline creation with zero-footprint using Oracle Data Integrator • Analyze Data In-Motion run temporal, spatial and predictive algorithms with Oracle Stream Analytics • Foundation Services for hosting Kafka (Event Hub) Spark/Hadoop (Big Data Cloud) or Relational (Database) • Govern the data flowing through Kappa architecture with Oracle Metadata Management GoldenGate Data Integrator Stream Analytics Event Hub Big Data Database Metadata Management (for Data Governance)
  17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Kappa at Massive Scale Using eBay’s Rheos Confidential – Oracle Internal/Restricted/Highly Restricted Connie Yang Principal MTS for eBay Data Platform eBay Software Engineering October 02, 2017 Presented by
  18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Rheos: A Business Focused Real-Time Data Platform ✓ Fully managed real-time streaming data platform built with Oracle GoldenGate, Kafka, MirrorMaker and Storm ✓ Provide shared, curated, “private” streams and stream processing computation running on eBay cloud ✓ Dynamic stream endpoint discovery ✓ Standardized data format & stream catalog ✓ Secure stream access control ✓ Data movement across security zones over a TLS connection ✓ Comprehensive monitoring, alerting and remediation
  19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Business Motivation Value ✓ Data Democratization ✓ Real-Time Seller Insights ✓ Data-Driven Recommendation ✓ Data-Driven Business Models ✓ Higher Conversion Rates Method ✓ Standardized event header with Avro and stream namespaces ✓ A schema registry to store metadata or schema definition for each stream ✓ Logical to physical stream mapping ✓ Lifecycle Management Service for node provisioning, replacement, administering remediation SOPs ✓ End-to-end monitoring and alerting at the stream, node and cluster level ✓ Stream access authentication via Identity Service ✓ Data mirroring to support use cases’ HA model as well as their data movement requirements
  20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Rheos Services Lifecycle Management Service - a cloud service that provisions and provides full lifecycle management for Zookeeper, Kafka, Storm, MirrorMaker, [soon-to-be-available] Flink clusters Core Service - consists of these components: Kafka Proxy Server, Schema Registry, Metadata System, and Management Health Check Service - monitors the health of each asset (for example, a Kafka, Zookeeper, or MirrorMaker node) that is provisioned through the Lifecycle Management Service in these aspects: node state, cluster health, source & sink traffic, lag and etc. Mirroring Service - provides high data availability and integrity by mirroring data from source cluster to one or more target clusters. This service is also used to perform data movement across security zones.
  21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Fun Facts Rheos @ Scale Alignment with Oracle 232+ OGG producers 200+ streams > 200 billion events per day 840+ stream producers 1400+ stream consumers 2500+ compute nodes 90+ Oracle tables > 28 billion change events per day second(s) latency from DB to Kafka
  22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | What’s Next? ✓ Upgrade to Oracle Integrated Extract based solution ✓ Provide Flink as Rheos’ stream processing framework ✓ Full lifecycle management for stream processing applications ✓ Run Flink and Kafka as Kubernetes cloud-natives
  23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | THANK YOU! Confidential – Oracle Internal/Restricted/Highly Restricted Connie Yang Principal MTS for eBay Data Platform eBay Software Engineering October 02, 2017 Presented by
  24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Sushi Principle of Data: “Data is Best Served Raw”
  25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | All Enterprise Data Sources Oracle Confidential 25 Sushi Principle of Data: “Data is Best Served Raw” Poly- Structured Relational RAW DATA SCHEMA EVENTS <produce> <produce> <produce> Many customers want to consume their data “raw” …they prefer it close to the source of truth <subscribe>
  26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Raw Data Layer Apps Layer Speed Layer Batch Layer Oracle Confidential 26 State of the Art Data Ingestion: GoldenGate + Kappa Streaming Analytics Application Serving Layer REST Services Visualization Tools Reporting Tools Data Marts Capture Trail Route Deliver Pump GG GG User Updates DBMS Updates GoldenGate for Big Data Supported Platforms Kafka HDFS Fastest, most scalable and non-invasive way to ingest data into Apache. Benefits of low-impact on Sources, micro-second access to transactions and ability to replicate schema (DDL) events for downstream automation of change impact. GG used with 4 of top 5 largest Kafka clusters in the world… From user update to serving layer in <1 second & no impact on Source
  27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 27 De-Coupling of the Database: Downstream Processing Mid-Tier for Log Mine Eliminate overhead on DBMS Primary Site Primary Secondary Log Mine GoldenGate Capture Trail Route Deliver Pump Business Apps Active DataGuard WAN REDO Transport Remote DR Host Eliminate overhead on DBMS Primary Site Primary Secondary Remote Standby GoldenGate Capture Trail Route Deliver Pump Business Apps AlwaysOn WAN AlwaysOn
  28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential …But Sometimes Fully Prepared / Cooked is Needed
  29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | All Enterprise Data Sources Oracle Confidential 29 Prepared Data: ETL to “Cook” the Data for Consumption Poly- Structured Relational RAW DATA PREPARED DATA MASTER DATA SCHEMA EVENTS ETL ETL <produce> <produce> <produce> <subscribe> <subscribe> Business-oriented consumers usually prefer that IT prepare the data for them
  30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Raw Data Layer Speed Layer Batch Layer Oracle Confidential 30 ETL Pipelines with Data Integrator Streaming Analytics Serving Layer REST Services Visualization Tools Reporting Tools Data Marts Oracle Data Integrator Capture Trail Route Deliver Pump GG SQOOP API/File SQOOP + Native Loaders Data Integrator for Big Data ✓ Batch data ingestion with Sqoop, native loaders & Oozie ✓ Generate data transformations in Hive, Pig, Spark & Spark Streaming ✓ Extract data into external DBs, Files or Cloud Compare to Informatica / Talend ✓ NoETL Engine native E-LT execution, 1000’s of references ✓ Zero Footprint does not require any Oracle install on cluster ✓ Loosely Coupled design time means you can reuse mapping logic in many big data languages
  31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | All Enterprise Data Sources Oracle Confidential 31 A Common Data Pattern: Access Data from REST/Kafka Poly- Structured Relational Data Science Data Analysts Business Analyst DBAs RAW DATA PREPARED DATA MASTER DATA SCHEMA EVENTS ETL ETL <subscribe> <subscribe> <subscribe> <subscribe> <produce> <produce> <produce>
  32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential Kappa Data Flow Pattern using Oracle Tech Stack GoldenGate Raw Data (LCR) Schema Events (DDL) Prepared Data Topics Master Data ETL ETL 1 Topic : 1 Table Data Consumers <subscribe> Applications Streaming Analytics ODS (Data Store) Big Data Lake Data Warehouses CQL & Spatial Analytic Data Oracle Event Hub DBMS Updates Data Producers Entire Enterprise Database Estate Stream Analytics Data Integrator Dev / Test Env. Oracle Big Data <generate> <generate> API Management
  33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential If Transaction Data Were Food… Raw Prepared Seared Fully Cooked Native Source Events Events as JSON Validated JSON Topics Aggregate Topics LCR$_ROW_RECORD type (LONG, LONGRAW, or LOB) and contains the following attributes: • source_database_name: • command_type: • object_owner: • object_name: • tag: • transaction_id: • scn: • old_values: • new_values: gg.handler.kafkahandler.Format (JSON) {"address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021" }, “ssn": "646554567" } Topic Policy = phoneNumber(!NULL) gg.handler.kafkahandler.Format (JSON) { "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021" }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ] } { "firstName": "Jonathan", "lastName": "Smith", "age": 25, "address": { "streetAddress": “101 Main Street", "city": “San Francisco", "state": “CA", "postalCode": “27519" }, "phoneNumber": [ { "type": “cell", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ] } VERY RAW...........…SYNTACTIC PREPARATION…………RECORD LEVEL VALIDATION……....AGGREGATE DATA Raw Records: LCRs from Databases; Log Events from Web/Mobile; App Events from SaaS or ERP Applications Raw Data: sparsely populated raw records (eg; changes only) but syntactically normalized in JSON format Validated Data: populate the fully populated record, filter bad records or light transformations, records still 1:1 with Source Master Data: Composite records have had ETL aggregations and may have merged attributes from many sources/topics or joins back to DBs
  34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential If Transaction Data Were Food…How Will You Eat Yours?
Publicidad