Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

IBM+Hortonworks = Transformation of the Big Data Landscape

1.730 visualizaciones

Publicado el

Last year IBM and Hortonworks jointly announced a strategic and deep partnership. Join us as we take a close look at the partnership accomplishments and the conjoined road ahead with industry-leading analytics offers.

View the webinar here:

Publicado en: Tecnología
  • Did you try ⇒ ⇐?. They know how to do an amazing essay, research papers or dissertations.
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

IBM+Hortonworks = Transformation of the Big Data Landscape

  1. 1. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Webinar: IBM-Hortonworks Partnership: How it is changing Big Data landscape Satheesh Bandaram Director, IBM Big Data Development Srikanth Venkat Senior Director, Hortonworks Product Management
  2. 2. Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2
  3. 3. Contents 3 IBM-Hortonworks partnership Announcement Where are we today Typical upgrade Scenario IBM Hortonworks Data Platform on Cloud Joint roadmap for 2018 Roadmap for 2018 DSX roadmap IBM Db2 BigSQL plans IBM BigIntegrate/BigQuality Hortonworks offerings HDP HDF Community Initiatives ODPi Data Governance Typical use cases EDW Governed Data lakes
  4. 4. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 4 IBM-Hortonworks partnership
  5. 5. © 2017 IBM Corporation5 IBM and Hortonworks Announce Partnership to Bring BigSQL/Data Science on an Open Platform § The partnership, announced June 13th 2017, showcases Hortonworks recognition of IBM as the leader in data science, and provides Hadoop clients a clear choice for open data science vs. the limited and proprietary offerings from vendors such as Cloudera and MapR. § Specifically Announced: − Hortonworks adopts IBM Data Science Experience (DSX) as its Data Science Platform. − Hortonworks adopts IBM Big SQL as its SQL engine for complex analytics on Hadoop. − IBM will leverage Hortonworks HDP as the core Hadoop distribution for Big SQL and DSX.
  6. 6. © 2017 IBM Corporation6 Where are we today? üIn-place Express upgrade available from IBM BigInsights to HDP and BigSQL üBigInsights 4.2.5 released in June is a controlled update. Customers should move to HDP and BigSQL directly. üMore than 50% customers upgraded to latest HDP and BigInsights versions of 2017 − About 20% upgraded to BI 4.2.5 − About 30% upgraded to HDP/Big SQL − Several large GEP customers upgraded successfully üClusters included other IBM products in business applications / solutions üSome feedback from customers on the upgrade: Satisfied with upgrade process and workload execution after upgrade Happy with upgrade of 3 Big SQL clusters Happy with IBM’s commitment to successful completion during the process
  7. 7. © 2017 IBM Corporation7 Typical Upgrade Scenario Upgrade Assistance Program IBM will provide inventory ofdocumentation,blogs and other resources for all customers Phase1 Upgrade execution Resources : One on-site BI to HDP upgrade SME and a solution implementation manager Duration : TBD Activities : TBD Cost: TBD + T&L Phase2 Pre-Upgrade Analysis Activities : Review customer environment.Gather Upgrade Checklistdetails Cost: No Cost to Customer Phase0 IBM Services Driven Upgrade execution Resources : IBM to provide remote support Activities : Customer driven scheduling and time frames Cost: No Cost to Customer Phase2 Customer Self Driven
  8. 8. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 8 IBM Hortonworks Data Platform on Cloud IBM Data Science Experience IBM Db2 Big SQLHortonworks Data Platform A hosted offering to meet Data Lake, Data Analytics and Data Science requirements of our Customers Dates and Contact Info: You can order this offering TODAY! Contact- Jerry Green Value Proposition: -Eliminate the overhead of investing and managing hardware infrastructure. - Encrypted, secured and private cloud environment Services & Support: Superior Services and Support is available for customer assistance to get more value from the hosted environment. Offering: Hosted on IBM Softlayer, the customers will get turn- key platform with Hortonworks Data Platform, IBM BigSQL and IBM DSX in couple of hours.
  9. 9. Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 9 IBM Hortonworks Data Platform on Cloud Recommended configuration for Enterprise environment. Configuration #2 - 10 Nodes – HDP+ IBM Big SQL - 2 Nodes – IBM DSX Machine – Virtual Machines 12vCPU, 128GB RAM Recommended configuration for extreme performance. Configuration #3 - 10 Nodes – HDP+ IBM Big SQL - 2 Nodes – IBM DSX Machine – Bare Metal 16CPU, 256GB RAM Recommended configuration for Test/Dev type of environment. Configuration #1 - 8 Nodes – HDP + IBM Big SQL - 3 Nodes – IBM DSX Machine – Virtual Machines 8vCPU, 32 GB RAM Available Configurations Development Environment Enterprise Environment Enterprise Performance Env. -Non Production License -Provisioning Included -Production License -Provisioning Included -Addition Service available to buy -Production License -Provisioning Included -Addition Service available to buy
  10. 10. © 2017 IBM Corporation10 Joint Roadmap for 2018
  11. 11. 11 Roadmap for 2018 HDF 3.1 HDF 3.2 HDP 3.1 (Ambari 3.0) HDP 3.0 (Ambari 2.7)HDP HDF 3.3 HDF Q4 Q1 Q2 Q3 Q4 2 0 1 82 0 1 7 Strata SJC Summit EMEA Summit SJC Strata NYC Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec HDP 3.0 Beta 1 BigSQL BigIntegrate/ BigQuality DSX BigSQL 5.0.2 BigSQL 5.0.3 BigSQL 6.0 Preview BigSQL 5.0.4 DSX 1.1.3 DSX 1.3 DSX 1.2.1 BigIntegrate BigIntegrate BigIntegrate Preview BigIntegrate Q1 BigSQL 6.0
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DSX–HDP:3-Phased Implementation PlanPhase 1 Phase 2 Phase 3 Side-by-Side Installation DSX and HDP installed side-by-side independently Dedicated Nodes for DSX K8S Align with HDP 3.0, K8S integration, DSX & HDP installed within same cluster ✓ Milestone 1: Target Non-Secure HDP (Aug 25) ✓ Milestone 2: Target Secure HDP (Sep 30) ✓ Milestone 3: Production Ready (Oct 31) ✓ 30-60-90 Execution Plan ✓ Knox integration, LDAP integration ✓ Management Pack for Ambari-based installation, configuration & operations of IBM DSX in a HDP cluster ✓ Deployment and distribution of spark, R and python packages onto the HDP cluster ✓ Milestone 1: Beta (H1 2018) ✓ Milestone 2: GA (H2 2018) ✓ Ranger & Atlas integration for authentication, authorization & governance ✓ Ranger-based access control for models ✓ Knox SSO ✓ Automatic deployment & scale of IBM scoring/inference engine on HDP Compute Grid ✓ HDP embed, integration and support of K8S ✓ Run DSX static services on Kubernetes on YARN ✓ Run Jupyter / R / Zeppelin services on YARN Complete DPS/ Yarn Management
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IBM Db2 Big SQL Roadmap V5.0.3 2Q 2018 V6.0/V5.0.4 4Q/1Q 2018/19 Administration • Introduce “Fast Patch Management” for improved patch consumption by customers • Unveil best practices for automatic workload management & monitoring • Improve tool experience by adding historical monitoring in DSM Federation • Enhance federation by adding new data sources • Create best practices when connecting to remote databases Performance • Improve query performance for query execution • Enhance operability of MQTs by bringing auto-refresh and auto-create capabilities • Introduce an all new Java IO for all file formats for improved resource consumption and query performance Enterprise, Governance & Security • Enhance Ranger usability • Integrate with IGC HDP 3.0 & 3.0.1 Support • Hive 3.0 enables ACID by default in ORC file formats – Big SQL needs to work with this default • Slider deprecation dictates Big SQL to integrated with merged YARN to work seamlessly for resource management • Tolerate new storage Ozone • Enable Big SQL to be packaged in use cases for installation using Ambari through management packs • Integrate with Atlas SQL compatibility • Avail Netezza table functions for SQL compatibility • Improve SQL semantics for common SQL Data Virtualization • Significant revamp of Federation technology by delivering next generation QueryPlex • Extending traditional data lake to virtually cover large number and types of data sources • Add Intelligent Caching to speed up execution
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BigIntegrate, BigQuality, Governance Roadmap Q1 2018 Theme: Deployment •Reduce time it takes to deploy BigIntegrate / BigQuality •Docker Containers, Kubernetes •Reduce binary footprint •Simplify localization strategy •Operational improvements •Deliver DataStage on ICP Q2 2018 Q3 2018 Theme: Governed Data Lake •Better memory allocation for BIBQ jobs •Prevent BIBQ jobs to fail due too resources constraints •BigQualityon Spark: Lightweight data quality, enable data quality for self- services, better performance & scalability •Atlas-IGC Integration via OMRS: Active governance on Hadoop • Near real-time integration between IGC- Atlas •Improved connectivity to hive and kafka Theme: HDP 3.x •YARN 3.0 •Triggering a checkpoint •Checkpoint implementation •Checkpoint parallelism •Restart analysis & recovery •Cleanup checkpoint data •Spark Engine (interactive) •Operational improvements With Q4 release, align with HDP 3.1
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Virtualization coming to IBM Db2 BigSQL The mission … Provide simplicity and rich analytics capabilities when virtualizing over large numbers of data sources, flexible schema & data types, automated data & results caching, and compute scaling with industry leading performance.
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Fundamental Capabilities à Data Connectivity to a wide variety of sources both local and remote from the service. à Rich Data Processing to execute complex operations upon the disparate sources. à Seamless Query capability through a consistent and standard language without knowledge of the source of the data. à Performant Data Access to sources of data regardless of location. à Findability of data including rich meta data and the ability to find the lineage and consumption of the data. à Security and Governance providing comprehensive access control, encryption and lineage. à Dynamic Scaling for both the number of sources and the compute resources available. à Simplicity for configuring and using the service including deployment, discovery of data, autonomic tuning of the system. à Availability in both private Cloud and public Cloud analytics offerings. à Integration with existing IBM capabilities and offerings.
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Re mote Source ConnectorRemote Source Connector Client Applications IBM Db2 BigSQL MS SQL Server Netezza (PDA) Oracle Remot e Agent Adjacent to data Sybase SparkSQL Administration UI Remote Source Connector Remot e Agent S3… Information Governance (existing Information Governance Network OPTIONAL installation of Remote Agent OPTIONAL installation of Remote Agent DV Architecture
  18. 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Hortonworks Offerings Srikanth Venkat Senior Director, Product Management, Hortonworks
  19. 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved SPEED LIMIT 35 Good Traffic Internet of Things Streaming Data Cloud Computing Artificial Intelligence Perfect Storm of Trends
  20. 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved The Big Data Tech Journey 2011 DATA-AT-REST HADOOP 1.0 100% Open 2015 DATA-IN-MOTION Out to the edge 2016 CONNECT DATA PLATFORMS Cloud/On prem TODAY GLOBAL DATA MANGEMENT Holistic Govern, Manage, Secure 2013 YARN Enable multiple workloads
  21. 21. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Continuous Insights Deliver insights from ALL data, origin to rest Enterprise Ready Management Security Governance Any Delivery Model Data Center Cloud Hybrid Open Innovation Architecture Community Ecosystem Enabling Modern Applications on a Connected Data Architecture
  22. 22. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Global Data Management DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring Legacy/ Operational Data Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems
  23. 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Today’s Reality: Encompass and Connect All Data ®
  24. 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Exception- Based Monitoring 360 View of Operations, Equipment Failure Analytics, etc. Deep Historical Analysis DATA CENTER Stream Analytics Cyber Security & Threat Detection Telemetry – Connected Devices Machine Learning CLOUD Sensors, SCADA, Control Systems Edge Analytics Time Series Historian Modern Data Architecture
  25. 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved • 100% Open Source Capture all the innovation with no vendor lock in • Centrally architected with YARN at its core Coordinate cluster wide services for operations, security, governance & shared data services • Interoperable with existing technology and skills Interoperable through the ODPi Core for integration with existing services • Enterprise-ready, with data services for operations, governance and security Dynamic security through integration of Apache Atlas and Apache Ranger What Makes Hortonworks Data Platform Unique? ®
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP & Cloud Platforms Security & Governance Enterprise-grade Security & Governance Single-pane authorization, Row/Column Security for SQL, Metadata management & governance, Classification-based policies Comprehensive & Fast SQL Full-fledged EDW capabilities, ACID, Merge, Security, In-memory & fast, Cloud- ready Enterprise Ready Apache Spark Security for SparkSQL, Management & Monitoring, Latest Spark, IBM Data Science Experience SQL / Hive Spark & Data Science Ambari/SmartSense Seamless Cluster Management and Operational Intelligence Tools and services for managing full service lifecycle for entire ecosystem, faster case resolution, proactive issue detection and activity reports/visualization Cloud Cloud Data Lakes Fast, opinionated cloud data lakes, powered by Cloudbreak
  27. 27. HDP What’s coming? Solutions Consumability Deploy Anywhere Deep Learning & GPU support Containerization on Hadoop Management Packs Scale (Name Node Federation) IBM DSX on HDP Erasure Coding Analytics UI (Superset) Security and Governance ON by Default Kubernetes Support IBM DSX w/Atlas & Ranger Support
  28. 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Stream Analytics Clusters Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors What Does HDF Do? Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform
  29. 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved F L O W M A N A G E M E N T Data acquisition and delivery Simple transformation and data routing Simple event processing End to end provenance Edge intelligence & bi-directional communication S T R E A M P R O C E S S I N G Scalable data broker for streaming apps Scale Out Streaming Computation Engine S T R E A M A N A LY T I C S Pattern Matching Prescriptive & Predictive Stream Analytics Complex Event Processing Continuous Insight E N T E R P R I S E S E R V I C E S Provisioning, Management, Monitoring, Security, Audit, Compliance, Governance, Multi-tenancy Hortonwork s Schema Registry Jav a Ag ent HDF 3.1 Platform Data Flow Registry Orange = new in HDF 3.1, available in HDP+HDF scenario Blue = new in HDF 3.1, available in HDF-only, and HDP+HDF scenarios C++ Agent
  30. 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Summary
  32. 32. Today’s reality
  33. 33. A new manifesto for metadata and governance § Metadata management must be automated § Metadata management must become ubiquitous § Metadata must become open and remotely accessible § Metadata should be used to drive the governance of data The discovery, maintenance and use of metadata has to be an integral part of all tools that access, change and move information.
  34. 34. What needs to change?
  35. 35. Open Metadata and Governance
  36. 36. @ODPiOrg Good metadata enables subject matter experts to collaborate around the data Locate the data they need, quickly and efficiently Feeding back their knowledge about the data and the uses they have made about it to help others and support economic evaluation of data CO-CREATION WITH PRACTITIONERS
  37. 37. GET INVOLVED WITH ODPi DATA GOVERNANCE Have your organization support ODPi Visit ODPi Data Governance website and join the quarterly newsletter Learn more about Data Governance PMC Join the Data Governance PMC Mailing List
  38. 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved Typical Use cases
  39. 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EDW Modernization Reference Architecture IBM BigIntegrate High- Performanc e Data Movement Hadoop Scalable Storage and Compute (On Prem or Cloud) Hive LLAP High Performance SQL Data Mart BigSQL/IBM Cognos Complex Queries, Concurrency for Higher Performance Source EDW Systems Fast, scalable SQL analytics Intelligent in-memory caching High performance data import from all major EDW platforms Pre-aggregated data ... Or, full-fidelity dataIBM IIDR Change Data Capture
  40. 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Modern Data Warehouse Solution Federation Sources Governed Data Lake Hortonworks HDP (Compute and Storage Platform) Big SQL (High performance, Scalable, Complex queries, Data virtualization, SQL compatibility, Spark integration ) Hive LLAP (Fast and Scalable SQL) Hbase (Key Value pair) EDW Weblog Sensor Clickstream IGC , Big Quality, Big Match (Data quality and governance for Hadoop and non-Hadoop data lake) Data sources Ingestion Query processing with security Visualization & Data Science Ranger / Atlas (Governance & Security) No SQL Unstructured, social media RDBMS BigIntegrate(ETL), IIDR (CDC), BigSQL (Insert/Load) HDF (Kafka, Nifi, Storm), sqoop