Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Unifying the Silos: Optimize your Data Pipeline for Analytics and AI

1.893 visualizaciones

Publicado el

Discover how to break apart the modern data analytics workflow to focus on the data challenges across different phases of the analytics and AI lifecycle. By taking an end-to-end data pipeline view while adopting storage technologies for AI and analytics, your organization can reduce costs, modernize your data strategy and improve data governance. By anticipating how Hadoop, Spark, Tensorflow, Caffe and traditional analytics like SAS can share data, IT departments and data science practitioners can not only coexist, but also speed time to insight. You'll also learn the tangible benefits of a reference architecture using real-world installations that span proprietary and open source frameworks.

Publicado en: Tecnología
  • Sé el primero en comentar

Unifying the Silos: Optimize your Data Pipeline for Analytics and AI

  1. 1. IBM Storage and SDI © Copyright IBM Corporation 2018 Unifying the Silos : Optimize your data pipeline for Analytics and AI Gary Tomchuk IBM Global SW Defined Storage Sales Benoit Granier IBM File and Object Systems Technical Manager for Europe
  2. 2. IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 Please note
  3. 3. Notices and disclaimers 3Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation © 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law.
  4. 4. Notices and disclaimers continued 4Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM Corporation Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at: www.ibm.com/legal/copytrade.shtml.
  5. 5. IBMStorageandSDI © Copyright IBM Corporation 2018 Agenda § Data Management Challenges in Analytics and AI § AI Data Pipeline with IBM Spectrum Storage § IBM Spectrum Storage offering for Analytics and AI § IBM Spectrum Scale § IBM Spectrum Discover § IBM Cloud Object Storage § Data Unification using IBM Spectrum Scale with HDP § Data Unification Use Cases § IBM Spectrum Storage for AI - Solutions 5
  6. 6. IBM Storage and SDI © Copyright IBM Corporation 2018 Data Management Challenges in Analytics and AI
  7. 7. IBMStorageandSDI © Copyright IBM Corporation 2018 Biggest Unstructured Data Challenges Source: Forrester Analytics, Global Business Technographics Data And Analytics Survey, 2017, Global Business Technographics Data And Analytics Survey, 2016 (Enterprises with 1000+ employees) of firms see sourcing, gathering, managing & governing data as their biggest challenges when using systems of insight 39% Number of enterprises with 1,000 TB+ unstructured data stores grew from 2016 to 2017 3X
  8. 8. IBMStorageandSDI © Copyright IBM Corporation 2018 Data Management Challenges § Silos of infrastructure for various analytics use cases § Multiple copies of the same data without a single source of truth § Analytics on the stale data § Time consuming data ingest cycle § Unmanageable cluster sprawl with data growth
  9. 9. IBM Storage and SDI © Copyright IBM Corporation 2018 AI Data Pipeline for IBM Spectrum Storage
  10. 10. © IBM Corporation 2018 10 AI, Analytics and Data Pipelines AI and Big Data pipelines need to support high performance Data Analytics and AI/Machine Learning /Deep Learning from early experimentation to shared data services on production clusters POWERAI
  11. 11. Shorten Time to Value with IBM Storage INGEST INFERENCETRAININGCLASSIFY AI Data Workflow Champion Challenger 80% of Data Science Time Resource Optimization Provision Time NEWDATA AI Workflow Why IBM? Business Value Data Scientist Productivity Reduce Time to Accuracy, Improve Provisioning Time, Increase Cycles, Reduce Human Error • Improve velocity by getting to your data faster using tools, not trial & error The most scalable, low latency storage platform Minimize data movement Increase performance, automate storage processes and reduce cost • Using the leading portfolio of Software-defined storage Optimized Economics • Balance performance and cost with system choices Proven Reference Architecture • Higher performance, more confidence, lower costs Industry Standard Approach • Deliver consistency and efficiencies Uses Technology advances • GPU, Open Source Frameworks Headwinds Challenge time-to-value Lower CAPEX Improve Model Quality Faster Time to Insight Business Agility Lower OPEX Higher Client Experience Automation Savings Look for dynamically adaptable, simple, flexible, secure, cost-efficient, and elastic infrastructure that can support high capacity along with high throughput and low latency for high performance training and inferencing experience. IDC
  12. 12. IBMStorageandSDI © Copyright IBM Corporation 2018 The Goal: Move Data from Ingest to Insights INSIGHTSCLASSIFY / TRANSFORM ANALYZE / TRAININGESTEDGE
  13. 13. IBMStorageandSDI © Copyright IBM Corporation 2018 Trained Model SSD/NVMe ML / DL Prep Training Inference IBM AI Data Pipeline Throughput-oriented, software defined temporary landing zone High throughput performance tier Transient Storage Global Ingest Fast Ingest / Real-time Analytics Archive Classification & Metadata Tagging SSD SDS/Cloud Cloud Hybrid/HDD INSIGHTSANALYZE / TRAININGEST Insights Out High scalability, large/sequential I/O capacity tier EDGE CLASSIFY / TRANSFORM TapeHDD Cloud High volume, index & auto-tagging zone Throughput-oriented, performance & capacity tier Throughput-oriented, globally accessible capacity tier High throughput, low latency, random I/O performance tier ETL Data In High throughput, random I/O, performance & capacity Tier Hadoop / Spark Data Lakes SSD/Hybrid Inference
  14. 14. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM AI Data Pipeline with IBM Spectrum Storage Improved data governance with storage offerings for end-to-end data pipeline Spectrum Scale Cloud Object Storage Cloud Object Storage Elastic Storage Server Elastic Storage Server Elastic Storage Server Transient Storage Global Ingest Fast Ingest / Real-time Analytics Archive Spectrum Archive Hadoop / Spark Data Lakes Data In Insights Out INSIGHTSANALYZE / TRAININGESTEDGE CLASSIFY / TRANSFORM SSD SDS/Cloud Cloud SSD/Hybrid Hybrid/HDD TapeHDD Cloud Trained Model SSD/NVMe ML / DL Prep Training Inference Spectrum Discover Elastic Storage Server Cloud Object Storage Elastic Storage Server ETL Classification & Metadata Tagging Inference
  15. 15. IBM Storage and SDI © Copyright IBM Corporation 2018 IBM Spectrum Storage Offerings for Analytics and AI
  16. 16. IBMStorageandSDI © Copyright IBM Corporation 2018 Delivers Data Management at scale for enterprises that are swamped by data IBM Spectrum Scale Lets you grow and share the storage infrastructure while automatically moving file and object data to the optimal storage tier as quickly as possible. IBM Spectrum Scale Store Everywhere. Run Anywhere.
  17. 17. © 2018 IBM Corporation© Copyright IBM Corporation 2018 IBM Spectrum Scale – Data Management at Scale Spectrum Scale Encryption and Compression NFS SMBFile ObjectHDFS Distributed RAID • Software defined file storage with high performance and extreme scalability • 50% of systems delivering top Spec-SFS benchmarks run IBM Spectrum Scale SW. • Supports file systems with sizes of tens of petabytes that contain billions of files and can be accessed by thousands of nodes in a cluster. • Smart policy engine to optimize utilization with multiple storage tiers Flash->Disk->Cloud->Tape • Enterprise class storage features like Disaster recovery, Encryption, Compression, Erasure Coding • Flexibility in storage architectures shared-nothing, shared-storage or hybrid. Fast Disk Slow Disk TapeSSD Fast Disk Slow Disk IBM Spectrum Scale – Data Management at Scale
  18. 18. © 2018 IBM Corporation18 IBM Spectrum Scale Proven at over 4,000 customers worldwide Most common use- cases: - High performance computing - Big data workloads like Hadoop, Spark - Enterprise analytics workloads like SAS grid, SAP HANA - AI/ML/DL like genomics, autonomous driving - High performance active archive stores 4 time Champion Infiniti Red Bull Racing does real-time race analytics Personalized cancer treatment for over 65,000 patients Climate and weather modeling with 16 PB on line & 12 PB archive on tape R&D environment for natural language tools Semiconductor Design Higher profits from shorter chip design cycles Shared storage for global banking 100 times faster than incumbent solution
  19. 19. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Spectrum Scale Storage …for the world’s most powerful supercomputers Summit System • 4608 nodes, each with: • 2 IBM Power9 processors • 6 Nvidia Tesla V100 GPUs • 608 GB of fast memory • 1.6 TB of NVMe memory • 200 petaflops peak performance for modeling and simulation • 3.3 ExaOps peak performance for data analytics and AI IBM Spectrum Scale IBM Elastic Storage Server 2.5 TB/sec throughput to storage architecture 250 PB HDD storage capacity Sierra System • 4320 nodes, each with • 2 IBM Power9 processors • 4 Nvidia V100 GPUs • 320 GB of node memory • 1.6 TB of NVMe memory • IBM Spectrum Scale • IBM Elastic Storage Server 125 petaflops peak performance 154 PB HDD storage capacity World’s most powerful supercomputer World #2 supercomputer
  20. 20. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Elastic Storage Server (ESS) Integrated scale-out data management for file and object data Optimal building block for high-performance, scalable, reliable enterprise Spectrum Scale storage • Faster data access with choice to scale-up or out • Easy to deploy clusters with unified system GUI • Simplified storage administration with IBM Spectrum Control integration One solution for all your Spectrum Scale data needs • Single repository of data with unified file and object support • Anywhere access with multi-protocol support: NFS 4.0, SMB, OpenStack Swift, Cinder, and Manila • Ideal for Big Data Analytics with full Hadoop transparency Ready for business critical data • Disaster recovery with synchronous or asynchronous replication • Ensure reliability and fast rebuild times using Spectrum Scale RAID’s dispersed data and erasure code • Five 99999s of availability ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage
  21. 21. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Elastic Storage Server (ESS) Family Model GL4S: 4 Enclosures, 20U 334 NL-SAS, 2 SSD Model GL6S: 6 Enclosures, 28U 502 NL-SAS, 2 SSD Model GL2S: 2 Enclosures, 12U 166 NL-SAS, 2 SSD Capacity ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage 36 GB/s12 GB/s 24 GB/s System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EXP3524 8 9 16 17 Model GS1S 24 SSD EXP3524 8 9 16 17 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EXP3524 8 9 16 17 Model GS2S 48 SSD EXP3524 8 9 16 17 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 System x3650 M40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EXP3524 8 9 16 17 EXP3524 8 9 16 17 EXP3524 8 9 16 17 Model GS4S 96 SSD Speed 40 GB/s 14 GB/s Model GL1Sz: 1 Enclosures, 9U 82 NL-SAS, 2 SSD ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage ESS 5U84 Storage 38 GB/s 40 GB/s Model GH14S: 1 2U24 Enclosure SSD 4 5U84 Enclosure HDD 334 NL-SAS, 24 SSD Model GH24S: 2 2U24 Enclosure SSD 4 5U84 Enclosure HDD 334 NL-SAS, 48 SSD 6 GB/s
  22. 22. IBMStorageandSDI © Copyright IBM Corporation 2018 Consolidate capacity storage for a cognitive and AI enterprise NAS Services File sync & share Archive Data Backup & Cloud Backup Cloud Repository/Service IoT Repository Mobile Apps Access multiple distributed applications concurrently One or more sites with geo-dispersed data DVR & Video Repository Image/Voice Repository AnalyticsFile Archive Financial Compliance Healthcare Cardiology, Radiology PACS Research & Patient Data Cloud Native Apps Media Production/ Archive / Distribution Compliance & Retention Backup, Archive and File Services Data Oceans and Repositories Industry Specific Data New Cloud Applications Documents Fast data discoveryEfficient data analysis 22Page Actions based on dataData tagging
  23. 23. IBMStorageandSDI © Copyright IBM Corporation 2018 The Market reinforces IBM transformational story Gartner Critical Capabilities for Object Storage #1 Analytics #1 Archiving #1 Backup #1 Cloud Storage * Source: Gartner Critical Capabilities for Object Storage Published 30 January 2019 - ID G00352191 Gartner MQ and IDC MarketScape IBM worldwide object-based leadership Gartner: MQ IDC MarketScape CRN Tech Innovator Tech Target LEADER LEADER WINNER FINALIST Distributed File Systems and MarketScape for Object Storage Storage – Cloud Product of the Year Object Storage Software Defined Storage October 2018 June 2018 December 2018 January 2019 3 years in a row 5 years in a row First Year First Year January 2019
  24. 24. IBMStorageandSDI © Copyright IBM Corporation 2018 Transformational Insight for AI, Analytics, Governance, & Optimization – Expedite time to discovery • Automate cataloging of data by capturing metadata as it’s created • Locate and identify the most relevant data regardless of its type or location • Use simple SQL query commands using GUI interface or API scripts • Enable comprehensive insight by combining system metadata with custom tags to increase storage admin & data consumer productivity • Create custom tags, and policy-based workflows to orchestrate content inspection & activate data in AI, ML, & analytics workflows Scanning and Event Notifications
  25. 25. IBM Storage and SDI © Copyright IBM Corporation 2018 Data Unification with IBM Spectrum Scale and HDP
  26. 26. IBMStorageandSDI © Copyright IBM Corporation 2018 Reduce datacenter footprint and get faster ingest with in-place analytics Data NFS SMB POSIX Object HDFS API Access to the data using any of the industry standard protocols. No need to maintain separate copies for different applications. Flexible storage architectures Flexibility in architectures with the support of hybrid architecture under common namespace. Support for running containerized workloads. Extreme scalability with parallel file system architecture Data + Metadata Node Data + Metadata Node Data + Metadata Node Data + Metadata Node Scale to billions of files. No centralized metadata node bottleneck. ESS Why IBM Spectrum Scale for Analytics/AI workloads? Unmatched Scalability and Performance with the most optimized storage footprint Full Data Life Cycle Management Flash Disk Storage rich servers Storage pool1 Storage pool2 Storage poolx External Storage poolx Tape IBM TSM/LTFS Spectrum Scale Storage pool1 Storage pool2 Storage poolx External Storage poolx Data Migration between various storage pools with policy based Auto Tiering Install SW directly on compute nodes Shared storageOR Performance leadership in AI benchmarks 40GB/s and 300TB in 2U, Linear scaling of 120GB/s in 6U
  27. 27. IBMStorageandSDI © Copyright IBM Corporation 2018 IBM Spectrum Scale + Hortonworks HDP • Spectrum Scale becomes the storage layer in your HDP environment. • Spectrum Scale supports accessing data using HDFS API and hence is transparent to the applications using HDP. • Enterprise class storage for your Hadoop/Spark environment (Encryption, Compression, Tiering, DR…) IBM Spectrum ScaleHDFS – Scale Transparency Connector Hortonworks HDP with IBM Spectrum Scale IBM Redbook
  28. 28. 28 IBM Spectrum Scale IBM ESS Shared-Storage Model vs Classic HDFS Shared-Nothing Cluster 10 GigE / 40 GigE HDP Storage-Rich Worker Nodes HDP HDP HDP Standard Shared-Nothing model on storage-rich servers - Inefficient, inflexible, and expensive - Expensive, wasteful, and with high OPEX to scale and manage compute and storage - Lacks enterprise features • Disaggregated “thin” worker nodes with fewer disks • No application-data disks in servers • Replaced with shared storage • No need for storage-only nodes • Avoidance cluster sprawl with high performance, flexibility, and enterprise features • All with HDFS compatibility
  29. 29. IBM Storage and SDI © Copyright IBM Corporation 2018 Data Unification with IBM Spectrum Scale Use Cases
  30. 30. EDW Optimization Simplify data management using common storage between EDW and Hadoop Archive Data away from EDW - Move cold or rarely used data to Hadoop as active archive - Store more of data longer Offload costly ETL process - Free your EDW to perform high-value functions like analytics & operations, not ETL - Use Hadoop for advanced ETL Optimize the value of your EDW - Use Hadoop to refine new data sources, such as web and machine data for new analytical context Reduce migration effort & skillset gap - Use existing investment in Oracle/DB2/Netezza skills - BigSQL allows you to migrate applications without major code rewrites and additional SQL development Control cluster sprawl - Grow storage independent of compute with ESS - POWER servers deliver 1.7x throughput compared to Hortonworks on x86 - Up-to 60% less storage footprint Enterprise Data Warehouse DB2 / Dashdb / Oracle / Netezza / Teradata … Hot Data Hortonworks Hadoop Cold Data, Archive Data, New Sources BigSQL SQL Interface BI Software (Business Analytics, Visualization like SAS grid, SAP HANA etc) ESS for Speed ESS for Data Lake Spectrum Scale A Financial Services company in Europe is optimizing their DB2 warehouse using Hortonworks Hadoop; and is using ESS as the common storage behind DB2 and Hadoop. New Data Sources Streaming / IOT data
  31. 31. © 2018 IBM Corporation Large banking group selects scalable data science platform to develop new smart banking services through use of AI in real-time Business problem • Needed to improve client experience and create new client services by identifying new patterns in its data through use of data science and AI techniques • Existing Hadoop infrastructure solution did not have sufficient throughput and scalability Solution • POWER9 cluster with L922 servers (x96) and AC922 servers (x3) • IBM Elastic Storage Server (ESS) with Spectrum Scale: GL1S (x2) and GL2S (x2) • Hortonworks Data Platform (HDP) and IBM Watson Studio (formerly DSX) Benefits • Open, virtualized infrastructure solution based on IBM Power Servers running HDP and Watson Studio • Optimized, scalable and highly available Storage Architecture with IBM Spectrum Scale based ESS • Integrated security of DSX+HDP in conjunction with higher throughput of POWER9 servers outperformed Intel and reduced time to value • End-to-end solution that addressed all requirements around performance, security, costs, and ability to scale New Smart AI ServicesNew AI-Driven Client Services in Banking IBM Spectrum Scale
  32. 32. Unified Analytics Workflows Single data lake for Hadoop and non-Hadoop analytics A bank in South Africa is implementing HDP and SAS grid software on a common ESS based infrastructure. ESS for Data Lake POSIX Interface HDFS Interface Other Analytics Platforms SAS grid, SAP HANA/Vora, ML/DL, Conductor with Spark etc Hadoop Map-Reduce, Spark, ML/DL etc ESS for Speed Fast Ingest POSIX Interface Spectrum Scale All analytics workflows on common storage - Improve data reliability and governance with single data lake for Hadoop and non-Hadoop analytics setups - Build ML/DL workflows that use multiple analytics platforms - Share data across analytics workflows as appropriate Ingest fast and improve time to insight - POSIX interface combined with ESS Flash storage gives super fast ingest ability Control cluster sprawl - Grow storage independent of compute with ESS - Up-to 60% less storage footprint - POWER servers deliver 1.7x throughput compared to Hortonworks on x86
  33. 33. © 2018 IBM Corporation Large bank delivers personalized banking in real- time to millions of customers by applying new analytics and data science. Business problem • Aggressively improve their analytics maturity by delivering Predictive Analytics capability providing a Data-driven Customer Experience • Develop open platform that can ingest all relevant data from various sources with the ability to extract new insights Solution • POWER8 cluster with S822L servers (x24) • IBM Elastic Storage Server (ESS) with Spectrum Scale: GL2S (x2) • Hortonworks Data Platform (HDP) Benefits • Open infrastructure solution based on IBM Power Servers running Linux and HDP • Optimized, scalable and highly available Storage Architecture with IBM Spectrum Scale based ESS • Better overall TCO: Superior performance with less than half the number of compute nodes where Power + ESS outperformed local storage on Intel • Leverage ESS in-place analytics to host both HDP and SAS workloads on single data layer reducing data copies and improving data governance Predictive Analytics Data-Driven Customer Banking IBM Spectrum Scale
  34. 34. Integrated HPC and Hadoop Efficiently transform data into insights with single data lake for HPC & Hadoop NASA and a Healthcare company from middle east are using common Spectrum Scale data lake to efficiently get insights using traditional HPC and Hadoop analytics. ESS for Data Lake POSIX Interface HDFS Interface Traditional HPC Open, Read, Write, MPI, C-code, Python etc Hadoop Map-Reduce, Spark, ML/DL etc NFS/SMB/Object Interface Spectrum Scale Protocol Node ESS for Speed Fast Ingest POSIX Interface Spectrum Scale Extend HPC to add modern analytics capabilities - Efficient movement of data between modern and traditional applications with common namespace - Spectrum Scale in-place analytics capabilities enable accessing the same data using NFS/SMB/Object/POSIX/HDFS without requiring any modifications to the data - Improve data reliability and governance with single data lake Ingest fast and improve time to insight - POSIX interface combined with ESS Flash storage gives super fast ingest ability - Common namespace enables running some edge analytics at the ingest layer as well Control cluster sprawl - Grow storage independent of compute with ESS - Up-to 60% less storage footprint - POWER servers deliver 1.7x throughput compared to Hortonworks on x86
  35. 35. IBMStorageandSDI © Copyright IBM Corporation 2018 Solutions – IBM Spectrum Storage for AI Available Solutions: § IBM Spectrum Storage for AI with Power Systems § IBM Spectrum Storage for AI with NVIDIA DGX (leading AI x86 based solution) § IBM Spectrum Storage for Hadoop/Spark workloads (Hortonworks/Cloudera) § IBM Spectrum Storage for AI in Autonomous Driving 35 IBM Spectrum Storage for AI supercharges your AI data pipeline with storage solutions optimized for the unique demands of AI. Integrating industry-leading servers, ISV / open source software and IBM software-defined storage, IBM Spectrum Storage for AI delivers simplified deployment, groundbreaking performance, and extended data management to drive developer productivity with the fastest path to insights. https://www.ibm.com/it-infrastructure/storage/ai-infrastructure
  36. 36. © IBM Corporation 2019 36 “IBM’s Spectrum Storage for AI is differentiated from both the NetApp and Pure Storage offerings. IBM Spectrum Storage for AI provides a level of scalability that is nearly unmatched by anyone in the industry. It’s both incredibly fast at scale, and it scales linearly. The ability for IBM Spectrum Storage for AI to seamlessly integrate with the rest of the Spectrum Storage suite should make IBM’s solution an easy decision for enterprise buyers.” § Steve McDowell
  37. 37. IBM Storage and SDI © Copyright IBM Corporation 2018 Questions?
  38. 38. IBM Storage and SDI © Copyright IBM Corporation 2018 Thank You!

×