Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data Analytics Platforms by KTH and RISE SICS

43 visualizaciones

Publicado el

Big Data Analytics Platforms by KTH and RISE SICS

Publicado en: Datos y análisis
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Sé el primero en recomendar esto

Big Data Analytics Platforms by KTH and RISE SICS

  1. 1. Seif Haridi KTH/RISE AI @ RISE Hopsworks, Apache Flink and Beyond Big Data Analytics Platforms By KTH and RISE SICS
  2. 2. Hopsworks: End2End Data Platform for Analytics/ML Datasources Applications API Dashboards Hopsworks Apache Beam Apache Spark Pip Conda Tensorflow scikit-learn PyTorch J upyter Notebooks Tensorboard Apache Beam Apache Spark Apache Flink Kubernetes Batch Distributed ML &DL Model Serving Hopsworks Feature Store Kafka + Spark Streaming Model Monitoring Orchestration in Airflow Data Preparation &Ingestion Experimentation &Model Training Deploy &Productionalize Streaming Filesystem and Metadata storage HopsFS Apache Kafka Datasources
  3. 3. Logical Clocks was founded by the team that created and continues to drive Hopsworks a Data-Intensive AI platform, and its Feature Store, a warehouse for machine learning features. Logical Clocks’ vision is to simplify the process of refining data into intelligence at scale
  4. 4. 25 Continuous Intelligence A design pattern in which real-time analytics are integrated within a business operation, processing current and historical data to prescribe actions in response to events. Business Tech https://www.gartner.com/en/newsroom/press-releases/2019-02-18-gartner-identifies-top-10-data-and-analytics-technolo events actions
  5. 5. Paradigm Shift in Data Processing Data lots of Queries retrospective answers Query lots of Data real-time answers • Data Stream Processing as a 24/7 execution paradigm paradigm shift 6 Stream SQL, CEP… Kafka, Pub/Sub, Kinesis, Pravega… Flink, Beam, Kafka-Streams, Apex, Storm, Spark Streaming… Storage Compute High Level Models The Real-Time Analytics Stack
  6. 6. Actors vs Streams vs Data Stream ComputingActor Programming • Declarative Programming • State Managed by the system • Robust: Built-in Fault Tolerance • Scalable Deployments service logic service logic state log ic log ic log ic log ic log iclogic logic logic log ic logic state • Low-Level Event-Based Programming • Manual/External State • Not Robust: Manual Fault Tolerance • Not flexible scaling Declarative Program service
  7. 7. Stream SQL, CEP… Kafka, Pub/Sub, Kinesis, Pravega… Flink, Beam, Kafka-Streams, Apex, Storm, Spark Streaming… Storage Compute High Level Models 8 The Real-Time Analytics Stack
  8. 8. 9 Apache Flink Foundations commercial deployments • Top-level Apache Project • #1 stream processor (2019) • Production-Proof • > 400 contributors • 100s of deployments Data Streams, Fault Tolerance, Window Aggregation Calcite stream-SQL influenced
  9. 9. Structure of a 24/7 Application Event Logs Historic Data Event Logs Files Applications/Services Stream Processing State
  10. 10. Program Hierarchy in Flink 11 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management Automates
  11. 11. Program Hierarchy in Flink 12 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management • Dynamic program state • Operations on out-of-order streams Event Processing API f(input, state, time) Automates
  12. 12. Program Hierarchy in Flink 13 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management • Dynamic program state • Operations on out-of-order streams Event Processing API f(input, state, time) DataStream API window,map,filter etc. • Higher-Order Streaming Functions • Event Windowing (sessions, time etc.) Automates
  13. 13. Program Hierarchy in Flink 14 Dataflow Engine • Fault Tolerance • Scalability • Monitoring/IO Management • Dynamic program state • Operations on out-of-order streams Event Processing API f(input, state, time) DataStream API window,map,filter etc. • Higher-Order Streaming Functions • Event Windowing (sessions, time etc.) SQL, CEP, Tables, ML • Fully Declarative Programming • Event Patterns, Relations etc. Automates Domain-Specific APIs
  14. 14. 15 Declarative Streaming Examples Average Tip per Hour with Stream SQL
  15. 15. 16 Declarative Streaming Examples Completed Taxi Rides within 120min with Complex Event Processing
  16. 16. Example Use Cases Real-Time Analytics in Action https://flink.apache.org/poweredby.html https://www.flink-forward.org/
  17. 17. 18 Marketplace - Dynamic Ride Pricing with Apache Flink (2018) https://marketplace.uber.com/ Flink Forward 2018 • supply • demand (taxi orders) • Trips • Traffic Compute Location-Sensitive Trends in Rider Demand and Driver Availability Prices • Pricing • Dispatch • Promotions • Driver Positioning Geo-Sensitive Time-based Aggregations million events per sec Input Streams Output Decisions
  18. 18. 19 Flink as an Anomaly-Detection Engine for the Cloud (2018) • Activity-Based Threat Protection • Behavioural model/per cloud user • Detect outliers/suspicious behavior • Cross-reference suspicious users • Alert Admins within seconds We needed a stateful and scalable stream processing framework. We tested everything (Azure ML/Streams, MS Orlieans, Apache Storm/Samza/Spark/Ignite/Beam etc.) and chose Flink. - Yonatan Most & Avihai Berkovitz -https://www.slideshare.net/FlinkForward/flink-forward-berlin-2018-yonatan-most-avihai-berkovitz-anomaly-detection-engine-for-cloud-activities-using-flink 8 data clusters. many TB of state 30k events per second
  19. 19. 20 Data Streaming at Mass Scale https://data-artisans.com/blog/blink-flink-alibaba-search • Biggest Retailer in the world. • Entire Product Search, A/B Testing, User Recommendations and Analytics Services are powered by Blink (fork of Flink). • 1000s of nodes actively in production.
  20. 20. Continuous Deep Analytics CDA knowledge PROCESSING ∞ Data REASONING Decision Making The goal of the CDA • Create a Big Data platform that can leverage complex real-time decisions based on massive live data.
  21. 21. Real-Time and Deep Analytics for Central & Edge Clouds Our promise and vision From Real-Time Analytics to Continuous Deep Analytics X Query live data real-time answers Deep Analytics Historic Model historic data CDA system all data critical decision making Live Model online offline The Continuous Deep Analytics Paradigm Shift
  22. 22. ? ? ? ? The Bigger Picture 24 Data Processing • scalable, fault tolerant analytics • event-based business logic • out-of-order computation • dynamic relational tables (SQL) • event pattern-matching (CEP) Data Streams • tensors • graph algorithms • deep learning • feature learning • reinforcement learning • …. but what about deeper analytics…
  23. 23. Data Pipelines Today •Many Frameworks/Frontends for different needs •(ML Training & Serving, SQL, Streams, Tensors, Graphs) 25 ⋈ ⋈ ⋈ σθ σθ σθ σθ π π Streams Feature Learning Tensor Programming Dynamic Graphs AI ML RL Simulation tasks Reasoning Feature Engineering Model Serving
  24. 24. 26 Marketplace - Dynamic Ride Pricing with Apache Flink (2018) https://marketplace.uber.com/ Flink Forward 2018 • supply • demand (taxi orders) • Trips • Traffic Compute Location-Sensitive Trends in Rider Demand and Driver Availability Prices • Pricing • Dispatch • Promotions • Driver Positioning Geo-Sensitive Time-based Aggregations million events per sec Input Streams Output Decisions
  25. 25. The Problem & Solution Problem Data analytics pipelines build on diverse programming models with hard abstraction boundaries Performance deteriorates from context switching, steep data movement costs and excessive type conversions Solution A solution is to raise the level of abstraction through an intermediate representation (IR). The IR is a programming language that is able to both express and reason about each of the programming models.
  26. 26. ArconArcon Arcon 28 The Arcon Vision Tensors DataFrames DataStreams Graphs Unified Declarative Programming Shared Native Execution Cross-Compile Optimize Generated code
  27. 27. The Arcon Architecture 29 Unified Analytics DSL Arcon Runtime Arc IR (Intermediate Representation)
  28. 28. 30 Arc IR Translation Data Streams Linear Algebra Relational Algebra σθ σθ π ⋈ Core DSL Unified analytics DSL • Host language-agnostic core • Compositional • First-class citizen support for: • streams, tensors, relations
  29. 29. Stream Task The Arc Intermediate Representation Graph Task Tensor Task λ2 λ3λ1 λ1IR λ2IR λ3IR λ1 + λ2 + λ3
  30. 30. 32 Arcon Arc (High Level IR) Logical Dataflow IR Arcon runner Hardware Arcon Compiler Pipeline Dataflow optimizations Compiler optimizations Cross-domain optimizations Rust based runner Hardware accelerated Dynamic task execution CPU/GPU/FPGA Local & distributed Dynamic scaling Arc an IR for expressing and optimizing computations that combine stream, relations and linear algebra Arcon a general purpose distributed runtime written in Rust
  31. 31. Arc IR 33 • A minimal yet feature-complete set of read/write-only types and expressions
  32. 32. Arc Optimisations • Arc supports both compiler and dataflow optimisations • Compiler: Loop unrolling, partial evaluation, • Dataflow: Operator fusion, fission, reordering, specialization, ... 34
  33. 33. Performance 35 v + 3 v + 1 + 1 + 1 v + 1 v + 1 v + 1 v + 1 v + 1 v + 1 Unoptimised Fused Partially Evaluated Inlined (Task with function)
  34. 34. Performance 36 x2 orders of magnitude faster Unoptimised Partially Evaluated Fused Inlined • 10M elements mapped 50 times on Apache Flink • Arc can boost even existing frameworks
  35. 35. A Runtime Capable for Unified Analytics 37 Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications SOCC 2019 Garefalakis, Karanasos, Pietzuch Hadoop SparkFlink Arcon Storm
  36. 36. Performance Matters • Arc Optimizer : ~10x Speedup • Shared Hardware Acceleration : ~102x Speedup • Data Parallel Execution : ~103x Speedup 38
  37. 37. Thanks • To the CDA and HOPS teams and in general to the distributed computing group at KTH and RISE SICS • Please Visit • DC@KTH https://dcatkth.github.io/ • HOPS https://www.hops.io/ • LogicalClocks https://www.logicalclocks.com/

×