Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Introduction To Flink

168 visualizaciones

Publicado el

Apache Flink is an open source platform which is a streaming data flow engine that provides communication, fault-tolerance, and data-distribution for distributed computations over data streams. Flink is a top level project of Apache. Flink is a scalable data analytics framework that is fully compatible to Hadoop. Flink can execute both stream processing and batch processing easily.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Introduction To Flink

  1. 1. Presented By: Kundan Kumar Software Consultant An introduction to Apache Flink: 4G of Big Data
  2. 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Mute Be on mute until you have questions or concerns. Avoid Disturbance Avoid unwanted chit chat during the session.
  3. 3. Agenda 01 Big Data evolution 02 Introduction to Flink 03 Features of Flink Architecture of Flink Anatomy of a Flink program Demo 04 05 06
  4. 4. Big Data Evolution Problems with Big Data: ● Storing huge and exponentially growing datasets. ● Processing of huge data datasets having complex structure. ● 3v’s of Big Data - Volume, Variety, Velocity
  5. 5. Continue.. ● At early 2000, Big Data era started with multiple frameworks focusing on specifying Big Data problem.
  6. 6. Continue.. ● A unified platform that alone can handle various Big Data problem: ➢ Batch processing ➢ Stream processing ➢ Graph processing ➢ Iterative processing ● A unified platform must have following characteristics to solve Big Data Problem: ➢ Distributed/ parallel computation ➢ Fault tolerance ➢ Ease of use (developer friendly API’s) ➢ Powerful predefined operators/functions(Like Join, filter) ➢ Fast
  7. 7. Apache Spark (3G Big Data Framework) ● Spark is a lightning-fast cluster computing engine that is 100 times faster than Hadoop in running applications in memory ● Apache Spark is best known for its in-memory computing capabilities that deliver high-speed processing. ➢ Problem ● Process data streams in micro batches and not in real time. ● High throughput but medium latency in some use cases.
  8. 8. Introduction to Flink ● Apache Flink is a Big Data framework and distributed processing engine for stateful computations over unbounded and bounded data streams. ● Flink is based on the streaming first principle which means it is real streaming processing engine Flink considers batch processing as a special case of streaming ● Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
  9. 9. Source Transformations Sink
  10. 10. ➢ A Flink application may consume real-time data from streaming sources such as message queues or distributed logs, like Apache Kafka or Kinesis. ➢ Flink can also consume bounded, historic data from a variety of data sources. ➢ The streams of results being produced by a Flink application can be sent to a wide variety of systems that can be connected as sinks
  11. 11. ➢ Programs in Flink are inherently parallel and distributed. ➢ During execution, a stream has one or more stream partitions, and each operator has one or more operator subtasks.
  12. 12. ➢ Flink facilitate stateful operations. ➢ Current handling event can depend on the accumulated effect of all the events that came before it. ➢ The set of parallel instances of a stateful operator is effectively a sharded key-value store. Each parallel instance is responsible for handling events for a specific group of keys, and the state for those keys is kept locally.
  13. 13. Flink Architecture ➢ Flink 1.X's architecture consists of various components such as deploy, core processing, and APIs. ➢ Flink has a layered architecture and each component is a part of a specific layer. ➢ Each layer is built on top of the others for clear abstraction.
  14. 14. Flinks Distributed Execution ➢ Flink is based on master slave architecture. ➢ Various processes take part in the Flink’s program execution, namely Job Manager, Task Manager, and Job Client.
  15. 15. Flink Task Manager
  16. 16. Flink Features ➢ High performance ➢ Exactly-once stateful computation ➢ Fault tolerance ➢ Memory management ➢ Optimizer ➢ Unified platform for stream and batch ➢ Rich Libraries
  17. 17. Basic Anatomy of a Flink Program
  18. 18. DEMO
  19. 19. Q/A
  20. 20. References 1. 2. 3.
  21. 21. Thank You !