Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Distributed Systems in Data Engineering

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
 
 
Lesson Keynote 
Distributed Systems in Data Engineering 
By: Oluwasegun Matthew | oadetimehin@terragonltd.com 
 
Summa...
 
 
1. Introduction to Distributed Systems 
According to Wikipedia through Google,  
 
A distributed system in its most si...
 
 
The Concept of Server-Client Architecture 
Client-server architecture(client/server) is a network architecture in whic...
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 17 Anuncio

Distributed Systems in Data Engineering

Descargar para leer sin conexión

A distributed system in its most simplest definition is a group of computers working together as to
appear as a single computer to the end-user. These machines have a shared state, operate
concurrently and can fail independently without affecting the whole system’s uptime.
This is in line with ever-growing technological expansion of the world, distributed systems are
becoming more and more widespread. Take a look at the increasing number of available
computer technologies/innovation around, this is sporadically increasing, and this result in
intense computational requirement.
Yeah, Moore’s law proposed more computing power by fitting more transistors (which
approximately doubles every two years) into a simple chip using cost-efficient approach - cool,
but over the past 5 years, there has been little deviation from this - ability to scale horizontally
and not just vertically alone.

A distributed system in its most simplest definition is a group of computers working together as to
appear as a single computer to the end-user. These machines have a shared state, operate
concurrently and can fail independently without affecting the whole system’s uptime.
This is in line with ever-growing technological expansion of the world, distributed systems are
becoming more and more widespread. Take a look at the increasing number of available
computer technologies/innovation around, this is sporadically increasing, and this result in
intense computational requirement.
Yeah, Moore’s law proposed more computing power by fitting more transistors (which
approximately doubles every two years) into a simple chip using cost-efficient approach - cool,
but over the past 5 years, there has been little deviation from this - ability to scale horizontally
and not just vertically alone.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Distributed Systems in Data Engineering (20)

Anuncio

Más reciente (20)

Anuncio

Distributed Systems in Data Engineering

  1. 1.     Lesson Keynote  Distributed Systems in Data Engineering  By: Oluwasegun Matthew | oadetimehin@terragonltd.com    Summary  1. Introduction to Distributed Systems  a. The concept of server-client architecture  b. Channel for Communication  c. Impact on Data Engineering at Scale  2. From Localhost to Production - things to watchout for...  3. Industry based Technologies/Tools in View  a. Messaging kits - RabbitMQ & Kafka  b. In Memory Data Caching - Redis & Aerospike  c. Data in Stream Tools - AWS Kinesis  d. Monitoring and Log Watch - CloudWatch  4. Summary - in class  5. Questions  Class Activity:​ ​Form 4 groups, choose from any of the messaging and in-memory data caching  tool, use this to ​create a resilient distributed system​ to fix the following problems:   - Crashing nature of e-Portal portal  - Exam records processing  Let’s Dive In...  1 
  2. 2.     1. Introduction to Distributed Systems  According to Wikipedia through Google,     A distributed system in its most simplest definition is a group of computers working together as to                                  appear as a single computer to the end-user. These machines have a shared state, operate                              concurrently and can fail independently without affecting the whole system’s uptime.  This is in line with ever-growing technological expansion of the world, distributed systems are                            becoming more and more widespread. Take a look at the increasing number of available                            computer technologies/innovation around, this is sporadically increasing, and this result in                      intense computational requirement.  Yeah, Moore’s law proposed more computing power by fitting more transistors (which                        approximately doubles every two years) into a simple chip using cost-efficient approach - cool,                            but over the past 5 years, there has been little deviation from this - ability to scale horizontally                                    and not just vertically alone.              2 
  3. 3.     The Concept of Server-Client Architecture  Client-server architecture(client/server) is a network architecture in which each computer or                      process on the network is either a client or a server.   Just the way it is in a general world, activities is usually based on server/client relationship and                                  this isn’t different in technology too e.g Cashier/Customer, Bus Conductor/Passengers etc.  Another type of network architecture is known as a peer-to-peer architecture because each node                            has equivalent responsibilities -​ but this isn’t what we are discussing today            The approach of breaking breaking larger application into chunks over a server-client  architecture can be explained with ​Microservices. ​Consider the cases below:      3 
  4. 4.     Case 1 - Monolith: ​At the core of the application is the business logic, which is implemented by                                    modules that define services, domain objects, and events. Surrounding the core are adapters that                            interface with the external world. Examples of adapters include database access components,                        messaging components that produce and consume messages, and web components that either                        expose API or implement a UI - this results in ​Monolithic Hell      4 
  5. 5.     Case 2 - Microservices:​ Here we are tackling complexity, A service typically implements a set of  distinct features or functionality, such as order management, customer management etc. Each  microservice is a mini-application that has its own hexagonal architecture consisting of business  logic along with various adapters. Some microservices world expose an API that’s consumed by  other microservies or by the application’s client. Other microservices might implement a web UI.  At runtime, each instance is often a cloud VM or a Docker container.      5 
  6. 6.             Quiz ​Give…   ● Examples of a client/server relationship in real world  ● Methods of binding two systems that you know  ● Two architectures in which softwares are designed  ● Major issue with Monolithic design                      6 
  7. 7.     Channel for Communication  When we have a decentralized system, it’s important for us to make these systems communicate                              with one-another. The client/server architecture emphasis a producer/consumer computing                  architecture where the server acts as the producer and the client as a consumer. The model of                                  communication can either be ​synchronous​ or ​asynchronous​. Each of this further broken into:  - API Mode  - Buffer Mode    API Mode: is a synchronous (or instant feedback) mode of communication. It usually used for                              one-to-one type of communication through protocols like http, https, smtp, smpp etc.    Buffer Mode: is an asynchronous mode of communication, where feedback isn’t needed                        immediately. It works for both one-to-one and broadcast communication. In this mode of                          communication, a queuing/messaging/buffering system is placed in between these two systems                      to manage flow of information. Here the following queuing algorithm is emphasized:  - FIFO (First In First Out)  - LIFO (Last In First Out)  - SJF (Shortest Job First)  - Round Robin    Impact on Data Engineering at Scale  Again, bringing the concept of distributed system into data Engineering...Hey, what’s data                        engineering?  Data engineering is the act of building and managing information or “big data” infrastructure.                            Data engineers create architecture that helps analyze and process data in the way it’s needed by                                an organization, from data processing to creating a pipeline of data into lake and warehouse for                                business value creation.  The following are some of the positive impacts of distributed system in data engineering:  - Creating resilient data architecture  - Easily managed systems  7 
  8. 8.     - Security and control  - Reduced failure point  - Fault detection with ease                                              8 
  9. 9.                   Quiz ​Mention...   ● 2 Queue algorithms you are familiar with  ● Web Technologies that runs on HTTP protocol                  9 
  10. 10.     2. From Localhost to Production - things to watchout for..  When systems are built on development environment, a lot isn’t considered, this may be due to  experience, right information or un-envisaged circumstances. This implies that a perfect system  cannot be built at development stage until it’s tested in real-life scenario.  Sometimes, system overkill design might be a major flaw of the development phase, but the  production will really tell or not.  List of things to watch out:  - Unexpected spike in platform/technology usage - system overload  - Performance as a result of consistent platform usage  - Security of interconnected systems  - Extensibility of features  - Easy of deployment                            10 
  11. 11.     Enough of theoretical exposition, Let’s go practical…    3. Industry based Technologies/Tools in View  Here we shall talk about the different tools used in the industry to manage distributed system  Messaging Kits ​- e.g. RabbitMQ or Kafka    RabbitMQ is the most widely deployed open source message broker - ​https://www.rabbitmq.com/  Tutorial Guide (in PHP) - https://www.rabbitmq.com/tutorials/tutorial-three-php.html      11 
  12. 12.         12 
  13. 13.               13 
  14. 14.     In Memory Data Caching ​- e.g. Redis or Aerospike      Redis is an open source in-memory data structure store used as a databse, cache and message                                broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range                              queries, bitmaps, hyperlogs etc.. - ​https://redis.io/    Documentation found here for PHP: https://github.com/amphp/redis  14 
  15. 15.       Data in Steam Tools ​- AWS Kinesis    AWS Kinesis makes it easy to collect, process and analyze real-time streaming data so you can                                get timely insights and react quickly to new information; owned by Amazon   - https://aws.amazon.com/kinesis/        15 
  16. 16.     Monitoring and Logs Watch​ - CloudWatch  `  AWS Cloudwatch is a monitoring and management service built for developers, system                        operators, site reliability engineers (SRE), and IT managers https://aws.amazon.com/cloudwatch/                          Assessment  See class activity on the first page...          16 
  17. 17.             Questions and Mentorship  For further questions, collaboration or mentorship, reach out:  Email: oadetimehin@terragonltd.com   Mobile: 07060514642      17 

×