Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Chapter 1-Introduction.ppt
1. Chapter 1 - Introduction
Ayalew Belay (Dr.)
with thanks to Dr. Mulugeta Lebsie
2. 2
1.1 Introduction and Definition
Before the mid-80s, computers were
Very expensive (hundred of thousands or even millions of
dollars)
Very slow (a few thousand instructions per second)
Not connected among themselves
After the mid-80s: two major developments
Cheap and powerful microprocessor-based computers
appeared
Computer networks
LANs at speeds ranging from 10 to 1000 Mbps (now
even 10, 40, and 100Gbps)
WANs at speed ranging from 64 Kbps to gigabits/sec
Consequence
Feasibility of using a large network of computers to work for
the same application; this is in contrast to the old centralized
systems where there was a single computer with its
peripherals
3. 3
Definition of a Distributed System
A distributed system is:
A collection of independent computers that appears to its users
as a single coherent system - computer (Tanenbaum & Van
Steen)
Other Definitions
A distributed system is a system designed to support the
development of applications and services which can exploit a
physical architecture consisting of multiple, autonomous
processing elements that do not share primary memory but
cooperate by sending asynchronous messages over a
communication network (Blair & Stefani)
The definitions has two aspects:
1. Hardware: autonomous machines
2. Software: a single system view for the users
4. 4
A distributed system is one that stops you getting any work
done when a machine you have never even heard of crashes
(Leslie)
5. 5
Why Distributed?
Resource and Data Sharing
Printers, databases, multimedia servers, ...
Availability, Reliability
The loss of some instances can be hidden
Scalability, Extensibility
The system grows with demand (e.g., extra servers)
Performance
Huge power (CPU, memory, ...) available
Inherent distribution, communication
Organizational distribution, e-mail, video
6. 6
Problems of Distribution
Concurrency, Security
Clients must not disturb each other
Privacy
e.g., when building a preference profile such as using
cookies
Unwanted communication such as spam
Partial failure
We often do not know where the error is (e.g., RPC)
Location, Migration, Relocation, Replication
Clients must be able to find their servers
Heterogeneity
Hardware, platforms, languages, management
7. 7
Characteristics of Distributed Systems
Differences between the computers and the ways they
communicate are hidden from users
Users and applications can interact with a distributed system in
a consistent and uniform way regardless of location
Distributed systems should be easy to expand and scale
A distributed system is normally continuously available, even if
there may be partial failures
8. 8
1.2 Goals of a Distributed System
To support heterogeneous computers and networks and to
provide a single-system view, a distributed system is often
organized by means of a layer of software called middleware
that extends over multiple machines
A distributed system organized as middleware; note that the middleware
layer extends over multiple machines, and offers each application the
same interface
Ack: most diagrams in all slides are taken from the textbook
9. 9
A distributed system should
Easily connect users with resources (printers, computers,
storage facilities, data, files, Web pages, ...)
Some of the reasons
Economics: sharing resources such as printers and high-
speed computers
To collaborate and exchange information
Groupware: software for collaborative editing,
teleconferencing, etc.
e-commerce: buying and selling goods
Be transparent: hide the fact that the resources and processes
are distributed across multiple computers
Be open
Be scalable
Transparency in a Distributed System
A distributed system that is able to present itself to users and
applications as if it were only a single computer system is said
to be transparent
10. 10
Different forms of transparency in a distributed system
Transparency Description
AccessHide differences in data representation
(endianness, file naming, ...) and how a resource
is accessed
Location Hide where a resource is physically located; where
is http://www.prenhall.com/index.html? (naming)
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another
location while in use; e.g., mobile users using their wireless
laptops and moving from place to place
Replication Hide that a resource is replicated (for availability
and performance); all replicas have the same name
Concurrency Hide that a resource may be shared by several
competitive users; a resource must be left in a
consistent state; through locking
Failure Hide the failure and recovery of a resource
But trying to achieve all distribution transparency may be
impossible or may not be a good idea
11. 11
Openness in a Distributed System
A distributed system should be open
We need well-defined interfaces
Interoperability
Components of different origin can communicate
Portability
Components work on different platforms
Another goal of an open distributed system is that it should be
flexible and extensible; easy to configure the system out of
different components; easy to add new components, replace
existing ones; easier said than done
An Open Distributed System is a system that offers services
according to standard rules that describe the syntax and
semantics of those services; e.g., protocols in networks
Standards - a necessity
12. 12
Scalability in Distributed Systems
A distributed system should be scalable; there are three
dimensions
Size: adding more users and resources to the system
Geographically: users and resources may be far apart
Administratively: should be easy to manage even if it spans
many administrative organizations
But a scalable system may exhibit performance problems
In distributed systems, such services are often specified through
interfaces often described using an Interface Definition
Language (IDL)
Specify only syntax: the names of the functions, types of
parameters, return values, possible exceptions, ...
Semantics given in an informal way by means of natural
languages
13. 13
Concept Example
Centralized services
Single server for all users-mostly for
security reasons
Centralized data A single on-line telephone book
Centralized algorithms
Doing routing based on complete
information
Examples of scalability limitations
Scalability problems leading to low performance
Scaling Techniques: how to solve scaling problems
The problem is mainly performance, and arises as a result of
limitations in the capacity of servers and networks (for
geographical scalability with high latency and mostly unreliable
links)
Three possible solutions: hiding communication latencies,
distribution, and replication
14. 14
a. Hide Communication Latencies
Try to avoid waiting for responses to remote service requests
Let the requester do other useful job
i.e., construct requesting applications that use only
asynchronous communication instead of synchronous
communication; when a reply arrives the application is
interrupted
Good for batch processing and parallel applications since
independent tasks can be scheduled while another task is
waiting for communication to complete or use multithreading for
non-parallel programs
Hiding communication latencies is not in general applicable for
interactive applications
For interactive applications, try to reduce communication; move
part of the job to the client to reduce communication; e.g., filling
a form to access a database and checking the entries
15. 15
(a) A server checking the correctness of field entries
(b) A client doing the job
e.g., checking the completeness of mandatory fields
Shipping code is now supported in Web applications using Java
Applets and ActiveX controls (with some security issues)
16. 16
b. Distribution
Means splitting a component into smaller parts and spreading
those parts across the system
e.g., DNS - Domain Name System (abebe@aau.edu.et)
Divide the name space into nonoverlapping zones
For details, see later in Chapter 5 - Naming
An example of dividing the DNS name space into zones
17. 17
c. Replication
Replicate components across a distributed system to increase
availability and for load balancing, leading to better
performance
Replication is decided by the owner of a resource
Caching (a special form of replication) also reduces
communication latency; decided by the user
But, caching and replication may lead to consistency problems
(see Chapter 7 - Consistency and Replication)
18. 18
Pitfalls when Developing Distributed Systems
Because of false assumptions made by first time developers (of
distributed systems) which are related to the properties of
distributed systems and do not occur in nondistributed
applications
The network is reliable (making it difficult to achieve failure
transparency)
The network is secure
The network is homogeneous
The topology does not change
Latency is zero
Bandwidth is infinite
Transport cost is zero
There is one administrator
19. 19
1.3 Types of Distributed Systems
Three types: distributed computing systems, distributed
information systems, and distributed pervasive/embedded
systems
1. Distributed Computing Systems
Used for high-performance computing tasks
Two types: cluster computing and grid computing
Cluster Computing
A collection of similar workstations or PCs
(homogeneous), closely connected by means of a high-
speed LAN
Each node runs the same operating system
Used for parallel programming in which a single compute
intensive program is run in parallel on multiple machines
20. 20
An example of a cluster computing system
A master node runs a middleware (containing libraries for
parallel programs) and controls other compute nodes; it
Allocates tasks
Provides an interface to users
etc.
21. 21
Grid Computing
“Resource sharing and coordinated problem solving in
dynamic, multi-institutional virtual organizations” (Ian Foster)
High degree of heterogeneity: no assumptions are made
concerning hardware, operating systems, networks,
administrative domains, security policies, etc.
Globus is a software system for Grid Computing; read about
the Globus Alliance at http://www.globus.org/
2. Distributed Information Systems
Many networked applications
Problem: interoperability
At the lowest level: wrap a number of requests into a single
larger request and have it executed as a distributed
transaction; all or none of the requests would be executed
How to let applications communicate directly with each other,
i.e., Enterprise Application Integration (EAI)
22. 22
a. Transaction Processing Systems
Consider database applications
Special primitives are required to program transactions, supplied
either by the underlying distributed system or by the language
runtime system
Exact list of primitives depends on the type of application;
procedure calls, ordinary statements, etc. can also be included
Primitive Description
BEGIN_TRANSACTION Mark the start of a transaction
END_TRANSACTION
Terminate the transaction and try to
commit
ABORT_TRANSACTION
Kill the transaction and restore the old
values
READ
Read data from a file, a table, or
otherwise
WRITE
Write data to a file, a table, or
23. 23
The Transaction Model
The model for transactions comes from the world of business
A supplier and a retailer negotiate on
Price
Delivery date
Quality
etc.
Until the deal is concluded they can continue negotiating or one
of them can terminate
But once they have reached an agreement they are bound by
law to carry out their part of the deal
Transactions between processes is similar with this scenario
24. 24
e.g., assume the following banking operation
Withdraw an amount x from account 1
Deposit the amount x to account 2
What happens if there is a problem after the first activity is
carried out?
Group the two operations into one transaction; either both are
carried out or neither
We need a way to roll back when a transaction is not
completed
25. 25
(a) Transaction to reserve three flights commits
(b) Transaction aborts when the third flight is unavailable
BEGIN_TRANSACTION
reserve Man Heathrow;
reserve Heathrow Bole;
reserve Bole Lalibella;
END_TRANSACTION
(a)
BEGIN_TRANSACTION
reserve Man Heathrow;
reserve Heathrow Bole;
reserve Bole Lalibella full
ABORT_TRANSACTION
(b)
e.g. reserving a seat from Manchester to Lalibella through
Heathrow and AA Bole airports
26. 26
Properties of transactions, often referred to as ACID
1. Atomic: to the outside world, the transaction happens
indivisibly; a transaction either happens completely or not at
all; intermediate states are not seen by other processes
2. Consistent: the transaction does not violate system
invariants; e.g., in an internal transfer in a bank, the amount
of money in the bank must be the same as it was before the
transfer (the law of conservation of money); this may be
violated for a brief period of time, but not seen to other
processes
3. Isolated or Serializable: concurrent transactions do not
interfere with each other; if two or more transactions are
running at the same time, the final result must look as
though all transactions run sequentially in some order
4. Durable: once a transaction commits, the changes are
permanent; see later in Chapter 8 - Fault Tolerance
27. 27
Classification of Transactions
A transaction could be flat, nested or distributed
Flat Transaction
Consists of a series of operations that satisfy the ACID
properties
Simple and widely used but with some limitations
Do not allow partial results to be committed or aborted
i.e., atomicity is also partly a weakness
In our airline reservation example, we may want to
accept the first two reservations and find an alternative
one for the last
Some transactions may take too much time
28. 28
Nested Transaction
Constructed from a number of subtransactions; it is logically
decomposed into a hierarchy of subtransactions; the flight
reservation can be split into three transactions, each
accessing a different database
The top-level transaction forks off children that run in parallel,
on different machines; to gain performance or for
programming simplicity
Each may also execute one or more subtransactions
Permanence (durability) applies only to the top-level
transaction; commits by children should be undone
Distributed Transaction
A flat transaction that operates on data that are distributed
across multiple machines
Problem: separate algorithms are needed to handle the
locking of data and committing the entire transaction; see
later in Chapter 8 for distributed commit
30. 30
b. Enterprise Application Integration
How to integrate applications independent from their
databases
Transaction systems rely on request/reply
How can applications communicate with each other; by
means of a middleware
Middleware as a communication facilitator in enterprise application
integration
31. 31
There are different communication models
RPC (Remote procedure Call)
RMI (Remote Method Invocation)
MOM (Message-Oriented Middleware)
Stream-Oriented Communication
Multicast Communication
See later in Chapter 4 - Communication
3. Distributed Pervasive Systems
The distributed systems discussed so far are characterized
by their stability; fixed nodes having high-quality connection
to a network
There are also mobile and embedded computing devices
which are small, battery-powered, mobile, and with a
wireless connection
32. 32
Three requirements for pervasive applications
Embrace contextual changes: a device is aware that its
environment (location, identities of nearby people and
objects, time of the day, season, temperature, etc.) may
change all the time, e.g., by changing its network access
point; hence its operations and services must be adapted to
the current context
Encourage ad hoc composition: devices are used in different
ways by different users
Recognize sharing as the default: devices join a system to
access or provide information
Examples of pervasive systems
Home Systems that integrate consumer electronics
Electronic Health Care Systems to monitor the well-being of
individuals
Sensor Networks
Read pages 26 - 30
33. 33
Different approaches to distribution - Lost in the forest of
distribution
Distributed System
N autonomous computers (sites): n administrators, n
data/control flows
An interconnection network
User view: one single (virtual) system
(traditional) programmer view: client-server
Parallel System
1 computer, n nodes: one administrator, one scheduler,
one power source
Memory: it depends
Programmer view: one single machine executing parallel
codes; various programming models (message passing,
distributed shared memory, …)
34. 34
Cluster Computing
Use of PCs interconnected by a (high performance) network
as a parallel (cheap) machine
Network Computing
From LAN (cluster) computing to WAN computing
Set of machines distributed over a MAN/WAN that are used
to execute parallel loosely coupled codes
Depending on the infrastructure, network computing comes
in many flavours: grid computing, P2P, Internet computing,
etc.
a. Grid Computing
“Resource sharing and coordinated problem solving in
dynamic, multi-institutional virtual organizations” (Ian
Foster)
b. Peer-to-Peer Computing
A site is both client and server
Application: mostly file sharing, but also others like
Internet Telephony (Skype)
35. 35
2 approaches:
Centralized management: Napster
Distributed management: Gnutella, Kazaa
c. Internet Computing
Use of (idle) computers interconnected by Internet for
processing large throughput applications
Programmer view: a single master, n servants
Cloud Computing
practice of using a network of remote servers hosted on the
Internet to store, manage, and process data, rather than a local
server or a personal computer.
A general term for anything that involves delivering hosted
services over the Internet
36. A model for enabling convenient, on-demand network
access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications,
and services) that can be rapidly provisioned and
released with minimal management effort or service
provider interaction
Service models: Software as a Service - SaaS; Platform
as a Service – PaaS; Infrastructure as a Service - IaaS
36