SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
MAP REDUCE AND OTHER MATTERS REVISITED
GENOVEVAVARGAS SOLAR
FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE
Genoveva.Vargas@imag.fr
http://www.vargas-solar.com/teaching
http://www.vargas-solar.com
MAPREDUCE: SIMPLIFIED DATA PROCESSING ON
LARGE CLUSTERS
BY: JEFFREY DEAN AND SANJAY GHEMAWAT
PRESENTED BY: HIND ALHAKAMI
¡  Simple queries on huge amount of data.
¡  Distribution needed, which Complicate the problem.
PROBLEM TO SOLVE
SOLUTION
¡  Count number of occurrences of each word in the
given literature
To be or not to
be that is the
question
The head is not
more native
than the heart
Brevity is the
soul of wit
EXAMPLE
Literature
To be or not
to be that is
the question
SIMPLIFIED EXAMPLE
¡  For Simplicity, Consider counting words in one
document.
{(" To",1), ("be",1), ("or",
1), ("not",1), ("to",1),
("be",1), ("that",1), ("is",
1), ("the",1),
("question",1)}
FIRST STEP - MAP
{("to",2), ("be",2), ("or",
1), ("not",1), ("that",1),
("is",1), ("the",1),
("question",1)}
SECOND STEP - REDUCE
¡  {(“the”,3), ...}
To be or not to
be that is the
question
The head is not
more native
than the heart
Brevity is the
soul of wit
BACK TO THE ORIGINAL EXAMPLE
EXECUTION OVERVIEW
WORKER FAILURE
MASTER FAILURE
¡ Cluster of approximately 1800 machine.
¡ Two Benchmarks:
¡  Grep
¡  Sort
EXPERIMENT
PERFORMANCE
PERFORMANCE
Normal execution No backup tasks 200 tasks killed
¡  Combiner function.
¡  Skipping bad records.
¡  Local execution.
¡  Status Information.
OPTIMIZATIONS
¡  MapReduce provides a convenience level of abstraction.
¡  Supports scalability and fault-tolerance.
¡  Preserves network bandwidth.
CONCLUSION
BIGTABLE:A DISTRIBUTED STORAGE
SYSTEM FOR STRUCTURED DATA
FAY CHANG, JEFFREY DEAN, SANJAY GHEMAWAT,WILSON C. HSIEH, DEBORAH A.WALLACH, MIKE BURROWS,TUSHAR CHANDRA,ANDREW FIKES, ROBERT E. GRUBER @ GOOGLE
PRESENTED BY RICHARDVENUTOLO
INTRODUCTION
¡  BigTable is a distributed storage system for managing structured data.
¡  Designed to scale to a very large size
¡  Petabytes of data across thousands of servers
¡  Used for many Google projects
¡  Web indexing, Personalized Search, Google Earth, Google Analytics, Google
Finance, …
¡  Flexible, high-performance solution for all of Google’s products
MOTIVATION
¡  Lots of (semi-)structured data at Google
¡  URLs:
¡  Contents, crawl metadata, links, anchors, pagerank, …
¡  Per-user data:
¡  User preference settings, recent queries/search results, …
¡  Geographic locations:
¡  Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, …
¡  Scale is large
¡  Billions of URLs, many versions/page (~20K/version)
¡  Hundreds of millions of users, thousands or q/sec
¡  100TB+ of satellite image data
WHY NOT JUST USE COMMERCIAL DB?
¡ Scale is too large for most commercial databases
¡ Even if it weren’t, cost would be very high
¡  Building internally means system can be applied across many
projects for low incremental cost
¡ Low-level storage optimizations help performance
significantly
¡  Much harder to do when running on top of a database layer
GOALS
¡  Want asynchronous processes to be continuously updating different pieces
of data
¡  Want access to most current data at any time
¡  Need to support:
¡  Very high read/write rates (millions of ops per second)
¡  Efficient scans over all or interesting subsets of data
¡  Efficient joins of large one-to-one and one-to-many datasets
¡  Often want to examine data changes over time
¡  E.g. Contents of a web page over multiple crawls
BIGTABLE
¡  Distributed multi-level map
¡  Fault-tolerant, persistent
¡  Scalable
¡  Thousands of servers
¡  Terabytes of in-memory data
¡  Petabyte of disk-based data
¡  Millions of reads/writes per second, efficient scans
¡  Self-managing
¡  Servers can be added/removed dynamically
¡  Servers adjust to load imbalance
BUILDING BLOCKS
¡  Building blocks:
¡  Google File System (GFS): Raw storage
¡  Scheduler: schedules jobs onto machines
¡  Lock service: distributed lock manager
¡  MapReduce: simplified large-scale data processing
¡  BigTable uses of building blocks:
¡  GFS: stores persistent data (SSTable file format for storage of data)
¡  Scheduler: schedules jobs involved in BigTable serving
¡  Lock service: master election, location bootstrapping
¡  Map Reduce: often used to read/write BigTable data
BASIC DATA MODEL
¡  A BigTable is a sparse, distributed persistent multi-dimensional sorted map
(row, column, timestamp) -> cell contents
¡  Good match for most Google applications
WEBTABLE EXAMPLE
¡  Want to keep copy of a large collection of web pages and related information
¡  Use URLs as row keys
¡  Various aspects of web page as column names
¡  Store contents of web pages in the contents: column under the timestamps when
they were fetched.
ROWS
¡  Name is an arbitrary string
¡  Access to data in a row is atomic
¡  Row creation is implicit upon storing data
¡  Rows ordered lexicographically
¡  Rows close together lexicographically usually on one or a small number of machines
ROWS (CONT.)
Reads of short row ranges are efficient and typically require communication with a small number of machines.
¡  Can exploit this property by selecting row keys so they get good locality for data access.
¡  Example:
math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu
VS
edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys
COLUMNS
¡  Columns have two-level name structure:
¡  family:optional_qualifier
¡  Column family
¡  Unit of access control
¡  Has associated type information
¡  Qualifier gives unbounded columns
¡  Additional levels of indexing, if desired
TIMESTAMPS
¡  Used to store different versions of data in a cell
¡  New writes default to current time, but timestamps for writes can also be set explicitly by clients
¡  Lookup options:
¡  “Return most recent K values”
¡  “Return all values in timestamp range (or all values)”
¡  Column families can be marked w/ attributes:
¡  “Only retain most recent K values in a cell”
¡  “Keep values until they are older than K seconds”
IMPLEMENTATION – THREE MAJOR COMPONENTS
¡  Library linked into every client
¡  One master server
¡  Responsible for:
¡  Assigning tablets to tablet servers
¡  Detecting addition and expiration of tablet servers
¡  Balancing tablet-server load
¡  Garbage collection
¡  Many tablet servers
¡  Tablet servers handle read and write requests to its table
¡  Splits tablets that have grown too large
IMPLEMENTATION (CONT.)
¡  Client data doesn’t move through master server. Clients communicate directly with tablet servers for reads
and writes.
¡  Most clients never communicate with the master server, leaving it lightly loaded in practice.
TABLETS
¡  Large tables broken into tablets at row boundaries
¡  Tablet holds contiguous range of rows
¡  Clients can often choose row keys to achieve locality
¡  Aim for ~100MB to 200MB of data per tablet
¡  Serving machine responsible for ~100 tablets
¡  Fast recovery:
¡  100 machines each pick up 1 tablet for failed machine
¡  Fine-grained load balancing:
¡  Migrate tablets away from overloaded machine
¡  Master makes load-balancing decisions
TABLET LOCATION
¡  Since tablets move around from server to server, given a row, how do clients find the right machine?
¡  Need to find tablet whose row range covers the target row
TABLET ASSIGNMENT
¡  Each tablet is assigned to one tablet server at a time.
¡  Master server keeps track of the set of live tablet servers and current assignments of tablets to servers. Also
keeps track of unassigned tablets.
¡  When a tablet is unassigned, master assigns the tablet to an tablet server with sufficient room.
API
¡  Metadata operations
¡  Create/delete tables, column families, change metadata
¡  Writes (atomic)
¡  Set(): write cells in a row
¡  DeleteCells(): delete cells in a row
¡  DeleteRow(): delete all cells in a row
¡  Reads
¡  Scanner: read arbitrary cells in a bigtable
¡  Each row read is atomic
¡  Can restrict returned rows to a particular range
¡  Can ask for just data from 1 row, all rows, etc.
¡  Can ask for all columns, just certain column families, or specific columns
REFINEMENTS: LOCALITY GROUPS
¡  Can group multiple column families into a locality group
¡  Separate SSTable is created for each locality group in each tablet.
¡  Segregating columns families that are not typically accessed together enables more efficient reads.
¡  In WebTable, page metadata can be in one group and contents of the page in another group.
REFINEMENTS: COMPRESSION
¡ Many opportunities for compression
¡  Similar values in the same row/column at different timestamps
¡  Similar values in different columns
¡  Similar values across adjacent rows
¡ Two-pass custom compressions scheme
¡  First pass: compress long common strings across a large window
¡  Second pass: look for repetitions in small window
¡ Speed emphasized, but good space reduction (10-to-1)
REFINEMENTS: BLOOM FILTERS
¡  Read operation has to read from disk when desired SSTable isn’t in
memory
¡  Reduce number of accesses by specifying a Bloom filter.
¡  Allows us ask if an SSTable might contain data for a specified row/column pair.
¡  Small amount of memory for Bloom filters drastically reduces the number of
disk seeks for read operations
¡  Use implies that most lookups for non-existent rows or columns do not need
to touch disk
 
THE	
  FUTURE	
  OF	
  HIGH	
  PERFORMANCE	
  DATABASE	
  
SYSTEMS	
  
AUTHORS:	
  DAVID	
  DEWITT	
  AND	
  JIM	
  GRAY	
  
PRESENTER:	
  JENNIFER	
  TOM	
  
BASED	
  ON	
  PRESENTATION	
  BY	
  AJITH	
  KARIMPANA	
  
OUTLINE	
  
¡  Why	
  Parallel	
  Databases?	
  
¡  Parallel	
  Database	
  ImplementaGon	
  
¡  Parallel	
  Database	
  Systems	
  
¡  Future	
  DirecGons	
  and	
  Research	
  Problems	
  
WHY	
  PARALLEL	
  DATABASES?	
  
•  Relational Data Model – Relational queries are ideal candidates for parallelization
•  Multiprocessor systems using inexpensive microprocessors provide more power and scalability than
expensive mainframe counterparts
Why do the relational queries allow for parallelization?
PROPERTIES	
  OF	
  IDEAL	
  PARALLELISM	
  
¡  Linear	
  Speedup	
  –	
  An	
  N-­‐Gmes	
  large	
  system	
  yields	
  a	
  speedup	
  of	
  N	
  
¡  Linear	
  Scaleup	
  –	
  An	
  N-­‐Gmes	
  larger	
  system	
  performs	
  an	
  N-­‐Gmes	
  larger	
  job	
  in	
  the	
  same	
  elapsed	
  Gme	
  as	
  the	
  
original	
  system	
  
Why	
  are	
  linear	
  speedup	
  and	
  linear	
  scaleup	
  indicators	
  of	
  good	
  parallelism?	
  
TYPES	
  OF	
  SCALEUP	
  
¡  Transac'onal	
  –	
  N-­‐Gmes	
  as	
  many	
  clients	
  submit	
  N-­‐Gmes	
  as	
  many	
  requests	
  against	
  an	
  N-­‐Gmes	
  larger	
  
database.	
  
¡  Batch	
  –	
  N-­‐Gmes	
  larger	
  problem	
  requires	
  an	
  N-­‐Gmes	
  larger	
  computer.	
  
Should	
  we	
  look	
  at	
  both	
  transacGonal	
  scaleup	
  and	
  batch	
  scaleup	
  for	
  all	
  situaGons	
  when	
  assessing	
  performance?	
  
BARRIERS	
  TO	
  LINEAR	
  SPEEDUP	
  AND	
  	
  
LINEAR	
  SCALEUP	
  
¡  Startup:	
  The	
  Gme	
  needed	
  to	
  start	
  a	
  parallel	
  operaGon	
  dominates	
  the	
  
total	
  execuGon	
  Gme	
  
¡  Interference:	
  A	
  slowdown	
  occurs	
  as	
  new	
  processes	
  are	
  added	
  when	
  
accessing	
  shared	
  resources	
  
¡  Skew:	
  A	
  slowdown	
  occurs	
  when	
  the	
  variance	
  exceeds	
  the	
  mean	
  in	
  job	
  
service	
  Gmes	
  
	
  
Give	
  example	
  situaGons	
  where	
  you	
  might	
  come	
  across	
  interference	
  and	
  
skew	
  barriers?	
  
SHARED-­‐NOTHING	
  MACHINES	
  
¡  Shared-­‐memory	
  –	
  All	
  processors	
  have	
  equal	
  access	
  to	
  a	
  global	
  memory	
  and	
  all	
  disks	
  
¡  Shared-­‐disk	
  –	
  Each	
  processor	
  has	
  its	
  own	
  private	
  memory,	
  but	
  has	
  equal	
  access	
  to	
  all	
  disks	
  
¡  Shared-­‐nothing	
  –	
  Each	
  processor	
  has	
  its	
  own	
  private	
  memory	
  and	
  disk(s)	
  
Why	
  do	
  shared-­‐nothing	
  systems	
  work	
  well	
  for	
  database	
  systems?	
  
PARALLEL	
  DATAFLOW	
  APPROACH	
  
¡  RelaGonal	
  data	
  model	
  operators	
  take	
  relaGons	
  as	
  input	
  and	
  produce	
  relaGons	
  as	
  output	
  
¡  Allows	
  operators	
  to	
  be	
  composed	
  into	
  dataflow	
  graphs	
  (operaGons	
  can	
  be	
  executed	
  in	
  parallel)	
  
How	
  would	
  you	
  compose	
  a	
  set	
  of	
  operaGons	
  for	
  an	
  SQL	
  query	
  to	
  be	
  done	
  in	
  parallel?	
  	
  
PIPELINED	
  VS.	
  PARTITIONED	
  PARALLELISM	
  
¡  Pipelined:	
  The	
  computaGon	
  of	
  one	
  operator	
  is	
  done	
  at	
  the	
  same	
  Gme	
  as	
  another	
  as	
  data	
  is	
  streamed	
  into	
  
the	
  system.	
  
¡  Par''oned:	
  The	
  same	
  set	
  of	
  operaGons	
  is	
  performed	
  on	
  tuples	
  parGGoned	
  across	
  the	
  disks.	
  The	
  set	
  of	
  
operaGons	
  are	
  done	
  in	
  parallel	
  on	
  parGGons	
  of	
  the	
  data.	
  
	
  
Can	
  you	
  give	
  brief	
  examples	
  of	
  each?	
  Which	
  parallelism	
  is	
  generally	
  be]er	
  for	
  databases?	
  
DATA	
  PARTITIONING	
  
•  Round-robin – For n processors, the ith tuple is assigned to the “i mod n” disk.
•  Hash partitioning – Tuples are assigned to the disks using a hashing function applied to select fields of the
tuple.
•  Range partitioning – Clusters tuples with similar attribute values on the same disk.
What operations/situations are ideal (or not ideal) for the above partitioning types?
What issues to database performance may arise due to partitioning?
SKEW	
  PROBLEMS	
  
¡  Data	
  skew	
  –	
  The	
  number	
  of	
  tuples	
  vary	
  widely	
  	
  across	
  the	
  parGGons.	
  	
  	
  
¡  Execu'on	
  skew	
  –	
  Due	
  to	
  data	
  skew,	
  more	
  Gme	
  is	
  spent	
  to	
  execute	
  an	
  operaGon	
  on	
  parGGons	
  that	
  contain	
  
larger	
  numbers	
  of	
  tuples.	
  
	
  
How	
  could	
  these	
  skew	
  problems	
  be	
  minimized?	
  
	
  
PARALLEL	
  EXECUTION	
  OF	
  RELATIONAL	
  OPERATORS	
  
¡  Rather	
  than	
  write	
  new	
  parallel	
  operators,	
  use	
  parallel	
  streams	
  of	
  data	
  to	
  execute	
  exisGng	
  relaGonal	
  operators	
  
in	
  parallel	
  
¡  RelaGonal	
  operators	
  have	
  (1)	
  a	
  set	
  of	
  input	
  ports	
  and	
  (2)	
  an	
  output	
  port	
  
¡  Parallel	
  dataflow	
  –	
  parGGoning	
  and	
  merging	
  of	
  data	
  streams	
  into	
  the	
  operator	
  ports	
  
	
  
Are	
  there	
  situaGons	
  in	
  which	
  parallel	
  dataflow	
  does	
  not	
  work	
  well?	
  
SIMPLE	
  RELATIONAL	
  DATAFLOW	
  GRAPH	
  
PARALLELIZATION	
  OF	
  HASH-­‐JOIN	
  
•  Sort-merge join does not work well if there is data skew
•  Hash-join is more resistant to data skew
•  To join 2 relations, we divide the join into a set of smaller joins
•  The union of these smaller joins should result in the join of the 2 relations
How might other relational algorithms/operations (like sorting) be parallelized to improve query performance?
PARALLEL	
  DATABASE	
  SYSTEMS	
  
•  Teradata – Shared-nothing systems that use commodity hardware. Near linear speedup and scaleup on
relational queries.
•  Tandem NonStop SQL – Shared-nothing architecture. Near linear speedup and scaleup for large
relational queries.
•  Gamma – Shared-nothing system. Provides hybrid-range partitioning. Near linear speed and scaleup
measured.
•  The Super Database Computer – Has special-purpose components. However, is shared-nothing
architecture with dataflow.
•  Bubba – Shared-nothing system, but uses shared-memory for message passing. Uses FAD rather than SQL.
Uses single-level storage
Should we focus on the use of commodity hardware to build parallel database systems?
DATABASE	
  MACHINES	
  AND	
  GROSCH’S	
  LAW	
  
¡  Grosch’s	
  Law	
  –	
  The	
  performance	
  of	
  computer	
  systems	
  increase	
  as	
  the	
  square	
  of	
  their	
  cost.	
  The	
  more	
  
expensive	
  systems	
  have	
  be]er	
  cost	
  to	
  performance	
  raGos.	
  
¡  Less	
  expensive	
  shared-­‐nothing	
  systems	
  like	
  Gamma,	
  Tandem,	
  and	
  Teradata	
  achieve	
  near-­‐linear	
  performance	
  
(compared	
  to	
  mainframes)	
  
¡  Suggests	
  that	
  Grosch’s	
  Law	
  no	
  longer	
  applies	
  
FUTURE	
  DIRECTIONS	
  AND	
  RESEARCH	
  PROBLEMS	
  
¡  Concurrent	
  execu'on	
  of	
  simple	
  and	
  complex	
  queries	
  
¡  Problem	
  with	
  locks	
  and	
  large	
  queries	
  
¡  Priority	
  scheduling	
  (priority	
  inversion	
  scheduling)	
  
¡  Parallel	
  Query	
  Op'miza'on	
  –	
  OpGmizers	
  do	
  not	
  take	
  into	
  account	
  
parallel	
  algorithms.	
  Data	
  skew	
  creates	
  poor	
  query	
  plan	
  cost	
  esGmates.	
  
¡  Applica'on	
  Program	
  Paralleliza'on	
  –	
  Database	
  applicaGons	
  are	
  not	
  
parallelized.	
  No	
  effecGve	
  way	
  to	
  automate	
  the	
  parallelizaGon	
  of	
  
applicaGons.	
  
	
  
FUTURE	
  DIRECTIONS	
  AND	
  RESEARCH	
  PROBLEMS	
  
¡  Physical	
  Database	
  Design	
  –	
  Tools	
  are	
  needed	
  to	
  determine	
  indexing	
  and	
  
parGGoning	
  schemes	
  given	
  a	
  database	
  and	
  its	
  workload.	
  
¡  On-­‐line	
  Data	
  Reorganiza'on	
  –	
  Availability	
  of	
  data	
  for	
  concurrent	
  reads	
  and	
  
writes	
  as	
  database	
  uGliGes	
  create	
  indices,	
  reorganize	
  data,	
  etc.	
  This	
  process	
  
must	
  be	
  (1)	
  online,	
  (2)	
  incremental,	
  (3)	
  parallel,	
  and	
  (4)	
  recoverable.	
  
	
  
	
  
Which	
  of	
  these	
  issues	
  may	
  be	
  extremely	
  difficult	
  to	
  resolve?	
  
Contact: GenovevaVargas-Solar, CNRS, LIG-LAFMIA
Genoveva.Vargas@imag.fr	
  
http://www.vargas-­‐solar.com/teaching	
  	
  
58

Más contenido relacionado

La actualidad más candente

An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
Frane Bandov
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
Cyanny LIANG
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
zingopen
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 

La actualidad más candente (20)

Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Large Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduceLarge Scale Math with Hadoop MapReduce
Large Scale Math with Hadoop MapReduce
 
Overview of Hadoop and HDFS
Overview of Hadoop and HDFSOverview of Hadoop and HDFS
Overview of Hadoop and HDFS
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 

Similar a 3 map reduce perspectives

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
Guillermo Julca
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
guest18a0f1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
Genoveva Vargas-Solar
 

Similar a 3 map reduce perspectives (20)

Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and Boxwood
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
01 hbase
01 hbase01 hbase
01 hbase
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
 
Nosql
NosqlNosql
Nosql
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 

Más de Genoveva Vargas-Solar

Más de Genoveva Vargas-Solar (7)

Aiccsa 2021-w-stem
Aiccsa 2021-w-stemAiccsa 2021-w-stem
Aiccsa 2021-w-stem
 
Talk straps: Interactivity between Human and Artificial Intelligence
Talk straps: Interactivity between Human and Artificial IntelligenceTalk straps: Interactivity between Human and Artificial Intelligence
Talk straps: Interactivity between Human and Artificial Intelligence
 
Data w-steamm
Data w-steammData w-steamm
Data w-steamm
 
FROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEY
FROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEYFROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEY
FROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEY
 
FROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEY
FROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEYFROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEY
FROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEY
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

3 map reduce perspectives

  • 1. MAP REDUCE AND OTHER MATTERS REVISITED GENOVEVAVARGAS SOLAR FRENCH COUNCIL OF SCIENTIFIC RESEARCH, LIG-LAFMIA, FRANCE Genoveva.Vargas@imag.fr http://www.vargas-solar.com/teaching http://www.vargas-solar.com
  • 2. MAPREDUCE: SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS BY: JEFFREY DEAN AND SANJAY GHEMAWAT PRESENTED BY: HIND ALHAKAMI
  • 3. ¡  Simple queries on huge amount of data. ¡  Distribution needed, which Complicate the problem. PROBLEM TO SOLVE
  • 5. ¡  Count number of occurrences of each word in the given literature To be or not to be that is the question The head is not more native than the heart Brevity is the soul of wit EXAMPLE Literature
  • 6. To be or not to be that is the question SIMPLIFIED EXAMPLE ¡  For Simplicity, Consider counting words in one document.
  • 7. {(" To",1), ("be",1), ("or", 1), ("not",1), ("to",1), ("be",1), ("that",1), ("is", 1), ("the",1), ("question",1)} FIRST STEP - MAP
  • 8. {("to",2), ("be",2), ("or", 1), ("not",1), ("that",1), ("is",1), ("the",1), ("question",1)} SECOND STEP - REDUCE
  • 9. ¡  {(“the”,3), ...} To be or not to be that is the question The head is not more native than the heart Brevity is the soul of wit BACK TO THE ORIGINAL EXAMPLE
  • 13. ¡ Cluster of approximately 1800 machine. ¡ Two Benchmarks: ¡  Grep ¡  Sort EXPERIMENT
  • 15. PERFORMANCE Normal execution No backup tasks 200 tasks killed
  • 16. ¡  Combiner function. ¡  Skipping bad records. ¡  Local execution. ¡  Status Information. OPTIMIZATIONS
  • 17. ¡  MapReduce provides a convenience level of abstraction. ¡  Supports scalability and fault-tolerance. ¡  Preserves network bandwidth. CONCLUSION
  • 18. BIGTABLE:A DISTRIBUTED STORAGE SYSTEM FOR STRUCTURED DATA FAY CHANG, JEFFREY DEAN, SANJAY GHEMAWAT,WILSON C. HSIEH, DEBORAH A.WALLACH, MIKE BURROWS,TUSHAR CHANDRA,ANDREW FIKES, ROBERT E. GRUBER @ GOOGLE PRESENTED BY RICHARDVENUTOLO
  • 19. INTRODUCTION ¡  BigTable is a distributed storage system for managing structured data. ¡  Designed to scale to a very large size ¡  Petabytes of data across thousands of servers ¡  Used for many Google projects ¡  Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … ¡  Flexible, high-performance solution for all of Google’s products
  • 20. MOTIVATION ¡  Lots of (semi-)structured data at Google ¡  URLs: ¡  Contents, crawl metadata, links, anchors, pagerank, … ¡  Per-user data: ¡  User preference settings, recent queries/search results, … ¡  Geographic locations: ¡  Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, … ¡  Scale is large ¡  Billions of URLs, many versions/page (~20K/version) ¡  Hundreds of millions of users, thousands or q/sec ¡  100TB+ of satellite image data
  • 21. WHY NOT JUST USE COMMERCIAL DB? ¡ Scale is too large for most commercial databases ¡ Even if it weren’t, cost would be very high ¡  Building internally means system can be applied across many projects for low incremental cost ¡ Low-level storage optimizations help performance significantly ¡  Much harder to do when running on top of a database layer
  • 22. GOALS ¡  Want asynchronous processes to be continuously updating different pieces of data ¡  Want access to most current data at any time ¡  Need to support: ¡  Very high read/write rates (millions of ops per second) ¡  Efficient scans over all or interesting subsets of data ¡  Efficient joins of large one-to-one and one-to-many datasets ¡  Often want to examine data changes over time ¡  E.g. Contents of a web page over multiple crawls
  • 23. BIGTABLE ¡  Distributed multi-level map ¡  Fault-tolerant, persistent ¡  Scalable ¡  Thousands of servers ¡  Terabytes of in-memory data ¡  Petabyte of disk-based data ¡  Millions of reads/writes per second, efficient scans ¡  Self-managing ¡  Servers can be added/removed dynamically ¡  Servers adjust to load imbalance
  • 24. BUILDING BLOCKS ¡  Building blocks: ¡  Google File System (GFS): Raw storage ¡  Scheduler: schedules jobs onto machines ¡  Lock service: distributed lock manager ¡  MapReduce: simplified large-scale data processing ¡  BigTable uses of building blocks: ¡  GFS: stores persistent data (SSTable file format for storage of data) ¡  Scheduler: schedules jobs involved in BigTable serving ¡  Lock service: master election, location bootstrapping ¡  Map Reduce: often used to read/write BigTable data
  • 25. BASIC DATA MODEL ¡  A BigTable is a sparse, distributed persistent multi-dimensional sorted map (row, column, timestamp) -> cell contents ¡  Good match for most Google applications
  • 26. WEBTABLE EXAMPLE ¡  Want to keep copy of a large collection of web pages and related information ¡  Use URLs as row keys ¡  Various aspects of web page as column names ¡  Store contents of web pages in the contents: column under the timestamps when they were fetched.
  • 27. ROWS ¡  Name is an arbitrary string ¡  Access to data in a row is atomic ¡  Row creation is implicit upon storing data ¡  Rows ordered lexicographically ¡  Rows close together lexicographically usually on one or a small number of machines
  • 28. ROWS (CONT.) Reads of short row ranges are efficient and typically require communication with a small number of machines. ¡  Can exploit this property by selecting row keys so they get good locality for data access. ¡  Example: math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu VS edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys
  • 29. COLUMNS ¡  Columns have two-level name structure: ¡  family:optional_qualifier ¡  Column family ¡  Unit of access control ¡  Has associated type information ¡  Qualifier gives unbounded columns ¡  Additional levels of indexing, if desired
  • 30. TIMESTAMPS ¡  Used to store different versions of data in a cell ¡  New writes default to current time, but timestamps for writes can also be set explicitly by clients ¡  Lookup options: ¡  “Return most recent K values” ¡  “Return all values in timestamp range (or all values)” ¡  Column families can be marked w/ attributes: ¡  “Only retain most recent K values in a cell” ¡  “Keep values until they are older than K seconds”
  • 31. IMPLEMENTATION – THREE MAJOR COMPONENTS ¡  Library linked into every client ¡  One master server ¡  Responsible for: ¡  Assigning tablets to tablet servers ¡  Detecting addition and expiration of tablet servers ¡  Balancing tablet-server load ¡  Garbage collection ¡  Many tablet servers ¡  Tablet servers handle read and write requests to its table ¡  Splits tablets that have grown too large
  • 32. IMPLEMENTATION (CONT.) ¡  Client data doesn’t move through master server. Clients communicate directly with tablet servers for reads and writes. ¡  Most clients never communicate with the master server, leaving it lightly loaded in practice.
  • 33. TABLETS ¡  Large tables broken into tablets at row boundaries ¡  Tablet holds contiguous range of rows ¡  Clients can often choose row keys to achieve locality ¡  Aim for ~100MB to 200MB of data per tablet ¡  Serving machine responsible for ~100 tablets ¡  Fast recovery: ¡  100 machines each pick up 1 tablet for failed machine ¡  Fine-grained load balancing: ¡  Migrate tablets away from overloaded machine ¡  Master makes load-balancing decisions
  • 34. TABLET LOCATION ¡  Since tablets move around from server to server, given a row, how do clients find the right machine? ¡  Need to find tablet whose row range covers the target row
  • 35. TABLET ASSIGNMENT ¡  Each tablet is assigned to one tablet server at a time. ¡  Master server keeps track of the set of live tablet servers and current assignments of tablets to servers. Also keeps track of unassigned tablets. ¡  When a tablet is unassigned, master assigns the tablet to an tablet server with sufficient room.
  • 36. API ¡  Metadata operations ¡  Create/delete tables, column families, change metadata ¡  Writes (atomic) ¡  Set(): write cells in a row ¡  DeleteCells(): delete cells in a row ¡  DeleteRow(): delete all cells in a row ¡  Reads ¡  Scanner: read arbitrary cells in a bigtable ¡  Each row read is atomic ¡  Can restrict returned rows to a particular range ¡  Can ask for just data from 1 row, all rows, etc. ¡  Can ask for all columns, just certain column families, or specific columns
  • 37. REFINEMENTS: LOCALITY GROUPS ¡  Can group multiple column families into a locality group ¡  Separate SSTable is created for each locality group in each tablet. ¡  Segregating columns families that are not typically accessed together enables more efficient reads. ¡  In WebTable, page metadata can be in one group and contents of the page in another group.
  • 38. REFINEMENTS: COMPRESSION ¡ Many opportunities for compression ¡  Similar values in the same row/column at different timestamps ¡  Similar values in different columns ¡  Similar values across adjacent rows ¡ Two-pass custom compressions scheme ¡  First pass: compress long common strings across a large window ¡  Second pass: look for repetitions in small window ¡ Speed emphasized, but good space reduction (10-to-1)
  • 39. REFINEMENTS: BLOOM FILTERS ¡  Read operation has to read from disk when desired SSTable isn’t in memory ¡  Reduce number of accesses by specifying a Bloom filter. ¡  Allows us ask if an SSTable might contain data for a specified row/column pair. ¡  Small amount of memory for Bloom filters drastically reduces the number of disk seeks for read operations ¡  Use implies that most lookups for non-existent rows or columns do not need to touch disk
  • 40.   THE  FUTURE  OF  HIGH  PERFORMANCE  DATABASE   SYSTEMS   AUTHORS:  DAVID  DEWITT  AND  JIM  GRAY   PRESENTER:  JENNIFER  TOM   BASED  ON  PRESENTATION  BY  AJITH  KARIMPANA  
  • 41. OUTLINE   ¡  Why  Parallel  Databases?   ¡  Parallel  Database  ImplementaGon   ¡  Parallel  Database  Systems   ¡  Future  DirecGons  and  Research  Problems  
  • 42. WHY  PARALLEL  DATABASES?   •  Relational Data Model – Relational queries are ideal candidates for parallelization •  Multiprocessor systems using inexpensive microprocessors provide more power and scalability than expensive mainframe counterparts Why do the relational queries allow for parallelization?
  • 43. PROPERTIES  OF  IDEAL  PARALLELISM   ¡  Linear  Speedup  –  An  N-­‐Gmes  large  system  yields  a  speedup  of  N   ¡  Linear  Scaleup  –  An  N-­‐Gmes  larger  system  performs  an  N-­‐Gmes  larger  job  in  the  same  elapsed  Gme  as  the   original  system   Why  are  linear  speedup  and  linear  scaleup  indicators  of  good  parallelism?  
  • 44. TYPES  OF  SCALEUP   ¡  Transac'onal  –  N-­‐Gmes  as  many  clients  submit  N-­‐Gmes  as  many  requests  against  an  N-­‐Gmes  larger   database.   ¡  Batch  –  N-­‐Gmes  larger  problem  requires  an  N-­‐Gmes  larger  computer.   Should  we  look  at  both  transacGonal  scaleup  and  batch  scaleup  for  all  situaGons  when  assessing  performance?  
  • 45. BARRIERS  TO  LINEAR  SPEEDUP  AND     LINEAR  SCALEUP   ¡  Startup:  The  Gme  needed  to  start  a  parallel  operaGon  dominates  the   total  execuGon  Gme   ¡  Interference:  A  slowdown  occurs  as  new  processes  are  added  when   accessing  shared  resources   ¡  Skew:  A  slowdown  occurs  when  the  variance  exceeds  the  mean  in  job   service  Gmes     Give  example  situaGons  where  you  might  come  across  interference  and   skew  barriers?  
  • 46. SHARED-­‐NOTHING  MACHINES   ¡  Shared-­‐memory  –  All  processors  have  equal  access  to  a  global  memory  and  all  disks   ¡  Shared-­‐disk  –  Each  processor  has  its  own  private  memory,  but  has  equal  access  to  all  disks   ¡  Shared-­‐nothing  –  Each  processor  has  its  own  private  memory  and  disk(s)   Why  do  shared-­‐nothing  systems  work  well  for  database  systems?  
  • 47. PARALLEL  DATAFLOW  APPROACH   ¡  RelaGonal  data  model  operators  take  relaGons  as  input  and  produce  relaGons  as  output   ¡  Allows  operators  to  be  composed  into  dataflow  graphs  (operaGons  can  be  executed  in  parallel)   How  would  you  compose  a  set  of  operaGons  for  an  SQL  query  to  be  done  in  parallel?    
  • 48. PIPELINED  VS.  PARTITIONED  PARALLELISM   ¡  Pipelined:  The  computaGon  of  one  operator  is  done  at  the  same  Gme  as  another  as  data  is  streamed  into   the  system.   ¡  Par''oned:  The  same  set  of  operaGons  is  performed  on  tuples  parGGoned  across  the  disks.  The  set  of   operaGons  are  done  in  parallel  on  parGGons  of  the  data.     Can  you  give  brief  examples  of  each?  Which  parallelism  is  generally  be]er  for  databases?  
  • 49. DATA  PARTITIONING   •  Round-robin – For n processors, the ith tuple is assigned to the “i mod n” disk. •  Hash partitioning – Tuples are assigned to the disks using a hashing function applied to select fields of the tuple. •  Range partitioning – Clusters tuples with similar attribute values on the same disk. What operations/situations are ideal (or not ideal) for the above partitioning types? What issues to database performance may arise due to partitioning?
  • 50. SKEW  PROBLEMS   ¡  Data  skew  –  The  number  of  tuples  vary  widely    across  the  parGGons.       ¡  Execu'on  skew  –  Due  to  data  skew,  more  Gme  is  spent  to  execute  an  operaGon  on  parGGons  that  contain   larger  numbers  of  tuples.     How  could  these  skew  problems  be  minimized?    
  • 51. PARALLEL  EXECUTION  OF  RELATIONAL  OPERATORS   ¡  Rather  than  write  new  parallel  operators,  use  parallel  streams  of  data  to  execute  exisGng  relaGonal  operators   in  parallel   ¡  RelaGonal  operators  have  (1)  a  set  of  input  ports  and  (2)  an  output  port   ¡  Parallel  dataflow  –  parGGoning  and  merging  of  data  streams  into  the  operator  ports     Are  there  situaGons  in  which  parallel  dataflow  does  not  work  well?  
  • 53. PARALLELIZATION  OF  HASH-­‐JOIN   •  Sort-merge join does not work well if there is data skew •  Hash-join is more resistant to data skew •  To join 2 relations, we divide the join into a set of smaller joins •  The union of these smaller joins should result in the join of the 2 relations How might other relational algorithms/operations (like sorting) be parallelized to improve query performance?
  • 54. PARALLEL  DATABASE  SYSTEMS   •  Teradata – Shared-nothing systems that use commodity hardware. Near linear speedup and scaleup on relational queries. •  Tandem NonStop SQL – Shared-nothing architecture. Near linear speedup and scaleup for large relational queries. •  Gamma – Shared-nothing system. Provides hybrid-range partitioning. Near linear speed and scaleup measured. •  The Super Database Computer – Has special-purpose components. However, is shared-nothing architecture with dataflow. •  Bubba – Shared-nothing system, but uses shared-memory for message passing. Uses FAD rather than SQL. Uses single-level storage Should we focus on the use of commodity hardware to build parallel database systems?
  • 55. DATABASE  MACHINES  AND  GROSCH’S  LAW   ¡  Grosch’s  Law  –  The  performance  of  computer  systems  increase  as  the  square  of  their  cost.  The  more   expensive  systems  have  be]er  cost  to  performance  raGos.   ¡  Less  expensive  shared-­‐nothing  systems  like  Gamma,  Tandem,  and  Teradata  achieve  near-­‐linear  performance   (compared  to  mainframes)   ¡  Suggests  that  Grosch’s  Law  no  longer  applies  
  • 56. FUTURE  DIRECTIONS  AND  RESEARCH  PROBLEMS   ¡  Concurrent  execu'on  of  simple  and  complex  queries   ¡  Problem  with  locks  and  large  queries   ¡  Priority  scheduling  (priority  inversion  scheduling)   ¡  Parallel  Query  Op'miza'on  –  OpGmizers  do  not  take  into  account   parallel  algorithms.  Data  skew  creates  poor  query  plan  cost  esGmates.   ¡  Applica'on  Program  Paralleliza'on  –  Database  applicaGons  are  not   parallelized.  No  effecGve  way  to  automate  the  parallelizaGon  of   applicaGons.    
  • 57. FUTURE  DIRECTIONS  AND  RESEARCH  PROBLEMS   ¡  Physical  Database  Design  –  Tools  are  needed  to  determine  indexing  and   parGGoning  schemes  given  a  database  and  its  workload.   ¡  On-­‐line  Data  Reorganiza'on  –  Availability  of  data  for  concurrent  reads  and   writes  as  database  uGliGes  create  indices,  reorganize  data,  etc.  This  process   must  be  (1)  online,  (2)  incremental,  (3)  parallel,  and  (4)  recoverable.       Which  of  these  issues  may  be  extremely  difficult  to  resolve?  
  • 58. Contact: GenovevaVargas-Solar, CNRS, LIG-LAFMIA Genoveva.Vargas@imag.fr   http://www.vargas-­‐solar.com/teaching     58