SlideShare una empresa de Scribd logo
1 de 16
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
External Sorting
Chapters 13: 13.1—13.5
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Why Sort?
 A classic problem in computer science (See Knuth, v.3)!
 Data requested in sorted order
 e.g., find students in increasing gpa order
 Sorting is first step in bulk loading B+ tree index.
 Sorting useful for eliminating duplicate copies in a
collection of records (Why?)
 Sort-merge join algorithm involves sorting.
 Problem: sort 1Gb of data with 1Mb of RAM.
 why not virtual memory?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3
2-Way Merge Sort: Requires 3 Buffers
 Pass 1: Read a page, sort it (in memory), write it.
 only one buffer page is used
 Pass 2, 3, …, etc.:
 three buffer pages used.
Main memory buffers
INPUT 1
INPUT 2
OUTPUT
Disk
Disk
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
2-Way Merge Sort: Algorithm
proc two-way_extsort(file)
// Given a file on disk, sort it using 3 buffer pages
//Pass 0: produce 1-page runs
Read each page of file into memory, sort it, and write it out.
//Merge pairs of runs to produce longer runs until 1 run is left
While # of runs at end of previous pass > 1 do:
//Process passes i=1,2,…
While there are runs to be merged from previous pass do:
Pick next 2 runs from previous pass.
Reach read each run into an input buffer, 1 page at a time.
Merge the runs and write result to the output buffer by
forcing output buffer to disk one page at a time.
endproc
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5
Two-Way External Merge Sort
 Each pass we read + write
each page in file => 2N I/O’s
per pass.
 N pages in the file => the
number of passes
 So total cost is:
 Idea: Divide and conquer:
sort subfiles and merge
 Input file contains 7 pages;
dark pages shows what
would happen with 8 pages.
 
 
log2 1
N
 
 
2 1
2
N N
log 
Input file
1-page runs
2-page runs
4-page runs
8-page runs
PASS 0
PASS 1
PASS 2
PASS 3
9
3,4 6,2 9,4 8,7 5,6 3,1 2
3,4 5,6
2,6 4,9 7,8 1,3 2
2,3
4,6
4,7
8,9
1,3
5,6 2
2,3
4,4
6,7
8,9
1,2
3,5
6
1,2
2,3
3,4
4,5
6,6
7,8
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6
General External Merge Sort
 To sort a file with N pages using B buffer pages:
 Pass 0: use B buffer pages. Produce sorted runs of B
pages each.
 Pass 2, …, etc.: merge B-1 runs.
 
N B
/
B Main memory buffers
INPUT 1
INPUT B-1
OUTPUT
Disk
Disk
INPUT 2
. . . . . .
. . .
* More than 3 buffer pages. How can we utilize them?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
General External Merge Sort: Algorithm
proc extsort(file)
// Given a file on disk, sort it using B buffer pages
//Pass 0: produce B-pages runs
Read B pages of file into memory, sort them, and write them out.
//Merge B-1 runs to produce longer runs until 1 run is left
While # of runs at end of previous pass > 1 do:
//Process passes i=1,2,…
While there are runs to be merged from previous pass do:
Pick next B-1 runs from previous pass.
Reach read each run into an input buffer, 1 page at a time.
Merge the runs and write result to the output buffer by
forcing output buffer to disk one page at a time.
endproc
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
Cost of External Merge Sort
 Number of passes:
 Cost = 2N * (# of passes)
 E.g., with 5 buffer pages, to sort 108 page file:
 Pass 0: = 22 sorted runs of 5 pages each
(last run is only 3 pages)
 Pass 1: = 6 sorted runs of 20 pages each
(last run is only 8 pages)
 Pass 2: 2 sorted runs, 80 pages and 28 pages
 Pass 3: Sorted file of 108 pages
 
 
1 1
 
log /
B N B
 
108 5
/
 
22 4
/
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9
Number of Passes of External Sort
N B=3 B=5 B=9 B=17 B=129 B=257
100 7 4 3 2 1 1
1,000 10 5 4 3 2 2
10,000 13 7 5 4 2 2
100,000 17 9 6 5 3 3
1,000,000 20 10 7 5 3 3
10,000,000 23 12 8 6 4 3
100,000,000 26 14 9 7 4 4
1,000,000,000 30 15 10 8 5 4
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
I/O for External Merge Sort
 … longer runs often means fewer passes!
 Actually, do I/O a page at a time
 In fact, read a block of pages sequentially!
 Suggests we should make each buffer
(input/output) be a block of pages.
 But this will reduce fan-out during merge passes!
 In practice, most files still sorted in 2-3 passes.
 Typical DBMSs sort 1M records of size 100 bytes
in 15 minutes
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11
Number of Passes of Optimized Sort
N B=1,000 B=5,000 B=10,000
100 1 1 1
1,000 1 1 1
10,000 2 2 1
100,000 3 2 2
1,000,000 3 2 2
10,000,000 4 3 3
100,000,000 5 3 3
1,000,000,000 5 4 3
* Block size = 32, initial pass produces runs of size 2B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12
Double Buffering
 To reduce wait time for I/O request to
complete, can prefetch into `shadow block’.
 Potentially, more passes; in practice, most files still
sorted in 2-3 passes.
OUTPUT
OUTPUT'
Disk Disk
INPUT 1
INPUT k
INPUT 2
INPUT 1'
INPUT 2'
INPUT k'
block size
b
B main memory buffers, k-way merge
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
Using B+ Trees for Sorting
 Scenario: Table to be sorted has B+ tree index on
sorting column(s).
 Idea: Can retrieve records in order by traversing
leaf pages.
 Is this a good idea?
 Cases to consider:
 B+ tree is clustered Good idea!
 B+ tree is not clustered Could be a very bad idea!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14
Clustered B+ Tree Used for Sorting
 Cost: root to the left-
most leaf, then retrieve
all leaf pages
(Alternative 1)
 If Alternative 2 is used?
Additional cost of
retrieving data records:
each page fetched just
once.
* Always better than external sorting!
(Directs search)
Data Records
Index
Data Entries
("Sequence set")
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15
Unclustered B+ Tree Used for Sorting
 Alternative (2) for data entries; each data
entry contains rid of a data record. In general,
one I/O per data record!
(Directs search)
Data Records
Index
Data Entries
("Sequence set")
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
Summary
 External sorting is important; DBMS may dedicate
part of buffer pool for sorting!
 External merge sort minimizes disk I/O cost:
 Pass 0: Produces sorted runs of size B (# buffer pages).
Later passes: merge runs.
 # of runs merged at a time depends on B, and block size.
 Larger block size => smaller # runs merged.
 Clustered B+ tree is good for sorting; unclustered
tree is usually very bad.

Más contenido relacionado

Similar a ch13_extsort.ppt

MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Database management system chapter13
Database management system chapter13Database management system chapter13
Database management system chapter13Pranab Dasgupta
 
ch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.pptch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.pptRahulBhole12
 
Parallel Database description in database management
Parallel Database description in database managementParallel Database description in database management
Parallel Database description in database managementchandugoswami
 
Write intensive workloads and lsm trees
Write intensive workloads and lsm treesWrite intensive workloads and lsm trees
Write intensive workloads and lsm treesTilak Patidar
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
Samsung DeepSort
Samsung DeepSortSamsung DeepSort
Samsung DeepSortRyo Jin
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in LinuxRaghu Udiyar
 
data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsDrishti Bhalla
 
Chapter 04
Chapter 04Chapter 04
Chapter 04 Google
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Int 2 computer structure 2010
Int 2 computer structure 2010Int 2 computer structure 2010
Int 2 computer structure 2010iarthur
 
Input and Output Devices and Systems
Input and Output Devices and SystemsInput and Output Devices and Systems
Input and Output Devices and SystemsNajma Alam
 

Similar a ch13_extsort.ppt (20)

MongoDB Memory Management Demystified
MongoDB Memory Management DemystifiedMongoDB Memory Management Demystified
MongoDB Memory Management Demystified
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Database management system chapter13
Database management system chapter13Database management system chapter13
Database management system chapter13
 
ch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.pptch22a_ParallelDBs how parallel Datab.ppt
ch22a_ParallelDBs how parallel Datab.ppt
 
Parallel Database description in database management
Parallel Database description in database managementParallel Database description in database management
Parallel Database description in database management
 
Write intensive workloads and lsm trees
Write intensive workloads and lsm treesWrite intensive workloads and lsm trees
Write intensive workloads and lsm trees
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
Samsung DeepSort
Samsung DeepSortSamsung DeepSort
Samsung DeepSort
 
Spark Meetup
Spark MeetupSpark Meetup
Spark Meetup
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
virtual memory
virtual memoryvirtual memory
virtual memory
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
MapReduce
MapReduceMapReduce
MapReduce
 
Data corruption
Data corruptionData corruption
Data corruption
 
Unix Memory Management - Operating Systems
Unix Memory Management - Operating SystemsUnix Memory Management - Operating Systems
Unix Memory Management - Operating Systems
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Int 2 computer structure 2010
Int 2 computer structure 2010Int 2 computer structure 2010
Int 2 computer structure 2010
 
Input and Output Devices and Systems
Input and Output Devices and SystemsInput and Output Devices and Systems
Input and Output Devices and Systems
 

Más de RobinRohit2

devops-complete-notes-2.pdf
devops-complete-notes-2.pdfdevops-complete-notes-2.pdf
devops-complete-notes-2.pdfRobinRohit2
 
Different Components of Computer
Different Components of ComputerDifferent Components of Computer
Different Components of ComputerRobinRohit2
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures NotesRobinRohit2
 
08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdf08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdfRobinRohit2
 
Floating Roof Operation.pptx
Floating Roof Operation.pptxFloating Roof Operation.pptx
Floating Roof Operation.pptxRobinRohit2
 
031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptx031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptxRobinRohit2
 
Intro Ch 01B.ppt
Intro Ch 01B.pptIntro Ch 01B.ppt
Intro Ch 01B.pptRobinRohit2
 
Computer Hardware.ppt
Computer Hardware.pptComputer Hardware.ppt
Computer Hardware.pptRobinRohit2
 
SRWE_Module_14.pptx
SRWE_Module_14.pptxSRWE_Module_14.pptx
SRWE_Module_14.pptxRobinRohit2
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptxRobinRohit2
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptxRobinRohit2
 

Más de RobinRohit2 (14)

devops-complete-notes-2.pdf
devops-complete-notes-2.pdfdevops-complete-notes-2.pdf
devops-complete-notes-2.pdf
 
Different Components of Computer
Different Components of ComputerDifferent Components of Computer
Different Components of Computer
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures Notes
 
DATA STRUCTURE
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
 
Ch06.ppt
Ch06.pptCh06.ppt
Ch06.ppt
 
08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdf08_Subnetting_IP_Networks.pdf
08_Subnetting_IP_Networks.pdf
 
AES (2).ppt
AES (2).pptAES (2).ppt
AES (2).ppt
 
Floating Roof Operation.pptx
Floating Roof Operation.pptxFloating Roof Operation.pptx
Floating Roof Operation.pptx
 
031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptx031VCRS19-les-01_oJ80LT2.pptx
031VCRS19-les-01_oJ80LT2.pptx
 
Intro Ch 01B.ppt
Intro Ch 01B.pptIntro Ch 01B.ppt
Intro Ch 01B.ppt
 
Computer Hardware.ppt
Computer Hardware.pptComputer Hardware.ppt
Computer Hardware.ppt
 
SRWE_Module_14.pptx
SRWE_Module_14.pptxSRWE_Module_14.pptx
SRWE_Module_14.pptx
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptx
 
SRWE_Module_16.pptx
SRWE_Module_16.pptxSRWE_Module_16.pptx
SRWE_Module_16.pptx
 

Último

Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 

Último (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 

ch13_extsort.ppt

  • 1. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapters 13: 13.1—13.5
  • 2. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2 Why Sort?  A classic problem in computer science (See Knuth, v.3)!  Data requested in sorted order  e.g., find students in increasing gpa order  Sorting is first step in bulk loading B+ tree index.  Sorting useful for eliminating duplicate copies in a collection of records (Why?)  Sort-merge join algorithm involves sorting.  Problem: sort 1Gb of data with 1Mb of RAM.  why not virtual memory?
  • 3. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 2-Way Merge Sort: Requires 3 Buffers  Pass 1: Read a page, sort it (in memory), write it.  only one buffer page is used  Pass 2, 3, …, etc.:  three buffer pages used. Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk Disk
  • 4. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 2-Way Merge Sort: Algorithm proc two-way_extsort(file) // Given a file on disk, sort it using 3 buffer pages //Pass 0: produce 1-page runs Read each page of file into memory, sort it, and write it out. //Merge pairs of runs to produce longer runs until 1 run is left While # of runs at end of previous pass > 1 do: //Process passes i=1,2,… While there are runs to be merged from previous pass do: Pick next 2 runs from previous pass. Reach read each run into an input buffer, 1 page at a time. Merge the runs and write result to the output buffer by forcing output buffer to disk one page at a time. endproc
  • 5. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Two-Way External Merge Sort  Each pass we read + write each page in file => 2N I/O’s per pass.  N pages in the file => the number of passes  So total cost is:  Idea: Divide and conquer: sort subfiles and merge  Input file contains 7 pages; dark pages shows what would happen with 8 pages.     log2 1 N     2 1 2 N N log  Input file 1-page runs 2-page runs 4-page runs 8-page runs PASS 0 PASS 1 PASS 2 PASS 3 9 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 5,6 2,6 4,9 7,8 1,3 2 2,3 4,6 4,7 8,9 1,3 5,6 2 2,3 4,4 6,7 8,9 1,2 3,5 6 1,2 2,3 3,4 4,5 6,6 7,8
  • 6. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6 General External Merge Sort  To sort a file with N pages using B buffer pages:  Pass 0: use B buffer pages. Produce sorted runs of B pages each.  Pass 2, …, etc.: merge B-1 runs.   N B / B Main memory buffers INPUT 1 INPUT B-1 OUTPUT Disk Disk INPUT 2 . . . . . . . . . * More than 3 buffer pages. How can we utilize them?
  • 7. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 General External Merge Sort: Algorithm proc extsort(file) // Given a file on disk, sort it using B buffer pages //Pass 0: produce B-pages runs Read B pages of file into memory, sort them, and write them out. //Merge B-1 runs to produce longer runs until 1 run is left While # of runs at end of previous pass > 1 do: //Process passes i=1,2,… While there are runs to be merged from previous pass do: Pick next B-1 runs from previous pass. Reach read each run into an input buffer, 1 page at a time. Merge the runs and write result to the output buffer by forcing output buffer to disk one page at a time. endproc
  • 8. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8 Cost of External Merge Sort  Number of passes:  Cost = 2N * (# of passes)  E.g., with 5 buffer pages, to sort 108 page file:  Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages)  Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages)  Pass 2: 2 sorted runs, 80 pages and 28 pages  Pass 3: Sorted file of 108 pages     1 1   log / B N B   108 5 /   22 4 /
  • 9. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Number of Passes of External Sort N B=3 B=5 B=9 B=17 B=129 B=257 100 7 4 3 2 1 1 1,000 10 5 4 3 2 2 10,000 13 7 5 4 2 2 100,000 17 9 6 5 3 3 1,000,000 20 10 7 5 3 3 10,000,000 23 12 8 6 4 3 100,000,000 26 14 9 7 4 4 1,000,000,000 30 15 10 8 5 4
  • 10. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10 I/O for External Merge Sort  … longer runs often means fewer passes!  Actually, do I/O a page at a time  In fact, read a block of pages sequentially!  Suggests we should make each buffer (input/output) be a block of pages.  But this will reduce fan-out during merge passes!  In practice, most files still sorted in 2-3 passes.  Typical DBMSs sort 1M records of size 100 bytes in 15 minutes
  • 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Number of Passes of Optimized Sort N B=1,000 B=5,000 B=10,000 100 1 1 1 1,000 1 1 1 10,000 2 2 1 100,000 3 2 2 1,000,000 3 2 2 10,000,000 4 3 3 100,000,000 5 3 3 1,000,000,000 5 4 3 * Block size = 32, initial pass produces runs of size 2B.
  • 12. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12 Double Buffering  To reduce wait time for I/O request to complete, can prefetch into `shadow block’.  Potentially, more passes; in practice, most files still sorted in 2-3 passes. OUTPUT OUTPUT' Disk Disk INPUT 1 INPUT k INPUT 2 INPUT 1' INPUT 2' INPUT k' block size b B main memory buffers, k-way merge
  • 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13 Using B+ Trees for Sorting  Scenario: Table to be sorted has B+ tree index on sorting column(s).  Idea: Can retrieve records in order by traversing leaf pages.  Is this a good idea?  Cases to consider:  B+ tree is clustered Good idea!  B+ tree is not clustered Could be a very bad idea!
  • 14. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14 Clustered B+ Tree Used for Sorting  Cost: root to the left- most leaf, then retrieve all leaf pages (Alternative 1)  If Alternative 2 is used? Additional cost of retrieving data records: each page fetched just once. * Always better than external sorting! (Directs search) Data Records Index Data Entries ("Sequence set")
  • 15. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 Unclustered B+ Tree Used for Sorting  Alternative (2) for data entries; each data entry contains rid of a data record. In general, one I/O per data record! (Directs search) Data Records Index Data Entries ("Sequence set")
  • 16. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16 Summary  External sorting is important; DBMS may dedicate part of buffer pool for sorting!  External merge sort minimizes disk I/O cost:  Pass 0: Produces sorted runs of size B (# buffer pages). Later passes: merge runs.  # of runs merged at a time depends on B, and block size.  Larger block size => smaller # runs merged.  Clustered B+ tree is good for sorting; unclustered tree is usually very bad.