SlideShare a Scribd company logo
1 of 33
Download to read offline
Lecture 9:
        Linear Sorting
         Steven Skiena

Department of Computer Science
 State University of New York
 Stony Brook, NY 11794–4400

http://www.cs.sunysb.edu/∼skiena
Problem of the Day
The nuts and bolts problem is defined as follows. You
are given a collection of n bolts of different widths, and n
corresponding nuts. You can test whether a given nut and bolt
together, from which you learn whether the nut is too large,
too small, or an exact match for the bolt. The differences in
size between pairs of nuts or bolts can be too small to see by
eye, so you cannot rely on comparing the sizes of two nuts or
two bolts directly. You are to match each bolt to each nut.
1. Give an O(n2 ) algorithm to solve the nuts and bolts
   problem.
2. Suppose that instead of matching all of the nuts and bolts,
   you wish to find the smallest bolt and its corresponding
   nut. Show that this can be done in only 2n − 2
   comparisons.
3. Match the nuts and bolts in expected O(n log n) time.
Solution
Quicksort Pseudocode

Sort(A)
      Quicksort(A,1,n)


Quicksort(A, low, high)
     if (low < high)
            pivot-location = Partition(A,low,high)
            Quicksort(A,low, pivot-location - 1)
            Quicksort(A, pivot-location+1, high)
Partition Implementation

Partition(A,low,high)
       pivot = A[low]
       leftwall = low
       for i = low+1 to high
              if (A[i] < pivot) then
                     leftwall = leftwall+1
                     swap(A[i],A[leftwall])
       swap(A[low],A[leftwall])
Quicksort Animation

               Q U I C K S O R T

               Q I C K S O R T U

               Q I C K O R S T U

               I C K O Q R S T U

               I C K O Q R S T U

               I C K O Q R S T U
Best Case for Quicksort
Since each element ultimately ends up in the correct position,
the algorithm correctly sorts. But how long does it take?
The best case for divide-and-conquer algorithms comes when
we split the input as evenly as possible. Thus in the best case,
each subproblem is of size n/2.
The partition step on each subproblem is linear in its size.
Thus the total effort in partitioning the 2k problems of size
n/2k is O(n).
Best Case Recursion Tree




The total partitioning on each level is O(n), and it take
lg n levels of perfect partitions to get to single element
subproblems. When we are down to single elements, the
problems are sorted. Thus the total time in the best case is
O(n lg n).
Worst Case for Quicksort
Suppose instead our pivot element splits the array as
unequally as possible. Thus instead of n/2 elements in the
smaller half, we get zero, meaning that the pivot element is
the biggest or smallest element in the array.
Now we have n−1 levels, instead of lg n, for a worst case time
of Θ(n2 ), since the first n/2 levels each have ≥ n/2 elements
to partition.
To justify its name, Quicksort had better be good in the
average case. Showing this requires some intricate analysis.
The divide and conquer principle applies to real life. If you
break a job into pieces, make the pieces of equal size!
Intuition: The Average Case for Quicksort
Suppose we pick the pivot element at random in an array of n
keys.

            1        n/4     n/2      3n/4     n


Half the time, the pivot element will be from the center half
of the sorted array.
Whenever the pivot element is from positions n/4 to 3n/4, the
larger remaining subarray contains at most 3n/4 elements.
How Many Good Partitions
If we assume that the pivot element is always in this range,
what is the maximum number of partitions we need to get
from n elements down to 1 element?
                (3/4)l · n = 1 −→ n = (4/3)l

                      lg n = l · lg(4/3)

Therefore l = lg(4/3) · lg(n) < 2 lg n good partitions suffice.
How Many Bad Partitions?
How often when we pick an arbitrary element as pivot will it
generate a decent partition?
Since any number ranked between n/4 and 3n/4 would make
a decent pivot, we get one half the time on average.
If we need 2 lg n levels of decent partitions to finish the job,
and half of random partitions are decent, then on average the
recursion tree to quicksort the array has ≈ 4 lg n levels.
Since O(n) work is done partitioning on each level, the
average time is O(n lg n).
Average-Case Analysis of Quicksort (*)
To do a precise average-case analysis of quicksort, we
formulate a recurrence given the exact expected time T (n):
                  n 1
        T (n) =       (T (p − 1) + T (n − p)) + n − 1
                p=1 n

Each possible pivot p is selected with equal probability. The
number of comparisons needed to do the partition is n − 1.
We will need one useful fact about the Harmonic numbers
Hn, namely
                             n
                     Hn =      1/i ≈ ln n
                           i=1

It is important to understand (1) where the recurrence relation
comes from and (2) how the log comes out from the
summation. The rest is just messy algebra.
                n 1
      T (n) =        (T (p − 1) + T (n − p)) + n − 1
               p=1 n
                       2 n
             T (n) =         T (p − 1) + n − 1
                       n p=1
                n
    nT (n) = 2     T (p − 1) + n(n − 1) multiply by n
               p=1
                     n−1
(n−1)T (n−1) = 2           T (p−1)+(n−1)(n−2) apply to n-1
                     p=1
     nT (n) − (n − 1)T (n − 1) = 2T (n − 1) + 2(n − 1)
rearranging the terms give us:
                T (n)   T (n − 1) 2(n − 1)
                      =          +
               n+1          n      n(n + 1)
substituting an = A(n)/(n + 1) gives
                        2(n − 1)   n 2(i − 1)
            an = an−1 +          =
                        n(n + 1) i=1 i(i + 1)
                           n     1
                  an ≈ 2              ≈ 2 ln n
                          i=1 (i + 1)
We are really interested in A(n), so
       A(n) = (n + 1)an ≈ 2(n + 1) ln n ≈ 1.38n lg n
Pick a Better Pivot
Having the worst case occur when they are sorted or almost
sorted is very bad, since that is likely to be the case in certain
applications.
To eliminate this problem, pick a better pivot:
 1. Use the middle element of the subarray as pivot.
 2. Use a random element of the array as the pivot.
 3. Perhaps best of all, take the median of three elements
    (first, last, middle) as the pivot. Why should we use
    median instead of the mean?
Whichever of these three rules we use, the worst case remains
O(n2 ).
Is Quicksort really faster than Heapsort?

Since Heapsort is Θ(n lg n) and selection sort is Θ(n2 ), there
is no debate about which will be better for decent-sized files.
When Quicksort is implemented well, it is typically 2-3 times
faster than mergesort or heapsort.
The primary reason is that the operations in the innermost
loop are simpler.
Since the difference between the two programs will be limited
to a multiplicative constant factor, the details of how you
program each algorithm will make a big difference.
Randomized Quicksort
Suppose you are writing a sorting program, to run on data
given to you by your worst enemy. Quicksort is good on
average, but bad on certain worst-case instances.
If you used Quicksort, what kind of data would your enemy
give you to run it on? Exactly the worst-case instance, to
make you look bad.
But instead of picking the median of three or the first element
as pivot, suppose you picked the pivot element at random.
Now your enemy cannot design a worst-case instance to give
to you, because no matter which data they give you, you
would have the same probability of picking a good pivot!
Randomized Guarantees
Randomization is a very important and useful idea. By either
picking a random pivot or scrambling the permutation before
sorting it, we can say:
   “With high probability, randomized quicksort runs in
   Θ(n lg n) time.”
Where before, all we could say is:
   “If you give me random input data, quicksort runs in
   expected Θ(n lg n) time.”
Importance of Randomization
Since the time bound how does not depend upon your input
distribution, this means that unless we are extremely unlucky
(as opposed to ill prepared or unpopular) we will certainly get
good performance.
Randomization is a general tool to improve algorithms with
bad worst-case but good average-case complexity.
The worst-case is still there, but we almost certainly won’t
see it.
Can we sort o(n lg n)?
Any comparison-based sorting program can be thought of as
defining a decision tree of possible executions.
Running the same program twice on the same permutation
causes it to do exactly the same thing, but running it on
different permutations of the same data causes a different
sequence of comparisons to be made on each.
a1 < a2 ?
                                T                                F


                      a2 < a3 ?                                  a1 < a3 ?

                T                   F                     T                   F


            (1,2,3)             a1 < a3 ?              (2,1,3)               a2 < a3 ?
                        T                   F                          T                   F


                      (1,3,2)           (3,1,2)                       (2,3,1)            (3,2,1)


Claim: the height of this decision tree is the worst-case
complexity of sorting.
Lower Bound Analysis
Since any two different permutations of n elements requires
a different sequence of steps to sort, there must be at least n!
different paths from the root to leaves in the decision tree.
Thus there must be at least n! different leaves in this binary
tree.
Since a binary tree of height h has at most 2h leaves, we know
n! ≤ 2h, or h ≥ lg(n!).
By inspection n! > (n/2)n/2 , since the last n/2 terms of the
product are each greater than n/2. Thus

   log(n!) > log((n/2)n/2 ) = n/2 log(n/2) → Θ(n log n)
Stirling’s Approximation
By Stirling’s approximation, a better bound is n! > (n/e)n
where e = 2.718.
        h ≥ lg(n/e)n = n lg n − n lg e = Ω(n lg n)
Non-Comparison-Based Sorting
All the sorting algorithms we have seen assume binary
comparisons as the basic primative, questions of the form “is
x before y?”.
But how would you sort a deck of playing cards?
Most likely you would set up 13 piles and put all cards with
the same number in one pile.
With only a constant number of cards left in each pile, you can
use insertion sort to order by suite and concatenate everything
together.
If we could find the correct pile for each card in constant
time, and each pile gets O(1) cards, this algorithm takes O(n)
time.
Bucketsort
Suppose we are sorting n numbers from 1 to m, where we
know the numbers are approximately uniformly distributed.
We can set up n buckets, each responsible for an interval of
m/n numbers from 1 to m
            x   x          x                           x   x             x
                                                                     x         x

        1           m/n m/n+1   2m/n   2m/n+1   3m/n           ...       ...       m



Given an input number x, it belongs in bucket number
 xn/m .
If we use an array of buckets, each item gets mapped to the
right bucket in O(1) time.
Bucketsort Analysis
With uniformly distributed keys, the expected number of
items per bucket is 1. Thus sorting each bucket takes O(1)
time!
The total effort of bucketing, sorting buckets, and concatenat-
ing the sorted buckets together is O(n).
What happened to our Ω(n lg n) lower bound!
Worst-Case vs. Assumed-Case
Bad things happen to bucketsort when we assume the wrong
distribution.
             xx
            xxx     x
           x xx x    x
                      x
            x x     x
            xx       x
       1            m/n m/n+1   2m/n   2m/n+1   3m/n   ...   ...   m



We might spend linear time distributing our items into
buckets and learn nothing.
Problems like this are why we worry about the worst-case
performance of algorithms!
Real World Distributions
The worst case “shouldn’t” happen if we understand the
distribution of our data.
Consider the distribution of names in a telephone book.
 • Will there be a lot of Skiena’s?
 • Will there be a lot of Smith’s?
 • Will there be a lot of Shifflett’s?
Either make sure you understand your data, or use a good
worst-case or randomized algorithm!
The Shifflett’s of Charlottesville
For comparison, note that there are seven Shifflett’s (of
various spellings) in the 1000 page Manhattan telephone
directory.

More Related Content

What's hot

DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier AnalysisDSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier AnalysisAmr E. Mohamed
 
signal and system Lecture 2
signal and system Lecture 2signal and system Lecture 2
signal and system Lecture 2iqbal ahmad
 
Physics Research Summer2009
Physics Research Summer2009Physics Research Summer2009
Physics Research Summer2009Ryan Melvin
 
Signal fundamentals
Signal fundamentalsSignal fundamentals
Signal fundamentalsLalit Kanoje
 
Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...Vinod Sharma
 
lecture 7
lecture 7lecture 7
lecture 7sajinsc
 
Fourier series 1
Fourier series 1Fourier series 1
Fourier series 1Faiza Saher
 
SIGNAL OPERATIONS
SIGNAL OPERATIONSSIGNAL OPERATIONS
SIGNAL OPERATIONSmihir jain
 
Convolution
ConvolutionConvolution
Convolutionmuzuf
 
Fourier Series for Continuous Time & Discrete Time Signals
Fourier Series for Continuous Time & Discrete Time SignalsFourier Series for Continuous Time & Discrete Time Signals
Fourier Series for Continuous Time & Discrete Time SignalsJayanshu Gundaniya
 
Lecture4 Signal and Systems
Lecture4  Signal and SystemsLecture4  Signal and Systems
Lecture4 Signal and Systemsbabak danyal
 
Dsp U Lec04 Discrete Time Signals & Systems
Dsp U   Lec04 Discrete Time Signals & SystemsDsp U   Lec04 Discrete Time Signals & Systems
Dsp U Lec04 Discrete Time Signals & Systemstaha25
 
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and SystemsDSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and SystemsAmr E. Mohamed
 
Lecture5 Signal and Systems
Lecture5 Signal and SystemsLecture5 Signal and Systems
Lecture5 Signal and Systemsbabak danyal
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)Amr E. Mohamed
 

What's hot (20)

DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier AnalysisDSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
 
signal and system Lecture 2
signal and system Lecture 2signal and system Lecture 2
signal and system Lecture 2
 
Physics Research Summer2009
Physics Research Summer2009Physics Research Summer2009
Physics Research Summer2009
 
Signal fundamentals
Signal fundamentalsSignal fundamentals
Signal fundamentals
 
Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...Convolution discrete and continuous time-difference equaion and system proper...
Convolution discrete and continuous time-difference equaion and system proper...
 
lecture 7
lecture 7lecture 7
lecture 7
 
Fourier series 1
Fourier series 1Fourier series 1
Fourier series 1
 
Solved problems
Solved problemsSolved problems
Solved problems
 
SIGNAL OPERATIONS
SIGNAL OPERATIONSSIGNAL OPERATIONS
SIGNAL OPERATIONS
 
Convolution
ConvolutionConvolution
Convolution
 
Fourier Series for Continuous Time & Discrete Time Signals
Fourier Series for Continuous Time & Discrete Time SignalsFourier Series for Continuous Time & Discrete Time Signals
Fourier Series for Continuous Time & Discrete Time Signals
 
Unit 5: All
Unit 5: AllUnit 5: All
Unit 5: All
 
Notes for signals and systems
Notes for signals and systemsNotes for signals and systems
Notes for signals and systems
 
NFSFIXES
NFSFIXESNFSFIXES
NFSFIXES
 
Lecture4 Signal and Systems
Lecture4  Signal and SystemsLecture4  Signal and Systems
Lecture4 Signal and Systems
 
Dsp U Lec04 Discrete Time Signals & Systems
Dsp U   Lec04 Discrete Time Signals & SystemsDsp U   Lec04 Discrete Time Signals & Systems
Dsp U Lec04 Discrete Time Signals & Systems
 
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and SystemsDSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
 
Lecture5 Signal and Systems
Lecture5 Signal and SystemsLecture5 Signal and Systems
Lecture5 Signal and Systems
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
 
OPERATIONS ON SIGNALS
OPERATIONS ON SIGNALSOPERATIONS ON SIGNALS
OPERATIONS ON SIGNALS
 

Similar to Skiena algorithm 2007 lecture09 linear sorting

CSE680-07QuickSort.pptx
CSE680-07QuickSort.pptxCSE680-07QuickSort.pptx
CSE680-07QuickSort.pptxDeepakM509554
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqPradeep Bisht
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquerVikas Sharma
 
lecture 15
lecture 15lecture 15
lecture 15sajinsc
 
Algorithm: Quick-Sort
Algorithm: Quick-SortAlgorithm: Quick-Sort
Algorithm: Quick-SortTareq Hasan
 
CS330-Lectures Statistics And Probability
CS330-Lectures Statistics And ProbabilityCS330-Lectures Statistics And Probability
CS330-Lectures Statistics And Probabilitybryan111472
 
lecture 1
lecture 1lecture 1
lecture 1sajinsc
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Deepak John
 
T2311 - Ch 4_Part1.pptx
T2311 - Ch 4_Part1.pptxT2311 - Ch 4_Part1.pptx
T2311 - Ch 4_Part1.pptxGadaFarhan
 
Top school in noida
Top school in noidaTop school in noida
Top school in noidaEdhole.com
 
lecture 8
lecture 8lecture 8
lecture 8sajinsc
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2Deepak John
 
Skiena algorithm 2007 lecture02 asymptotic notation
Skiena algorithm 2007 lecture02 asymptotic notationSkiena algorithm 2007 lecture02 asymptotic notation
Skiena algorithm 2007 lecture02 asymptotic notationzukun
 
Quick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisQuick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisSNJ Chaudhary
 

Similar to Skiena algorithm 2007 lecture09 linear sorting (20)

Lec10
Lec10Lec10
Lec10
 
CSE680-07QuickSort.pptx
CSE680-07QuickSort.pptxCSE680-07QuickSort.pptx
CSE680-07QuickSort.pptx
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
 
2.pptx
2.pptx2.pptx
2.pptx
 
lecture 15
lecture 15lecture 15
lecture 15
 
Algorithm: Quick-Sort
Algorithm: Quick-SortAlgorithm: Quick-Sort
Algorithm: Quick-Sort
 
Lec10
Lec10Lec10
Lec10
 
CS330-Lectures Statistics And Probability
CS330-Lectures Statistics And ProbabilityCS330-Lectures Statistics And Probability
CS330-Lectures Statistics And Probability
 
lecture 1
lecture 1lecture 1
lecture 1
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1
 
Algorithm.ppt
Algorithm.pptAlgorithm.ppt
Algorithm.ppt
 
T2311 - Ch 4_Part1.pptx
T2311 - Ch 4_Part1.pptxT2311 - Ch 4_Part1.pptx
T2311 - Ch 4_Part1.pptx
 
Slide2
Slide2Slide2
Slide2
 
Top school in noida
Top school in noidaTop school in noida
Top school in noida
 
Merge sort and quick sort
Merge sort and quick sortMerge sort and quick sort
Merge sort and quick sort
 
lecture 8
lecture 8lecture 8
lecture 8
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2
 
Skiena algorithm 2007 lecture02 asymptotic notation
Skiena algorithm 2007 lecture02 asymptotic notationSkiena algorithm 2007 lecture02 asymptotic notation
Skiena algorithm 2007 lecture02 asymptotic notation
 
Quick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisQuick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And Analysis
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Skiena algorithm 2007 lecture09 linear sorting

  • 1. Lecture 9: Linear Sorting Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.sunysb.edu/∼skiena
  • 2. Problem of the Day The nuts and bolts problem is defined as follows. You are given a collection of n bolts of different widths, and n corresponding nuts. You can test whether a given nut and bolt together, from which you learn whether the nut is too large, too small, or an exact match for the bolt. The differences in size between pairs of nuts or bolts can be too small to see by eye, so you cannot rely on comparing the sizes of two nuts or two bolts directly. You are to match each bolt to each nut.
  • 3. 1. Give an O(n2 ) algorithm to solve the nuts and bolts problem. 2. Suppose that instead of matching all of the nuts and bolts, you wish to find the smallest bolt and its corresponding nut. Show that this can be done in only 2n − 2 comparisons. 3. Match the nuts and bolts in expected O(n log n) time.
  • 5. Quicksort Pseudocode Sort(A) Quicksort(A,1,n) Quicksort(A, low, high) if (low < high) pivot-location = Partition(A,low,high) Quicksort(A,low, pivot-location - 1) Quicksort(A, pivot-location+1, high)
  • 6. Partition Implementation Partition(A,low,high) pivot = A[low] leftwall = low for i = low+1 to high if (A[i] < pivot) then leftwall = leftwall+1 swap(A[i],A[leftwall]) swap(A[low],A[leftwall])
  • 7. Quicksort Animation Q U I C K S O R T Q I C K S O R T U Q I C K O R S T U I C K O Q R S T U I C K O Q R S T U I C K O Q R S T U
  • 8. Best Case for Quicksort Since each element ultimately ends up in the correct position, the algorithm correctly sorts. But how long does it take? The best case for divide-and-conquer algorithms comes when we split the input as evenly as possible. Thus in the best case, each subproblem is of size n/2. The partition step on each subproblem is linear in its size. Thus the total effort in partitioning the 2k problems of size n/2k is O(n).
  • 9. Best Case Recursion Tree The total partitioning on each level is O(n), and it take lg n levels of perfect partitions to get to single element subproblems. When we are down to single elements, the problems are sorted. Thus the total time in the best case is O(n lg n).
  • 10. Worst Case for Quicksort Suppose instead our pivot element splits the array as unequally as possible. Thus instead of n/2 elements in the smaller half, we get zero, meaning that the pivot element is the biggest or smallest element in the array.
  • 11. Now we have n−1 levels, instead of lg n, for a worst case time of Θ(n2 ), since the first n/2 levels each have ≥ n/2 elements to partition. To justify its name, Quicksort had better be good in the average case. Showing this requires some intricate analysis. The divide and conquer principle applies to real life. If you break a job into pieces, make the pieces of equal size!
  • 12. Intuition: The Average Case for Quicksort Suppose we pick the pivot element at random in an array of n keys. 1 n/4 n/2 3n/4 n Half the time, the pivot element will be from the center half of the sorted array. Whenever the pivot element is from positions n/4 to 3n/4, the larger remaining subarray contains at most 3n/4 elements.
  • 13. How Many Good Partitions If we assume that the pivot element is always in this range, what is the maximum number of partitions we need to get from n elements down to 1 element? (3/4)l · n = 1 −→ n = (4/3)l lg n = l · lg(4/3) Therefore l = lg(4/3) · lg(n) < 2 lg n good partitions suffice.
  • 14. How Many Bad Partitions? How often when we pick an arbitrary element as pivot will it generate a decent partition? Since any number ranked between n/4 and 3n/4 would make a decent pivot, we get one half the time on average. If we need 2 lg n levels of decent partitions to finish the job, and half of random partitions are decent, then on average the recursion tree to quicksort the array has ≈ 4 lg n levels.
  • 15. Since O(n) work is done partitioning on each level, the average time is O(n lg n).
  • 16. Average-Case Analysis of Quicksort (*) To do a precise average-case analysis of quicksort, we formulate a recurrence given the exact expected time T (n): n 1 T (n) = (T (p − 1) + T (n − p)) + n − 1 p=1 n Each possible pivot p is selected with equal probability. The number of comparisons needed to do the partition is n − 1. We will need one useful fact about the Harmonic numbers Hn, namely n Hn = 1/i ≈ ln n i=1 It is important to understand (1) where the recurrence relation
  • 17. comes from and (2) how the log comes out from the summation. The rest is just messy algebra. n 1 T (n) = (T (p − 1) + T (n − p)) + n − 1 p=1 n 2 n T (n) = T (p − 1) + n − 1 n p=1 n nT (n) = 2 T (p − 1) + n(n − 1) multiply by n p=1 n−1 (n−1)T (n−1) = 2 T (p−1)+(n−1)(n−2) apply to n-1 p=1 nT (n) − (n − 1)T (n − 1) = 2T (n − 1) + 2(n − 1) rearranging the terms give us: T (n) T (n − 1) 2(n − 1) = + n+1 n n(n + 1)
  • 18. substituting an = A(n)/(n + 1) gives 2(n − 1) n 2(i − 1) an = an−1 + = n(n + 1) i=1 i(i + 1) n 1 an ≈ 2 ≈ 2 ln n i=1 (i + 1) We are really interested in A(n), so A(n) = (n + 1)an ≈ 2(n + 1) ln n ≈ 1.38n lg n
  • 19. Pick a Better Pivot Having the worst case occur when they are sorted or almost sorted is very bad, since that is likely to be the case in certain applications. To eliminate this problem, pick a better pivot: 1. Use the middle element of the subarray as pivot. 2. Use a random element of the array as the pivot. 3. Perhaps best of all, take the median of three elements (first, last, middle) as the pivot. Why should we use median instead of the mean? Whichever of these three rules we use, the worst case remains O(n2 ).
  • 20. Is Quicksort really faster than Heapsort? Since Heapsort is Θ(n lg n) and selection sort is Θ(n2 ), there is no debate about which will be better for decent-sized files. When Quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort. The primary reason is that the operations in the innermost loop are simpler. Since the difference between the two programs will be limited to a multiplicative constant factor, the details of how you program each algorithm will make a big difference.
  • 21. Randomized Quicksort Suppose you are writing a sorting program, to run on data given to you by your worst enemy. Quicksort is good on average, but bad on certain worst-case instances. If you used Quicksort, what kind of data would your enemy give you to run it on? Exactly the worst-case instance, to make you look bad. But instead of picking the median of three or the first element as pivot, suppose you picked the pivot element at random. Now your enemy cannot design a worst-case instance to give to you, because no matter which data they give you, you would have the same probability of picking a good pivot!
  • 22. Randomized Guarantees Randomization is a very important and useful idea. By either picking a random pivot or scrambling the permutation before sorting it, we can say: “With high probability, randomized quicksort runs in Θ(n lg n) time.” Where before, all we could say is: “If you give me random input data, quicksort runs in expected Θ(n lg n) time.”
  • 23. Importance of Randomization Since the time bound how does not depend upon your input distribution, this means that unless we are extremely unlucky (as opposed to ill prepared or unpopular) we will certainly get good performance. Randomization is a general tool to improve algorithms with bad worst-case but good average-case complexity. The worst-case is still there, but we almost certainly won’t see it.
  • 24. Can we sort o(n lg n)? Any comparison-based sorting program can be thought of as defining a decision tree of possible executions. Running the same program twice on the same permutation causes it to do exactly the same thing, but running it on different permutations of the same data causes a different sequence of comparisons to be made on each.
  • 25. a1 < a2 ? T F a2 < a3 ? a1 < a3 ? T F T F (1,2,3) a1 < a3 ? (2,1,3) a2 < a3 ? T F T F (1,3,2) (3,1,2) (2,3,1) (3,2,1) Claim: the height of this decision tree is the worst-case complexity of sorting.
  • 26. Lower Bound Analysis Since any two different permutations of n elements requires a different sequence of steps to sort, there must be at least n! different paths from the root to leaves in the decision tree. Thus there must be at least n! different leaves in this binary tree. Since a binary tree of height h has at most 2h leaves, we know n! ≤ 2h, or h ≥ lg(n!). By inspection n! > (n/2)n/2 , since the last n/2 terms of the product are each greater than n/2. Thus log(n!) > log((n/2)n/2 ) = n/2 log(n/2) → Θ(n log n)
  • 27. Stirling’s Approximation By Stirling’s approximation, a better bound is n! > (n/e)n where e = 2.718. h ≥ lg(n/e)n = n lg n − n lg e = Ω(n lg n)
  • 28. Non-Comparison-Based Sorting All the sorting algorithms we have seen assume binary comparisons as the basic primative, questions of the form “is x before y?”. But how would you sort a deck of playing cards? Most likely you would set up 13 piles and put all cards with the same number in one pile. With only a constant number of cards left in each pile, you can use insertion sort to order by suite and concatenate everything together. If we could find the correct pile for each card in constant time, and each pile gets O(1) cards, this algorithm takes O(n) time.
  • 29. Bucketsort Suppose we are sorting n numbers from 1 to m, where we know the numbers are approximately uniformly distributed. We can set up n buckets, each responsible for an interval of m/n numbers from 1 to m x x x x x x x x 1 m/n m/n+1 2m/n 2m/n+1 3m/n ... ... m Given an input number x, it belongs in bucket number xn/m . If we use an array of buckets, each item gets mapped to the right bucket in O(1) time.
  • 30. Bucketsort Analysis With uniformly distributed keys, the expected number of items per bucket is 1. Thus sorting each bucket takes O(1) time! The total effort of bucketing, sorting buckets, and concatenat- ing the sorted buckets together is O(n). What happened to our Ω(n lg n) lower bound!
  • 31. Worst-Case vs. Assumed-Case Bad things happen to bucketsort when we assume the wrong distribution. xx xxx x x xx x x x x x x xx x 1 m/n m/n+1 2m/n 2m/n+1 3m/n ... ... m We might spend linear time distributing our items into buckets and learn nothing. Problems like this are why we worry about the worst-case performance of algorithms!
  • 32. Real World Distributions The worst case “shouldn’t” happen if we understand the distribution of our data. Consider the distribution of names in a telephone book. • Will there be a lot of Skiena’s? • Will there be a lot of Smith’s? • Will there be a lot of Shifflett’s? Either make sure you understand your data, or use a good worst-case or randomized algorithm!
  • 33. The Shifflett’s of Charlottesville For comparison, note that there are seven Shifflett’s (of various spellings) in the 1000 page Manhattan telephone directory.