Merge sort analysis and its real time applications
1. Merge Sort Analysis
and its Real-Time
Applications
GROUP-4
BE-3RD YEAR, SEM-5TH
Sarvajanik College of Engineering & Technology
Department of Computer Engineering (Shift-1)
1
3. Introduction to sorting
Introduction to merge sort algorithm
Complexity analysis of merge sort algorithm
Real-time application
3CONTENTS : A GLIMPSE
OF WHAT IS TO COME
4. Introduction to sorting
Sorting refers to arranging a set of data in some logical order.
For ex. A telephone directory can be considered as a list where each record has
three fields - name, address and phone number.
Being unique, phone number can work as a key to locate any record in the
list.
Sorting is among the most basic problems in algorithm design.
We are given a sequence of items, each associated with a given key value. And
the problem is to rearrange the items so that they are in an increasing(or
decreasing) order by key.
The methods of sorting can be divided into two categories:
Internal Sorting
External Sorting
4
5. Internal Sorting
If all the data that is to be sorted can be adjusted at a time in main memory, then internal
sorting methods are used
•External Sorting
When the data to be sorted can’t be accommodated in the memory at the same time and
some has to be kept in auxiliary memory, then external sorting methods are used.
NOTE: We will only consider External sorting
5
6. Stable and Not Stable Sorting
If a sorting algorithm, after sorting the contents, does not change the sequence of similar
content in which they appear, it is called stable sorting.
If a sorting algorithm, after sorting the contents, changes the sequence of similar content in
which they appear, it is called unstable sorting.
6
7. Efficiency of Sorting Algorithm
The complexity of a sorting algorithm measures the running time of a function
in which n number of items are to be sorted.
The choice of sorting method depends on efficiency considerations for
different problems.
Tree most important of these considerations are:
The length of time spent by programmer in coding a particular sorting program.
Amount of machine time necessary for running the program.
The amount of memory necessary for running the program.
7
8. Efficiency of Sorting Algorithm
Various sorting methods are analyzed in the cases like – best case, worst case or
average case.
Most of the sort methods we consider have requirements that range from O(n
logn) to O(n2).
A sort should not be selected only because its sorting time is 0(nlogn); the relation
of the file size n and the other factors affecting the actual sorting time must be
considered.
Determining the time requirement of sorting technique is to actually run the
program and measure its efficiency.
Once a particular sorting technique is selected the need is to make the program as
efficient as possible.
Any improvement in sorting time significantly affect the overall efficiency and saves
a great deal of computer time.
8
9. Efficiency of Sorting Algorithm
Space constraints are usually less important than time considerations.
The reason for this can be, as for most sorting programs, the amount of
space needed is closer to 0(n) than to 0(n2)
The second reason is that, if more space is required, it can almost always
be found in auxiliary storage.
9
10. Introduction to merge sort algorithm
10
Divide-and-conquer, breaks a problem into sub problems that are similar to the
original problem, recursively solves the sub problems, and finally combines the
solutions to the sub problems to solve the original problem.
Think of a divide-and-conquer algorithm as having three parts:
Divide the problem into a number of sub-problems that are smaller instances of the same
problem.
Conquer the sub-problems by solving them recursively. If they are small enough, solve the
sub-problems as base cases.
Combine the solutions to the sub-problems into the solution for the original problem.
Because we're using divide-and-conquer to sort, we need to decide what our sub problems
are going to be.
Divide-and-conquer algorithms
11. 11Problem
divide
Sub problem Sub problem
Conquer
Solve
sub-problem
Solve
sub-problem
Solution
to
Sub-problem
Solution
to
Sub-problem
Solution to problem
Combine
12. Merge sort algorithm 12
Merge sort is a sorting technique based on divide and conquer
technique that was invented by John von Neumann in 1945.
Merge sort work on Two basic principle :
• Sorting smaller list is faster than sorting larger list.
• Combining two sorted sub lists is faster than
of two unsorted list.
13. Working of merge sort 13
Merge sort works in three stage:
• Divide : Merge sort first divides the list into equal halves and then
combines them in a sorted manner.
• Conquer : Then sort the sub-lists
• Combine : After sorting merge all sub-lists into single list.
17. Algorithm for merge sort :
17
Algorithm MERGE_SORT(A,1,n)
//A is array of size n
if low < high then
mid floor ( low + high ) / 2
MERGE_SORT(A , low , mid)
MERGE_SORT(A , mid+1 , high)
COMBINE(A , low, mid, high)
end
Algorithm COMBINE(A , low , mid , high)
L1 mid – low +1
L2 high – mid
for i 1 to L1 do
LEFT[i] A [ low + i -1 ]
end
for j 1 to L2 do
RIGHT[ i ] A[ mid + j ]
end
LEFT [ L1 + 1] ∞
RIGHT [ L2 + 1] ∞
i 1 , j 1
for k low to high do
if LEFT [ i ] RIGHT [ j ] then
A[ k ] LEFT [ i ]
i i +1
else
A [] RIGHT []
j = j + 1
end
end
end
20. 20
Example base on previous method to sorting the list
using merge sort algorithm.
21. Complexity analysis of merge sort algorithm
Divide : This step computes the middle of the sub array, which
takes constant time . Thus, D(n) = θ(1).
Conquer : We recursively sole two sub problems, each of size (n/2)
, which contributes 2T(n/2) to the running time.
Combine: Combine procedure on an n-element sub array takes
times θ(n).so C(n) = θ(n).
T(n)=
21
0 ,if n <=1
T(n/2) + T(n/2) + D(n) + C(n) , else
22. 22
T(n) = 2 * T(n/2) + θ(n) + θ(1)
T(n) = 2 * T(n/2) + n
T(n/2) = 2 * ( 2 * T(n/4) + n/2 )
T(n) = 2 * ( 2 * T(n/2) + n/2 ) + n
= 22 * T(n/ 22) + 2n
.
.
.
.
=2k T(n/ 2k)+ kn
Suppose, n = 2k so k = log2 n
T(2k) = n* T(n/n) + log2 n * n
=n *T(1) +n* log2 n
But T(1)=0
T(n)= O(n* log2 n )
23. 23
Weather list is already sorted , inverse sorted or randomly distributed , all three steps must be
performed.
Merge sort cannot identify if list is sorted. So numbers of comparisons are same for 3 case.
Worst case of merge sort takes les time compared to insertion sort , selection sort & bubble sort .
However , best case of insertion sort (O(n)) beats all three cases of merge sort(O(nlog2n))
Best case Average case Worst case
O(nlog2n) O(nlog2n) O(nlog2n)
24. Property of merge sort :
• Not Adaptive : Running time doesn’t change with data pattern.
• Stable/ Unstable : Both implementations are possible .
• Not Incremental : Does not sort one by one element in each pass.
• Not online : Need all data to be in memory at the time of sorting.
• Not in place : It need O(n) extra space to sort two sub list of
size(n/2).
24
25. Real-time application
The e-commerce application
Have you ever noticed on any e-commerce website, they have this section of "You
might like", they have maintained an array for all the user accounts and then
whichever has the least number of inversion with your array of choices, they start
recommending what they have bought or they like. I am not going into the time and
space complexity details of the algorithm. Obviously there are a lot of ways of doing
this and this is one of them.
And some e-commerce website like policybazaar.com , trivago.com etc. use there
search engine for collecting data from other website to provide minimum cost of
booking hotel room or purchasing product .
25