Analysis of Algorithms Bucket Sort - Radix Sort & Lower Limit for sorting
Ibrahim Mesecan Page 1
Lower limit for sorting
Important Note In this lecture series, I have used some parts and notes from the books: Data Structures and Algorithm Analysis in C++ 2nd editon Author: M.A. Weiss Introduction to Algorithms (2nd Edition) Authors: Cormen / Leiserson / Rivest / Stein And from the Lecture notes of Prof. Charles Leiserson Prof. Erik Demaine. (http://ocw.mit.edu). If I miss referencing anybody please inform me at
[email protected] This material is free for non-commercial and public use. For any reuse or distribution, you must make clear to others the license terms of this work. Ibrahim Mesecan Page 2
Lower limit for sorting
Decision trees
Ibrahim Mesecan Page 3
Although we have 0(N log N) algorithms for sorting, it is not clear that if this is as good as we can do. In this section, we prove that any algorithm for sorting that uses only comparisons requires (N log N) comparisons (and hence time) in the worst case, so that mergesort and heapsort are optimal to within a constant factor. The proof can be extended to show that (N logN) comparisons are required, even on average, for any sorting algorithm that uses only comparisons, which means that quicksort is optimal on average to within a constant factor. A decision tree is an abstraction used to prove lower bounds. binary tree. Each node represents a set of possible orderings Lower limit for sorting
Decision trees
Ibrahim Mesecan Page 4
Lower limit for sorting
Decision trees
Ibrahim Mesecan Page 5
Lower limit for sorting
Decision trees
Ibrahim Mesecan Page 6
Lower limit for sorting
Decision trees Every algorithm that sorts by using only comparisons can be represented by a decision tree. Of course, because the number of leaves grows with n!, it is only feasible to draw the tree for extremely small input sizes. The number of comparisons used by the sorting algorithm is equal to the depth of the deepest leaf. In our case, this algorithm uses three comparisons in the worst case. Ibrahim Mesecan Page 7
Lower limit for sorting
Decision trees Theorem. Any decision tree that can sort n elements must have height Ω(n lg n). Proof. The tree must contain n! leaves, since there are n! possible permutations. A height-h binary tree has ≤2h leaves. Thus, we need to find 2h that is ≥ n! ∴ h ≥ lg(n!) (lg is mono. increasing) ≥ lg ((n/e)n) (Stirling’s formula) = n lg n– n lg e = Ω(n lg n). Ibrahim Mesecan Page 8
Lower limit for sorting
Bucket - Counting Sort In some ways Bucket sort is similar to Counting sort.
Although we proved in the previous section that any general sorting algorithm that uses only comparisons requires (N logN) time in the worst case, recall that it is still possible to sort in linear time in some special cases.
Ibrahim Mesecan Page 9
A simple example is bucket sort. There are different implementations of it. For bucket sort to work, extra information must be available. The input A1, A2, ..., AN must consist of only positive integers smaller than M. (Obviously extensions to this are possible.) Lower limit for sorting
Bucket Sort If this is the case, then the algorithm is simple:
Keep an array called Count, of size M, which is initialized to all 0s. Thus, Count has M cells, or buckets, which are initially empty. When A, is read, increment Count[Ai] by 1. After all the input is read, scan the Count array, printing out a representation of the sorted list. This algorithm takes 0(M + N); the proof is left as an exercise. If M is O(N), then the total is O(N). Ibrahim Mesecan Page 10
Lower limit for sorting
Bucket Sort You put the items in separate buckets first. Then, you sort the buckets.
Ibrahim Mesecan Page 11
Lower limit for sorting
Bucket Sort For example, think that in a school program, student marks are from 1 to 10 and there are thousands of records. You are asked to sort this data according to average marks. We can easily create 10 buckets and put the records in proper buckets in O(N) time. Then, we can take out the records from the buckets in order, O(N).
Ibrahim Mesecan Page 12
Lower limit for sorting
Bucket Sort void bucketsort(int *List, int n){ int *buckets=new int[NUM_BUCKETS], cnt=0; for(int i=0;i
0){ List[cnt++]=i; buckets[i]--; } Ibrahim Mesecan Page 13
} Lower limit for sorting
Bucket Sort
Ibrahim Mesecan Page 14
Although this algorithm seems to violate the lower bound, it turns out that it does not. By incrementing the appropriate bucket, the algorithm essentially performs an M-way comparison in unit time. The model actually is a strong model, because a general-purpose sorting algorithm cannot make assumptions about the type of input it can expect to see, but must make decisions based on ordering information only. Naturally, if there is extra information available, we should expect to find a more efficient algorithm, since otherwise the extra information would be wasted. Lower limit for sorting
Self Study Question: Parking Busses Parking Buses: Tirana City has N buses, conveniently numbered from 1 to N, for the public transportation. The buses are parked in a parking area at the end of the day. The buses are parked side by side. They can be parked in any order but there are some restrictions that should be taken into the consideration. The restrictions are about the priority between two certain buses. Some buses should be parked before some other buses due to the schedule of the next day. Question: Make a program to determine if the current order of the buses meets all the restriction. Ibrahim Mesecan Page 15
Lower limit for sorting
Self Study Question: Parking Busses Parking Buses, Input File: Input has several lines. The first line contains two integers N and M, where N denotes number of the buses (1
Lower limit for sorting
Bucket Sort: Constraints 1. Important point for bucket sort is that m has to be a meaningful value. If you have a list of 10.000 numbers whose values vary between 1 and two billion, then you would have to create two billion boxes for 10,000 numbers. 2. In case of all the records goes into the same box, the complexity will reach to N2. Eg: 44 45 46 44 48 41 43 47 46 49 45 48 46 ……. 0-10 11-20 21-30 31-40 41-50
Ibrahim Mesecan Page 17
44, 45, 46, 44, 48, 41, 43, 47…
Lower limit for sorting
Radix Sort Origin: Herman Hollerith’s card-sorting machine for the 1890 U.S. Census. Digit-by-digit sort. Hollerith’s original (bad) idea: sort on mostsignificant digit first. Good idea: Sort on least-significant digit first with auxiliary stable sort. Radix Sort Animation Ibrahim Mesecan Page 18
Lower limit for sorting
Radix Sort Radix Sort Animation
Ibrahim Mesecan Page 19
Lower limit for sorting
Correctness of radix sort Induction on digit position Assume that the numbers are sorted by their low-order t –1 digits. Sort on digit t
Ibrahim Mesecan Page 20
Lower limit for sorting
Correctness of radix sort Induction on digit position Assume that the numbers are sorted by their low-order t –1 digits. Sort on digit t Two numbers that differ in digit are correctly sorted.
Ibrahim Mesecan Page 21
Lower limit for sorting
Correctness of radix sort
Ibrahim Mesecan Page 22
Induction on digit position Assume that the numbers are sorted by their low-order t –1 digits. Sort on digit t Two numbers that differ in digit tare correctly sorted. Two numbers equal in digit are put in the same order as the input ⇒correct order. Lower limit for sorting
Analysis of radix sort Assume counting sort is the auxiliary stable sort. Sort n computer words of b bits each. Each word can be viewed as having b/r base-2r digits. Example:32-bit word r= 8⇒b/r=4 passes of counting sort on base-28 digits; or r= 16⇒ b/r = 2 passes of counting sort on base-216 digits. How many passes should we make? Ibrahim Mesecan Page 23
Lower limit for sorting
Analysis of radix sort Recall: Counting sort takes Θ(n + k)time to sort n numbers in the range from 0 to k –1. If each b-bit word is broken into r-bit pieces, each pass of counting sort takes Θ(n + 2r)time. Since there are b/r passes, we have
Choose r to minimize T(n,b): Ibrahim Mesecan Page 24
Increasing r means fewer passes, but as r >> log n, the time grows exponentially. Lower limit for sorting
Conclusion In practice, radix sort is fast for large inputs, as well as simple to code and maintain. Example (32-bit numbers): At most 3 passes when sorting ≥2000 numbers.
Merge sort and quick sort do at least ⎡lg2000⎤= 11 passes.
Ibrahim Mesecan Page 25
Lower limit for sorting
Conclusion Radix sort Downside: Unlike quick sort, radix sort displays little locality of reference, and thus a welltuned quick sort fares better on modern processors, which feature steep memory hierarchies.
Bucket sort: Although bucket sort seems like much too trivial an algorithm to be useful, it turns out that there are many cases where the input is only small integers, so that using a method like quicksort is really overkill. Ibrahim Mesecan Page 26
Lower limit for sorting