Week 03 (Sorting)
Sorting: Background |
Sorting | 2/83 |
Sorting involves arranging a collection of items in order
Item
The Sorting Problem | 3/83 |
Arrange items in array slice a[lo..hi]
For Item a[N]
(lo == 0)
(hi == N-1)
... The Sorting Problem | 4/83 |
More formally ...
Precondition:
lo
hi
0
lo
hi
N-1
a[lo..hi]
Item
a'[lo..hi]
i
lo..hi-1
a'[i]
a'[i+1]
... The Sorting Problem | 5/83 |
We sort arrays of Item
int
char
float
struct
Item
key
key
Duplicate key
... The Sorting Problem | 6/83 |
Properties of sorting algorithms: stable, adaptive
Stable sort:
a[i]
a[j]
key(x) == key(y)
a
a'
Adaptive:
... The Sorting Problem | 7/83 |
In analysing sorting algorithms:
hi-lo+1
Cases to consider for initial order of items:
Item
a[lo..hi]
a'[lo]
a'[lo+1]
a'[hi]
a'[lo]
a'[lo+1]
a'[hi]
Comparison of Sorting Algorithms | 8/83 |
A variety of sorting algorithms exist
Implementing Sorting | 9/83 |
Concrete framework:
// we deal with generic Items typedef int Item;// abstractions to hide details of Items #define key(A) (A) #define less(A,B) (key(A) < key(B)) #define swap(A,B) {Item t; t = A; A = B; B = t;} #define swil(A,B) {if (less(A,B)) swap(A,B);}// swil = SWap If Less // Sorts a slice of an array of Items void sort(Item a[], int lo, int hi);// Check for sortedness (to validate functions) int isSorted(Item a[], int lo, int hi);
Exercise 1: Implementing isSorted() | 10/83 |
Implement the isSorted()
Use the generic Item
Exercise 2: Sorting Performance Lab | 11/83 |
Implement a program that allows us to
$ ./sorter Algo Size Dist [Seed]
Algo
Size
Dist
Seed
Exercise 3: SortLab | 12/83 |
Implement a "sorting laboratory", a program that allows us to
$ ./sorter Usage: ./sorter Algo Size Dist [Seed] Algo = b|i|m|q|s, Dist = A|D|R $ ./sorter b 10 R Before: 79 70 69 17 89 07 02 56 30 86 After: 02 07 17 30 56 69 70 79 86 89 #compares: 44, #swaps: 27, #moves: 0 OK
Sorts on Linux | 13/83 |
The sort
qsort()
qsort(void *a, int n, int size, int (*cmp)())
n
size
strcmp()
cmp()
Exercise 4: Sorting on Different Fields | 14/83 |
Write a program that reads data in the form:
5059413 Daisy 3762 15 3461045 Yan 3648 42 3474881 Sinan 8543 16 5061020 Yu 3970 3
and then ..
Student
{sid,name,prog,magic}
qsort()
argv[1]
Student
Sorting: Elementary Algorithms |
Describing Sorting Algorithms | 16/83 |
To describe simple sorting, we use diagrams like:
In these algorithms ...
Selection Sort | 17/83 |
Simple, non-adaptive method:
... Selection Sort | 18/83 |
State of array after each iteration:
... Selection Sort | 19/83 |
void selectionSort(int a[], int lo, int hi) { int i, j, min; for (i = lo; i < hi-1; i++) { min = i; for (j = i+1; j <= hi; j++) { if (less(a[j],a[min])) min = j; } swap(a[i], a[min]); } }
... Selection Sort | 20/83 |
Cost analysis (where n = hi-lo+1
Bubble Sort | 21/83 |
Simple adaptive method:
... Bubble Sort | 22/83 |
... Bubble Sort | 23/83 |
State of array after each iteration:
... Bubble Sort | 24/83 |
Sorting Example (courtesy Sedgewick):
S O R T E X A M P L E A S O R T E X E M P L A E S O R T E X L M P A E E S O R T L X M P A E E L S O R T M X P A E E L M S O R T P X A E E L M O S P R T X A E E L M O P S R T X A E E L M O P R S T X ... A E E L M O P R S T X
... Bubble Sort | 25/83 |
C version:
void bubbleSort(int a[], int lo, int hi) { int i, j, nswaps; for (i = lo; i < hi; i++) { nswaps = 0; for (j = hi; j > i; j--) { if (less(a[j], a[j-1])) { swap(a[j], a[j-1]); nswaps++; } } if (nswaps == 0) break; } }
... Bubble Sort | 26/83 |
Cost analysis (where n = hi-lo+1
Insertion Sort | 27/83 |
Simple adaptive method:
... Insertion Sort | 28/83 |
... Insertion Sort | 29/83 |
Sorting Example (courtesy Sedgewick):
S O R T E X A M P L E S O R T E X A M P L E O S R T E X A M P L E O R S T E X A M P L E O R S T E X A M P L E E O R S T X A M P L E E O R S T X A M P L E A E O R S T X M P L E A E M O R S T X P L E A E M O P R S T X L E A E L M O P R S T X E A E E L M O P R S T X
... Insertion Sort | 30/83 |
Simple implementation:
void insertionSort(int a[], int lo, int hi) { int i, j, val; for (i = lo+1; i <= hi; i++) { val = a[i]; for (j = i; j > lo; j--) { if (!less(val,a[j-1])) break; a[j] = a[j-1]; } a[j] = val; } }
... Insertion Sort | 31/83 |
Above algorithm can be improved by using sentinel
Central loop has two tests:
for (j = i; j > lo; j--) { if (!less(val,a[j-1])) break; a[j] = a[j-1]; }
If we knew that a[lo]
for (j = i; less(val,a[j-1]); j--) a[j] = a[j-1];
... Insertion Sort | 32/83 |
Better implementation:
void insertionSort(int a[], int lo, int hi) { int i, j, val; for (i = hi; i > lo; i--) swil(a[i-1], a[i]); for (i = lo+2; i <= hi; i++) { val = a[i]; for (j = i; less(val,a[j-1]); j--) { a[j] = a[j-1]; } a[j] = val; } }
Uses sentinel to reduce tests needed.
... Insertion Sort | 33/83 |
How the above algorithm works:
S O R T E X A M P L E A S O R T E X E M P L A S O R T E X E M P L A O S R T E X E M P L A O R S T E X E M P L A O R S T E X E M P L A E O R S T X E M P L A E O R S T X E M P L A E E O R S T X M P L A E E M O R S T X P L A E E M O P R S T X L A E E L M O P R S T X
... Insertion Sort | 34/83 |
Complexity analysis ...
ShellSort: Improving Insertion Sort | 35/83 |
Insertion sort:
... ShellSort: Improving Insertion Sort | 36/83 |
Example h-sorted arrays:
... ShellSort: Improving Insertion Sort | 37/83 |
void shellSort(int a[], int lo, int hi) { int hvals[8] = {701, 301, 132, 57, 23, 10, 4, 1}; int g, h, start, i, j, val; for (g = 0; g < 8; g++) { h = hvals[g]; start = lo + h; for (i = start; i < hi; i++) { val = a[i]; for (j = i; j >= start && less(val,a[j-h]); j -= h) move(a, j, j-h); a[j] = val; } } }
... ShellSort: Improving Insertion Sort | 38/83 |
Effective sequences of h values have been determined empirically.
E.g. hi+i = 3hi+1 ... 1093, 364, 121, 40, 13, 4, 1
Efficiency of Shellsort:
Summary of Elementary Sorts | 39/83 |
Comparison of sorting algorithms (online)
#compares | #swaps | #moves | |||||||
min | avg | max | min | avg | max | min | avg | max | |
Selection sort | n2 | n2 | n2 | n | n | n | . | . | . |
Bubble sort | n | n2 | n2 | 0 | n2 | n2 | . | . | . |
Insertion sort | n | n2 | n2 | . | . | . | n | n2 | n2 |
Shell sort | n | n4/3 | n4/3 | . | . | . | 1 | n4/3 | n4/3 |
Which is best?
Sorting Linked Lists | 40/83 |
Selection sort on linked lists
Exercise 5: Exploring Sorting in SortLab | 41/83 |
Use SortLab to demonstrate/check:
Sorting: Better (O(nlogn)) Algorithms |
Quicksort | 43/83 |
Previous sorts were all O(nk) (k>1). Can we do better?
Quicksort: basic idea
... Quicksort | 44/83 |
Phases of quicksort:
Quicksort Implementation | 45/83 |
Elegant recursive solution ...
void quicksort(Item a[], int lo, int hi) { int i;// index of pivot if (hi <= lo) return; i = partition(a, lo, hi); quicksort(a, lo, i-1); quicksort(a, i+1, hi); }
... Quicksort Implementation | 46/83 |
Partitioning phase:
... Quicksort Implementation | 47/83 |
Partition implementation:
int partition(Item a[], int lo, int hi) { Item v = a[lo];// pivot int i = lo+1, j = hi; for (;;) { while (less(a[i],v) && i < j) i++; while (less(v,a[j]) && j > i) j--; if (i == j) break; swap(a,i,j); } j = less(a[i],v) ? i : i-1; swap(a,lo,j); return j; }
Quicksort Performance | 48/83 |
Best case: O(nlogn) comparisons
Quicksort Improvements | 49/83 |
Choice of pivot can have significant effect:
... Quicksort Improvements | 50/83 |
Median-of-three partitioning:
void medianOfThree(Item a[], int lo, int hi) { int mid = (lo+hi)/2; if (less(a[mid],a[lo])) swap(a, lo, mid); if (less(a[hi],a[mid])) swap(a, mid, hi); if (less(a[mid],a[lo])) swap(a, lo, mid);// now, we have a[lo] < a[mid] < a[hi] // swap a[mid] to a[lo+1] to use as pivot swap(a, mid, lo+1);swap(a, lo, mid);} void quicksort(Item a[], int lo, int hi) { int i; if (hi-lo < Threshhold) { ... return; } medianOfThree(a, lo, hi); i = partition(a, lo+1, hi-1); quicksort(a, lo, i-1); quicksort(a, i+1, hi); }
... Quicksort Improvements | 51/83 |
Another source of inefficiency:
... Quicksort Improvements | 52/83 |
Approach 1:
void quicksort(Item a[], int lo, int hi) { int i; if (hi-lo < Threshhold) { insertionSort(a, lo, hi); return; } medianOfThree(a, lo, hi); i = partition(a, lo+1, hi-1); quicksort(a, lo, i-1); quicksort(a, i+1, hi); }
... Quicksort Improvements | 53/83 |
If the array contains many duplicate keys
... Quicksort Improvements | 54/83 |
Bentley/McIlroy approach to three-way partition:
Non-recursive Quicksort | 55/83 |
Quicksort can be implemented using an explicit stack:
void quicksortStack (Item a[], int lo, int hi) { int i; Stack s = newStack(); StackPush(s,hi); StackPush(s,lo); while (!StackEmpty(s)) { lo = StackPop(s); hi = StackPop(s); if (hi > lo) { i = partition (a,lo,hi); StackPush(s,hi); StackPush(s,i+1); StackPush(s,i-1); StackPush(s,lo); } } }
Mergesort | 56/83 |
Mergesort: basic idea
... Mergesort | 57/83 |
Phases of mergesort
Mergesort Implementation | 58/83 |
Mergesort function:
void mergesort(Item a[], int lo, int hi) { int mid = (lo+hi)/2;// mid point if (hi <= lo) return; mergesort(a, lo, mid); mergesort(a, mid+1, hi); merge(a, lo, mid, hi); }
Example of use (typedef int Item
int nums[10] = {32,45,17,22,94,78,64,25,55,42}; mergesort(nums, 0, 9);
... Mergesort Implementation | 59/83 |
One step in the merging process:
... Mergesort Implementation | 60/83 |
Implementation of merge:
void merge(Item a[], int lo, int mid, int hi) { int i, j, k, nitems = hi-lo+1; Item *tmp = malloc(nitems*sizeof(Item)); i = lo; j = mid+1; k = 0;// scan both segments, copying to tmp while (i <= mid && j <= hi) { if (less(a[i],a[j])) tmp[k++] = a[i++]; else tmp[k++] = a[j++]; }// copy items from unfinished segment while (i <= mid) tmp[k++] = a[i++]; while (j <= hi) tmp[k++] = a[j++];//copy tmp back to main array for (i = lo, k = 0; i <= hi; i++, k++) a[i] = tmp[k]; free(tmp); }
Mergesort Performance | 61/83 |
Best case: O(NlogN) comparisons
Non-recursive Mergesort | 62/83 |
Non-recursive mergesort does not require a stack
... Non-recursive Mergesort | 63/83 |
Bottom-up mergesort for arrays:
#define min(A,B) ((A < B) ? A : B) void mergesort(Item a[], int lo, int hi) { int i, m;// m = length of runs int end;// end of 2nd run for (m = 1; m <= lo-hi; m = 2*m) { for (i = lo; i <= hi-m; i += 2*m) { end = min(i+2*m-1, hi); merge(a, i, i+m-1, end); } } }
Mergesort Variation | 64/83 |
Above methods require temporary array
merge(a,lo,mid,hi)
malloc()/free()
malloc()/free()
... Mergesort Variation | 65/83 |
Merge sort with source/destination arrays:
void mergeSort(Item a[], int lo, int hi) { int i; Item *aux = malloc((hi+1)*sizeof(Item)); for (i = lo; i <= hi; i++) aux[i] = a[i]; doMergeSort(a, aux, lo, hi); free(aux); } void doMergeSort(Item a[], Item b[], int lo, int hi) { if (lo >= hi) return; int mid = (lo+hi)/2; doMergeSort(b, a, lo, mid); doMergeSort(b, a, mid+1, hi); merge(b+lo, mid-lo+1, b+mid+1, hi-mid, a+lo); }
... Mergesort Variation | 66/83 |
// merge arrays a[] and b[] into c[] // aN = size of a[], bN = size of b[] void merge(Item a[], int aN, Item b[], int bN, Item c[]) { int i; // index into a[] int j; // index into b[] int k; // index into c[] for (i = j = k = 0; k < aN+bN; k++) { if (i == aN) c[k] = b[j++]; else if (j == bN) c[k] = a[i++]; else if (less(a[i],b[j])) c[k] = a[i++]; else c[k] = b[j++]; } }
Summary of Sort Methods | 67/83 |
Sort an collection of N items in ascending order ...
Elementary sorts: O(N2) comparisons
Merge sort adapts well for use as disk-based sort.
... Summary of Sort Methods | 68/83 |
Other properties of sort algorithms: stability, adaptive
Selection sort:
HeapSort | 69/83 |
We will discuss HeapSort later in the course, after discussing a "heap" tree structure.
Sorting Lower Bound |
Sorting Lower Bound for Comparison-Based Sorting | 71/83 |
Many popular sorting algorithms "compare" pairs of keys (objects) to sort an input sequence.
For example: selection-sort, insertion-sort, bubble-sort, merge-sort, quick-sort, etc.
Lower Bound: Any comparison-based sorting algorithm must take Ω (n log n) time to sort n elements in the worst case.
... Sorting Lower Bound for Comparison-Based Sorting | 72/83 |
Brief explanation:
Given n elements (no duplicates),
For a given input,
log2(n!) = log2(1) + log2(2) + ... + log2(n/2) + ... + log2(n-1) + log2(n) log2(n!) >= log2(n/2) + ... + log2(n-1) + log2(n) log2(n!) >= (n/2)log2(n/2) log2(n!) = Ω (n log2n)
Therefore, for any comparison-based sorting algorithm, the lower bound is Ω (n log2 n) .
Non-Comparative Sorting |
Radix Sort | 74/83 |
Radix sort is a non-comparative sorting algorithm.
Radix sort basic idea:
See radix sort example below,
Bucket/Pigeonhole Sort | 75/83 |
Bucket/Pigeonhole sort basic idea:
See bucket/pigeonhole sort example below,
External Sorting |
External Mergesort | 77/83 |
Previous sorts all assume ...
Item
... External Mergesort | 78/83 |
External mergesort basic idea:
Item
... External Mergesort | 79/83 |
State of output file after each iteration:
... External Mergesort | 80/83 |
Naive external mergesort algorithm (not valid C)
Input: stdin, containing N Items Output: stream of Items on stdout copy stdin file to A runLength = 1 iter = 0 while (runLength < N) { if (iter % 2 == 0) inFile = A, outFile = B else inFile = B, outFile = A fileMerge(inFile, outFile, runLength, N) iter++; runLength *= 2; } copy outfile to stdout
... External Mergesort | 81/83 |
Rough file merging algorithm (not valid C)
// assumes N = 2^k for some integer k > 0 fileMerge(inFile, outFile, runLength, N) { inf1 = open inFile for reading inf2 = open inFile for reading outf = open outFile for writing in1 = 0; in2 = runLength while (in1 < N) { seek to position in1 in inf1 end1 = in1+runLength it1 = getItem(inf1) seek to position in2 in inf2 end2 = in2+runLength it2 = getItem(inf2) while (in1 < end1 && in2 < end2) { if (less(it1,it2)) { write it1 to outf it1 = getItem(inf1); in1++ } else { write it2 to outf it2 = getItem(inf2); in2++ } } while (in1 < end1) { write it1 to outf it1 = getItem(inf1); in1++ } while (in2 < end2) { write it1 to outf it1 = getItem(inf1); in1++ } in1 += runLength; in2 += runLength; } }
External Mergesort Cost Analysis | 82/83 |
Critical operations: read an Item
Item
This approach is used by the Unix/Linux sort
... External Mergesort Cost Analysis | 83/83 |
While this is an O(NlogN) algorithm, each iteration reads and write the entire data file.
Reduce iterations via an initial in-memory sorting pass:
If chunks are size 2k, need k less iterations.
Produced: 8 Aug 2017