diff --git a/index.md b/index.md index 65b09fe..2674278 100644 --- a/index.md +++ b/index.md @@ -62,6 +62,7 @@ Sep. 22 | | | [Lab 4: Next Prev Binary Tree](./lab4)
[_Lab Notes_](./Sep Sep. 25 | [Hash tables](./lectures/Sep-25.md) | Ch. 5 sec. 1,2,3,5,6 | Lab 4 due | Sep. 27 | [Code Review (Merge Sort),
Assertions and Correctness](./lectures/Sep-27.md) Sep. 29 | [Quiz 2](https://iu.instructure.com/courses/2165834/quizzes/4049366) | | Lab: work on
[Project 2 Segment Intersection](./proj2-seg-int)
[_Lab Notes_](./Sep-29-backup-notes) | [test](https://autograder.luddy.indiana.edu/web/project/833), [code](https://autograder.luddy.indiana.edu/web/project/700) +Oct. 2 | [Heaps and Priority Queues](./lectures/Oct-2.md) | Ch. 6 sec. 1-4, 9 | Oct. 6 | | | Lab: finish
[Project 2 Segment Intersection](./proj2-seg-int) | [test](https://autograder.luddy.indiana.edu/web/project/833), [code](https://autograder.luddy.indiana.edu/web/project/700) Oct. 9 | Review for Midterm Exam | | Project 2: Segment Intersection due | Oct. 11 | **Midterm Exam** (in class) diff --git a/lectures/Oct-2.md b/lectures/Oct-2.md new file mode 100644 index 0000000..264dac5 --- /dev/null +++ b/lectures/Oct-2.md @@ -0,0 +1,301 @@ +# Lecture: Heaps and Priority Queues + +Motivations: + + 1. Need a data structure that returns the maximum-keyed + element of a set, that is, a data structure called a priority + queue. The key's are the priorities. + + (We could implement a priority queue using an AVL tree, but we + want to be more space efficient.) + + 2. Want to do in-place sorting in O(n log(n)). + + 3. Want to be able to change the key (priority) associated with an element. + +## Heaps + + __16__ + / \ + 14 10 + / \ / \ + / \ / \ + 8 7 9 3 + / \ / + 2 4 1 + +Idea: store elements in an array left-to-right, one level at a +time, top-to-bottom. + + 0 1 2 3 4 5 6 7 8 9 + [16,14,10, 8, 7, 9, 3, 2, 4, 1] + +Def. A **heap** is an array viewed as a nearly complete binary tree. +Given an array A, the root of the tree is stored at A[0] and for a +node stored at index i, the left child is stored at index 2i+1 and +the right child is stored at index 2(i+1). +(Not to be confused with the use of "heap" to mean a computer's +main memory.) + + left(i) = 2*i + 1 + + right(i) = 2*(i + 1) + + parent(i) = floor( (i-1) / 2 ) + +Def. A **max heap** is a heap in which for every node other than the root, +A[i] ≤ A[parent(i)] + +This is called the **heap property** or max-heap property. +One can instead create a min-heap by flipping this around. + + +Overview of the Heap operations + +* `build_max_heap` creates a heap from an unordered array in O(n). +* `maximum` returns maximum element in O(1). +* `extract_max` removes the max in O(log(n)). +* `sortInPlace` sorts an array in-place in O(n log(n)). +* `increase_key` updates the key of an element in the heap in O(log(n)). +* `insert` adds a new element to a heap in O(log(n)). + +Heap class: + + public class Heap { + ArrayList data; + BiPredicate lesseq; + ... + } + +## `max_heapify` helper function + +Many of the heap operations need to turn an array that is almost a max +heap, except for one element, into a max heap. The `max_heapify` +operation moves the element at position i into the correct position. + +The tree rooted at i is not a max heap, but the subtrees +left(i) and right(i) are max heaps. + +Example: 4 is less than one of its children so we swap it with +the larger child and repeat. + + ___16___ + / \ + *4* 10 + / \ / \ + / \ 9 3 + 14 7 + / \ / + 2 8 1 + +becomes + + ___16___ + / \ + 14 10 + / \ / \ + / \ 9 3 + *4* 7 + / \ / + 2 8 1 + +becomes + + ___16___ + / \ + 14 10 + / \ / \ + / \ 9 3 + 8 7 + / \ / + 2 *4* 1 + +Java declaration for `max_heapify`: + + void max_heapify(int i, int heap_length); + +What is the time complexity of max_heapify? O(log(n)) + +## `build_max_heap` method + + void build_max_heap() { + int last_parent = data.size() / 2 - 1; + for (int i = last_parent; i != -1; --i) { + max_heapify(i, data.size()); + } + } + +Why does this procedure work? What is the loop invariant? +Answer: the invariant is that the trees rooted at + positions from i+1 to the end are max heaps. + +What is the time complexity? +Answer: O(n log(n)) is the easy answer, but not tight. + +The tight upper bound is O(n) which can be obtained by +observing that the time for `max_heapify` depends on the +height of the node, and `build_max_heap` calls `max_heapify +many times on nodes with a low height. + +Consider how many nodes there can be in an n-element heap at each +height. The worst cast is a complete tree. For example, + + n = 7 + + _o_ height 2, 1 node + / \ + o o height 1, 2 nodes + / \ / \ + o o o o height 0, 4 nodes + + Height 0: n / 2 = n / 2^(0+1) + + Height 1: n / 4 = n / 2^(1+1) + + Height 2: n / 8 = n / 2^(2+1) + +In general, there are at most n / 2^(h+1) nodes at a given height h. + +So we can sum these up, from h=0 to log(n), with O(h) cost for each: + +sum from h=0 to log(n) of (n / 2^(h+1)) * O(h) + = O(n * sum from h=0 to log(n) of h / 2^h) + +recall formula A.8: + + sum from k=0 to ∞ of (k * x^k) = x / (1 - x)², for |x| < 1 + + let x = 1/2: + + sum from k=0 to ∞ of (k / 2^k) = (1/2) / (1 - (1/2))² + = (1/2) / (1 \times 1 - 2\times 1/2 + 1/4) + = (1/2) / (1/4) = 2 + +so we get + + sum from h=0 to log(n) of (h / 2^h) < 2 + +Thus + + O( n * sum from h=0 to log(n) of (h / 2^h)) + = O( n * 2 ) + = O(n). + + +## `maximum` method + + public E maximum() { + return data.get(0); + } + +## `extract_max` method + +Idea: first record the max element, which is at `data[0]`, then we +need to delete that element. But deleting the first element of an +array is expensive. So we move the last element of the heap to +`data[0]`, shrink the array, and re-establish the max heap +property with `max_heapify`. + + E extract_max() { + E max = data.get(0); + data.set(0, data.get(data.size()-1)); + data.remove(data.size()-1); + max_heapify(0, data.size()); + return max; + } + +## `sortInPlace` method + +Idea: swap the max (the root) with the last element, call `max_heapify`, +then shrink the heap by 1. +Similar to `extract_max` but does a swap instead of move. + + static void sortInPlace(ArrayList A, + BiPredicate lessThanOrEqual) + { + Heap H = new Heap(lessThanOrEqual); + H.data = A; + H.build_max_heap(); + for (int i = H.data.size() - 1; i != 0; --i) { + swap(H.data, 0, i); + H.max_heapify(0, i); + } + } + +Time complexity: +The for loop executes n times, and max_heapify is O(log(n)), +so we have O(n log(n)). + +## `increase_key` method (used by insert) + +The key of the object at position i has increased. +How should we move it to get a valid heap? +Example: Change the key of 9 to 20. + + ___16___ + / \ + 14 10 + / \ / \ + 8 7 *9* 3 + +## Student exercise: come up with the algorithm for `increase_key` + +Answer: the idea is to propagate the element up. Continuing the above +example, + + ___16___ + / \ + 14 10 + / \ / \ + 8 7 *20* 3 + +becomes + + ___16___ + / \ + 14 *20* + / \ / \ + 8 7 10 3 + +becomes + + __*20*__ + / \ + 14 16 + / \ / \ + 8 7 10 3 + +## `insert` method + + void insert(E k) { + if (data.size() + 1 ≥ data.size()) { + ArrayList d = new ArrayList<>((data.size() + 1) * 2); + for (E e : data) + d.add(e); + data = d; + } + data.add(k); + increase_key(data.size() - 1); + } + +## Priority Queues + +- implement using a `Heap` +- the priorities are the keys +- `push`: implement with `insert` +- `pop`: implement with `extract_max` + +In Java: + + class PriorityQueue { + Heap heap; + PriorityQueue(BiPredicate lessThanOrEqual) { + heap = new Heap<>(lessThanOrEqual); + } + void push(E key) { + heap.insert(key); + } + E pop() { + return heap.extract_max(); + } + }