Week 09

Kruskal's Algorithm 1/45

One approach to computing MST for graph G(V,E):

MSTree KruskalMST(Graph g)
{
   MST = empty graph (nV(g))
   sortedEdgeList = sort edges(g) by weight
   foreach (edge e in sortedEdgeList) {
      add e to MST
      if (MST has a cyle) remove e from MST
      if (MST has nV(g) vertices) break
   }
   return MST
}

Critical operations: sorting edges O(ElogE), checking for cycles O(EV)

Reminder: MST = subgraph, with all vertices, and minimum sum-of-edge-weights

Prim's Algorithm 2/45

Another approach to computing MST for graph G(V,E):

MSTree PrimMST(Graph g)
{
   MST = empty graph (nV(g))
   usedV = {0};  unusedE = edges(g)
   while (size(usedV) < nV(g)) {
      find (edge e in unusedE) satisfying {
         e = (s,t,w), s ∈ usedV, t ∉ usedV,
         w is min weight of all such edges
      }
      add e = (s,t,w) to MST
      usedV += t;  unusedE -= e
   }
   return MST
}

Critical operation: finding best edge

... Prim's Algorithm 3/45

Rough time complexity analysis ...

V iterations of outer loop
in each iteration ...
- set operations could be O(1) with e.g. bit-map
- find min edge with edge Set O(E) ⇒ O(VE) overall
- find min edge with PQueue O(logE) ⇒ O(VlogE) overall

Notes:

is a variation of graph traversal with PQueue vs LIFO/FIFO
has a similar structure to Dijkstra's shortest path algorithm

Priority Queues (sidetrack)

Priority Queues 5/45

Some applications of queues require

items processed in order of "key"
rather than in order of entry (FIFO)

Priority Queues (PQueues or PQs) provide this via:

join: insert item into PQ
leave: remove item with largest/smallest key
remove: remove item with specified key

Plus generic ADT operations: new, dispose, isEmpty, ...

Priority order may involve "weight" based on other factors than just key

... Priority Queues 6/45

Behaviour of priority queue ...

... Priority Queues 7/45

Priority Queue interface:

// Types
... PQueue ... //priority queue
... Item ... //items in queue
... Key ...  //priority values in items
// Operations
PQueue newPQueue(int Size);
void PQueueJoin(PQueue q, Item it);
Item PQueueLeave(PQueue q);
int  PQueueIsEmpty(PQueue q);

... Priority Queues 8/45

Priority Queue representations:

Data Structure	Space	Join	Leave	IsEmpty
Sorted Array	MaxN	O(N)	O(1)	O(1)
Unsorted Array	MaxN	O(1)	O(N)	O(1)
Sorted List	O(N)	O(N)	O(1)	O(1)
Unsorted List	O(N)	O(1)	O(N)	O(1)
Heap (see later)	O(N)	O(logN)	O(logN)	O(1)

for a PQueue containing N items

Shortest Path

Shortest Path 10/45

Path = sequence of edges in graph G

With unweighted edges, cost(path) = length (#edges)

With weighted edges, cost(path) = sum of edge weights (aka weight(path)

Shortest path between vertices s and t

a simple path p where s = first(p), t = last(p)
no other simple path q has cost(q) < cost(p)

Assumptions: weighted digraph, no negative weights.

Variations: source-target, single-source, all-pairs

Applications: robot navigation, routing in data networks

Single-source Shortest Path 11/45

Given: weighted digraph G, source vertex s

Result: shortest paths from s to all other vertices

dist[] V-indexed array of shortest path from s
pred[] V-indexed array of predecessor in shortest path

Example:

Edge Relaxation 12/45

Assume: dist[] and pred[] as above
(but containing data for shortest paths discovered so far)

[Diagram:Pics/wgraphs/relaxation-small.png]

Relaxation updates data for w if we find a shorter path from s to w.

... Edge Relaxation 13/45

Relaxation along edge e from v to w

dist[v] is length of shortest known path from s to v
dist[w] is length of shortest known path from s to w
if e gives shorter path s to w via v,
then update dist[w] and pred[w]

Implementation of edge relaxation:

if (dist[v] + e.weight < dist[w]) {
   dist[w] = dist[v] + e.weight;
   pred[w] = v;
}

Dijkstra's Algorithm 14/45

(* the above algorithm is from the book "Data Structures and Algorithms in Java", 6th Edition, By: Michael T. Goodrich; Roberto Tamassia, John Wiley & Sons)

Example-1 (Dijkstra's Algorithm) 15/45

Example-2^* (Dijkstra's Algorithm) 16/45

Find shortest paths (using Dijkstra's Algorithm) for the following graph, start vertex is BWI.

Predecessor List (a dot (.) means no entry exists on the predecessor list)

Action Cloud Priority Queue JFK MIA BOS DFW PVD SFO LAX ORD

{ } [ (0, BWI), (inf, JFK), (inf,ORD), (inf, MIA), (inf, DFW), (inf, BOS), (inf, PVD), (inf, SFO), (inf, LAX) ] . . . . . . . .

pull BWI into the cloud + BWI (0) [ (184, JFK), (621,ORD), (946, MIA), (inf, DFW), (inf, BOS), (inf, PVD), (inf, SFO), (inf, LAX) ] BWI BWI . . . . . BWI

pull JFK into the cloud + JFK (184) [ (328, PVD), (371, BOS), (621, ORD), (946, MIA), (1575, DFW), (inf, SFO), (inf, LAX) ] BWI BWI JFK JFK JFK . . BWI

pull PVD + PVD (328) [ (371, BOS), (621, ORD), (946, MIA), (1575, DFW), (inf, SFO), (inf, LAX) ] BWI BWI JFK JFK JFK . . BWI

pull BOS + BOS (371) [ (621, ORD), (946, MIA), (1575, DFW), (3075, SFO), (inf, LAX) ] BWI BWI JFK JFK JFK BOS . BWI

pull ORD,
update DFW and SFO + ORD (621) [ (946, MIA), (1423, DFW), (2467, SFO), (inf, LAX) ] BWI BWI JFK ORD JFK ORD . BWI

pull MIA + MIA (946) [ (1423, DFW), (2467, SFO), (3288, LAX) ] BWI BWI JFK ORD JFK ORD MIA BWI

pull DFW,
update LAX + DFW (1423) [ (2467, SFO), (2658, LAX) ] BWI BWI JFK ORD JFK ORD DFW BWI

pull SFO + SFO (2467) [ (2658, LAX) ] BWI BWI JFK ORD JFK ORD DFW BWI

pull LAX + LAX (2658) [ ] BWI BWI JFK ORD JFK ORD DFW BWI

			Predecessor List (a dot (.) means no entry exists on the predecessor list)
Action	Cloud	Priority Queue	JFK	MIA	BOS	DFW	PVD	SFO	LAX	ORD
	{ }	[ (0, BWI), (inf, JFK), (inf,ORD), (inf, MIA), (inf, DFW), (inf, BOS), (inf, PVD), (inf, SFO), (inf, LAX) ]	.	.	.	.	.	.	.	.
pull BWI into the cloud	+ BWI (0)	[ (184, JFK), (621,ORD), (946, MIA), (inf, DFW), (inf, BOS), (inf, PVD), (inf, SFO), (inf, LAX) ]	BWI	BWI	.	.	.	.	.	BWI
pull JFK into the cloud	+ JFK (184)	[ (328, PVD), (371, BOS), (621, ORD), (946, MIA), (1575, DFW), (inf, SFO), (inf, LAX) ]	BWI	BWI	JFK	JFK	JFK	.	.	BWI
pull PVD	+ PVD (328)	[ (371, BOS), (621, ORD), (946, MIA), (1575, DFW), (inf, SFO), (inf, LAX) ]	BWI	BWI	JFK	JFK	JFK	.	.	BWI
pull BOS	+ BOS (371)	[ (621, ORD), (946, MIA), (1575, DFW), (3075, SFO), (inf, LAX) ]	BWI	BWI	JFK	JFK	JFK	BOS	.	BWI
pull ORD, update DFW and SFO	+ ORD (621)	[ (946, MIA), (1423, DFW), (2467, SFO), (inf, LAX) ]	BWI	BWI	JFK	ORD	JFK	ORD	.	BWI
pull MIA	+ MIA (946)	[ (1423, DFW), (2467, SFO), (3288, LAX) ]	BWI	BWI	JFK	ORD	JFK	ORD	MIA	BWI
pull DFW, update LAX	+ DFW (1423)	[ (2467, SFO), (2658, LAX) ]	BWI	BWI	JFK	ORD	JFK	ORD	DFW	BWI
pull SFO	+ SFO (2467)	[ (2658, LAX) ]	BWI	BWI	JFK	ORD	JFK	ORD	DFW	BWI
pull LAX	+ LAX (2658)	[ ]	BWI	BWI	JFK	ORD	JFK	ORD	DFW	BWI

To recover the path itself, trace back through the Predecessor table. E.g., the shortest path to LAX goes through (in reverse order) LAX, DFW, ORD, BWI.

(* the above example is from the book "Data Structures and Algorithms in Java", 6th Edition, By: Michael T. Goodrich; Roberto Tamassia, John Wiley & Sons)

Exercise 1: Tracing Dijkstra's Algorithm 17/45

Show how Dijkstra's SSSP algorithm runs on:

[Diagram:Pics/wgraphs/sp-example-small.png]

But maybe better to watch Steven Ha's Visualisations (http://visualgo.net/sssp.html)

Dijkstra's Algorithm (C implementation) 18/45

What's needed for real implementation?

#define MAX_WT  Value larger than any real weight
#define NO_EDGE Value in adj matrix to indicate no edge

// Priority Queue interface
// use dist[] array to determine PQ priorities
PQueue newPQ(float dist[], int nV);
// add vertex to priority queue
void join(PQueue, Vertex);
// remove vertex with smallest dist[]
Vertex leave(PQueue, Vertex);
// reorder queue based on change to vertex
void reorder(PQueue, Vertex);
// check whether queue is empty
int empty(PQueue);
// clean up queue data
void disposePQ(PQueue);

... Dijkstra's Algorithm (C implementation) 19/45

C implementation (via Sedgewick)

void shortestPath(Graph g, Vertex start,
                   Vertex pred[], float dist[])
{
   PQueue pq = newPQ(dist,nV(g));
   for (Vertex v = 0; v < nV(g); v++)
      { pred[v] = -1; dist[v] = MAX_WT; join(pq,v); }
   dist[start] = 0.0;  reorder(pq,start);
   while (!empty(pq)) {
      Vertex s = leave(pq);
      for (Vertex t = 0; t < nV(g); t++) {
         float len = g->adj[s][t];
         if (len == NO_EDGE) continue;
         if (dist[s]+len < dist[t]) {
            pred[t] = s;
            dist[t] = dist[s]+len;
            reorder(pq,t);
         }
      }
   }
   disposePQ(pq);
}

... Dijkstra's Algorithm (C implementation) 20/45

Rough time complexity analysis ...

Outer loop has O(V) iterations, and PQ updates are O(logV).

Implementing "find (edge e=(s,t,w)) satisfying ..."

try all e ∈ E ... cost = O(VE)
classical Djiskstra approach ... cost = O(V²)
- consider only edges (s,t,w) where s ∈ vSet and t ∉ vSet
use a PQueue to find edge to relax
- cost = V*Cost_join + V*Cost_leave + ~E*Cost_reorder

In case 3, cost is dependent on efficiency of PQueue.

Searching

Searching 22/45

An extremely common application in computing

given a (large) collection of items and a key value
find the item(s) in the collection containing that key

As previously

item = {key, val₁, val₂, ...} (i.e. a struct)
key = value used to distinguish items (e.g. student ID)

Keys may be ...

primary ... key value uniquely identifies one item
secondary ... many items may have the same key value

Applications: Google, databases, .....

... Searching 23/45

Assume that we are dealing largely with primary keys.

Search problem can be encapsulated as:

Item search(Collection c, Key k) { ... }

Possible return values:

an Item
a distinguished "NOT FOUND" value

For secondary keys ... return an array (possibly empty) of matching items


Item *search(Collection c, key k) { ... }

... Searching 24/45

Searching is a very important/frequent operation.

Many approaches have been developed to do it ...

O(n) ... linear scan (search technique of last resort)
O(logn) ... binary search, search trees (trees also have other uses)
O(1) ... hash tables (only O(1) under optimal conditions)

We look at trees and hash tables in this part of course.

Searching in Linear Structures

Searching in Linear Structures 26/45

Linear structures: arrays, linked lists, files

Arrays = random access. Lists, files = sequential access.

Cost of searching:

	Array	List	File
Unsorted	O(n) (linear scan)	O(n) (linear scan)	O(n) (linear scan)
Sorted	O(logn) (binary search)	O(n) (linear scan)	O(logn) (seek,seek,...)

Although fseek() gives expensive random-access

... Searching in Linear Structures 27/45

Search in unsorted array, list, file:

Item searchArray(Key k, Item a[], int n) {
   int i;
   for (i = 0; i < n; i++) {
      if (key(a[i]) == k) return a[i];
   }
   return NOT_FOUND;
}
Item searchList(Key k, List L) {
   List n;
   for (n = L; n != NULL; n = n->next) {
      if (key(n->data) == k) return n->data;
   }
   return NOT_FOUND;
}
Item searchFile(Key k, FILE *f) { // open at start
   Item it;
   while (fread(&it, sizeof(Item), 1, f) == 1) {
      if (key(it) == k) return it;
   }
   return NOT_FOUND;
}

Tree Data Structures

Trees 29/45

Trees are branched data structures

consisting of nodes and links, with no cycles (no up-links)
each node contains a data value (or key+data)
each node has links to ≤ k other child nodes (k=2 below)

... Trees 30/45

Trees can be viewed as a set of nested structures:

each node has k possibly empty subtrees

... Trees 31/45

For much of this course, we focus on binary trees (k=2)

Binary trees can be defined recursively, as follows:

A binary tree is either

empty (contains no nodes)
consists of a node, with two subtrees
- each node contains a value
- the left and right subtrees are binary trees

... Trees 32/45

Trees are used in many contexts, e.g.

representing hierachical data structures (e.g. expressions)
efficient searching (e.g. sets, symbol tables, ...)

... Trees 33/45

Search trees have the properties

all values in left subtree are less than root
all values in right subtree are greater than root
this property applies over all nodes in the tree

Balanced trees have the properties

#nodes in left subtree = #nodes in right subtree
this property applies over all nodes in the tree

We focus on binary search trees or BSTs

Binary Search Trees (BSTs, BSTrees)

Binary Search Trees 35/45

Binary search trees have the characteristic properties

each node is the root of 0, 1 or 2 subtrees
all values in any left subtree are less than root
all values in any right subtree are greater than root
these properties applies over all nodes in the tree

Operations:

insert(Tree,Item) ... add new item to tree via key
delete(Tree,Key) ... remove item with specified key from tree
search(Tree,Key) ... find item containing key in tree
plus, "bookeeping" ... new(), dispose(), show(), empty(), ...

... Binary Search Trees 36/45

Examples of binary search trees:

[Diagram:Pics/trees/binary-search-trees-small.png]

Shape of tree is determined by order of insertion.

... Binary Search Trees 37/45

Level of node = path length from root to node

Depth of tree = max path length from root to leaf

[Diagram:Pics/trees/tree-depth-small.png]

Depth of tree with n nodes: min = floor(log₂n), max = n-1

Height balanced tree: ∀ nodes, depth(left subtree) ≅ depth(right subtree)

Time complexity of tree algorithms is typically O(depth)

Exercise 2: Insertion into BSTs 38/45

For each of the sequences below

start from an initially empty binary search tree
show the tree resulting from inserting the values in the order given

(a) 4 2 6 5 1 7 3

(b) 5 3 6 2 4 7 1

Assume ...

Items are keys, all keys are unique (no duplicates)
new values are always inserted as new leaf nodes

Representing BSTs 39/45

Typical data structures for trees ...

// Items (and Keys) are simply integer values
typedef int Item;
typedef int Key;

// a Tree is represented by a pointer to its root node
typedef struct node *Tree;

// a Node contains its data, plus left and right subtrees
typedef struct node {
   int data;  Tree left;  Tree right;
} Node;

// some macros that we might occasionally use
#define data(tree)  ((tree)->data)
#define left(tree)  ((tree)->left)
#define right(tree)  ((tree)->right)

... Representing BSTs 40/45

Abstract data vs concrete data ...

Exercise 3: BSTree Operations 41/45

The files BSTree.h and BSTree.c contain an ADT for binary search trees.

Using these definitions, implement the following operations:

BSTree newBSTree() (create empty tree)
void dropBSTree(BSTree) (free memory for tree)
int BSTreeFind(BSTree,int) (search for value in tree)
int BSTreeDepth(BSTree) (compute depth of tree)
int BSTreeNodes(BSTree) (count #nodes in tree)
BSTree BSTreeInsert(BSTree,int) (insert value into tree)

Look at VisuAlgo first, for some ideas.

Most are best expressed recursively. Base case? Recursive case?

Note: BSTreeFind() returns yes/no; BSTreeInsert() ignores duplicates

Tree Traversal 42/45

Iteration (traversal) on ...

Sets ... visit each value, order not important
Lists ... visit each value, from first to last
Graphs ... visit each vertex, order determined by DFS/BFS/...

For binary Trees, several well-defined visiting orders exist:

preorder (NLR) ... visit root, then left subtree, then right subtree
inorder (LNR) ... visit left subtree, then root, then right subtree
postorder (LRN) ... visit left subtree, then right subtree, then root
level-order ... visit root, then all its children, then all their children

... Tree Traversal 43/45

Consider visiting an expression tree like:

NLR: + * 1 3 - * 5 7 9    (prefix-order: useful for building tree)
LNR: 1 * 3 + 5 * 7 - 9    (infix-order: "natural" order)
LRN: 1 3 * 5 7 * 9 - +    (postfix-order: useful for evaluation)
Levl: + * - 1 3 * 9 5 7    (level-order: useful for ??)

... Tree Traversal 44/45

Traversal (with parameterised visit operation):

void Traverse(Tree t, void (*visit)(Item))
{
   if (t != NULL) {
     // put "visit data" here for NLR
     Traverse(t->left, visit);
     // put "visit data" here for LNR
     Traverse(t->right, visit);
     // put "visit data" here for LRN
   }
}

Level-order cannot be implemented recursively.

Exercise 4: Generic Traversal 45/45

Implement a generic tree traversal function

void BSTreeTraverse(BSTree t,
                    void (*visit)(),
                    char *order)

where

order is one of "LNR", "NLR" or "LRN"
traversal carried out according to order

Produced: 13 Sep 2017