Minimum Spanning Trees

Suppose you run a cable company and you are planning where to place cables so that you reach every house, but you'd like to minimize the amount of cable that you use.

Definition. A spanning tree is a subset of edges of a graph that form a tree and that connect all the vertices in the graph.

Definition. Given an edge-weighted graph $G$, the minimum spanning tree problem is to find a spanning tree of $G$ whose total weight is less or equal to any other spanning tree. We write $T_1 \le T_2$ when the total weight of the edges in tree $T_1$ is less-than or equal to the total weight of the edges in tree $T_2$.

The following are two different minimum spanning trees for the same graph.

Example minimum spanning tree Another minimum spanning tree

There are two popular algorithms for MST, I'll cover the first. The second is in the textbook.

The two algorithms have a lot in common.

How to Identify Safe Edges

But how do we find a safe edge?

We'll need some definitions to lead up to the answer.

Definitions. Suppose $G = (V,E)$ is a graph with weighted edges.

Theorem 23.1 (Light Edges are Safe Edges) Let $A$ be a subset of some MST of $G=(V,E)$. Let $(S,V-S)$ be a cut of $G$ that respects $A$. Let $(u,v)$ be a light edge wrt. the cut. Then $(u,v)$ is a safe edge.

Proof. Let $T$ be a MST of $G$ that includes $A$.


Key Idea: Merge Little Trees into Larger Trees

Kruskal's Algorithm

Main idea: process all the edges from lightest to heaviest.

Demo of Kruskal's algorithm on the example graph:

Example graph

Edges sorted by weight:

A-1-B, B-2-E, C-2-F, A-3-E, D-4-E, A-4-D, D-4-B, C-4-E, E-7-F

Implementation of Kruskal's Algorithm

The following shows the result of applying Kruskal's algorithm to the example graph.

Output of Kruskal's Algorithm

Time Complexity

The dominating cost is the sorting. So the overall time complexity is $O(m \log m)$ which can be instead stated as $O(m \log n)$ because $m < n²$ and therefore $\log m < 2 \log n$.

Student Exercise

Apply Kruskal's to the following graph

Graph for exercise


sorted edges:

       A-1-E, G-1-H, B-1-D, D-1-H
       F-2-G, A-2-B, 
       E-4-F, C-4-H, D-4-G

An MST of weight 11:

Minimum Spanning Tree

Prim's Algorithm

Main idea: grow a single tree, at each step choosing the lightest edge that goes from the tree to a vertex not in the tree.

Why does this work? Consider the cut that separates the growing tree with the rest of the nodes in the graph. By theorem 23.1, the lightest edge across this cut is a safe edge.

Consider again the following example graph:

Example graph

We randomly choose to start at vertex A.

Of the three out-edges from A, A-1-B is the lightest.

  A--B  C

  D  E  F

The edges that leave the current tree:

A-4-D, A-3-E, B-4-D, B-2-E

The lightest is B-2-E.

  A--B  C
  D  E  F

The edges that leave the current tree:

A-4-D, B-4-D, E-4-C, E-4-D, E-7-F

One of the lightest is A-4-D.

  A--B  C
  |  |
  D  E  F

The edges that leave the current tree:

E-4-C, E-7-F

The lightest is E-4-C.

  A--B  C
  |  | /
  D  E  F

The edges that leave the current tree:

E-7-F, C-2-F

The lightest is C-2-F, which completes the minimum spanning tree:

Another minimum spanning tree


Similar to Dijkstra's Shortest Paths, we need to choose the "best" among the neighbors of the current tree, so let's use a PriorityQueue again.

Unlike Dijkstra's, we don't care about the distance from the source to the neighbors, just the weight of the last edge to the neighbor. So we set the distance to be the weight of the lightest edge to the neighbor.

Prim's algorithm maintains a queue of neighbors of the current tree. It repeatedly performs the following steps:

The following shows the output of Prim's algorithm on the example graph.

Output of Prim's Algorithm