Suppose you run a cable company and you are planning where to place cables so that you reach every house, but you'd like to minimize the amount of cable that you use.
Definition. A spanning tree is a subset of edges of a graph that form a tree and that connect all the vertices in the graph.
Definition. Given an edgeweighted graph $G$, the minimum spanning tree problem is to find a spanning tree of $G$ whose total weight is less or equal to any other spanning tree. We write $T_1 \le T_2$ when the total weight of the edges in tree $T_1$ is lessthan or equal to the total weight of the edges in tree $T_2$.
The following are two different minimum spanning trees for the same graph.
There are two popular algorithms for MST, I'll cover the first. The second is in the textbook.
The two algorithms have a lot in common.
But how do we find a safe edge?
We'll need some definitions to lead up to the answer.
Definitions. Suppose $G = (V,E)$ is a graph with weighted edges.
Theorem 23.1 (Light Edges are Safe Edges) Let $A$ be a subset of some MST of $G=(V,E)$. Let $(S,VS)$ be a cut of $G$ that respects $A$. Let $(u,v)$ be a light edge wrt. the cut. Then $(u,v)$ is a safe edge.
Proof. Let $T$ be a MST of $G$ that includes $A$.
Case $(u,v)$ in $T$: Then trivially, $A \cup \{(u,v)\} \subseteq T$.
Case $(u,v)$ not in $T$:
We're going to construct another MST $T'$ such that $A \cup \{(u,v)\} \subseteq T'$.
Because $T$ is spanning, $u$ and $v$ both are in $T$, so there is already a path from $u$ to $v$ in $T$. Thus, $(u,v)$ completes a cycle. A cycle with a crossing edge must have another crossing edge as shown in the following diagram. Let that other crossing edge be $(x,y)$ .
u    x
 

 
v    y
We form the new MST $T'$ by removing $(x,y)$ and adding $(u,v)$:
$T' = (T  \{(x,y)\}) \cup \{(u,v)\}$
Now we need to show that $T'$ is an MST. We know that $T' \le T$ because $(u,v)$ is a light edge, so its weight is lessorequal to that of $(x,y)$. So $T'$ is an MST.
It remains to show that $(u,v)$ is a safe edge, that is,
$A \cup \{(u,v)\} \subseteq (T  \{(x,y)\}) \cup \{(u,v)\}$
We had $A \subseteq T$, so we need to prove that $(x,y)$ not in $A$, but we have that because the cut respects $A$ and edge $(x,y)$ crosses the cut.
QED
We can think of MST algorithms as maintaining a forest of trees, where all the tree edges are in the set $A$.
Initially, each vertex is in it's own tree because $A=\{\}$.
At each step, we merge two trees into a single tree by identifying the lightest edge connecting them (such an edge is safe).
Main idea: process all the edges from lightest to heaviest.
Demo of Kruskal's algorithm on the example graph:
Edges sorted by weight:
A1B, B2E, C2F, A3E, D4E, A4D, D4B, C4E, E7F
Process A1B, union {A} and {B}:
AB C
D E F
edges: B2E, C2F, A3E, D4E, A4D, D4B, C4E, E7F
Process B2E, union {A,B} and {E}
AB C

D E F
edges: C2F, A3E, D4E, A4D, D4B, C4E, E7F
Process C2F, union {C} and {F}
AB C
 
D E F
edges: A3E, D4E, A4D, D4B, C4E, E7F
Process A3E, do nothing
edges: D4E, A4D, D4B, C4E, E7F
Process D4E, union {A,B,E} and {D}
AB C
 
DE F
edges: A4D, D4B, C4E, E7F
Process A4D, do nothing
edges: D4B, C4E, E7F
Process D4B, do nothing
edges: C4E, E7F
Process C4E, union {A,B,D,E} and {C,F}
sort the edges by increasing weight to make it easy to process lighter edges before heavier edges.
use the DisjointSets
data structure to keep track of each tree as a
separate set. Recall the DisjointSets operations:
from ipynb.fs.full.DisjointSets import *
from ipynb.fs.full.Graph import *
def kruskal_mst(g, weigth):
A = []
ds = DisjointSets(g.num_vertices())
for v in g.vertices():
ds.make_set(v)
E = list(g.edges())
E = sorted(E, key=lambda e: weight[e])
for e in E:
if not (ds.find(e.source) == ds.find(e.target)):
A.append(e)
ds.union(e.source, e.target)
return A
The following shows the result of applying Kruskal's algorithm to the example graph.
id = {'A':0,'B':1,'C':2,'D':3,'E':4,'F':5}
name = ['A','B','C','D','E','F']
named_weight = {('A','B'):1, ('A','D'):4, ('A','E'): 3,
('B','D'):4, ('B','E'):2,
('C','E'):4, ('C','F'):2,
('D','E'):4,
('E','F'):7}
weight = {UEdge(id[e[0]],id[e[1]]):w for (e,w) in named_weight.items()}
edges = weight.keys()
G = UndirectedAdjList(6, edges)
MST = kruskal_mst(G, weight)
print("MST: ", end="")
print([(name[e.source],name[e.target]) for e in MST], sep=", ")
total = 0
for e in MST:
total += weight[e]
print("weight of MST: " + str(total))
MST: [('A', 'B'), ('B', 'E'), ('C', 'F'), ('A', 'D'), ('C', 'E')] weight of MST: 13
The dominating cost is the sorting. So the overall time complexity is $O(m \log m)$ which can be instead stated as $O(m \log n)$ because $m < n²$ and therefore $\log m < 2 \log n$.
Apply Kruskal's to the following graph
Solution:
sorted edges:
A1E, G1H, B1D, D1H
F2G, A2B,
A3C,
E4F, C4H, D4G
C5D,
B8F
An MST of weight 11:
Main idea: grow a single tree, at each step choosing the lightest edge that goes from the tree to a vertex not in the tree.
Why does this work? Consider the cut that separates the growing tree with the rest of the nodes in the graph. By theorem 23.1, the lightest edge across this cut is a safe edge.
Consider again the following example graph:
We randomly choose to start at vertex A.
Of the three outedges from A, A1B
is the lightest.
AB C
D E F
The edges that leave the current tree:
A4D, A3E, B4D, B2E
The lightest is B2E
.
AB C

D E F
The edges that leave the current tree:
A4D, B4D, E4C, E4D, E7F
One of the lightest is A4D
.
AB C
 
D E F
The edges that leave the current tree:
E4C, E7F
The lightest is E4C
.
AB C
  /
D E F
The edges that leave the current tree:
E7F, C2F
The lightest is C2F
, which completes the minimum spanning tree:
Similar to Dijkstra's Shortest Paths, we need to choose the "best" among the neighbors of the current tree, so let's use a PriorityQueue
again.
from ipynb.fs.full.PriorityQueue import PriorityQueue
def create_priority_queue(num_vertices, distance):
def compare_distance(u, v):
return distance[u] > distance[v]
position = [None for i in range(0, num_vertices)]
def get_pos(v):
return position[v]
def update_pos(v, pos):
position[v] = pos
return PriorityQueue(compare_distance, update_pos, get_pos)
Unlike Dijkstra's, we don't care about the distance from the source to the neighbors, just the weight of the last edge to the neighbor. So we set the distance
to be the weight of the lightest edge to the neighbor.
def relax(e, distance, queue, parent, weight):
if distance[e.target] > weight[e]:
distance[e.target] = weight[e]
parent[e.target] = e.source
queue.increase_key(e.target)
Prim's algorithm maintains a queue of neighbors of the current tree. It repeatedly performs the following steps:
import sys
def prim_minimum_spanning_tree(g, source, weight):
parent = [i for i in range(0, g.num_vertices())]
distance = [sys.maxsize for i in range(0, g.num_vertices())]
visited = [False for i in range(0, g.num_vertices())]
queue = create_priority_queue(g.num_vertices(), distance)
distance[source] = 0
queue.push(source)
visited[source] = True
MST = []
while not queue.empty():
u = queue.pop()
if u != source:
MST.append(UEdge(parent[u], u))
for e in g.out_edges(u):
if not visited[e.target]:
queue.push(e.target)
visited[e.target] = True
relax(e, distance, queue, parent, weight)
return MST
The following shows the output of Prim's algorithm on the example graph.
MST = prim_minimum_spanning_tree(G, 0, weight)
print("MST: ", end="")
print([(name[e.source],name[e.target]) for e in MST], sep=", ")
print("weight of MST: " + str(sum([weight[e] for e in MST])))
MST: [('A', 'B'), ('B', 'E'), ('A', 'D'), ('E', 'C'), ('C', 'F')] weight of MST: 13