Kruskal’s Algorithm is a mathematical model that finds the minimum spanning forest of an undirected weighted-edge graph, which is a lot of words that most people will find difficult to understand.

Since Kruskal’s algorithm is an incredibly useful greedy algorithm, we’ll do our best to break down everything you need to know about it to start using it right away!

## What is a Minimum Spanning Forest?

A minimum spanning forest is a type of graph that connects the points using the smallest possible weightage footprint while including all points on the graph. A minimum spanning forest will organize the point connections by weightage, removing all connections that would result in a circuit, then connect the connections with the lowest sum of weightage of the edges.

There are two types of weighted graphs. A graph with weighted vertices assigns its weight to the graph’s vertices, while a weighted edge graph assigns weight to the edges, the connections *between* the vertices.

It’s important to know which section of the graph is weighted when using weighted graphs, as both vertex and edge-weighted graphs are called “weighted graphs.”

## Kruskal’s Algorithm: How Does It Work?

Kruskal’s algorithm can be explained using a simple example of a weighted-edge graph and how Kruskal’s algorithm would organize the data in the graph.

Our biggest weakness in explaining Kruskal’s algorithm with pictures is not that we do not know how to apply Kruskal’s algorithm to an accurately made graph; it is that we do not know how to draw a weighted edge graph (because this team is not one of data scientists). But we did our best after looking at lots and lots of graphs.

The first step to applying Kruskal’s algorithm will be removing any loops or parallel edges. Now, in this application, which is graph theory, *parallel* edges are not actually parallel but rather edges that exist between the same two points. So, we’ll remove that loop on point A and between C and D.

That gives us the following:

Now we have to assign weights to these points. This graph is actually meaningless and has no data associated with it. So, we just assigned random weights to the graph’s edges and made up a reason why that weight made sense. Thus, we come to the following graph:

Once the graph is weighted properly, it’s time to black out all the interconnects between the points and start analyzing them using Kruskal’s algorithm, which means sorting out the lowest-weighted way to get all the points connected –– *without* making any circuits. So, the path *cannot* loop back over itself if followed with a pencil.

The first weight we can see –– in numerical order –– is 1. Since including this edge in the graph won’t create a circuit, we can just include it without any worries.

Now, we have to deal with the next weight, which is 2. Now including all three edges that have this weight will create a *circuit* –– a group of edges that all interconnect in a complete loop. So, we’ll drop the one that looks the longest (but you could drop any of them since they all have the same weight.)

Finally, we’ll need to move to the next weight, 3. Including this edge doesn’t create a circuit, so we will. However, including any of the remaining edges will create a circuit. So, we won’t include them, meaning our minimal spanning forest graph is complete!

## Kruskal’s Algorithm: What’s It Used For and When Should You Use It?

Kruskal’s sorting algorithm is typically used to find *the shortest path between two points*.

So, while our example simply sorted all the points by weight, the actual application of Kruskal’s algorithm tends to be to *find the shortest path between two points while including all data points present.* When using Kruskal’s algorithm, you might want to find the distance between points D and A. You’d get the same graph we did, but you’d be solving the graph for a different reason.

## Kruskal’s Algorithm: Proof of Correctness

We want to warn everyone who isn’t particularly interested or well-versed in math and computer science that this section might be a little bit hard to understand. While mathematical proofs aren’t exclusive knowledge for the tech-and-math-lovers, they’re a lot easier to understand under those conditions. But we’ll try to break it down into simple words.

The proof for Kruskal’s algorithm has two parts. First, we have to prove that Kruskal’s algorithm *actually* produces a spanning tree. Once we’ve proved that Kruskal’s algorithm has produced a spanning tree graph, we must prove that the graph’s overall weight is minimal.

### Proof that Kruskal’s Algorithm Produces a Minimum Spanning Tree

We will use the proof written by the Indian Institute of Science, found here. Mathematical proofs are a very specific field that the average person does not have the skills to produce and should not feel comfortable producing without the proper education.

The following proof from the Indian Institute of Science proves that Kruskal’s Algorithm produces a minimum-spanning tree.

Theorem: Kruskal’s algorithm finds a minimum spanning tree.

Proof: Let G = (V, E) be a weighted, connected graph. Let T be the edge set that is grown in Kruskal’s algorithm. The proof is by mathematical induction on the number of edges in T.We show that, if T is promising at any stage of the algorithm, then it is still promising when a new edge is added to it in Kruskal’s algorithm.

When the algorithm terminates, it will happen that T gives a solution to the problem and, hence, an MST.

Basis: T = $ \phi$ is promising since a weighted connected graph always has at least one MST.

Induction Step: Let T be promising just before adding a new edge e = (u, v). The edges T divide the nodes of G into one or more connected components. U and v will be in two different components. Let U be the set of nodes in the component that includes u.Note that U is a strict subset of V, T is a promising set of edges such that no edge in T leaves U (since an edge T either has both ends in U or has neither end in U), and e is a least cost edge that leaves U (since Kruskal’s algorithm, being greedy, would have chosen e only after examining edges shorter than e).

The above three conditions are precisely like in the MST Lemma and hence we can conclude that the T$ \cup$ {e} is also promising. When the algorithm stops, T gives not merely a spanning tree but a minimal spanning tree, since it is promising.

J.B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem.Proceedings of the American Mathematical Society, Volume 7, pp. 48-50, 1956.

## Kruskal’s Algorithm: Pseudocode

Pseudocode is a plain language used in the style of programming languages. Rather than using any specific coding language, pseudocode is designed to be applicable to all programming languages, so long as you understand the language you’re trying to translate the pseudocode into.

The pseudocode for Kruskal’s algorithm is as follows:

algorithm Kruskal(G) is

```
algorithm Kruskal(G) is
F:= ∅
for each v ∈ G.V do
MAKE-SET(v)
for each (u, v) in G.E ordered by weight(u, v), increasing do
if FIND-SET(u) ≠ FIND-SET(v) then
F:= F ∪ {(u, v)} ∪ {(v, u)}
UNION(FIND-SET(u), FIND-SET(v))
return F
```

This code introduces the “forest” (F) as a set of edges to the computer, then has it use a disjoint-data structure to determine whether two points are part of the same tree.

## Final Thoughts

Kruskal’s algorithm is an excellent example of a greedy algorithm that does exactly what it’s supposed to, and does so efficiently! Kruskal’s algorithm is a good way to generate minimal spanning tree graphs, and it helps that its code implementation is so simple.

If you’re someone who loves graph theory, you can put this algorithm to the test today by simply translating the pseudocode. If you love mathematics and graph theory, leave us a comment here!