Sometimes, when we’re learning new things, we all need a helping hand. Funnily enough, the situation isn’t much different when it comes to machine learning. While AI offers some extremely powerful tools for transforming the way we use technology, a lot of the time, these machines must be supervised, too.

There are many methods for carrying this out and a lot of these fall under the concept of supervised learning. If you’re wondering what this is all about and want a simple breakdown of the various types of supervised learning, look no further.

## What Exactly Is Supervised Learning?

When we think of supervised learning, teachers and students usually come to mind. A teacher might observe to see where a student’s making mistakes, support them before handing them a test, or even give an open-book exam.

Similarly, supervised machine learning is where the machines have been given some support. This generally equates to the datasets they’re being trained on being partially or completely labeled.

Essentially, this means that the data input is already mapped to the output, with the goal of the machine “learning” the relationships between the inputs and outputs. Therefore, the ultimate objective is to have the machine trained well enough to generalize this to new inputs and compute correct outputs by itself.

### Example

To demonstrate, let’s consider house prices. We may have a database full of information, with which we hope to predict house prices based on certain characteristics. These could be things like the location, size, and number of bedrooms.

In this case, the dataset that we use to train the machine may consist of many such examples, usually in the thousands. These inputs, i.e. the house’s parameters, would all be mapped to their correct outputs, i.e. the total house price.

The machine would ideally figure out a suitable function that correctly maps the inputs to the outputs after examining all of the data. In theory, we could then use the machine to predict future house prices without labeling the inputs for it.

## What Are the Different Types of Supervised Learning?

While many kinds of algorithms belong to the supervised learning family, they all fall into one (or both) of two groups: classification and regression. When we talk about classification, we’re talking about grouping data into categories, or classes. In this way, we want to produce a discrete output variable, which has a finite value.

An example would be if we want to classify customer reviews as negative, positive, or neutral. The output variable, i.e. the review type, would be discrete since it can only have 1 of 3 potential values.

On the other hand, regression is concerned with outputs that are continuous values, usually numbers. For example, if we want to predict future stock prices based on past data, the output would be in numerical form and, therefore, be considered continuous. The table below gives a few more of the most common scenarios in that classification and regression are used.

Type of Supervised Learning | Uses |
---|---|

Classification | Image recognition, sentiment analysis (i.e. customer reviews), spam filtering, fraud detection |

Regression | Stock price prediction, real estate valuation, medical diagnosis, sales forecasting |

Now we’ve covered the two main types of supervised learning, let’s get a little more detailed while keeping it simple. First, a brief overview of some of the most-used supervised learning algorithms, and if they’re considered classification or regression.

Algorithm | Category |
---|---|

Naive Bayes | Classification |

Neural Networks | Classification and Regression |

Decision Trees | Classification and Regression |

Random Forests | Classification and Regression |

Support Vector Machines (SVMs) | Classification and Regression |

Logistic Regression | Classification |

K-Nearest Neighbors (KNN) | Classification and Regression |

Gradient Boosting | Classification and Regression |

Linear Discriminant Analysis (LDA) | Classification |

Interestingly, we can see that most of these algorithms can be used for both classification and regression projects. Let’s examine each one more closely.

## #1: Naive Bayes

Mostly used for classification tasks, Naive Bayes refers to a method of predicting a category label from an input. Naive Bayes is best used for large datasets with finite data.

The “Naive” part comes from the fact the algorithm assumes that each input does not affect the other, and the “Bayes” part is because the algorithm is based on Bayes’ theorem for calculating the probability of something happening.

Although predictions can be inaccurate, Naive Bayes remains a good algorithm for complicated datasets. One of the most common situations where Naive Bayes is used is in email spam filtering.

## #2: Decision Trees

You can think of a decision tree as a kind of flowchart. Like most flowcharts, each step has a conditional statement, which, depending on the result, moves you down either direction of the chart. Each node represents a test, each branch represents an outcome and the endpoint “leaf” nodes represent a class label.

Decision trees can be used for classification or regression, depending on whether you’re dealing with discrete or continuous values. Examples would be to use a tree to classify a species of animal, or for predicting a price range for a vehicle.

## #3: Random Forests

Used for both types of supervised learning, random forests are made up of a collection of decision trees, much as real forests are made up of ordinary trees. Random forests are known as ensemble algorithms since they consist of several individual models (in this case, decision trees.)

The principle behind random forests is to reduce overfitting, which is where a model is trained too much to fit data, so it struggles to accommodate new data. Random forests also help to discover interactions between different features that a single decision tree wouldn’t be able to do.

Some of the areas where random forests are used include medicine, speech and image recognition, and finance, where the outputs are complicated.

## #4: Neural Networks

This is the sort of machine learning model behind many of the currently talked about AI software, like BERT, T5, and ChatGPT. Basically, neural networks are modeled on the human brain. In this way, they’re designed to consist of nodes (sort of like neurons) that are connected to each other in multiple layers.

These layers can communicate with each other, and the key behind neural networks’ ability to learn from data is the capacity to communicate errors backward through the network, adjusting each neural connection to better reflect the true output.

Neural networks can be used for many tasks, including image and speech recognition, different kinds of classification and regression, and natural language processing (NLP.) NLP receives a lot of attention, as it’s concerned with how humans communicate with machines. As such, NLP is the driving force behind the most popular neural network AI applications.

## #5: Support Vector Machines (SVMs)

SVMs sound pretty cool, but they’re also pretty complicated. Essentially, SVMs work by mapping data points in a space that’s usually 1 dimension higher than the points we’re concerned with. The goal of this is to then separate the points with a line so that each type of data falls on either side of the line.

For example, if we have some green and yellow data points in a two-dimensional space, the SVM will aim to construct a line that separates the green and yellow points. The best-case scenario is to find a boundary that gives the maximum distance between the different kinds of data points, leading to better accuracy in classifying the data.

SVMs can be used in many of the previously discussed applications, such as speech recognition, spam filtering, and finance, but also in bioinformatics (where technology is used to assist with biological problems, like classifying genes.)

## #6: Logistic Regression

Surprisingly enough, logistic regression is a type of classification learning, not regression. Logistic regression is used in situations where we want to predict a binary label, that is, an output that can have 1 of 2 values.

For example, if we take the situation where we want to predict whether a customer will make a purchase based on some factors, such as their income and where they live. Logistic regression would be trained on a lot of this kind of data and then used to predict the outcome, i.e. yes or no.

## #7: K-Nearest Neighbors (KNN)

Another algorithm with an acronym, k-nearest neighbors, or KNN, is probably simpler than the name suggests. The “k” in the name means the number of neighboring data values the algorithm will use to make its prediction.

Let’s consider the task of classifying animals, as either fish, birds, or mammals. Each type of animal would have features that are represented by numerical values in a dimensional space. The distance in this space between the pre-existing data values and the value we want to classify would be calculated.

If we take a k value of 3, then we would compare this value with the 3 closest existing values. Finally, we would classify the animal based on the majority of its nearest neighbors, hence the name. For regression tasks, KNN is used similarly, but we would take the average of the neighboring values rather than the majority.

## #8: Gradient Boosting

Like random forests, gradient boosting is an example of an ensemble algorithm. Gradient boosting also relies on the results of multiple decision trees, but works slightly differently. While random forests combine the predictions of the trees, gradient boosting works by correcting the errors of previous models as it creates new models, with each new model being more accurate.

We can use gradient boosting if we want to predict whether a customer will buy something. The decision trees are trained on the data, and then each tree created after that is informed by the results of the previous tests. Gradient boosting can be more accurate than random forests, but does take more computing power.

## #9: Linear Discriminant Analysis (LDA)

Linear discriminant analysis, or LDA, can be used for binary classification, as well as multi-classification. The aim here is to classify data by finding a combination of attributes that best separates the data types. This combination is linear, as it combines two or more variables. The linear combination is known as a discriminant because it’s used to distinguish between the data categories.

If we consider the problem of identifying flowers of 4 species by their petal color and petal width, LDA would be used to find a linear combination that separates these species the most. We would calculate the mean and variance for each feature. These are the average value and the variability in the data respectively.

After this, we calculate the difference between the mean values for each type, then look for the combination that leads to the variance between types being much greater than the variance within each type. This means that the flower types are very well separated, making the model more accurate and efficient.

## Wrapping Up

Overall, supervised learning is a very widely used type of machine learning. Every supervised learning algorithm can be categorized as used for classification or regression tasks.

However, some can be used for both. Some of the most-used supervised learning algorithms include neural networks, Naive Bayes, random trees, gradient boosting, and logistic regression. If you’re interested in working in the machine learning industry, understanding the algorithms at play is crucial.

The image featured at the top of this post is ©metamorworks/Shutterstock.com.