Machine Learning - Supervised
This chapter discusses supervised learning, one of the most important models of machine learning.
Algorithms for Supervised Learning
Several algorithms are available for supervised learning. Some of the most widely used algorithms are listed below.
-
k-Nearest Neighbours
-
Decision Trees
-
Naive Bayes
-
Logistic Regression
-
Support Vector Machines
In this chapter, let's discuss each algorithm in detail.
k-Nearest Neighbours
The k-Nearest Neighbours, also known as kNN, is a statistical technique that can be used for solving for classification and regression problems. As we will see below, kNN can be used to classify an unknown object. Consider the distribution of objects in the image below.
Source:
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
If you run the kNN classifier on the above dataset, the boundaries for each type of object will be marked as shown below.
Source:
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Consider the following unknown object that you would like to classify as red, green, or blue.
By measuring the distance between this unknown data point and every other point in the data set, it is possible to conclude that it belongs to a class of blue objects visually. Upon doing so, you will see that the majority of its neighbours are blue. As the average distance to red and green objects is definitely greater than the average distance to blue objects, this unknown object can be categorized as belonging to the blue class.
Regression problems can also be solved using the kNN algorithm, which is available in most ML libraries as ready-to-use.
Decision Trees
Below is a flowchart showing a simple decision tree.
In this scenario, you are trying to classify an incoming email to decide when to read it by writing code based on the flowchart.
As a Machine Learning enthusiast, you should master these techniques of creating and traversing decision trees since they can be large and complex.
Naive Bayes
If you want to sort out (classify) fruits of different kinds from a fruit basket, naive bayes is used. Color, size and shape can be used to identify a fruit. For instance, any fruit that is red in color, round in shape and about 10 cm in diameter can be considered an apple. To train the model, these features would be used and the probability of matching the desired constraints would be tested. For Naive Bayes classification, the probabilities of different features are combined to determine whether a given fruit is an Apple or not.
Logistic Regression
The following diagram shows the XY distribution of data points.
The diagram shows the separation of red dots from green dots. To classify a new data point, you must determine on which side of the boundary line it lies.
Support Vector Machines
In the following distribution, the three classes of data cannot be linearly separated. In such a case, finding the equation of the curve becomes complex.
Source: http://uc-r.github.io/svm
In such cases, Support Vector Machines (SVM) are useful for determining separation boundaries.