#### Types of ML Classification Algorithms:

Classification Algorithms can be further divided into the mainly two category: ##### Linear regression model:

It is a supervised learning algorithm which predicts the outcome and outcome has be to continuous and constant slope. It also predict the values that are in within continuous range like amount instead of classifying them into two different categories. Further, linear regression model consists of two common algorithms as shown in figure. Let’s understand the two common linear regression model algorithms in detail are as follows:

##### Logistic Regression:

: It is one of the most common supervised learning algorithm and that it used for classification problems. Its method is based on the Maximum Likelihood estimation. It can predict the categorical dependent variable using independent variables and produce outcome between 0 and 1 via activation function, passing weighted inputs sum. Sometimes, activation function, called sigmoid function; while an obtained curve is known as sigmoid curve. It may also require to find probabilities between two different classes. Like, it might rain today and to calculate this estimation, the dataset should be error free.

Logistic regression equation:

Linear equation: a =b0+ b1 X1+ b2 X2+⋯…….+bn Xn

Sigmoid function: S(a)=1⁄((1+e-a)

Where, S (a) = sigmoid function

e = Euler’s number

Replace a in the sigmoid function with the linear equation:

logit(S):ln⁡(S⁄((1-S) ))= b0+ b1 X1+ b2 X2+⋯……..bn Xn

Now, let’s understand with the help of an example, how to implement logistic regression
using by jypter library of Python:

##### Step 1: Import python modules

from matplotlib import pyplot as plt

from sklearn.datasets import make_classification

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import confusion_matrix

import pandas as pd

Firstly, it is compulsory to import libraries; where, “Matplotlib” library will help to analysis process in the data manipulation and sklearn library imports the text data and stores in the input data variable. However, “make classification” is used to generate dataset which is present in sklearn.datasets . Also, “LogisticRegression” is imported from sklearn.linear_model to perform a model and train_test_split: is imported from sklearn.model_selection to split dataset into training and test datasets. The confusion matrix is also imported from sklearn.metrics to produce the confusion matrix of the classifiers and Pandas is used for managing the datasets.

##### Step 2: Input dataset

x, y = make_classification(

n_samples=100,

n_features=1,

n_classes=2,

n_clusters_per_class=1,

flip_y=0.03,

n_informative=1,

n_redundant=0,

)

In the above program, two variables have been taken, namely dependent variable(y) and Independent variable(x). Now, to produce the dataset using the “make_classification” function. It also specify the number of samples, the number of feature, number of classes and other parameters.

##### Step 3: Calculate the dataset

plt.scatter(x, y, c=y, cmap=’rainbow’)

plt.title(‘Scatter Plot of Logistic Regression’)

plt.show()

Thus, the values of the different variables can be calculated once the model is created using the fit function. Now, to plot values in scatter way, scatter plots the values of existing data will fit on logistic regression line across.

##### Step 4: Visualise the dataset

x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1)

log_reg = LogisticRegression()

log_reg.fit(x_train, y_train)

Now, “training dataset” is used to train the model. While, “test dataset” is used to test the model’s performance based on the new data. ##### Output of Logistics Regression

Output:

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,

intercept_scaling=1, max_iter=100, multi_class=’ovr’, n_jobs=1,

penalty=’l2′, random_state=None, solver=’liblinear’, tol=0.0001,

verbose=0, warm_start=False)

• ##### Support Vector Machines (SVM):

It is a common, supervised Learning algorithms; used for both classification and regression problems. However, at a primary stage, used for classification problems. The aim of this algorithm is to develop a better line or decision boundary. The best decision boundary also known as a hyperplane and high vectors or points to create it. The aim of SVM, to find hyperplane; which separate two objects and classes as shown in figure. Suppose, let us consider two hyperplanes to check the margins shown by X1 and X2. Where, X1 > X2 margins, the hyperplane that divides the good one between the green and blue planes in a new plan. • However, high vectors are known as support vectors and hence it is termed as support vector machine. It also divide n-dimensional space into different classes. So that, in future, new data point can be in the accurate category.

Now, let’s understand with the help of an example, how to implement support vector machine(SVM) using by jypter library of Python:

##### Step 1:Import python modules

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets.samples_generator import make_blobs

Firstly, import all the python modules. In support vector machine; has a function called “make_blobs”. Where, this function; is a part of arn.datasets.samples_generator. In package, all the methods generates the data samples or datasets. Thus, scikit-learn makes the datasets. Further, which are used to calculate the efficiency of the models.

##### Step 2: Input dataset

X, Y = make_blobs(n_samples=350, centers=2, random_state=0, cluster_std=0.20)

The function “make_blobs” helps to generate the datasets.

##### Step 3: Calculate the dataset

plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap=’winter’);

plt.show()

xfit = np.linspace(-1, 3.5)

for m, b, d in [(1, 0.55, 0.30), (0.2, 1.8, 0.25), (-0.1, 2.6, 0.1)]:

yfit = m * xfit + b

plt.plot(xfit, yfit, ‘-k’)

plt.fill_between(xfit, yfit – d, yfit + d, edgecolor=’none’,

color=’#AAAAAA’, alpha=0.4)

plt.xlim(-1, 3.5);

plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap=’winter’)

plt.show()

The values of the different variables can be calculated once the model is created using the fit function. Now, to plot values in scatter way, scatter plots the values of existing data will fit around the on linear regression line. Now, to check the accuracy of the model, following code is given below and an output
of accuracy is 0.9875

from sklearn.linear_model import LinearRegression

regressor = LinearRegression().fit(X, Y)

from sklearn.metrics import r2_score

print(r2_score(regressor.predict(X), Y))

• ##### Non-linear regression models:

Simply when a function that is not; called as Non-linear. It has high degree of polynomials which are nonlinear. Besides, sin or cos; trigonometric functions are nonlinear. However, square roots are also nonlinear. Now, let’s understand with the help of an example, how to implement non- linear regression using by jypter library of Python. Thus, the explanation of non- linear method is similar to linear models.

##### Step 1: Import python modules

To import python modules use:

Import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

##### Step 2: To enter dataset

X = np.random.randn(120,1)

Y = np.random.uniform(-10,10,(120,))

##### Step 3: Calculate the dataset

X = np.hstack((X, X*X))

Z = (3*X[:,1] + Y)

##### Step 4: Visualise the dataset

plt.scatter(X[:, 0], Z)

plt.show()

plt.scatter(X[:, 1], Z)

plt.show() ##### Output: Nonlinear Regression

Now, to check the accuracy of the model, following code is given below and an output of
accuracy is 55.3383.

from sklearn.linear_model import LinearRegression

regressor = LinearRegression().fit(X, Y)

from sklearn.metrics import r2_score

print(r2_score(regressor.predict(X), Y))

Further, non-linear regression model consists of four common algorithms as shown in figure. Let’s understand the four common non-linear regression model algorithms in detail are as follows:

• ##### K-Nearest Neighbours:

It is useful for classification problems. It evaluate the distance between an input and test data and then give outcome. Sometime, it predicts on the basis of similarity concept between the raw data and new data. Further, it divides the present case and new case into different category. In such cases, new data can be easily identified and classified into suitable category. It is also a non-parametric algorithm, which do not make any assumption basis on the underlying data. As, kNN evaluate the distance between two different data points. To calculate , Euclidean Distance formula equation is:

Where, d(a,b) = √∑ni=1(bi-ai)2

a , b are two different points in Euclidean.

bi and ai , are two different euclidean vectors which starts from the initial point.

n , defines the n-space in Euclidean.

Now, let’s illustrate an example. In the given figure, plotted the training data set and
consists of predictions namely, six blue and six orange. The aim, classify the data point marked with a black cross (x).

The steps are as follows:

1. Select the value of K = 3
2. Select the closest observation around the cross as shown in below figure. Where, there are two blue dots and one orange dot around it.
3. 3. Calculate the probability for each class as shown :
4. P(blue class | observation) = 2/3

P(orange class | observation) = 1/3

5. In the given figure, since; blue class has the maximum probability. Therefore, classify the black cross to the blue class belonging.
6. Repeat the process, until all the data points are classified. Now, let’s understand with the help of an example, how to implement KNN using by jypter library of Python.

##### Step 1: Import the python modules

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets.samples_generator import make_blobs

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

a,b = make_blobs(n_features=2, centers=1)

The “make_blobs” function; is a part of sklearn.datasets.samples_generator to generate data. In package, all the methods generates the data samples or datasets. Thus, scikit-learn makes the datasets. Further, by default, the KNeighborsClassifier algorithm search for the closest neighbours. Now, “training dataset” is used to train the model. While, “test dataset” is used to test the model’s performance based on the new data.

##### Step 2: Input data and Visualise

plt.figure()

plt.scatter(a[:, 0], a[:, 1], c=b)

plt.savefig(‘centers_1.png’)

plt.title(‘centers = 1’)

plt.show()

Now, to plot values in scatter way, scatter plots the values of existing data. • ##### Decision tree:

: It builds both classification or regression models, in tree structure. It divide the dataset into small and small subsets. Therefore, at the same time it is associated with root (decision node) and keeps on growing. The final output consists of a tree with decision nodes and leaf nodes. Where, a decision node consists of more than two branches and a leaf node consist of classification or decision. The root node also known as a decision node because it corresponds to the better predictor. It can also operate both categorical and numerical data. The common , terminology of a decision tree are as follows:

• ##### Root node:

Determine all the population or sample. Further, it divides more than two homogeneous sets.

• ##### Splitting:

Divide a node into two or more sub-nodes.

• ##### Decision node:

Divide into another two sub-nodes.

• ##### Leaf node:

In this, nodes does not splits.

• ##### Pruning:

When, sub-nodes are removed from a decision node.

• ##### Sub-Tree:

Branch of a decision tree.

• ##### Parent and child node:

When a node, gets divided into sub-nodes is called a parent node. Whereas, sub-nodes are the child of a parent node.

• • ##### Naïve Bayes:

It is an easy and most effective classifier which can increase the processing speed of machine learning models and can predict faster. This algorithm is based on bayes theorem; to solve classification problems. Generally, it is used for text classification which consist of high-dimensional training dataset. It is also known as probabilistic classifier; which means, predicts object on the basis of the probability. For instance, spam filtration, sentimental analysis and classifying articles can be predicted easily.

Naïve Bayes equation: P(AB) = P(BA) P(A)/ P(B)

Where,

P(A|B) : Probability of hypothesis A is predicted on the basis of event B.

P(B|A) : If probability of a hypothesis is true, then evidence of Probability is given .

P(A) : Predict the evidence before the probability of hypothesis.

P(B) : Predict the evidence of Probability.

Therefore, there are three different types of distributions are present in Naive Bayes and often, implementation is known after the distribution.

• ##### Binomial Naive Bayes:

It uses a binomial distribution.

• ##### Multinomial Naive Bayes:

It uses a multinomial distribution.

• ##### Gaussian Naive Bayes:

It uses a Gaussian distribution.

When, a dataset gets mixed with other data types for the input variables then it may
need to select the different types of data distribution for all the variables. It is not
compulsory to use all the distributions are. This algorithm has been proven accurate
and useful for text classification tasks. For instance, in a word document, it consists
of binary numbers , count or frequency (tf/idf) input vectors. Where, binomial,
multinomial or Gaussian probability distributions are respectively used.

• ##### Random forest:

It is a learning group concept for classes. By several decision trees, both regression and classification are constructed by it during training time. It also determine suitable way for practice sets by overfitting their way.

Now, let’s understand with the help of an example, how to implement Random Forest using by jypter library of Python.

• ##### Step 1: Import module and dataset

from sklearn import datasets

print(iris.target_names)

print(iris.feature_names)

print(iris.data[0:5])

print(iris.target)

To build a model in random forest, use ‘load_iris()’ function . It is an in-built function in
sklearn. It consists of sepal (length and width) also petal (length and width) and other
type of flowers too. The flower is divided into three classes such as setosa, versicolor,
and Virginia. To print, the target and feature names; make sure that to you have correct
dataset. Further, the first five rows of the dataset will gets printed and also the target
variable for the whole dataset.

##### Step 2: Create a dataframe

import pandas as pd

data=pd.DataFrame({

‘sepal length’:iris.data[:,0],

‘sepal width’:iris.data[:,1],

‘petal length’:iris.data[:,2],

‘petal width’:iris.data[:,3],

‘species’:iris.target

})

DataFrame is defined as a two-dimensional labelled data structure which consists of
columns and other potentially types.

##### Step 3: Split the dataset

from sklearn.model_selection import train_test_split

X=data[[‘sepal length’, ‘sepal width’, ‘petal length’, ‘petal width’]]

y=data[‘species’]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

To split the columns into dependent variable (Y) and independent variables (X) by using
training and test set.

##### Step 4: Train the model

from sklearn.ensemble import RandomForestClassifier

clf=RandomForestClassifier(n_estimators=100)

clf.fit(X_train,y_train)

y_pred=clf.predict(X_test)

After splitting, train the model based on the training set and predict the performance
on the test dataset.

##### Step 5: Check the accuracy

from sklearn import metrics

print(“Accuracy:”,metrics.accuracy_score(y_test, y_pred))

Output: 0.9777

##### Step 6: Predict type of flower

clf.predict([[3, 5, 4, 2]])

##### Step 7: Create a random forests model

from sklearn.ensemble import RandomForestClassifier

clf=RandomForestClassifier(n_estimators=100)

##### Step 8: To see variable score

import pandas as pd

feature_imp =

pd.Series(clf.feature_importances_,index=iris.feature_names).sort_values(ascending=False)
feature_imp

##### Step 9: Visualise the dataset

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

# Creating a bar plot

sns.barplot(x=feature_imp, y=feature_imp.index)

plt.xlabel(‘Feature Importance Score’)

plt.ylabel(‘Features’)

plt.title(“Visualizing Important Features”)

plt.legend()

plt.show()

For visualization process, combine matplotlib and seaborn because Matplotlib is a
superset of seaborn and seaborn is built on the top of matplotlib library . It also provide
several customized themes and extra plot types. ##### Applications of supervised learning

Let’s understand some common applications of supervised learning are:

• ##### Bioinformatics:

Nowadays, it is one of the common applications of supervised learning. It is responsible for storing biological data of like fingerprints, iris texture and so on. Today, smartphones have ability to learn biological data and also provide security for the system. For example, Google Pixel, iPhones, Samsung, OnePlus and etc..Has the ability to provide feature like facial and finger recognition.

• ##### Speech Recognition:
• This type of application is capable to identify the voice. For example, Google Assistant and Siri. Where, supervised learning algorithms help to maintain security and communication between virtual assistants and customers.
• ##### Spam Detection:

It is an unauthorised computer based messages which can used to harm someone’s personal system. It can be present in any form like in email’s, mobile phones and etc… However, various spam emails are consists of commercial emails but inside which might contain fake links and directly which can be connected to attack the system or malware-websites. Thus, to detect spam emails, supervised learning algorithm can be used as it’s consist of several emails dataset which are assign as spam or not spam. When a new dataset is provided without any labels, an outcome can be calculated and reduce spams from several media.

• ##### Medical:

In this field, a supervised learning algorithms, can predict if a person has a disease or not. For example, by loading the reports dataset in the algorithms model then model trains itself and will predict whether a person is healthy or suffering from any disease.