Quick Contact
Machine Learning Tutorial
 Introduction to Machine Learning
 Classification Algorithm
 Types of ML Classification Algorithms
 Types of Machine Learning
 Supervised learning
 Applications of unsupervised learning
 Unsupervised Learning
 Reinforcement learning
Machine Learning Interview Question & Answers
Introduction to Unsupervised Learning
It is also a technique for machine learning in which the model does not need to be trained by users. Its aim is to deals with the unlabelled data. In order to discover patterns and data that were not previously identified, it allows the model to work on it itself. The algorithm let users to perform more complex tasks. Thus, it is more unpredictable algorithm as compared with other natural learning concepts. For example, clustering, neural networks, etc.The figure shows the working of the unsupervised learning:
The algorithm first examine the input raw data, present in the dataset and recognises various
patterns. Then, identified patterns are used to extract useful information from the given unlabelled dataset. Finally, the model able to make sense of the knowledge by itself.
Types of unsupervised learning
In unsupervised learning, its focus on identifying items rather than predicting the output. Its consists of mainly two types of methods: Clustering and Association. Further, these types are described in detail.
Let’s understand clustering type and its algorithm in detail.

Clustering algorithms:
In this method, the algorithm divides the datasets into different types of groups. For example, data points which comes under in a specific group are more similar to another ones. To build clusters, different types of algorithms are used like KMeans clustering algorithm, meanshift clustering algorithm, DBSCAN clustering algorithm, hierarchical clustering algorithm. Here, some of the tasks which can performed by clustering analysis are as follows:
 Classify an objects based on their features
 In a library, books can be kept, according to the books of a genre
 Identify user group; with the help of same behaviour

KMeans clustering algorithm:
It is an unsupervised learning algorithm and in which no labelled data are present for clustering. Its aim, to reduce the distances between the sum of data point and their respective clusters. It also divides objects into clusters. Those objects have same similarities are kept together and dissimilar objects are kept separately into another cluster. It is also a centroidbased algorithm, where every clusters are linked with a centroid.
However, an algorithm consider unlabelled dataset as an input and distribute the dataset into k number of clusters. Therefore, repeat the procedure until the best find clusters are not tified. Where, in this algorithm, the value of k should be predefined and consist of two main tasks are as follows:
 Calculate the best value for Kcentre points.
 Assigns every data points to its nearest kcentre. A cluster is generated by those data points those are close to the specific kcentre.

Mean shift clustering algorithm:
: It is a common type of unsupervised learning. The working of an algorithm is based on the method, called KDE (Kernel Density Estimation) and also known as mode seeking algorithm. To develop a model for the machine learning at the primary stage, it is linked with the maximum density points or mode value. Where, the Kernel is connected with statistical computation which is related to the weightage of the data points. The computer vision and image segmentation is commonly used for this algorithm. To fulfil the kernel function, the following conditions are required:
 Ensure the estimation of kernel density to be normalized.
 Associate KDE with the space of symmetry.
Further, it consists of two main popular kernel functions like flat kernel and Gaussian Kernel.

Flat Kernel:
This kernel does not assure to have densest points around the center. The center can be associated with a single kernel which might covers the two or more than center clusters at its edge.

Gaussian Kernel:
This kernel assure to have densest points around the center. The standard deviation would act like bandwidth parameter for this kernel.
Now, let’s understand with the help of an example, how to implement Mean shift clustering by using jypter library of Python.
Step 1: Import the python modules
import numpy as np
from sklearn.cluster import MeanShift
import matplotlib.pyplot as plt
from matplotlib import style
%matplotlib inline
style.use(“ggplot”)
from sklearn.datasets.samples_generator import make_blobs
Here, “Numpy” is used to calculate the efficiency of the data set. Where, “Matplotlib” is used for data visualization. The (“ggplot”) define the grammar of graphics in Sklearn.The function, called “make_blobs”; part of sklearn.datasets.samples_generator. In which all the methods generates the data samples or datasets.
Step 2: Input data
centers = [[1,1,1],[1,2,2],[3,8,8]]
X, _ = make_blobs(n_samples = 500, centers = centers, cluster_std = 0.5)
plt.scatter(X[:,0],X[:,1])
plt.show()
By using “make_blobs”, enter the values and calculate. The below figure contain 2D dataset for four different blobs.
Output: 2D dataset
Step 3: Calculate and Visualise dataset
ms = MeanShift()
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_
print(cluster_centers)
n_clusters_ = len(np.unique(labels))
print(“Estimated clusters:”, n_clusters_)
colors = 10*[‘r.’,’g.’,’b.’,’c.’,’k.’,’y.’,’m.’]
for i in range(len(X)):
plt.plot(X[i][0], X[i][1], colors[labels[i]], markersize = 3)
plt.scatter(cluster_centers[:,0],cluster_centers[:,1],
marker = “.”,color = ‘b’, s = 10, linewidths = 5, zorder = 10)
plt.show()
To calculate the generate 2D dataset for four different blobs, mean shift is applied.
Output: Mean shift clustering

DBSCAN clustering algorithm:
DBSCAN stands for DensityBased Spatial Clustering of Applications with noise based on density clustering. This algorithm refers to unsupervised learning which identify particular groups or clusters in the dataset, based on the assumption which cluster is continuous in the data region with low/high point density. It can also determine different types of clusters, i.e., shapes and sizes from a huge amount of dataset which can be contain noise and outliers. Generally, it uses two basic parameters:

minPts:
To be consider as a dense, it should clustered minimum number of points together.

eps (ε):
This parameter, distance measure, use to track points in its neighbourhood.
Further, to understand these parameters, use the method of density reachability and density connectivity and figure consist of main three part of an algorithm.

Density reachability:
It develops a point that is reachable from another point, only if it lies within a certain distance from it.

Density connectivity:
It is based a transitivity chaining method which determine, whether points are lies in a specific cluster or not.
Now, let’s understand with the help of an example, how to implement DBSCAN clustering using by jypter library of Python.
Step 1: Import the python modules
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
%matplotlib inline
Where, “StandardScaler” function removes the mean and assign each feature to unit variance.
Step 2: Input data
centers = [[1, 1], [1, 1], [1, 1]]
X, labels_true = make_blobs(n_samples=550, centers=centers, cluster_std=0.4,
random_state=0)
X = StandardScaler().fit_transform(X)
The code is used to generate data using make_blobs.
Step 3: Calculate DBSCAN
db = DBSCAN(eps=0.2, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
n_clusters_ = len(set(labels)) – (1 if 1 in labels else 0)
n_noise_ = list(labels).count(1)
print(‘Estimated number of clusters: %d’ % n_clusters_)
print(‘Estimated number of noise points: %d’ % n_noise_)
print(“Homogeneity: %0.3f” % metrics.homogeneity_score(labels_true, labels))
print(“Completeness: %0.3f” % metrics.completeness_score(labels_true, labels))
print(“Vmeasure: %0.3f” % metrics.v_measure_score(labels_true, labels))
print(“Adjusted Rand Index: %0.3f” % metrics.adjusted_rand_score(labels_true, labels))
print(“Adjusted Mutual Information: %0.3f” % metrics.adjusted_mutual_info_score(labels_true, labels))
print(“Silhouette Coefficient: %0.3f” % metrics.silhouette_score(X, labels))
It is used to calculate the DBSCAN and clusters them in labels by ignoring noise if it is present in the output and plot final result.
Step 4: Visualise dataset
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == 1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (labels == k)
xy = X[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], ‘o’, markerfacecolor=tuple(col),
markeredgecolor=’k’, markersize=10)
xy = X[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], ‘o’, markerfacecolor=tuple(col),
markeredgecolor=’k’, markersize=5)
plt.title(‘Estimated number of clusters: %d’ % n_clusters_)
plt.show()
For visualisation of data , black are removed and instead used for noise.
Output: DBSCAN clustering


Hierarchical clustering algorithm:
It is a common type of unsupervised learning algorithm. It is used to cluster unlabelled data points in the dataset. Like Kmeans clustering, also cluster the data points together with same characteristics. On the other hand, in some cases, the outcome of hierarchical and KMeans clustering may be equal.The algorithm is classified into two main types are as follows:

Agglomerative hierarchical clustering:
It is a “bottomup” technique in which every assumption starts from itself cluster and combine pairs of clusters as a one moves towards hierarchy.

Divisive hierarchical clustering:
It is a “topdown” technique in which where all the observations are assigned to a single cluster and then divide the cluster into two least similar clusters and it continues the process until each cluster is allocated for each observation. However, the figure shows about the difference between these two types:
Now, let’s understand with the help of an example, how to implement hierarchical clustering by using jypter library of Python.
Step 1: Import the python modules
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
import numpy as np
from sklearn.cluster import AgglomerativeClustering
The method of hierarchical clustering is similar just like to other unsupervised machine learning algorithm. Then, import the needful libraries.
Step 2: Input data
X = np.array([[2,3], [11,14],[13,15],[20,10],[18,25],[60,68],[73,80],[65,88],[45,50],[70,95],])
Input data to generate the output.
Step 3: Calculate the dataset
cluster = AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, linkage=’ward’)
cluster.fit_predict(X)
print(cluster.labels_)
Import the function, called fit_predict, which predicts the clusters. It predict those clusters which belong to the data point. Further, “AgglomerativeClustering” class from the “sklearn.cluster” calculates the number of parameters. Then, these parameters are set to use n_clusters parameter, while the affinity is set to the “euclidean”. Thus, the link parameter is set as a “ward”; that reduces the variant between the clusters.
Step 4: Visualise the dataset
plt.scatter(X[:,0],X[:,1], c=cluster.labels_, cmap=’rainbow’)
Finally, plot the clusters data points, to get the output.
Output: Hierarchical clustering
By using certain rules, association helps to develop relationships between different data points
present in the huge datasets. Like, online shopping websites are generally use it to make recommendations to clients based on their online previously history. It uses association rule
mining technique to form association. 

Association rule mining (ARM):
It is also a type of unsupervised learning method, which is to check one data item’s dependence on another data item. It also match accordingly so that, can be more profitable. It attempts to find any interesting relationship or correlations between the dataset variables. The algorithm uses different types of rules to identify the better relationship between variables which are present in the dataset.
For example, Market based analysis, it is one of the important approach used in association rule mining. To show associations between data items it uses a large dataset. It also allow retailers to recognise the relationships between the items so that, customers can buy together them frequently. Such as, if a customer wants to buy bread, after entering the shop he may can buy butter, eggs or milk because these dairy products are aligned within the same shelf.
Now, let’s understand in detail are these how rules are associated with association algorithm of machine learning.

Support:
It calculates how frequently an item has occurred in the dataset. The support of X with respect to T. Where, transaction (T) is defined as the proportion present in the dataset and also that consist of the itemset (X). The equation of the support is:
Support(X) = Frequency itemset X/Transaction T

Confidence:
In this, true is an indicator of how much it has been found that the rule is valid or in the dataset, how often X and Y items occur together, when X event is already given. It define as the ratio between the transaction which contains X and Y which refers to the number of records which contain X. The equation of the confidence is:
Confidence = Frequency of X and Y/Frequency of X

Lift:
It can define the strength for any rule or the ratio between the support observed and that predicted if X and Y were independent. The equation of the lift is:
Lift = Support X and Y/Support X ×Support Y
Further, association rule learning are classified into three algorithms are as follows:


Apriori Algorithm:
This algorithm is similar to the association rule mining technique. It is also used for mining similar item sets. For example, in a shop, customer can buy things related to the similar products like bread, butter or milk and etc.

Eclat Algorithm:
To achieve mining of itemset, the eclat algorithm is applied. The mining of Itemset allows to get periodic patterns in the dataset. For example, if a constomer went in a shop to buy butter, he can also buy eggs. The aim of this algorithm, to use set relationships to calculate the support of a consumer itemset. To count the columns, it also works based on a depthfirst search. Thus, this algorithm works faster than the Apriori algorithm.

FP Growth Algorithm:
This algorithm is uses databases and not streams. Although, an Apriori algorithm requires n+1 scans until a database is not fully used; where n refers to the longest model length. By using the FPGrowth concept, for the complete database, the number of scans can be reduces to two.
Hence, every outcome has cluster data points with some similarities and dissimilarities are away from other clusters. The figure show the working of the Kmeans clustering algorithm:
Now, let’s understand with the help of an example, how to implement Kmeans clustering by using jypter library of Python.
Step 1: Import the python modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
%matplotlib inline
Here, “Pandas” library is used to read and write the spreadsheets. The “Numpy” is used to calculate the efficiency of the data set. Where, “Matplotlib” is used for data visualization.
Step 2: Input data and calculate
X= 2 * np.random.rand(100,2)
X1 = 1 + 2 * np.random.rand(50,2)
X[50:100, :] = X1
The above code is used to generate random data in the form of a twodimensional space.
Step 3: Visualise the dataset
plt.scatter(X[ : , 0], X[ :, 1], s = 50, c = ‘b’)
plt.show()
Now, to plot values in scatter way, scatter plots the values of existing dataset.
Step 4: Use of ScikitLearn
from sklearn.cluster import KMeans
Kmean = KMeans(n_clusters=2)
Kmean.fit(X)
Here, arbitrarily gives k (n_clusters) an arbitrary value of two.
Step 5: Find centroid
Kmean.cluster_centers_
plt.scatter(X[ : , 0], X[ : , 1], s =50, c=’b’)
plt.scatter(0.94665068, 0.97138368, s=200, c=’g’, marker=’s’)
plt.scatter(2.01559419, 2.02597093, s=200, c=’r’, marker=’s’)
plt.show()
Here, the above code is used to find the centre of the clusters.
Step 6: Algorithm testing
Kmean.labels_
sample_test=np.array([3.0,3.0])
second_test=sample_test.reshape(1, 1)
Kmean.predict(second_test)
The code is used to getting the labels property of the Kmeans clustering example dataset; that is, how the data points are categorized into the two clusters.