Quick Contact

    ML Interview Questions

    Q1). Define the term “Machine Learning”.

    It is defined as a subset of artificial intelligence (AI) technology which allow systems to learn and develop from experience automatically without being programmed specifically. The focus of machine learning is on designing computer programmes which can access and use data to learn for themselves.

    Q2). Differentiate between supervised and unsupervised machine learning?

    Supervised learning requires labelled training dataset. For instance, to train the model, firstly it needs to be classified dataset and then label into labelled groups. On the other side, unsupervised learning does not need any labelling data explicitly.

    Q3). Name the phases of the life cycle of machine learning.

    The phases of the life cycle are as follows:

    • Data gathering
    • Data preparation
    • Data wrangling
    • Data analysis
    • Data selection and verification
    • Data deployment
    Q4). What is a Linear Regression?

    It is a supervised Machine Learning algorithm that is used for predictive analysis to find the linear relationship between the dependent and the independent variables.

    The linear Regression equation is:

    y=mX +c

    Where y= Dependent variable

    x = Independent variable

    m = Coefficient of X

    c = Intercept point

    Q5). What are the types of Machine Learning?

    The Figure shows the different types of machine learning:

    • Supervised Learning:

      In this type of machine learning, under the supervision of labelled data, machines or systems learn. There is a training dataset that the computer is trained on, and according to its training, it provides the performance.

    • Unsupervised Learning:

      It has unlabelled data, unlike supervised learning. So, there is no oversight in which it operates on the results. In essence, unsupervised learning aims to recognise data patterns and create clusters of related entities. After that, it does not define the entity as new input data is fed into the model; rather, it positions the entity in a cluster of similar objects.

    • Reinforcement Learning:

      Reinforcement learning involves learning and traversing models to find the next possible step. Based on the reward and punishment principle, the algorithms for reinforcement learning are designed in a way that they try to find the best possible suite of action.

    Q6). Differentiate between classification and regression in Machine Learning.

    There are different kinds of prediction problems in machine learning that are based on supervised and unsupervised learning. There are classification, clustering and association. Here, we are going to explore classification and regression.

    • Classification:

      We try to build a model of Machine Learning in a classification that allows us to classify data into separate categories. Based on the input parameters, the data is classified and categorised. For example, you want to make assumptions on the churning out of consumers based on some recorded data for a specific product. They will either pump out the clients or they won’t. So, for this, the marks will be ‘Yes’ and ‘No.’

    • Regression:

      Instead of using groups or discrete values, it is the method of constructing a model for separating data into continuous real values. Depending on the historical evidence, it can also classify the distribution movement. It is used according to the degree of association of variables to predict the occurrence of an event. The prediction of weather conditions, for example, depends on variables such as temperature, air pressure, solar radiation, area elevation, and distance from the sea. The relationship between these variables helps us predict the state of the atmosphere.

    Q7). What is model selection in Machine Learning?

    Model selection is defined as the process of selecting models from various mathematical models that are used to define the same data set. The selection of models is applied to statistics, machine learning and data mining fields.

    Q8). Name the three stages which are required to build the hypotheses or model in machine learning.

    The three stages which are required to build the hypotheses or model in machine learning are as follows:

    • Building the model
    • Testing the model
    • Implementing the model
    Q9). What do you mean by cross-validation in machine learning?

    In Machine Learning, the cross-validation method enables a framework to improve the efficiency of the given Machine Learning algorithm to which you feed multiple sample data from the dataset. This method of sampling is done to split the dataset into smaller parts with the same number of rows, from which a random part is chosen as a test set and the rest of the parts are stored as train sets. It consists of the following techniques:

    • Holdout method
    • K-fold cross-validation
    • Stratified k-fold cross-validation
    • Leave p-out cross-validation
    Q10). Explain logistic regression in detail.

    The proper regression analysis used when the dependent variable is categorical or binary is logistic regression. Logistic regression is a tool for predictive analysis, like other regression analyses. To describe information and the relationship between one dependent binary variable and one or more independent variables, logistic regression is used. Also, it is used to estimate the likelihood of a categorical dependent variable. In the following cases, we can use logistic regression:

    • To predict whether or not a citizen is a Senior Citizen (1) (0)
    • To check whether or not a person has a disease (Yes) (No)

    Three kinds of logistic regression are available:

    • Binary Logistic Regression:

      There are only two conclusions available in this. For example, predicting whether or not it will rain (1) (0)

    • Multinomial Logistic Regression:

      In this, three or more unordered groups consist of the performance. For example, regional language prediction (Kannada, Telugu, Marathi, etc.)

    • Ordinal Logistic Regression:

      The output consists of three or more ordered categories in ordinal logistic regression. For example, rating an application for Android from 1 to 5 stars.

    Q11). What do you mean by overfitting in machine learning and how can you avoid it?

    When a computer has an incomplete dataset, overfitting occurs and it attempts to learn from it. Overfitting is, therefore, inversely proportional to the amount of information. By the cross-validation process, we can bypass overfitting for small databases. We will split the dataset into two parts in this method. Testing and training sets will contain these two pieces by using these train the model for new inputs.

    Q12). What is SVM (Support Vector Machines) in machine learning?

    SVM is an algorithm for Machine Learning that is primarily used for classification. It is used on top of the characteristic vector’s high dimensionality.

    Now, let’s understand with the help of an example, how to implement support vector machine(SVM) by using jypter library of Python:

    Step 1: Import the python modules

    import numpy as np

    import matplotlib.pyplot as plt

    from sklearn.datasets.samples_generator import make_blobs

    Firstly, import all the python modules. In support vector machine; has a function called “make_blobs”. Where, this function; is a part of sklearn.datasets.samples_generator. In the package, all the methods generate the data samples or datasets. Thus, scikit-learn makes the datasets. Further, which are used to calculate the efficiency of the models.

    Step 2: Input data

    X, Y = make_blobs(n_samples=350, centers=2, random_state=0, cluster_std=0.20)

    The function “make_blobs” helps to generate the datasets.

    Step 3: Calculate the dataset

    plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap=’winter’);


    xfit = np.linspace(-1, 3.5)

    for m, b, d in [(1, 0.55, 0.30), (0.2, 1.8, 0.25), (-0.1, 2.6, 0.1)]:

    yfit = m * xfit + b

    plt.plot(xfit, yfit, ‘-k’)

    plt.fill_between(xfit, yfit – d, yfit + d, edgecolor=’none’,

    color=’#AAAAAA’, alpha=0.4)

    plt.xlim(-1, 3.5);

    plt.scatter(X[:, 0], X[:, 1], c=Y, s=50, cmap=’winter’)


    The values of the different variables can be calculated once the model is created using the fit function. Now, to plot values in scatter way, scatter plots the values of existing data will fit around the on linear regression line.

    Output: Support vector machine

    Now, to check the accuracy of the model, the following code is given below and output of accuracy is 0.9875

    from sklearn.linear_model import LinearRegression

    regressor = LinearRegression().fit(X, Y)

    from sklearn.metrics import r2_score

    print(r2_score(regressor.predict(X), Y))

    Q13). Define the Bayes’ theorem in machine learning?

    Using prior knowledge, the Bayes’ theorem provides the likelihood of any given occurrence occurring. In mathematical terms, the true positive rate of the given sample condition can be defined as divided by the sum of the true positive rate of the said condition and the false positive rate of the population as a whole.

    Bayesian optimization and Bayesian belief networks are two of the most common applications of the Bayes’ theorem in Machine Learning. The basis behind the Machine Learning brand that includes the Naive Bayes classifier is also this theorem.

    Q14). Differentiate between ‘Training set’ and ‘Test set’.

    A collection of data is used in various fields of information technology such as machine learning to discover the potentially predictive relationship known as the ‘Training Set.’ The training set is an example offered to the learner, while the test set is used to test the accuracy of the learner’s hypotheses, and it is the example set maintained by the learner. The training set is different from the test set.

    Q15). Explain the difference between KNN and K- means clustering?

    K-Nearest Neighbours is an algorithm for supervised machine learning where we need to provide the model with the labelled data and then identify the points based on the point’s distance from the nearest points. Though K-Means clustering, on the other hand, is an unsupervised machine learning algorithm, we need to provide unlabelled data to the model and this algorithm classifies points into clusters based on the mean of the distances between different points.

    Copyright 1999- Ducat Creative, All rights reserved.