Top 100 Python Machine Learning MCQs with Answers (2026 Updated)

Python has become the top choice for Machine Learning because it combines simplicity with immense power. Machine Learning involves training algorithms to find patterns in data and make predictions, and Python makes this process smooth. It has specialised libraries like Scikit-learn for building models, Pandas for data handling, and NumPy for calculations.

Since Machine Learning involves numerous libraries and concepts, we are presenting you a set of 100 Python Machine Learning MCQs covering a wide range of topics, designed to help you prepare for ML interviews and examinations in 2026.

100 Python Machine Learning MCQs

These 100 Python Machine Learning MCQs cover all key concepts with simple, clear explanations, making them one of the best ways to master AI ML.

For in-depth preparation, check out our article: 50 Machine Learning Interview Questions You Must Know in 2026, featuring the most frequently asked questions with detailed answers.

Q1. Which type of machine learning involves training a model on labeled data?

A. Unsupervised Learning
B. Supervised Learning
C. Reinforcement Learning
D. Semi-supervised Learning

Show Answer

Answer: B
Supervised learning uses labeled datasets to train algorithms to classify data or predict outcomes accurately.

Q2. In the context of Python machine learning, which library is primarily used for data manipulation and analysis?

A. NumPy
B. Matplotlib
C. Pandas
D. Scikit-learn

Show Answer

Answer: C
Pandas provides data structures like DataFrames and functions to manipulate numerical tables and time series efficiently.

Q3. What is the primary purpose of the “train_test_split” function in Scikit-learn?

A. To train the model faster
B. To split the dataset into training and testing sets
C. To improve the accuracy of the model
D. To handle missing values in the dataset

Show Answer

Answer: B
This function divides the dataset into two parts: one for training the model and the other for evaluating its performance.

Q4. Which of the following is a regression algorithm in machine learning?

A. K-Nearest Neighbors (KNN)
B. Support Vector Machine (SVM)
C. Linear Regression
D. K-Means Clustering

Show Answer

Answer: C
Linear Regression is used for predicting a continuous dependent variable based on one or more independent variables.

Q5. What does the term “overfitting” mean in machine learning models?

A. The model is too simple to capture the underlying trend.
B. The model performs well on training data but poorly on unseen data.
C. The model takes too long to train.
D. The model has high bias and low variance.

Show Answer

Answer: B
Overfitting occurs when a model learns the training data and noise too well, failing to generalize to new data.

Q6. Which method is used in Python to standardize the features of a dataset?

A. MinMaxScaler
B. StandardScaler
C. LabelEncoder
D. OneHotEncoder

Show Answer

Answer: B
StandardScaler standardizes features by removing the mean and scaling to unit variance.

Q7. In a Confusion Matrix for a binary classification, what does the term “True Positive” represent?

A. The model predicted negative, and it was actually negative.
B. The model predicted positive, and it was actually negative.
C. The model predicted positive, and it was actually positive.
D. The model predicted negative, and it was actually positive.

Show Answer

Answer: C
True Positive indicates cases where the model correctly predicts the positive class.

Q8. Which Python library is most commonly used for plotting and visualizing data in machine learning?

A. Scikit-learn
B. TensorFlow
C. Matplotlib
D. Keras

Show Answer

Answer: C
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

Q9. What type of machine learning is used for grouping similar data points without predefined labels?

A. Regression
B. Classification
C. Clustering
D. Dimensionality Reduction

Show Answer

Answer: C
Clustering is an unsupervised learning technique used to group similar instances into clusters.

Q10. Which algorithm is known as the “Lazy Learner” because it does not learn a model during training?

A. Decision Tree
B. Logistic Regression
C. K-Nearest Neighbors (KNN)
D. Naive Bayes

Show Answer

Answer: C
KNN stores the training data and makes predictions based on the closest neighbors at query time.

Q11. What is the main function of the “fit()” method in Scikit-learn estimators?

A. To make predictions on new data
B. To train the model on the provided data
C. To evaluate the model performance
D. To visualize the model results

Show Answer

Answer: B
The fit method trains the model by learning parameters from the training data.

Q12. Which metric is best suited for evaluating a regression model’s performance?

A. Accuracy
B. F1 Score
C. Mean Squared Error (MSE)
D. Precision

Show Answer

Answer: C
MSE measures the average of the squares of the errors, indicating how close the regression line is to the data points.

Q13. In Decision Trees, what is the process of reducing the depth of the tree to prevent overfitting called?

A. Pruning
B. Branching
C. Splitting
D. Rooting

Show Answer

Answer: A
Pruning removes sections of the tree that provide little power to classify instances, simplifying the model.

Q14. Which technique is used to convert categorical variables into numerical format for machine learning models?

A. Normalization
B. Standardization
C. One-Hot Encoding
D. Imputation

Show Answer

Answer: C
One-Hot Encoding creates binary columns for each category, allowing algorithms to process categorical data.

Q15. What does the “n_clusters” parameter specify in the K-Means algorithm?

A. The number of iterations
B. The number of features
C. The number of clusters to form
D. The number of data points

Show Answer

Answer: C
The n_clusters parameter defines how many distinct groups the algorithm should partition the data into.

Q16. Which regularization technique adds a penalty equivalent to the absolute value of the magnitude of coefficients?

A. L1 Regularization (Lasso)
B. L2 Regularization (Ridge)
C. Elastic Net
D. Dropout

Show Answer

Answer: A
L1 Regularization or Lasso adds a penalty equal to the absolute value of the coefficient magnitude.

Q17. What is the primary goal of the Random Forest algorithm?

A. To create a single decision tree
B. To build multiple decision trees and merge them for a stable prediction
C. To reduce the number of features
D. To classify data based on distance

Show Answer

Answer: B
Random Forest is an ensemble method that combines multiple decision trees to improve accuracy and control overfitting.

Q18. Which method in Scikit-learn is used to make predictions after the model has been trained?

A. fit()
B. predict()
C. transform()
D. score()

Show Answer

Answer: B
The predict method takes input data and returns the predicted labels or values based on the trained model.

Q19. In the context of model evaluation, what does “Precision” measure?

A. The ratio of correctly predicted positive observations to the total predicted positive observations.
B. The ratio of correctly predicted positive observations to all observations in the actual class.
C. The harmonic mean of Precision and Recall.
D. The overall accuracy of the model.

Show Answer

Answer: A
Precision answers the question: “Of all instances predicted as positive, how many were actually positive?”

Q20. Which Python library provides functionality for Support Vector Machines (SVM)?

A. NLTK
B. Scikit-learn
C. OpenCV
D. PyTorch

Show Answer

Answer: B
Scikit-learn includes the sklearn.svm module, which provides classes for SVM classification and regression.

Q21. What is the primary assumption of the Naive Bayes algorithm?

A. Features are highly correlated.
B. Features are independent of each other.
C. Data is normally distributed.
D. The number of features is equal to the number of samples.

Show Answer

Answer: B
Naive Bayes assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Q22. Which technique is used to handle missing values in a dataset?

A. One-Hot Encoding
B. Imputation
C. Feature Scaling
D. Binning

Show Answer

Answer: B
Imputation replaces missing values with substituted values, such as the mean, median, or mode of the column.

Q23. What is the purpose of the “random_state” parameter in train_test_split?

A. To increase the training speed.
B. To ensure the split is the same every time the code is run.
C. To shuffle the data randomly.
D. To select the test size automatically.

Show Answer

Answer: B
Setting random_state ensures reproducibility by fixing the seed for the random number generator.

Q24. Which method is used to determine the optimal number of clusters in K-Means?

A. Confusion Matrix
B. Elbow Method
C. ROC Curve
D. Scatter Plot

Show Answer

Answer: B
The Elbow Method plots the Within-Cluster Sum of Squares (WCSS) against the number of clusters to find the “elbow” point.

Q25. What does PCA stand for in machine learning?

A. Principal Component Analysis
B. Primary Cluster Analysis
C. Probability Component Algorithm
D. Predicted Class Accuracy

Show Answer

Answer: A
Principal Component Analysis is a dimensionality reduction technique that transforms data into fewer dimensions.

Q26. In Logistic Regression, which function is used to map predictions to probabilities?

A. Linear Function
B. Sigmoid Function
C. ReLU Function
D. Tanh Function

Show Answer

Answer: B
The Sigmoid function maps any real-valued number into a value between 0 and 1, representing a probability.

Q27. What is the primary disadvantage of a single Decision Tree?

A. It is difficult to interpret.
B. It is prone to overfitting.
C. It cannot handle numerical data.
D. It requires very little data.

Show Answer

Answer: B
Decision trees can easily overfit the training data, creating complex structures that do not generalize well.

Q28. Which metric is used to evaluate the trade-off between true positive rates and false positive rates?

A. Accuracy
B. ROC Curve
C. Mean Absolute Error
D. R-Squared

Show Answer

Answer: B
The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate against the False Positive Rate.

Q29. What is the function of the “transform()” method in Scikit-learn transformers?

A. To train the transformer on data.
B. To apply the learned transformation to the data.
C. To split the data into sets.
D. To evaluate the transformation.

Show Answer

Answer: B
The transform method applies the parameters learned during fit() to the dataset to transform the data.

Q30. Which ensemble technique combines multiple models by training them sequentially, correcting the errors of previous models?

A. Bagging
B. Boosting
C. Stacking
D. Voting

Show Answer

Answer: B
Boosting algorithms like AdaBoost and Gradient Boosting train weak learners sequentially to correct previous errors.

Q31. In Scikit-learn, what is the purpose of the “Pipeline” class?

A. To create data visualizations.
B. To chain multiple processing steps and a model into one object.
C. To deploy models to production.
D. To handle large datasets in memory.

Show Answer

Answer: B
Pipeline assembles several steps that can be cross-validated together while setting different parameters.

Q32. Which loss function is typically used for binary classification problems?

A. Mean Squared Error
B. Binary Cross-Entropy
C. Hinge Loss
D. Huber Loss

Show Answer

Answer: B
Binary Cross-Entropy measures the performance of a classification model whose output is a probability value.

Q33. What does the term “bias” represent in the bias-variance tradeoff?

A. Error introduced by approximating a real-world problem with a simplified model.
B. Error due to sensitivity to small fluctuations in the training set.
C. Error due to noise in the data.
D. The difference between predicted and actual value.

Show Answer

Answer: A
Bias is the error resulting from incorrect assumptions in the learning algorithm, often leading to underfitting.

Q34. Which Python command is used to install the Scikit-learn library?

A. pip install sklearn
B. pip install scikit-learn
C. python install sklearn
D. install scikit-learn

Show Answer

Answer: B
The correct package name for installation via pip is ‘scikit-learn’, although it is imported as ‘sklearn’.

Q35. In Gradient Descent, what determines the step size taken towards the minimum of the loss function?

A. Iteration count
B. Learning Rate
C. Batch size
D. Momentum

Show Answer

Answer: B
The Learning Rate is a hyperparameter that controls how much to change the model in response to the estimated error.

Q36. Which method of the KNN class in Scikit-learn returns the indices of the nearest neighbors?

A. predict()
B. kneighbors()
C. fit()
D. score()

Show Answer

Answer: B
The kneighbors method finds the K-neighbors of a point and returns indices and distances to those neighbors.

Q37. Which problem arises when a model is too simple to capture the complexity of the data?

A. Overfitting
B. Underfitting
C. Data Leakage
D. Multicollinearity

Show Answer

Answer: B
Underfitting occurs when a model is too simple to learn the underlying structure of the data.

Q38. What is the default kernel used by SVM (SVC) in Scikit-learn?

A. linear
B. poly
C. rbf
D. sigmoid

Show Answer

Answer: C
The Radial Basis Function (rbf) kernel is the default choice for Support Vector Classification in Scikit-learn.

Q39. In Pandas, which method is used to display the first few rows of a DataFrame?

A. head()
B. tail()
C. info()
D. describe()

Show Answer

Answer: A
The head() method returns the first n rows of a DataFrame, defaulting to 5 rows.

Q40. Which of the following is an unsupervised learning task?

A. Spam Detection
B. Image Classification
C. Dimensionality Reduction
D. Stock Price Prediction

Show Answer

Answer: C
Dimensionality reduction, like PCA, is unsupervised as it does not require labeled output data.

Q41. What is the range of values for the R-squared metric in regression?

A. 0 to 1
B. -1 to 1
C. 0 to Infinity
D. -Infinity to 1

Show Answer

Answer: D
R-squared typically ranges from negative infinity to 1, where 1 indicates a perfect fit.

Q42. Which hyperparameter in Random Forest determines the number of trees in the forest?

A. max_depth
B. n_estimators
C. min_samples_split
D. max_features

Show Answer

Answer: B
The n_estimators parameter defines the number of trees to be built in the ensemble model.

Q43. What is the main function of GridSearchCV?

A. To visualize the grid of data.
B. To exhaustively search over specified parameter values for an estimator.
C. To reduce the grid size of the dataset.
D. To perform grid-based clustering.

Show Answer

Answer: B
GridSearchCV is used to tune hyperparameters by evaluating all possible combinations from a grid of values.

Q44. Which distance metric is commonly used by the KNN algorithm for continuous variables?

A. Manhattan Distance
B. Euclidean Distance
C. Hamming Distance
D. Cosine Similarity

Show Answer

Answer: B
Euclidean distance is the straight-line distance between two points in Euclidean space, standard for KNN.

Q45. In Scikit-learn, which class is used to perform Label Encoding?

A. OneHotEncoder
B. LabelEncoder
C. OrdinalEncoder
D. MinMaxScaler

Show Answer

Answer: B
LabelEncoder converts labels into a numeric form ranging from 0 to n_classes-1.

Q46. What does “Cross-Validation” primarily help in assessing?

A. The speed of the algorithm.
B. How the results of a statistical analysis will generalize to an independent dataset.
C. The amount of memory used.
D. The complexity of the code.

Show Answer

Answer: B
Cross-validation assesses how well a model performs on new data by using different subsets for training and testing.

Q47. Which attribute of a fitted LinearRegression model contains the coefficients of the features?

A. coef_
B. intercept_
C. score_
D. params_

Show Answer

Answer: A
The coef_ attribute holds the estimated coefficients for the linear regression problem.

Q48. What is the effect of increasing the value of K in KNN?

A. It increases the model complexity and creates overfitting.
B. It smooths the decision boundary and reduces noise sensitivity.
C. It makes the training time longer.
D. It has no effect on the model.

Show Answer

Answer: B
Increasing K reduces the impact of noise on the classification but makes the classification boundary less distinct.

Q49. Which of the following is NOT a valid splitting criterion for Decision Trees in Scikit-learn?

A. gini
B. entropy
C. log_loss
D. mse

Show Answer

Answer: D
MSE (Mean Squared Error) is used for regression trees, not classification trees which use gini or entropy.

Q50. What is the role of a “Loss Function” in machine learning?

A. To measure the accuracy of the model.
B. To quantify the difference between predicted and actual values.
C. To clean the data.
D. To split the data.

Show Answer

Answer: B
The loss function evaluates how well the algorithm models the data, guiding the optimization process.

Q51. Which Pandas function is used to drop missing values from a DataFrame?

A. fillna()
B. dropna()
C. isnull()
D. interpolate()

Show Answer

Answer: B
dropna() removes rows or columns containing null or missing values from the DataFrame.

Q52. In the Bias-Variance tradeoff, high variance usually leads to which problem?

A. Underfitting
B. Overfitting
C. Data Leakage
D. Low Complexity

Show Answer

Answer: B
High variance indicates that the model is too sensitive to the training data, resulting in overfitting.

Q53. Which algorithm is widely used for face recognition and dimensionality reduction?

A. PCA
B. SVM
C. KNN
D. Linear Regression

Show Answer

Answer: A
PCA is effective at reducing high-dimensional data (like images) into lower dimensions while retaining variance.

Q54. What does the “test_size” parameter in train_test_split specify?

A. The size of the training set.
B. The proportion of the dataset to include in the test split.
C. The size of the total dataset.
D. The number of features in the test set.

Show Answer

Answer: B
test_size determines the fraction of data allocated for testing the model (e.g., 0.2 for 20%).

Q55. Which metric is suitable for evaluating models on imbalanced datasets?

A. Accuracy
B. F1 Score
C. Mean Squared Error
D. R-Squared

Show Answer

Answer: B
The F1 Score considers both precision and recall, making it a better metric for imbalanced classes than accuracy.

Q56. Which layer type is not found in Scikit-learn but is fundamental to Neural Networks?

A. Dense Layer
B. Convolutional Layer
C. Decision Node
D. Leaf Node

Show Answer

Answer: A
Dense and Convolutional layers are concepts used in deep learning libraries like TensorFlow/Keras, not Scikit-learn.

Q57. How is the Mean Absolute Error (MAE) calculated?

A. Square of the average of errors.
B. Average of the absolute differences between predicted and actual values.
C. Sum of squared errors.
D. Square root of the mean squared error.

Show Answer

Answer: B
MAE measures the average magnitude of errors in a set of predictions, without considering their direction.

Q58. Which NumPy function is used to create an array of zeros?

A. np.empty()
B. np.zeros()
C. np.ones()
D. np.full()

Show Answer

Answer: B
np.zeros() returns a new array of given shape and type, filled with zeros.

Q59. What is “Feature Scaling”?

A. Selecting the most important features.
B. Normalizing the range of independent variables.
C. Increasing the number of features.
D. Converting features to strings.

Show Answer

Answer: B
Feature scaling standardizes the range of features so that each contributes equally to the distance calculation.

Q60. Which algorithm can be used for both classification and regression tasks?

A. Linear Regression
B. Logistic Regression
C. Decision Tree
D. K-Means

Show Answer

Answer: C
Decision Trees can be adapted for both classification (predicting classes) and regression (predicting continuous values).

Q61. What does the “fit_transform()” method do?

A. Fits the model and transforms the data in one step.
B. Only transforms the data.
C. Only fits the model.
D. Predicts the output.

Show Answer

Answer: A
fit_transform() combines the fit() and transform() methods for convenience on the training data.

Q62. Which criterion is used to measure the quality of a split in Decision Trees?

A. Accuracy
B. Gini Impurity or Entropy
C. Mean Squared Error
D. R-Squared

Show Answer

Answer: B
Gini Impurity and Entropy are metrics used to evaluate how well a split separates the classes.

Q63. What is “Bagging” in ensemble learning?

A. Training models sequentially.
B. Training multiple models on random subsets of data and averaging their predictions.
C. Selecting the best single model.
D. Reducing the number of features.

Show Answer

Answer: B
Bagging (Bootstrap Aggregating) reduces variance by training multiple models on different samples of the data.

Q64. Which Scikit-learn module contains various distance metrics?

A. sklearn.metrics
B. sklearn.neighbors
C. sklearn.preprocessing
D. sklearn.linear_model

Show Answer

Answer: A
sklearn.metrics contains functions to calculate various performance metrics and distance computations.

Q65. What is the output of the “predict_proba()” method in classifiers like LogisticRegression?

A. The predicted class label.
B. The probability of the input belonging to each class.
C. The accuracy score.
D. The confusion matrix.

Show Answer

Answer: B
predict_proba() returns the probability estimates for each class label.

Q66. Which method is used to create a correlation matrix in Pandas?

A. cov()
B. corr()
C. describe()
D. mean()

Show Answer

Answer: B
The corr() method computes pairwise correlation of columns, excluding NA/null values.

Q67. What type of scaling transforms features to a given range, usually between 0 and 1?

A. StandardScaler
B. MinMaxScaler
C. RobustScaler
D. Normalizer

Show Answer

Answer: B
MinMaxScaler transforms features by scaling each feature to a given range, typically [0, 1].

Q68. In the context of SVM, what is a “Support Vector”?

A. The centroid of the dataset.
B. Data points that are closest to the hyperplane and influence its position.
C. Outliers in the dataset.
D. The average of the features.

Show Answer

Answer: B
Support Vectors are the critical data points that help define the maximum margin of the hyperplane.

Q69. Which regularization technique is known for shrinking coefficients to zero for feature selection?

A. Ridge (L2)
B. Lasso (L1)
C. Elastic Net
D. Dropout

Show Answer

Answer: B
Lasso (L1) regularization can shrink some coefficients to exactly zero, effectively selecting features.

Q70. What is the purpose of the “random_state” parameter in Random Forest?

A. To increase the number of trees.
B. To ensure reproducibility of the results.
C. To handle missing values.
D. To speed up the computation.

Show Answer

Answer: B
Setting random_state ensures that the random bootstrapping of data produces the same result each time.

Q71. Which attribute of a Decision Tree model can be used to check the depth of the tree?

A. tree_depth_
B. max_depth
C. get_depth()
D. depth

Show Answer

Answer: C
get_depth() is a method that returns the depth of the decision tree after it has been fitted.

Q72. What is the “Curse of Dimensionality”?

A. Too much data.
B. The phenomenon where the feature space becomes increasingly sparse as dimensions increase.
C. The high cost of training models.
D. The inability to visualize data.

Show Answer

Answer: B
As dimensions increase, the amount of data needed to generalize accurately grows exponentially.

Q73. Which function is used to calculate the dot product of two arrays in NumPy?

A. np.cross()
B. np.dot()
C. np.multiply()
D. np.sum()

Show Answer

Answer: B
np.dot() computes the dot product of two arrays, a fundamental operation in linear algebra for ML.

Q74. Which method is used to save a trained Scikit-learn model to a file?

A. save()
B. dump() from joblib or pickle
C. export()
D. write()

Show Answer

Answer: B
joblib.dump() or pickle.dump() is used to serialize the model object to a file on disk.

Q75. What does a high “Recall” value indicate?

A. The model predicts very few false positives.
B. The model predicts very few false negatives.
C. The model is very precise.
D. The model is very accurate.

Show Answer

Answer: B
High recall means the model correctly identifies most of the relevant cases (positive class), minimizing false negatives.

Q76. Which Scikit-learn class implements the K-Means clustering algorithm?

A. sklearn.cluster.KMeans
B. sklearn.cluster.KNN
C. sklearn.model.KMeans
D. sklearn.linear.KMeans

Show Answer

Answer: A
KMeans is the class located in the sklearn.cluster module for performing K-Means clustering.

Q77. What is the primary input to the “fit()” method for supervised learning?

A. Only X (features)
B. Only y (labels)
C. Both X and y
D. Only the file path

Show Answer

Answer: C
In supervised learning, fit() requires both the input features (X) and the target labels (y) to learn mapping.

Q78. Which type of machine learning involves an agent learning to make decisions by performing actions in an environment?

A. Supervised Learning
B. Unsupervised Learning
C. Reinforcement Learning
D. Semi-supervised Learning

Show Answer

Answer: C
Reinforcement Learning is about taking suitable action to maximize reward in a specific situation.

Q79. Which Pandas function is used to read a CSV file into a DataFrame?

A. pandas.read_csv()
B. pandas.load_csv()
C. pandas.open_csv()
D. pandas.import_csv()

Show Answer

Answer: A
read_csv() is the standard Pandas function to read a comma-separated values file into a DataFrame.

Q80. What is the key difference between Standardization and Normalization?

A. Standardization rescales to [0,1], Normalization rescales to mean 0.
B. Standardization rescales to mean 0, variance 1; Normalization rescales to a range [0,1].
C. There is no difference.
D. Normalization handles outliers better.

Show Answer

Answer: B
Standardization (Z-score) centers data around 0 with unit variance, while Normalization (Min-Max) scales to a bounded range.

Q81. Which method in Scikit-learn is used to calculate the accuracy of a classifier?

A. accuracy_score()
B. r2_score()
C. mean_squared_error()
D. f1_score()

Show Answer

Answer: A
accuracy_score computes the accuracy, the fraction of correct predictions over total predictions.

Q82. What does a confusion matrix look like for a binary classification problem?

A. 1×1 Matrix
B. 2×2 Matrix
C. 3×3 Matrix
D. Linear Array

Show Answer

Answer: B
A binary classification confusion matrix is a 2×2 table showing TP, FP, FN, and TN counts.

Q83. Which hyperparameter of SVM controls the margin hardness?

A. kernel
B. C
C. gamma
D. degree

Show Answer

Answer: B
The C parameter trades off correct classification of training examples against maximization of the decision margin.

Q84. What is the primary function of an activation function in a neural network?

A. To calculate the error.
B. To introduce non-linearity into the network.
C. To normalize the input data.
D. To reduce the number of parameters.

Show Answer

Answer: B
Activation functions allow neural networks to learn complex patterns by introducing non-linear properties.

Q85. Which gradient descent variant uses the whole dataset to compute the gradient?

A. Stochastic Gradient Descent
B. Mini-batch Gradient Descent
C. Batch Gradient Descent
D. Gradient Boosting

Show Answer

Answer: C
Batch Gradient Descent calculates the error for each example in the training dataset but updates the model only after all training examples have been evaluated.

Q86. In Scikit-learn, what does the “score()” method typically return for a classifier?

A. The mean squared error.
B. The R-squared value.
C. The accuracy score.
D. The confusion matrix.

Show Answer

Answer: C
For classifiers, the score method returns the mean accuracy on the given test data and labels.

Q87. Which technique combines predictions from multiple models to produce a final prediction?

A. Feature Selection
B. Ensemble Learning
C. Dimensionality Reduction
D. Data Augmentation

Show Answer

Answer: B
Ensemble Learning combines multiple models (like in Random Forest) to improve robustness and accuracy.

Q88. What is the role of the “gamma” parameter in the RBF kernel of SVM?

A. To define the regularization strength.
B. To define how far the influence of a single training example reaches.
C. To define the polynomial degree.
D. To define the bias term.

Show Answer

Answer: B
Gamma defines the influence of a single training example; low values mean ‘far’, high values mean ‘close’.

Q89. Which NumPy method is used to reshape an array?

A. resize()
B. reshape()
C. shape()
D. flatten()

Show Answer

Answer: B
reshape() gives a new shape to an array without changing its data.

Q90. What is “Data Leakage” in machine learning?

A. Losing data during processing.
B. When information from outside the training dataset is used to create the model.
C. Memory leak in Python code.
D. Unauthorized access to the dataset.

Show Answer

Answer: B
Data leakage occurs when data inadvertently passes information from the test set to the training set.

Q91. Which classifier is known for performing well even with small training datasets?

A. Random Forest
B. Naive Bayes
C. Deep Neural Network
D. Gradient Boosting

Show Answer

Answer: B
Naive Bayes is computationally efficient and can perform well with limited data due to its simple assumptions.

Q92. What is the function of the “intercept_” attribute in Linear Regression?

A. It represents the slope.
B. It represents the bias term (y-intercept).
C. It represents the error.
D. It represents the correlation.

Show Answer

Answer: B
intercept_ contains the independent term in the linear model, representing the value of y when all X are 0.

Q93. Which method in Scikit-learn is used to generate a classification report?

A. metrics.report()
B. metrics.classification_report()
C. metrics.summary()
D. metrics.confusion_report()

Show Answer

Answer: B
classification_report builds a text report showing the main classification metrics like precision, recall, and f1-score.

Q94. What type of variables are suitable for Label Encoding?

A. Continuous variables.
B. Ordinal categorical variables.
C. Nominal categorical variables with many levels.
D. Target variables.

Show Answer

Answer: B
Label Encoding is best for ordinal variables where the order matters (e.g., Low, Medium, High).

Q95. Which optimization algorithm is often the default for neural network training?

A. Gradient Descent
B. Adam
C. Newton’s Method
D. Simulated Annealing

Show Answer

Answer: B
Adam (Adaptive Moment Estimation) is widely used as it computes adaptive learning rates for each parameter.

Q96. In Pandas, what does the “groupby()” method do?

A. Groups data based on row index.
B. Groups data based on column values to perform aggregate functions.
C. Groups multiple DataFrames together.
D. Groups missing values.

Show Answer

Answer: B
groupby() involves splitting the object, applying a function, and combining the results.

Q97. Which metric represents the area under the ROC curve?

A. ROC-AUC
B. MSE
C. RMSE
D. MAE

Show Answer

Answer: A
ROC-AUC stands for Area Under the Receiver Operating Characteristic Curve, a common performance metric.

Q98. What is the main advantage of using a virtual environment in Python for ML projects?

A. It speeds up code execution.
B. It manages dependencies separately for different projects.
C. It provides more memory.
D. It visualizes data better.

Show Answer

Answer: B
Virtual environments isolate project libraries to prevent version conflicts between different projects.

Q99. Which algorithm uses “purity” to determine the best split?

A. Linear Regression
B. Decision Tree
C. K-Means
D. PCA

Show Answer

Answer: B
Decision Trees use metrics like Gini Impurity or Entropy to measure node purity and decide splits.

Q100. What is the outcome of a “one-vs-rest” (OvR) strategy in multiclass classification?

A. It fits one classifier per class, treating it as the positive class and all others as negative.
B. It fits one classifier for all classes simultaneously.
C. It ignores one class and classifies the rest.
D. It rests the model after every iteration.

Show Answer

Answer: A
OvR is a heuristic method for multi-class classification involving training multiple binary classifiers.

Conclusion

That’s it for this 100 Python Machine Learning MCQs question bank! Remember, you do not need to memorise these questions or answers. Simply bookmark this page and review it twice a week. Through consistent repetition, within just a few weeks, you will naturally begin to remember these concepts and develop a deep understanding of Machine Learning.

Now, if you want to go further in Python and strengthen your interview preparation, check out this article:

100 Python Interview Questions

You can also explore questions focused on specific libraries and frameworks:

These resources will help you understand and prepare for real-world Python development and interviews.