🚀 Supercharge your YouTube channel's growth with AI.
Try YTGrowAI FreeTop 100 Python Machine Learning MCQs with Answers (2026 Updated)

Python has become the top choice for Machine Learning because it combines simplicity with immense power. Machine Learning involves training algorithms to find patterns in data and make predictions, and Python makes this process smooth. It has specialised libraries like Scikit-learn for building models, Pandas for data handling, and NumPy for calculations.
Since Machine Learning involves numerous libraries and concepts, we are presenting you a set of 100 Python Machine Learning MCQs covering a wide range of topics, designed to help you prepare for ML interviews and examinations in 2026.
100 Python Machine Learning MCQs
These 100 Python Machine Learning MCQs cover all key concepts with simple, clear explanations, making them one of the best ways to master AI ML.
For in-depth preparation, check out our article: 50 Machine Learning Interview Questions You Must Know in 2026, featuring the most frequently asked questions with detailed answers.
Q1. Which type of machine learning involves training a model on labeled data?
A. Unsupervised Learning
B. Supervised Learning
C. Reinforcement Learning
D. Semi-supervised Learning
Show Answer
Answer: B
Supervised learning uses labeled datasets to train algorithms to classify data or predict outcomes accurately.
Q2. In the context of Python machine learning, which library is primarily used for data manipulation and analysis?
A. NumPy
B. Matplotlib
C. Pandas
D. Scikit-learn
Show Answer
Answer: C
Pandas provides data structures like DataFrames and functions to manipulate numerical tables and time series efficiently.
Q3. What is the primary purpose of the “train_test_split” function in Scikit-learn?
A. To train the model faster
B. To split the dataset into training and testing sets
C. To improve the accuracy of the model
D. To handle missing values in the dataset
Show Answer
Answer: B
This function divides the dataset into two parts: one for training the model and the other for evaluating its performance.
Q4. Which of the following is a regression algorithm in machine learning?
A. K-Nearest Neighbors (KNN)
B. Support Vector Machine (SVM)
C. Linear Regression
D. K-Means Clustering
Show Answer
Answer: C
Linear Regression is used for predicting a continuous dependent variable based on one or more independent variables.
Q5. What does the term “overfitting” mean in machine learning models?
A. The model is too simple to capture the underlying trend.
B. The model performs well on training data but poorly on unseen data.
C. The model takes too long to train.
D. The model has high bias and low variance.
Show Answer
Answer: B
Overfitting occurs when a model learns the training data and noise too well, failing to generalize to new data.
Q6. Which method is used in Python to standardize the features of a dataset?
A. MinMaxScaler
B. StandardScaler
C. LabelEncoder
D. OneHotEncoder
Show Answer
Answer: B
StandardScaler standardizes features by removing the mean and scaling to unit variance.
Q7. In a Confusion Matrix for a binary classification, what does the term “True Positive” represent?
A. The model predicted negative, and it was actually negative.
B. The model predicted positive, and it was actually negative.
C. The model predicted positive, and it was actually positive.
D. The model predicted negative, and it was actually positive.
Show Answer
Answer: C
True Positive indicates cases where the model correctly predicts the positive class.
Q8. Which Python library is most commonly used for plotting and visualizing data in machine learning?
A. Scikit-learn
B. TensorFlow
C. Matplotlib
D. Keras
Show Answer
Answer: C
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Q9. What type of machine learning is used for grouping similar data points without predefined labels?
A. Regression
B. Classification
C. Clustering
D. Dimensionality Reduction
Show Answer
Answer: C
Clustering is an unsupervised learning technique used to group similar instances into clusters.
Q10. Which algorithm is known as the “Lazy Learner” because it does not learn a model during training?
A. Decision Tree
B. Logistic Regression
C. K-Nearest Neighbors (KNN)
D. Naive Bayes
Show Answer
Answer: C
KNN stores the training data and makes predictions based on the closest neighbors at query time.
Q11. What is the main function of the “fit()” method in Scikit-learn estimators?
A. To make predictions on new data
B. To train the model on the provided data
C. To evaluate the model performance
D. To visualize the model results
Show Answer
Answer: B
The fit method trains the model by learning parameters from the training data.
Q12. Which metric is best suited for evaluating a regression model’s performance?
A. Accuracy
B. F1 Score
C. Mean Squared Error (MSE)
D. Precision
Show Answer
Answer: C
MSE measures the average of the squares of the errors, indicating how close the regression line is to the data points.
Q13. In Decision Trees, what is the process of reducing the depth of the tree to prevent overfitting called?
A. Pruning
B. Branching
C. Splitting
D. Rooting
Show Answer
Answer: A
Pruning removes sections of the tree that provide little power to classify instances, simplifying the model.
Q14. Which technique is used to convert categorical variables into numerical format for machine learning models?
A. Normalization
B. Standardization
C. One-Hot Encoding
D. Imputation
Show Answer
Answer: C
One-Hot Encoding creates binary columns for each category, allowing algorithms to process categorical data.
Q15. What does the “n_clusters” parameter specify in the K-Means algorithm?
A. The number of iterations
B. The number of features
C. The number of clusters to form
D. The number of data points
Show Answer
Answer: C
The n_clusters parameter defines how many distinct groups the algorithm should partition the data into.
Q16. Which regularization technique adds a penalty equivalent to the absolute value of the magnitude of coefficients?
A. L1 Regularization (Lasso)
B. L2 Regularization (Ridge)
C. Elastic Net
D. Dropout
Show Answer
Answer: A
L1 Regularization or Lasso adds a penalty equal to the absolute value of the coefficient magnitude.
Q17. What is the primary goal of the Random Forest algorithm?
A. To create a single decision tree
B. To build multiple decision trees and merge them for a stable prediction
C. To reduce the number of features
D. To classify data based on distance
Show Answer
Answer: B
Random Forest is an ensemble method that combines multiple decision trees to improve accuracy and control overfitting.
Q18. Which method in Scikit-learn is used to make predictions after the model has been trained?
A. fit()
B. predict()
C. transform()
D. score()
Show Answer
Answer: B
The predict method takes input data and returns the predicted labels or values based on the trained model.
Q19. In the context of model evaluation, what does “Precision” measure?
A. The ratio of correctly predicted positive observations to the total predicted positive observations.
B. The ratio of correctly predicted positive observations to all observations in the actual class.
C. The harmonic mean of Precision and Recall.
D. The overall accuracy of the model.
Show Answer
Answer: A
Precision answers the question: “Of all instances predicted as positive, how many were actually positive?”
Q20. Which Python library provides functionality for Support Vector Machines (SVM)?
A. NLTK
B. Scikit-learn
C. OpenCV
D. PyTorch
Show Answer
Answer: B
Scikit-learn includes the sklearn.svm module, which provides classes for SVM classification and regression.
Q21. What is the primary assumption of the Naive Bayes algorithm?
A. Features are highly correlated.
B. Features are independent of each other.
C. Data is normally distributed.
D. The number of features is equal to the number of samples.
Show Answer
Answer: B
Naive Bayes assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Q22. Which technique is used to handle missing values in a dataset?
A. One-Hot Encoding
B. Imputation
C. Feature Scaling
D. Binning
Show Answer
Answer: B
Imputation replaces missing values with substituted values, such as the mean, median, or mode of the column.
Q23. What is the purpose of the “random_state” parameter in train_test_split?
A. To increase the training speed.
B. To ensure the split is the same every time the code is run.
C. To shuffle the data randomly.
D. To select the test size automatically.
Show Answer
Answer: B
Setting random_state ensures reproducibility by fixing the seed for the random number generator.
Q24. Which method is used to determine the optimal number of clusters in K-Means?
A. Confusion Matrix
B. Elbow Method
C. ROC Curve
D. Scatter Plot
Show Answer
Answer: B
The Elbow Method plots the Within-Cluster Sum of Squares (WCSS) against the number of clusters to find the “elbow” point.
Q25. What does PCA stand for in machine learning?
A. Principal Component Analysis
B. Primary Cluster Analysis
C. Probability Component Algorithm
D. Predicted Class Accuracy
Show Answer
Answer: A
Principal Component Analysis is a dimensionality reduction technique that transforms data into fewer dimensions.
Q26. In Logistic Regression, which function is used to map predictions to probabilities?
A. Linear Function
B. Sigmoid Function
C. ReLU Function
D. Tanh Function
Show Answer
Answer: B
The Sigmoid function maps any real-valued number into a value between 0 and 1, representing a probability.
Q27. What is the primary disadvantage of a single Decision Tree?
A. It is difficult to interpret.
B. It is prone to overfitting.
C. It cannot handle numerical data.
D. It requires very little data.
Show Answer
Answer: B
Decision trees can easily overfit the training data, creating complex structures that do not generalize well.
Q28. Which metric is used to evaluate the trade-off between true positive rates and false positive rates?
A. Accuracy
B. ROC Curve
C. Mean Absolute Error
D. R-Squared
Show Answer
Answer: B
The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate against the False Positive Rate.
Q29. What is the function of the “transform()” method in Scikit-learn transformers?
A. To train the transformer on data.
B. To apply the learned transformation to the data.
C. To split the data into sets.
D. To evaluate the transformation.
Show Answer
Answer: B
The transform method applies the parameters learned during fit() to the dataset to transform the data.
Q30. Which ensemble technique combines multiple models by training them sequentially, correcting the errors of previous models?
A. Bagging
B. Boosting
C. Stacking
D. Voting
Show Answer
Answer: B
Boosting algorithms like AdaBoost and Gradient Boosting train weak learners sequentially to correct previous errors.
Q31. In Scikit-learn, what is the purpose of the “Pipeline” class?
A. To create data visualizations.
B. To chain multiple processing steps and a model into one object.
C. To deploy models to production.
D. To handle large datasets in memory.
Show Answer
Answer: B
Pipeline assembles several steps that can be cross-validated together while setting different parameters.
Q32. Which loss function is typically used for binary classification problems?
A. Mean Squared Error
B. Binary Cross-Entropy
C. Hinge Loss
D. Huber Loss
Show Answer
Answer: B
Binary Cross-Entropy measures the performance of a classification model whose output is a probability value.
Q33. What does the term “bias” represent in the bias-variance tradeoff?
A. Error introduced by approximating a real-world problem with a simplified model.
B. Error due to sensitivity to small fluctuations in the training set.
C. Error due to noise in the data.
D. The difference between predicted and actual value.
Show Answer
Answer: A
Bias is the error resulting from incorrect assumptions in the learning algorithm, often leading to underfitting.
Q34. Which Python command is used to install the Scikit-learn library?
A. pip install sklearn
B. pip install scikit-learn
C. python install sklearn
D. install scikit-learn
Show Answer
Answer: B
The correct package name for installation via pip is ‘scikit-learn’, although it is imported as ‘sklearn’.
Q35. In Gradient Descent, what determines the step size taken towards the minimum of the loss function?
A. Iteration count
B. Learning Rate
C. Batch size
D. Momentum
Show Answer
Answer: B
The Learning Rate is a hyperparameter that controls how much to change the model in response to the estimated error.
Q36. Which method of the KNN class in Scikit-learn returns the indices of the nearest neighbors?
A. predict()
B. kneighbors()
C. fit()
D. score()
Show Answer
Answer: B
The kneighbors method finds the K-neighbors of a point and returns indices and distances to those neighbors.
Q37. Which problem arises when a model is too simple to capture the complexity of the data?
A. Overfitting
B. Underfitting
C. Data Leakage
D. Multicollinearity
Show Answer
Answer: B
Underfitting occurs when a model is too simple to learn the underlying structure of the data.
Q38. What is the default kernel used by SVM (SVC) in Scikit-learn?
A. linear
B. poly
C. rbf
D. sigmoid
Show Answer
Answer: C
The Radial Basis Function (rbf) kernel is the default choice for Support Vector Classification in Scikit-learn.
Q39. In Pandas, which method is used to display the first few rows of a DataFrame?
A. head()
B. tail()
C. info()
D. describe()
Show Answer
Answer: A
The head() method returns the first n rows of a DataFrame, defaulting to 5 rows.
Q40. Which of the following is an unsupervised learning task?
A. Spam Detection
B. Image Classification
C. Dimensionality Reduction
D. Stock Price Prediction
Show Answer
Answer: C
Dimensionality reduction, like PCA, is unsupervised as it does not require labeled output data.
Q41. What is the range of values for the R-squared metric in regression?
A. 0 to 1
B. -1 to 1
C. 0 to Infinity
D. -Infinity to 1
Show Answer
Answer: D
R-squared typically ranges from negative infinity to 1, where 1 indicates a perfect fit.
Q42. Which hyperparameter in Random Forest determines the number of trees in the forest?
A. max_depth
B. n_estimators
C. min_samples_split
D. max_features
Show Answer
Answer: B
The n_estimators parameter defines the number of trees to be built in the ensemble model.
Q43. What is the main function of GridSearchCV?
A. To visualize the grid of data.
B. To exhaustively search over specified parameter values for an estimator.
C. To reduce the grid size of the dataset.
D. To perform grid-based clustering.
Show Answer
Answer: B
GridSearchCV is used to tune hyperparameters by evaluating all possible combinations from a grid of values.
Q44. Which distance metric is commonly used by the KNN algorithm for continuous variables?
A. Manhattan Distance
B. Euclidean Distance
C. Hamming Distance
D. Cosine Similarity
Show Answer
Answer: B
Euclidean distance is the straight-line distance between two points in Euclidean space, standard for KNN.
Q45. In Scikit-learn, which class is used to perform Label Encoding?
A. OneHotEncoder
B. LabelEncoder
C. OrdinalEncoder
D. MinMaxScaler
Show Answer
Answer: B
LabelEncoder converts labels into a numeric form ranging from 0 to n_classes-1.
Q46. What does “Cross-Validation” primarily help in assessing?
A. The speed of the algorithm.
B. How the results of a statistical analysis will generalize to an independent dataset.
C. The amount of memory used.
D. The complexity of the code.
Show Answer
Answer: B
Cross-validation assesses how well a model performs on new data by using different subsets for training and testing.
Q47. Which attribute of a fitted LinearRegression model contains the coefficients of the features?
A. coef_
B. intercept_
C. score_
D. params_
Show Answer
Answer: A
The coef_ attribute holds the estimated coefficients for the linear regression problem.
Q48. What is the effect of increasing the value of K in KNN?
A. It increases the model complexity and creates overfitting.
B. It smooths the decision boundary and reduces noise sensitivity.
C. It makes the training time longer.
D. It has no effect on the model.
Show Answer
Answer: B
Increasing K reduces the impact of noise on the classification but makes the classification boundary less distinct.
Q49. Which of the following is NOT a valid splitting criterion for Decision Trees in Scikit-learn?
A. gini
B. entropy
C. log_loss
D. mse
Show Answer
Answer: D
MSE (Mean Squared Error) is used for regression trees, not classification trees which use gini or entropy.
Q50. What is the role of a “Loss Function” in machine learning?
A. To measure the accuracy of the model.
B. To quantify the difference between predicted and actual values.
C. To clean the data.
D. To split the data.
Show Answer
Answer: B
The loss function evaluates how well the algorithm models the data, guiding the optimization process.
Q51. Which Pandas function is used to drop missing values from a DataFrame?
A. fillna()
B. dropna()
C. isnull()
D. interpolate()
Show Answer
Answer: B
dropna() removes rows or columns containing null or missing values from the DataFrame.
Q52. In the Bias-Variance tradeoff, high variance usually leads to which problem?
A. Underfitting
B. Overfitting
C. Data Leakage
D. Low Complexity
Show Answer
Answer: B
High variance indicates that the model is too sensitive to the training data, resulting in overfitting.
Q53. Which algorithm is widely used for face recognition and dimensionality reduction?
A. PCA
B. SVM
C. KNN
D. Linear Regression
Show Answer
Answer: A
PCA is effective at reducing high-dimensional data (like images) into lower dimensions while retaining variance.
Q54. What does the “test_size” parameter in train_test_split specify?
A. The size of the training set.
B. The proportion of the dataset to include in the test split.
C. The size of the total dataset.
D. The number of features in the test set.
Show Answer
Answer: B
test_size determines the fraction of data allocated for testing the model (e.g., 0.2 for 20%).
Q55. Which metric is suitable for evaluating models on imbalanced datasets?
A. Accuracy
B. F1 Score
C. Mean Squared Error
D. R-Squared
Show Answer
Answer: B
The F1 Score considers both precision and recall, making it a better metric for imbalanced classes than accuracy.
Q56. Which layer type is not found in Scikit-learn but is fundamental to Neural Networks?
A. Dense Layer
B. Convolutional Layer
C. Decision Node
D. Leaf Node
Show Answer
Answer: A
Dense and Convolutional layers are concepts used in deep learning libraries like TensorFlow/Keras, not Scikit-learn.
Q57. How is the Mean Absolute Error (MAE) calculated?
A. Square of the average of errors.
B. Average of the absolute differences between predicted and actual values.
C. Sum of squared errors.
D. Square root of the mean squared error.
Show Answer
Answer: B
MAE measures the average magnitude of errors in a set of predictions, without considering their direction.
Q58. Which NumPy function is used to create an array of zeros?
A. np.empty()
B. np.zeros()
C. np.ones()
D. np.full()
Show Answer
Answer: B
np.zeros() returns a new array of given shape and type, filled with zeros.
Q59. What is “Feature Scaling”?
A. Selecting the most important features.
B. Normalizing the range of independent variables.
C. Increasing the number of features.
D. Converting features to strings.
Show Answer
Answer: B
Feature scaling standardizes the range of features so that each contributes equally to the distance calculation.
Q60. Which algorithm can be used for both classification and regression tasks?
A. Linear Regression
B. Logistic Regression
C. Decision Tree
D. K-Means
Show Answer
Answer: C
Decision Trees can be adapted for both classification (predicting classes) and regression (predicting continuous values).
Q61. What does the “fit_transform()” method do?
A. Fits the model and transforms the data in one step.
B. Only transforms the data.
C. Only fits the model.
D. Predicts the output.
Show Answer
Answer: A
fit_transform() combines the fit() and transform() methods for convenience on the training data.
Q62. Which criterion is used to measure the quality of a split in Decision Trees?
A. Accuracy
B. Gini Impurity or Entropy
C. Mean Squared Error
D. R-Squared
Show Answer
Answer: B
Gini Impurity and Entropy are metrics used to evaluate how well a split separates the classes.
Q63. What is “Bagging” in ensemble learning?
A. Training models sequentially.
B. Training multiple models on random subsets of data and averaging their predictions.
C. Selecting the best single model.
D. Reducing the number of features.
Show Answer
Answer: B
Bagging (Bootstrap Aggregating) reduces variance by training multiple models on different samples of the data.
Q64. Which Scikit-learn module contains various distance metrics?
A. sklearn.metrics
B. sklearn.neighbors
C. sklearn.preprocessing
D. sklearn.linear_model
Show Answer
Answer: A
sklearn.metrics contains functions to calculate various performance metrics and distance computations.
Q65. What is the output of the “predict_proba()” method in classifiers like LogisticRegression?
A. The predicted class label.
B. The probability of the input belonging to each class.
C. The accuracy score.
D. The confusion matrix.
Show Answer
Answer: B
predict_proba() returns the probability estimates for each class label.
Q66. Which method is used to create a correlation matrix in Pandas?
A. cov()
B. corr()
C. describe()
D. mean()
Show Answer
Answer: B
The corr() method computes pairwise correlation of columns, excluding NA/null values.
Q67. What type of scaling transforms features to a given range, usually between 0 and 1?
A. StandardScaler
B. MinMaxScaler
C. RobustScaler
D. Normalizer
Show Answer
Answer: B
MinMaxScaler transforms features by scaling each feature to a given range, typically [0, 1].
Q68. In the context of SVM, what is a “Support Vector”?
A. The centroid of the dataset.
B. Data points that are closest to the hyperplane and influence its position.
C. Outliers in the dataset.
D. The average of the features.
Show Answer
Answer: B
Support Vectors are the critical data points that help define the maximum margin of the hyperplane.
Q69. Which regularization technique is known for shrinking coefficients to zero for feature selection?
A. Ridge (L2)
B. Lasso (L1)
C. Elastic Net
D. Dropout
Show Answer
Answer: B
Lasso (L1) regularization can shrink some coefficients to exactly zero, effectively selecting features.
Q70. What is the purpose of the “random_state” parameter in Random Forest?
A. To increase the number of trees.
B. To ensure reproducibility of the results.
C. To handle missing values.
D. To speed up the computation.
Show Answer
Answer: B
Setting random_state ensures that the random bootstrapping of data produces the same result each time.
Q71. Which attribute of a Decision Tree model can be used to check the depth of the tree?
A. tree_depth_
B. max_depth
C. get_depth()
D. depth
Show Answer
Answer: C
get_depth() is a method that returns the depth of the decision tree after it has been fitted.
Q72. What is the “Curse of Dimensionality”?
A. Too much data.
B. The phenomenon where the feature space becomes increasingly sparse as dimensions increase.
C. The high cost of training models.
D. The inability to visualize data.
Show Answer
Answer: B
As dimensions increase, the amount of data needed to generalize accurately grows exponentially.
Q73. Which function is used to calculate the dot product of two arrays in NumPy?
A. np.cross()
B. np.dot()
C. np.multiply()
D. np.sum()
Show Answer
Answer: B
np.dot() computes the dot product of two arrays, a fundamental operation in linear algebra for ML.
Q74. Which method is used to save a trained Scikit-learn model to a file?
A. save()
B. dump() from joblib or pickle
C. export()
D. write()
Show Answer
Answer: B
joblib.dump() or pickle.dump() is used to serialize the model object to a file on disk.
Q75. What does a high “Recall” value indicate?
A. The model predicts very few false positives.
B. The model predicts very few false negatives.
C. The model is very precise.
D. The model is very accurate.
Show Answer
Answer: B
High recall means the model correctly identifies most of the relevant cases (positive class), minimizing false negatives.
Q76. Which Scikit-learn class implements the K-Means clustering algorithm?
A. sklearn.cluster.KMeans
B. sklearn.cluster.KNN
C. sklearn.model.KMeans
D. sklearn.linear.KMeans
Show Answer
Answer: A
KMeans is the class located in the sklearn.cluster module for performing K-Means clustering.
Q77. What is the primary input to the “fit()” method for supervised learning?
A. Only X (features)
B. Only y (labels)
C. Both X and y
D. Only the file path
Show Answer
Answer: C
In supervised learning, fit() requires both the input features (X) and the target labels (y) to learn mapping.
Q78. Which type of machine learning involves an agent learning to make decisions by performing actions in an environment?
A. Supervised Learning
B. Unsupervised Learning
C. Reinforcement Learning
D. Semi-supervised Learning
Show Answer
Answer: C
Reinforcement Learning is about taking suitable action to maximize reward in a specific situation.
Q79. Which Pandas function is used to read a CSV file into a DataFrame?
A. pandas.read_csv()
B. pandas.load_csv()
C. pandas.open_csv()
D. pandas.import_csv()
Show Answer
Answer: A
read_csv() is the standard Pandas function to read a comma-separated values file into a DataFrame.
Q80. What is the key difference between Standardization and Normalization?
A. Standardization rescales to [0,1], Normalization rescales to mean 0.
B. Standardization rescales to mean 0, variance 1; Normalization rescales to a range [0,1].
C. There is no difference.
D. Normalization handles outliers better.
Show Answer
Answer: B
Standardization (Z-score) centers data around 0 with unit variance, while Normalization (Min-Max) scales to a bounded range.
Q81. Which method in Scikit-learn is used to calculate the accuracy of a classifier?
A. accuracy_score()
B. r2_score()
C. mean_squared_error()
D. f1_score()
Show Answer
Answer: A
accuracy_score computes the accuracy, the fraction of correct predictions over total predictions.
Q82. What does a confusion matrix look like for a binary classification problem?
A. 1×1 Matrix
B. 2×2 Matrix
C. 3×3 Matrix
D. Linear Array
Show Answer
Answer: B
A binary classification confusion matrix is a 2×2 table showing TP, FP, FN, and TN counts.
Q83. Which hyperparameter of SVM controls the margin hardness?
A. kernel
B. C
C. gamma
D. degree
Show Answer
Answer: B
The C parameter trades off correct classification of training examples against maximization of the decision margin.
Q84. What is the primary function of an activation function in a neural network?
A. To calculate the error.
B. To introduce non-linearity into the network.
C. To normalize the input data.
D. To reduce the number of parameters.
Show Answer
Answer: B
Activation functions allow neural networks to learn complex patterns by introducing non-linear properties.
Q85. Which gradient descent variant uses the whole dataset to compute the gradient?
A. Stochastic Gradient Descent
B. Mini-batch Gradient Descent
C. Batch Gradient Descent
D. Gradient Boosting
Show Answer
Answer: C
Batch Gradient Descent calculates the error for each example in the training dataset but updates the model only after all training examples have been evaluated.
Q86. In Scikit-learn, what does the “score()” method typically return for a classifier?
A. The mean squared error.
B. The R-squared value.
C. The accuracy score.
D. The confusion matrix.
Show Answer
Answer: C
For classifiers, the score method returns the mean accuracy on the given test data and labels.
Q87. Which technique combines predictions from multiple models to produce a final prediction?
A. Feature Selection
B. Ensemble Learning
C. Dimensionality Reduction
D. Data Augmentation
Show Answer
Answer: B
Ensemble Learning combines multiple models (like in Random Forest) to improve robustness and accuracy.
Q88. What is the role of the “gamma” parameter in the RBF kernel of SVM?
A. To define the regularization strength.
B. To define how far the influence of a single training example reaches.
C. To define the polynomial degree.
D. To define the bias term.
Show Answer
Answer: B
Gamma defines the influence of a single training example; low values mean ‘far’, high values mean ‘close’.
Q89. Which NumPy method is used to reshape an array?
A. resize()
B. reshape()
C. shape()
D. flatten()
Show Answer
Answer: B
reshape() gives a new shape to an array without changing its data.
Q90. What is “Data Leakage” in machine learning?
A. Losing data during processing.
B. When information from outside the training dataset is used to create the model.
C. Memory leak in Python code.
D. Unauthorized access to the dataset.
Show Answer
Answer: B
Data leakage occurs when data inadvertently passes information from the test set to the training set.
Q91. Which classifier is known for performing well even with small training datasets?
A. Random Forest
B. Naive Bayes
C. Deep Neural Network
D. Gradient Boosting
Show Answer
Answer: B
Naive Bayes is computationally efficient and can perform well with limited data due to its simple assumptions.
Q92. What is the function of the “intercept_” attribute in Linear Regression?
A. It represents the slope.
B. It represents the bias term (y-intercept).
C. It represents the error.
D. It represents the correlation.
Show Answer
Answer: B
intercept_ contains the independent term in the linear model, representing the value of y when all X are 0.
Q93. Which method in Scikit-learn is used to generate a classification report?
A. metrics.report()
B. metrics.classification_report()
C. metrics.summary()
D. metrics.confusion_report()
Show Answer
Answer: B
classification_report builds a text report showing the main classification metrics like precision, recall, and f1-score.
Q94. What type of variables are suitable for Label Encoding?
A. Continuous variables.
B. Ordinal categorical variables.
C. Nominal categorical variables with many levels.
D. Target variables.
Show Answer
Answer: B
Label Encoding is best for ordinal variables where the order matters (e.g., Low, Medium, High).
Q95. Which optimization algorithm is often the default for neural network training?
A. Gradient Descent
B. Adam
C. Newton’s Method
D. Simulated Annealing
Show Answer
Answer: B
Adam (Adaptive Moment Estimation) is widely used as it computes adaptive learning rates for each parameter.
Q96. In Pandas, what does the “groupby()” method do?
A. Groups data based on row index.
B. Groups data based on column values to perform aggregate functions.
C. Groups multiple DataFrames together.
D. Groups missing values.
Show Answer
Answer: B
groupby() involves splitting the object, applying a function, and combining the results.
Q97. Which metric represents the area under the ROC curve?
A. ROC-AUC
B. MSE
C. RMSE
D. MAE
Show Answer
Answer: A
ROC-AUC stands for Area Under the Receiver Operating Characteristic Curve, a common performance metric.
Q98. What is the main advantage of using a virtual environment in Python for ML projects?
A. It speeds up code execution.
B. It manages dependencies separately for different projects.
C. It provides more memory.
D. It visualizes data better.
Show Answer
Answer: B
Virtual environments isolate project libraries to prevent version conflicts between different projects.
Q99. Which algorithm uses “purity” to determine the best split?
A. Linear Regression
B. Decision Tree
C. K-Means
D. PCA
Show Answer
Answer: B
Decision Trees use metrics like Gini Impurity or Entropy to measure node purity and decide splits.
Q100. What is the outcome of a “one-vs-rest” (OvR) strategy in multiclass classification?
A. It fits one classifier per class, treating it as the positive class and all others as negative.
B. It fits one classifier for all classes simultaneously.
C. It ignores one class and classifies the rest.
D. It rests the model after every iteration.
Show Answer
Answer: A
OvR is a heuristic method for multi-class classification involving training multiple binary classifiers.
Conclusion
That’s it for this 100 Python Machine Learning MCQs question bank! Remember, you do not need to memorise these questions or answers. Simply bookmark this page and review it twice a week. Through consistent repetition, within just a few weeks, you will naturally begin to remember these concepts and develop a deep understanding of Machine Learning.
Now, if you want to go further in Python and strengthen your interview preparation, check out this article:
You can also explore questions focused on specific libraries and frameworks:
- Top 50 Pandas Interview Questions
- Top 50 Flask Interview Questions
- Top 50 Django Interview Questions
These resources will help you understand and prepare for real-world Python development and interviews.


