Python predict() function – All you need to know!

Every ML model, regardless of how it was trained or what framework built it, eventually does the same thing: it takes input and produces output. Python’s model.predict() operation is. It looks simple. It is simple-until it isn’t.

The same method name appears across scikit-learn, Keras, TensorFlow, PyTorch, XGBoost, LightGBM, and most other ML frameworks. But “predict” means slightly different things in each. The return shapes differ. The input expectations differ. The performance characteristics differ. And the ways it can fail differ too. This article covers what predict() actually does, how it behaves across the major frameworks, and the practical issues you’ll hit when running it in production.

TLDR

  • predict() runs inference mode without updating weights
  • sklearn returns class labels; Keras with activation returns probabilities; PyTorch requires model.eval() and torch.no_grad()
  • Almost all frameworks require 2D input shape (n_samples, n_features) even for single samples
  • predict_proba() gives class probabilities in sklearn, XGBoost, and LightGBM
  • PyTorch has no built-in predict() method – call the model directly

What model.predict() Actually Does

predict() runs the model in inference mode. It passes your input data through the forward pass and returns the model’s predictions. Unlike fit() or train(), it does not update any weights. It is a pure computation.

At a high level:

1. The input is preprocessed and formatted to match what the model expects 2. The model runs its forward pass 3. Raw outputs (logits, probabilities, regression values) are returned

The critical thing to understand: predict() does not apply your final activation function in some frameworks, and it does in others. Confusion starts here.

scikit-learn: The Baseline

scikit-learn has the most consistent and predictable predict() behavior. It is the reference implementation that most other frameworks loosely follow.

Classification


from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(predictions.shape)  # (200,)
print(predictions[:10])  # array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0])

predict() returns a NumPy array of class labels. For binary classification, these are 0 or 1. For multiclass, these are integer indices.

Getting Probabilities in sklearn

If you want probabilities, you need predict_proba():


probabilities = model.predict_proba(X_test)
print(probabilities.shape)  # (200, 2) — two classes
print(probabilities[:3])
# [[0.85, 0.15],
#  [0.12, 0.88],
#  [0.73, 0.27]]

Note that predict_proba() returns the probability for each class. The order matches model.classes_. If you need just the positive class probability in binary classification, use predict_proba(X_test)[:, 1].

Regression


from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(predictions.shape)  # (200,)
print(predictions[:5])  # array([2.34, -0.87, 1.56, 3.21, 0.12])

For regressors, predict() returns floating-point values directly. No separate method for raw scores vs. final output-the returned value is always the final prediction.

sklearn Summary

Model TypeReturn ShapeReturn Type
Classifier(n_samples,)Integer labels
Regressor(n_samples,)Float values
predict_proba()(n_samples, n_classes)Float probabilities

Keras / TensorFlow: Classification Requires Sigmoid or Softmax

Keras is where most developers hit their first predict() surprise. The predict() method returns logits for classification models by default, not probabilities.

Binary Classification


import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Binary classification model
model = Sequential([
    Dense(64, activation='relu', input_shape=(20,)),
    Dense(1, activation='sigmoid')  # sigmoid on output layer
])

model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X_train, y_train, epochs=10, verbose=0)

# Raw predictions — still logits because of how Keras works
# Actually, sigmoid is applied, so you get probabilities
predictions = model.predict(X_test, verbose=0)
print(predictions.shape)  # (n_samples, 1)
print(predictions[:5].flatten())
# [0.87, 0.12, 0.65, 0.91, 0.34]

Here’s the gotcha: if you build a binary classification model without an activation function on the final layer (i.e., you plan to apply sigmoid manually), then predict() returns raw logits. If you use activation='sigmoid' on the final layer, predict() returns probabilities.


# Without sigmoid on final layer — returns logits
model_logits = Sequential([
    Dense(64, activation='relu', input_shape=(20,)),
    Dense(1)  # linear output — raw logits
])
model_logits.compile(optimizer='adam', loss='binary_crossentropy')
model_logits.fit(X_train, y_train, epochs=10, verbose=0)

raw_output = model_logits.predict(X_test, verbose=0)
# raw_output is logits, not probabilities
# Apply sigmoid manually to convert: 1 / (1 + np.exp(-raw_output))

Multiclass Classification


from tensorflow.keras.utils import to_categorical

# One-hot encode labels for multiclass
y_train_cat = to_categorical(y_train, num_classes=3)
y_test_cat = to_categorical(y_test, num_classes=3)

model = Sequential([
    Dense(64, activation='relu', input_shape=(20,)),
    Dense(3, activation='softmax')  # softmax for multiclass
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_train, y_train_cat, epochs=10, verbose=0)

predictions = model.predict(X_test, verbose=0)
print(predictions.shape)  # (n_samples, 3)
print(predictions[:3])
# [[0.05, 0.12, 0.83],
#  [0.71, 0.22, 0.07],
#  [0.33, 0.45, 0.22]]

With softmax on the final layer, predict() returns probabilities that sum to 1.0 across each row.

Using predict() with Models Without Output Activation

If you are doing custom training loops or using logits directly, you need to know how to handle raw outputs:


# Raw logits from a model without softmax
logits = model_logits.predict(X_test, verbose=0)

# Convert to probabilities
probabilities = np.exp(logits) / np.sum(np.exp(logits), axis=1, keepdims=True)
# Or simply:
from scipy.special import softmax
probabilities = softmax(logits, axis=1)

predict() vs predict_on_batch()

Keras predict() is designed to handle large datasets by processing in batches internally. For small datasets, this overhead can actually slow things down. Use predict_on_batch() when you know your input size and want to avoid the batch-scheduling overhead:


# Standard predict — handles batching internally
predictions = model.predict(X_test, batch_size=32, verbose=1)

# Manual batch processing for small data
for i in range(0, len(X_test), 32):
    batch = X_test[i:i+32]
    batch_preds = model.predict_on_batch(batch)

predict() with verbose=1 shows a progress bar, which is useful for large datasets. predict_on_batch() has no progress output-it is a direct computation call.

PyTorch: No Built-In predict() Method

PyTorch does not have a model.predict() method. Developers coming from sklearn or Keras trip up here.

Instead, you put the model in evaluation mode and call the model directly:


import torch
import torch.nn as nn

class SimpleClassifier(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 1)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

model = SimpleClassifier(input_dim=20)
model.eval()  # Critical: set to evaluation mode

# Inference
with torch.no_grad():
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    predictions = model(X_test_tensor)
    
print(predictions.shape)  # (200, 1)
print(predictions[:5].numpy().flatten())

The eval() Mode Matters

Dropout layers, batch normalization, and other stochastic layers behave differently in training vs. evaluation. Always call model.eval() before inference:


model.eval()  # Disables dropout, uses running stats for BatchNorm

with torch.no_grad():  # Disables gradient computation
    predictions = model(X_test_tensor)

Common PyTorch Inference Patterns


# Batch inference
def predict_batch(model, X, batch_size=64):
    model.eval()
    predictions = []
    with torch.no_grad():
        for i in range(0, len(X), batch_size):
            batch = torch.tensor(X[i:i+batch_size], dtype=torch.float32)
            preds = model(batch)
            predictions.append(preds.numpy())
    return np.concatenate(predictions)

# CPU vs GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

with torch.no_grad():
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)
    predictions = model(X_test_tensor).cpu().numpy()

PyTorch with torch.compile (PyTorch 2.0+)

PyTorch 2.0 introduced torch.compile(), which JIT-compiles the model for faster inference:


model = SimpleClassifier(input_dim=20)
model.eval()

# Compile for ~20-30% speedup on inference
compiled_model = torch.compile(model)

with torch.no_grad():
    predictions = compiled_model(X_test_tensor)

XGBoost and LightGBM: Native Gradient Boosting

XGBoost and LightGBM have their own predict() methods that behave similarly to sklearn but with important differences.

XGBoost


import xgboost as xgb

model = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

# Default: returns class predictions
predictions = model.predict(X_test)
print(predictions.shape)  # (200,)

# Probabilities
probabilities = model.predict_proba(X_test)
print(probabilities.shape)  # (200, 2)

XGBoost with raw_score and pred_leaf

XGBoost exposes additional prediction types:


# Raw margin scores (before the global link function)
raw_scores = model.predict(X_test, output_margin=True)

# Leaf indices (useful for tree interpretation)
leaf_indices = model.predict(X_test, pred_leaf=True)
print(leaf_indices.shape)  # (200, n_trees) — which leaf each tree puts the sample in

LightGBM


import lightgbm as lgb

model = lgb.LGBMClassifier(n_estimators=100)
model.fit(X_train, y_train)

predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)

# Raw scores
raw_scores = model.predict(X_test, raw_score=True)

# Leaf indices
leaf_preds = model.predict(X_test, pred_leaf=True)

Common Pitfalls Across All Frameworks

1. Input Shape Mismatches

The single most common error. model.predict() almost always expects 2D input (n_samples, n_features), even if you are predicting a single sample.


# Wrong — 1D array
single_sample = X_test[0]
predictions = model.predict(single_sample)  # Shape mismatch error

# Correct — 2D array
single_sample = X_test[0:1]  # Shape (1, n_features)
predictions = model.predict(single_sample)

Here is the tricky part: most frameworks can broadcast 1D to 2D in other contexts, but predict() is strict about shape.

2. Not Setting the Model to Evaluation Mode (PyTorch)

Dropout being active during inference will randomly zero out neurons, producing different outputs every call. Always:


model.eval()  # Before inference

3. Forgetting That predict() Returns Indices, Not Probabilities (sklearn)


# Wrong assumption
if model.predict(X_test) > 0.5:  # comparing array to scalar does element-wise comparison
    ...

# Correct — for binary classification
proba = model.predict_proba(X_test)[:, 1]
predictions = (proba > 0.5).astype(int)

4. Keras predict() Batching Overhead for Small Inputs

For small test sets, Keras predict() can be slower than expected due to internal batch scheduling:


# Slow for small data — batch scheduling overhead
predictions = model.predict(X_small, verbose=0)

# Faster for small data
predictions = model.predict_on_batch(X_small)

5. Ignoring the Dtype of Your Input


# If your training data was float32 but inference is float64
X_test_wrong = np.array(X_test, dtype=np.float64)
predictions = model.predict(X_test_wrong)  # May work or may cast unexpectedly

# Ensure matching dtype
X_test_correct = np.array(X_test, dtype=np.float32)
predictions = model.predict(X_test_correct)

6. XGBoost/LightGBM Using Wrong Input Type After sklearn

sklearn models accept pandas DataFrames. XGBoost and LightGBM often work better with their native data structures for large datasets:


import xgboost as xgb

# DMatrix is XGBoost's native data structure — faster for large data
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test)

model = xgb.train(params, dtrain, num_boost_round=100)
predictions = model.predict(dtest)  # Note: different API — model is Booster, not Classifier

Batch Prediction Performance

When you need to predict on large datasets, how you batch matters:


def batch_predict(model, X, framework='sklearn', batch_size=1000):
    n_samples = len(X)
    predictions = []
    
    for start in range(0, n_samples, batch_size):
        end = min(start + batch_size, n_samples)
        batch = X[start:end]
        
        if framework == 'sklearn':
            preds = model.predict(batch)
        elif framework == 'keras':
            preds = model.predict(batch, verbose=0)
        elif framework == 'pytorch':
            with torch.no_grad():
                batch_tensor = torch.tensor(batch, dtype=torch.float32)
                preds = model(batch_tensor).numpy()
        
        predictions.append(preds)
    
    return np.concatenate(predictions)

Key points:

  • sklearn: internal batching is usually sufficient, pass the whole array
  • Keras: batch_size parameter in predict() controls internal batching; set it based on your memory constraints
  • PyTorch: manual batching gives you full control

What About predict_proba() and Other Variants?

Frameworks typically provide variant methods:

MethodReturnsAvailable In
predict()Class labels (sklearn) or probabilities (Keras with activation)All
predict_proba()Class membership probabilitiessklearn, Keras (wraps predict), XGBoost, LightGBM
predict_log_proba()Log probabilitiessklearn
predict_on_batch()Same as predict, explicit batchKeras
predict_async()Async versionSome frameworks (e.g., TensorFlow.js)

Use predict_proba() when you need the uncertainty of a prediction, not just the label. These methods are essential for:

  • Threshold tuning (choosing your own classification threshold)
  • Calibrated probabilities
  • Ensemble methods that weight predictions by confidence

Putting It Together: A Framework-Agnostic predict() Wrapper

If you are working with multiple frameworks in the same codebase, a thin wrapper can smooth over the differences:


import numpy as np

def predict(model, X, framework='sklearn', proba=False):
    X = np.asarray(X)
    
    if X.ndim == 1:
        X = X.reshape(1, -1)  # Ensure 2D
    
    if framework == 'sklearn':
        if proba:
            return model.predict_proba(X)
        return model.predict(X)
    
    elif framework == 'keras':
        preds = model.predict(X, verbose=0)
        if proba:
            return preds
        return (preds > 0.5).astype(int).flatten()
    
    elif framework == 'pytorch':
        model.eval()
        with torch.no_grad():
            X_tensor = torch.tensor(X, dtype=torch.float32)
            preds = model(X_tensor).numpy()
        if proba:
            return preds
        return (preds > 0.5).astype(int).flatten()
    
    elif framework in ('xgboost', 'lightgbm'):
        if proba:
            return model.predict_proba(X)
        return model.predict(X)
    
    else:
        raise ValueError(f"Unknown framework: {framework}")

The Core Principle

model.predict() is a framework-specific inference call that:

  • Takes your preprocessed input data
  • Runs the forward pass without updating weights
  • Returns predictions in framework-specific format (labels, probabilities, or raw scores)

FAQ

Q: Does model.predict() update weights?

No. predict() runs inference mode, which is a pure forward pass with no weight updates. Only fit(), train(), or backward() operations change model parameters.

Q: Why does Keras predict() return different values than sklearn?

sklearn always returns class labels (0 or 1) for classifiers. Keras returns raw values that depend on your output layer activation. If you used sigmoid, you get probabilities. If you used no activation, you get logits and must apply sigmoid manually.

Q: Why does my PyTorch model give different outputs every time I call it?

You probably forgot model.eval() or are still inside a torch.enable_grad() context. Dropout and certain other layers behave differently in training mode. Always call model.eval() and use torch.no_grad() for inference.

Q: Can I use predict() on a single sample?

Yes, but you must pass a 2D array with shape (1, n_features), not a 1D array. Single samples need X[0:1] not X[0] for most frameworks.

Q: What is the difference between predict() and predict_proba()?

predict() returns class labels (for classifiers) or values (for regressors). predict_proba() returns class membership probabilities. Use predict_proba() when you need confidence scores or want to tune your own classification threshold.

The surface similarity across frameworks masks important differences in return types, input shape requirements, and behavior in training vs. evaluation mode. Understanding these differences is what separates code that works in a notebook from code that works in production.

Safa Mulani
Safa Mulani

An enthusiast in every forthcoming wonders!

Articles: 189