Metrics

Every metric implements:

metric.calculate(predictions, y)

returning a scalar (or, for ConfusionMatrix, an (n_classes, n_classes) array). model.fit() and model.evaluate() call this with predictions = model.predict(X) (or the cached forward output) and the raw y passed to fit/evaluate.

Label conversion (`_to_labels`)

Every classification metric (Accuracy, Precision, Recall, F1Score, ConfusionMatrix) first converts predictions and y to 1D integer label arrays:

if predictions.shape[1] == 1:
    predictions = (predictions > 0.5).astype(int).flatten()  # strict >
    if y.ndim == 2:
        y = y.flatten()            # (n,1), not one-hot
else:
    predictions = np.argmax(predictions, axis=1)
    if y.ndim == 2:
        y = np.argmax(y, axis=1)   # (n,C) one-hot -> labels
y = y.astype(int)

The binary threshold is > 0.5 (strict) - a prediction of exactly 0.5 is classified as 0.
For multiclass, argmax ties resolve to the lowest index (NumPy default).

Accuracy

class vanillanets.metrics.Accuracy

predictions, y = _to_labels(predictions, y)
return np.mean(predictions == y)

Precision

class vanillanets.metrics.Precision

classes = np.unique(np.concatenate([predictions, y]))
if len(classes) <= 2:
    tp = np.sum((predictions == 1) & (y == 1))
    fp = np.sum((predictions == 1) & (y == 0))
    return tp / (tp + fp + 1e-7)
# else: macro-average TP/(TP+FP) across classes

The binary branch treats label 1 as the positive class - if your positive class uses a different integer value, this returns the wrong number.
len(classes) <= 2 also fires for a batch containing only one class (e.g. all-negative), which is then treated as binary.
1e-7 is added to every denominator; a zero denominator returns ~0 instead of raising or NaN.

Recall

class vanillanets.metrics.Recall

Same structure as Precision, computing TP / (TP + FN + 1e-7) (binary) or the macro-average across classes.

F1Score

class vanillanets.metrics.F1Score

p = Precision().calculate(predictions, y)
r = Recall().calculate(predictions, y)
return 2 * p * r / (p + r + 1e-7)

Fresh Precision/Recall instances are created on every call - no shared state.

ConfusionMatrix

class vanillanets.metrics.ConfusionMatrix
    def calculate(self, predictions, y, num_classes=None)

cm[true_label, pred_label] += 1   # rows = y (true), columns = predictions

If num_classes is provided, the matrix is (num_classes, num_classes).
If omitted, it's inferred as max(predictions.max(), y.max()) + 1 - the matrix size can vary batch-to-batch, and classes absent from a given batch may not align with other batches' matrices. Pass num_classes explicitly for stable shapes.

R2Score

class vanillanets.metrics.R2Score
    def calculate(self, y_pred, y_true):
        ss_res = np.sum((y_true - y_pred) ** 2)
        ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
        return 1 - ss_res / (ss_tot + 1e-7)

MAE

class vanillanets.metrics.MAE
    def calculate(self, y_pred, y_true):
        return np.mean(np.abs(y_true - y_pred))

RMSE

class vanillanets.metrics.RMSE
    def calculate(self, y_pred, y_true):
        return np.sqrt(np.mean((y_true - y_pred) ** 2))

Regression metrics take (y_pred, y_true) - the same positional order Model always uses (metric.calculate(predictions, y)), so this is consistent in practice. No label conversion is applied; raw arrays are used as-is.

Choosing metrics

Task	Recommended metrics
Binary classification	`Accuracy`, `Precision`, `Recall`, `F1Score`
Multiclass classification	`Accuracy`, `Precision`, `Recall`, `F1Score`, `ConfusionMatrix` (pass `num_classes`)
Regression	`R2Score`, `MAE`, `RMSE`

Example

from vanillanets.metrics import Accuracy, Precision, Recall, F1Score

model.set(
    loss=loss,
    optimizer=optimizer,
    metrics={
        'accuracy': Accuracy(),
        'precision': Precision(),
        'recall': Recall(),
        'f1': F1Score(),
    }
)

On this page