Metrics
API reference for classification and regression metrics.
Every metric implements:
metric.calculate(predictions, y)returning a scalar (or, for ConfusionMatrix, an (n_classes, n_classes) array). model.fit() and model.evaluate() call this with predictions = model.predict(X) (or the cached forward output) and the raw y passed to fit/evaluate.
Label conversion (_to_labels)
Every classification metric (Accuracy, Precision, Recall, F1Score, ConfusionMatrix) first converts predictions and y to 1D integer label arrays:
if predictions.shape[1] == 1:
predictions = (predictions > 0.5).astype(int).flatten() # strict >
if y.ndim == 2:
y = y.flatten() # (n,1), not one-hot
else:
predictions = np.argmax(predictions, axis=1)
if y.ndim == 2:
y = np.argmax(y, axis=1) # (n,C) one-hot -> labels
y = y.astype(int)- The binary threshold is
> 0.5(strict) - a prediction of exactly0.5is classified as0. - For multiclass,
argmaxties resolve to the lowest index (NumPy default).
Accuracy
class vanillanets.metrics.Accuracypredictions, y = _to_labels(predictions, y)
return np.mean(predictions == y)Precision
class vanillanets.metrics.Precisionclasses = np.unique(np.concatenate([predictions, y]))
if len(classes) <= 2:
tp = np.sum((predictions == 1) & (y == 1))
fp = np.sum((predictions == 1) & (y == 0))
return tp / (tp + fp + 1e-7)
# else: macro-average TP/(TP+FP) across classes- The binary branch treats label
1as the positive class - if your positive class uses a different integer value, this returns the wrong number. len(classes) <= 2also fires for a batch containing only one class (e.g. all-negative), which is then treated as binary.1e-7is added to every denominator; a zero denominator returns ~0instead of raising orNaN.
Recall
class vanillanets.metrics.RecallSame structure as Precision, computing TP / (TP + FN + 1e-7) (binary) or the macro-average across classes.
F1Score
class vanillanets.metrics.F1Scorep = Precision().calculate(predictions, y)
r = Recall().calculate(predictions, y)
return 2 * p * r / (p + r + 1e-7)Fresh Precision/Recall instances are created on every call - no shared state.
ConfusionMatrix
class vanillanets.metrics.ConfusionMatrix
def calculate(self, predictions, y, num_classes=None)cm[true_label, pred_label] += 1 # rows = y (true), columns = predictions- If
num_classesis provided, the matrix is(num_classes, num_classes). - If omitted, it's inferred as
max(predictions.max(), y.max()) + 1- the matrix size can vary batch-to-batch, and classes absent from a given batch may not align with other batches' matrices. Passnum_classesexplicitly for stable shapes.
R2Score
class vanillanets.metrics.R2Score
def calculate(self, y_pred, y_true):
ss_res = np.sum((y_true - y_pred) ** 2)
ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
return 1 - ss_res / (ss_tot + 1e-7)MAE
class vanillanets.metrics.MAE
def calculate(self, y_pred, y_true):
return np.mean(np.abs(y_true - y_pred))RMSE
class vanillanets.metrics.RMSE
def calculate(self, y_pred, y_true):
return np.sqrt(np.mean((y_true - y_pred) ** 2))Regression metrics take
(y_pred, y_true)- the same positional orderModelalways uses (metric.calculate(predictions, y)), so this is consistent in practice. No label conversion is applied; raw arrays are used as-is.
Choosing metrics
| Task | Recommended metrics |
|---|---|
| Binary classification | Accuracy, Precision, Recall, F1Score |
| Multiclass classification | Accuracy, Precision, Recall, F1Score, ConfusionMatrix (pass num_classes) |
| Regression | R2Score, MAE, RMSE |
Example
from vanillanets.metrics import Accuracy, Precision, Recall, F1Score
model.set(
loss=loss,
optimizer=optimizer,
metrics={
'accuracy': Accuracy(),
'precision': Precision(),
'recall': Recall(),
'f1': F1Score(),
}
)