vanillanets
Components

Component Architecture

How VanillaNets objects are structured, and what model.fit() does internally.

VanillaNets represents every component of a network - layers, activations, losses, optimizers - as a plain Python object with a small, consistent interface. There is no autograd engine and no computational graph object; Model is just a list, and training is a sequence of explicit method calls.

forward / backward contract

Layers and activations:

layer.forward(inputs)   # caches .inputs, sets .output
layer.backward(dvalues) # sets .dinputs (and .dweights / .dbiases for stateful layers)

Losses differ slightly:

loss.forward(y_pred, y_true)   # returns a per-sample array (not stored on .output)
loss.calculate(y_pred, y_true) # = np.mean(loss.forward(y_pred, y_true))
loss.backward(dvalues, y_true) # sets .dinputs

model.fit() and model.evaluate() call .calculate() for the reported scalar loss.

Object categories

CategoryExamplesTrainable params?
Stateful layersDenseLayerYes - .weights, .biases
Stateless opsReLU, Sigmoid, Softmax, CategoricalCrossEntropy, ...No
OrchestratorsModel, Optimizer_SGD, Optimizer_AdamNo - operate on other objects
Fused shortcutActivation_Softmax_Loss_CategoricalCrossentropyNo - special-cased by Model.finalize()

What model.fit() does, per epoch

  1. Forward - iterate self.layers in order; each layer's .output becomes the next layer's input.
  2. Loss - data_loss = self.loss.calculate(layer_input, y).
  3. Metrics - each entry in self.metrics is computed as metric.calculate(layer_input, y).
  4. Backward - branches on self.softmax_classifier_output:
    • Fused path (active when the last layer is Softmax and the loss is CategoricalCrossEntropy or SparseCategoricalCrossEntropy): call softmax_classifier_output.backward(layer_input, y), then backprop over self.layers[:-1] - the trailing Softmax is skipped.
    • Standard path: self.loss.backward(layer_input, y), then backprop over all of self.layers.
  5. Optimization - optimizer.pre_update_lr(), then optimizer.update_params(layer) for every layer where hasattr(layer, 'weights'), then optimizer.post_update_params().
  6. Validation (if validation_data is passed) - self.evaluate(X_val, y_val) runs after step 5, so validation metrics for epoch N reflect the weights already updated at epoch N.
  7. Logging - every print_every epochs, prints training metrics, loss, learning rate, and (if applicable) validation metrics.

predict() vs. the forward pass inside fit()

model.predict(X) runs the same per-layer forward loop and returns the final .output. It's a separate code path - fit() does not call predict(). Both leave .inputs/.output cached on every layer, which fit()'s backward pass then reads.

Writing custom components

  • Custom activation or loss: implement forward/backward per the contracts above.
  • Custom stateful layer: must expose .weights, .biases, .dweights, .dbiases under exactly those names - model.fit() finds it for optimization via hasattr(layer, 'weights'), and nothing else.

Inspecting state

Every object stores its inputs, outputs, and gradients as plain attributes. You can pause after any forward()/backward() call and read layer.output, layer.dweights, loss.dinputs, etc. directly - no hooks or debug mode required.

On this page