Component Architecture

VanillaNets represents every component of a network - layers, activations, losses, optimizers - as a plain Python object with a small, consistent interface. There is no autograd engine and no computational graph object; Model is just a list, and training is a sequence of explicit method calls.

forward / backward contract

Layers and activations:

layer.forward(inputs)   # caches .inputs, sets .output
layer.backward(dvalues) # sets .dinputs (and .dweights / .dbiases for stateful layers)

Losses differ slightly:

loss.forward(y_pred, y_true)   # returns a per-sample array (not stored on .output)
loss.calculate(y_pred, y_true) # = np.mean(loss.forward(y_pred, y_true))
loss.backward(dvalues, y_true) # sets .dinputs

model.fit() and model.evaluate() call .calculate() for the reported scalar loss.

Object categories

Category	Examples	Trainable params?
Stateful layers	`DenseLayer`	Yes - `.weights`, `.biases`
Stateless ops	`ReLU`, `Sigmoid`, `Softmax`, `CategoricalCrossEntropy`, ...	No
Orchestrators	`Model`, `Optimizer_SGD`, `Optimizer_Adam`	No - operate on other objects
Fused shortcut	`Activation_Softmax_Loss_CategoricalCrossentropy`	No - special-cased by `Model.finalize()`

What `model.fit()` does, per epoch

Forward - iterate self.layers in order; each layer's .output becomes the next layer's input.
Loss - data_loss = self.loss.calculate(layer_input, y).
Metrics - each entry in self.metrics is computed as metric.calculate(layer_input, y).
Backward - branches on self.softmax_classifier_output:
- Fused path (active when the last layer is Softmax and the loss is CategoricalCrossEntropy or SparseCategoricalCrossEntropy): call softmax_classifier_output.backward(layer_input, y), then backprop over self.layers[:-1] - the trailing Softmax is skipped.
- Standard path: self.loss.backward(layer_input, y), then backprop over all of self.layers.
Optimization - optimizer.pre_update_lr(), then optimizer.update_params(layer) for every layer where hasattr(layer, 'weights'), then optimizer.post_update_params().
Validation (if validation_data is passed) - self.evaluate(X_val, y_val) runs after step 5, so validation metrics for epoch N reflect the weights already updated at epoch N.
Logging - every print_every epochs, prints training metrics, loss, learning rate, and (if applicable) validation metrics.

`predict()` vs. the forward pass inside `fit()`

model.predict(X) runs the same per-layer forward loop and returns the final .output. It's a separate code path - fit() does not call predict(). Both leave .inputs/.output cached on every layer, which fit()'s backward pass then reads.

Writing custom components

Custom activation or loss: implement forward/backward per the contracts above.
Custom stateful layer: must expose .weights, .biases, .dweights, .dbiases under exactly those names - model.fit() finds it for optimization via hasattr(layer, 'weights'), and nothing else.

Inspecting state

Every object stores its inputs, outputs, and gradients as plain attributes. You can pause after any forward()/backward() call and read layer.output, layer.dweights, loss.dinputs, etc. directly - no hooks or debug mode required.