Component Architecture
How VanillaNets objects are structured, and what model.fit() does internally.
VanillaNets represents every component of a network - layers, activations, losses, optimizers - as a plain Python object with a small, consistent interface. There is no autograd engine and no computational graph object; Model is just a list, and training is a sequence of explicit method calls.
forward / backward contract
Layers and activations:
layer.forward(inputs) # caches .inputs, sets .output
layer.backward(dvalues) # sets .dinputs (and .dweights / .dbiases for stateful layers)Losses differ slightly:
loss.forward(y_pred, y_true) # returns a per-sample array (not stored on .output)
loss.calculate(y_pred, y_true) # = np.mean(loss.forward(y_pred, y_true))
loss.backward(dvalues, y_true) # sets .dinputsmodel.fit() and model.evaluate() call .calculate() for the reported scalar loss.
Object categories
| Category | Examples | Trainable params? |
|---|---|---|
| Stateful layers | DenseLayer | Yes - .weights, .biases |
| Stateless ops | ReLU, Sigmoid, Softmax, CategoricalCrossEntropy, ... | No |
| Orchestrators | Model, Optimizer_SGD, Optimizer_Adam | No - operate on other objects |
| Fused shortcut | Activation_Softmax_Loss_CategoricalCrossentropy | No - special-cased by Model.finalize() |
What model.fit() does, per epoch
- Forward - iterate
self.layersin order; each layer's.outputbecomes the next layer's input. - Loss -
data_loss = self.loss.calculate(layer_input, y). - Metrics - each entry in
self.metricsis computed asmetric.calculate(layer_input, y). - Backward - branches on
self.softmax_classifier_output:- Fused path (active when the last layer is
Softmaxand the loss isCategoricalCrossEntropyorSparseCategoricalCrossEntropy): callsoftmax_classifier_output.backward(layer_input, y), then backprop overself.layers[:-1]- the trailingSoftmaxis skipped. - Standard path:
self.loss.backward(layer_input, y), then backprop over all ofself.layers.
- Fused path (active when the last layer is
- Optimization -
optimizer.pre_update_lr(), thenoptimizer.update_params(layer)for every layer wherehasattr(layer, 'weights'), thenoptimizer.post_update_params(). - Validation (if
validation_datais passed) -self.evaluate(X_val, y_val)runs after step 5, so validation metrics for epoch N reflect the weights already updated at epoch N. - Logging - every
print_everyepochs, prints training metrics, loss, learning rate, and (if applicable) validation metrics.
predict() vs. the forward pass inside fit()
model.predict(X) runs the same per-layer forward loop and returns the final .output. It's a separate code path - fit() does not call predict(). Both leave .inputs/.output cached on every layer, which fit()'s backward pass then reads.
Writing custom components
- Custom activation or loss: implement
forward/backwardper the contracts above. - Custom stateful layer: must expose
.weights,.biases,.dweights,.dbiasesunder exactly those names -model.fit()finds it for optimization viahasattr(layer, 'weights'), and nothing else.
Inspecting state
Every object stores its inputs, outputs, and gradients as plain attributes. You can pause after any forward()/backward() call and read layer.output, layer.dweights, loss.dinputs, etc. directly - no hooks or debug mode required.