Layers

DenseLayer

class vanillanets.layers.DenseLayer(n_inputs, n_neurons, *, activation='relu',
                                     init='auto', distribution='normal',
                                     bias_init='zeros', seed=None)

Applies a linear transformation: output = inputs @ weights + biases.

DenseLayer is a pure linear transform. activation is only used to resolve the weight-initialization scheme when init='auto' - it has no effect on forward/backward. Non-linearities must be added as separate layers:

model.add(DenseLayer(784, 128, activation='relu'))
model.add(ReLU())  # <- required separately

Parameters

n_inputs (int) - number of input features (fan_in).
n_neurons (int) - number of output neurons (fan_out).
activation (str, default='relu') - 'relu', 'leaky_relu', 'tanh', 'sigmoid', 'softmax', or 'linear'. Used only when init='auto'.
init (str, default='auto') - 'auto', 'he', or 'xavier'. Other values raise ValueError.
distribution (str, default='normal') - 'normal' or 'uniform'.
bias_init (str | int | float, default='zeros') - 'zeros' or a numeric constant. Other values raise ValueError.
seed (int, optional) - passed to np.random.default_rng(). If omitted, fresh entropy is used each construction.

`init='auto'` resolution

if activation in ('relu', 'leaky_relu'):
    init = 'he'
else:
    init = 'xavier'

Any activation value outside ('relu', 'leaky_relu') - including 'tanh', 'sigmoid', 'softmax', 'linear', or a typo - resolves to Xavier. This is not validated against the activation layer you actually add.

Initialization formulas

fan_in = n_inputs, fan_out = n_neurons.

`init`	`distribution`	Formula
`'he'`	`'normal'` (default)	`weights ~ N(0, sqrt(2 / fan_in))`
`'he'`	`'uniform'`	`weights ~ U(-limit, limit)`, `limit = sqrt(6 / fan_in)`
`'xavier'`	`'normal'`	`weights ~ N(0, sqrt(2 / (fan_in + fan_out)))`
`'xavier'`	`'uniform'`	`weights ~ U(-limit, limit)`, `limit = sqrt(6 / (fan_in + fan_out))`

`bias_init`

Value	Result
`'zeros'` (default)	`np.zeros((1, n_neurons))`
`int` / `float`	`np.full((1, n_neurons), value)`
anything else	`ValueError`

Forward

self.inputs = inputs
self.output = inputs @ self.weights + self.biases

Backward

self.dweights = self.inputs.T @ dvalues
self.dbiases  = np.sum(dvalues, axis=0, keepdims=True)
self.dinputs  = dvalues @ self.weights.T

Shapes

Tensor	Shape
Input	`(batch_size, n_inputs)`
`weights`	`(n_inputs, n_neurons)`
`biases`	`(1, n_neurons)`
Output	`(batch_size, n_neurons)`
`dweights`	same as `weights`
`dbiases`	same as `biases`
`dinputs`	`(batch_size, n_inputs)`

Attributes

weights (ndarray) - shape (n_inputs, n_neurons).
biases (ndarray) - shape (1, n_neurons).
dweights, dbiases, dinputs - populated after backward().

These exact attribute names are read by the optimizer via hasattr(layer, 'weights') - a custom layer must use this naming to receive parameter updates.

Example

from vanillanets.layers import DenseLayer

layer = DenseLayer(n_inputs=784, n_neurons=128, activation='relu', seed=42)
layer.forward(X_batch)
print(layer.output.shape)  # (batch_size, 128)

On this page