shredx.modules.transformer.TransformerEncoderLayer#

class shredx.modules.transformer.TransformerEncoderLayer(d_model: int, n_heads: int, dim_feedforward: int, dropout: float, activation: Module, layer_norm_eps: float, norm_first: bool, bias: bool, dtype: dtype | None, device: str = 'cpu')#

Bases: Module

Single transformer encoder layer.

Consists of multi-head self-attention followed by a position-wise feedforward network, with residual connections and layer normalization.

Parameters:

d_modelint: Model dimension (input/output size).
n_headsint: Number of attention heads.
dim_feedforwardint: Dimension of feedforward network hidden layer.
dropoutfloat: Dropout probability.
activationnn.Module: Activation function for feedforward network.
layer_norm_epsfloat: Epsilon for layer normalization.
norm_firstbool: If True, apply layer norm before attention/feedforward.
biasbool: Whether to use bias in linear layers.
dtypetorch.dtype, optional: Data type for parameters.
devicestr, optional: Device to place the model on. Default is "cpu".

Methods

forward(src[, is_causal])

Forward pass through the encoder layer.