shredx.modules.transformer.TransformerEncoderLayer#
- class shredx.modules.transformer.TransformerEncoderLayer(d_model: int, n_heads: int, dim_feedforward: int, dropout: float, activation: Module, layer_norm_eps: float, norm_first: bool, bias: bool, dtype: dtype | None, device: str = 'cpu')#
Bases:
ModuleSingle transformer encoder layer.
Consists of multi-head self-attention followed by a position-wise feedforward network, with residual connections and layer normalization.
- Parameters:
- d_modelint
Model dimension (input/output size).
- n_headsint
Number of attention heads.
- dim_feedforwardint
Dimension of feedforward network hidden layer.
- dropoutfloat
Dropout probability.
- activationnn.Module
Activation function for feedforward network.
- layer_norm_epsfloat
Epsilon for layer normalization.
- norm_firstbool
If True, apply layer norm before attention/feedforward.
- biasbool
Whether to use bias in linear layers.
- dtypetorch.dtype, optional
Data type for parameters.
- devicestr, optional
Device to place the model on. Default is
"cpu".
Methods
forward(src[, is_causal])Forward pass through the encoder layer.