transformer_heads package

Subpackages

Submodules

transformer_heads.config module

This module defines the configuration classes for the model and its heads.

It includes the HeadConfig class, which defines the configuration for a model head, and the HeadedConfig class, which extends a base configuration class with additional output heads.

Classes:: HeadConfig: A configuration class for a model head. HeadedConfig: A configuration class that extends a base configuration class with additional output heads.

class transformer_heads.config.HeadConfig(name: str, in_size: int, num_outputs: int | None, layer_hook: int = -1, hidden_size: int = 0, num_layers: int = 1, output_activation: str = 'linear', is_causal_lm: bool | None = False, pred_for_sequence: bool | None = False, is_regression: bool | None = False, output_bias: bool | None = False, loss_fct: str | None = 'cross_entropy', trainable: bool | None = True, loss_weight: float | None = 1.0)

Bases: dict

A configuration class for a model head.

Variables:

name (str) – The name of the head.
in_size (int) – The input size for the head.
num_outputs (Optional[int]) – The number of outputs for the head.
layer_hook (int) – The layer to hook the head to. This uses python list indexing, so -1 is the last layer. Default is -1.
hidden_size (int) – The size of the hidden layers if the head should be an mlp. Default is 0.
num_layers (int) – The number of layers in the head. Set to 1 for a linear head and to > 1 for an mlp head. Default is 1.
output_activation (str) – The activation function for the output layer. Default is “linear”.
is_causal_lm (Optional[bool]) – Whether the head is for doing causal language modelling. Default is False.
pred_for_sequence (Optional[bool]) – Whether the head predicts on output per sequence (E.g text classification). Default is False.
is_regression (Optional[bool]) – Whether the head is for a regression task. Default is False.
output_bias (Optional[bool]) – Whether to include a bias in the output layer. Default is False.
loss_fct (Optional[str]) – The loss function for the head. Options are “cross_entropy”, “mse”, “bce”. Default is “cross_entropy”.
trainable (Optional[bool]) – Whether the head is trainable. Default is True.
loss_weight (Optional[float]) – The weight of this head when computing the loss. Default is 1.0.

hidden_size: int = 0

in_size: int

is_causal_lm: bool | None = False

is_regression: bool | None = False

items() → a set-like object providing a view on D's items

layer_hook: int = -1

loss_fct: str | None = 'cross_entropy'

loss_weight: float | None = 1.0

name: str

num_layers: int = 1

num_outputs: int | None

output_activation: str = 'linear'

output_bias: bool | None = False

pred_for_sequence: bool | None = False

trainable: bool | None = True

transformer_heads.config.create_headed_model_config(base_config_class: Type[PretrainedConfig]) → Type[PretrainedConfig]

Creates a new configuration class with additional output heads.

This function takes a base configuration class and returns a new class that inherits from the base class and adds an output_heads attribute. The output_heads attribute is a list of HeadConfig instances.

Parameters:: base_config_class (Type[PretrainedConfig]) – The base configuration class to extend.
Returns:: A new configuration class that includes output heads.
Return type:: Type[PretrainedConfig]

transformer_heads.constants module

This module defines constants for loss functions and model types.

It includes loss_fct_map, a dictionary that maps loss function names to their corresponding PyTorch implementations, and model_type_map, a dictionary that maps model type names to their corresponding transformers model classes. activation_map is a dictionary that maps activation function names to their corresponding PyTorch implementations.

transformer_heads.output module

This module defines the output class for a model with multiple heads.

It includes the HeadedModelOutput class, which extends the ModelOutput class from the transformers library with additional attributes for multi-head models.

Classes:: HeadedModelOutput: An output class for a model with multiple heads.

class transformer_heads.output.HeadedModelOutput(loss: FloatTensor | None = None, loss_by_head: dict[str, FloatTensor] | None = None, preds_by_head: dict[str, FloatTensor] | None = None, past_key_values: Tuple[Tuple[FloatTensor]] | None = None, hidden_states: Tuple[FloatTensor, ...] | None = None, attentions: Tuple[FloatTensor, ...] | None = None)

Bases: ModelOutput

An output class for a model with multiple heads.

This class extends the ModelOutput class from the transformers library with additional attributes for multi-head models.