transformer_heads package

Subpackages

Submodules

transformer_heads.config module

This module defines the configuration classes for the model and its heads.

It includes the HeadConfig class, which defines the configuration for a model head, and the HeadedConfig class, which extends a base configuration class with additional output heads.

Classes:

HeadConfig: A configuration class for a model head. HeadedConfig: A configuration class that extends a base configuration class with additional output heads.

class transformer_heads.config.HeadConfig(name: str, in_size: int, num_outputs: int | None, layer_hook: int = -1, hidden_size: int = 0, num_layers: int = 1, output_activation: str = 'linear', is_causal_lm: bool | None = False, pred_for_sequence: bool | None = False, is_regression: bool | None = False, output_bias: bool | None = False, loss_fct: str | None = 'cross_entropy', trainable: bool | None = True, loss_weight: float | None = 1.0)

Bases: dict

A configuration class for a model head.

Variables:
  • name (str) – The name of the head.

  • in_size (int) – The input size for the head.

  • num_outputs (Optional[int]) – The number of outputs for the head.

  • layer_hook (int) – The layer to hook the head to. This uses python list indexing, so -1 is the last layer. Default is -1.

  • hidden_size (int) – The size of the hidden layers if the head should be an mlp. Default is 0.

  • num_layers (int) – The number of layers in the head. Set to 1 for a linear head and to > 1 for an mlp head. Default is 1.

  • output_activation (str) – The activation function for the output layer. Default is “linear”.

  • is_causal_lm (Optional[bool]) – Whether the head is for doing causal language modelling. Default is False.

  • pred_for_sequence (Optional[bool]) – Whether the head predicts on output per sequence (E.g text classification). Default is False.

  • is_regression (Optional[bool]) – Whether the head is for a regression task. Default is False.

  • output_bias (Optional[bool]) – Whether to include a bias in the output layer. Default is False.

  • loss_fct (Optional[str]) – The loss function for the head. Options are “cross_entropy”, “mse”, “bce”. Default is “cross_entropy”.

  • trainable (Optional[bool]) – Whether the head is trainable. Default is True.

  • loss_weight (Optional[float]) – The weight of this head when computing the loss. Default is 1.0.

hidden_size: int = 0
in_size: int
is_causal_lm: bool | None = False
is_regression: bool | None = False
items() a set-like object providing a view on D's items
layer_hook: int = -1
loss_fct: str | None = 'cross_entropy'
loss_weight: float | None = 1.0
name: str
num_layers: int = 1
num_outputs: int | None
output_activation: str = 'linear'
output_bias: bool | None = False
pred_for_sequence: bool | None = False
trainable: bool | None = True
transformer_heads.config.create_headed_model_config(base_config_class: Type[PretrainedConfig]) Type[PretrainedConfig]

Creates a new configuration class with additional output heads.

This function takes a base configuration class and returns a new class that inherits from the base class and adds an output_heads attribute. The output_heads attribute is a list of HeadConfig instances.

Parameters:

base_config_class (Type[PretrainedConfig]) – The base configuration class to extend.

Returns:

A new configuration class that includes output heads.

Return type:

Type[PretrainedConfig]

transformer_heads.constants module

This module defines constants for loss functions and model types.

It includes loss_fct_map, a dictionary that maps loss function names to their corresponding PyTorch implementations, and model_type_map, a dictionary that maps model type names to their corresponding transformers model classes. activation_map is a dictionary that maps activation function names to their corresponding PyTorch implementations.

transformer_heads.output module

This module defines the output class for a model with multiple heads.

It includes the HeadedModelOutput class, which extends the ModelOutput class from the transformers library with additional attributes for multi-head models.

Classes:

HeadedModelOutput: An output class for a model with multiple heads.

class transformer_heads.output.HeadedModelOutput(loss: FloatTensor | None = None, loss_by_head: dict[str, FloatTensor] | None = None, preds_by_head: dict[str, FloatTensor] | None = None, past_key_values: Tuple[Tuple[FloatTensor]] | None = None, hidden_states: Tuple[FloatTensor, ...] | None = None, attentions: Tuple[FloatTensor, ...] | None = None)

Bases: ModelOutput

An output class for a model with multiple heads.

This class extends the ModelOutput class from the transformers library with additional attributes for multi-head models.

Variables:
  • loss (Optional[torch.FloatTensor]) – The total loss.

  • loss_by_head (Optional[dict[str, torch.FloatTensor]]) – A dictionary mapping head names to their corresponding losses.

  • preds_by_head (Optional[dict[str, torch.FloatTensor]]) – A dictionary mapping head names to their corresponding predictions.

  • past_key_values (Optional[Tuple[Tuple[torch.FloatTensor]]]) – Tuple of key value states for transformer models.

  • hidden_states (Optional[Tuple[torch.FloatTensor, ...]]) – Tuple of hidden states for transformer models.

  • attentions (Optional[Tuple[torch.FloatTensor, ...]]) – Tuple of attention weights for transformer models.

attentions: Tuple[FloatTensor, ...] | None = None
hidden_states: Tuple[FloatTensor, ...] | None = None
loss: FloatTensor | None = None
loss_by_head: dict[str, FloatTensor] | None = None
past_key_values: Tuple[Tuple[FloatTensor]] | None = None
preds_by_head: dict[str, FloatTensor] | None = None

Module contents