transformer_heads package
Subpackages
- transformer_heads.model package
- transformer_heads.tests package
- transformer_heads.util package
Submodules
transformer_heads.config module
This module defines the configuration classes for the model and its heads.
It includes the HeadConfig class, which defines the configuration for a model head, and the HeadedConfig class, which extends a base configuration class with additional output heads.
- Classes:
HeadConfig: A configuration class for a model head. HeadedConfig: A configuration class that extends a base configuration class with additional output heads.
- class transformer_heads.config.HeadConfig(name: str, in_size: int, num_outputs: int | None, layer_hook: int = -1, hidden_size: int = 0, num_layers: int = 1, output_activation: str = 'linear', is_causal_lm: bool | None = False, pred_for_sequence: bool | None = False, is_regression: bool | None = False, output_bias: bool | None = False, loss_fct: str | None = 'cross_entropy', trainable: bool | None = True, loss_weight: float | None = 1.0)
Bases:
dictA configuration class for a model head.
- Variables:
name (str) – The name of the head.
in_size (int) – The input size for the head.
num_outputs (Optional[int]) – The number of outputs for the head.
layer_hook (int) – The layer to hook the head to. This uses python list indexing, so -1 is the last layer. Default is -1.
hidden_size (int) – The size of the hidden layers if the head should be an mlp. Default is 0.
num_layers (int) – The number of layers in the head. Set to 1 for a linear head and to > 1 for an mlp head. Default is 1.
output_activation (str) – The activation function for the output layer. Default is “linear”.
is_causal_lm (Optional[bool]) – Whether the head is for doing causal language modelling. Default is False.
pred_for_sequence (Optional[bool]) – Whether the head predicts on output per sequence (E.g text classification). Default is False.
is_regression (Optional[bool]) – Whether the head is for a regression task. Default is False.
output_bias (Optional[bool]) – Whether to include a bias in the output layer. Default is False.
loss_fct (Optional[str]) – The loss function for the head. Options are “cross_entropy”, “mse”, “bce”. Default is “cross_entropy”.
trainable (Optional[bool]) – Whether the head is trainable. Default is True.
loss_weight (Optional[float]) – The weight of this head when computing the loss. Default is 1.0.
- in_size: int
- is_causal_lm: bool | None = False
- is_regression: bool | None = False
- items() a set-like object providing a view on D's items
- layer_hook: int = -1
- loss_fct: str | None = 'cross_entropy'
- loss_weight: float | None = 1.0
- name: str
- num_layers: int = 1
- num_outputs: int | None
- output_activation: str = 'linear'
- output_bias: bool | None = False
- pred_for_sequence: bool | None = False
- trainable: bool | None = True
- transformer_heads.config.create_headed_model_config(base_config_class: Type[PretrainedConfig]) Type[PretrainedConfig]
Creates a new configuration class with additional output heads.
This function takes a base configuration class and returns a new class that inherits from the base class and adds an output_heads attribute. The output_heads attribute is a list of HeadConfig instances.
- Parameters:
base_config_class (Type[PretrainedConfig]) – The base configuration class to extend.
- Returns:
A new configuration class that includes output heads.
- Return type:
Type[PretrainedConfig]
transformer_heads.constants module
This module defines constants for loss functions and model types.
It includes loss_fct_map, a dictionary that maps loss function names to their corresponding PyTorch implementations, and model_type_map, a dictionary that maps model type names to their corresponding transformers model classes. activation_map is a dictionary that maps activation function names to their corresponding PyTorch implementations.
transformer_heads.output module
This module defines the output class for a model with multiple heads.
It includes the HeadedModelOutput class, which extends the ModelOutput class from the transformers library with additional attributes for multi-head models.
- Classes:
HeadedModelOutput: An output class for a model with multiple heads.
- class transformer_heads.output.HeadedModelOutput(loss: FloatTensor | None = None, loss_by_head: dict[str, FloatTensor] | None = None, preds_by_head: dict[str, FloatTensor] | None = None, past_key_values: Tuple[Tuple[FloatTensor]] | None = None, hidden_states: Tuple[FloatTensor, ...] | None = None, attentions: Tuple[FloatTensor, ...] | None = None)
Bases:
ModelOutputAn output class for a model with multiple heads.
This class extends the ModelOutput class from the transformers library with additional attributes for multi-head models.
- Variables:
loss (Optional[torch.FloatTensor]) – The total loss.
loss_by_head (Optional[dict[str, torch.FloatTensor]]) – A dictionary mapping head names to their corresponding losses.
preds_by_head (Optional[dict[str, torch.FloatTensor]]) – A dictionary mapping head names to their corresponding predictions.
past_key_values (Optional[Tuple[Tuple[torch.FloatTensor]]]) – Tuple of key value states for transformer models.
hidden_states (Optional[Tuple[torch.FloatTensor, ...]]) – Tuple of hidden states for transformer models.
attentions (Optional[Tuple[torch.FloatTensor, ...]]) – Tuple of attention weights for transformer models.
- attentions: Tuple[FloatTensor, ...] | None = None
- loss: FloatTensor | None = None
- loss_by_head: dict[str, FloatTensor] | None = None
- past_key_values: Tuple[Tuple[FloatTensor]] | None = None
- preds_by_head: dict[str, FloatTensor] | None = None