transformer_heads.util package

Submodules

transformer_heads.util.evaluate module

This module contains functions for evaluating a HeadedModel, a type of model that has multiple “heads” for different tasks. It provides functions to compute the model loss for each of its heads, get predictions from the model, and get the top n predictions for a given text.

Functions:

evaluate_head_wise(model: HeadedModel, ds: Dataset, collator=None, batch_size=8, epochs=1) -> tuple[int, dict[str, int]]:: Compute the model loss for each of its heads.
get_some_preds(model, ds, tokenizer, n=5, classification=True) -> tuple[list[str], dict[str, list[int]], dict[str, list[int]]]:: Get predictions from the model.
get_top_n_preds(n: int, model: HeadedModel, text: str, tokenizer: PreTrainedTokenizer):: Get the top n predictions for a given text. Use for models with causal language modeling heads.

transformer_heads.util.evaluate.evaluate_head_wise(model: HeadedModel, ds: Dataset, collator=None, batch_size=8, epochs=1) → tuple[int, dict[str, int]]

Compute the model loss for each of its heads.

Parameters:

model (HeadedModel) – The model to be evaluated.
ds (Dataset) – The dataset to be used for evaluation.
collator (callable, optional) – Merges a list of samples to form a mini-batch.
batch_size (int, optional) – The size of each batch. Defaults to 8.
epochs (int, optional) – The number of epochs for evaluation. Defaults to 1.

Returns:

The overall loss and the losses by each head.

Return type:

tuple[int, dict[str, int]]

transformer_heads.util.evaluate.get_some_preds(model, ds, tokenizer, n=5, classification=True) → tuple[list[str], dict[str, list[int]], dict[str, list[int]]]

Get predictions from the model.

Parameters:

model (HeadedModel) – The model to be used for prediction.
ds (Dataset) – The dataset to be used for prediction.
tokenizer (PreTrainedTokenizer) – The tokenizer to be used.
n (int, optional) – The number of predictions to get (From the beginning of the datset). Defaults to 5.
classification (bool, optional) – Whether the task is text classification. Defaults to True.

Returns:

The inputs, predictions, and ground truths.

Return type:

tuple[list[str], dict[str, list[int]], dict[str, list[int]]]

transformer_heads.util.evaluate.get_top_n_preds(n: int, model: HeadedModel, text: str, tokenizer: PreTrainedTokenizer)

Get the top n predictions for a given text. Use for models with causal language modeling heads.

Parameters:

n (int) – The number of top predictions to get.
model (HeadedModel) – The model to be used for prediction.
text (str) – The input text to be used for prediction.
tokenizer (PreTrainedTokenizer) – The tokenizer to be used.

Returns:

The top n predictions for each head.

Return type:

dict[str, list[str]]

transformer_heads.util.helpers module

This module provides helper functions and classes for handling data and models in a language model training and evaluation pipeline. It includes a data collator for padding sequences and a function for getting model parameters based on the model type.

Classes:: DataCollatorWithPadding: A data collator that pads sequences to the same length.
Functions:: get_model_params(model_path: str): Get the parameters of a model based on its type.

class transformer_heads.util.helpers.DataCollatorWithPadding(feature_name_to_padding_value: dict[str, int])

Bases: object

A data collator that pads sequences to the same length.

Variables:: feature_name_to_padding_value (dict[str, int]) – A dictionary mapping feature names to their padding values.

__call__(features: List[Dict[str, Any]]) -> Dict[str, Any]: Pad the sequences in the features to the same length.

feature_name_to_padding_value: dict[str, int]

transformer_heads.util.helpers.get_model_params(model_path: str)

Get the parameters of a model based on its type.

Parameters:: model_path (str) – The name of the huggingface model.
Returns:: A dictionary containing the model class, hidden size, and vocab size.
Return type:: dict
Raises:: ValueError – If the model type is unknown.

transformer_heads.util.load_model module

This module provides functions for loading and creating transformer models with additional heads.

Functions:: patch_quantization_config: Modifies the quantization configuration to skip head modules during the quantization process. load_headed: Loads a transformer model with additional heads. load_lora_with_heads: Loads a LoRA (Low Rank Adaptation) transformer model with additional heads. create_headed_qlora: Creates a quantized LoRA (Low Rank Adaptation) transformer model with additional heads.

These functions are used to load and create transformer models with additional heads, which can be useful for tasks such as multi-task learning or linear probes. The models can be loaded with or without quantization, and with or without LoRA (Low Rank Adaptation).

transformer_heads.util.load_model.create_headed_qlora(base_model_class: Type[PreTrainedModel], model_name: str, quantization_config: BitsAndBytesConfig, lora_config: LoraConfig, head_configs: list[HeadConfig], fully_trained_heads: bool = True, device_map='auto', gradient_checkpointing: bool = False, **kwargs)

Creates a quantized LoRA (Low Rank Adaptation) transformer model with additional heads.

Parameters:

base_model_class (Type[PreTrainedModel]) – The class of the base transformer model.
model_name (str) – The name of the pretrained base model (e.g. it’s huggingface name).
quantization_config (BitsAndBytesConfig) – The quantization configuration to use when creating the model.
lora_config (LoraConfig) – The LoRA configuration to adapt the model with.
head_configs (list[HeadConfig]) – A list of head configurations.
fully_trained_heads (bool, optional) – Whether the heads should be fully trained.
device_map (str, optional) – The device map to use when creating the model.
gradient_checkpointing (bool, optional) – Whether to prepare the model for gradient checkpointing.
**kwargs – Additional keyword arguments to pass to from_pretrained.

transformer_heads.util.load_model.load_headed(base_model_class: Type[PreTrainedModel], model_name: str, head_configs=None, head_folder_path=None, only_inference: bool = False, device_map='auto', quantization_config: BitsAndBytesConfig = None, freeze_base_model: bool = True, **kwargs)

Loads a transformer model with additional heads.

Parameters:

base_model_class (Type[PreTrainedModel]) – The class of the base transformer model.
model_name (str) – The huggingface name of the model to load.
head_configs (list, optional) – A list of head configurations.
head_folder_path (str, optional) – The path to the folder containing the saved heads and head configurations.
only_inference (bool, optional) – Whether to load the model for inference only.
device_map (str, optional) – The device map to use when loading the model.
quantization_config (BitsAndBytesConfig, optional) – The quantization configuration to use when loading the model.
freeze_base_model (bool, optional) – Whether to freeze the base model during training.
**kwargs – Additional keyword arguments to pass to from_pretrained.

transformer_heads.util.load_model.load_lora_with_heads(base_model_class: Type[PreTrainedModel], path: str, quantization_config: BitsAndBytesConfig = None, only_inference: bool = False, fully_trained_heads: bool = True, device_map='auto', torch_dtype=torch.float32, gradient_checkpointing: bool = False, **kwargs)

Loads a LoRA (Low Rank Adaptation) transformer model with additional heads.

Parameters:

base_model_class (Type[PreTrainedModel]) – The class of the base transformer model.
path (str) – The path (saved or huggingface) to the headed model to load.
quantization_config (BitsAndBytesConfig, optional) – The quantization configuration to use when loading the model.
only_inference (bool, optional) – Whether to load the model for inference only.
fully_trained_heads (bool, optional) – Whether to fully train all the heads.
device_map (str, optional) – The device map to use when loading the model.
torch_dtype (torch.dtype, optional) – The torch processing data type for the model.
gradient_checkpointing (bool, optional) – Whether to prepare the model for gradient checkpointing.
**kwargs – Additional keyword arguments to pass to from_pretrained.

transformer_heads.util.load_model.patch_quantization_config(quantization_config: BitsAndBytesConfig)

Modifies the quantization configuration to skip head modules during the quantization process.

Parameters:: quantization_config (BitsAndBytesConfig) – The quantization configuration to modify.

transformer_heads.util.model module

This module contains utility functions for handling and modifying the state of a model, finding all linear names in a model, printing the number of trainable parameters in the model, and patching the save_pretrained method of a model.

Functions:

patch_state_dict(state_dict: Dict):: Patch a state_dict to fix problems with zero-dimensional tensors.
find_all_linear_names(bits: int, model: torch.nn.Module, noadd: List[str] = []):: Find all linear modules in a model.
print_trainable_parameters(model: torch.nn.Module, use_4bit: bool = False):: Print some information about the trainable parameters off a model.
patch_save_pretrained(model: torch.nn.Module, preserve_old: bool = True):: Patch the save_pretrained method of a model to save heads and head configurations.

transformer_heads.util.model.find_all_linear_names(bits, model, noadd=[])

Find all linear modules in a model.

Parameters:

bits (int) – The number of bits used in quantization. (set to 32 for unquantized model)
model (torch.nn.Module) – The model to find linear names in.
noadd (List[str], optional) – A list of names to exclude. Defaults to [].

Returns:

A list of all linear names in the model.

Return type:

List[str]

transformer_heads.util.model.patch_save_pretrained(model, preserve_old: bool = True)

Patch the save_pretrained method of a model to save heads and head configurations.

Parameters:

model (torch.nn.Module) – The model to patch the save_pretrained method for.
preserve_old (bool, optional) – Whether to preserve (and call) the old save_pretrained method. Defaults to True.

transformer_heads.util.model.patch_state_dict(state_dict)

Patch a state_dict to fix problems with zero-dimensional tensors.

Parameters:: state_dict (Dict) – The state dictionary of a model.
Returns:: The modified state dictionary.
Return type:: Dict

transformer_heads.util.model.print_trainable_parameters(model, use_4bit=False)

Print some information about the trainable parameters off a model.

Parameters:

model (torch.nn.Module) – The model to print the number of trainable parameters for.
use_4bit (bool, optional) – Whether 4-bit quantization is used. Defaults to False.

transformer_heads.util package

Submodules

transformer_heads.util.evaluate module

transformer_heads.util.helpers module

transformer_heads.util.load_model module

transformer_heads.util.model module

Module contents