heads¶

heads provide a collection of pre-defined prediction heads.

heads take in either a ModelOutupt, a dict, or a tuple as input. It automatically looks for the model output required for prediction and processes it accordingly.

Some prediction heads may require additional information, such as the attention_mask or the input_ids, like ContactPredictionHead. These additional arguments can be passed in as arguments/keyword arguments.

Note that heads use the same ModelOutupt conventions as the Transformers. If the model output is a tuple, we consider the first element as the pooler_output, the second element as the last_hidden_state, and the last element as the attention_map. It is the user’s responsibility to ensure that the model output is correctly formatted.

If the model output is a ModelOutupt or a dict, the heads will look for the HeadConfig.output_name from the model output. You can specify the output_name in the HeadConfig to ensure that the heads can correctly locate the required tensor.

multimolecule.module.heads.config ¶

HeadConfig ¶

Bases: BaseHeadConfig

Configuration class for a prediction head.

Parameters:

Name	Description	Default
`num_labels` ¶	Number of labels to use in the last layer added to the model, typically for a classification task. Head should look for `Config.num_labels` if is `None`.	required
`problem_type` ¶	Problem type for `XxxForYyyPrediction` models. Can be one of `"binary"`, `"regression"`, `"multiclass"` or `"multilabel"`. Head should look for `Config.problem_type` if is `None`.	required
`hidden_size` ¶	Dimensionality of the encoder layers and the pooler layer. Head should look for `Config.hidden_size` if is `None`.	required
`dropout` ¶	The dropout ratio for the hidden states.	required
`transform` ¶	The transform operation applied to hidden states.	required
`transform_act` ¶	The activation function of transform applied to hidden states.	required
`bias` ¶	Whether to apply bias to the final prediction layer.	required
`act` ¶	The activation function of the final prediction output.	required
`layer_norm_eps` ¶	The epsilon used by the layer normalization layers.	required
`output_name` ¶	The name of the tensor required in model outputs. If is `None`, will use the default output name of the corresponding head.	required
`type` ¶	The type of the head in the model. This is used by [`MultiMoleculeModel`][multimolecule.MultiMoleculeModel] to construct heads.	required

Source code in multimolecule/module/heads/config.py

Python
class HeadConfig(BaseHeadConfig):
    r"""
    Configuration class for a prediction head.

    Args:
        num_labels:
            Number of labels to use in the last layer added to the model, typically for a classification task.

            Head should look for [`Config.num_labels`][multimolecule.PreTrainedConfig] if is `None`.
        problem_type:
            Problem type for `XxxForYyyPrediction` models. Can be one of `"binary"`, `"regression"`,
            `"multiclass"` or `"multilabel"`.

            Head should look for [`Config.problem_type`][multimolecule.PreTrainedConfig] if is `None`.
        hidden_size:
            Dimensionality of the encoder layers and the pooler layer.

            Head should look for [`Config.hidden_size`][multimolecule.PreTrainedConfig] if is `None`.
        dropout:
            The dropout ratio for the hidden states.
        transform:
            The transform operation applied to hidden states.
        transform_act:
            The activation function of transform applied to hidden states.
        bias:
            Whether to apply bias to the final prediction layer.
        act:
            The activation function of the final prediction output.
        layer_norm_eps:
            The epsilon used by the layer normalization layers.
        output_name:
            The name of the tensor required in model outputs.

            If is `None`, will use the default output name of the corresponding head.
        type:
            The type of the head in the model.

            This is used by [`MultiMoleculeModel`][multimolecule.MultiMoleculeModel] to construct heads.
    """

    num_labels: Optional[int] = None
    problem_type: Optional[str] = None
    hidden_size: Optional[int] = None
    dropout: float = 0.0
    transform: Optional[str] = None
    transform_act: Optional[str] = "gelu"
    bias: bool = True
    act: Optional[str] = None
    layer_norm_eps: float = 1e-12
    output_name: Optional[str] = None
    type: Optional[str] = None

MaskedLMHeadConfig ¶

Bases: BaseHeadConfig

Configuration class for a Masked Language Modeling head.

Parameters:

Name	Description	Default
`hidden_size` ¶	Dimensionality of the encoder layers and the pooler layer. Head should look for `Config.hidden_size` if is `None`.	required
`dropout` ¶	The dropout ratio for the hidden states.	required
`transform` ¶	The transform operation applied to hidden states.	required
`transform_act` ¶	The activation function of transform applied to hidden states.	required
`bias` ¶	Whether to apply bias to the final prediction layer.	required
`act` ¶	The activation function of the final prediction output.	required
`layer_norm_eps` ¶	The epsilon used by the layer normalization layers.	required
`output_name` ¶	The name of the tensor required in model outputs. If is `None`, will use the default output name of the corresponding head.	required

Source code in multimolecule/module/heads/config.py

Python
class MaskedLMHeadConfig(BaseHeadConfig):
    r"""
    Configuration class for a Masked Language Modeling head.

    Args:
        hidden_size:
            Dimensionality of the encoder layers and the pooler layer.

            Head should look for [`Config.hidden_size`][multimolecule.PreTrainedConfig] if is `None`.
        dropout:
            The dropout ratio for the hidden states.
        transform:
            The transform operation applied to hidden states.
        transform_act:
            The activation function of transform applied to hidden states.
        bias:
            Whether to apply bias to the final prediction layer.
        act:
            The activation function of the final prediction output.
        layer_norm_eps:
            The epsilon used by the layer normalization layers.
        output_name:
            The name of the tensor required in model outputs.

            If is `None`, will use the default output name of the corresponding head.
    """

    hidden_size: Optional[int] = None
    dropout: float = 0.0
    transform: Optional[str] = "nonlinear"
    transform_act: Optional[str] = "gelu"
    bias: bool = True
    act: Optional[str] = None
    layer_norm_eps: float = 1e-12
    output_name: Optional[str] = None

multimolecule.module.heads.sequence ¶

SequencePredictionHead ¶

Bases: PredictionHead

Head for tasks in sequence-level.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/sequence.py

Python
@HeadRegistry.register("sequence")
class SequencePredictionHead(PredictionHead):
    r"""
    Head for tasks in sequence-level.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    output_name: str = "pooler_output"
    r"""The default output to use for the head."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        super().__init__(config, head_config)

    def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
        self,
        outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
        labels: Tensor | None = None,
        output_name: str | None = None,
        **kwargs,
    ) -> HeadOutput:
        r"""
        Forward pass of the SequencePredictionHead.

        Args:
            outputs: The outputs of the model.
            labels: The labels for the head.
            output_name: The name of the output to use.
                Defaults to `self.output_name`.
        """
        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[1]
        return super().forward(output, labels, **kwargs)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'pooler_output'

The default output to use for the head.

forward ¶

Python

forward(outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...], labels: Tensor | None = None, output_name: str | None = None, **kwargs) -> HeadOutput

Forward pass of the SequencePredictionHead.

Parameters:

Name	Type	Description	Default
`outputs` ¶	`ModelOutput \| Mapping[str, Tensor] \| Tuple[Tensor, ...]`	The outputs of the model.	required
`labels` ¶	`Tensor \| None`	The labels for the head.	`None`
`output_name` ¶	`str \| None`	The name of the output to use. Defaults to `self.output_name`.	`None`

Source code in multimolecule/module/heads/sequence.py

Python
def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
    self,
    outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
    labels: Tensor | None = None,
    output_name: str | None = None,
    **kwargs,
) -> HeadOutput:
    r"""
    Forward pass of the SequencePredictionHead.

    Args:
        outputs: The outputs of the model.
        labels: The labels for the head.
        output_name: The name of the output to use.
            Defaults to `self.output_name`.
    """
    if isinstance(outputs, (Mapping, ModelOutput)):
        output = outputs[output_name or self.output_name]
    elif isinstance(outputs, tuple):
        output = outputs[1]
    return super().forward(output, labels, **kwargs)

multimolecule.module.heads.token ¶

TokenPredictionHead ¶

Bases: PredictionHead

Head for tasks in token-level.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/token.py

Python
@HeadRegistry.token.register("single", default=True)
@TokenHeadRegistryHF.register("single", default=True)
class TokenPredictionHead(PredictionHead):
    r"""
    Head for tasks in token-level.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    output_name: str = "last_hidden_state"
    r"""The default output to use for the head."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        super().__init__(config, head_config)

    def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
        self,
        outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
        attention_mask: Tensor | None = None,
        input_ids: NestedTensor | Tensor | None = None,
        labels: Tensor | None = None,
        output_name: str | None = None,
        **kwargs,
    ) -> HeadOutput:
        r"""
        Forward pass of the TokenPredictionHead.

        Args:
            outputs: The outputs of the model.
            attention_mask: The attention mask for the inputs.
            input_ids: The input ids for the inputs.
            labels: The labels for the head.
            output_name: The name of the output to use.
                Defaults to `self.output_name`.
        """
        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[0]
        else:
            raise ValueError(f"Unsupported type for outputs: {type(outputs)}")

        if attention_mask is None:
            attention_mask = self._get_attention_mask(input_ids)
        output = output * attention_mask.unsqueeze(-1)
        output, _, _ = self._remove_special_tokens(output, attention_mask, input_ids)

        return super().forward(output, labels, **kwargs)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'last_hidden_state'

The default output to use for the head.

forward ¶

Python

forward(outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...], attention_mask: Tensor | None = None, input_ids: NestedTensor | Tensor | None = None, labels: Tensor | None = None, output_name: str | None = None, **kwargs) -> HeadOutput

Forward pass of the TokenPredictionHead.

Parameters:

Name	Type	Description	Default
`outputs` ¶	`ModelOutput \| Mapping[str, Tensor] \| Tuple[Tensor, ...]`	The outputs of the model.	required
`attention_mask` ¶	`Tensor \| None`	The attention mask for the inputs.	`None`
`input_ids` ¶	`NestedTensor \| Tensor \| None`	The input ids for the inputs.	`None`
`labels` ¶	`Tensor \| None`	The labels for the head.	`None`
`output_name` ¶	`str \| None`	The name of the output to use. Defaults to `self.output_name`.	`None`

Source code in multimolecule/module/heads/token.py

Python
def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
    self,
    outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
    attention_mask: Tensor | None = None,
    input_ids: NestedTensor | Tensor | None = None,
    labels: Tensor | None = None,
    output_name: str | None = None,
    **kwargs,
) -> HeadOutput:
    r"""
    Forward pass of the TokenPredictionHead.

    Args:
        outputs: The outputs of the model.
        attention_mask: The attention mask for the inputs.
        input_ids: The input ids for the inputs.
        labels: The labels for the head.
        output_name: The name of the output to use.
            Defaults to `self.output_name`.
    """
    if isinstance(outputs, (Mapping, ModelOutput)):
        output = outputs[output_name or self.output_name]
    elif isinstance(outputs, tuple):
        output = outputs[0]
    else:
        raise ValueError(f"Unsupported type for outputs: {type(outputs)}")

    if attention_mask is None:
        attention_mask = self._get_attention_mask(input_ids)
    output = output * attention_mask.unsqueeze(-1)
    output, _, _ = self._remove_special_tokens(output, attention_mask, input_ids)

    return super().forward(output, labels, **kwargs)

TokenKMerHead ¶

Bases: PredictionHead

Head for tasks in token-level with kmer inputs.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/token.py

Python
@HeadRegistry.register("token.kmer")
@TokenHeadRegistryHF.register("kmer")
class TokenKMerHead(PredictionHead):
    r"""
    Head for tasks in token-level with kmer inputs.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    output_name: str = "last_hidden_state"
    r"""The default output to use for the head."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        super().__init__(config, head_config)
        self.nmers = config.nmers

        # Do not pass bos_token_id and eos_token_id to unfold_kmer_embeddings
        # As they will be removed in preprocess
        self.unfold_kmer_embeddings = partial(unfold_kmer_embeddings, nmers=self.nmers)

    def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
        self,
        outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
        attention_mask: Tensor | None = None,
        input_ids: NestedTensor | Tensor | None = None,
        labels: Tensor | None = None,
        output_name: str | None = None,
        **kwargs,
    ) -> HeadOutput:
        r"""
        Forward pass of the TokenKMerHead.

        Args:
            outputs: The outputs of the model.
            attention_mask: The attention mask for the inputs.
            input_ids: The input ids for the inputs.
            labels: The labels for the head.
            output_name: The name of the output to use.
                Defaults to `self.output_name`.
        """
        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[0]
        else:
            raise ValueError(f"Unsupported type for outputs: {type(outputs)}")

        if attention_mask is None:
            attention_mask = self._get_attention_mask(input_ids)
        output = output * attention_mask.unsqueeze(-1)
        output, attention_mask, _ = self._remove_special_tokens(output, attention_mask, input_ids)

        output = self.unfold_kmer_embeddings(output, attention_mask)
        return super().forward(output, labels, **kwargs)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'last_hidden_state'

The default output to use for the head.

forward ¶

Python

forward(outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...], attention_mask: Tensor | None = None, input_ids: NestedTensor | Tensor | None = None, labels: Tensor | None = None, output_name: str | None = None, **kwargs) -> HeadOutput

Forward pass of the TokenKMerHead.

Parameters:

Name	Type	Description	Default
`outputs` ¶	`ModelOutput \| Mapping[str, Tensor] \| Tuple[Tensor, ...]`	The outputs of the model.	required
`attention_mask` ¶	`Tensor \| None`	The attention mask for the inputs.	`None`
`input_ids` ¶	`NestedTensor \| Tensor \| None`	The input ids for the inputs.	`None`
`labels` ¶	`Tensor \| None`	The labels for the head.	`None`
`output_name` ¶	`str \| None`	The name of the output to use. Defaults to `self.output_name`.	`None`

Source code in multimolecule/module/heads/token.py

Python
def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
    self,
    outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
    attention_mask: Tensor | None = None,
    input_ids: NestedTensor | Tensor | None = None,
    labels: Tensor | None = None,
    output_name: str | None = None,
    **kwargs,
) -> HeadOutput:
    r"""
    Forward pass of the TokenKMerHead.

    Args:
        outputs: The outputs of the model.
        attention_mask: The attention mask for the inputs.
        input_ids: The input ids for the inputs.
        labels: The labels for the head.
        output_name: The name of the output to use.
            Defaults to `self.output_name`.
    """
    if isinstance(outputs, (Mapping, ModelOutput)):
        output = outputs[output_name or self.output_name]
    elif isinstance(outputs, tuple):
        output = outputs[0]
    else:
        raise ValueError(f"Unsupported type for outputs: {type(outputs)}")

    if attention_mask is None:
        attention_mask = self._get_attention_mask(input_ids)
    output = output * attention_mask.unsqueeze(-1)
    output, attention_mask, _ = self._remove_special_tokens(output, attention_mask, input_ids)

    output = self.unfold_kmer_embeddings(output, attention_mask)
    return super().forward(output, labels, **kwargs)

unfold_kmer_embeddings ¶

Python

unfold_kmer_embeddings(embeddings: Tensor, attention_mask: Tensor, nmers: int, bos_token_id: int | None = None, eos_token_id: int | None = None) -> Tensor

Unfold k-mer embeddings to token embeddings.

For k-mer input, each embedding column represents k tokens. This should be fine for sequence level tasks, but sacrifices the resolution for token level tasks. This function unfolds the k-mer embeddings to token embeddings by sliding averaging the k-mer embeddings.

For example:

input tokens = ACGU

2-mer embeddings = [<CLS>, AC, CG, GU, <SEP>].

token embeddings = [<CLS>, AC, (AC + CG) / 2, (CG + GU) / 2, GU, <SEP>].

Parameters:

Name	Type	Description	Default
`embeddings` ¶	`Tensor`	The k-mer embeddings.	required
`attention_mask` ¶	`Tensor`	The attention mask.	required
`nmers` ¶	`int`	The number of tokens in each k-mer.	required
`bos_token_id` ¶	`int \| None`	The id of the beginning of sequence token. If not None, the first valid token will not be included in sliding averaging.	`None`
`eos_token_id` ¶	`int \| None`	The id of the end of sequence token. If not None, the last valid token will not be included in sliding averaging.	`None`

Returns:

Type	Description
`Tensor`	The token embeddings.

Examples:

Python Console Session
>>> from danling import NestedTensor
>>> embeddings = NestedTensor(torch.arange(3).repeat(2, 1).T, torch.arange(5).repeat(2, 1).T) + 1
>>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 3, True, True)
>>> output[0, :, 0].tolist()
[1.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0]
>>> output[1, :, 0].tolist()
[1.0, 2.0, 2.5, 3.0, 3.5, 4.0, 5.0]
>>> embeddings = NestedTensor(torch.arange(5).repeat(2, 1).T, torch.arange(7).repeat(2, 1).T) + 1
>>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 4, True, True)
>>> output[0, :, 0].tolist()
[1.0, 2.0, 2.5, 3.0, 3.0, 3.5, 4.0, 5.0, 0.0, 0.0]
>>> output[1, :, 0].tolist()
[1.0, 2.0, 2.5, 3.0, 3.5, 4.5, 5.0, 5.5, 6.0, 7.0]
>>> embeddings = NestedTensor(torch.arange(7).repeat(2, 1).T, torch.arange(11).repeat(2, 1).T) + 1
>>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 5, True, True)
>>> output[0, :, 0].tolist()
[1.0, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 7.0, 0.0, 0.0, 0.0, 0.0]
>>> output[1, :, 0].tolist()
[1.0, 2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 6.0, 7.0, 8.0, 8.5, 9.0, 9.5, 10.0, 11.0]
>>> embeddings = NestedTensor(torch.arange(3).repeat(2, 1).T, torch.arange(4).repeat(2, 1).T) + 1
>>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 6, True, True)
>>> output[0, :, 0].tolist()
[1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0]
>>> output[1, :, 0].tolist()
[1.0, 2.0, 2.5, 2.5, 2.5, 2.5, 2.5, 3.0, 4.0]
>>> embeddings = NestedTensor(torch.arange(1).repeat(2, 1).T, torch.arange(2).repeat(2, 1).T) + 1
>>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 6)
>>> output[0, :, 0].tolist()
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0]
>>> output[1, :, 0].tolist()
[1.0, 1.5, 1.5, 1.5, 1.5, 1.5, 2.0]

Source code in multimolecule/module/heads/token.py

Python
def unfold_kmer_embeddings(
    embeddings: Tensor,
    attention_mask: Tensor,
    nmers: int,
    bos_token_id: int | None = None,
    eos_token_id: int | None = None,
) -> Tensor:
    r"""
    Unfold k-mer embeddings to token embeddings.

    For k-mer input, each embedding column represents k tokens.
    This should be fine for sequence level tasks, but sacrifices the resolution for token level tasks.
    This function unfolds the k-mer embeddings to token embeddings by sliding averaging the k-mer embeddings.

    For example:

    input tokens = `ACGU`

    2-mer embeddings = `[<CLS>, AC, CG, GU, <SEP>]`.

    token embeddings = `[<CLS>, AC, (AC + CG) / 2, (CG + GU) / 2, GU, <SEP>]`.

    Args:
        embeddings: The k-mer embeddings.
        attention_mask: The attention mask.
        nmers: The number of tokens in each k-mer.
        bos_token_id: The id of the beginning of sequence token.
            If not None, the first valid token will not be included in sliding averaging.
        eos_token_id: The id of the end of sequence token.
            If not None, the last valid token will not be included in sliding averaging.

    Returns:
        The token embeddings.

    Examples:
        >>> from danling import NestedTensor
        >>> embeddings = NestedTensor(torch.arange(3).repeat(2, 1).T, torch.arange(5).repeat(2, 1).T) + 1
        >>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 3, True, True)
        >>> output[0, :, 0].tolist()
        [1.0, 2.0, 2.0, 2.0, 3.0, 0.0, 0.0]
        >>> output[1, :, 0].tolist()
        [1.0, 2.0, 2.5, 3.0, 3.5, 4.0, 5.0]
        >>> embeddings = NestedTensor(torch.arange(5).repeat(2, 1).T, torch.arange(7).repeat(2, 1).T) + 1
        >>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 4, True, True)
        >>> output[0, :, 0].tolist()
        [1.0, 2.0, 2.5, 3.0, 3.0, 3.5, 4.0, 5.0, 0.0, 0.0]
        >>> output[1, :, 0].tolist()
        [1.0, 2.0, 2.5, 3.0, 3.5, 4.5, 5.0, 5.5, 6.0, 7.0]
        >>> embeddings = NestedTensor(torch.arange(7).repeat(2, 1).T, torch.arange(11).repeat(2, 1).T) + 1
        >>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 5, True, True)
        >>> output[0, :, 0].tolist()
        [1.0, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 7.0, 0.0, 0.0, 0.0, 0.0]
        >>> output[1, :, 0].tolist()
        [1.0, 2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 6.0, 7.0, 8.0, 8.5, 9.0, 9.5, 10.0, 11.0]
        >>> embeddings = NestedTensor(torch.arange(3).repeat(2, 1).T, torch.arange(4).repeat(2, 1).T) + 1
        >>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 6, True, True)
        >>> output[0, :, 0].tolist()
        [1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 0.0]
        >>> output[1, :, 0].tolist()
        [1.0, 2.0, 2.5, 2.5, 2.5, 2.5, 2.5, 3.0, 4.0]
        >>> embeddings = NestedTensor(torch.arange(1).repeat(2, 1).T, torch.arange(2).repeat(2, 1).T) + 1
        >>> output = unfold_kmer_embeddings(embeddings.tensor.float(), embeddings.mask, 6)
        >>> output[0, :, 0].tolist()
        [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0]
        >>> output[1, :, 0].tolist()
        [1.0, 1.5, 1.5, 1.5, 1.5, 1.5, 2.0]
    """

    batch_size, seq_length, hidden_size = embeddings.size()
    last_valid_indices = attention_mask.sum(dim=-1)
    output = torch.zeros(batch_size, seq_length + nmers - 1, hidden_size, device=embeddings.device)
    for index, (tensor, seq_length) in enumerate(zip(embeddings, last_valid_indices)):
        embedding = tensor[:seq_length]
        if bos_token_id is not None:
            embedding = embedding[1:]
        if eos_token_id is not None:
            embedding = embedding[:-1]
        if len(embedding) > nmers:
            begin = torch.stack([embedding[:i].mean(0) for i in range(1, nmers)])
            medium = embedding.unfold(0, nmers, 1).mean(-1)
            end = torch.stack([embedding[-i:].mean(0) for i in range(nmers - 1, 0, -1)])
            embedding = torch.cat([begin, medium, end])
        elif len(embedding) > 2:
            begin = torch.stack([embedding[:i].mean(0) for i in range(1, len(embedding))])
            end = torch.stack([embedding[-i:].mean(0) for i in range(nmers, 0, -1)])
            embedding = torch.cat([begin, end])
        elif len(embedding) == 2:
            medium = embedding.mean(0).repeat(nmers - 1, 1)
            embedding = torch.cat([embedding[0][None, :], medium, embedding[1][None, :]])
        elif len(embedding) == 1:
            embedding = embedding.repeat(nmers, 1)
        else:
            raise ValueError("Sequence length is less than nmers.")
        if bos_token_id is not None:
            embedding = torch.cat([tensor[0][None, :], embedding])
        if eos_token_id is not None:
            embedding = torch.cat([embedding, tensor[seq_length - 1][None, :]])
        output[index, : seq_length + nmers - 1] = embedding
    return output

multimolecule.module.heads.contact ¶

ContactPredictionHead ¶

Bases: BasePredictionHead

Source code in multimolecule/module/heads/contact.py

Python
@HeadRegistry.contact.logits.register("projection", default=True)
class ContactPredictionHead(BasePredictionHead):

    output_name: str = "last_hidden_state"
    r"""The default output to use for the head."""

    require_attentions: bool = False
    r"""Whether the head requires attentions."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        super().__init__(config, head_config)
        self.dropout = nn.Dropout(self.config.dropout)
        self.transform = HeadTransformRegistryHF.build(self.config)
        out_channels: int = self.config.hidden_size  # type: ignore[assignment]
        self.q_proj = nn.Linear(out_channels, out_channels)
        self.decoder = nn.Linear(out_channels, self.num_labels, bias=False)
        self.activation = ACT2FN[self.config.act] if self.config.act is not None else None
        self.criterion = CriterionRegistry.build(self.config)
        # self.ffn = MLP(1, out_channels, residual=False)

    def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
        self,
        outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
        attention_mask: Tensor | None = None,
        input_ids: NestedTensor | Tensor | None = None,
        labels: Tensor | None = None,
        output_name: str | None = None,
        **kwargs,
    ) -> HeadOutput:
        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[0]
        else:
            raise ValueError(f"Unsupported type for outputs: {type(outputs)}")

        if attention_mask is None:
            attention_mask = self._get_attention_mask(input_ids)
        output = output * attention_mask.unsqueeze(-1)
        output, _, _ = self._remove_special_tokens(output, attention_mask, input_ids)

        output = self.dropout(output)
        output = self.transform(output)
        q = self.q_proj(output)
        contact_map = q.unsqueeze(1) * q.unsqueeze(2)
        # contact_map = (q @ q.transpose(-1, -2)).unsqueeze(-1)
        # contact_map = contact_map + self.ffn(contact_map)

        output = self.decoder(contact_map)
        if self.activation is not None:
            output = self.activation(output)
        if labels is not None:
            if isinstance(labels, NestedTensor):
                if isinstance(output, Tensor):
                    output = labels.nested_like(output, strict=False)
                return HeadOutput(output, self.criterion(output.concat, labels.concat))
            return HeadOutput(output, self.criterion(output, labels))
        return HeadOutput(output)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'last_hidden_state'

The default output to use for the head.

require_attentions `class-attribute` `instance-attribute` ¶

Python

require_attentions: bool = False

Whether the head requires attentions.

ContactAttentionLinearHead ¶

Bases: PredictionHead

Head for tasks in contact-level.

Performs symmetrization, and average product correct.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/contact.py

Python
@HeadRegistry.contact.attention.register("linear")
class ContactAttentionLinearHead(PredictionHead):
    r"""
    Head for tasks in contact-level.

    Performs symmetrization, and average product correct.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    output_name: str = "attentions"
    r"""The default output to use for the head."""

    require_attentions: bool = True
    r"""Whether the head requires attentions."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        if head_config is None:
            head_config = HeadConfig(hidden_size=config.num_hidden_layers * config.num_attention_heads)
        else:
            head_config.hidden_size = config.num_hidden_layers * config.num_attention_heads
        super().__init__(config, head_config)

    def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
        self,
        outputs: ModelOutput | Mapping | Tuple[Tensor, ...],
        attention_mask: Tensor | None = None,
        input_ids: NestedTensor | Tensor | None = None,
        labels: Tensor | None = None,
        output_name: str | None = None,
        **kwargs,
    ) -> HeadOutput:
        r"""
        Forward pass of the ContactPredictionHead.

        Args:
            outputs: The outputs of the model.
            attention_mask: The attention mask for the inputs.
            input_ids: The input ids for the inputs.
            labels: The labels for the head.
            output_name: The name of the output to use.
                Defaults to `self.output_name`.
        """
        if attention_mask is None:
            if isinstance(input_ids, NestedTensor):
                input_ids, attention_mask = input_ids.tensor, input_ids.mask
            else:
                if input_ids is None:
                    raise ValueError(
                        f"Either attention_mask or input_ids must be provided for {self.__class__.__name__} to work."
                    )
                if self.pad_token_id is None:
                    raise ValueError(
                        f"pad_token_id must be provided when attention_mask is not passed to {self.__class__.__name__}."
                    )
                attention_mask = input_ids.ne(self.pad_token_id)

        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[-1]
        attentions = torch.stack(output, 1)

        # In the original model, attentions for padding tokens are completely zeroed out.
        # This makes no difference most of the time because the other tokens won't attend to them,
        # but it does for the contact prediction task, which takes attentions as input,
        # so we have to mimic that here.
        attention_mask = attention_mask.unsqueeze(1) * attention_mask.unsqueeze(2)
        attentions *= attention_mask[:, None, None, :, :]

        # remove bos token attentions
        if self.bos_token_id is not None:
            attentions = attentions[..., 1:, 1:]
            # process attention_mask and input_ids to make removal of eos token happy
            attention_mask = attention_mask[..., 1:]
            if input_ids is not None:
                input_ids = input_ids[..., 1:]
        # remove eos token attentions
        if self.eos_token_id is not None:
            if input_ids is not None:
                eos_mask = input_ids.ne(self.eos_token_id).to(attentions)
            else:
                last_valid_indices = attention_mask.sum(dim=-1)
                seq_length = attention_mask.size(-1)
                eos_mask = torch.arange(seq_length, device=attentions.device).unsqueeze(0) == last_valid_indices
            eos_mask = eos_mask.unsqueeze(1) * eos_mask.unsqueeze(2)
            attentions *= eos_mask[:, None, None, :, :]
            attentions = attentions[..., :-1, :-1]

        # features: batch x channels x input_ids x input_ids (symmetric)
        batch_size, layers, heads, seqlen, _ = attentions.size()
        attentions = attentions.view(batch_size, layers * heads, seqlen, seqlen)
        attentions = attentions.to(self.decoder.weight.device)
        attentions = average_product_correct(symmetrize(attentions))
        attentions = attentions.permute(0, 2, 3, 1).squeeze(3)

        return super().forward(attentions, labels, **kwargs)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'attentions'

The default output to use for the head.

require_attentions `class-attribute` `instance-attribute` ¶

Python

require_attentions: bool = True

Whether the head requires attentions.

forward ¶

Python

forward(outputs: ModelOutput | Mapping | Tuple[Tensor, ...], attention_mask: Tensor | None = None, input_ids: NestedTensor | Tensor | None = None, labels: Tensor | None = None, output_name: str | None = None, **kwargs) -> HeadOutput

Forward pass of the ContactPredictionHead.

Parameters:

Name	Type	Description	Default
`outputs` ¶	`ModelOutput \| Mapping \| Tuple[Tensor, ...]`	The outputs of the model.	required
`attention_mask` ¶	`Tensor \| None`	The attention mask for the inputs.	`None`
`input_ids` ¶	`NestedTensor \| Tensor \| None`	The input ids for the inputs.	`None`
`labels` ¶	`Tensor \| None`	The labels for the head.	`None`
`output_name` ¶	`str \| None`	The name of the output to use. Defaults to `self.output_name`.	`None`

Source code in multimolecule/module/heads/contact.py

Python
def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
    self,
    outputs: ModelOutput | Mapping | Tuple[Tensor, ...],
    attention_mask: Tensor | None = None,
    input_ids: NestedTensor | Tensor | None = None,
    labels: Tensor | None = None,
    output_name: str | None = None,
    **kwargs,
) -> HeadOutput:
    r"""
    Forward pass of the ContactPredictionHead.

    Args:
        outputs: The outputs of the model.
        attention_mask: The attention mask for the inputs.
        input_ids: The input ids for the inputs.
        labels: The labels for the head.
        output_name: The name of the output to use.
            Defaults to `self.output_name`.
    """
    if attention_mask is None:
        if isinstance(input_ids, NestedTensor):
            input_ids, attention_mask = input_ids.tensor, input_ids.mask
        else:
            if input_ids is None:
                raise ValueError(
                    f"Either attention_mask or input_ids must be provided for {self.__class__.__name__} to work."
                )
            if self.pad_token_id is None:
                raise ValueError(
                    f"pad_token_id must be provided when attention_mask is not passed to {self.__class__.__name__}."
                )
            attention_mask = input_ids.ne(self.pad_token_id)

    if isinstance(outputs, (Mapping, ModelOutput)):
        output = outputs[output_name or self.output_name]
    elif isinstance(outputs, tuple):
        output = outputs[-1]
    attentions = torch.stack(output, 1)

    # In the original model, attentions for padding tokens are completely zeroed out.
    # This makes no difference most of the time because the other tokens won't attend to them,
    # but it does for the contact prediction task, which takes attentions as input,
    # so we have to mimic that here.
    attention_mask = attention_mask.unsqueeze(1) * attention_mask.unsqueeze(2)
    attentions *= attention_mask[:, None, None, :, :]

    # remove bos token attentions
    if self.bos_token_id is not None:
        attentions = attentions[..., 1:, 1:]
        # process attention_mask and input_ids to make removal of eos token happy
        attention_mask = attention_mask[..., 1:]
        if input_ids is not None:
            input_ids = input_ids[..., 1:]
    # remove eos token attentions
    if self.eos_token_id is not None:
        if input_ids is not None:
            eos_mask = input_ids.ne(self.eos_token_id).to(attentions)
        else:
            last_valid_indices = attention_mask.sum(dim=-1)
            seq_length = attention_mask.size(-1)
            eos_mask = torch.arange(seq_length, device=attentions.device).unsqueeze(0) == last_valid_indices
        eos_mask = eos_mask.unsqueeze(1) * eos_mask.unsqueeze(2)
        attentions *= eos_mask[:, None, None, :, :]
        attentions = attentions[..., :-1, :-1]

    # features: batch x channels x input_ids x input_ids (symmetric)
    batch_size, layers, heads, seqlen, _ = attentions.size()
    attentions = attentions.view(batch_size, layers * heads, seqlen, seqlen)
    attentions = attentions.to(self.decoder.weight.device)
    attentions = average_product_correct(symmetrize(attentions))
    attentions = attentions.permute(0, 2, 3, 1).squeeze(3)

    return super().forward(attentions, labels, **kwargs)

ContactAttentionResnetHead ¶

Bases: PredictionHead

Head for tasks in contact-level.

Performs symmetrization, and average product correct.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/contact.py

Python
@HeadRegistry.contact.attention.register("resnet")
class ContactAttentionResnetHead(PredictionHead):
    r"""
    Head for tasks in contact-level.

    Performs symmetrization, and average product correct.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    output_name: str = "attentions"
    r"""The default output to use for the head."""

    require_attentions: bool = True
    r"""Whether the head requires attentions."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        if head_config is None:
            head_config = HeadConfig(hidden_size=config.num_hidden_layers * config.num_attention_heads)
        else:
            head_config.hidden_size = config.num_hidden_layers * config.num_attention_heads
        super().__init__(config, head_config)
        num_layers = self.config.get("num_layers", 16)
        num_channels = self.config.get("num_channels", self.config.hidden_size)  # type: ignore[operator]
        block = self.config.get("block", "auto")
        self.decoder = ResNet(
            num_layers=num_layers,
            hidden_size=self.config.hidden_size,  # type: ignore[arg-type]
            block=block,
            num_channels=num_channels,
            num_labels=self.num_labels,
        )

    def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
        self,
        outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
        attention_mask: Tensor | None = None,
        input_ids: NestedTensor | Tensor | None = None,
        labels: Tensor | None = None,
        output_name: str | None = None,
        **kwargs,
    ) -> HeadOutput:
        r"""
        Forward pass of the ContactPredictionHead.

        Args:
            outputs: The outputs of the model.
            attention_mask: The attention mask for the inputs.
            input_ids: The input ids for the inputs.
            labels: The labels for the head.
            output_name: The name of the output to use.
                Defaults to `self.output_name`.
        """

        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[-1]
        attentions = torch.stack(output, 1)

        # In the original model, attentions for padding tokens are completely zeroed out.
        # This makes no difference most of the time because the other tokens won't attend to them,
        # but it does for the contact prediction task, which takes attentions as input,
        # so we have to mimic that here.
        if attention_mask is None:
            attention_mask = self._get_attention_mask(input_ids)
        attention_mask = attention_mask.unsqueeze(1) * attention_mask.unsqueeze(2)
        attentions = attentions * attention_mask[:, None, None, :, :]

        # remove bos token attentions
        if self.bos_token_id is not None:
            attentions = attentions[..., 1:, 1:]
            attention_mask = attention_mask[..., 1:]
            if input_ids is not None:
                input_ids = input_ids[..., 1:]
        # remove eos token attentions
        if self.eos_token_id is not None:
            if input_ids is not None:
                eos_mask = input_ids.ne(self.eos_token_id).to(attentions)
                input_ids = input_ids[..., :-1]
            else:
                last_valid_indices = attention_mask.sum(dim=-1)
                seq_length = attention_mask.size(-1)
                eos_mask = torch.arange(seq_length, device=attentions.device).unsqueeze(0) == last_valid_indices
            eos_mask = eos_mask.unsqueeze(1) * eos_mask.unsqueeze(2)
            attentions = attentions * eos_mask[:, None, None, :, :]
            attentions = attentions[..., :-1, :-1]
            attention_mask = attention_mask[..., :-1, :-1]

        # features: batch x channels x input_ids x input_ids (symmetric)
        batch_size, layers, heads, seqlen, _ = attentions.size()
        attentions = attentions.view(batch_size, layers * heads, seqlen, seqlen)
        attentions = attentions.to(self.decoder.proj.weight.device)
        attentions = average_product_correct(symmetrize(attentions))
        attentions = attentions.permute(0, 2, 3, 1).squeeze(3)

        return super().forward(attentions, labels, **kwargs)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'attentions'

The default output to use for the head.

require_attentions `class-attribute` `instance-attribute` ¶

Python

require_attentions: bool = True

Whether the head requires attentions.

forward ¶

Python

forward(outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...], attention_mask: Tensor | None = None, input_ids: NestedTensor | Tensor | None = None, labels: Tensor | None = None, output_name: str | None = None, **kwargs) -> HeadOutput

Forward pass of the ContactPredictionHead.

Parameters:

Name	Type	Description	Default
`outputs` ¶	`ModelOutput \| Mapping[str, Tensor] \| Tuple[Tensor, ...]`	The outputs of the model.	required
`attention_mask` ¶	`Tensor \| None`	The attention mask for the inputs.	`None`
`input_ids` ¶	`NestedTensor \| Tensor \| None`	The input ids for the inputs.	`None`
`labels` ¶	`Tensor \| None`	The labels for the head.	`None`
`output_name` ¶	`str \| None`	The name of the output to use. Defaults to `self.output_name`.	`None`

Source code in multimolecule/module/heads/contact.py

Python
def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
    self,
    outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
    attention_mask: Tensor | None = None,
    input_ids: NestedTensor | Tensor | None = None,
    labels: Tensor | None = None,
    output_name: str | None = None,
    **kwargs,
) -> HeadOutput:
    r"""
    Forward pass of the ContactPredictionHead.

    Args:
        outputs: The outputs of the model.
        attention_mask: The attention mask for the inputs.
        input_ids: The input ids for the inputs.
        labels: The labels for the head.
        output_name: The name of the output to use.
            Defaults to `self.output_name`.
    """

    if isinstance(outputs, (Mapping, ModelOutput)):
        output = outputs[output_name or self.output_name]
    elif isinstance(outputs, tuple):
        output = outputs[-1]
    attentions = torch.stack(output, 1)

    # In the original model, attentions for padding tokens are completely zeroed out.
    # This makes no difference most of the time because the other tokens won't attend to them,
    # but it does for the contact prediction task, which takes attentions as input,
    # so we have to mimic that here.
    if attention_mask is None:
        attention_mask = self._get_attention_mask(input_ids)
    attention_mask = attention_mask.unsqueeze(1) * attention_mask.unsqueeze(2)
    attentions = attentions * attention_mask[:, None, None, :, :]

    # remove bos token attentions
    if self.bos_token_id is not None:
        attentions = attentions[..., 1:, 1:]
        attention_mask = attention_mask[..., 1:]
        if input_ids is not None:
            input_ids = input_ids[..., 1:]
    # remove eos token attentions
    if self.eos_token_id is not None:
        if input_ids is not None:
            eos_mask = input_ids.ne(self.eos_token_id).to(attentions)
            input_ids = input_ids[..., :-1]
        else:
            last_valid_indices = attention_mask.sum(dim=-1)
            seq_length = attention_mask.size(-1)
            eos_mask = torch.arange(seq_length, device=attentions.device).unsqueeze(0) == last_valid_indices
        eos_mask = eos_mask.unsqueeze(1) * eos_mask.unsqueeze(2)
        attentions = attentions * eos_mask[:, None, None, :, :]
        attentions = attentions[..., :-1, :-1]
        attention_mask = attention_mask[..., :-1, :-1]

    # features: batch x channels x input_ids x input_ids (symmetric)
    batch_size, layers, heads, seqlen, _ = attentions.size()
    attentions = attentions.view(batch_size, layers * heads, seqlen, seqlen)
    attentions = attentions.to(self.decoder.proj.weight.device)
    attentions = average_product_correct(symmetrize(attentions))
    attentions = attentions.permute(0, 2, 3, 1).squeeze(3)

    return super().forward(attentions, labels, **kwargs)

ContactLogitsResnetHead ¶

Bases: PredictionHead

Head for tasks in contact-level.

Performs symmetrization, and average product correct.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/contact.py

Python
@HeadRegistry.contact.logits.register("resnet")
class ContactLogitsResnetHead(PredictionHead):
    r"""
    Head for tasks in contact-level.

    Performs symmetrization, and average product correct.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    output_name: str = "last_hidden_state"
    r"""The default output to use for the head."""

    require_attentions: bool = False
    r"""Whether the head requires attentions."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        super().__init__(config, head_config)
        num_layers = self.config.get("num_layers", 16)
        num_channels = self.config.get("num_channels", self.config.hidden_size)  # type: ignore[operator]
        block = self.config.get("block", "auto")
        self.decoder = ResNet(
            num_layers=num_layers,
            hidden_size=self.config.hidden_size,  # type: ignore[arg-type]
            block=block,
            num_channels=num_channels,
            num_labels=self.num_labels,
        )

    def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
        self,
        outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
        attention_mask: Tensor | None = None,
        input_ids: NestedTensor | Tensor | None = None,
        labels: Tensor | None = None,
        output_name: str | None = None,
        **kwargs,
    ) -> HeadOutput:
        r"""
        Forward pass of the ContactPredictionHead.

        Args:
            outputs: The outputs of the model.
            attention_mask: The attention mask for the inputs.
            input_ids: The input ids for the inputs.
            labels: The labels for the head.
            output_name: The name of the output to use.
                Defaults to `self.output_name`.
        """
        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[0]
        else:
            raise ValueError(f"Unsupported type for outputs: {type(outputs)}")

        if attention_mask is None:
            attention_mask = self._get_attention_mask(input_ids)
        output = output * attention_mask.unsqueeze(-1)
        output, _, _ = self._remove_special_tokens(output, attention_mask, input_ids)

        # make symmetric contact map
        contact_map = output.unsqueeze(1) * output.unsqueeze(2)

        return super().forward(contact_map, labels, **kwargs)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'last_hidden_state'

The default output to use for the head.

require_attentions `class-attribute` `instance-attribute` ¶

Python

require_attentions: bool = False

Whether the head requires attentions.

forward ¶

Python

forward(outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...], attention_mask: Tensor | None = None, input_ids: NestedTensor | Tensor | None = None, labels: Tensor | None = None, output_name: str | None = None, **kwargs) -> HeadOutput

Forward pass of the ContactPredictionHead.

Parameters:

Name	Type	Description	Default
`outputs` ¶	`ModelOutput \| Mapping[str, Tensor] \| Tuple[Tensor, ...]`	The outputs of the model.	required
`attention_mask` ¶	`Tensor \| None`	The attention mask for the inputs.	`None`
`input_ids` ¶	`NestedTensor \| Tensor \| None`	The input ids for the inputs.	`None`
`labels` ¶	`Tensor \| None`	The labels for the head.	`None`
`output_name` ¶	`str \| None`	The name of the output to use. Defaults to `self.output_name`.	`None`

Source code in multimolecule/module/heads/contact.py

Python
def forward(  # type: ignore[override]  # pylint: disable=arguments-renamed
    self,
    outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
    attention_mask: Tensor | None = None,
    input_ids: NestedTensor | Tensor | None = None,
    labels: Tensor | None = None,
    output_name: str | None = None,
    **kwargs,
) -> HeadOutput:
    r"""
    Forward pass of the ContactPredictionHead.

    Args:
        outputs: The outputs of the model.
        attention_mask: The attention mask for the inputs.
        input_ids: The input ids for the inputs.
        labels: The labels for the head.
        output_name: The name of the output to use.
            Defaults to `self.output_name`.
    """
    if isinstance(outputs, (Mapping, ModelOutput)):
        output = outputs[output_name or self.output_name]
    elif isinstance(outputs, tuple):
        output = outputs[0]
    else:
        raise ValueError(f"Unsupported type for outputs: {type(outputs)}")

    if attention_mask is None:
        attention_mask = self._get_attention_mask(input_ids)
    output = output * attention_mask.unsqueeze(-1)
    output, _, _ = self._remove_special_tokens(output, attention_mask, input_ids)

    # make symmetric contact map
    contact_map = output.unsqueeze(1) * output.unsqueeze(2)

    return super().forward(contact_map, labels, **kwargs)

symmetrize ¶

Python

symmetrize(x: Tensor) -> Tensor

Make layer symmetric in final two dimensions, used for contact prediction.

Source code in multimolecule/module/heads/contact.py

Python
def symmetrize(x: Tensor) -> Tensor:
    "Make layer symmetric in final two dimensions, used for contact prediction."
    return x + x.transpose(-1, -2)

average_product_correct ¶

Python

average_product_correct(x: Tensor) -> Tensor

Perform average product correct, used for contact prediction.

Source code in multimolecule/module/heads/contact.py

Python
def average_product_correct(x: Tensor) -> Tensor:
    "Perform average product correct, used for contact prediction."
    a1 = x.sum(-1, keepdims=True)
    a2 = x.sum(-2, keepdims=True)
    a12 = x.sum((-1, -2), keepdims=True)

    avg = a1 * a2
    avg.div_(a12)  # in-place to reduce memory
    normalized = x - avg
    return normalized

multimolecule.module.heads.pretrain ¶

MaskedLMHead ¶

Bases: BasePredictionHead

Head for masked language modeling.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`MaskedLMHeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/pretrain.py

Python
@HeadRegistry.register("masked_lm")
class MaskedLMHead(BasePredictionHead):
    r"""
    Head for masked language modeling.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    output_name: str = "last_hidden_state"
    r"""The default output to use for the head."""

    def __init__(
        self, config: PreTrainedConfig, weight: Tensor | None = None, head_config: MaskedLMHeadConfig | None = None
    ):
        if head_config is None:
            head_config = (config.lm_head if hasattr(config, "lm_head") else config.head) or MaskedLMHeadConfig()
        head_config.num_labels = config.vocab_size
        super().__init__(config, head_config)
        self.dropout = nn.Dropout(self.config.dropout)
        self.transform = HeadTransformRegistryHF.build(self.config)
        self.decoder = nn.Linear(self.config.hidden_size, self.num_labels, bias=False)
        if weight is not None:
            self.decoder.weight = weight
        if self.config.bias:
            self.bias = nn.Parameter(torch.zeros(self.num_labels))
            self.decoder.bias = self.bias
        self.activation = ACT2FN[self.config.act] if self.config.act is not None else None

    def forward(
        self,
        outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
        labels: Tensor | None = None,
        output_name: str | None = None,
    ) -> HeadOutput:
        r"""
        Forward pass of the MaskedLMHead.

        Args:
            outputs: The outputs of the model.
            labels: The labels for the head.
            output_name: The name of the output to use.
                Defaults to `self.output_name`.
        """
        if isinstance(outputs, (Mapping, ModelOutput)):
            output = outputs[output_name or self.output_name]
        elif isinstance(outputs, tuple):
            output = outputs[0]
        else:
            raise ValueError(f"Unsupported type for outputs: {type(outputs)}")
        output = self.dropout(output)
        output = self.transform(output)
        output = self.decoder(output)
        if self.activation is not None:
            output = self.activation(output)
        if labels is not None:
            if isinstance(labels, NestedTensor):
                if isinstance(output, Tensor):
                    output = labels.nested_like(output, strict=False)
                return HeadOutput(output, F.cross_entropy(output.concat, labels.concat))
            return HeadOutput(output, F.cross_entropy(output.view(-1, self.num_labels), labels.view(-1)))
        return HeadOutput(output)

output_name `class-attribute` `instance-attribute` ¶

Python

output_name: str = 'last_hidden_state'

The default output to use for the head.

forward ¶

Python

forward(outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...], labels: Tensor | None = None, output_name: str | None = None) -> HeadOutput

Forward pass of the MaskedLMHead.

Parameters:

Name	Type	Description	Default
`outputs` ¶	`ModelOutput \| Mapping[str, Tensor] \| Tuple[Tensor, ...]`	The outputs of the model.	required
`labels` ¶	`Tensor \| None`	The labels for the head.	`None`
`output_name` ¶	`str \| None`	The name of the output to use. Defaults to `self.output_name`.	`None`

Source code in multimolecule/module/heads/pretrain.py

Python
def forward(
    self,
    outputs: ModelOutput | Mapping[str, Tensor] | Tuple[Tensor, ...],
    labels: Tensor | None = None,
    output_name: str | None = None,
) -> HeadOutput:
    r"""
    Forward pass of the MaskedLMHead.

    Args:
        outputs: The outputs of the model.
        labels: The labels for the head.
        output_name: The name of the output to use.
            Defaults to `self.output_name`.
    """
    if isinstance(outputs, (Mapping, ModelOutput)):
        output = outputs[output_name or self.output_name]
    elif isinstance(outputs, tuple):
        output = outputs[0]
    else:
        raise ValueError(f"Unsupported type for outputs: {type(outputs)}")
    output = self.dropout(output)
    output = self.transform(output)
    output = self.decoder(output)
    if self.activation is not None:
        output = self.activation(output)
    if labels is not None:
        if isinstance(labels, NestedTensor):
            if isinstance(output, Tensor):
                output = labels.nested_like(output, strict=False)
            return HeadOutput(output, F.cross_entropy(output.concat, labels.concat))
        return HeadOutput(output, F.cross_entropy(output.view(-1, self.num_labels), labels.view(-1)))
    return HeadOutput(output)

multimolecule.module.heads.generic ¶

BasePredictionHead ¶

Bases: Module

Head for all-level of tasks.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/generic.py

Python
class BasePredictionHead(nn.Module):
    r"""
    Head for all-level of tasks.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    num_labels: int
    r"""Number of labels for the head."""

    output_name: str | None
    r"""The default output to use for the head."""

    require_attentions: bool = False
    r"""Whether the head requires attentions from the model."""

    bos_token_id: int | None = None
    r"""The ID of the beginning-of-sequence token. Usually is an alias of `cls_token_id`."""

    pad_token_id: int | None = None
    r"""The ID of the padding token."""

    eos_token_id: int | None = None
    r"""The ID of the end-of-sequence token. In rare cases, it is an alias of `sep_token_id`."""

    requires_attention: bool = False
    r"""Whether the head requires attentions from the model."""

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        super().__init__()
        if head_config is None:
            head_config = config.head or HeadConfig(num_labels=config.num_labels)
        if not isinstance(head_config, HeadConfig):
            head_config = HeadConfig(head_config)
        if not head_config.num_labels:
            head_config.num_labels = config.num_labels
        if not head_config.hidden_size:
            head_config.hidden_size = config.hidden_size
        if not head_config.problem_type:
            head_config.problem_type = config.problem_type
        self.config = head_config
        self.bos_token_id = config.bos_token_id
        self.eos_token_id = config.eos_token_id
        self.pad_token_id = config.pad_token_id
        self.num_labels = self.config.num_labels  # type: ignore[assignment]
        if getattr(self.config, "output_name", None) is not None:
            self.output_name = self.config.output_name

    def _get_attention_mask(self, input_ids: NestedTensor | Tensor) -> Tensor:
        if isinstance(input_ids, NestedTensor):
            return input_ids.mask
        if input_ids is None:
            raise ValueError(
                f"Either attention_mask or input_ids must be provided for {self.__class__.__name__} to work."
            )
        if self.pad_token_id is None:
            raise ValueError(
                f"pad_token_id must be provided when attention_mask is not passed to {self.__class__.__name__}."
            )
        return input_ids.ne(self.pad_token_id).int()

    def _remove_special_tokens(
        self, output: Tensor, attention_mask: Tensor, input_ids: Tensor | None
    ) -> Tuple[Tensor, Tensor, Tensor]:
        # remove bos token embeddings
        if self.bos_token_id is not None:
            output = output[..., 1:, :]
            attention_mask = attention_mask[..., 1:]
            if input_ids is not None:
                input_ids = input_ids[..., 1:]
        # remove eos token embeddings
        if self.eos_token_id is not None:
            if input_ids is not None:
                eos_mask = input_ids.ne(self.eos_token_id).to(output)
                input_ids = input_ids[..., :-1]
            else:
                last_valid_indices = attention_mask.sum(dim=-1)
                seq_length = attention_mask.size(-1)
                eos_mask = torch.arange(seq_length, device=output.device) == last_valid_indices.unsqueeze(1)
            output = output * eos_mask[:, :, None]
            output = output[..., :-1, :]
            attention_mask = attention_mask[..., :-1]
        return output, attention_mask, input_ids

output_name `instance-attribute` ¶

Python

output_name: str | None

The default output to use for the head.

require_attentions `class-attribute` `instance-attribute` ¶

Python

require_attentions: bool = False

Whether the head requires attentions from the model.

requires_attention `class-attribute` `instance-attribute` ¶

Python

requires_attention: bool = False

Whether the head requires attentions from the model.

bos_token_id `class-attribute` `instance-attribute` ¶

Python

bos_token_id: int | None = bos_token_id

The ID of the beginning-of-sequence token. Usually is an alias of cls_token_id.

eos_token_id `class-attribute` `instance-attribute` ¶

Python

eos_token_id: int | None = eos_token_id

The ID of the end-of-sequence token. In rare cases, it is an alias of sep_token_id.

pad_token_id `class-attribute` `instance-attribute` ¶

Python

pad_token_id: int | None = pad_token_id

The ID of the padding token.

num_labels `instance-attribute` ¶

Python

num_labels: int = num_labels

Number of labels for the head.

PredictionHead ¶

Bases: BasePredictionHead

Head for all-level of tasks.

Parameters:

Name	Type	Description	Default
`config` ¶	`PreTrainedConfig`	The configuration object for the model.	required
`head_config` ¶	`HeadConfig \| None`	The configuration object for the head. If None, will use configuration from the `config`.	`None`

Source code in multimolecule/module/heads/generic.py

Python
class PredictionHead(BasePredictionHead):
    r"""
    Head for all-level of tasks.

    Args:
        config: The configuration object for the model.
        head_config: The configuration object for the head.
            If None, will use configuration from the `config`.
    """

    def __init__(self, config: PreTrainedConfig, head_config: HeadConfig | None = None):
        super().__init__(config, head_config)
        self.dropout = nn.Dropout(self.config.dropout)
        self.transform = HeadTransformRegistryHF.build(self.config)
        self.decoder = nn.Linear(self.config.hidden_size, self.num_labels, bias=self.config.bias)
        self.activation = ACT2FN[self.config.act] if self.config.act is not None else None
        self.criterion = CriterionRegistry.build(self.config)

    def forward(self, embeddings: Tensor, labels: Tensor | None, **kwargs) -> HeadOutput:
        r"""
        Forward pass of the PredictionHead.

        Args:
            embeddings: The embeddings to be passed through the head.
            labels: The labels for the head.
        """
        if kwargs:
            warn(
                f"The following arguments are not applicable to {self.__class__.__name__}"
                f"and will be ignored: {kwargs.keys()}"
            )
        output = self.dropout(embeddings)
        output = self.transform(output)
        output = self.decoder(output)
        if self.activation is not None:
            output = self.activation(output)
        if labels is not None:
            if isinstance(labels, NestedTensor):
                if isinstance(output, Tensor):
                    output = labels.nested_like(output, strict=False)
                return HeadOutput(output, self.criterion(output.concat, labels.concat))
            return HeadOutput(output, self.criterion(output, labels))
        return HeadOutput(output)

forward ¶

Python

forward(embeddings: Tensor, labels: Tensor | None, **kwargs) -> HeadOutput

Forward pass of the PredictionHead.

Parameters:

Name	Type	Description	Default
`embeddings` ¶	`Tensor`	The embeddings to be passed through the head.	required
`labels` ¶	`Tensor \| None`	The labels for the head.	required

Source code in multimolecule/module/heads/generic.py

Python
def forward(self, embeddings: Tensor, labels: Tensor | None, **kwargs) -> HeadOutput:
    r"""
    Forward pass of the PredictionHead.

    Args:
        embeddings: The embeddings to be passed through the head.
        labels: The labels for the head.
    """
    if kwargs:
        warn(
            f"The following arguments are not applicable to {self.__class__.__name__}"
            f"and will be ignored: {kwargs.keys()}"
        )
    output = self.dropout(embeddings)
    output = self.transform(output)
    output = self.decoder(output)
    if self.activation is not None:
        output = self.activation(output)
    if labels is not None:
        if isinstance(labels, NestedTensor):
            if isinstance(output, Tensor):
                output = labels.nested_like(output, strict=False)
            return HeadOutput(output, self.criterion(output.concat, labels.concat))
        return HeadOutput(output, self.criterion(output, labels))
    return HeadOutput(output)

multimolecule.module.heads.output ¶

HeadOutput `dataclass` ¶

Bases: ModelOutput

Output of a prediction head.

Parameters:

Name	Type	Description	Default
`logits` ¶	`FloatTensor`	The prediction logits from the head.	required
`loss` ¶	`FloatTensor \| None`	The loss from the head. Defaults to None.	`None`

Source code in multimolecule/module/heads/output.py

Python
@dataclass
class HeadOutput(ModelOutput):
    r"""
    Output of a prediction head.

    Args:
        logits: The prediction logits from the head.
        loss: The loss from the head.
            Defaults to None.
    """

    logits: FloatTensor
    loss: FloatTensor | None = None

heads¶

multimolecule.module.heads.config ¶

HeadConfig ¶

num_labels ¶

problem_type ¶

hidden_size ¶

dropout ¶

transform ¶

transform_act ¶

bias ¶

act ¶

layer_norm_eps ¶

output_name ¶

type ¶

MaskedLMHeadConfig ¶

hidden_size ¶

dropout ¶

transform ¶

transform_act ¶

bias ¶

act ¶

layer_norm_eps ¶

output_name ¶

multimolecule.module.heads.sequence ¶

SequencePredictionHead ¶

config ¶

head_config ¶

output_name class-attribute instance-attribute ¶

forward ¶

outputs ¶

labels ¶

output_name ¶

multimolecule.module.heads.token ¶

TokenPredictionHead ¶

config ¶

head_config ¶

output_name class-attribute instance-attribute ¶

forward ¶

outputs ¶

attention_mask ¶

input_ids ¶

labels ¶

output_name ¶

TokenKMerHead ¶

config ¶

head_config ¶

output_name class-attribute instance-attribute ¶

forward ¶

outputs ¶

attention_mask ¶

input_ids ¶

labels ¶

output_name ¶

unfold_kmer_embeddings ¶

embeddings ¶

attention_mask ¶

nmers ¶

bos_token_id ¶

eos_token_id ¶

multimolecule.module.heads.contact ¶

ContactPredictionHead ¶

output_name class-attribute instance-attribute ¶

require_attentions class-attribute instance-attribute ¶

ContactAttentionLinearHead ¶

config ¶

head_config ¶

output_name class-attribute instance-attribute ¶

require_attentions class-attribute instance-attribute ¶

forward ¶

outputs ¶

attention_mask ¶

input_ids ¶

labels ¶

output_name ¶

ContactAttentionResnetHead ¶

config ¶

head_config ¶

output_name class-attribute instance-attribute ¶

require_attentions class-attribute instance-attribute ¶

forward ¶

`num_labels` ¶

`problem_type` ¶

`hidden_size` ¶

`dropout` ¶

`transform` ¶

`transform_act` ¶

`bias` ¶

`act` ¶

`layer_norm_eps` ¶

`output_name` ¶

`type` ¶

`hidden_size` ¶

`dropout` ¶

`transform` ¶

`transform_act` ¶

`bias` ¶

`act` ¶

`layer_norm_eps` ¶

`output_name` ¶

`config` ¶

`head_config` ¶

output_name `class-attribute` `instance-attribute` ¶

`outputs` ¶

`labels` ¶

`output_name` ¶

`config` ¶

`head_config` ¶

output_name `class-attribute` `instance-attribute` ¶

`outputs` ¶

`attention_mask` ¶

`input_ids` ¶

`labels` ¶

`output_name` ¶

`config` ¶

`head_config` ¶

output_name `class-attribute` `instance-attribute` ¶

`outputs` ¶

`attention_mask` ¶

`input_ids` ¶

`labels` ¶

`output_name` ¶

`embeddings` ¶

`attention_mask` ¶

`nmers` ¶

`bos_token_id` ¶

`eos_token_id` ¶

output_name `class-attribute` `instance-attribute` ¶

require_attentions `class-attribute` `instance-attribute` ¶

`config` ¶

`head_config` ¶

output_name `class-attribute` `instance-attribute` ¶

require_attentions `class-attribute` `instance-attribute` ¶

`outputs` ¶

`attention_mask` ¶

`input_ids` ¶

`labels` ¶

`output_name` ¶

`config` ¶

`head_config` ¶

output_name `class-attribute` `instance-attribute` ¶

require_attentions `class-attribute` `instance-attribute` ¶

`outputs` ¶

`attention_mask` ¶

`input_ids` ¶

`labels` ¶

`output_name` ¶

`config` ¶

`head_config` ¶

output_name `class-attribute` `instance-attribute` ¶

require_attentions `class-attribute` `instance-attribute` ¶

`outputs` ¶

`attention_mask` ¶

`input_ids` ¶

`labels` ¶

`output_name` ¶

`config` ¶

`head_config` ¶

output_name `class-attribute` `instance-attribute` ¶

`outputs` ¶

`labels` ¶