跳转至

Optimus 5-Prime

Convolutional neural network that predicts the mean ribosome load (MRL) of a fixed 50 nt human 5’ untranslated region (5’UTR) from sequence alone.

Disclaimer

This is an UNOFFICIAL implementation of Human 5’ UTR design and variant effect prediction from a massively parallel translation assay by Paul J. Sample et al.

The OFFICIAL repository of Optimus 5-Prime is at pjsample/human_5utr_modeling.

Tip

The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

The team releasing Optimus 5-Prime did not write this model card for this model so this model card has been written by the MultiMolecule team.

Model Details

Optimus 5-Prime is a simple, fully feed-forward 1D convolutional network trained on a massively parallel polysome-profiling assay of ~280,000 random 50 nt 5’UTRs upstream of an eGFP reporter expressed in HEK293T. The network ingests a fixed 50 nt 5’UTR one-hot tensor, applies three stacked padding="same" 1D convolutions (120 filters, kernel 8, ReLU) with dropout between the second/third convolutions, flattens the per-position activations channels-last, and emits a single standardized mean ribosome load (MRL) regression score through a 40-unit fully connected layer and a linear regression head. Please refer to the Training Details section for more information on the training process.

The MRL scalar is the per-sequence mean of polysome-profile-derived ribosome loading and is used by the original authors both to score natural human 5’UTRs and to engineer new sequences with predictable translation efficiency. Variant-effect scoring is performed externally by computing the MRL difference between the reference and alternative sequences; the model itself takes a single sequence as input.

Model Specification

Num Layers Hidden Size Num Parameters (M) FLOPs (M) MACs (M) Max Num Tokens
4 40 0.48 24.04 12.00 50

Usage

The model file depends on the multimolecule library. You can install it using pip:

Bash
pip install multimolecule

Direct Use

Mean Ribosome Load Prediction

You can use this model directly to predict the mean ribosome load (MRL) of a fixed 50 nt 5’UTR sequence:

Python
1
2
3
4
5
6
7
8
>>> from multimolecule import RnaTokenizer, Optimus5PrimeForSequencePrediction

>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeForSequencePrediction.from_pretrained("multimolecule/optimus5prime")
>>> output = model(**tokenizer("GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC", return_tensors="pt"))

>>> output.keys()
odict_keys(['logits'])

The pre-regression dense representation is exposed on the backbone:

Python
1
2
3
4
5
6
7
8
>>> from multimolecule import RnaTokenizer, Optimus5PrimeModel

>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeModel.from_pretrained("multimolecule/optimus5prime")
>>> output = model(**tokenizer("GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC", return_tensors="pt"))

>>> output.keys()
odict_keys(['pooler_output'])

Interface

  • Input length: fixed 50 nt 5’UTR sequence
  • Padding: shorter sequences are right-padded with zeros to 50 nt; longer sequences are truncated to the first 50 nt
  • Alphabet: ACGUN; the upstream checkpoint only learned the four canonical nucleotides, the N channel stays zero
  • Special tokens: none added; input_ids are consumed positionally as one-hot channels
  • Output: standardized mean ribosome load score (logits) of shape (batch_size, 1); raw-MRL calibration requires the external scaler used by the upstream training workflow

Variant Effect

Optimus 5-Prime is a single-sequence regression model. To score the effect of a variant on translation, run the reference and alternative 5’UTRs through the model independently and compute the difference between their predicted MRL values:

Python
1
2
3
4
5
6
7
8
>>> from multimolecule import RnaTokenizer, Optimus5PrimeForSequencePrediction
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeForSequencePrediction.from_pretrained("multimolecule/optimus5prime")
>>> ref = "GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC"
>>> alt = "GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGAUAGC"
>>> ref_mrl = model(**tokenizer(ref, return_tensors="pt"))["logits"]
>>> alt_mrl = model(**tokenizer(alt, return_tensors="pt"))["logits"]
>>> delta = (alt_mrl - ref_mrl).item()

Training Details

Optimus 5-Prime was trained to regress the per-sequence mean ribosome load (MRL) derived from polysome profiling on a massively parallel reporter assay.

Training Data

Optimus 5-Prime was trained on approximately 280,000 randomized 50 nt 5’UTRs placed upstream of an eGFP reporter and expressed in HEK293T cells. Mean ribosome load was computed per sequence from polysome-fractionation read counts. The raw sequencing data are available at GEO accession GSE114002.

Training Procedure

Pre-training

The published main_MRL_model checkpoint was trained with mean-squared-error loss against standardized per-sequence MRL values. The optimizer was Adam with learning rate 1e-3, batch size 128, default Adam betas (0.9, 0.999), and epsilon 1e-8.

Citation

BibTeX
@article{sample2019human,
  author    = {Sample, Paul J. and Wang, Ban and Reid, David W. and Presnyak, Vlad and McFadyen, Iain J. and Morris, David R. and Seelig, Georg},
  title     = {Human 5' UTR design and variant effect prediction from a massively parallel translation assay},
  journal   = {Nature Biotechnology},
  volume    = {37},
  number    = {7},
  pages     = {803--809},
  year      = {2019},
  publisher = {Springer Science and Business Media LLC},
  doi       = {10.1038/s41587-019-0164-5}
}

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:

BibTeX
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}

Contact

Please use GitHub issues of MultiMolecule for any questions or comments on the model card.

Please contact the authors of the Optimus 5-Prime paper for questions or comments on the paper/model.

License

This model implementation is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Text Only
SPDX-License-Identifier: AGPL-3.0-or-later

multimolecule.models.optimus5prime

RnaTokenizer

Bases: Tokenizer

Tokenizer for RNA sequences.

Parameters:

Name Type Description Default

alphabet

Alphabet | str | List[str] | None

alphabet to use for tokenization.

  • If is None, the standard RNA alphabet will be used.
  • If is a string, it should correspond to the name of a predefined alphabet. The options include
    • standard
    • extended
    • streamline
    • nucleobase
  • If is an alphabet or a list of characters, that specific alphabet will be used.
None

nmers

int

Size of kmer to tokenize.

1

codon

bool

Whether to tokenize into codons.

False

replace_T_with_U

bool

Whether to replace T with U.

True

do_upper_case

bool

Whether to convert input to uppercase.

True

Examples:

Python Console Session
>>> from multimolecule import RnaTokenizer
>>> tokenizer = RnaTokenizer()
>>> tokenizer('<pad><cls><eos><unk><mask><null>ACGUNRYSWKMBDHVIX|.*-?')["input_ids"]
[1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 2]
>>> tokenizer('acgu')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer('acgt')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer = RnaTokenizer(replace_T_with_U=False)
>>> tokenizer('acgt')["input_ids"]
[1, 6, 7, 8, 3, 2]
>>> tokenizer = RnaTokenizer(nmers=3)
>>> tokenizer('uagcuuauc')["input_ids"]
[1, 83, 17, 64, 49, 96, 84, 22, 2]
>>> tokenizer = RnaTokenizer(codon=True)
>>> tokenizer('uagcuuauc')["input_ids"]
[1, 83, 49, 22, 2]
>>> tokenizer('uagcuuauca')["input_ids"]
Traceback (most recent call last):
ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10
Source code in multimolecule/tokenisers/rna/tokenization_rna.py
Python
class RnaTokenizer(Tokenizer):
    """
    Tokenizer for RNA sequences.

    Args:
        alphabet: alphabet to use for tokenization.

            - If is `None`, the standard RNA alphabet will be used.
            - If is a `string`, it should correspond to the name of a predefined alphabet. The options include
                + `standard`
                + `extended`
                + `streamline`
                + `nucleobase`
            - If is an alphabet or a list of characters, that specific alphabet will be used.
        nmers: Size of kmer to tokenize.
        codon: Whether to tokenize into codons.
        replace_T_with_U: Whether to replace T with U.
        do_upper_case: Whether to convert input to uppercase.

    Examples:
        >>> from multimolecule import RnaTokenizer
        >>> tokenizer = RnaTokenizer()
        >>> tokenizer('<pad><cls><eos><unk><mask><null>ACGUNRYSWKMBDHVIX|.*-?')["input_ids"]
        [1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 2]
        >>> tokenizer('acgu')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer('acgt')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer = RnaTokenizer(replace_T_with_U=False)
        >>> tokenizer('acgt')["input_ids"]
        [1, 6, 7, 8, 3, 2]
        >>> tokenizer = RnaTokenizer(nmers=3)
        >>> tokenizer('uagcuuauc')["input_ids"]
        [1, 83, 17, 64, 49, 96, 84, 22, 2]
        >>> tokenizer = RnaTokenizer(codon=True)
        >>> tokenizer('uagcuuauc')["input_ids"]
        [1, 83, 49, 22, 2]
        >>> tokenizer('uagcuuauca')["input_ids"]
        Traceback (most recent call last):
        ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10
    """

    model_input_names = ["input_ids", "attention_mask"]

    def __init__(
        self,
        alphabet: Alphabet | str | List[str] | None = None,
        nmers: int = 1,
        codon: bool = False,
        replace_T_with_U: bool = True,
        do_upper_case: bool = True,
        additional_special_tokens: List | Tuple | None = None,
        **kwargs,
    ):
        if codon and (nmers > 1 and nmers != 3):
            raise ValueError("Codon and nmers cannot be used together.")
        if codon:
            nmers = 3  # set to 3 to get correct vocab
        if not isinstance(alphabet, Alphabet):
            alphabet = get_alphabet(alphabet, nmers=nmers)
        super().__init__(
            alphabet=alphabet,
            nmers=nmers,
            codon=codon,
            replace_T_with_U=replace_T_with_U,
            do_upper_case=do_upper_case,
            additional_special_tokens=additional_special_tokens,
            **kwargs,
        )
        self.replace_T_with_U = replace_T_with_U
        self.nmers = nmers
        self.codon = codon

    def _tokenize(self, text: str, **kwargs):
        if self.do_upper_case:
            text = text.upper()
        if self.replace_T_with_U:
            text = text.replace("T", "U")
        if self.codon:
            if len(text) % 3 != 0:
                raise ValueError(
                    f"length of input sequence must be a multiple of 3 for codon tokenization, but got {len(text)}"
                )
            return [text[i : i + 3] for i in range(0, len(text), 3)]
        if self.nmers > 1:
            return [text[i : i + self.nmers] for i in range(len(text) - self.nmers + 1)]  # noqa: E203
        return list(text)

Optimus5PrimeConfig

Bases: PreTrainedConfig

This is the configuration class to store the configuration of a Optimus5PrimeModel. It is used to instantiate an Optimus 5-Prime model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Optimus 5-Prime main MRL model from pjsample/human_5utr_modeling.

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

Parameters:

Name Type Description Default

vocab_size

int

Vocabulary size of the Optimus 5-Prime model. Defines the number of one-hot input channels derived from input_ids. Defaults to 5 (the MultiMolecule RNA streamline alphabet ACGUN); the upstream checkpoint only uses the first four (A, C, G, U/T) and the N channel stays zero.

5

sequence_length

int

The fixed 5’UTR input sequence length Optimus 5-Prime was trained on (50 nt).

50

num_conv_layers

int

Number of stacked 1D convolutions. The published main MRL model uses 3.

3

conv_filters

int

Number of filters in every convolution. The published main MRL model uses 120.

120

conv_kernel_size

int

Convolution kernel size. The published main MRL model uses 8 with padding="same".

8

conv_dropout

float

Dropout probability applied after each intermediate convolution. The published main MRL model uses 0.0.

0.0

hidden_size

int

Size of the fully connected layer between the convolutional stack and the regression output. The published main MRL model uses 40.

40

dense_dropout

float

Dropout probability applied after the dense hidden layer. The published main MRL model uses 0.2.

0.2

hidden_act

str

The non-linear activation function used by the convolutional and dense layers.

'relu'

num_labels

int

Number of output labels. Optimus 5-Prime predicts a single mean ribosome load (MRL) scalar, so this defaults to 1.

1

head

HeadConfig | None

The configuration of the sequence-level prediction head. Defaults to a regression head (problem_type="regression"), matching Optimus 5-Prime’s MRL regression task.

None

Examples:

Python Console Session
1
2
3
4
5
6
7
>>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel
>>> # Initializing an Optimus 5-Prime style configuration
>>> configuration = Optimus5PrimeConfig()
>>> # Initializing a model (with random weights) from the configuration
>>> model = Optimus5PrimeModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in multimolecule/models/optimus5prime/configuration_optimus5prime.py
Python
class Optimus5PrimeConfig(PreTrainedConfig):
    r"""
    This is the configuration class to store the configuration of a
    [`Optimus5PrimeModel`][multimolecule.models.Optimus5PrimeModel]. It is used to instantiate an Optimus 5-Prime model
    according to the specified arguments, defining the model architecture. Instantiating a configuration with the
    defaults will yield a similar configuration to that of the Optimus 5-Prime main MRL model from
    [pjsample/human_5utr_modeling](https://github.com/pjsample/human_5utr_modeling).

    Configuration objects inherit from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig] and can be used to
    control the model outputs. Read the documentation from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig]
    for more information.

    Args:
        vocab_size:
            Vocabulary size of the Optimus 5-Prime model. Defines the number of one-hot input channels derived from
            `input_ids`. Defaults to 5 (the MultiMolecule RNA `streamline` alphabet `ACGUN`); the upstream checkpoint
            only uses the first four (`A`, `C`, `G`, `U`/`T`) and the `N` channel stays zero.
        sequence_length:
            The fixed 5'UTR input sequence length Optimus 5-Prime was trained on (50 nt).
        num_conv_layers:
            Number of stacked 1D convolutions. The published main MRL model uses 3.
        conv_filters:
            Number of filters in every convolution. The published main MRL model uses 120.
        conv_kernel_size:
            Convolution kernel size. The published main MRL model uses 8 with `padding="same"`.
        conv_dropout:
            Dropout probability applied after each intermediate convolution. The published main MRL model uses 0.0.
        hidden_size:
            Size of the fully connected layer between the convolutional stack and the regression output. The published
            main MRL model uses 40.
        dense_dropout:
            Dropout probability applied after the dense hidden layer. The published main MRL model uses 0.2.
        hidden_act:
            The non-linear activation function used by the convolutional and dense layers.
        num_labels:
            Number of output labels. Optimus 5-Prime predicts a single mean ribosome load (MRL) scalar, so this
            defaults to 1.
        head:
            The configuration of the sequence-level prediction head. Defaults to a regression head
            (`problem_type="regression"`), matching Optimus 5-Prime's MRL regression task.

    Examples:
        >>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel
        >>> # Initializing an Optimus 5-Prime style configuration
        >>> configuration = Optimus5PrimeConfig()
        >>> # Initializing a model (with random weights) from the configuration
        >>> model = Optimus5PrimeModel(configuration)
        >>> # Accessing the model configuration
        >>> configuration = model.config
    """

    model_type = "optimus5prime"

    def __init__(
        self,
        vocab_size: int = 5,
        sequence_length: int = 50,
        num_conv_layers: int = 3,
        conv_filters: int = 120,
        conv_kernel_size: int = 8,
        conv_dropout: float = 0.0,
        hidden_size: int = 40,
        dense_dropout: float = 0.2,
        hidden_act: str = "relu",
        num_labels: int = 1,
        head: HeadConfig | None = None,
        **kwargs,
    ):
        super().__init__(num_labels=num_labels, **kwargs)
        if vocab_size < 4:
            raise ValueError(
                f"vocab_size ({vocab_size}) must cover the four canonical nucleotides used by Optimus 5-Prime."
            )
        if sequence_length <= 0:
            raise ValueError(f"sequence_length ({sequence_length}) must be a positive integer.")
        if num_conv_layers < 1:
            raise ValueError(f"num_conv_layers ({num_conv_layers}) must be >= 1.")
        if conv_filters <= 0:
            raise ValueError(f"conv_filters ({conv_filters}) must be positive.")
        if conv_kernel_size <= 0:
            raise ValueError(f"conv_kernel_size ({conv_kernel_size}) must be positive.")
        if not 0.0 <= conv_dropout < 1.0:
            raise ValueError(f"conv_dropout ({conv_dropout}) must be in [0.0, 1.0).")
        if not 0.0 <= dense_dropout < 1.0:
            raise ValueError(f"dense_dropout ({dense_dropout}) must be in [0.0, 1.0).")
        if hidden_size <= 0:
            raise ValueError(f"hidden_size ({hidden_size}) must be positive.")
        self.vocab_size = vocab_size
        self.sequence_length = sequence_length
        self.num_conv_layers = num_conv_layers
        self.conv_filters = conv_filters
        self.conv_kernel_size = conv_kernel_size
        self.conv_dropout = conv_dropout
        self.hidden_size = hidden_size
        self.dense_dropout = dense_dropout
        self.hidden_act = hidden_act
        if head is None:
            head = HeadConfig(problem_type="regression")
        else:
            head = HeadConfig(head)
            if head.problem_type is None:
                head.problem_type = "regression"
        self.head = head

Optimus5PrimeForSequencePrediction

Bases: Optimus5PrimePreTrainedModel

Optimus 5-Prime model with a sequence-level prediction head.

The published model is a regression network that predicts the mean ribosome load (MRL) scalar for a fixed 50 nt 5’UTR. This wrapper exposes the converted upstream regression decoder through the standard MultiMolecule sequence-prediction head.

Examples:

Python Console Session
>>> import torch
>>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeForSequencePrediction, RnaTokenizer
>>> config = Optimus5PrimeConfig()
>>> model = Optimus5PrimeForSequencePrediction(config)
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
>>> output = model(**input, labels=torch.tensor([[1.0]]))
>>> output["logits"].shape
torch.Size([1, 1])
>>> output["loss"]
tensor(..., grad_fn=<MseLossBackward0>)
Source code in multimolecule/models/optimus5prime/modeling_optimus5prime.py
Python
class Optimus5PrimeForSequencePrediction(Optimus5PrimePreTrainedModel):
    """
    Optimus 5-Prime model with a sequence-level prediction head.

    The published model is a regression network that predicts the mean ribosome load (MRL) scalar for a fixed 50 nt
    5'UTR. This wrapper exposes the converted upstream regression decoder through the standard MultiMolecule
    sequence-prediction head.

    Examples:
        >>> import torch
        >>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeForSequencePrediction, RnaTokenizer
        >>> config = Optimus5PrimeConfig()
        >>> model = Optimus5PrimeForSequencePrediction(config)
        >>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
        >>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
        >>> output = model(**input, labels=torch.tensor([[1.0]]))
        >>> output["logits"].shape
        torch.Size([1, 1])
        >>> output["loss"]  # doctest:+ELLIPSIS
        tensor(..., grad_fn=<MseLossBackward0>)
    """

    def __init__(self, config: Optimus5PrimeConfig):
        super().__init__(config)
        self.model = Optimus5PrimeModel(config)
        self.sequence_head = SequencePredictionHead(config, config.head)
        self.head_config = self.sequence_head.config
        # Initialize weights and apply final processing
        self.post_init()

    @property
    def output_channels(self) -> list[str]:
        if self.sequence_head.num_labels != 1:
            return [f"mean_ribosome_load_{index}" for index in range(self.sequence_head.num_labels)]
        return ["mean_ribosome_load"]

    @can_return_tuple
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        labels: Tensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> Tuple[Tensor, ...] | SequencePredictorOutput:
        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
            return_dict=True,
            **kwargs,
        )

        output = self.sequence_head(outputs, labels)

        return SequencePredictorOutput(loss=output.loss, logits=output.logits)

Optimus5PrimeModel

Bases: Optimus5PrimePreTrainedModel

The bare Optimus 5-Prime model outputting the pre-regression shared representation.

Examples:

Python Console Session
1
2
3
4
5
6
7
8
>>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel, RnaTokenizer
>>> config = Optimus5PrimeConfig()
>>> model = Optimus5PrimeModel(config)
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
>>> output = model(**input)
>>> output["pooler_output"].shape
torch.Size([1, 40])
Source code in multimolecule/models/optimus5prime/modeling_optimus5prime.py
Python
class Optimus5PrimeModel(Optimus5PrimePreTrainedModel):
    """
    The bare Optimus 5-Prime model outputting the pre-regression shared representation.

    Examples:
        >>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel, RnaTokenizer
        >>> config = Optimus5PrimeConfig()
        >>> model = Optimus5PrimeModel(config)
        >>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
        >>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
        >>> output = model(**input)
        >>> output["pooler_output"].shape
        torch.Size([1, 40])
    """

    def __init__(self, config: Optimus5PrimeConfig):
        super().__init__(config)
        self.embeddings = Optimus5PrimeEmbedding(config)
        self.encoder = Optimus5PrimeEncoder(config)
        # Initialize weights and apply final processing
        self.post_init()

    @merge_with_config_defaults
    @capture_outputs
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> Optimus5PrimeModelOutput:
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is None and inputs_embeds is None:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if isinstance(input_ids, NestedTensor):
            if attention_mask is None:
                attention_mask = input_ids.mask
            input_ids = input_ids.tensor
        if isinstance(inputs_embeds, NestedTensor):
            if attention_mask is None:
                attention_mask = inputs_embeds.mask
            inputs_embeds = inputs_embeds.tensor

        embedding_output = self.embeddings(
            input_ids=input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
        )

        pooled_output = self.encoder(embedding_output)

        return Optimus5PrimeModelOutput(pooler_output=pooled_output)

Optimus5PrimeModelOutput dataclass

Bases: ModelOutput

Base class for outputs of the Optimus 5-Prime model.

Parameters:

Name Type Description Default

pooler_output

`torch.FloatTensor` of shape `(batch_size, hidden_size)`

The pre-regression dense representation consumed by the MRL regression layer.

None
Source code in multimolecule/models/optimus5prime/modeling_optimus5prime.py
Python
@dataclass
class Optimus5PrimeModelOutput(ModelOutput):
    """
    Base class for outputs of the Optimus 5-Prime model.

    Args:
        pooler_output (`torch.FloatTensor` of shape `(batch_size, hidden_size)`):
            The pre-regression dense representation consumed by the MRL regression layer.
    """

    pooler_output: torch.FloatTensor | None = None

Optimus5PrimePreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in multimolecule/models/optimus5prime/modeling_optimus5prime.py
Python
class Optimus5PrimePreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = Optimus5PrimeConfig
    base_model_prefix = "model"
    _can_record_outputs: dict[str, Any] | None = None
    _no_split_modules = ["Optimus5PrimeEncoder"]

    @torch.no_grad()
    def _init_weights(self, module):
        super()._init_weights(module)
        # Use transformers.initialization wrappers (imported as `init`); they check the
        # `_is_hf_initialized` flag so they don't clobber tensors loaded from a checkpoint.
        if isinstance(module, (nn.Conv1d, nn.Linear)):
            init.kaiming_uniform_(module.weight, a=math.sqrt(5))
            if module.bias is not None:
                fan_in, _ = nn.init._calculate_fan_in_and_fan_out(module.weight)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                init.uniform_(module.bias, -bound, bound)