Optimus 5-Prime¶

Convolutional neural network that predicts the mean ribosome load (MRL) of a fixed 50 nt human 5’ untranslated region (5’UTR) from sequence alone.

Disclaimer¶

This is an UNOFFICIAL implementation of Human 5’ UTR design and variant effect prediction from a massively parallel translation assay by Paul J. Sample, et al.

The OFFICIAL repository of Optimus 5-Prime is at pjsample/human_5utr_modeling.

Tip

The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

The team releasing Optimus 5-Prime did not write this model card for this model so this model card has been written by the MultiMolecule team.

Model Details¶

Optimus 5-Prime is a simple, fully feed-forward 1D convolutional network trained on a massively parallel polysome-profiling assay of ~280,000 random 50 nt 5’UTRs upstream of an eGFP reporter expressed in HEK293T. The network ingests a fixed 50 nt 5’UTR one-hot tensor, applies three stacked padding="same" 1D convolutions (120 filters, kernel 8, ReLU) with dropout between the second/third convolutions, flattens the per-position activations channels-last, and emits a single standardized mean ribosome load (MRL) regression score through a 40-unit fully connected layer and a linear regression head. Please refer to the Training Details section for more information on the training process.

The MRL scalar is the per-sequence mean of polysome-profile-derived ribosome loading and is used by the original authors both to score natural human 5’UTRs and to engineer new sequences with predictable translation efficiency. Variant-effect scoring is performed externally by computing the MRL difference between the reference and alternative sequences; the model itself takes a single sequence as input.

Model Specification¶

Num Layers	Hidden Size	Num Parameters (M)	FLOPs (M)	MACs (M)	Max Num Tokens
4	40	0.48	24.04	12.00	50

Links¶

Code: multimolecule.optimus5prime
Data: Massively parallel polysome-profiling MRL library on randomized 50 nt 5’UTRs in HEK293T, GEO GSE114002
Paper: Human 5’ UTR design and variant effect prediction from a massively parallel translation assay
Developed by: Paul J. Sample, Ban Wang, David W. Reid, Vlad Presnyak, Iain J. McFadyen, David R. Morris, Georg Seelig
Model type: 1D CNN for mean ribosome load (MRL) regression from a fixed 50 nt 5’UTR sequence
Original Repository: pjsample/human_5utr_modeling

Usage¶

The model file depends on the multimolecule library. You can install it using pip:

Bash
1	`pip install multimolecule`

Direct Use¶

Mean Ribosome Load Prediction¶

You can use this model directly to predict the mean ribosome load (MRL) of a fixed 50 nt 5’UTR sequence:

Python
>>> from multimolecule import RnaTokenizer, Optimus5PrimeForSequencePrediction

>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeForSequencePrediction.from_pretrained("multimolecule/optimus5prime")
>>> output = model(**tokenizer("GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC", return_tensors="pt"))

>>> output.keys()
odict_keys(['logits'])

The pre-regression dense representation is exposed on the backbone:

Python
>>> from multimolecule import RnaTokenizer, Optimus5PrimeModel

>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeModel.from_pretrained("multimolecule/optimus5prime")
>>> output = model(**tokenizer("GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC", return_tensors="pt"))

>>> output.keys()
odict_keys(['pooler_output'])

Interface¶

Input length: fixed 50 nt 5’UTR sequence
Padding: shorter sequences are right-padded with zeros to 50 nt; longer sequences are truncated to the first 50 nt
Alphabet: RNA (A, C, G, U); N is encoded as an all-zero channel
Special tokens: none added; input_ids are consumed positionally as one-hot channels
Output: standardized mean ribosome load score (logits) of shape (batch_size, 1); raw-MRL calibration requires the external scaler used by the upstream training workflow

Variant Effect¶

Optimus 5-Prime is a single-sequence regression model. To score the effect of a variant on translation, run the reference and alternative 5’UTRs through the model independently and compute the difference between their predicted MRL values:

Python
>>> from multimolecule import RnaTokenizer, Optimus5PrimeForSequencePrediction
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> model = Optimus5PrimeForSequencePrediction.from_pretrained("multimolecule/optimus5prime")
>>> ref = "GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGC"
>>> alt = "GGGACAUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGCUAGAUAGC"
>>> ref_mrl = model(**tokenizer(ref, return_tensors="pt"))["logits"]
>>> alt_mrl = model(**tokenizer(alt, return_tensors="pt"))["logits"]
>>> delta = (alt_mrl - ref_mrl).item()

Training Details¶

Optimus 5-Prime was trained to regress the per-sequence mean ribosome load (MRL) derived from polysome profiling on a massively parallel reporter assay.

Training Data¶

Optimus 5-Prime was trained on approximately 280,000 randomized 50 nt 5’UTRs placed upstream of an eGFP reporter and expressed in HEK293T cells. Mean ribosome load was computed per sequence from polysome-fractionation read counts. The raw sequencing data are available at GEO accession GSE114002.

Training Procedure¶

Pre-training¶

The published main_MRL_model was trained with mean-squared-error loss against standardized per-sequence MRL values. The optimizer was Adam with learning rate 1e-3, batch size 128, betas (0.9, 0.999), and epsilon 1e-8.

Citation¶

BibTeX
@article{sample2019human,
  author    = {Sample, Paul J. and Wang, Ban and Reid, David W. and Presnyak, Vlad and McFadyen, Iain J. and Morris, David R. and Seelig, Georg},
  title     = {Human 5' UTR design and variant effect prediction from a massively parallel translation assay},
  journal   = {Nature Biotechnology},
  volume    = {37},
  number    = {7},
  pages     = {803--809},
  year      = {2019},
  publisher = {Springer Science and Business Media LLC},
  doi       = {10.1038/s41587-019-0164-5}
}

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If MultiMolecule supports your research, please cite the MultiMolecule project as follows:

BibTeX
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}

Contact¶

Please use GitHub issues of MultiMolecule for any questions or comments on the model card.

Please contact the authors of the Optimus 5-Prime paper for questions or comments on the paper/model.

License¶

This model implementation is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Text Only
1	`SPDX-License-Identifier: AGPL-3.0-or-later`

API Reference¶

Optimus5PrimeConfig ¶

Bases: PreTrainedConfig

This is the configuration class to store the configuration of a Optimus5PrimeModel. It is used to instantiate an Optimus 5-Prime model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Optimus 5-Prime main MRL model from pjsample/human_5utr_modeling.

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

参数：

名称	类型	描述	默认
`vocab_size` ¶	`int`	Vocabulary size of the Optimus 5-Prime model. Defines the number of one-hot input channels derived from `input_ids`. Defaults to 5 (the MultiMolecule RNA `streamline` alphabet `ACGUN`); the upstream checkpoint only uses the first four (`A`, `C`, `G`, `U`/`T`) and the `N` channel stays zero.	`5`
`sequence_length` ¶	`int`	The fixed 5’UTR input sequence length Optimus 5-Prime was trained on (50 nt).	`50`
`num_conv_layers` ¶	`int`	Number of stacked 1D convolutions. The published main MRL model uses 3.	`3`
`conv_channels` ¶	`int`	Number of output channels in every convolution. The published main MRL model uses 120.	`120`
`conv_kernel_size` ¶	`int`	Convolution kernel size. The published main MRL model uses 8 with `padding="same"`.	`8`
`conv_dropout` ¶	`float`	Dropout probability applied after each intermediate convolution. The published main MRL model uses 0.0.	`0.0`
`hidden_size` ¶	`int`	Size of the fully connected layer between the convolutional stack and the regression output. The published main MRL model uses 40.	`40`
`dense_dropout` ¶	`float`	Dropout probability applied after the dense hidden layer. The published main MRL model uses 0.2.	`0.2`
`hidden_act` ¶	`str`	The non-linear activation function used by the convolutional and dense layers.	`'relu'`
`num_labels` ¶	`int`	Number of output labels. Optimus 5-Prime predicts a single mean ribosome load (MRL) scalar, so this defaults to 1.	`1`
`head` ¶	`HeadConfig \| None`	The configuration of the sequence-level prediction head. Defaults to a regression head (`problem_type="regression"`), matching Optimus 5-Prime’s MRL regression task.	`None`

示例：

Python Console Session
>>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel
>>> # Initializing an Optimus 5-Prime style configuration
>>> configuration = Optimus5PrimeConfig()
>>> # Initializing a model (with random weights) from the configuration
>>> model = Optimus5PrimeModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

源代码位于： multimolecule/models/optimus5prime/configuration_optimus5prime.py

Python
class Optimus5PrimeConfig(PreTrainedConfig):
    r"""
    This is the configuration class to store the configuration of a
    [`Optimus5PrimeModel`][multimolecule.models.Optimus5PrimeModel]. It is used to instantiate an Optimus 5-Prime model
    according to the specified arguments, defining the model architecture. Instantiating a configuration with the
    defaults will yield a similar configuration to that of the Optimus 5-Prime main MRL model from
    [pjsample/human_5utr_modeling](https://github.com/pjsample/human_5utr_modeling).

    Configuration objects inherit from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig] and can be used to
    control the model outputs. Read the documentation from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig]
    for more information.

    Args:
        vocab_size:
            Vocabulary size of the Optimus 5-Prime model. Defines the number of one-hot input channels derived from
            `input_ids`. Defaults to 5 (the MultiMolecule RNA `streamline` alphabet `ACGUN`); the upstream checkpoint
            only uses the first four (`A`, `C`, `G`, `U`/`T`) and the `N` channel stays zero.
        sequence_length:
            The fixed 5'UTR input sequence length Optimus 5-Prime was trained on (50 nt).
        num_conv_layers:
            Number of stacked 1D convolutions. The published main MRL model uses 3.
        conv_channels:
            Number of output channels in every convolution. The published main MRL model uses 120.
        conv_kernel_size:
            Convolution kernel size. The published main MRL model uses 8 with `padding="same"`.
        conv_dropout:
            Dropout probability applied after each intermediate convolution. The published main MRL model uses 0.0.
        hidden_size:
            Size of the fully connected layer between the convolutional stack and the regression output. The published
            main MRL model uses 40.
        dense_dropout:
            Dropout probability applied after the dense hidden layer. The published main MRL model uses 0.2.
        hidden_act:
            The non-linear activation function used by the convolutional and dense layers.
        num_labels:
            Number of output labels. Optimus 5-Prime predicts a single mean ribosome load (MRL) scalar, so this
            defaults to 1.
        head:
            The configuration of the sequence-level prediction head. Defaults to a regression head
            (`problem_type="regression"`), matching Optimus 5-Prime's MRL regression task.

    Examples:
        >>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel
        >>> # Initializing an Optimus 5-Prime style configuration
        >>> configuration = Optimus5PrimeConfig()
        >>> # Initializing a model (with random weights) from the configuration
        >>> model = Optimus5PrimeModel(configuration)
        >>> # Accessing the model configuration
        >>> configuration = model.config
    """

    model_type = "optimus5prime"

    def __init__(
        self,
        vocab_size: int = 5,
        sequence_length: int = 50,
        num_conv_layers: int = 3,
        conv_channels: int = 120,
        conv_kernel_size: int = 8,
        conv_dropout: float = 0.0,
        hidden_size: int = 40,
        dense_dropout: float = 0.2,
        hidden_act: str = "relu",
        num_labels: int = 1,
        head: HeadConfig | None = None,
        **kwargs,
    ):
        super().__init__(num_labels=num_labels, **kwargs)
        if vocab_size < 4:
            raise ValueError(
                f"vocab_size ({vocab_size}) must cover the four canonical nucleotides used by Optimus 5-Prime."
            )
        if sequence_length <= 0:
            raise ValueError(f"sequence_length ({sequence_length}) must be a positive integer.")
        if num_conv_layers < 1:
            raise ValueError(f"num_conv_layers ({num_conv_layers}) must be >= 1.")
        if conv_channels <= 0:
            raise ValueError(f"conv_channels ({conv_channels}) must be positive.")
        if conv_kernel_size <= 0:
            raise ValueError(f"conv_kernel_size ({conv_kernel_size}) must be positive.")
        if not 0.0 <= conv_dropout < 1.0:
            raise ValueError(f"conv_dropout ({conv_dropout}) must be in [0.0, 1.0).")
        if not 0.0 <= dense_dropout < 1.0:
            raise ValueError(f"dense_dropout ({dense_dropout}) must be in [0.0, 1.0).")
        if hidden_size <= 0:
            raise ValueError(f"hidden_size ({hidden_size}) must be positive.")
        self.vocab_size = vocab_size
        self.sequence_length = sequence_length
        self.num_conv_layers = num_conv_layers
        self.conv_channels = conv_channels
        self.conv_kernel_size = conv_kernel_size
        self.conv_dropout = conv_dropout
        self.hidden_size = hidden_size
        self.dense_dropout = dense_dropout
        self.hidden_act = hidden_act
        if head is None:
            head = HeadConfig(problem_type="regression")
        else:
            head = HeadConfig(head)
            if head.problem_type is None:
                head.problem_type = "regression"
        self.head = head

Optimus5PrimeForSequencePrediction ¶

Bases: Optimus5PrimePreTrainedModel

Optimus 5-Prime model with a sequence-level prediction head.

The published model is a regression network that predicts the mean ribosome load (MRL) scalar for a fixed 50 nt 5’UTR. This wrapper exposes the converted upstream regression decoder through the standard MultiMolecule sequence-prediction head.

示例：

Python Console Session
>>> import torch
>>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeForSequencePrediction, RnaTokenizer
>>> config = Optimus5PrimeConfig()
>>> model = Optimus5PrimeForSequencePrediction(config)
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
>>> output = model(**input, labels=torch.tensor([[1.0]]))
>>> output["logits"].shape
torch.Size([1, 1])
>>> output["loss"]
tensor(..., grad_fn=<MseLossBackward0>)

源代码位于： multimolecule/models/optimus5prime/modeling_optimus5prime.py

Python
class Optimus5PrimeForSequencePrediction(Optimus5PrimePreTrainedModel):
    """
    Optimus 5-Prime model with a sequence-level prediction head.

    The published model is a regression network that predicts the mean ribosome load (MRL) scalar for a fixed 50 nt
    5'UTR. This wrapper exposes the converted upstream regression decoder through the standard MultiMolecule
    sequence-prediction head.

    Examples:
        >>> import torch
        >>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeForSequencePrediction, RnaTokenizer
        >>> config = Optimus5PrimeConfig()
        >>> model = Optimus5PrimeForSequencePrediction(config)
        >>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
        >>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
        >>> output = model(**input, labels=torch.tensor([[1.0]]))
        >>> output["logits"].shape
        torch.Size([1, 1])
        >>> output["loss"]  # doctest:+ELLIPSIS
        tensor(..., grad_fn=<MseLossBackward0>)
    """

    def __init__(self, config: Optimus5PrimeConfig):
        super().__init__(config)
        self.model = Optimus5PrimeModel(config)
        self.sequence_head = SequencePredictionHead(config, config.head)
        self.head_config = self.sequence_head.config
        # Initialize weights and apply final processing
        self.post_init()

    @property
    def output_channels(self) -> list[str]:
        if self.sequence_head.num_labels != 1:
            return [f"mean_ribosome_load_{index}" for index in range(self.sequence_head.num_labels)]
        return ["mean_ribosome_load"]

    @can_return_tuple
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        labels: Tensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> tuple[Tensor, ...] | SequencePredictorOutput:
        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
            return_dict=True,
            **kwargs,
        )

        output = self.sequence_head(outputs, labels)

        return SequencePredictorOutput(loss=output.loss, logits=output.logits)

Optimus5PrimeModel ¶

Bases: Optimus5PrimePreTrainedModel

The bare Optimus 5-Prime model outputting the pre-regression shared representation.

示例：

Python Console Session
>>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel, RnaTokenizer
>>> config = Optimus5PrimeConfig()
>>> model = Optimus5PrimeModel(config)
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
>>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
>>> output = model(**input)
>>> output["pooler_output"].shape
torch.Size([1, 40])

源代码位于： multimolecule/models/optimus5prime/modeling_optimus5prime.py

Python
class Optimus5PrimeModel(Optimus5PrimePreTrainedModel):
    """
    The bare Optimus 5-Prime model outputting the pre-regression shared representation.

    Examples:
        >>> from multimolecule import Optimus5PrimeConfig, Optimus5PrimeModel, RnaTokenizer
        >>> config = Optimus5PrimeConfig()
        >>> model = Optimus5PrimeModel(config)
        >>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/optimus5prime")
        >>> input = tokenizer("ACGUACGUACGU", return_tensors="pt")
        >>> output = model(**input)
        >>> output["pooler_output"].shape
        torch.Size([1, 40])
    """

    def __init__(self, config: Optimus5PrimeConfig):
        super().__init__(config)
        self.embeddings = Optimus5PrimeEmbedding(config)
        self.encoder = Optimus5PrimeEncoder(config)
        # Initialize weights and apply final processing
        self.post_init()

    @merge_with_config_defaults
    @capture_outputs
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> Optimus5PrimeModelOutput:
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is None and inputs_embeds is None:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if isinstance(input_ids, NestedTensor):
            attention_mask = input_ids.mask
            input_ids = input_ids.tensor
        if isinstance(inputs_embeds, NestedTensor):
            attention_mask = inputs_embeds.mask
            inputs_embeds = inputs_embeds.tensor

        embedding_output = self.embeddings(
            input_ids=input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
        )

        pooled_output = self.encoder(embedding_output)

        return Optimus5PrimeModelOutput(pooler_output=pooled_output)

Optimus5PrimeModelOutput `dataclass` ¶

Bases: ModelOutput

Base class for outputs of the Optimus 5-Prime model.

参数：

名称	类型	描述	默认
`pooler_output` ¶	`torch.FloatTensor` of shape `(batch_size, hidden_size)`	The pre-regression dense representation consumed by the MRL regression layer.	`None`

源代码位于： multimolecule/models/optimus5prime/modeling_optimus5prime.py

Python
@dataclass
class Optimus5PrimeModelOutput(ModelOutput):
    """
    Base class for outputs of the Optimus 5-Prime model.

    Args:
        pooler_output (`torch.FloatTensor` of shape `(batch_size, hidden_size)`):
            The pre-regression dense representation consumed by the MRL regression layer.
    """

    pooler_output: torch.FloatTensor | None = None

Optimus5PrimePreTrainedModel ¶

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

源代码位于： multimolecule/models/optimus5prime/modeling_optimus5prime.py

Python
class Optimus5PrimePreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = Optimus5PrimeConfig
    base_model_prefix = "model"
    _can_record_outputs: dict[str, Any] | None = None
    _no_split_modules = ["Optimus5PrimeEncoder"]

    @torch.no_grad()
    def _init_weights(self, module):
        super()._init_weights(module)
        # Use transformers.initialization wrappers (imported as `init`); they check the
        # `_is_hf_initialized` flag so they don't clobber tensors loaded from a checkpoint.
        if isinstance(module, (nn.Conv1d, nn.Linear)):
            init.kaiming_uniform_(module.weight, a=math.sqrt(5))
            if module.bias is not None:
                fan_in, _ = nn.init._calculate_fan_in_and_fan_out(module.weight)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                init.uniform_(module.bias, -bound, bound)

Optimus 5-Prime¶

Disclaimer¶

Model Details¶

Model Specification¶

Links¶

Usage¶

Direct Use¶

Mean Ribosome Load Prediction¶

Interface¶

Variant Effect¶

Training Details¶

Training Data¶

Training Procedure¶

Pre-training¶

Citation¶

Contact¶

License¶

API Reference¶

Optimus5PrimeConfig ¶

`vocab_size` ¶

`sequence_length` ¶

`num_conv_layers` ¶

`conv_channels` ¶

`conv_kernel_size` ¶

`conv_dropout` ¶

`hidden_size` ¶

`dense_dropout` ¶

`hidden_act` ¶

`num_labels` ¶

`head` ¶

Optimus5PrimeForSequencePrediction ¶

Optimus5PrimeModel ¶

Optimus5PrimeModelOutput `dataclass` ¶

`pooler_output` ¶

Optimus5PrimePreTrainedModel ¶

Optimus 5-Prime¶

Disclaimer¶

Model Details¶

Model Specification¶

Links¶

Usage¶

Direct Use¶

Mean Ribosome Load Prediction¶

Interface¶

Variant Effect¶

Training Details¶

Training Data¶

Training Procedure¶

Pre-training¶

Citation¶

Contact¶

License¶

API Reference¶

Optimus5PrimeConfig ¶

vocab_size ¶

sequence_length ¶

num_conv_layers ¶

conv_channels ¶

conv_kernel_size ¶

conv_dropout ¶

hidden_size ¶

dense_dropout ¶

hidden_act ¶

num_labels ¶

head ¶

Optimus5PrimeForSequencePrediction ¶

Optimus5PrimeModel ¶

Optimus5PrimeModelOutput dataclass ¶

pooler_output ¶

Optimus5PrimePreTrainedModel ¶

`vocab_size` ¶

`sequence_length` ¶

`num_conv_layers` ¶

`conv_channels` ¶

`conv_kernel_size` ¶

`conv_dropout` ¶

`hidden_size` ¶

`dense_dropout` ¶

`hidden_act` ¶

`num_labels` ¶

`head` ¶

Optimus5PrimeModelOutput `dataclass` ¶

`pooler_output` ¶