Skip to content

MPRA-DragoNN

Disclaimer

This is an UNOFFICIAL implementation of Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays by Rajiv Movva et al.

The OFFICIAL repository of MPRA-DragoNN is at kundajelab/MPRA-DragoNN.

Tip

The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

The team releasing MPRA-DragoNN did not write this model card for this model so this model card has been written by the MultiMolecule team.

Model Details

MPRA-DragoNN is a convolutional neural network (CNN) trained to quantitatively predict Sharpr-MPRA reporter activity from 145 bp DNA sequences. The released ConvModel consists of three convolutional blocks (Conv1D + ReLU + BatchNorm + Dropout, 120 filters of width 5 with valid padding) followed by a flatten and a single fully-connected layer that emits 12 task outputs. Each task corresponds to a (cell line, reporter promoter, replicate) combination from the Sharpr-MPRA experiment: the K562 and HepG2 cell lines, each measured with both a minimal promoter (minP) and the strong SV40 promoter (SV40p), with two individual replicates plus a pooled average per condition. Please refer to the Training Details section for more information on the training process.

Model Specification

Num Conv Layers Num FC Layers Hidden Size Num Parameters (M) FLOPs (M) MACs (M) Max Num Tokens
3 1 15960 0.34 40.40 20.05 145

Usage

The model file depends on the multimolecule library. You can install it using pip:

Bash
pip install multimolecule

Direct Use

MPRA Activity Prediction

You can use this model directly to predict the Sharpr-MPRA activity of a 145 bp DNA sequence:

Python
>>> import torch
>>> from multimolecule import DnaTokenizer, MpraDragoNnForSequencePrediction

>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mpradragonn")
>>> model = MpraDragoNnForSequencePrediction.from_pretrained("multimolecule/mpradragonn")
>>> sequence = "ACGT" * 36 + "A"
>>> output = model(**tokenizer(sequence, return_tensors="pt"))

>>> output.logits.shape
torch.Size([1, 12])

Interface

  • Input length: fixed 145 bp DNA window
  • Output: 12 MPRA activity scalars in the order k562_minp_{rep1, rep2, avg}, k562_sv40p_{rep1, rep2, avg}, hepg2_minp_{rep1, rep2, avg}, hepg2_sv40p_{rep1, rep2, avg} (z-scored log2 RNA/DNA ratios)

Training Details

MPRA-DragoNN was trained to predict quantitative Sharpr-MPRA reporter activity from DNA sequence.

Training Data

MPRA-DragoNN was trained on the Sharpr-MPRA dataset (Ernst et al. 2016, GEO accession GSE71279) which assays ~487K 145 bp candidate regulatory elements in K562 and HepG2 cell lines under two reporter promoters (a minimal promoter and the strong SV40 promoter) and provides two replicates plus a pooled count per condition (12 tasks total).

Raw counts were preprocessed by (1) computing log2((RNA + 1) / (DNA + 1)) per task, (2) column-wise z-score normalisation per task, and (3) augmenting with the reverse complement of every sequence. Chromosomes were split with chr8 held out as validation, chr18 held out as test, and all remaining chromosomes used for training (~900K training, ~30K validation, ~20K test sequences after the reverse-complement augmentation).

Training Procedure

Pre-training

The model was trained to minimise a task-wise mean-squared-error loss between predicted and measured MPRA activities and evaluated with Spearman correlation per task.

  • Optimizer: Adam
  • Loss: Mean Squared Error (task-wise, equally weighted)
  • Regularization: Batch normalization and dropout (p=0.1) after every convolutional block
  • Validation: chr8 sequences; Test: chr18 sequences

Citation

BibTeX
@article{movva2019mpradragonn,
  author    = {Movva, Rajiv and Greenside, Peyton and Marinov, Georgi K. and Nair, Surag and Shrikumar, Avanti and Kundaje, Anshul},
  title     = {Deciphering regulatory {DNA} sequences and noncoding genetic variants using neural network models of massively parallel reporter assays},
  journal   = {PLoS ONE},
  volume    = 14,
  number    = 6,
  pages     = {e0218073},
  year      = 2019,
  publisher = {Public Library of Science},
  doi       = {10.1371/journal.pone.0218073}
}

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:

BibTeX
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}

Contact

Please use GitHub issues of MultiMolecule for any questions or comments on the model card.

Please contact the authors of the MPRA-DragoNN paper for questions or comments on the paper/model.

License

This model implementation is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Text Only
SPDX-License-Identifier: AGPL-3.0-or-later

multimolecule.models.mpradragonn

DnaTokenizer

Bases: Tokenizer

Tokenizer for DNA sequences.

Parameters:

Name Type Description Default

alphabet

Alphabet | str | List[str] | None

alphabet to use for tokenization.

  • If is None, the standard RNA alphabet will be used.
  • If is a string, it should correspond to the name of a predefined alphabet. The options include
    • standard
    • iupac
    • streamline
    • nucleobase
  • If is an alphabet or a list of characters, that specific alphabet will be used.
None

nmers

int

Size of kmer to tokenize.

1

codon

bool

Whether to tokenize into codons.

False

replace_U_with_T

bool

Whether to replace U with T.

True

do_upper_case

bool

Whether to convert input to uppercase.

True

Examples:

Python Console Session
>>> from multimolecule import DnaTokenizer
>>> tokenizer = DnaTokenizer()
>>> tokenizer('<pad><cls><eos><unk><mask><null>ACGTNRYSWKMBDHVX|.*-?')["input_ids"]
[1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2]
>>> tokenizer('acgt')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer('acgu')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer = DnaTokenizer(replace_U_with_T=False)
>>> tokenizer('acgu')["input_ids"]
[1, 6, 7, 8, 3, 2]
>>> tokenizer = DnaTokenizer(nmers=3)
>>> tokenizer('tataaagta')["input_ids"]
[1, 84, 21, 81, 6, 8, 19, 71, 2]
>>> tokenizer = DnaTokenizer(codon=True)
>>> tokenizer('tataaagta')["input_ids"]
[1, 84, 6, 71, 2]
>>> tokenizer('tataaagtaa')["input_ids"]
Traceback (most recent call last):
ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10
Source code in multimolecule/tokenisers/dna/tokenization_dna.py
Python
class DnaTokenizer(Tokenizer):
    """
    Tokenizer for DNA sequences.

    Args:
        alphabet: alphabet to use for tokenization.

            - If is `None`, the standard RNA alphabet will be used.
            - If is a `string`, it should correspond to the name of a predefined alphabet. The options include
                + `standard`
                + `iupac`
                + `streamline`
                + `nucleobase`
            - If is an alphabet or a list of characters, that specific alphabet will be used.
        nmers: Size of kmer to tokenize.
        codon: Whether to tokenize into codons.
        replace_U_with_T: Whether to replace U with T.
        do_upper_case: Whether to convert input to uppercase.

    Examples:
        >>> from multimolecule import DnaTokenizer
        >>> tokenizer = DnaTokenizer()
        >>> tokenizer('<pad><cls><eos><unk><mask><null>ACGTNRYSWKMBDHVX|.*-?')["input_ids"]
        [1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2]
        >>> tokenizer('acgt')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer('acgu')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer = DnaTokenizer(replace_U_with_T=False)
        >>> tokenizer('acgu')["input_ids"]
        [1, 6, 7, 8, 3, 2]
        >>> tokenizer = DnaTokenizer(nmers=3)
        >>> tokenizer('tataaagta')["input_ids"]
        [1, 84, 21, 81, 6, 8, 19, 71, 2]
        >>> tokenizer = DnaTokenizer(codon=True)
        >>> tokenizer('tataaagta')["input_ids"]
        [1, 84, 6, 71, 2]
        >>> tokenizer('tataaagtaa')["input_ids"]
        Traceback (most recent call last):
        ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10
    """

    model_input_names = ["input_ids", "attention_mask"]

    def __init__(
        self,
        alphabet: Alphabet | str | List[str] | None = None,
        nmers: int = 1,
        codon: bool = False,
        replace_U_with_T: bool = True,
        do_upper_case: bool = True,
        additional_special_tokens: List | Tuple | None = None,
        **kwargs,
    ):
        if codon and (nmers > 1 and nmers != 3):
            raise ValueError("Codon and nmers cannot be used together.")
        if codon:
            nmers = 3  # set to 3 to get correct vocab
        if not isinstance(alphabet, Alphabet):
            alphabet = get_alphabet(alphabet, nmers=nmers)
        super().__init__(
            alphabet=alphabet,
            nmers=nmers,
            codon=codon,
            replace_U_with_T=replace_U_with_T,
            do_upper_case=do_upper_case,
            additional_special_tokens=additional_special_tokens,
            **kwargs,
        )
        self.replace_U_with_T = replace_U_with_T
        self.nmers = nmers
        self.codon = codon

    def _tokenize(self, text: str, **kwargs):
        if self.do_upper_case:
            text = text.upper()
        if self.replace_U_with_T:
            text = text.replace("U", "T")
        if self.codon:
            if len(text) % 3 != 0:
                raise ValueError(
                    f"length of input sequence must be a multiple of 3 for codon tokenization, but got {len(text)}"
                )
            return [text[i : i + 3] for i in range(0, len(text), 3)]
        if self.nmers > 1:
            return [text[i : i + self.nmers] for i in range(len(text) - self.nmers + 1)]  # noqa: E203
        return list(text)

MpraDragoNnConfig

Bases: PreTrainedConfig

This is the configuration class to store the configuration of a MpraDragoNnModel. It is used to instantiate an MPRA-DragoNN model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the MPRA-DragoNN kundajelab/MPRA-DragoNN ConvModel architecture.

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

Parameters:

Name Type Description Default

vocab_size

int

Vocabulary size of the MPRA-DragoNN model. Defines the number of feature channels in the one-hot encoded input fed to the first convolution. Defaults to 5.

5

input_length

int

The fixed length (in base pairs) of the input DNA sequence. Defaults to 145.

145

num_conv_layers

int

Number of convolutional blocks (Conv1D + BatchNorm + activation + Dropout).

3

conv_channels

list[int] | None

Number of output channels for each convolutional block.

None

conv_kernel_sizes

list[int] | None

Convolution kernel size for each convolutional block.

None

hidden_act

str

The non-linear activation function (function or string) in the encoder. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

'relu'

hidden_dropout

float

The dropout probability applied after each convolutional block.

0.1

batch_norm_eps

float

The epsilon used by the batch normalization layers.

0.001

batch_norm_momentum

float

The momentum used by the batch normalization layers (PyTorch convention; equivalent to 1 - momentum in Keras, which uses 0.99 in the upstream checkpoint).

0.01

num_labels

int

Number of regression outputs. MPRA-DragoNN predicts Sharpr-MPRA activity for 12 tasks: K562 / HepG2 cell lines, each with minP and SV40p reporter promoters, each measured as two replicates plus a pooled “avg” track (2 cells x 2 promoters x 3 measurements = 12 tasks).

12

head

HeadConfig | None

The configuration of the prediction head. Defaults to a regression head (problem_type="regression"), matching MPRA-DragoNN’s MPRA activity prediction task.

None

Examples:

Python Console Session
1
2
3
4
5
6
7
>>> from multimolecule import MpraDragoNnConfig, MpraDragoNnModel
>>> # Initializing an MPRA-DragoNN multimolecule/mpradragonn style configuration
>>> configuration = MpraDragoNnConfig()
>>> # Initializing a model (with random weights) from the multimolecule/mpradragonn style configuration
>>> model = MpraDragoNnModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in multimolecule/models/mpradragonn/configuration_mpradragonn.py
Python
class MpraDragoNnConfig(PreTrainedConfig):
    r"""
    This is the configuration class to store the configuration of a
    [`MpraDragoNnModel`][multimolecule.models.MpraDragoNnModel]. It is used to instantiate an MPRA-DragoNN model
    according to the specified arguments, defining the model architecture. Instantiating a configuration with the
    defaults will yield a similar configuration to that of the MPRA-DragoNN
    [kundajelab/MPRA-DragoNN](https://github.com/kundajelab/MPRA-DragoNN) ConvModel architecture.

    Configuration objects inherit from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig] and can be used to
    control the model outputs. Read the documentation from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig]
    for more information.

    Args:
        vocab_size:
            Vocabulary size of the MPRA-DragoNN model. Defines the number of feature channels in the one-hot encoded
            input fed to the first convolution.
            Defaults to 5.
        input_length:
            The fixed length (in base pairs) of the input DNA sequence.
            Defaults to 145.
        num_conv_layers:
            Number of convolutional blocks (Conv1D + BatchNorm + activation + Dropout).
        conv_channels:
            Number of output channels for each convolutional block.
        conv_kernel_sizes:
            Convolution kernel size for each convolutional block.
        hidden_act:
            The non-linear activation function (function or string) in the encoder. If string, `"gelu"`, `"relu"`,
            `"silu"` and `"gelu_new"` are supported.
        hidden_dropout:
            The dropout probability applied after each convolutional block.
        batch_norm_eps:
            The epsilon used by the batch normalization layers.
        batch_norm_momentum:
            The momentum used by the batch normalization layers (PyTorch convention; equivalent to ``1 - momentum``
            in Keras, which uses 0.99 in the upstream checkpoint).
        num_labels:
            Number of regression outputs. MPRA-DragoNN predicts Sharpr-MPRA activity for 12 tasks: K562 / HepG2 cell
            lines, each with minP and SV40p reporter promoters, each measured as two replicates plus a pooled "avg"
            track (2 cells x 2 promoters x 3 measurements = 12 tasks).
        head:
            The configuration of the prediction head. Defaults to a regression head
            (`problem_type="regression"`), matching MPRA-DragoNN's MPRA activity prediction task.

    Examples:
        >>> from multimolecule import MpraDragoNnConfig, MpraDragoNnModel
        >>> # Initializing an MPRA-DragoNN multimolecule/mpradragonn style configuration
        >>> configuration = MpraDragoNnConfig()
        >>> # Initializing a model (with random weights) from the multimolecule/mpradragonn style configuration
        >>> model = MpraDragoNnModel(configuration)
        >>> # Accessing the model configuration
        >>> configuration = model.config
    """

    model_type = "mpradragonn"

    def __init__(
        self,
        vocab_size: int = 5,
        input_length: int = 145,
        num_conv_layers: int = 3,
        conv_channels: list[int] | None = None,
        conv_kernel_sizes: list[int] | None = None,
        hidden_act: str = "relu",
        hidden_dropout: float = 0.1,
        batch_norm_eps: float = 1e-3,
        batch_norm_momentum: float = 0.01,
        num_labels: int = 12,
        head: HeadConfig | None = None,
        **kwargs,
    ):
        super().__init__(num_labels=num_labels, **kwargs)
        if conv_channels is None:
            conv_channels = [120, 120, 120]
        if conv_kernel_sizes is None:
            conv_kernel_sizes = [5, 5, 5]
        if len(conv_channels) != num_conv_layers:
            raise ValueError(f"conv_channels must have {num_conv_layers} entries, got {len(conv_channels)}.")
        if len(conv_kernel_sizes) != num_conv_layers:
            raise ValueError(f"conv_kernel_sizes must have {num_conv_layers} entries, got {len(conv_kernel_sizes)}.")
        if input_length <= 0:
            raise ValueError(f"input_length must be positive, got {input_length}.")
        trimmed = input_length
        for kernel_size in conv_kernel_sizes:
            trimmed = trimmed - kernel_size + 1
            if trimmed <= 0:
                raise ValueError(
                    f"input_length={input_length} is too short for the configured conv stack; "
                    f"the feature map collapses after kernel sizes {conv_kernel_sizes}."
                )
        self.vocab_size = vocab_size
        self.input_length = input_length
        self.num_conv_layers = num_conv_layers
        self.conv_channels = conv_channels
        self.conv_kernel_sizes = conv_kernel_sizes
        # `hidden_size` is the dimensionality of the pooled feature vector that the prediction
        # head consumes (the flattened conv feature map), since MPRA-DragoNN has no learned pooler.
        self.hidden_size = trimmed * conv_channels[-1]
        self.hidden_act = hidden_act
        self.hidden_dropout = hidden_dropout
        self.batch_norm_eps = batch_norm_eps
        self.batch_norm_momentum = batch_norm_momentum
        if head is None:
            head = HeadConfig(problem_type="regression")
        else:
            head = HeadConfig(head)
            if head.problem_type is None:
                head.problem_type = "regression"
        self.head = head

    @property
    def pooled_length(self) -> int:
        length = self.input_length
        for kernel_size in self.conv_kernel_sizes:
            length = length - kernel_size + 1
        return length

    @property
    def flattened_size(self) -> int:
        return self.pooled_length * self.conv_channels[-1]

MpraDragoNnForSequencePrediction

Bases: MpraDragoNnPreTrainedModel

Examples:

Python Console Session
1
2
3
4
5
6
7
8
9
>>> import torch
>>> from multimolecule import MpraDragoNnConfig, MpraDragoNnForSequencePrediction, DnaTokenizer
>>> config = MpraDragoNnConfig()
>>> model = MpraDragoNnForSequencePrediction(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mpradragonn")
>>> input = tokenizer(["ACGT" * 36 + "A", "TGCA" * 36 + "T"], return_tensors="pt")
>>> output = model(**input, labels=torch.randn(2, 12))
>>> output["logits"].shape
torch.Size([2, 12])
Source code in multimolecule/models/mpradragonn/modeling_mpradragonn.py
Python
class MpraDragoNnForSequencePrediction(MpraDragoNnPreTrainedModel):
    """
    Examples:
        >>> import torch
        >>> from multimolecule import MpraDragoNnConfig, MpraDragoNnForSequencePrediction, DnaTokenizer
        >>> config = MpraDragoNnConfig()
        >>> model = MpraDragoNnForSequencePrediction(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mpradragonn")
        >>> input = tokenizer(["ACGT" * 36 + "A", "TGCA" * 36 + "T"], return_tensors="pt")
        >>> output = model(**input, labels=torch.randn(2, 12))
        >>> output["logits"].shape
        torch.Size([2, 12])
    """

    def __init__(self, config: MpraDragoNnConfig):
        super().__init__(config)
        self.model = MpraDragoNnModel(config)
        self.sequence_head = SequencePredictionHead(config)
        self.head_config = self.sequence_head.config

        # Initialize weights and apply final processing
        self.post_init()

    @property
    def output_channels(self) -> list[str]:
        # Upstream Sharpr-MPRA task order from the Kipoi ConvModel schema:
        #   k562 minP {rep1, rep2, avg}, k562 sv40p {rep1, rep2, avg},
        #   hepg2 minP {rep1, rep2, avg}, hepg2 sv40p {rep1, rep2, avg}.
        if self.config.num_labels == 12:
            return [
                "k562_minp_rep1",
                "k562_minp_rep2",
                "k562_minp_avg",
                "k562_sv40p_rep1",
                "k562_sv40p_rep2",
                "k562_sv40p_avg",
                "hepg2_minp_rep1",
                "hepg2_minp_rep2",
                "hepg2_minp_avg",
                "hepg2_sv40p_rep1",
                "hepg2_sv40p_rep2",
                "hepg2_sv40p_avg",
            ]
        return [f"mpra_activity_{index}" for index in range(self.config.num_labels)]

    @can_return_tuple
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        labels: Tensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> Tuple[Tensor, ...] | SequencePredictorOutput:
        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
            return_dict=True,
            **kwargs,
        )

        output = self.sequence_head(outputs, labels)
        logits, loss = output.logits, output.loss

        return SequencePredictorOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

MpraDragoNnModel

Bases: MpraDragoNnPreTrainedModel

Examples:

Python Console Session
1
2
3
4
5
6
7
8
>>> from multimolecule import MpraDragoNnConfig, MpraDragoNnModel, DnaTokenizer
>>> config = MpraDragoNnConfig()
>>> model = MpraDragoNnModel(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mpradragonn")
>>> input = tokenizer(["ACGT" * 36 + "A", "TGCA" * 36 + "T"], return_tensors="pt")
>>> output = model(**input)
>>> output["pooler_output"].shape
torch.Size([2, 15960])
Source code in multimolecule/models/mpradragonn/modeling_mpradragonn.py
Python
class MpraDragoNnModel(MpraDragoNnPreTrainedModel):
    """
    Examples:
        >>> from multimolecule import MpraDragoNnConfig, MpraDragoNnModel, DnaTokenizer
        >>> config = MpraDragoNnConfig()
        >>> model = MpraDragoNnModel(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/mpradragonn")
        >>> input = tokenizer(["ACGT" * 36 + "A", "TGCA" * 36 + "T"], return_tensors="pt")
        >>> output = model(**input)
        >>> output["pooler_output"].shape
        torch.Size([2, 15960])
    """

    def __init__(self, config: MpraDragoNnConfig):
        super().__init__(config)
        self.embeddings = MpraDragoNnEmbedding(config)
        self.encoder = MpraDragoNnEncoder(config)

        # Initialize weights and apply final processing
        self.post_init()

    @merge_with_config_defaults
    @capture_outputs
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> MpraDragoNnModelOutput:
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        elif input_ids is None and inputs_embeds is None:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if isinstance(input_ids, NestedTensor):
            if attention_mask is None:
                attention_mask = input_ids.mask
            input_ids = input_ids.tensor
        if isinstance(inputs_embeds, NestedTensor):
            if attention_mask is None:
                attention_mask = inputs_embeds.mask
            inputs_embeds = inputs_embeds.tensor

        embedding_output = self.embeddings(
            input_ids=input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
        )
        sequence_output = self.encoder(embedding_output)

        # MPRA-DragoNN has no learned pooler: the flattened convolutional feature map is fed
        # directly into the Dense regression head. Expose it as both `last_hidden_state` and
        # `pooler_output` so the shared `SequencePredictionHead` (which reads `pooler_output`)
        # consumes the right tensor without an extra projection.
        return MpraDragoNnModelOutput(
            last_hidden_state=sequence_output,
            pooler_output=sequence_output,
        )

MpraDragoNnModelOutput dataclass

Bases: ModelOutput

Base class for outputs of the MPRA-DragoNN model.

Parameters:

Name Type Description Default

last_hidden_state

`torch.FloatTensor` of shape `(batch_size, pooled_length * conv_channels[-1])`

Flattened feature map produced by the convolutional encoder.

None

pooler_output

`torch.FloatTensor` of shape `(batch_size, pooled_length * conv_channels[-1])`

Sequence-level representation. MPRA-DragoNN has no learned pooler, so this is the same flattened convolutional feature map as last_hidden_state; the regression head consumes it directly.

None

attentions

`tuple(torch.FloatTensor)`, *optional*

Always None; MPRA-DragoNN is a convolutional model without attention.

None
Source code in multimolecule/models/mpradragonn/modeling_mpradragonn.py
Python
@dataclass
class MpraDragoNnModelOutput(ModelOutput):
    """
    Base class for outputs of the MPRA-DragoNN model.

    Args:
        last_hidden_state (`torch.FloatTensor` of shape `(batch_size, pooled_length * conv_channels[-1])`):
            Flattened feature map produced by the convolutional encoder.
        pooler_output (`torch.FloatTensor` of shape `(batch_size, pooled_length * conv_channels[-1])`):
            Sequence-level representation. MPRA-DragoNN has no learned pooler, so this is the same flattened
            convolutional feature map as `last_hidden_state`; the regression head consumes it directly.
        hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or
            when `config.output_hidden_states=True`):
            Hidden-states of the model at the output of each layer.
        attentions (`tuple(torch.FloatTensor)`, *optional*):
            Always `None`; MPRA-DragoNN is a convolutional model without attention.
    """

    last_hidden_state: torch.FloatTensor | None = None
    pooler_output: torch.FloatTensor | None = None
    hidden_states: tuple[torch.FloatTensor, ...] | None = None
    attentions: tuple[torch.FloatTensor, ...] | None = None

MpraDragoNnPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in multimolecule/models/mpradragonn/modeling_mpradragonn.py
Python
class MpraDragoNnPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = MpraDragoNnConfig
    base_model_prefix = "model"
    supports_gradient_checkpointing = True
    _can_record_outputs: dict[str, Any] | None = None
    _no_split_modules = ["MpraDragoNnBlock"]

    @torch.no_grad()
    def _init_weights(self, module):
        super()._init_weights(module)
        # Use transformers.initialization wrappers (imported as `init`); they check the
        # `_is_hf_initialized` flag so they don't clobber tensors loaded from a checkpoint.
        if isinstance(module, nn.Conv1d):
            init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu")
            if module.bias is not None:
                init.zeros_(module.bias)
        # copied from the `reset_parameters` method of `class Linear(Module)` in `torch`.
        elif isinstance(module, nn.Linear):
            init.kaiming_uniform_(module.weight, a=math.sqrt(5))
            if module.bias is not None:
                fan_in, _ = nn.init._calculate_fan_in_and_fan_out(module.weight)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                init.uniform_(module.bias, -bound, bound)
        elif isinstance(module, (nn.BatchNorm1d, nn.LayerNorm, nn.GroupNorm)):
            init.ones_(module.weight)
            init.zeros_(module.bias)