Skip to content

APARENT

APARENT

Convolutional neural network for predicting human 3’UTR Alternative Polyadenylation (APA) from sequence.

Disclaimer

This is an UNOFFICIAL implementation of A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation by Nicholas Bogard, Johannes Linder et al.

The OFFICIAL repository of APARENT is at johli/aparent.

Tip

The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

The team releasing APARENT did not write this model card for this model so this model card has been written by the MultiMolecule team.

Model Details

APARENT (APA REgression NeT) is a convolutional neural network trained on more than 3.5 million randomized 3’UTR poly-A signals expressed on mini-gene reporters in HEK293. Given a fixed-length 205 nt 3’UTR/polyA sequence, APARENT predicts the alternative-polyadenylation isoform proportion (a scalar) and a positional cleavage distribution. The model is primarily used to score the impact of genetic variants on APA regulation and to engineer new polyadenylation signals. Please refer to the Training Details section for more information on the training process.

The base, non-normalised APARENT model is recommended by the original authors for isoform and variant-effect prediction.

Model Specification

Num Layers Hidden Size Num Parameters (M) FLOPs (G) MACs (G) Max Num Tokens
4 256 6.43 0.03 0.01 205

Usage

The model file depends on the multimolecule library. You can install it using pip:

Bash
pip install multimolecule

Direct Use

APA Isoform Prediction

You can use this model directly to predict the APA isoform proportion of a 3’UTR/polyA sequence:

Python
1
2
3
4
5
6
7
8
>>> from multimolecule import DnaTokenizer, AparentForSequencePrediction

>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent")
>>> model = AparentForSequencePrediction.from_pretrained("multimolecule/aparent")
>>> output = model(**tokenizer("ACGTACGTACGT", return_tensors="pt"))

>>> output.keys()
odict_keys(['logits'])

The full APARENT isoform and cleavage outputs are available on the backbone:

Python
1
2
3
4
5
6
7
8
>>> from multimolecule import DnaTokenizer, AparentModel

>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent")
>>> model = AparentModel.from_pretrained("multimolecule/aparent")
>>> output = model(**tokenizer("ACGTACGTACGT", return_tensors="pt"))

>>> output.keys()
odict_keys(['pooler_output', 'isoform_logits', 'cleavage_logits'])

Interface

  • Input length: fixed 205 nt 3’UTR / polyA sequence
  • Output (AparentModel): isoform_logits (scalar APA proportion) + cleavage_logits (206-dim positional cleavage distribution)
  • Output (AparentForSequencePrediction): APA isoform scalar only (logits)

Training Details

APARENT was trained to jointly predict the APA isoform proportion and the positional cleavage distribution of randomized 3’UTR poly-A signals.

Training Data

APARENT was trained on more than 3.5 million randomized 3’UTR poly-A signal sequences expressed on mini-gene reporters in HEK293 cells (a massively parallel reporter assay, MPRA). The raw sequencing data for the 3’UTR MPRA libraries are available at GEO accession GSE113849.

This APARENT model was trained on all MPRA libraries (no libraries held out) to produce the best general-purpose APA predictor; it differs from the per-library held-out model evaluated in the paper.

Training Procedure

Pre-training

The model was trained to minimize a combined objective: a sigmoid KL-divergence on the isoform proportion and a KL-divergence on the positional cleavage distribution, weighted equally.

Citation

BibTeX
@article{bogard2019adeep,
  author    = {Bogard, Nicholas and Linder, Johannes and Rosenberg, Alexander B. and Seelig, Georg},
  title     = {A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation},
  journal   = {Cell},
  volume    = {178},
  number    = {1},
  pages     = {91--106.e23},
  year      = {2019},
  publisher = {Elsevier BV},
  doi       = {10.1016/j.cell.2019.04.046}
}

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:

BibTeX
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}

Contact

Please use GitHub issues of MultiMolecule for any questions or comments on the model card.

Please contact the authors of the APARENT paper for questions or comments on the paper/model.

License

This model implementation is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Text Only
SPDX-License-Identifier: AGPL-3.0-or-later

multimolecule.models.aparent

DnaTokenizer

Bases: Tokenizer

Tokenizer for DNA sequences.

Parameters:

Name Type Description Default

alphabet

Alphabet | str | List[str] | None

alphabet to use for tokenization.

  • If is None, the standard RNA alphabet will be used.
  • If is a string, it should correspond to the name of a predefined alphabet. The options include
    • standard
    • iupac
    • streamline
    • nucleobase
  • If is an alphabet or a list of characters, that specific alphabet will be used.
None

nmers

int

Size of kmer to tokenize.

1

codon

bool

Whether to tokenize into codons.

False

replace_U_with_T

bool

Whether to replace U with T.

True

do_upper_case

bool

Whether to convert input to uppercase.

True

Examples:

Python Console Session
>>> from multimolecule import DnaTokenizer
>>> tokenizer = DnaTokenizer()
>>> tokenizer('<pad><cls><eos><unk><mask><null>ACGTNRYSWKMBDHVX|.*-?')["input_ids"]
[1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2]
>>> tokenizer('acgt')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer('acgu')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer = DnaTokenizer(replace_U_with_T=False)
>>> tokenizer('acgu')["input_ids"]
[1, 6, 7, 8, 3, 2]
>>> tokenizer = DnaTokenizer(nmers=3)
>>> tokenizer('tataaagta')["input_ids"]
[1, 84, 21, 81, 6, 8, 19, 71, 2]
>>> tokenizer = DnaTokenizer(codon=True)
>>> tokenizer('tataaagta')["input_ids"]
[1, 84, 6, 71, 2]
>>> tokenizer('tataaagtaa')["input_ids"]
Traceback (most recent call last):
ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10
Source code in multimolecule/tokenisers/dna/tokenization_dna.py
Python
class DnaTokenizer(Tokenizer):
    """
    Tokenizer for DNA sequences.

    Args:
        alphabet: alphabet to use for tokenization.

            - If is `None`, the standard RNA alphabet will be used.
            - If is a `string`, it should correspond to the name of a predefined alphabet. The options include
                + `standard`
                + `iupac`
                + `streamline`
                + `nucleobase`
            - If is an alphabet or a list of characters, that specific alphabet will be used.
        nmers: Size of kmer to tokenize.
        codon: Whether to tokenize into codons.
        replace_U_with_T: Whether to replace U with T.
        do_upper_case: Whether to convert input to uppercase.

    Examples:
        >>> from multimolecule import DnaTokenizer
        >>> tokenizer = DnaTokenizer()
        >>> tokenizer('<pad><cls><eos><unk><mask><null>ACGTNRYSWKMBDHVX|.*-?')["input_ids"]
        [1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2]
        >>> tokenizer('acgt')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer('acgu')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer = DnaTokenizer(replace_U_with_T=False)
        >>> tokenizer('acgu')["input_ids"]
        [1, 6, 7, 8, 3, 2]
        >>> tokenizer = DnaTokenizer(nmers=3)
        >>> tokenizer('tataaagta')["input_ids"]
        [1, 84, 21, 81, 6, 8, 19, 71, 2]
        >>> tokenizer = DnaTokenizer(codon=True)
        >>> tokenizer('tataaagta')["input_ids"]
        [1, 84, 6, 71, 2]
        >>> tokenizer('tataaagtaa')["input_ids"]
        Traceback (most recent call last):
        ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10
    """

    model_input_names = ["input_ids", "attention_mask"]

    def __init__(
        self,
        alphabet: Alphabet | str | List[str] | None = None,
        nmers: int = 1,
        codon: bool = False,
        replace_U_with_T: bool = True,
        do_upper_case: bool = True,
        additional_special_tokens: List | Tuple | None = None,
        **kwargs,
    ):
        if codon and (nmers > 1 and nmers != 3):
            raise ValueError("Codon and nmers cannot be used together.")
        if codon:
            nmers = 3  # set to 3 to get correct vocab
        if not isinstance(alphabet, Alphabet):
            alphabet = get_alphabet(alphabet, nmers=nmers)
        super().__init__(
            alphabet=alphabet,
            nmers=nmers,
            codon=codon,
            replace_U_with_T=replace_U_with_T,
            do_upper_case=do_upper_case,
            additional_special_tokens=additional_special_tokens,
            **kwargs,
        )
        self.replace_U_with_T = replace_U_with_T
        self.nmers = nmers
        self.codon = codon

    def _tokenize(self, text: str, **kwargs):
        if self.do_upper_case:
            text = text.upper()
        if self.replace_U_with_T:
            text = text.replace("U", "T")
        if self.codon:
            if len(text) % 3 != 0:
                raise ValueError(
                    f"length of input sequence must be a multiple of 3 for codon tokenization, but got {len(text)}"
                )
            return [text[i : i + 3] for i in range(0, len(text), 3)]
        if self.nmers > 1:
            return [text[i : i + self.nmers] for i in range(len(text) - self.nmers + 1)]  # noqa: E203
        return list(text)

AparentConfig

Bases: PreTrainedConfig

This is the configuration class to store the configuration of a AparentModel. It is used to instantiate an APARENT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the APARENT johli/aparent architecture.

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

Parameters:

Name Type Description Default

vocab_size

int

Vocabulary size of the APARENT model. Defines the number of input channels of the first convolution. Defaults to 5 (A, C, G, T, N), matching the MultiMolecule DNA streamline alphabet. The upstream checkpoint only uses the first four (A, C, G, T); the N channel stays zero.

5

sequence_length

int

The fixed 3’UTR/polyA input sequence length APARENT was trained on (205 nt).

205

conv1_filters

int

Number of filters in the first convolution.

96

conv1_kernel_size

int

Kernel size (sequence span) of the first convolution. The first convolution also spans the full nucleotide dimension.

8

pool_size

int

Pooling window of the max-pooling layer after the first convolution.

2

conv2_filters

int

Number of filters in the second convolution.

128

conv2_kernel_size

int

Kernel size of the second convolution.

6

hidden_sizes

list[int] | None

Sizes of the two fully connected layers after the convolutional stack. The second value is the size of the shared sequence representation exposed as pooler_output.

None

dropouts

list[float] | None

Dropout probabilities applied after each fully connected layer.

None

hidden_act

str

The non-linear activation function used by the convolutional and dense layers.

'relu'

num_isoform_labels

int

Dimension of the upstream isoform-proportion output (sigmoid). APARENT predicts a single scalar.

1

num_cleavage_labels

int

Dimension of the upstream positional cleavage-distribution output (softmax). APARENT predicts 206 positions (205 sequence positions + 1 distal/library bias slot).

206

library_size

int

Size of the upstream one-hot library-identity input concatenated before the output layers. The MultiMolecule API keeps this as a non-persistent zero feature, matching the upstream default encoder.

13

head

HeadConfig | None

The configuration of the sequence-level prediction head. Defaults to a regression head (problem_type="regression"), matching APARENT’s APA isoform prediction task.

None

Examples:

Python Console Session
1
2
3
4
5
6
7
>>> from multimolecule import AparentConfig, AparentModel
>>> # Initializing a APARENT johli/aparent style configuration
>>> configuration = AparentConfig()
>>> # Initializing a model (with random weights) from the configuration
>>> model = AparentModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
Source code in multimolecule/models/aparent/configuration_aparent.py
Python
class AparentConfig(PreTrainedConfig):
    r"""
    This is the configuration class to store the configuration of a
    [`AparentModel`][multimolecule.models.AparentModel]. It is used to instantiate an APARENT model according to the
    specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a
    similar configuration to that of the APARENT [johli/aparent](https://github.com/johli/aparent) architecture.

    Configuration objects inherit from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig] and can be used to
    control the model outputs. Read the documentation from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig]
    for more information.

    Args:
        vocab_size:
            Vocabulary size of the APARENT model. Defines the number of input channels of the first convolution.
            Defaults to 5 (`A`, `C`, `G`, `T`, `N`), matching the MultiMolecule DNA `streamline` alphabet. The
            upstream checkpoint only uses the first four (`A`, `C`, `G`, `T`); the `N` channel stays zero.
        sequence_length:
            The fixed 3'UTR/polyA input sequence length APARENT was trained on (205 nt).
        conv1_filters:
            Number of filters in the first convolution.
        conv1_kernel_size:
            Kernel size (sequence span) of the first convolution. The first convolution also spans the full
            nucleotide dimension.
        pool_size:
            Pooling window of the max-pooling layer after the first convolution.
        conv2_filters:
            Number of filters in the second convolution.
        conv2_kernel_size:
            Kernel size of the second convolution.
        hidden_sizes:
            Sizes of the two fully connected layers after the convolutional stack. The second value is the size of
            the shared sequence representation exposed as `pooler_output`.
        dropouts:
            Dropout probabilities applied after each fully connected layer.
        hidden_act:
            The non-linear activation function used by the convolutional and dense layers.
        num_isoform_labels:
            Dimension of the upstream isoform-proportion output (sigmoid). APARENT predicts a single scalar.
        num_cleavage_labels:
            Dimension of the upstream positional cleavage-distribution output (softmax). APARENT predicts 206
            positions (205 sequence positions + 1 distal/library bias slot).
        library_size:
            Size of the upstream one-hot library-identity input concatenated before the output layers. The
            MultiMolecule API keeps this as a non-persistent zero feature, matching the upstream default encoder.
        head:
            The configuration of the sequence-level prediction head. Defaults to a regression head
            (`problem_type="regression"`), matching APARENT's APA isoform prediction task.

    Examples:
        >>> from multimolecule import AparentConfig, AparentModel
        >>> # Initializing a APARENT johli/aparent style configuration
        >>> configuration = AparentConfig()
        >>> # Initializing a model (with random weights) from the configuration
        >>> model = AparentModel(configuration)
        >>> # Accessing the model configuration
        >>> configuration = model.config
    """

    model_type = "aparent"

    def __init__(
        self,
        vocab_size: int = 5,
        sequence_length: int = 205,
        conv1_filters: int = 96,
        conv1_kernel_size: int = 8,
        pool_size: int = 2,
        conv2_filters: int = 128,
        conv2_kernel_size: int = 6,
        hidden_sizes: list[int] | None = None,
        dropouts: list[float] | None = None,
        hidden_act: str = "relu",
        num_isoform_labels: int = 1,
        num_cleavage_labels: int = 206,
        library_size: int = 13,
        num_labels: int = 1,
        head: HeadConfig | None = None,
        **kwargs,
    ):
        super().__init__(num_labels=num_labels, **kwargs)
        if hidden_sizes is None:
            hidden_sizes = [512, 256]
        if dropouts is None:
            dropouts = [0.1, 0.1]
        if len(hidden_sizes) != 2 or len(dropouts) != 2:
            raise ValueError(
                f"APARENT expects exactly two dense layers; got hidden_sizes={hidden_sizes}, dropouts={dropouts}."
            )
        self.vocab_size = vocab_size
        self.sequence_length = sequence_length
        self.conv1_filters = conv1_filters
        self.conv1_kernel_size = conv1_kernel_size
        self.pool_size = pool_size
        self.conv2_filters = conv2_filters
        self.conv2_kernel_size = conv2_kernel_size
        self.hidden_sizes = hidden_sizes
        self.dropouts = dropouts
        self.hidden_act = hidden_act
        self.num_isoform_labels = num_isoform_labels
        self.num_cleavage_labels = num_cleavage_labels
        self.library_size = library_size
        # ``hidden_size`` is the dimensionality of the shared dense representation
        # consumed by the MultiMolecule sequence-prediction head.
        self.hidden_size = hidden_sizes[-1]
        if head is None:
            head = HeadConfig(problem_type="regression")
        else:
            head = HeadConfig(head)
            if head.problem_type is None:
                head.problem_type = "regression"
        self.head = head

AparentForSequencePrediction

Bases: AparentPreTrainedModel

APARENT model with a sequence-level prediction head.

APARENT’s primary sequence-level output is the alternative-polyadenylation isoform score. This wrapper exposes the converted upstream isoform decoder directly. The upstream positional cleavage distribution is intentionally not exposed by this head; it remains available on [AparentModel] as cleavage_logits.

Examples:

Python Console Session
>>> import torch
>>> from multimolecule import AparentConfig, AparentForSequencePrediction, DnaTokenizer
>>> config = AparentConfig()
>>> model = AparentForSequencePrediction(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent")
>>> input = tokenizer("ACGTNACGTN", return_tensors="pt")
>>> output = model(**input, labels=torch.tensor([[1.0]]))
>>> output["logits"].shape
torch.Size([1, 1])
>>> output["loss"]
tensor(..., grad_fn=<MseLossBackward0>)
Source code in multimolecule/models/aparent/modeling_aparent.py
Python
class AparentForSequencePrediction(AparentPreTrainedModel):
    """
    APARENT model with a sequence-level prediction head.

    APARENT's primary sequence-level output is the alternative-polyadenylation isoform score. This wrapper exposes the
    converted upstream isoform decoder directly. The upstream positional cleavage distribution is intentionally not
    exposed by this head; it remains available on [`AparentModel`] as `cleavage_logits`.

    Examples:
        >>> import torch
        >>> from multimolecule import AparentConfig, AparentForSequencePrediction, DnaTokenizer
        >>> config = AparentConfig()
        >>> model = AparentForSequencePrediction(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent")
        >>> input = tokenizer("ACGTNACGTN", return_tensors="pt")
        >>> output = model(**input, labels=torch.tensor([[1.0]]))
        >>> output["logits"].shape
        torch.Size([1, 1])
        >>> output["loss"]  # doctest:+ELLIPSIS
        tensor(..., grad_fn=<MseLossBackward0>)
    """

    def __init__(self, config: AparentConfig):
        super().__init__(config)
        self.model = AparentModel(config)
        head_config = HeadConfig(config.head) if config.head is not None else HeadConfig()
        if head_config.num_labels is None:
            head_config.num_labels = config.num_isoform_labels
        if head_config.problem_type is None:
            head_config.problem_type = "regression"
        self.head_config = head_config
        self.criterion = Criterion(head_config)
        # Initialize weights and apply final processing
        self.post_init()

    @property
    def output_channels(self) -> list[str]:
        if self.config.num_isoform_labels != 1:
            return [f"isoform_proportion_{index}" for index in range(self.config.num_isoform_labels)]
        return ["isoform_proportion"]

    @can_return_tuple
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        labels: Tensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> Tuple[Tensor, ...] | SequencePredictorOutput:
        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
            return_dict=True,
            **kwargs,
        )

        logits = outputs.isoform_logits
        loss = self.criterion(logits, labels) if labels is not None else None

        return SequencePredictorOutput(loss=loss, logits=logits)

    def postprocess(self, outputs: SequencePredictorOutput | ModelOutput) -> Tensor:
        return torch.sigmoid(outputs["logits"])

AparentModel

Bases: AparentPreTrainedModel

The bare APARENT model outputting the shared sequence representation together with the upstream isoform and cleavage predictions.

Examples:

Python Console Session
>>> from multimolecule import AparentConfig, AparentModel, DnaTokenizer
>>> config = AparentConfig()
>>> model = AparentModel(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent")
>>> input = tokenizer("ACGTNACGTN", return_tensors="pt")
>>> output = model(**input)
>>> output["pooler_output"].shape
torch.Size([1, 256])
>>> output["isoform_logits"].shape
torch.Size([1, 1])
>>> output["cleavage_logits"].shape
torch.Size([1, 206])
Source code in multimolecule/models/aparent/modeling_aparent.py
Python
class AparentModel(AparentPreTrainedModel):
    """
    The bare APARENT model outputting the shared sequence representation together with the upstream isoform and
    cleavage predictions.

    Examples:
        >>> from multimolecule import AparentConfig, AparentModel, DnaTokenizer
        >>> config = AparentConfig()
        >>> model = AparentModel(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent")
        >>> input = tokenizer("ACGTNACGTN", return_tensors="pt")
        >>> output = model(**input)
        >>> output["pooler_output"].shape
        torch.Size([1, 256])
        >>> output["isoform_logits"].shape
        torch.Size([1, 1])
        >>> output["cleavage_logits"].shape
        torch.Size([1, 206])
    """

    def __init__(self, config: AparentConfig):
        super().__init__(config)
        self.embeddings = AparentEmbedding(config)
        self.encoder = AparentEncoder(config)
        self.isoform_decoder = nn.Linear(config.hidden_sizes[-1] + config.library_size, config.num_isoform_labels)
        self.cleavage_decoder = nn.Linear(config.hidden_sizes[-1] + config.library_size, config.num_cleavage_labels)
        # Initialize weights and apply final processing
        self.post_init()

    @merge_with_config_defaults
    @capture_outputs
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> AparentModelOutput:
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is None and inputs_embeds is None:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if isinstance(input_ids, NestedTensor):
            if attention_mask is None:
                attention_mask = input_ids.mask
            input_ids = input_ids.tensor
        if isinstance(inputs_embeds, NestedTensor):
            if attention_mask is None:
                attention_mask = inputs_embeds.mask
            inputs_embeds = inputs_embeds.tensor

        embedding_output = self.embeddings(
            input_ids=input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
        )

        pooled_output = self.encoder(embedding_output)

        batch_size = pooled_output.size(0)
        # The upstream default encoder feeds an all-zero library one-hot before the output
        # layers. It is a deterministic constant rebuilt here rather than stored in the
        # checkpoint.
        library = torch.zeros(
            batch_size,
            self.config.library_size,
            device=pooled_output.device,
            dtype=pooled_output.dtype,
        )
        shared = torch.cat([pooled_output, library], dim=-1)
        isoform_logits = self.isoform_decoder(shared)
        cleavage_logits = self.cleavage_decoder(shared)

        return AparentModelOutput(
            pooler_output=pooled_output,
            isoform_logits=isoform_logits,
            cleavage_logits=cleavage_logits,
        )

AparentModelOutput dataclass

Bases: ModelOutput

Base class for outputs of the APARENT model.

Parameters:

Name Type Description Default

pooler_output

`torch.FloatTensor` of shape `(batch_size, hidden_size)`

The shared sequence representation after the two fully connected layers. Consumed by the MultiMolecule sequence-prediction head.

None

isoform_logits

`torch.FloatTensor` of shape `(batch_size, num_isoform_labels)`

Pre-sigmoid logits of the upstream alternative-polyadenylation isoform-proportion output.

None

cleavage_logits

`torch.FloatTensor` of shape `(batch_size, num_cleavage_labels)`

Pre-softmax logits of the upstream positional cleavage distribution.

None
Source code in multimolecule/models/aparent/modeling_aparent.py
Python
@dataclass
class AparentModelOutput(ModelOutput):
    """
    Base class for outputs of the APARENT model.

    Args:
        pooler_output (`torch.FloatTensor` of shape `(batch_size, hidden_size)`):
            The shared sequence representation after the two fully connected layers. Consumed by the MultiMolecule
            sequence-prediction head.
        isoform_logits (`torch.FloatTensor` of shape `(batch_size, num_isoform_labels)`):
            Pre-sigmoid logits of the upstream alternative-polyadenylation isoform-proportion output.
        cleavage_logits (`torch.FloatTensor` of shape `(batch_size, num_cleavage_labels)`):
            Pre-softmax logits of the upstream positional cleavage distribution.
    """

    pooler_output: torch.FloatTensor | None = None
    isoform_logits: torch.FloatTensor | None = None
    cleavage_logits: torch.FloatTensor | None = None

AparentPreTrainedModel

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in multimolecule/models/aparent/modeling_aparent.py
Python
class AparentPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = AparentConfig
    base_model_prefix = "model"
    _can_record_outputs: dict[str, Any] | None = None
    _no_split_modules = ["AparentEncoder"]

    @torch.no_grad()
    def _init_weights(self, module):
        super()._init_weights(module)
        # Use transformers.initialization wrappers (imported as `init`); they check the
        # `_is_hf_initialized` flag so they don't clobber tensors loaded from a checkpoint.
        if isinstance(module, (nn.Conv1d, nn.Linear)):
            init.kaiming_uniform_(module.weight, a=math.sqrt(5))
            if module.bias is not None:
                fan_in, _ = nn.init._calculate_fan_in_and_fan_out(module.weight)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                init.uniform_(module.bias, -bound, bound)