APARENT2¶

Deep residual neural network for predicting human 3’ UTR Alternative Polyadenylation (APA) and cleavage magnitude at base-pair resolution, and for deciphering the impact of genetic variants on polyadenylation.

Disclaimer¶

This is an UNOFFICIAL implementation of Deciphering the impact of genetic variation on human polyadenylation using APARENT2 by Johannes Linder, Samantha E. Koplik et al.

The OFFICIAL repository of APARENT2 is at johli/aparent-resnet.

Tip

The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

The team releasing APARENT2 did not write this model card for this model so this model card has been written by the MultiMolecule team.

Model Details¶

APARENT2 is a residual convolutional neural network (a ResNet successor to the original APARENT) trained on a 3’ UTR massively parallel reporter assay (MPRA). Given a fixed 205bp polyadenylation signal (PAS) sequence, it predicts a base-pair-resolution cleavage probability distribution as well as the overall isoform abundance. It is primarily used to score the effect of genetic variants on polyadenylation by comparing the predictions for a reference and an alternate sequence.

Model Specification¶

Num Layers	Hidden Size	Num Parameters (M)	FLOPs (G)	MACs (G)	Max Num Tokens
28	32	0.19	0.08	0.04	205

Links¶

Code: multimolecule.aparent2
Weights: multimolecule/aparent2
Data: Massively-parallel polyadenylation MPRA with variant-effect evaluation data
Paper: Deciphering the impact of genetic variation on human polyadenylation using APARENT2
Developed by: Johannes Linder, Samantha E. Koplik, Anshul Kundaje, Georg Seelig
Model type: 1D residual CNN successor to APARENT for polyadenylation isoform, cleavage, and variant-effect prediction
Original Repository: johli/aparent-resnet

Usage¶

The model file depends on the multimolecule library. You can install it using pip:

Bash
1	`pip install multimolecule`

Direct Use¶

Polyadenylation Cleavage Prediction¶

You can use this model directly to predict the cleavage distribution of a 205bp polyadenylation signal sequence (core hexamer starting at position 70):

Python
>>> import torch
>>> from multimolecule import DnaTokenizer, Aparent2Model

>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent2")
>>> model = Aparent2Model.from_pretrained("multimolecule/aparent2")
>>> sequence = "A" * 70 + "AATAAA" + "A" * 129
>>> output = model(**tokenizer(sequence, return_tensors="pt"))

>>> output.logits.shape
torch.Size([1, 206])

Variant Effect Scoring¶

Score a reference and an alternate sequence separately, then compare:

Python
>>> import torch
>>> ref = "A" * 70 + "AATAAA" + "A" * 129
>>> alt = "A" * 70 + "AATACA" + "A" * 129
>>> ref_prob = torch.softmax(model(**tokenizer(ref, return_tensors="pt")).logits, dim=-1)
>>> alt_prob = torch.softmax(model(**tokenizer(alt, return_tensors="pt")).logits, dim=-1)
>>> ref_iso = ref_prob[:, 77:127].sum(dim=-1)
>>> alt_iso = alt_prob[:, 77:127].sum(dim=-1)
>>> delta_logodds = torch.log(alt_iso / (1 - alt_iso)) - torch.log(ref_iso / (1 - ref_iso))

Interface¶

Input length: fixed 205 bp window
Hexamer position: core hexamer (e.g., AATAAA) at position 70 (0-indexed) of the 205 bp window
Output: 206-dim cleavage distribution (one score per input position + trailing “no cleavage in window” bucket)

Variant Effect¶

Score reference and alternate sequences separately and compare their cleavage / isoform predictions
There is no separate ref/alt output dataclass

Training Details¶

APARENT2 was trained to predict base-pair-resolution cleavage and isoform abundance from 3’ UTR MPRA measurements.

Training Data¶

The model was trained on the 3’ UTR MPRA library used by the original APARENT, re-processed with additional improvements (exact cleavage positions for the Alien1 Random sublibrary and a 20 nt random barcode upstream of the USE in the Alien1 sublibrary). The measured variant data and processed data repository are available at the original APARENT GitHub.

Training Procedure¶

Pre-training¶

The model minimizes a combination of a sigmoid KL-divergence isoform loss and a KL-divergence cleavage loss, weighted equally. The released inference model corresponds to the residual-network model trained for 5 epochs on all sublibraries (excluding ClinVar wild-type sequences), with dropout disabled for inference.

Citation¶

BibTeX
@article{linder2022deciphering,
  author    = {Linder, Johannes and Koplik, Samantha E. and Kundaje, Anshul and Seelig, Georg},
  title     = {Deciphering the impact of genetic variation on human polyadenylation using APARENT2},
  journal   = {Genome Biology},
  volume    = {23},
  number    = {1},
  pages     = {232},
  year      = {2022},
  doi       = {10.1186/s13059-022-02799-4},
  publisher = {Springer Science and Business Media LLC}
}

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:

BibTeX
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}

Contact¶

Please use GitHub issues of MultiMolecule for any questions or comments on the model card.

Please contact the authors of the APARENT2 paper for questions or comments on the paper/model.

License¶

This model implementation is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Text Only
1	`SPDX-License-Identifier: AGPL-3.0-or-later`

multimolecule.models.aparent2 ¶

DnaTokenizer ¶

Bases: Tokenizer

Tokenizer for DNA sequences.

Parameters:

Name	Type	Description	Default
`alphabet` ¶	`Alphabet \| str \| List[str] \| None`	alphabet to use for tokenization. If is `None`, the standard RNA alphabet will be used. If is a `string`, it should correspond to the name of a predefined alphabet. The options include `standard` `iupac` `streamline` `nucleobase` If is an alphabet or a list of characters, that specific alphabet will be used.	`None`
`nmers` ¶	`int`	Size of kmer to tokenize.	`1`
`codon` ¶	`bool`	Whether to tokenize into codons.	`False`
`replace_U_with_T` ¶	`bool`	Whether to replace U with T.	`True`
`do_upper_case` ¶	`bool`	Whether to convert input to uppercase.	`True`

Examples:

Python Console Session
>>> from multimolecule import DnaTokenizer
>>> tokenizer = DnaTokenizer()
>>> tokenizer('<pad><cls><eos><unk><mask><null>ACGTNRYSWKMBDHVX|.*-?')["input_ids"]
[1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2]
>>> tokenizer('acgt')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer('acgu')["input_ids"]
[1, 6, 7, 8, 9, 2]
>>> tokenizer = DnaTokenizer(replace_U_with_T=False)
>>> tokenizer('acgu')["input_ids"]
[1, 6, 7, 8, 3, 2]
>>> tokenizer = DnaTokenizer(nmers=3)
>>> tokenizer('tataaagta')["input_ids"]
[1, 84, 21, 81, 6, 8, 19, 71, 2]
>>> tokenizer = DnaTokenizer(codon=True)
>>> tokenizer('tataaagta')["input_ids"]
[1, 84, 6, 71, 2]
>>> tokenizer('tataaagtaa')["input_ids"]
Traceback (most recent call last):
ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10

Source code in multimolecule/tokenisers/dna/tokenization_dna.py

Python
class DnaTokenizer(Tokenizer):
    """
    Tokenizer for DNA sequences.

    Args:
        alphabet: alphabet to use for tokenization.

            - If is `None`, the standard RNA alphabet will be used.
            - If is a `string`, it should correspond to the name of a predefined alphabet. The options include
                + `standard`
                + `iupac`
                + `streamline`
                + `nucleobase`
            - If is an alphabet or a list of characters, that specific alphabet will be used.
        nmers: Size of kmer to tokenize.
        codon: Whether to tokenize into codons.
        replace_U_with_T: Whether to replace U with T.
        do_upper_case: Whether to convert input to uppercase.

    Examples:
        >>> from multimolecule import DnaTokenizer
        >>> tokenizer = DnaTokenizer()
        >>> tokenizer('<pad><cls><eos><unk><mask><null>ACGTNRYSWKMBDHVX|.*-?')["input_ids"]
        [1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2]
        >>> tokenizer('acgt')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer('acgu')["input_ids"]
        [1, 6, 7, 8, 9, 2]
        >>> tokenizer = DnaTokenizer(replace_U_with_T=False)
        >>> tokenizer('acgu')["input_ids"]
        [1, 6, 7, 8, 3, 2]
        >>> tokenizer = DnaTokenizer(nmers=3)
        >>> tokenizer('tataaagta')["input_ids"]
        [1, 84, 21, 81, 6, 8, 19, 71, 2]
        >>> tokenizer = DnaTokenizer(codon=True)
        >>> tokenizer('tataaagta')["input_ids"]
        [1, 84, 6, 71, 2]
        >>> tokenizer('tataaagtaa')["input_ids"]
        Traceback (most recent call last):
        ValueError: length of input sequence must be a multiple of 3 for codon tokenization, but got 10
    """

    model_input_names = ["input_ids", "attention_mask"]

    def __init__(
        self,
        alphabet: Alphabet | str | List[str] | None = None,
        nmers: int = 1,
        codon: bool = False,
        replace_U_with_T: bool = True,
        do_upper_case: bool = True,
        additional_special_tokens: List | Tuple | None = None,
        **kwargs,
    ):
        if codon and (nmers > 1 and nmers != 3):
            raise ValueError("Codon and nmers cannot be used together.")
        if codon:
            nmers = 3  # set to 3 to get correct vocab
        if not isinstance(alphabet, Alphabet):
            alphabet = get_alphabet(alphabet, nmers=nmers)
        super().__init__(
            alphabet=alphabet,
            nmers=nmers,
            codon=codon,
            replace_U_with_T=replace_U_with_T,
            do_upper_case=do_upper_case,
            additional_special_tokens=additional_special_tokens,
            **kwargs,
        )
        self.replace_U_with_T = replace_U_with_T
        self.nmers = nmers
        self.codon = codon

    def _tokenize(self, text: str, **kwargs):
        if self.do_upper_case:
            text = text.upper()
        if self.replace_U_with_T:
            text = text.replace("U", "T")
        if self.codon:
            if len(text) % 3 != 0:
                raise ValueError(
                    f"length of input sequence must be a multiple of 3 for codon tokenization, but got {len(text)}"
                )
            return [text[i : i + 3] for i in range(0, len(text), 3)]
        if self.nmers > 1:
            return [text[i : i + self.nmers] for i in range(len(text) - self.nmers + 1)]  # noqa: E203
        return list(text)

Aparent2Config ¶

Bases: PreTrainedConfig

This is the configuration class to store the configuration of a Aparent2Model. It is used to instantiate a APARENT2 model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the APARENT2 johli/aparent-resnet architecture.

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

APARENT2 is a residual convolutional network that predicts human 3’ UTR Alternative Polyadenylation (APA) and cleavage magnitude at base-pair resolution. The network is fully convolutional plus a position-wise locally-connected library-bias layer; it does not contain any flatten/dense layers.

Parameters:

Name	Type	Description	Default
`vocab_size` ¶	`int`	Vocabulary size of the APARENT2 model. Defines the number of one-hot input channels derived from `input_ids`. Defaults to 5 (the MultiMolecule streamline DNA alphabet `ACGTN`). The converted first projection represents `N` as the upstream 0.25 mixture across A/C/G/T.	`5`
`sequence_length` ¶	`int`	The fixed length of the polyadenylation signal sequence the model was trained on. APARENT2 expects a 205bp window with the core hexamer (e.g. `AATAAA`) starting at position 70 (0-indexed).	`205`
`hidden_size` ¶	`int`	Number of feature channels used throughout the residual network.	`32`
`num_groups` ¶	`int`	Number of residual-block groups.	`7`
`num_blocks` ¶	`int`	Number of residual blocks per group.	`4`
`kernel_size` ¶	`int`	Convolution kernel size used inside each residual block.	`3`
`dilations` ¶	`list[int] \| None`	Dilation factor for each residual-block group. Must have `num_groups` entries.	`None`
`num_libraries` ¶	`int`	Dimensionality of the one-hot training sub-library bias input.	`13`
`library_index` ¶	`int`	The training sub-library index used to construct the deterministic library-bias input. The upstream variant-effect workflow always uses index 11.	`11`
`hidden_act` ¶	`str`	The non-linear activation function used inside the residual blocks.	`'relu'`
`batch_norm_eps` ¶	`float`	The epsilon used by the batch normalization layers.	`0.001`
`batch_norm_momentum` ¶	`float`	The momentum used by the batch normalization layers.	`0.99`
`num_labels` ¶	`int`	Number of output labels. APARENT2 predicts a cleavage distribution over `sequence_length + 1` positions (the extra position is the “no cleavage in window” bucket), so this defaults to 206.	`206`
`head` ¶	`HeadConfig \| None`	The configuration of the prediction head. Defaults to a regression head (`problem_type="regression"`), matching APARENT2’s cleavage-distribution prediction task.	`None`

Examples:

Python Console Session
>>> from multimolecule import Aparent2Config, Aparent2Model
>>> # Initializing a APARENT2 multimolecule/aparent2 style configuration
>>> configuration = Aparent2Config()
>>> # Initializing a model (with random weights) from the multimolecule/aparent2 style configuration
>>> model = Aparent2Model(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

Source code in multimolecule/models/aparent2/configuration_aparent2.py

Python
class Aparent2Config(PreTrainedConfig):
    r"""
    This is the configuration class to store the configuration of a
    [`Aparent2Model`][multimolecule.models.Aparent2Model]. It is used to instantiate a APARENT2 model according to the
    specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a
    similar configuration to that of the APARENT2 [johli/aparent-resnet](https://github.com/johli/aparent-resnet)
    architecture.

    Configuration objects inherit from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig] and can be used to
    control the model outputs. Read the documentation from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig]
    for more information.

    APARENT2 is a residual convolutional network that predicts human 3' UTR Alternative Polyadenylation (APA) and
    cleavage magnitude at base-pair resolution. The network is fully convolutional plus a position-wise
    locally-connected library-bias layer; it does not contain any flatten/dense layers.

    Args:
        vocab_size:
            Vocabulary size of the APARENT2 model. Defines the number of one-hot input channels derived from
            `input_ids`. Defaults to 5 (the MultiMolecule streamline DNA alphabet `ACGTN`). The converted first
            projection represents `N` as the upstream 0.25 mixture across A/C/G/T.
        sequence_length:
            The fixed length of the polyadenylation signal sequence the model was trained on. APARENT2 expects a 205bp
            window with the core hexamer (e.g. `AATAAA`) starting at position 70 (0-indexed).
        hidden_size:
            Number of feature channels used throughout the residual network.
        num_groups:
            Number of residual-block groups.
        num_blocks:
            Number of residual blocks per group.
        kernel_size:
            Convolution kernel size used inside each residual block.
        dilations:
            Dilation factor for each residual-block group. Must have `num_groups` entries.
        num_libraries:
            Dimensionality of the one-hot training sub-library bias input.
        library_index:
            The training sub-library index used to construct the deterministic library-bias input. The upstream
            variant-effect workflow always uses index 11.
        hidden_act:
            The non-linear activation function used inside the residual blocks.
        batch_norm_eps:
            The epsilon used by the batch normalization layers.
        batch_norm_momentum:
            The momentum used by the batch normalization layers.
        num_labels:
            Number of output labels. APARENT2 predicts a cleavage distribution over `sequence_length + 1` positions
            (the extra position is the "no cleavage in window" bucket), so this defaults to 206.
        head:
            The configuration of the prediction head. Defaults to a regression head
            (`problem_type="regression"`), matching APARENT2's cleavage-distribution prediction task.

    Examples:
        >>> from multimolecule import Aparent2Config, Aparent2Model
        >>> # Initializing a APARENT2 multimolecule/aparent2 style configuration
        >>> configuration = Aparent2Config()
        >>> # Initializing a model (with random weights) from the multimolecule/aparent2 style configuration
        >>> model = Aparent2Model(configuration)
        >>> # Accessing the model configuration
        >>> configuration = model.config
    """

    model_type = "aparent2"

    def __init__(
        self,
        vocab_size: int = 5,
        sequence_length: int = 205,
        hidden_size: int = 32,
        num_groups: int = 7,
        num_blocks: int = 4,
        kernel_size: int = 3,
        dilations: list[int] | None = None,
        num_libraries: int = 13,
        library_index: int = 11,
        hidden_act: str = "relu",
        batch_norm_eps: float = 1e-3,
        batch_norm_momentum: float = 0.99,
        num_labels: int = 206,
        head: HeadConfig | None = None,
        **kwargs,
    ):
        super().__init__(num_labels=num_labels, **kwargs)
        if dilations is None:
            dilations = [1, 2, 4, 8, 4, 2, 1]
        if len(dilations) != num_groups:
            raise ValueError(f"`dilations` must have `num_groups` ({num_groups}) entries, but got {len(dilations)}.")
        if not 0 <= library_index < num_libraries:
            raise ValueError(f"`library_index` ({library_index}) must be in [0, num_libraries={num_libraries}).")
        if num_labels != sequence_length + 1:
            raise ValueError(
                f"`num_labels` ({num_labels}) must equal `sequence_length + 1` ({sequence_length + 1}); "
                "APARENT2 predicts a cleavage distribution over `sequence_length + 1` positions."
            )
        self.vocab_size = vocab_size
        self.sequence_length = sequence_length
        self.hidden_size = hidden_size
        self.num_groups = num_groups
        self.num_blocks = num_blocks
        self.kernel_size = kernel_size
        self.dilations = dilations
        self.num_libraries = num_libraries
        self.library_index = library_index
        self.hidden_act = hidden_act
        self.batch_norm_eps = batch_norm_eps
        self.batch_norm_momentum = batch_norm_momentum
        if head is None:
            head = HeadConfig(problem_type="regression")
        else:
            head = HeadConfig(head)
            if head.problem_type is None:
                head.problem_type = "regression"
        self.head = head

Aparent2ForSequencePrediction ¶

Bases: Aparent2PreTrainedModel

APARENT2 with a sequence-level prediction head.

The backbone already produces a sequence_length + 1 dimensional cleavage score (the APA cleavage distribution before softmax), so this wrapper exposes those converted upstream scores directly and adds the shared MultiMolecule regression loss.

Examples:

Python Console Session
>>> import torch
>>> from multimolecule import Aparent2Config, Aparent2ForSequencePrediction, DnaTokenizer
>>> config = Aparent2Config()
>>> model = Aparent2ForSequencePrediction(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent2")
>>> input = tokenizer("A" * 205, return_tensors="pt")
>>> output = model(**input, labels=torch.randn(1, 206))
>>> output["logits"].shape
torch.Size([1, 206])

Source code in multimolecule/models/aparent2/modeling_aparent2.py

Python
class Aparent2ForSequencePrediction(Aparent2PreTrainedModel):
    """
    APARENT2 with a sequence-level prediction head.

    The backbone already produces a `sequence_length + 1` dimensional cleavage score (the APA cleavage distribution
    before softmax), so this wrapper exposes those converted upstream scores directly and adds the shared
    MultiMolecule regression loss.

    Examples:
        >>> import torch
        >>> from multimolecule import Aparent2Config, Aparent2ForSequencePrediction, DnaTokenizer
        >>> config = Aparent2Config()
        >>> model = Aparent2ForSequencePrediction(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent2")
        >>> input = tokenizer("A" * 205, return_tensors="pt")
        >>> output = model(**input, labels=torch.randn(1, 206))
        >>> output["logits"].shape
        torch.Size([1, 206])
    """

    def __init__(self, config: Aparent2Config):
        super().__init__(config)
        self.model = Aparent2Model(config)
        head_config = HeadConfig(config.head) if config.head is not None else HeadConfig()
        if head_config.num_labels is None:
            head_config.num_labels = config.num_labels
        if head_config.problem_type is None:
            head_config.problem_type = "regression"
        self.head_config = head_config
        self.criterion = Criterion(head_config)

        # Initialize weights and apply final processing
        self.post_init()

    @property
    def output_channels(self) -> list[str]:
        return [f"cleavage_{index}" for index in range(self.config.sequence_length)] + ["no_cleavage"]

    @can_return_tuple
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        labels: Tensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> Tuple[Tensor, ...] | SequencePredictorOutput:
        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
            return_dict=True,
            **kwargs,
        )

        logits = outputs.logits
        loss = self.criterion(logits, labels) if labels is not None else None

        return SequencePredictorOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
        )

    def postprocess(self, outputs: SequencePredictorOutput | ModelOutput) -> Tensor:
        return F.softmax(outputs["logits"], dim=-1)

Aparent2Model ¶

Bases: Aparent2PreTrainedModel

The bare APARENT2 residual network.

APARENT2 predicts a base-pair-resolution cleavage distribution for a fixed 205bp polyadenylation signal window. The core hexamer (e.g. AATAAA) is expected to start at position 70 (0-indexed). Variant effect is an input-schema concern: score a reference and an alternate sequence separately and compare their cleavage / isoform predictions; there is no separate ref/alt output dataclass.

Examples:

Python Console Session
>>> from multimolecule import Aparent2Config, Aparent2Model, DnaTokenizer
>>> config = Aparent2Config()
>>> model = Aparent2Model(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent2")
>>> input = tokenizer("A" * 205, return_tensors="pt")
>>> output = model(**input)
>>> output["logits"].shape
torch.Size([1, 206])
>>> output["pooler_output"].shape
torch.Size([1, 206])

Source code in multimolecule/models/aparent2/modeling_aparent2.py

Python
class Aparent2Model(Aparent2PreTrainedModel):
    """
    The bare APARENT2 residual network.

    APARENT2 predicts a base-pair-resolution cleavage distribution for a fixed 205bp polyadenylation signal window.
    The core hexamer (e.g. ``AATAAA``) is expected to start at position 70 (0-indexed). Variant effect is an
    *input-schema* concern: score a reference and an alternate sequence separately and compare their cleavage /
    isoform predictions; there is no separate ref/alt output dataclass.

    Examples:
        >>> from multimolecule import Aparent2Config, Aparent2Model, DnaTokenizer
        >>> config = Aparent2Config()
        >>> model = Aparent2Model(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/aparent2")
        >>> input = tokenizer("A" * 205, return_tensors="pt")
        >>> output = model(**input)
        >>> output["logits"].shape
        torch.Size([1, 206])
        >>> output["pooler_output"].shape
        torch.Size([1, 206])
    """

    def __init__(self, config: Aparent2Config):
        super().__init__(config)
        self.config = config
        self.gradient_checkpointing = False
        self.embeddings = Aparent2Embedding(config)
        self.encoder = Aparent2Encoder(config)
        self.prediction = nn.Conv1d(config.hidden_size, 1, kernel_size=1)
        self.library_bias = Aparent2LibraryBias(config)
        # Initialize weights and apply final processing
        self.post_init()

    @merge_with_config_defaults
    @capture_outputs
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> Aparent2ModelOutput:
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is None and inputs_embeds is None:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if isinstance(input_ids, NestedTensor):
            if attention_mask is None:
                attention_mask = input_ids.mask
            input_ids = input_ids.tensor
        if isinstance(inputs_embeds, NestedTensor):
            if attention_mask is None:
                attention_mask = inputs_embeds.mask
            inputs_embeds = inputs_embeds.tensor

        embedding_output = self.embeddings(
            input_ids=input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
        )

        hidden_state = self.encoder(embedding_output, **kwargs)

        sequence_score = self.prediction(hidden_state).squeeze(1)
        logits = self.library_bias(sequence_score)

        return Aparent2ModelOutput(
            logits=logits,
            pooler_output=logits,
            last_hidden_state=hidden_state.transpose(1, 2),
        )

Aparent2ModelOutput `dataclass` ¶

Bases: ModelOutput

Base class for outputs of the APARENT2 model.

Parameters:

Name	Type	Description	Default
`loss` ¶	`torch.FloatTensor` of shape `(1,)`, optional	Not produced by the bare model; present for API compatibility.	`None`
`logits` ¶	`torch.FloatTensor` of shape `(batch_size, sequence_length + 1)`	APA cleavage scores (before SoftMax) for each position plus a trailing “no cleavage in window” bucket.	`None`
`pooler_output` ¶	`torch.FloatTensor` of shape `(batch_size, sequence_length + 1)`	Same content as `logits`; exposed for sequence-level prediction wrappers.	`None`
`last_hidden_state` ¶	`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`	The residual-network feature map before the final cleavage projection.	`None`
`hidden_states` ¶	`tuple(torch.FloatTensor)`, optional	Hidden states of the model at the output of each layer.	`None`

Source code in multimolecule/models/aparent2/modeling_aparent2.py

Python
@dataclass
class Aparent2ModelOutput(ModelOutput):
    """
    Base class for outputs of the APARENT2 model.

    Args:
        loss (`torch.FloatTensor` of shape `(1,)`, *optional*):
            Not produced by the bare model; present for API compatibility.
        logits (`torch.FloatTensor` of shape `(batch_size, sequence_length + 1)`):
            APA cleavage scores (before SoftMax) for each position plus a trailing "no cleavage in window" bucket.
        pooler_output (`torch.FloatTensor` of shape `(batch_size, sequence_length + 1)`):
            Same content as `logits`; exposed for sequence-level prediction wrappers.
        last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`):
            The residual-network feature map before the final cleavage projection.
        hidden_states (`tuple(torch.FloatTensor)`, *optional*):
            Hidden states of the model at the output of each layer.
    """

    loss: torch.FloatTensor | None = None
    logits: torch.FloatTensor | None = None
    pooler_output: torch.FloatTensor | None = None
    last_hidden_state: torch.FloatTensor | None = None
    hidden_states: tuple[torch.FloatTensor, ...] | None = None

Aparent2PreTrainedModel ¶

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in multimolecule/models/aparent2/modeling_aparent2.py

Python
class Aparent2PreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = Aparent2Config
    base_model_prefix = "model"
    supports_gradient_checkpointing = True
    _can_record_outputs: dict[str, Any] | None = None
    _no_split_modules = ["Aparent2Block"]

    @torch.no_grad()
    def _init_weights(self, module):
        super()._init_weights(module)
        # Use transformers.initialization wrappers (imported as `init`); they check the
        # `_is_hf_initialized` flag so they don't clobber tensors loaded from a checkpoint.
        if isinstance(module, nn.Conv1d):
            init.xavier_normal_(module.weight)
            if module.bias is not None:
                init.zeros_(module.bias)
        elif isinstance(module, nn.Linear):
            init.kaiming_uniform_(module.weight, a=math.sqrt(5))
            if module.bias is not None:
                fan_in, _ = nn.init._calculate_fan_in_and_fan_out(module.weight)
                bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
                init.uniform_(module.bias, -bound, bound)
        elif isinstance(module, (nn.BatchNorm1d, nn.LayerNorm, nn.GroupNorm)):
            init.ones_(module.weight)
            init.zeros_(module.bias)
        elif isinstance(module, Aparent2LibraryBias):
            init.xavier_normal_(module.weight)
            init.zeros_(module.bias)

APARENT2¶

APARENT2¶

Disclaimer¶

Model Details¶

Model Specification¶

Links¶

Usage¶

Direct Use¶

Polyadenylation Cleavage Prediction¶

Variant Effect Scoring¶

Interface¶

Variant Effect¶

Training Details¶

Training Data¶

Training Procedure¶

Pre-training¶

Citation¶

Contact¶

License¶

multimolecule.models.aparent2 ¶

DnaTokenizer ¶

alphabet ¶

nmers ¶

codon ¶

replace_U_with_T ¶

do_upper_case ¶

Aparent2Config ¶

vocab_size ¶

sequence_length ¶

hidden_size ¶

num_groups ¶

num_blocks ¶

kernel_size ¶

dilations ¶

num_libraries ¶

library_index ¶

hidden_act ¶

batch_norm_eps ¶

batch_norm_momentum ¶

num_labels ¶

head ¶

Aparent2ForSequencePrediction ¶

Aparent2Model ¶

Aparent2ModelOutput dataclass ¶

loss ¶

logits ¶

pooler_output ¶

last_hidden_state ¶

hidden_states ¶

Aparent2PreTrainedModel ¶

`alphabet` ¶

`nmers` ¶

`codon` ¶

`replace_U_with_T` ¶

`do_upper_case` ¶

`vocab_size` ¶

`sequence_length` ¶

`hidden_size` ¶

`num_groups` ¶

`num_blocks` ¶

`kernel_size` ¶

`dilations` ¶

`num_libraries` ¶

`library_index` ¶

`hidden_act` ¶

`batch_norm_eps` ¶

`batch_norm_momentum` ¶

`num_labels` ¶

`head` ¶

Aparent2ModelOutput `dataclass` ¶

`loss` ¶

`logits` ¶

`pooler_output` ¶

`last_hidden_state` ¶

`hidden_states` ¶