DeepSEA¶

Deep convolutional neural network that predicts noncoding chromatin features (DNase I hypersensitivity, transcription-factor binding, and histone marks) from DNA sequence, used to score the regulatory impact of noncoding variants.

Disclaimer¶

This is an UNOFFICIAL implementation of Predicting effects of noncoding variants with deep learning-based sequence model by Jian Zhou, et al.

The OFFICIAL repository of DeepSEA is at jisraeli/DeepSEA.

Tip

The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.

The team releasing DeepSEA did not write this model card for this model so this model card has been written by the MultiMolecule team.

Model Details¶

DeepSEA is a convolutional neural network (CNN) trained to predict 919 chromatin features—DNase I hypersensitivity peaks, transcription-factor binding peaks, and histone-mark peaks—across multiple human cell types from a fixed-length 1000 bp DNA sequence. The model applies three convolutional blocks (convolution, ReLU, max pooling, and dropout) followed by a single fully-connected layer and a multi-label sigmoid output. The sequence-prediction model averages forward and reverse-complement probabilities. The trained model is then used to score the regulatory impact of noncoding single-nucleotide variants by computing the difference between reference- and alternate-allele predictions. Please refer to the Training Details section for more information on the training process.

Model Specification¶

Num Conv Layers	Num FC Layers	Hidden Size	Num Parameters (M)	FLOPs (G)	MACs (G)	Max Num Tokens
3	1	925	52.84	1.10	0.55	1000

Links¶

Code: multimolecule.deepsea
Data: ENCODE and Roadmap Epigenomics chromatin-feature peak compendium covering 690 transcription-factor binding profiles, 125 DNase I hypersensitivity profiles, and 104 histone-mark profiles (919 chromatin features in total)
Paper: Predicting effects of noncoding variants with deep learning-based sequence model
Developed by: Jian Zhou, Olga G. Troyanskaya
Model type: Three-layer 1D CNN over 1000 bp DNA for multi-task chromatin-feature prediction
Original Repository: DeepSEA

Usage¶

The model file depends on the multimolecule library. You can install it using pip:

Bash
1	`pip install multimolecule`

Direct Use¶

Chromatin Feature Prediction¶

You can use this model directly to predict the chromatin features of a DNA sequence:

Python
>>> import torch
>>> from multimolecule import DnaTokenizer, DeepSeaForSequencePrediction

>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/deepsea")
>>> model = DeepSeaForSequencePrediction.from_pretrained("multimolecule/deepsea")
>>> input = tokenizer("ACGT" * 250, return_tensors="pt")
>>> output = model(**input)

>>> output.logits.shape
torch.Size([1, 919])

Interface¶

Input length: fixed 1000 bp DNA window
Output: 919 chromatin-feature logits (multi-label binary), covering DNase I hypersensitivity, transcription-factor binding, and histone-mark peaks across multiple cell types

Training Details¶

DeepSEA was trained to predict the chromatin features of DNA sequences across a panel of human cell types and then used to score the regulatory impact of noncoding variants.

Training Data¶

DeepSEA was trained on chromatin profiling data from ENCODE and the Roadmap Epigenomics project, comprising 690 transcription-factor ChIP-seq profiles, 125 DNase I hypersensitivity profiles, and 104 histone-mark ChIP-seq profiles for a total of 919 chromatin features. Each 1000 bp genomic interval centered on a 200 bp bin is labeled with a binary vector indicating which of the 919 chromatin features have a peak overlapping the central bin.

Training Procedure¶

Pre-training¶

The model was trained to minimize a multi-label binary cross-entropy loss, comparing its predicted per-feature probabilities against the observed chromatin-feature labels.

Optimizer: Stochastic gradient descent with momentum
Loss: Multi-label binary cross-entropy
Regularization: Dropout (0.2 after the first two convolutions, 0.5 after the third convolution) and L2 weight decay

Citation¶

BibTeX
@article{zhou2015deepsea,
  author    = {Zhou, Jian and Troyanskaya, Olga G.},
  title     = {Predicting effects of noncoding variants with deep learning-based sequence model},
  journal   = {Nature Methods},
  volume    = 12,
  number    = 10,
  pages     = {931--934},
  year      = 2015,
  publisher = {Nature Publishing Group},
  doi       = {10.1038/nmeth.3547}
}

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If MultiMolecule supports your research, please cite the MultiMolecule project as follows:

BibTeX
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}

Contact¶

Please use GitHub issues of MultiMolecule for any questions or comments on the model card.

Please contact the authors of the DeepSEA paper for questions or comments on the paper/model.

License¶

This model implementation is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Text Only
1	`SPDX-License-Identifier: AGPL-3.0-or-later`

API Reference¶

DeepSeaConfig ¶

Bases: PreTrainedConfig

This is the configuration class to store the configuration of a DeepSeaModel. It is used to instantiate a DeepSEA model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the DeepSEA architecture from Zhou & Troyanskaya (Nat. Methods 2015).

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

Parameters:

Name	Type	Description	Default
`vocab_size` ¶	`int`	Vocabulary size of the DeepSEA model. DeepSEA consumes a one-hot encoding of the four DNA nucleotides, so this also defines the number of input channels of the first convolution. Defaults to 4.	`4`
`sequence_length` ¶	`int`	The fixed length of the input DNA sequence in base pairs. Defaults to 1000.	`1000`
`num_conv_layers` ¶	`int`	Number of convolutional layers in the encoder.	`3`
`conv_channels` ¶	`list[int] \| None`	Number of filters for each convolutional layer.	`None`
`conv_kernel_sizes` ¶	`list[int] \| None`	Kernel size for each convolutional layer.	`None`
`conv_pool_sizes` ¶	`list[int] \| None`	Max-pool size applied after each convolutional layer. A value of `1` means no pooling is applied after that layer (DeepSEA omits the pool between the third convolution and the fully-connected stack).	`None`
`conv_dropouts` ¶	`list[float] \| None`	Dropout probability applied after each convolutional layer.	`None`
`fc_sizes` ¶	`list[int] \| None`	Hidden dimensionality of each fully-connected layer between the convolutional stack and the output head.	`None`
`hidden_act` ¶	`str`	The non-linear activation function (function or string) in the encoder. If string, `"gelu"`, `"relu"`, `"silu"` and `"gelu_new"` are supported.	`'relu'`
`hidden_dropout` ¶	`float`	The dropout probability applied between the fully-connected layer and the output head.	`0.0`
`reverse_complement_average` ¶	`bool`	Whether `DeepSeaForSequencePrediction` averages forward and reverse-complement prediction probabilities. Defaults to True, matching the DeepSEA sequence-prediction checkpoint.	`True`
`num_labels` ¶	`int`	Number of output labels. DeepSEA predicts 919 chromatin-feature probabilities (DNase I hypersensitivity, transcription-factor binding, and histone-mark peaks across multiple cell types). Defaults to 919.	`919`
`head` ¶	`HeadConfig \| None`	The configuration of the prediction head. Defaults to a multi-label binary classification head (`problem_type="multilabel"`), matching DeepSEA’s chromatin-feature prediction task.	`None`

Examples:

Python Console Session
>>> from multimolecule import DeepSeaConfig, DeepSeaModel
>>> # Initializing a DeepSEA multimolecule/deepsea style configuration
>>> configuration = DeepSeaConfig()
>>> # Initializing a model (with random weights) from the multimolecule/deepsea style configuration
>>> model = DeepSeaModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

Source code in multimolecule/models/deepsea/configuration_deepsea.py

Python
class DeepSeaConfig(PreTrainedConfig):
    r"""
    This is the configuration class to store the configuration of a
    [`DeepSeaModel`][multimolecule.models.DeepSeaModel]. It is used to instantiate a DeepSEA model according to the
    specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a
    similar configuration to that of the DeepSEA architecture from Zhou & Troyanskaya (Nat. Methods 2015).

    Configuration objects inherit from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig] and can be used to
    control the model outputs. Read the documentation from [`PreTrainedConfig`][multimolecule.models.PreTrainedConfig]
    for more information.

    Args:
        vocab_size:
            Vocabulary size of the DeepSEA model. DeepSEA consumes a one-hot encoding of the four DNA nucleotides, so
            this also defines the number of input channels of the first convolution.
            Defaults to 4.
        sequence_length:
            The fixed length of the input DNA sequence in base pairs.
            Defaults to 1000.
        num_conv_layers:
            Number of convolutional layers in the encoder.
        conv_channels:
            Number of filters for each convolutional layer.
        conv_kernel_sizes:
            Kernel size for each convolutional layer.
        conv_pool_sizes:
            Max-pool size applied after each convolutional layer. A value of `1` means no pooling is applied after
            that layer (DeepSEA omits the pool between the third convolution and the fully-connected stack).
        conv_dropouts:
            Dropout probability applied after each convolutional layer.
        fc_sizes:
            Hidden dimensionality of each fully-connected layer between the convolutional stack and the output head.
        hidden_act:
            The non-linear activation function (function or string) in the encoder. If string, `"gelu"`, `"relu"`,
            `"silu"` and `"gelu_new"` are supported.
        hidden_dropout:
            The dropout probability applied between the fully-connected layer and the output head.
        reverse_complement_average:
            Whether [`DeepSeaForSequencePrediction`][multimolecule.models.DeepSeaForSequencePrediction] averages
            forward and reverse-complement prediction probabilities.
            Defaults to True, matching the DeepSEA sequence-prediction checkpoint.
        num_labels:
            Number of output labels. DeepSEA predicts 919 chromatin-feature probabilities (DNase I hypersensitivity,
            transcription-factor binding, and histone-mark peaks across multiple cell types).
            Defaults to 919.
        head:
            The configuration of the prediction head. Defaults to a multi-label binary classification head
            (`problem_type="multilabel"`), matching DeepSEA's chromatin-feature prediction task.

    Examples:
        >>> from multimolecule import DeepSeaConfig, DeepSeaModel
        >>> # Initializing a DeepSEA multimolecule/deepsea style configuration
        >>> configuration = DeepSeaConfig()
        >>> # Initializing a model (with random weights) from the multimolecule/deepsea style configuration
        >>> model = DeepSeaModel(configuration)
        >>> # Accessing the model configuration
        >>> configuration = model.config
    """

    model_type = "deepsea"

    def __init__(
        self,
        vocab_size: int = 4,
        sequence_length: int = 1000,
        num_conv_layers: int = 3,
        conv_channels: list[int] | None = None,
        conv_kernel_sizes: list[int] | None = None,
        conv_pool_sizes: list[int] | None = None,
        conv_dropouts: list[float] | None = None,
        fc_sizes: list[int] | None = None,
        hidden_act: str = "relu",
        hidden_dropout: float = 0.0,
        reverse_complement_average: bool = True,
        num_labels: int = 919,
        head: HeadConfig | None = None,
        **kwargs,
    ):
        super().__init__(num_labels=num_labels, **kwargs)
        if conv_channels is None:
            conv_channels = [320, 480, 960]
        if conv_kernel_sizes is None:
            conv_kernel_sizes = [8, 8, 8]
        if conv_pool_sizes is None:
            # Upstream DeepSEA pools after the first two convolutions only; the third convolution
            # is followed directly by the heavy 0.5 dropout and the fully-connected classifier.
            conv_pool_sizes = [4, 4, 1]
        if conv_dropouts is None:
            conv_dropouts = [0.2, 0.2, 0.5]
        if fc_sizes is None:
            fc_sizes = [925]
        lengths = (len(conv_channels), len(conv_kernel_sizes), len(conv_pool_sizes), len(conv_dropouts))
        if any(length != num_conv_layers for length in lengths):
            raise ValueError(
                "conv_channels, conv_kernel_sizes, conv_pool_sizes and conv_dropouts must each have length "
                f"num_conv_layers ({num_conv_layers}), but got {lengths[0]}, {lengths[1]}, {lengths[2]} "
                f"and {lengths[3]}."
            )
        if sequence_length <= 0:
            raise ValueError(f"sequence_length must be positive, but got {sequence_length}.")
        if not fc_sizes:
            raise ValueError("fc_sizes must contain at least one fully-connected layer.")
        self.vocab_size = vocab_size
        self.sequence_length = sequence_length
        self.num_conv_layers = num_conv_layers
        self.conv_channels = conv_channels
        self.conv_kernel_sizes = conv_kernel_sizes
        self.conv_pool_sizes = conv_pool_sizes
        self.conv_dropouts = conv_dropouts
        self.fc_sizes = fc_sizes
        self.hidden_size = fc_sizes[-1]
        self.hidden_act = hidden_act
        self.hidden_dropout = hidden_dropout
        self.reverse_complement_average = reverse_complement_average
        # DeepSEA performs multi-label binary classification of 919 chromatin features. The MultiMolecule
        # `problem_type` convention lives on the head config, since the Transformers base config only accepts
        # the HF `problem_type` literals.
        if head is None:
            head = HeadConfig(problem_type="multilabel")
        else:
            head = HeadConfig(head)
            if head.problem_type is None:
                head.problem_type = "multilabel"
        self.head = head

DeepSeaForSequencePrediction ¶

Bases: DeepSeaPreTrainedModel

Examples:

Python Console Session
>>> import torch
>>> from multimolecule import DeepSeaConfig, DeepSeaForSequencePrediction, DnaTokenizer
>>> config = DeepSeaConfig()
>>> model = DeepSeaForSequencePrediction(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/deepsea")
>>> input = tokenizer(["ACGT" * 250, "TGCA" * 250], return_tensors="pt")
>>> output = model(**input, labels=torch.randint(2, (2, 919)))
>>> output["logits"].shape
torch.Size([2, 919])
>>> output["loss"]
tensor(..., grad_fn=<...>)

Source code in multimolecule/models/deepsea/modeling_deepsea.py

Python
class DeepSeaForSequencePrediction(DeepSeaPreTrainedModel):
    """
    Examples:
        >>> import torch
        >>> from multimolecule import DeepSeaConfig, DeepSeaForSequencePrediction, DnaTokenizer
        >>> config = DeepSeaConfig()
        >>> model = DeepSeaForSequencePrediction(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/deepsea")
        >>> input = tokenizer(["ACGT" * 250, "TGCA" * 250], return_tensors="pt")
        >>> output = model(**input, labels=torch.randint(2, (2, 919)))
        >>> output["logits"].shape
        torch.Size([2, 919])
        >>> output["loss"]  # doctest:+ELLIPSIS
        tensor(..., grad_fn=<...>)
    """

    def __init__(self, config: DeepSeaConfig):
        super().__init__(config)
        self.model = DeepSeaModel(config)
        self.sequence_head = SequencePredictionHead(config)
        self.head_config = self.sequence_head.config

        # Initialize weights and apply final processing
        self.post_init()

    @property
    def output_channels(self) -> list[str]:
        id2label = getattr(self.config, "id2label", None)
        if id2label is not None:
            labels = [str(id2label.get(index, f"chromatin_{index}")) for index in range(self.config.num_labels)]
            if any(label != f"LABEL_{index}" for index, label in enumerate(labels)):
                return labels
        return [f"chromatin_{index}" for index in range(self.config.num_labels)]

    @can_return_tuple
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        labels: Tensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> tuple[Tensor, ...] | SequencePredictorOutput:
        if self.config.reverse_complement_average:
            input_ids, attention_mask, inputs_embeds = self._prepare_inputs(input_ids, attention_mask, inputs_embeds)

        outputs = self.model(
            input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
            return_dict=True,
            **kwargs,
        )

        output = self.sequence_head(outputs, labels=None)
        logits = output.logits

        if self.config.reverse_complement_average:
            reverse_outputs = self.model(
                self._reverse_complement_input_ids(input_ids) if input_ids is not None else None,
                attention_mask=attention_mask.flip(-1) if attention_mask is not None else None,
                inputs_embeds=(
                    self._reverse_complement_inputs_embeds(inputs_embeds) if inputs_embeds is not None else None
                ),
                return_dict=True,
                **kwargs,
            )
            reverse_output = self.sequence_head(reverse_outputs, labels=None)
            probabilities = (torch.sigmoid(logits) + torch.sigmoid(reverse_output.logits)) / 2
            probabilities = probabilities.clamp(min=1e-7, max=1.0 - 1e-7)
            logits = torch.logit(probabilities)

        loss = self._compute_loss(logits, labels)

        return SequencePredictorOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

    def postprocess(self, outputs: Any) -> Tensor:
        return torch.sigmoid(outputs["logits"])

    @staticmethod
    def _prepare_inputs(
        input_ids: Tensor | NestedTensor | None,
        attention_mask: Tensor | None,
        inputs_embeds: Tensor | NestedTensor | None,
    ) -> tuple[Tensor | None, Tensor | None, Tensor | None]:
        if isinstance(input_ids, NestedTensor):
            if attention_mask is None:
                attention_mask = input_ids.mask
            input_ids = input_ids.tensor
        if isinstance(inputs_embeds, NestedTensor):
            if attention_mask is None:
                attention_mask = inputs_embeds.mask
            inputs_embeds = inputs_embeds.tensor
        return input_ids, attention_mask, inputs_embeds

    @staticmethod
    def _reverse_complement_input_ids(input_ids: Tensor | None) -> Tensor | None:
        if input_ids is None:
            return None
        reverse_input_ids = input_ids.flip(-1)
        # Complement lookup under the MultiMolecule DNA alphabet where A=0, C=1, G=2, T=3 (nucleobase order):
        # A(0)↔T(3), C(1)↔G(2).  Token ids outside [0, 3] (e.g. N or padding) are left unchanged below.
        complement = torch.tensor([3, 2, 1, 0], device=input_ids.device, dtype=input_ids.dtype)
        valid = (reverse_input_ids >= 0) & (reverse_input_ids < complement.numel())
        complemented = complement[reverse_input_ids.clamp(min=0, max=complement.numel() - 1).long()]
        return torch.where(valid, complemented, reverse_input_ids)

    @staticmethod
    def _reverse_complement_inputs_embeds(inputs_embeds: Tensor | None) -> Tensor | None:
        if inputs_embeds is None:
            return None
        channels = torch.arange(inputs_embeds.size(-1) - 1, -1, -1, device=inputs_embeds.device)
        return inputs_embeds.flip(1).index_select(-1, channels)

    def _compute_loss(self, logits: Tensor, labels: Tensor | None) -> torch.FloatTensor | None:
        # Use sequence_head.criterion directly rather than calling sequence_head(outputs, labels) again:
        # when reverse_complement_average is True the logits already encode the averaged probability in
        # logit space (after both branches are run and merged), so the criterion must receive those final
        # re-logited values — not recompute them from the raw encoder output a second time.
        if labels is None:
            return None
        loss = self.sequence_head.criterion(logits, labels)
        loss_weight = getattr(self.sequence_head, "loss_weight", None)
        if loss_weight is not None:
            loss = loss * loss_weight
        return cast(torch.FloatTensor, loss)

DeepSeaModel ¶

Bases: DeepSeaPreTrainedModel

Examples:

Python Console Session
>>> from multimolecule import DeepSeaConfig, DeepSeaModel, DnaTokenizer
>>> config = DeepSeaConfig()
>>> model = DeepSeaModel(config)
>>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/deepsea")
>>> input = tokenizer(["ACGT" * 250, "TGCA" * 250], return_tensors="pt")
>>> output = model(**input)
>>> output["pooler_output"].shape
torch.Size([2, 925])

Source code in multimolecule/models/deepsea/modeling_deepsea.py

Python
class DeepSeaModel(DeepSeaPreTrainedModel):
    """
    Examples:
        >>> from multimolecule import DeepSeaConfig, DeepSeaModel, DnaTokenizer
        >>> config = DeepSeaConfig()
        >>> model = DeepSeaModel(config)
        >>> tokenizer = DnaTokenizer.from_pretrained("multimolecule/deepsea")
        >>> input = tokenizer(["ACGT" * 250, "TGCA" * 250], return_tensors="pt")
        >>> output = model(**input)
        >>> output["pooler_output"].shape
        torch.Size([2, 925])
    """

    def __init__(self, config: DeepSeaConfig):
        super().__init__(config)
        self.embeddings = DeepSeaEmbedding(config)
        self.encoder = DeepSeaEncoder(config)
        # Initialize weights and apply final processing
        self.post_init()

    @merge_with_config_defaults
    @capture_outputs
    def forward(
        self,
        input_ids: Tensor | NestedTensor | None = None,
        attention_mask: Tensor | None = None,
        inputs_embeds: Tensor | NestedTensor | None = None,
        **kwargs: Unpack[TransformersKwargs],
    ) -> DeepSeaModelOutput:
        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        if input_ids is None and inputs_embeds is None:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        if isinstance(input_ids, NestedTensor):
            if attention_mask is None:
                attention_mask = input_ids.mask
            input_ids = input_ids.tensor
        if isinstance(inputs_embeds, NestedTensor):
            if attention_mask is None:
                attention_mask = inputs_embeds.mask
            inputs_embeds = inputs_embeds.tensor

        embedding_output = self.embeddings(
            input_ids=input_ids,
            attention_mask=attention_mask,
            inputs_embeds=inputs_embeds,
        )
        # The DeepSEA encoder collapses the sequence dimension through its fully-connected stack, so the
        # final feature vector is both the model's last hidden state and its pooled representation.
        sequence_output = self.encoder(embedding_output)

        return DeepSeaModelOutput(
            last_hidden_state=sequence_output,
            pooler_output=sequence_output,
        )

DeepSeaModelOutput `dataclass` ¶

Bases: ModelOutput

Base class for outputs of the DeepSEA backbone.

Parameters:

Name	Type	Description	Default
`last_hidden_state` ¶	`torch.FloatTensor` of shape `(batch_size, hidden_size)`	Final feature vector produced by the DeepSEA encoder.	`None`
`pooler_output` ¶	`torch.FloatTensor` of shape `(batch_size, hidden_size)`	Same tensor as `last_hidden_state`; DeepSEA collapses the sequence dimension in its encoder.	`None`
`attentions` ¶	`tuple[FloatTensor, ...] \| None`	Always `None`; DeepSEA is a convolutional model and has no attention layers. Provided for compatibility with the Transformers output convention.	`None`

Source code in multimolecule/models/deepsea/modeling_deepsea.py

Python
@dataclass
class DeepSeaModelOutput(ModelOutput):
    """
    Base class for outputs of the DeepSEA backbone.

    Args:
        last_hidden_state (`torch.FloatTensor` of shape `(batch_size, hidden_size)`):
            Final feature vector produced by the DeepSEA encoder.
        pooler_output (`torch.FloatTensor` of shape `(batch_size, hidden_size)`):
            Same tensor as `last_hidden_state`; DeepSEA collapses the sequence dimension in its encoder.
        hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or
            when `config.output_hidden_states=True`):
            Tuple containing the one-hot embedding output and the final encoder feature vector.
        attentions:
            Always `None`; DeepSEA is a convolutional model and has no attention layers. Provided for compatibility
            with the Transformers output convention.
    """

    last_hidden_state: torch.FloatTensor | None = None
    pooler_output: torch.FloatTensor | None = None
    hidden_states: tuple[torch.FloatTensor, ...] | None = None
    attentions: tuple[torch.FloatTensor, ...] | None = None

DeepSeaPreTrainedModel ¶

Bases: PreTrainedModel

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

Source code in multimolecule/models/deepsea/modeling_deepsea.py

Python
class DeepSeaPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = DeepSeaConfig
    base_model_prefix = "model"
    supports_gradient_checkpointing = True
    _can_record_outputs: dict[str, Any] | None = None
    _no_split_modules = ["DeepSeaConvLayer"]

    @torch.no_grad()
    def _init_weights(self, module: nn.Module):
        # Use transformers.initialization wrappers (imported as `init`); they check the
        # `_is_hf_initialized` flag so they don't clobber tensors loaded from a checkpoint.
        if isinstance(module, (nn.Conv1d, nn.Linear)):
            init.kaiming_normal_(module.weight, mode="fan_in", nonlinearity="relu")
            if module.bias is not None:
                init.zeros_(module.bias)

DeepSEA¶

Disclaimer¶

Model Details¶

Model Specification¶

Links¶

Usage¶

Direct Use¶

Chromatin Feature Prediction¶

Interface¶

Training Details¶

Training Data¶

Training Procedure¶

Pre-training¶

Citation¶

Contact¶

License¶

API Reference¶

DeepSeaConfig ¶

`vocab_size` ¶

`sequence_length` ¶

`num_conv_layers` ¶

`conv_channels` ¶

`conv_kernel_sizes` ¶

`conv_pool_sizes` ¶

`conv_dropouts` ¶

`fc_sizes` ¶

`hidden_act` ¶

`hidden_dropout` ¶

`reverse_complement_average` ¶

`num_labels` ¶

`head` ¶

DeepSeaForSequencePrediction ¶

DeepSeaModel ¶

DeepSeaModelOutput `dataclass` ¶

`last_hidden_state` ¶

`pooler_output` ¶

`attentions` ¶

DeepSeaPreTrainedModel ¶

DeepSEA¶

Disclaimer¶

Model Details¶

Model Specification¶

Links¶

Usage¶

Direct Use¶

Chromatin Feature Prediction¶

Interface¶

Training Details¶

Training Data¶

Training Procedure¶

Pre-training¶

Citation¶

Contact¶

License¶

API Reference¶

DeepSeaConfig ¶

vocab_size ¶

sequence_length ¶

num_conv_layers ¶

conv_channels ¶

conv_kernel_sizes ¶

conv_pool_sizes ¶

conv_dropouts ¶

fc_sizes ¶

hidden_act ¶

hidden_dropout ¶

reverse_complement_average ¶

num_labels ¶

head ¶

DeepSeaForSequencePrediction ¶

DeepSeaModel ¶

DeepSeaModelOutput dataclass ¶

last_hidden_state ¶

pooler_output ¶

attentions ¶

DeepSeaPreTrainedModel ¶

`vocab_size` ¶

`sequence_length` ¶

`num_conv_layers` ¶

`conv_channels` ¶

`conv_kernel_sizes` ¶

`conv_pool_sizes` ¶

`conv_dropouts` ¶

`fc_sizes` ¶

`hidden_act` ¶

`hidden_dropout` ¶

`reverse_complement_average` ¶

`num_labels` ¶

`head` ¶

DeepSeaModelOutput `dataclass` ¶

`last_hidden_state` ¶

`pooler_output` ¶

`attentions` ¶