Enformer¶
Transformer-based deep neural network for predicting genomic coverage tracks from long DNA sequences with long-range context.
Disclaimer¶
This is an UNOFFICIAL implementation of Effective gene expression prediction from sequence by integrating long-range interactions by Žiga Avsec, Vikram Agarwal, Daniel Visentin, et al.
The OFFICIAL repository of Enformer is at google-deepmind/deepmind-research/enformer.
Tip
The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.
The team releasing Enformer did not write this model card for this model so this model card has been written by the MultiMolecule team.
Model Details¶
Enformer is the successor of Basenji. It replaces Basenji’s dilated convolution tower with a convolution stem followed by a Transformer trunk, which lets it model long-range genomic interactions. It consumes a long DNA window (~197 kb), passes it through a convolution + attention-pooling stem that downsamples the sequence by 2 ** 7 = 128x, processes the binned representation with 11 Transformer blocks using Transformer-XL style relative positional encoding, center-crops to 896 output bins, and applies a pointwise head plus a per-species linear track projection with a softplus activation. The prediction is binned: the output has shape (batch_size, target_length, num_tracks) where each bin summarizes 128 bp of sequence and num_tracks is the number of genomic coverage experiments for the selected species.
Model Specification¶
| Input Length | Bin Size | Output Bins | Hidden Size | Layers | Heads | Num Labels | Num Parameters (M) | FLOPs (P) | MACs (P) | Max Num Tokens |
|---|---|---|---|---|---|---|---|---|---|---|
| 196608 | 128 | 896 | 1536 | 11 | 8 | 5313 | 246.18 | - | - | 196,608 |
The table reports the human output head. The mouse head predicts 1643 tracks. FLOPs and MACs have not been recomputed for the canonical 196,608 bp Enformer input window.
Links¶
- Code: multimolecule.enformer
- Data: ENCODE, FANTOM5, GTEx CAGE, ChIP-seq, DNase-seq, and related genomic coverage tracks
- Paper: Effective gene expression prediction from sequence by integrating long-range interactions
- Developed by: Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, David R. Kelley
- Model type: Convolutional stem followed by Transformer trunk with long-range attention for binned multi-track genomic coverage prediction
- Original Repository: google-deepmind/deepmind-research/enformer
Usage¶
The model file depends on the multimolecule library. You can install it using pip:
| Bash | |
|---|---|
Direct Use¶
Genomic Coverage Prediction¶
You can use this model to predict binned genomic coverage tracks from a DNA sequence:
The binned positional axis is treated as the “token” axis: each output position corresponds to one
genomic bin rather than a single nucleotide. The species configuration option selects the
human (5,313 tracks) or mouse (1,643 tracks) output head.
Interface¶
- Input length: fixed 196,608 bp DNA window
- Output binning: 128 bp per output bin; 896 output bins per window (after center-cropping the binned representation)
- Species head: select
human(5,313 tracks) ormouse(1,643 tracks) via thespeciesconfig option - Output: raw pre-softplus
logitsof shape(batch_size, target_length, num_tracks); usepostprocessfor non-negative coverage tracks
Training Details¶
Enformer was trained to predict genomic coverage tracks (DNase-seq, ATAC-seq, ChIP-seq and CAGE) from the human and mouse reference genomes.
Training Data¶
The model was trained on a large compendium of functional genomics experiments aligned to the human (hg38) and mouse (mm10) reference genomes. The genome was divided into overlapping windows; for each window the per-128-bp coverage of every experiment served as the regression target.
Training Procedure¶
Pre-training¶
The model was trained to minimize a Poisson regression loss between predicted and observed coverage, using a softplus output activation to keep the predicted coverage non-negative.
Citation¶
Note
The artifacts distributed in this repository are part of the MultiMolecule project. If MultiMolecule supports your research, please cite the MultiMolecule project as follows:
| BibTeX | |
|---|---|
Contact¶
Please use GitHub issues of MultiMolecule for any questions or comments on the model card.
Please contact the authors of the Enformer paper for questions or comments on the paper/model.
License¶
This model implementation is licensed under the GNU Affero General Public License.
For additional terms and clarifications, please refer to our License FAQ.
| Text Only | |
|---|---|
API Reference¶
EnformerConfig
¶
Bases: PreTrainedConfig
This is the configuration class to store the configuration of a
EnformerModel. It is used to instantiate an Enformer model according to
the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will
yield a similar configuration to that of the Enformer
deepmind/enformer architecture.
Configuration objects inherit from PreTrainedConfig and can be used to
control the model outputs. Read the documentation from PreTrainedConfig
for more information.
Enformer is the successor of Basenji. It replaces Basenji’s dilated convolution tower with a
convolution stem followed by a Transformer trunk so it can model long-range genomic
interactions. A long DNA window of sequence_length base pairs is downsampled by the
convolution stem (2 ** num_downsamples, i.e. 128 bp per bin by default), processed by the
Transformer trunk, cropped to target_length bins, and projected to genomic coverage tracks.
The output has shape (batch_size, target_length, num_labels).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Vocabulary size of the Enformer model. Defines the number of input feature channels
derived from the MultiMolecule DNA token order.
Defaults to 5 ( |
5
|
|
int
|
The length, in base pairs, of the input DNA window. Defaults to 196608 (~197 kb), matching the released Enformer checkpoint. |
196608
|
|
int
|
Dimensionality of the Transformer trunk. The convolution stem’s first conv produces
|
1536
|
|
int
|
Number of Transformer blocks in the trunk. |
11
|
|
int
|
Number of attention heads in each Transformer block. |
8
|
|
int
|
Dimensionality of the query/key projection per head. |
64
|
|
int
|
Total number of 2x downsampling steps applied by the convolution stem. The binning
factor is |
7
|
|
int
|
The conv-tower channel sizes are rounded to a multiple of this value. |
128
|
|
int
|
Kernel size of the first (stem) convolution. |
15
|
|
int
|
Kernel size of the main convolution in every conv-tower stage. |
5
|
|
int
|
Number of output bins kept after center-cropping the trunk output. |
896
|
|
int | None
|
Dimensionality of the pointwise output head before the final track projection.
Defaults to |
None
|
|
str
|
The non-linear activation function used by the convolution blocks and the pointwise
head. Enformer uses the sigmoid GELU approximation |
'quick_gelu'
|
|
str
|
Activation applied to the per-track predictions. Enformer applies |
'softplus'
|
|
float
|
The dropout probability applied in the Transformer trunk. |
0.4
|
|
float
|
The dropout probability applied to the attention matrix. |
0.05
|
|
float
|
The dropout probability applied to the relative positional features. |
0.01
|
|
bool
|
Whether to use the fixed precomputed gamma relative-position basis distributed with the released checkpoint. The official converted checkpoint stores this basis table. |
False
|
|
float
|
The epsilon used by the batch normalization layers. |
1e-05
|
|
float
|
The momentum used by the batch normalization layers. |
0.1
|
|
str
|
Output head to expose downstream. Enformer is trained with two species heads; the
selected head determines |
'human'
|
|
int | None
|
Number of genomic coverage tracks predicted per bin. Defaults to the track count of
the selected |
None
|
|
HeadConfig | None
|
Head configuration for the binned track prediction head. |
None
|
|
bool
|
Whether to output the context vectors for each trunk block. |
False
|
Examples:
Source code in multimolecule/models/enformer/configuration_enformer.py
| Python | |
|---|---|
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | |
EnformerForTokenPrediction
¶
Bases: EnformerPreTrainedModel
Enformer with a pointwise regression head over genomic coverage tracks.
The binned positional axis is treated as the “token” axis: logits have shape
(batch_size, target_length, num_labels) where num_labels is the number of coverage tracks
of the configured species head.
Examples:
Source code in multimolecule/models/enformer/modeling_enformer.py
| Python | |
|---|---|
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | |
postprocess
¶
postprocess(
outputs: TokenPredictorOutput | ModelOutput | Tensor,
) -> tuple[Tensor, list[str]]
Return the non-negative per-track coverage prediction with track channel names.
Source code in multimolecule/models/enformer/modeling_enformer.py
EnformerModel
¶
Bases: EnformerPreTrainedModel
The bare Enformer backbone. Consumes a long DNA window and returns binned hidden states.
The positional axis of the output is binned: a window of config.sequence_length base pairs
is downsampled by the convolution stem, processed by the Transformer trunk, and center-cropped
so last_hidden_state has shape (batch_size, target_length, head_hidden_size).
Examples:
Source code in multimolecule/models/enformer/modeling_enformer.py
EnformerPreTrainedModel
¶
Bases: PreTrainedModel
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.