Borzoi¶
Sequence-to-coverage neural network for predicting RNA-seq and chromatin tracks across 524 kb DNA windows at 32 bp resolution.
Disclaimer¶
This is an UNOFFICIAL implementation of Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation by Johannes Linder, Divyanshi Srivastava, Han Yuan, et al.
The OFFICIAL repository of Borzoi is at calico/borzoi.
Tip
The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.
The team releasing Borzoi did not write this model card for this model so this model card has been written by the MultiMolecule team.
Model Details¶
Borzoi is the successor of Enformer. It extends the Enformer recipe (convolution stem + Transformer trunk + binned multi-track output) to a 524,288 bp input window and 32 bp output bins, and adds a U-Net style upsampling tail so the binned positional axis matches a higher-resolution coverage prediction. A long DNA window of 524 kb is downsampled by a convolution stem and a width-growing residual convolution tower, projected to 1,536 channels by a U-Net bottleneck, processed by 8 Transformer blocks with Transformer-XL style relative positional encoding, then upsampled by two skip-connected U-Net stages with depthwise-separable convolutions, center-cropped to 6,144 bins, and projected to per-species coverage tracks with a softplus activation. The output is binned: it has shape (batch_size, target_length, num_tracks) where each bin summarizes 32 bp of sequence and num_tracks is the number of genomic coverage experiments for the selected species. Borzoi was trained jointly on RNA-seq, CAGE, ATAC-seq, DNase-seq, and ChIP-seq tracks. Please refer to the Training Details section for more information on the training process.
Variants¶
Borzoi releases human and mouse species heads.
- multimolecule/borzoi: human head.
- multimolecule/borzoi-mouse: mouse head.
Model Specification¶
| Input Length | Bin Size | Output Bins | Hidden Size | Layers | Heads | Num Labels | Num Parameters (M) | FLOPs (P) | MACs (P) |
|---|---|---|---|---|---|---|---|---|---|
| 524288 | 32 | 6144 | 1536 | 8 | 8 | 7611 | 185.90 | 13.57 | 6.76 |
The table reports the human output head. The mouse head predicts 2,608 tracks. FLOPs and MACs are measured on the canonical 524,288 bp Borzoi input window.
Links¶
- Code: multimolecule.borzoi
- Data: ENCODE, GTEx, FANTOM5 RNA-seq / CAGE / ATAC-seq / DNase-seq / ChIP-seq tracks aligned to human and mouse genomes
- Paper: Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation
- Developed by: Johannes Linder, Divyanshi Srivastava, Han Yuan, Vikram Agarwal, David R. Kelley
- Model type: Convolutional stem followed by Transformer trunk and U-Net upsampling tail for binned multi-track RNA-seq and chromatin coverage prediction
- Original Repository: calico/borzoi
Usage¶
The model file depends on the multimolecule library. You can install it using pip:
| Bash | |
|---|---|
Direct Use¶
Genomic Coverage Prediction¶
You can use this model to predict binned RNA-seq and chromatin coverage tracks from a DNA sequence:
The binned positional axis is treated as the “token” axis: each output position corresponds to one
genomic bin rather than a single nucleotide. The species configuration option selects the
human (7,611 tracks) or mouse (2,608 tracks) output head.
Interface¶
- Input length: fixed 524,288 bp DNA window
- Output binning: 32 bp per output bin; 6,144 output bins per window (after center-cropping the U-Net upsampling tail)
- Species head: select
human(7,611 tracks) ormouse(2,608 tracks) via thespeciesconfig option - Output:
(batch_size, target_length, num_tracks)
Training Details¶
Borzoi was trained to predict bulk RNA-seq coverage together with chromatin tracks (DNase-seq, ATAC-seq, ChIP-seq) and CAGE from the human and mouse reference genomes.
Training Data¶
The model was trained on a large compendium of functional genomics experiments aligned to the human (hg38) and mouse (mm10) reference genomes. The genome was divided into 524 kb windows; for each window the per-32-bp coverage of every experiment served as the regression target. The training set is dominated by RNA-seq coverage (the modality Borzoi extends over Enformer); the remaining tracks include the chromatin and CAGE modalities used by Enformer.
Training Procedure¶
Pre-training¶
The model was trained to minimize a Poisson-multinomial regression loss between predicted and observed coverage, using a softplus output activation to keep the predicted coverage non-negative. Training used the Adam optimizer with a warmup schedule and global gradient-norm clipping; reverse-complement and small genomic-shift data augmentations were applied during training.
Citation¶
Note
The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:
| BibTeX | |
|---|---|
Contact¶
Please use GitHub issues of MultiMolecule for any questions or comments on the model card.
Please contact the authors of the Borzoi paper for questions or comments on the paper/model.
License¶
This model implementation is licensed under the GNU Affero General Public License.
For additional terms and clarifications, please refer to our License FAQ.
| Text Only | |
|---|---|
API Reference¶
BorzoiConfig
¶
Bases: PreTrainedConfig
This is the configuration class to store the configuration of a
BorzoiModel. It is used to instantiate a Borzoi model according to the
specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a
configuration that reproduces the upstream Borzoi human architecture
(calico/borzoi, examples/params_pred.json).
Configuration objects inherit from PreTrainedConfig and can be used to
control the model outputs. Read the documentation from PreTrainedConfig
for more information.
Borzoi is the successor of Enformer. It extends the Enformer recipe (convolution stem + Transformer trunk +
binned multi-track output) to a 524,288 bp input window and 32 bp output bins, and adds a U-Net style
upsampling tail so the binned positional axis matches a higher-resolution coverage prediction. A long DNA
window of sequence_length base pairs is downsampled by a convolution stem and a width-growing residual
convolution tower, projected to hidden_size channels by a U-Net bottleneck, processed by the Transformer
trunk with Transformer-XL style relative positional encoding, then upsampled by two skip-connected U-Net
stages with depthwise-separable convolutions, center-cropped to target_length bins, and projected to
per-species coverage tracks with a softplus activation. The output has shape
(batch_size, target_length, num_labels).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int
|
Vocabulary size of the Borzoi model. Defines the number of input feature channels derived from the
MultiMolecule DNA token order.
Defaults to 5 ( |
5
|
|
int
|
The length, in base pairs, of the input DNA window. Defaults to 524288 (= 512 kb). |
524288
|
|
int
|
Dimensionality of the Transformer trunk and the U-Net upsampling tail. |
1536
|
|
int
|
Number of Transformer blocks in the trunk. |
8
|
|
int
|
Number of attention heads in each Transformer block. |
8
|
|
int
|
Dimensionality of the query/key projection per head. |
64
|
|
int
|
Dimensionality of the value projection per head. Borzoi uses a larger value dim than key dim. |
192
|
|
int
|
Number of relative positional features used by the Transformer-XL style attention. |
32
|
|
int
|
Number of channels produced by the first (stem) convolution. |
512
|
|
int
|
Kernel size of the first (stem) convolution. |
15
|
|
list[int] | None
|
Explicit per-stage output channel schedule of the reducing convolution tower. Borzoi grows the
width as |
None
|
|
int
|
Kernel size used by every convolution in the reducing tower. |
5
|
|
int
|
Kernel size of the depthwise-separable convolutions in the U-Net upsampling tail. |
3
|
|
int
|
Channel count of the final pointwise convolution block feeding the per-species track head. |
1920
|
|
str
|
The non-linear activation used throughout the convolution blocks. Borzoi uses the tanh-approximation
GELU ( |
'gelu_new'
|
|
str
|
Activation applied to the per-track predictions. Borzoi applies |
'softplus'
|
|
float
|
Dropout probability of the final pointwise convolution block. |
0.1
|
|
float
|
Dropout probability applied inside the Transformer feed-forward sublayer. |
0.2
|
|
float
|
Dropout probability applied to the attention matrix. |
0.05
|
|
float
|
Dropout probability applied to the relative positional features. |
0.01
|
|
float
|
Epsilon used by the batch normalization layers. |
0.001
|
|
float
|
Momentum used by the batch normalization layers (PyTorch convention; upstream Keras momentum 0.9 corresponds to PyTorch momentum 0.1). |
0.1
|
|
str
|
Output head to expose downstream. Borzoi is trained with two species heads; the selected head
determines |
'human'
|
|
int
|
Number of output bins kept after center-cropping the U-Net output. Defaults to 6144 (the
|
6144
|
|
int | None
|
Number of genomic coverage tracks predicted per bin. Defaults to the track count of the selected
|
None
|
|
HeadConfig | None
|
Head configuration for the binned track prediction head. |
None
|
|
bool
|
Whether to output the context vectors for each trunk block. |
False
|
Examples:
Source code in multimolecule/models/borzoi/configuration_borzoi.py
| Python | |
|---|---|
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | |
num_downsamples
property
¶
num_downsamples: int
Number of 2x downsampling stages: stem + tower + U-Net bottleneck + final pool.
pool_factor
property
¶
pool_factor: int
Total downsampling factor at the transformer trunk, i.e. base pairs per attention position.
BorzoiForTokenPrediction
¶
Bases: BorzoiPreTrainedModel
Borzoi with a pointwise regression head over genomic coverage tracks.
The binned positional axis is treated as the “token” axis: logits have shape
(batch_size, target_length, num_labels) where num_labels is the number of coverage tracks
of the configured species head.
Examples:
Source code in multimolecule/models/borzoi/modeling_borzoi.py
| Python | |
|---|---|
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | |
BorzoiModel
¶
Bases: BorzoiPreTrainedModel
The bare Borzoi backbone. Consumes a long DNA window and returns binned hidden states.
The architecture follows the upstream Borzoi trunk: a pre-activation convolution stem with attention-pool
downsampling, a width-growing residual convolution tower, a U-Net bottleneck pool, a Transformer trunk
with Transformer-XL style relative positional encoding, two U-Net upsampling stages with depthwise-separable
convolutions, and a center-crop. The positional axis of the output is binned: a window of
config.sequence_length base pairs is downsampled and then re-upsampled, and last_hidden_state has shape
(batch_size, target_length, head_hidden_size).
Examples:
Source code in multimolecule/models/borzoi/modeling_borzoi.py
BorzoiPreTrainedModel
¶
Bases: PreTrainedModel
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.