跳转至

bpRNA-1m

bpRNA-new is a database of single molecule secondary structures annotated using bpRNA.

bpRNA-new is a dataset of RNA families from Rfam 14.2, designed for cross-family validation to assess generalization capability. It focuses on families distinct from those in bpRNA-1m, providing a robust benchmark for evaluating model performance on unseen RNA families.

Disclaimer

This is an UNOFFICIAL release of the bpRNA-new by Kengo Sato, et al.

The team releasing bpRNA-new did not write this dataset card for this dataset so this dataset card has been written by the MultiMolecule team.

Dataset Description

  • bpRNA-1m: A database of single molecule secondary structures annotated using bpRNA.
  • bpRNA-spot: A subset of bpRNA-1m that applies CD-HIT (CD-HIT-EST) to remove sequences with more than 80% sequence similarity from bpRNA-1m.
  • ArchiveII: A database of RNA secondary with the same families as RNAStrAlign, usually used for testing.

License

This dataset is licensed under the AGPL-3.0 License.

Text Only
SPDX-License-Identifier: AGPL-3.0-or-later

Citation

BibTeX
@article{sato2021rna,
  author    = {Sato, Kengo and Akiyama, Manato and Sakakibara, Yasubumi},
  journal   = {Nature Communications},
  month     = feb,
  number    = 1,
  pages     = {941},
  publisher = {Springer Science and Business Media LLC},
  title     = {{RNA} secondary structure prediction using deep learning with thermodynamic integration},
  volume    = 12,
  year      = 2021
}