跳转至

bpRNA-spot

bpRNA-spot is a database of single molecule secondary structures annotated using bpRNA.

bpRNA-spot is a subset of bpRNA-1m. It applies CD-HIT (CD-HIT-EST) to remove sequences with more than 80% sequence similarity from bpRNA-1m. It further randomly splits the remaining sequences into training, validation, and test sets with a ratio of apprxiately 8:1:1.

Disclaimer

This is an UNOFFICIAL release of the bpRNA-spot by Jaswinder Singh, et al.

The team releasing bpRNA-spot did not write this dataset card for this dataset so this dataset card has been written by the MultiMolecule team.

Dataset Description

  • bpRNA-1m: A database of single molecule secondary structures annotated using bpRNA.
  • bpRNA-new: A dataset of newly discovered RNA families from Rfam 14.2, designed for cross-family validation to assess generalization capability.
  • RNAStrAlign: A database of RNA secondary with the same families as ArchiveII, usually used for training.

License

This dataset is licensed under the GNU Affero General Public License.

For additional terms and clarifications, please refer to our License FAQ.

Text Only
SPDX-License-Identifier: AGPL-3.0-or-later

Citation

BibTeX
@article{singh2019rna,
  author    = {Singh, Jaswinder and Hanson, Jack and Paliwal, Kuldip and Zhou, Yaoqi},
  journal   = {Nature Communications},
  month     = nov,
  number    = 1,
  pages     = {5407},
  publisher = {Springer Science and Business Media LLC},
  title     = {{RNA} secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning},
  volume    = 10,
  year      = 2019
}

@article{darty2009varna,
  author    = {Darty, K{\'e}vin and Denise, Alain and Ponty, Yann},
  journal   = {Bioinformatics},
  month     = aug,
  number    = 15,
  pages     = {1974--1975},
  publisher = {Oxford University Press (OUP)},
  title     = {{VARNA}: Interactive drawing and editing of the {RNA} secondary structure},
  volume    = 25,
  year      = 2009
}

@article{berman2000protein,
  author    = {Berman, H M and Westbrook, J and Feng, Z and Gilliland, G and Bhat, T N and Weissig, H and Shindyalov, I N and Bourne, P E},
  journal   = {Nucleic Acids Research},
  month     = jan,
  number    = 1,
  pages     = {235--242},
  publisher = {Oxford University Press (OUP)},
  title     = {The Protein Data Bank},
  volume    = 28,
  year      = 2000
}

Note

The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:

BibTeX
@software{chen_2024_12638419,
  author    = {Chen, Zhiyuan and Zhu, Sophia Y.},
  title     = {MultiMolecule},
  doi       = {10.5281/zenodo.12638419},
  publisher = {Zenodo},
  url       = {https://doi.org/10.5281/zenodo.12638419},
  year      = 2024,
  month     = may,
  day       = 4
}