bpRNA-spot¶
bpRNA-spot is a database of single molecule secondary structures annotated using bpRNA.
bpRNA-spot is a subset of bpRNA-1m. It applies CD-HIT (CD-HIT-EST) to remove sequences with more than 80% sequence similarity from bpRNA-1m. It further randomly splits the remaining sequences into training, validation, and test sets with a ratio of apprxiately 8:1:1.
Disclaimer¶
This is an UNOFFICIAL release of the bpRNA-spot by Jaswinder Singh, et al.
The team releasing bpRNA-spot did not write this dataset card for this dataset so this dataset card has been written by the MultiMolecule team.
Dataset Description¶
- Homepage: https://multimolecule.danling.org/datasets/bprna-spot
- datasets: https://huggingface.co/datasets/multimolecule/bprna-spot
- Point of Contact: Kuldip Paliwal and Yaoqi Zhou
Related Datasets¶
- bpRNA-1m: A database of single molecule secondary structures annotated using bpRNA.
- bpRNA-new: A dataset of newly discovered RNA families from Rfam 14.2, designed for cross-family validation to assess generalization capability.
- RNAStrAlign: A database of RNA secondary with the same families as ArchiveII, usually used for training.
License¶
This dataset is licensed under the GNU Affero General Public License.
For additional terms and clarifications, please refer to our License FAQ.
| Text Only | |
|---|---|
Citation¶
Note
The artifacts distributed in this repository are part of the MultiMolecule project. If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows: