bpRNA-1m¶
bpRNA-spot is a database of single molecule secondary structures annotated using bpRNA.
bpRNA-spot is a subset of bpRNA-1m. It applies CD-HIT (CD-HIT-EST) to remove sequences with more than 80% sequence similarity from bpRNA-1m. It further randomly splits the remaining sequences into training, validation, and test sets with a ratio of apprxiately 8:1:1.
Disclaimer¶
This is an UNOFFICIAL release of the bpRNA-spot by Jaswinder Singh, et al.
The team releasing bpRNA-spot did not write this dataset card for this dataset so this dataset card has been written by the MultiMolecule team.
Dataset Description¶
- Homepage: https://multimolecule.danling.org/datasets/bprna-spot
- datasets: https://huggingface.co/datasets/multimolecule/bprna-spot
- Point of Contact: Kuldip Paliwal and Yaoqi Zhou
Related Datasets¶
- bpRNA-1m: A database of single molecule secondary structures annotated using bpRNA.
- bpRNA-new: A dataset of newly discovered RNA families from Rfam 14.2, designed for cross-family validation to assess generalization capability.
- RNAStrAlign: A database of RNA secondary with the same families as ArchiveII, usually used for training.
License¶
This dataset is licensed under the AGPL-3.0 License.
Text Only | |
---|---|