跳转至

data

data 提供了一系列用于处理数据的实用工具。

尽管 🤗 datasets 是一个强大的管理数据集的库,但它是一个通用工具,可能无法涵盖科学应用程序的所有特定功能。

data 包旨在通过提供在科学任务中常用的数据处理实用程序来补充 datasets

使用

从本地数据文件加载

Python
1
2
3
4
5
6
7
8
9
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

# For additional terms and clarifications, please refer to our License FAQ at:
# <https://multimolecule.danling.org/about/license-faq>.


from multimolecule.data import Dataset

data = Dataset("data/rna/5utr.csv", split="train", pretrained="multimolecule/rna")

🤗 datasets加载

Python
1
2
3
4
5
6
7
8
9
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

# For additional terms and clarifications, please refer to our License FAQ at:
# <https://multimolecule.danling.org/about/license-faq>.


from multimolecule.data import Dataset

data = Dataset("multimolecule/bprna-spot", split="train", pretrained="multimolecule/rna")