Skip to content

data

data provides a collection of data processing utilities for handling data.

While 🤗 datasets is a powerful library for managing datasets, it is a general-purpose tool that may not cover all the specific functionalities of scientific applications.

The data package is designed to complement datasets by offering additional data processing utilities that are commonly used in scientific tasks.

Usage

Load from local data file

Python
1
2
3
from multimolecule.data import Dataset

data = Dataset("data/rna/5utr.csv", split="train", pretrained="multimolecule/rna")

Load from 🤗 datasets

Python
1
2
3
from multimolecule.data import Dataset

data = Dataset("multimolecule/bprna-spot", split="train", pretrained="multimolecule/rna")