MultiMolecule
What are you trying to do?¶
MultiMolecule covers four common entry points: task-level prediction, model fine-tuning, direct use of pretrained models, and curated biological datasets.
Prediction
Run task-level predictions¶
Registered pipelines turn biological task names and input sequences into structured predictions without manual model assembly.
Training
Fine-tune pretrained models¶
The runner connects pretrained models with Hugging Face datasets or labelled local tables, using sequence and label columns to build task-aware batches.
Models
Use pretrained models¶
Documented pretrained models are available for Python-level control beyond task pipelines. Model cards give checkpoint IDs, expected inputs, citations, and licenses.
| Python | |
|---|---|
Datasets
Use curated datasets¶
Curated biological datasets include sequence and label fields, task metadata, source information, citations, and licenses for benchmarks, examples, and fine-tuning.
One stack underneath¶
MultiMolecule provides the same layers behind these entry points: documented resources, biological input handling, reusable model components, and execution tools for prediction, training, evaluation, and scripted use.
Execution
Task-level entry points
Pipelines provide ready task predictions, the runner manages supervised training and evaluation, and API entry points support scripts and applications.
Resources
Documented resources
Dataset cards and model cards collect supported inputs, task names, model checkpoints, citations, licenses, and training metadata.
Data layer
Biological files to trainable batches
IO reads biological sequence and structure formats, tokenisers encode molecules, and data abstractions infer task fields and prepare runner-ready batches.
io · tokenisers · data
Model layer
Reusable model components
Models provide pretrained configs, AutoModel classes, checkpoints, and output contracts; modules provide backbones, heads, losses, and embeddings for custom architectures.
Community¶
-
Google Group
Receive release announcements, migration notes, and design RFCs without following every issue.
-
Discourse
Ask which pipeline, model, or dataset fits a biological problem; share configs, request models, and discuss model components.