← Back to Playbook

Dataset Management for Training

Complexity: Intermediate Plus Last updated: 2025-11-20

Dataset Management for Training

Manage training datasets across your infrastructure. The agent handles data staging, versioning, and distribution to training nodes.

What you'll accomplish

  • Stage datasets from cloud storage to training hosts
  • Track dataset versions and lineage
  • Distribute data shards across nodes for distributed training
  • Clean up stale datasets to free storage

Getting started

Configure your data sources in Skyportal, then ask SARA to prepare datasets for your next training run.