Skip to main content
This page explains every field in the training section of the YAML config.
The SDK reads a single YAML file that contains both data_generation and training. run_training(config_path=...) reads the training section from that file.

File format

Your YAML file is a run config that includes top-level keys and a nested training section:
run_steps: ["data_generation", "training"]
pipeline_provider: "local"
metric_name: "custom"

training:
  metric: {}
  dataset: {}
  prompt_template: ""
  training: {}
  output: {}
  model: {}

Configuration structure

The training section has these parts:
  • metric: output type and class list (if multi-class)
  • dataset: where to load the dataset created by data generation
  • prompt_template: the exact format the model is trained to follow
  • training: hyperparameters and performance options
  • output: where to write artifacts + optional pushes (Hub/Luna Studio/object store)
  • model: base model + LoRA settings + model download options

metric

FieldTypeRequiredDefault (base config)Notes
metric.typestringYes""One of boolean, multi_class.
metric.classeslist[string] | nullConditionally[]Required when type is multi_class. Order defines the class index mapping.

dataset

FieldTypeRequiredDefault (base config)Notes
dataset.namestringYes""For Hub datasets: org/repo. For local: typically the generated dataset folder name.
dataset.splitstringYes"train"Training split.
dataset.test_split_namestringYes"test"Holdout split.
dataset.train_label_columnstringYes"label"Label column for training.
dataset.test_label_columnstringYes"label"Label column for evaluation.
dataset.localboolYesfalseWhen true, load from disk using dataset.name as a path-like identifier.
dataset.local_pathstringYes""Base path where local datasets exist/extract to.
dataset.pull_from_object_storeboolNofalseDownload dataset from object store before training.
dataset.object_store_bucketstring | nullIf object store"${LUNA_OBJECT_STORE_BUCKET:-luna-ft}"Required when pull_from_object_store is true.
dataset.object_store_blobstring | nullIf object store"generated_data"Required when pull_from_object_store is true.

Point training at the generated dataset

run_data_generation(...) returns a dataset path/name. After generation completes, set:
  • training.dataset.name to that value, then run training.

prompt_template

prompt_template is required.
  • It must reference your dataset columns using {variable} placeholders.
  • For boolean metrics, the model should respond with "true" or "false".
  • For multi-class metrics, the model should respond with a single class-key token ("0", "1", …).
Tip: your template variables are validated against dataset columns during preflight.

training (hyperparameters)

FieldTypeRequiredDefault (base config)
num_train_epochsintYes5
per_device_train_batch_sizeintYes1
gradient_accumulation_stepsintYes8
learning_rate_multiplierfloatYes1.0
max_seq_lengthintYes4096
warmup_stepsintYes30
weight_decayfloatYes0.01
max_grad_normfloatYes1.0
logging_stepsintYes40
torch_compileboolYesfalse
fp16boolYesfalse
bf16boolYestrue
optimstringYes"adamw_8bit"
lr_scheduler_typestringYes"linear"
seedintYes3407
use_wandbboolYestrue

output

FieldTypeRequiredDefault (base config)Notes
output.model_namestringYes""Artifact folder name and (optionally) Hub model name.
output.local_pathstringYes"data/"Root directory for artifacts.
output.push_to_hubboolYestrueIf true, pushes model artifacts to Hugging Face. Requires HF_TOKEN when enabled.
output.push_to_object_storeboolNofalseUpload artifacts tarball to the configured object store.
output.object_store_bucketstring | nullIf object store"${LUNA_OBJECT_STORE_BUCKET:-luna-ft}"Required when push_to_object_store is true.
output.object_store_blobstring | nullIf object store"finetuned_models"Required when push_to_object_store is true.

Environment variables for pushes

  • Hugging Face: HF_TOKEN (required when push_to_hub: true)
  • W&B: WANDB_API_KEY (optional; warning if missing when use_wandb: true)

model

FieldTypeRequiredDefault (base config)Notes
model.namestringYes"meta-llama/Llama-3.2-3B-Instruct"Base model repo id.
model.peft_model_mapmap[string,string]No{}Map base-model → LoRA weights repo id.
model.use_unsloth_tokenizerboolYestrueUse Unsloth Tokenizer when available.
model.lora_rintYes16LoRA rank.
model.lora_alphaintYes16LoRA alpha.
model.pull_model_from_object_storeboolNofalseDownload base model + adapters from object store.
model.model_object_store_bucketstring | nullIf object store"${LUNA_OBJECT_STORE_BUCKET:-luna-ft}"Required when pull_model_from_object_store is true.
model.model_object_store_blob_basestring | nullIf object store"base_models"Base model location.
model.model_object_store_blob_adaptersstring | nullIf object store"adapters"Adapters location.
model.model_local_pathstring | nullNo""Local directory for models (also used for local-only runs).