It is currently quite hard to resume training from a checkpoint - you need to hardcode a specific checkpoint path.
It is also buggy:
- Training will overwrite itself with checkpoints starting counting iteration from 0, so you'd need to use a different path
- wandb will also overwrite old snapshots with new snapshots