You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a tracking issue for dataloader improvements. The current support is very basic and we likely need to make some bigger changes to make this more efficient
track dataloader step counts on a per replica_id basis
add mechanism for reinstantiating dataloader from checkpoint and fast forwarding to the correct step count
throw this all out and use a deterministic index managed by Lighthouse?
The text was updated successfully, but these errors were encountered:
we do have a flag “snapshot_every_n_steps” that will only update the checkpoints every say 10 steps, and then there’s a counter in there so if you request checkpoint at step 15, it will load the snapshot from step 10 and then throw away 5 batches to recover the state
This is a tracking issue for dataloader improvements. The current support is very basic and we likely need to make some bigger changes to make this more efficient
The text was updated successfully, but these errors were encountered: