-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set epoch_seed_change
attribute on SimulationDataset
#840
base: main
Are you sure you want to change the base?
Set epoch_seed_change
attribute on SimulationDataset
#840
Conversation
c78a1c7
to
64cd184
Compare
Hey @srstevenson, we actually don't want to call StreamingDataset's constructor here -- SimulationDataset is meant to run on single process unlike StreamingDataset so there's some logic that's different between the two init methods. The |
Thanks, @snarayan21, that makes sense. I've marked this PR as draft for now, and will update it to set |
This was added to the `StreamingDataset` which the `SimulationDataset` inherits, so also needed to be added here. Without this, the code attempts to access the missing attribute when running a simulation: ``` AttributeError: 'SimulationDataset' object has no attribute 'epoch_seed_change' Traceback: File "/home/scott/projects/streaming/.venv/lib64/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling result = func() ^^^^^^ File "/home/scott/projects/streaming/.venv/lib64/python3.12/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 579, in code_to_exec exec(code, module.__dict__) File "/home/scott/projects/streaming/simulation/interfaces/sim_ui.py", line 409, in <module> submit_jobs(shuffle_quality, dataset, time_per_sample, node_internet_bandwidth, File "/home/scott/projects/streaming/simulation/interfaces/sim_ui.py", line 110, in submit_jobs for output in gen_sim: ^^^^^^^ File "/home/scott/projects/streaming/simulation/core/main.py", line 110, in simulate samples_per_node = dataset.get_samples_per_node(epoch, 0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/scott/projects/streaming/simulation/core/sim_dataset.py", line 367, in get_samples_per_node partition = generate_work(self.batching_method, self, self.world, epoch, sample_in_epoch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/scott/projects/streaming/streaming/base/batching/__init__.py", line 45, in generate_work return get(dataset, world, epoch, sample_in_epoch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/scott/projects/streaming/streaming/base/batching/random.py", line 49, in generate_work_random_batching shuffle_units, small_per_big = dataset.resample_streams(epoch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/scott/projects/streaming/streaming/base/dataset.py", line 878, in resample_streams epoch, self.epoch_seed_change) ^^^^^^^^^^^^^^^^^^^^^^ ``` Closes mosaicml#831
64cd184
to
148157b
Compare
epoch_seed_change
attribute on SimulationDataset
@snarayan21 I've updated this to just set |
Set
epoch_seed_change
attribute onSimulationDataset
This was added to the
StreamingDataset
which theSimulationDataset
inherits, so also needed to be added here. Without this, the code attempts to access the missing attribute when running a simulation:Closes #831