We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi! I am trying to download the crawl split 2023-50. I am running the command python -m cc_net --dump 2023-50, which raises the following error:
python -m cc_net --dump 2023-50
Will run cc_net.mine.main with the following config: Config(config_name='base', dump='2023-50', output_dir=PosixPath('data'), mined_dir='mined', execution='auto', num_shards=1600, min_shard=-1, num_segments_per_shard=-1, metadata=None, min_len=300, hash_in_mem=50, lang_whitelist=[], lang_blacklist=[], lang_threshold=0.5, keep_bucket=[], lm_dir=PosixPath('data/lm_sp'), cutoff=PosixPath('/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/data/cutoff.csv'), lm_languages=None, mine_num_processes=16, target_size='4G', cleanup_after_regroup=False, task_parallelism=-1, pipeline=['dedup', 'lid', 'keep_lang', 'sp', 'lm', 'pp_bucket', 'drop', 'split_by_lang'], experiments=[], cache_dir=None) Submitting _hashes_shard in a job array (1600 jobs) Traceback (most recent call last): File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 18, in <module> main() File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 14, in main func_argparse.parse_and_call(cc_net.mine.get_main_parser()) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/func_argparse/__init__.py", line 72, in parse_and_call return command(**parsed_args) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 638, in main all_files = mine(conf) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 340, in mine hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem)) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 265, in hashes ex(_hashes_shard, repeat(conf), *_transpose(missing_outputs)) File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/execution.py", line 106, in map_array_and_wait jobs = ex.map_array(function, *args) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 771, in map_array return self._internal_process_submissions(submissions) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions return self._executor._internal_process_submissions(delayed_submissions) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions array_ex.update_parameters(**self.parameters) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 810, in update_parameters self._internal_update_parameters(**kwargs) File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 306, in _internal_update_parameters raise ValueError( ValueError: Unavailable parameter(s): ['slurm_time'] Valid parameters are: - account (default: None) - additional_parameters (default: None) - array_parallelism (default: 256) - comment (default: None) - constraint (default: None) - cpus_per_gpu (default: None) - cpus_per_task (default: None) - dependency (default: None) - exclude (default: None) - exclusive (default: None) - gpus_per_node (default: None) - gpus_per_task (default: None) - gres (default: None) - job_name (default: 'submitit') - mail_type (default: None) - mail_user (default: None) - mem (default: None) - mem_per_cpu (default: None) - mem_per_gpu (default: None) - nodelist (default: None) - nodes (default: 1) - ntasks_per_node (default: None) - num_gpus (default: None) - partition (default: None) - qos (default: None) - setup (default: None) - signal_delay_s (default: 90) - srun_args (default: None) - stderr_to_stdout (default: False) - time (default: 5) - use_srun (default: True) - wckey (default: 'submitit')
Can someone please help me solve the problem? Thanks!
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hi! I am trying to download the crawl split 2023-50. I am running the command
python -m cc_net --dump 2023-50
, which raises the following error:Can someone please help me solve the problem? Thanks!
The text was updated successfully, but these errors were encountered: