Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unavailable Parameters #102

Open
zhentingqi opened this issue Feb 5, 2024 · 0 comments
Open

Unavailable Parameters #102

zhentingqi opened this issue Feb 5, 2024 · 0 comments

Comments

@zhentingqi
Copy link

Hi! I am trying to download the crawl split 2023-50. I am running the command python -m cc_net --dump 2023-50, which raises the following error:

Will run cc_net.mine.main with the following config: Config(config_name='base', dump='2023-50', output_dir=PosixPath('data'), mined_dir='mined', execution='auto', num_shards=1600, min_shard=-1, num_segments_per_shard=-1, metadata=None, min_len=300, hash_in_mem=50, lang_whitelist=[], lang_blacklist=[], lang_threshold=0.5, keep_bucket=[], lm_dir=PosixPath('data/lm_sp'), cutoff=PosixPath('/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/data/cutoff.csv'), lm_languages=None, mine_num_processes=16, target_size='4G', cleanup_after_regroup=False, task_parallelism=-1, pipeline=['dedup', 'lid', 'keep_lang', 'sp', 'lm', 'pp_bucket', 'drop', 'split_by_lang'], experiments=[], cache_dir=None)
Submitting _hashes_shard in a job array (1600 jobs)
Traceback (most recent call last):
  File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 18, in <module>
    main()
  File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 14, in main
    func_argparse.parse_and_call(cc_net.mine.get_main_parser())
  File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/func_argparse/__init__.py", line 72, in parse_and_call
    return command(**parsed_args)
  File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 638, in main
    all_files = mine(conf)
  File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 340, in mine
    hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem))
  File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 265, in hashes
    ex(_hashes_shard, repeat(conf), *_transpose(missing_outputs))
  File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/execution.py", line 106, in map_array_and_wait
    jobs = ex.map_array(function, *args)
  File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 771, in map_array
    return self._internal_process_submissions(submissions)
  File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions
    return self._executor._internal_process_submissions(delayed_submissions)
  File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions
    array_ex.update_parameters(**self.parameters)
  File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 810, in update_parameters
    self._internal_update_parameters(**kwargs)
  File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 306, in _internal_update_parameters
    raise ValueError(
ValueError: Unavailable parameter(s): ['slurm_time']
Valid parameters are:
  - account (default: None)
  - additional_parameters (default: None)
  - array_parallelism (default: 256)
  - comment (default: None)
  - constraint (default: None)
  - cpus_per_gpu (default: None)
  - cpus_per_task (default: None)
  - dependency (default: None)
  - exclude (default: None)
  - exclusive (default: None)
  - gpus_per_node (default: None)
  - gpus_per_task (default: None)
  - gres (default: None)
  - job_name (default: 'submitit')
  - mail_type (default: None)
  - mail_user (default: None)
  - mem (default: None)
  - mem_per_cpu (default: None)
  - mem_per_gpu (default: None)
  - nodelist (default: None)
  - nodes (default: 1)
  - ntasks_per_node (default: None)
  - num_gpus (default: None)
  - partition (default: None)
  - qos (default: None)
  - setup (default: None)
  - signal_delay_s (default: 90)
  - srun_args (default: None)
  - stderr_to_stdout (default: False)
  - time (default: 5)
  - use_srun (default: True)
  - wckey (default: 'submitit')

Can someone please help me solve the problem? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant