where is _round_robin_process_groups? #1232

jiangxiluning · 2020-01-19T16:21:25Z

Line 42 in 7d38355

_round_robin_process_group = dist_c10d._round_robin_process_groups(

mwu1993 · 2020-01-21T19:04:11Z

diff was 151be72, either @chenyangyu1988 or the pytorch docs might be able to help further.

jiangxiluning · 2020-01-24T06:16:57Z

 _round_robin_process_group = dist_c10d._round_robin_process_groups

my pytorch is 1.3.1, but I cannnot fund _round_robin_process_groups method.
@mwu1993

chenyangyu1988 · 2020-01-24T22:20:23Z

@jiangxiluning https://github.com/facebookresearch/pytext/blob/9f705f44a05f6b58b4b58d0327a1ef57f94551b5/pytext/utils/distributed.py

jiangxiluning · 2020-01-25T09:20:08Z

@chenyangyu1988 you actually import '_round_robin_process_groups' from torch, but indeed it dosenot exist.

(pytext-nlp) luning@luning-mate:~/dev/tools/pytext$ pytext train < demo/configs/distributed_docnn.json 
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
No config file specified, reading from stdin
WARNING - Applying old config adapter for version=0. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=1. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=2. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=3. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=4. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=5. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=6. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=7. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=8. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=9. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=10. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=11. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=12. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=13. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=14. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=15. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=16. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=17. Please consider migrating your old configs to the latest version.
WARNING - Applying old config adapter for version=18. Please consider migrating your old configs to the latest version.

===Starting training...

=== Starting training, World size is 2
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

Parameters: PyTextConfig:
    auto_resume_from_snapshot: False
    debug_path: /tmp/model.debug
    distributed_world_size: 2
    export_caffe2_path: None
    export_onnx_path: /tmp/model.onnx
    export_torchscript_path: None
    gpu_streams_for_distributed_training: 1
    include_dirs: None
    load_snapshot_path: 
    modules_save_dir: 
    random_seed: None
    report_eval_results: False
    save_all_checkpoints: False
    save_module_checkpoints: False
    save_snapshot_path: /tmp/model.pt
    task: DocumentClassificationTask.Config:
        data: Data.Config:
            batcher: PoolingBatcher.Config:
                eval_batch_size: 16
                num_shuffled_pools: 1
                pool_num_batches: 10000
                test_batch_size: 16
                train_batch_size: 16
            in_memory: True
            sort_key: None
            source: TSVDataSource.Config:
                column_mapping: {}
                delimiter: 	
                drop_incomplete_rows: False
                eval_filename: base_dir/test_tiny.tsv
                field_names: ['text', 'doc_label']
                quoted: False
                test_filename: base_dir/test_tiny.tsv
                train_filename: base_dir/train_tiny.tsv
        metric_reporter: ClassificationMetricReporter.Config:
            additional_column_names: []
            model_select_metric: ComparableClassificationMetric.ACCURACY
            output_path: /tmp/test_out.txt
            pep_format: False
            recall_at_precision_thresholds: [0.2, 0.4, 0.6, 0.8, 0.9]
            target_label: None
            text_column_names: ['text']
        model: DocModel.Config:
            decoder: MLPDecoder.Config:
                activation: Activation.RELU
                dropout: 0.0
                freeze: False
                hidden_dims: []
                layer_norm: False
                load_path: None
                out_dim: None
                save_path: None
                shared_module_key: None
            embedding: WordEmbedding.Config:
                cpu_only: False
                delimiter:  
                embed_dim: 100
                embedding_init_range: None
                embedding_init_strategy: EmbedInitStrategy.RANDOM
                export_input_names: ['tokens_vals']
                freeze: False
                load_path: None
                lowercase_tokens: True
                min_freq: 1
                mlp_layer_dims: []
                padding_idx: None
                pretrained_embeddings_path: 
                save_path: None
                shared_module_key: None
                skip_header: True
                vocab_file: 
                vocab_from_all_data: False
                vocab_from_pretrained_embeddings: False
                vocab_from_train_data: True
                vocab_size: 0
            inputs: ModelInput:
                dense: None
                labels: LabelTensorizer.Config:
                    allow_unknown: False
                    column: doc_label
                    is_input: False
                    label_vocab: None
                    pad_in_vocab: False
                tokens: TokenTensorizer.Config:
                    add_bos_token: False
                    add_eos_token: False
                    column: text
                    is_input: True
                    max_seq_len: None
                    tokenizer: Tokenizer.Config:
                        lowercase: True
                        split_regex: \s+
                    use_eos_token_for_bos: False
                    vocab: VocabConfig:
                        build_from_data: True
                        size_from_data: 0
                        vocab_files: []
                    vocab_file_delimiter:  
            output_layer: ClassificationOutputLayer.Config:
                freeze: False
                label_weights: None
                load_path: None
                loss: CrossEntropyLoss.Config:
                save_path: None
                shared_module_key: None
            representation: BiLSTMDocAttention.Config:
                dropout: 0.4
                freeze: False
                load_path: None
                lstm: BiLSTM.Config:
                    bidirectional: True
                    dropout: 0.4
                    freeze: False
                    load_path: None
                    lstm_dim: 32
                    num_layers: 1
                    pack_sequence: True
                    save_path: None
                    shared_module_key: None
                mlp_decoder: None
                pooling: SelfAttention.Config:
                    attn_dimension: 64
                    dropout: 0.4
                save_path: None
                shared_module_key: None
        trainer: TaskTrainer.Config:
            do_eval: True
            early_stop_after: 0
            epochs: 10
            fp16_args: FP16OptimizerFairseq.Config:
                init_loss_scale: 128
                min_loss_scale: 0.0001
                scale_tolerance: 0.0
                scale_window: None
                threshold_loss_scale: None
            load_best_model_after_train: True
            max_clip_norm: None
            num_accumulated_batches: 1
            num_batches_per_epoch: None
            num_samples_to_log_progress: 1000
            optimizer: Adam.Config:
                eps: 1e-08
                lr: 0.001
                weight_decay: 1e-05
            report_train_metrics: True
            scheduler: None
            sparsifier: None
            target_time_limit_seconds: None
    test_out_path: /tmp/test_out.txt
    torchscript_quantize: False
    use_config_from_snapshot: True
    use_cuda_for_testing: True
    use_cuda_if_available: True
    use_deterministic_cudnn: False
    use_fp16: False
    use_tensorboard: True
    version: 19


        # for debug of GPU
        use_cuda_if_available: True
        device_id: 0
        world_size: 2
        torch.cuda.is_available(): True
        cuda.CUDA_ENABLED: True
        cuda.DISTRIBUTED_WORLD_SIZE: 2
        
# for debug of FP16: fp16_enabled=False
Traceback (most recent call last):
  File "/home/luning/.pyenv/versions/pytext-nlp/bin/pytext", line 8, in <module>
    sys.exit(main())
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/pytext/main.py", line 369, in train
    train_model_distributed(config, metric_channels)
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/pytext/main.py", line 91, in train_model_distributed
    config.distributed_world_size,
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/pytext/main.py", line 114, in run_single
    metadata=metadata,
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/pytext/workflow.py", line 101, in train_model
    config, dist_init_url, device_id, rank, world_size, metric_channels, metadata
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/pytext/workflow.py", line 130, in prepare_task
    config.gpu_streams_for_distributed_training,
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/pytext/workflow.py", line 73, in _set_distributed
    rank, world_size, dist_init_url, device_id, gpu_streams=gpu_streams
  File "/home/luning/.pyenv/versions/3.7.5/envs/pytext-nlp/lib/python3.7/site-packages/pytext/utils/distributed.py", line 42, in dist_init
    _round_robin_process_group = dist_c10d._round_robin_process_groups(
AttributeError: module 'torch.distributed' has no attribute '_round_robin_process_groups'

pietern · 2020-01-28T06:41:34Z

It's available on unstable. If PyText supports 1.3.1, it should gracefully degrade if it can't find the function.

jiangxiluning · 2020-01-28T08:42:48Z

so which pytorch version should I use ? @pietern @chenyangyu1988
I cannot find this method in pytorch's master branch also.

pietern · 2020-01-28T09:16:39Z

It's in the nightly releases and perhaps also in 1.4 (not sure).

See pytorch/pytorch@0282c5a for the commit.

jiangxiluning · 2020-01-28T10:21:20Z

@pietern 1.4.0 does not have it. How could pytext release include this unstable feature ??? And even make any suggestion ?

jiangxiluning changed the title ~~whese is _round_robin_process_groups?~~ where is _round_robin_process_groups? Jan 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

where is _round_robin_process_groups? #1232

where is _round_robin_process_groups? #1232

jiangxiluning commented Jan 19, 2020

mwu1993 commented Jan 21, 2020

jiangxiluning commented Jan 24, 2020

chenyangyu1988 commented Jan 24, 2020

jiangxiluning commented Jan 25, 2020

pietern commented Jan 28, 2020

jiangxiluning commented Jan 28, 2020

pietern commented Jan 28, 2020

jiangxiluning commented Jan 28, 2020

where is _round_robin_process_groups? #1232

where is _round_robin_process_groups? #1232

Comments

jiangxiluning commented Jan 19, 2020

mwu1993 commented Jan 21, 2020

jiangxiluning commented Jan 24, 2020

chenyangyu1988 commented Jan 24, 2020

jiangxiluning commented Jan 25, 2020

pietern commented Jan 28, 2020

jiangxiluning commented Jan 28, 2020

pietern commented Jan 28, 2020

jiangxiluning commented Jan 28, 2020