transformer 多进程单卡下报错 #81

ccmeteorljh · 2019-05-21T11:07:55Z

https://github.com/PaddlePaddle/benchmark/blob/master/NeuralMachineTranslation/Transformer/fluid/train/train.py#L616

2019-05-21 09:29:28,729-INFO: Namespace(batch_size=4096, device='GPU', enable_ce=True, fetch_steps=100, local=True, opts=['dropout_seed', '10', 'learning_rate', '2.0', 'warmup_steps', '8000', 'beta2', '0.997', 'd_model', '512', 'd_inner_hid', '2048', 'n_head', '8', 'prepostprocess_dropout', '0.1', 'attention_dropout', '0.1', 'relu_dropout', '0.1', 'weight_sharing', 'True', 'pass_num', '1', 'model_dir', 'tmp_models', 'ckpt_dir', 'tmp_ckpts'], pool_size=200000, shuffle=False, shuffle_batch=False, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='data/vocab.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='data/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='data/vocab.bpe.32000', update_method='pserver', use_default_pe=False, use_mem_opt=True, use_py_reader=True, use_token_batch=True, val_file_pattern=None)
Traceback (most recent call last):
  File "train.py", line 784, in <module>
    train(args)
  File "train.py", line 641, in train
    dev_count = get_device_num()
  File "train.py", line 616, in get_device_num
    device_num = subprocess.check_output(['nvidia-smi','-L']).decode().count('\n')
NameError: global name 'subprocess' is not defined

chengduoZH · 2019-05-21T23:56:22Z

@ccmeteorljh 为什么是多进程单卡？
没有设置环境变量（CUDA_VISIBLE_DEVICES）？

ccmeteorljh · 2019-05-22T02:44:43Z

@ccmeteorljh 为什么是多进程单卡？
没有设置环境变量（CUDA_VISIBLE_DEVICES）？

设置了，想试试多进程模式下单卡和单进程单卡下的速度对比如何，上面那个问题import一下就可以了

Traceback (most recent call last):
  File "train.py", line 785, in <module>
    train(args)
  File "train.py", line 703, in train
    token_num, predict, pyreader)
  File "train.py", line 534, in train_loop
    feed=feed_dict_list)
  File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 286, in run
    return_numpy=return_numpy)
  File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/executor.py", line 640, in run
    return_numpy=return_numpy)
  File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/executor.py", line 482, in _run_parallel
    "Feed a list of tensor, the list should be the same size as places"
ValueError: Feed a list of tensor, the list should be the same size as places

QianShengWu · 2019-08-13T09:24:22Z

设置了，想试试多进程模式下单卡和单进程单卡下的速度对比如何，上面那个问题import一下就可以了

老哥，你怎么解决的，求教，我也出现同样的问题

chengduoZH · 2019-08-14T01:22:29Z

@QianShengWu 目前还不支持多进程单卡模式

ccmeteorljh assigned chengduoZH May 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformer 多进程单卡下报错 #81

transformer 多进程单卡下报错 #81

ccmeteorljh commented May 21, 2019

chengduoZH commented May 21, 2019

ccmeteorljh commented May 22, 2019 •

edited

Loading

QianShengWu commented Aug 13, 2019

chengduoZH commented Aug 14, 2019

transformer 多进程单卡下报错 #81

transformer 多进程单卡下报错 #81

Comments

ccmeteorljh commented May 21, 2019

chengduoZH commented May 21, 2019

ccmeteorljh commented May 22, 2019 • edited Loading

QianShengWu commented Aug 13, 2019

chengduoZH commented Aug 14, 2019

ccmeteorljh commented May 22, 2019 •

edited

Loading