Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformer 多进程单卡下报错 #81

Open
ccmeteorljh opened this issue May 21, 2019 · 4 comments
Open

transformer 多进程单卡下报错 #81

ccmeteorljh opened this issue May 21, 2019 · 4 comments
Assignees

Comments

@ccmeteorljh
Copy link
Collaborator

https://github.com/PaddlePaddle/benchmark/blob/master/NeuralMachineTranslation/Transformer/fluid/train/train.py#L616

2019-05-21 09:29:28,729-INFO: Namespace(batch_size=4096, device='GPU', enable_ce=True, fetch_steps=100, local=True, opts=['dropout_seed', '10', 'learning_rate', '2.0', 'warmup_steps', '8000', 'beta2', '0.997', 'd_model', '512', 'd_inner_hid', '2048', 'n_head', '8', 'prepostprocess_dropout', '0.1', 'attention_dropout', '0.1', 'relu_dropout', '0.1', 'weight_sharing', 'True', 'pass_num', '1', 'model_dir', 'tmp_models', 'ckpt_dir', 'tmp_ckpts'], pool_size=200000, shuffle=False, shuffle_batch=False, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='data/vocab.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='data/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='data/vocab.bpe.32000', update_method='pserver', use_default_pe=False, use_mem_opt=True, use_py_reader=True, use_token_batch=True, val_file_pattern=None)
Traceback (most recent call last):
  File "train.py", line 784, in <module>
    train(args)
  File "train.py", line 641, in train
    dev_count = get_device_num()
  File "train.py", line 616, in get_device_num
    device_num = subprocess.check_output(['nvidia-smi','-L']).decode().count('\n')
NameError: global name 'subprocess' is not defined
@chengduoZH
Copy link
Contributor

@ccmeteorljh 为什么是多进程单卡?
没有设置环境变量(CUDA_VISIBLE_DEVICES)?

@ccmeteorljh
Copy link
Collaborator Author

ccmeteorljh commented May 22, 2019

@ccmeteorljh 为什么是多进程单卡?
没有设置环境变量(CUDA_VISIBLE_DEVICES)?

设置了,想试试多进程模式下单卡和单进程单卡下的速度对比如何,上面那个问题import一下就可以了

Traceback (most recent call last):
  File "train.py", line 785, in <module>
    train(args)
  File "train.py", line 703, in train
    token_num, predict, pyreader)
  File "train.py", line 534, in train_loop
    feed=feed_dict_list)
  File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 286, in run
    return_numpy=return_numpy)
  File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/executor.py", line 640, in run
    return_numpy=return_numpy)
  File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/executor.py", line 482, in _run_parallel
    "Feed a list of tensor, the list should be the same size as places"
ValueError: Feed a list of tensor, the list should be the same size as places

@QianShengWu
Copy link

设置了,想试试多进程模式下单卡和单进程单卡下的速度对比如何,上面那个问题import一下就可以了

老哥,你怎么解决的,求教,我也出现同样的问题

@chengduoZH
Copy link
Contributor

@QianShengWu 目前还不支持多进程单卡模式

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants