Fail to run train.lua #30

shuait · 2017-04-24T03:54:25Z

This is maybe a trivial question but I'm completely new to torch, I tried to search on Google but no luck. I'm working with a Ubuntu 14.04 machine, cuda 7.0 and cudnn R4 version. I prepared all training files and when running train.lua it gives me this error:

{
input_img_train_h5 : "data/vqa_data_img_vgg_train.h5"
learning_rate_decay_every : 300
optim : "rmsprop"
hidden_size : 512
optim_epsilon : 1e-08
output_size : 1000
rnn_layers : 2
input_img_test_h5 : "data/vqa_data_img_vgg_test.h5"
losses_log_every : 600
id : "0"
input_ques_h5 : "data/vqa_data_prepro.h5"
learning_rate_decay_start : 0
start_from : ""
gpuid : 6
seed : 123
input_json : "data/vqa_data_prepro.json"
optim_beta : 0.995
batch_size : 20
iterPerEpoch : 1200
rnn_size : 512
max_iters : -1
checkpoint_path : "save/train_vgg"
save_checkpoint_every : 6000
learning_rate : 0.0004
co_atten_type : "Alternating"
feature_type : "VGG"
backend : "cudnn"
optim_alpha : 0.99
}
DataLoader loading h5 image file: data/vqa_data_img_vgg_train.h5
DataLoader loading h5 image file: data/vqa_data_img_vgg_test.h5
DataLoader loading h5 question file: data/vqa_data_prepro.h5
DataLoader loading json file: data/vqa_data_prepro.json
assigned 215375 images to split 0
assigned 121512 images to split 2
Building the model...
total number of parameters in word_level: 8031747
total number of parameters in phrase_level: 2889219
total number of parameters in ques_level: 5517315
constructing clones inside the ques_level
total number of parameters in recursive_attention: 2862056
/home/raamac/torch/install/bin/luajit: ./misc/word_level.lua:86: the class torch.CudaByteTensor cannot be indexed
stack traceback:
[C]: in function '__newindex'
./misc/word_level.lua:86: in function 'forward'
train.lua:253: in function 'lossFun'
train.lua:310: in main chunk
[C]: in function 'dofile'
...amac/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

haibin894609937 · 2017-05-07T15:37:18Z

I have the same error , have you solved the problem?

shuait · 2017-05-09T13:29:11Z

@haibin894609937 I still has no clue what caused the problem, tried to reinstall torch and that failed too.

haibin894609937 · 2017-05-09T13:41:57Z

I run it on centos7 cuda8.0 来自魅族 PRO 5

…

-------- 原始邮件 -------- 发件人：Shuai Tang <[email protected]> 时间：周二 5月9日 21:29 收件人：jiasenlu/HieCoAttenVQA <[email protected]> 抄送：haibin894609937 <[email protected]>,Mention <[email protected]> 主题：Re: [jiasenlu/HieCoAttenVQA] Fail to run train.lua (#30) @haibin894609937<https://github.com/haibin894609937> I still has no clue what caused the problem, tried to reinstall torch and that failed too. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#30 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGon7qYRZuyyGl2-8czEbOf2NMIGJ5OFks5r4GongaJpZM4NFrCe>.

Jhhuangkay · 2017-07-18T09:01:45Z

You guys are working on VQA dataset, right?
If yes, I guess the problem is on your vqa_data_prepro.json and vqa_data_prepro.h5.
You can try to use other dataset the author provided, cocoqa.
If you replace the above two files by cocoqa_data_prepro.json and cocoqa_data_prepro.h5, all the code should run well. When I replace those two files, everything works well.
So, you also can try this, then you will know the problem is the generation of prepro files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to run train.lua #30

Fail to run train.lua #30

shuait commented Apr 24, 2017

haibin894609937 commented May 7, 2017

shuait commented May 9, 2017

haibin894609937 commented May 9, 2017 via email

Jhhuangkay commented Jul 18, 2017

Fail to run train.lua #30

Fail to run train.lua #30

Comments

shuait commented Apr 24, 2017

haibin894609937 commented May 7, 2017

shuait commented May 9, 2017

haibin894609937 commented May 9, 2017 via email

Jhhuangkay commented Jul 18, 2017