Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to run train.lua #30

Open
shuait opened this issue Apr 24, 2017 · 4 comments
Open

Fail to run train.lua #30

shuait opened this issue Apr 24, 2017 · 4 comments

Comments

@shuait
Copy link

shuait commented Apr 24, 2017

This is maybe a trivial question but I'm completely new to torch, I tried to search on Google but no luck. I'm working with a Ubuntu 14.04 machine, cuda 7.0 and cudnn R4 version. I prepared all training files and when running train.lua it gives me this error:

{
input_img_train_h5 : "data/vqa_data_img_vgg_train.h5"
learning_rate_decay_every : 300
optim : "rmsprop"
hidden_size : 512
optim_epsilon : 1e-08
output_size : 1000
rnn_layers : 2
input_img_test_h5 : "data/vqa_data_img_vgg_test.h5"
losses_log_every : 600
id : "0"
input_ques_h5 : "data/vqa_data_prepro.h5"
learning_rate_decay_start : 0
start_from : ""
gpuid : 6
seed : 123
input_json : "data/vqa_data_prepro.json"
optim_beta : 0.995
batch_size : 20
iterPerEpoch : 1200
rnn_size : 512
max_iters : -1
checkpoint_path : "save/train_vgg"
save_checkpoint_every : 6000
learning_rate : 0.0004
co_atten_type : "Alternating"
feature_type : "VGG"
backend : "cudnn"
optim_alpha : 0.99
}
DataLoader loading h5 image file: data/vqa_data_img_vgg_train.h5
DataLoader loading h5 image file: data/vqa_data_img_vgg_test.h5
DataLoader loading h5 question file: data/vqa_data_prepro.h5
DataLoader loading json file: data/vqa_data_prepro.json
assigned 215375 images to split 0
assigned 121512 images to split 2
Building the model...
total number of parameters in word_level: 8031747
total number of parameters in phrase_level: 2889219
total number of parameters in ques_level: 5517315
constructing clones inside the ques_level
total number of parameters in recursive_attention: 2862056
/home/raamac/torch/install/bin/luajit: ./misc/word_level.lua:86: the class torch.CudaByteTensor cannot be indexed
stack traceback:
[C]: in function '__newindex'
./misc/word_level.lua:86: in function 'forward'
train.lua:253: in function 'lossFun'
train.lua:310: in main chunk
[C]: in function 'dofile'
...amac/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

@haibin894609937
Copy link

I have the same error , have you solved the problem?

@shuait
Copy link
Author

shuait commented May 9, 2017

@haibin894609937 I still has no clue what caused the problem, tried to reinstall torch and that failed too.

@haibin894609937
Copy link

haibin894609937 commented May 9, 2017 via email

@Jhhuangkay
Copy link

You guys are working on VQA dataset, right?
If yes, I guess the problem is on your vqa_data_prepro.json and vqa_data_prepro.h5.
You can try to use other dataset the author provided, cocoqa.
If you replace the above two files by cocoqa_data_prepro.json and cocoqa_data_prepro.h5, all the code should run well. When I replace those two files, everything works well.
So, you also can try this, then you will know the problem is the generation of prepro files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants