Replies: 5 comments 1 reply
-
BTW, I have a rough version of a patch to fix this. Basically, instead of opening the file using Python, we just open the file and store the hashmap in C++, using Anyway, the reason I think this is a problem is that I'm not sure how to train on data sitting in HDFS otherwise. I had an example use case doing transfer learning, where the featurized output would be saved as indexed recordio, so that you can use the gluon data loader for training the final layers. It would be great if these records could be stored on HDFS. |
Beta Was this translation helpful? Give feedback.
-
Same question there. Seems MXNet has abandoned HDFS support in one way or another. |
Beta Was this translation helpful? Give feedback.
-
cc @zhreshold |
Beta Was this translation helpful? Give feedback.
-
HDFS seems to be optionally enabled: https://github.com/apache/incubator-mxnet/blob/master/make/config.mk#L175 Have you tested the build with |
Beta Was this translation helpful? Give feedback.
-
I did, that’s why RecordIO works with HDFS. But this issue is concerned about IndexRecordIO. |
Beta Was this translation helpful? Give feedback.
-
Description
MXIndexedRecordIO doesn't work with HDFS. This is because the indexed file is assumed to be a local file, and it is opened and parsed in python using
open(self.idx_path, self.flag)
which won't work when and HDFS path is provided.One consequence of this is that in gluon, you cannot use the
RecordFileDataset
since it requiresMXIndexedRecordIO
.Environment info (Required)
Package used (Python/R/Scala/Julia):
Python
Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):
MXNet commit hash:
ccb08fb
Minimum reproducible example
stack trace:
Note that:
python -c "import mxnet as mx; record = mx.recordio.MXRecordIO('hdfs:///tmp/data.rec', 'w')"
works fine.Beta Was this translation helpful? Give feedback.
All reactions