How to get parallel dataset from already shared raw tokenized data ? #42

himanshu034 · 2021-07-21T09:17:08Z

Hi I have looked into the raw tokenized parallel data which is in .tok format. Downloaded the same from https://dl.fbaipublicfiles.com/transcoder/TransCoder_tokenized_test_set_functions.zip . Seems the same methods are written into all 3 language C++, Python and Java. I need to know the generation process of binarized .pth files like "python_sa-cpp_sa-python_sa","cpp_sa-python_sa-cpp_sa"..
Please help. Any help would be much appreciated.

malachaux · 2021-07-28T09:41:34Z

This repo is now deprecated. Please now refer to our new repository https://github.com/facebookresearch/CodeGen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get parallel dataset from already shared raw tokenized data ? #42

How to get parallel dataset from already shared raw tokenized data ? #42

himanshu034 commented Jul 21, 2021

malachaux commented Jul 28, 2021

How to get parallel dataset from already shared raw tokenized data ? #42

How to get parallel dataset from already shared raw tokenized data ? #42

Comments

himanshu034 commented Jul 21, 2021

malachaux commented Jul 28, 2021