Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

French dataset #67

Open
taminhtoan2601 opened this issue Mar 24, 2021 · 2 comments
Open

French dataset #67

taminhtoan2601 opened this issue Mar 24, 2021 · 2 comments

Comments

@taminhtoan2601
Copy link

Hello,

Can I build my own french dataset for some keywords using download_mfa.sh and generate_dataset.sh like "Preparing a Dataset" ? If yes, can you explain me some tips for that?

Have a good day! Thank you so much!
Minh Toan.

@ljj7975
Copy link
Member

ljj7975 commented Apr 1, 2021

Will you be using French version of Common voice?
if so, I think it can possibly work.

You will need to find the right french dictionary and replace this line
https://github.com/castorini/howl/blob/master/download_mfa.sh#L14

The dictionary you found must be working with the mfa as well

@ljj7975
Copy link
Member

ljj7975 commented Apr 3, 2021

it's not merged yet but I recommend using vocab_stitching branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants