Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alternative data sources/sharing replication dataset? #30

Open
cwhittaker1000 opened this issue Jul 28, 2022 · 2 comments
Open

alternative data sources/sharing replication dataset? #30

cwhittaker1000 opened this issue Jul 28, 2022 · 2 comments

Comments

@cwhittaker1000
Copy link

Hi there! Congrats on this work, it's amazing! We're currently doing some work looking at epistatic effects, and are hoping to build on the incredible work you folks have done with PyR0.

So far, we haven't been able to get access to a data feed from GISAID - I saw some others had had similar issues and inquired about alternative data sources (#13) for running the model. Can you advise on what data we'd need to go down this route (and where to get it from) - any potential advice you could provide on how to modify the code would also be hugely appreciated!

If the above isn't viable, would it be possible for you to share the processed dataset used for the analyses in your Science paper so we can make progress on extending the code while we continue to work out access issues?

Thanks in advance and congrats again on some awesome work!

@Xiang-Leo
Copy link

Hi, I'm also curious about the GISAID data feed. No response even email to GISAID. According to the term of GISAID, authors can not share precessed dataset with you.
But there are alternative methods for data. One method is to download dataset from open source like nextstrain provided. If you have an account of GISAID, you can downloaded all sequences and correspondent metadata. Then preprocess these data with preprocess_gisaid.py.

@Jialu-Zuo
Copy link

@Xiang-Leo
Dear Xiang-Leo, I'm recently trying to replicate this work and the GISAID data is not available. Do you mean that we can download the data from https://nextstrain.org/ncov/open/global/all-time, change the neme of metadata and preprocess the data with preprocess_gisaid.py file? I tried to preprocess the data using preprocess_usher.py file and found that the number of the regions can be expanded to about only 300, extremely less than 1560 in the article. I'm wondering if the situation is the same when preprocessing the data from nextstrain with preprocess_gisaid.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants