Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full_dump.csv has inconsistent and sometimes missing data #33

Open
cmirzayi opened this issue Aug 22, 2024 · 7 comments
Open

full_dump.csv has inconsistent and sometimes missing data #33

cmirzayi opened this issue Aug 22, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@cmirzayi
Copy link

I imported BugSigDB data into R using bugsigdbr last week and had ~3,900 rows of data. Running the same import function again this morning results in 1,701 rows of data.

Checking the raw exports and full_dump.csv is 1,703 rows and it's missing a ton of signatures from BugSigDB so the issue is not with the import function but seems to be with how these data exports are being generated here.

@cmirzayi cmirzayi added the bug Something isn't working label Aug 22, 2024
@jwokaty
Copy link
Collaborator

jwokaty commented Aug 22, 2024

I believe this is related to #32, waldronlab/BugSigDB#234 and waldronlab/BugSigDB#238 because they download the CSV files, which are used to create the full_dump.csv.

Looking at Github action where the CSV files are downloaded, they're much smaller for this run than we expect (they should be greater than 1MB) but 2 are under: https://github.com/waldronlab/BugSigDBExports/actions/runs/10508595908/job/29112829073#step:7:19.

@tosfos
Copy link

tosfos commented Aug 30, 2024

Noting that a download of a Signatures CSV today contained the correct 5,333 records. This issue is likely resolved.

@jwokaty
Copy link
Collaborator

jwokaty commented Sep 3, 2024

@cmirzayi Can you verify when you return and also confirm #32 is resolved?

I appear to be getting the correct number of rows when I download the csv files and the files downloaded in the Github Actions appear be the size we expect: https://github.com/waldronlab/BugSigDBExports/actions/runs/10688939861/job/29629915697#step:7:19. I also see more data in the columns than previously.

@cmirzayi
Copy link
Author

cmirzayi commented Sep 9, 2024

@jwokaty This issue is not resolved and is becoming critical. I just checked the export from BugSigDB and it appears to have the right number of rows for signatures but the export here is wrong. Can you investigate please?

@jwokaty
Copy link
Collaborator

jwokaty commented Sep 9, 2024

@tosfos Maybe we just got lucky the day we checked. The problem is still happening. I just tried to curl for the signature file but only got about 2k rows.

We can also see it in the GitHub action. The signature file should have 5k+ rows but it gets less than that https://github.com/waldronlab/BugSigDBExports/actions/runs/10779489982/job/29893115722#step:7:19 where as last week, it downloaded 5k+: https://github.com/waldronlab/BugSigDBExports/actions/runs/10689720440/job/29632477014#step:7:47.

@cmirzayi
Copy link
Author

@jwokaty The problem is still occurring. We are missing 900 records as of the latest (28 minutes ago) data export. This is a critical issue that is disrupting my ability to work with students on their capstones so we need a solution sooner rather than later I'm afraid.

@jwokaty
Copy link
Collaborator

jwokaty commented Sep 25, 2024

@tosfos I'll also send a message to support. You can see as @cmirzayi mentioned that we're still seeing smaller downloads. For example, the signature files should be around 6mb but they are smaller when our GitHub action downloaded them:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants