Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading .xlsx files #562

Open
genisplaja opened this issue Oct 27, 2022 · 2 comments
Open

Reading .xlsx files #562

genisplaja opened this issue Oct 27, 2022 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@genisplaja
Copy link
Collaborator

Hello! As you might have seen, PR #560 is blocked because of an optional dependency that is missing:

E               ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

That is because I am trying to load a dataset metadata .xlsx file using pandas.read_excel(). I tried to use different loading strategies but didn't work, that seems to be the standard way to load .xlsx files. Would openpyxl be a problematic dependency to have in mirdata, taking into account that we may have a dataset in the future that includes .xlsx files as well (I know is not common but who knows...). Otherwise we could include openpyxl as an optional dependency for the dataloader in #560. What do you think?

@genisplaja genisplaja added the help wanted Extra attention is needed label Oct 27, 2022
@magdalenafuentes
Copy link
Collaborator

Hey @genisplaja, we were discussing the issue of optional dependencies with @harshpalan the other day. We thought that a potential solution could be to add a check in loaders that need those optional dependencies, so when you initialize the dataset it will warn you that the dependency is missing and you should install it for that dataset to work. Would that fix your issue?

@genisplaja
Copy link
Collaborator Author

genisplaja commented Oct 27, 2022

Yes! Now we were having a conversation with @nkundiushuti that maybe we could also try to change from .xslx to .csv, since we have access to the Zenodo entry, however, IMHO creating a newer version just for that is a little bit overkill.

What you actually propose I think would be nice. We could even use pipdeptree to check if the user has the particular optional dependencies installed, and if not, throw the warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants