Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for cloud authentication via tokens, service accounts, etc. #181

Open
juliasilge opened this issue Jan 24, 2023 · 3 comments

Comments

@juliasilge
Copy link
Member

In writing up the blog post for pins 1.1.0 in R, I ran into some challenges around authenticating for GCS in Python. I have a service account JSON file google-pins.json for a project "pins-dev" in my working directory.

I kind of expected that this might work, given what the docs for gcsfs and pins say (use a cached gcsfs token), but it does not:

import pins
import gcsfs

fs = gcsfs.GCSFileSystem(project="pins-dev", token="google-pins.json")
board = pins.board_gcs("pins-testing")
board.pin_read("small-numbers")

The pins functions don't work, even though fs.ls("pins-testing/") does.

This does work:

import pins
opts = {"cache_timeout": 0, "token": "google-pins.json"}
path = "pins-testing"
board = pins.board("gcs", path, storage_options=opts)
board.pin_read("nice-numbers")

Once I successfully read the pin this way, I can re-declare the board via pins.board_gcs("pins-testing") and still read the pin (which is cached locally), even though the board object is different.

🎯 Can/should we add a token argument to the GCS board? Should we do something similar for the other cloud boards? FWIW in R, we decided authentication was specific enough to these platforms that we needed to add individualized support in each board.

Also FWIW with GCS specifically, I'm still fuzzy on how the CLI authentication interacts with what I can do from Python. I did try authenticating via the CLI with gcloud auth application-default login and I'm not sure whether that was important.

@cpcloud
Copy link
Contributor

cpcloud commented Jul 19, 2023

IME it's best to delegate to the underlying libraries and if the underlying libraries are also hand-rolling APIs to their users, to help those libraries use the provider-implemented APIs to handle authentication and authorization.

Exposing authz to users almost always fails to account for the many ways credentials can be set.

@cpcloud
Copy link
Contributor

cpcloud commented Jul 19, 2023

FWIW I've had no problems authenticating to GCS using pins.

It's likely that gcsfs is using the Google-implemented authentication libraries, which will among other things look in the correct user-directories for credentials (which as you allude to are set up by running gcloud auth ...).

@machow
Copy link
Collaborator

machow commented Jul 24, 2023

I wonder if at the very least we include in the docstring of board() something similar to the example of what worked. That way, there's a quick escape hatch to manually passing arguments to the underlying fsspec.filesystem constructor.

Another kind of weird thing, that maybe we can nudge upstream in gcsfs on, is that AFAIK it respects the GOOGLE_APPLICATION_CREDENTIALS env var (though I haven't checked recently), but this isn't documented in the gcsfs docs (it's a behavior in the lower level google auth library?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants