Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax for source declaration #1

Open
rossmeissl opened this issue Apr 26, 2016 · 11 comments
Open

Syntax for source declaration #1

rossmeissl opened this issue Apr 26, 2016 · 11 comments

Comments

@rossmeissl
Copy link
Member

How does

AWS_ACCESS_KEY_ID     secrets/services/aws:id

Indicate that we're pulling from Vault specifically (as opposed to Keywhiz)? Shouldn't it be something like:

AWS_ACCESS_KEY_ID     vault/secrets/services/aws:id

Or maybe I'm missing something?

@dkastner @emk @seamusabshere

@emk
Copy link

emk commented Apr 26, 2016

Right now, we assume that an application talks to a single secret server,
and which secret server it uses is determined by looking for env vars like
VAULT_ADDR. There is a syntax in the Rust implementation for keywhiz keys
(which don't have paths or key value maps), but no backend for keywhiz yet
because the server setup is gross.

Honestly, the format was never designed for multiple secret servers, or
secret servers of multiple kinds at once. Would you be interested in
sketching out a couple of use cases, so that we can weigh the benefits
against any added UI complexity? Thank you for preparing the overview page!
Le 26 avr. 2016 6:56 PM, "Andy Rossmeissl" [email protected] a
écrit :

How does

AWS_ACCESS_KEY_ID secrets/services/aws:id

Indicate that we're pulling from Vault specifically (as opposed to
Keywhiz)? Shouldn't it be something like:

AWS_ACCESS_KEY_ID vault/secrets/services/aws:id

Or maybe I'm missing something?

@dkastner https://github.com/dkastner @emk https://github.com/emk
@seamusabshere https://github.com/seamusabshere


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1

@rossmeissl
Copy link
Member Author

@emk:

I don't have any use cases in mind, it's just not clear to me how a client implementing the Secretfile syntax would go about determining which backend a specific secret comes from. Maybe I'm misunderstanding though. If a library comes across this:

secrets/services/aws:id

How does it know that it should try Vault (via VAULT_ADDR), looking for secrets/services/aws:id there, instead of trying Keywhiz, or any other backend? Or is the specification that it should try all possible backends in alphabetical order or something? Or maybe the /[a-z/:]+/ path-like syntax is enough to identify this as a Vaulty string?

@emk
Copy link

emk commented Apr 26, 2016

Ah, OK. Let's see if I can explain. The process is driven first by
environment variables, and only secondarily by the contents of Secretfile:

  1. If VAULT_ADDR is set, assume that we should talk to Vault.
  2. (If KEYWHIZ_WHATEVER is set, assume that we should talk to Keywhiz.
    But this isn't implemented yet.)
  3. If no backend environment variables are set, assume that we're always
    going fetch secrets directly from the environment.
  4. If we are fetching secrets from an external server, look in
    Secretfile to figure out the mappings between env var names and the
    backend's secret names/paths.

The key insight is that we never even look at the Secretfile until we've
decided how to interpret what's in it. This seems to work tolerably well
for the single-secret-server case, but it would obviously break if an org
had multiple backends.

Does this at least make sense of the current design?

On Tue, Apr 26, 2016 at 7:07 PM, Andy Rossmeissl [email protected]
wrote:

@emk https://github.com/emk:

I don't have any use cases in mind, it's just not clear to me how a client
implementing the Secretfile syntax would go about determining which backend
a specific secret comes from. Maybe I'm misunderstanding though. If a
library comes across this:

secrets/services/aws:id

How does it know that it should try Vault (via VAULT_ADDR), looking for
secrets/services/aws:id there, instead of trying Keywhiz, or any other
backend? Or is the specification that it should try all possible backends
in alphabetical order or something? Or maybe the /[a-z/:]+/ path-like
syntax is enough to identify this as a Vaulty string?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1 (comment)

@rossmeissl
Copy link
Member Author

@emk but what if both VAULT_ADDR and KEYWHIZ_ADDR were set? Doesn't that seem like an ambiguous way to do driver-selection?

@rossmeissl
Copy link
Member Author

@emk I guess I'm saying that from a spec perspective if we say that when faced with a Secretfile line we can tokenize it as follows:

{secret_name} {backend}/{uninterpolated_key}

The Secretfile consumer parses the line, interpolates env vars into the key if necessary, send the (interpolated) key to the driver for backend, then makes the result available as secret_name. The current syntax forces the consumer to guess which backend to query for the key, which seems like a tall order.

@emk
Copy link

emk commented Apr 27, 2016

Well, I can see two cases:

  1. We want to support multiple instances of vault, keywhiz, etc., in a
    single Secretfile.
  2. We only case about supporting a single secret server.

In case (1), prefixing keys with "vault/" or "keywhiz/" is insufficient,
because if you can have both a Vault and a Keywhiz server, you can surely
have two Vault servers, and there's no way to disambiguate. I'm not sure if
(1) is an important use case, but if it is, we should support it well.

In case (2), specifying "vault/" on every line of the file is redundant,
and if we decide to specify it in the local file, then we should do so in a
single line at the top.

There's also another important use case to keep in mind here: The entire
point of the Secretfile format is that a container can switch between using
environment variables and using a secret-storage server at runtime. If
VAULT_ADDR is present, we assume that we should use Vault. But if
VAULT_ADDR isn't specified, we fall back to 12factors-style config. We rely
heavily on this internally, both in development mode and when testing
containers, and it turns out to be one of the key pieces of the puzzle. The
presence of a Secretfile in the container, by itself, cannot be the sole
determining factor in whether the Vault backend (for example) gets
activated, because that would increase the complexity of a lot of code.

Now, there are some other interesting use cases that we haven't fully
thought through. For example:

Distributing containers which switch between 12factor and vault at the

user's choice

The vision here is that you could package a standard container, and allow
users to specify where it gets its secrets.

Right now, the way that you do this is you install an appropriate
Secretfile library or tool, and build a commented-out Secretfile into your
container. Then, if the user wants 12factor-style, they proceed normally
and pass secrets via the environment. If they want to use Vault, they add a
Secretfile containing the paths where those secrets can be found in their
local backend, and they ADD it to the container.

I'm not sure whether this is a good design, but it's arguably at least
workable, and we might be able to find better. But even in this case, the
user can switch between Vault and non-Vault mode by passing VAULT_ADDR into
the container, or not, allowing them to run containers locally without
Vault, or to run them in testing environments.

I'm definitely happy to tweak the design here. We already have a number of
known use-cases based on converting a lot of different containers, running
them, and testing them. So let's make a list of any additional use cases we
want to support, and then come up with the simplest behavior that supports
those use cases. (Simplicity is key here, because we want the library to be
trivially portable to new languages.)

I'm also willing to consider the idea that we should dump the custom
Secretfile parser in favor a human-friendly syntax like "toml"
https://github.com/toml-lang/toml or "hcl" https://github.com/hashicorp/hcl
, both of which are trivial wrappers around JSON that allow comments, etc.
This would be worthwhile if (1) we're going to change the format anyway,
and especially if (2) we want to add more features. Personally, I think
toml is a nicer format, but hcl will be familiar to vault users at least
(which might or might not be a plus for Secretfile users).

On Tue, Apr 26, 2016 at 7:20 PM, Andy Rossmeissl [email protected]
wrote:

@emk https://github.com/emk but what if both VAULT_ADDR and KEYWHIZ_ADDR
were set? Doesn't that seem like an ambiguous way to do driver-selection?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1 (comment)

@rossmeissl
Copy link
Member Author

rossmeissl commented Apr 27, 2016

@emk I think you're right that case (1) above — multiple Vaults/Keywhizzes in a single Secretfile — sounds aggressive for now.

And I hear you about supporting 12factor style. If we were to write a high-level spec for how a Secretfile-friendly secrets library should behave, would it be something like this?

  1. Application asks library for a secret (via credentials::var("TOP_SECRET") or SecretGarden.fetch('TOP_SECRET'), say)
  2. Library checks the environment first: if it finds TOP_SECRET there, returns that value
  3. Otherwise, library checks to see if VAULT_ADDR, KEYWHIZ_ADDR, or some other supported backend config environment variable is present—if not, library throws an exception
  4. Library attempts to find TOP_SECRET in the Secretfile—if missing, library throws an exception
  5. Library attempts to find the TOP_SECRET secret listed in the Secretfile in the previously found backend. If found, caches the result and returns it; if not, library throws an exception.

I realize that's probably wrong in several respects, but as a straw man I see a few immediate issues:

  • You could ask the library for any environment variable—the spec above doesn't force a whitelist check against Secretfile. Seems like the library should load/parse the Secretfile at runtime and whitelist the named secrets, throwing an exception when an unknown secret is requested (even if it coincidentally happens to be in the environment)
  • Step (3) still feels like an onerous search process for an implementing library. I think your suggestion above of a single directive at the top of the Secretfile makes the most sense: @source vault or something like that. We could even specify vault as the default if we want to make it special. Importantly, the library shouldn't try to connect to the backend or even care about the declared source at all unless it must because, later on, a whitelisted secret is requested that isn't in the environment. That ensures that, say, a development box with no vault and 12factor-style env stuffing works just fine.
  • Let's say, as a thought experiment, that someday we do have to support multiple declared backends in a single Secretfile. Can we imagine a backwards-compatible extension to the current syntax? If not, we may want to provide a "hook" for that now. For example, we could say that current implementers should ignore any secret identifiers that start with @. Then later we could extend the spec to indicate that when an identifier starts with @, the first chunk of the identifier should get shifted off and used to select source. For now, though, this is as simple as specifying that secret identifiers should never start with a @ (or whatever character makes the most sense).

FWIW I like the simple format now and don't see a pressing need to use TOML or HCL.

@rossmeissl rossmeissl changed the title Multi-store syntax Syntax for source declaration Apr 27, 2016
@emk
Copy link

emk commented Apr 27, 2016

These are all excellent ideas. Let me think about the various use cases
today and see how many of these will work well in practice.

I do think, however, that it might be best to for the library to pick at
backend at startup time, even if it never decides to use it, rather than
trying to choose a backend lazily on demand. Lazy evaluation and/or late
binding feel a bit unnatural in a static language, and they usually add
code complexity.

I'd strongly prefer for the following function (or its equivalent) to be
run once, at library startup time:
https://github.com/emk/credentials/blob/99867d4f3951d56a330882d47d0b46129192489f/src/chained.rs#L26-L43

(Ignore the allow_override. This is always true for a normal
implementation of Secretfile, and is only false for certain
credentials-to-env use cases involving retrofitting third-party Docker
images that supply garbage values in env vars and don't allow you to unset
them. If you implement Secretfile as a library, this should never be
relevant, because you'll control the env vars declared in your image.)

On Wed, Apr 27, 2016 at 9:05 AM, Andy Rossmeissl [email protected]
wrote:

@emk https://github.com/emk I think you're right that case (1) above
— multiple Vaults/Keywhizzes in a single Secretfile — sounds aggressive for
now.

And I hear you about supporting 12factor style. If we were to write a
high-level spec for how a Secretfile-friendly secrets library should
behave, would it be something like this?

  1. Application asks library for a secret (via
    credentials::var("TOP_SECRET") or SecretGarden.fetch('TOP_SECRET'),
    say)
  2. Library checks the environment first: if it finds TOP_SECRET there,
    returns that value
  3. Otherwise, library checks to see if VAULT_ADDR, KEYWHIZ_ADDR, or
    some other supported backend config environment variable is present—if not,
    library throws an exception
  4. Library attempts to find TOP_SECRET in the Secretfile—if missing,
    library throws an exception
  5. Library attempts to find the TOP_SECRET secret listed in the
    Secretfile in the previously found backend. If found, caches the result and
    returns it; if not, library throws an exception.

I realize that's probably wrong in several respects, but as a straw man I
see a few immediate issues:

You could ask the library for any environment variable—the spec
above doesn't force a whitelist check against Secretfile. Seems like the
library should load/parse the Secretfile at runtime and whitelist the named
secrets, throwing an exception when an unknown secret is requested (even if
it coincidentally happens to be in the environment)

Step (3) still feels like an onerous search process for an
implementing library. I think your suggestion above of a single directive
at the top of the Secretfile makes the most sense: @source vault or
something like that. We could even specify vault as the default if we
want to make it special. Importantly, the library shouldn't try to connect
to the backend or even care about the declared source at all unless it
must because, later on, a whitelisted secret is requested that isn't
in the environment. That ensures that, say, a development box with no vault
and 12factor-style env stuffing works just fine.

Let's say, as a thought experiment, that someday we do have to
support multiple declared backends in a single Secretfile. Can we imagine a
backwards-compatible extension to the current syntax? If not, we may want
to provide a "hook" for that now. For example, we could say that current
implementers should ignore any secret identifiers that start with @.
Then later we could extend the spec to indicate that when an identifier
starts with @, the first chunk of the identifier should get unshifted
off and used to select source. For now, though, this is as simple as
specifying that secret identifiers should never start with a @ (or
whatever character makes the most sense).

FWIW I like the simple format now and don't see a pressing need to use
TOML or HCL.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1 (comment)

@rossmeissl
Copy link
Member Author

rossmeissl commented Apr 27, 2016

@emk cool, sounds good. The one problem with the code snippet you linked to (and that behavior in general) is that early, greedy startup-time secret prep may impose an unnecessary burden on development and other toy environments. For example:

# Secretfile
ROLLBAR_TOKEN    secrets/rollbar:token

In production, there will be no ROLLBAR_TOKEN env var, so under your proposal, the implementing library will pull this secret out of vault at startup time. Then later when the application wants to connect to Rollbar, the secret's ready. That works great, as intended.

But in, say, development, we never want to talk to Rollbar. So ROLLBAR_TOKEN won't be and shouldn't be in the environment; nor do we want to connect to Vault for this! What should the library do while parsing a Secretfile when it finds a secret that's not in the environment and doesn't have a VAULT_ADDR to use to look in a store? Shrug and move on or raise?

Finally, aren't secrets from Vault often temporary? If they're retrieved at startup time, won't that make it more likely they'll be expired by the time they're needed? Or is the expectation that secrets are employed near startup time as well?

(FWIW I like the idea of startup-time secret resolution, but the Rollbar example made me rethink.)

@emk
Copy link

emk commented Apr 28, 2016

I definitely agree that we should not fetch secrets at startup time, except in a tool like credentials-to-env that just fetches everything at once and then executes another program. As you point out, this would cause various issues, especially with dynamically-generated secrets.

What I was suggesting is that we choose the backend configuration at startup time. If we want to use the environment, and fall back to vault if no secrets are available, that should be determined at startup, so that we can create all the necessary driver objects together.

Certainly, nothing obliges an implementation to do it this way, but I'd like to be possible to fully determine where secrets are supposed to come from (if it turns out we actually need to fetch them) as early in the process as possible.

@rossmeissl
Copy link
Member Author

@emk OK yep agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants