Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with race conditions connecting to Vault #65

Merged
merged 3 commits into from
Jul 29, 2021

Conversation

rmc47
Copy link
Contributor

@rmc47 rmc47 commented Apr 20, 2021

Related to #61 and possibly hashicorp/vault-ruby#216, we've been seeing a large number of Vault::ConnectionPool::PoolShuttingDownError errors running hiera_vault on a Puppet server.

There looks like there's two (related) possible race conditions in play: while a single Puppet JRuby instance serves one request at a time, I think the background thread that the Debouncer calls shutdown on can happen concurrently with the main thread's use of the Vault client.

I think this flow is possible:

  • First request served; shutdown debouncer gets scheduled for T+10
  • At T+9.9, second request starts being served; during execution (after configuration but before completion), debouncer fires and shuts down Vault client; logical_read operation then fails

I think there's also an issue where $vault gets set to nil following a StandardError, but not recreated when it's used on a subsequent request?

This addresses both scenarios by ensuring all use of $vault happens within a critical section. This should not substantially affect performance: the mutex should always be uncontended except where the shutdown and vault_get calls happen at the same time. It also ensures (within the critical section) that the vault client is not nil, and creates a new one if required.

GitHub's diff is pretty unhelpful due to indentation changes, I'm afraid - ignoring whitespace changes gives something much more readable. I've changed the names of the global variables to avoid any conflicts with other global variables from older versions of this module running concurrently in other Puppet environments, as I think Ruby globals are not isolated between environments.

(Caveat: my Ruby knowledge is pretty poor - apologies if I've taken the wrong approach on this.)

@petems
Copy link
Owner

petems commented Apr 20, 2021

This looks like a good approach, I'll see if I can quickly test it and merge this week.

Can you rebase as well? Just merged a fix to ENV setup

@rmc47
Copy link
Contributor Author

rmc47 commented Apr 20, 2021

For sure - rebased on 4376028. (I've kept it as two commits - one for the critical section, one for the variable rename to avoid environment escape - but shout if you'd prefer it as one.)

…versions of this function and this one in different environments
@petems
Copy link
Owner

petems commented Jul 23, 2021

@rmc47 So sorry, I completely forgot about this PR. I've just merged in some logic for comma-based pathing that has a bunch of refacotring, are you ok to rebase again? If not no worries, I think I get the jist of the logic here so I think I can do it.

petems added a commit that referenced this pull request Jul 23, 2021
- Building on the original work in #65
@petems
Copy link
Owner

petems commented Jul 23, 2021

Ok, I think I've rebased it correctly, gonna see if I can manually check this in an acceptance test environment 👍🏻

@petems petems changed the title Perform operations on the Vault client within a critical section Fix issues with race conditions connecting to Vault Jul 23, 2021
@rmc47
Copy link
Contributor Author

rmc47 commented Jul 23, 2021

Amazing, thank you! To be honest, we'd entirely forgotten about it as well and have been running off my branch for the last couple of months 😆. Would be great to get back on to main if you're happy with the fix!

@petems
Copy link
Owner

petems commented Jul 23, 2021

Glad to hear! Are you ok to test this particular rebased branch as well to confirm it works? If so, I'll merge and cut a new 2.0.0 version of the module as a potential breaking change, then you can move back to the master branch 👍🏻

@rmc47
Copy link
Contributor Author

rmc47 commented Jul 29, 2021

@petems that's been running for the last few days, and all looks good our side!

@petems
Copy link
Owner

petems commented Jul 29, 2021

Awesome, lets get this merged and a new release cut! 😄

@petems petems merged commit 184fe95 into petems:master Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants