title | expires_at | tags | |
---|---|---|---|
Dynamic ASGs |
never |
|
- Dynamic ASGs
- Overview
- FAQ
- 1. I'm not using silk. How does this change things?
- 2. How do I enable dynamic ASGs?
- 3. How do I disable dynamic ASGs?
- 4. I don't like waiting a few minutes for rules to be enforced. Can it be faster?
- 5. Did we stop passing ASG information through the LRP?
- 6. If I create an ASG and immediately push an app, will the ASGs be up-to-date?
- 7. What about windows?
- 8. Why are my security-group updates taking a very long time to be detected and synced through to policy-server?
- Problem:
- Solution Illustration (bolded pages are the ones we actually query capi for):
- Q&A:
ASG - Application Security Group. These network egress rules are an allow list of IPs and protocols that apps are allowed to access.
As of cf-networking-release 3.0.0 and silk-release 3.0.0 dynamic ASG enforcement is available and on by default.
Before we talk about dynamic ASGs, it is important to understand what existed first.
When Dynamic ASGs enforcement is not enabled then ASG rule enforcement requires app restarts in order for rules to take effect. Cloud Foundry operators with large installations and distributed teams spend an enormous amount of time trying to wrangle developer teams to restart their apps when global rules change. This experience is also at odds with expectations, as developers expect these changes to take effect immediately with no extra action on their part.
**Steps taken to create and enforce ASGs** 1. An operator creates and binds and ASG. 1. An app dev restarts their app. 1. The cloud controller creates a desired LRP with the ASG rules. 1. The rep gets desired LRP information when launching containers. 1. The executor calls out to garden to start a container. 1. Garden calls out to the silk CNI plugin to set up the networking for the container. 1. The silk CNI plugin creates ASGs as iptables netout rules. ### With Dynamic ASGs When Dynamic ASGs enforcement _is_ enabled then ASG rule enforcement _does not_ require app restarts in order for rules to take effect. Within a few minutes rules will be automatically enforced. ![dynamic ASG architecture diagram](dynamic-asg-enforcement-architecture.png) **Steps taken to create and enforce dynamic ASGs** 1. An operator creates and binds and ASG. 1. Cloud Controller updates its internal reference to the last time ASGs were modified. **NOTE** This only occurs on `/v3` API endpoints. 1. A new job, the Policy Server ASG Syncer, polls Cloud Controller for the last time ASGs were updated. If changes have been made, it syncs all ASGs. The syncer saves all the ASGs in the policy server DB. This poll interval is controlled by the syncer's bosh property [`asg_poll_interval_seconds`](cf-networking-release/jobs/policy-server-asg-syncer/spec
Lines 31 to 33 in 0c5029c
cf-networking-release/jobs/policy-server-asg-syncer/spec
Lines 27 to 29 in 0c5029c
cf-networking-release/jobs/policy-server-asg-syncer/spec
Lines 31 to 33 in 0c5029c
**page1: 0-4999**, page2: 5000-9999, page3: 10000-14999, page4: 15000-19999
Second Query (page=2, page_size=4999):
page1: 0-4998, **page2: 4999-9997**, page3: 9998-14996, page4: 14997 - 19995, page5: 19996-19999
Third Query (page=3, page_size=4998):
page1: 0-4997, page2: 4998-9995, **page3: 9996-14993**, page4: 14994 - 19991, page5: 19992 - 19999
Fourth Query ([age=4, page_size=4997):
page1: 0-4996, page2: 4997-9993, page3: 9994 - 14990, **page4: 14991 - 19987**, page5: 19988 - 19999
Fifth Query (page=5, page_size=4996):
page1: 0-4995, page2: 4996 - 9991, page3: 9992 - 14987, page4: 14988 - 19983, **page5: 19984 - 19999**
On the second query, we check that index0 of the second query (4999) was the same as the last index of the first query (4999).
On the third query we check that index1 of the third query (9997) was the same as the last index of the second query (9997).
On the fourth query, we check that index2 of the fourth query (14993) was the same as the last index of the third query (14993).
On the fifth query, we check that index3 of the fifth query (19987) was the same as the last index of the fourth query (19987).
### Q&A: Q: Why is this complex pagination necessary?
A: We need to detect any changes (deletions) in the capi response that happened after the start of the poll cycle. We sort by `created_by`, so any additions are at the end. However deletions are likely somewhere in the middle and cause all ASGs following them to be shifted up, causing us to miss non-deleted ASGs (see "Guard against the following scenario, above). Q: Why do we have to decrement page size for each page?
A: We want the create an overlap between the response of the last query with the present query.
Capi lets us set `page`, `per_page`, and `order_by` query parameters. Given the query parameters we have access to, decrementing `page_size` as we increment page is is the only way we can create an overlap (see Solution Illustration, above). Q: Why do we have to increment the index of the ASG we are inspecting?
A: As we decrement, the overlap we create will get bigger and bigger. We don't need to compare all the contents of the overlap, just the last overlapping ASG (see Solution Illustration, above). Q: What happens when there are +5000 pages?
A: Actualy, once there are *2500* pages, we run out of space in the result set. (This is because page size goes DOWN at the same time that index goes UP). This would be a problem; however even our customers with the largest numbers of ASGs don't have 2500 pages of 5000+4999+4998... ASGs per page. Q: How long will this algorithm be in place?
A: Our first priority for TAS 2.14 is to refactor ASGs so that policy-server, not capi, is the source of truth for ASGs. This will remove this feature's dependcy on capi and we will be able to remove this algorithm.