URL Grouping/Aggregation #371

tom-miseur · 2022-06-03T23:19:13Z

It is often useful to aggregate endpoint URLs that contain dynamic values. This is critical in the k6 Cloud due to the limits we have in place to prevent tests from emitting too-many-metrics/too-many-urls.

The URL Grouping documentation provides a solution for k6 scripts using the http module, but because xk6-browser operates at the browser-level, there is no opportunity for the user to apply the name tag to requests that require it.

The situation is compounded by the fact that xk6-browser gains visibility of all HTTP requests incurred by the browser, including 3rd party hosts that would not normally be interacted with at all using HTTP k6 scripts.

Potential solutions

Allowlist/blocklist hosts in xk6-browser

A cursory browse through Playwright docs suggests there is no convenient way of preventing/allowing requests to certain hosts, e.g. through specifying regular expressions. There is, however, a request interception mechanism involving Page.route or BrowserContext.route that could be used to abort requests that don't fit the criteria.

Pros:

would appear to support regex allow/block-listing which should be fairly easy for users to apply
doesn't actually send requests to undesired hosts, so no need to wait for #1321 and no need to worry about errors from 3rd party hosts (e.g. errors caused by rate limiting)

Cons:

users need to encounter the problem before then figuring out how to resolve it

Allowlist/blocklist hosts after-the-fact

This means xk6-browser still sends requests to the additional hosts, but that traffic can be filtered out of results.

Pros:

the user wouldn't need to run the test again to have filtering applied

Cons:

requests are sent to 3rd parties who may have rate limiting/bot protections in place that cause errors
k6 OSS would need some mechanism to ignore metrics from certain hosts (#1321)
k6 Cloud would need to be able to filter out hosts (unless #1321 would result in k6 Cloud not receiving the metrics at all which is quite likely)

Aggregation Rules

This would involve the user specifying URL grouping regular expressions (likely in options) ahead of time. Before any metric is generated, we check if the URL matches any of the patterns and apply the transformation as necessary.

Example:

export const options = {
  aggregations: [
    { regex: 'http:\/\/ecommerce\.test\.k6\.io\/checkout\/order-received\/.*\/\?key=.*', replace: '[id]' }
  ]
}

// http://ecommerce.test.k6.io/checkout/order-received/124/?key=bgravga43g43 -> http://ecommerce.test.k6.io/checkout/order-received/[id]/?key=[id]

Pros:

fairly straightforward to use; possibly even easier to implement than tagging requests with name
would be applicable to both http and xk6-browser
also solves the edge case where redirect requests contain dynamic IDs (you can apply a name tag to the request that initiates the redirect chain, but then all requests in that chain end up with the same name tag)

Cons:

requests are sent to 3rd parties who may have rate limiting/bot protections in place that cause errors
users need to encounter the problem before then figuring out how to resolve it
performance is likely going to be a concern here, given that all URLs would need to be evaluated against one or more regular expressions

Tasks

Give feedback

The text was updated successfully, but these errors were encountered:

imiric · 2022-06-06T08:38:17Z

As mentioned over Slack, support for k6's blockHostnames option was added in #204, and released in v0.2.0. So you can give that a try right now and see if it helps.

That said, we'll still have to implement URL grouping by name, since that's currently not possible.

Using regex for this would be the more flexible option, but sticking with globbing patterns like with blockHostnames would be user friendlier. Considering this feature would also be useful for plain k6 scripts, where evaluating a regex for each URL might be too CPU intensive, using globbing would also perform better. Performance in this case isn't as important for xk6-browser, since we don't make requests with nearly the same frequency, so regex might work for us as well, but globbing seems like the way to go.

If we want to use the global options object, this will have to be implemented in k6 instead, since extensions don't have access to change it. It's worth discussing this with k6 devs, so @na--, WDYT? Would this feature also be useful for k6? If so, we should implement it there first, and then reuse the option in xk6-browser, in the same way we did for blockHostnames. If not, then this will have to be an xk6-browser-specific option, likely part of the BrowserContext options.

na-- · 2022-06-06T09:37:00Z

Hmm, I don't have a very strong opinion here, but I'd prefer if we can avoid doing this via a new global option, at least until we have a clear idea of how to implement that optimally... 🤔

Global options are always a heavy maintenance burden over time and they are often not flexible enough to address all use cases. In some cases they are unavoidable, but in general I think we've found that programmable APIs are both easier to maintain and more flexible.

In this case, maybe a new callback to the browser.newContext() parameters could be used? I am not familiar enough with xk6-browser to know if this is a good or even possible solution, just throwing it out there as a potential solution through the API instead of through the global config

dgzlopes · 2022-12-01T12:27:02Z

Sorry! I somehow missed responding to this one 😞

I thought it could be interesting to have an automatic way of doing this. After all, we have the metrics data and all the URLs in k6! (at least for some time).

Maybe we could have the option to aggregate "high cardinality data" that would check the latest URLs and remove the highly changing part (and replace it with id_X or something).

There is a "similar" feature in Grafana that lets you dedup Loki logs based on the signature.

dgzlopes · 2022-12-01T12:29:35Z

Internally, if I remember correctly, we had something similar for Prometheus metrics labels, too (In Python).

ankur22 · 2024-09-26T11:57:36Z

After some discussions we want to showcase the following API that will soon be available to allow grouping of metrics which are tagged with url. It differs from how the k6/http module groups metrics with high cardinality urls for good reason.

Here's the API (some details may slightly change):

export default async function() {
  const context = await browser.newContext();
  const page = await context.newPage();

  // Register a callback on the page object to be executed whenever a
  // metric is about to be emitted: offering the user the ability to build
  // their own logic and grouping of URLs. 
  page.on('metric', metric => {
    let regex = /^https:\/\/example\.com\/checkout\/[0-9a-f]*$/;

    // Grouping all browser metrics that contain the url tag which match the
    // regex with the name "example-checkout", which would allow the customer to
    // build a graph by querying for "shop-checkout".
    if (regex.test(metric.tags['url'])) {
      metric.tags['url']["name"] = 'example-checkout'
    }
  });

  await page.goto('https://example.com');
  await page.close();
}

The new API extends the page.on API (that already exists) to intercept and modify the metrics that are being emitted for the current page. In the example above we're working with the raw metric object. The user experience might not be tight, but it does give the user a lot of control over the metric.

NOTE: This will only intercept metrics that the browser module emits, which currently are:

browser_data_sent
browser_http_req_duration
browser_data_received
browser_http_req_failed
browser_web_vital_*

We also hope to offer an easier to use helper function on metric to reduce the boiler plate code:

 page.on('metric', metric => {
    metric.groupURLTag({
    urls: [
      {url: /^https:\/\/example\.com\/[0-9a-f]*\/checkout\/[0-9a-f]*$/, name:"account-basket"},
      {url: /^https:\/\/example\.com\/catalogue\?session=[0-9a-f]*$/, name:"catalogue"},
    ]});
 });

jewbetcha · 2024-10-02T18:32:47Z

Jumping in here from the k8s monitoring team, we would be very interested in this!

ankur22 · 2024-10-03T11:47:28Z

While implementing this feature, I've had to change it ever so slightly, which is to not expose the metric internals and allow the user to amend them. Instead the focus is only on the groupURLTag, which was the primary use case for this feature request. The reason for not exporting the metric itself is that there's still some uncertainty to the metric structure. I think it requires a bit more thought into the structure of the metric object that we want to expose, and there needs to be a clear reason why we're doing that.

ankur22 · 2024-10-07T20:22:14Z

The final API looks like this:

  page.on('metric', (metric) => {
    metric.Tag({
      urls: [
        {url: /^https:\/\/test\.k6\.io\/\?q=[0-9a-z]+$/, name:'test'},
      ]
    });
  });

There's an example you can work with to get you off your feet (remember to change the import to k6/browser)..

tom-miseur added the feature A new feature label Jun 3, 2022

imiric added the evaluate label Jun 6, 2022

inancgumus added the next Might be eligible for the next planning (not guaranteed!) label Jun 9, 2022

inancgumus added this to the v0.5.0 milestone Jun 9, 2022

inancgumus modified the milestones: v0.5.0, v0.6.0 Jun 23, 2022

inancgumus modified the milestones: v0.6.0, v0.7.0 Nov 8, 2022

inancgumus self-assigned this Nov 8, 2022

inancgumus added the blocked We need further action from something/someone to be able to work on the issue label Nov 10, 2022

inancgumus removed their assignment Dec 5, 2022

inancgumus removed this from the v0.7.0 milestone Feb 7, 2023

ka3de added the user request Requested by the community label Jul 28, 2023

inancgumus removed the next Might be eligible for the next planning (not guaranteed!) label Sep 22, 2023

ankur22 added the next Might be eligible for the next planning (not guaranteed!) label Sep 18, 2024

ankur22 mentioned this issue Sep 20, 2024

Filter out request metrics by their URL tag #1434

Open

ankur22 assigned ankur22 and unassigned ankur22 Sep 26, 2024

ankur22 self-assigned this Sep 27, 2024

ankur22 mentioned this issue Oct 3, 2024

Add page.on('metric') #1456

Merged

3 tasks

inancgumus mentioned this issue Oct 4, 2024

Page.On #1460

Open

ankur22 mentioned this issue Oct 16, 2024

Add the ability to filter by request type in page.on('metric') #1487

Open

inancgumus mentioned this issue Oct 28, 2024

Support custom tags in page.goto() and other relevant methods #504

Open

This was referenced Nov 8, 2024

Add page.on('metric') type definition grafana/k6-DefinitelyTyped#73

Merged

Add page.on('metric') docs grafana/k6-docs#1807

Merged

BrewTestBot mentioned this issue Nov 11, 2024

k6 0.55.0 Homebrew/homebrew-core#197332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL Grouping/Aggregation #371

URL Grouping/Aggregation #371

tom-miseur commented Jun 3, 2022 •

edited by ankur22

Loading

Tasks

imiric commented Jun 6, 2022

na-- commented Jun 6, 2022 •

edited

Loading

dgzlopes commented Dec 1, 2022 •

edited

Loading

dgzlopes commented Dec 1, 2022

ankur22 commented Sep 26, 2024 •

edited

Loading

jewbetcha commented Oct 2, 2024

ankur22 commented Oct 3, 2024

ankur22 commented Oct 7, 2024

URL Grouping/Aggregation #371

URL Grouping/Aggregation #371

Comments

tom-miseur commented Jun 3, 2022 • edited by ankur22 Loading

Potential solutions

Allowlist/blocklist hosts in xk6-browser

Allowlist/blocklist hosts after-the-fact

Aggregation Rules

Tasks

imiric commented Jun 6, 2022

na-- commented Jun 6, 2022 • edited Loading

dgzlopes commented Dec 1, 2022 • edited Loading

dgzlopes commented Dec 1, 2022

ankur22 commented Sep 26, 2024 • edited Loading

jewbetcha commented Oct 2, 2024

ankur22 commented Oct 3, 2024

ankur22 commented Oct 7, 2024

tom-miseur commented Jun 3, 2022 •

edited by ankur22

Loading

na-- commented Jun 6, 2022 •

edited

Loading

dgzlopes commented Dec 1, 2022 •

edited

Loading

ankur22 commented Sep 26, 2024 •

edited

Loading