Are there plans to allow `req_throttle()` with `multi_req_perform()`? #224

michaelgfalk · 2023-05-05T06:24:13Z

You note this as a limitation in the docs. I'm currently writing a new wrapper for Wikipedia's APIs, and they have many endpoints that allow asynchronous requests but are strictly rate-limited. I'm rather new to all this—I imagine if it were simple to throttle asynchronous requests you would already have done it...

(Thanks for httr2. It's made it very easy for me to get started with this project!)

hadley · 2023-05-09T22:44:39Z

It's not obvious how to implement it since we currently just send all the requests to curl::multi_run and let it handle the details. Implementing throttling would require some how gradually feeding in more requests over time, which we might implement as part of #8, but that stills feel fairly far off.

(There's also some weird tension about requesting in parallel to speed up your requests while throttling to slow them down).

michaelgfalk · 2023-05-11T22:33:10Z

Thanks Hadley. Some of the endpoints have high rate limits (e.g. 200/s), and I find I'm not hitting the limit with synchronous requests, although maybe there's some other reason for that. Perhaps I'll see if I can do something myself with the help of your hint (and your useful advice in Advanced R).

benwhalley · 2024-04-16T22:39:57Z

Just to add, rate limiting on parallel requests would be super useful for us. We're using the Azure AI API and can currently make around 80RPM for one model and 480 for another (and these limits will be going up shortly). In both cases responses can take quite a long time (up to a minute), so we want to max out the RPM. It would be great to be able to set different limits when sending lists of prompts to different models.

It seems like it would be fairly easy to add some code to do this here:

httr2/R/multi-req.R

Line 9 in 66a6fd0

#' `req_perform_parallel()` has a few limitations:

for (i in seq_along(reqs)) {
    perfs[[i]] <- Performance$new(
      req = reqs[[i]],
      path = paths[[i]],
      progress = progress,
      error_call = environment()
    )
    perfs[[i]]$submit(pool)
  }

  pool_run(pool, perfs, on_error = on_error)   # this calls curl::multi_run

You can call curl::multi_run as often as you like, so it seems like the logic could be:

define max_rpm
add the first max_rpm requests to the pool
call pool_run
wait 60 seconds if we need to add more requests to the pool
add the next max_rpm requests etc...

This doesn't account for realm, but that could be implemented a similar way?

hadley closed this as completed May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there plans to allow `req_throttle()` with `multi_req_perform()`? #224

Are there plans to allow `req_throttle()` with `multi_req_perform()`? #224

michaelgfalk commented May 5, 2023

hadley commented May 9, 2023

michaelgfalk commented May 11, 2023

benwhalley commented Apr 16, 2024

Are there plans to allow req_throttle() with multi_req_perform()? #224

Are there plans to allow req_throttle() with multi_req_perform()? #224

Comments

michaelgfalk commented May 5, 2023

hadley commented May 9, 2023

michaelgfalk commented May 11, 2023

benwhalley commented Apr 16, 2024

Are there plans to allow `req_throttle()` with `multi_req_perform()`? #224

Are there plans to allow `req_throttle()` with `multi_req_perform()`? #224