-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pagination #279
Add pagination #279
Conversation
@hadley Would be great to get feedback on this so that we can add some basic pagination support to httr2 soon 😄 |
Looking at this very quickly, I wonder if we should start a bit lower level so if needed you get greater control over the process. Maybe something like this? library(tibblify)
req <- request("https://pokeapi.co/api/v2/pokemon") %>%
req_paginate(
next_field = "next",
results_field = "results",
limit = in_query("limit", 150L),
total_field = "count"
)
repeat({
resp <- req_perform(req)
req <- resp_next_request(resp)
if (is.null(req)) break
}) Once we have that in place, it would be trivial to build these one-size-fits-most helpers on top. I'm not sure what |
Good point with the lower level version. The api now consists of:
There are also three functions needed that specify where in the request a parameter is set:
They are needed in two ways: request("https://pokeapi.co/api/v2/pokemon") %>%
req_paginate_offset(
offset = in_query("offset"),
page_size = in_query("limit", 150L),
total = "count"
) The page size often isn't needed and could also be set before by the user. The advantage of having that in |
R/paginate.R
Outdated
req_paginate <- function(req, | ||
next_request, | ||
page_size = NULL, | ||
total = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the total
interface isn't flexible enough. The response might contain a) the number of elements (what I assume in this implementation) or b) the number of pages.
R/paginate.R
Outdated
check_character(next_url) | ||
|
||
next_request <- function(req, resp) { | ||
body_parsed <- resp_body_json(resp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to cache the parsed body. Otherwise the body is usually parsed twice.
R/paginate.R
Outdated
#' | ||
#' responses <- paginate_perform(req_pokemon) | ||
paginate_perform <- function(req, | ||
max_pages = 20L, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to add a callback data
that is applied on the response and returns the data to store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; but lets do that in the next PR.
R/paginate.R
Outdated
#' @export | ||
#' | ||
#' @examples | ||
#' in_query("start", 20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need these helpers. It's not too awful to supply an anonymous function if there's an existing response helper:
page_size = \(resp) resp_url_query(resp, "limit", 150L),
next_page = \(resp) resp_link_url(resp, "next")
Then we'd just need something extra in resp_body_json()
that let you drill down to a specific component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(That makes me wonder if resp_link_url()
should actually be resp_header_link()
)
This will be sooooooooo useful! Is there anything I can do to help? If it's ready to kick the tires, I can try it out on a few APIs. |
I removed the library(tibblify)
# next url ----------------------------------------------------------------
req_next_url <- request("https://pokeapi.co/api/v2/pokemon") %>%
req_url_query(limit = 150) %>%
req_paginate_next_url(
next_url = \(resp) resp_body_json(resp)[["next"]],
n_pages = \(resp) {
calculate_n_pages(
page_size = 150,
total = resp_body_json(resp)$count
)
}
)
responses_next_url <- paginate_perform(req_next_url)
req_next_url_header <- request("https://api.github.com/repos/octocat/Spoon-Knife/issues") %>%
req_paginate_next_url(
next_url = \(resp) resp_link_url(resp, "next")
)
responses_next_url_header <- paginate_perform(req_next_url_header, max_pages = 3)
# offset ------------------------------------------------------------------
req_offset <- request("https://pokeapi.co/api/v2/pokemon") %>%
req_url_query(limit = 150) %>%
req_paginate_offset(
offset = \(req, offset) req_url_query(req, offset = offset),
page_size = 150,
n_pages = \(resp) {
calculate_n_pages(
page_size = 150,
total = resp_body_json(resp)$count
)
}
)
responses_offset <- paginate_perform(req_offset) The last example shows why it can be useful to have the As mentioned above by @hadley it would be nice to have a helper to extract some information from the body, e.g. a new argument to It would also be nice to cache the parsed body, as the parsing can be relatively expensive for bigger responses. I think this should also be done in a separate PR. |
I would be happy if you try it out and provide feedback. |
I'll give it a try later today! |
@mgirlich looking at the interface now, I'd say in a future PR we should add an argument for parsing the body so the interface could look something like this: req_next_url <- request("https://pokeapi.co/api/v2/pokemon") %>%
req_url_query(limit = 150) %>%
req_paginate_next_url(
body = \(resp) resp_body_json(resp),
next_url = \(resp, body) body[["next"]],
n_pages = \(resp, body) ceiling(body$count / page_size)
) I'd suggest we don't expect |
R/paginate.R
Outdated
#' @param set_token A function that applies that applies the new token to the | ||
#' request. It takes two arguments: a [request] and the new token. | ||
#' @param next_token A function that extracts the next token from the [response]. | ||
#' @param n_pages A function that extracts the next token from the [response]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is out of date?
) | ||
} | ||
|
||
#' Perform a paginated request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document this with req_paginate()
?
R/paginate.R
Outdated
#' | ||
#' responses <- paginate_perform(req_pokemon) | ||
paginate_perform <- function(req, | ||
max_pages = 20L, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; but lets do that in the next PR.
R/paginate.R
Outdated
if (!req_policy_exists(req, "paginate")) { | ||
cli::cli_abort(c( | ||
"{.arg req} doesn't have a pagination policy", | ||
i = "You can add pagination via `req_paginate()`." | ||
)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this check to paginate_perform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it makes sense to also export paginate_next_request()
so I left the check there but also added one to paginate_req_perform()
.
R/req-body.R
Outdated
@@ -216,7 +216,8 @@ req_body_apply <- function(req) { | |||
} else if (type == "raw") { | |||
req <- req_body_apply_raw(req, data) | |||
} else if (type == "json") { | |||
content_type <- "application/json" | |||
# FIXME temporary workaround just for testing purposes. Remove before merging! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix now?
I got it to work with Slack 🎉🎉🎉 Some comments:
Overall this worked great, it just could use some documentation tweaks. I'm very happy to see this in action! |
To inform in
Let's tackle that in a separate PR. I guess at least adding an extra callback for processing the response would make sense. |
Pagination support would be great to have. This PR provides basic support for pagination. It adds a general
req_paginate()
and three helpers for common patterns:req_paginate_next_url()
: the response contains a next url (usually in the body)req_paginate_offset()
: simple pagination withlimit
andoffset
(typically in the query)req_paginate_next_token()
: pagination via a token/cursorI could test the first two helpers via the public Pokemon API, the last one only via an API that needs authorisation.