Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination #8

Closed
jennybc opened this issue Aug 26, 2019 · 7 comments
Closed

Pagination #8

jennybc opened this issue Aug 26, 2019 · 7 comments
Labels
feature a feature request or enhancement

Comments

@jennybc
Copy link
Member

jennybc commented Aug 26, 2019

Help with auto-traversal, as we do in gh. There are some pretty standard ways of doing this.

@hadley
Copy link
Member

hadley commented May 24, 2021

The big question is how to handle failures — otherwise I think we could provide some helpers similar to what I did in https://github.com/ropensci/rtweet/blob/master/R/http.R#L105 and https://github.com/ropensci/rtweet/blob/master/R/http.R#L178 (but with more thought around the callbacks).

@hadley
Copy link
Member

hadley commented May 24, 2021

Maybe this is special case of general spidering problem — build queue and have callback to add new urls/requests to queue.

next_page <- function(req, resp) {
  req %>% req_url_query(cursor = resp_get_header(resp, "Next-Cursor"))
}
req %>% req_paginate(next_page)

Also need hint callback that tells the user how to continue if the pagination is rate-limited, terminated because hit maximum number of pages, or similar.

Possible to know total number of pages without knowing what request corresponds to each page.

Would return a list of responses and would be user's responsibility to handle. May want to allow some specialised error handling so that if it occurs after the first page, you still get the interim results (with a warning).

Also related to #18.

@hadley
Copy link
Member

hadley commented Jun 22, 2021

multi_req_paginate <- function(req, next_page, n_pages = NULL) {
  out <- vector("list", n_pages %||% 10)
  i <- 1L

  repeat({
    out[[i]] <- req_perform(req)
    if (!is.null(n_pages) && i == n_pages) {
      break
    }

    req <- next_page(req, out[[i]])
    if (is.null(req)) {
      break
    }

    i <- i + 1L
    if (i > length(out)) {
      length(out) <- length(out) * 2L
    }
  })

  if (i != length(out)) {
    out <- out[seq_len(i)]
  }
  out

}

@JosiahParry
Copy link

Would like to give me +1 to built in pagination. Would be very nice :)

@jl5000
Copy link
Contributor

jl5000 commented Jul 14, 2023

multi_req_paginate <- function(req, next_page, n_pages = NULL) {
  out <- vector("list", n_pages %||% 10)
  i <- 1L

  repeat({
    out[[i]] <- req_perform(req)
    if (!is.null(n_pages) && i == n_pages) {
      break
    }

    req <- next_page(req, out[[i]])
    if (is.null(req)) {
      break
    }

    i <- i + 1L
    if (i > length(out)) {
      length(out) <- length(out) * 2L
    }
  })

  if (i != length(out)) {
    out <- out[seq_len(i)]
  }
  out

}

I'm wondering if we couldn't generalise the n_pages argument to be a function determining the stop_condition? For example the MediaWiki API may not provide all metadata for the first page of results in a single response. The response containing the final batch of metadata has a batchcomplete flag in the response body.

@hadley
Copy link
Member

hadley commented Jul 14, 2023

I think you already get that from the next_page() callback; n_pages is probably better named max_pages; it just protects you from infinite iteration.

@mgirlich
Copy link
Collaborator

mgirlich commented Sep 1, 2023

Closed by #279.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants