Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination with item_collection function #165

Open
agrosu1978 opened this issue Nov 4, 2024 · 5 comments
Open

Pagination with item_collection function #165

agrosu1978 opened this issue Nov 4, 2024 · 5 comments

Comments

@agrosu1978
Copy link

agrosu1978 commented Nov 4, 2024

Hello

I may be doing something wrong, but I cannot make the pagination work. I do have a collection with 14 items, and when I try to use item_collection with the default limit of 10, I don't receive a rel = next link in the links list. When I manually call the pgstac with the limit set to 1, the "rel": "next" link is there. What I want to say is that for the following code:

items = []
token = None
while True:
    items_collection = await self.client.item_collection(
        request=request,
        collection_id=collection_id,
        limit=1,
        token=token,
    )
    items.extend(items_collection.get("features", []))
    # Check if there's a next token for pagination
    token = None
    for link in items_collection.get("links", []):
        if link.get("rel") == "next":
            token = link.get("href", None)
    if not token:
        # No more pages left, break the loop
        break

(where self.client is a CoreCrudClient object) I get the following answer:

{
    "type": "FeatureCollection",
    "context": {"limit": 1, "returned": 1},
    "features": [
        {
            "id": "feature_id",
            "rest_of_the_fields": "OK"
        }
    ],
    "links": [
        {
            "rel": "collection",
            "type": "application/json",
            "href": "https://my_link/collections/user_my_test_collection",
        },
        {
            "rel": "parent",
            "type": "application/json",
            "href": "https://my_link/collections/user_my_test_collection",
        },
        {
            "rel": "root",
            "type": "application/json",
            "href": "https://my_link/",
        },
        {
            "rel": "self",
            "type": "application/geo+json",
            "href": "https://my_link/collections/user_my_test_collection/items",
        },
    ],
}

The response has one item (feature) with the correct information (I replaced this info with "rest_of_the_fields": "OK")
But as you can see, in the links list, there is no "rel": "next" link so I can use with the token for the next call.
A manual call to pgstac returns me this:

{
    "type": "FeatureCollection",
    "context": {"limit": 1, "returned": 1},
    "features": [
        {
            "id": "feature_id",
            "rest_of_the_fields": "OK"
        }
    ],
    "links": [
        {
            "rel": "collection",
            "type": "application/json",
            "href": "https://my_link/collections/user_my_test_collection",
        },
        {
            "rel": "parent",
            "type": "application/json",
            "href": "https://my_link/collections/user_my_test_collection",
        },
        {
            "rel": "root",
            "type": "application/json",
            "href": "https://my_link/",
        },
        {
            "rel": "self",
            "type": "application/geo+json",
            "href": "https://my_link/collections/user_my_test_collection/items",
        },
        {
            "rel": "next",
            "type": "application/geo+json",
            "method": "GET",
            "href": "https://my_link/collections/user_my_test_collection/items?limit=1&token=next:user_my_test_collection:<next_feature_id>",
        },
    ],
}

where <next_feature_id> is indeed the good one.
I searched all over for an example with this pagination, but I wasn't able to find any. That's why I reached out to post the issue here.
(I used the limit = 1 just to see how to make the pagination work, in the real code it is set to 50)
Thank you for reading this

@vincentsarago
Copy link
Member

@agrosu1978 which version of pgstac are you using?

@agrosu1978
Copy link
Author

Hello Vincent.
These are the used libraries:

pypgstac 0.8.6
pystac 1.11.0
stac-fastapi-api 3.0.3
stac-fastapi-extensions 3.0.3
stac-fastapi-pgstac 3.0.0
stac-fastapi-types 3.0.3
stac-pydantic 3.1.3

@vincentsarago
Copy link
Member

@agrosu1978 do you have the context setting set to ON in pgstac? https://stac-utils.github.io/pgstac/pgstac/#pgstac-settings-variables

without this, we don't get a numberMatched value so we can't know if there is a next item 🤷

(I used the limit = 1 just to see how to make the pagination work, in the real code it is set to 50)

that is weird 😬

we only test pagination with limit = 1 😬

async def test_pagination_item_collection(
app_client, load_test_data, load_test_collection
):
"""Test item collection pagination links (paging extension)"""
test_item = load_test_data("test_item.json")
ids = []
# Ingest 5 items
for _ in range(5):
uid = str(uuid.uuid4())
test_item["id"] = uid
resp = await app_client.post(
f"/collections/{test_item['collection']}/items", json=test_item
)
assert resp.status_code == 201
ids.append(uid)
# Paginate through all 5 items with a limit of 1 (expecting 5 requests)
page = await app_client.get(
f"/collections/{test_item['collection']}/items", params={"limit": 1}
)
idx = 0
item_ids = []
while True:
idx += 1
page_data = page.json()
item_ids.append(page_data["features"][0]["id"])
nextlink = [link["href"] for link in page_data["links"] if link["rel"] == "next"]
if len(nextlink) < 1:
break
page = await app_client.get(nextlink.pop())
assert idx < 10
# Our limit is 1 so we expect len(ids) number of requests before we run out of pages
assert idx == len(ids)
# Confirm we have paginated through all items
assert not set(item_ids) - set(ids)

@agrosu1978
Copy link
Author

agrosu1978 commented Nov 5, 2024

the context setting set to ON in pgstac? https://stac-utils.github.io/pgstac/pgstac/#pgstac-settings-variables

Thank you @vincentsarago for your response.
No, the context setting was not set to 'ON'. Now it is, and the response changed a little bit. I am receiving the "matched" field in the context dictionary, but still no "rel": "next" in the links array:

{
    "type": "FeatureCollection",
    "context": {"limit": 1, "matched": 14, "returned": 1},
    "features": [
        {
            "id": "feature_id",
            "rest_of_the_fields": "OK"
        }
    ],
    "links": [
        {
            "rel": "collection",
            "type": "application/json",
            "href": "https://my_link/collections/agrosu_my_test_collection",
        },
        {
            "rel": "parent",
            "type": "application/json",
            "href": "https://my_link/collections/agrosu_my_test_collection",
        },
        {
            "rel": "root",
            "type": "application/json",
            "href": "https://my_link/",
        },
        {
            "rel": "self",
            "type": "application/geo+json",
            "href": "https://my_link/collections/user_my_test_collection/items",
        },
    ],
}

@jamescw19
Copy link

I'm having the same issue with my stac-fastapi-pgstac deployment. I don't get a next link in item searches so I can't retrieve all of the results. I'm on stac-fastapi-pgstac 3.0.0 as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants