Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an --ids-only flag to easily extract ID's from a search. #645

Open
cholmes opened this issue Jul 20, 2022 · 16 comments · May be fixed by #940
Open

Add an --ids-only flag to easily extract ID's from a search. #645

cholmes opened this issue Jul 20, 2022 · 16 comments · May be fixed by #940
Assignees
Labels
2024-revisit SDK committee reviewed and marked for follow-up

Comments

@cholmes
Copy link
Member

cholmes commented Jul 20, 2022

Looking into this, is all this work to get a comma-separated list of ids from the search? Whew, we (CLI, SDK) should make it easier 😆 Maybe add an --ids-only flag?

Originally posted by @jreiberkyle in #642 (comment)

@cholmes
Copy link
Member Author

cholmes commented Oct 31, 2022

Note that the place this is most needed is in planet data search, so the results can be passed into orders API. See like https://planet-sdk-for-python-v2.readthedocs.io/en/latest/cli-tutorial/#bringing-it-all-together in the second one we could eliminate the obscure | jq -r .id | tr '\n' , | sed 's/.$//' which is just formatting id's into the format needed.

@kevinlacaille kevinlacaille self-assigned this Mar 28, 2023
@kevinlacaille kevinlacaille linked a pull request Apr 26, 2023 that will close this issue
2 tasks
@sgillies
Copy link
Contributor

@cholmes @kevinlacaille I've got a different take on this. Instead of modifying the output of planet data search to suit the input of planet orders request, what if we flip it around?

Postel's law (https://en.wikipedia.org/wiki/Robustness_principle) says that programs should be conservative in what they send, but be liberal in what they accept. I think that applies to our SDK and CLI. If planet orders request accepted not only a list of item ids, but also a list of search items, and extracted the ids from them, planet data search wouldn't need to change.

I think enhancing planet orders request might also be less error prone. A planet data search can search across multiple item types and if we only extract the ids from the search results, we can lose track of which item types they map to. Whereas planet orders request could group items by item type and keep this information together.

@jreiberkyle
Copy link
Contributor

@sgillies this paradigm makes a lot of sense to me. It appears the primary use case here is creating an order from search results, so enhancing planet orders request could also work. On the other hand, this would add some complexity to planet orders request and not be a trivial change. @kevinlacaille what do you think?

@sgillies
Copy link
Contributor

sgillies commented May 2, 2023

@jreiberkyle I've got another one for you 😄 https://wiki.c2.com/?ConservationOfComplexity. Getting a comma-separated list of ids from planet data search requires waiting on all results to come back and then it might mix PSScene and SkySat ids together, which offloads complexity onto the user. I think planet orders request, which already carries well so much complexity for us -- making orders is really complicated -- could take on even more complexity without becoming unusable.

@cholmes
Copy link
Member Author

cholmes commented May 3, 2023

A planet data search can search across multiple item types and if we only extract the ids from the search results, we can lose track of which item types they map to. Whereas planet orders request could group items by item type and keep this information together.

To be clear this would require a pretty major change in functionality to planet orders request, as right now it can only handle one item type, and planet orders create similarly can only take input of one item type at a time. So if we're to enable piping in mixed PSScene and SkySat item types then we'd need an alternate flow. Like planet orders request outputting to requests? And planet orders create being able to take multiple of orders and create one order per?

Getting a comma-separated list of ids from planet data search requires waiting on all results to come back

Well realistically right now the maximum wait would only be 500 items, since that's the max a single order could handle. #746 will hopefully address that, and will also lead us down the path of creating more than one order on a single command.

Overall I do like the idea, I just don't think the argument about mixing item-types is that compelling, at least until we built the mechanism for our orders commands to handle multiple item types. Which is a direction I'm interested in, but is a non-trivial amount of work, and first I'd do the 'large orders' as that was a super clear user request, while mixing item-types from search results piped into orders is a bit theoretical, and is beyond the capabilities of our core API's.

I'm right now leaning towards 'both' - get the --ids-only in, but then take on making orders better in general and accept more. @sgillies I think if you'd brought this up earlier, before work had been done, I'd be convinced to just go that route, but I don't know that we should let great be the enemy of good and right now prefer to get this in and continue to iterate. I also think --ids-only can be useful just to more easily see the ids, instead of getting a wall of json output. I can be convinced otherwise if you feel really strongly about it Sean, and we have a full plan & design to actually handle large orders and be able to pipe in mixed item-type results.

@jreiberkyle
Copy link
Contributor

jreiberkyle commented May 3, 2023

It is a great point that planet data search can be used to search multiple item types, and therefore the result from an --ids-only could be from a mix of item types, but now the item type info has been stripped.

Additionally, we can get a list of the ids with planet data search as it is using jq, e.g.

$> planet data search PSScene --limit 3 | jq -r .id
20230503_070426_25_2498
20230503_091422_18_2426
20230503_075349_44_24bb

While I fully agree with discussion happening before work and not after, this ticket didn't get the benefit of clarifying the use case and thinking through the implications before the work, and the implications here are not insignificant, so I am open to discussion now (and @kevinlacaille indicated the same when we got to speak).

To start off, we can limit planet order request to check for and only allow one item type, something like:

item_types = set([i['properties']['item_type'] for i in items])
item_type = item_types.pop()

if item_types:  # more than one item_type was provided
    raise Exception(f'Only one item type at a time is supported, item types provided {item_types.add(item_type)}')

then go on to get the ids:

ids = set([i['id'] for i in items])

That all being said, it could make sense to reassess the priority of this relative to other work given the jq snippet above. @cholmes that's in your camp

@cholmes
Copy link
Member Author

cholmes commented May 3, 2023

Additionally, we can get a list of the ids with planet data search as it is using jq, e.g.

So if passing in the output of | jq -r .id worked as input to planet orders request then I probably wouldn't have made this issue. What you actually need is | jq -r .id | tr '\n' ',' | sed 's/.$//', see the second example in https://planet-sdk-for-python-v2.readthedocs.io/en/latest/cli/cli-orders/#bringing-it-all-together Maybe there's some pure JQ way to do that, but I don't know it. But as far as I can tell you need to convert the newlines + space to commas for input to the orders request command.

I think the 'liberal input' thing to do would be to have planet orders request be able to take | jq -r .id output. I'd be happy with that as an incremental step forward - have it be liberal about that list of id's, and eventually take full items as input as well.

To start off, we can limit planet order request to check for and only allow one item type, something like:

? planet order request is limited to only one item-type currently:

planet orders request 44343 --item-type SkySatCollect,Sentinel1 --bundle analytic --name dude
Usage: planet orders request [OPTIONS] IDS
Try 'planet orders request --help' for help.

Error: Invalid value for '--item-type': 'SkySatCollect,Sentinel1' is not one of 'Sentinel2L1C', 'PSScene', 'Sentinel1', 'MOD09GA', 'REOrthoTile', 'MOD09GQ', 'MYD09GA', 'Landsat8L1G', 'SkySatCollect', 'MYD09GQ', 'PSOrthoTile', 'REScene', 'SkySatScene'.

That all being said, it could make sense to reassess the priority of this relative to other work given the jq snippet above. @cholmes that's in your camp

I think some improvement is needed, since it's not just that snippet. If the dev team is all aligned on not going with --ids-only that's fine - I'm happy with any improvement that better lets me pipe output from planet data search without | jq -r .id | tr '\n' ',' | sed 's/.$//'. planet orders request accepting output of jq -r .id is reasonable, or having it accept full item geojson is also fine. If you all have a pure jq method that also works. My main goal is to not have to pipe through 3 different programs for one of the most common commands.

@sgillies
Copy link
Contributor

sgillies commented May 3, 2023

The jq expression to make a comma separated list of strings is a little complex, but more concise than using sed and tr. The -s "Slurp" option is the key.

planet data search PSScene --limit 4 | jq -s -r '[.[].id] | join(",")'

@jreiberkyle
Copy link
Contributor

jreiberkyle commented May 3, 2023

@cholmes I just want to take a second to validate your main goal.

My main goal is to not have to pipe through 3 different programs for one of the most common commands.

And even with Sean's snippet above, I don't think we are there yet - I for one will have to add that snippet to a cheat sheet to remember it whenever I want to use it. planet data search to planet orders request is a very common path and should be simple for the user. And I think we should amplify that as the central discussion point: It should be easy to make an order request from a search result.

Here are the two approaches we are discussing, laid out in more detail

Option 1 - planet orders request accepts more:

planet data search --limit 5 PSScene | planet orders request - --name test --clip aoi.geojson --bundle analytic

If that looks good then let's look at planet orders request and how it would need to change.

Currently, this is the signature of planet orders request:

Usage: planet orders request [OPTIONS] IDS

  Generate an order request.

  This command provides support for building an order description used in
  creating an order. It outputs the order request, optionally pretty-printed.

  IDs is one or more comma-separated item IDs.

Options:
...

This would need to change in the following ways:

  • planet orders request [OPTIONS] IDS would become planet orders request [OPTIONS] NEW-NAME-THAT-COVERS-IDs-AND-ITEMS (not sure what that new name is yet). Docs indicate that NEW-NAME can either be a comma separated list of ids or a list of item descriptions. If its item descriptions, item-type arg is not necessary
  • --item-type goes from being a required option to only required if IDs is used, and not ITEMS

Additional complexity:

  • conditions upon whether options are required or not, IDs becomes a complex input of a comma separated list of ids or a list of item descriptions
  • need to handle a mix of item-types
  • need to validate the item description input data

Option 2: planet data search can output ids:

planet data search --limit 5 PSScene --ids-only | planet orders request - --name test --clip aoi.geojson --item-type PSScene --bundle analytic

For this, we add a new boolean flag, --ids-only to planet data search.

Additional complexity:

  • Need to handle the case where a LARGE number of results are retrieved from the search when trying to concatenate into a comma-separated list

@sgillies
Copy link
Contributor

sgillies commented May 4, 2023

I'm not strongly opposed to an --ids-only option. In my opinion, it could even be considered a bug fix and added in a 2.0.x release. What I'm pointing out here is that it's a special case with some unexamined corners and as a special case entails some technical debt. Special cases are always a drag on projects in the long run and should be used sparingly.

Generalizing planet orders request shouldn't be done in a 2.0.x release, if it is done at all.

@kevinlacaille
Copy link
Contributor

kevinlacaille commented May 4, 2023

Thank you all for your comments, I think we have a really productive dialogue going on here! Let me gather my thoughts for a moment.

At the end of the day, I want an easy path from searching for data to requesting it. The two ways of getting there mentioned are:

  1. Expanding the outputs from data search by returning only item IDs
  2. Or, having a more broad range inputs for orders request by accepting any list of item descriptions then finding item types, bundles, etc.

I really dislike the idea of having users look up how to use extra tools like jq, tr, or sed to perform such a simple task as looking up and requesting data - probably one of the main reasons someone would use the CLI.

I do like the simplicity of our current solution in #940 where one can simply write the following to go from search to request:

Option 1:

planet data search --limit 5 PSScene --ids-only \
  | planet orders request - --name test --item-type PSScene --bundle analytic_udm2

However, I feel like this is likely going to be a fairly common workflow and the --ids-only option is going to feel redundant and the following command would be more desirable:

Option 2:

planet data search --limit 5 PSScene \
  | planet orders request - --name test --item-type PSScene --bundle analytic_udm2

I'm hesitant to plug this (option 1) in as a bug fix for 2.0.1, as I don't want users to get into this workflow and then switch over to option 2 and have their pipelines break when --ids-only is no longer supported.

What are your thoughts on closing this ticket and pushing for more broad inputs for orders request right away?

@sgillies
Copy link
Contributor

sgillies commented May 4, 2023

Let's take the meta conversation to a call! I'm inclined to approve #940 but am asking for 2 changes.

@jreiberkyle
Copy link
Contributor

+1 on the call. However, I believe we should scrap this. Due to the accumulation, it will at the least lag, and at the worst break, if someone uses --ids-only with a large limit. I'm with @kevinlacaille on scrapping this and pushing for more broad input for orders request - though we still need to discuss how to "slurp" (ugghh I hate that verb) or "aggregate" (ahh, better) multiple lines

@kevinlacaille kevinlacaille linked a pull request May 4, 2023 that will close this issue
2 tasks
@cholmes
Copy link
Member Author

cholmes commented May 4, 2023

I was also leaning towards scrapping. Happy for call, but we may not need it.

I think jq -s -r '[.[].id] | join(",")' likely solves the immediate need - I think use of jq is fine, I just didn't like 3 different tools. I suspected there was a jq way, but it was beyond my skills. And then yes, let's prioritize at least the design of our 'ideal' way. And tackle 'large orders', which is an independent request that is much desired.

@jreiberkyle
Copy link
Contributor

instead of a meeting, let's talk about what I propose and demonstrate in #944. Note how it takes the search result as input. The script is hacked up so there is still much work to be done but it would take only a moderate amount of effort.

additionally, see what I propose and demonstrate in #945, which uses a slight variation of that script to demonstrate chunking a 500-item search result.

@tbarsballe tbarsballe added the 2024-revisit SDK committee reviewed and marked for follow-up label Aug 8, 2024
@tbarsballe
Copy link
Contributor

Closing in favor of more comprehensive solutions like #944 and similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024-revisit SDK committee reviewed and marked for follow-up
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants