Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to filter_equal_dict with the wrong typed object produces an exception instead of gracefully returning nothing #142

Open
jasper-tms opened this issue Apr 29, 2024 · 9 comments

Comments

@jasper-tms
Copy link

caveclient 5.18.0
python 3.10.8
MacOS


I have some code that is unexpectedly crashing because the following sorts of commands raise an exception:

client = CAVEclient('brain_and_nerve_cord')

# Trying to filter a str column using a bool
client.materialize.live_live_query(
    'cell_info',
    datetime.now(timezone.utc),
    filter_equal_dict={'cell_info': {'tag': True}})

# Trying to filter a bool column using a str
client.materialize.live_live_query(
    'backbone_proofread',
    datetime.now(timezone.utc),
    filter_equal_dict={'backbone_proofread': {'proofread': 'some tag'}})

I think it makes more sense in principle (and it definitely would be more useful for me in practice) if such an attempt to query while filtering using the wrong data type just returned nothing, instead of the sort of ugly error that's currently raised:

HTTPError: 500 Server Error: invalid input syntax for type boolean: "some tag"
LINE 4: WHERE backbone_proofread.proofread = 'some tag' AND backbone_proo...

(This might seem like an unusual command to want to run, but I'm trying to check whether a user-specified thing exists in any of a number of tables, so I loop over a few candidate tables and try doing this sort of filter to find said thing in said tables. My candidates include some str and some bool columns, hence the desire to be able to do this for both str and bool arguments without getting an exception when trying to query either str or bool columns.)

Thanks!

@fcollman
Copy link
Collaborator

I don't know if i agree about this, there are at least a couple of things you could be doing. 1) you know the schema of each of these tables, so your code can know which columns are booleans and which are strings. If you are searching for a boolean then you should only look in boolean columns, and if you are looking for a string you should only look in string columns.

Second, you can wrap your query in a try except look for an HttpError, and have it squash the error if you see it has code 500 and invalid input syntax in the error message, but raise up the exception otherwise.

If we made it just return silently, then i can easily imagine the mistake where someone starts putting 'true' and 'false' strings in a string schema and then a user searches for True, and receives no results and gets the wrong impression that there are no 'true' values for their query.. but they just used the wrong type.

@ceesem
Copy link
Collaborator

ceesem commented Apr 30, 2024

I agree with Forrest that this shouldn't be silent, but I do think that it would be more informative if we had a SchemaValidationError that specified the invalid field, the type that was given and the type that was expected rather than the generic 500 Server error.

@fcollman
Copy link
Collaborator

so a 400 errorcode with a more helpful error message?

@jasper-tms
Copy link
Author

I like Casey's idea of a specific python exception type (instead of an HTTPError) so it can be caught specifically. Is that sort of thing possible?

@fcollman
Copy link
Collaborator

from the server side no.. but from the client side yes we can transform the error from an httperror to some other kind of exception. We could even do the schema checking client side without involving the server since the client can also query all the schemas and know what the data types are before communicating with the server.

@jasper-tms
Copy link
Author

Sure, doing some validation client side before making a query and raising a specific type of exception if something is amiss sounds great! The only concern is that this would add a new check involving a request to the annotation engine to every materialization query, so would slow things down, right?

@fcollman
Copy link
Collaborator

yes.. that's right, plus just the development time to get this right and completely general is not trivial.

@jasper-tms
Copy link
Author

Then to prevent slowing every query down (even ones with the correct dtypes) perhaps it makes more sense to do what you suggested initially – just have the server try the db query without checking types, then if an error arrises to convert it to a different type of exception (either based on the error message that the db returns or by then doing a check of types). This might also take less effort to develop. I imagine this case I described isn't going to come up terribly often, so I wouldn't think it makes sense to make all queries slower in order to try to handle it.

@jasper-tms
Copy link
Author

I actually get different error text if I try to filter a string column with a bool:

HTTPError: 500 Server Error: operator does not exist: character varying = boolean
LINE 4: WHERE proofreading_notes.tag = true AND proofreading_notes.v...

which makes it a bit harder to catch this sort of error by the text of the exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants