Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow strings in field/2 #4384

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

greg-rychlewski
Copy link
Member

I'm worried I'm missing something obvious. But I think with these 2 simple changes to the schema functions we can allow string names in field/2.

The __schema__(:type, field)__ functions will now catch string names and get the right type. And the __schema__(:field_source, field)__ functions will ensure the field is converted back into an atom during normalization so that the adapters don't have to change.

@josevalim
Copy link
Member

Interesting. What happens if there is no schema?

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Feb 24, 2024

Ah good catch, I knew I forgot something. Then it fails but it can be fixed by changing the guard in the adapter here: https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L835 and here https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L841

@josevalim
Copy link
Member

I think if we want to go down this route, we should rather go in the other direction. Have field_source return strings and then send it all the way down to the adapter as strings. If there is no field source, we normalize it to strings on the spot?

As long as it is a small change on the adapter side, we should be good. :)

@josevalim
Copy link
Member

Although converting to strings on adapters such as etso and ecto_mnesia will likely cause other problems.

@greg-rychlewski
Copy link
Member Author

It makes the inspection a bit weird too if it's normalized to string. I had to make a special condition to make the inspect look normal. I just pushed a change

@josevalim
Copy link
Member

Thanks @greg-rychlewski. I am honestly a bit unsure. It feels we are at the worst of both words: sometimes they are atom, sometimes they are strings. What do you think about:

  1. If there is a schema, we always normalize them to atoms in the planner (as long as the schema module has been loaded because we called a previous operation in it, String.to_existing_atom should be enough).

  2. For schemaless queries, we always keep them as strings.

Perhaps we can start with step 2 for now (and raise if a string is given to a schema source)?

@greg-rychlewski
Copy link
Member Author

That sounds like a good compromise. I'll update the PR.

There was one thing I was having trouble figuring out for inspect. I believe the dot notation for the fields does not play nice with strings. For example, atoms produce this

query = from p in Post, select: field(p, :visit)

IO.inspect query
# "from p0 in Inspect.Post, select: p0.visit"

But strings produce this

query = from p in Post, select: field(p, "visit")

IO.inspect query
# "from p0 in Inspect.Post, select: p0 . \"visit\"()"

I was previously handling it by transforming it to field inside of the inspect. But if we will normalize to string then it could be confusing if there is an error during planning and the user didn't use field/2. The only other thing I could think is to not use the dot notation, but then the inconsistency is probably confusing.

Do you know if there is anything I can do to make the string play nice with the dot notation?

@josevalim
Copy link
Member

My only suggestion is to indeed convert it back to field(..., ...) when pretty printing/inspecting.

@greg-rychlewski
Copy link
Member Author

I had one idea. Would you be against tagging the field/2 stuff all the way down to the adapter and them have the source resolved in the adapter? That way we could normalize everything given to field/2 to string and all the non field/2 stuff will stay as atom.

The reason I'm suggesting this is because when I tried normalizing schemaless sources to string it started to affect a lot of things I didn't expect like subqueries and joins. Doing it this way would minimize the blast radius and still have some kind of internal consistency. And either way we are asking adapters to change something.

@josevalim
Copy link
Member

To be clear, you are saying this:

field/2 with a string arrives as a field in the AST all the way down to the adapter? That sounds good to me.

But we still need to decide what to do with strings and schemas. We have to options:

  1. We say string fields are not checked, typed, or converted in any way (akin to a fragment)
  2. We say string fields are validated and normalized accordingly

@greg-rychlewski
Copy link
Member Author

I'm starting to feel like I completely misunderstood one of your earlier replies. When you said this

I am honestly a bit unsure. It feels we are at the worst of both words: sometimes they are atom, sometimes they are strings. What do you think about:

If there is a schema, we always normalize them to atoms in the planner (as long as the schema module has been loaded because we called a previous operation in it, String.to_existing_atom should be enough).

For schemaless queries, we always keep them as strings.

Were you talking about field/2 specifically? I read it as any field notation, for example p.title as well but now I think I was mistaken.

@josevalim
Copy link
Member

At the time I did not mean field/2 but later on I certainly thought it should be only about field/2. :)

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Feb 27, 2024

Let me know what you think of this:

field/2 with string name

  • always normalizes its name as an atom when dealing with schemas
  • keeps its name as string when not dealing with schemas
  • `has its type validated when dealing with schemas. it's handled by adding guards to the reflection functions here:

{args, when_expr, body} ->

types_quoted =

field/2 with atom name

  • no changes. though i'm not sure if you'd prefer to normalize its name to string when dealing with schemaless sources

@greg-rychlewski
Copy link
Member Author

we could also make it so that field/2 with a string always emits a string and field/2 with an atom always emits an atom. for both schema/schemaless.

@josevalim
Copy link
Member

My thinking would be that, if you pass a string, then it is kept as is and it would always be the field source. We never transform or manipulate it in any case.

@greg-rychlewski
Copy link
Member Author

Do you think it would still be ok to check its type against the schema? For instance, if someone wanted to do this

field = get_user_input_as_string(...)
query = from p in Post, where field(p, ^field) == ^"some_string"

To work properly the parameter would have to be cast to the type of field(p, ^field).

Or alternatively they could wrap field(p, ^field) inside of type/2 but then it's kind of the asme problem where they have to dynamically find the type they want given a string field name.

@josevalim
Copy link
Member

josevalim commented Feb 28, 2024

It really depends which problem we want to solve. Do we want to make it easier for people receiving parameters to create custom queries or do we want to make it possible for you to dynamically query databases without generating tons of atoms?

@greg-rychlewski
Copy link
Member Author

I would say this is the bigger issue right now:

make it possible for you to dynamically query databases without generating tons of atoms

In which case I think you are saying this is not an issue currently for schemas. Because if you are using schemas then you know up front which strings are allowed and can handle it without this change?

If my above statement is not mistaken, then I think even though it's a bigger problem, it might be a bit confusing why we are not solving this problem at the same time if it's not too hard to do so:

make it easier for people receiving parameters to create custom queries

@greg-rychlewski
Copy link
Member Author

I think I get it though. Basically it's about staying consistent using atoms when referring to schema fields right? If so it's ok to me. I could change the PR to raise if string is used for schemas.

@josevalim
Copy link
Member

The problem I think is that we have two acceptable semantics. One is to make it a direct field source access and the other is to make it automatically cast. When we pick one, we exclude the other. For schemaless fields they are the same, however.

@josevalim
Copy link
Member

In which case I think you are saying this is not an issue currently for schemas. Because if you are using schemas then you know up front which strings are allowed and can handle it without this change?

Yes, you could do String.to_existing_atom, althogh we can argue that we could have a more convenient API. The question is if it should be field or something else.

@josevalim
Copy link
Member

For example, just for brainstorming purposes, it could be called field! or cast_field or something. Given the field may not exist and the query my fail.

@greg-rychlewski
Copy link
Member Author

The question is if it should be field or something else.

One reason I can think to keep it as field/2 is because I'm seeing other areas that users might want opened up. For example, if someone has a need for field/2 with string then probably someone would also have a need for map/2 with a string list.

If we go down the route of making a new function for each one it might be cumbersome for us to maintain and for users to use. So it might be worthwhile to lax the definition to include stings in the existing functions and make them have the same semantics as atoms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants