Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: get raw repr of key/value when deserializing #761

Closed
abonander opened this issue Jul 27, 2024 · 5 comments
Closed

Feature Request: get raw repr of key/value when deserializing #761

abonander opened this issue Jul 27, 2024 · 5 comments

Comments

@abonander
Copy link

abonander commented Jul 27, 2024

Background

I'm creating a TOML config parser for SQLx to override various aspects of the library's operation, namely the query macros, as well as migrations. (launchbadge/sqlx#3383)

One big need is to provide a way to override or augment the default SQL -> Rust type mappings in the macros, so I'm adding a section like this to the TOML format:

[macros.type_overrides]
# SQL type `foo` will map to `crate::types::Foo`
foo = "crate::types::Foo"

This is going to be most useful with PostgreSQL, which has a rich type system supporting user-created types (as well as a dynamically loaded extension system which can also add types).

Making it easier to use custom types is one of the main goals of this feature.

Type names in Postgres can be schema-qualified, and to support this in TOML, I'm just allowing type names to be arbitrary paths (using a type with a custom Deserialize implementation to read nested tables):

[macros.type_overrides]
foo.foo = "crate::types::Foo"

This would also make it easier to define multiple types for the same schema, as the user can just create a new table:

[macros.type_overrides.foo]
# foo.foo
foo = "crate::types::Foo"
# foo.bar
bar = "crate::types::Bar"

Problem

In Postgres, both the name of the type as well as the name of the schema can be quoted identifiers, which are case-sensitive whereas normal unquoted identifiers are not. Thus, it is important to distinguish between the two.

From my experience, I know with some certainty that a typical user's first instinct will be to just do this:

[macros.type_overrides]
# *NOT* `foo."foo"` in SQLx (parses as `foo.foo`)
foo."foo" = "crate::types::Foo"

But this will not result in the expected behavior because toml will parse out the quotes, and SQLx won't know the difference.

This is an issue with or without a schema qualifier:

[macros.type_overrides]
# *NOT* `"foo"` in SQLx (parses as `foo`)
"foo" = "crate::types::Foo"

Proposed Solution

I think the most versatile solution would be to add a type analogous to serde_json::value::RawValue.

When parsing, I could use something like this to get the raw key, then decide whether to parse it or use it raw.

A simpler solution would be to just add a specific analogue just for keys, maybe implementing Deserialize for toml_edit::Key itself. That would satisfy my use-case but little else.

I'm willing to put some work into a PR if we agree on a solution here.

Alternatives

Fix It in Post Documentation

This could easily be written off as a simple documentation issue on SQLx's part. If the key is wrapped in a second pair of quotes, SQLx will see the inner quotes:

[macros.type_overrides]
# `foo."foo"` in SQLx
foo.'"foo"' = "crate::types::Foo"

This is what I'm going with for now so I'm not blocked on this proposal. However, it's not a satisfying solution on its own because there's always users who don't fully read documentation and just go with their gut, then open an issue when things don't work as expected. I'd prefer SQLx to be able to handle this intelligently.

Spanned

While reading the source, I did notice the support for serde_spanned::Spanned which would theoretically allow me to implement this myself.

I could deserialize a key with Spanned<IgnoredAny>, then go back to the TOML string and get the raw key from the given span.

The problem is, how do I thread the TOML string that deep in the call tree?

  • Thread-local storage?
    • Would work, but TLS is not particularly fast to access. I'm not overly worried about performance, but I'm trying to avoid any huge pitfalls.
    • I might try this if my main proposal goes nowhere and quoted identifiers prove to be a significant problem.
  • DeserializeSeed?
    • Non-starter. I'd have to replace #[derive(serde::Deserialize)] with manual implementations, which would make it more cumbersome for maintenance and future additions.

Also, is Spanned support even part of toml/toml_edit's SemVer guarantees?

Deserialize using the parsed TOML structure

This is a non-starter for the same reasons DeserializeSeed is.

Lint the TOML or Two-Pass Deserializtion

These are basically the same thing and have mostly the same problems.

Alongside the documentation alternative, I could lint for improperly quoted names in the TOML by parsing to a toml_edit::DocumentMut, inspecting the structure and then deserializing the config structure if I don't find any problems.

Alternatively, I could parse to a toml_edit::DocumentMut, deserialize the config structure, then traverse the TOML structure and go back and fix-up quoted names.

Either way, it requires duplicating some of the work that deserialization is doing and having two different places in the code where semantics are defined, which is not great for maintainability.

@epage
Copy link
Member

epage commented Jul 27, 2024

Making the quoting significant makes this no longer TOML but a custom format that looks like TOML. As the use case is something we intentionally do not want to support, I think we'll pass on this feature.

@epage epage closed this as not planned Won't fix, can't repro, duplicate, stale Jul 27, 2024
@abonander
Copy link
Author

That's disappointing, I was hoping for at least some back-and-forth.

It wouldn't have to be in toml, it could be in toml_edit, the whole point of which, from my perspective, is to treat all syntax as significant.

But you're the boss.

@abonander
Copy link
Author

This could also be used, like that in serde_json, to extract a subsection of a TOML file and re-serialized it verbatim. I could see that as being very useful.

@abonander
Copy link
Author

@epage could you at least please answer this question?

is Spanned support even part of toml/toml_edit's SemVer guarantees?

@epage
Copy link
Member

epage commented Jul 27, 2024

Yes, it is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants