Skip to content

Commit

Permalink
Implement Spanned to retrieve source locations on AST nodes (apache…
Browse files Browse the repository at this point in the history
…#1435)

Co-authored-by: Ifeanyi Ubah <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
  • Loading branch information
3 people authored Nov 26, 2024
1 parent 0adec33 commit 3c8fd74
Show file tree
Hide file tree
Showing 18 changed files with 3,092 additions and 399 deletions.
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,23 @@ similar semantics are represented with the same AST. We welcome PRs to fix such
issues and distinguish different syntaxes in the AST.


## WIP: Extracting source locations from AST nodes

This crate allows recovering source locations from AST nodes via the [Spanned](https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html) trait, which can be used for advanced diagnostics tooling. Note that this feature is a work in progress and many nodes report missing or inaccurate spans. Please see [this document](./docs/source_spans.md#source-span-contributing-guidelines) for information on how to contribute missing improvements.

```rust
use sqlparser::ast::Spanned;

// Parse SQL
let ast = Parser::parse_sql(&GenericDialect, "SELECT A FROM B").unwrap();

// The source span can be retrieved with start and end locations
assert_eq!(ast[0].span(), Span {
start: Location::of(1, 1),
end: Location::of(1, 16),
});
```

## SQL compliance

SQL was first standardized in 1987, and revisions of the standard have been
Expand Down
52 changes: 52 additions & 0 deletions docs/source_spans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@

## Breaking Changes

These are the current breaking changes introduced by the source spans feature:

#### Added fields for spans (must be added to any existing pattern matches)
- `Ident` now stores a `Span`
- `Select`, `With`, `Cte`, `WildcardAdditionalOptions` now store a `TokenWithLocation`

#### Misc.
- `TokenWithLocation` stores a full `Span`, rather than just a source location. Users relying on `token.location` should use `token.location.start` instead.
## Source Span Contributing Guidelines

For contributing source spans improvement in addition to the general [contribution guidelines](../README.md#contributing), please make sure to pay attention to the following:


### Source Span Design Considerations

- `Ident` always have correct source spans
- Downstream breaking change impact is to be as minimal as possible
- To this end, use recursive merging of spans in favor of storing spans on all nodes
- Any metadata added to compute spans must not change semantics (Eq, Ord, Hash, etc.)

The primary reason for missing and inaccurate source spans at this time is missing spans of keyword tokens and values in many structures, either due to lack of time or because adding them would break downstream significantly.

When considering adding support for source spans on a type, consider the impact to consumers of that type and whether your change would require a consumer to do non-trivial changes to their code.

Example of a trivial change
```rust
match node {
ast::Query {
field1,
field2,
location: _, // add a new line to ignored location
}
```

If adding source spans to a type would require a significant change like wrapping that type or similar, please open an issue to discuss.

### AST Node Equality and Hashes

When adding tokens to AST nodes, make sure to store them using the [AttachedToken](https://docs.rs/sqlparser/latest/sqlparser/ast/helpers/struct.AttachedToken.html) helper to ensure that semantically equivalent AST nodes always compare as equal and hash to the same value. F.e. `select 5` and `SELECT 5` would compare as different `Select` nodes, if the select token was stored directly. f.e.

```rust
struct Select {
select_token: AttachedToken, // only used for spans
/// remaining fields
field1,
field2,
...
}
```
82 changes: 82 additions & 0 deletions src/ast/helpers/attached_token.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use core::cmp::{Eq, Ord, Ordering, PartialEq, PartialOrd};
use core::fmt::{self, Debug, Formatter};
use core::hash::{Hash, Hasher};

use crate::tokenizer::{Token, TokenWithLocation};

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

#[cfg(feature = "visitor")]
use sqlparser_derive::{Visit, VisitMut};

/// A wrapper type for attaching tokens to AST nodes that should be ignored in comparisons and hashing.
/// This should be used when a token is not relevant for semantics, but is still needed for
/// accurate source location tracking.
#[derive(Clone)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct AttachedToken(pub TokenWithLocation);

impl AttachedToken {
pub fn empty() -> Self {
AttachedToken(TokenWithLocation::wrap(Token::EOF))
}
}

// Conditional Implementations
impl Debug for AttachedToken {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}

// Blanket Implementations
impl PartialEq for AttachedToken {
fn eq(&self, _: &Self) -> bool {
true
}
}

impl Eq for AttachedToken {}

impl PartialOrd for AttachedToken {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}

impl Ord for AttachedToken {
fn cmp(&self, _: &Self) -> Ordering {
Ordering::Equal
}
}

impl Hash for AttachedToken {
fn hash<H: Hasher>(&self, _state: &mut H) {
// Do nothing
}
}

impl From<TokenWithLocation> for AttachedToken {
fn from(value: TokenWithLocation) -> Self {
AttachedToken(value)
}
}
1 change: 1 addition & 0 deletions src/ast/helpers/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
pub mod attached_token;
pub mod stmt_create_table;
pub mod stmt_data_loading;
84 changes: 75 additions & 9 deletions src/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,22 @@ use alloc::{
string::{String, ToString},
vec::Vec,
};
use helpers::attached_token::AttachedToken;

use core::fmt::{self, Display};
use core::ops::Deref;
use core::{
fmt::{self, Display},
hash,
};

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

#[cfg(feature = "visitor")]
use sqlparser_derive::{Visit, VisitMut};

use crate::tokenizer::Span;

pub use self::data_type::{
ArrayElemTypeDef, CharLengthUnits, CharacterLength, DataType, ExactNumberInfo,
StructBracketKind, TimezoneInfo,
Expand Down Expand Up @@ -87,6 +93,9 @@ mod dml;
pub mod helpers;
mod operator;
mod query;
mod spans;
pub use spans::Spanned;

mod trigger;
mod value;

Expand Down Expand Up @@ -131,7 +140,7 @@ where
}

/// An identifier, decomposed into its value or character data and the quote style.
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[derive(Debug, Clone, PartialOrd, Ord)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct Ident {
Expand All @@ -140,17 +149,49 @@ pub struct Ident {
/// The starting quote if any. Valid quote characters are the single quote,
/// double quote, backtick, and opening square bracket.
pub quote_style: Option<char>,
/// The span of the identifier in the original SQL string.
pub span: Span,
}

impl PartialEq for Ident {
fn eq(&self, other: &Self) -> bool {
let Ident {
value,
quote_style,
// exhaustiveness check; we ignore spans in comparisons
span: _,
} = self;

value == &other.value && quote_style == &other.quote_style
}
}

impl core::hash::Hash for Ident {
fn hash<H: hash::Hasher>(&self, state: &mut H) {
let Ident {
value,
quote_style,
// exhaustiveness check; we ignore spans in hashes
span: _,
} = self;

value.hash(state);
quote_style.hash(state);
}
}

impl Eq for Ident {}

impl Ident {
/// Create a new identifier with the given value and no quotes.
/// Create a new identifier with the given value and no quotes and an empty span.
pub fn new<S>(value: S) -> Self
where
S: Into<String>,
{
Ident {
value: value.into(),
quote_style: None,
span: Span::empty(),
}
}

Expand All @@ -164,6 +205,30 @@ impl Ident {
Ident {
value: value.into(),
quote_style: Some(quote),
span: Span::empty(),
}
}

pub fn with_span<S>(span: Span, value: S) -> Self
where
S: Into<String>,
{
Ident {
value: value.into(),
quote_style: None,
span,
}
}

pub fn with_quote_and_span<S>(quote: char, span: Span, value: S) -> Self
where
S: Into<String>,
{
assert!(quote == '\'' || quote == '"' || quote == '`' || quote == '[');
Ident {
value: value.into(),
quote_style: Some(quote),
span,
}
}
}
Expand All @@ -173,6 +238,7 @@ impl From<&str> for Ident {
Ident {
value: value.to_string(),
quote_style: None,
span: Span::empty(),
}
}
}
Expand Down Expand Up @@ -919,10 +985,10 @@ pub enum Expr {
/// `<search modifier>`
opt_search_modifier: Option<SearchModifier>,
},
Wildcard,
Wildcard(AttachedToken),
/// Qualified wildcard, e.g. `alias.*` or `schema.table.*`.
/// (Same caveats apply to `QualifiedWildcard` as to `Wildcard`.)
QualifiedWildcard(ObjectName),
QualifiedWildcard(ObjectName, AttachedToken),
/// Some dialects support an older syntax for outer joins where columns are
/// marked with the `(+)` operator in the WHERE clause, for example:
///
Expand Down Expand Up @@ -1211,8 +1277,8 @@ impl fmt::Display for Expr {
Expr::MapAccess { column, keys } => {
write!(f, "{column}{}", display_separated(keys, ""))
}
Expr::Wildcard => f.write_str("*"),
Expr::QualifiedWildcard(prefix) => write!(f, "{}.*", prefix),
Expr::Wildcard(_) => f.write_str("*"),
Expr::QualifiedWildcard(prefix, _) => write!(f, "{}.*", prefix),
Expr::CompoundIdentifier(s) => write!(f, "{}", display_separated(s, ".")),
Expr::IsTrue(ast) => write!(f, "{ast} IS TRUE"),
Expr::IsNotTrue(ast) => write!(f, "{ast} IS NOT TRUE"),
Expand Down Expand Up @@ -5432,8 +5498,8 @@ pub enum FunctionArgExpr {
impl From<Expr> for FunctionArgExpr {
fn from(wildcard_expr: Expr) -> Self {
match wildcard_expr {
Expr::QualifiedWildcard(prefix) => Self::QualifiedWildcard(prefix),
Expr::Wildcard => Self::Wildcard,
Expr::QualifiedWildcard(prefix, _) => Self::QualifiedWildcard(prefix),
Expr::Wildcard(_) => Self::Wildcard,
expr => Self::Expr(expr),
}
}
Expand Down
Loading

0 comments on commit 3c8fd74

Please sign in to comment.