Skip to content

Commit

Permalink
Allow reserved words in unquoted imports but disallow whitespace.
Browse files Browse the repository at this point in the history
Fix #3983.
Fix #3984.
  • Loading branch information
munificent committed Aug 14, 2024
1 parent 5527a8f commit 98652a6
Showing 1 changed file with 134 additions and 61 deletions.
195 changes: 134 additions & 61 deletions working/unquoted-imports/feature-specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Author: Bob Nystrom

Status: In-progress

Version 0.3 (see [CHANGELOG](#CHANGELOG) at end)
Version 0.4 (see [CHANGELOG](#CHANGELOG) at end)

Experiment flag: unquoted-imports

Expand Down Expand Up @@ -118,7 +118,7 @@ The way I think about the proposed syntax is that relative imports are
*physical* in that they specify the actual relative path on the file system from
the current library to another library *file*. Because those are physical file
paths, they use string literals and file extensions as they do today. SDK and
package imports are *logical* in that you don't know where the library your
package imports are *logical* in that you don't know where the library you're
importing lives on your disk. What you know is it's *logical name* and the
relative location of the library you want inside that package. Since these are
abstract references to a *library*, they are unquoted and omit the file
Expand Down Expand Up @@ -160,7 +160,7 @@ the reasons for the choices this proposal makes:

### Path separator

An import shorthand syntax that only supported a single identifier would work
A package shorthand syntax that only supported a single identifier would work
for packages like `test` and `args` that only expose a single library, but
would fail for even very common libraries like `package:flutter/material.dart`.
So we need some notion of a package name and a path within the that package.
Expand Down Expand Up @@ -220,27 +220,30 @@ import flutter/material;
Is the `flutter/material` part a single token or three (`flutter`, `/`, and
`material`)? The main advantage of tokenizing it as a single monolithic token is
that we could potentially allow characters or identifiers in there aren't
otherwise valid Dart. For example, we could let you use reserved words as path
segments:
otherwise valid Dart. For example, we could let you use hyphens as word
separators as in:

```dart
import weird_package/for/if/ok;
import weird-package/but-ok;
```

The disadvantage is that the tokenizer doesn't generally have enough context to
know when it should tokenize `foo/bar` as a single import path token versus
know when it should tokenize `foo/bar` as a single package path token versus
three tokens that are presumably dividing two variables named `foo` and `bar`.

Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize an
import path as a single token. Instead, it's tokenized using Dart's current
Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize a
package path as a single token. Instead, it's tokenized using Dart's current
lexical grammar.

This means you can't have a path segment that's a reserved word or is otherwise
not a valid Dart identifier. Fortunately, our published guidance has *always*
told users that [package names][name guideline] and [directories][directory
guideline] should be valid Dart identifiers. Pub will complain if you try to
publish a package whose name isn't a valid identifier. Likewise, the linter will
flag directory or library names that aren't identifiers.
This means you can't have a path segment that uses some combination of
characters that isn't currently a single token in Dart, like `hyphen-separated`
or `123LeadingDigits`. A path component must be an identifier (which may be a
reserved word or built-in identifier, discussed below). Fortunately, our
published guidance has *always* told users that [package names][name guideline]
and [directories][directory guideline] should be valid Dart identifiers. Pub
will complain if you try to publish a package whose name isn't a valid
identifier. Likewise, the linter will flag directory or file names that aren't
identifiers.

[name guideline]: https://dart.dev/tools/pub/pubspec#name
[directory guideline]: https://dart.dev/effective-dart/style#do-name-packages-and-file-system-entities-using-lowercase-with-underscores
Expand All @@ -258,18 +261,49 @@ in a large corpus of pub packages and open source widgets:
69 ( 0.010%): dotted with non-identifiers =
```

This splits every "package:" import's path into segments separated by `/`. Then
for each segment, it reports whether the segment is a valid identifier, a
built-in identifier like `dynamic` or `covariant`, etc. Almost all segments are
either valid identifiers, or dotted identifiers where each subcomponent is a
valid identifier.
This splits every "package:" path into segments separated by `/`. Then it splits
segments into components separated by `.` For each component, the analysis
reports whether the component is a valid identifier, a built-in identifier like
`dynamic` or `covariant`, or a reserved word like `for` or `if`.

(For the very small number that aren't, they can continue to use the old quoted
"package:" import syntax to import the library.)
Components that are not some kind of identifier (regular, reserved, or built-in)
are vanishingly rare. In those few cases, if a user can't simply rename the
file, they can continue to use the old quoted "package:" syntax to refer to the
file.

I think this approach is much simpler than trying to add special lexing rules.
It's consistent with how Java, C# and other languages parse their imports. It
does mean users can do silly things like:
### Reserved words and semi-reserved words

One confusing area of Dart that the previous table hints at is that Dart has
several categories of identifiers that vary in how user-accessible they are:

* Reserved words like `for` and `class` can never be used by a user as a
regular identifier in any context.

* Built-in identifiers like `abstract` and `interface` can't be used as *type*
names but can be used as other kinds of identifiers.

* Contextual keywords like `await` and `show` behave like keywords in some
specific contexts but are usable as regular identifiers everywhere else.

This leads to confusion about which of these flavors of identifiers can be used
as package paths. Which of these, if any, are valid:

```dart
import if/else;
import abstract/interface;
import show/hide;
```

Many Dart users (including experts, some of whom may be members of the Dart
language team) don't know the full list of reserved or semi-reserved words. We
don't want them to run into problems determining which identifiers work in
package paths. To that end, we allow *all* identifiers, including reserved
words, built-in identifiers, and contextual keywords as path segments.

### Whitespace and comments

If we don't use any special tokenizing rules for the path, that suggests that
whitespace and comments are allowed between the tokens as in:

```dart
import strange /* comment */ . but
Expand All @@ -281,7 +315,37 @@ import strange /* comment */ . but
fine;
```

But they can also choose to *not* do that.
This wouldn't cause any problems for a Dart implementation. It would simply
discard the whitespace and comments as it does elsewhere and the resulting path
is `strange.but/another/fine`.

However, it likely causes problems for Dart *users* and other simpler tools and
scripts that work with Dart code. In particular, we often see homegrown tools
that want to "parse" a Dart file to find its package references and traverse the
dependency graph. While these tools ideally should use a full Dart parser (like
the one in the [analyzer package][], which is freely available), the reality is
that users often cobble together simple scripts using regex to do this kind of
parsing, or they need to write these tools in a language other than Dart. In
those cases, if the package path happens to contain whitespace or comments, the
tool will likely silently fail to recognize the package path.

[analyzer package]: https://pub.dev/packages/analyzer

Also, we find no compelling *use* for whitespace and comments inside package
paths. To that end, this proposal makes it an error. All of the tokens in the
path must be directly adjacent with no whitespace, newlines, or comments between
them. The previous import is an error. However, we still allow comments in or
after the directives outside of the path. These are all valid:

```dart
import /* Weird but OK. */ some/path;
export some/path; // Hi there.
part some/path // Before the semicolon? Really?
;
```

The syntax that results from the above few sections is simple to tokenize and
parse while looking like a single opaque "unquoted string" to users and tools.

## Syntax

Expand All @@ -291,27 +355,33 @@ We add a new rule and hang it off the existing `uri` rule already used by import
and export directives:

```
uri ::= stringLiteral | packagePath
packagePath ::= packagePathSegment ( '/' packagePathSegment )*
packagePathSegment ::= dottedIdentifierList
dottedIdentifierList ::= identifier ('.' identifier)*
uri ::= stringLiteral | packagePath
packagePath ::= pathSegment ( '/' pathSegment )*
pathSegment ::= segmentComponent ( '.' segmentComponent )*
segmentComponent ::= identifier
| ⟨RESERVED_WORD⟩
| ⟨BUILT_IN_IDENTIFIER⟩
| ⟨OTHER_IDENTIFIER⟩
```

An import or export can continue to use a `stringLiteral` for the quoted form
(which is what they will do for relative imports). But they can also use a
`packagePath`, which is a slash-separated series of segments, each of which is a
series of dot-separated identifiers. *(The `dottedIdentifierList` rule is
already in the grammar and is shown here for clarity.)*
It is a compile-time error if any whitespace, newlines, or comments occur
between any of the `segmentComponent`, `/`, or `.` tokens in a `packagePath`.
*In other words, there can be nothing except the terminals themselves from the
first `segmentComponent` in the `packagePath` to the last.*

*An import, export, or part directive can continue to use a `stringLiteral` for
the quoted form (which is what they will do for relative references). But they
can also use a `packagePath`, which is a slash-separated series of segments,
each of which is a series of dot-separated components.*

### Part directive lookahead

*There are two directives for working with part files, `part` and `part of`. The
`of` identifier is not a reserved word in Dart. This means that when the parser
sees `part of`, it doesn't immediately know if it is looking at a `part`
directive followed by an unquoted identifier like `part of;` or `part
of.some/other.thing;` versus a `part of` directive like `part of thing;` or
`part of 'uri.dart';` It must lookahead past the `of` identifier to see if the
next token is `;`, `.`, `/`, or another identifier.*
*There are two directives for working with part files, `part` and `part of`.
This means that when the parser sees `part of`, it doesn't immediately know if
it is looking at a `part` directive followed by an unquoted identifier like
`part of;` or `part of.some/other.thing;` versus a `part of` directive like
`part of thing;` or `part of 'uri.dart';` It must lookahead past the `of`
identifier to see if the next token is `;`, `.`, `/`, or another identifier.*

*This may add some complexity to parsing, but should be minor. Dart's grammar
has other places that require much more (sometimes unbounded) lookahead.*
Expand All @@ -322,23 +392,20 @@ The semantics of the new syntax are defined by taking the `packagePath` and
converting it to a string. The directive then behaves as if the user had written
a string literal containing that string. The process is:

1. Let the *segment* for a `packagePathSegment` be a string defined by the
ordered concatenation of the `identifier` and `.` terminals in the
`packagePathSegment`, with all whitespace and comments removed. *So if
`packagePathSegment` is `a . b /* comment */ . c`, then its *segment* is
1. Let the *segment* for a `pathSegment` be a string defined by the ordered
concatenation of the `segmentComponent` and `.` terminals in the
`pathSegment`. *So if `pathSegment` is `a.b.c`, then its *segment* is
"a.b.c".*

2. Let *segments* be an ordered list of the segments of each
`packagePathSegment` in `packagePath`. *In other words, this and the
preceding step take the `packagePath` and convert it to a list of segment
strings while discarding whitespace and comments. So if `packagePathSegment`
is `a . b /* comment */ / c / d . e`, then *segments* is ["a.b", "c",
"d.e"].*
2. Let *segments* be an ordered list of the segments of each `pathSegment` in
`packagePath`. *In other words, this and the preceding step take the
`packagePath` and convert it to a list of segment strings. So if
`pathSegment` is `a.b/c/d.e`, then *segments* is ["a.b", "c", "d.e"].*

3. If the first segment in *segments* is "dart":

1. It is a compile error if there are no subsequent segments. *There's no
"dart:dart" or "package:dart/dart.dart" library. We reserve the right
1. It is a compile-time error if there are no subsequent segments. *There's
no "dart:dart" or "package:dart/dart.dart" library. We reserve the right
to use `import dart;` in the future to mean something useful.*

2. Let *path* be the concatenation of the remaining segments, separated
Expand All @@ -354,14 +421,14 @@ a string literal containing that string. The process is:

1. Let *name* be the segment.

2. Let *path* be the last identifier in the segment. *If the segment is
only a single identifier, this is the entire segment. Otherwise, it's
the last identifier after the last `.`. So in `foo`, *path* is `foo`.
In `foo.bar.baz`, it's `baz`.*
2. Let *path* be the last `segmentComponent` in the segment. *If the
segment is only a single `segmentComponent`, this is the entire segment.
Otherwise, it's the last identifier after the last `.`. So in `foo`,
*path* is `foo`. In `foo.bar.baz`, it's `baz`.*

3. The URI is "package:*name*/*path*.dart". *So `import test;` desugars to
`import "package:test/test.dart";`, and `import server.api;` desugars
to `import "package:server.api/api.dart";`.*
`import "package:test/test.dart";`, and `import server.api;` desugars to
`import "package:server.api/api.dart";`.*

5. Else:

Expand Down Expand Up @@ -463,7 +530,7 @@ this proposal's semantics. In other words, `part of foo.bar;` is part of the
library at `package:foo/bar.dart`, not part of the library with name `foo.bar`.

Users affected by the breakage can and should update their `part of` directive
to point to the URI of the library that the file is a part, using either the
to point to the URI of the library that the file is a part of, using either the
quoted or unquoted syntax.

### Language versioning
Expand Down Expand Up @@ -501,6 +568,12 @@ new unquoted style whenever an existing directive could use it.

## Changelog

### 0.4

- Allow reserved words and built-in identifiers as path components (#3984).

- Disallow whitespace and comments inside package paths (#3983).

### 0.3

- Address breaking change in `part of` directives with library names.
Expand Down

0 comments on commit 98652a6

Please sign in to comment.