Reason V4 [Stacked Diff 2/n #2599] [String Template Literals]

Summary:This diff implements string template literals. Test Plan: Reviewers: CC:
reasonml · Aug 6, 2020 · 5ee062e · 5ee062e
1 parent dab3565
commit 5ee062e
Show file tree

Hide file tree

Showing 17 changed files with 978 additions and 49 deletions.
diff --git a/docs/TEMPLATE_LITERALS.md b/docs/TEMPLATE_LITERALS.md
@@ -0,0 +1,146 @@
+
+Contributors: Lexing and Parsing String Templates:
+===================================================
+Supporting string templates requires coordination between the lexer, parser and
+printer. The lexer (as always) creates a token stream, but when it encounters a
+backtick, it begins a special parsing mode that collects the (mostly) raw text,
+until either hitting a closing backtick, or a `${`. If it encounters the `${`
+(called an "interpolation region"), it will temporarily resume the "regular"
+lexing approach, instead of collecting the raw text - until it hits a balanced
+`}`, upon which it will enter the "raw text" mode again until it hits the
+closing backtick.
+
+- Parsing of raw text regions and regular tokenizing: Handled by
+  `reason_declarative_lexer.ml`.
+- Token balancing: Handled by `reason_lexer.ml`.
+
+The output of lexing becomes tokens streamed into the parser, and the parser
+`reason_parser.mly` turns those tokens into AST expressions.
+
+## Lexing:
+
+String templates are opened by:
+- A backtick.
+- Followed by any whitespace character (newline, or space/tab).
+
+- Any whitespace character (newline, or space/tab).
+- Followed by a backtick
+
+```reason
+let x = ` hi this is my string template `
+let x = `
+The newline counts as a whitespace character both for opening and closing.
+`
+
+```
+
+Within the string template literal, there may be regions of non-string
+"interpolation" where expressions are lexed/parsed.
+
+```reason
+let x = ` hi this is my ${expressionHere() ++ "!"} template `
+```
+
+Template strings are lexed into tokens, some of those tokens contain a string
+"payload" with portions of the string content.
+The opening backtick, closing backtick, and `${` characters do not become a
+token that is fed to the parser, and are not included in the text payload of
+any token. The Right Brace `}` closing an interpolation region `${` _does_
+become a token that is fed to the parser. There are three tokens that are
+produced when lexing string templates.
+
+- `STRING_TEMPLATE_TERMINATED(string)`: A string region that is terminated with
+  closing backtick. It may be the entire string template contents if there are
+  no interpolation regions `${}`, or it may be the final string segment after
+  an interpolation region `${}`, as long as it is the closing of the entire
+  template.
+- `STRING_TEMPLATE_SEGMENT_LBRACE(string)`: A string region occuring _before_
+  an interpolation region `${`. The `string` payload of this token is the
+  contents up until (but not including) the next `${`.
+- `RBRACE`: A `}` character that terminates an interpolation region that
+  started with `${`.
+
+Simple example:
+
+     STRING_TEMPLATE_TERMINATED
+     |                         |
+   ` lorem ipsum lorem ipsum bla `
+    ^                          ^
+    |                          |
+    |                The closing backtick also doesn't show up in the token
+    |                stream, but the last white space is part of the lexed
+    |                STRING_TEMPLATE_TERMINATED token
+    |                (it is used to compute indentation, but is stripped from
+    |                the string constant, or re-inserted in refmting if not present)
+    |
+    The backtick doesn't show up anywhere in the token stream.  The first
+    single white space after backtick is also not part of the lexed tokens.
+
+Multiline example:
+
+    All of this leading line whitespace remains parts of the tokens' payloads
+    but it is is normalized and stripped when the parser converts the tokens
+    into string expressions.
+    |
+    |   This newline not part of any token
+    |   |
+    |   v
+    |  `
+    +->   lorem ipsum lorem
+          ipsum bla
+       `
+      ^
+      |
+      All of this white space on final line is part of the token as well.
+
+
+For interpolation, the token `STRING_TEMPLATE_SEGMENT_LBRACE` represents the
+string contents (minus any single/first white space after backtick), up to the
+`${`. As with non-interpolated string templates, the opening and closing
+backtick does not show up in the token stream, the first white space character
+after opening backtick is not included in the lexed string contents, the final
+white space character before closing backtick *is* part of the lexed string
+token (to compute indentation), but that final white space character, along
+with leading line whitespace is stripped from the string expression when the
+parsing stage converts from lexed tokens to AST string expressions.
+
+   ` lorem ipsum lorem ipsum bla${expression}lorem ipsum lorem ip lorem`
+     |                         |            ||                        |
+     STRING_TEMPLATE_TERMINATED             |STRING_TEMPLATE_TERMINATED
+                                      RBRACE
+## Parsing:
+
+The string template tokens are turned into normal AST expressions.
+`STRING_TEMPLATE_SEGMENT_LBRACE` and `STRING_TEMPLATE_TERMINATED` lexed tokens
+contains all of the string contents, plus leading line whitespace for each
+line, including the final whitespace before the closing backtick. These are
+normalized in the parser by stripping that leading whitespace including two
+additional spaces for nice indentation, before turning them into some
+combination of string contants with a special attribute on the AST, or string
+concats with a special attribute on the concat AST node.
+
+```reason
+
+// This:
+let x = `
+  Hello there
+`;
+// Becomes:
+let x = [@reason.template] "Hello there";
+
+// This:
+let x = `
+  ${expr} Hello there
+`;
+// Becomes:
+let x = [@reason.template] (expr ++ [@reason.template] "Hello there");
+
+```
+
+User Documentation:
+===================
+> This section is the user documentation for string template literals, which
+> will be published to the [official Reason Syntax
+> documentation](https://reasonml.github.io/) when 
+
+TODO
diff --git a/formatTest/typeCheckedTests/expected_output/templateStrings.re b/formatTest/typeCheckedTests/expected_output/templateStrings.re
@@ -0,0 +1,190 @@
+[@reason.version 3.7];
+/**
+ * Comments:
+ */
+
+let addTwo = (a, b) => string_of_int(a + b);
+let singleLineConstant = `
+  Single line template
+`;
+let singleLineInterpolate = `
+  Single line ${addTwo(1, 2)}!
+`;
+
+let multiLineConstant = `
+  Multi line template
+  Multi %a{x, y}line template
+  Multi line template
+  Multi line template
+`;
+
+let printTwo = (a, b) => {
+  print_string(a);
+  print_string(b);
+};
+
+let templteWithAttribute =
+  [@attrHere]
+  `
+    Passing line template
+    Passing line template
+    Passing line template
+    Passing line template
+  `;
+
+let result =
+  print_string(
+    `
+      Passing line template
+      Passing line template
+      Passing line template
+      Passing line template
+    `,
+  );
+
+let resultPrintTwo =
+  printTwo(
+    "short one",
+    `
+      Passing line template
+      Passing line template
+      Passing line template
+      Passing line template
+    `,
+  );
+
+let hasBackSlashes = `
+  One not escaped: \
+  Three not escaped: \ \ \
+  Two not escaped: \\
+  Two not escaped: \\\
+  One not escaped slash, and one escaped tick: \\`
+  Two not escaped slashes, and one escaped tick: \\\`
+  Two not escaped slashes, and one escaped dollar-brace: \\\${
+  One not escaped slash, then a close tick: \
+`;
+
+let singleLineInterpolateWithEscapeTick = `
+  Single \`line ${addTwo(1, 2)}!
+`;
+
+let singleLineConstantWithEscapeDollar = `
+  Single \${line template
+`;
+
+// The backslash here is a backslash literal.
+let singleLineInterpolateWithBackslashThenDollar = `
+  Single \$line ${addTwo(2, 3)}!
+`;
+
+let beforeExpressionCommentInNonLetty = `
+  Before expression comment in non-letty interpolation:
+  ${/* Comment */ string_of_int(1 + 2)}
+`;
+
+let beforeExpressionCommentInNonLetty2 = `
+  Same thing but with comment on own line:
+  ${
+    /* Comment */
+    string_of_int(10 + 8)
+  }
+`;
+module StringIndentationWorksInModuleIndentation = {
+  let beforeExpressionCommentInNonLetty2 = `
+    Same thing but with comment on own line:
+    ${
+      /* Comment */
+      string_of_int(10 + 8)
+    }
+  `;
+};
+
+let beforeExpressionCommentInNonLetty3 = `
+  Same thing but with text after final brace on same line:
+  ${
+    /* Comment */
+    string_of_int(20 + 1000)
+  }TextAfterBrace
+`;
+
+let beforeExpressionCommentInNonLetty3 = `
+  Same thing but with text after final brace on next line:
+  ${
+    /* Comment */
+    string_of_int(100)
+  }
+  TextAfterBrace
+`;
+
+let x = 0;
+let commentInLetSequence = `
+  Comment in letty interpolation:
+  ${
+    /* Comment */
+    let x = 200 + 49;
+    string_of_int(x);
+  }
+`;
+
+let commentInLetSequence2 = `
+  Same but with text after final brace on same line:
+  ${
+    /* Comment */
+    let x = 200 + 49;
+    string_of_int(x);
+  }TextAfterBrace
+`;
+
+let commentInLetSequence3 = `
+  Same but with text after final brace on next line:
+  ${
+    /* Comment */
+    let x = 200 + 49;
+    string_of_int(x);
+  }
+  TextAfterBrace
+`;
+
+let reallyCompicatedNested = `
+  Comment in non-letty interpolation:
+
+  ${
+    /* Comment on first line of interpolation region */
+
+    let y = (a, b) => a + b;
+    let x = 0 + y(0, 2);
+    // Nested string templates
+    let s = `
+      asdf${addTwo(0, 0)}
+      alskdjflakdsjf
+    `;
+    s ++ s;
+  }same line as brace with one space
+  and some more text at the footer no newline
+`;
+
+let reallyLongIdent = "!";
+let backToBackInterpolations = `
+  Two interpolations side by side:
+  ${addTwo(0, 0)}${addTwo(0, 0)}
+  Two interpolations side by side with leading and trailing:
+  Before${addTwo(0, 0)}${addTwo(0, 0)}After
+
+  Two interpolations side by side second one should break:
+  Before${addTwo(0, 0)}${
+    reallyLongIdent
+    ++ reallyLongIdent
+    ++ reallyLongIdent
+    ++ reallyLongIdent
+  }After
+
+  Three interpolations side by side:
+  Before${addTwo(0, 0)}${
+    reallyLongIdent
+    ++ reallyLongIdent
+    ++ reallyLongIdent
+    ++ reallyLongIdent
+  }${
+    ""
+  }After
+`;