Skip to content

Commit

Permalink
Add UTF-8 and UTF-16 encodings #9
Browse files Browse the repository at this point in the history
  • Loading branch information
KarolS committed Oct 17, 2019
1 parent 4fc0b98 commit 3a6790e
Show file tree
Hide file tree
Showing 8 changed files with 188 additions and 38 deletions.
1 change: 1 addition & 0 deletions docs/lang/literals.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ The exact value of `nullchar` is encoding-dependent:
* in the `vectrex` encoding it's 128,
* in the `zx80` encoding it's 1,
* in the `zx81` encoding it's 11,
* in the `utf16be` and `utf16le` encodings it's exceptionally two bytes: 0, 0
* in other encodings it's 0 (this might be a subject to change in future versions).


Expand Down
7 changes: 7 additions & 0 deletions docs/lang/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ For information about types, see [Types](./types.md).
For information about literals, see [Literals](./literals.md).
For information about assembly, see [Using assembly within Millfork programs](./assembly.md).

## Allowed characters

Source files are text files in UTF-8 encoding.
Allowed line endings are U+000A, U+000D and U+000D/U+000A.
Outside of text strings and comments, the only allowed characters are U+0009 and U+0020–U+007E
(so-called printable ASCII).

## Comments

Comments start with `//` and last until the end of line.
Expand Down
8 changes: 8 additions & 0 deletions docs/lang/text.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@

* `vectrex` – built-in Vectrex font

* `utf8` – UTF-8 (BMP only)

* `utf16be`, `utf16le` – UTF-16BE and UTF-16LE

When programming for Commodore,
use `pet` for strings you're printing using standard I/O routines
and `petscr` for strings you're copying to screen memory directly.
Expand Down Expand Up @@ -101,6 +105,8 @@ control codes for changing the text background color

* `{yen}`, `{pound}`, `{cent}`, `{euro}`, `{copy}` – yen symbol, pound symbol, cent symbol, euro symbol, copyright symbol

* `{u0000}``{u1fffff}` – Unicode codepoint (available in UTF encodings only)

##### Character availability

Encoding | lowercase letters | backslash | currencies | intl | card suits
Expand All @@ -123,6 +129,7 @@ Encoding | lowercase letters | backslash | currencies | intl | card suits
`msx_ru` | yes | yes | | Russian⁴ | yes
`koi7n2` | no | yes | | Russian⁵ | no
`vectrex` | no | yes | | none | no
`utf*` | yes | yes | all | all | yes
all the rest | yes | yes | | none | no

1. `pet`, `origpet` and `petscr` cannot display card suit symbols and lowercase letters at the same time.
Expand Down Expand Up @@ -163,4 +170,5 @@ Encoding | new line | braces | backspace | cursor movement | text colour | rever
`msx_*` | yes | yes | yes | yes | no | no | no
`koi7n2` | yes | no | yes | no | no | no | no
`vectrex` | no | no | no | no | no | no | no
`utf*` | yes | yes | yes | no | no | no | no
all the rest | yes | yes | no | no | no | no | no
3 changes: 3 additions & 0 deletions src/main/scala/millfork/Platform.scala
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,9 @@ object Platform {
if (czt) {
log.error("Default encoding cannot be zero-terminated")
}
if (codec.stringTerminator.length != 1) {
log.warn("Default encoding should be byte-based")
}
val (srcCodec, szt) = TextCodec.forName(srcCodecName, None, log)
if (szt) {
log.error("Default screen encoding cannot be zero-terminated")
Expand Down
2 changes: 1 addition & 1 deletion src/main/scala/millfork/env/Environment.scala
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ class Environment(val parent: Option[Environment], val prefix: String, val cpuFa
addThing(ConstantThing("nullptr.raw", nullptrConstant, p), None)
addThing(ConstantThing("nullptr.raw.hi", nullptrConstant.hiByte.quickSimplify, b), None)
addThing(ConstantThing("nullptr.raw.lo", nullptrConstant.loByte.quickSimplify, b), None)
val nullcharValue = options.features.getOrElse("NULLCHAR", 0L)
val nullcharValue = options.features.getOrElse("NULLCHAR", options.platform.defaultCodec.stringTerminator.head.toLong)
val nullcharConstant = NumericConstant(nullcharValue, 1)
addThing(ConstantThing("nullchar", nullcharConstant, b), None)
val __zeropage_usage = UnexpandedConstant("__zeropage_usage", 1)
Expand Down
4 changes: 2 additions & 2 deletions src/main/scala/millfork/parser/MfParser.scala
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
LiteralExpression(value, 1)
case _ =>
log.error(s"Character `$c` cannot be encoded as one byte", Some(p))
LiteralExpression(co.stringTerminator, 1)
LiteralExpression(co.stringTerminator.head, 1)
}
}

Expand All @@ -88,7 +88,7 @@ abstract class MfParser[T](fileId: String, input: String, currentDirectory: Stri
val textLiteral: P[List[Expression]] = P(position() ~ doubleQuotedString ~/ HWS ~ codec).map {
case (p, s, ((co, zt), lenient)) =>
val characters = co.encode(options.log, None, s, options, lenient = lenient).map(c => LiteralExpression(c, 1).pos(p))
if (zt) characters :+ LiteralExpression(co.stringTerminator, 1)
if (zt) characters ++ co.stringTerminator.map(nul => LiteralExpression(nul, 1))
else characters
}

Expand Down
Loading

0 comments on commit 3a6790e

Please sign in to comment.