Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonWriterException: illegal char sequence of surrogate pair #1076

Open
2072 opened this issue Aug 29, 2023 · 1 comment
Open

JsonWriterException: illegal char sequence of surrogate pair #1076

2072 opened this issue Aug 29, 2023 · 1 comment
Labels

Comments

@2072
Copy link

2072 commented Aug 29, 2023

Hello,

Just stumble on this error (JsonWriterException: illegal char sequence of surrogate pair) while marshalling a large data structure containing the original file names of a real filesystem (with probably a bad file name somewhere it seems).

I've tried with withEscapeUnicode true and false but the result is the same. What can I do ?

Here is the back trace:

com.github.plokhotnyuk.jsoniter_scala.core.JsonWriter.encodeError(JsonWriter.scala:230),
com.github.plokhotnyuk.jsoniter_scala.core.JsonWriter.illegalSurrogateError(JsonWriter.scala:982),
com.github.plokhotnyuk.jsoniter_scala.core.JsonWriter.writeEncodedString(JsonWriter.scala:901),
com.github.plokhotnyuk.jsoniter_scala.core.JsonWriter.writeEscapedOrEncodedString(JsonWriter.scala:877),
com.github.plokhotnyuk.jsoniter_scala.core.JsonWriter.writeString(JsonWriter.scala:865),
com.github.plokhotnyuk.jsoniter_scala.core.JsonWriter.writeVal(JsonWriter.scala:258),

Jdk: openjdk 17.0.8 2023-07-18 LT
Scala: 2.13.11
Jsoniter: 2.21.3

Thanks

@plokhotnyuk
Copy link
Owner

Hi John,

While ABNF of JSON specification does allow strings for keys and values to contain bit sequences that cannot encode Unicode characters, this library does not allow that by design.

It could happen when strings for filenames are truncated at the middle of surrogate pair or just generated by random sequence of UTF-16 symbols.

The best solution would be renaming of affected files or fixing the software that produce them with such names.

Using the following custom codec you can print affected names for analysis:

implicit val codecThatPrintInvalidStrings: JsonValueCodec[String] = new JsonValueCodec[String] {
  override def decodeValue(in: JsonReader, default: String): String = in.readString(default)

  override def encodeValue(x: String, out: JsonWriter): _root_.scala.Unit =
    try {
      out.writeVal(x)
    } catch {
      case ex: JsonWriterException =>
        println("Has illegal surrogate pair: " + x)
        throw ex
    }

  override def nullValue: String = null
}

As a workaround, until the source problem is not fixed, you can serialize invalid surrogate pairs using a custom codec that store string as a raw value (see, out.writeRawVal), but in that case you should encode UTF-8 characters and escape \r, \t, \n, and \f values by your code (as an example using other JSON serializer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants