encoding.bs

<pre class=metadata>
Group: WHATWG
H1: Encoding
Shortname: encoding
Text Macro: TWITTER encodings
Abstract: The Encoding Standard defines encodings and their JavaScript API.
Translation: ja https://triple-underscore.github.io/Encoding-ja.html
Markup Shorthands: css off
Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions,index section-index
</pre>

<link rel=stylesheet href=visualization-colors.css>

<pre class=link-defaults>
spec:streams; type:interface; text:ReadableStream
</pre>


<h2 id=preface>Preface</h2>

<p>The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the
universal coded character set. Therefore for new protocols and formats, as well as
existing formats deployed in new contexts, this specification requires (and defines) the
UTF-8 encoding.

<p>The other (legacy) encodings have been defined to some extent in the past. However,
user agents have not always implemented them in the same way, have not always used the
same labels, and often differ in dealing with undefined and former proprietary areas of
encodings. This specification addresses those gaps so that new user agents do not have to
reverse engineer encoding implementations and existing user agents can converge.

<p>In particular, this specification defines all those encodings, their algorithms to go
from bytes to scalar values and back, and their canonical names and identifying labels.
This specification also defines an API to expose part of the encoding algorithms to
JavaScript.

<p>User agents have also significantly deviated from the labels listed in the
<a href=https://www.iana.org/assignments/character-sets/character-sets.xhtml>IANA Character Sets registry</a>.
To stop spreading legacy encodings further, this specification is exhaustive about the
aforementioned details and therefore has no need for the registry. In particular, this
specification does not provide a mechanism for extending any aspect of encodings.


<h2 id=security-background>Security background</h2>

<p>There is a set of encoding security issues when the producer and consumer do not agree
on the encoding in use, or on the way a given encoding is to be implemented. For instance,
an attack was reported in 2011 where a <a>Shift_JIS</a> lead byte 0x82 was used to
“mask” a 0x22 trail byte in a JSON resource of which an attacker could control some field.
The producer did not see the problem even though this is an illegal byte combination. The
consumer decoded it as a single U+FFFD and therefore changed the overall interpretation as
U+0022 is an important delimiter. Decoders of encodings that use multiple bytes for scalar
values now require that in case of an illegal byte combination, a scalar value in the
range U+0000 to U+007F, inclusive, cannot be “masked”. For the aforementioned sequence the
output would be U+FFFD U+0022.

<p>This is a larger issue for encodings that map anything that is an <a>ASCII byte</a> to
something that is not an <a>ASCII code point</a>, when there is no lead byte present. These
are “ASCII-incompatible” encodings and other than <a>ISO-2022-JP</a>, <a>UTF-16BE</a>,
and <a>UTF-16LE</a>, which are unfortunately required due to deployed content, they are not
supported. (Investigation is
<a href=https://github.com/whatwg/encoding/issues/8 lt="Add more labels to the replacement encoding">ongoing</a>
whether more labels of other such encodings can be mapped to the <a>replacement</a>
encoding, rather than the unknown encoding fallback.) An example attack is injecting
carefully crafted content into a resource and then encouraging the user to override the
encoding, resulting in e.g. script execution.

<p>Encoders used by URLs found in HTML and HTML's form feature can also result in slight
information loss when an encoding is used that cannot represent all scalar values. E.g.
when a resource uses the <a>windows-1252</a> encoding a server will not be able to
distinguish between an end user entering “💩” and “&amp;#128169;” into a form.

<p>The problems outlined here go away when exclusively using UTF-8, which is one of the
many reasons that is now the mandatory encoding for all things.

<p class=note>See also the <a href=#browser-ui>Browser UI</a> chapter.


<h2 id=terminology>Terminology</h2>

<p>This specification depends on the Infra Standard. [[!INFRA]]

<p>Hexadecimal numbers are prefixed with "0x".

<p>In equations, all numbers are integers, addition is represented by "+", subtraction by "&minus;",
multiplication by "×", integer division by "/" (returns the quotient), modulo by "%" (returns the
remainder of an integer division), logical left shifts by "&lt;&lt;", logical right shifts by ">>",
bitwise AND by "&amp;", and bitwise OR by "|".

<p>For logical right shifts operands must have at least twenty-one bits precision.

<hr>

<p>A <dfn id=concept-token>token</dfn> is a piece of data, such as a <a>byte</a> or
<a>scalar value</a>.

<p>A <dfn id=concept-stream>stream</dfn> represents an ordered sequence of
<a>tokens</a>. <dfn>End-of-stream</dfn> is a special
<a>token</a> that signifies no more
<a>tokens</a> are in the
<a for=/>stream</a>.

<p>When a <a>token</a> is
<dfn id=concept-stream-read for=stream>read</dfn> from a <a for=/>stream</a>,
the first token in the stream must be returned and subsequently removed, and
<a>end-of-stream</a> must be returned otherwise.
<!-- this means read is blocking on e.g. networking activity;
     SimonSapin thinks this is fine, blame him if not -->

<p>When one or more <a>tokens</a> are
<dfn id=concept-stream-prepend for=stream>prepended</dfn> to a
<a for=/>stream</a>, those tokens must be inserted, in given order,
before the first token in the stream.

<p class=example id=example-tokens>Inserting the sequence of tokens <code>&amp;#128169;</code>
in a stream "<code> hello world</code>", results in a stream
"<code>&amp;#128169; hello world</code>". The next token to be read would be
<code>&amp;</code>. <!-- 💩 -->

<p>When one or more <a>tokens</a> are
<dfn id=concept-stream-push for=stream>pushed</dfn> to a <a for=/>stream</a>,
those tokens must be inserted, in given order, after the last token in the stream.


<h2 id=encodings>Encodings</h2>

<p>An <dfn export>encoding</dfn> defines a mapping from a <a>scalar value</a> sequence to
a <a>byte</a> sequence (and vice versa). Each <a for=/>encoding</a> has a
<dfn id=name export for=encoding>name</dfn>, and one or more
<dfn id=label export for=encoding lt=label>labels</dfn>.

<p class="note no-backref">This specification defines three <a for=/>encodings</a> with the same
names as <i>encoding schemes</i> defined in the Unicode standard: <a>UTF-8</a>, <a>UTF-16LE</a>, and
<a>UTF-16BE</a>. The <a for=/>encodings</a> differ from the <i>encoding schemes</i> by byte order
mark (also known as BOM) handling not being part of the <a for=/>encodings</a> themselves and
instead being part of wrapper algorithms in this specification, whereas byte order mark handling is
part of the definition of the <i>encoding schemes</i> in the Unicode Standard. <a>UTF-8</a> used
together with the <a>UTF-8 decode</a> algorithm matches the <i>encoding scheme</i> of the same name.
This specification does not provide wrapper algorithms that would combine with <a>UTF-16LE</a> and
<a>UTF-16BE</a> to match the similarly-named <i>encoding schemes</i>. [[UNICODE]]


<h3 id=encoders-and-decoders>Encoders and decoders</h3>

<p>Each <a for=/>encoding</a> has an associated <dfn>decoder</dfn> and most of them have an
associated <dfn>encoder</dfn>. Each <a for=/>decoder</a> and <a for=/>encoder</a> have a
<dfn>handler</dfn> algorithm. A <a>handler</a> algorithm takes an input
<a for=/>stream</a> and a <a>token</a>, and returns
<dfn>finished</dfn>, one or more <a>tokens</a>, <dfn>error</dfn>
optionally with a <a>code point</a>, or <dfn>continue</dfn>.

<p class="note no-backref">The <a>replacement</a>, <a>UTF-16BE</a>, and
<a>UTF-16LE</a> <a for=/>encodings</a> have no <a for=/>encoder</a>.

<p>An <dfn>error mode</dfn> as used below is "<code>replacement</code>" (default) or
"<code>fatal</code>" for a <a for=/>decoder</a> and "<code>fatal</code>" (default) or
"<code>html</code>" for an <a for=/>encoder</a>.

<p class=note>An XML processor would set <a for=/>error mode</a> to "<code>fatal</code>".
[[XML]]

<p class=note><code>html</code> exists as <a for=/>error mode</a> due to URLs and HTML forms
requiring a non-terminating legacy <a for=/>encoder</a>. The "<code>html</code>"
<a for=/>error mode</a> causes a sequence to be emitted that cannot be distinguished from
legitimate input and can therefore lead to silent data loss. Developers are strongly
encouraged to use the <a>UTF-8</a> <a for=/>encoding</a> to prevent this from
happening.
[[URL]]
[[HTML]]

<p>To <dfn id=concept-encoding-run for=encoding>run</dfn> an <a for=/>encoding</a>'s
<a for=/>decoder</a> or <a for=/>encoder</a> <var>encoderDecoder</var> with input
<a for=/>stream</a> <var>input</var>, output
<a for=/>stream</a> <var>output</var>, and optional
<a for=/>error mode</a> <var>mode</var>, run these steps:

<ol>
 <li><p>If <var>mode</var> is not given, set it to "<code>replacement</code>", if
 <var>encoderDecoder</var> is a <a for=/>decoder</a>, and "<code>fatal</code>" otherwise.

 <li><p>Let <var>encoderDecoderInstance</var> be a new <var>encoderDecoder</var>.

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>result</var> be the result of
   <a>processing</a> the result of
   <a>reading</a> from <var>input</var> for
   <var>encoderDecoderInstance</var>, <var>input</var>, <var>output</var>, and
   <var>mode</var>.

   <li><p>If <var>result</var> is not <a>continue</a>, return <var>result</var>.

   <li><p>Otherwise, do nothing.
  </ol>
</ol>

<p>To <dfn id=concept-encoding-process for=encoding>process</dfn> a
<a>token</a> <var>token</var> for an <a for=/>encoding</a>'s
<a for=/>encoder</a> or <a for=/>decoder</a> instance <var>encoderDecoderInstance</var>,
<a for=/>stream</a> <var>input</var>, output
<a for=/>stream</a> <var>output</var>, and optional
<a for=/>error mode</a> <var>mode</var>, run these steps:

<ol>
 <li><p>If <var>mode</var> is not given, set it to "<code>replacement</code>", if
 <var>encoderDecoderInstance</var> is a <a for=/>decoder</a> instance, and "<code>fatal</code>"
 otherwise.

 <li><p>Assert: if <var>encoderDecoderInstance</var> is an <a for=/>encoder</a> instance,
 <var>token</var> is not a <a>surrogate</a>.

 <li><p>Let <var>result</var> be the result of running <var>encoderDecoderInstance</var>'s
 <a>handler</a> on <var>input</var> and <var>token</var>.

 <li><p>If <var>result</var> is <a>continue</a> or <a>finished</a>, return
 <var>result</var>.

 <li>
  <p>Otherwise, if <var>result</var> is one or more <a>tokens</a>:

  <ol>
   <li><p>Assert: if <var>encoderDecoderInstance</var> is a <a for=/>decoder</a> instance,
   <var>result</var> does not contain any <a>surrogates</a>.

   <li><p><a>Push</a> <var>result</var> to <var>output</var>.
  </ol>

 <li>
  <p>Otherwise, if <var>result</var> is <a>error</a>, switch on <var>mode</var> and
  run the associated steps:

  <dl class=switch>
   <dt>"<code>replacement</code>"
   <dd><a>Push</a> U+FFFD to <var>output</var>.
   <dt>"<code>html</code>"
   <dd><a>Prepend</a> U+0026, U+0023, followed by the
   shortest sequence of <a>ASCII digits</a> representing <var>result</var>'s
   <a>code point</a> in base ten, followed by U+003B to <var>input</var>.
   <!-- &# ... ; -->
   <dt>"<code>fatal</code>"
   <dd>Return <a>error</a>.
  </dl>

 <li>Return <a>continue</a>.
</ol>


<h3 id=names-and-labels>Names and labels</h3>

<p>The table below lists all <a for=/>encodings</a>
and their <a>labels</a> user agents must support.
User agents must not support any other <a for=/>encodings</a>
or <a>labels</a>.

<p class=note>For each encoding, <a lt="ASCII lowercase">ASCII-lowercasing</a> its
<a for=encoding>name</a> yields one of its <a for=encoding>labels</a>.

<p>Authors must use the <a>UTF-8</a> <a for=/>encoding</a> and must use the
<a>ASCII case-insensitive</a> "<code>utf-8</code>" <a>label</a> to
identify it.

<p>New protocols and formats, as well as existing formats deployed in new contexts, must
use the <a>UTF-8</a> <a for=/>encoding</a> exclusively. If these protocols and
formats need to expose the <a for=/>encoding</a>'s <a>name</a> or
<a>label</a>, they must expose it as "<code>utf-8</code>".

<p>To
<dfn export lt="get an encoding|getting an encoding" id=concept-encoding-get>get an encoding</dfn>
from a string <var>label</var>, run these steps:

<ol>
 <li><p>Remove any leading and trailing <a>ASCII whitespace</a> from
 <var>label</var>.

 <li><p>If <var>label</var> is an <a>ASCII case-insensitive</a>
 match for any of the <a>labels</a> listed in the table
 below, return the corresponding <a for=/>encoding</a>, and failure otherwise.
</ol>

<p class="note no-backref">This is a more basic and restrictive algorithm of mapping <a>labels</a>
to <a for=/>encodings</a> than
<a href=https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching>section 1.4 of Unicode Technical Standard #22</a>
prescribes, as that is necessary to be compatible with deployed content.

<table>
 <thead>
  <tr>
   <th><a>Name</a>
   <th><a>Labels</a>
 <tbody>
  <tr><th colspan=2><a href=#the-encoding>The Encoding</a>
  <tr>
   <td rowspan=3><a>UTF-8</a>
   <td>"<code>unicode-1-1-utf-8</code>"
  <tr><td>"<code>utf-8</code>"
  <tr><td>"<code>utf8</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-single-byte-encodings>Legacy single-byte encodings</a>
  <tr>
   <td rowspan=4><a>IBM866</a>
   <td>"<code>866</code>"
  <tr><td>"<code>cp866</code>"
  <tr><td>"<code>csibm866</code>"
  <tr><td>"<code>ibm866</code>"
  <tr>
   <td rowspan=9><a>ISO-8859-2</a>
   <td>"<code>csisolatin2</code>"
  <tr><td>"<code>iso-8859-2</code>"
  <tr><td>"<code>iso-ir-101</code>"
  <tr><td>"<code>iso8859-2</code>"
  <tr><td>"<code>iso88592</code>"
  <tr><td>"<code>iso_8859-2</code>"
  <tr><td>"<code>iso_8859-2:1987</code>"
  <tr><td>"<code>l2</code>"
  <tr><td>"<code>latin2</code>"
  <tr>
   <td rowspan=9><a>ISO-8859-3</a>
   <td>"<code>csisolatin3</code>"
  <tr><td>"<code>iso-8859-3</code>"
  <tr><td>"<code>iso-ir-109</code>"
  <tr><td>"<code>iso8859-3</code>"
  <tr><td>"<code>iso88593</code>"
  <tr><td>"<code>iso_8859-3</code>"
  <tr><td>"<code>iso_8859-3:1988</code>"
  <tr><td>"<code>l3</code>"
  <tr><td>"<code>latin3</code>"
  <tr>
   <td rowspan=9><a>ISO-8859-4</a>
   <td>"<code>csisolatin4</code>"
  <tr><td>"<code>iso-8859-4</code>"
  <tr><td>"<code>iso-ir-110</code>"
  <tr><td>"<code>iso8859-4</code>"
  <tr><td>"<code>iso88594</code>"
  <tr><td>"<code>iso_8859-4</code>"
  <tr><td>"<code>iso_8859-4:1988</code>"
  <tr><td>"<code>l4</code>"
  <tr><td>"<code>latin4</code>"
  <tr>
   <td rowspan=8><a>ISO-8859-5</a>
   <td>"<code>csisolatincyrillic</code>"
  <tr><td>"<code>cyrillic</code>"
  <tr><td>"<code>iso-8859-5</code>"
  <tr><td>"<code>iso-ir-144</code>"
  <tr><td>"<code>iso8859-5</code>"
  <tr><td>"<code>iso88595</code>"
  <tr><td>"<code>iso_8859-5</code>"
  <tr><td>"<code>iso_8859-5:1988</code>"
  <tr>
   <td rowspan=14><a>ISO-8859-6</a>
   <td>"<code>arabic</code>"
  <tr><td>"<code>asmo-708</code>"
  <tr><td>"<code>csiso88596e</code>"
  <tr><td>"<code>csiso88596i</code>"
  <tr><td>"<code>csisolatinarabic</code>"
  <tr><td>"<code>ecma-114</code>"
  <tr><td>"<code>iso-8859-6</code>"
  <tr><td>"<code>iso-8859-6-e</code>"
  <tr><td>"<code>iso-8859-6-i</code>"
  <tr><td>"<code>iso-ir-127</code>"
  <tr><td>"<code>iso8859-6</code>"
  <tr><td>"<code>iso88596</code>"
  <tr><td>"<code>iso_8859-6</code>"
  <tr><td>"<code>iso_8859-6:1987</code>"
  <tr>
   <td rowspan=12><a>ISO-8859-7</a>
   <td>"<code>csisolatingreek</code>"
  <tr><td>"<code>ecma-118</code>"
  <tr><td>"<code>elot_928</code>"
  <tr><td>"<code>greek</code>"
  <tr><td>"<code>greek8</code>"
  <tr><td>"<code>iso-8859-7</code>"
  <tr><td>"<code>iso-ir-126</code>"
  <tr><td>"<code>iso8859-7</code>"
  <tr><td>"<code>iso88597</code>"
  <tr><td>"<code>iso_8859-7</code>"
  <tr><td>"<code>iso_8859-7:1987</code>"
  <tr><td>"<code>sun_eu_greek</code>"
  <tr>
   <td rowspan=11><a>ISO-8859-8</a>
   <td>"<code>csiso88598e</code>"
  <tr><td>"<code>csisolatinhebrew</code>"
  <tr><td>"<code>hebrew</code>"
  <tr><td>"<code>iso-8859-8</code>"
  <tr><td>"<code>iso-8859-8-e</code>"
  <tr><td>"<code>iso-ir-138</code>"
  <tr><td>"<code>iso8859-8</code>"
  <tr><td>"<code>iso88598</code>"
  <tr><td>"<code>iso_8859-8</code>"
  <tr><td>"<code>iso_8859-8:1988</code>"
  <tr><td>"<code>visual</code>"
  <tr>
   <td rowspan=3><a>ISO-8859-8-I</a>
   <td>"<code>csiso88598i</code>"
  <tr><td>"<code>iso-8859-8-i</code>"
  <tr><td>"<code>logical</code>"
  <tr>
   <td rowspan=7><a>ISO-8859-10</a>
   <td>"<code>csisolatin6</code>"
  <tr><td>"<code>iso-8859-10</code>"
  <tr><td>"<code>iso-ir-157</code>"
  <tr><td>"<code>iso8859-10</code>"
  <tr><td>"<code>iso885910</code>"
  <tr><td>"<code>l6</code>"
  <tr><td>"<code>latin6</code>"
  <tr>
   <td rowspan=3><a>ISO-8859-13</a>
   <td>"<code>iso-8859-13</code>"
  <tr><td>"<code>iso8859-13</code>"
  <tr><td>"<code>iso885913</code>"
  <tr>
   <td rowspan=3><a>ISO-8859-14</a>
   <td>"<code>iso-8859-14</code>"
  <tr><td>"<code>iso8859-14</code>"
  <tr><td>"<code>iso885914</code>"
  <tr>
   <td rowspan=6><a>ISO-8859-15</a>
   <td>"<code>csisolatin9</code>"
  <tr><td>"<code>iso-8859-15</code>"
  <tr><td>"<code>iso8859-15</code>"
  <tr><td>"<code>iso885915</code>"
  <tr><td>"<code>iso_8859-15</code>"
  <tr><td>"<code>l9</code>"
  <tr>
   <td><a>ISO-8859-16</a>
   <td>"<code>iso-8859-16</code>"
  <tr>
   <td rowspan=5><a>KOI8-R</a>
   <td>"<code>cskoi8r</code>"
  <tr><td>"<code>koi</code>"
  <tr><td>"<code>koi8</code>"
  <tr><td>"<code>koi8-r</code>"
  <tr><td>"<code>koi8_r</code>"
  <tr>
   <td rowspan=2><a>KOI8-U</a>
   <td>"<code>koi8-ru</code>"
  <tr><td>"<code>koi8-u</code>"
  <tr>
   <td rowspan=4><a>macintosh</a>
   <td>"<code>csmacintosh</code>"
  <tr><td>"<code>mac</code>"
  <tr><td>"<code>macintosh</code>"
  <tr><td>"<code>x-mac-roman</code>"
  <tr>
   <td rowspan=6><a>windows-874</a>
   <td>"<code>dos-874</code>"
  <tr><td>"<code>iso-8859-11</code>"
  <tr><td>"<code>iso8859-11</code>"
  <tr><td>"<code>iso885911</code>"
  <tr><td>"<code>tis-620</code>"
  <tr><td>"<code>windows-874</code>"
  <tr>
   <td rowspan=3><a>windows-1250</a>
   <td>"<code>cp1250</code>"
  <tr><td>"<code>windows-1250</code>"
  <tr><td>"<code>x-cp1250</code>"
  <tr>
   <td rowspan=3><a>windows-1251</a>
   <td>"<code>cp1251</code>"
  <tr><td>"<code>windows-1251</code>"
  <tr><td>"<code>x-cp1251</code>"
  <tr>
   <td rowspan=17><a>windows-1252</a>
   <td>"<code>ansi_x3.4-1968</code>"
  <tr><td>"<code>ascii</code>"
  <tr><td>"<code>cp1252</code>"
  <tr><td>"<code>cp819</code>"
  <tr><td>"<code>csisolatin1</code>"
  <tr><td>"<code>ibm819</code>"
  <tr><td>"<code>iso-8859-1</code>"
  <tr><td>"<code>iso-ir-100</code>"
  <tr><td>"<code>iso8859-1</code>"
  <tr><td>"<code>iso88591</code>"
  <tr><td>"<code>iso_8859-1</code>"
  <tr><td>"<code>iso_8859-1:1987</code>"
  <tr><td>"<code>l1</code>"
  <tr><td>"<code>latin1</code>"
  <tr><td>"<code>us-ascii</code>"
  <tr><td>"<code>windows-1252</code>"
  <tr><td>"<code>x-cp1252</code>"
  <tr>
   <td rowspan=3><a>windows-1253</a>
   <td>"<code>cp1253</code>"
  <tr><td>"<code>windows-1253</code>"
  <tr><td>"<code>x-cp1253</code>"
  <tr>
   <td rowspan=12><a>windows-1254</a>
   <td>"<code>cp1254</code>"
  <tr><td>"<code>csisolatin5</code>"
  <tr><td>"<code>iso-8859-9</code>"
  <tr><td>"<code>iso-ir-148</code>"
  <tr><td>"<code>iso8859-9</code>"
  <tr><td>"<code>iso88599</code>"
  <tr><td>"<code>iso_8859-9</code>"
  <tr><td>"<code>iso_8859-9:1989</code>"
  <tr><td>"<code>l5</code>"
  <tr><td>"<code>latin5</code>"
  <tr><td>"<code>windows-1254</code>"
  <tr><td>"<code>x-cp1254</code>"
  <tr>
   <td rowspan=3><a>windows-1255</a>
   <td>"<code>cp1255</code>"
  <tr><td>"<code>windows-1255</code>"
  <tr><td>"<code>x-cp1255</code>"
  <tr>
   <td rowspan=3><a>windows-1256</a>
   <td>"<code>cp1256</code>"
  <tr><td>"<code>windows-1256</code>"
  <tr><td>"<code>x-cp1256</code>"
  <tr>
   <td rowspan=3><a>windows-1257</a>
   <td>"<code>cp1257</code>"
  <tr><td>"<code>windows-1257</code>"
  <tr><td>"<code>x-cp1257</code>"
  <tr>
   <td rowspan=3><a>windows-1258</a>
   <td>"<code>cp1258</code>"
  <tr><td>"<code>windows-1258</code>"
  <tr><td>"<code>x-cp1258</code>"
  <tr>
   <td rowspan=2><a>x-mac-cyrillic</a>
   <td>"<code>x-mac-cyrillic</code>"
  <tr><td>"<code>x-mac-ukrainian</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-chinese-(simplified)-encodings>Legacy multi-byte Chinese (simplified) encodings</a>
  <tr>
   <td rowspan=9><a>GBK</a>
   <td>"<code>chinese</code>"
  <tr><td>"<code>csgb2312</code>"
  <tr><td>"<code>csiso58gb231280</code>"
  <tr><td>"<code>gb2312</code>"
  <tr><td>"<code>gb_2312</code>"
  <tr><td>"<code>gb_2312-80</code>"
  <tr><td>"<code>gbk</code>"
  <tr><td>"<code>iso-ir-58</code>"
  <tr><td>"<code>x-gbk</code>"
  <tr>
   <td><a>gb18030</a>
   <td>"<code>gb18030</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-chinese-(traditional)-encodings>Legacy multi-byte Chinese (traditional) encodings</a>
  <tr>
   <td rowspan=5><a>Big5</a>
   <td>"<code>big5</code>"
  <tr><td>"<code>big5-hkscs</code>"
  <tr><td>"<code>cn-big5</code>"
  <tr><td>"<code>csbig5</code>"
  <tr><td>"<code>x-x-big5</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-japanese-encodings>Legacy multi-byte Japanese encodings</a>
  <tr>
   <td rowspan=3><a>EUC-JP</a>
   <td>"<code>cseucpkdfmtjapanese</code>"
  <tr><td>"<code>euc-jp</code>"
  <tr><td>"<code>x-euc-jp</code>"
  <tr>
   <td rowspan=2><a>ISO-2022-JP</a>
   <td>"<code>csiso2022jp</code>"
  <tr><td>"<code>iso-2022-jp</code>"
  <tr>
   <td rowspan=8><a>Shift_JIS</a>
   <td>"<code>csshiftjis</code>"
  <tr><td>"<code>ms932</code>"
  <tr><td>"<code>ms_kanji</code>"
  <tr><td>"<code>shift-jis</code>"
  <tr><td>"<code>shift_jis</code>"
  <tr><td>"<code>sjis</code>"
  <tr><td>"<code>windows-31j</code>"
  <tr><td>"<code>x-sjis</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-korean-encodings>Legacy multi-byte Korean encodings</a>
  <tr>
   <td rowspan=10><a>EUC-KR</a>
   <td>"<code>cseuckr</code>"
  <tr><td>"<code>csksc56011987</code>"
  <tr><td>"<code>euc-kr</code>"
  <tr><td>"<code>iso-ir-149</code>"
  <tr><td>"<code>korean</code>"
  <tr><td>"<code>ks_c_5601-1987</code>"
  <tr><td>"<code>ks_c_5601-1989</code>"
  <tr><td>"<code>ksc5601</code>"
  <tr><td>"<code>ksc_5601</code>"
  <tr><td>"<code>windows-949</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-miscellaneous-encodings>Legacy miscellaneous encodings</a>
  <tr>
   <td rowspan=6><a>replacement</a>
   <td>"<code>csiso2022kr</code>"
  <tr><td>"<code>hz-gb-2312</code>"
  <tr><td>"<code>iso-2022-cn</code>"
  <tr><td>"<code>iso-2022-cn-ext</code>"
  <tr><td>"<code>iso-2022-kr</code>"
  <tr><td>"<code>replacement</code>"
  <tr>
   <td><a>UTF-16BE</a>
   <td>"<code>utf-16be</code>"
  <tr>
   <td rowspan=2><a>UTF-16LE</a>
   <td>"<code>utf-16</code>"
  <tr><td>"<code>utf-16le</code>"
  <tr>
   <td><a>x-user-defined</a>
   <td>"<code>x-user-defined</code>"
</table>

<p class=note>All <a for=/>encodings</a> and their
<a>labels</a> are also available as non-normative
<a href=encodings.json>encodings.json</a> resource.


<h3 id=output-encodings>Output encodings</h3>

<p>To <dfn export>get an output encoding</dfn> from an <a for=/>encoding</a>
<var>encoding</var>, run these steps:

<ol>
 <li><p>If <var>encoding</var> is <a>replacement</a>, <a>UTF-16BE</a>, or
 <a>UTF-16LE</a>, return <a>UTF-8</a>.

 <li><p>Return <var>encoding</var>.
</ol>

<p class=note>The <a>get an output encoding</a> algorithm is useful for URL parsing and HTML
form submission, which both need exactly this.


<h2 id=indexes>Indexes</h2>

<p>Most legacy <a for=/>encodings</a> make use of an <dfn id=index>index</dfn>. An
<a>index</a> is an ordered list of entries, each entry consisting of a pointer and a
corresponding code point. Within an <a>index</a> pointers are unique and code points can be
duplicated.

<p class="note no-backref">An efficient implementation likely has two
<a lt=index>indexes</a> per <a for=/>encoding</a>. One optimized for its
<a for=/>decoder</a> and one for its <a for=/>encoder</a>.

<p>To find the pointers and their corresponding code points in an <a>index</a>,
let <var>lines</var> be the result of splitting the resource's contents on U+000A.
Then remove each item in <var>lines</var> that is the empty string or starts with U+0023.
Then the pointers and their corresponding code points are found by splitting each item in <var>lines</var> on U+0009.
The first subitem is the pointer (as a decimal number) and the second is the corresponding code point (as a hexadecimal number).
Other subitems are not relevant.

<p class="note no-backref">To signify changes an <a>index</a> includes an
<i>Identifier</i> and a <i>Date</i>. If an <i>Identifier</i> has
changed, so has the <a>index</a>.

<p>The <dfn>index code point</dfn> for <var>pointer</var> in
<var>index</var> is the code point corresponding to
<var>pointer</var> in <var>index</var>, or null if
<var>pointer</var> is not in <var>index</var>.

<p>The <dfn>index pointer</dfn> for <var>code point</var> in
<var>index</var> is the <em>first</em> pointer corresponding to
<var>code point</var> in <var>index</var>, or null if
<var>code point</var> is not in <var>index</var>.

<div class=note id=visualization>
 <p>There is a non-normative visualization for each <a>index</a> other than
 <a>index gb18030 ranges</a> and <a>index ISO-2022-JP katakana</a>. <a>index jis0208</a> also has an
 alternative <a>Shift_JIS</a> visualization. Additionally, there is visualization of the Basic
 Multilingual Plane coverage of each index other than <a>index gb18030 ranges</a> and
 <a>index ISO-2022-JP katakana</a>.

 <p>The legend for the visualizations is:

 <ul class=visualizationlegend>
  <li class=unmapped>Unmapped
  <li class=mid>Two bytes in UTF-8
  <li class="mid contiguous">Two bytes in UTF-8, code point follows immediately the code point of
  previous pointer
  <li class=upper>Three bytes in UTF-8 (non-PUA)
  <li class="upper contiguous">Three bytes in UTF-8 (non-PUA), code point follows immediately the
  code point of previous pointer
  <li class=pua>Private Use
  <li class="pua contiguous">Private Use, code point follows immediately the code point of previous
  pointer
  <li class=astral>Four bytes in UTF-8
  <li class="astral contiguous">Four bytes in UTF-8, code point follows immediately the code point
  of previous pointer
  <li class=duplicate>Duplicate code point already mapped at an earlier index
  <li class=compatibility>CJK Compatibility Ideograph
  <li class=ext>CJK Unified Ideographs Extension A
 </ul>
</div>

<p>These are the <a lt=index>indexes</a> defined by this
specification, excluding <a>index single-byte</a>, which have their own table:

<table>
 <tbody><tr><th colspan=4><a>Index</a><th>Notes
 <tr>
  <td><dfn export>index Big5</dfn>
  <td><a href=index-big5.txt>index-big5.txt</a>
  <td><a href=big5.html>index Big5 visualization</a>
  <td><a href=big5-bmp.html>index Big5 BMP coverage</a>
  <td>This matches the Big5 standard in combination with the
  Hong Kong Supplementary Character Set and other common extensions.
 <tr>
  <td><dfn export>index EUC-KR</dfn>
  <td><a href=index-euc-kr.txt>index-euc-kr.txt</a>
  <td><a href=euc-kr.html>index EUC-KR visualization</a>
  <td><a href=euc-kr-bmp.html>index EUC-KR BMP coverage</a>
  <td>This matches the KS X 1001 standard and the Unified Hangul Code, more commonly known together
  as Windows Codepage 949. It covers the Hangul Syllables block of Unicode in its entirety. The
  Hangul block whose top left corner in the visualization is at pointer 9026 is in the Unicode
  order. Taken separately, the rest of the Hangul syllables in this index are in the Unicode order,
  too.
 <tr>
  <td><dfn export>index gb18030</dfn>
  <td><a href=index-gb18030.txt>index-gb18030.txt</a>
  <td><a href=gb18030.html>index gb18030 visualization</a>
  <td><a href=gb18030-bmp.html>index gb18030 BMP coverage</a>
  <td>This matches the GB18030-2005 standard for code points encoded as two bytes, except for
  0xA3 0xA0 which maps to U+3000 to be compatible with deployed content. This index covers the
  CJK Unified Ideographs block of Unicode in its entirety. Entries from that block that are above or
  to the left of (the first) U+3000 in the visualization are in the Unicode order.
  <!-- https://bugzilla.mozilla.org/show_bug.cgi?id=131837
       https://bugs.webkit.org/show_bug.cgi?id=17014
       https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396
       https://github.com/whatwg/encoding/issues/17 -->
 <tr>
  <td><dfn export>index gb18030 ranges</dfn>
  <td colspan=3><a href=index-gb18030-ranges.txt>index-gb18030-ranges.txt</a>
  <td>This <a>index</a> works different from all others. Listing all code points would result
  in over a million items whereas they can be represented neatly in 207 ranges combined with trivial
  limit checks. It therefore only superficially matches the GB18030-2005 standard for code points
  encoded as four bytes. See also <a>index gb18030 ranges code point</a> and
  <a>index gb18030 ranges pointer</a> below.
 <tr>
  <td><dfn export>index jis0208</dfn>
  <td><a href=index-jis0208.txt>index-jis0208.txt</a>
  <td><a href=jis0208.html>index jis0208 visualization</a>, <a href=shift_jis.html>Shift_JIS visualization</a>
  <td><a href=jis0208-bmp.html>index jis0208 BMP coverage</a>
  <td>This is the JIS X 0208 standard including formerly proprietary
  extensions from IBM and NEC.
  <!-- NEC = Nippon Electronics Corporation -->
 <tr>
  <td><dfn export>index jis0212</dfn>
  <td><a href=index-jis0212.txt>index-jis0212.txt</a>
  <td><a href=jis0212.html>index jis0212 visualization</a>
  <td><a href=jis0212-bmp.html>index jis0212 BMP coverage</a>
  <td>This is the JIS X 0212 standard. It is only used by the <a>EUC-JP decoder</a>
  due to lack of widespread support elsewhere.
  <!--
   No JIX X 0212 EUC-JP encoder support:
     https://bugzilla.mozilla.org/show_bug.cgi?id=600715
     https://code.google.com/p/chromium/issues/detail?id=78847

   No JIX X 0212 ISO-2022-JP support:
     https://www.w3.org/Bugs/Public/show_bug.cgi?id=26885
  -->
 <tr>
  <td><dfn export>index ISO-2022-JP katakana</dfn>
  <td colspan=3><a href=index-iso-2022-jp-katakana.txt>index-iso-2022-jp-katakana.txt</a>
  <td>This maps halfwidth to fullwidth katakana as per Unicode Normalization Form KC, except that
  U+FF9E and U+FF9F map to U+309B and U+309C rather than U+3099 and U+309A. It is only used by the
  <a>ISO-2022-JP encoder</a>. [[UNICODE]]
</table>

<p>The <dfn>index gb18030 ranges code point</dfn> for <var>pointer</var> is
the return value of these steps:

<ol>
 <li><p>If <var>pointer</var> is greater than 39419 and less than
 189000, or <var>pointer</var> is greater than 1237575, return null.

 <li><p>If <var>pointer</var> is 7457, return code point U+E7C7.
 <!-- 7457 is 0x81 0x35 0xF4 0x37 -->

 <li><p>Let <var>offset</var> be the last pointer in <a>index gb18030 ranges</a> that is less than
 or equal to <var>pointer</var> and let <var>code point offset</var> be its corresponding code
 point.

 <li><p>Return a code point whose value is
 <var>code point offset</var> + <var>pointer</var> &minus; <var>offset</var>.
</ol>

<p>The <dfn>index gb18030 ranges pointer</dfn> for <var>code point</var> is
the return value of these steps:

<ol>
 <li><p>If <var>code point</var> is U+E7C7, return pointer 7457.

 <li><p>Let <var>offset</var> be the last code point in <a>index gb18030 ranges</a> that is less
 than or equal to <var>code point</var> and let <var>pointer offset</var> be its corresponding
 pointer.

 <li><p>Return a pointer whose value is
 <var>pointer offset</var> + <var>code point</var> &minus; <var>offset</var>.
</ol>

<p>The <dfn>index Shift_JIS pointer</dfn> for <var>code point</var> is the return value of these
steps:

<ol>
 <li>
  <p>Let <var>index</var> be <a>index jis0208</a> excluding all entries whose pointer is in
  the range 8272 to 8835, inclusive.
  <!-- selected NEC duplicates from IBM extensions later in the index; need to use IBM
       extensions when going back to bytes -->

  <p class=note>The <a>index jis0208</a> contains duplicate code points so the exclusion of
  these entries causes later code points to be used.

 <li><p>Return the <a>index pointer</a> for <var>code point</var> in
 <var>index</var>.
</ol>

<p>The <dfn>index Big5 pointer</dfn> for <var>code point</var> is the return value of
these steps:

<ol>
 <li>
  <p>Let <var>index</var> be <a>index Big5</a> excluding all entries whose pointer is less
  than (0xA1 - 0x81) × 157.

  <p class=note>Avoid returning Hong Kong Supplementary Character Set extensions literally.

 <li>
  <p>If <var>code point</var> is U+2550, U+255E, U+2561, U+256A, U+5341, or U+5345,
  return the <em>last</em> pointer corresponding to <var>code point</var> in
  <var>index</var>.
  <!-- https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878 -->

  <p class=note>There are other duplicate code points, but for those the <em>first</em> pointer is
  to be used.

 <li><p>Return the <a>index pointer</a> for <var>code point</var> in
 <var>index</var>.
</ol>

<hr>

<p class="note no-backref">All <a lt=index>indexes</a> are also available as a non-normative
<a href=indexes.json>indexes.json</a> resource. (<a>Index gb18030 ranges</a> has a slightly
different format here, to be able to represent ranges.)


<h2 id=specification-hooks>Hooks for standards</h2>

<div class=note>
 <p>The algorithms defined below (<a>decode</a>, <a>UTF-8 decode</a>,
 <a>UTF-8 decode without BOM</a>, <a>UTF-8 decode without BOM or fail</a>, <a for=/>encode</a>,
 <a>UTF-8 encode</a>, and <a>BOM sniff</a>) are intended for usage by other standards.

 <p>For decoding, <a>UTF-8 decode</a> is to be used by new formats. For identifiers or byte
 sequences within a format or protocol, use <a>UTF-8 decode without BOM</a> or
 <a>UTF-8 decode without BOM or fail</a>.

 <p>For encoding, <a>UTF-8 encode</a> is to be used.

 <p>Standards are strongly discouraged from using <a>decode</a>, <a for=/>encode</a>, and
 <a>BOM sniff</a>, except as needed for compatibility.

 <p>The <a>get an encoding</a> algorithm is to be used to turn a <a>label</a> into an
 <a for=/>encoding</a>.

 <p>Standards are to ensure that the streams they pass to the <a for=/>encode</a> and
 <a>UTF-8 encode</a> algorithms are effectively scalar value streams, i.e., they contain no
 <a>surrogates</a>.
</div>

<p>To <dfn export>decode</dfn> a byte stream <var>stream</var> using
fallback encoding <var>encoding</var>, run these steps:

<ol>
 <li><p>Let <var>BOMEncoding</var> be the result of <a>BOM sniffing</a> <var>stream</var>.

 <li>
  <p>If <var>BOMEncoding</var> is non-null:

  <ol>
   <li><p>Set <var>encoding</var> to <var>BOMEncoding</var>.

   <li><p><a>Read</a> three bytes from <var>stream</var>, if <var>BOMEncoding</var> is <a>UTF-8</a>;
   otherwise <a>read</a> two bytes. (Do nothing with those bytes.)
  </ol>

  <p class=note>For compatibility with deployed content, the byte order mark is more authoritative
  than anything else. In a context where HTTP is used this is in violation of the semantics of the
  `<code>Content-Type</code>` header.

 <li><p>Let <var>output</var> be a scalar value <a for=/>stream</a>.

 <li><p><a>Run</a> <var>encoding</var>'s
 <a for=/>decoder</a> with <var>stream</var> and <var>output</var>.

 <li><p>Return <var>output</var>.
</ol>

<p>To <dfn export>UTF-8 decode</dfn> a byte stream <var>stream</var>, run
these steps:

<ol>
 <li><p>Let <var>buffer</var> be an empty byte sequence.

 <li><p><a>Read</a> three bytes from <var>stream</var>
 into <var>buffer</var>.

 <li><p>If <var>buffer</var> does not match 0xEF 0xBB 0xBF,
 <a>prepend</a> <var>buffer</var> to <var>stream</var>.

 <li><p>Let <var>output</var> be a scalar value <a for=/>stream</a>.

 <li><p><a>Run</a> <a>UTF-8</a>'s
 <a for=/>decoder</a> with <var>stream</var> and <var>output</var>.

 <li><p>Return <var>output</var>.
</ol>

<p>To <dfn export>UTF-8 decode without BOM</dfn> a byte stream <var>stream</var>, run these
steps:

<ol>
 <li><p>Let <var>output</var> be a scalar value <a for=/>stream</a>.

 <li><p><a>Run</a> <a>UTF-8</a>'s
 <a for=/>decoder</a> with <var>stream</var> and <var>output</var>.

 <li><p>Return <var>output</var>.
</ol>

<p>To <dfn export>UTF-8 decode without BOM or fail</dfn> a byte stream <var>stream</var>, run these
steps:
<!-- Needed by https://tools.ietf.org/html/rfc6455#section-8.1 and
     https://webassembly.github.io/spec/js-api/#dom-module-customsections-moduleobject-sectionname
     -->

<ol>
 <li><p>Let <var>output</var> be a scalar value stream.

 <li><p>Let <var>potentialError</var> be the result of <a>running</a>
 <a>UTF-8</a>'s <a for=/>decoder</a> with <var>stream</var>, <var>output</var>, and
 "<code>fatal</code>".

 <li><p>If <var>potentialError</var> is <a>error</a>, return failure.

 <li><p>Return <var>output</var>.
</ol>

<hr>

<p>To <dfn export>encode</dfn> a scalar value stream <var>stream</var> using encoding
<var>encoding</var>, run these steps:

<ol>
 <li><p>Assert: <var>encoding</var> is not <a>replacement</a>, <a>UTF-16BE</a> or
 <a>UTF-16LE</a>.

 <li><p>Let <var>output</var> be a byte <a for=/>stream</a>.

 <li><p><a>Run</a> <var>encoding</var>'s
 <a for=/>encoder</a> with <var>stream</var>, <var>output</var>, and "<code>html</code>".

 <li><p>Return <var>output</var>.
</ol>

<p class="note no-backref">This is mostly a legacy hook for URLs and HTML forms. Layering
<a>UTF-8 encode</a> on top is safe as it never triggers
<a>errors</a>.
[[URL]]
[[HTML]]

<p>To <dfn export>UTF-8 encode</dfn> a scalar value stream <var>stream</var>, return the result of
<a lt=encode for=/>encoding</a> <var>stream</var> using encoding <a>UTF-8</a>.

<hr>

<p>To <dfn export>BOM sniff</dfn> a byte stream <var>stream</var>, run these steps:

<ol>
 <li><p>Wait until <var>stream</var> has three bytes available or the <a>end-of-stream</a> has been
 reached, whichever comes first.

 <li>
  <p>For each of the rows in the table below, starting with the first one and going down, if
  <var>stream</var> <a for="byte sequence">starts with</a> the bytes given in the first column,
  return the <a for=/>encoding</a> given in the cell in the second column of that row. (Do not
  consume those bytes.)

  <table>
   <tbody><tr><th>Byte order mark<th>Encoding
   <tr><td>0xEF 0xBB 0xBF<td><a>UTF-8</a>
   <tr><td>0xFE 0xFF<td><a>UTF-16BE</a>
   <tr><td>0xFF 0xFE<td><a>UTF-16LE</a>
  </table>

 <li><p>Return null.
</ol>

<p class=note>This hook is a workaround for the fact that <a>decode</a> has no way to communicate
back to the caller that it has found a byte order mark and is therefore not using the provided
encoding. The hook is to be invoked before <a>decode</a>, and it will return an encoding
corresponding to the byte order mark found, or null otherwise.


<h2 id=api>API</h2>

<p>This section uses terminology from Web IDL. Browser user agents must support this API. JavaScript
implementations should support this API. Other user agents or programming languages are encouraged
to use an API suitable to their needs, which might not be this one. [[!WEBIDL]]

<div class=example id=example-textencoder>
 <p>The following example uses the {{TextEncoder}} object to encode
 an array of strings into an
 {{ArrayBuffer}}. The result is a
 {{Uint8Array}} containing the number
 of strings (as a {{Uint32Array}}),
 followed by the length of the first string (as a
 {{Uint32Array}}), the
 <a>UTF-8</a> encoded string data, the length of the second string (as
 a {{Uint32Array}}), the string data,
 and so on.
 <pre><code class=lang-javascript>
function encodeArrayOfStrings(strings) {
  var encoder, encoded, len, bytes, view, offset;

  encoder = new TextEncoder();
  encoded = [];

  len = Uint32Array.BYTES_PER_ELEMENT;
  for (var i = 0; i &lt; strings.length; i++) {
    len += Uint32Array.BYTES_PER_ELEMENT;
    encoded[i] = encoder.encode(strings[i]);
    len += encoded[i].byteLength;
  }

  bytes = new Uint8Array(len);
  view = new DataView(bytes.buffer);
  offset = 0;

  view.setUint32(offset, strings.length);
  offset += Uint32Array.BYTES_PER_ELEMENT;
  for (var i = 0; i &lt; encoded.length; i += 1) {
    len = encoded[i].byteLength;
    view.setUint32(offset, len);
    offset += Uint32Array.BYTES_PER_ELEMENT;
    bytes.set(encoded[i], offset);
    offset += len;
  }
  return bytes.buffer;
}</code></pre>

 <p>The following example decodes an {{ArrayBuffer}} containing data encoded in the
 format produced by the previous example, or an equivalent algorithm for encodings other than
 <a>UTF-8</a>, back into an array of strings.

 <pre><code class=lang-javascript>
function decodeArrayOfStrings(buffer, encoding) {
  var decoder, view, offset, num_strings, strings, len;

  decoder = new TextDecoder(encoding);
  view = new DataView(buffer);
  offset = 0;
  strings = [];

  num_strings = view.getUint32(offset);
  offset += Uint32Array.BYTES_PER_ELEMENT;
  for (var i = 0; i &lt; num_strings; i++) {
    len = view.getUint32(offset);
    offset += Uint32Array.BYTES_PER_ELEMENT;
    strings[i] = decoder.decode(
      new DataView(view.buffer, offset, len));
    offset += len;
  }
  return strings;
}</code></pre>
</div>


<h3 id=interface-mixin-textdecodercommon>Interface mixin {{TextDecoderCommon}}</h3>

<pre class=idl>
interface mixin TextDecoderCommon {
  readonly attribute DOMString encoding;
  readonly attribute boolean fatal;
  readonly attribute boolean ignoreBOM;
};
</pre>

<p>The {{TextDecoderCommon}} interface mixin defines common attributes that are shared between
{{TextDecoder}} and {{TextDecoderStream}} objects. These objects have an associated
<dfn id=textdecoder-encoding for=TextDecoderCommon>encoding</dfn>,
<dfn id=textdecoder-ignore-bom-flag for=TextDecoderCommon>ignore BOM</dfn> (initially false),
<dfn id=textdecoder-bom-seen-flag for=TextDecoderCommon>BOM seen</dfn> (initially false), and
<dfn id=textdecoder-error-mode for=TextDecoderCommon>error mode</dfn> (initially
"<code>replacement</code>").

<p>These objects also have an associated
<dfn id=concept-td-serialize for=TextDecoderCommon>serialize stream</dfn> algorithm, that given a
<a for=/>stream</a> <var>stream</var>, runs these steps:

<ol>
 <li><p>Let <var>output</var> be the empty string.

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>stream</var>.

   <li>
    <p>If <a for=TextDecoderCommon>encoding</a> is <a>UTF-8</a>, <a>UTF-16BE</a>, or
    <a>UTF-16LE</a>, and <a for=TextDecoderCommon>ignore BOM</a> and
    <a for=TextDecoderCommon>BOM seen</a> are false, then:

    <ol>
     <li><p>If <var>token</var> is U+FEFF, then set <a for=TextDecoderCommon>BOM seen</a> to true.

     <li><p>Otherwise, if <var>token</var> is not <a>end-of-stream</a>, then set
     <a for=TextDecoderCommon>BOM seen</a> to true and append <var>token</var> to <var>output</var>.

     <li><p>Otherwise, return <var>output</var>.
    </ol>

   <li><p>Otherwise, if <var>token</var> is not <a>end-of-stream</a>, then append <var>token</var>
   to <var>output</var>.

   <li><p>Otherwise, return <var>output</var>.
  </ol>
</ol>

<p class=note>This algorithm is intentionally different with respect to BOM handling from
the <a for=/>decode</a> algorithm used by the rest of the platform to give API users more
control.

<hr>

<p>The <dfn attribute id=dom-textdecoder-encoding for=TextDecoderCommon><code>encoding</code></dfn>
attribute's getter, when invoked, must return this object's <a for=TextDecoderCommon>encoding</a>'s
<a for=encoding>name</a> in <a>ASCII lowercase</a>.

<p>The <dfn attribute id=dom-textdecoder-fatal for=TextDecoderCommon><code>fatal</code></dfn>
attribute's getter, when invoked, must return true if this object's
<a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>", and false otherwise.

<p>The
<dfn attribute id=dom-textdecoder-ignorebom for=TextDecoderCommon><code>ignoreBOM</code></dfn>
attribute's getter, when invoked, must return the value of this object's
<a for=TextDecoderCommon>ignore BOM</a>.


<h3 id=interface-textdecoder>Interface {{TextDecoder}}</h3>

<pre class=idl>
dictionary TextDecoderOptions {
  boolean fatal = false;
  boolean ignoreBOM = false;
};

dictionary TextDecodeOptions {
  boolean stream = false;
};

[Exposed=(Window,Worker)]
interface TextDecoder {
  constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options = {});

  USVString decode(optional [AllowShared] BufferSource input, optional TextDecodeOptions options = {});
};
TextDecoder includes TextDecoderCommon;
</pre>

<p>A {{TextDecoder}} object has an associated <dfn for=TextDecoder>decoder</dfn>,
<dfn for=TextDecoder>stream</dfn>, and
<dfn id=textdecoder-do-not-flush-flag for=TextDecoder>do not flush</dfn> (initially false).

<dl class=domintro>
 <dt><code><var>decoder</var> = new <a constructor for=TextDecoder lt=TextDecoder()>TextDecoder([<var>label</var> = "utf-8" [, <var>options</var>]])</a></code>
 <dd>
  <p>Returns a new {{TextDecoder}} object.
  <p>If <var>label</var> is either not a <a>label</a> or is a
  <a>label</a> for <a>replacement</a>,
  <a>throws</a> a
  {{RangeError}}.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>encoding</a></code>
 <dd><p>Returns <a for=TextDecoderCommon>encoding</a>'s <a>name</a>, lowercased.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>fatal</a></code>
 <dd><p>Returns true if <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>", and
 false otherwise.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>ignoreBOM</a></code>
 <dd><p>Returns the value of <a for=TextDecoderCommon>ignore BOM</a>.

 <dt><code><var>decoder</var> . <a method for=TextDecoder lt=decode()>decode([<var>input</var> [, <var>options</var>]])</a></code>
 <dd>
  <p>Returns the result of running <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a>.
  The method can be invoked zero or more times with <var>options</var>'s <code>stream</code> set to
  true, and then once without <var>options</var>'s <code>stream</code> (or set to false), to process
  a fragmented stream. If the invocation without <var>options</var>'s <code>stream</code> (or set to
  false) has no <var>input</var>, it's clearest to omit both arguments.

  <pre class=example id=example-end-of-stream><code class=lang-javascript>
var string = "", decoder = new TextDecoder(encoding), buffer;
while(buffer = next_chunk()) {
  string += decoder.decode(buffer, {stream:true});
}
string += decoder.decode(); // end-of-stream</code></pre>

  <p>If the <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>" and
  <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> returns <a>error</a>,
  <a>throws</a> a {{TypeError}}.
</dl>

<p>The
<dfn constructor for=TextDecoder id=dom-textdecoder><code>TextDecoder(<var>label</var>, <var>options</var>)</code></dfn>
constructor, when invoked, must run these steps:

<ol>
 <li><p>Let <var>encoding</var> be the result of <a>getting an encoding</a> from <var>label</var>.

 <li><p>If <var>encoding</var> is failure or <a>replacement</a>, then <a>throw</a> a {{RangeError}}.

 <li><p>Let <var>dec</var> be a new {{TextDecoder}} object.

 <li><p>Set <var>dec</var>'s <a for=TextDecoderCommon>encoding</a> to <var>encoding</var>.

 <li><p>If <var>options</var>'s <code>fatal</code> member is true, then set <var>dec</var>'s
 <a for=TextDecoderCommon>error mode</a> to "<code>fatal</code>".

 <li>If <var>options</var>'s <code>ignoreBOM</code> member is true, then set <var>dec</var>'s
 <a for=TextDecoderCommon>ignore BOM</a> to true.

 <li><p>Return <var>dec</var>.
</ol>

<p>The <dfn method for=TextDecoder><code>decode(<var>input</var>, <var>options</var>)</code></dfn>
method, when invoked, must run these steps:

<ol>
 <li><p>If <a for=TextDecoder>do not flush</a> is false, set <a for=TextDecoder>decoder</a>
 to a new <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a>,
 <a for=TextDecoder>stream</a> to a new <a for=/>stream</a>, and
 <a for=TextDecoderCommon>BOM seen</a> to false.

 <li><p>Set <a for=TextDecoder>do not flush</a> to true if <var>options</var>'s <code>stream</code>
 is true, and false otherwise.

 <li>
  <p>If <var>input</var> is given, then <a>push</a> a
  <a lt="get a copy of the buffer source">copy of</a> <var>input</var> to
  <a for=TextDecoder>stream</a>.

  <p class=note>Implementations are strongly encouraged to use an implementation strategy that
  avoids this copy. When doing so they will have to make sure that changes to <var>input</var> do
  not affect future calls to <a method><code>decode()</code></a>.

  <p class=warning id=sharedarraybuffer-warning>The memory exposed by <code>SharedArrayBuffer</code>
  objects does not adhere to data race freedom properties required by the memory model of
  programming languages typically used for implementations. When implementing, take care to use the
  appropriate facilities when accessing memory exposed by <code>SharedArrayBuffer</code> objects.

 <li><p>Let <var>output</var> be a new <a for=/>stream</a>.

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>token</var> be the result of <a>reading</a> from <a for=TextDecoder>stream</a>.

   <li>
    <p>If <var>token</var> is <a>end-of-stream</a> and <a for=TextDecoder>do not flush</a>
    is true, then return <var>output</var>,
    <a lt="serialize stream" for=TextDecoderCommon>serialized</a>.

    <p class=note>The way streaming works is to not handle <a>end-of-stream</a> here when
    <a for=TextDecoder>do not flush</a> is true and to not set <a for=TextDecoder>do not flush</a>
    to false. That way in a subsequent invocation <a for=TextDecoder>decoder</a> is not set anew in
    the first step of the algorithm and its state is preserved.

   <li>
    <p>Otherwise:

    <ol>
     <li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for
     <a for=TextDecoder>decoder</a>, <a for=TextDecoder>stream</a>, <var>output</var>, and
     <a for=TextDecoderCommon>error mode</a>.

     <li><p>If <var>result</var> is <a>finished</a>, then return <var>output</var>,
     <a lt="serialize stream" for=TextDecoderCommon>serialized</a>.

     <li><p>Otherwise, if <var>result</var> is <a>error</a>, then <a lt=throw>throw</a> a
     {{TypeError}}.
    </ol>
  </ol>
</ol>

<h3 id=interface-mixin-textencodercommon>Interface mixin {{TextEncoderCommon}}</h3>

<pre class=idl>
interface mixin TextEncoderCommon {
  readonly attribute DOMString encoding;
};
</pre>

<p>The {{TextEncoderCommon}} interface mixin defines common attributes that are shared between
{{TextEncoder}} and {{TextEncoderStream}} objects.

<p>The <dfn attribute id=dom-textencoder-encoding for=TextEncoderCommon><code>encoding</code></dfn>
attribute's getter, when invoked, must return "<code>utf-8</code>".


<h3 id=interface-textencoder>Interface {{TextEncoder}}</h3>

<pre class=idl>
dictionary TextEncoderEncodeIntoResult {
  unsigned long long read;
  unsigned long long written;
};

[Exposed=(Window,Worker)]
interface TextEncoder {
  constructor();

  [NewObject] Uint8Array encode(optional USVString input = "");
  TextEncoderEncodeIntoResult encodeInto(USVString source, [AllowShared] Uint8Array destination);
};
TextEncoder includes TextEncoderCommon;
</pre>

<p class="note no-backref">A {{TextEncoder}} object offers no <var>label</var> argument as it only
supports <a>UTF-8</a>. It also offers no <code>stream</code> option as no <a for=/>encoder</a>
requires buffering of scalar values.

<hr>

<dl class=domintro>
 <dt><code><var>encoder</var> = new <a constructor for=TextEncoder>TextEncoder()</a></code>
 <dd><p>Returns a new {{TextEncoder}} object.

 <dt><code><var>encoder</var> . <a attribute for=TextEncoderCommon>encoding</a></code>
 <dd><p>Returns "<code>utf-8</code>".

 <dt><code><var>encoder</var> . <a method for=TextEncoder lt=encode()>encode([<var>input</var> = ""])</a></code>
 <dd><p>Returns the result of running <a>UTF-8</a>'s <a for=/>encoder</a>.

 <dt><code><var>encoder</var> . <a method=for=TextEncoder lt="encodeInto(source, destination)">encodeInto(<var>source</var>, <var>destination</var>)</a></code>
 <dd><p>Runs the <a>UTF-8 encoder</a> on <var>source</var>, stores the result of that operation into
 <var>destination</var>, and returns the progress made as a dictionary whereby
 {{TextEncoderEncodeIntoResult/read}} is the number of converted <a>code units</a> of
 <var>source</var> and {{TextEncoderEncodeIntoResult/written}} is the number of bytes modified in
 <var>destination</var>.
</dl>

<p>The <dfn constructor for=TextEncoder id=dom-textencoder><code>TextEncoder()</code></dfn>
constructor, when invoked, must return a new {{TextEncoder}} object.

<p>The <dfn method for=TextEncoder><code>encode(<var>input</var>)</code></dfn> method, when invoked,
must run these steps:

<ol>
 <li><p>Convert <var>input</var> to a <a for=/>stream</a>.

 <li><p>Let <var>output</var> be a new <a for=/>stream</a>.

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>token</var> be the result of
   <a>reading</a> from <var>input</var>.

   <li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for the
   <a>UTF-8 encoder</a>, <var>input</var>, <var>output</var>.

   <li>
    <p>Assert: <var>result</var> is not <a>error</a>.

    <p class=note>The <a>UTF-8 encoder</a> cannot return <a>error</a>.

   <li><p>If <var>result</var> is <a>finished</a>, convert <var>output</var> into a byte sequence,
   and then return a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing <var>output</var>.
   <!-- XXX https://www.w3.org/Bugs/Public/show_bug.cgi?id=26966 -->
  </ol>
</ol>

<p>The
<dfn method for=TextEncoder><code>encodeInto(<var>source</var>, <var>destination</var>)</code></dfn>
method, when invoked, must run these steps:

<ol>
 <li><p>Let <var>read</var> be 0.

 <li><p>Let <var>written</var> be 0.

 <li><p>Let <var>destinationBytes</var> be the result of
 <a lt="get a reference to the buffer source">getting a reference to the bytes held by</a>
 <var>destination</var>.

 <li>
  <p>Let <var>unused</var> be a new <a for=/>stream</a>.

  <p class=note>The <a>handler</a> algorithm invoked below requires this argument, but it is not
  used by the <a>UTF-8 encoder</a>.

 <li><p>Convert <var>source</var> to a <a for=/>stream</a>.

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>source</var>.

   <li><p>Let <var>result</var> be the result of running the <a>UTF-8 encoder</a>'s <a>handler</a>
   on <var>unused</var> and <var>token</var>.

   <li><p>If <var>result</var> is <a>finished</a>, then <a for=iteration>break</a>.

   <li>
    <p>Otherwise:

    <ol>
     <li>
      <p>If <var>destinationBytes</var>'s <a for="byte sequence">length</a> &minus;
      <var>written</var> is greater than or equal to the number of bytes in <var>result</var>, then:

      <ol>
       <li><p>If <var>token</var> is greater than U+FFFF, then increment <var>read</var> by 2.

       <li><p>Otherwise, increment <var>read</var> by 1.

       <li>
        <p>Write the bytes in <var>result</var> into <var>destinationBytes</var>, from byte offset
        <var>written</var>.

        <p class=warning>See the
        <a href=#sharedarraybuffer-warning>warning for <code>SharedArrayBuffer</code> objects</a>
        above.

       <li><p>Increment <var>written</var> by the number of bytes in <var>result</var>.
      </ol>

     <li><p>Otherwise, <a for=iteration>break</a>.
    </ol>
  </ol>

 <li><p>Return a new {{TextEncoderEncodeIntoResult}} dictionary whose
 {{TextEncoderEncodeIntoResult/read}} member is <var>read</var> and
 {{TextEncoderEncodeIntoResult/written}} member is <var>written</var>.
</ol>

<div class=example id=example-textencoder-encodeinto>
 <p>The <a method=for=TextEncoder lt="encodeInto(source, destination)">encodeInto()</a> method can
 be used to encode a string into an existing {{ArrayBuffer}} object. Various details below are left
 as an exercise for the reader, but this demonstrates an approach one could take to use this method:

 <pre><code class=lang-javascript>
function convertString(buffer, input, callback) {
  let bufferSize = 256,
      bufferStart = malloc(buffer, bufferSize),
      writeOffset = 0,
      readOffset = 0;
  while (true) {
    const view = new Uint8Array(buffer, bufferStart + writeOffset, bufferSize - writeOffset),
          {read, written} = cachedEncoder.encodeInto(input.substring(readOffset), view);
    readOffset += read;
    writeOffset += written;
    if (readOffset === input.length) {
      callback(bufferStart, writeOffset);
      free(buffer, bufferStart);
      return;
    }
    bufferSize *= 2;
    bufferStart = realloc(buffer, bufferStart, bufferSize);
  }
}
</code></pre>
</div>


<h3 id=interface-mixin-generictransformstream>Interface mixin {{GenericTransformStream}}</h3>

<p>The {{GenericTransformStream}} interface mixin represents the concept of a
<a>transform stream</a> in IDL. It is not a {{TransformStream}}, though it has the same interface
and it delegates to one.

<pre class=idl>
interface mixin GenericTransformStream {
  readonly attribute ReadableStream readable;
  readonly attribute WritableStream writable;
};
</pre>

<p>An object that includes {{GenericTransformStream}} has an associated
<dfn for=GenericTransformStream>transform</dfn> of type {{TransformStream}}.

<p>The <dfn attribute for=GenericTransformStream><code>readable</code></dfn> attribute's getter,
when invoked, must return this object's <a for=GenericTransformStream>transform</a>.\[[readable]].

<p>The <dfn attribute for=GenericTransformStream><code>writable</code></dfn> attribute's getter,
when invoked, must return this object's <a for=GenericTransformStream>transform</a>.\[[writable]].


<h3 id=interface-textdecoderstream>Interface {{TextDecoderStream}}</h3>

<pre class=idl>
[Exposed=(Window,Worker)]
interface TextDecoderStream {
  constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options = {});
};
TextDecoderStream includes TextDecoderCommon;
TextDecoderStream includes GenericTransformStream;
</pre>

<p>A {{TextDecoderStream}} object has an associated
<dfn for=TextDecoderStream>decoder</dfn>, and <dfn for=TextDecoderStream>stream</dfn>.

<dl class=domintro>
 <dt><code><var>decoder</var> = new
 <a constructor for=TextDecoderStream lt=TextDecoderStream()>TextDecoderStream([<var>label</var> =
 "utf-8" [, <var>options</var>]])</a></code>
 <dd>
  <p>Returns a new {{TextDecoderStream}} object.
  <p>If <var>label</var> is either not a <a>label</a> or is a <a>label</a> for <a>replacement</a>,
  <a>throws</a> a {{RangeError}}.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>encoding</a></code>
 <dd><p>Returns <a for=TextDecoderCommon>encoding</a>'s <a>name</a>, lowercased.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>fatal</a></code>
 <dd><p>Returns true if <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>", and
 false otherwise.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>ignoreBOM</a></code>
 <dd><p>Returns the value of <a for=TextDecoderCommon>ignore BOM</a>.

 <dt><code><var>decoder</var> . <a attribute for=GenericTransformStream>readable</a></code>
 <dd>
  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running
  <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> on the chunks written to
  {{GenericTransformStream/writable}}.

 <dt><code><var>decoder</var> . <a attribute for=GenericTransformStream>writable</a></code>
 <dd>
  <p>Returns a <a>writable stream</a> which accepts
  <code>[<a extended-attribute>AllowShared</a>] <a typedef>BufferSource</a></code> chunks and runs
  them through <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> before making them
  available to {{GenericTransformStream/readable}}.

  <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a
  {{ReadableStream}} source.

  <pre class=example id=example-textdecoderstream-writable><code class=lang-javascript>
var decoder = new TextDecoderStream(encoding);
byteReadable
  .pipeThrough(decoder)
  .pipeTo(textWritable);</code></pre>

  <p>If the <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>" and
  <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> returns <a>error</a>, both
  {{GenericTransformStream/readable}} and {{GenericTransformStream/writable}} will be errored with a
  {{TypeError}}.
</dl>

<p>The
<dfn constructor for=TextDecoderStream id=dom-textdecoderstream><code>TextDecoderStream(<var>label</var>,
<var>options</var>)</code></dfn> constructor, when invoked, must run these steps:

<ol>
 <li><p>Let <var>encoding</var> be the result of <a>getting an encoding</a> from <var>label</var>.

 <li><p>If <var>encoding</var> is failure or <a>replacement</a>, then <a>throw</a> a {{RangeError}}.

 <li><p>Let <var>dec</var> be a new {{TextDecoderStream}} object.

 <li><p>Set <var>dec</var>'s <a for=TextDecoderCommon>encoding</a> to <var>encoding</var>.

 <li><p>If <var>options</var>'s <code>fatal</code> member is true, then set <var>dec</var>'s
 <a for=TextDecoderCommon>error mode</a> to "<code>fatal</code>".

 <li><p>If <var>options</var>'s <code>ignoreBOM</code> member is true, then set <var>dec</var>'s
 <a for=TextDecoderCommon>ignore BOM</a> to true.

 <li>
  <p>Set <var>dec</var>'s <a for=TextDecoderStream>decoder</a> to a new <a for=/>decoder</a>
  for <var>dec</var>'s <a for=TextDecoderCommon>encoding</a>, and set <var>dec</var>'s
  <a for=TextDecoderStream>stream</a> to a new <a for=/>stream</a>.

 <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing.

 <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
 and runs the <a>decode and enqueue a chunk</a> algorithm with <var>dec</var> and
 <var>chunk</var>.

 <li><p>Let <var>flushAlgorithm</var> be an algorithm which takes no arguments and runs the <a>flush
 and enqueue</a> algorithm with <var>dec</var>.

 <li><p>Let <var>transform</var> be the result of calling
 <a abstract-op>CreateTransformStream</a>(<var>startAlgorithm</var>, <var>transformAlgorithm</var>,
 <var>flushAlgorithm</var>).

 <li><p>Set <var>dec</var>'s <a for=GenericTransformStream>transform</a> to <var>transform</var>.

 <li><p>Return <var>dec</var>.
</ol>

<p>The <dfn>decode and enqueue a chunk</dfn> algorithm, given a {{TextDecoderStream}} object
<var>dec</var> and a <var>chunk</var>, runs these steps:

<ol>
 <li><p>Let <var>bufferSource</var> be the result of
 <a lt="converted to an IDL value">converting</a> <var>chunk</var> to an
 <code>[<a extended-attribute>AllowShared</a>] <a typedef>BufferSource</a></code>. If this throws an
 exception, then return a promise rejected with that exception.

 <li>
  <p><a>Push</a> a <a lt="get a copy of the buffer source">copy of</a> <var>bufferSource</var> to
  <var>dec</var>'s <a for=TextDecoderStream>stream</a>. If this throws an exception, then return a
  promise rejected with that exception.

  <p class=warning>See the
  <a href=#sharedarraybuffer-warning>warning for <code>SharedArrayBuffer</code> objects</a> above.

 <li><p>Let <var>controller</var> be <var>dec</var>'s
 <a for=GenericTransformStream>transform</a>.\[[transformStreamController]].

 <li><p>Let <var>output</var> be a new <a for=/>stream</a>.

 <li>
  <p>While true, run these steps:

  <ol>
   <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>dec</var>'s
   <a for=TextDecoderStream>stream</a>.

   <li>
    <p>If <var>token</var> is <a>end-of-stream</a>, run these steps:
    <ol>
     <li><p>Let <var>outputChunk</var> be <var>output</var>,
     <a lt="serialize stream" for=TextDecoderCommon>serialized</a>.

     <li><p>if <var>outputChunk</var> is non-empty, call
     <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>,
     <var>outputChunk</var>).

     <li><p>Return a new promise resolved with undefined.
    </ol>

   <li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for
   <var>dec</var>'s <a for=TextDecoderStream>decoder</a>, <var>dec</var>'s
   <a for=TextDecoderStream>stream</a>, <var>output</var>, and <var>dec</var>'s
   <a for=TextDecoderCommon>error mode</a>.

   <li><p>If <var>result</var> is <a>error</a>, then return a new promise rejected with a
   {{TypeError}} exception.
  </ol>
</ol>

<p>The <dfn>flush and enqueue</dfn> algorithm, which handles the end of data from the input
{{ReadableStream}} object, given a {{TextDecoderStream}} object <var>dec</var>, runs these steps:

<ol>
 <li><p>Let <var>output</var> be a new <a for=/>stream</a>.

 <li><p>Let <var>result</var> be the result of <a>processing</a> <a>end-of-stream</a> for
 <var>dec</var>'s <a for=TextDecoderStream>decoder</a> and <var>dec</var>'s
 <a for=TextDecoderStream>stream</a>, <var>output</var>, and <var>dec</var>'s
 <a for=TextDecoderCommon>error mode</a>.

 <li><p>If <var>result</var> is <a>finished</a>, run these steps:
 <ol>
  <li><p>Let <var>outputChunk</var> be <var>output</var>,
  <a lt="serialize stream" for=TextDecoderCommon>serialized</a>.

  <li><p>Let <var>controller</var> be <var>dec</var>'s
  <a for=GenericTransformStream>transform</a>.\[[transformStreamController]].

  <li><p>If <var>outputChunk</var> is non-empty, call
  <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>,
  <var>outputChunk</var>).

  <li><p>Return a new promise resolved with undefined.
 </ol>

 <li><p>Otherwise, return a new promise rejected with a {{TypeError}} exception.
</ol>


<h3 id=interface-textencoderstream>Interface {{TextEncoderStream}}</h3>

<pre class=idl>
[Exposed=(Window,Worker)]
interface TextEncoderStream {
  constructor();
};
TextEncoderStream includes TextEncoderCommon;
TextEncoderStream includes GenericTransformStream;
</pre>

<p>A {{TextEncoderStream}} object has an associated <dfn for=TextEncoderStream>encoder</dfn>,
and <dfn for=TextEncoderStream>pending high surrogate</dfn> (initially null).

<p class="note no-backref">A {{TextEncoderStream}} object offers no <var>label</var> argument as it
only supports <a>UTF-8</a>.

<dl class=domintro>
 <dt><code><var>encoder</var> = new <a constructor for=TextEncoderStream>TextEncoderStream()</a></code>
 <dd><p>Returns a new {{TextEncoderStream}} object.

 <dt><code><var>encoder</var> . <a attribute for=TextEncoderCommon>encoding</a></code>
 <dd><p>Returns "<code>utf-8</code>".

 <dt><code><var>encoder</var> . <a attribute for=GenericTransformStream>readable</a></code>
 <dd>
  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are {{Uint8Array}}s resulting from running
  <a>UTF-8</a>'s <a for=/>encoder</a> on the chunks written to {{GenericTransformStream/writable}}.

 <dt><code><var>encoder</var> . <a attribute for=GenericTransformStream>writable</a></code>
 <dd>
  <p>Returns a <a>writable stream</a> which accepts string chunks and runs them through
  <a>UTF-8</a>'s <a for=/>encoder</a> before making them available to
  {{GenericTransformStream/readable}}.

  <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a
  {{ReadableStream}} source.

  <pre class=example id=example-textencoderstream-writable><code class=lang-javascript>
textReadable
  .pipeThrough(new TextEncoderStream())
  .pipeTo(byteWritable);</code></pre>
</dl>

<p>The
<dfn constructor for=TextEncoderStream id=dom-textencoderstream><code>TextEncoderStream()</code></dfn>
constructor, when invoked, must run these steps:

<ol>
 <li><p>Let <var>enc</var> be a new {{TextEncoderStream}} object.

 <li><p>Set <var>enc</var>'s <a for=TextEncoderStream>encoder</a> to <a>UTF-8</a>'s
 <a for=/>encoder</a>.

 <li><p>Let <var>startAlgorithm</var> be an algorithm that takes no arguments and returns nothing.

 <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
 and runs the <a>encode and enqueue a chunk</a> algorithm with <var>enc</var> and <var>chunk</var>.

 <li><p>Let <var>flushAlgorithm</var> be an algorithm which runs the <a>encode and flush</a>
 algorithm with <var>enc</var>.

 <li><p>Let <var>transform</var> be the result of calling
 <a abstract-op>CreateTransformStream</a>(<var>startAlgorithm</var>, <var>transformAlgorithm</var>,
 <var>flushAlgorithm</var>).

 <li><p>Set <var>enc</var>'s <a for=GenericTransformStream>transform</a> to <var>transform</var>.

 <li><p>Return <var>enc</var>.
</ol>

<hr>

<p>The <dfn>encode and enqueue a chunk</dfn> algorithm, given a {{TextEncoderStream}} object
<var>enc</var> and <var>chunk</var>, runs these steps:

<ol>
 <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a>
 <var>chunk</var> to a {{DOMString}}. If this throws an exception, then return a promise rejected
 with that exception.

 <p class=note>{{DOMString}} is used here so that a surrogate pair that is split between chunks can
 be reassembled into the appropriate scalar value. The behavior is otherwise identical to
 {{USVString}}. In particular, lone surrogates will be replaced with U+FFFD.

 <li><p>Convert <var>input</var> to a <a for=/>stream</a>.

 <li><p>Let <var>output</var> be a new <a for=/>stream</a>.

 <li><p>Let <var>controller</var> be <var>enc</var>'s
 <a for=GenericTransformStream>transform</a>.\[[transformStreamController]].

 <li>
  <p>While true, run these steps:

  <ol>
   <li><p>Let <var>token</var> be the result of <a>reading</a> from <var>input</var>.

   <li>
    <p>If <var>token</var> is <a>end-of-stream</a>, run these steps:

    <ol>
     <li><p>Convert <var>output</var> into a byte sequence.

     <li>
      <p>If <var>output</var> is non-empty, run these steps:

      <ol>
       <li><p>Let <var>chunk</var> be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing
       <var>output</var>.

       <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>,
       <var>chunk</var>).
      </ol>

     <li><p>Return a new promise resolved with undefined.
    </ol>

   <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar
   value</a> algorithm with <var>enc</var>, <var>token</var> and <var>input</var>.

   <li><p>If <var>result</var> is not <a>continue</a>, then <a>process</a> <var>result</var> for
   <a for=TextEncoderStream>encoder</a>, <var>input</var>, <var>output</var>.

  </ol>
</ol>

<p>The <dfn>convert code unit to scalar value</dfn> algorithm, given a {{TextEncoderStream}} object
<var>enc</var>, <var>token</var>, and stream <var>input</var>, runs these steps:

<ol>
 <li>
  <p>If <var>enc</var>'s <a>pending high surrogate</a> is non-null, run these steps:

  <ol>
   <li><p>Let <var>high surrogate</var> be <var>enc</var>'s <a>pending high surrogate</a>.

   <li><p>Set <var>enc</var>'s <a>pending high surrogate</a> to null.

   <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, then return a code point
   whose value is 0x10000 + ((<var>high surrogate</var> &minus; 0xD800) &lt;&lt; 10) +
   (<var>token</var> &minus; 0xDC00).

   <li><p><a>Prepend</a> <var>token</var> to <var>input</var>.

   <li><p>Return U+FFFD.
  </ol>

 <li><p>If <var>token</var> is in the range U+D800 to U+DBFF, inclusive, then set <a>pending high
 surrogate</a> to <var>token</var> and return <a>continue</a>.

 <li><p>If <var>token</var> is in the range U+DC00 to U+DFFF, inclusive, then return U+FFFD.

 <li><p>Return <var>token</var>.
</ol>

<p class=note>This is equivalent to the "<a for=string>convert</a> a <a for=/>string</a> into a
<a for=/>scalar value string</a>" algorithm from the Infra Standard, but allows for surrogate pairs
that are split between strings. [[!INFRA]]

<p>The <dfn>encode and flush</dfn> algorithm, given a {{TextEncoderStream}} object <var>enc</var>,
runs these steps:

<ol>
 <li>
  <p>If <var>enc</var>'s <a>pending high surrogate</a> is non-null, run these steps:

  <ol>
   <li><p>Let <var>controller</var> be <var>enc</var>'s
   <a for=GenericTransformStream>transform</a>.\[[transformStreamController]].

   <li>
    <p>Let <var>output</var> be the byte sequence 0xEF 0xBF 0xBD.

    <p class=note>This is the replacement character U+FFFD encoded as UTF-8.

   <li><p>Let <var>chunk</var> be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing
   <var>output</var>.

   <li><p>Call <a abstract-op>TransformStreamDefaultControllerEnqueue</a>(<var>controller</var>,
   <var>chunk</var>).
  </ol>

 <li><p>Return a new promise resolved with undefined.
</ol>


<h2 id=the-encoding>The encoding</h2>

<h3 id=utf-8 dfn export>UTF-8</h3>

<h4 id=utf-8-decoder dfn export>UTF-8 decoder</h4>

<p class="note no-backref">A byte order mark has priority over a <a>label</a> as it has been found
to be more accurate in deployed content. Therefore it is not part of the <a>UTF-8 decoder</a>
algorithm but rather the <a>decode</a> and <a>UTF-8 decode</a> algorithms.

<p><a>UTF-8</a>'s <a for=/>decoder</a>'s has an associated
<dfn>UTF-8 code point</dfn>, <dfn>UTF-8 bytes seen</dfn>, and
<dfn>UTF-8 bytes needed</dfn> (all initially 0), a <dfn>UTF-8 lower boundary</dfn>
(initially 0x80), and a <dfn>UTF-8 upper boundary</dfn> (initially 0xBF).

<p><a>UTF-8</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>UTF-8 bytes needed</a> is not 0, set
 <a>UTF-8 bytes needed</a> to 0 and return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li>
  <p>If <a>UTF-8 bytes needed</a> is 0, based on <var>byte</var>:

  <dl class=switch>
   <dt>0x00 to 0x7F
   <dd><p>Return a code point whose value is <var>byte</var>.

   <dt>0xC2 to 0xDF
   <dd>
    <ol>
     <li><p>Set <a>UTF-8 bytes needed</a> to 1.

     <li>
      <p>Set <a>UTF-8 code point</a> to <var>byte</var> &amp; 0x1F.

      <p class=note>The five least significant bits of <var>byte</var>.
    </ol>

   <dt>0xE0 to 0xEF
   <dd>
    <ol>
     <li><p>If <var>byte</var> is 0xE0, set
     <a>UTF-8 lower boundary</a> to 0xA0.

     <li><p>If <var>byte</var> is 0xED, set
     <a>UTF-8 upper boundary</a> to 0x9F.

     <li><p>Set <a>UTF-8 bytes needed</a> to 2.

     <li>
      <p>Set <a>UTF-8 code point</a> to <var>byte</var> &amp; 0xF.

      <p class=note>The four least significant bits of <var>byte</var>.
    </ol>

   <dt>0xF0 to 0xF4
   <dd>
    <ol>
     <li><p>If <var>byte</var> is 0xF0, set
     <a>UTF-8 lower boundary</a> to 0x90.

     <li><p>If <var>byte</var> is 0xF4, set
     <a>UTF-8 upper boundary</a> to 0x8F.

     <li><p>Set <a>UTF-8 bytes needed</a> to 3.

     <li>
      <p>Set <a>UTF-8 code point</a> to <var>byte</var> &amp; 0x7.

      <p class=note>The three least significant bits of <var>byte</var>.
    </ol>

   <dt>Otherwise
   <dd><p>Return <a>error</a>.
  </dl>

  <p>Return <a>continue</a>.

 <li>
  <p>If <var>byte</var> is not in the range <a>UTF-8 lower boundary</a> to
  <a>UTF-8 upper boundary</a>, inclusive, then:

  <ol>
   <li><p>Set <a>UTF-8 code point</a>,
   <a>UTF-8 bytes needed</a>, and <a>UTF-8 bytes seen</a> to 0,
   set <a>UTF-8 lower boundary</a> to 0x80, and set
   <a>UTF-8 upper boundary</a> to 0xBF.

   <li><p><a>Prepend</a> <var>byte</var> to
   <var>stream</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>Set <a>UTF-8 lower boundary</a> to 0x80 and
 <a>UTF-8 upper boundary</a> to 0xBF.

 <li>
  <p>Set <a>UTF-8 code point</a> to (<a>UTF-8 code point</a> &lt;&lt; 6) |
  (<var>byte</var> &amp; 0x3F)

  <p class="note no-backref">Shift the existing bits of <a>UTF-8 code point</a> left by six
  places and set the newly-vacated six least significant bits to the six least significant bits of
  <var>byte</var>.

 <li><p>Increase <a>UTF-8 bytes seen</a> by one.

 <li><p>If <a>UTF-8 bytes seen</a> is not equal to
 <a>UTF-8 bytes needed</a>, return <a>continue</a>.

 <li><p>Let <var>code point</var> be <a>UTF-8 code point</a>.

 <li><p>Set <a>UTF-8 code point</a>,
 <a>UTF-8 bytes needed</a>, and <a>UTF-8 bytes seen</a> to 0.

 <li><p>Return a code point whose value is <var>code point</var>.
</ol>

<p class=note>The constraints in the <a>UTF-8 decoder</a> above match
“Best Practices for Using U+FFFD” from the Unicode standard. No other
behavior is permitted per the Encoding Standard (other algorithms that
achieve the same result are fine, even encouraged).
[[!UNICODE]]


<h4 id=utf-8-encoder dfn export>UTF-8 encoder</h4>

<p><a>UTF-8</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li>
  <p>Set <var>count</var> and <var>offset</var> based on the
  range <var>code point</var> is in:

  <dl class=switch>
   <dt>U+0080 to U+07FF, inclusive
   <dd>1 and 0xC0
   <dt>U+0800 to U+FFFF, inclusive
   <dd>2 and 0xE0
   <dt>U+10000 to U+10FFFF, inclusive
   <dd>3 and 0xF0
  </dl>

 <li><p>Let <var>bytes</var> be a byte sequence whose first byte is
 (<var>code point</var> >> (6 × <var>count</var>)) + <var>offset</var>.

 <li>
  <p>While <var>count</var> is greater than 0:

  <ol>
   <li><p>Set <var>temp</var> to
   <var>code point</var> >> (6 × (<var>count</var> &minus; 1)).

   <li><p>Append to <var>bytes</var> 0x80 | (<var>temp</var> &amp; 0x3F).

   <li><p>Decrease <var>count</var> by one.
  </ol>

 <li><p>Return bytes <var>bytes</var>, in order.
</ol>

<p class=note>This algorithm has identical results to the one described in the Unicode standard. It
is included here for completeness. [[!UNICODE]]


<h2 id=legacy-single-byte-encodings>Legacy single-byte encodings</h2>

<p>An <a for=/>encoding</a> where each byte is either a single code point or
nothing, is a <dfn>single-byte encoding</dfn>.
<a>Single-byte encodings</a> share the
<a for=/>decoder</a> and <a for=/>encoder</a>. <dfn>Index single-byte</dfn>,
as referenced by the <a>single-byte decoder</a> and
<a>single-byte encoder</a>,  is defined by the following table, and
depends on the <a>single-byte encoding</a> in use. All but two
<a>single-byte encodings</a> have a
unique <a>index</a>.

<table>
 <tr><td><dfn export>IBM866</dfn><td><a href=index-ibm866.txt>index-ibm866.txt</a><td><a href=ibm866.html>index IBM866 visualization</a><td><a href=ibm866-bmp.html>index IBM866 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-2</dfn><td><a href=index-iso-8859-2.txt>index-iso-8859-2.txt</a><td><a href=iso-8859-2.html>index ISO-8859-2 visualization</a><td><a href=iso-8859-2-bmp.html>index ISO-8859-2 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-3</dfn><td><a href=index-iso-8859-3.txt>index-iso-8859-3.txt</a><td><a href=iso-8859-3.html>index ISO-8859-3 visualization</a><td><a href=iso-8859-3-bmp.html>index ISO-8859-3 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-4</dfn><td><a href=index-iso-8859-4.txt>index-iso-8859-4.txt</a><td><a href=iso-8859-4.html>index ISO-8859-4 visualization</a><td><a href=iso-8859-4-bmp.html>index ISO-8859-4 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-5</dfn><td><a href=index-iso-8859-5.txt>index-iso-8859-5.txt</a><td><a href=iso-8859-5.html>index ISO-8859-5 visualization</a><td><a href=iso-8859-5-bmp.html>index ISO-8859-5 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-6</dfn><td><a href=index-iso-8859-6.txt>index-iso-8859-6.txt</a><td><a href=iso-8859-6.html>index ISO-8859-6 visualization</a><td><a href=iso-8859-6-bmp.html>index ISO-8859-6 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-7</dfn><td><a href=index-iso-8859-7.txt>index-iso-8859-7.txt</a><td><a href=iso-8859-7.html>index ISO-8859-7 visualization</a><td><a href=iso-8859-7-bmp.html>index ISO-8859-7 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-8</dfn><td rowspan=2><a href=index-iso-8859-8.txt>index-iso-8859-8.txt</a><td rowspan=2><a href=iso-8859-8.html>index ISO-8859-8 visualization</a><td rowspan=2><a href=iso-8859-8-bmp.html>index ISO-8859-8 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-8-I</dfn>
 <tr><td><dfn export>ISO-8859-10</dfn><td><a href=index-iso-8859-10.txt>index-iso-8859-10.txt</a><td><a href=iso-8859-10.html>index ISO-8859-10 visualization</a><td><a href=iso-8859-10-bmp.html>index ISO-8859-10 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-13</dfn><td><a href=index-iso-8859-13.txt>index-iso-8859-13.txt</a><td><a href=iso-8859-13.html>index ISO-8859-13 visualization</a><td><a href=iso-8859-13-bmp.html>index ISO-8859-13 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-14</dfn><td><a href=index-iso-8859-14.txt>index-iso-8859-14.txt</a><td><a href=iso-8859-14.html>index ISO-8859-14 visualization</a><td><a href=iso-8859-14-bmp.html>index ISO-8859-14 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-15</dfn><td><a href=index-iso-8859-15.txt>index-iso-8859-15.txt</a><td><a href=iso-8859-15.html>index ISO-8859-15 visualization</a><td><a href=iso-8859-15-bmp.html>index ISO-8859-15 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-16</dfn><td><a href=index-iso-8859-16.txt>index-iso-8859-16.txt</a><td><a href=iso-8859-16.html>index ISO-8859-16 visualization</a><td><a href=iso-8859-16-bmp.html>index ISO-8859-16 BMP coverage</a>
 <tr><td><dfn export>KOI8-R</dfn><td><a href=index-koi8-r.txt>index-koi8-r.txt</a><td><a href=koi8-r.html>index KOI8-R visualization</a><td><a href=koi8-r-bmp.html>index KOI8-R BMP coverage</a>
 <tr><td><dfn export>KOI8-U</dfn><td><a href=index-koi8-u.txt>index-koi8-u.txt</a><td><a href=koi8-u.html>index KOI8-U visualization</a><td><a href=koi8-u-bmp.html>index KOI8-U BMP coverage</a>
 <tr><td><dfn export>macintosh</dfn><td><a href=index-macintosh.txt>index-macintosh.txt</a><td><a href=macintosh.html>index macintosh visualization</a><td><a href=macintosh-bmp.html>index macintosh BMP coverage</a>
 <tr><td><dfn export>windows-874</dfn><td><a href=index-windows-874.txt>index-windows-874.txt</a><td><a href=windows-874.html>index windows-874 visualization</a><td><a href=windows-874-bmp.html>index windows-874 BMP coverage</a>
 <tr><td><dfn export>windows-1250</dfn><td><a href=index-windows-1250.txt>index-windows-1250.txt</a><td><a href=windows-1250.html>index windows-1250 visualization</a><td><a href=windows-1250-bmp.html>index windows-1250 BMP coverage</a>
 <tr><td><dfn export>windows-1251</dfn><td><a href=index-windows-1251.txt>index-windows-1251.txt</a><td><a href=windows-1251.html>index windows-1251 visualization</a><td><a href=windows-1251-bmp.html>index windows-1251 BMP coverage</a>
 <tr><td><dfn export>windows-1252</dfn><td><a href=index-windows-1252.txt>index-windows-1252.txt</a><td><a href=windows-1252.html>index windows-1252 visualization</a><td><a href=windows-1252-bmp.html>index windows-1252 BMP coverage</a>
 <tr><td><dfn export>windows-1253</dfn><td><a href=index-windows-1253.txt>index-windows-1253.txt</a><td><a href=windows-1253.html>index windows-1253 visualization</a><td><a href=windows-1253-bmp.html>index windows-1253 BMP coverage</a>
 <tr><td><dfn export>windows-1254</dfn><td><a href=index-windows-1254.txt>index-windows-1254.txt</a><td><a href=windows-1254.html>index windows-1254 visualization</a><td><a href=windows-1254-bmp.html>index windows-1254 BMP coverage</a>
 <tr><td><dfn export>windows-1255</dfn><td><a href=index-windows-1255.txt>index-windows-1255.txt</a><td><a href=windows-1255.html>index windows-1255 visualization</a><td><a href=windows-1255-bmp.html>index windows-1255 BMP coverage</a>
 <tr><td><dfn export>windows-1256</dfn><td><a href=index-windows-1256.txt>index-windows-1256.txt</a><td><a href=windows-1256.html>index windows-1256 visualization</a><td><a href=windows-1256-bmp.html>index windows-1256 BMP coverage</a>
 <tr><td><dfn export>windows-1257</dfn><td><a href=index-windows-1257.txt>index-windows-1257.txt</a><td><a href=windows-1257.html>index windows-1257 visualization</a><td><a href=windows-1257-bmp.html>index windows-1257 BMP coverage</a>
 <tr><td><dfn export>windows-1258</dfn><td><a href=index-windows-1258.txt>index-windows-1258.txt</a><td><a href=windows-1258.html>index windows-1258 visualization</a><td><a href=windows-1258-bmp.html>index windows-1258 BMP coverage</a>
 <tr><td><dfn export>x-mac-cyrillic</dfn><td><a href=index-x-mac-cyrillic.txt>index-x-mac-cyrillic.txt</a><td><a href=x-mac-cyrillic.html>index x-mac-cyrillic visualization</a><td><a href=x-mac-cyrillic-bmp.html>index x-mac-cyrillic BMP coverage</a>
 </table>

<p class=note><a>ISO-8859-8</a> and <a>ISO-8859-8-I</a> are
distinct <a for=/>encoding</a> <a for=encoding>names</a>, because
<a>ISO-8859-8</a> has influence on the layout direction. And although
historically this might have been the case for <a>ISO-8859-6</a> and
"ISO-8859-6-I" as well, that is no longer true.
<!-- https://www.w3.org/Bugs/Public/show_bug.cgi?id=19505 -->

<h3 id=single-byte-decoder dfn export>single-byte decoder</h3>

<p><a>Single-byte encodings</a>'s
<a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return a code point whose value
 is <var>byte</var>.

 <li><p>Let <var>code point</var> be the <a>index code point</a>
 for <var>byte</var> &minus; 0x80 in <a>index single-byte</a>.

 <li><p>If <var>code point</var> is null, return <a>error</a>.

 <li><p>Return a code point whose value is <var>code point</var>.
</ol>

<h3 id=single-byte-encoder export dfn>single-byte encoder</h3>

<p><a>Single-byte encodings</a>'s
<a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>Let <var>pointer</var> be the <a>index pointer</a> for
 <var>code point</var> in <a>index single-byte</a>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Return a byte whose value is <var>pointer</var> + 0x80.
</ol>


<h2 id=legacy-multi-byte-chinese-(simplified)-encodings>Legacy multi-byte Chinese (simplified) encodings</h2>

<h3 id=gbk dfn export>GBK</h3>

<h4 id=gbk-decoder dfn export>GBK decoder</h4>

<p><a>GBK</a>'s <a for=/>decoder</a> is <a>gb18030</a>'s <a for=/>decoder</a>.


<h4 id=gbk-encoder dfn export>GBK encoder</h4>

<p><a>GBK</a>'s <a for=/>encoder</a> is <a>gb18030</a>'s <a for=/>encoder</a>
with its <a>is GBK</a> set to true.

<p class="note no-backref">Not fully aliasing <a>GBK</a> with <a>gb18030</a>
is a conservative move to decrease the chances of breaking legacy servers and other
consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.


<h3 id=gb18030 dfn export>gb18030</h3>

<h4 id=gb18030-decoder dfn export>gb18030 decoder</h4>

<p><a>gb18030</a>'s <a for=/>decoder</a> has an associated <dfn>gb18030 first</dfn>,
<dfn>gb18030 second</dfn>, and <dfn>gb18030 third</dfn> (all initially 0x00).

<p><a>gb18030</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>gb18030 first</a>, <a>gb18030 second</a>, and <a>gb18030 third</a>
 are 0x00, return <a>finished</a>.

 <li><p>If <var>byte</var> is <a>end-of-stream</a>, and
 <a>gb18030 first</a>, <a>gb18030 second</a>, or <a>gb18030 third</a>
 is not 0x00, set <a>gb18030 first</a>, <a>gb18030 second</a>, and
 <a>gb18030 third</a> to 0x00, and return <a>error</a>.

 <li>
  <p>If <a>gb18030 third</a> is not 0x00, then:

  <ol>
   <li>
    <p>If <var>byte</var> is not in the range 0x30 to 0x39, inclusive, then:

    <ol>
     <li><p><a>Prepend</a> <a>gb18030 second</a>, <a>gb18030 third</a>, and <var>byte</var> to
     <var>stream</var>.

     <li><p>Set <a>gb18030 first</a>, <a>gb18030 second</a>, and <a>gb18030 third</a> to 0x00.

     <li><p>Return <a>error</a>.
    </ol>

   <li><p>Let <var>code point</var> be the <a>index gb18030 ranges code point</a> for
   ((<a>gb18030 first</a> &minus; 0x81) × (10 × 126 × 10)) +
   ((<a>gb18030 second</a> &minus; 0x30) × (10 × 126)) +
   ((<a>gb18030 third</a> &minus; 0x81) × 10) + <var>byte</var> &minus; 0x30.

   <li><p>Set <a>gb18030 first</a>, <a>gb18030 second</a>, and <a>gb18030 third</a> to 0x00.

   <li><p>If <var>code point</var> is null, return <a>error</a>.

   <li><p>Return a code point whose value is <var>code point</var>.
  </ol>

 <li>
  <p>If <a>gb18030 second</a> is not 0x00, then:

  <ol>
   <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
   <a>gb18030 third</a> to <var>byte</var> and return <a>continue</a>.

   <li><p><a>Prepend</a> <a>gb18030 second</a>
   followed by <var>byte</var> to <var>stream</var>, set
   <a>gb18030 first</a> and <a>gb18030 second</a> to 0x00, and return
   <a>error</a>.
  </ol>

 <li>
  <p>If <a>gb18030 first</a> is not 0x00, then:

  <ol>
   <li><p>If <var>byte</var> is in the range 0x30 to 0x39, inclusive, set
   <a>gb18030 second</a> to <var>byte</var> and return <a>continue</a>.

   <li><p>Let <var>lead</var> be <a>gb18030 first</a>, let
   <var>pointer</var> be null, and set <a>gb18030 first</a> to 0x00.

   <li><p>Let <var>offset</var> be 0x40 if <var>byte</var> is
   less than 0x7F and 0x41 otherwise.

   <li><p>If <var>byte</var> is in the range 0x40 to 0x7E, inclusive, or
   0x80 to 0xFE, inclusive, set <var>pointer</var> to
   (<var>lead</var> &minus; 0x81) × 190 + (<var>byte</var> &minus; <var>offset</var>).

   <li><p>Let <var>code point</var> be null if
   <var>pointer</var> is null and the <a>index code point</a>
   for <var>pointer</var> in <a>index gb18030</a> otherwise.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>stream</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is 0x80, return code point U+20AC.

 <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
 <a>gb18030 first</a> to <var>byte</var> and return <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=gb18030-encoder dfn export>gb18030 encoder</h4>

<p><a>gb18030</a>'s <a for=/>encoder</a> has an associated <dfn id=gbk-flag>is GBK</dfn>
(initially false).

<p><a>gb18030</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li>
  <p>If <var>code point</var> is U+E5E5, return <a>error</a> with <var>code point</var>.

  <p class=note><a>Index gb18030</a> maps 0xA3 0xA0 to U+3000 rather than U+E5E5 for
  compatibility with deployed content. Therefore it cannot roundtrip.

 <li><p>If <a>is GBK</a> is true and <var>code point</var> is
 U+20AC, return byte 0x80.

 <li><p>Let <var>pointer</var> be the <a>index pointer</a> for
 <var>code point</var> in <a>index gb18030</a>.

 <li>
  <p>If <var>pointer</var> is non-null, then:

  <ol>
   <li><p>Let <var>lead</var> be <var>pointer</var> / 190 + 0x81.

   <li><p>Let <var>trail</var> be <var>pointer</var> % 190.

   <li><p>Let <var>offset</var> be 0x40 if <var>trail</var> is
   less than 0x3F<!--0x7F-0x40--> and 0x41 otherwise.

   <li><p>Return two bytes whose values are <var>lead</var> and
   <var>trail</var> + <var>offset</var>.
  </ol>

 <li><p>If <a>is GBK</a> is true, return <a>error</a> with
 <var>code point</var>.

 <li><p>Set <var>pointer</var> to the
 <a>index gb18030 ranges pointer</a> for <var>code point</var>.

 <li><p>Let <var>byte1</var> be <var>pointer</var> / (10 × 126 × 10).

 <li><p>Set <var>pointer</var> to <var>pointer</var> % (10 × 126 × 10).

 <li><p>Let <var>byte2</var> be <var>pointer</var> / (10 × 126).

 <li><p>Set <var>pointer</var> to <var>pointer</var> % (10 × 126).

 <li><p>Let <var>byte3</var> be <var>pointer</var> / 10.

 <li><p>Let <var>byte4</var> be <var>pointer</var> % 10.

 <li><p>Return four bytes whose values are <var>byte1</var> + 0x81,
 <var>byte2</var> + 0x30, <var>byte3</var> + 0x81,
 <var>byte4</var> + 0x30.
</ol>


<h2 id=legacy-multi-byte-chinese-(traditional)-encodings>Legacy multi-byte Chinese (traditional) encodings</h2>

<!--
 Lead:  0x81 to 0xFE
 Trail: 0x40 to 0x7E or 0xA1 to 0xFE
-->


<h3 id=big5 dfn export>Big5</h3>

<h4 id=big5-decoder dfn export>Big5 decoder</h4>

<p><a>Big5</a>'s <a for=/>decoder</a> has an associated
<dfn>Big5 lead</dfn> (initially 0x00).

<a>Big5</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a> and <a>Big5 lead</a>
 is not 0x00, set <a>Big5 lead</a> to 0x00 and return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-stream</a> and <a>Big5 lead</a>
 is 0x00, return <a>finished</a>.

 <li>
  <p>If <a>Big5 lead</a> is not 0x00, let <var>lead</var> be
  <a>Big5 lead</a>, let <var>pointer</var> be null, set
  <a>Big5 lead</a> to 0x00, and then:

  <ol>
   <li><p>Let <var>offset</var> be 0x40 if <var>byte</var> is
   less than 0x7F and 0x62 otherwise.
   <!-- 0x62 = 0xA1-0x7E+1+0x40 -->

   <li><p>If <var>byte</var> is in the range 0x40 to 0x7E, inclusive, or
   0xA1 to 0xFE, inclusive, set <var>pointer</var> to
   (<var>lead</var> &minus; 0x81) × 157 + (<var>byte</var> &minus; <var>offset</var>).

   <li>
    <p>If there is a row in the table below whose first column is
    <var>pointer</var>, return the <em>two</em> code points listed in
    its second column (the third column is irrelevant):

    <table>
     <tbody><tr><th>Pointer<th>Code points<th>Notes<!-- https://www.unicode.org/Public/UNIDATA/NamedSequences.txt -->
     <tr><td>1133<!-- 0x88 0x62 --><td>U+00CA U+0304<td>Ê̄ (LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND MACRON)
     <tr><td>1135<!-- 0x88 0x64 --><td>U+00CA U+030C<td>Ê̌ (LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND CARON)
     <tr><td>1164<!-- 0x88 0xA3 --><td>U+00EA U+0304<td>ê̄ (LATIN SMALL LETTER E WITH CIRCUMFLEX AND MACRON)
     <tr><td>1166<!-- 0x88 0xA5 --><td>U+00EA U+030C<td>ê̌ (LATIN SMALL LETTER E WITH CIRCUMFLEX AND CARON)
    </table>
    <!-- we do this to avoid PUA -->

    <p class=note>Since <a lt=index>indexes</a> are limited to
    single code points this table is used for these pointers.

   <li><p>Let <var>code point</var> be null if
   <var>pointer</var> is null and the <a>index code point</a>
   for <var>pointer</var> in <a>index Big5</a> otherwise.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>stream</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
 <a>Big5 lead</a> to <var>byte</var> and return <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=big5-encoder dfn export>Big5 encoder</h4>

<p><a>Big5</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>Let <var>pointer</var> be the <a>index Big5 pointer</a> for
 <var>code point</var>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 157 + 0x81.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 157.

 <li><p>Let <var>offset</var> be 0x40 if <var>trail</var> is
 less than 0x3F<!--0x7F-0x40--> and 0x62<!--0xA1-0x3F--> otherwise.

 <li><p>Return two bytes whose values are <var>lead</var> and
 <var>trail</var> + <var>offset</var>.
</ol>


<h2 id=legacy-multi-byte-japanese-encodings>Legacy multi-byte Japanese encodings</h2>

<h3 id=euc-jp dfn export>EUC-JP</h3>
<!-- https://www.iana.org/assignments/charset-reg/CP51932 -->

<h4 id=euc-jp-decoder dfn export>EUC-JP decoder</h4>

<p><a>EUC-JP</a>'s <a for=/>decoder</a> has an associated
<dfn id=euc-jp-jis0212-flag>EUC-JP jis0212</dfn> (initially false) and
<dfn>EUC-JP lead</dfn> (initially 0x00).

<p><a>EUC-JP</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>EUC-JP lead</a> is not 0x00, set <a>EUC-JP lead</a> to 0x00, and return
 <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>EUC-JP lead</a> is 0x00, return <a>finished</a>.

 <li><p>If <a>EUC-JP lead</a> is 0x8E and <var>byte</var> is
 in the range 0xA1 to 0xDF, inclusive, set <a>EUC-JP lead</a> to 0x00 and return
 a code point whose value is 0xFF61 &minus; 0xA1 + <var>byte</var>.
 <!-- Katakana; subtraction is done first to avoid upsetting compilers -->

 <li><p>If <a>EUC-JP lead</a> is 0x8F and <var>byte</var> is in the range
 0xA1 to 0xFE, inclusive, set <a>EUC-JP jis0212</a> to true, set
 <a>EUC-JP lead</a> to <var>byte</var>, and return <a>continue</a>.

 <li>
  <p>If <a>EUC-JP lead</a> is not 0x00, let <var>lead</var> be <a>EUC-JP lead</a>, set
  <a>EUC-JP lead</a> to 0x00, and then:

  <ol>
   <li><p>Let <var>code point</var> be null.

   <li><p>If <var>lead</var> and <var>byte</var> are both in the
   range 0xA1 to 0xFE, inclusive, set <var>code point</var> to the
   <a>index code point</a> for
   (<var>lead</var> &minus; 0xA1) × 94 + <var>byte</var> &minus; 0xA1
   in <a>index jis0208</a> if <a>EUC-JP jis0212</a> is false and in
   <a>index jis0212</a> otherwise.

   <li><p>Set <a>EUC-JP jis0212</a> to false.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>stream</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is 0x8E, 0x8F, or in the range 0xA1 to
 0xFE, inclusive, set <a>EUC-JP lead</a> to <var>byte</var> and return
 <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=euc-jp-encoder dfn export>EUC-JP encoder</h4>

<p><a>EUC-JP</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>If <var>code point</var> is U+00A5, return byte 0x5C.

 <li><p>If <var>code point</var> is U+203E, return byte 0x7E.

 <li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, return
 two bytes whose values are 0x8E and <var>code point</var> &minus; 0xFF61 + 0xA1.

 <li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

 <li>
  <p>Let <var>pointer</var> be the <a>index pointer</a> for <var>code point</var> in
  <a>index jis0208</a>.

  <p class=note>If <var>pointer</var> is non-null, it is less than 8836 due to the nature of
  <a>index jis0208</a> and the <a>index pointer</a> operation.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 94 + 0xA1.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 94 + 0xA1.

 <li><p>Return two bytes whose values are <var>lead</var> and
 <var>trail</var>.
</ol>


<h3 id=iso-2022-jp dfn export>ISO-2022-JP</h3>
<!--
 https://tools.ietf.org/html/rfc1468
 https://tools.ietf.org/html/rfc2237 (ISO-2022-JP-1; not used)
 "ESC ) I" is from ISO-2022-JP-3 reportedly
-->

<h4 id=iso-2022-jp-decoder dfn export>ISO-2022-JP decoder</h4>

<p><a>ISO-2022-JP</a>'s <a for=/>decoder</a> has an associated
<dfn>ISO-2022-JP decoder state</dfn> (initially
<a lt="ISO-2022-JP decoder ASCII">ASCII</a>),
<dfn>ISO-2022-JP decoder output state</dfn> (initially
<a lt="ISO-2022-JP decoder ASCII">ASCII</a>),
<dfn>ISO-2022-JP lead</dfn> (initially 0x00), and
<dfn id=iso-2022-jp-output-flag>ISO-2022-JP output</dfn> (initially false).

<p><a>ISO-2022-JP</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps, switching on
<a>ISO-2022-JP decoder state</a>:

<dl class=switch>
 <dt><dfn lt="ISO-2022-JP decoder ASCII">ASCII</dfn>
 <dd>
  <p>Based on <var>byte</var>:

  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x00 to 0x7F, excluding 0x0E, 0x0F, and 0x1B
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return a code point whose
   value is <var>byte</var>.

   <dt><a>end-of-stream</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder Roman">Roman</dfn>
 <dd>
  <p>Based on <var>byte</var>:

  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x5C
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+00A5.

   <dt>0x7E
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+203E.

   <dt>0x00 to 0x7F, excluding 0x0E, 0x0F, 0x1B, 0x5C, and 0x7E
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return a code point whose
   value is <var>byte</var>.

   <dt><a>end-of-stream</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder katakana">katakana</dfn>
 <dd>
  <p>Based on <var>byte</var>:
  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x21 to 0x5F
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return a code point whose
   value is 0xFF61 &minus; 0x21 + <var>byte</var>.
   <!-- Katakana; subtraction is done first to avoid upsetting compilers -->

   <dt><a>end-of-stream</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder lead byte">Lead byte</dfn>
 <dd>
  <p>Based on <var>byte</var>:
  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x21 to 0x7E
   <dd><p>Set <a>ISO-2022-JP output</a> to false,
   <a>ISO-2022-JP lead</a> to <var>byte</var>,
   <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder trail byte">trail byte</a>, and return
   <a>continue</a>.

   <dt><a>end-of-stream</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder trail byte">Trail byte</dfn>
 <dd>
  <p>Based on <var>byte</var>:
  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>error</a>.
   <!-- ISO-2022-JP decoder output state is still lead byte -->

   <dt>0x21 to 0x7E
   <dd>
    <ol>
     <li><p>Set the <a>ISO-2022-JP decoder state</a> to
     <a lt="ISO-2022-JP decoder lead byte">lead byte</a>.

     <li><p>Let <var>pointer</var> be
     (<a>ISO-2022-JP lead</a> &minus; 0x21) × 94 + <var>byte</var> &minus; 0x21.

     <li><p>Let <var>code point</var> be the <a>index code point</a> for
     <var>pointer</var> in <a>index jis0208</a>.

     <li><p>If <var>code point</var> is null, return <a>error</a>.

     <li><p>Return a code point whose value is <var>code point</var>.
    </ol>

   <dt><a>end-of-stream</a>
   <dd><p>Set the <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder lead byte">lead byte</a>,
   <a>prepend</a> <var>byte</var> to
   <var>stream</var>, and return <a>error</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder lead byte">lead byte</a> and return
   <a>error</a>.
   <!-- ISO-2022-JP decoder output state is still lead byte -->
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder escape start">Escape start</dfn>
 <dd>
  <ol>
   <li><p>If <var>byte</var> is either <!--$-->0x24 or <!--(-->0x28, set
   <a>ISO-2022-JP lead</a> to <var>byte</var>,
   <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape">escape</a>, and return
   <a>continue</a>.

   <li><p><a>Prepend</a> <var>byte</var> to
   <var>stream</var>.

   <li><p>Set <a>ISO-2022-JP output</a> to false,
   <a>ISO-2022-JP decoder state</a> to
   <a>ISO-2022-JP decoder output state</a>, and return <a>error</a>.
  </ol>

 <dt><dfn lt="ISO-2022-JP decoder escape">Escape</dfn>
 <dd>
  <ol>
   <li><p>Let <var>lead</var> be <a>ISO-2022-JP lead</a> and set
   <a>ISO-2022-JP lead</a> to 0x00.

   <li><p>Let <var>state</var> be null.

   <li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x42<!--B-->, set
   <var>state</var> to <a lt="ISO-2022-JP decoder ASCII">ASCII</a>.

   <li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x4A<!--J-->, set
   <var>state</var> to <a lt="ISO-2022-JP decoder Roman">Roman</a>.

   <li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x49<!--I-->, set
   <var>state</var> to <a lt="ISO-2022-JP decoder katakana">katakana</a>.

   <li><p>If <var>lead</var> is 0x24 and <var>byte</var> is either
   0x40<!--@--> or 0x42<!--B-->, set <var>state</var> to
   <a lt="ISO-2022-JP decoder lead byte">lead byte</a>.

   <li>
    <p>If <var>state</var> is non-null, then:

    <ol>
     <li><p>Set <a>ISO-2022-JP decoder state</a> and
     <a>ISO-2022-JP decoder output state</a> to <var>state</var>.

     <li><p>Let <var>output</var> be the value of <a>ISO-2022-JP output</a>.

     <li><p>Set <a>ISO-2022-JP output</a> to true.

     <li><p>Return <a>continue</a>, if <var>output</var> is false, and
     <a>error</a> otherwise.
    </ol>

   <li><p><a>Prepend</a>
   <var>lead</var> and <var>byte</var> to <var>stream</var>.

   <li><p>Set <a>ISO-2022-JP output</a> to false,
   <a>ISO-2022-JP decoder state</a> to <a>ISO-2022-JP decoder output state</a>
   and return <a>error</a>.
  </ol>
</dl>


<h4 id=iso-2022-jp-encoder dfn export>ISO-2022-JP encoder</h4>

<div class="note no-backref">
 <p>The <a>ISO-2022-JP encoder</a> is the only <a for=/>encoder</a> for which the concatenation of
 multiple outputs can result in an <a>error</a> when run through the corresponding
 <a for=/>decoder</a>.

 <p class=example id=example-iso-2022-jp-encoder-oddity>Encoding U+00A5 gives 0x1B 0x28 0x4A 0x5C
 0x1B 0x28 0x42. Doing that twice, concatenating the results, and then decoding yields U+00A5 U+FFFD
 U+00A5.
</div>

<p><a>ISO-2022-JP</a>'s <a for=/>encoder</a> has an associated
<dfn>ISO-2022-JP encoder state</dfn> which is <dfn lt="ISO-2022-JP encoder ASCII">ASCII</dfn>,
<dfn lt="ISO-2022-JP encoder Roman">Roman</dfn>, or
<dfn lt="ISO-2022-JP encoder jis0208">jis0208</dfn> (initially
<a lt="ISO-2022-JP encoder ASCII">ASCII</a>).

<p><a>ISO-2022-JP</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a> and
 <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>,
 <a>prepend</a> <var>code point</var> to
 <var>stream</var>, set <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, and return three bytes
 0x1B 0x28 0x42.

 <li><p>If <var>code point</var> is <a>end-of-stream</a> and
 <a>ISO-2022-JP encoder state</a> is
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, return <a>finished</a>.

 <li>
  <p>If <a>ISO-2022-JP encoder state</a> is
  <a lt="ISO-2022-JP encoder ASCII">ASCII</a> or
  <a lt="ISO-2022-JP encoder Roman">Roman</a>, and <var>code point</var> is U+000E, U+000F,
  or U+001B, return <a>error</a> with U+FFFD.

  <p class=note>This returns U+FFFD rather than <var>code point</var> to prevent attacks.
  <!-- https://github.com/whatwg/encoding/issues/15 -->

 <li><p>If <a>ISO-2022-JP encoder state</a> is
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a> and <var>code point</var> is an
 <a>ASCII code point</a>, return a byte whose value is <var>code point</var>.

 <li>
  <p>If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder Roman">Roman</a> and
  <var>code point</var> is an <a>ASCII code point</a>, excluding U+005C and U+007E, or is U+00A5 or
  U+203E, then:

  <ol>
   <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return a byte
   whose value is <var>code point</var>.

   <li><p>If <var>code point</var> is U+00A5, return byte 0x5C.

   <li><p>If <var>code point</var> is U+203E, return byte 0x7E.
  </ol>

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, and
 <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>,
 <a>prepend</a> <var>code point</var> to
 <var>stream</var>, set <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, and return three bytes
 0x1B 0x28 0x42.

 <li><p>If <var>code point</var> is either U+00A5 or U+203E, and
 <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder Roman">Roman</a>,
 <a>prepend</a> <var>code point</var> to
 <var>stream</var>, set <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder Roman">Roman</a>, and return three bytes
 0x1B 0x28 0x4A.

 <li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

 <li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, set it to the
 <a>index code point</a> for <var>code point</var> &minus; 0xFF61 in
 <a>index ISO-2022-JP katakana</a>.

 <li>
  <p>Let <var>pointer</var> be the <a>index pointer</a> for <var>code point</var> in
  <a>index jis0208</a>.

  <p class=note>If <var>pointer</var> is non-null, it is less than 8836 due to the nature of
  <a>index jis0208</a> and the <a>index pointer</a> operation.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>If <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder jis0208">jis0208</a>,
 <a>prepend</a> <var>code point</var> to
 <var>stream</var>, set <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder jis0208">jis0208</a>, and return three bytes
 0x1B 0x24 0x42.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 94 + 0x21.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 94 + 0x21.

 <li><p>Return two bytes whose values are <var>lead</var> and
 <var>trail</var>.
</ol>


<h3 id=shift_jis dfn export>Shift_JIS</h3>

<h4 id=shift_jis-decoder dfn export>Shift_JIS decoder</h4>

<p><a>Shift_JIS</a>'s <a for=/>decoder</a> has an associated
<dfn>Shift_JIS lead</dfn> (initially 0x00).

<p><a>Shift_JIS</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>Shift_JIS lead</a> is not 0x00, set <a>Shift_JIS lead</a> to 0x00 and
 return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>Shift_JIS lead</a> is 0x00, return <a>finished</a>.

 <li>
  <p>If <a>Shift_JIS lead</a> is not 0x00, let <var>lead</var> be <a>Shift_JIS lead</a>, let
  <var>pointer</var> be null, set <a>Shift_JIS lead</a> to 0x00, and then:

  <ol>
   <li><p>Let <var>offset</var> be 0x40, if <var>byte</var> is
   less than 0x7F, and 0x41 otherwise.

   <li><p>Let <var>lead offset</var> be 0x81, if <var>lead</var>
   is less than 0xA0, and 0xC1 otherwise.

   <li><p>If <var>byte</var> is in the range 0x40 to 0x7E, inclusive, or
   0x80 to 0xFC, inclusive, set <var>pointer</var> to
   (<var>lead</var> &minus; <var>lead offset</var>) × 188 + <var>byte</var> &minus; <var>offset</var>.

   <li>
    <p>If <var>pointer</var> is in the range 8836 to 10715, inclusive, return a code point whose
    value is 0xE000 &minus; 8836 + <var>pointer</var>.
    <!-- subtraction is done first to avoid upsetting compilers -->

    <p class=note>This is interoperable legacy from Windows known as EUDC.
    <!-- PUA -->

   <li><p>Let <var>code point</var> be null, if
   <var>pointer</var> is null, and the <a>index code point</a>
   for <var>pointer</var> in <a>index jis0208</a> otherwise.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>stream</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a> or 0x80, return a code point
 whose value is <var>byte</var>.
 <!-- Opera has 0x7E -->

 <li><p>If <var>byte</var> is in the range 0xA1 to 0xDF, inclusive, return
 a code point whose value is 0xFF61 &minus; 0xA1 + <var>byte</var>.
 <!-- Katakana; subtraction is done first to avoid upsetting compilers -->

 <li><p>If <var>byte</var> is in the range 0x81 to 0x9F, inclusive, or 0xE0 to 0xFC,
 inclusive, set <a>Shift_JIS lead</a> to <var>byte</var> and return
 <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=shift_jis-encoder dfn export>Shift_JIS encoder</h4>

<p><a>Shift_JIS</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a> or U+0080, return
 a byte whose value is <var>code point</var>.

 <li><p>If <var>code point</var> is U+00A5, return byte 0x5C.

 <li><p>If <var>code point</var> is U+203E, return byte 0x7E.

 <li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, return
 a byte whose value is <var>code point</var> &minus; 0xFF61 + 0xA1.

 <li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

 <li><p>Let <var>pointer</var> be the <a>index Shift_JIS pointer</a> for
 <var>code point</var>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 188.

 <li><p>Let <var>lead offset</var> be 0x81, if <var>lead</var> is
 less than 0x1F, and 0xC1 otherwise.
 <!-- 0xA0-0x81 -->

 <li><p>Let <var>trail</var> be <var>pointer</var> % 188.

 <li><p>Let <var>offset</var> be 0x40, if <var>trail</var> is
 less than 0x3F, and 0x41 otherwise.

 <li><p>Return two bytes whose values are
 <var>lead</var> + <var>lead offset</var> and
 <var>trail</var> + <var>offset</var>.
</ol>


<h2 id=legacy-multi-byte-korean-encodings>Legacy multi-byte Korean encodings</h2>

<h3 id=euc-kr dfn export>EUC-KR</h3>

<h4 id=euc-kr-decoder dfn export>EUC-KR decoder</h4>

<p><a>EUC-KR</a>'s <a for=/>decoder</a> has an associated
<dfn>EUC-KR lead</dfn> (initially 0x00).

<p><a>EUC-KR</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>EUC-KR lead</a> is not 0x00, set <a>EUC-KR lead</a> to 0x00
 and return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>EUC-KR lead</a> is 0x00, return <a>finished</a>.

 <li>
  <p>If <a>EUC-KR lead</a> is not 0x00, let <var>lead</var> be <a>EUC-KR lead</a>, let
  <var>pointer</var> be null, set <a>EUC-KR lead</a> to 0x00, and then:

  <ol>
   <li><p>If <var>byte</var> is in the range  0x41 to 0xFE, inclusive, set
   <var>pointer</var> to
   (<var>lead</var> &minus; 0x81) × 190 + (<var>byte</var> &minus; 0x41).

   <li><p>Let <var>code point</var> be null, if <var>pointer</var> is null,
   and the <a>index code point</a> for <var>pointer</var> in
   <a>index EUC-KR</a> otherwise.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>stream</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
 <a>EUC-KR lead</a> to <var>byte</var> and return <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=euc-kr-encoder dfn export>EUC-KR encoder</h4>

<p><a>EUC-KR</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>Let <var>pointer</var> be the <a>index pointer</a> for
 <var>code point</var> in <a>index EUC-KR</a>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 190 + 0x81.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 190 + 0x41.

 <li><p>Return two bytes whose values are <var>lead</var> and <var>trail</var>.
</ol>


<h2 id=legacy-miscellaneous-encodings>Legacy miscellaneous encodings</h2>

<h3 id=replacement dfn export>replacement</h3>

<p class=note>The <a>replacement</a> <a for=/>encoding</a> exists to prevent certain
attacks that abuse a mismatch between <a for=/>encodings</a> supported on
the server and the client.


<h4 id=replacement-decoder dfn export>replacement decoder</h4>

<p><a>replacement</a>'s <a for=/>decoder</a> has an associated
<dfn id=replacement-error-returned-flag>replacement error returned</dfn> (initially false).

<p><a>replacement</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a>, return <a>finished</a>.

 <li><p>If <a>replacement error returned</a> is false, set
 <a>replacement error returned</a> to true and return <a>error</a>.

 <li><p>Return <a>finished</a>.
</ol>


<h3 id=common-infrastructure-for-utf-16be-and-utf-16le>Common infrastructure for <a>UTF-16BE</a> and <a>UTF-16LE</a></h3>

<h4 id=shared-utf-16-decoder dfn export>shared UTF-16 decoder</h4>

<p class="note no-backref">A byte order mark has priority over a <a>label</a> as it
has been found to be more accurate in deployed content. Therefore it is not part of the
<a>shared UTF-16 decoder</a> algorithm but rather the <a>decode</a> algorithm.

<p><a>shared UTF-16 decoder</a> has an associated <dfn>UTF-16 lead byte</dfn> and
<dfn>UTF-16 lead surrogate</dfn> (both initially null), and
<dfn id=utf-16be-decoder-flag>is UTF-16BE decoder</dfn> (initially false).

<p><a>shared UTF-16 decoder</a>'s <a>handler</a>, given a <var>stream</var>
and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a> and either
 <a>UTF-16 lead byte</a> or <a>UTF-16 lead surrogate</a> is non-null, set
 <a>UTF-16 lead byte</a> and <a>UTF-16 lead surrogate</a> to null, and return
 <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-stream</a> and
 <a>UTF-16 lead byte</a> and <a>UTF-16 lead surrogate</a> are null, return
 <a>finished</a>.

 <li><p>If <a>UTF-16 lead byte</a> is null, set <a>UTF-16 lead byte</a> to
 <var>byte</var> and return <a>continue</a>.

 <li>
  <p>Let <var>code unit</var> be the result of:

  <dl class=switch>
   <dt><a>is UTF-16BE decoder</a> is true
   <dd><p>(<a>UTF-16 lead byte</a> &lt;&lt; 8) + <var>byte</var>.
   <dt><a>is UTF-16BE decoder</a> is false
   <dd><p>(<var>byte</var> &lt;&lt; 8) + <a>UTF-16 lead byte</a>.
  </dl>

  <p>Then set <a>UTF-16 lead byte</a> to null.

 <li>
  <p>If <a>UTF-16 lead surrogate</a> is non-null, let <var>lead surrogate</var> be
  <a>UTF-16 lead surrogate</a>, set <a>UTF-16 lead surrogate</a> to null, and then:

  <ol>
   <li><p>If <var>code unit</var> is in the range U+DC00 to U+DFFF, inclusive,
   return a code point whose value is
   0x10000 + ((<var>lead surrogate</var> &minus; 0xD800) &lt;&lt; 10) + (<var>code unit</var> &minus; 0xDC00).

   <li><p>Let <var>byte1</var> be <var>code unit</var> >> 8.

   <li><p>Let <var>byte2</var> be <var>code unit</var> &amp; 0x00FF.

   <li><p>Let <var>bytes</var> be two bytes whose values are <var>byte1</var> and <var>byte2</var>,
   if <a>is UTF-16BE decoder</a> is true, and <var>byte2</var> and <var>byte1</var> otherwise.

   <li><p><a>Prepend</a> the <var>bytes</var> to
   <var>stream</var> and return <a>error</a>.
   <!-- unpaired surrogates; IE/WebKit output them, Gecko/Opera U+FFFD them -->
  </ol>

 <li><p>If <var>code unit</var> is in the range U+D800 to U+DBFF, inclusive, set
 <a>UTF-16 lead surrogate</a> to <var>code unit</var> and return
 <a>continue</a>.

 <li><p>If <var>code unit</var> is in the range U+DC00 to U+DFFF, inclusive,
 return <a>error</a>.
 <!-- unpaired surrogates; IE/WebKit output them, Gecko/Opera U+FFFD them -->

 <li><p>Return code point <var>code unit</var>.
</ol>


<h3 id=utf-16be dfn export>UTF-16BE</h3>

<h4 id=utf-16be-decoder dfn export>UTF-16BE decoder</h4>

<p><a>UTF-16BE</a>'s <a for=/>decoder</a> is <a>shared UTF-16 decoder</a> with
its <a>is UTF-16BE decoder</a> set to true.


<h3 id=utf-16le dfn export>UTF-16LE</h3>

<p class="note no-backref">Both "<code>utf-16</code>" and
"<code>utf-16le</code>" are <a>labels</a> for
<a>UTF-16LE</a> to deal with deployed content.


<h4 id=utf-16le-decoder dfn export>UTF-16LE decoder</h4>

<p><a>UTF-16LE</a>'s <a for=/>decoder</a> is <a>shared UTF-16 decoder</a>.


<h3 id=x-user-defined dfn export>x-user-defined</h3>

<p class=note>While technically this is a <a>single-byte encoding</a>,
it is defined separately as it can be implemented algorithmically.

<!--
This encoding is silly, however, the web depends on it:

https://krijnhoetmer.nl/irc-logs/whatwg/20121003#l-461
https://krijnhoetmer.nl/irc-logs/whatwg/20121010#l-812

https://stackoverflow.com/questions/6986789/why-are-some-bytes-prefixed-with-0xf7-when-using-charset-x-user-defined-with-xm
-->

<h4 id=x-user-defined-decoder dfn export>x-user-defined decoder</h4>

<p><a>x-user-defined</a>'s <a for=/>decoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>Return a code point whose value is 0xF780 + <var>byte</var> &minus; 0x80.
</ol>


<h4 id=x-user-defined-encoder dfn export>x-user-defined encoder</h4>

<p><a>x-user-defined</a>'s <a for=/>encoder</a>'s <a>handler</a>, given a
<var>stream</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-stream</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>If <var>code point</var> is in the range U+F780 to U+F7FF, inclusive, return
 a byte whose value is <var>code point</var> &minus; 0xF780 + 0x80.

 <li><p>Return <a>error</a> with <var>code point</var>.
</ol>


<h2 id=browser-ui>Browser UI</h2>

<p>Browsers are encouraged to not enable overriding the encoding of a resource. If such a
feature is nonetheless present, browsers should not offer either
<a>UTF-16BE</a> or <a>UTF-16LE</a> as option due to aforementioned security
issues. Browsers also should disable this feature if the resource was decoded using either
<a>UTF-16BE</a> or <a>UTF-16LE</a>.


<h2 class=no-num id=implementation-considerations>Implementation considerations</h2>

<p>Instead of supporting <a for=/>streams</a> with arbitrary <a for=stream>prepend</a>, the
<a for=/>decoders</a> for <a for=/>encodings</a> in this standard could be implemented with:

<ol>
 <li><p>The ability to unread the current byte.

 <li>
  <p>A single-byte buffer for <a>gb18030</a> (an <a>ASCII byte</a>) and <a>ISO-2022-JP</a> (0x24 or
  0x28).

  <p class=example id=example-gb18030-implementation-strategy>For <a>gb18030</a> when hitting a
  bogus byte while <a>gb18030 third</a> is not 0x00, <a>gb18030 second</a> could be moved into the
  single-byte buffer to be returned next, and <a>gb18030 third</a> would be the new
  <a>gb18030 first</a>, checked for not being 0x00 after the single-byte buffer was returned and
  emptied. This is possible as the range for the first and third byte in <a>gb18030</a> is
  identical.
</ol>

<p>The <a>ISO-2022-JP encoder</a> needs <a>ISO-2022-JP encoder state</a> as additional state, but
other than that, none of the <a for=/>encoders</a> for <a for=/>encodings</a> in this standard
require additional state or buffers.


<h2 class=no-num id=acknowledgments>Acknowledgments</h2>

<p>There have been a lot of people that have helped make encodings more
interoperable over the years and thereby furthered the goals of this
standard. Likewise many people have helped making this standard what it is
today.

<p>With that, many thanks to
Adam Rice,
Alan Chaney,
Alexander Shtuchkin,
Allen Wirfs-Brock,
Andreu Botella,
Aneesh Agrawal,
Arkadiusz Michalski,
Asmus Freytag,
Ben Noordhuis,
Bnaya Peretz,
Boris Zbarsky,
Bruno Haible,
Cameron McCormack,
Charles McCathieNeville,
Christopher Foo,
David Carlisle,
Domenic Denicola,
Dominique Hazaël-Massieux,
Doug Ewell,
Erik van der Poel,
譚永鋒 (Frank Yung-Fong Tang),
Glenn Maynard,
Gordon P. Hemsley,
Henri Sivonen,
Ian Hickson,
James Graham,
Jeffrey Yasskin,
John Tamplin,
Joshua Bell,
村井純 (Jun Murai),
신정식 (Jungshik Shin),
Jxck,
강 성훈 (Kang Seonghoon),<!-- space is intentional: https://www.w3.org/Bugs/Public/show_bug.cgi?id=27675#c2 -->
川幡太一 (Kawabata Taichi),
Ken Lunde,
Ken Whistler,
Kenneth Russell,
田村健人 (Kent Tamura),
Leif Halvard Silli,
Luke Wagner,
Maciej Hirsz,
Makoto Kato,
Mark Callow,
Mark Crispin,
Mark Davis,
Martin Dürst,
Masatoshi Kimura,
Mattias Buelens,
Ms2ger,
Nigel Megitt,
Nigel Tao,
Norbert Lindenberg,
Øistein E. Andersen,
Peter Krefting,
Philip Jägenstedt,
Philip Taylor,
Richard Ishida,
Robbert Broersma,
Robert Mustacchi,
Ryan Dahl,
Sam Sneddon,
Shawn Steele,
Simon Montagu,
Simon Pieters,
Simon Sapin,
寺田健 (Takeshi Terada),
Vyacheslav Matva, and
成瀬ゆい (Yui Naruse)
for being awesome.

<p>This standard is written by
<a href=https://annevankesteren.nl/ lang=nl>Anne van Kesteren</a>
(<a href=https://www.mozilla.org/>Mozilla</a>,
<a href=mailto:annevk@annevk.nl>annevk@annevk.nl</a>). The <a href=#api>API</a> chapter
was initially written by Joshua Bell (<a href=https://www.google.com/>Google</a>).