chap-cheri-x86-64.tex

\chapter[The CHERI-x86-64 ISA (Sketch)]{The CHERI-x86-64 Instruction-Set Architecture (Sketch)}
\label{chap:cheri-x86-64}

\rwnote{New introduction is required, and some change of pitch.}

In this chapter, we explore models for applying CHERI protection to the x86
architecture.
The x86 architecture is a widely deployed CPU architecture used in a
variety of applications ranging from mobile to high-performance computing.
The architecture has evolved over time from 16-bit processors without
MMUs to present-day systems with 64-bit processors supporting virtual
memory via a combination of segmentation and paging.

The x86 architecture has spanned three register sizes (16, 32, and
64 bits) and multiple memory management models.  We choose to define
CHERI solely for the 64-bit x86 architecture for a variety of reasons
including its more mature virtual-memory model, as well as its larger
general-purpose integer register file.

\section{CHERI-x86-64 Approach}

In applying CHERI to the 64-bit x86 architecture, we aim to provide a
model similar to CHERI-RISC-V and Morello.  This model should have the
following properties:

\begin{itemize}
\item A new capability hardware type that is usable for C language
  pointers.

\item Capability values should be intentionally used.  Instructions
  should explicitly specify whether a register operand should be used as a
  capability or an integer scalar.  Specifically, the presence (or
  lack) of a tag should not determine if a value is treated as a
  capability rather than an integer.

\item While new instructions will be required to manipulate
  capabilities, common code patterns for pure-capability C such as
  function prologues and epilogues should use similar opcode density
  to 64-bit x86.
\end{itemize}

\subsection{Capability Registers versus Segments}

The x86 architecture first added virtual memory support via
relocatable and variable-sized segments.  Each segment was assigned a
mask of permissions.  Memory references were resolved with respect to a
specific segment including relocation to a base address, bounds
checking, and access checks.  Special segment types permitted transitions
to and from different protection domains.

These features are similar to features in CHERI capabilities.
However, there are also some key differences.

First, x86 addresses are stored as a combination of an offset and a
segment spanning two different registers.  General-purpose registers
are used to hold offsets, and dedicated segment selector registers are
used to hold information about a single segment.  The x86 architecture
provides six segment selector registers -- three of which are reserved
for code, stack, and general data accesses.  A fourth register is
typically used to define the location of thread-local storage (TLS).
This leaves two segment registers to use for fine-grained segments
such as separate segments for individual stack variables.  These
registers do not load a segment descriptor from arbitrary locations in
memory.  Instead, each register selects a segment descriptor from a
descriptor table with a limited number of entries.  One could treat
the segment descriptor tables (or portions of these tables) as a cache
of active segments.

Second, more fine-grained segments are not derived from existing
segments.  Instead, each entry in a descriptor table is independent.
Write access to a descriptor table permits construction of arbitrary
segments (including special segments that permit privilege
transitions).  Restricting descriptor-table write access to kernel
mode does not protect against construction of arbitrary segments in
kernel mode due to bugs or vulnerabilities.  As a result, segment
descriptors are not able to provide the same provenance guarantees as
tagged capabilities.

Third, existing segment descriptors do not have available bits for
storing types or permissions more expressive than the existing
read, write, and execute.

Finally, x86 segmentation is typically not used in modern operating
systems.  On the 32-bit x86 architecture, systems generally create
segments with infinite bounds and use a non-zero base address only
for a single segment that provides TLS.  The 64-bit x86 architecture
codifies this by removing segment bounds entirely and supporting non-zero-base
addresses only for two segment registers.
Software for x86 systems stores only the offset portion of virtual
addresses in pointer variables.  Segment registers are set to fixed
values at program startup, never change, and are largely ignored.

One approach for providing a similar set of features to CHERI
capabilities on x86 would be to extend the existing segment primitives
to accommodate some of these differences.  For example, descriptor-table
entries could be tagged, whereby loading an untagged segment would trigger
an exception.  However, some other potential changes are broader in
scope (e.g., whether segment selectors should contain an index into a
table, versus a logical address of a segment descriptor).  Extending
segments would also result in a very different model compared to CHERI
capabilities on other architectures, limiting the ability to share code
and algorithms.  Instead, we propose to add CHERI capabilities to 64-bit
x86 by extending existing general-purpose integer registers.

\subsection{Common Architectural Features}

CHERI-x86-64 shares the following features with other CHERI
architectures:

\begin{itemize}
\item Tagged memory with capability-width tag granularity and alignment.
\item Registers able to hold capabilities are tagged.
\item \CIP{} controls program-counter-relative fetches.
\item \DDC{} controls memory operands using integer addresses.
\item Floating point is fully supported, including capability-relative
  floating-point load and store instructions.
\item General-purpose registers are extended to hold capabilities.
\item It is never left ambiguous as to whether a register operand used
  as the base address of a memory operand or branch target
  is a capability and therefore must have a tag set.
\item \cappermASR limits privileged ISA operations when within
  privileged rings.
\end{itemize}

\subsection{Unique Architectural Features}

The following changes are specific to CHERI-x86-64:

\begin{itemize}
 \item CHERI-x86-64 makes use of opcode prefixes to permit altering
   the addressing mode and operand size of individual instructions,
   both in 64-bit mode and capability mode.
 \item \RIP{} is the full integer value (virtual address) of \CIP{}
   and not \CIP{}.offset.
 \item Integer addresses are treated as absolute virtual addresses
   bounded by \DDC{}, and are not treated as offsets to \DDC{}.base.
 \item x86 exception handling is extended to support capabilities
   including a new architectural stack frame for exception entry and
   return.
 \item A new exception code is used to report CHERI-related
   exceptions.
 \item New PTE bits and page-fault exception code bits are defined for
   loading and storing capabilities in memory.
 \item The \FSBASE{} and \GSBASE{} registers are extended as
   capabilities.
 \item As with CHERI-RISC-V, the \cflags{} field contains a single bit
   used to enable capability mode in code capabilities installed into
   \CIP{}.
 \item Operations on capabilities can set bits in the \RFLAGS{}
   register.
\end{itemize}

\section{CHERI-x86-64 Specification}

\subsection{Tagged Capabilities and Memory}

As with CHERI-RISC-V, we recommend that both memory and
registers contain tagged 128-bit capabilities.
Since capabilities require 16-byte alignment in memory, attempts to
load or store capabilities at misaligned addresses should raise a
General Protection Fault with an error code of zero, similar to
misaligned loads and stores of SSE registers.

\subsection{General-Purpose Capability Registers}

The x86 architecture has expanded its general-purpose integer registers multiple
times.  Thus, the 16-bit \AX{} register has been extended to 32-bit \EAX{}
and 64-bit \RAX{}.
We propose extending each general-purpose integer register to a tagged, 128-bit register
able to contain a single capability.
The capability-sized registers would be named with a `C' prefix in place
of the `R' prefix used for 64-bit registers
(\CAX{}, \CBX{}, etc.).
As with CHERI-RISC-V,
we recommend that the bottom 64 bits of capability registers contain
the integer value (virtual address) and the upper 64 bits contain
capability metadata.
Reads of capability registers as integers return the integer value.
Integer writes to capability registers
should clear the tag and upper 64 bits of capability metadata, storing the
desired integer value in the bottom 64 bits.

The \RIP{} register (which contains the address of the current
instruction in the existing x86 architecture)
would also be extended into a \CIP{} capability.  This would function as
the equivalent of \PCC{}.

\subsection{Additional Capability Registers}
\label{sec:x86:additional-caps}

Additional capability registers beyond those present in the general-purpose
integer
register set will also be required.

A new register will be required to hold \DDC{} for controlling
non-capability-aware memory accesses.

The x86 architecture currently uses the \FS{} and \GS{} segment selector registers
to provide thread-local storage (TLS).  In the 64-bit x86 architecture,
these selectors are mostly reduced to holding an alternate base address
that is added as an offset to the virtual address of existing instructions.
For CHERI-x86-64 we recommend replacing these segment registers with two
new capability registers: \CFS{} and \CGS{}.

In addition, new capability control registers will be required to
manage user to kernel transitions as described in
Section~\ref{sec:x86:capability-control-registers}.

These additional registers will be stored as a separate bank of
capability registers.  As with other x86 register banks such as
control registers and debug registers, additional capability registers
cannot be used
as operands (with a limited exception for \CFS{} and \CGS{} described
below) in existing instructions.

\subsection{Capability Mode}

As with other CHERI architectures, CHERI-x86-64 should support running existing
x86-64 code, capability-aware code, and hybrid code.  This
requires the architecture to support an additional addressing mode
using capabilities as well as a new operand size for instructions
that use capabilities as operands.
The x86 architecture has supported similar extensions in the past when it was
extended to support 32-bit operation.

When x86 was extended from 16 bits to 32 bits, the architecture
included the ability to run existing 16-bit code without modification
as well as execute individual 16-bit or 32-bit instructions within a
32-bit or 16-bit codebase.  The support for 16-bit versus 32-bit
operation was
split into two categories: operand size and addressing modes.  The
code segment descriptor contains a single-bit `D' flag, which sets the
default operand size and addressing mode.  These attributes can then
be toggled to the non-default setting via opcode prefixes.  The 0x66
prefix is used to toggle the operand size, and the 0x67 prefix is used
to toggle the addressing mode.

In 64-bit (``long'') mode, the `D' flag is always set to
0 to indicate 32-bit operands and 64-bit addressing.  A value of
1 for `D' is reserved.  The 0x67 opcode prefix is used to toggle
between 32-bit and 64-bit addresses, but a few other single-byte opcodes
are invalid in 64-bit mode and could be repurposed as a prefix.

For CHERI support, we propose a similar scheme of using a default
execution mode along with prefixes to toggle the individual addressing
mode and operand size of individual instructions.  We define a new
\textbf{capability mode}.  As with CHERI-RISC-V, this mode is enabled
by setting the low bit of the \cflags{} field in \CIP{}.  This mode is
valid only in 64-bit mode.  A far call or jump that uses a 32-bit
code segment along with a target code capability with this flag set
will raise a General Protection Fault with the error code set to the
target segment selector.

In capability mode, instructions will use capability-aware addressing
(Section~\ref{sec:x86:capability-addressing}) by default.  Some existing
opcodes will also assume a capability sized operand in this mode.
Finally, instructions which work with the stack would use \CSP{} as
the implicit stack pointer.

\subsubsection{Removed Instructions in Capability Mode}

In capability mode, the following 64-bit mode instructions would no
longer be valid:

\begin{itemize}
  \item \insnnoref{PUSH FS}
  \item \insnnoref{POP FS}
  \item \insnnoref{PUSH GS}
  \item \insnnoref{POP GS}
  \item \insnnoref{LFS}
  \item \insnnoref{LGS}
  \item \insnnoref{LSS}
  \item \insnnoref{LAR}
  \item \insnnoref{LSL}
  \item Direct memory-offset \insnnoref{MOV}
  \item Far branches (\insnnoref{CALL}, \insnnoref{JMP}, and \insnnoref{RET})
  \item Segment Prefixes for \CS{}, \DS{}, \ES{}, and \SSreg{}
\end{itemize}

\subsection{Using Capabilities with Memory Address Operands}
\label{sec:x86:capability-addressing}

We propose a new capability-aware addressing mode that can be
toggled via a new 0x07
opcode prefix.  (In 32-bit x86, the 0x07 opcode is the
\insnnoref{POP ES} instruction, which is invalid in 64-bit mode.)
In capability mode, instructions will use
the capability-aware addressing mode by default.  Individual
instructions can toggle between capability-aware and ``plain''
64-bit addressing via the 0x07 opcode prefix.  Addresses using the
``plain'' 32-bit or 64-bit addressing will be constrained by \DDC{}
(for example, bounds and permissions).
Instructions using capability-aware addressing
would always use 64-bit virtual addresses.

The 0x07 prefix would be a Group 4 prefix meaning that a single
instruction would not be permitted to use both 0x67 and 0x07 prefixes.
In addition, the use of the 0x67 prefix in capability mode would not
be permitted.

\subsubsection{Capability-Aware Addressing}

For instructions with register-based memory operands, capability-aware
addressing would use the capability version of the register rather
than the integer register as a virtual address constrained by \DDC{}.

For example:

\begin{verbatim}
mov 0x8(%cbp),%rax
\end{verbatim}

would read the 64-bit value at offset 8 from the capability described
by the \CBP{} register.

On the other hand,

\begin{verbatim}
mov 0x8(%rbp),%rax
\end{verbatim}

would read the 64-bit value at the address \RBP{}+8 constraining the
memory access to the bounds and permissions of the \DDC{} capability.
Both instructions would use the same opcode aside from the addition of
an 0x07 opcode prefix.  In capability mode, the second
instruction would require the prefix.  In plain 64-bit mode,
the first instruction would require the prefix.

\subsubsection{Scaled-Index Base Addressing}

x86 also supports an addressing mode that combines the values of two
registers to construct a virtual address known as scaled-index base
addressing.  These addresses use one register, the \emph{base}, and a
second register, the \emph{index}, multiplied by a scaling factor of 1, 2,
4, or 8.  For these addresses, capability-aware addresses would select
a capability for the base register, but the index register would use
the integer value of the register.  For example:

\begin{verbatim}
mov (%rax,%rbx,4),%rcx
\end{verbatim}

This computes an effective address of \RAX{} + \RBX{} * 4 and loads the value
at that address into \RCX{},  The capability-aware version would be:

\begin{verbatim}
mov (%cax,%rbx,4),%rcx
\end{verbatim}

That is, starting with the \CAX{} capability, \RBX{} * 4 would be added to the
offset, and the resulting address validated against the \CAX{} capability.

\subsubsection{RIP-Relative Addressing}

The 64-bit x86 architecture added a new addressing mode to support more
efficient Position-Independent Code (PIC) performance.
This addressing mode uses an immediate offset
relative to the current value of the instruction
pointer.  These addresses are known as \RIP{}-relative addresses.

To support existing code, \RIP{}-relative addresses should be constrained
by \DDC{} when using ``plain'' 64-bit addressing.

When capability-aware addressing is used, \RIP{}-relative addresses
would instead be treated as \CIP{}-relative addresses
constrained by the bounds and permissions of \CIP{}.

\subsubsection{Absolute Addresses}

Memory operands can be encoded without a base register, either as an
absolute address, or an absolute address added to a scaled index
register.  If these addresses are not used as offsets relative to
\CFS{} or \CGS{} as described below in Section~\ref{sec:x86:cfs-cgs},
they are always constrained by
\DDC{}, including in capability mode.

\subsubsection{Direct Memory-Offset MOVs}

The direct memory-offset \insnnoref{MOV} instructions store the
absolute address of a memory operand as an immediate operand.
Extending these instructions to support capability immediates would
require padding nops to align the capability immediate as well as text
relocations (even for position-dependent code).  However, we do not
anticipate wide use of these instructions so instead choose to
remove them in capability mode and restrict them to using integer
operands and integer addressing in 64-bit mode.  Attempting to use these instructions
with capability-aware addressing would be reserved and raise a UD\#
exception.

\subsubsection{Addresses Relative to CFS and CGS}
\label{sec:x86:cfs-cgs}

Capability-aware addressing must also permit addresses defined as
offsets relative to \CFS{} and \CGS{} to support TLS with
capability-aware addresses.  When an instruction uses the \FS{} or
\GS{} segment prefix with capability-aware addressing, the memory
operand (registers and displacement) is interpreted as an integer
offset relative to the \CFS{} or \CGS{} capability register,
respectively.

Other segment prefixes are not permitted in capability-aware
addressing.  Attempting to use a segment prefix other than \FS{} or
\GS{} with a capability-aware address should raise an illegal
instruction exception.

\subsubsection{Instructions with Implicit Memory Operands}

Some x86 instructions have implicit memory operands addressed by a
register.  These instructions should support addressing memory with
capabilities.

The ``string''
instructions use \RSI{} as source address and \RDI{} as a destination address.
For example, the
\insnxesref{STOS} instruction stores the value in \AL{}/\AX{}/\EAX{}/\RAX{} to the address in
\RDI{}, and then either increments or decrements the destination
index register (depending on the Direction Flag).  When capability
addressing mode is enabled,
these string instructions should use \CSI{} instead of \RSI{} and \CDI{} instead of
\RDI{}.

\insnnoref{XLAT} should use \CBX{} as the implicit table address when
using capability-aware addressing.

\subsubsection{Stack Address Size}

Instructions that work with the stack such as \insnxesref{PUSH} or
\insnxesref{CALL} use the stack pointer as an implicit operand.  In
32-bit x86, the `B' flag of the stack segment selector determines if
the 16-bit or 32-bit stack pointer register is used.  In 64-bit long
mode, \RSP{} is always used as the stack pointer.  In capability mode,
\CSP{} would always be used as the stack pointer.

Code that needs to use the alternate stack pointer
interpretation would simulate these instructions using \insnxesref{MOV}
instructions and adjusting the desired stack pointer using
instructions such as \insnxesref{ADD} or \insnxesref{SUB}.  Emulation of
\insnxesref{CALL} or \insnxesref{RET} would use \insnxesref{JMP} to
adjust the instruction pointer.

\subsection{Capability-Aware Instructions}

CHERI-x86-64 will require new instructions to examine and modify
capabilities.  Many of these new instructions can be implemented as
new variants of existing instructions that use an opcode that
specifies a capability operation rather than an integer operation.
Existing x86 toolchains already use instruction suffixes such as
\texttt{b}, \texttt{w}, \texttt{l}, and \texttt{q} to explicitly state
the operand size.  We recommend that the \texttt{c} suffix be used to
explicitly state a capability operand size.

\subsubsection{Capability Operands for Existing Opcodes}

Previous extensions to the x86 architecture have relied on opcode
prefixes combined with the `D' and `L' flags of the current code
segment to determine the operand size.  We propose a similar
scheme for supporting capability-sized operands with existing
opcodes.

First, we propose reusing a single-byte opcode declared invalid in
64-bit mode such as 0x06 (\insnnoref{PUSH ES}) as an opcode prefix
(\textbf{capability operand prefix}).  This prefix would be classified
as a Group 3 prefix meaning that a single instruction would not be
permitted to use both 0x66 and 0x06 prefixes.

When not executing in capability mode, existing instructions will
follow the existing rules for 64-bit long mode as defined by the
0x66 prefix and \texttt{REX.W} flag to set the operand size.  If an
instruction supports capability-sized operands, the capability operand
prefix can be used to use a capability-sized operand instead.  This
prefix would have higher precedence than \texttt{REX.W}.

In capability mode, most instructions that can operate on either
integer or capability-sized values would follow the same logic in the
previous paragraph to determine the operand size.  However, two groups
of existing instructions would default to using a capability-sized
operand when executed in capability mode:

\begin{itemize}
  \item Near branches.

  \item Instructions that implicitly reference
    the stack pointer (\CSP{}).
\end{itemize}

This matches the approach used to select a default operand size of 64
bits in 64-bit long mode.  For some of these instructions, the
capability operand prefix could be used to revert to a smaller operand
size.  The effective operand size would then determined by \texttt{REX.W}.

\subsubsection{Extending Existing Instructions to Support Capability Operands}

Several existing instructions should be extended to support
capability operands:

\begin{itemize}
  \item \insnxesref[mov]{MOVC} would handle loads and stores of
    capabilities similar to \insnriscvref{CLC} and \insnriscvref{CSC} as well as
    copying capabilities between registers similar to \insnriscvref{CMove}.

    To permit moving the contents of an additional capability register
    to a general-purpose register or vice versa, two new
    \insnxesref[movcap]{MOV} opcodes would be
    used.  These opcodes would permit access to \CFS{}, \CGS{}, and
    \DDC{} in all privilege levels.  Access to other additional
    capability registers would be permitted only in privilege level 0.

  \item \insnxesref[movnti]{MOVNTIC} would store a single capability to memory
    using a non-temporal hint.

  \item The string instructions \insnxesref{LODS}, \insnxesref{MOVS},
    and \insnxesref{STOS} would be extended to support capability
    operands.

    We do not currently foresee a need to extend \insnnoref{CMPS} and
    \insnnoref{SCAS} with support for capability operands.  If that
    did prove necessary, they could be extended.

  \item \insnxesref[cmov]{CMOVC} would handle conditional loads and stores of
    capabilities.

  \item \insnxesref[add]{ADDC} and \insnxesref[sub]{SUBC} would be used to adjust
    the \textbf{address} field of a capability similar to \insnriscvref{CIncOffset}.  Note
    that for these instructions, the source operand would either be a
    sign-extended immediate or a 64-bit integer register whose value
    is either added to or subtracted from the \textbf{address} field of the
    capability-sized destination operand.

    For example:

\begin{verbatim}
add %csp,$16
\end{verbatim}

    would move the capability stack pointer up by 16 bytes.

    We do not anticipiate a need for capability-sized variants of
    \insnnoref{ADC} or \insnnoref{SBB}.

  \item \insnxesref[inc]{INCC} and \insnxesref[dec]{DECC} would permit
    simple increments and decrements of the \textbf{address} field of
    capabilites.

  \item \insnxesref[and]{ANDC}, \insnxesref[or]{ORC}, and \insnxesref[xor]{XORC} would
    permit bit manipulation of the \textbf{address} field of a capability.  As
    with \insnxesref[add]{ADDC}, the second operand would always be an
    integer operand.

  \item \insnxesref[cmp]{CMPC} would permit comparison of capability values
    including the functionality of both \insnref{CSetEqualExact} (via
    \texttt{ZF}) and \insnref{CTestSubset} (via \texttt{SF}).  This is
    somewhat different from the existing variants of \insnnoref{CMP}
    that perform the equivalent \insnnoref{SUB} instruction and then
    discard the result as in this case the flags set would not be
    identical to the flags set as a result of \insnxesref[sub]{SUBC}.

    We do not anticipate a need for a capability-sized variant of
    \insnnoref{TEST}.

  \item \insnxesref[cmpxchg]{CMPXCHGC} will be required to support atomic
    operations on capabilities.  (Note that \insnnoref{CMPXCHG16B}'s
    existing semantics are not suitable for capabilities as it divides
    the values into register pairs.)

  \item \insnxesref{CMPXCHG2C} will be required to support atomic
    operations on pairs of capabilities.

  \item \insnxesref[xchg]{XCHGC} will also be required to support atomic
    operations on capabilities.

  \item It may also be desirable to support \insnxesref[xadd]{XADDC}.  For
    this instruction, only the integer portion of the second (source)
    operand would be added to the first
    (destination) operand to determine the value stored to the
    destination.  Any tag or capability metadata in the second operand
    would be ignored and would be overwritten with the original value
    of the first operand.

  \item \insnxesref[push]{PUSHC} and \insnxesref[pop]{POPC} would be used to save
    and restore capability registers on the stack.

  \item \insnxesref[lea]{LEAC} would store the resulting address in a
    destination capability register.

    \insnxesref{LEA} would not support the 0x07 opcode prefix.  The
    address size would always match the operand size.  Storing an
    integer address in a capability register would have the same
    effect as the equivalent version of \insnxesref{LEA} storing the
    integer address to the integer alias register.  Using a
    capability-aware address with an integer \insnxesref{LEA} would
    also be identical in effect to using ``plain'' addressing.

  \item \insnnoref{ENTER} and \insnnoref{LEAVE} could be extended to
    support implicit capability operands, or they could be deprecated
    and remain as integer-only instructions.

    If these instructions were extended to support capability
    operands, the capability-sized versions would operate on \CSP{}
    and \CBP{} rather than \RSP{} and \RBP{}.  These instructions
    would also default to capability operands in capability mode
    if extended.

    If these instructions were deprecated then they would would be
    removed in capability mode.
\end{itemize}

\subsection{Control-Flow Instructions}

Absolute near branches would be extended to support capability operands.
In 64-bit long mode, a capability operand prefix would select a
capability operand size.  In capability mode, absolute near branches would
support only capability operands.
Absolute near branches that use an integer operand would set the
\textbf{address} field of the
\CIP{} capability while absolute near branches using a capability operand would
load a new capability into \CIP{}.
Relative near branches would always modify the \textbf{address} field of the \CIP{}
capability and would not support the capability operand prefix.

The size of return addresses pushed to and popped from the
stack for near branches would be determined by the operand size.
Capability-sized branches would save and restore a full capability on
the stack while integer-sized branches would save and restore an
integer address.

Far calls, jumps, and returns would not support capability operands
and would be invalid in capability mode.
Far branches would
set the \textbf{address} field of \CIP{}.

If the resulting value of \CIP{} after any branch
is invalid, a capability violation fault would be raised on the branch
instruction (see Section~\ref{sec:x86:capability-fault}).

\insnnoref{IRETC} should pop a capability exception frame (see
Section~\ref{sec:x86:interrupt-exception}) from the stack loading
capabilities into \CIP{} and \CSP{}.  This instruction would require
the capability operand prefix.  An attempt to restore a 32-bit code
segment paired with a \CIP{} that uses capability mode should raise a
General Protection fault with the error code set to the destination
code segment.

Note that attempting to push or pop a misaligned capability will raise
an exception.  The stack pointer must be suitably aligned before the
use of \insnxesref[call]{CALLC}, \insnnoref{IRETC}, and \insnxesref[ret]{RETC}.

\subsection{New CHERI Instructions}

For other capability operations we
propose adding new CHERI-specific instructions.
Existing general-purpose x86 instructions support two operands rather
than three operands.  To avoid requiring a \VEX{} prefix for all new
CHERI instructions, most instructions are defined with two operands
rather than three.  New instructions that require three operands must
be encoded using a \VEX{} prefix.

Note that all of these instructions would only be valid in 64-bit mode
and capability mode.

\subsubsection{Capability-Inspection Instructions}

These instructions fetch a single field from a capability.

\begin{itemize}
  \item \insnxesref{GCPERM} -- Get Capability Permissions
  \item \insnxesref{GCTYPE} -- Get Capability Object Type
  \item \insnxesref{GCBASE} -- Get Capability Base
  \item \insnxesref{GCLEN} -- Get Capability Length
  \item \insnxesref{GCTAG} -- Get Capability Tag
  \item \insnxesref{GCOFF} -- Get Capability Offset
  \item \insnxesref{GCHI} -- Get Capability High Half
  \item \insnxesref{GCLIM} -- Get Capability Limit
  \item \insnxesref{GCFLAGS} -- Get Capability Flags
\end{itemize}

\subsubsection{Capability-Modification Instructions}

If these instructions fail, they should clear the tag in the resulting
capability.

\begin{itemize}
  \item \insnxesref{SEAL} -- Seal Capability
  \item \insnxesref{UNSEAL} -- Unseal Capability
  \item \insnxesref{ANDCPERM} -- Mask Capability Permissions
  \item \insnxesref{SCOFF} -- Set Capability Offset
  \item \insnxesref{SCADDR} -- Set Capability Address
  \item \insnxesref{SCBND} -- Set Capability Bounds
  \item \insnxesref{SCBNDE} -- Set Exact Capability Bounds
  \item \insnxesref{SCHI} -- Set Capability High Half
  \item \insnxesref{SCFLAGS} -- Set Capability Flags
  \item \insnxesref{CLCTAG} -- Clear Capability Tag
  \item \insnxesref{BUILDCAP} -- Construct Capability
  \item \insnxesref{CPYTYPE} -- Construct Sealing Capability
  \item \insnxesref{CSEAL} -- Conditional Capability Seal
  \item \insnxesref{SENTRY} -- Seal Capability as a Sentry
\end{itemize}

\subsubsection{Control-Flow Instructions}

\begin{itemize}
  \item \insnxesref{CINVOKE} -- Invoke sealed capability pair
\end{itemize}

\subsubsection{Adjusting to Compressed Capability Precision
  Instructions}

\begin{itemize}
  \item \insnxesref{CRRL} -- Round Representable Length
  \item \insnxesref{CRAM} -- Representable Alignment Mask
\end{itemize}

\subsubsection{Tag-Memory Access Instructions}

These instructions permit bulk access to a set of in-memory tags.
Each instruction accesses the tags in a ``stride'' of capabilities.
The size of a stride is implementation dependent.  It must be a power
of two, and it is suggested that a stride contain the number of tags
in a single cache line.  The stride size should either be reported in
a new \insnnoref{CPUID} leaf or be defined as equal to the value
returned by an existing \insnnoref{CPUID} leaf.

\begin{itemize}
  \item \insnxesref{LCTAGS} -- Load Capability Tags
  \item \insnxesref{CLCTAGS} -- Clear Capability Tags
\end{itemize}

\subsection{Interactions with Vector Extensions}

CHERI should have minimal impact on existing vector extensions to the
x86 architecture including MMX, SSE, AVX, and AVX-512.

\subsubsection{Vector Registers and Memory Tags}

We propose that vector registers should not contain tags.  Loads of
vector registers should ignore tags in memory, and stores of vector
registers to memory should always clear tags.  Existing vector
instructions that manipulate vector register contents do not make
sense for tagged capability values.  However, vector extensions are
also used to perform certain classes of memory loads and stores, which
may require additional care.

\subsubsection{Memory Copies}

Vector loads and stores are often used to implement \ccode{memcpy()}.
In CHERI C, \ccode{memcpy()} must preserve tags.  A \ccode{memcpy()}
implementation that uses \insnxesref[mov]{MOVC} will operate at the same
width as existing memory copies implemented using SSE, which may
mitigate some of the cost.  Another option may be to support an
optimized \insnxesref[movs]{REP MOVSC} similar to the existing optimization
for \insnnoref{REP MOVSB} where the former instruction would preserve
tags during a copy unlike the latter.

\subsubsection{Non-Temporal Stores}

Blocks of data stored to memory mapped with write-combining (WC) are
often written via non-temporal vector register stores.  However, such
data is generally consumed by an I/O device via DMA and rarely
contains pointers.  We believe that permitting a non-temporal store of
a single capability via \insnxesref[movnti]{MOVNTIC} is sufficient for cases
requiring non-temporal stores of tagged capabilities.

\subsubsection{Memory Addressing}

Vector instructions with memory operands would support
capability-aware addressing in the same manner as general-purpose
register instructions.  For scatter/gather instructions using VSIB,
the base address register would use a capability register instead of
an integer address when using capability-aware addressing.

\subsection{Capability Violation Faults}
\label{sec:x86:capability-fault}

For reporting capability violations, we propose reserving a new
exception vector.  This new exception would report an error code
pushed as part of the exception frame similar to GP\# and PF\# faults.
This error code would contain the capability exception code as
described in Table~\ref{table:x86:capability-cause} to indicate
the specific violation.

\begin{table}
\begin{center}
\begin{tabular}{ll}
\toprule
Value & Description \\
\midrule
0x0 & Tag Violation \\
0x1 & Length Violation \\
0x2 & Seal Violation \\
0x3 & Type Violation \\
0x4 & Software-defined Permission Violation \\
0x5 & \cappermG Violation \\
0x6 & \cappermX Violation \\
0x7 & \cappermL Violation \\
0x8 & \cappermS Violation \\
0x9 & \cappermLC Violation \\
0xa & \cappermSC Violation \\
0xb & \cappermSLC Violation \\
0xc & \cappermASR Violation \\
0xd & \cappermInvoke Violation \\
0xe & \cappermCid Violation \\
\bottomrule
\end{tabular}
\end{center}
\caption{CHERI-x86-64 Capability Exception Error Codes}
\label{table:x86:capability-cause}
\end{table}

If an instruction could potentially throw more than one capability exception,
the capability exception error code is set to the highest priority exception (numerically lowest
priority value) as shown in Table~\ref{table:x86:exception-priority}.

\begin{table}
\begin{center}
\begin{tabular}{ll}
\toprule
Priority & Description \\
\midrule
1  & \cappermASR Violation \\
2  & Tag Violation \\
3  & Seal Violation \\
4  & Type Violation \\
5  & \cappermInvoke Violation \\
   & \cappermCid Violation \\
6  & \cappermX Violation \\
7  & \cappermL Violation \\
   & \cappermS Violation \\
8  & \cappermLC Violation \\
   & \cappermSC Violation \\
9 & \cappermSLC Violation \\
10 & \cappermG Violation \\
11 & Length Violation \\
12 & Software-defined Permission Violation \\
\bottomrule
\end{tabular}
\end{center}
\caption{CHERI-x86-64 Capability Exception Priority}
\label{table:x86:exception-priority}
\end{table}

CHERI-RISC-V includes the name of the register, which
triggers a capability violation.  It is not feasible to provide a
direct analog of this on x86.  Indirect jumps and calls may raise an
exception while loading a capability from memory that is not present
in any register at the start of the instruction.  However, unlike page
faults, capability violation faults are not generally restartable and
the register name's primary use is for debugging convenience rather than
correctness.  There are a few possible options for providing similar
information:

\begin{enumerate}
\item Provide a copy of the faulting capability via a new capability
  control register similar to the PF\# virtual address stored in
  \CRTWO{}.  This faulting capability would include the result of any
  offset adjustments from immediates or scaled indices.  If the result
  of offset adjustments made the capability unrepresentable, the
  faulting capability would have its tag cleared.
\item Similar to the above, but ignore offset adjustments and provide
  only the base capability value.
\item Provide the virtual address from the faulting capability in
  \CRTWO{} similar to PF\#.  A debugger could examine the faulting
  instruction's operands to determine which capability triggered the fault.
\item Do nothing as the prior approaches may be too expensive to
  implement.
\end{enumerate}

Like Morello and CHERI-RISC-V, CHERI-x86-64 would
raise capability violation faults when a invalid memory access is
performed such as an out-of-bounds access or access via an untagged
capability.  Instructions which modify
capabilities should not raise capability violation faults (for
example, when a capability becomes unrepresentable) but should instead
clear the tag of the resulting capability.  This permits compilers to
speculatively reorder these instructions without raising spurious
faults during execution.

\subsection{Call Gates}

We do not recommend extending call gates to support capabilities.
Supporting capabilities with call gates would likely require the
following changes:

\begin{itemize}
  \item Extending the global and local descriptor table format to
    support a new capability call gate that stores a full capability
    rather than a 64-bit address.  This will be more invasive than the
    64-bit call gate that depends on the ability to force a number
    of reserved bits in the fourth double word to zero as a sentinel
    type for the second half of a 64-bit call gate.

  \item As with 64-bit call gates, capability call gates would not support
    parameter copying.

  \item Calls to a capability call gate would need to push a modified
    call frame containing both a code segment and code capability that
    would be returned from via \insnnoref{RETFC}.
\end{itemize}

\subsection{Interrupt and Exception Handling}
\label{sec:x86:interrupt-exception}

For interrupt and exception handling, we propose a new overall CPU
mode that enables the use of capabilities.  The availability of this
mode would be indicated by a new \insnnoref{CPUID} flag.  The mode
would be enabled by setting a new bit in \CRFOUR{}.  When this mode is
enabled, exceptions would push a new type of interrupt frame.  As with
exceptions in long mode, the stack pointer would be 16-byte aligned
prior to pushing the exception frame to ensure capabilities are
aligned.  The \RIP{} and \RSP{} fields in the exception frame would be
replaced with the full \CIP{} and \CSP{} capabilities.  Other fields
in this frame would be padded to 16 bytes.  To minimize padding, it
may be desirable to pack multiple smaller registers into a single
16-byte slot; for example, \SSreg{}, \CS{}, and \RFLAGS{} could be stored
in a single slot.  However, this would result in a frame layout
inconsistent with far calls.  \insnnoref{IRETC} would be used in
interrupt service routines to unwind this frame.

\subsubsection{Capability Control Registers}
\label{sec:x86:capability-control-registers}

Interrupt and exception handlers require new capabilities for the
program counter (\CIP{}) and stack pointer (\CSP{}) registers.  These
values must be derived from valid, privileged capabilities.  To
support this, we propose the addition of a new class of capability
registers: capability control registers.

Capability control registers are capability-sized control registers.
As with other control registers such as \CRFOUR, direct access to
capability control registers would be restricted to supervisor mode as
well as requiring \cappermASR{} in \CIP{}.  Unlike other control
registers, however, capability control registers would not be accessed
via the \texttt{0F 20} and \texttt{0F 22} opcodes of \insnnoref{MOV}.
Instead, capability control registers would be named as additional
capability registers as described in
Section~\ref{sec:x86:additional-caps}.

We consider two possible approaches for deriving \CIP{} and \CSP{} at
the start of an interrupt or exception.

\subsubsection{Kernel Code and Stack Capabilities}

The first approach would add two new capability control registers: the Kernel
Code Capability (\KCC{}) and Kernel Stack Capability (\KSC{}).  Transitions into
supervisor mode would load new addresses from
existing data structures and tables to derive the new \CIP{} and \CSP{}
register values.  For example, the current virtual address stored in
each Interrupt Descriptor Table (\IDT{}) entry would be used as an
address to derive a new \CIP{} from \KCC{}, and the address stored in the Interrupt
Stack Table (\IST{}) entry in the current Task-State Segment (\TSS{}) would
be used as an address to derive a new \CSP{} from \KSC{}.  Transitions via
the \insnnoref{SYSCALL} instruction would use the address from \LSTAR{} to
construct the new \CIP{}.

This approach does require broad capabilities
for \KCC{} and \KSC{} that can accommodate any desired entry point or stack
location.  However, it will require minimal changes to existing systems
code such as operating-system kernels.

\subsubsection{Capabilities in Entry Points}

The second approach would replace virtual addresses stored in
existing entry points with complete capabilities.  This is a more
invasive change, requiring larger changes to existing systems code, but
it enables the use of more fine-grained capabilities for each entry
point.

Setting the desired kernel stack pointers in \CSP{} would require a new
\TSS{} layout that expanded the existing \RSP{} and \IST{} entries to
capabilities.

For \insnnoref{SYSCALL}, a new capability control register \CSTAR{} would be
added to hold the target instruction pointer.

Entries in the \IDT{} would be expanded to 32-bytes, appending a capability
code pointer in the last 16 bytes.  This would double the size of the
\IDT{}, and most of the bytes would be unused.  However, it would
ensure that all of the information currently stored in an \IDT{} entry
(such as the segment selector, \IST{} index, and descriptor type) would
be configurable.

\subsubsection{\insnnoref{SWAPGS} and Capabilities}

The \insnnoref{SWAPGS} instruction is used in user-to-kernel
transitions for the 64-bit x86 architecture to permit separate TLS
pointers for user and kernel mode.  We recommend defining a new
capability control register \KGS{}.  \insnnoref{SWAPGS} in capability
mode would swap the \CGS{} and \KGS{} registers.

\subsection{FS and GS Aliases}

The \FS{} and \GS{} segment descriptors have grown several related
aliases over time such as the \FSBASE{} and \GSBASE{} MSRs and
\insnnoref{RDFSBASE} family of instructions.  These aliases should be
implemented as the addresses of the appropriate capability register.
Reads of the \FSBASE{}, \GSBASE{}, and \KGSBASE{} MSRs should return
the \textbf{address} field of the \CFS{}, \CGS{}, and \KGS{} capabilities,
respectively.  Writes to these MSRs should set the \textbf{address} field of the
respective capability equivalent to \insnxesref{SCADDR}.  Similarly,
the \insnnoref{RDFSBASE} and \insnnoref{RDGSBASE} instructions should
return the \textbf{address} field of the \CFS{} and \CGS{} capabilities,
respectively.  The \insnnoref{WRFSBASE} and \insnnoref{WRGSBASE}
instructions should set the \textbf{address} field of the respective capability
equivalent to \insnxesref{SCADDR}.  If a new address is set that makes
the capability unrepresentable, the capability's tag should be
cleared.

\subsection{Page Tables}

Similar to CHERI on other architectures, additional page-table
permission bits governing loads and stores of capabilities are
desirable.  In addition, it may be beneficial to have a ``capability
dirty'' bit.  At present the 64-bit x86 architecture has reserved bits
in a range from bit 52 (\texttt{MAXPHYADDR}) to bit 62.  The Protection Keys
extension uses bits 59-62 from that range.  To avoid conflicting with
Protection Keys, CHERI-x86-64 could use bits starting at bit 58 as described in Table~\ref{table:x86:pte}.  Higher bits are
preferred, to permit maximal room for growth of the physical address
field that currently ends at bit 51.

\begin{table}
\begin{center}
\begin{tabular}{lll}
\toprule
Bit & Name & Description \\
\midrule
58 & CW & Permits writes of tagged capabilities \\
57 & CR & Permits reads of tagged capabilities \\
56 & CD & Set when a tagged capability is written to this page \\
\bottomrule
\end{tabular}
\end{center}
\caption{CHERI-x86-64 Page Table Bits}
\label{table:x86:pte}
\end{table}

If an instruction performs a memory access that violates a CHERI page
permission (such as a store of a tagged capability to a page where the
\texttt{CW} bit is clear), a page-fault (PF\#) exception should be
raised.  Bit 8 (currently reserved) should be set in the page-fault
error code provided by the processor indicating that the fault was
caused by a capability permission violation.  Other bits in the page
fault error code such as \texttt{P}, \texttt{W/R}, \texttt{U/S}, and
\texttt{I/D} should be set to indicate the type of memory access.  In
addition, the virtual address of the memory access should be provided
in the \CRTWO{} register similar to other page-fault exceptions.

Note that the \texttt{CR} and \texttt{CW} bits fault only if the
capability being read or written is tagged.  Untagged capability
values can be read from or written to memory regardless of the
\texttt{CR} and \texttt{CW} permissions.  In addition, if the
authorizing capability for a capability read does not hold \cappermLC,
then reading a tagged capability will always return a capability with
the tag cleared instead of faulting.

Instruction fetches always ignore tags and will never raise a
capability page-fault exception.

\subsection{Controlling Access to System Registers}

In CHERI-x86-64, \cappermASR{} would be required to directly access the
following registers:

\begin{itemize}
  \item Control registers including \KCC{}, \KSC{}, \CSTAR{}, and \KGS{}
  \item Debug registers
  \item Model-specific registers
\end{itemize}

\section{Design Rationale}

We have considered several alternatives to various aspects of the
CHERI-x86-64 design.  This section describes some of those
alternatives.

\subsection{Capability Mode}

Currently capability mode is enabled via a single-bit flags field in
\CIP{}.  We did consider more closely matching older extensions to the
x86 architecture by repurposing the `D' flag of the current code
segment descriptor to enable capability mode.  Similarly, we
considered using the `B' flag of the current stack segment descriptor
to select the implicit stack pointer of \CSP{}.  While this approach
would match traditional x86, it would not protect instruction decoding
by sealing.  For example, a sentry capability could be used in either
plain 64-bit mode or capability mode.  By storing the mode in
capability metadata protected by sealing, sealed code capabilities can
be used only in the intended mode.  Also, while it may be less
flexible to permit the stack alignment to be chosen orthogonally to the
default instruction encoding mode, it does not seem useful in
practice.  Instead, capability mode is designed as a single knob to
optimize pure capability code.

\subsection{Additional Capability Registers as Operands}

We considered various options for using additional capability
registers such as \CGS{} as explicit operands in instructions rather
than as a separate bank of registers accessed only via
\insnxesref[movcap]{MOV}.  All of these approaches add complexity to
instruction decoding, but we do not anticipate frequent direct access
to additional capability registers beyond the use of the existing
\FS{} and \GS{} segment prefixes.

\subsubsection{Using Segment Prefixes}

One approach to expand register selector fields would be to make use
of existing segment prefixes to indicate a set 5th bit for a specific
field.  For example, the \GS{} prefix could be used in capability-aware
addressing mode to indicate that the base capability register used in
a memory operand would be an additional capability register with an
index of 16 or higher.  The lower four bits of the register selector
would be determined by the existing register selector fields.   Note
that this approach would void the earlier use of the \FS{} and \GS{}
segment prefixes.  Instead, \CFS{} and \CGS{} would be used as base
address registers in memory operands via the expanded register
selector field.

Additional prefixes could be used to extend other register selector
fields at the cost of potentially using multiple segment prefixes in a
single instruction.  For example, the \FS{} prefix could be used to
extend the ``r'' register selector field.

\subsubsection{\texttt{VEX.mmmmm} Field}

A second approach can be used with instructions that can be encoded
with a \VEX{} prefix.  The upper three bits of the \texttt{VEX.mmmmm}
field could be reused as the 5th bit of register selector fields
similar to \texttt{EVEX.R'}, \texttt{EVEX.X}, and \texttt{EVEX.V'}
fields in \EVEX{} prefixes.

\subsubsection{EVEX Prefixes}

A third approach would be to require \EVEX{} prefixes for instructions
using an additional capability register.

\subsection{Access to Additional Capability Registers}

The \CFS{}, \CGS{}, and \DDC{} capability registers must be accessible
in all privilege levels.  However, other additional capability
registers such as \KGS{} are suitable only for privilege level 0.
Currently the \textbf{0F 24} and \textbf{0F 25} instructions permit
access to a subset of registers in privilege levels other than 0.

A few other approaches are enumerated below:

\begin{itemize}
  \item For \CFS{} and \CGS{} the \insnnoref{RDFSBASE} family of
    instructions could be expanded to support a capability operand
    size.  This would not provide a solution for access to \DDC{} but
    would otherwise permit restricting \textbf{0F 24} and \textbf{0F
      25} to privilege level 0.

  \item Capability control registers such as \KGS{} could be allocated
    unused indices in the existing control register bank.  This would
    require the \textbf{0F 20} and \textbf{0F 22} opcodes to vary the
    operand size based on the control register index.

  \item The additional capability registers could be split into two
    separate banks.  One for \CFS{}, \CGS{}, and \DDC{} accessible via
    \textbf{0F 24} and \textbf{0F 25} accessible at all privilege
    levels, and a second bank of capability control registers
    accessible via a a second set of opcodes such as \textbf{0F 26}
    and \textbf{0F 27} that were restricted to privilege level 0.
\end{itemize}

\subsection{Additional Capability Arithmetic Opcodes}

Two operand arithmetic instructions such as \insnxesref{ADD} overwrite
one of the source operands with the arithmetic result.  For operations
that are commutative (such as adding two integers), a compiler can
choose which of the source operands to overwrite.  For example, if a
series of instructions computes integer pointers to fields of an
object by adding offsets to a base integer pointer, the compiler can
use the register holding the offset as the destination operand of
\insnxesref{ADD} to preserve the operand holding the base object
integer pointer.  Capability arithmetic instructions such as
\insnxesref[add]{ADDC} are not commutative since only one source
operand holds a capability.

To mitigate this, it may be desirable to add alternate opcodes for
\insnxesref{ADD}, \insnxesref{SUB}, \insnxesref{AND}, \insnxesref{OR},
and \insnxesref{XOR}, which treat the destination operand as an integer
input to the arithmetic operation applied to the second capability
operand.  This could be implemented by extending a subset of the 8-bit
opcodes of these instructions to perform a three operand operation
when used with a capability operand prefix.  However, two of these
operands would be encoded by a single ModRM field similar to the
encoding of \insnxesref[xadd]{XADDC}.

For example, the \texttt{00} opcode would be extended to support three
operand \insnxesref[add]{ADDC} by adding a 64-bit offset read from
ModRM:r/m to the \textbf{address} field of the capability read from
ModRM:reg.  The result would then be stored to the capability
identified by ModRM:r/m.

The instruction

\begin{verbatim}
addc %cax, %cbx, %rax
\end{verbatim}

would add \RAX{} to the address field of \CBX{} and store the result
in \CAX{}.  It would be encoded identically to the instruction

\begin{verbatim}
addc %cax, %rbx
\end{verbatim}

except for using the opcode \texttt{00} instead of \texttt{01}.

Note that this approach would permit encoding variants of
\insnxesref[and]{ANDC} and \insnxesref[sub]{SUBC} that preserve the
base capability pointer input operand -- which is not possible in the
existing ISA for integer pointers.

Instructions that use an immediate source operand would not be
extended in this manner.

\subsection{Vector Registers and Tags}

It may be desirable to support loading and storing tags in vector
registers.  In particular, if a tag preserving extension of
\insnnoref{REP MOVSB} is not added, then loads and stores of multiple
packed capabilities via new instructions may be desirable to support
optimized implementations of \ccode{memcpy()}.  For example, new
tag-preserving variants of \insnnoref{MOVDQA} and \insnnoref{MOVDQU}
could be added via two new two-byte opcodes.

This would require extending the vector registers to contain one or
more tags (1 tag for XMM registers, 2 tags for YMM, 4 tags for ZMM).
Instructions that modify vector registers should not permit
non-monotonic operations on tagged capabilities embedded in vector
registers.  The simplest approach would be to clear all tags for any
instruction other than simple move operations.  However, it may be
desirable to preserve tags for operations that are safe.  For
example, tags belonging to capabilities in the unshuffled half of a
YMM or ZMM register used with \insnnoref{VPSHUFHW} could be safely
preserved.

\subsection{Far Branches and Capabilities}

Supporting far branches with capability operands would add additional
complexity.  For example, far branches need to ensure that code
capability pointers that enable capability mode are used only with
64-bit code segments.  In-memory far capability pointers would also
have odd alignment requirements due to the 16-bit code selector being
adjacent to an aligned capability.  Far branches are also little used
in existing 64-bit x86 programs.  Significantly, 64-bit x86 still
defaults to 32-bit operands for far branches (unlike near branches
that are commonly used and default to 64-bit operands).

\subsection{Direct Memory-Offset MOVs}

These four \insnnoref{MOV} instructions store the address of their
memory operand inline as an immediate.  Currently, we propose
removing these instructions in capability code and not extending
them to support capability operands in 64-bit mode.  These instructions could instead
be extended to support capabilities both as immediates for the memory
offset and as operands.  In that case, the opcodes would be retained
in capability mode rather than removed.

\subsection{XCHG [ER]AX Opcodes}

If the \insnxesref{XCHG} instructions \texttt{91} -- \texttt{97} are not
commonly used, they could be removed in capability mode.