Recent Mac machines use so-called "Apple Silicon" chips, from the M1 or M2 families
of chips. These chips are designed by Apple and use CPU cores implementing the 64-bit
Arm Instruction Set Architecture (ISA), also called arm64
in the Apple context and
elsewhere.
The arm64
ISA is described in details in the
Arm Architecture Reference Manual for A-profile architecture.
Be aware that this is a monstrous document of nearly 13,000 pages.
The name arm64e
was recently introduced by Apple. This platform ABI is currently
in "preview" on macOS. This note explores the differences between arm64
and arm64e
.
Disclaimer: I am not working for Apple or Arm. I have no internal information on
the M1 or M2 chips, or macOS. Everything in this note is based on my observations and
tests on a MacBook with an M1 chip. The explanations on how such or such feature is
implemented in macOS are just speculations, based on a rational analysis of the
technical possibilites. Moreover, all tests were performed on macOS 13 (Ventura).
Future versions of macOS may behave differently. Specifically, the arm64e
ABI,
as defined by Apple, is still in preview mode and may evolve in the future.
Contents:
- What is arm64e architecture, compared to arm64 ?
- Default PAC configuration on macOS 13 (Ventura)
- Enabling arm64e on macOS 13 (Ventura)
- Programming examples
- How does macOS control the PAC hardware ?
- Code generation differences between arm64 and arm64e
- Conclusion
Additional information in this project:
- Reference documentation on the Arm64 architecture
- List of Arm features for A-Profile
- Arm features in the Apple M1 and M2 chips
- Pointer Authentication Code format
The Arm architecture defines several versions over the years. The Apple M1 and M2 chips are said to implement the version 8.5-A of the Arm architecture (which is not completely true in the case of the M1).
At this level of Arm architecture, several new instructions were defined to implement "defensive security" against malware code injection. This type of code injection usually starts with a buffer overflow (due to a bug in an application or library) and implements Return-Oriented Programing (ROP) techniques when it smashes the stack or Jump-Oriented Programming (JOP) when it smashes other types of data structure, typically in the heap.
The mitigation techniques which are implemented by the new Arm instructions are split into two classes:
-
Pointer Code Authentication (PAC) : A deterministic but unpredictable short "signature" is added to code and data pointers to detect invalid injected code addresses. This signature is added in the 16 or 8 most significant bits of the 64-pointers, otherwise unused since no address space uses 2^64 bytes.
-
Branch Target Identification (BTI) : Targets of valid jump and branch instructions are identified by a special instruction so that using any unexpected ROP or JOP gadget address results in an invalid instruction.
See the compilation option -mbranch-protection
for gcc and clang to generate these instructions.
- The default is
-mbranch-protection=none
, PAC and BTI are not used. - With
-mbranch-protection=bti
, BTI instructions and code page protection are used. - With
-mbranch-protection=pac-ret
, Pointer Authentication is used on return addresses from function call. - Using
-mbranch-protection=standard
is a combination of the two.
In the Arm architecture, the PAC and BTI techniques can be smoothly integrated into interoperable code modules, using a common ABI. However, there are some perculiarities with the Apple M1 chip.
The first peculiarity concerns the level of Arm architecture and the corresponding
features. On macOS, some of the supported Arm features can be inspected using the
command sysctl hw.optional.arm
.
Surprisingly, the M1 chip does not support BTI while the Arm architecture reference manual describes this as a mandatory feature at version 8.5-A. On the other hand, the M1 supports other Arm features such as FlagM2 and FRINTTS which are documented by Arm as new and mandatory in version 8.5-A also, just like BTI. Therefore, the M1 implements new features which were introduced by Arm in version 8.5-A, without supporting other mandatory features of this version (namely BTI). Formally, this means that the M1 does not implement any valid version of the Arm architecture.
Not having BTI in the M1 is a surprise because Apple is deeply concerned with security, hardware security, and implements a very efficient usage of PAC.
The M2, however, implements BTI and a few other Armv8.5-A features which were missing in the M1. See the list of Arm features in the Apple M1 and M2 chips for a comparison of the two chips.
Nevertheless, the two chips M1 and M2 implement the Pointer Authentication Code (PAC) feature,
which is the main improvement of the arm64e
platform.
To enable PAC, the gcc and clang compilers support the option -mbranch-protection=pac-ret
.
As the name suggests, Pointer Authentication is used on function return addresses only.
This feature is consequently internal to each function, internal to each compilation unit.
This means that modules which were compiled with and without this option can be linked
into the same executable and remain interoperable.
However, this usage of PAC is suboptimal. There are many other additional usages of PAC, such as protecting data structures addresses or pointers to functions inside data structures (the most obvious usage is the C++ vtables). However, since these data structures are inherently shared between distinct compilation units, possibly compiled with different options, they would no longer be interoperable.
This is where Apple invented arm64e
...
Defining a new ABI allows to use the full power of PAC, on data structure, C++ vtables, etc.
As explained above, this breaks the interoperability between modules with different
compilation options. This means that all modules of an application or library shall be
compiled with consistent options. In practice, this defines a new platform ABI, named arm64e
.
Compiling a module with -arch arm64e
marks the object code with that platform name.
Building an application or library for arm64e
requires that all compilation units
were built for arm64e
. This is why defining a new platform ABI is the solution
to get the full power of PAC.
Mixing arm64
and arm64e
is only possible under strict conditions which are
explained later in this note.
At the end of this note, there is a complete demonstration on how the code generation
for arm64e
extends the protection to JOP attacks, in addition to ROP attacks.
Currently, it does not seem that this extended protection can be used on Linux,
neither using gcc nor clang.
Important note: The option -mbranch-protection=pac-ret
is a generic option
for gcc and clang on all Arm64 platforms, Linux or macOS. On the other hand,
-arch arm64e
is specific to macOS and is available in the Apple version of clang only.
Each of these two options defines its own way of generating PAC code. These two methods
are incompatible and shall not be used together. As of this writing, the Apple clang
compiler does not prevent the simultaneous usage of the two options. Using them at the
same time generates non-functional code which crashes at run time.
A bug report has been filed for that.
In practice, the M1 has the required capabilities to implement full PAC. However,
as of macOS 13 (Ventura), macOS boots in arm64
mode. The CPU is configured so
that the PAC instructions are inoperative. Even though the compiler generates them
using the compilation option -mbranch-protection=pac-ret
, these instructions do
nothing. No PAC is generated or authenticated.
We can demonstrate this using the following simple source code pacia.c
. To enforce
the demonstration on PAC instructions, we do not use the compilation option
-mbranch-protection=pac-ret
. This option generates PAC instructions but they are
transparent to the user and it is difficult to observe them. In our example,
we explicitly use the PACIA and AUTIA instructions using asm
directives and
we display their result.
#include <stdio.h>
#include <inttypes.h>
// noinline for easier inspection of generated code in main
__attribute__((noinline)) void report(uint64_t value)
{
printf("%016" PRIX64 "\n", value);
}
int main(int argc, char* argv[])
{
uint64_t data = 0x12345678;
uint64_t modifier = 2;
report(data);
asm("pacia %[reg], %[mod]" : [reg] "+r" (data) : [mod] "r" (modifier) : );
report(data);
asm("autia %[reg], %[mod]" : [reg] "+r" (data) : [mod] "r" (modifier) : );
report(data);
}
The PACIA instruction should add a Pointer Authentication Code (PAC) into the most significant bits of the value 0x0000000012345678, using "modifier" 2 (a seed). The AUTIA instruction should restore the previous value.
However, if we build the application normally (in arm64
mode), we can see that
the PACIA instruction does nothing:
$ cc pacia.c -o pacia
$
$ file pacia
pacia: Mach-O 64-bit executable arm64
$
$ ./pacia
0000000012345678
0000000012345678 <---- should have been modified
0000000012345678
It is probably a surprise to discover that the PACIA and AUTIA instructions "do not work". They are just instructions which are executed by the hardware. Their behavior is not expected to depend on the way the application was compiled.
Does this mean that the M1 chip "does not work"? Not really.
There is a special control register named SCTLR_EL1, the System Control Register for EL1. When writing selected bits into this register it is possible to selectively enable or disable each category of PAC. When these bits are zero, PACIA, AUTIA and other PAC instructions become simple NOP's. Authenticated branch and load instructions perform their functional task without address authentication. For instance, RETAA becomes a simple RET.
The register SCTLR_EL1 must be accessed from EL1 (kernel) and controls the execution of PAC instructions at EL0 (user) and EL1 (kernel). Similar registers exist for EL2 (hypervisor) and EL3 (monitor), SCTLR_EL2 and SCTLR_EL3.
Speculation: The macOS kernel probably controls the way the PAC instructions behave,
depending on the software target platform. When macOS boots in arm64
mode, the kernel
probably configures SCTLR_EL1 to disable the PAC features before booting.
In order to enable the full PAC feature, we need to create an arm64e
binary.
This is done using the compilation and linker option -arch arm64e
. The resulting
binary is marked as arm64e
instead of arm64
. This can be checked using the command file
.
However, an arm64e
application is not allowed to run on an arm64
system.
Trying to run such an application immediately crashes with Killed: 9
:
$ cc -arch arm64e pacia.c -o pacia
$
$ file pacia
pacia: Mach-O 64-bit executable arm64e
$
$ ./pacia
Killed: 9
Since all systems running macOS 13 (Ventura) boot in arm64
mode, we cannot run
our arm64e
applications by default. We need to enable the arm64e
mode first.
To enable arm64e
applications on macOS 13 (Ventura), we need to do two things:
- Boot macOS in
arm64e
mode. - Disable system integrity.
Remember that, as of macOS 13 (Ventura), arm64e
is in preview mode. The macOS
system integrity feature currently prevents arm64e
applications from running.
Booting in arm64e
mode with system integrity enabled still prevents the arm64e
application from running.
More precisely, it prevents third-party arm64e
applications from running. All
executables in /bin
or /usr/bin
are already built in arm64e
mode and run correctly.
$ file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/bin/ls (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/ls (for architecture arm64e): Mach-O 64-bit executable arm64e
A few system executables are even built for both architectures, arm64
and arm64e
:
$ file /usr/bin/file
/usr/bin/file: Mach-O universal binary with 3 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/file (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/file (for architecture arm64): Mach-O 64-bit executable arm64
/usr/bin/file (for architecture arm64e): Mach-O 64-bit executable arm64e
However, if we build an application in arm64e
mode ourselves, running it ends
in Killed: 9
. To be allowed to run an arm64e
executable, it must be in the
part of the file system which is protected by the system integrity. The following
commands demonstrate that a system arm64e
executable stops working when it is
copied outside the system-protected area.
$ file /bin/echo
/bin/echo: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/bin/echo (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/echo (for architecture arm64e): Mach-O 64-bit executable arm64e
$ /bin/echo hi
hi
$ cp /bin/echo .
$ ./echo hi
Killed: 9
If we want to test arm64e
executables, we must first disable the system integrity.
This puts our system at risks and should be done on test systems only.
To disable system integrity, we must reboot in "recovery mode":
- First, shutdown the system. With power off. Do not simply restart the system. We are not allowed to disable the system integrity if the system was restarted in recovery mode. The system must have booted in recovery mode from a power off state. This is a clever security precaution to make sure that the operation is manually performed by a human and not automatically triggered by some malware.
- Boot the system, keeping the finger on the power button to display the boot options. On an iMac, or MacBook, the procedure is the same, keep the finger on the power button. Select "Options", "Continue", select a user, "Next", enter our password, open a terminal from the "Utilities" top menu.
- Enter command
csrutil disable
. This command prompts for our user name and password again. - Reboot the system.
On macOS 14 (Sonoma), the procedure is slightly different. After booting in recovery
mode and selecting "Options", you can directly open a terminal. The user's password
will be asked by csrutil disable
.
Note: To restore the system integrity after tests, perform the same recovery
mode procedure with command csrutil enable
instead of disable
.
At that point, the arm64e
mode is not enabled yet. We need to add the boot flag
-arm64e_preview_abi
:
sudo nvram boot-args=-arm64e_preview_abi
And then reboot the system to use that flag.
After the reboot, we can run our arm64e
applications:
$ ./pacia
0000000012345678
917B000012345678
0000000012345678
$ ./pacia
0000000012345678
6258800012345678
0000000012345678
$ ./pacia
0000000012345678
5418800012345678
0000000012345678
We can notice that the PACIA instruction now works as expected. The 17 most significant bits of the 64-bit value are altered with a PAC. Using AUTIA on the same value, using the same "modifier", the previous value is correctly restored.
Looking carefully, we notice that 17 bits are altered, not 16. This is explained in details in another note of this project, about Pointer Authentication Code format.
Also note that the PAC value changes at each execution. This is due to the kernel assigning a different value for the PAC key in each process. Thus, if an attacker is able to learn the updated value for a pointer using a test, this value won't be valid in any other occurrence of the same program.
The arm64
executables and libraries can call arm64e
libraries, but not the
opposite. The arm64e
libraries are secured against external code injections
and can be called from non-secured applications.
Non-secured applications (arm64
) do not expect any protection and do not care
about the security of the libraries they call. This is why they can call both
types of libraries.
Secured application (arm64e
), on the other hand, expect protection against code
injection. This is why they are not allowed to call non-secured libraries (arm64
)
because code injection could happen during the execution of one of these libraries.
On macOS 13 (Ventura), the system library cache contains arm64e
binaries only,
no arm64
. All system libraries, as installed with macOS are secured against code
injection, even before arm64e
is officially supported for applications.
Content of the system library cache:
$ ls -l /System/Volumes/Preboot/Cryptexes/OS/System/DriverKit/System/Library/dyld
total 9088
-rwxr-xr-x 1 root admin 6324224 Dec 2 12:37 dyld_shared_cache_arm64e
-rwxr-xr-x 1 root admin 409600 Dec 2 12:37 dyld_shared_cache_arm64e.symbols
-rwxr-xr-x 1 root admin 7012352 Dec 2 12:37 dyld_shared_cache_x86_64
-rwxr-xr-x 1 root admin 393216 Dec 2 12:37 dyld_shared_cache_x86_64.symbols
The following C/C++ macros are predefined when compiled with -arch arm64e
:
#define __PTRAUTH_INTRINSICS__ 1
#define __arm64e__ 1
#define __ptrauth_abi_version__ 0
Demonstration:
$ cc -E -x c -dM /dev/null >macros-arm64.txt
$ cc -E -arch arm64e -x c -dM /dev/null >macros-arm64e.txt
$ diff macros-arm64.txt macros-arm64e.txt
239a240
> #define __PTRAUTH_INTRINSICS__ 1
379a381
> #define __arm64e__ 1
392a395
> #define __ptrauth_abi_version__ 0
$
We are going to build a shared library and an application which calls it and
try all possible combinations of arm64
and arm64e
.
File libtest.h
:
#if !defined(LIBTEST_H)
#define LIBTEST_H 1
#if defined(__arm64e__)
#define ARCH "arm64e"
#elif defined(__arm64__)
#define ARCH "arm64"
#else
#define ARCH "other"
#endif
void libtest();
#endif // LIBTEST_H
File libtest.c
:
#include <stdio.h>
#include "libtest.h"
void libtest()
{
printf("in libtest, arch is %s\n", ARCH);
}
File main.c
:
#include <stdio.h>
#include "libtest.h"
int main(int argc, char* argv[])
{
printf("enter main, arch is %s\n", ARCH);
libtest();
printf("leave main, arch is %s\n", ARCH);
}
We build the shared library in arm64
mode:
$ cc -c libtest.c -o libtest.arm64.o
$ cc -shared libtest.arm64.o -o libtest.arm64.dylib
And then in arm64e
mode:
$ cc -arch arm64e -c libtest.c -o libtest.arm64e.o
$ cc -arch arm64e -shared libtest.arm64e.o -o libtest.arm64e.dylib
Checking the result:
$ file libtest.*.dylib
libtest.arm64.dylib: Mach-O 64-bit dynamically linked shared library arm64
libtest.arm64e.dylib: Mach-O 64-bit dynamically linked shared library arm64e
We build the application in arm64
mode, in two versions: one calling the
arm64
library and the other one calling the arm64e
library:
$ cc -c main.c -o main.arm64.o
$ cc main.arm64.o -L. -ltest.arm64 -o main.arm64
$ cc main.arm64.o -L. -ltest.arm64e -o main.arm64-to-64e
Checking the result of the build:
$ file main.arm64 main.arm64-to-64e
main.arm64: Mach-O 64-bit executable arm64
main.arm64-to-64e: Mach-O 64-bit executable arm64
$
$ otool -L main.arm64 main.arm64-to-64e
main.arm64:
libtest.arm64.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)
main.arm64-to-64e:
libtest.arm64e.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)
Verifying that the right library is called in each case:
$ ./main.arm64
enter main, arch is arm64
in libtest, arch is arm64
leave main, arch is arm64
$
$ ./main.arm64-to-64e
enter main, arch is arm64
in libtest, arch is arm64e
leave main, arch is arm64
Now, let's try to build the application in arm64e
secured mode with both libraries:
$ cc -arch arm64e -c main.c -o main.arm64e.o
$ cc -arch arm64e main.arm64e.o -L. -ltest.arm64e -o main.arm64e
$ cc -arch arm64e main.arm64e.o -L. -ltest.arm64 -o main.arm64e-to-64
ld: warning: ignoring file ./libtest.arm64.dylib, building for macOS-arm64e but attempting to link with file built for macOS-arm64
Undefined symbols for architecture arm64e:
"_libtest", referenced from:
_main in main.arm64e.o
ld: symbol(s) not found for architecture arm64e
clang: error: linker command failed with exit code 1 (use -v to see invocation)
We can see that an arm64e
application can be linked against arm64e
libraries only. The linker emits an error when trying to link against an arm64
library.
Verifying that the right library is called in the only case which works, full arm64e
:
$ ./main.arm64e
enter main, arch is arm64e
in libtest, arch is arm64e
leave main, arch is arm64e
We have already demonstrated that macOS can enable or disable the pointer authentication in the PAC instructions, probably using a configuration of the special system register SCTLR_EL1 from the kernel.
Additionally, even in arm64e
mode, the PAC hardware is configured in a way which
is different from other platforms such as Linux (in a more secure way).
The "signature" part of the authenticated pointers is generated from a "key". There are several PAC key registers (named APIAKey_EL1, APIBKey_EL1, APDAKey_EL1, APDBKeyEL1, APGAKey_EL1). Usually, the kernel of the operating system selects random PAC keys when a process is created and writes these keys into the PAC key registers each time the process is scheduled. As indicated in their names, the PAC key registers are accessible at EL1 (kernel).
This is demonstrated in this project which allows an application to manipulate some system registers which are normally accessible to the kernel only. To do this, a special kernel module was developed with Linux, Windows and macOS variants.
On Linux and Windows, we can change the value of the PAC keys inside a process and observe the impact on the PAC instructions. We can conclude that our special kernel module was allowed to read and write the PAC key registers.
However, on macOS, the same sample code crashes the system. Why does accessing EL1 systems registers from EL1 crash the system?
According to the Arm architecture reference manual, there are two ways to prevent access to the PAC key registers from EL1.
-
EL2 hypervisor control: Using the system register HCR_EL2, the Hypervisor Configuration Register, it is possible force all accesses to PAC key registers to generate a trap at EL2. Finer-grained access control can be also set using registers HFGRTR_EL2 and HFGWTR_EL2.
-
EL3 monitor control: Using the system register SCR_EL3, the Secure Configuration Register, it is possible force all accesses to PAC key registers to generate a trap at EL3.
The first method is typically used by hypervisors to control or emulate the PAC features in a virtual machine. It is unlikely that this is used by macOS. Moreover, when running on the macOS host system, it appears that HCR_EL2 is still readable from EL1 on the M1, and we see that HCR_EL2.APK = 1, meaning "no trap on accessing the PAC key registers". Consequently, the second method is probably used here.
Speculation: There is probably a specific Apple EL3 monitor which includes traps for PAC key register access. When the macOS kernel schedules a process, it probably calls the EL3 monitor to configure the PAC keys registers on behalf of the kernel. The details of this process are unknown.
This way of accessing the PAC key registers is more secure than in Linux where the EL1 kernel is allowed to fully configure the PAC. Using macOS, the kernel is not even trusted. This is clever because, if an attacker takes control of a Linux kernel, it can completely control or disable the PAC. This is not possible on macOS. In practice, "taking control of the kernel to hack the PAC keys" is exactly what this project does. It succeeds on Linux and fails on macOS. On a security scale, macOS wins.
When a Linux virtual machine is run on the macOS host, Pointer Authentication returns to a simpler mechanism when used inside the virtual machine.
- The PAC instructions work as expected, even when the macOS host booted in
arm64
mode. These instructions are inoperative in the macOS host OS but work as expected in the Linux guest OS. - The Linux kernel can fully control the PAC registers at EL1.
Speculation: The EL2 hypervisor probably emulates the PAC mechanism. HCR_EL2, the Hypervisor Configuration Register, contains two bits, API and APK, which can be configured as follow:
- HCR_EL2.API = 0 : any PAC instruction is trapped at EL2.
- HCR_EL2.APK = 0 : any access to the PAC registers is trapped at EL2.
Using this configuration, the EL2 hypervisor is able to take full control of the PAC inside the virtual machine, regardless of any implementation or configuration on the host.
Using the pacia
small program, we have demonstrated that the PAC instructions are
fully functional in an arm64e
running on an arm64e
system.
Using the small libtest
library, we have also demonstrated how to mix arm64
applications with arm64e
libraries. What is the behavior of the PAC instructions
in that context?
Let's add the following function to libtest.h
. It displays the result of the PACIA
and AUTIA instructions on one line. We use the always_inline
attribute to ensure
that the code is compiled as part of each compilation unit.
static inline __attribute__((always_inline)) void test_pacia(void)
{
uint64_t modifier = 2;
uint64_t data1 = 0x12345678;
uint64_t data2 = data1;
asm("pacia %[reg], %[mod]" : [reg] "+r" (data2) : [mod] "r" (modifier) : );
uint64_t data3 = data2;
asm("autia %[reg], %[mod]" : [reg] "+r" (data3) : [mod] "r" (modifier) : );
printf("%016" PRIX64 " -> %016" PRIX64 " -> %016" PRIX64 "\n", data1, data2, data3);
}
We call the test_pacia
function in the main application and in the library,
right after each message.
Let's rebuild all variants and test them.
$ ./main.arm64
enter main, arch is arm64
0000000012345678 -> 0000000012345678 -> 0000000012345678
in libtest, arch is arm64
0000000012345678 -> 0000000012345678 -> 0000000012345678
leave main, arch is arm64
0000000012345678 -> 0000000012345678 -> 0000000012345678
$
$ ./main.arm64-to-64e
enter main, arch is arm64
0000000012345678 -> 0000000012345678 -> 0000000012345678
in libtest, arch is arm64e
0000000012345678 -> 0000000012345678 -> 0000000012345678
leave main, arch is arm64
0000000012345678 -> 0000000012345678 -> 0000000012345678
$
$ ./main.arm64e
enter main, arch is arm64e
0000000012345678 -> 6C75000012345678 -> 0000000012345678
in libtest, arch is arm64e
0000000012345678 -> 6C75000012345678 -> 0000000012345678
leave main, arch is arm64e
0000000012345678 -> 6C75000012345678 -> 0000000012345678
We can see that the PAC instructions are disabled when running an arm64
application,
even on an arm64e
system. Additionally, when the arm64
application calls an arm64e
library, the PAC instructions are also disabled when running code inside the arm64e
library. This is required since the arm64e
library authenticates all pointers it
manipulates. When called from an arm64
application, the library receives plain
unauthenticated pointers and the library should not fail when manipulating them.
Only in the last case, when an arm64e
application is run, the PAC instructions are effective.
Speculation: When the macOS kernel schedules an arm64
application on a core,
it probably reconfigures the SCTLR_EL1 register to disable the PAC instructions.
This section demonstrates the differences between the arm64
ABI with option
-mbranch-protection=pac-ret
as used on Linux and the Apple arm64e
ABI.
In the Linux case, the added security only protects the function call return addresses on stack. This protection mitigates all basic ROP chains (Return-Oriented Programming), usually starting from a stack overflow.
However, this method does not protect against JOP attacks (Jump-Oriented Programming), starting from general buffer overflows on the heap. In the latter case, the corrupted addresses are in data structures, not necessarity on the stack. Object-oriented programming languages use a lot of structures containing pointers to code. In C++, this is called a "vtable". There is one per class (when the class contains virtual methods).
Hacking the C++ vtables has now become popular in malware injection. This is why we
need to prevent these attacks. The Arm PAC technology is capable of this. However,
this is not used in the arm64
ABI with option -mbranch-protection=pac-ret
.
On the other hand, the Apple arm64e
ABI does. Let's see how.
Consider this sample test.cpp
code to call a C++ virtual method:
class C
{
public:
virtual void f() const;
virtual void g() const;
};
int test(const C& obj)
{
obj.g();
return 0; // to avoid tail call optimization
}
To understand the generated code, we need to review the standard way of implementing
polymorphism through vtables in C++ (other object-oriented languages use similar
techniques). Consider an object instance at address O
, of class C. There is a
unique vtable at address V
for class C. Each object instance of that class points
to the vtable of its exact class using a hidden or implicit field at the beginning
of the object data structure.
object instance class C vtable
+---------------+ +------------+
O ----->|+0: V |------->|+0: F |--------> C::f() code
+---------------+ +------------+
|+8: other | |+8: G |--------> C::g() code
| object | +------------+
| fields |
+---------------+
On Linux Ubuntu 22.10 for arm64, we use gcc version 12.2.0 or clang version 15.0.2-1. The following commands display the generated code (with demangled names):
$ gcc -O2 -march=armv8.5-a -mbranch-protection=pac-ret test.cpp -S -o - | c++filt
$ clang -O2 -march=armv8.5-a -mbranch-protection=pac-ret test.cpp -S -o - | c++filt
The generated code for the test()
function is exactly the same with the two
compilers, except the allocation of registers (x1 with gcc, x8 with clang).
The call to the virtual method has been isolated between dashed lines.
The virtual call sequence uses 3 instructions.
paciasp ; [*] add a PAC to return address in x30 before pushing it on stack (key = IA, mod = current stack pointer)
stp x29, x30, [sp, #-16]!
mov x29, sp
----------------- ; x0 == first parameter of test() function == object address O
ldr x8, [x0] ; x8 = vtable address V, fetched at offset 0 in the object instance
ldr x8, [x8, #8] ; x8 = virtual method address G, fetched at offset 8 in the vtable
blr x8 ; 1) branch with link to method C::g()
; 2) x0 still contains the object address (implicit first argument of a non-static method)
-----------------
mov w0, wzr
ldp x29, x30, [sp], #16
retaa ; [*] authenticate the return address in x30 before jumping to it
The two lines which are marked with [*]
result from the option -mbranch-protection=pac-ret
.
Thanks to option -march=armv8.5-a
, the return sequence has been optimized as
one single retaa
instruction. Without this option, in order to remain compatible
with previous versions of the Arm architecture, the sequence uses two instructions:
autiasp
for the authentication alone, followed by a standard ret
.
On macOS 13.2, we use Apple clang version 14.0.0 and the command becomes:
$ clang -O2 -arch arm64e test.cpp -S -o - | c++filt
The generated code for the test()
function is shown below. The stack protection is
the same as on Linux, except that it uses the PAC key IB instead of IA, which does
not make any difference in terms of security. However, unlike the Linux code, we can
also see how the C++ vtable is protected.
In practice, we will see that 3 different PAC keys are used on macOS for 3 different types of protection. On Linux, only the last protection is used.
- IA : protects the content of C++ vtables (against JOP attacks).
- DA : protects the content of C++ object instances (intermediate step toward a JOP attack).
- IB : protects the return addresses on stack (against ROP attacks).
pacibsp
stp x29, x30, [sp, #-16]!
mov x29, sp
----------------- ; x0 == first parameter of test() function == object address O
ldr x16, [x0] ; x16 = vtable address V with PAC, fetched at start of object instance
mov x17, x0 ; x17 = object address O
movk x17, #20692, lsl #48 ; tweak object address: store 20692 (0x50D4) in upper 16 bits of O
autda x16, x17 ; authenticate vtable address in x16 (key = DA, mod = tweaked object address)
ldr x8, [x16, #8]! ; 1) x8 = virtual method address G with PAC, fetched at offset 8 in the vtable
; 2) x16 = plain address of method address inside vtable (V+8)
mov x9, x16 ; [*] useless intermediate step ?
mov x17, x9 ; [*] x17 = plain address of method address inside vtable (still useless, could have kept x16)
movk x17, #30081, lsl #48 ; tweak vtable address V+8: store #30081 (0x7581) in upper 16 bits of V+8
blraa x8, x17 ; 1) authenticate method address in x8 (key = IA, mod = tweaked vtable address)
; 2) branch with link to method C::g()
; 3) x0 still contains the object address (implicit first argument of a non-static method)
-----------------
mov w0, #0
ldp x29, x30, [sp], #16
retab
This sequence uses 9 instructions instead of 3 in the unauthenticate method.
However, there is some room for improvement. The two instructions which are
marked with [*]
are useless and the two registers x8 and x9 are allocated
for nothing. But that would still make 7 instructions instead of 3.
Using option -O3
does not improve the code.
The code pattern which is recommended by Arm to call an authenticated virtual method is the following (x16 and x17 are arbitrary choices here).
ldr x16, [x0] ; x16 = vtable address V with PAC (x0 == object address == O)
ldraa x17, [x16, #8]! ; 1) authenticate vtable address in x16 (key = DA, mod = 0)
; 2) x17 = virtual method address G with PAC, fetched at offset 8 in the vtable
; 3) x16 = plain address of method address in vtable == V+8
blraa x17, x16 ; 1) authenticate method address in x17 (key = IA, mod = address of method address in vtable)
; 2) branch with link to virtual method
Using this method, calling an authenticated virtual method uses only 3 instructions, exactly the same number as an unauthenticated call.
However, there are differences between the two ABI:
- Recommended Arm method:
- The vtable address (V) is authenticated using zero as modifier.
- The virtual method address (G) is authenticated using a modifier which is the address of where G is stored in the vtable (V + some offset, modifier = V+8 in our example).
- Apple
arm64e
ABI:- The vtable address (V) is authenticated using a modifier which is the object instance address (O) with a 16-bit tweak (Tv) in the upper 2 bytes of the address O (modifier = Tv||O). Using several class types demonstrates that the value of Tv (20692 in the example) is different for each class. Since it must be consistent across compilation units, we can speculate that it is derived from the fully qualified name of the class.
- The virtual method address (G) is authenticated using a modifier which is the address where it is stored inside the vtable (V+8 in the example) with a 16-bit tweak (Tm) in the upper 2 bytes of the address V+8 (modifier = Tm||(V+8)). Using several virtual methods for the same class and several class types demonstrates that the value of Tm (30081 in the example) is different for each virtual method of the same class and different from virtual methods in other classes at the same offset in the vtable.
Assuming that Apple clang is improved to use 7 instructions instead of 9 for the virtual method call, where do the remaining extra 4 instructions come from?
- 3 additional instructions to build the tweaks
- 1 additional instruction to load the method address from the vtable:
ldraa
uses modifier zero, we need to split this inautda
andldr
if we use an explicit modifier.
Why this extra complexity in the Apple arm64e
ABI?
- There is some benefit in using the object instance address as modifier for the vtable address. It mitigates some potential polymorphic attack. If we steal the vtable address of an object O1 of class C1 and overwrite the start of an object O2 of class C2 with that solen vtable address, then object O2 will behave as if it was of class C1 instead of C2. This could be a security risk which is not mitigated by the Arm recommended method.
- The benefit of adding tweaks in the modifier is explained in this document from the Apple LLVM project. They call this a "discriminator". The discriminator is a constant value which is derived from the fully qualified name of the class (for the pointer to the vtable) or the method (for pointers inside the vtable).
The Pointer Authentication Code (PAC) Arm features are extremely powerful to mitigate current malware injection techniques, ROP and JOP. However, PAC must be used and properly used. The current mainstream versions of gcc and clang do not use PAC by default. Moreover, when they use it, they do not use the full power of PAC and only address the basic ROP attacks (smashing the return pointer on stack).
The Apple arm64e
ABI, on the other hand, is carefully designed to use PAC in most
possible situations. This platform ABI is currently used by default on iOS (iPhone and
iPad) but not yet on macOS. We look forward to have arm64e
enabled by default on macOS.
We also look forward to see a similar ABI with gcc and clang. This will be a new ABI,
breaking the compatibility with previous ones. However, this is not the first time gcc
will introduce a new level of ABI. The link constraints have been solved using
dedicated GLIBCXX_
symbols or similar tricks.