Skip to content

Commit

Permalink
unix: stop using weak symbols with Python 3.8
Browse files Browse the repository at this point in the history
The bane of weak symbols on macOS has come back to haunt us.
(See indygreg/PyOxidizer#373 for previous
battles.)

In #122
we tracked down a runtime failure to the fact that CPython 3.8
didn't properly backport weak symbol handling support. So, if
you build with a modern SDK targeting an older SDK (which we do as
of 63f13fb), the linker will
insert a weak symbol. However, CPython doesn't have the runtime
guards and will attempt to dereference it, causing a crash.

Up to this point, our strategy for handling this mess was to
stop using symbols on all Python versions when we found one to
be causing an issue. This was crude, but effective.

In recent commits, we implemented support for leveraging the
macOS SDK .tbd files for validating symbol presence. We can now
cross reference undefined symbols in our binaries against what
the SDKs tell us is present and screen for missing symbols. This
helps us detect strong symbols that aren't present on targeted
SDK versions.

For weak symbols, I'm not sure if we can statically analyze the
Mach-O to determine if a symbol is guarded. I _think_ the guard
is a compiler built-in and gets converted to a function call, or
maybe inline assembly. We _might_ have to disassemble if we wanted
to catch unguarded weakly referenced symbols. Yeah, no.

In this commit, we effectively change our strategy for weak symbol
handling.

Knowing that CPython 3.9+ should have guarded weak symbols everywhere,
we only ban symbol use on CPython 3.8, specifically x86-64 3.8 since
the aarch64 build targets macOS SDK 11, which has the symbols we
need.

We also remove the one-off validation check for 2 banned symbols. In
its place we add validation that only a specific allow list of weak
symbols is present on CPython 3.8 builds.

As part of developing this, I discovered that libffi was also using
mkostemp without runtime guards. I'm unsure if Python would ever call
into a function that would attempt to resolve this symbol. But if it
does it would crash on 10.9. So we disable that symbol for builds
targeting 10.9.
  • Loading branch information
indygreg committed Apr 28, 2022
1 parent 12f2557 commit e69198e
Show file tree
Hide file tree
Showing 5 changed files with 74 additions and 32 deletions.
10 changes: 8 additions & 2 deletions .github/workflows/apple.yml
Original file line number Diff line number Diff line change
Expand Up @@ -219,10 +219,16 @@ jobs:
- toolchain
runs-on: 'macos-11'
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
with:
fetch-depth: 0

- uses: actions/checkout@v3
with:
repository: 'phracker/MacOSX-SDKs'
ref: master
path: macosx-sdks

- name: Install Python
uses: actions/setup-python@v2
with:
Expand Down Expand Up @@ -267,7 +273,7 @@ jobs:
EXTRA_ARGS="--run"
fi
build/pythonbuild validate-distribution ${EXTRA_ARGS} dist/*.tar.zst
build/pythonbuild validate-distribution --macos-sdks-path macosx-sdks ${EXTRA_ARGS} dist/*.tar.zst
- name: Upload Distributions
uses: actions/upload-artifact@v2
Expand Down
24 changes: 13 additions & 11 deletions cpython-unix/build-cpython.sh
Original file line number Diff line number Diff line change
Expand Up @@ -754,18 +754,20 @@ if [ "${PYBUILD_PLATFORM}" = "macos" ]; then
# as Homebrew or MacPorts. So nerf the check to prevent this.
CONFIGURE_FLAGS="${CONFIGURE_FLAGS} ac_cv_lib_intl_textdomain=no"

# When building against an 11.0+ SDK, preadv() and pwritev() are
# detected and used, despite only being available in the 11.0+ SDK. This
# prevents object files from re-linking when built with older SDKs.
# So we disable them. But not in aarch64-apple-darwin, as that target
# requires the 11.0 SDK.
# CPython 3.9+ have proper support for weakly referenced symbols and
# runtime availability guards. CPython 3.8 will emit weak symbol references
# (this happens automatically when linking due to SDK version targeting).
# However CPython lacks the runtime availability guards for most symbols.
# This results in runtime failures when attempting to resolve/call the
# symbol.
#
# This solution is less than ideal. Modern versions of Python support
# weak linking and it should be possible to coerce these functions into
# being weakly linked.
if [ "${TARGET_TRIPLE}" != "aarch64-apple-darwin" ]; then
CONFIGURE_FLAGS="${CONFIGURE_FLAGS} ac_cv_func_preadv=no"
CONFIGURE_FLAGS="${CONFIGURE_FLAGS} ac_cv_func_pwritev=no"
# Unfortunately, this means we need to ban weak symbols on CPython 3.8, to
# the detriment of performance. However, we can actually use the symbols
# on aarch64 because it targets macOS SDK 11.0, not 10.9.
if [[ "${PYTHON_MAJMIN_VERSION}" = "3.8" && "${TARGET_TRIPLE}" != "aarch64-apple-darwin" ]]; then
for symbol in clock_getres clock_gettime clock_settime faccessat fchmodat fchownat fdopendir fstatat getentropy linkat mkdirat openat preadv pwritev readlinkat renameat symlinkat unlinkat; do
CONFIGURE_FLAGS="${CONFIGURE_FLAGS} ac_cv_func_${symbol}=no"
done
fi

if [ -n "${CROSS_COMPILING}" ]; then
Expand Down
11 changes: 10 additions & 1 deletion cpython-unix/build-libffi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,20 @@ tar -xf libffi-${LIBFFI_VERSION}.tar.gz

pushd libffi-${LIBFFI_VERSION}

EXTRA_CONFIGURE=

# mkostemp() was introduced in macOS 10.10 and libffi doesn't have
# runtime guards for it. So ban the symbol when targeting old macOS.
if [ "${APPLE_MIN_DEPLOYMENT_TARGET}" = "10.9" ]; then
EXTRA_CONFIGURE="${EXTRA_CONFIGURE} ac_cv_func_mkostemp=no"
fi

CFLAGS="${EXTRA_TARGET_CFLAGS} -fPIC" CPPFLAGS="${EXTRA_TARGET_CFLAGS} -fPIC" LDFLAGS="${EXTRA_TARGET_LDFLAGS}" ./configure \
--build=${BUILD_TRIPLE} \
--host=${TARGET_TRIPLE} \
--prefix=/tools/deps \
--disable-shared
--disable-shared \
${EXTRA_CONFIGURE}

make -j ${NUM_CPUS}
make -j ${NUM_CPUS} install DESTDIR=${ROOT}/out
4 changes: 2 additions & 2 deletions src/macho.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ pub struct MachOAllowedDylib {
#[derive(Clone, Debug, Default)]
pub struct LibrarySymbols {
/// Symbol name -> source paths that require them.
symbols: BTreeMap<String, BTreeSet<PathBuf>>,
pub symbols: BTreeMap<String, BTreeSet<PathBuf>>,
}

impl LibrarySymbols {
Expand All @@ -95,7 +95,7 @@ impl LibrarySymbols {
/// Holds required symbols, indexed by library.
#[derive(Clone, Debug, Default)]
pub struct RequiredSymbols {
libraries: BTreeMap<String, LibrarySymbols>,
pub libraries: BTreeMap<String, LibrarySymbols>,
}

impl RequiredSymbols {
Expand Down
57 changes: 41 additions & 16 deletions src/validation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -420,12 +420,24 @@ const ELF_BANNED_SYMBOLS: &[&str] = &[
"pthread_yield",
];

/// Symbols that we don't want to appear in mach-o binaries.
const MACHO_BANNED_SYMBOLS_NON_AARCH64: &[&str] = &[
// _readv and _pwritev are introduced when building with the macOS 11 SDK.
// If present, they can cause errors re-linking object files. So we ban their
// existence.
"_preadv", "_pwritev",
/// Mach-O symbols that can be weakly linked on Python 3.8.
const MACHO_ALLOWED_WEAK_SYMBOLS_38_NON_AARCH64: &[&str] = &[
// Internal to Apple SDK. However, the symbol isn't guarded properly in some Apple
// SDKs. See https://github.com/indygreg/PyOxidizer/issues/373.
"___darwin_check_fd_set_overflow",
// Appears to get inserted by Clang.
"_dispatch_once_f",
// Used by CPython. But is has runtime availability guards in 3.8 (one of the few
// symbols that does).
"__dyld_shared_cache_contains_path",
// Used by CPython without guards but the symbol is so old it doesn't matter.
"_inet_aton",
// Used by tk. It does availability guards properly.
"_NSAppearanceNameDarkAqua",
// Older than 10.9.
"_fstatvfs",
"_lchown",
"_statvfs",
];

static WANTED_WINDOWS_STATIC_PATHS: Lazy<BTreeSet<PathBuf>> = Lazy::new(|| {
Expand Down Expand Up @@ -777,16 +789,6 @@ fn validate_macho<Mach: MachHeader<Endian = Endianness>>(
library_ordinal: symbol.library_ordinal(endian),
weak: symbol.n_desc(endian) & (object::macho::N_WEAK_REF) != 0,
});

if target_triple != "aarch64-apple-darwin"
&& MACHO_BANNED_SYMBOLS_NON_AARCH64.contains(&name.as_str())
{
context.errors.push(format!(
"{} references unallowed symbol {}",
path.display(),
name
));
}
}
}
}
Expand Down Expand Up @@ -1153,6 +1155,29 @@ fn validate_distribution(
}
}

// On Apple Python 3.8 we need to ban most weak symbol references because 3.8 doesn't have
// the proper runtime guards in place to prevent them from being resolved at runtime,
// which would lead to a crash. See
// https://github.com/indygreg/python-build-standalone/pull/122.
if python_major_minor == "3.8" && *triple != "aarch64-apple-darwin" {
for (lib, symbols) in &context.macho_undefined_symbols_weak.libraries {
for (symbol, paths) in &symbols.symbols {
if MACHO_ALLOWED_WEAK_SYMBOLS_38_NON_AARCH64.contains(&symbol.as_str()) {
continue;
}

for path in paths {
context.errors.push(format!(
"{} has weak symbol {}:{}, which is not allowed on Python 3.8",
path.display(),
lib,
symbol
));
}
}
}
}

// Validate Mach-O symbols and libraries against what the SDKs say. This is only supported
// on macOS.
if triple.contains("-apple-darwin") {
Expand Down

0 comments on commit e69198e

Please sign in to comment.