Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FFI/Jextract] Failure to load a complex archive file (.a) on AIX in the jextract generation #19930

Closed
ChengJin01 opened this issue Jul 29, 2024 · 17 comments
Assignees
Labels
comp:vm os:aix project:panama Used to track Project Panama related work

Comments

@ChengJin01
Copy link

ChengJin01 commented Jul 29, 2024

The issue was detected when generating the jextract tool for FFI after the issue with native library loading (.a) was resolved (which is entirely different from the original problem being addressed in #19344).

As explained in https://github.com/openjdk/jextract, the generation of the jextract tool requires the LLVM libraries in place, part of which (e.g. libclang.a) must be loaded by JDK to exploit the native functions in building the tool where the loading failure occurs in calling dlopen (see #19344 (comment) for details), which most likely happens to the code around there at https://github.com/eclipse-openj9/openj9-omr/blob/9083c8237ac215927ac55b5db256780132983136/port/aix/omrsl.c#L216).

Technically, the existing code dealing with dlopen currently only works to load a simple shared object (extremely simple format) suffixed with .a but fails to support a complex archive file combined with many shared objects (like libc.a or libclang.a in LLVM). To address the problem, we need to figure out how dlopen works to load these archives at first, especially in the case of libc.a/libclang.a.

FYI: @TobiAjila, @pshipton, @JasonFengJ9, @zl-wang, @keithc-ca

@ChengJin01 ChengJin01 added comp:vm project:panama Used to track Project Panama related work os:aix labels Jul 29, 2024
@ChengJin01
Copy link
Author

@zl-wang, is there any way to reach out to the AIX team to understand the details as to how dlopen works in such case?

@ChengJin01 ChengJin01 changed the title [FFI/Jextract] Failure to load native library (.a) file on AIX in the jextract generation [FFI/Jextract] Failure to load a complex archive file (.a) on AIX in the jextract generation Jul 29, 2024
@zl-wang
Copy link
Contributor

zl-wang commented Jul 29, 2024

@ChengJin01 this has been always like that on AIX. when there are multiple members in an archive, dlopen needs to name the specific member you want to load. I am going to look up man-page of dlopen and attach here later.

@zl-wang
Copy link
Contributor

zl-wang commented Jul 29, 2024

dlopen Subroutine

Last Updated: 2023-03-24

Purpose

Dynamically loads a module into the calling process.

Syntax

#include <dlfcn.h>
void *dlopen (FilePath, Flags);
const char *FilePath;
int Flags;

Description

The dlopen subroutine loads the module specified by FilePath into the executing process's address space. Dependents of the module are automatically loaded as well. If the module is already loaded, it is not loaded again, but a new, unique value will be returned by the dlopen subroutine.

The dlopen subroutine is a portable way of dynamically loading shared libraries. It performs C++ static initialization of the modules that it loads, like the loadAndInit subroutine does.

The value returned by the dlopen might be used in subsequent calls to dlsym and dlclose. If an error occurs during the operation, dlopen returns NULL.

If the main application was linked with the -brtl option, then the runtime linker is invoked by dlopen. If the module being loaded was linked with runtime linking enabled, both intra-module and inter-module references are overridden by any symbols available in the main application. If runtime linking was enabled, but the module was not built enabled, then all inter-module references will be overridden, but some intra-module references will not be overridden.

If the module being opened with dlopen or any of its dependents is being loaded for the first time, initialization routines for these newly-loaded routines are called (after runtime linking, if applicable) before dlopen returns. Initialization routines are the functions specified with the -binitfini: linker option when the module was built. (See the ld command for more information about this option.)

After calling the initialization functions for all newly-loaded modules, C++ static initialization is performed. If you call the dlopen subroutine from within an initialization function or a C++ static initialization function, modules loaded by the nested dlopen subroutine might be initialized before completely initializing the originally loaded modules.

If a dlopen subroutine is called from within a binitfini function, the initialization of the current module is abandoned for other modules.

Note: If the module being loaded has read-other permission, the module is loaded into the global shared library segment. Modules loaded into the global shared library segment are not unloaded even if they are no longer being used. Use the slibclean command to remove unused modules from the global shared library segment. To load the module in the process private region, unload the module completely using the slibclean command, and then unset its read-other permission.

The LIBPATH or LD_LIBRARY_PATH environment variables can be used to specify a list of directories in which the dlopen subroutine searches for the named module. The running application also contains a set of library search paths that were specified when the application was linked. The dlopen subroutine searches the modules based on the mechanism that the load subroutine defines, because the dlopen subroutine internally calls the load subroutine with the L_LIBPATH_EXEC flag.

Item | Description -- | -- FilePath | Specifies the name of a file containing the loadable module. This parameter can be contain an absolute path, a relative path, or no path component. If FilePath contains a slash character, FilePath is used directly, and no directories are searched.If the FilePath parameter is /unix, dlopen returns a value that can be used to look up symbols in the current kernel image, including those symbols found in any kernel extension that was available at the time the process began execution.If the value of FilePath is NULL, a value for the main application is returned. This allows dynamically loaded objects to look up symbols in the main executable, or for an application to examine symbols available within itself.

Return Values

Upon successful completion, dlopen returns a value that can be used in calls to the dlsym and dlclose subroutines. The value is not valid for use with the loadbind and unload subroutines.

If the dlopen call fails, NULL (a value of 0) is returned and the global variable errno is set. If errno contains the value ENOEXEC, further information is available via the dlerror function.

dlopen Subroutine Last Updated: 2023-03-24 Purpose Dynamically loads a module into the calling process.

Syntax
#include <dlfcn.h>

void *dlopen (FilePath, Flags);
const char *FilePath;
int Flags;

Description
The dlopen subroutine loads the module specified by FilePath into the executing process's address space. Dependents of the module are automatically loaded as well. If the module is already loaded, it is not loaded again, but a new, unique value will be returned by the dlopen subroutine.

The dlopen subroutine is a portable way of dynamically loading shared libraries. It performs C++ static initialization of the modules that it loads, like the loadAndInit subroutine does.

The value returned by the dlopen might be used in subsequent calls to dlsym and dlclose. If an error occurs during the operation, dlopen returns NULL.

If the main application was linked with the -brtl option, then the runtime linker is invoked by dlopen. If the module being loaded was linked with runtime linking enabled, both intra-module and inter-module references are overridden by any symbols available in the main application. If runtime linking was enabled, but the module was not built enabled, then all inter-module references will be overridden, but some intra-module references will not be overridden.

If the module being opened with dlopen or any of its dependents is being loaded for the first time, initialization routines for these newly-loaded routines are called (after runtime linking, if applicable) before dlopen returns. Initialization routines are the functions specified with the -binitfini: linker option when the module was built. (See the ld command for more information about this option.)

After calling the initialization functions for all newly-loaded modules, C++ static initialization is performed. If you call the dlopen subroutine from within an initialization function or a C++ static initialization function, modules loaded by the nested dlopen subroutine might be initialized before completely initializing the originally loaded modules.

If a dlopen subroutine is called from within a binitfini function, the initialization of the current module is abandoned for other modules.

Note: If the module being loaded has read-other permission, the module is loaded into the global shared library segment. Modules loaded into the global shared library segment are not unloaded even if they are no longer being used. Use the slibclean command to remove unused modules from the global shared library segment. To load the module in the process private region, unload the module completely using the slibclean command, and then unset its read-other permission.
The LIBPATH or LD_LIBRARY_PATH environment variables can be used to specify a list of directories in which the dlopen subroutine searches for the named module. The running application also contains a set of library search paths that were specified when the application was linked. The dlopen subroutine searches the modules based on the mechanism that the load subroutine defines, because the dlopen subroutine internally calls the load subroutine with the L_LIBPATH_EXEC flag.

Item
Description
FilePath Specifies the name of a file containing the loadable module. This parameter can be contain an absolute path, a relative path, or no path component. If FilePath contains a slash character, FilePath is used directly, and no directories are searched.
If the FilePath parameter is /unix, dlopen returns a value that can be used to look up symbols in the current kernel image, including those symbols found in any kernel extension that was available at the time the process began execution.

If the value of FilePath is NULL, a value for the main application is returned. This allows dynamically loaded objects to look up symbols in the main executable, or for an application to examine symbols available within itself.

Flags
Specifies variations of the behavior of dlopen. Either RTLD_NOW or RTLD_LAZY must always be specified. Other flags may be OR'ed with RTLD_NOW or RTLD_LAZY.

Item
Description
RTLD_NOW Load all dependents of the module being loaded and resolve all symbols.
RTLD_LAZY Specifies the same behavior as RTLD_NOW. In a future release of the operating system, the behavior of the RTLD_LAZY may change so that loading of dependent modules is deferred of resolution of some symbols is deferred.
RTLD_GLOBAL Allows symbols in the module being loaded to be visible when resolving symbols used by other dlopen calls. These symbols will also be visible when the main application is opened with dlopen(NULL, mode).
RTLD_LOCAL Prevent symbols in the module being loaded from being used when resolving symbols used by other dlopen calls. Symbols in the module being loaded can only be accessed by calling dlsym subroutine. If neither RTLD_GLOBAL nor RTLD_LOCAL is specified, the default is RTLD_LOCAL. If both flags are specified, RTLD_LOCAL is ignored.
RTLD_MEMBER The dlopen subroutine can be used to load a module that is a member of an archive. The L_LOADMEMBER flag is used when the load subroutine is called. The module name FilePath names the archive and archive member according to the rules outlined in the load subroutine.
RTLD_NOAUTODEFER Prevents deferred imports in the module being loaded from being automatically resolved by subsequent loads. The L_NOAUTODEFER flag is used when the load subroutine is called.
Ordinarily, modules built for use by the dlopen and dlsym sub routines will not contain deferred imports. However, deferred imports can be still used. A module opened with dlopen may provide definitions for deferred imports in the main application, for modules loaded with the load subroutine (if the L_NOAUTODEFER flag was not used), and for other modules loaded with the dlopen subroutine (if the RTLD_NOAUTODEFER flag was not used).

Return Values
Upon successful completion, dlopen returns a value that can be used in calls to the dlsym and dlclose subroutines. The value is not valid for use with the loadbind and unload subroutines.

If the dlopen call fails, NULL (a value of 0) is returned and the global variable errno is set. If errno contains the value ENOEXEC, further information is available via the dlerror function.

@zl-wang
Copy link
Contributor

zl-wang commented Jul 29, 2024

buried deep in the above in Flags section:
RTLD_MEMBER The dlopen subroutine can be used to load a module that is a member of an archive. The L_LOADMEMBER flag is used when the load subroutine is called. The module name FilePath names the archive and archive member according to the rules outlined in the load subroutine.

@ChengJin01
Copy link
Author

The problem with libclang.a is that these symbols (required in jextract) don't belong to any member of the archive:

[304]   0x11041e898    .data      EXP     DS SECdef        [noIMid] clang_getRemappings
[305]   0x11041e8b0    .data      EXP     DS SECdef        [noIMid] clang_getRemappingsFr
[306]   0x11041e8c8    .data      EXP     DS SECdef        [noIMid] clang_remap_getNumFil
[307]   0x11041e8e0    .data      EXP     DS SECdef        [noIMid] clang_remap_getFilena
[308]   0x11041e8f8    .data      EXP     DS SECdef        [noIMid] clang_remap_dispose
[309]   0x11041e910    .data      EXP     DS SECdef        [noIMid] clang_getBuildSession
[310]   0x11041e928    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOver
[311]   0x11041e940    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOverMapping
[312]   0x11041e958    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOverSensitivity
[313]   0x11041e970    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOverBuffer
[314]   0x11041e988    .data      EXP     DS SECdef        [noIMid] clang_free
[315]   0x11041e9a0    .data      EXP     DS SECdef        [noIMid] clang_VirtualFileOver
[316]   0x11041e9b8    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescri
[317]   0x11041e9d0    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrimeworkModuleName
[318]   0x11041e9e8    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrirellaHeader
[319]   0x11041ea00    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrioBuffer
[320]   0x11041ea18    .data      EXP     DS SECdef        [noIMid] clang_ModuleMapDescrie
[321]   0x110434a20    .data      EXP     DS SECdef        [noIMid] clang_Cursor_isNull
[322]   0x110434a38    .data      EXP     DS SECdef        [noIMid] clang_getNullRange
[323]   0x110434a50    .data      EXP     DS SECdef        [noIMid] clang_getNullLocation
[324]   0x110434a68    .data      EXP     DS SECdef        [noIMid] clang_getFileLocation
[325]   0x110434a80    .data      EXP     DS SECdef        [noIMid] clang_getCursorUSR
[326]   0x110434a98    .data      EXP     DS SECdef        [noIMid] clang_getCString
[327]   0x110434ab0    .data      EXP     DS SECdef        [noIMid] clang_disposeString
[328]   0x110434ac8    .data      EXP     DS SECdef        [noIMid] clang_getTypeDeclarat
[329]   0x110434af8    .data      EXP     DS SECdef        [noIMid] clang_getRangeStart
[330]   0x110434b10    .data      EXP     DS SECdef        [noIMid] clang_getRangeEnd
[331]   0x110434b28    .data      EXP     DS SECdef        [noIMid] clang_getRange
[332]   0x110434b70    .data      EXP     DS SECdef        [noIMid] clang_defaultDiagnosttions
[333]   0x110434b88    .data      EXP     DS SECdef        [noIMid] clang_formatDiagnosti
[334]   0x1104c32f0    .data      EXP     DS SECdef        [noIMid] clang_install_abortinl_error_handler
[335]   0x1104c38d8    .data      EXP     DS SECdef        [noIMid] clang_createTranslati
[336]   0x1104e2f10    .data      EXP     DS SECdef        [noIMid] clang_Cursor_getTrans
[337]   0x1104e2f28    .data      EXP     DS SECdef        [noIMid] clang_Range_isNull
[338]   0x1104e3048    .data      EXP     DS SECdef        [noIMid] clang_disposeTranslat
[339]   0x1104e3078    .data      EXP     DS SECdef        [noIMid] clang_isInvalid
[340]   0x1104e3090    .data      EXP     DS SECdef        [noIMid] clang_isDeclaration
[341]   0x1104e30a8    .data      EXP     DS SECdef        [noIMid] clang_isReference
[342]   0x1104e30c0    .data      EXP     DS SECdef        [noIMid] clang_isStatement
[343]   0x1104e30d8    .data      EXP     DS SECdef        [noIMid] clang_isExpression
[344]   0x1104e30f0    .data      EXP     DS SECdef        [noIMid] clang_isTranslationUn
[345]   0x1104e3108    .data      EXP     DS SECdef        [noIMid] clang_isAttribute
[346]   0x1104e3120    .data      EXP     DS SECdef        [noIMid] clang_createIndex
......

Does dlopen work to handle them correctly?

@zl-wang
Copy link
Contributor

zl-wang commented Jul 29, 2024

that means you might need another shared-lib to satisfy the request. i am wondering how the executable was linked first. why can it be linked successfully, if there were missing symbols.

@ChengJin01
Copy link
Author

this has been always like that on AIX. when there are multiple members in an archive, dlopen needs to name the specific member you want to load. I am going to look up man-page of dlopen and attach here later.

I tried the following code but it ended up with a null handle.

#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char **argv) {
    void *handle;
    handle = dlopen ("/usr/lib/libc.a(shr_64.o)", RTLD_MEMBER); <--- or RTLD_MEMBER | RTLD_LAZY
    printf("handle = %p\n", handle);
    dlclose(handle);
    return 0;
}

@ChengJin01
Copy link
Author

ChengJin01 commented Jul 29, 2024

that means you might need another shared-lib to satisfy the request. i am wondering how the executable was linked first. why can it be linked successfully, if there were missing symbols.

These libraries (including libclang.a) are directly unpacked from https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.8/clang+llvm-18.1.8-powerpc64-ibm-aix-7.2.tar.xz (as required by jextract) in which these libraries are put together there.

@pshipton pshipton added this to the Java 23 (0.47) milestone Jul 30, 2024
@pshipton
Copy link
Member

Not specific to jdk23, not a new problem, not a blocker for jdk23, move it forward.

@babsingh
Copy link
Contributor

@JasonFengJ9 For 0.48, this issue will need to be resolved by the end of this week. What's the current state of this issue? Based on this issue's impact, do we need it to be fixed in 0.48 or can it be pushed to 0.49?

@zl-wang
Copy link
Contributor

zl-wang commented Sep 16, 2024

i have successfully built and run jextract on AIX (and Linux) for customer (Finanz Informatik). on AIX, there is a clang bug though (allocating 2TB memory). for official build, you might need to change the gradle build script to copy/extract libclang.so from libclang.a.

the bug still exists in latest/current version of clang. OpenXL team is investigating, tracked here: https://github.ibm.com/compiler/wyvern/issues/20642

@JasonFengJ9
Copy link
Member

This is not a new problem. As per #19930 (comment), the customer has a running jextract.

Moving to 0.49.

@JasonFengJ9
Copy link
Member

i have successfully built and run jextract on AIX (and Linux) for customer (Finanz Informatik).

@zl-wang is there any OpenJ9 change involved? Do we still need this issue for further investigation?

@zl-wang
Copy link
Contributor

zl-wang commented Sep 30, 2024

no, i don't need to change anything in OpenJ9. If this issue was opened for the purpose of building jextract on AIX, i think it is better to be in jextract repository (in order to change the gradle script). otherwise, it can be closed.

@JasonFengJ9
Copy link
Member

Thanks @zl-wang this issue was opened for OpenJ9 support of AIX jextract, since it can be addressed with the build script changes, closing it here.

Copy link

Issue Number: 19930
Status: Closed
Actual Components: comp:vm, project:panama, os:aix
Actual Assignees: No one :(
PR Assignees: No one :(

@JasonFengJ9
Copy link
Member

If this issue was opened for the purpose of building jextract on AIX, i think it is better to be in jextract repository (in order to change the gradle script).

Chatted with @zl-wang, the shared library .so was extracted from the archive file, and renamed to libclang.so manually.
Will propose a script change at https://github.com/openjdk/jextract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:vm os:aix project:panama Used to track Project Panama related work
Projects
None yet
Development

No branches or pull requests

5 participants