[RFC] JSM alternatives #16913

kumargu · 2024-12-27T05:39:45Z

Is your feature request related to a problem? Please describe

This is a draft to review the known alternatives for JSM and path forward to upgrade Opensearch to JDK-24 in 3.0 release.

Describe the solution you'd like

Github issues —

[1] https://github.com/opensearch-project/OpenSearch/issues/1687
[2] #16634

Background

OpenJDK has decided to permanently disable security manager (JSM/security-manager) starting JDK 24 with below rationals —

It's low adoption within JAVA application with high maintenance cost.
It’s flawed design principles which makes developers maintaining and working with security-manager very hard.
Modern java has better ways to enforce security and advent of better tooling like docker, systemd are preferred to sandbox java applications.

Java security manager is used extensively in Opensearch to sandbox Opensearch process. This sandboxing protects Opensearch process from plugins (assume untrusted code) and itself from dependent package vulnerabilities. Security-manager restricts plugins to access system resources like files, network connections, environment variables, system properties, class-loading etc. On a very high level — you can write a policy at each plugin level which will define what resources can be access by the plugin. For example, you can allow the security plugin to access certs from a specific disk location while all other plugins are restricted to do so. Infact you can define a policy for the Opensearch core which restricting the core itself from vulnerabilities. A classic example is the Log4j remote execution (CVE-2021-44228) which impacted almost Java applications while Opensearch (then Elasticsearch) — all credits to security manager (it denied access to write the Log4j configuration file).

What exactly is changing with JDK-24

With JDK 24 security manager is deprecated for removal — meaning there would not be the ability to enable the Security Manager when starting the Java runtime. Up-until JDK-21, enabling the security manager was just a warning which could be suppressed.

Trusted vs Untrusted Code

Opensearch ecosystem comes with —

OpenSearch Core — This holds the core of the OpenSearch distribution maintained by OpenSearch with a high bar of security. We consider this trusted. However, trusted doesn’t mean it is not exposed to security vulnerabilities, because it depends on external packages, so it too is exposed to vulnerabilities (such as the Log4j vulnerability).
Bundled Plugins — These plugins are bundled together with the distribution. There’s a very fine line regarding what should be considered trusted vs. untrusted. Some of these require special attention beyond just protecting against vulnerabilities, due to the fact that they perform additional work like making outbound network calls, reading trust stores, class loading, etc. For these reasons, we classify them as Untrusted (or semi- trusted)
Other Plugins — These additional plugins are built by OpenSearch developers or members of the OpenSearch community and are not bundled by default. We consider them Untrusted.

Goal

In this doc, we try to answer below meta questions —

Do we need a replacement for both open-source distribution?
What are the known alternatives of security manager?

Ideally, we want the latest and greatest version of JAVA to be used in the Opensearch. We would like to use JDK-24 for 3.0 release of Opensearch expected to land in April 2025. Based on the known alternatives and their protection domain, we will to take a call what options are sufficient to place us in a confident state to live without security manager.

Do we need a replacement?

The open-source distribution heavily depends on security manager acting as a first line of security defense. Hence we must find a replacement for security manager. Again, we will not look for a full replacement. Until we are convinced with the new available security posture; we cannot upgrade Opensearch core and Plugins to JDK-24 — obviously we don’t want to remain pinned an older JDK version while a new (better) version is available.

Requirements

Before diving into alternatives to the Security manager, let’s first examine the types of protections it currently provides in OpenSearch. These will serve as the baseline requirements for identifying suitable alternatives.
We will categorize these requirements by priority:

Priority A: These are critical requirements that must be addressed by one or more alternative solutions.
Lower Priorities: These are less critical and considered "nice-to-have,"

Priority A

Controlled read, write, and execute permissions for specific files and directories. restrict hard or symbolic link creation
Controlled access to specific IPs, ports, or protocols.
Disallow system calls (some examples )
1. subprocess creation by plugins,
2. reboot
3. system exit
Disallow native access
Controlled access to system properties and environment variables.
Controlled Class Loading and Reflection:
1. prevented unauthorized access to private fields or methods
2. controlled dynamic class loading. Use pre-approved class loaders for dynamic class loading.
3. restricted use of reflection
4. Disallows implementation of arbitrary host interface
Controlled ability to load and access key stores containing private keys and certificates.
Restricted operations for creating or signing crypto keys.

Priority B

(not a blocker for 3.0)

Reduce or completely avoid shared memory between plugins and core.
Restricts the maximum number of stack frames that can be pushed on the stack by plugins, to prevent against unbounded recursions
Limits the size of the output that plugins code writes to standard output / error.
Prevent plugins to monitor SSL sessions established with SSL peers
Prevent plugins invalidate sessions which may slow down performance.

Ok. Now it’s time we review alternatives which can meet these requirements.

Alternatives

1 Systemd sandboxing

[GH issue: https://github.com//issues/16729]

Systemd provides security features that can be used to isolate processes from each other as well as from the underlying operating system. In other words it allow you to setup privilege separation between the different components of the OS.

Today, there already exists a systemd setup which you can optionally use to start you Opensearch process. Moving ahead, we will suggest starting your Opensearch process with systemd as the most preferred and secure way. Most importantly it requires no infrastructure to setup on linux systems and hence distribution and usage becomes really useful.

While there are whole lot of configs out there to build a highly secure sanboxed environment we will discuss the ones which interests our requirements and their usages (for clarity). Infact some of the configs available could bring in more protection than security-manager.

File System Restrictions:
1. ReadOnlyDirectories=, InaccessiblePaths=, ReadWritePaths= Grants a service specific read-write access to certain paths, while making the rest of the file system read-only or inaccessible.
Network Access Control: Restrict and control network communication via allowlisting/ deny-listing IP address over which process can communicate. Similarly allow the sockers to which a process can bind itself, f.e Opensearch core can be allowed to bind to (9200, 9200..)
System Call Filtering: Systemd uses seccomp to implement filtering by syscalls. Systemd uses this to allow or blocks specific system calls using seccomp filters heavily reducing the attack surface of the exposed kernel.
Capability Bounding: Allows to limit in a relatively fine grained fashion which kernel capabilities a service once started retains.
1. f.e CapabilityBoundingSet=CAP_CHOWN,CAP_KILL. This would allow the service to only use the "CAP_CHOWN" (change ownership) and "CAP_KILL" (terminate processes) capabilities.
Controlled ability to load and access key stores containing private keys and certificates

Overall this option does a great job to secure the Opensearch process against common side effects of vulnerabilities and untrusted code disrupting the OpenSearch process.

Limitations —

In the current model, both OpenSearch core and its plugins execute under a single process. This design introduces a bit of concern: if a plugin, such as the security plugin, requires elevated privileges— to access a trust store—those elevated permissions must be configured at the global systemd service level—hence these permissions are applied uniformly across all plugins. A more secure and ideal approach would involve implementing finegrained, plugin-specific systemd configurations to enforce the principle of least privilege which we c
Controlled Class Loading and access via reflection - Today, the classloader filtering currently integrates with security manager, just as a convenient way to provide a list of allowable classes, but it doesn't have to work this way. It can be changed to get its list of allowed classes through a new implementation of custom classloader. This is generally a good way anyways to abstract out this logic outside of security-manager.
Disallow Native access
No direct replacement in Windows.

2 GraalVM sandboxing

[GH issue :https://github.com//issues/16861]

Oracle GraalVM is a high-performance JDK that enhances Java and JVM-based applications through its Ahead-Of-Time (AOT) compiler.

Beyond performance improvements, GraalVM also offers a sandboxing mechanism, which is particularly relevant for securely executing guest code within a host application.
The sandboxing feature establishes an isolation boundary between host and guest code, comparable to the separation between user mode and kernel mode in operating systems. In this context:

Host: OpenSearch Core.
Guest: OpenSearch Plugins.

This isolation ensures that guest code executes in a restricted and controlled environment, separate from the host's privileges. However, as of now, GraalVM supports JavaScript as a guest language, with full support for Java as a guest language is WIP refer [GR-49729] [Espresso] Support running without native access]

While full guest Java support is still under development, GraalVM’s existing features (Expresso) can be used to:

Isolate and Execute Legacy Code: GraalVM allows running an older JVM version (guest JVM) in a sandboxed environment while the host JVM operates with a newer version.
No Compilation Target Changes: Both the OpenSearch Core and plugins can continue to run as JIT-compiled code without modifications to their compilation targets.

The overall idea is to spawn a Guest GraalVM JVM with security manager enabled and guest and host share their objects via low level GraalVM interoperability API. Next lets’s see some high level steps to achieve this. You can also refer the PoC for a better understanding #16863

Proposal

Host Environment:

OpenSearch Core and trusted components (e.g., Lucene and trusted plugins) run on a modern JVM version supported by GraalVM (e.g., JDK 24).
The Security Manager is disabled in this environment, as it is deprecated in newer Java versions.

Guest Environment:

Non-trusted components (e.g., plugins) run on an older JDK version (up to JDK 23) where the Security Manager is still available.
The Security Manager is configured with the same security policies currently used by OpenSearch Core.

A GraalVM Engine :

A GraalVM Engine is initialized to provide runtime support for interaction between host and guest environments. This engine facilitates secure communication using GraalVM’s low-level APIs.
The guest environment runs with the Security Manager enabled, ensuring that untrusted code is executed in a controlled context with appropriate restrictions.

Limitations

No support of GraalVM on Windows.
Slow boot-up time of the spawned JVM (but that ideally one time cost to pay)
Debugging is hard, most error are very low level GraalVM implementations details.
Communications between host and spawned JVM/context is currently very limited. At least one major bug fix is known but there has not been a full confirmation if that would be picked in upcoming Jan release. This majorly blocks us to further run our experiments. We have however requested the GraalVM team to try-pick get the bug fix available in the Jan version release.

Performance — While we don’t expect any performance impact, it is yet to be benchmarked and published.

Take-aways — Overall this approach allows to move forward with Java versions (JDK-24 and beyond) while preserving usage of security manager as it is used today.

While this looks hacky, this area of work will setup the ground work to fully utilise the sandboxing capabilities of GraalVM in future, allowing to build isolation boundaries between trusted and untrusted with fine-grained sandboxing polices.
No wonder this will also allow plugin authors to write plugins in different languages, such as JS, Rust etc.

3 Plugin level systemd

[GH issue: https://github.com//issues/16753]

Earlier we proposed to strengthen the Opensearch core security model via additional systemd configs such as limiting access to sockets and files. An advancement / extension of such sandboxing would be to run (some) plugins as a separate systemd unit (aka separate process), each of it with its own restrictive systemd config . This is akin to security-manager having plugin level security policies. This will also allow some plugins to run with elevated privileges without elevating the privileges of Core. The overall idea would be to expose a secure REST server within Opensearch core where plugin ↔ core interactions will be over secure, fast, bidirectional IPC. Such as IPC could be over Unix domain sockets which is fast, lightweight and can be modelled to use POSIX permissions to lock down access to the file descriptor (FD).

This idea is an overlap of work being proposed as part of Project Extensions which is being currently halted for
[1] ambiguity around the added performance impact from ser/de when running plugins outside of core and
[2] a large chunk of work involved requiring a rewrite of plugins which are tightly coupled with core.

4 JDK fork (not preferred)

The idea is to maintain a fork of JDK preserving the security manager in JDK-24 and beyond.

However, this approach is not ideal, as it would introduce significant overhead in maintaining the fork, particularly in porting bug fixes and updates from the upstream JDK. This solution should only be considered as a last resort if none of the previously discussed alternatives prove to be viable.

Conclusion

Assuming 3.0 lands in April 2025 with JDK-24, we are left with around three months of room from toda to pick alternatives which makes us feel comfortable to live without security manager. While this doc discussed multiple overlapping alternative, not all of these alternatives might be needed to be implemented necessarily for the 3.0 release.

#1 Systemd sandboxing alone is very powerful and covers for a lot of what security manager already does today. It will protect Opensearch from most security risks. This will become our first line of defence. I would say, we are 90% covered with just #1. Its a low hanging fruit and even if we are not able to ship 3.0 with JDK-24, we would still like to ship #1.

When it comes to #2 GraalVM sandboxing, it essentially means continuing usage of security manager even with JDK-24. The hardest part of the integration with GraalVM was already done by Andriy in his POC (#16861) and we would now assume that the integration could be delivered by March 2025.

Callouts for #2:

[1] Plugins which are run in sandbox JVM, can only be upgraded to JDK-23. Once we have the full sandboxing available in Graal oracle/graal#10239, then these plugins can be upgraded to JDK-24 or beyond.

[2] Not all plugins actually need to be instantiated within the Graal based forked sandbox, plugins which are Tightly coupled with OpenSearch Core, Trusted or Performance-critical can continue to run in the host JVM without sandboxing on >=JDK-24.

We believe that #1 and #2 provide enough confidence to proceed with upgrading to JDK 24, with delivery expected by mid-March 2025. Once #2 evolves into a fully developed sandboxing environment (anticipated in Q2 2025), we plan to treat #1 and #2 collectively as a replacement for the Security Manager.

We are temporarily setting aside #3, as it represents a significant amount of work, and meeting the April deadline seems unlikely. If GraalVM sandboxing integration proves problematic (e.g., harder debugging, unexpected bugs, perf issue etc.) within our ecosystem, we will revisit #3. However, GraalVM community is very supportive and it has been smooth working with them. On the other hand, if GraalVM integration aligns well with our needs, we may reconsider using Extensions on GraalVM. This presents a major potential advantage, making the risk of GraalVM integration worthwhile.

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

kumargu · 2024-12-27T05:42:00Z

cc @reta (for an early review)

reta · 2024-12-27T21:37:24Z

cc @reta (for an early review)

@kumargu could we incorporate this under existing RFC (#1687) as a comment or description change? I don't think we have to have yet another RFC on the same subject.

kumargu added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 27, 2024

github-actions bot added the Other label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] JSM alternatives #16913

[RFC] JSM alternatives #16913

kumargu commented Dec 27, 2024 •

edited

Loading

kumargu commented Dec 27, 2024

reta commented Dec 27, 2024

[RFC] JSM alternatives #16913

[RFC] JSM alternatives #16913

Comments

kumargu commented Dec 27, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Background

What exactly is changing with JDK-24

Goal

Do we need a replacement?

Requirements

Priority A

Priority B

Alternatives

1 Systemd sandboxing

Limitations —

2 GraalVM sandboxing

Proposal

Host Environment:

Guest Environment:

A GraalVM Engine :

Limitations

3 Plugin level systemd

4 JDK fork (not preferred)

Conclusion

Callouts for #2:

Related component

Describe alternatives you've considered

Additional context

kumargu commented Dec 27, 2024

reta commented Dec 27, 2024

kumargu commented Dec 27, 2024 •

edited

Loading