From e4b048cecc1ae4e3bb0b8f6b733683ed28143133 Mon Sep 17 00:00:00 2001 From: Cole-Greer Date: Thu, 5 Dec 2024 13:54:40 -0800 Subject: [PATCH] Update reference docs --- .../reference/gremlin-applications.asciidoc | 20 ++------ docs/src/reference/gremlin-variants.asciidoc | 40 +++++++++++++++ docs/src/reference/the-traversal.asciidoc | 50 +++++++++++++++++++ .../traversal/dsl/graph/GraphTraversal.java | 8 +-- 4 files changed, 99 insertions(+), 19 deletions(-) diff --git a/docs/src/reference/gremlin-applications.asciidoc b/docs/src/reference/gremlin-applications.asciidoc index 8cdb7d6c4b..5167ce1440 100644 --- a/docs/src/reference/gremlin-applications.asciidoc +++ b/docs/src/reference/gremlin-applications.asciidoc @@ -1859,7 +1859,7 @@ without all the associated structure which can slow the response. [[parameterized-scripts]] ==== Parameterized Scripts -image:gremlin-parameterized.png[width=150,float=left] If using the standard `GremlinGroovyScriptEngine` in Gremlin +image:gremlin-parameterized.png[width=150,float=left] If using `GremlinGroovyScriptEngine` in Gremlin Server, it is imperative to use script parameterization. Period. There are at least two good reasons for doing so: script caching and protection from "Gremlin injection" (conceptually the same as the notion of SQL injection). @@ -1870,7 +1870,7 @@ scripts. This processing is different from the processing performed by Groovy an concerns of this section. When considering parameterization, users should also consider the graph database they are using to determine if it has native mechanisms that preclude the need for parameterization. -With respect to caching, Gremlin Server caches all scripts that are passed to it. The cache is keyed based on the a +With respect to caching, Gremlin Server caches all scripts that are passed to it. The cache is keyed based on the hash of the script. Therefore `g.V(1)` and `g.V(2)` will be recognized as two separate scripts in the cache. If that script is parameterized to `g.V(x)` where `x` is passed as a parameter from the client, there will be no additional compilation cost for future requests on that script. Compilation of a script should be considered "expensive" and @@ -1923,21 +1923,14 @@ params.put("nodeId",nodeId); client.submit(query, params); ---- -Gremlin injection should not be possible with `Bytecode` based traversals - only scripts - because `Bytecode` -traversals will treat all arguments as literal values. There is potential for concern if lambda based steps are -utilized as they execute arbitrary code, which is string based, but configuring `TraversalSource` instances with -`LambdaRestrictionStrategy`, which prevents lambdas all together, using a graph that does not allow lambdas at all, or -configuring appropriate <> in Gremlin Server (or such options available to the graph -database in use) should each help mitigate problems related to this issue. - -Scripts create classes which get loaded to the JVM metaspace and to a `Class` cache. For those using script +Groovy scripts create classes which get loaded to the JVM metaspace and to a `Class` cache. For those using script parameterization, a typical application should not generate an overabundance of pressure on these two components of Gremlin Server's memory footprint. On the other hand, it's not too hard to imagine a situation where problems might emerge: * An application use case makes parameterization impossible and therefore all scripts are unique. * There is a bug in an applications parameterization code that is actually instead producing unique scripts. -* A long running Gremln Server takes lots of non-parameterized scripts from Gremlin Console or similar tools. +* A long running Gremlin Server takes lots of non-parameterized scripts from Gremlin Console or similar tools. In these sorts of cases, Gremlin Server's performance can be affected adversely as without some additional configuration the metaspace will grow indefinitely (possibly along with the general heap) triggering longer and more frequent rounds @@ -1949,7 +1942,7 @@ such that cache hits will be low there is little need to keep such references ar Perhaps the more important guards are related to the JVM metaspace. Start by setting the initial size of this space with `-XX:MetaspaceSize`. When this value is exceeded it will trigger a GC round - it is essentially a threshold for -GC. The grow of this value can be capped with `-XX:MaxMetaspaceSize` (this value is unlimited by default). In an ideal +GC. The growth of this value can be capped with `-XX:MaxMetaspaceSize` (this value is unlimited by default). In an ideal situation (i.e. parameterization), the `-XX:MetaspaceSize` should have a large enough setting so as to avoid early GC rounds for metaspace, but outside of an ideal world (i.e. non-parameterization) it may not be smart to make this number too large. Making the setting too large (and thus the `-XX:MaxMetaspaceSize` even larger) may trigger longer GC rounds @@ -1972,9 +1965,6 @@ There really aren't any general guidelines for how to initially set these values trends is likely the best way to understand how a particular workload is affecting the metaspace and its relation to GC. Getting these settings "right" however will help ensure much more predictable Gremlin Server operations. -IMPORTANT: A lambda used in a bytecode-based request will be treated as a script, so issues related to raw script-based -requests apply equally well to lambda-bytecode requests. - ==== Properties of Elements It was mentioned above at the start of this "Best Practices" section that serialization of graph elements (i.e. diff --git a/docs/src/reference/gremlin-variants.asciidoc b/docs/src/reference/gremlin-variants.asciidoc index adc349fb50..2e61fff1c7 100644 --- a/docs/src/reference/gremlin-variants.asciidoc +++ b/docs/src/reference/gremlin-variants.asciidoc @@ -869,6 +869,33 @@ ensure compatibility when making requests. Obviously, it is possible to switch t the appropriate `MessageSerializer` (e.g. `GraphSONMessageSerializerV4` or `GraphBinaryMessageSerializerV4` respectively) in the same way and building that into the `Cluster` object. +[[gremlin-java-gvalue]] +=== GValue Parameterization + +A `GValue` is an encapsulation of a parameter name and value. The GValue class has a series of static methods to +construct GValues of various types from a given parameter name and value. Some of the most common examples are listed +below, see the +link:++https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/process/traversal/step/GValue.html#method.summary++[Javadocs] +for a complete listing. + +[source,java] +---- +GValue stringArg = GValue.ofString("name", "value"); +GValue intArg = GValue.ofInteger("name", 1); +GValue mapArg = GValue.ofMap("name", Collections.emptyMap()); +GValue autoTypedArg = GValue.of("name", "value"); // GValue will attempt to automatically detect correct type +---- + +A <> are able to accept `GValues`. When constructing a +`GraphTraversal` with such steps in Java, a GValue may be passed in the traversal to utilize a parameter in place of a +literal. + +[source,java] +---- +g.V().has("name", GValue.ofString("name", "marko")); +g.mergeV(GValue.ofMap("vertexPattern", Collections.singletonMap("name", "marko"))); +---- + [[gremlin-java-lambda]] === The Lambda Solution @@ -2362,6 +2389,19 @@ except Exception as e: ---- +[[gremlin-python-gvalue]] +=== GValue Parameterization + +A `GValue` is an encapsulation of a parameter name and value. A <> +are able to accept GValues. When constructing a `GraphTraversal` with such steps in Python, a GValue may be passed in +the traversal to utilize a parameter in place of a literal. + +[source,python] +---- +g.V().has('name', GValue('name', 'marko')) +g.merge_v(GValue('vertexPattern', {'name': 'marko'})) +---- + [[gremlin-python-scripts]] === Submitting Scripts diff --git a/docs/src/reference/the-traversal.asciidoc b/docs/src/reference/the-traversal.asciidoc index 237e83eb15..e5d16f1b6e 100644 --- a/docs/src/reference/the-traversal.asciidoc +++ b/docs/src/reference/the-traversal.asciidoc @@ -5172,6 +5172,56 @@ location. Please see the <> for `io()`-step for more comp link:++https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#write()++[`write()`] +[[traversal-parameterization]] +== Traversal Parameterization + +A subset of gremlin steps are able to accept parameterized arguments also known as GValues. GValues can be used to +provide protection against gremlin-injection attacks in cases where untrusted and unsanitized inputs must be passed as +step arguments. Additionally, use of GValues may offer performance benefits in certain environments by making use of +some query caching capabilities. Note that the reference implementation of the gremlin language and `gremlin-server` do +not have such a query caching mechanism, and thus will not see any performance improvements through parameterization. Users +should consult the documentation of their specific graph system details of potential performance benefits via parameterization. + +NOTE: There are unique considerations regarding parameters when using `gremlin-groovy` scripts. Groovy allows for parameterization +at arbitrary points in the query in addition to the subset of parameterizable steps documented here. Groovy is also bound by +a comparatively slow script compilation, which makes parameterization essential for performant execution of `gremlin-groovy` scripts. + +[cols="1,1"] +|=== +|Step | Parameterizable arguments + +|<> | String edgeLabel +|<> | String vertexLabel +|<> | String...edgeLabels +|<> | String...edgeLabels +|<> | Map params +|<> | double probability +|<> | Object values +|<> | String delimiter +|<> | E e/value +|<> | Object values +|<> | Object values +|<> | Vertex fromVertex +|<> | String label +|<> | String label, String... labels +|<> | String...edgeLabels +|<> | String...edgeLabels +|<> | Object values +|<> | Object values +|<> | Map searchCreate +|<> | Map searchCreate +|<> | M token, Map m +|<> | String...edgeLabels +|<> | String...edgeLabels +|<> | Object values +|<> | String... edgeLabels, Vertex toVertex +|<> | String...edgeLabels +|=== + +*Additional References* + +<>, <>, <> + [[a-note-on-predicates]] == A Note on Predicates diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java index cdf217c686..47ac4cdd1c 100644 --- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java +++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.java @@ -433,7 +433,7 @@ public default GraphTraversal to(final Direction direction, final Str /** * Map the {@link Vertex} to its adjacent vertices given a direction and edge labels. The arguments for the - * labels must be either a {@code String} or a {@link GValue}. For internal use for parameterization. + * labels must be either a {@code String} or a {@link GValue}. For internal use for parameterization. * * @param direction the direction to traverse from the current vertex * @param edgeLabels the edge labels to traverse @@ -4504,7 +4504,7 @@ public default GraphTraversal option(final Traversal traversal * @see Reference Documentation - Choose Step * @see Reference Documentation - MergeV Step * @see Reference Documentation - MergeE Step - * @since 3.0.0-incubating + * @since 4.0.0 */ public default GraphTraversal option(final GValue token, final Traversal traversalOption) { this.asAdmin().getGremlinLang().addStep(GraphTraversal.Symbols.option, token, traversalOption); @@ -4525,7 +4525,7 @@ public default GraphTraversal option(final GValue token, final * @return the traversal with the modulated step * @see Reference Documentation - MergeV Step * @see Reference Documentation - MergeE Step - * @since 3.7.3 + * @since 4.0.0 */ public default GraphTraversal option(final M token, final GValue> m) { this.asAdmin().getGremlinLang().addStep(GraphTraversal.Symbols.option, token, m); @@ -4542,7 +4542,7 @@ public default GraphTraversal option(final M token, final GValueReference Documentation - MergeV Step * @see Reference Documentation - MergeE Step - * @since 3.7.3 + * @since 4.0.0 */ public default GraphTraversal option(final Merge merge, final GValue> m, final VertexProperty.Cardinality cardinality) { this.asAdmin().getGremlinLang().addStep(GraphTraversal.Symbols.option, merge, m, cardinality);