Skip to content

Commit

Permalink
Update reference docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Cole-Greer committed Dec 5, 2024
1 parent 66e7e6f commit e4b048c
Show file tree
Hide file tree
Showing 4 changed files with 99 additions and 19 deletions.
20 changes: 5 additions & 15 deletions docs/src/reference/gremlin-applications.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1859,7 +1859,7 @@ without all the associated structure which can slow the response.
[[parameterized-scripts]]
==== Parameterized Scripts
image:gremlin-parameterized.png[width=150,float=left] If using the standard `GremlinGroovyScriptEngine` in Gremlin
image:gremlin-parameterized.png[width=150,float=left] If using `GremlinGroovyScriptEngine` in Gremlin
Server, it is imperative to use script parameterization. Period. There are at least two good
reasons for doing so: script caching and protection from "Gremlin injection" (conceptually the same as the notion of
SQL injection).
Expand All @@ -1870,7 +1870,7 @@ scripts. This processing is different from the processing performed by Groovy an
concerns of this section. When considering parameterization, users should also consider the graph database they are
using to determine if it has native mechanisms that preclude the need for parameterization.
With respect to caching, Gremlin Server caches all scripts that are passed to it. The cache is keyed based on the a
With respect to caching, Gremlin Server caches all scripts that are passed to it. The cache is keyed based on the
hash of the script. Therefore `g.V(1)` and `g.V(2)` will be recognized as two separate scripts in the cache. If that
script is parameterized to `g.V(x)` where `x` is passed as a parameter from the client, there will be no additional
compilation cost for future requests on that script. Compilation of a script should be considered "expensive" and
Expand Down Expand Up @@ -1923,21 +1923,14 @@ params.put("nodeId",nodeId);
client.submit(query, params);
----
Gremlin injection should not be possible with `Bytecode` based traversals - only scripts - because `Bytecode`
traversals will treat all arguments as literal values. There is potential for concern if lambda based steps are
utilized as they execute arbitrary code, which is string based, but configuring `TraversalSource` instances with
`LambdaRestrictionStrategy`, which prevents lambdas all together, using a graph that does not allow lambdas at all, or
configuring appropriate <<script-execution,sandbox options>> in Gremlin Server (or such options available to the graph
database in use) should each help mitigate problems related to this issue.
Scripts create classes which get loaded to the JVM metaspace and to a `Class` cache. For those using script
Groovy scripts create classes which get loaded to the JVM metaspace and to a `Class` cache. For those using script
parameterization, a typical application should not generate an overabundance of pressure on these two components of
Gremlin Server's memory footprint. On the other hand, it's not too hard to imagine a situation where problems might
emerge:
* An application use case makes parameterization impossible and therefore all scripts are unique.
* There is a bug in an applications parameterization code that is actually instead producing unique scripts.
* A long running Gremln Server takes lots of non-parameterized scripts from Gremlin Console or similar tools.
* A long running Gremlin Server takes lots of non-parameterized scripts from Gremlin Console or similar tools.
In these sorts of cases, Gremlin Server's performance can be affected adversely as without some additional configuration
the metaspace will grow indefinitely (possibly along with the general heap) triggering longer and more frequent rounds
Expand All @@ -1949,7 +1942,7 @@ such that cache hits will be low there is little need to keep such references ar
Perhaps the more important guards are related to the JVM metaspace. Start by setting the initial size of this space
with `-XX:MetaspaceSize`. When this value is exceeded it will trigger a GC round - it is essentially a threshold for
GC. The grow of this value can be capped with `-XX:MaxMetaspaceSize` (this value is unlimited by default). In an ideal
GC. The growth of this value can be capped with `-XX:MaxMetaspaceSize` (this value is unlimited by default). In an ideal
situation (i.e. parameterization), the `-XX:MetaspaceSize` should have a large enough setting so as to avoid early GC
rounds for metaspace, but outside of an ideal world (i.e. non-parameterization) it may not be smart to make this number
too large. Making the setting too large (and thus the `-XX:MaxMetaspaceSize` even larger) may trigger longer GC rounds
Expand All @@ -1972,9 +1965,6 @@ There really aren't any general guidelines for how to initially set these values
trends is likely the best way to understand how a particular workload is affecting the metaspace and its relation to
GC. Getting these settings "right" however will help ensure much more predictable Gremlin Server operations.
IMPORTANT: A lambda used in a bytecode-based request will be treated as a script, so issues related to raw script-based
requests apply equally well to lambda-bytecode requests.
==== Properties of Elements
It was mentioned above at the start of this "Best Practices" section that serialization of graph elements (i.e.
Expand Down
40 changes: 40 additions & 0 deletions docs/src/reference/gremlin-variants.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -869,6 +869,33 @@ ensure compatibility when making requests. Obviously, it is possible to switch t
the appropriate `MessageSerializer` (e.g. `GraphSONMessageSerializerV4` or `GraphBinaryMessageSerializerV4` respectively)
in the same way and building that into the `Cluster` object.
[[gremlin-java-gvalue]]
=== GValue Parameterization
A `GValue` is an encapsulation of a parameter name and value. The GValue class has a series of static methods to
construct GValues of various types from a given parameter name and value. Some of the most common examples are listed
below, see the
link:++https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/process/traversal/step/GValue.html#method.summary++[Javadocs]
for a complete listing.
[source,java]
----
GValue<String> stringArg = GValue.ofString("name", "value");
GValue<Integer> intArg = GValue.ofInteger("name", 1);
GValue<Map> mapArg = GValue.ofMap("name", Collections.emptyMap());
GValue<?> autoTypedArg = GValue.of("name", "value"); // GValue will attempt to automatically detect correct type
----
A <<traversal-parameterization,subset of gremlin steps>> are able to accept `GValues`. When constructing a
`GraphTraversal` with such steps in Java, a GValue may be passed in the traversal to utilize a parameter in place of a
literal.
[source,java]
----
g.V().has("name", GValue.ofString("name", "marko"));
g.mergeV(GValue.ofMap("vertexPattern", Collections.singletonMap("name", "marko")));
----
[[gremlin-java-lambda]]
=== The Lambda Solution
Expand Down Expand Up @@ -2362,6 +2389,19 @@ except Exception as e:
----
[[gremlin-python-gvalue]]
=== GValue Parameterization
A `GValue` is an encapsulation of a parameter name and value. A <<traversal-parameterization,subset of gremlin steps>>
are able to accept GValues. When constructing a `GraphTraversal` with such steps in Python, a GValue may be passed in
the traversal to utilize a parameter in place of a literal.
[source,python]
----
g.V().has('name', GValue('name', 'marko'))
g.merge_v(GValue('vertexPattern', {'name': 'marko'}))
----
[[gremlin-python-scripts]]
=== Submitting Scripts
Expand Down
50 changes: 50 additions & 0 deletions docs/src/reference/the-traversal.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5172,6 +5172,56 @@ location. Please see the <<io-step,documentation>> for `io()`-step for more comp
link:++https://tinkerpop.apache.org/javadocs/x.y.z/full/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversal.html#write()++[`write()`]
[[traversal-parameterization]]
== Traversal Parameterization
A subset of gremlin steps are able to accept parameterized arguments also known as GValues. GValues can be used to
provide protection against gremlin-injection attacks in cases where untrusted and unsanitized inputs must be passed as
step arguments. Additionally, use of GValues may offer performance benefits in certain environments by making use of
some query caching capabilities. Note that the reference implementation of the gremlin language and `gremlin-server` do
not have such a query caching mechanism, and thus will not see any performance improvements through parameterization. Users
should consult the documentation of their specific graph system details of potential performance benefits via parameterization.
NOTE: There are unique considerations regarding parameters when using `gremlin-groovy` scripts. Groovy allows for parameterization
at arbitrary points in the query in addition to the subset of parameterizable steps documented here. Groovy is also bound by
a comparatively slow script compilation, which makes parameterization essential for performant execution of `gremlin-groovy` scripts.
[cols="1,1"]
|===
|Step | Parameterizable arguments
|<<addedge-step,addE()>> | String edgeLabel
|<<addvertex-step,addV()>> | String vertexLabel
|<<vertex-steps,both()>> | String...edgeLabels
|<<vertex-steps,bothE()>> | String...edgeLabels
|<<call-step,call()>> | Map params
|<<coin-step,coin()>> | double probability
|<<combine-step,combine()>> | Object values
|<<conjoin-step,conjoin()>> | String delimiter
|<<constant-step,constant()>> | E e/value
|<<difference-step,difference()>> | Object values
|<<disjunct-step,disjunct()>> | Object values
|<<from-step,from()>> | Vertex fromVertex
|<<has-step,has()>> | String label
|<<has-step,hasLabel()>> | String label, String... labels
|<<vertex-steps,in()>> | String...edgeLabels
|<<vertex-steps,inE()>> | String...edgeLabels
|<<intersect-step,intersect()>> | Object values
|<<merge-step,merge()>> | Object values
|<<mergeedge-step,mergeE()>> | Map searchCreate
|<<mergevertex-step,mergeV()>> | Map searchCreate
|<<option-step,option()>> | M token, Map m
|<<vertex-steps,out()>> | String...edgeLabels
|<<vertex-steps,outE()>> | String...edgeLabels
|<<product-step,product()>> | Object values
|<<to-step,to()>> | String... edgeLabels, Vertex toVertex
|<<vertex-steps,toE()>> | String...edgeLabels
|===
*Additional References*
<<gremlin-java-gvalue,Java>>, <<gremlin-python-gvalue,Python>>, <<parameterized-scripts,Server>>
[[a-note-on-predicates]]
== A Note on Predicates
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ public default GraphTraversal<S, Vertex> to(final Direction direction, final Str

/**
* Map the {@link Vertex} to its adjacent vertices given a direction and edge labels. The arguments for the
* labels must be either a {@code String} or a {@link GValue<String>}. For internal use for parameterization.
* labels must be either a {@code String} or a {@link GValue<String>}. For internal use for parameterization.
*
* @param direction the direction to traverse from the current vertex
* @param edgeLabels the edge labels to traverse
Expand Down Expand Up @@ -4504,7 +4504,7 @@ public default <E2> GraphTraversal<S, E> option(final Traversal<?, E2> traversal
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#choose-step" target="_blank">Reference Documentation - Choose Step</a>
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#mergev-step" target="_blank">Reference Documentation - MergeV Step</a>
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#mergee-step" target="_blank">Reference Documentation - MergeE Step</a>
* @since 3.0.0-incubating
* @since 4.0.0
*/
public default <M, E2> GraphTraversal<S, E> option(final GValue<M> token, final Traversal<?, E2> traversalOption) {
this.asAdmin().getGremlinLang().addStep(GraphTraversal.Symbols.option, token, traversalOption);
Expand All @@ -4525,7 +4525,7 @@ public default <M, E2> GraphTraversal<S, E> option(final GValue<M> token, final
* @return the traversal with the modulated step
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#mergev-step" target="_blank">Reference Documentation - MergeV Step</a>
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#mergee-step" target="_blank">Reference Documentation - MergeE Step</a>
* @since 3.7.3
* @since 4.0.0
*/
public default <M, E2> GraphTraversal<S, E> option(final M token, final GValue<Map<Object, Object>> m) {
this.asAdmin().getGremlinLang().addStep(GraphTraversal.Symbols.option, token, m);
Expand All @@ -4542,7 +4542,7 @@ public default <M, E2> GraphTraversal<S, E> option(final M token, final GValue<M
* @return the traversal with the modulated step
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#mergev-step" target="_blank">Reference Documentation - MergeV Step</a>
* @see <a href="http://tinkerpop.apache.org/docs/${project.version}/reference/#mergee-step" target="_blank">Reference Documentation - MergeE Step</a>
* @since 3.7.3
* @since 4.0.0
*/
public default <M, E2> GraphTraversal<S, E> option(final Merge merge, final GValue<Map<Object, Object>> m, final VertexProperty.Cardinality cardinality) {
this.asAdmin().getGremlinLang().addStep(GraphTraversal.Symbols.option, merge, m, cardinality);
Expand Down

0 comments on commit e4b048c

Please sign in to comment.