From 6e5f00d32c1af617c7a74e21cb2173338047a3fe Mon Sep 17 00:00:00 2001 From: Ken Hu <106191785+kenhuuu@users.noreply.github.com> Date: Tue, 19 Nov 2024 13:46:43 -0800 Subject: [PATCH] Update documentation around TinkerPop HTTP API and serializers. --- docs/src/dev/io/graphson.asciidoc | 92 ++-- docs/src/dev/provider/index.asciidoc | 492 ++++++++---------- .../reference/gremlin-applications.asciidoc | 166 ++---- docs/src/reference/gremlin-variants.asciidoc | 77 ++- docs/src/reference/intro.asciidoc | 21 +- docs/src/upgrade/release-4.x.x.asciidoc | 114 +++- 6 files changed, 456 insertions(+), 506 deletions(-) diff --git a/docs/src/dev/io/graphson.asciidoc b/docs/src/dev/io/graphson.asciidoc index cede81a2da4..38ad6aa637b 100644 --- a/docs/src/dev/io/graphson.asciidoc +++ b/docs/src/dev/io/graphson.asciidoc @@ -19,9 +19,11 @@ limitations under the License. image:gremlin-graphson.png[width=350,float=left] GraphSON is a JSON-based format that is designed for human readable output that is easily supported in any programming language through the wide-array of JSON parsing libraries that -exist on virtually all platforms. GraphSON is considered both a "graph" format and a generalized object serialization -format. That characteristic makes it useful as a serialization format for Gremlin Server where arbitrary objects -of varying types may be returned as results. +exist on virtually all platforms. GraphSON versions 1 to 3 were considered to be both a "graph" format and a +generalized object serialization format. That characteristic makes it useful as a serialization format for Gremlin +Server where arbitrary objects of varying types may be returned as results. However, starting in GraphSON 4, GraphSON +is only intended to be a network serialization format that is only able to serialize specific types defined by the +format. It is only meant to be used with the TinkerPop HTTP API. When considering GraphSON as a "graph" format, the relevant feature to consider is the `writeGraph` and `readGraph` methods on the `GraphSONWriter` and `GraphSONReader` interfaces, respectively. These methods write the entire `Graph` @@ -48,7 +50,7 @@ writer.writeObject(os, graph); Generalized object serialization will be discussed later in this section, so for now the focus will be on the "graph" format. Unlike GraphML, GraphSON does not use an edge list format. It uses an adjacency list. In the adjacency list, each vertex is essentially a line in the file and the vertex line contains a list of all the edges associated with -that vertex. The GraphSON 3.0 representation looks like this for the Modern toy graph: +that vertex. The GraphSON 4.0 representation looks like this for the Modern toy graph: [source,json] ---- @@ -102,13 +104,11 @@ mime type is made explicit on requests to avoid breaking changes or unexpected r Version 4.0 of GraphSON was first introduced on TinkerPop 4.0.0 and is represented by the `application/vnd.gremlin-v4.0+json` mime type. There also exists an untyped version: -`application/vnd.gremlin-v3.0+json;types=false`. It is very similar to GraphSON 3.0, with just several key differences: +`application/vnd.gremlin-v4.0+json;types=false`. It is very similar to GraphSON 3.0, with just several key differences: many underused or duplicated types have been removed, labels are now list of strings and request/response formats have changed quite a bit, and custom types have been replaced with Provider Defined Type (PDT). -=== Core - -==== Boolean +=== Boolean Matches the JSON Boolean and doesn't have type information. @@ -122,7 +122,7 @@ true true ---- -==== Composite PDT +=== Composite PDT JSON Object with two required keys: "type" and "fields" + "type" is a JSON String + @@ -161,7 +161,7 @@ JSON Object with two required keys: "type" and "fields" + } ---- -==== DateTime +=== DateTime JSON String representing a datetime in the ISO-8601 format. @@ -178,7 +178,7 @@ JSON String representing a datetime in the ISO-8601 format. "2007-12-03T10:15:30+01:00" ---- -==== Double +=== Double A JSON Number with the same range as a IEEE754 double precision floating point or a JSON String with one of the following values: "-Infinity", "Infinity", "NaN" @@ -196,7 +196,7 @@ following values: "-Infinity", "Infinity", "NaN" 100.0 ---- -==== Float +=== Float A JSON Number with the same range as a IEEE754 single precision floating point or a JSON String with one of the following values: "-Infinity", "Infinity", "NaN" @@ -214,7 +214,7 @@ following values: "-Infinity", "Infinity", "NaN" 100.0 ---- -==== Integer +=== Integer A JSON Number with the same range as a 4-byte signed integer. @@ -231,7 +231,7 @@ A JSON Number with the same range as a 4-byte signed integer. 100 ---- -==== List +=== List List is a JSON Array. The type is used to distinguish between different collection types that are also mapped to JSON Array. The untyped version converts complex types to JSON String. @@ -257,7 +257,7 @@ Array. The untyped version converts complex types to JSON String. [ 1, "person", true, null ] ---- -==== Long +=== Long A JSON Number with the same range as a 8-byte signed integer. @@ -274,7 +274,7 @@ A JSON Number with the same range as a 8-byte signed integer. 100 ---- -==== Map +=== Map Map is a JSON Array to provide the ability to allow for non-String keys, which is not possible in JSON. The untyped version converts complex types to JSON String. @@ -325,7 +325,7 @@ version converts complex types to JSON String. } ---- -==== Null +=== Null Matches the JSON Null and doesn't have type information. @@ -339,7 +339,7 @@ null null ---- -==== Primitive PDT +=== Primitive PDT JSON Object with two required keys: "type" and "value" + "type" is a JSON String + @@ -364,7 +364,7 @@ JSON Object with two required keys: "type" and "value" + } ---- -==== Set +=== Set A JSON Array. The untyped version converts complex types to JSON String. @@ -389,7 +389,7 @@ A JSON Array. The untyped version converts complex types to JSON String. [ null, 2, "person", true ] ---- -==== String +=== String Matches the JSON String and doesn't have type information. @@ -403,7 +403,7 @@ Matches the JSON String and doesn't have type information. "abc" ---- -==== UUID +=== UUID JSON String form of UUID. @@ -420,9 +420,7 @@ JSON String form of UUID. "41d2e28a-20a4-4ab0-b379-d810dede3786" ---- -=== Graph Structure - -==== Edge +=== Edge JSON Object (required keys are: id, label, inVLabel, outVLabel, inV, outV) + "id" is any GraphSON 4.0 type + @@ -508,7 +506,7 @@ The untyped version has one additional required key "type" which is always "vert } ---- -==== Graph +=== Graph `TinkerGraph` has a custom serializer that is registered as part of the `TinkerIoRegistry`. Graph is a JSON Object with two required keys: "vertices" and "edges" + @@ -2165,7 +2163,7 @@ two required keys: "vertices" and "edges" + } ---- -==== Path +=== Path Object with two required keys: "labels" and "objects" + "labels" is a `g:List` of `g:Set` of labels of the steps traversed + @@ -2272,7 +2270,7 @@ Object with two required keys: "labels" and "objects" + } ---- -==== Property +=== Property JSON Object with two required keys: "key" and "value" + "key" is a `String` + @@ -2300,7 +2298,7 @@ JSON Object with two required keys: "key" and "value" + } ---- -==== Tree +=== Tree JSON Object with one or more possibly nested "key" "value" pairs "key" is an Element (`g:Vertex`, `g:Edge`, `g:VertexProperty`) @@ -2429,7 +2427,7 @@ JSON Object with one or more possibly nested "key" "value" pairs ] ---- -==== Vertex +=== Vertex JSON Object with required keys: "id", "label", "properties" + "id" is any GraphSON 4.0 type + @@ -2613,7 +2611,7 @@ The untyped version has one additional required key "type" which is always "vert } ---- -==== VertexProperty +=== VertexProperty JSON Object with required keys: "id", "value", "label", "properties" + "id" is any type GraphSON 4.0 type + @@ -2649,9 +2647,7 @@ JSON Object with required keys: "id", "value", "label", "properties" + } ---- -=== Graph Process - -==== BulkSet +=== BulkSet JSON Array that contains the expanded entries of the BulkSet. Note: BulkSet is serialized to g:List so there is no BulkSet deserializer. @@ -2673,7 +2669,7 @@ Note: BulkSet is serialized to g:List so there is no BulkSet deserializer. [ "marko", "josh", "josh" ] ---- -==== Direction +=== Direction JSON String of the enum value. @@ -2690,7 +2686,7 @@ JSON String of the enum value. "OUT" ---- -==== T +=== T JSON String of the enum value. @@ -2707,9 +2703,7 @@ JSON String of the enum value. "label" ---- -=== RequestMessage - -==== Standard +=== Standard Request The following `RequestMessage` is an example of a simple sessionless request for a script evaluation with parameters. @@ -2751,9 +2745,7 @@ The following `RequestMessage` is an example of a simple sessionless request for } ---- -=== ResponseMessage - -==== Standard Result +=== Standard Result The following `ResponseMessage` is a typical example of the typical successful response Gremlin Server will return when returning results from a script. @@ -2953,7 +2945,7 @@ The following `ResponseMessage` is a typical example of the typical successful r } ---- -==== Error Result +=== Error Result The following `ResponseMessage` is a typical example of the typical successful response Gremlin Server will return when returning results from a script. @@ -2988,8 +2980,6 @@ The following `ResponseMessage` is a typical example of the typical successful r } ---- -=== Extended - Note that the "extended" types require the addition of the separate `GraphSONXModuleV4d0` module as follows: [source,java] @@ -3000,7 +2990,7 @@ mapper = GraphSONMapper.build(). version(GraphSONVersion.V4_0).create().createMapper() ---- -==== BigDecimal +=== BigDecimal A JSON Number. @@ -3017,7 +3007,7 @@ A JSON Number. 123456789987654321123456789987654321 ---- -==== BigInteger +=== BigInteger A JSON Number. @@ -3034,7 +3024,7 @@ A JSON Number. 123456789987654321123456789987654321 ---- -==== Byte +=== Byte A JSON Number with the same range as a 1-byte signed integer. @@ -3051,7 +3041,7 @@ A JSON Number with the same range as a 1-byte signed integer. 1 ---- -==== Binary +=== Binary JSON String containing base64-encoded bytes @@ -3068,7 +3058,7 @@ JSON String containing base64-encoded bytes "c29tZSBieXRlcyBmb3IgeW91" ---- -==== Char +=== Char A JSON String containing a single UTF-8 encoded character. @@ -3085,7 +3075,7 @@ A JSON String containing a single UTF-8 encoded character. "x" ---- -==== Duration +=== Duration JSON String with ISO-8601 seconds based representation. The following example is a `Duration` of five days. @@ -3102,7 +3092,7 @@ JSON String with ISO-8601 seconds based representation. The following example is "PT120H" ---- -==== Short +=== Short A JSON Number with the same range as a 2-byte signed integer. diff --git a/docs/src/dev/provider/index.asciidoc b/docs/src/dev/provider/index.asciidoc index 995cf533502..20f79e87673 100644 --- a/docs/src/dev/provider/index.asciidoc +++ b/docs/src/dev/provider/index.asciidoc @@ -971,19 +971,15 @@ extensible nature of Gremlin Server, it is difficult to provide an authoritative It is however possible to describe the core communication protocol using the standard out-of-the-box configuration which should provide enough information to develop a driver for a specific language. -image::gremlin-server-flow.png[width=300,float=right] +Gremlin Server is distributed with a configuration that utilizes HTTP with a custom API. Under this configuration, +Gremlin Server accepts requests containing a Gremlin script, evaluates that script and then streams back the results in +HTTP chunks. -Gremlin Server is distributed with a configuration that utilizes link:http://en.wikipedia.org/wiki/WebSocket[WebSocket] -with a custom sub-protocol. Under this configuration, Gremlin Server accepts requests containing a Gremlin script, -evaluates that script and then streams back the results. The notion of "streaming" is depicted in the diagram to the -right. - -The diagram shows an incoming request to process the Gremlin script of `g.V()`. Gremlin Server evaluates that script, -getting an `Iterator` of vertices as a result, and steps through each `Vertex` within it. The vertices are batched -together given the `resultIterationBatchSize` configuration. In this case, that value must be `2` given that each -"response" contains two vertices. Each response is serialized given the requested serializer type (JSON is likely -best for non-JVM languages) and written back to the requesting client immediately. Gremlin Server does not wait for -the entire result to be iterated, before sending back a response. It will send the responses as they are realized. +Let's use the incoming request to process the Gremlin script of `g.V()` as an example. Gremlin Server evaluates that +script, getting an `Iterator` of vertices as a result, and steps through each `Vertex` within it. The vertices are +batched together into an HTTP chunk. Each response is serialized given the requested serializer type (GraphBinary is +recommended) and written back to the requesting client immediately. Gremlin Server does not wait for the entire result +to be iterated, before sending back a response. It will send the responses as they are realized. This approach allows for the processing of large result sets without having to serialize the entire result into memory for the response. It places a bit of a burden on the developer of the driver however, because it becomes necessary to @@ -992,340 +988,231 @@ Server returns for a single request. Again, this description of Gremlin Server' out-of-the-box configuration. It is quite possible to construct other flows, that might be more amenable to a particular language or style of processing. -It is recommended but not required that a driver include a `User-Agent` header as part of any web socket -handshake request to Gremlin Server. Gremlin Server uses the user agent in building usage metrics -as well as debugging. The standard format for connection user agents is: +NOTE: TinkerPop provides a test server which may be useful for testing drivers. Details can be found +link:https://tinkerpop.apache.org/docs/current/dev/developer/#gremlin-socket-server-tests[here] + +It is recommended but not required that a driver include a `User-Agent` header as part of any HTTP request to Gremlin +Server. Gremlin Server uses the user agent in building usage metrics as well as debugging. The standard format for +connection user agents is: +[[user-agent-format]] `"[Application Name] [GLV Name].[Version] [Language Runtime Version] [OS].[Version] [CPU Architecture]"` For example: `"MyTestApplication Gremlin-Java.3.5.4 11.0.16.1 Mac_OS_X.12.6.1 aarch64"` -To formulate a request to Gremlin Server, a `RequestMessage` needs to be constructed. The `RequestMessage` is a -generalized representation of a request that carries a set of "standard" values in addition to optional ones that are -dependent on the operation being performed. A `RequestMessage` has these fields: - -[width="100%",cols="3,10",options="header"] -|========================================================= -|Key |Description -|requestId |A link:http://en.wikipedia.org/wiki/Globally_unique_identifier[UUID] representing the unique identification for the request. -|op |The name of the "operation" to execute based on the available `OpProcessor` configured in the Gremlin Server. To evaluate a script, use `eval`. -|processor |The name of the `OpProcessor` to utilize. The default `OpProcessor` for evaluating scripts is unnamed and therefore script evaluation purposes, this value can be an empty string. -|args |A `Map` of arbitrary parameters to pass to Gremlin Server. The requirements for the contents of this `Map` are dependent on the `op` selected. -|========================================================= - -This message can be serialized in any fashion that is supported by Gremlin Server. New serialization methods can -be plugged in by implementing a `ServiceLoader` enabled `MessageSerializer`, however Gremlin Server provides for -JSON serialization by default which will be good enough for purposes of most developers building drivers. -A `RequestMessage` to evaluate a script with variable bindings looks like this in JSON: - -[source,js] ----- -{ "requestId":"1d6d02bd-8e56-421d-9438-3bd6d0079ff1", - "op":"eval", - "processor":"", - "args":{"gremlin":"g.V(x).out()", - "bindings":{"x":1}, - "language":"gremlin-groovy"}} ----- +The following section provides an in-depth description of the TinkerPop HTTP API. The HTTP API is used for +communicating the requests and responses that were described earlier. -The above JSON represents the "body" of the request to send to Gremlin Server. When sending this "body" over -WebSocket, Gremlin Server can accept a packet frame using a "text" (1) or a "binary" (2) opcode. Using "text" -is a bit more limited in that Gremlin Server will always process the body of that request as JSON. Generally speaking -"text" is just for testing purposes. +=== HTTP API -The preferred method for sending requests to Gremlin Server is to use the "binary" opcode. In this case, a "header" -will need be sent in addition to to the "body". The "header" basically consists of a "mime type" so that Gremlin -Server knows how to deserialize the `RequestMessage`. So, the actual byte array sent to Gremlin Server would be -formatted as follows: +This section describes the TinkerPop HTTP API which should be implemented by both graph system providers and graph +driver providers. There is only one endpoint that currently needs to be supported which is `POST /gremlin`. This +endpoint is a Gremlin evaluator which takes in a Gremlin script request and responds with the serialized results. The +formats below use a bit of pseudo-JSON to help represent request and response bodies. The actual format of the request +and response bodies will be determined by the serializers defined via the "Accept" and "Content-Type" headers. As a +result, a generic type definition in this document like "number" could translate to a "long" for a serializer that +supports types like GraphBinary. -image::gremlin-server-request.png[] +==== HTTP Request -The first byte represents the length of the "mime type" string value that follows. Given the default configuration of -Gremlin Server, this value should be set to `application/json`. The "payload" represents the JSON message above -encoded as bytes. - -NOTE: Gremlin Server will only accept masked packets as it pertains to a WebSocket packet header construction. - -When Gremlin Server receives that request, it will decode it given the "mime type", pass it to the requested -`OpProcessor` which will execute the `op` defined in the message. In this case, it will evaluate the script -`g.V(x).out()` using the `bindings` supplied in the `args` and stream back the results in a series of -`ResponseMessages`. A `ResponseMessage` looks like this: +To formulate a request to Gremlin Server, a `RequestMessage` needs to be constructed. The `RequestMessage` is a +generalized representation of a request. This message can be serialized in any fashion that is supported by Gremlin +Server, which by default is GraphBinary. An HTTP request that contains a `RequestMessage` has the following form: -[width="100%",cols="3,10",options="header"] -|========================================================= -|Key |Description -|requestId |The identifier of the `RequestMessage` that generated this `ResponseMessage`. -|status | The `status` contains a `Map` of three keys: `code` which refers to a `ResultCode` that is somewhat analogous to an link:http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html[HTTP status code], `attributes` that represent a `Map` of protocol-level information, and `message` which is just a human-readable `String` usually associated with errors. -|result | The `result` contains a `Map` of two keys: `data` which refers to the actual data returned from the server (the type of data is determined by the operation requested) and `meta` which is a `Map` of meta-data related to the response. -|========================================================= +[source,text] +---- +POST /gremlin HTTP/1.1 +Accept: +Content-Type: +Gremlin-Hints: + +{ + "gremlin": string, + "timeoutMs": number, + "bindings": object, + "g": string, + "language" : string, + "materializeProperties": string, + "bulkResults": boolean +} +---- -In this case the `ResponseMessage` returned to the client would look something like this: +An actual, complete request might look like the following: -[source,js] +[source,text] ---- -{"result":{"data":[{"id": 2,"label": "person","type": "vertex","properties": [ - {"id": 2, "value": "vadas", "label": "name"}, - {"id": 3, "value": 27, "label": "age"}]}, - ], "meta":{}}, - "requestId":"1d6d02bd-8e56-421d-9438-3bd6d0079ff1", - "status":{"code":206,"attributes":{},"message":""}} +POST /gremlin HTTP/1.1 +content-length: 61 +host: 127.0.0.1 +content-type: application/vnd.gremlin-v4.0+json +accept-encoding: deflate +accept: application/vnd.graphbinary-v4.0 +user-agent: NotAvailable Gremlin-Java.4.0.0 11.0.25 Windows_11.10.0 amd64 +{ + "gremlin": "g.V()", + "language": "gremlin-lang" +} ---- -Gremlin Server is capable of streaming results such that additional responses will arrive over the WebSocket connection until -the iteration of the result on the server is complete. Each successful incremental message will have a `ResultCode` -of `206`. Termination of the stream will be marked by a final `200` status code. Note that all messages without a -`206` represent terminating conditions for a request. The following table details the various status codes that -Gremlin Server will send: +===== Expected Request HTTP Headers -[width="100%",cols="2,2,9",options="header"] +[width="100%",cols="3,10,3,3",options="header"] |========================================================= -|Code |Name |Description -|200 |SUCCESS |The server successfully processed a request to completion - there are no messages remaining in this stream. -|204 |NO CONTENT |The server processed the request but there is no result to return (e.g. an `Iterator` with no elements) - there are no messages remaining in this stream. -|206 |PARTIAL CONTENT |The server successfully returned some content, but there is more in the stream to arrive - wait for a `SUCCESS` to signify the end of the stream. -|401 |UNAUTHORIZED |The request attempted to access resources that the requesting user did not have access to. -|403 |FORBIDDEN |The server could authenticate the request, but will not fulfill it. -|407 |AUTHENTICATE |A challenge from the server for the client to authenticate its request. -|497 |REQUEST ERROR SERIALIZATION |The request message contained an object that was not serializable. -|498 |REQUEST ERROR MALFORMED REQUEST |The request message was not properly formatted which means it could not be parsed at all or the "op" code was not recognized such that Gremlin Server could properly route it for processing. Check the message format and retry the request. -|499 |REQUEST ERROR INVALID REQUEST ARGUMENTS |The request message was parseable, but the arguments supplied in the message were in conflict or incomplete. Check the message format and retry the request. -|500 |SERVER ERROR |A general server error occurred that prevented the request from being processed. -|595 |SERVER ERROR FAIL STEP | A server error that is produced when the `fail()` step is triggered. The returned exception will include information consistent with the `Failure` interface. -|596 |SERVER ERROR TEMPORARY |A server error occurred, but it was temporary in nature and therefore the client is free to retry it's request as-is with the potential for success. -|597 |SERVER ERROR EVALUATION |The script submitted for processing evaluated in the `ScriptEngine` with errors and could not be processed. Check the script submitted for syntax errors or other problems and then resubmit. -|598 |SERVER ERROR TIMEOUT |The server exceeded one of the timeout settings for the request and could therefore only partially responded or did not respond at all. -|599 |SERVER ERROR SERIALIZATION |The server was not capable of serializing an object that was returned from the script supplied on the request. Either transform the object into something Gremlin Server can process within the script or install mapper serialization classes to Gremlin Server. +|Name |Description |Required |Default +|Accept |Serializer MIME types supported for the response. Must be a mimetype (see <>). |No |`application/vnd.gremlin-v4.0+json;types=false` +|Accept-Encoding |The requested compression algorithm of the response. Valid values: `deflate`. |No |N/A +|Authorization |Header used with Basic authorization. |No |N/A +|Content-Length |The size of the payload |Yes |N/A +|Content-Type |The MIME type of the serialized body |No |None +|Gremlin-Hints |A semi-colon separated list of key/value pair metadata that could be helpful to the server in processing a particular request in some way. Must be a hints (see table below). |No |N/A +|User-Agent |The user agent. Follow the format specified by <>. |No |<> |========================================================= -NOTE: Please refer to the link:https://tinkerpop.apache.org/docs/x.y.z/dev/io[IO Reference Documentation] for more -examples of `RequestMessage` and `ResponseMessage` instances. - -NOTE: Tinkerpop provides a test server which may be useful for testing drivers. Details can be found -link:https://tinkerpop.apache.org/docs/current/dev/developer/#gremlin-socket-server-tests[here] - -=== OpProcessors Arguments - -The following sections define a non-exhaustive list of available operations and arguments for embedded `OpProcessors` -(i.e. ones packaged with Gremlin Server). +===== Request Header Value Options -==== Common - -All `OpProcessor` instances support these arguments. - -[width="100%",cols="2,2,9",options="header"] +[width="100%",cols="3,10",options="header"] |========================================================= -|Key |Type |Description -|batchSize |Int |When the result is an iterator this value defines the number of iterations each `ResponseMessage` should contain - overrides the `resultIterationBatchSize` server setting. +|Name |Options +|mimetype |A MIME type listed in <>. +|hints | mutations: yes, no, unknown - Indicates if the Gremlin contains steps that can mutate the graph. |========================================================= -==== Standard OpProcessor +The body of the request should be a `RequestMessage` which is a `Map`. The `RequestMessage` should be serialized using +the serializer specified by the `Content-Type` header. The following are the key value pairs allowed in a +`RequestMessage`: -The "standard" `OpProcessor` handles requests for the primary function of Gremlin Server - executing Gremlin. -Requests made to this `OpProcessor` are "sessionless" in the sense that a request must encapsulate the entirety -of a transaction. There is no state maintained between requests. A transaction is started when the script is first -evaluated and is committed when the script completes (or rolled back if an error occurred). +===== Request Message Format -[width="100%",cols="3,10a",options="header"] +[width="100%",cols="3,10,3,3",options="header"] |========================================================= -|Key |Description -|processor |As this is the default `OpProcessor` this value can be set to an empty string. -|op |[width="100%",cols="3,10",options="header"] -!========================================================= -!Key !Description -!`authentication` !A request that contains the response to a server challenge for authentication. -!`eval` !Evaluate a Gremlin script provided as a `String`. -!========================================================= +|Key |Description |Value |Required +|gremlin |The Gremlin query to execute. |String containing script |Yes +|timeoutMs |The maximum time a query is allowed to execute in milliseconds. |Number between 0 and 2^31-1 |No +|bindings |A map used during query execution. Its usage depends on "language". For "gremlin-groovy", these are the variable bindings. For "gremlin-lang", these are the parameter bindings. |Object (Map) |No +|g |The name of the graph traversal source to which the query applies. Default: "g" |String containing traversal source name |No +|language |The name of the ScriptEngine to use to parse the gremlin query. Default: "gremlin-lang" |String containing ScriptEngine name |No +|materializeProperties |Whether to include all properties for results. One of "tokens" or "all". |String |No +|bulkResults |Whether the results should be bulked by the server (only applies to GraphBinary) |Boolean |No |========================================================= -**`authentication` operation arguments** - -[width="100%",cols="2,2,9",options="header"] -|========================================================= -|Key |Type |Description -|sasl |String | *Required* The response to the server authentication challenge. This value is dependent on the SASL authentication mechanism required by the server and is Base64 encoded. -|saslMechanism |String | The SASL mechanism: `PLAIN` or `GSSAPI`. Note that it is up to the server implementation to use or disregard this setting (default implementation in Gremlin Server ignores it). -|========================================================= +==== HTTP Response -**`eval` operation arguments** +When Gremlin Server receives that request, it will decode it given the "mime type", and execute it using the +`ScriptEngine` specified by the `language` field. In this case, it will evaluate the script `g.V(x).out()` using the +`bindings` supplied in the `args` and stream back the results in HTTP chunks. When the chunks are combined, they will +form a single `ResponseMessage`. The HTTP response containing the `ResponseMessage` has the following form: -[width="100%",cols="2,2,9",options="header"] -|========================================================= -|Key |Type |Description -|gremlin |String | *Required* The Gremlin script to evaluate. -|bindings |Map |A map of key/value pairs to apply as variables in the context of the Gremlin script. -|language |String |The flavor of Gremlin used (e.g. `gremlin-groovy`). -|aliases |Map |A map of key/value pairs that allow globally bound `Graph` and `TraversalSource` objects to -be aliased to different variable names for purposes of the current request. The value represents the name of the -global variable and its key represents the new binding name as it will be referenced in the Gremlin query. For -example, if the Gremlin Server defines two `TraversalSource` instances named `g1` and `g2`, it would be possible -to send an alias pair with key of "g" and value of "g2" and thus allow the script to refer to "g2" simply as "g". -|evaluationTimeout |Long |An override for the server setting that determines the maximum time to wait for a script to execute on the server. -|========================================================= +[source,text] +---- +HTTP/1.1 200 +Content-type: +Transfer-Encoding: chunked +Gremlin-RequestId: +{ + "result": list, + "status": object +} +---- -==== Session OpProcessor +NOTE: While this response message is expected for all serialized responses, there may be some errors that are not +serialized. In that case, the `Content-Type` of the response should be `application/json` and the JSON should contain a +`message` key. -The "session" `OpProcessor` handles requests for the primary function of Gremlin Server - executing Gremlin. It is -like the "standard" `OpProcessor`, but instead maintains state between sessions and allows the option to leave all -transaction management up to the calling client. It is important that clients that open sessions, commit or roll -them back, however Gremlin Server will try to clean up such things when a session is killed that has been abandoned. -It is important to consider that a session can only be maintained with a single machine. In the event that multiple -Gremlin Server are deployed, session state is not shared among them. +===== Response Message Format [width="100%",cols="3,10a",options="header"] |========================================================= |Key |Description -|processor |This value should be set to `session` -|op | -[cols="3,10",options="header"] +|result |A map that contains the result data. +[width="100%",cols="3,10,3,3",options="header"] +!========================================================= +!Name !Description !Required !Default +!data !A list of result objects. !Array !Yes +!========================================================= +|status |A map that contains the status of the result. +[width="100%",cols="3,10,3,3",options="header"] +!========================================================= +!Name !Description !Required !Default +!code !The actual <> of the result. !Number !Yes +!exception !A class of exception if an error occurred. !String !No +!message !The error message if an error occurred. !String !No !========================================================= -!Key !Description -!`authentication` !A request that contains the response to a server challenge for authentication. -!`eval` !Evaluate a Gremlin script provided as a `String`. -!`close` !Deprecated. Gremlin-Server will only return a `NO CONTENT` message. |========================================================= -NOTE: The "close" message related to sessions was deprecated as of 3.3.11. Closing sessions now relies on closing the connections. The function to accept `close` message on the server was removed -in 3.5.0, but has been added back as of 3.5.2. Servers wishing to be compatible with older versions of the driver need only send back a `NO_CONTENT` for -this message (which is what Gremlin Server does as of 3.5.0). Drivers wishing to be compatible with servers prior to -3.3.11 may continue to send the message on calls to `close()`, otherwise such code can be removed. - -**`authentication` operation arguments** +===== Expected Response HTTP Headers -[width="100%",cols="2,2,9",options="header"] +[width="100%",cols="3,10,3,3",options="header"] |========================================================= -|Key |Type |Description -|saslMechanism |String | The SASL mechanism: `PLAIN` or `GSSAPI`. Note that it is up to the server implementation to use or disregard this setting (default implementation in Gremlin Server ignores it). -|sasl |String | *Required* The response to the server authentication challenge. This value is dependent on the SASL authentication mechanism required by the server and is Base64 encoded. +|Name |Description |Required |Default +|Content-Type |The MIME type of the serialized body which is based on the request's `Accept` header. May also be "application/json". |Yes |N/A +|Gremlin-RequestId |The server generated UUID that is used as a request ID. |Yes |N/A +|Transfer-Encoding |The server should attempt to chunk all responses. |No |"chunked" |========================================================= -**`eval` operation arguments** +===== Response Header Value Options -[width="100%",options="header"] -|========================================================= -|Key |Type |Description -|gremlin |String | *Required* The Gremlin script to evaluate. -|session |String | *Required* The session identifier for the current session - typically this value should be a UUID (the session will be created if it doesn't exist). -|manageTransaction |Boolean |When set to `true` the transaction for the current request is auto-committed or rolled-back as are done with sessionless requests - defaulted to `false`. -|bindings |Map |A map of key/value pairs to apply as variables in the context of the Gremlin script. -|evaluationTimeout |Long |An override for the server setting that determines the maximum time to wait for a script to execute on the server. -|language |String |The flavor of Gremlin used (e.g. `gremlin-groovy`) -|aliases |Map |A map of key/value pairs that allow globally bound `Graph` and `TraversalSource` objects to -be aliased to different variable names for purposes of the current request. The value represents the name the -global variable and its key represents the new binding name as it will be referenced in the Gremlin query. For -example, if the Gremlin Server defines two `TraversalSource` instances named `g1` and `g2`, it would be possible -to send an alias pair with key of "g" and value of "g2" and thus allow the script to refer to "g2" simply as "g". -|========================================================= - -**`close` operation arguments** -[width="100%",cols="2,2,9",options="header"] +[width="100%",cols="3,10",options="header"] |========================================================= -|Key |Type |Description -|session |String | *Required* The session identifier for the session to close. -|force |Boolean | Determines if the session should be force closed when the client is closed. Force closing will not -attempt to close open transactions from existing running jobs and leave it to the underlying graph to decided how to -proceed with those orphaned transactions. Setting this to `true` tends to lead to faster close operation and release -of resources which can be desirable if Gremlin Server has a long session timeout and a long script evaluation timeout -as attempts to close long run jobs can occur more rapidly. If not provided, this value is `false`. +|Name |Options +|mimetype |A MIME type listed in <>. +|uuid |A randomly generated UUID string. |========================================================= -==== Traversal OpProcessor - -Both the Standard and Session OpProcessors allow for Gremlin scripts to be submitted to the server. The -`TraversalOpProcessor` however allows Gremlin `Bytecode` to be submitted to the server. Supporting this `OpProcessor` -makes it possible for a link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-drivers-variants[Gremlin Language Variant] -to submit a `Traversal` directly to Gremlin Server in the native language of the GLV without having to use a script in -a different language. - -Unlike Standard and Session OpProcessors, the Traversal OpProcessor does not simply return the results of the -`Traversal`. It instead returns `Traverser` objects which allows the client to take advantage of -link:https://tinkerpop.apache.org/docs/x.y.z/reference/#barrier-step[bulking]. To describe this interaction more -directly, the returned `Traverser` will represent some value from the `Traversal` result and the number of times it -is represented in the full stream of results. So, if a `Traversal` happens to return the same vertex twenty times -it won't return twenty instances of the same object. It will return one in `Traverser` with the `bulk` value set to -twenty. Under this model, the amount of processing and network overhead can be reduced considerably. - -To demonstrate consider this example: - -[gremlin-groovy] ----- -cluster = Cluster.open() -client = cluster.connect() -aliased = client.alias("g") -g = traversal().with(org.apache.tinkerpop.gremlin.structure.util.empty.EmptyGraph.instance()) <1> -rs = aliased.submit(g.V().both().barrier().both().barrier()).all().get() <2> -aliased.submit(g.V().both().barrier().both().barrier().count()).all().get().get(0).getInt() <3> -rs.collect{[value: it.getObject().get(), bulk: it.getObject().bulk()]} <4> ----- - -<1> All commands through this step are just designed to demonstrate bulking with Gremlin Server and don't represent -a real-world way that this feature would be used. -<2> Submit a `Traversal` that happens to ensure that the server uses bulking. Note that a `Traverser` is returned -and that there are only six results. -<3> In actuality, however, if this same `Traversal` is iterated there are thirty results. Without bulking, the previous -request would have sent back thirty traversers. -<4> Note that the sum of the bulk of each `Traverser` is thirty. +[[http-status-codes]] +===== Response Status Codes -The full iteration of a `Traversal` is thus left to the client. It must interpret the bulk on the `Traverser` and -unroll it to represent the actual number of times it exists when iterated. The unrolling is typically handled -directly within TinkerPop's remote traversal implementations. +The following table details the HTTP status codes that Gremlin Server will send: -[width="100%",cols="3,10a",options="header"] +[width="100%",cols="2,2,9",options="header"] |========================================================= -|Key |Description -|processor |This value should be set to `traversal` -|op | -[cols="3,10",options="header"] -!========================================================= -!Key !Description -!`authentication` !A request that contains the response to a server challenge for authentication. -!`bytecode` !A request that contains the `Bytecode` representation of a `Traversal`. +|Code |Name |Description +|200 |SUCCESS |The server successfully processed a request to completion - there are no messages remaining in this stream. +|204 |NO CONTENT |The server processed the request but there is no result to return (e.g. an `Iterator` with no elements) - there are no messages remaining in this stream. +|206 |PARTIAL CONTENT |The server successfully returned some content, but there is more in the stream to arrive - wait for a `SUCCESS` to signify the end of the stream. +|400 |BAD REQUEST |There was a problem with the HTTP request. +|401 |UNAUTHORIZED |The request attempted to access resources that the requesting user did not have access to. +|403 |FORBIDDEN |The server could authenticate the request, but will not fulfill it. +|404 |NOT FOUND |The server was unable to find the requested resource. +|405 |METHOD NOT ALLOWED |The request used an unsupported method. The server only supports POST. +|413 |REQUEST ENTITY TOO LARGE |The request was too large or the query could not be compiled due to size limitations. +|500 |INTERNAL SERVER ERROR |A general server error occurred that prevented the request from being processed. +|505 |HTTP VERSION NOT SUPPORTED |A server error indicating that an unsupported version of HTTP is being used. Only HTTP/1.1 is supported. |========================================================= -**`authentication` operation arguments** +===== Trailing Headers -[width="100%",cols="2,2,9",options="header"] -|========================================================= -|Key |Type |Description -|sasl |String | *Required* The response to the server authentication challenge. This value is dependent on the SASL authentication mechanism required by the server and is Base64 encoded. -|========================================================= +Error responses will have trailing headers in addition to the status object in the response body. This information is +duplicated and should be the same, so graph driver providers should use whichever is easier for them. The trailers, +however, will only contain the `Status` and `Exception` without the `Message`. -**`bytecode` operation arguments** +==== HTTP Examples -[width="100%",cols="2,2,9",options="header"] -|========================================================= -|Key |Type |Description -|gremlin |String | *Required* The `Bytecode` representation of a `Traversal`. -|aliases |Map | *Required* A map with a single key/value pair that refers to a globally bound `TraversalSource` object -to be aliased to different variable names for purposes of the current request. The value represents the name of the -global variable and its key represents the new binding name as it will be referenced in the Gremlin query. For -example, if the Gremlin Server defines two `TraversalSource` instances named `g1`, it would be possible -to send an alias pair with key of "g" and value of "g1" and thus allow the script to refer to "g1" simply as "g". Note -that unlike users of `alias` in other contexts, in this case, the key can *only* be set to "g" and there can be only -one key value pair present (since only one `Traversal` is being submitted, there is no sense to having more than a -single alias). -|========================================================= +For examples of actual requests and responses, take a look at the IO documentation for +link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#_requestmessage[GraphSON requests] and +link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#_responsemessage[GraphSON responses]. -=== Authentication and Authorization +=== HTTP Request Interceptor -Gremlin Server supports link:https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer[SASL-based] -authentication. A SASL implementation provides a series of challenges and responses that a driver must comply with -in order to authenticate. Gremlin Server supports the "PLAIN" SASL mechanism, which is a cleartext -password system, for all link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-drivers-variants[Gremlin Language Variants]. -Other SASL mechanisms supported for selected clients are listed in the -link:https://tinkerpop.apache.org/docs/x.y.z/reference/#security[security section of the Gremlin Server reference documentation]. +A graph driver may support HTTP request intercepting which provides a means for the user of your graph driver to update +the headers and body of the HTTP request before it is sent to the server. This enables use cases where a graph system +provider's server implementation has additional capabilities that aren't included in the base Gremlin Server. Although +every graph system provider is expected to support the protocol defined by the TinkerPop HTTP API, this doesn't +preclude them from including additional functionality. Be aware that if you choose to not provide this functionality, +then your graph driver may not have access to some graph provider's features, or, possibly, it may not be able to +connect at all. -When authentication is enabled, an incoming request is intercepted before it is evaluated by the `ScriptEngine`. The -request is saved on the server and a `AUTHENTICATE` challenge response (status code `407`) is returned to the client. +=== Authentication and Authorization -The client will detect the `AUTHENTICATE` and respond with an `authentication` for the `op` and an `arg` named `sasl`. -In case of the "PLAIN" SASL mechanism the `arg` contains the password. The password should be either, an encoded -sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL), where the form is : `usernamepassword`, or a Base64 -encoded string of the former (which in this instance would be `AHVzZXJuYW1lAHBhc3N3b3Jk`). Should Gremlin Server be -able to authenticate with the provided credentials, the server will return the results of the original request as it -normally does without authentication. If it cannot authenticate given the challenge response from the client, it will -return `UNAUTHORIZED` (status code `401`). +By default, Gremlin Server only supports +link:https://en.wikipedia.org/wiki/Basic_access_authentication[basic HTTP authentication]. This is handled by the +`HttpBasicAuthenticationHandler` which is the only `AbstractAuthenticationHandler` provided with the Gremlin Server. +Other common HTTP authentication schemes that are sent via an HTTP header can be supported by implementing a custom +`AbstractAuthenticationHandler`. Because the communication protocol is HTTP/1.1, authentication should be header-based +and should not include negotiation. -NOTE: Gremlin Server does not support the "authorization identity" as described in link:https://tools.ietf.org/html/rfc4616[RFC4616]. +When basic authentication is enabled, an incoming request is intercepted before it is evaluated by the `ScriptEngine`. +The request is examined for an `Authorization` header. If one doesn't exist then "401 Unauthorized" error response is +returned. In addition to authenticating users at the start of a connection, Gremlin Server allows providers to authorize users on a per request basis. If @@ -1341,6 +1228,39 @@ link:https://tinkerpop.apache.org/docs/x.y.z/reference/#security[reference docum NOTE: While Gremlin Server supports this authorization feature it is not a feature that TinkerPop requires of graph providers as part of the agreement between client and server. +[[serializers]] +=== Serializers + +In order to serialize and deserialize the requests and responses, your graph driver will need to implement +link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphbinary[GraphBinary]. The Gremlin Server is capable of +returning both GraphBinary and GraphSON, however, GraphBinary is a more compact format which can lead to increased +performance as fewer bytes need to be sent through the wire. For this reason, drivers only need to support GraphBinary. +link:https://tinkerpop.apache.org/docs/x.y.z/dev/io/#graphson[GraphSON] can be used by applications that only support JSON serialization. + +The following table lists the serializers supported by the Gremlin Server and their MIME types. These MIME types should +be used in the `Content-Type` and `Accept` HTTP headers. + +[width="100%",cols="3,5,5",options="header"] +|========================================================= +|Name |Description |MIME type +|Untyped GraphSON 4.0 |A JSON-based graph format |application/vnd.gremlin-v4.0+json;types=false +|Typed GraphSON 4.0 |A JSON-based graph format with embedded type information used for serialization |application/vnd.gremlin-v4.0+json;types=true +|GraphBinary 4.0 |A binary graph format |application/vnd.graphbinary-v4.0 +|========================================================= + +==== IO Tests + +The IO test suite is a collection of files that contain the expected outcome of serialization of certain types. These +tests can be used to determine if a particular serializer has been correctly implemented. In general, a driver should +be able to "round trip" each of these types. That is, it should be able to both read from and write to those exact same +bytes. Not all programming languages provide library types that will match the specification of the corresponding type +defined by the serializer. In this case, it is not possible to completely round trip that type and you may skip that +test. The GraphBinary test files can be found +link:https://github.com/apache/tinkerpop/tree/x.y.z/gremlin-test/src/main/resources/org/apache/tinkerpop/gremlin/structure/io/graphbinary[here]. +The link:https://github.com/apache/tinkerpop/blob/x.y.z/gremlin-util/src/test/java/org/apache/tinkerpop/gremlin/structure/io/AbstractTypedCompatibilityTest.java:[Java implementation] +can be used as a reference on how these files can be used and its +link:https://github.com/apache/tinkerpop/blob/x.y.z/gremlin-util/src/test/java/org/apache/tinkerpop/gremlin/structure/io/Model.java[model] +shows the Java representation of those files. [[gremlin-plugins]] == Gremlin Plugins diff --git a/docs/src/reference/gremlin-applications.asciidoc b/docs/src/reference/gremlin-applications.asciidoc index 2f1d7f3d588..f076f1ae1cf 100644 --- a/docs/src/reference/gremlin-applications.asciidoc +++ b/docs/src/reference/gremlin-applications.asciidoc @@ -462,8 +462,8 @@ NOTE: Gremlin Server is the replacement for link:https://github.com/tinkerpop/re NOTE: Please see the link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/[Provider Documentation] for information on how to develop a driver for Gremlin Server. -By default, communication with Gremlin Server occurs over link:http://en.wikipedia.org/wiki/WebSocket[WebSocket] and -exposes a custom sub-protocol for interacting with the server. +By default, communication with Gremlin Server occurs over HTTP/1.1. The TinkerPop HTTP API is described in the +link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_http_api[HTTP provider documentation]. WARNING: Gremlin Server allows for the execution of remotely submitted "scripts" (i.e. arbitrary code sent by a client to the server). Developers should consider the security implications involved in running Gremlin Server without the @@ -491,17 +491,13 @@ $ bin/gremlin-server.sh conf/gremlin-server-modern.yaml [INFO] ServerGremlinExecutor - Initialized GremlinExecutor and preparing GremlinScriptEngines instances. [INFO] ServerGremlinExecutor - Initialized gremlin-groovy GremlinScriptEngine and registered metrics [INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[tinkergraph[vertices:0 edges:0], standard] -[INFO] OpLoader - Adding the standard OpProcessor. -[INFO] OpLoader - Adding the session OpProcessor. -[INFO] OpLoader - Adding the traversal OpProcessor. [INFO] GremlinServer - Executing start up LifeCycleHook [INFO] Logger$info - Loading 'modern' graph data. [INFO] GremlinServer - idleConnectionTimeout was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled [INFO] GremlinServer - keepAliveInterval was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled -[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3 -[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3 -[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0 with org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1 -[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v1.0-stringd with org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1 +[INFO] AbstractChannelizer - Configured application/vnd.gremlin-v4.0+json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV4 +[INFO] AbstractChannelizer - Configured application/json with org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV4 +[INFO] AbstractChannelizer - Configured application/vnd.graphbinary-v4.0 with org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV4 [INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1. [INFO] GremlinServer$1 - Channel started at port 8182. ---- @@ -824,63 +820,47 @@ They are all still evaluated locally. [[connecting-via-http]] === Connecting via HTTP -image:gremlin-rexster.png[width=225,float=left] While the default behavior for Gremlin Server is to provide a -WebSocket-based connection, it can also be configured to support plain HTTP web service. -The HTTP endpoint provides for a communication protocol familiar to most developers, with a wide support of -programming languages, tools and libraries for accessing it. As a result, HTTP provides a fast way to get started -with Gremlin Server. It also may represent an easier upgrade path from link:https://github.com/tinkerpop/rexster[Rexster] -as the API for the endpoint is very similar to Rexster's link:https://github.com/tinkerpop/rexster/wiki/Gremlin-Extension[Gremlin Extension]. - -IMPORTANT: TinkerPop provides and supports this HTTP endpoint as a convenience and for legacy reasons, but users should -prefer the recommended approach of bytcode based requests as described in <> -section. +image:gremlin-rexster.png[width=225,float=left] The HTTP endpoint provides for a communication protocol familiar to +most developers, with a wide support of programming languages, tools and libraries for accessing it. As a result, HTTP +provides a fast way to get started with Gremlin Server. -Gremlin Server provides for a single HTTP endpoint - a Gremlin evaluator - which allows the submission of a Gremlin -script as a request. For each request, it returns a response containing the serialized results of that script. -To enable this endpoint, Gremlin Server needs to be configured with the `HttpChannelizer`, which replaces the default. -The `WsAndHttpChannelizer` may also be configured to enable both WebSockets and the REST endpoint in the configuration -file: - -[source,yaml] -channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer +Gremlin Server implements the link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_http_api[TinkerPop HTTP API]. +It provides a single endpoint which allows for the submission of a Gremlin script as a request. For each request, it +returns a response containing the serialized results of that script. -[source,yaml] -channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer - -The `HttpChannelizer` is already configured in the `gremlin-server-rest-modern.yaml` file that is packaged with the Gremlin +The `HttpChannelizer` is already configured in the `gremlin-server-modern.yaml` file that is packaged with the Gremlin Server distribution. To utilize it, start Gremlin Server as follows: [source,text] -bin/gremlin-server.sh conf/gremlin-server-rest-modern.yaml +bin/gremlin-server.sh conf/gremlin-server-modern.yaml Once the server has started, issue a request. Here's an example with link:http://curl.haxx.se/[cURL]: -[source,text] -$ curl "http://localhost:8182?gremlin=100-1" - -which returns: - -[source,js] -{ - "result":{"data":99,"meta":{}}, - "requestId":"0581cdba-b152-45c4-80fa-3d36a6eecf1c", - "status":{"code":200,"attributes":{},"message":""} -} - -The above example showed a `GET` operation, but the preferred method for this endpoint is `POST`: - [source,text] curl -X POST -d "{\"gremlin\":\"100-1\"}" "http://localhost:8182" -which returns: +returns: [source,js] { - "result":{"data":99,"meta":{}}, - "requestId":"ef2fe16c-441d-4e13-9ddb-3c7b5dfb10ba", - "status":{"code":200,"attributes":{},"message":""} + "result": { + "data": { + "@type": "g:List", + "@value": [ + { + "@type": "g:Int32", + "@value": 99 + } + ] + } + }, + "status": { + "code": 200 + } } +`POST` is the only supported method for this endpoint. This means that `GET` with query parameters is not supported. + It is also preferred that Gremlin scripts be parameterized when possible via `bindings`: [source,text] @@ -897,6 +877,9 @@ Gremlin script. The caveat is that these arguments will always be treated as `S types are preserved or to pass complex objects such as lists or maps, use `POST` which will at least support the allowed JSON data types. +NOTE: The Gremlin Server doesn't support link:https://en.wikipedia.org/wiki/HTTP_pipelining[HTTP pipelining]. Attempts +to use this feature will cause the server to throw an error and may lead to results being sent out-of-order. + Passing the `Accept` header with a valid MIME type will trigger the server to return the result in a particular format. Note that in addition to the formats available given the server's `serializers` configuration, there is also a basic `text/plain` format which produces a text representation of results similar to the Gremlin Console: @@ -993,7 +976,7 @@ The following table describes the various YAML configuration options that Gremli |authentication.config |A `Map` of configuration settings to be passed to the `Authenticator` when it is constructed. The settings available are dependent on the implementation. |_none_ |authorization.authorizer |The fully qualified classname of an `Authorizer` implementation to use. |_none_ |authorization.config |A `Map` of configuration settings to be passed to the `Authorizer` when it is constructed. The settings available are dependent on the implementation. |_none_ -|channelizer |The fully qualified classname of the `Channelizer` implementation to use. A `Channelizer` is a "channel initializer" which Gremlin Server uses to define the type of processing pipeline to use. By allowing different `Channelizer` implementations, Gremlin Server can support different communication protocols (e.g. WebSocket). |`WebSocketChannelizer` +|channelizer |The fully qualified classname of the `Channelizer` implementation to use. A `Channelizer` is a "channel initializer" which Gremlin Server uses to define the type of processing pipeline to use. By allowing different `Channelizer` implementations, Gremlin Server can support different communication protocols (e.g. HTTP). |`HttpChannelizer` |enableAuditLog |The `AuthenticationHandler`, `AuthorizationHandler` and processors can issue audit logging messages with the authenticated user, remote socket address and requests with a gremlin query. For privacy reasons, the default value of this setting is false. The audit logging messages are logged at the INFO level via the `audit.org.apache.tinkerpop.gremlin.server` logger, which can be configured using the `logback.xml` file. |_false_ |graphManager |The fully qualified classname of the `GraphManager` implementation to use. A `GraphManager` is a class that adheres to the TinkerPop `GraphManager` interface, allowing custom implementations for storing and managing graph references, as well as defining custom methods to open and close graphs instantiations. To prevent Gremlin Server from starting when all graphs fails, the `CheckedGraphManager` can be used.|`DefaultGraphManager` |graphs |A `Map` of `Graph` configuration files where the key of the `Map` becomes the name to which the `Graph` will be bound and the value is the file name of a `Graph` configuration file. |_none_ @@ -1343,36 +1326,16 @@ authorization and protective measures against malicious script execution. Clien chosen. Script execution options are covered <>. This section starts with authentication. -Gremlin Server supports a pluggable authentication framework using -link:https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer[SASL] (Simple Authentication and -Security Layer). Depending on the client used to connect to Gremlin Server, different authentication -mechanisms are accessible, see the table below. - -[width="70%",cols="3,5,3",options="header"] -|========================================================= -|Client |Authentication mechanism |Availability -|HTTP |BASIC |3.0.0-incubating -1.3+v|Gremlin-Java/ -Gremlin-Console |PLAIN SASL (username/password) |3.0.0-incubating -|Pluggable SASL |3.0.0-incubating -|GSSAPI SASL (Kerberos) |3.3.0 -|Gremlin.NET |PLAIN SASL |3.3.0 -1.2+v|Gremlin-Python |PLAIN SASL |3.2.2 -|GSSAPI SASL (Kerberos) |3.4.7 -|Gremlin.Net |PLAIN SASL |3.2.7 -|Gremlin-Javascript |PLAIN SASL |3.3.0 -|Gremlin-go |PLAIN SASL |3.5.4 -|========================================================= +By default, Gremlin Server supports HTTP basic authentication. By default, Gremlin Server is configured to allow all requests to be processed (i.e. no authentication). To enable authentication, Gremlin Server must be configured with an `Authenticator` implementation in its YAML file. Gremlin -Server comes packaged with two implementations called `SimpleAuthenticator` for plain text authentication using HTTP -BASIC or PLAIN SASL and `Krb5Authenticator` for Kerberos authentication using GSSAPI SASL. +Server comes packaged with an implementation called `SimpleAuthenticator` for plain text authentication using HTTP +BASIC. ==== Plain text authentication -The `SimpleAuthenticator` implements the "PLAIN" SASL mechanism (i.e. plain text) to authenticate a request. It also -supports handling basic authentication requests from http clients. It validates +The `SimpleAuthenticator` supports handling basic authentication requests from http clients. It validates username/password pairs against a graph database, which must be provided to it as part of the configuration. [source,yaml] @@ -1524,59 +1487,6 @@ Obviously, this data will not be retained and usable with Gremlin Server. It wou TinkerGraph to persist that data or to manually persist it (e.g. write the graph data to Gryo) once changes are complete. Alternatively, use a persistent graph to hold the credentials and configure Gremlin Server accordingly. -[[krb5authenticator]] -==== Kerberos Authentication - -The `Krb5Authenticator` implements the "GSSAPI" SASL mechanism (i.e. Kerberos) to authenticate a request from a Gremlin -client. It can be applied in an existing Kerberos environment and validates whether a -link:https://www.roguelynn.com/words/explain-like-im-5-kerberos/[valid authentication proof and service ticket are -offered]. - -[source,yaml] -authentication: { - authenticator: org.apache.tinkerpop.gremlin.server.auth.Krb5Authenticator, - config: { - principal: gremlinserver/hostname.your.org@YOUR.REALM, - keytab: /etc/security/keytabs/gremlinserver.service.keytab}} - -`Krb5Authenticator` needs a Kerberos service principal and a keytab that holds the secret key for that principal. The keytab -location and service name, e.g. gremlinserver, are free to be chosen. `Krb5Authenticator` finds the KDC's hostname and -port from the krb5.conf file with Kerberos configurations. This file can reside at either the -https://web.mit.edu/kerberos/krb5-devel/doc/mitK5defaults.html[default location] or a location to be specified as a -system property in the JAVA_OPTIONS environment variable of Gremlin Server: - -[source, bash] -export JAVA_OPTIONS="${JAVA_OPTIONS} -Xms512m -Xmx4096m -Djava.security.krb5.conf=/etc/krb5.conf" - -Gremlin clients have to specify the service name as the `protocol` connection parameter. For Gremlin-Console the -`protocol` is an entry in the remote.yaml file, for Gremlin-java the client builder has a `protocol()` method. - -In addition to the `protocol`, the Gremlin client needs to specify a `jaasEntry`, an entry in the -link:https://en.wikipedia.org/wiki/Java_Authentication_and_Authorization_Service[JAAS] configuration file. As a -start one can define a conf/gremlin-jaas.conf file with a `GremlinConsole` jaasEntry: - -[source, jaas] -GremlinConsole { - com.sun.security.auth.module.Krb5LoginModule required - doNotPrompt=true - useTicketCache=true; -}; - -This configuration tells Gremlin Console to pass authentication requests from Gremlin Server to the Krb5LoginModule, which is -part of the java standard library. The Krb5LoginModule does not prompt the user for a username and password but uses the -ticket cache that is normally refreshed when a user logs in to a host within the Kerberos realm. - -The Gremlin client needs the location of the JAAS configuration file to be passed as a system property to the JVM. For -Gremlin-Console the easiest way to do this is to pass it to the run script via the JAVA_OPTIONS environment property. -If the krb5.conf Kerberos configuration file is not available from the -https://web.mit.edu/kerberos/krb5-devel/doc/mitK5defaults.html[default location] it has to be provided as a system -property as well: - -[source, bash] -JAAS_OPTION="-Djava.security.auth.login.config=conf/gremlin-jaas.conf" -KRB5_OPTION="-Djava.security.krb5.conf=/etc/krb5.conf" -export JAVA_OPTIONS="${JAVA_OPTIONS} ${KRB5_OPTION} ${JAAS_OPTION}" - [[authorization]] ==== Authorization diff --git a/docs/src/reference/gremlin-variants.asciidoc b/docs/src/reference/gremlin-variants.asciidoc index 7e4445bc5c1..5ef6a470570 100644 --- a/docs/src/reference/gremlin-variants.asciidoc +++ b/docs/src/reference/gremlin-variants.asciidoc @@ -69,7 +69,8 @@ providing more detailed information about usage, configuration and known limitat [[gremlin-go]] == Gremlin-Go -WARNING: Gremlin-Go is not available in this Milestone release, please consider testing with Java or Python. +IMPORTANT: 4.0 Milestone Release - Gremlin-Go is not available in this milestone, please consider testing with Java or +Python. image:gremlin-go.png[width=130,float=right] Apache TinkerPop's Gremlin-Go implements Gremlin within the link:https://go.dev/[Go] language and can therefore be used on different operating systems. Go's syntax has the similar constructs as Java including "dot notation" for function chaining (`a.b.c`) and round bracket function arguments (`a(b,c)`). Something unlike Java is that Gremlin-Go requires a @@ -792,20 +793,21 @@ The following table describes the various configuration options for the Gremlin [width="100%",cols="3,10,^2",options="header"] |========================================================= |Key |Description |Default -|connectionPool.channelizer |The fully qualified classname of the client `Channelizer` that defines how to connect to the server. |`Channelizer.WebSocketChannelizer` +|auth.type |Type of Auth to submit on requests that require authentication. Can be: `basic` or `sigv4`. |_""_ +|auth.username |The username to submit on requests that require basic authentication. |_none_ +|auth.password |The password to submit on requests that require basic authentication. |_none_ +|auth.region |The region setting for sigv4 authentication. |_none_ +|auth.serviceName |The service name setting for sigv4 authentication. |_none_ +|connectionPool.connectionSetupTimeoutMillis | Duration of time in milliseconds provided for connection setup to complete which includes WebSocket protocol handshake and SSL handshake. |15000 |connectionPool.enableSsl |Determines if SSL should be enabled or not. If enabled on the server then it must be enabled on the client. |false -|connectionPool.keepAliveInterval |Length of time in milliseconds to wait on an idle connection before sending a keep-alive request. Set to zero to disable this feature. |180000 +|connectionPool.idleConnectionTimeout | Duration of time in milliseconds that the driver will allow a channel to not receive read or writes before it automatically closes. |180000 |connectionPool.keyStore |The private key in JKS or PKCS#12 format. |_none_ |connectionPool.keyStorePassword |The password of the `keyStore` if it is password-protected. |_none_ |connectionPool.keyStoreType |`JKS` (Java 8 default) or `PKCS12` (Java 9+ default)|_none_ |connectionPool.maxResponseContentLength |The maximum length in bytes that a message can be received from the server. |2147483647 -|connectionPool.maxInProcessPerConnection |The maximum number of in-flight requests that can occur on a connection. |4 -|connectionPool.maxSimultaneousUsagePerConnection |The maximum number of times that a connection can be borrowed from the pool simultaneously. |16 |connectionPool.maxSize |The maximum size of a connection pool for a host. |128 |connectionPool.maxWaitForConnection |The amount of time in milliseconds to wait for a new connection before timing out. |3000 |connectionPool.maxWaitForClose |The amount of time in milliseconds to wait for pending messages to be returned from the server before closing the connection. |3000 -|connectionPool.minInProcessPerConnection |The minimum number of in-flight requests that can occur on a connection. |1 -|connectionPool.minSimultaneousUsagePerConnection |The maximum number of times that a connection can be borrowed from the pool simultaneously. |8 |connectionPool.reconnectInterval |The amount of time in milliseconds to wait before trying to reconnect to a dead host. |1000 |connectionPool.resultIterationBatchSize |The override value for the size of the result batches to be returned from the server. |64 |connectionPool.sslCipherSuites |The list of JSSE ciphers to support for SSL connections. If specified, only the ciphers that are listed and supported will be enabled. If not specified, the JVM default is used. |_none_ @@ -814,20 +816,17 @@ The following table describes the various configuration options for the Gremlin |connectionPool.trustStore |File location for a SSL Certificate Chain to use when SSL is enabled. If this value is not provided and SSL is enabled, the default `TrustManager` will be used. |_none_ |connectionPool.trustStorePassword |The password of the `trustStore` if it is password-protected |_none_ |connectionPool.validationRequest |A script that is used to test server connectivity. A good script to use is one that evaluates quickly and returns no data. The default simply returns an empty string, but if a graph is required by a particular provider, a good traversal might be `g.inject()`. |_''_ -|connectionPool.connectionSetupTimeoutMillis | Duration of time in milliseconds provided for connection setup to complete which includes WebSocket protocol handshake and SSL handshake. |15000 -|connectionPool.idleConnectionTimeout | Duration of time in milliseconds that the driver will allow a channel to not receive read or writes before it automatically closes. |180000 +|bulkResults |Sets whether the server should attempt to get bulk results or not. |false +|enableUserAgentOnConnect |Enables sending a user agent to the server during connection requests. More details can be found in provider docs link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_graph_driver_provider_requirements[here].|true |hosts |The list of hosts that the driver will connect to. |localhost -|jaasEntry |Sets the `AuthProperties.Property.JAAS_ENTRY` properties for authentication to Gremlin Server. |_none_ |nioPoolSize |Size of the pool for handling request/response operations. |available processors |password |The password to submit on requests that require authentication. |_none_ |path |The URL path to the Gremlin Server. |_/gremlin_ |port |The port of the Gremlin Server to connect to. The same port will be applied for all hosts. |8192 -|protocol |Sets the `AuthProperties.Property.PROTOCOL` properties for authentication to Gremlin Server. |_none_ |serializer.className |The fully qualified class name of the `MessageSerializer` that will be used to deserialize responses from the server. Note that the serializer configured on the client should be supported by the server configuration. |_none_ |serializer.config |A `Map` of configuration settings for the serializer. |_none_ |username |The username to submit on requests that require authentication. |_none_ |workerPoolSize |Size of the pool for handling background work. |available processors * 2 -|enableUserAgentOnConnect |Enables sending a user agent to the server during connection requests. More details can be found in provider docs link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_graph_driver_provider_requirements[here].|true |========================================================= Please see the link:https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/tinkerpop/gremlin/driver/Cluster.Builder.html[Cluster.Builder javadoc] to get more information on these settings. @@ -835,25 +834,28 @@ Please see the link:https://tinkerpop.apache.org/javadocs/x.y.z/core/org/apache/ [[gremlin-java-transactions]] === Transactions -Transactions with Java are best described in <> section of this -documentation as Java covers both embedded and remote use cases. +IMPORTANT: 4.0 Milestone Release - Transactions with Java isn't currently supported. [[gremlin-java-serialization]] === Serialization Remote systems like Gremlin Server and Remote Gremlin Providers respond to requests made in a particular serialization format and respond by serializing results to some format to be interpreted by the client. For JVM-based languages, -there are two options for serialization: GraphSON and GraphBinary. It is important that the client and server -have the same serializers configured in the same way or else one or the other will experience serialization exceptions -and fail to always communicate. Discrepancy in serializer registration between client and server can happen fairly -easily as different graph systems may automatically include serializers on the server-side, thus leaving the client -to be configured manually. As an example: +there is a single option for serialization: GraphBinary. + +IMPORTANT: 4.0 Milestone Release - There is temporary support for GraphSON in the Java driver which will help with +testing, but it is expected that the drivers will only support GraphBinary when 4.0 is fully released. + +It is important that the client and server have the same serializers configured in the same way or else one or the +other will experience serialization exceptions and fail to always communicate. Discrepancy in serializer registration +between client and server can happen fairly easily as different graph systems may automatically include serializers on +the server-side, thus leaving the client to be configured manually. As an example: [source,java] ---- IoRegistry registry = ...; // an IoRegistry instance exposed by a specific graph provider TypeSerializerRegistry typeSerializerRegistry = TypeSerializerRegistry.build().addRegistry(registry).create(); -MessageSerializer serializer = new GraphBinaryMessageSerializerV1(typeSerializerRegistry); +MessageSerializer serializer = new GraphBinaryMessageSerializerV4(typeSerializerRegistry); Cluster cluster = Cluster.build(). serializer(serializer). create(); @@ -864,7 +866,7 @@ GraphTraversalSource g = traversal().with(DriverRemoteConnection.using(client, " The `IoRegistry` tells the serializer what classes from the graph provider to auto-register during serialization. Gremlin Server roughly uses this same approach when it configures its serializers, so using this same model will ensure compatibility when making requests. Obviously, it is possible to switch to GraphSON or GraphBinary by using -the appropriate `MessageSerializer` (e.g. `GraphSONMessageSerializerV3` or `GraphBinaryMessageSerializerV1` respectively) +the appropriate `MessageSerializer` (e.g. `GraphSONMessageSerializerV4` or `GraphBinaryMessageSerializerV4` respectively) in the same way and building that into the `Cluster` object. [[gremlin-java-lambda]] @@ -1024,6 +1026,29 @@ g2Client.submit("g.V()") The above code demonstrates how the `alias` method can be used such that the script need only contain a reference to "g" and "g1" and "g2" are automatically rebound into "g" on the server-side. +==== RequestInterceptor + +Gremlin-Java allows for modification of the underlying HTTP request through the use of `RequestInteceptors`. This is +intended to be an advanced feature which means that you will need to understand how the implementation works in order +to safely utilize it. Gremlin-Java is written in a way that you should be able to interact with a TinkerPop-enabled +server without having to use interceptors. This is intended for cases where the server has special capabilities. + +A `RequestInterceptor` is simply a `UnaryOperator`. A list of these are maintained and will be run sequentially for +each request. When building a `Cluster` instance, the methods `addInterceptorAfter()`, `addInterceptorBefore()`, +`addInterceptor()`, and `removeInterceptor()` can be used to add or remove interceptors. It's important to remember +that order matters so if one interceptor depends on another's output then ensure they are added in the correct order. +Note that `Auth` is also implemented using interceptors, and `Auth` is always run last after your list of interceptors +has already ran. By default, the `PayloadSerializingInterceptor` with the name `serializer` is added to your list of +interceptors. This interceptor is used for serializing the body of the request. The first interceptor is provided with +a `org.apache.tinkerpop.gremlin.driver.HttpRequest` that contains a `RequestMessage` in the body. As a reminder +`RequestMessage` is immutable and only certain keys can be added to them. If you want to customize the body by adding +other fields, you will need to make a different copy of the `RequestMessage` or completely change the body to contain a +different data type. The very last interceptor should have a `org.apache.tinkerpop.gremlin.driver.HttpRequest` that +contains a byte[] in the body. + +For an example of a simple `RequestInterceptor` that only modifies the header of the request take a look at +link:https://github.com/apache/tinkerpop/blob/x.y.z/gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/auth/Basic.java[basic authentication]. + [[gremlin-java-dsl]] === Domain Specific Languages @@ -1243,7 +1268,8 @@ All examples can then be run using your IDE of choice. [[gremlin-javascript]] == Gremlin-JavaScript -WARNING: Gremlin-JavaScript is not available in this Milestone release, please consider testing with Java or Python. +IMPORTANT: 4.0 Milestone Release - Gremlin-JavaScript is not available in this milestone, please consider testing with +Java or Python. image:gremlin-js.png[width=130,float=right] Apache TinkerPop's Gremlin-JavaScript implements Gremlin within the JavaScript language. It targets Node.js runtime and can be used on different operating systems on any Node.js 6 or @@ -1692,7 +1718,8 @@ node modern-traversals.js anchor:gremlin-DotNet[] [[gremlin-dotnet]] == Gremlin.Net -WARNING: Gremlin.Net is not available in this Milestone release, please consider testing with Java or Python. +IMPORTANT: 4.0 Milestone Release - Gremlin.Net is not available in this milestone, please consider testing with Java or +Python. image:gremlin-dotnet-logo.png[width=371,float=right] Apache TinkerPop's Gremlin.Net implements Gremlin within the C# language. It targets .NET Standard and can therefore be used on different operating systems and with different .NET @@ -2154,7 +2181,7 @@ Some connection options can also be set on individual requests made through the vertices = g.with_('evaluationTimeout', 500).V().out('knows').to_list() ---- -The following options are allowed on a per-request basis in this fashion: `batchSize`, `bulked`, `language`, `materializeProperties`, +The following options are allowed on a per-request basis in this fashion: `batchSize`, `bulkResults`, `language`, `materializeProperties`, `userAgent`, and `evaluationTimeout`. anchor:python-imports[] @@ -2254,7 +2281,7 @@ can be passed to the `Client` or `DriverRemoteConnection` instance as keyword ar |enable_user_agent_on_connect |Enables sending a user agent to the server during connection requests. More details can be found in provider docs link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_graph_driver_provider_requirements[here].|True -|enable_bulked_result |Enables bulking of results on the server. |False +|bulk_results |Enables bulking of results on the server. |False |========================================================= Note that the `transport_factory` can allow for additional configuration of the `AiohttpTransport`, which allows diff --git a/docs/src/reference/intro.asciidoc b/docs/src/reference/intro.asciidoc index 947de8d6709..86d4bbceda0 100644 --- a/docs/src/reference/intro.asciidoc +++ b/docs/src/reference/intro.asciidoc @@ -364,18 +364,15 @@ image:rexster-character-3.png[width=125,float=left] A JVM-based graph may be hos <>. Gremlin Server exposes the graph as an endpoint to which different clients can connect, essentially providing a remote GTM. Gremlin Server supports multiple methods for clients to interface with it: -* Websockets with a link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_graph_driver_provider_requirements[custom sub-protocol] -** String-based Gremlin scripts -** Bytecode-based Gremlin traversals -* HTTP for string-based scripts - -Users are encouraged to use the bytecode-based approach with websockets because it allows them to write Gremlin -in the language of their choice. Connecting looks somewhat similar to the <> approach -in that there is a need to create a `GraphTraversalSource`. In the embedded approach, the means for that object's -creation is derived from a `Graph` object which spawns it. In this case, however, the `Graph` instance exists only on -the server which means that there is no `Graph` instance to create locally. The approach is to instead create a -`GraphTraversalSource` anonymously with `AnonymousTraversalSource` and then apply some "remote" options that describe -the location of the Gremlin Server to connect to: +* link:https://tinkerpop.apache.org/docs/x.y.z/dev/provider/#_http_api[HTTP] for string-based scripts (both +gremlin-lang and gremlin-groovy) + +Connecting looks somewhat similar to the <> approach in that there is a need to create a +`GraphTraversalSource`. In the embedded approach, the means for that object's creation is derived from a `Graph` object +which spawns it. In this case, however, the `Graph` instance exists only on the server which means that there is no +`Graph` instance to create locally. The approach is to instead create a `GraphTraversalSource` anonymously with +`AnonymousTraversalSource` and then apply some "remote" options that describe the location of the Gremlin Server to +connect to: [source,java,tab] ---- diff --git a/docs/src/upgrade/release-4.x.x.asciidoc b/docs/src/upgrade/release-4.x.x.asciidoc index 239dfa11081..ba57204d125 100644 --- a/docs/src/upgrade/release-4.x.x.asciidoc +++ b/docs/src/upgrade/release-4.x.x.asciidoc @@ -21,15 +21,45 @@ image::https://raw.githubusercontent.com/apache/tinkerpop/master/docs/static/ima *4.0.0* -== TinkerPop 4.0.0 +== TinkerPop 4.0.0.M1 *Release Date: NOT OFFICIALLY RELEASED YET* Please see the link:https://github.com/apache/tinkerpop/blob/4.0.0/CHANGELOG.asciidoc#release-4-0-0[changelog] for a complete list of all the modifications that are part of this release. +NOTE: 4.0.0.M1 is a milestone release. It is for meant as a preview version to try out the new HTTP API features in +the server and drivers (Java/Python only). As this is a milestone version only, you can expect breaking changes to +occur in future milestones for 4.0.0 on the way to its General Availability release. Items that have important +limitations and constraints pertinent to this milestone will be highlighted through the documentation inside an +"IMPORTANT" box that starts with "4.0 Milestone Release". + === Upgrading for Users +==== Result Bulking from Server +In previous versions, when a traversal is submitted through the DriverRemoteConnection (DRC) via the Bytecode processor, +the results from the server were bulked as Traverser, which provides a form of result optimization across the wire. +Starting with 4.0, with the removal of Bytecode and Traverser serializer, this optimization is now achieved via +`GraphBinaryV4` response message serialization, and can be controlled through cluster setting or per request option. + +Per request option setting will always override cluster settings, and regardless of cluster or request option settings, +bulking will only occur if the script processing language is set to `gremlin-lang` and the serializer is set to `GraphBinaryV4`. + +[source,java] +---- +// cluster setting +Cluster cluster = Cluster.build().bulkResults(true).create(); + +// per request option +GraphTraversalSource g = traversal().with(DriverRemoteConnection.using(cluster)); +List result = g.with("language", "gremlin-lang").with("bulkResults", true).inject(1).toList(); +---- + +By default, the cluster setting of `bulkResults` is false. To remain consistent with previous behavior, remote traversal +submitted through the DRC will always send a request option setting `bulkResults` to `true`. This implies that if `gremlin-lang` +script engine and `GraphBinaryV4` serializer are used, then server will bulk results before sending regardless of cluster setting, +and can only be disabled via per request option. + ==== BulkSet Behavior Changes Starting with 4.0, steps which return BulkSet (e.g. `aggregate()`) will have results returned in different format depending on embedded or remote usage. @@ -44,6 +74,7 @@ This is a placeholder to summarize configuration-related changes. * `maxContentLength` setting for Gremlin Driver has been renamed to `maxResponseContentLength` and now blocks incoming responses that are too large based on total response size. * `maxContentLength` setting for Gremlin Server has been renamed to `maxRequestContentLength`. +* `enableCompression` setting has been removed. ==== Simplification to g creation @@ -276,7 +307,7 @@ g.with_strategies(TraversalStrategy( ==== Changes to Serialization -The GLVs will only support GraphBinary V4 and GraphSON support has been removed. This means that the serializer option +The GLVs will only support GraphBinaryV4 and GraphSON support will be removed. This means that the serializer option that was available in most GLVs has been removed. GraphBinary is a more compact format and has support for the same types. This should lead to increased performance for users upgrading from any version of GraphSON to GraphBinary. @@ -292,6 +323,39 @@ PDT. Primitive PDTs are string-based representations of a primitive type support contain a map of fields. You should consult your provider's documentation to determine what types of fields a particular PDT may contain. +==== Changes to Authentication and Authorization + +With the move to HTTP, the only authentication option supported out-of-the-box is HTTP basic access authentication +(username/password). The SASL-based authentication mechanisms are no longer supported (e.g. Kerberos). Your graph +system provider may choose to implement other authentication mechanisms over HTTP which you would have to use via a +request interceptor. Refer to your provider's documentation to determine if other authentication mechanisms are +available. + +==== Transactions Disabled + +IMPORTANT: 4.0 Milestone Release - Transactions are currently disabled and use of `tx()` will return an error. + +==== Result Bulking Changes + +Previous versions of the Gremlin Server would attempt to "bulk" the result if bytecode was used in the request. This +"bulking" increased performance by sending similar results once with a count of occurrences. Starting in 4.0, Gremlin +Server will bulk based on a newly introduced `bulked` field in the `RequestMessage`. It only applies to GraphBinary and +`gremlin-lang` requests and other requests won't be bulked. This can be toggled in the language variants by setting a +boolean value with `enableBulkedResult()` in the `Cluster` settings. + +==== Gremlin Java Changes + +Connection pooling has been updated to work with HTTP. Previously, connections could only be opened one at a time, but +this has changed and now many connections can be opened at the same time. This supports bursty workloads where many +queries may be issued within a short period of time. Connections are no longer closed based on how "busy" they are +based on the `minInProcessPerConnection` and `minSimultaneousUsagePerConnection`, rather they are closed based on an +idle timeout called `idleConnectionTimeout`. Because the number of connections can increase much faster and connections +are closed based on a timeout, the `minConnectionPoolSize` option has been removed and there may be zero connections +available if the driver has been idle for a while. + +The Java driver can currently handle a response that is a maximum of 2^31-1 (`Integer.MAX_VALUE`) bytes in size. +Queries that return more data will have to be separated into multiple queries that return less data. + === Upgrading for Providers ==== Renaming NoneStep to DiscardStep @@ -302,8 +366,9 @@ no elements matching the predicate. ==== Changes to Serialization The V4 versions of GraphBinary and GraphSON are being introduced. Support for the older versions of GraphBinary (V1) -and GraphSON (V1-3) is removed. Additionally, the GLVs will only use GraphBinary, the Gremlin Server, however, can -still serialize both GraphSON and GraphBinary. The following is a list of the major changes to the GraphBinary format: +and GraphSON (V1-3) is removed. Upon the full release of 4.0, the GLVs will only use GraphBinary, however, the Gremlin +Server will support both GraphSON and GraphBinary. The following is a list of the major changes to the GraphBinary +format: * Removed type serializers: ** Period @@ -348,6 +413,16 @@ still serialize both GraphSON and GraphBinary. The following is a list of the ma * `Element` (Vertex, Edge, VertexProperty) properties are no longer null and are `List` of `Property`. * Custom is replaced with Provider Defined Types +One of the biggest differences is in datetime support. Previously, in the Java implementation, `java.util.Date`, +`java.sql.Timestamp` and most types from the `java.time` package had serializers. This is isn't the case in GraphSON 4 +as only `java.time.OffsetDateTime` is supported. Java provides methods to convert amongst these classes so they should +be used to convert your data to and from `java.time.OffsetDateTime`. + +The `GraphSONSerializerProvider` is not used in GraphSON 4. The `GraphSONSerializerProvider` uses the +`ToStringSerializer` for any unknown type and was used in previous GraphSON versions. Because GraphSON 4 is only +intended to serialize specific types and not used as a general serializer, GraphSON 4 serializers will throw an error +when encountering unknown types. + ==== Graph System Providers ===== AbstractAuthenticatorHandler Constructor @@ -357,4 +432,35 @@ for the implementations. Gremlin Server formerly supported the two-arg `Authenti instantiating new custom instances. It now expects implementations of `AbstractAuthenticationHandler` to use a three-arg constructor that takes `Authenticator`, `Authorizer`, and `Settings`. +===== GraphManager Changes + +The `beforeQueryStart()`, `onQueryError()`, and `onQuerySuccess()` of `GraphManager` have been removed. These were +originally intended to give providers more insight into when execution occurs in the server and the outcome of that +execution. However, they depended on `RequestMessage` containing a Request ID, which isn't the case anymore. + +===== Gremlin Server Updates + +The `OpProcessor` extension point of the server has been removed. In order to extend the functionality of the Gremlin +Server, you have to implement your own `Channelizer`. + +If you are a provider that makes use of the Gremlin Server, you may need to update server configuration YAML files that +you provide to your users. With the change from WebSockets to HTTP, some of the previous default values are invalid and +some of the fields no longer exist. See link:https://tinkerpop.apache.org/docs/4.0.0/reference/#_configuring_2[options] +for an updated list. One of the most important changes is to the `Channelizer` configuration as only the +`HttpChannelizer` remains and the rest have been removed. + ==== Graph Driver Providers + +===== Application Layer Protocol Support + +HTTP/1.1 is now the only supported application-layer protocol and WebSockets support is dropped. Please follow the +instructions in the +link:https://tinkerpop.apache.org/docs/4.0.0/dev/provider/#_graph_driver_provider_requirements[provider documentation] +for more detailed information. The subprotocol remains fairly similar but has been adjusted to work better with HTTP. +Also, the move to HTTP means that SASL has been removed as an authentication mechanism and only HTTP basic remains. + +===== Request Interceptor + +It is strongly recommended that every graph driver provider give a way for users to intercept and modify the HTTP +request before it is sent off to the server. This capability is needed in cases where the graph system provider has +additional functionality that can be enabled by modifying the HTTP request.