Skip to content

GID Graph format

Mehdi edited this page Aug 23, 2022 · 2 revisions

This page briefly describes the format of the GID Graph which is produced by the Metadata Plugin and consumed by Graph Plugin.

Version 1

GID Graph represents a graph with both internal and external nodes and edges between them. GID stands for Global Identifier which means that all nodes in the graph are globally unique because they were generated by Metadata Database (PostgreSQL).

Below is the example of the GID Graph that is used to communicate between Metadata and Graph Plugins:

{
    "index": 1,
    "product": "test",
    "version": "0.0.1",
    "nodes": [0, 1, 2],
    "numInternalNodes": 3,
    "edges": [[0, 1], [1, 2]]
}
  • index is the ID of the package version of the product generated by the Metadata Database. It is needed to be able to retrieve a graph from the Graph Database by its corresponding global index of package version.
  • product is the name of the package which is being saved i.e <groupId>.<artifactId>
  • version is the version of the package that is being saved
  • nodes is the array of GIDs of corresponding nodes. Important! In the array of nodes, there must be first listed all internal nodes, and only then all external. This order is important to be able to differentiate between internal and external nodes
  • numInternalNodes is the number of the internal nodes listed in the nodes array
  • edges is the array of arrays (pairs) of nodes that represents edges of the graph. NB! If there are any nodes in the edges which weren't listed in the nodes array, IllegalArgumentException will be thrown in Graph Plugin upon consumption of such GID graph

Version 2

{
"index": 1,
"product": "test",
"version": "0.0.1",
"nodes": [0,1,2],
"numInternalNodes": 3,
"edges": [],
"callsites_info": {
    "[0, 1]": {
      "line": 31,
      "receiver_type_ids": [5, ...],
      "call_type": "virtual"
    }, ...
},
"types_map": {"5": "/java.util/Collections", ...},
"gid_to_uri": {"0": "/java.util/Collections.emptySet()Set", ...}
}

Version 2 is an extension of the first version. We add multiple additional data to the previous representation. that allows us to stitch call graphs on demand. The additional data is the following:

  • callsites-info is necessary information about call sites that we need to find all potential targets. This includes receiver_type_ids which are the types that are used to make this call and call_type which is the bytecode instruction used in this call indicating whether or not a call is for example dynamic dispatch.
  • types_map is a map of ids that we use to refer to types. For example, in the above example within receiver_type_ids section we can use id 5 instead of writing the full name of the type Java Collections.
  • gid_to_uri similar to types_map we use a map to store the full uris of the methods and then use the ids instead of the full string name of the method.