Add basic docs for integration setup and config (#1613)

Signed-off-by: Simeon Widdis <[email protected]> (cherry picked from commit 8f7950f) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-project · Mar 23, 2024 · 2881340 · 2881340
1 parent 2c4d5b5
commit 2881340
Show file tree

Hide file tree

Showing 3 changed files with 128 additions and 0 deletions.
diff --git a/docs/integrations/README.md b/docs/integrations/README.md
@@ -0,0 +1,8 @@
+# OpenSearch Integrations
+
+This is the developer documentation for OpenSearch Integrations.
+
+Some major documents to look at:
+- [Setup](setup.md) explains the major steps of the integration setup process behind the scenes,
+  which gives context for how integration content is assembled. To get more into developing
+  integrations directly, there's the related [Config](config.md) document.
diff --git a/docs/integrations/config.md b/docs/integrations/config.md
@@ -0,0 +1,85 @@
+# Integration Configuration
+
+**Date:** March 22, 2024
+
+The bulk of an integration's functionality is defined in its config. Let's look a bit at the config
+for the current [Nginx integration](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/__data__/repository/nginx/nginx-1.0.0.json),
+with some fields pruned for legibility, to get a better understanding of what information it
+contains.
+
+```json5
+{
+  "name": "nginx",
+  "version": "1.0.0",
+  "workflows": [
+    {
+      "name": "queries"
+    },
+    {
+      "name": "dashboards"
+    }
+  ],
+  "components": [
+    {
+      "name": "communication",
+      "version": "1.0.0"
+    },
+    {
+      "name": "http",
+      "version": "1.0.0"
+    },
+    {
+      "name": "logs",
+      "version": "1.0.0"
+    }
+  ],
+  "assets": [
+    {
+      "name": "nginx",
+      "version": "1.0.0",
+      "extension": "ndjson",
+      "type": "savedObjectBundle",
+      "workflows": ["dashboards"]
+    },
+    {
+      "name": "create_table",
+      "version": "1.0.0",
+      "extension": "sql",
+      "type": "query"
+    },
+    {
+      "name": "create_mv",
+      "version": "1.0.0",
+      "extension": "sql",
+      "type": "query",
+      "workflows": ["dashboards"]
+    }
+  ],
+  "sampleData": {
+    "path": "sample.json"
+  }
+}
+```
+
+There are generally four key components to an integration's functionality, a lot of what's left is metadata or used for rendering.
+
+- `assets` are the items that are associated with the integration, including queries, dashboards,
+  and index patterns. Originally the assets were just one `ndjson` file of exported Saved Objects
+  (today a `savedObjectBundle`), but to support further options it was transformed to a list with
+  further types. The assets are available under the [directory of the same name](https://github.com/opensearch-project/dashboards-observability/tree/4e1e0e585/server/adaptors/integrations/__data__/repository/nginx/assets).
+  The currently supported asset types are:
+  - `savedObjectBundle`: a saved object export. This typically includes an index pattern and a dashboard querying it, and it indicates that the integration expects data that conforms to this index pattern (see `components` below).
+  - `query`: A SQL query that is sent to OpenSearch Spark. You can read more about it at the
+    [opensearch-spark repository](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md).
+- `workflows` are conditional flags that toggle whether or not an asset should be installed. They're
+  selected by the user before installing the integration. By default, an asset is included under
+  every workflow. Currently, workflows are only enabled for integrations that support S3 data source
+  installations, and workflows are run in order of type (`query`s are always run before `savedObjectBundle`s).
+- `components` define the format of the data expected for saved queries and dashboards. This format
+  is specified by the components. These are typically shared between related integrations to allow
+  things like correlation by field. The current standard components defined here and in the
+  [OpenSearch Catalog](https://github.com/opensearch-project/opensearch-catalog) are heavily
+  inspired by [OpenTelemetry](https://opentelemetry.io/). The components can be used for validation
+  when connecting an integration to an index pattern. It's highly recommended to reuse existing
+  components where possible.
+- `sampleData` is loaded after the rest of the integration setup process when users select the "Try it" option.
diff --git a/docs/integrations/setup.md b/docs/integrations/setup.md
@@ -0,0 +1,35 @@
+# Integrations Setup
+
+**Date:** March 22, 2024
+
+When an integration is being installed, there are several steps executed in the process of getting
+everything up and running. This document describes the major steps of installing an integration that
+happen behind the scenes, to make it more clear how to implement content. It's generally recommended to read this along with the [Config document](config.md).
+
+Currently, two types of integration assets are supported with a synchronous install. The full
+installation process installs these separately, in two major chunks.
+
+- The frontend side of the setup is in
+  [setup_integrations.tsx](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/public/components/integrations/components/setup_integration.tsx#L450).
+  This is where the installation flow is selected based on the type of integration being installed,
+  integration `query`s are ran if available, and eventually the build request is sent to the
+  backend.
+- On the backend the request is routed to a
+  [builder](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/integrations_builder.ts#L32)
+  that handles some further reference tidying (rewriting UUIDs to avoid collisions, modifying which
+  index is read, etc) and makes the final integration instance object.
+
+This process is a little confusing and perhaps more convoluted than it needs to be. This is known to
+the author in hindsight.
+
+## Query Mapping
+
+If working on S3-based integrations, it's worth noting that queries have some values
+[substituted](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/public/components/integrations/components/setup_integration.tsx#L438) when installing. They are:
+
+- `{s3_bucket_location}` to locate data.
+- `{s3_checkpoint_location}` to store intermediate results, which is required by Spark.
+- `{object_name}` used for giving tables a unique name per-integration to avoid collisions.
+
+For some query examples, it can be worth looking at the assets for the
+[VPC integration](https://github.com/opensearch-project/dashboards-observability/blob/4e1e0e585/server/adaptors/integrations/__data__/repository/aws_vpc_flow/assets/README.md).