lyft · chandanchowdhury · Sep 24, 2024 · Sep 24, 2024
diff --git a/README.md b/README.md
@@ -5,16 +5,14 @@ Cartography is a Python tool that consolidates infrastructure assets and the rel
 ![Visualization of RDS nodes and AWS nodes](docs/root/images/accountsandrds.png)
 
 ## Why Cartography?
-Cartography aims to enable a broad set of exploration and automation scenarios.  It is particularly good at exposing otherwise hidden dependency relationships between your service's assets so that you may validate assumptions about security risks.
+Cartography aims to enable a broad set of exploration and automation scenarios. It is particularly good at exposing otherwise hidden dependency relationships between your service's assets so that you may validate assumptions about security risks.
 
-Service owners can generate asset reports, Red Teamers can discover attack paths, and Blue Teamers can identify areas for security improvement.   All can benefit from using the graph for manual exploration through a web frontend interface, or in an automated fashion by calling the APIs.
+Service owners can generate asset reports, Red Teamers can discover attack paths, and Blue Teamers can identify areas for security improvement. All can benefit from using the graph for manual exploration through a web frontend interface, or in an automated fashion by calling the APIs.
 
-Cartography is not the only [security](https://github.com/dowjones/hammer) [graph](https://github.com/BloodHoundAD/BloodHound) [tool](https://github.com/Netflix/security_monkey) [out](https://github.com/vysecurity/ANGRYPUPPY) [there](https://github.com/duo-labs/cloudmapper), but it differentiates itself by being fully-featured yet generic and [extensible](https://lyft.github.io/cartography/dev/writing-analysis-jobs.html) enough to help make anyone better understand their risk exposure, regardless of what platforms they use.  Rather than being focused on one core scenario or attack vector like the other linked tools, Cartography focuses on flexibility and exploration.
+Cartography is not the only [security](https://github.com/dowjones/hammer) [graph](https://github.com/BloodHoundAD/BloodHound) [tool](https://github.com/Netflix/security_monkey) [out](https://github.com/vysecurity/ANGRYPUPPY) [there](https://github.com/duo-labs/cloudmapper), but it differentiates itself by being fully-featured yet generic and [extensible](https://lyft.github.io/cartography/dev/writing-analysis-jobs.html) enough to help make anyone better understand their risk exposure, regardless of what platforms they use. Rather than being focused on one core scenario or attack vector like the other linked tools, Cartography focuses on flexibility and exploration.
 
 You can learn more about the story behind Cartography in our [presentation at BSidesSF 2019](https://www.youtube.com/watch?v=ZukUmZSKSek).
 
-## Install and configure
-Start [here](https://lyft.github.io/cartography/install.html).
 
 ## Supported platforms
 
@@ -29,16 +27,57 @@ Start [here](https://lyft.github.io/cartography/install.html).
 - [Microsoft Azure](https://lyft.github.io/cartography/modules/azure/index.html) -  CosmosDB, SQL, Storage, Virtual Machine
 - [Kubernetes](https://lyft.github.io/cartography/modules/kubernetes/index.html) - Cluster, Namespace, Service, Pod, Container
 - [PagerDuty](https://lyft.github.io/cartography/modules/pagerduty/index.html) - Users, teams, services, schedules, escalation policies, integrations, vendors
-- [Crowdstrike Falcon](https://lyft.github.io/cartography/modules/crowdstrike/index.html) - Hosts, Spotlight vulnerabilites, CVEs
+- [Crowdstrike Falcon](https://lyft.github.io/cartography/modules/crowdstrike/index.html) - Hosts, Spotlight vulnerabilities, CVEs
 - [NIST CVE](https://lyft.github.io/cartography/modules/cve/index.html) - Common Vulnerabilities and Exposures (CVE) data from NIST database
 - [Lastpass](https://lyft.github.io/cartography/modules/lastpass/index.html) - users
 - [BigFix](https://lyft.github.io/cartography/modules/bigfix/index.html) - Computers
 - [Duo](https://lyft.github.io/cartography/modules/duo/index.html) - Users, Groups, Endpoints
 - [Kandji](https://lyft.github.io/cartography/modules/kandji/index.html) - Devices
 - [SnipeIT](https://lyft.github.io/cartography/modules/snipeit/index.html) - Users, Assets
 
+
+## Philosophy
+Here are some points that can help you decide if adopting Cartography is a good fit for your problem.
+
+### What Cartography is
+- A simple Python script that pulls data from multiple providers and writes it to a Neo4j graph database in batches.
+- A powerful analysis tool that captures the current snapshot of the environment, building a uniquely useful inventory where you can ask complex questions such as:
+  - Which identities have access to which datastores?
+  - What are the cross-tenant permission relationships in the environment?
+  - What are the network paths in and out of the environment?
+  - What are the backup policies for my datastores?
+- Battle-tested in production by [many companies](#who-uses-cartography).
+- Straightforward to extend with your own custom plugins.
+- Provides a useful data-plane that you can build CSPM applications on top of.
+
+### What Cartography is not
+- A near-real time capability.
+  - Cartography is not designed for very fast updates. Cartography writes to the database in a batches (not streamed).
+  - Cartography is also limited by how most upstream sources only provide APIs to retrieve assets in a batched manner.
+- By itself, Cartography does not capture data changes over time.
+  - Although we do include a [drift detection](docs/root/usage/drift-detect.md) feature.
+  - It's also possible to implement other processes in your Cartography installation to make this happen.
+
+
+## Install and configure
+
+### Trying out Cartography on a test machine
+Start [here](https://lyft.github.io/cartography/install.html) to set up a test graph and get data into it.
+
+### Setting up Cartography in production
+When you are ready to try it in production, read [here](docs/root/ops.md) for recommendations on getting cartography spun up in your environment.
+
 ## Usage
-Start with our [tutorial](https://lyft.github.io/cartography/usage/tutorial.html). Our [data schema](https://lyft.github.io/cartography/usage/schema.html) is a helpful reference when you get stuck.
+
+### Querying the database directly
+
+![poweruser.png](docs/root/images/poweruser.png)
+
+Now that data is in the graph, you can quickly start with our [querying tutorial](https://lyft.github.io/cartography/usage/tutorial.html). Our [data schema](https://lyft.github.io/cartography/usage/schema.html) is a helpful reference when you get stuck.
+
+### Building applications around Cartography
+Directly querying Neo4j is already very useful as a sort of "swiss army knife" for security data problems, but you can also build applications and data pipelines around Cartography. View this doc on [applications](docs/root/usage/applications.md).
+
 
 ## Community
 

diff --git a/docs/root/images/app-direct.png b/docs/root/images/app-direct.png
diff --git a/docs/root/images/app-with-api.png b/docs/root/images/app-with-api.png
diff --git a/docs/root/images/basic-dataflow.png b/docs/root/images/basic-dataflow.png
diff --git a/docs/root/images/parallel-crons.png b/docs/root/images/parallel-crons.png
diff --git a/docs/root/images/pipeline-hive-mode.png b/docs/root/images/pipeline-hive-mode.png
diff --git a/docs/root/images/pipeline-neodash.png b/docs/root/images/pipeline-neodash.png
diff --git a/docs/root/images/poweruser.png b/docs/root/images/poweruser.png
diff --git a/docs/root/install.md b/docs/root/install.md
@@ -1,43 +1,49 @@
-# Cartography Installation
+# Cartography Installation On Test Machine
 
 .. _cartography-installation:
 
-Time to set up the server that will run Cartography.  Cartography _should_ work on both Linux and Windows servers, but bear in mind we've only tested it in Linux so far.  Cartography supports Python 3.10. Older versions of Python may work but are not explicitly supported.
+Time to set up a test machine to run Cartography. Cartography _should_ work on both Linux and Windows, but bear in mind we've only tested it on Linux so far.
 
-1. **Run the Neo4j graph database version 4.4.\*** or higher on your server.
+1. Ensure that you have Python 3.10 set up on your machine.
 
-        ⚠️ Neo4j 5.x will probably work but Cartography does not explicitly support it yet.
+    - Older or newer versions of Python may work but are not explicitly supported. You will probably have more luck with newer versions.
 
-    1. If you prefer **Docker**, follow the Neo4j Docker [official docs](https://github.com/neo4j/docker-neo4j) to run a container with version 4.4.\* or higher.
+1. **Run the Neo4j graph database version 4.4.\*** or higher on your server. 4.3 and lower will _not_ work.
 
-        - If you are using an ARM-based machine like an M1 Mac, you should use an ARM image otherwise performance will be very slow - Neo4j keeps ARM builds [here](https://hub.docker.com/r/arm64v8/neo4j/).
+        ⚠️ Neo4j 5.x will probably work since it's included in our test suite, but we do not explicitly support it yet.
 
-        - If you're just playing around, you can specify the `--env=NEO4J_AUTH=none` argument to your `docker` command to run a Neo4j container without authentication.
+    1. If you prefer **Docker** (recommended), run `docker run --publish=7474:7474 --publish=7687:7687 -v data:/data --env=NEO4J_AUTH=none neo4j:4.4-community` to spin up a Neo4j container. Refer to the Neo4j Docker [official docs](https://github.com/neo4j/docker-neo4j) for more information.
+
+        - Note that we are just playing around here on a test instance and have specified `--env=NEO4J_AUTH=none` to turn off authentication.
+
+        - If you experience very slow write performance using an ARM-based machine like an M1 Mac, you should use an ARM image. Neo4j keeps ARM builds [here](https://hub.docker.com/r/arm64v8/neo4j/).
 
     1. Else if you prefer a **manual install**,
 
         1. Neo4j requires a JVM (JDK/JRE 11 or higher) to be installed. One option is to install [Amazon Coretto 11](https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/what-is-corretto-11.html).
 
-                ⚠️ Make sure you have `JAVA_HOME` environment variable set. The following works for Mac OS: `export JAVA_HOME=$(/usr/libexec/java_home)`
+            ⚠️ Make sure you have `JAVA_HOME` environment variable set. The following works for Mac OS: `export JAVA_HOME=$(/usr/libexec/java_home)`
 
-        1. Go to the [Neo4j download page](https://neo4j.com/download-center/#community), and download Neo4j Community Edition 4.4.\*. If you prefer Docker, you can view Neo4j's instructions [here].
+        1. Go to the [Neo4j download page](https://neo4j.com/download-center/#community), and download Neo4j Community Edition 4.4.\*.
 
-        1. [Install](https://neo4j.com/docs/operations-manual/current/installation/) Neo4j on the server you will run Cartography on.
+        1. [Install](https://neo4j.com/docs/operations-manual/current/installation/) Neo4j.
 
-                ⚠️ For local testing, you might want to turn off authentication via property `dbms.security.auth_enabled` in file /NEO4J_PATH/conf/neo4j.conf
+            ⚠️ For local testing, you might want to turn off authentication via property `dbms.security.auth_enabled` in file NEO4J_PATH/conf/neo4j.conf
 
-4. Configure your data sources. See the configuration section of each relevant intel module for more details.
+1. Configure your data sources. See the configuration section of [each relevant intel module](../root/modules) for more details.
 
-5. **Get and run Cartography**
+1. **Get and run Cartography**
 
-    1. Run `pip install cartography` to install our code.
+    1. Run `pip install cartography`
 
-    1. Finally, to sync your data:
+        - This will install cartography in the current Python virtual environment. We recommend creating a separate virtual environment for just Cartography and its dependencies.
+
+    1. Finally, let's sync some data into the test graph. In this example we will use AWS. Refer to each module's [specific configuration section](../root/modules) on how to set them up.
 
         - For one account using the `default` profile defined in your AWS config file, run
 
             ```
-            cartography --neo4j-uri <uri for your neo4j instance; usually bolt://localhost:7687>
+            cartography --neo4j-uri bolt://localhost:7687
             ```
 
         - Or for a specific account defined as a separate profile in your AWS config file, set the `AWS_PROFILE` environment variable, for example
@@ -54,6 +60,11 @@ Time to set up the server that will run Cartography.  Cartography _should_ work
 
         You can view a full list of Cartography's CLI arguments by running `cartography --help`
 
-        The sync will pull data from your configured accounts and ingest data to Neo4j!  This process might take a long time if your account has a lot of assets.
+        If everything worked, the sync will pull data from your configured accounts and ingest data to Neo4j! This process might take a long time if your account has a lot of assets.
+
+        If you encounter errors, review these references:
+        - Ensure your ~/.aws/credentials and ~/.aws/config files are set up correctly: https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html
+        - Review the various AWS environment variables: https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-envvars.html
+        - Cartography uses the boto3 Python library to access AWS, so remember that boto3's standard order of precedence when retrieving credentials applies: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials
 
-    1. See our [Operations Guide](ops.html) for tips on running Cartography in production.
+    1. Enjoy! Next set up other data providers, see our [Operations Guide](ops.html) for tips on running Cartography in production, view our [usage instructions](../../README.md#usage) for querying tips, and think of [applications](../root/usage/applications.md) to build around it.
diff --git a/docs/root/modules/aws/config.md b/docs/root/modules/aws/config.md
@@ -6,7 +6,8 @@ Follow these steps to analyze AWS assets with Cartography.
 
 ### Single AWS Account Setup
 
-1. Set up an AWS identity (user, group, or role) for Cartography to use.  Ensure that this identity has the built-in AWS [SecurityAudit policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions.html#jf_security-auditor) (arn:aws:iam::aws:policy/SecurityAudit) attached.  This policy grants access to read security config metadata. The SecurityAudit policy does not yet containe permissions for `inspector2`, so you will also need the [AmazonInspector2ReadOnlyAccess policy](https://docs.aws.amazon.com/inspector/latest/user/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonInspector2ReadOnlyAccess).
+1. Set up an AWS identity (user, group, or role) for Cartography to use. Ensure that this identity has the built-in AWS [SecurityAudit policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_job-functions.html#jf_security-auditor) (arn:aws:iam::aws:policy/SecurityAudit) attached. This policy grants access to read security config metadata.
+   1. If you want to use AWS Inspector, the SecurityAudit policy does not yet contain permissions for `inspector2`, so you will also need the [AmazonInspector2ReadOnlyAccess policy](https://docs.aws.amazon.com/inspector/latest/user/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonInspector2ReadOnlyAccess).
 1. Set up AWS credentials to this identity on your server, using a `config` and `credential` file.  For details, see AWS' [official guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
 1. [Optional] Configure AWS Retry settings using `AWS_MAX_ATTEMPTS` and `AWS_RETRY_MODE` environment variables. This helps in API Rate Limit throttling and TooManyRequestException related errors. For details, see AWS' [official guide](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables).
 

diff --git a/docs/root/ops.md b/docs/root/ops.md
@@ -1,7 +1,31 @@
-# Cartography operations guide
+# Cartography Production Operations
 
 This document contains tips for running Cartography in production.
 
+## Deployments
+
+### Simple
+
+The simplest production deployment involving Cartography looks something like this:
+
+![basic-dataflow.png](images/basic-dataflow.png)
+
+- Configure a Neo4j database. Specifics on this are out of scope of this document; refer to Neo4j's resources on how to
+  do this.
+- Configure a scheduled task (e.g. a cron job) to be able to access one or more data providers. See the
+  [modules](../root/modules) section for specifics on each. We recommend that you run the cron job on a separate machine
+  from the Neo4j database.
+
+### Parallel jobs
+If a single cartography job takes longer than you would like, you can configure jobs to run in parallel where each job syncs different resources.
+
+![parallel-crons.png](images/parallel-crons.png)
+
+Making sure that 2 resources of the same type never run at the same time is critical: you will encounter race conditions where one job may delete the resources synced by the other.
+
+The above diagram shows AWS and GitHub running on different jobs, but you can get more granular than that: as an example, you can have job 1 run AWS S3 and job 2 run AWS RDS in parallel with no negative effects.
+
+
 ## Maintaining a up-to-date picture of your infrastructure
 
 Running `cartography` ensures that your Neo4j instance contains the most recent snapshot of your infrastructure. Here's

diff --git a/docs/root/usage/applications.md b/docs/root/usage/applications.md
@@ -0,0 +1,29 @@
+# Building around Cartography
+
+This document shows patterns on how Cartography data fits in as part of a production system.
+
+## DB driver
+
+The quickest way to build an application around the graph is to use the Neo4j database driver and send queries at it:
+
+![app-direct.png](../images/app-direct.png)
+
+
+## API
+A more mature application will want to define a formal API around it like this:
+
+![app-with-api.png](../images/app-with-api.png)
+
+This way, database queries are abstracted behind the questions that your users will want to ask the graph.
+
+## As part of a data pipeline
+
+It can be beneficial to periodically extract graph data into data warehouses like Hive. This way you can have historical data. Hive is also easily paired with Mode for dashboards.
+
+![pipeline-hive-mode.png](../images/pipeline-hive-mode.png)
+
+## Other useful dashboard options
+
+[Neodash]() is great for mocking up views on top of graph data and can help you build a "home-made CSPM" very quickly.
+
+![pipeline-neodash.png](../images/pipeline-neodash.png)