From dde77d1869a17aed6f9fdd33385fdd7e02146366 Mon Sep 17 00:00:00 2001
From: vinhegde <73926849+vinayakhegde1@users.noreply.github.com>
Date: Thu, 19 Oct 2023 21:51:36 +0530
Subject: [PATCH] removed-igestion_server-readme.md (#3227)

---
 ingestion_server/README.md | 173 -------------------------------------
 1 file changed, 173 deletions(-)
 delete mode 100644 ingestion_server/README.md
diff --git a/ingestion_server/README.md b/ingestion_server/README.md
deleted file mode 100644
index 726368dbf52..00000000000
--- a/ingestion_server/README.md
+++ /dev/null
@@ -1,173 +0,0 @@
-# Ingestion server
-
-## Introduction
-
-Ingestion Server is a small private API for copying data from an upstream source
-and loading it into the Openverse API. This is a two-step process:
-
-1. The data is copied from the upstream catalog database and into the downstream
-   API database.
-2. Data from the downstream API database gets indexed in Elasticsearch.
-
-Performance is dependent on the size of the target Elasticsearch cluster,
-database throughput, and bandwidth available to the ingestion server. The
-primary bottleneck is indexing to Elasticsearch.
-
-## How indexing works
-
-![How indexing works](../readme_assets/howitworks.png)
-
-## Safety and security considerations
-
-The server has been designed to fail gracefully in the event of network
-interruptions, full disks, etc. If a task fails to complete successfully, the
-whole process is rolled back with zero impact to production.
-
-The server is designed to be run in a private network only. You must not expose
-the private Ingestion Server API to the public internet.
-
-## Notifications
-
-If a `SLACK_WEBHOOK` variable is provided, the ingestion server will provide
-periodic updates on the progress of a data refresh, or relay any errors that may
-occur during the process.
-
-## Data refresh limit
-
-The `DATA_REFRESH_LIMIT` variable can be used to define a limit to the number of
-rows pulled from the upstream catalog database. If the server is running in an
-`ENVIRONMENT` that is not `prod` or `production`, this is automatically set to
-100k records.
-
-## Running on the host
-
-1. Create environment variables from the template file.
-
-   ```bash
-   just env
-   ```
-
-2. Install Python dependencies.
-
-   ```bash
-   just install
-   ```
-
-3. Start the Gunicorn server.
-   ```bash
-   pipenv run gunicorn
-   ```
-
-## Running the tests
-
-### Integration Tests
-
-The integration tests can be run using `just ingestion_server/test-local`. Note
-that if an `.env` file exists in the folder you're running `just` from, it may
-interfere with the integration test variables and cause unexpected failures.
-
-### Making requests
-
-To make cURL requests to the server
-
-```bash
-pipenv run \
-  curl \
-    --XPOST localhost:8001/task \
-    -H "Content-Type: application/json" \
-    -d '{"model": <model>, "action": <action>}'
-```
-
-Replace `<model>` and `<action>` with the correct values. For example, to
-download and index all new images, `<model>` will be `"image"` and `<action>`
-will be `"INGEST_UPSTREAM"`.
-
-## Configuration
-
-All configuration is performed through environment variables. See the
-`env.template` file for a comprehensive list of all environment variables. The
-ones with sane defaults have been commented out.
-
-Pipenv will automatically load `.env` files when running commands with
-`pipenv run`.
-
-## Mapping database tables to Elasticsearch
-
-In order to synchronize a given table to Elasticsearch, the following
-requirements must be met:
-
-- The database table must have an autoincrementing integer primary key named
-  `id`.
-- A SyncableDoctype must be defined in `es_syncer/elasticsearch_models`. The
-  SyncableDoctype must implement the function
-  `database_row_to_elasticsearch_model`.
-- The table name must be mapped to the corresponding Elasticsearch
-  SyncableDoctype in `database_table_to_elasticsearch_model` map.
-
-Example from `es_syncer/elasticsearch_models.py`:
-
-```python
-class Image(SyncableDocType):
-    title = Text(analyzer="english")
-    identifier = Text(index="not_analyzed")
-    creator = Text()
-    creator_url = Text(index="not_analyzed")
-    tags = Text(multi=True)
-    created_on = Date()
-    url = Text(index="not_analyzed")
-    thumbnail = Text(index="not_analyzed")
-    provider = Text(index="not_analyzed")
-    source = Text(index="not_analyzed")
-    license = Text(index="not_analyzed")
-    license_version = Text("not_analyzed")
-    foreign_landing_url = Text(index="not_analyzed")
-    meta_data = Nested()
-
-    class Meta:
-        index = 'image'
-
-    @staticmethod
-    def database_row_to_elasticsearch_doc(row, schema):
-        return Image(
-            pg_id=row[schema['id']],
-            title=row[schema['title']],
-            identifier=row[schema['identifier']],
-            creator=row[schema['creator']],
-            creator_url=row[schema['creator_url']],
-            created_on=row[schema['created_on']],
-            url=row[schema['url']],
-            thumbnail=row[schema['thumbnail']],
-            provider=row[schema['provider']],
-            source=row[schema['source']],
-            license=row[schema['license']],
-            license_version=row[schema['license_version']],
-            foreign_landing_url=row[schema['foreign_landing_url']],
-            meta_data=row[schema['meta_data']],
-        )
-
-
-# Table name -> Elasticsearch model
-database_table_to_elasticsearch_model = {
-    'image': Image
-}
-```
-
-## Deployment
-
-This codebase is deployed as a Docker image to the GitHub Container Registry
-[ghcr.io](https://ghcr.io). The deployed image is then pulled in the production
-environment. See the [`ci_cd.yml`](../.github/workflows/ci_cd.yml) workflow for
-deploying to GHCR.
-
-The published image can be deployed using the minimal
-[`docker-compose.yml`](docker-compose.yml) file defined in this folder (do not
-forget to update the `.env` file for production). The repository `justfile` can
-be used, but the environment variable `IS_PROD` must be set to `true` in order
-for it to reference the production `docker-compose.yml` file here. The version
-of the image to use can also be explicitly defined using the `IMAGE_TAG`
-environment variable (e.g. `IMAGE_TAG=v2.1.1`).
-
-### Old Docker Hub images
-
-- [openverse/ingestion_server](https://hub.docker.com/r/openverse/ingestion_server)
-- [creativecommons/ingestion_server](https://hub.docker.com/r/creativecommons/ingestion_server)