Skip to content

Commit

Permalink
Merge pull request #36 from dlcs/feature/composite_id
Browse files Browse the repository at this point in the history
Handle "compositeId" parameter
  • Loading branch information
donaldgray authored Nov 22, 2023
2 parents 2d6911e + 62d1ce8 commit a675af0
Show file tree
Hide file tree
Showing 5 changed files with 30 additions and 15 deletions.
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Additionally, the project uses:
The project ships with a [`docker-compose.yml`](docker-compose.yml) that can be used to get a local version of the component running:

```bash
docker-compose up
docker compose up
```

> Note that for the Composite Handler to be able to interact with the target S3 bucket, the Docker Compose assumes that the `AWS_PROFILE` environment variable has been set and a valid AWS session is available.
Expand Down Expand Up @@ -58,7 +58,7 @@ The administrator user can be used to browse the database and manage the queue (
There are 3 possible entrypoints to make the above easier:

* `entrypoint.sh` - this will wait for Postgres to be available and run `manage.py migrate` and `manage.py createcachetable` if `MIGRATE=True`. It will run `manage.py createsuperuser` is `INIT_SUPERUSER=True` (also needs `DJANGO_SUPERUSER_*` envvars)
* `entrypoint-api.sh` - this runs above then `python manage.py runserver 0.0.0.0:8000`
* `entrypoint-api.sh` - this runs above then starts nginx instance fronting gunicorn process
* `entrypoint-worker.sh` - this runs above then `python manage.py qcluster`

## Configuration
Expand Down Expand Up @@ -88,6 +88,7 @@ The following list of environment variables are supported:
| `ENGINE_WORKER_MAX_ATTEMPTS` | `0` | Engine | The number of processing attempts a single task will undergo before it is abandoned. Setting this value to `0` will cause a task to be retried forever. |
| `MIGRATE` | None | API, Engine | If "True" will run migrations + createcachetable on startup if entrypoint used. |
| `INIT_SUPERUSER` | None | API, Engine | If "True" will attempt to create superuser. Needs standard Django envvars to be set (e.g. `DJANGO_SUPERUSER_USERNAME`, `DJANGO_SUPERUSER_EMAIL`, `DJANGO_SUPERUSER_PASSWORD`) if entrypoint used. |
| `GUNICORN_WORKERS` | `2` | API | The value of [`--workers`](https://docs.gunicorn.org/en/stable/run.html) arg when running gunicorn |

Note that in order to access the S3 bucket, the Composite Handler assumes that valid AWS credentials are available in the environment - this can be in the former of [environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html), or in the form of ambient credentials.

Expand All @@ -96,15 +97,15 @@ Note that in order to access the S3 bucket, the Composite Handler assumes that v
The project ships with a [`Dockerfile`](./Dockerfile):

```bash
docker build -t dlcs/composite-handler:latest .
docker build -t dlcs/composite-handler:local .
```

This will produce a single image that can be used to execute any of the supported Django commands, including running the API and the engine:

```bash
docker run dlcs/composite-handler:latest python manage.py migrate # Apply any pending DB schema changes
docker run dlcs/composite-handler:latest python manage.py createcachetable # Create the cache table (if it doesn't exist)
docker run dlcs/composite-handler:latest python manage.py runserver 0.0.0.0:8000 # Run the API
docker run dlcs/composite-handler:latest python manage.py qcluster # Run the engine
docker run dlcs/composite-handler:latest python manage.py qmonitor # Monitor the workers
docker run dlcs/composite-handler:local python manage.py migrate # Apply any pending DB schema changes
docker run dlcs/composite-handler:local python manage.py createcachetable # Create the cache table (if it doesn't exist)
docker run --env-file .env -it --rm dlcs/composite-handler:local /srv/dlcs/entrypoint-api.sh # Run the API
docker run --env-file .env -it --rm dlcs/composite-handler:local /srv/dlcs/entrypoint-worker.sh # Run the engine
docker run dlcs/composite-handler:local python manage.py qmonitor # Monitor the workers
```
3 changes: 3 additions & 0 deletions src/app/api/schemas/member.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@
},
"incrementSeed": {
"type": "integer"
},
"compositeId": {
"type": "string"
}
},
"required": [
Expand Down
5 changes: 3 additions & 2 deletions src/app/engine/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
class MemberBuilder:
STATIC_FIELDS = {"mediaType": "image/jpeg", "family": "I"}

STRIP_FIELDS = ["@type", "originFormat", "incrementSeed"]
STRIP_FIELDS = ["@type", "originFormat", "incrementSeed", "compositeId"]

FORMAT_FIELDS = [
"id",
Expand All @@ -26,7 +26,8 @@ def __init__(self, template):
def __build_template(self, original_template):
template = dict(original_template) | self.STATIC_FIELDS
for strip_field in self.STRIP_FIELDS:
template.pop(strip_field)
if strip_field in template:
template.pop(strip_field)
return template

def build_member(self, dlcs_uri):
Expand Down
15 changes: 11 additions & 4 deletions src/app/engine/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,13 @@ def __build_bucket_base_url(self):
else:
return f"https://s3.amazonaws.com/{self._bucket_name}"

def put_images(self, submission_id, images):
def put_images(self, images, submission_id, composite_id, customer_id, space_id):
s3_uris = []

key_prefix = self.__get_key_prefix(
submission_id, composite_id, customer_id, space_id
)

with tqdm.tqdm(
desc=f"[{submission_id}] Upload images to S3",
unit=" image",
Expand All @@ -39,14 +43,17 @@ def put_images(self, submission_id, images):
# same order as the list of images provided to it. '.map(...)' gives us that,
# whilst '.submit(...)' does not.
for s3_uri in executor.map(
self.__put_image, repeat(submission_id), images
self.__put_image, repeat(key_prefix), images
):
s3_uris.append(s3_uri)
progress_bar.update(1)
return s3_uris

def __put_image(self, submission_id, image):
object_key = f"{self._object_key_prefix}/{submission_id}/{os.path.basename(image.filename)}"
def __get_key_prefix(self, submission_id, composite_id, customer, space):
return f"{self._object_key_prefix}/{customer}/{space}/{composite_id or submission_id}"

def __put_image(self, key_prefix, image):
object_key = f"{key_prefix}/{os.path.basename(image.filename)}"
with open(image.filename, "rb") as file:
self._client.put_object(Bucket=self._bucket_name, Key=object_key, Body=file)
return f"{self._bucket_base_url}/{object_key}"
5 changes: 4 additions & 1 deletion src/app/engine/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,10 @@ def __rasterize_composite(member, pdf_path):

def __push_images_to_dlcs(member, images):
__update_status(member, "PUSHING_TO_DLCS", image_count=len(images))
return s3_client.put_images(member.id, images)
composite_id = member.json_data.get("compositeId")
customer = member.collection.customer
space = member.json_data["space"]
return s3_client.put_images(images, member.id, composite_id, customer, space)


def __build_dlcs_requests(member, dlcs_uris):
Expand Down

0 comments on commit a675af0

Please sign in to comment.