Skip to content

A sample application to create a ZIP file in a Backblaze B2 bucket from a set of files also in Backblaze B2

License

Notifications You must be signed in to change notification settings

backblaze-b2-samples/b2-zip-files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Backblaze B2 Zip Files Example

This web app accepts a list of files to be compressed and the name of a ZIP file to be created. Since reading data from cloud object storage, compressing it, and then writing the compressed data back can take some time, the app responds with HTTP status 202 ACCEPTED immediately it receives and parses a request, then launches a background job to perform the work.

The app is implemented in Python using the Flask web application framework and the flask-executor task queue. You can run the app in a Docker container, the Flask development server, or in the Gunicorn WSGI HTTP Server.

Create a Backblaze B2 Account, Bucket and Application Key

Follow these instructions, as necessary:

Be sure to copy the application key as soon as you create it, as you will not be able to retrieve it later!

Configuration

The app reads its configuration from a set of environment variables. The easiest way to manage these in many circumstances is via a .env file. Copy the included .env.template to .env, or create a new .env file:

% cp .env.template .env

Now edit .env, pasting in your application key, its ID, bucket name, and endpoint:

LOGLEVEL=DEBUG
AWS_ACCESS_KEY_ID='<Your Backblaze B2 Application Key ID>'
AWS_SECRET_ACCESS_KEY='<Your Backblaze B2 Application Key>'
AWS_ENDPOINT_URL='<Your bucket endpoint, prefixed with https://, for example, https://s3.us-west-004.backblazeb2.com>'
BUCKET_NAME='<Your Backblaze B2 bucket name>'
SHARED_SECRET='<A long random string known only to the app and its authorized clients>'
PORT=8000

You can configure different buckets for input and output files if you wish by replacing the BUCKET_NAME line with the following:

INPUT_BUCKET_NAME='<Bucket with files to be zipped>'
OUTPUT_BUCKET_NAME='<Bucket for zip files>'

Note that, if you do use two buckets, your application key needs to have permissions to access both.

Running the App in Docker

The easiest way to run the app is via Docker, since it is the only prerequisite, reading the environment variables from .env. Gunicorn is installed in the Docker container and is configured to listen on port 8000, so you will need to use Docker's -p option to bind port 8000 to an available port on your machine. For example, if you wanted the Docker container to listen on port 80, you would run:

% docker run -p 80:8000 --env-file .env ghcr.io/backblaze-b2-samples/b2-zip-files:latest
[2024-06-28 23:04:47 +0000] [1] [DEBUG] Current configuration:
  config: python:config.gunicorn
  wsgi_app: None
...
DEBUG:app.py:Connected to B2, my-bucket exists.

Once the app is running, you can send it a request.

If the app does not start correctly, see the Troubleshooting section below.

You can publish the image to a repository and run it in a container on any cloud provider that supports Docker. For example, to deploy the app to AWS Fargate for Amazon ECS, you would push your image to Amazon Elastic Container Registry, then create an Amazon ECS Linux task for the Fargate launch type.

Download the Source Code

% git clone [email protected]:backblaze-b2-samples/b2-zip-files.git
Cloning into 'b2-zip-files'...
remote: Enumerating objects: 60, done.
remote: Counting objects: 100% (60/60), done.
...
% cd b2-zip-files

Running the App on the Local Machine

Create a Python Virtual Environment

Virtual environments allow you to encapsulate a project's dependencies; we recommend that you create a virtual environment thus:

% python3 -m venv .venv

You must then activate the virtual environment before installing dependencies:

% source .venv/bin/activate

You will need to reactivate the virtual environment, with the same command, if you close your Terminal window and return to the app later.

Install Python Dependencies

% pip install -r requirements.txt

Running the App in the Flask development server

Once you have configured the app, created a virtual environment and installed the dependencies, the simplest way to run the app is in the Flask development server. By default, the app will listen on http://127.0.0.1:5000:

% flask run
DEBUG:app.py:Connected to B2, my-bucket exists.
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
INFO:werkzeug:Press CTRL+C to quit

You can use the --host and --port to configure a different interface and/or port:

% flask run --host=0.0.0.0 --port=8000 
DEBUG:app.py:Connected to B2, my-bucket exists.
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8000
 * Running on http://192.168.69.12:8000
INFO:werkzeug:Press CTRL+C to quit
...

Once the app is running, you can send it a request.

Running the App in Gunicorn

Gunicorn does not read environment variables from a .env file, but you can use the shell to work around that if you are running Gunicorn from the command line:

% (export $(cat .env | xargs) && gunicorn --config python:config.gunicorn app:app)
[2024-06-28 14:21:43 -0700] [56698] [INFO] Starting gunicorn 22.0.0
[2024-06-28 14:21:43 -0700] [56698] [INFO] Listening at: http://0.0.0.0:8000 (56698)
[2024-06-28 14:21:43 -0700] [56698] [INFO] Using worker: sync
[2024-06-28 14:21:43 -0700] [56711] [INFO] Booting worker with pid: 56711
[2024-06-28 14:21:43 -0700] [56712] [INFO] Booting worker with pid: 56712
[2024-06-28 14:21:43 -0700] [56713] [INFO] Booting worker with pid: 56713
DEBUG:app.py:Connected to B2, my-bucket exists.
...

Once the app is running, you can send it a request.

If you are running Gunicorn as a service, you must ensure that you set the above variables in its environment.

Sending Requests to the App

However you run the app, clients send requests in the same way, setting the Authorization and Content-Type HTTP headers and sending a JSON payload.

  • The Authorization header must be of the form Authorization: Bearer <your shared secret>
  • The Content-Type header must specify JSON content: Content-Type: application/json
  • The payload must be JSON, of the form:
    {
      "files": [
        "path/to/first/file.pdf",
        "path/to/second/file.txt",
        "path/to/third/file.csv"
      ],
      "target": "path/to/output/file.zip"
    }

For example, using curl with the -i option to send a request from the Mac/Linux command line:

% curl -i -d '
{
  "files": [
    "path/to/first/file.pdf",
    "path/to/second/file.txt",
    "path/to/third/file.csv"
  ],
  "target":"path/to/output/file.zip"
}
' http://127.0.0.1:8080 -H 'Content-Type: application/json' -H 'Authorization: Bearer my-long-random-string-of-characters'
HTTP/1.1 202 ACCEPTED
Server: gunicorn
Date: Fri, 28 Jun 2024 23:17:24 GMT
Connection: close
Content-Type: text/html; charset=utf-8
Content-Length: 0

Note that, as mentioned above, the app responds to the request immediately with 202 ACCEPTED. You should be able to see the app's progress in the Flask/Gunicorn/Docker log output. For example:

[2024-06-28 23:17:24 +0000] [27] [DEBUG] POST /
DEBUG:app.py:Request: {
  "files": [
    "path/to/first/file.pdf",
    "path/to/second/file.txt",
    "path/to/third/file.csv"
  ],
  "target":"path/to/output/file.zip"
}
DEBUG:app.py:Opening my-bucket/path/to/output/file.zip for writing as a ZIP
DEBUG:app.py:Writing my-bucket/path/to/first/file.pdf to ZIP
DEBUG:app.py:Wrote my-bucket/path/to/first/file.pdf to ZIP
...
DEBUG:app.py:Finished writing my-bucket/path/to/output/file.zip in 11.175 seconds.
DEBUG:app.py:Read 1667163 bytes, wrote 1116999 bytes, compression ratio was 67%
DEBUG:app.py:Currently using 70 MB

If you are building a server application, you can use Event Notifications to have Backblaze B2 send your app a webhook request when the ZIP file has been created. Alternatively, your app can periodically poll the target file name until it is available. Here's a minimal example of how to do so using the AWS SDK for Python, Boto3.

s3_client = boto3.client('s3')

while True:
    try:
        # Get information on the object
        s3_client.head_object(
            Bucket=bucket,
            Key=key
        )
        print(f'{bucket}/{key} is available')
        break
    except ClientError as err:
        if err.response['ResponseMetadata']['HTTPStatusCode'] == 404:
            # The object was not found - sleep for a second then try again
            time.sleep(1)
        else:
            # Some other problem!
            raise err

Troubleshooting

Append the following line to your .env to get more verbose log output:

S3FS_LOGGING_LEVEL=DEBUG

Here are some common errors you might see:


DEBUG:s3fs:Nonretryable error: Could not connect to the endpoint URL: "https://s3.us-west-004.backblazeb2.com/my-bucket?list-type=2&max-keys=1&encoding-type=url"

The app cannot connect to Backblaze B2. Check that AWS_ENDPOINT_URL is correct, and that it is accessible from your environment.


DEBUG:s3fs:Client error (maybe retryable): An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The key '0041234567890120000000001' is not valid

There are two causes of this error:

  • The AWS_ACCESS_KEY_ID value is not valid - check that the value matches the application key ID in the Backblaze web UI.
  • AWS_ENDPOINT_URL is set to the wrong value, so, although you have the right key, you're sending it to the wrong Backblaze B2 region.

DEBUG:s3fs:Client error (maybe retryable): An error occurred (InvalidAccessKeyId) when calling the GetBucketLocation operation: Malformed Access Key Id

The AWS_ACCESS_KEY_ID value fails the basic checks on key length and structure. Check that the value matches the application key ID in the Backblaze web UI.


DEBUG:s3fs:Client error (maybe retryable): An error occurred (SignatureDoesNotMatch) when calling the ListObjectsV2 operation: Signature validation failed

The AWS_SECRET_ACCESS_KEY value is incorrect. If you have not saved the application key, delete it in the Backblaze web UI and create a new one.


DEBUG:s3fs:Client error (maybe retryable): An error occurred (NoSuchBucket) when calling the GetBucketLocation operation: The specified bucket does not exist: my-bucket

The BUCKET_NAME value is incorrect. Check the bucket name in the Backblaze web UI, or create a bucket if you have not already done so.


Going Further

Feel free to fork this repository and use it as a starting point for your own app. Let us know at [email protected] if you come up with something interesting!

About

A sample application to create a ZIP file in a Backblaze B2 bucket from a set of files also in Backblaze B2

Resources

License

Stars

Watchers

Forks

Packages