Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build and deploy without docker? #10

Open
donnyv opened this issue Apr 29, 2019 · 8 comments
Open

How to build and deploy without docker? #10

donnyv opened this issue Apr 29, 2019 · 8 comments

Comments

@donnyv
Copy link

donnyv commented Apr 29, 2019

Not a fan of docker. How would I build and deploy without it?

@mojodna
Copy link
Owner

mojodna commented Apr 29, 2019

Fair enough.

Use the steps in the Dockerfile as instructions to translate to your target environment ;-) Those are for Ubuntu 16.04.

Short version:

pip install -r requirements-server.txt
gunicorn -k gevent virtual.web:app

You'll almost certainly need to install some dependencies for pip to be able to install everything correctly.

@donnyv
Copy link
Author

donnyv commented Apr 29, 2019

I'm assuming you haven't tried this with Windows?

@mojodna
Copy link
Owner

mojodna commented Apr 29, 2019

Nope, sorry. I develop locally on Ubuntu using Gunicorn or Flask's built-in web server and usually deploy to AWS Lambda.

I'm not aware of any specific reasons that it wouldn't work (since rasterio is known to work on Windows), but it's been years since I've used Python on Windows.

@donnyv
Copy link
Author

donnyv commented Apr 29, 2019

How hard would it be to switch out Gunicorn with nginx?

@mojodna
Copy link
Owner

mojodna commented Apr 29, 2019

Nginx is usually configured as a reverse proxy in front of something that speaks HTTP (like Gunicorn) or WSGI (like uWSGI), so I don't think you'd actually swap them; more that you'd layer Nginx in front and change out the application server.

This looks helpful: https://www.nginx.com/blog/maximizing-python-performance-with-nginx-parti-web-serving-and-caching/

@donnyv
Copy link
Author

donnyv commented Apr 29, 2019

Yeah I've done that before. My worry is with Gunicorn. These performance numbers are making me a little uncomfortable. Even with Nginx in front of it I would probably need to load balance it with multiple instances for large loads. How has the performance been for you on AWS Lambda?
https://www.appdynamics.com/blog/engineering/a-performance-analysis-of-python-wsgi-servers-part-2/

@mojodna
Copy link
Owner

mojodna commented Apr 29, 2019

Alas, if that were the worst of the performance numbers. marblecutter (and rasterio / GDAL by extension) is by far a greater bottleneck, especially when rendering tiles out of remote COGs.

It's not uncommon for tile requests to take 1s or more with an empty cache.

CPU is a consideration (especially when data needs to be reprojected), but the biggest driver is network / storage latency. GDAL attempts to minimize the number of upstream requests (and to parallelize them where possible), but even so, there are usually 2-3 round-trips to find the IFD, read the IFD, and request overlapping regions of the source image. If latency is 100ms (not uncommon with S3), that's immediately lots (most server processes will spend much of their time waiting for upstream data). Rendering from a local fileserver (not using remote sources) or from an endpoint with lower latency should help more than any tuning of your application server.

Lambda ends up working really well for this, in part because of its single-invocation per request model, but more because it can scale out almost instantaneously to absorb spikes. I typically run a CloudFront cache in front, so popular areas don't need to be constantly re-rendered (configuring Nginx as a cache would help immensely here, if certain regions are commonly requested).

If your data is on a publicly-accessible HTTP endpoint, give tiles.rdnt.io a shot to see how it performs. That's deployed on Lambda (with 1536MB allocated, mainly for the corresponding CPU increase) and will attempt to minimize by rendering from an AWS region closest to your data (only determinable for S3-hosted data right now).

If pre-rendering to tiles is feasible (small target area, uniform region popularity, requirement for low latency), that's ideal. Otherwise, the trade-off with marblecutter-virtual (and friends) is that initial tile requests will be slow in exchange for having access to extremely large regions immediately.

@donnyv
Copy link
Author

donnyv commented Apr 29, 2019

Thanks for the great write up! You gave me a lot to think about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants