Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for cloud object storage (S3, Swift) ? #231

Open
sguimmara opened this issue Jul 8, 2022 · 12 comments
Open

Support for cloud object storage (S3, Swift) ? #231

sguimmara opened this issue Jul 8, 2022 · 12 comments

Comments

@sguimmara
Copy link

Hello,

Does IIPImage supports accessing images elsewhere than a filesystem, such as a cloud object storage (and particularly swift) ?

Thank you

@ruven
Copy link
Owner

ruven commented Jul 8, 2022

Not at the moment. To do it you'd have to find a way to mount your swift storage as a virtual file system.

But, it would indeed be good to be able to access things like swift and aws storage directly through IIPImage. It risks to be quite inefficient, however, unless your images are optimized for cloud storage - you'd have to use something like cloud-optmized GeoTIFF: https://www.cogeo.org/ to make sure random access is fast enough

@sguimmara
Copy link
Author

Thanks for the answer !

Indeed, we are currently facing a dilemma. We must move our JPEG 2000 images from ordinary filesystems into Swift storage. But we were worried that IIPImage would no longer be able to serve them. From what I understand with your answer, our current solutions are :

  • duplicate each image to a lighter COG and serve them directly from a normal HTTP server, or
  • serve the original JPEG 2000 images without modification (they have overviews and block layout) with IIPImage from object storage (but we would have to mount a virtual FS).

@ruven
Copy link
Owner

ruven commented Jul 8, 2022

As your images are currently in JPEG2000 format, I think your best and most flexible option at this stage would be to mount a virtual FS with your swift storage.

In the longer term, I'll look into adding native swift support to IIPImage to avoid the need for a virtual FS. In such a configuration, a format such as COG would be much faster than using JPEG2000 or normal TIFF.

@scossu
Copy link
Contributor

scossu commented Jul 8, 2022

When I started evaluating IIIF image servers for my institution, I was initially taken aback by the lack of storage options of IIPImage. Afterwards, I actually found this limitation to be a good thing, that keeps IIPImage simple and reliable. S3 is several times slower than an SSD directly attached to the server or mounted via NFS over a fiber-channel network.

We ended up writing a small piece of Python middleware that does the following:

  • Intercept an image request to IIPImage
  • Check for the requested image in a fast storage volume ("cache"), accessible to IIPImage
  • If it's present, forward the request to IIPImage
  • If not, block the request, pull the image from S3 into the cache, and when done, forward to IIPImage

Along with some basic maintenance tools such as clearing the cache volume on demand and routinely by pruning older files, it's a relatively simple, low-maintenance addition that allows you to have your image sources anywhere, without depending on the image server features. Also, it allows you to provide fast access to frequently used sources without paying a fortune to store all your images in an SSD.

@sguimmara
Copy link
Author

@scossu Thanks for the report !

Our situation is that we have ~150 TB of JPEG 2000 images (around 4 millions files), that are currently served from a filesystem through IIPImage. Now, the vast majority of those images are, rarely if never, going to be served, and will remain in the archive for years without someone to touch them. More are added every month.

Converting everything to COG in advance seems like a huge overkill, in term of processing power, and storage cost, since COG would be 3 to 10 times bigger than the original image to maintain lossless quality.

My initial thought was to keep the JPEG2000 archive as is, but generate a lossy COG copy with gdal_translate for browser use (served via a simple HTTP server), and serve the original JPEG2000 in direct download if requested. Generating a COG on the fly is not instant (several seconds) however, so the almost zero latency of Cloud Optimized GeoTIFF would be offset by this initial conversion time.

In your scenario, the file would be fetched from object storage into a nearby cache to be served as is by IIIPImage.

On the top of my head, I don't know which scenario would have the lowest latency from user request to image display, but it seems yours should be faster, since there would be no conversion step. However once the COG is generated, it would be served by a simple HTTP server. This would appear to scale better and would unload a lot of work from the backend.

@ruven
Copy link
Owner

ruven commented Jul 9, 2022

@scossu's solution is indeed a good option. The only drawback is that the very first request to a new image not in cache will be very slow as you have to copy the whole file across first. All subsequent requests will, however, be very fast.

Regarding the use of COG directly through HTTP, it really depends what you want to be able to do with the images. Don't forget that COG is still a TIFF file. The only difference between COG and classic TIFF is just related to how the internal TIFF metadata in the file is ordered. COG puts all this information at the beginning of the file, whereas in classic TIFF, this can be scattered throughout. A COG HTTP request will give you direct access to the compressed tiles and not to transcoded images as you would get with an image server such as IIPImage. You also won't be able to get anything that isn't a tile, such as image overviews, arbitrary regions or be able to apply any image processing, unless you handle this through some client-side javascript.

If you want the fastest possible access to tiles with no intervening image server and no need for client-side JS, then the old Zoomify or Deepzoom approach would be your best bet - you just pre-generate the JPEG tiles and store them all as separate files on your cloud server, which would server them directly to the browser.

since COG would be 3 to 10 times bigger than the original image to maintain lossless quality.

By the way, lossless tiled pyramid TIFF (COG or not) will be about twice as large as lossless JPEG2000 (and similar in size to the raw image size).

@sguimmara
Copy link
Author

it really depends what you want to be able to do with the images

The goal is to be able to visualize the (8-bit) images in a browser. The user is the general public. So : pan, zoom, that's it. No need for image processing.

You also won't be able to get anything that isn't a tile, such as image overviews, arbitrary regions or be able to apply any image processing, unless you handle this through some client-side javascript.

If we switched to COG, we would to use OpenLayers with the GeoTIFF source. This would remove most of the load from the server, and reduce the costs.

If we stayed with JPEG2000s, we would probably keep IIPImage, with the aforementioned drawbacks.

@joesong168
Copy link

joesong168 commented Feb 28, 2023

or client-side JS, then the old Zoomify or Deepzoom

Agree that this would be the fastest way to serve tiles in large scale. However, maybe we could gain speed and dynamic serving together. If we think about IIP serve to include the computing power in browser, why couldn't we manipulate tiles in browser platform such as WASM and keep all tiles static on image server.

@ruven
Copy link
Owner

ruven commented Mar 5, 2023

If we think about IIP serve to include the computing power in browser, why couldn't we manipulate tiles in browser platform such as WASM and keep all tiles static on image server.

Yes, if you use something like COG, you would have direct access to the raw image data and would be able to offload all processing to the browser itself through JS and WASM..

@joesong168
Copy link

Execute me, what's COG?

@sguimmara
Copy link
Author

Execute me, what's COG?

@joesong168 Cloud Optimized GeoTIFF is a way to stream imagery (a bit like what IIPServ does). The benefit of COGs is that they don't need a particular service (a simple HTTP server is enough).

@joesong168
Copy link

@sguimmara Thanks for your explanation. Have you tried Juice FS? It is a cloud native file system that might serve your need. https://juicefs.com/en/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants