Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask_image.imread.imread: differences between using for local file and hosted file #268

Closed
rkoo19 opened this issue Aug 11, 2022 · 2 comments

Comments

@rkoo19
Copy link

rkoo19 commented Aug 11, 2022

What happened:

Hello,

I've noticed that dask_image.imread.imread is not working on my end for a remote REST API based storage system and gives an index error. I have tried with the same file on my local machine, which worked. For extra clarification, the error arises when I try to plot the image. I am assuming the Dask arrays returned by dask_image.imread.imread("http://192.168.49.2:8080/v1/objects/dask-demo-bucket/sample.jpg") and dask_image.imread.imread("./sample.jpg") are different. However, when I print their shape I notice that they are the same.

I also noticed that there is a similar method dask.array.image.imread. What is the difference between this method and dask_image.imread.imread?

Thanks in advance!

What you expected to happen: Expected dask.array.image.imread to work with HTTP file pointers the same way it does for local files

Minimal Complete Verifiable Example:

import dask_image.imread
import skimage.io
import matplotlib.pyplot as plt

# Line of issue:
img = dask_image.imread.imread("http://192.168.49.2:8080/v1/objects/dask-demo-bucket/sample.jpg")[0]

plt.figure(figsize=(10, 10))\
# Causes error
skimage.io.imshow(img[:, :, 0])

Environment:

  • Dask version: 2022.8.0
  • Python version: 3.10.4
  • Operating System: Linux
  • Install method (conda, pip, source): pip
@GenevieveBuckley
Copy link
Collaborator

GenevieveBuckley commented Aug 12, 2022

Hi @rkoo19

1. Example data access

I'm not able to access this file at all, are you sure it's accessible?
http://192.168.49.2:8080/v1/objects/dask-demo-bucket/sample.jpg

2. Opening a different example file

I can access this file from a browser, so perhaps we can use that as a test case.
https://blog.dask.org/images/threads.jpg

However, attempting to open it directly from python returns a 403 error

import imageio.v3 as io
io.imread("https://blog.dask.org/images/threads.jpg")

Dask is passing your filename directly off to a reader function like this. (Well, technically dask-image passes it to pims, which can pass it to many different types of readers, and imageio is one of them. And dask.array.image.imread() passes the filename to scikit-image, which then passes it to imageio.)

Proposed fix

To fix this, we can write our own function to read the image (ref: this stackoverflow answer)

import requests
from io import BytesIO
import imageio.v3 as io

def url_image_reader(url):
    response = requests.get(url)
    byte_content = BytesIO(response.content)
    image = io.imread(byte_content)  # will likely error if provided with non-image data, you may need to add a check
    return image

result = url_image_reader("https://blog.dask.org/images/threads.jpg")
print(result.shape)  # works

The dask.array.image.imread() function provides an option to pass in a reader function (dask-image does not have this option). So, I tried that.

When I did that, dask tries to parse the url string with glob. That doesn't work, so we want to get rid of this line.

So I made a copy of that function, and just removed the glob parsing.

Edited imread function - click to expand
import os

try:
    from skimage.io import imread as sk_imread
except (AttributeError, ImportError):
    pass

from dask.array.core import Array
from dask.base import tokenize


def add_leading_dimension(x):
    return x[None, ...]


def custon_imread(filenames, imread=None, preprocess=None):
    """Read a stack of images into a dask array
    Parameters
    ----------
    filenames: list of strings
        A list of filename strings, eg: ['myfile._01.png', 'myfile_02.png']
    imread: function (optional)
        Optionally provide custom imread function.
        Function should expect a filename and produce a numpy array.
        Defaults to ``skimage.io.imread``.
    preprocess: function (optional)
        Optionally provide custom function to preprocess the image.
        Function should expect a numpy array for a single image.
    Examples
    --------
    >>> from dask.array.image import imread
    >>> im = imread('2015-*-*.png')  # doctest: +SKIP
    >>> im.shape  # doctest: +SKIP
    (365, 1000, 1000, 3)
    Returns
    -------
    Dask array of all images stacked along the first dimension.
    Each separate image file will be treated as an individual chunk.
    """
    imread = imread or sk_imread

    name = "imread-%s" % tokenize(filenames, map(os.path.getmtime, filenames))

    sample = imread(filenames[0])
    if preprocess:
        sample = preprocess(sample)

    keys = [(name, i) + (0,) * len(sample.shape) for i in range(len(filenames))]
    if preprocess:
        values = [
            (add_leading_dimension, (preprocess, (imread, fn))) for fn in filenames
        ]
    else:
        values = [(add_leading_dimension, (imread, fn)) for fn in filenames]
    dsk = dict(zip(keys, values))

    chunks = ((1,) * len(filenames),) + tuple((d,) for d in sample.shape)

    return Array(dsk, name, chunks, sample.dtype)

And now I can open the image like this:

filenames = ["https://blog.dask.org/images/threads.jpg"]
result = custon_imread(filenames, imread=url_image_reader)

print(result)
# dask.array<imread, shape=(1, 417, 418, 3), dtype=uint8, chunksize=(1, 417, 418, 3), chunktype=numpy.ndarray>

result.compute().shape
# (1, 417, 418, 3)

3. Differences between dask_image.imread.imreadanddask.array.image.imread`

I also noticed that there is a similar method dask.array.image.imread. What is the difference between this method and dask_image.imread.imread?

I commented briefly on that here #265 (comment)
And there is a larger issue on that here #229
It is confusing, and not ideal.

@rkoo19
Copy link
Author

rkoo19 commented Aug 15, 2022

Hi Genevieve,

Thanks for the reply and help! :) The edited imread function you provided works well.

@rkoo19 rkoo19 closed this as completed Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants