Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option recursive to listdir #1275

Closed
adriantre opened this issue Jun 5, 2023 · 4 comments
Closed

Add option recursive to listdir #1275

adriantre opened this issue Jun 5, 2023 · 4 comments
Assignees
Labels

Comments

@adriantre
Copy link

papszFiles = VSIReadDir(path_c)

Would it be a quick fix to use VSIReadDirRecursive in fiona.listdir to enable recursive listing within VSI?

Current workaround:

def listdir_vsi_recursive(root: str) -> list[str]:
    """Walk directory and return list of all files in subdirectories.
    Also supports listing filenames within a GDAL Virtual File System like
    cloud buckets. https://gdal.org/user/virtual_file_systems.html
    Args:
        root: root directory or blob in bucket
    Returns
        List of absolute filepaths withing the root
    """
    dirs = [root]
    files = []
    while dirs:
        dir = dirs.pop()
        try:
            subdirs = fiona.listdir(dir)
            dirs.extend([os.path.join(dir, subdir) for subdir in subdirs])
        except FionaValueError:
            files.append(dir)
    return files

Should maybe return file paths relative to root to be consistent with listdir as is. (In my workaround I include root).

@sgillies
Copy link
Member

sgillies commented Jun 6, 2023

@adriantre I'm interested. Can you tell me more about how you would use this?

@adriantre
Copy link
Author

Thanks Sean!

I started using TorchGeo and found that there were only some minor blockers for it to be able to read files directly from cloud storage through GDAL VSI. The only blocker is they are pattern-matching files using glob.

root = "/vsigs/my_bucket'
filename_glob = "T*_B02_10m.tif"
pathname = os.path.join(root, "**", filename_glob)
for filepath in glob.iglob(pathname, recursive=True):
    ...

I don't know how easy it would be to mirror the behaviour of glob, allowing wildcards, but it would then be a very easy fix in that repo :P

A similar option is Pathlib.rglob.

Another approach would be to create a fiona.walk which mirrors os.walk.

for root, subdirs, files in os.walk(walk_dir):
    for file in files:
        asb_path_for_file = os.path.join(root, file
        #file is filenames only but os.path.join will magically yield the abs path

I created a PR to TorchGeo with the above workaround. But I think swapping out the usage of glob would be a cleaner solution.

@sgillies
Copy link
Member

sgillies commented Jun 8, 2023

@adriantre after some reflection, I'm inclined not to do this. First of all, the workaround isn't terrible, and not adding a new feature to fiona (or rasterio) saves us testing, documentation, &c. But mostly I don't want fiona and rasterio to become cloud storage APIs. GDAL's VSI system has a whole cloud storage API, but I believe Python developers are much better off using their cloud's Python SDK (boto3, for example) or a dedicated cross-cloud library like https://libcloud.apache.org/.

@sgillies sgillies closed this as completed Jun 8, 2023
@sgillies sgillies self-assigned this Jun 8, 2023
@adriantre
Copy link
Author

I understand. Cross-cloud is why I was looking to GDAL/Fiona, to have a platform-agnostic solution. libcloud looks promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants