Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GeoTIFFDataset returns CRSError: Invalid projection: : (Internal Proj Error: proj_create: unrecognized format / unknown name) #4756

Open
3 tasks
robmarkcole opened this issue Aug 30, 2024 · 8 comments
Labels
bug Bug fixes

Comments

@robmarkcole
Copy link

Describe the problem

Geotiffs are in EPSG:32648 and this raises an error

Code to reproduce issue

name = "data-v1"
dataset_dir = "data"

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.GeoTIFFDataset,
    label_field="location",
    name=name,
)

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Lightning studio
  • Python version (python --version): Python 3.10.10
  • FiftyOne version (fiftyone --version): FiftyOne v0.22.1
  • FiftyOne installed from (pip):

Other info/logs

---------------------------------------------------------------------------
CRSError                                  Traceback (most recent call last)
Cell In[3], line 2
      1 # Create the dataset
----> 2 dataset = fo.Dataset.from_dir(
      3     dataset_dir=dataset_dir,
      4     dataset_type=fo.types.GeoTIFFDataset,
      5     label_field="location",
      6     name=name,
      7 )
      9 # View summary info about the dataset
     10 print(dataset)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/fiftyone/core/dataset.py:5354, in Dataset.from_dir(cls, dataset_dir, dataset_type, data_path, labels_path, name, persistent, overwrite, label_field, tags, dynamic, **kwargs)
   5260 """Creates a :class:`Dataset` from the contents of the given directory.
   5261 
   5262 You can create datasets with this method via the following basic
   (...)
   5351     a :class:`Dataset`
   5352 """
   5353 dataset = cls(name, persistent=persistent, overwrite=overwrite)
-> 5354 dataset.add_dir(
   5355     dataset_dir=dataset_dir,
   5356     dataset_type=dataset_type,
   5357     data_path=data_path,
...
--> 348     self._local.crs = _CRS(self.srs)

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pyproj/_crs.pyx:2378, in pyproj._crs._CRS.__init__()

CRSError: Invalid projection: : (Internal Proj Error: proj_create: unrecognized format / unknown name)

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time
@robmarkcole robmarkcole added the bug Bug fixes label Aug 30, 2024
@swheaton
Copy link
Contributor

This seems like an issue potentially unique to your data? It will be hard to debug without an example. Can you provide a small, non-sensitive test image that reproduces your issue?

At the very least, please paste the whole stack trace as it seems to be chopped off in the middle. As it stands, it seems like rasterio or pyproj just doesn't support that CRS? Hard to tell.

@robmarkcole
Copy link
Author

example_32648.tif.zip

I've no issue opening with rasterio

@swheaton
Copy link
Contributor

Thank you. I am not able to reproduce with your example however.

>>> import fiftyone as fo
>>> ds=fo.Dataset.from_dir("data", dataset_type=fo.types.GeoTIFFDataset, label_field="location")
 100% |███████████████████████████████████████████████████████████████████████████████████| 1/1 [269.5ms elapsed, 0s remaining, 3.7 samples/s]
>>> ds.first().location
<GeoLocation: {
    'id': '66d1cf90bf68b63e4b2b7aff',
    'tags': [],
    'point': [104.06648594333119, 1.2362626908143242],
    'line': None,
    'polygon': [
        [
            [104.06303347762334, 1.2397351518883317],
            [104.06993598244, 1.2397376011299024],
            [104.06993840007095, 1.2327902182699888],
            [104.06303591319048, 1.2327877827578355],
            [104.06303347762334, 1.2397351518883317],
        ],
    ],
}>

My package versions (fresh environment pip install just now)

fiftyone==0.22.1
pyproj==3.6.1
rasterio==1.3.10

@robmarkcole
Copy link
Author

OK the issue is there are also png files in that folder

@swheaton
Copy link
Contributor

Got it yeah that won't work, it doesn't check file extension before trying to open the file.
Can we close this? 🙌🏼

@robmarkcole
Copy link
Author

I feel it would be good to at least raise a warning if a non tif is opened

@swheaton
Copy link
Contributor

Ok would you like to submit a PR proposal?

One idea that is idiomatic with other fiftyone methods is an argument to the GeoTIFFImporter skip_failures=True. If True, it would just completely ignore the file and not add that sample. If False then an exception is raised. We could wrap the exception you saw because whatever came out of rasterio/pyproj was obviously not helpful.

@robmarkcole
Copy link
Author

Sounds excellent, happy to take on but no promises on timeframe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

No branches or pull requests

2 participants