Add Crop Type Mapping tutorial #2449

burakekim · 2024-12-05T15:45:48Z

Adding a new tutorial according to #2418

This tutorial demonstrates how to combine Sentinel-2 and ~~CDL~~ EuroCrops datasets using the ~~Sentinel2CDLDataModule~~ Sentinel2EuroCropsDataModule. It covers training a semantic segmentation model, along with evaluation and inference steps.

nilsleh · 2024-12-06T13:47:19Z

@burakekim Can you add the tutorial to the table of content in the rst file? Then one can view it in the CI docs to review as well :)

docs/index.rst

burakekim · 2025-01-04T13:48:49Z

The initial and somewhat end-to-end draft is now out:

downloads a Sentinel-2 patch with rasterio's windowed reading
prepares EuroCrops
visualizes the Sentinel-2 patch and its corresponding EuroCrops mask (with matplotlib and on a dynamic map)
trains and loads a dummy model for qualitative and quantitative evaluation

There are quite a few things I want to correct and improve:

train on GPU for a reasonable number of epochs, with proper dataloader and Trainer hyperparameters
maybe host the pretrained model + Sentinel-2 patch on HF?
use a bigger Sentinel-2 patch for training and possibly download another patch for inference or use some sort of opportunistic sampling (can we do that with GridSampler?) for proper evaluation that tones down potential spatial autocorrelation
EuroCrops has over 300 labels, but each country has its own distinct subset. The number of classes Slovakia has is still high. Shall we just turn this into a binary crop classification?
addressing the question above comes down to what we want to do with the trained model, i.e., does it add value to form multi-class classification?
there is a skimage dependency to visualize Sentinel-2 with percentile normalization; and folium, pyproj, shapely for plotting the Sentinel-2 and EuroCrops bounds on a dynamic map -- or is it fine to download 3rd party libraries for individual case studies?

P.S. In the next iteration, I am thinking of renaming the case study to Crop Type Classification. That would describe the task better

docs/tutorials/case_studies.rst

Co-authored-by: Adam J. Stewart <[email protected]>

adamjstewart · 2025-01-04T21:19:46Z

Still need to actually look at the code, but here are responses to your TODOs:

train on GPU for a reasonable number of epochs, with proper dataloader and Trainer hyperparameters

Note that this needs to run in CI, preferably in seconds, not days. We can monkeypatch certain hyperparams to make this faster, but it shouldn't require a GPU.

maybe host the pretrained model + Sentinel-2 patch on HF?

Happy to do this if it makes the above faster while still getting good results.

use a bigger Sentinel-2 patch for training and possibly download another patch for inference or use some sort of opportunistic sampling (can we do that with GridSampler?) for proper evaluation that tones down potential spatial autocorrelation

Avoid big data, this needs to run in CI where we have very limited storage, don't want to wait 10 min to download data during a tutorial. Not sure what you mean by opportunistic sampling, but there are various GeoDataset splitting methods that you can use to chop a tile into east/west splits, grids, etc.

EuroCrops has over 300 labels, but each country has its own distinct subset. The number of classes Slovakia has is still high. Shall we just turn this into a binary crop classification? addressing the question above comes down to what we want to do with the trained model, i.e., does it add value to form multi-class classification?

I think both add value. Basically, we should have some kind of binary semantic segmentation application, and some kind of multiclass semantic segmentation application. They don't both have to be for agriculture though. For binary, something like building mapping may make more sense.

Also, tasks involving agriculture benefit greatly from time-series data. I'm planning on extending this tutorial for time series once we add support for it. So don't worry too much about the details right now, they will change in the future. This will also make the big data problem even worse, so keep the images small for now.

there is a skimage dependency to visualize Sentinel-2 with percentile normalization; and folium, pyproj, shapely for plotting the Sentinel-2 and EuroCrops bounds on a dynamic map -- or is it fine to download 3rd party libraries for individual case studies?

Would prefer to avoid any additional dependencies if we can. Any reason we can't plot a static map with matplotlib? eurocrops.plot(sample) and sentinel2.plot(sample) should get you pretty far. If we do need to add additional deps, they need to be installed in .github/workflows/tutorials.yaml and .github/workflows/release.yaml like we did with planetary_computer. But I'm trying to get rid of those too, since they aren't absolutely necessary and aren't tracked by dependabot like our formal deps.

P.S. In the next iteration, I am thinking of renaming the case study to Crop Type Classification. That would describe the task better

I agree with the rename. Both "Crop Classification" and "Crop Type Mapping" are common names. I think the latter may actually be even more common, and more technically correct. A computer vision person may argue that this is semantic segmentation, not classification. Of course, semantic segmentation is just pixelwise classification, so the distinction isn't too important.

burakekim · 2025-01-11T16:00:14Z

Re: Using a dynamic map of overlaid mask and Sentinel-2 AOIs: I think it just looks nice, which is not a legit reason to keep it. I will discard it later

Re: Naming: Changed to crop type mapping, although I do not think the mapping wording perfectly fits the task -- not in the technical sense, but in the practical sense. We do not often say, "In this tutorial, we map crop types[...]" but rather, "In this tutorial, we classify crop types[...]" Still, I am fine with the current naming. As for classification vs segmentation, I lean toward segmentation

Re: CPU-friendly pipeline: With the current setup, each epoch takes ~8 minutes. I was targeting a training+test of 30mins on my laptop, which gives me a budget of 3 epochs. However, this way, the prediction does not look pretty at all for 10-class classification. We can do one or both: increase the tutorial time budget or lower the number of classes from 10 to, say, 3 (three most occuring classes in our AOI). What are your priorities?

Re: Uploading the weights and Sentinel-2 patch on HF: This does not seem to offer a great deal of convenience unless we have a particular focus on allowing users to have the pretrained weights and Sentinel-2 patch and skip straight to inference -- which I do not think is the case because that bypasses the end-to-end showcase of TG abilities, which is the key feature of this tutorial

Re: Multi-class setting: I first listed all the crop type classes that fall under our AOI, got the 10 crop type classes with the highest occurrence, and set that as the classes argument

Thinking about it, we could consider a method for the users to choose the top-N crop types based on their occurrences for their AOIs. Otherwise, for larger AOIs (ours is fairly small and initially had 30 classes and for example, 239 for Estonia), the number of crop types might make the task challenging for those with specific experimental needs (we do not likely need 239-class classification task). This would be another PR though

adamjstewart · 2025-01-11T18:17:00Z

With the current setup, each epoch takes ~8 minutes. I was targeting a training+test of 30mins on my laptop, which gives me a budget of 3 epochs.

Change this from minutes to seconds and then this might get merged. Imagine presenting a tutorial in person and clicking "run" and telling people to just wait 30 min and it will be ready. It isn't hard to hack this to be fast in CI, but it should also be fast in person as well. This is where pre-trained models can help.

Exact number of crop types that is appropriate depends on the region, but 10 sounds okay to me.

burakekim and others added 3 commits November 29, 2024 20:38

define cdl and s2 datamodules

3dc8f5e

Merge branch 'microsoft:main' into tutorial_be

890d9d5

debugging length none error

5e3736e

github-actions bot added the documentation Improvements or additions to documentation label Dec 5, 2024

burakekim added 5 commits December 5, 2024 16:14

markdowns

c9fc49b

cdl to eurocrops

c2cb387

no spatiotemporal intersection error

c15ff01

solve spatiotemporal error + new France S2 + plotting

857d45f

vectordataset getitem taking forever

786e997

adamjstewart mentioned this pull request Dec 6, 2024

Add additional tutorials #2418

Open

25 tasks

set up training

78c90fe

burakekim and others added 2 commits December 6, 2024 17:55

add to index.rst and author info

83098f2

Merge branch 'main' into tutorial_be

d2ffb64

adamjstewart modified the milestones: 0.6.2, 0.6.3 Dec 8, 2024

burakekim and others added 5 commits January 3, 2025 20:07

handle Nones in get_label

068c731

make ruff hapy

1b18387

unit test for the win

3590666

Merge branch 'main' into tutorial_be

a07e9e1

I might have messed up the index.rst

427da4c

adamjstewart reviewed Jan 3, 2025

View reviewed changes

docs/index.rst Outdated Show resolved Hide resolved

burakekim added 7 commits January 4, 2025 01:01

solved the nodata and code-got-stuck errors

f3d26c9

solved the nodata and code-got-stuck errors

247aaa8

set up training -- train on WS, need GPU

c96043f

first complete draft and case_studies.rst addition

44b2e8b

Merge branch 'main' into tutorial_be

68f5b2c

revert index.rst?

4464a88

revert index.rst for real?

cf2fdd3

burakekim added 2 commits January 4, 2025 14:32

index.rst spaces on a sunny day

33641de

ok this was the last one

1e54f03

some docstring

b1f930b

adamjstewart reviewed Jan 4, 2025

View reviewed changes

docs/tutorials/case_studies.rst Outdated Show resolved Hide resolved

Update docs/tutorials/case_studies.rst

c50ba47

Co-authored-by: Adam J. Stewart <[email protected]>

adamjstewart mentioned this pull request Dec 18, 2024

Time Series Support #2382

Open

29 tasks

burakekim and others added 3 commits January 10, 2025 00:00

remove print statement

4593510

Merge branch 'main' into eurocrops_handlenones

863aaa9

rename to crop type mapping

a884c2b

burakekim changed the title ~~Add Land cover mapping tutorial~~ Add Crop Type Mapping tutorial Jan 11, 2025

Merge branch 'eurocrops_handlenones' into tutorial_be

2007429

github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Jan 11, 2025

burakekim and others added 4 commits January 11, 2025 16:13

top10 hcat classes for training

d5038a2

Merge branch 'eurocrops_handlenones' into tutorial_be

65fd5a1

Merge branch 'main' into tutorial_be

15a921d

10-class training

83c0a75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Crop Type Mapping tutorial #2449

Add Crop Type Mapping tutorial #2449

burakekim commented Dec 5, 2024 •

edited

Loading

nilsleh commented Dec 6, 2024

burakekim commented Jan 4, 2025 •

edited

Loading

adamjstewart commented Jan 4, 2025

burakekim commented Jan 11, 2025 •

edited

Loading

adamjstewart commented Jan 11, 2025

Add Crop Type Mapping tutorial #2449

Are you sure you want to change the base?

Add Crop Type Mapping tutorial #2449

Conversation

burakekim commented Dec 5, 2024 • edited Loading

nilsleh commented Dec 6, 2024

burakekim commented Jan 4, 2025 • edited Loading

adamjstewart commented Jan 4, 2025

burakekim commented Jan 11, 2025 • edited Loading

adamjstewart commented Jan 11, 2025

burakekim commented Dec 5, 2024 •

edited

Loading

burakekim commented Jan 4, 2025 •

edited

Loading

burakekim commented Jan 11, 2025 •

edited

Loading