Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS 2-4: Add Celery Task for Global Land Cover Data Fetching #3633

Merged
merged 6 commits into from
Jul 22, 2024

Conversation

rajadain
Copy link
Member

@rajadain rajadain commented Jul 15, 2024

Overview

Adds an endpoint allowing for analysis of Global Land Use / Land Cover data, using the Impact Observatory Annual LULC dataset available on AWS Open Registry.

The new endpoint works identically to the existing analyze/land endpoint, except it doesn't include Active River Area.

This implementation uses the Python toolchain for querying and fetching data from STAC, as done in https://github.com/rajadain/mmw-io-10m-lulc-summary. This could potentially be switched for an MMW-Geoprocessing version, if WikiWatershed/mmw-geoprocessing#117 ever becomes more performant.

Input validation and caching are deferred to a later card, with this PR focusing on core implementation.

Closes #3628

Demo

xh --verbose :8000/api/analyze/global-land/2019/ huc=0204020310 Authorization:"Token 4b91f3a5b5a635cb2aa6cefa5000ffdd28828b31"
POST /api/analyze/global-land/2019/ HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate, br
Authorization: Token 4b91f3a5b5a635cb2aa6cefa5000ffdd28828b31
Connection: keep-alive
Content-Length: 20
Content-Type: application/json
Host: localhost:8000
User-Agent: xh/0.22.0

{
    "huc": "0204020310"
}
HTTP/1.1 200 OK
Allow: POST, OPTIONS
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Mon, 15 Jul 2024 15:35:39 GMT
Location: /api/jobs/caf54a2a-5f89-47dd-b5cd-903b40d74934/
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
Vary: Accept, Cookie, Origin

{
    "job": "caf54a2a-5f89-47dd-b5cd-903b40d74934",
    "job_uuid": "caf54a2a-5f89-47dd-b5cd-903b40d74934",
    "status": "started",
    "messages": [
        "The `job` field will be deprecated in an upcoming release. Please switch to using `job_uuid` instead."
    ]
}
xh --verbose :8000/api/jobs/caf54a2a-5f89-47dd-b5cd-903b40d74934/ Authorization:"Token 4b91f3a5b5a6
35cb2aa6cefa5000ffdd28828b31"
GET /api/jobs/caf54a2a-5f89-47dd-b5cd-903b40d74934/ HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate, br
Authorization: Token 4b91f3a5b5a635cb2aa6cefa5000ffdd28828b31
Connection: keep-alive
Host: localhost:8000
User-Agent: xh/0.22.0
HTTP/1.1 200 OK
Allow: GET, OPTIONS
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Mon, 15 Jul 2024 15:36:09 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
Vary: Accept, Cookie, Origin

{
    "job_uuid": "caf54a2a-5f89-47dd-b5cd-903b40d74934",
    "status": "complete",
    "result": {
        "survey": {
            "name": "global_land_io_2019",
            "displayName": "Global Land Use/Cover 2019",
            "categories": [
                {
                    "area": 14941947.428126818,
                    "code": "water",
                    "coverage": 0.02144532388706757,
                    "ioclass": 1,
                    "type": "Water"
                },
                {
                    "area": 109588951.93420611,
                    "code": "trees",
                    "coverage": 0.15728676465889277,
                    "ioclass": 2,
                    "type": "Trees"
                },
                {
                    "area": 48464.99853478383,
                    "code": "flooded_vegetation",
                    "coverage": 0.00006955904481421345,
                    "ioclass": 4,
                    "type": "Flooded vegetation"
                },
                {
                    "area": 29756139.065063003,
                    "code": "crops",
                    "coverage": 0.042707287182504744,
                    "ioclass": 5,
                    "type": "Crops"
                },
                {
                    "area": 488978611.8600828,
                    "code": "built_area",
                    "coverage": 0.7018030785899226,
                    "ioclass": 7,
                    "type": "Built area"
                },
                {
                    "area": 535341.2912358101,
                    "code": "bare_ground",
                    "coverage": 0.0007683447847676015,
                    "ioclass": 8,
                    "type": "Bare ground"
                },
                {
                    "area": 0.0,
                    "code": "snow_ice",
                    "coverage": 0.0,
                    "ioclass": 9,
                    "type": "Snow/ice"
                },
                {
                    "area": 0.0,
                    "code": "clouds",
                    "coverage": 0.0,
                    "ioclass": 10,
                    "type": "Clouds"
                },
                {
                    "area": 52896720.20292212,
                    "code": "rangeland",
                    "coverage": 0.07591964185203046,
                    "ioclass": 11,
                    "type": "Rangeland"
                }
            ]
        }
    },
    "error": "",
    "started": "2024-07-15T15:35:38.989489Z",
    "finished": "2024-07-15T15:35:54.471039Z"
}

Testing Instructions

  • Check out this branch and reprovision the app and worker
    vagrant reload app worker --provision
  • Run ./scripts/debugserver.sh
  • Go to http://localhost:8000 and login with an account. Register one for yourself if you don't have one, using the confirmation link in the output of debugserver above to activate your account.
  • Go to your account settings page and copy your API token
  • Use xh, httpie, curl, or your API client of choice to test the new endpoint:
    curl -X POST -d '{"huc": "0204020310"}' -H "Authorization: Token $YOUR_API_TOKEN" -H "Content-Type: application/json" http://localhost:8000/api/analyze/global-land/2019/
    • Ensure it succeeds
  • Get the job details
    curl -H "Authorization: Token $YOUR_API_TOKEN" http://localhost:8000/api/jobs/$YOUR_JOB_ID/
    • Ensure it succeeds and has similar output to the above

@rajadain
Copy link
Member Author

Rebased atop latest develop. Added a commit 76f8190 to have the correct legend on the map:

image

Copy link
Contributor

@rachelekm rachelekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to get past this error with provisioning the worker. It's from the azavea.docker task to install docker, and I tried to update the task module locally by adding allow_downgrade: yes and update_cache: yes but to no avail (I get a different Unsupported parameters error). Not sure if you had issues with provisioning or suggestions to update the cache to allow for a downgrade following a rebase on develop?

TASK [azavea.docker : Install Docker] ******************************************
Thursday 18 July 2024  20:43:10 +0000 (0:00:00.356)       0:01:02.820 ********* 
fatal: [worker]: FAILED! => {"cache_update_time": 1706595872, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"      install 'docker-ce=5:25.*' 'docker-ce-cli=5:25.*'' failed: E: Packages were downgraded and -y was used without --allow-downgrades.\n", "rc": 100, "stderr": "E: Packages were downgraded and -y was used without --allow-downgrades.\n", "stderr_lines": ["E: Packages were downgraded and -y was used without --allow-downgrades."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSuggested packages:\n  aufs-tools cgroupfs-mount | cgroup-lite\nThe following packages will be DOWNGRADED:\n  docker-ce docker-ce-cli\n0 upgraded, 0 newly installed, 2 downgraded, 0 to remove and 105 not upgraded.\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Suggested packages:", "  aufs-tools cgroupfs-mount | cgroup-lite", "The following packages will be DOWNGRADED:", "  docker-ce docker-ce-cli", "0 upgraded, 0 newly installed, 2 downgraded, 0 to remove and 105 not upgraded."]}

@rajadain
Copy link
Member Author

rajadain commented Jul 19, 2024

Ah yes, that error says that previously Docker v26 was installed, and now that we're asking for Docker v25 it would be a downgrade. This requires manual intervention to fix, which is why the automated provisioning fails.

The way around this is to destroy and recreate the Worker:

vagrant destroy -f worker && vagrant up worker

@rajadain
Copy link
Member Author

The Impact Observatory API was down yesterday during demo, but is back up now: https://api.impactobservatory.com/stac-aws/

@rajadain
Copy link
Member Author

Actually wait, if you haven't destroyed your worker VM yet hold off. I'm going to rebase this PR atop the latest develop, which may help with that matter.

rajadain added 6 commits July 22, 2024 10:35
This is based on preparatory work done in https://github.com/rajadain/mmw-io-10m-lulc-summary/blob/main/main.py

We create a Celery task that takes a STAC URL, collection, asset,
and optional filters, and generates a histogram from it.

There is an additional method to format this into an MMW Geoprocessing
like shape, which allows for compatibility and modularity.
This task takes the output of the STAC Histogram Query and
calculates the area and coverage percent from it, and
formats it into the standard MMW output.

We remove IO Class 0 NOADAT from the standard set of
outputs so that we don't include it in the final output.
This follows the format of the Analyze Land endpoint.

Input validation and caching is deferred to a later commit.
@rajadain rajadain force-pushed the tt/3628/add-celery-task-for-stac-fetching-2 branch from 76f8190 to fffe8cd Compare July 22, 2024 14:36
@rajadain
Copy link
Member Author

Alright this is ready to be reviewed again. If this still doesn't work, then yes please destroy the worker VM and try again.

@rachelekm rachelekm self-requested a review July 22, 2024 17:52
Copy link
Contributor

@rachelekm rachelekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success! I force pulled the above commits and still ran into the same caching issues so I did have to destroy my worker VM after all -- but that did the trick and this is working as expected:

rachelemorino@Racheles-MacBook-Pro ~ % curl -X POST -d '{"huc": "0204020310"}' -H "Authorization: Token $YOUR_API_TOKEN" -H "Content-Type: application/json" http://localhost:8000/api/analyze/global-land/2019/

{

"job": "f673d52c-7e15-43a8-816a-bfff4cf305e0",

"job_uuid": "f673d52c-7e15-43a8-816a-bfff4cf305e0",

"status": "started",

"messages": [

"The `job` field will be deprecated in an upcoming release. Please switch to using `job_uuid` instead."

]

}
rachelemorino@Racheles-MacBook-Pro ~ % curl -H "Authorization: Token $YOUR_API_TOKEN" http://localhost:8000/api/jobs/$YOUR_JOB_ID/
{

"job_uuid": "f673d52c-7e15-43a8-816a-bfff4cf305e0",

"status": "complete",

"result": {

"survey": {

"name": "global_land_io_2019",

"displayName": "Global Land Use/Cover 2019",

"categories": [

{

"area": 14941947.428126818,

"code": "water",

"coverage": 0.02144532388706757,

"ioclass": 1,

"type": "Water"

},

{

"area": 109588951.93420613,

"code": "trees",

"coverage": 0.15728676465889277,

"ioclass": 2,

"type": "Trees"

},
...
]

}

},

"error": "",

"started": "2024-07-22T17:38:25.276397Z",

"finished": "2024-07-22T17:38:23.403943Z"

}

@rajadain rajadain merged commit e0d8de1 into develop Jul 22, 2024
2 checks passed
@rajadain rajadain deleted the tt/3628/add-celery-task-for-stac-fetching-2 branch July 22, 2024 17:57
@rajadain
Copy link
Member Author

Thanks for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Funding Source: AWS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS 2-4: Add Celery Task for Global Land Cover Data Fetching
2 participants