Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of Earthkit Data Extraction for Specific Region #487

Open
vinayk7 opened this issue Oct 15, 2024 · 2 comments
Open

Optimization of Earthkit Data Extraction for Specific Region #487

vinayk7 opened this issue Oct 15, 2024 · 2 comments

Comments

@vinayk7
Copy link

vinayk7 commented Oct 15, 2024

Problem Description:

Hello, I'm using Earthkit to extract weather data from a GRIB file provided by ECMWF. Specifically, I'm working with the u-vector wind data at a pressure level of 1000 hPa. Below is the code I've implemented to extract the data from a specific region (latitude and longitude bounds). As you can see I am simply extracting all the data and slicing it then searching on it, since I am new to weather analysis and python in general I want to know that is it the most optimum way or there exists some function in datakit that will extract it much faster ?

import earthkit.data
import numpy as np

data = earthkit.data.from_source("file", "20241006060000-0h-oper-fc.grib2")

u = data.sel(param="u", level=1000)[0]         # select wind data u-vector
u_data = u.data()

# The following info obtained about variable u_data while debugging
# u_data has 3 dimensions:
# at 0: ndarray with shape (721, 1440) | min value is -90.0 and max value is 90.0 | size is 1038240
# at 1: ndarray with shape (721, 1440) | min value is -180.0 and max value is 179.75 | size is 1038240
# at 2: ndarray with shape (721, 1440) | min value is -30.25118637084961 and max value is 26.03006362915039 | size is 1038240

shp = u_data.shape
print(shp)  # (3, 721, 1440)

lat_data = u_data[0, :, 0]   # Latitude values (take first column, all rows)
lon_data = u_data[1, 0, :]   # Longitude values (take first row, all columns)
temp_data = u_data[2]        # Wind (u-vector) data

lat_bounds = (30.0, 40.0)    # Example: from 30°N to 40°N
lon_bounds = (-100.0, -90.0) # Example: from 100°W to 90°W

# Data about the latitude flow in GRIB file is arranged from +90 (North pole) to -90 (South pole)
# 90, 80, .. 40, 30. So higher latitude degree occurs at lower index, we adjust for the fact below
lat_min_idx = np.abs(lat_data - lat_bounds[1]).argmin()
lat_max_idx = np.abs(lat_data - lat_bounds[0]).argmin()

# Ensure latitude min and max indices are within bounds
lat_min_idx = max(0, lat_min_idx)
size_lat_array = shp[1]
lat_max_idx = min(size_lat_array - 1, lat_max_idx)

# Data about the longitude flow in GRIB file is arranged from -180 (West) to +180 (East)
# no adjustment is needed
lon_min_idx = np.abs(lon_data - lon_bounds[0]).argmin()
lon_max_idx = np.abs(lon_data - lon_bounds[1]).argmin()

# Ensure longitude indices are within bounds
lon_min_idx = max(0, lon_min_idx)
size_lon_array = shp[2]
lon_max_idx = min(size_lon_array - 1, lon_max_idx)

# Extract the wind data (u-vector) for the specified region
region_temp_data = temp_data[lat_min_idx:lat_max_idx + 1, lon_min_idx:lon_max_idx + 1]
@sandorkertesz
Copy link
Collaborator

@vinayk7, unfortunately, earthkit does not offer a solution for subarea selection/cropping right now. However, all these features + grid handling will be available in the near future. Limited interpolation is already available in earthkit-regrid.

@vinayk7
Copy link
Author

vinayk7 commented Oct 25, 2024

Thanks for the information @sandorkertesz I will try earthkit-regrid to interpolate my subset data and earthkit-plots to plot it and get an image. Let's see how it goes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants