-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gap filling with NaN values added to Level 2 #283
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -217,6 +217,7 @@ def toL2( | |
|
||
|
||
ds = clip_values(ds, vars_df) | ||
ds = fill_gaps(ds) | ||
return ds | ||
|
||
|
||
|
@@ -770,6 +771,35 @@ def calcCorrectionFactor(Declination_rad, phi_sensor_rad, theta_sensor_rad, | |
|
||
return CorFac_all | ||
|
||
def fill_gaps(ds): | ||
'''Fill data gaps with nan values | ||
|
||
Parameters | ||
---------- | ||
ds : xarray.Dataset | ||
Data set to gap fill | ||
|
||
Returns | ||
------- | ||
ds_filled : xarray.Dataset | ||
Gap-filled dataset | ||
''' | ||
# Determine time range of dataset | ||
min_date = ds.to_dataframe().index.min() | ||
max_date = ds.to_dataframe().index.max() | ||
|
||
# Determine common time interval | ||
time_diffs = np.diff(ds['time'].values) | ||
common_diff = pd.Timedelta(pd.Series(time_diffs).mode()[0]) | ||
|
||
# Determine gap filled index | ||
full_time_range = pd.date_range(start=min_date, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How will it behave if there is a relative time shift in the date time indexes? Let's say we have a time series where the indices are come from two periods 01 and 02:
In this case, there will be a gap between 2023-01-05T02:00 and 2023-06-01T14:10. Both periods have hourly sample rates but the second part has an offset of 10 minutes. In this case, the generated indexes from
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do think that it is a relevant issue. We have cases where daily winter transmissions are taken as isolated hourly values and surrounded by NaN in the resampling process: We should have a better handling of mixed sample rates. Potentially with a |
||
end=max_date, | ||
freq=common_diff) | ||
|
||
# Apply gap-fille index to dataset | ||
ds_filled = ds.reindex({'time': full_time_range}, fill_value=np.nan) | ||
return ds_filled | ||
|
||
def _checkSunPos(ds, OKalbedos, sundown, sunonlowerdome, TOA_crit_nopass): | ||
'''Check sun position | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the nan values still available in the output files or are they removed later in the pipeline?
write.py#L166
Lcsv = Lx.to_dataframe().dropna(how="all")
write.py#L471
df = df.dropna(how="all")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thing that it should be handled in the
write
rather than in the processing.Actually I now remember that the
resample_dataset
is called when writing the L2 data:pypromice/src/pypromice/process/get_l2.py
Lines 42 to 48 in 3357e62
I thought this would fill the gaps within L2_raw and L2_tx data with NaN but apparently it doesn't!
So maybe it could be fixed there?
I can see that I did not make another resample after joining the raw and tx L2 data:
pypromice/src/pypromice/process/join_l2.py
Lines 101 to 102 in 3357e62
So there may be some gaps between the raw and tx data. For instance for a station that failed from Jan 2024 and visited in Jun 2024, we'll have raw data until Jan 2024 and tx data from Jun 2024.
Also not that we use the attribute
aws.L2.attrs['format']
to determine if a 10 min resample is needed on top of a hourly resample.aws.L2.attrs['format']
is inheritted fromaws.L1A.attrs['format']
which is inherited fromaws.L1[-1].attrs['format']
: the format of the last logger file. If that last file is a "STM" then no 10 min data is produced, even though there has been 10min data in older logger files.The use of
.mode
is also difficult to control, because when in presence of bothraw
andSTM
data, it depends on the number of occurence of a given sample rate.Note that these gaps do not exist in the level 3 files, potentially because there is a
resample
injoin_l3
after we merge data covering different periods:pypromice/src/pypromice/process/join_l3.py
Lines 531 to 536 in 3357e62