Gap filling with NaN values added to Level 2 #283

PennyHow · 2024-08-12T13:14:26Z

I have added a gap filling step so that Level 2 data should not have any gaps. Instead, gaps should now be filled with NaN values. The following steps occur:

Determine the datetime range of the dataset
Determine the most common time interval (i.e. 10 minute, hourly etc.)
Generate an index with no gaps
Reindex the dataset to the new index

I found that this was occurring at times when stations were being visited, and therefore there are gaps in the data when maintenance is being carried out and the station is offline. In most cases, this is only for a couple of hours. But then NaN gaps were not present in the Level 2 dataset - the dataset just jumped from the hour the station went offline, to when it is online again.

I am open to suggestions for where this functionality should go. For now, it is in the L1toL2 processing as a step at the end before the Level 2 dataset is returned. However, another option could be for this to go in the write function.

ladsmund · 2024-08-20T06:51:54Z

src/pypromice/process/L1toL2.py

+    common_diff = pd.Timedelta(pd.Series(time_diffs).mode()[0])
+
+    # Determine gap filled index
+    full_time_range = pd.date_range(start=min_date, 


How will it behave if there is a relative time shift in the date time indexes?

Let's say we have a time series where the indices are come from two periods 01 and 02:

period_01_indexes = [2023-01-03T10:00, 2023-01-03T11:00, ....,2023-01-05T02:00] period_02_indexes = [2023-06-01T14:10, 2023-06-01T15:10, ....,2024-07-12T18:10]

In this case, there will be a gap between 2023-01-05T02:00 and 2023-06-01T14:10. Both periods have hourly sample rates but the second part has an offset of 10 minutes.

In this case, the generated indexes from pd.data_range will ignore the offset in the second period.

How will xarray.Dataest.reindex behave when the indexes are slightly off?

Is it an irrelevant case? Why?

I do think that it is a relevant issue.

We have cases where daily winter transmissions are taken as isolated hourly values and surrounded by NaN in the resampling process:
#244

We should have a better handling of mixed sample rates. Potentially with a time_bnds variable that I have seen in many other CF-compliant dataset.

ladsmund · 2024-08-20T07:00:23Z

src/pypromice/process/L1toL2.py

@@ -217,6 +217,7 @@ def toL2(


    ds = clip_values(ds, vars_df)
+    ds = fill_gaps(ds)


Are the nan values still available in the output files or are they removed later in the pipeline?

write.py#L166 Lcsv = Lx.to_dataframe().dropna(how="all")

write.py#L471 df = df.dropna(how="all")

I also thing that it should be handled in the write rather than in the processing.

Actually I now remember that the resample_dataset is called when writing the L2 data:

pypromice/src/pypromice/process/get_l2.py

Lines 42 to 48 in 3357e62

# Write out level 2

if outpath is not None:

if not os.path.isdir(outpath):

os.mkdir(outpath)

if aws.L2.attrs['format'] == 'raw':

prepare_and_write(aws.L2, outpath, aws.vars, aws.meta, '10min')

prepare_and_write(aws.L2, outpath, aws.vars, aws.meta, '60min')

I thought this would fill the gaps within L2_raw and L2_tx data with NaN but apparently it doesn't!
So maybe it could be fixed there?

I can see that I did not make another resample after joining the raw and tx L2 data:

pypromice/src/pypromice/process/join_l2.py

Lines 101 to 102 in 3357e62

# Resample to hourly, daily and monthly datasets and write to file

prepare_and_write(all_ds, outpath, variables, metadata, resample = False)

So there may be some gaps between the raw and tx data. For instance for a station that failed from Jan 2024 and visited in Jun 2024, we'll have raw data until Jan 2024 and tx data from Jun 2024.

Also not that we use the attribute aws.L2.attrs['format'] to determine if a 10 min resample is needed on top of a hourly resample. aws.L2.attrs['format'] is inheritted from aws.L1A.attrs['format'] which is inherited from aws.L1[-1].attrs['format']: the format of the last logger file. If that last file is a "STM" then no 10 min data is produced, even though there has been 10min data in older logger files.

The use of .mode is also difficult to control, because when in presence of both raw and STM data, it depends on the number of occurence of a given sample rate.

Note that these gaps do not exist in the level 3 files, potentially because there is a resample in join_l3 after we merge data covering different periods:

pypromice/src/pypromice/process/join_l3.py

Lines 531 to 536 in 3357e62

v = pypromice.resources.load_variables(variables)

m = pypromice.resources.load_metadata(metadata)

if outpath is not None:

prepare_and_write(l3_merged, outpath, v, m, "60min")

prepare_and_write(l3_merged, outpath, v, m, "1D")

prepare_and_write(l3_merged, outpath, v, m, "M")

Gap filling step added

a23df02

PennyHow requested a review from ladsmund August 12, 2024 13:14

ladsmund reviewed Aug 20, 2024

View reviewed changes

ladsmund force-pushed the develop branch from 7b2ca2b to 80adcd8 Compare October 8, 2024 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gap filling with NaN values added to Level 2 #283

Gap filling with NaN values added to Level 2 #283

PennyHow commented Aug 12, 2024

ladsmund Aug 20, 2024 •

edited

Loading

BaptisteVandecrux Aug 20, 2024

ladsmund Aug 20, 2024

BaptisteVandecrux Aug 20, 2024

		@@ -217,6 +217,7 @@ def toL2(


		ds = clip_values(ds, vars_df)
		ds = fill_gaps(ds)

	# Write out level 2
	if outpath is not None:
	if not os.path.isdir(outpath):
	os.mkdir(outpath)
	if aws.L2.attrs['format'] == 'raw':
	prepare_and_write(aws.L2, outpath, aws.vars, aws.meta, '10min')
	prepare_and_write(aws.L2, outpath, aws.vars, aws.meta, '60min')

	# Resample to hourly, daily and monthly datasets and write to file
	prepare_and_write(all_ds, outpath, variables, metadata, resample = False)

	v = pypromice.resources.load_variables(variables)
	m = pypromice.resources.load_metadata(metadata)
	if outpath is not None:
	prepare_and_write(l3_merged, outpath, v, m, "60min")
	prepare_and_write(l3_merged, outpath, v, m, "1D")
	prepare_and_write(l3_merged, outpath, v, m, "M")

Gap filling with NaN values added to Level 2 #283

Are you sure you want to change the base?

Gap filling with NaN values added to Level 2 #283

Conversation

PennyHow commented Aug 12, 2024

ladsmund Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

BaptisteVandecrux Aug 20, 2024

Choose a reason for hiding this comment

ladsmund Aug 20, 2024

Choose a reason for hiding this comment

BaptisteVandecrux Aug 20, 2024

Choose a reason for hiding this comment

ladsmund Aug 20, 2024 •

edited

Loading