Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe Station averages #10

Open
qjhart opened this issue Nov 27, 2017 · 1 comment
Open

Describe Station averages #10

qjhart opened this issue Nov 27, 2017 · 1 comment

Comments

@qjhart
Copy link
Contributor

qjhart commented Nov 27, 2017

There are three sets of ETo estimations at each station that we compare for various reasons. These are daily station estimates, longterm estimates, and raster-based estimates. We use the following prefixes for these:

s_eto = CIMIS reported station ETo. These are the station data as reported contemporaneously with the raster data calculations. That means we have estimate for these every day for each day we have spatial CIMIS estimations. These are stored in the station table. There are about 500K entries in this table, for each day and each station.

r_eto = Spatial CIMIS calcuated ETo. These are the ETo estimations that are calculated each day, from the combination of the station data and the GOES estimated Rs estimations. We have these for every day spatial CIMIS was calculated as well. These are stored in the raster table. Thee are about 1M entries in this table, more than the station table, since we get the estimation regardless if the station reported a result for that particular day or now.

lt_eto = Long term average ETo. These are long term averages as supplied by the CIMIS program. We have a daily long term estimate, but these are independent of year, since they are averages. These are kept in the cimis_15day table. There are 134 stations, and 366 days per station in that table.

In addition, we often use 15 day window running averages of these data. We do this especially when summarizing our yearly data into 52 weekly average values. These values are what are used for calculating errors, and differences. They are also what are used as inputs to the FFT transfomations to further summarized our yearly ETo estimations into 5 FFT parameters, Three powers and two phase components. We add a 15 to the prefix designations. s15_eto , r15_eto, lt15_eto.

Station - Raster Comparisons

Long Term Average Comparisons

DWR has supplied some long term averages for a about 122 stations. Our interest is to compare these data with the Spatial CIMIS raster long term averages. The raster long term average data exists in the table, fft.raster_15avg_ed. There is one for every pixel, So we just need to extract the station pixels. We created a table for the station's associated pid from compare.station_xy that combines the station_info w/ the cimis boundaries, so we can just use that.

Station Location differences

Note, however, the lt_* data reports some stations considerably far from the
station_info data as reported by the et.water.ca.gov website. We are assuming the station info is correct, but these are the stations more then 500m from as reported be et.water.

station_id longitude latitude diff
135 -114.666 33.557 15431
196 -122.144 38.685 11337
88 -119.605 34.932 6388
84 -121.311 39.271 2088
152 -118.994 34.232 1407
114 -121.29 36.359 1305
170 -122.02 38.004 1264
194 -120.851 37.719 911
136 -116.154 33.516 868
175 -114.726 33.389 863
74 -116.973 33.09 758
56 -120.761 37.093 752
79 -122.421 38.549 698
62 -117.222 33.49 691
77 -122.41 38.434 614
90 -120.479 41.433 589
200 -116.258 33.746 553

We can then calculate the ratio of lt_p0/r_p0 to compare the DWR long term averages.

Contemporaneous Station Data Comparisons

When we are looking for biases in the station vs. raster estimations, we look at these data. One important table we have is the compare.ymd15 table. This compares the ETo estimations for s_eto and for r_eto for every 15day time window in Spatial CIMIS history. So, for each 15day time window, we calculate the average station and raster eto for that window. You can think of this as a 15x reduction in the data to compare, by only looking at those average values. There are about 69K entries in this table, covering overlap in each station, and each 15 day window, so each entry is an average of 15 days, or sometimes less. There is a range of overlapping windows based on these comparisons. The Station-Raster Dates and Count Google Sheet, shows the starting and stopping dates for the comparisons, and how many window entries overlap.

Now, we can take Just the overlapping time windows from this ymd15 table, and we can calculate our FFT transform parameters from that. So, note, for each raster location we are are calculating special FFT parameters, specific to the overlapping time windows with the stations. That way when we calculate a ratio, the ratio are comparing estimates from the same time period.

Combined Ratio Comparisons

The Long Term / Station / Raster Ratios Tab in the Google Sheet, shows a summary of the long_term and station ratios. Note there are two estimates from the raster data, the r_p0 is the long term data, and the s_r_p0 is the raster values from the data that overlap the station information. The too ratios then are s_p0_ratio = s_p0/s_r_p0 and lt_p0_ratio = lt_p0/r_p0. The ratios are fairly similar, but there are some differences. In that sheet, the column station_overlap_yrs shows the length of the comparison overlap. It's been suggested that for the station ratios to only look at stations with an overlap of 5 years or more.

If you were interested in seeing the largest differences, you could compared these two ways. You could look at the biggest differences in the p0 ratios, by looking at | (s_p0 / s_r_p0 ) -1 | where the absolute value orders by big differences in the ratio. If we are looking for a station to raster conversion, this ratio can be used. Or you could just look at the absolute value of the difference of s_r_p0 and r_p0, `| s_p0 - s_r_p0 |'. Here the values are equivalent to the average daily difference in ETo.

We plan to create a single multiplier for p0, we will look at the ration. The tab Rapid Change in s_p0/r_p0 In the Google Sheet, shows the stations that have the most rapid change in s_p0/r_p0 ratio in the images. Higher numbers mean more rapid changes from one station to another.

Ratio Splines.

This ratios are then used as input to a 3-d spline parameterization, Using Grass' v.vol.rst An example invocation looks like

 p0=${r}_s${s}_z${z}_t${t}_p0;  
 v.vol.rst --overwrite input=ratio wcolumn=${r}_p0_ratio \
 cross_input=Z@2km maskmap=state@2km \
 tension=${t} zscale=${z} smooth=${s}  cross_output=${p0} \
 where="${r}_p0_ratio is not null and station_overlap_yrs > 4";

A result of running a set of these splines is shown in the Splines Cloud directory.

The three parameters that are modified are

  • tension which affects the ability of a point to pull the interpolation to it. Higher tensions allow
    for higher bends in the fit. If you look at the t10 files, you can most easily see where the stations most differ from the rasters (the ratio is farthest from 1.)
  • zscale affects how much the elevation affects the spline. We have this low which makes this almost a 2-d fit.
  • smooth affects how far the spline can miss the input data. Higher smoothness allow for the data to not match the points exactly. This would however affect our desire for a matching layer, and the values are kept low.

Some of the parameters used result in an overshoot, that is the spline cannot be made to fit the data without extrapolating beyond the bounds of the input data. This is an indication that the spline is probably not too reliable.

You can see the data are pretty similar between the lt_ and 's_` values. for s=0 you need to increase tension to 7 before you remove overshoot, the result is a ratio that is probably a bit to blotchy. For s=0.02, you do get some overshoot at t=3, but the results are move believable.

Big Drivers for the Spline

Note the may be some indication of systematic changes west of the central valley, but they are not super clear. Note the LA stations show the biggest bend, but there are large bends up the west coast, and in the NE Ca (one station) as well.

  • In LA, the stations driving the spline are station_id=204 with a very high ratio of 1.2, near station_id=133 with a low ratio of 0.9.

  • In NE CA, its just station_id=57 with a ratio of 1.15

  • In the West it's more convoluted, but it involves station_id=109 that has a ratio of 1.005, but is surrounded by stations with a higher ratio, and then the pairs, station_id=122,212,140,167 That are high, near, station_id=166,42,70, that are low.

@qjhart
Copy link
Contributor Author

qjhart commented Nov 27, 2017

Ricardo suggested that we not include stations with comparison scales less than 5 years. That would eliminate about 38 stations from the 159 stations we have. That seems like a pretty good idea, as these staions with little overlap can have large errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant