Account for job start/end time not exactly matching forecast data points #54

tlestang · 2023-07-23T15:05:15Z

Currently, the average carbon intensity for a job is estimated over a time interval spanning $N + 1$ data points, where $N$ is the ceil of the ratio between the job duration and the timestep between data points. Moreover, cats approximates the job start time as the time of the first available data points. For carbonintensity.org.uk, this the previous top of the hour, or half hour.

example:
current time is 12:48 and my job is 194 minutes long. The job start time will be approximated as 12:30, and the job duration will be approximation as $210min = 7 \times 30min$ where $30min$ is the interval between two data points.

The approach followed in the PR is similar to changes included in #43 , but the implementation fits within the current structure by only enhancing the WindowedForecast class. These changes are covered by a new test test_average_intensity_with_offset.

implementation:
The average carbon intensity over a given time window is estimated by summing the intensity values at the midpoints between each consecutive data points in the window. The first midpoint (midpt[0]) is overidden to account for the fact that the first point (used to compute midpt[0]) should be located at the job start time. The corresponding intensity value is interpolated between the first and second data point. Similarly, the last midpoint (midpt[-1]) is overidden to account for the fast that the last point should be located at the job end time. The corresponding intensity value is interpolated between the penultimate and last data points in the window.

For a given candidate window, the first (last) data point is interpolated using the directly preceding (following) available data point before (after) the start (end) of the job.

edit 2023-07-24 13:20: Changed the implementation to handle short jobs correctly.

tlestang · 2023-07-23T15:08:23Z

Looks like this is only working with python3.10+, because I'm relying on being able to pass a sorting key to bisect. Probably worth working around this to be able to support 3.9 as well.

andreww

Changes all look good to me. One question - what happens if somebody submits a task that lasts less than 30 mins? For now, we maybe just want a test case (and I hope it 'just works'). In principle I guess we could start fitting some smooth function to the CI data, but I don't think we want to go there.

This allows handling of short jobs that fit between two consecutive data points. It also makes the implementation simpler, in my opinion.

tlestang · 2023-07-24T12:18:54Z

what happens if somebody submits a task that lasts less than 30 mins?

Well that's a very good question, and I don't think this case was handled correctly.

In response to this I changed the implementation slightly so it 'just works' for short jobs, in other words it shouldn't be a corner case anymore. I'm adding a new test with a 6 minutes job to check that this is correct.

Details: instead of computing the mid-points over the (over-estimated) window and then fixing them, it now builds the window, interpolating both ends, and computes the midpoints as if nothing happened. Besides handling short jobs naturally, I think this actually makes the code a lot easier to understand.

Example: You run cats at 12:48 for a 6minutes job. The second candidate window is 13:18 to 13:24, located between data points at 13:00 (data[1]) and 13:30 (data[2]). Both carbon intensity values at 13:18 and 13:24 and interpolated by drawing a straight line between data points at 13:00 and 13:30.

I guess we could start fitting some smooth function to the CI data

I think linear interpolation is enough. If you look at the carbon intensity timeseries, it's already very smooth. In other words the signal doesn't exhibit large variations within 30 min. I guess forecast providers provide data points at an interval ensuring the timeseries is well resolved.

cats/forecast.py

colinsauze · 2023-07-24T16:39:53Z

Example: You run cats at 12:48 for a 6minutes job. The second candidate window is 13:18 to 13:24, located between data points at 13:00 (data[1]) and 13:30 (data[2]). Both carbon intensity values at 13:18 and 13:24 and interpolated by drawing a straight line between data points at 13:00 and 13:30.

I guess we could start fitting some smooth function to the CI data

I think linear interpolation is enough. If you look at the carbon intensity timeseries, it's already very smooth. In other words the signal doesn't exhibit large variations within 30 min. I guess forecast providers provide data points at an interval ensuring the timeseries is well resolved.

I'd agree that linear interpolation is good enough here. Getting things within the right 30 minute period will achieve what we need in terms of emissions minimisation.

Part of me is tempted to say the best strategy for sub 30 minute jobs is to randomly determine when they run within the lowest 30 minute window available. This way if many people were launching small jobs they wouldn't all try to run at the same time (even when distributed across multiple unrelated systems).

tlestang · 2023-07-24T17:24:05Z

Part of me is tempted to say the best strategy for sub 30 minute jobs is to randomly determine when they run within the lowest 30 minute window available. This way if many people were launching small jobs they wouldn't all try to run at the same time (even when distributed across multiple unrelated systems).

Smart. I don't really want to add more features to this PR but that's probably a good starting point for a new one. Or an issue.

colinsauze · 2023-07-24T17:25:45Z

Yes, it's definitely something for a different PR/issue and not critical for getting a minimum viable product ready.

Llannelongue

Thanks for this. I've just ran small test that I had implemented in #43, and for some reason I don't get the expected result. It looks like lbound compute the wrong value here, but it may also well be that I ran the test incorrectly for the current structure. Do you know why it fails?

CI_forecast = [
    CarbonIntensityPointEstimate(datetime=datetime(2023,1,1,8,30), value=26),
    CarbonIntensityPointEstimate(datetime=datetime(2023,1,1,9,0), value=40),
    CarbonIntensityPointEstimate(datetime=datetime(2023,1,1,9,30), value=50),
    CarbonIntensityPointEstimate(datetime=datetime(2023,1,1,10,0), value=60),
    CarbonIntensityPointEstimate(datetime=datetime(2023,1,1,10,30), value=25),
]
wf = WindowedForecast(data=CI_forecast, duration=70, start=datetime(2023,1,1,9,15))
print(wf[0])
start_new = 45
end_new = 60 + (25-60)/30*25
expected_result_trapezoidal = ((start_new+50)*15/2 + (50+60)*30/2 + (60+end_new)*25/2)/70
print(expected_result_trapezoidal)
assert math.isclose(wf[0].value, expected_result_trapezoidal)

(I believe the integral between 9:15 and 10:25 should be 49.9.., with the left bound 45 and right bound 30.83..)

cats/forecast.py

Llannelongue · 2023-07-26T20:57:52Z

cats/forecast.py

+        # second data point (index + 1) in the window.  The ending
+        # intensity value is interpolated between the last and
+        # penultimate data points in he window.
+        window_start = self.start + index * self.data_stepsize


One potential downside of doing all the calculations in the __getitem__ method is that it requires redoing calculations every time (I think?): as in, min already calculates all windows, but then w[0] does it again etc.. An alternative would be to calculate all windows once, store it in self and only access it after that. Or are there performance benefits to doing everything here?

Overall I think performance should not be a concern here. Even then, only the the first window is computed twice. But yes, if you were to run min twice, you'd compute all windows twice.

tlestang · 2023-07-26T21:22:08Z

Thanks for testing @Llannelongue because I think you've exposed a mistake! In getitem

        return CarbonIntensityAverageEstimate(
            start=window_start,
            end=window_end,
            value=sum(midpt) / (self.ndata - 1),
        )

dividing by self.ndata - 1 assumes all midpoints are regularly spaced: true for inner points but not the first and last one, based on interpolated data points. I've also made this assumption in the tests I believe... 🤕

Although inner midpoints are separated by the same timestep. The first (interpolated) point and second are not. Same goes for penultimate and last (interpolated) points.

Co-authored-by: Loïc Lannelongue <[email protected]>

tlestang · 2023-07-27T11:02:06Z

I changed the average calculation to account for the difference in weights. @Llannelongue I added a test case based on your test above.

I believe the integral between 9:15 and 10:25 should be 49.9.., with the left bound 45 and right bound 30.83..)

Agreed. I think it is now ;)

tests/test_windowed_forecast.py

tlestang · 2023-08-01T14:17:52Z

Thank you @andreww @colinsauze and @Llannelongue for reviewing this. It really helped make this better!

tlestang added 11 commits July 22, 2023 14:40

Compute window_size in WindowedForecast __init__

2bb2a01

Add function to interpolation intensity value

8b3b5fe

Interpolate start and end CI for each potential duration

f6251b3

Work with nb of data points instead of nb of intervals

d876e4f

Use data list directly instead of times and intensities

b2d5488

test: case where job start/end dont match data points

1158bad

Change a few variable names and layout for readability

60a29f0

Add docstring for interp method

1c14f0c

Move interp method below __getitem__

93be87c

Remove import of unused ceil function

22bb8b7

Dont need job duration as WindowedForecast attribute

925e6c9

Cannot use 'key' param of bisect for python 3.9

51bc336

tlestang requested review from abhidg, Llannelongue, andreww, colinsauze and sadielbartholomew July 23, 2023 16:34

andreww approved these changes Jul 24, 2023

View reviewed changes

tlestang added 2 commits July 24, 2023 12:47

Fix intensity value at window boundaries instead of midpoints

b7dc142

This allows handling of short jobs that fit between two consecutive data points. It also makes the implementation simpler, in my opinion.

test: job with duration smaller than time between data points

50ff2cc

tlestang commented Jul 24, 2023

View reviewed changes

cats/forecast.py Outdated Show resolved Hide resolved

Llannelongue requested changes Jul 26, 2023

View reviewed changes

Llannelongue reviewed Jul 26, 2023

View reviewed changes

cats/forecast.py Show resolved Hide resolved

Llannelongue reviewed Jul 26, 2023

View reviewed changes

cats/forecast.py Show resolved Hide resolved

Remove commented pdb call

9f61ba8

Llannelongue reviewed Jul 26, 2023

View reviewed changes

tlestang and others added 4 commits July 27, 2023 11:34

interp function returns a CarbonPointEstimate instance

02ed744

Weight midpoints with interpoint distance

afe12b4

Although inner midpoints are separated by the same timestep. The first (interpolated) point and second are not. Same goes for penultimate and last (interpolated) points.

test: account for weighted midpoints

b26841e

test: add a second test with a job spanning a few data points

62d1255

Co-authored-by: Loïc Lannelongue <[email protected]>

Merge main into adjust_integration_window branch

a223b1d

Llannelongue mentioned this pull request Jul 28, 2023

[minor] Make the tool work for carbon intensity intervals of arbitrary sizes #57

Open

Llannelongue reviewed Jul 28, 2023

View reviewed changes

tests/test_windowed_forecast.py Outdated Show resolved Hide resolved

Llannelongue reviewed Jul 28, 2023

View reviewed changes

tests/test_windowed_forecast.py Show resolved Hide resolved

tlestang added 2 commits July 28, 2023 15:28

test: Don't truncate interpolated intensity values

6930399

Dont assume start time falls withing first data interval

db76c19

tlestang force-pushed the adjust_integration_window branch from 7bd423f to db76c19 Compare July 28, 2023 14:59

Llannelongue approved these changes Jul 31, 2023

View reviewed changes

sadielbartholomew removed their request for review August 1, 2023 14:01

Merge branch 'main' into adjust_integration_window

50319ee

tlestang force-pushed the adjust_integration_window branch from 26d8476 to 50319ee Compare August 1, 2023 14:15

tlestang merged commit f625fa2 into main Aug 1, 2023
3 checks passed

tlestang deleted the adjust_integration_window branch August 1, 2023 14:18

tlestang mentioned this pull request Aug 11, 2023

Make sure we respect local system time #61

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for job start/end time not exactly matching forecast data points #54

Account for job start/end time not exactly matching forecast data points #54

tlestang commented Jul 23, 2023 •

edited

Loading

tlestang commented Jul 23, 2023

andreww left a comment

tlestang commented Jul 24, 2023 •

edited

Loading

colinsauze commented Jul 24, 2023

tlestang commented Jul 24, 2023

colinsauze commented Jul 24, 2023

Llannelongue left a comment

Llannelongue Jul 26, 2023

tlestang Jul 26, 2023

tlestang commented Jul 26, 2023

tlestang commented Jul 27, 2023

tlestang commented Aug 1, 2023

Account for job start/end time not exactly matching forecast data points #54

Account for job start/end time not exactly matching forecast data points #54

Conversation

tlestang commented Jul 23, 2023 • edited Loading

tlestang commented Jul 23, 2023

andreww left a comment

Choose a reason for hiding this comment

tlestang commented Jul 24, 2023 • edited Loading

colinsauze commented Jul 24, 2023

tlestang commented Jul 24, 2023

colinsauze commented Jul 24, 2023

Llannelongue left a comment

Choose a reason for hiding this comment

Llannelongue Jul 26, 2023

Choose a reason for hiding this comment

tlestang Jul 26, 2023

Choose a reason for hiding this comment

tlestang commented Jul 26, 2023

tlestang commented Jul 27, 2023

tlestang commented Aug 1, 2023

tlestang commented Jul 23, 2023 •

edited

Loading

tlestang commented Jul 24, 2023 •

edited

Loading