Generate TimeOut errors with pandas.apply
- Free software: Apache Software License 2.0
- Documentation: https://pdtimeout.readthedocs.io.
- Define a pandas DataFrame
df = pd.DataFrame({'number': [1, 0.5, 0.2, 2, 0.3]}, index=[0, 2, 4, 6, 8])
Index | Number |
---|---|
0 | 1 |
2 | 0.5 |
4 | 0.2 |
6 | 2 |
8 | 0.3 |
- Define a function to apply on the DataFrame and set the
timeout
value
@timeout(4)
def sleep_and_triple(row):
number_ = row['number']
time.sleep(number_)
return number_ * 3
This function first sleeps for number
seconds and then returns the triple of the input value.
Since the highest number is 2, a timeout value of 4 should not trigger a TimeOut error.
- Apply function on DataFrame
df['result'] = df.apply(sleep_and_halve, axis=1)
print(df)
Index | Number | Result |
---|---|---|
0 | 1 | 3 |
2 | 0.5 | 1.5 |
4 | 0.2 | 0.6 |
6 | 2 | 6 |
8 | 0.3 | 0.9 |
- Change the timeout value to 1.7 seconds and re-apply function on DataFrame
@timeout(1.7)
def sleep_and_triple(row):
number_ = row['number']
time.sleep(number_)
return number_ * 3
df['result'] = df.apply(sleep_and_halve, axis=1)
print(df)
The following TimeOut error is triggered:
>>> "TimeoutError: ('Time expired', 'occurred at index 6')"
The row index (pandes .loc
) of the row triggering the TimeOut error is given in the error message.
The row with index 6 sleeps for 2 seconds which is longer than the timeout value
, thus the error is triggered.
@timeout(1, replace_value='TimeOut')
def sleep_and_triple(row):
number_ = row['number']
time.sleep(number_)
return number_ * 3
df['result'] = df.apply(sleep_and_halve, axis=1)
print(df)
Index | Number | Result |
---|---|---|
0 | 1 | TimeOut |
2 | 0.5 | 1.5 |
4 | 0.2 | 0.6 |
6 | 2 | TimeOut |
8 | 0.3 | 0.9 |
The time_apply
can be used to monitor the execution time of each row as follows:
@time_apply()
def sleep_and_triple(row):
number_ = row['number']
time.sleep(number_)
return number_ * 3
df['result'] = df.apply(sleep_and_halve, axis=1)
print(df)
Index | Number | Result |
---|---|---|
0 | 1 | 1.000667 |
2 | 0.5 | 0.500193 |
4 | 0.2 | 0.205290 |
6 | 2 | 2.005164 |
8 | 0.3 | 0.301278 |
The returned value (number * 3
) is replaced by the execution time for the associated row.
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.