Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise error if not 200 sstatus code #82

Closed
wants to merge 1 commit into from

Conversation

hunterowens
Copy link
Collaborator

Currently, if get_status_changes or get_trips runs and returns a not 200 status code, the __describe() method is run but doesn't actually raise a python error, meaning the code continues to run.

This PR wraps the request block inside try/ except that runs __describe and then raises an error, which should stop program execution.

@thekaveman
Copy link
Contributor

I have mixed feelings about this. I see plenty of cases where a given page/request will timeout with a 504 or similar but then the next page/request succeeds. When doing a large backfill, it's helpful to be able to pass over these errors especially when the next backfill request may partially overlap the current, such that the timeout/miss is handled "eventually". Here's a recent example from my logs:

Requesting trips from Lime
Time range: 2019-09-08T12:00:00-07:00 to 2019-09-09T12:00:00-07:00
Requested https://data.lime.bike/api/partners/v1/mds/santa_monica/trips?min_end_time=1567969200000&max_end_time=1568055600000, Response Code: 504

# more error output...

Validating trips @ 0.3.2
0 records, 0 passed, 0 failed
Skipping data load
trips complete
Requesting trips from Lime
Time range: 2019-09-08T00:00:00-07:00 to 2019-09-09T00:00:00-07:00
Got payload with 1000 trips
Got payload with 1000 trips
Got payload with 522 trips
Validating trips @ 0.3.2
2522 records, 2522 passed, 0 failed
Loading trips
No POSTGRES_HOST_PORT environment variable set, defaulting to: 5432
trips complete

The request for 2019-09-08T12:00:00-07:00 to 2019-09-09T12:00:00-07:00 failed but then the backfill kept going and the request for 2019-09-08T00:00:00-07:00 to 2019-09-09T00:00:00-07:00 succeeded.

For the backfill case, a try/except at the ETL level would allow for restarting/retrying/continuing. But if we raise in the middle of paging, there isn't really a clean way to continue with subsequent pages.

@ian-r-rose
Copy link
Contributor

Perhaps using a library like this one could allow for retrying API endpoints, while still allowing for non-ephemeral failures to propagate up to the system?

@thekaveman
Copy link
Contributor

Ah... let me back up, because I wasn't thinking clearly with my earlier comment. Obviously if a given page throws an error we wouldn't get anything back to have a next query, so that case is moot. It isn't so much the retry/backoff aspect I am concerned about here - that could be handled by a user of this library with standard try/except semantics or another helper library like @ian-r-rose linked.

I think the core issue is, if we raise during any one page request, all previously successful pages are thrown away. So if page 1...9 return data and then page 10 times out, we lose everything previously gathered. (Thinking especially of the Bird case with potentially many pages representing only a few minutes of time each from the overall request window).

This is probably more an argument in favor of something like proposed/discussed in #13 and CityofSantaMonica/mds-provider-services#61 than anything else. On that note, @hunterowens would you mind rebasing on the latest master? The client underwent some significant refactors since 5f1c51e (the commit this branch is based on).

@ian-r-rose
Copy link
Contributor

I think the core issue is, if we raise during any one page request, all previously successful pages are thrown away. So if page 1...9 return data and then page 10 times out, we lose everything previously gathered. (Thinking especially of the Bird case with potentially many pages representing only a few minutes of time each from the overall request window).

I wonder if, in practice, using something like exponential backoff would be effective in mitigating failures like what you describe @thekaveman. We would then only raise if the last attempt (maybe 10th try, could be configurable) failed. We would still lose prior pages, but maybe not very often, and a true failure-after-ten-retries would be something worth looking at.

This is probably more an argument in favor of something like proposed/discussed in #13 and CityofSantaMonica/mds-provider-services#61 than anything else. On that note, @hunterowens would you mind rebasing on the latest master? The client underwent some significant refactors since 5f1c51e (the commit this branch is based on).

I don't think it would be a problem to move to an asynchonous execution model, though it probably wouldn't help too much, since each page link is derived from the previous one, so the amount of concurrency is pretty low (unless you start to parallelize across different providers or different time windows).

@thekaveman
Copy link
Contributor

These changes would need to be rebased on latest master. My points above may be less relevant now given MDS 0.4.x changes for static-file backed status_changes and trips endpoints that no longer support paging.

Suggest opening a new PR if changes are still desired.

@thekaveman thekaveman closed this May 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants