Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge bad Yahoo data from requests_cache #1204

Draft
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

ValueRaider
Copy link
Collaborator

@ValueRaider ValueRaider commented Nov 26, 2022

Improvements to requests_cache session use to encourage users use.

- Replace 'datetime.now()' with midnight

Some financial data was being fetched using parameters derived from datetime.now(), so obviously will never get reused in session cache. Replace with midnight <- Moved to separate PR #1284

  • Purge bad Yahoo data

When Yahoo returns garbage data, then automatically remove from session cache. Garbage = parsing fails to find expected keys etc. Can disable this if e.g. debugging problem, via yfinance.disable_prune_session_cache(). @fredrik-corneliusson did I miss any fetches?

  • Update README

Copy link
Contributor

@fredrik-corneliusson fredrik-corneliusson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great with documentation of how to use session, but not enthusiastic about the extra layer of added complexity when it should be up to the user to handle the session cache.
Most stuff there with handling errors can be done by the actual owner of the cache with request hooks.
If yfinance adds this support of only caching successful requests it should probably also be done by registering request hooks (documented at requests_cache site) in order to not complicate the class.

tests/ticker.py Outdated Show resolved Hide resolved
yfinance/base.py Outdated
@@ -147,7 +147,9 @@ def history(self, period="1mo", interval="1d",
return utils.empty_df()

if end is None:
end = int(_time.time())
dt_now = _pd.Timestamp(_datetime.datetime.utcnow()).tz_localize("UTC")
midnight = _pd.Timestamp.combine(dt_now.tz_convert(tz).date()+_datetime.timedelta(days=1), _datetime.time(0)).tz_localize(tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would caching it as being data up to next day make the cache return old data?
For example, I do a request for daily data before the exchange close then it will cache data missing the days data.
Then later that day after close when ticker data for that day is present the old cached data will be returned as long as it is done with the same session object.
Tip: I think the easiest way to get midnight is to use Timestamps floor method:

pd.Timestamp.utcnow().tz_convert("CET").floor("D")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem equally applies to following: Ticker.history(period="1d") Here user must evict old data, via requests_cache expiration.

Good suggestion on midnight.

@ValueRaider
Copy link
Collaborator Author

ValueRaider commented Nov 29, 2022

The pruning is targeting situations when the request itself was successful, and what failed was the subsequent processing in yfinance because Yahoo embedded incomplete/wrong/no data. This happens when long lists of tickers trigger their spam system e.g. #1087

So I'm thinking if those heavy users would be reluctant to use a session cache.

@ValueRaider
Copy link
Collaborator Author

I've moved the checking/pruning into a hook function.

@asafravid
Copy link
Collaborator

Nice feature!

@ValueRaider ValueRaider changed the title Improve session caching to encourage use Purge bad Yahoo data from requests_cache Feb 7, 2023
@ValueRaider ValueRaider marked this pull request as draft October 19, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants