Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results between two successive runs #1982

Open
KamarajuKusumanchi opened this issue Jul 13, 2024 · 5 comments
Open

Inconsistent results between two successive runs #1982

KamarajuKusumanchi opened this issue Jul 13, 2024 · 5 comments

Comments

@KamarajuKusumanchi
Copy link

Describe bug

The OHLC data returned by yf.download() is not reproducible across multiple runs. For example, if I download OHLC data of SPY two times, the numbers are slightly different. Ideally, the numbers should be the same.

Simple code that reproduces your problem

Consider the following code. It downloads daily OHLC data for SPY and writes it to a file.

 % cat sp500_daily_ohlc.py 
import yfinance as yf
from datetime import datetime, date

df = yf.download(["SPY"], date(2023, 1, 1), date(2024, 7, 2))

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

Run the code twice

 % python sp500_daily_ohlc.py
[*********************100%%**********************]  1 of 1 completed
writing data into daily_20240713_193451.csv
 % python sp500_daily_ohlc.py
[*********************100%%**********************]  1 of 1 completed
writing data into daily_20240713_193457.csv

Ideally, these two files should be the same. But they are not.

 % diff daily_20240713_193451.csv daily_20240713_193457.csv | wc -l
466
rajulocal@hogwarts ~/work/github/market_data_processor/src/inprogress
 % diff daily_20240713_193451.csv daily_20240713_193457.csv | head -n 20
2,6c2,6
< 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7543029785156,74850700
< 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.63201904296875,85934100
< 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.34478759765625,76970500
< 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8604736328125,104189600
< 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451721191406,73978100
---
> 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7542419433594,74850700
> 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.6319885253906,85934100
> 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447570800781,76970500
> 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8605041503906,104189600
> 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451416015625,73978100
8,10c8,10
< 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.1429138183594,68881100
< 2023-01-12,396.6700134277344,398.489990234375,392.4200134277344,396.9599914550781,388.5523986816406,90157700
< 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.0597839355469,63903900
---
> 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.14288330078125,68881100
> 2023-01-12,396.6700134277344,398.489990234375,392.4200134277344,396.9599914550781,388.55242919921875,90157700
> 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.059814453125,63903900

Debug log

Code with debug mode enabled

 % cat sp500_daily_ohlc.py 
import yfinance as yf
from datetime import datetime, date

yf.enable_debug_mode()

df = yf.download(["SPY"], date(2023, 1, 1), date(2024, 7, 2))

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

Output on the first run

 % python sp500_daily_ohlc.py
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       SPY: Yahoo GET parameters: {'period1': '2023-01-01 00:00:00-05:00', 'period2': '2024-07-02 00:00:00-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/SPY
DEBUG        params=frozendict.frozendict({'period1': 1672549200, 'period2': 1719892800, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'})
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          loaded persistent cookie
DEBUG          reusing cookie
DEBUG          crumb = 'tz63UYSUdiS'
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       SPY: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-01 13:30:00
DEBUG       SPY: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-01 09:30:00-04:00
DEBUG       SPY: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG       SPY: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
writing data into daily_20240713_194042.csv

Output from the second run

 % python sp500_daily_ohlc.py
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       SPY: Yahoo GET parameters: {'period1': '2023-01-01 00:00:00-05:00', 'period2': '2024-07-02 00:00:00-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/SPY
DEBUG        params=frozendict.frozendict({'period1': 1672549200, 'period2': 1719892800, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'})
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          loaded persistent cookie
DEBUG          reusing cookie
DEBUG          crumb = 'tz63UYSUdiS'
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       SPY: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-01 13:30:00
DEBUG       SPY: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-01 09:30:00-04:00
DEBUG       SPY: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG       SPY: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
writing data into daily_20240713_194050.csv

Differences

 % diff daily_20240713_194042.csv daily_20240713_194050.csv | wc -l
452
 % diff daily_20240713_194042.csv daily_20240713_194050.csv | head -n 20
2c2
< 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.75421142578125,74850700
---
> 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7542724609375,74850700
4,5c4,5
< 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447265625,76970500
< 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8604431152344,104189600
---
> 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447570800781,76970500
> 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8605041503906,104189600
7c7
< 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.30755615234375,65358100
---
> 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.3075256347656,65358100
11,13c11,13
< 2023-01-17,398.4800109863281,400.2300109863281,397.05999755859375,397.7699890136719,389.3452453613281,62677300
< 2023-01-18,399.010009765625,400.1199951171875,391.2799987792969,391.489990234375,383.1982116699219,99632300
< 2023-01-19,389.3599853515625,391.0799865722656,387.260009765625,388.6400146484375,380.40869140625,86958900
---
> 2023-01-17,398.4800109863281,400.2300109863281,397.05999755859375,397.7699890136719,389.34521484375,62677300

Bad data proof

No response

yfinance version

0.2.40

Python version

3.12.3

Operating system

Debian GNU/Linux 12 (bookworm)

@ValueRaider
Copy link
Collaborator

ValueRaider commented Jul 14, 2024

Try the branch in PR #1984 (how to run)

@KamarajuKusumanchi
Copy link
Author

Try the branch in PR #1984 (how to run)

I tried it. It did not fix the issue. Consider the following code

 % cat sp500_daily_ohlc.py
import yfinance as yf
from datetime import datetime, date

auto_adjust = True
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=auto_adjust)

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_auto_adjust_{auto_adjust}_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

I ran this twice. Then I changed auto_adjust to False

 % cat sp500_daily_ohlc.py
import yfinance as yf
from datetime import datetime, date

auto_adjust = False
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=auto_adjust)

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_auto_adjust_{auto_adjust}_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

I ran this twice. The files are different across all the four runs.

 % md5sum daily_auto_adjust_*.csv
f470fbd816e09be9037edef54a0f4e59  daily_auto_adjust_False_20240714_142500.csv
001d65b7fa991f4d3aa2c1b8a99cd12d  daily_auto_adjust_False_20240714_142503.csv
ee0e0a82eb6f7307e3a540f3cf30da26  daily_auto_adjust_True_20240714_142435.csv
3b1ff5053c992fcd5965230c96333dbd  daily_auto_adjust_True_20240714_142442.csv
 % diff daily_auto_adjust_False_20240714_142500.csv daily_auto_adjust_False_20240714_142503.csv | head -n 20
2,8c2,8
< 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7542724609375,74850700
< 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.63201904296875,85934100
< 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3446960449219,76970500
< 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8604431152344,104189600
< 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451416015625,73978100
< 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.3075256347656,65358100
< 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.1428527832031,68881100
---
> 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7543029785156,74850700
> 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.6319580078125,85934100
> 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447570800781,76970500
> 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8605041503906,104189600
> 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451110839844,73978100
> 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.30755615234375,65358100
> 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.14288330078125,68881100
10c10
< 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.0597839355469,63903900
---
> 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.059814453125,63903900
 % diff daily_auto_adjust_True_20240714_142435.csv daily_auto_adjust_True_20240714_142442.csv | head
2c2
< 2023-01-03,376.2290102149442,378.2453768730092,369.8275195343226,372.75421142578125,74850700
---
> 2023-01-03,376.2290410170059,378.2454078401519,369.827549812291,372.7542419433594,74850700
4,5c4,5
< 2023-01-05,373.635161717404,373.7526153349045,370.73786295793764,371.3447265625,76970500
< 2023-01-06,374.50629665205355,381.0056756304061,371.3740906518094,379.8604431152344,104189600
---
> 2023-01-05,373.63519242321297,373.7526460503659,370.73789342564294,371.3447570800781,76970500
> 2023-01-06,374.5063267394853,381.00570623999096,371.37412048760314,379.8604736328125,104189600

@ValueRaider
Copy link
Collaborator

You should have seen a new warning message.

@KamarajuKusumanchi
Copy link
Author

You should have seen a new warning message.

Yes, I see the warning message. But it does not help with the reproducibility. For example

df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2))

gives the warning message. But neither

df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=True)

nor

df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=False)

give results that are reproducible.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Jul 14, 2024

It's not meant to fix the reproducibility. Adj Close isn't consistent from Yahoo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants