Replies: 10 comments 6 replies
-
Hey @ghill2 how are you creating the Ticks? Are you loading from the catalog or parquet? Or are you parsing them directly from some raw data (csv etc). There's a couple of things we can do here, but I'm keen to hear a little more about how you're loading the data if you don't mind sharing? Also how long are we actually talking? |
Beta Was this translation helpful? Give feedback.
-
Apologies for the lack of detail in my inital post! from nautilus_trader.model.data.bar import BarSpecification
from nautilus_trader.model.enums import VenueType, AccountType, OMSType, BarAggregation, AggregationSource
from nautilus_trader.model.enums import PriceType
from nautilus_trader.backtest.data.providers import TestInstrumentProvider
from nautilus_trader.model.identifiers import Venue
from nautilus_trader.backtest.data.wranglers import QuoteTickDataWrangler
from time import perf_counter
import pandas as pd
start_date = pd.Timestamp('2019-01-01', tz='UTC').to_pydatetime()
end_date = pd.Timestamp('2020-01-01', tz='UTC').to_pydatetime()
instrument = TestInstrumentProvider.default_fx_ccy("EUR/USD", venue=Venue("SIM"))
ticks_df = pd.read_feather('/data/EURUSD-2019-T1.feather')
print(ticks_df.dtypes)
ticks_df.set_index('date',inplace=True)
print(ticks_df)
wrangler = QuoteTickDataWrangler(instrument=instrument)
start = perf_counter()
tick_objs = wrangler.process(ticks_df)
stop = perf_counter()
print(f"Elapsed time {stop-start} secs") date datetime64[ns] [29168849 rows x 3 columns] |
Beta Was this translation helpful? Give feedback.
-
Hey @ghill2 - thanks so much for the detailed example! I'll grab some similar data and do some performance testing. I think saving as parquet will provide some speed ups but we need to spend a little time looking at this - give us a couple of days and we'll come back with some more details |
Beta Was this translation helpful? Give feedback.
-
Also interested in any solutions or ideas for this |
Beta Was this translation helpful? Give feedback.
-
I had a bit of a look at this, but theres nothing obvious here other than creating objects (classes) is pretty slow in python (even though we're actually doing this in cython). You might see some speed up doing some caching using pickle and reusing the objects (though in my tests it didn't seem to be a huge improvement). The next steps here would be refactoring the class creation and using some more cython tricks (memory pools or other optimisations), or storing this data in parquet, streaming it up via the cython parquet api in a background process. This will still involve pickling the objects - which I think will still be a pretty big blocker. @cjdsellers may have some more thoughts on this |
Beta Was this translation helpful? Give feedback.
-
@limx0 has given a good summary on where we are at here. I think we would see the best improvements leveraging a lower level parquet API, and possibly bypassing the standard Python class initialization. I'm currently investigating re-writing some of the core objects using the PyO3 Rust bindings for Python. I don't have any hard figures that this would be any faster than Cython however. Anecdotally others have seen a 3x speedup of PyO3 over Cython, however it all depends on what is being measured and compared too. Another idea is chunking and parallelizing the object creation using some multiprocessing. So in any case, @limx0 and I are looking at the parquet Cython API nearer term, which could pay off performance wise for object creation. More medium term expect to see some Rust coming into the codebase. |
Beta Was this translation helpful? Give feedback.
-
Hi @cjdsellers and @limx0 , For my information, once this list of Thanks for your help, |
Beta Was this translation helpful? Give feedback.
-
These are all good thoughts. Actually the initial versions of the platform generated the data objects 'on the fly' similar to your suggestions. Something to bear in mind is that right now the design is very simple, all data objects are built up front and then sorted, its made development and debugging much easier. Delaying the instantiations and sorting 'on the fly' could introduce a host of bugs, in fact we've seen some previously with this method. The problem with the idea of a single quote tick instance is there can be thousands of quote ticks existing in a running system, queued up in caches, indicators, bar builders etc, so we can't simply hold one reference and update the backing struct in C. However this is possibly a good path, to hold only the number of total quote ticks the system needs at any one time, and recycle the objects by simply reassigning the backing struct members (this would have to be coded extremely carefully though, and departs from the simple design mentioned above). Object pools and recycling are a common pattern in high performance OO systems, so this deserves further thought and investigation. Now that the platform is becoming more stable and mature, introducing complexity for the trade-off of performance gains could start to be worth it. Your thoughts on list vs map are interesting, however we need to know up front all of the timestamps so we can sort the data stream in order (the stream can and often does include any class which inherits If you're able to run any performance tests and show improvements, then we're more than willing to merge any improvements you might be able to make at the wrangler level. |
Beta Was this translation helpful? Give feedback.
-
I really appreciate the input on this @cjdsellers @limx0 @yohplala I agree about the slow Python object initialization. My test showed that 5.5 min (78%) of the total test time is calling the Python Decimal object init method (_value attribute of BaseDecimal class). Will using the low level pyarrow api avoid this? Also, what are your thoughts about initializing the Python decimal objects in C or C++ instead? Would this approach:
I understand this approach doesn't avoid every init method but from my understand the only way to avoid all of them would be (like you suggested) to re-code the Quantity, Price and Base in Rust so they can initialized using faster calls and with parallelization. |
Beta Was this translation helpful? Give feedback.
-
My current plan is to replace the Then PyO3 can expose this to Python (keeping the same API), and we avoid the overhead of that |
Beta Was this translation helpful? Give feedback.
-
As I work on checking my strategy rules, it takes a considerable amount of time to initialize Tick objects before adding them to the engine. (29,168,849 items, 1 yr of data)
I understand initializing 29,168,849 items is not a light task and should take some time. However, what are your thoughts on how to speed up the Tick object creation, such as bypassing expensive Python function calls or creating the objects on the C level in separate threads?
I am aware of the catalog's streaming functionality, but I'm trying to avoid adding time to the backtest.
Thankyou
George
Beta Was this translation helpful? Give feedback.
All reactions