-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to add sink
feature to data downloaders
#71
Comments
I think current |
And about the issue: changing signature into the form
From my point of view,
oh, what is the blueprint of this pkg? (And I never investigated it before)
Still not sure about that this kind of generic method is a worthy cost or not.
|
Well, here are some more thoughts in support of this proposal.
and so on. Now, when a new author creates a package, there is usually a high barrier for including external large dependencies. I think, that it is much easier to convince package maintainer to include a lightweight and stable interface, which will provide conversion abilities for his package than to convince to include full fledged TimeSeries.jl, especially if he is not going to use it (some may prefer to work with DataFrames, others may prefer to roll out there own formats). I see this proposal as step forward to union all authors, so they do not fracture ecosystem. And MarketData.jl can be used as a good example how to do things right way. |
I guess it's better to move it to a corresponding issue/zulip, so not to deviate from the main topic. But shortly, there is no blueprint as a written document, except of the original documentation: https://timestampsjl.readthedocs.io/en/latest/. It's going to be changed to some degree of course, but mainly you can think about it as a row-wise table. It has some drawbacks, but also provides huge advantages: https://julialang.zulipchat.com/#narrow/stream/282925-backtesting/topic/Timeseries.20format/near/232870653 I've used this approach in my implementation of backtesting strategy (it's called TimedEvent, but it is absolutely the same thing as Timestamp):
By the way, it's one of the reason, why i proposed this change to MarketData.jl, it was much easier to convert CSV.Rows to Timestamps directly then by using TimeArray. |
I have made small prototype, which one can toy with, to feel whether this approach is good or bad. |
Ah, okay. I think your point is that you want an interface to gluing the gap of time series data struct and normal table-like data, right? (but I think the scope is quite bigger than the topic in original post :p) hmm, I just recalled that I asked about a related issue (JuliaData/Tables.jl#40) Great, I'm going to try it out. |
Ah, that's quite an interesting discussion. But after playing for some time with "timeseries" interface, I do not think it is necessary to ask to add such a functionality to
Well, I am more interested in gluing together time series data structures (current and future one). And yes, it's bigger than original topic, but at the same time it is side effect of the original intention to have "sink" agnostic data source. It's rather tiresome to have different packages where each one invent it's own way to store resulting data. |
@Arkoniak I'm still busy on TimeSeries.jl (JuliaStats/TimeSeries.jl#482). How about just go with your interface package (https://github.com/Arkoniak/ProtoMarketData.jl) right now? |
I've been playing with
yahoo
data source and one thing occurs to me: in its current implementation user is locked inTimeArray
. It's not always convenient, user may prefer to work with other data formats,DataFrames
,Temporal
or maybe some other custom format. What I am proposing is to give an interface like this:Now,
SINK
can be anything:DataFrame
,TimeArray
or whatever user want. We can emit by defaultTimeArray
for example, but that wouldn't limit user.In order to do that we can wrap
CSV.File
in special structure which should conformTables.jl
protocol. The idea that if we for example defineyahoo
asthen this function is providing a
DataFrame
sink by default. In order for it to work for theTimeArray
, one should only implementcsv -> TimeArray
interface which can look likeand something similar for
Temporal
.The problem with this direct approach is that it is very non-general. If in some other data source datetime column wouldn't be located at the first position it will break. So, we can do something smarter, like defining a structure
and use it
This structure should implement corresponding
Tables.jl
methods and at the same time should provide the necessary information inmeta
field (like where datetime column is located). So, every sink which can use this structure can convert data source to its own format without any problems.We can do it in a few small steps
MarketData.jl
. As long asTimeDataWrapper
lives insideMarketData.jl
, functions likeTimeArray(x::TimeDataWrapper)
is not a type piracy. As a result, we get the function that can extract its data toTimeArray
andDataFrame
formats. Just to clarify,DataFrame
support is coming from the fact thatTimeDataWrapper
follows Tables.jl API.MarketDataInterface.jl
and ask the owner ofTimeSeries.jl
to provide support for this package.Temporal.jl
and ask him to provide support.Timestamps.jl
and can write necessary support for them as well.As a result, we will have a generic method, which can work with multiple sinks, and instead of forcing users what package to choose for financial data, they will be able to use a single package for data sourcing and any package they like for further data processing. It's a win-win situation.
As a further step,
Quandl.jl
can be revived and it can go through the same procedure. So we will have multiple financial data sources with the same consistent logic.If this proposal is ok, I can try to go with the first step and we will see how it works out.
The text was updated successfully, but these errors were encountered: