Replies: 17 comments 21 replies
-
In my opinion, open data should be retrievable. As an SP, there needs to be some SLA about retrievability which includes the maximum amount of data that a system is required to support in GiB/hour in order to not overwhelm each system. One problem we see is that our percentage of polled retrievals drops when too much data is polled or retrieved. That needs to be taken into account. Now if a system like Saturn reduces the frequency of data retrieval then that problem may be mitigated. The second critical aspect of retrievals is whether Fast Retrievals (unsealed copies) are required. That should also be part of the SLA. |
Beta Was this translation helpful? Give feedback.
-
In my opinion it should be possible, that's also main difference to EFIL+ program, but performance shouldn't be critical, as it is free of charge program. |
Beta Was this translation helpful? Give feedback.
-
Circle back on my already given answer: #880 (comment) |
Beta Was this translation helpful? Give feedback.
-
High quality datasets must be the powerful engines for future filecoin growth, but any attempt to define high quality data concretely will add friction and internal conflict to the project, which we really wouldn't want to see. Mechanically asking SPs with datacap to turn on the retrieval function isn't the final solution. Using incentives to make SPs willingly become retrieval servers reduces internal conflict and friction, not adds controversy. |
Beta Was this translation helpful? Give feedback.
-
Great discussion topic. I would like to push from an angle that data clients hold the SPs accountable for serving retrievals. An abundant number of datacap applications indicated that the clients wanted their data retrievable. I am aligned with many of the opinions expressed above, and drafted a proposal for Fil+ Retrieval Guidelines & Requirements for Data Clients below. Why HTTP over Graphsync or Bitswap is a separate topic but the TLDR is that it's the shortest path to enable retrievals on Filecoin today due to its ability to retrieve from PieceCIDs. Fil+ Retrieval Guidelines & Requirements for Data Clients:
|
Beta Was this translation helpful? Give feedback.
-
I have received some feedback from Korean SPs. Due to the language barrier, I'm sharing here on their behalf. “Retrieval takes up bandwidth. Uploading CAR files will make my bandwidth almost full which could affect my submit window & winning post, i.e. block rewards. As a small sp, this's not good for my participation." "In the long run, retrieval should be available to everyone but at a cost. Whoever wants the data must pay for the retrieval to reduce the risk that miners may face." |
Beta Was this translation helpful? Give feedback.
-
We have noticed a lot of challenges and controversies about retrieval success in the community. Frankly, I think the investments to verify the retrieval success rate of nodes in the early phase of Filecoin far outweigh the benefits. Moreover, this introduces huge internal friction because network conditions differ so much from place to place. Retrieving the same node may come out with different results that retrievers will have difficulty agreeing on. This will keep increasing community friction and drive more followers away from the project. |
Beta Was this translation helpful? Give feedback.
-
Supporting retrieval puts a lot of pressure on both clients and Sps, whether in terms of cost of technology or communication. |
Beta Was this translation helpful? Give feedback.
-
I would like to draw your attention to the fact that over 1EiB validation data is currently stored offline (not online) into Filecolin Network by default. Why would we have to expect data retrieval and downloading to be done online? Before the real retrieval network node is built, any online retrieval directly from the client to the Storage Network is only a technical attempt, which will definitely cause potential risks to the stability of the node from Storage Network. If it is considered as a required review condition for Fil+, this is obviously not appropriate. I suggest that this rule be changed as soon as possible. |
Beta Was this translation helpful? Give feedback.
-
@BobbyChoii @TakiChain Both of you are in public dispute, on GitHub and Slack. Why aren't you responsive on either platform but your opinions are voiced here? |
Beta Was this translation helpful? Give feedback.
-
This screenshot as below should give us the direction well. Currently filecoin network's greatest value is to help clients store real and useful data, not retrieve it online. It's enough that clients who have stored data can copy and recover it when needed. We shouldn't give filecoin an unbearable responsibility. If there are redundant barriers, we should remove them. In my opinion, the retrieval based on filecoin network is a great project. But it will most likely be implemented in filecoin's layer 2 network, not in the present moment. |
Beta Was this translation helpful? Give feedback.
-
Retrieval is something that in the long term filecoin needs to support. There is much more data in the world that is useful when it can be gotten back when needed. I think one of the major stumbling blocks today is that we talk about "retrieval" as a binary thing, a yes-or-no. Instead, I think we will have more success in thinking about retrieval as a sliding scale, because different data is going to need different amounts of retrieval, and the costs should be proportional to that. I hope ultimately we can get some "tiers" of retrieval to think about, so e.g.
These reservations of bandwidth can make it clearer what the underlying cost of infrastructure are going to be and can let SPs price deals to offset the costs of the retrieval burden they're taking on. |
Beta Was this translation helpful? Give feedback.
-
To augment @willscott 's point, I've noticed retrieval is perceived primarily as a burden on SPs. But I want to point out that data that can be stored and retrieved, at any of the levels @willscott describes, is valuable to a much wider market of potential customers than data that is stored but impossible to get back. Even offering 'archival' data that can be retrieved reliably with the parameters @willscott describes converts an SP's offering to a business that objectively valuable to large enterprises needing crash recovery. Getting to 'warm' retrieval opens the market of potential clients much further. It also makes an SP an appropriate "L3" backup for Saturn. But it's probably also sufficient for many SPs to simply be archival storage, cause a lot of people need that. Ultimately, I think what we want is alignment between a Fil+ client's needs and the service an SP is offering. And I think it may be worth distinguishing certain 'tranches' of SPs & clients between the tiers @willscott describes. |
Beta Was this translation helpful? Give feedback.
-
An additional perspective - it is beneficial to the Filecoin network to have a working end to end data storage and retrieval flow (regardless of retrieval tier as @willscott mentions), not just a pure storage solution with no retrieval. The different teams working on Filecoin have made significant improvements in the retrieval part, but if SPs choose not serve retrievals, many data use cases on Filecoin will be considered incomplete. Furthermore, although there are many retrieval related projects in progress (such as Saturn, Rhea, etc.) to help with shouldering the majority of retrieval requests on the network - these projects all are dependent on Filecoin SPs serving retrievals, as Filecoin is the layer where we have incentives/guarantees that data is stored (and ideally available to be retrieved). |
Beta Was this translation helpful? Give feedback.
-
I know the current macroeconomics environment is hard for all of us, and I am trying not to put more burden on anyone in our community. I want us to all succeed in the long term. Having an end-to-end solution to storage is imperative to Filecoin's long-term success. |
Beta Was this translation helpful? Give feedback.
-
Retrievability is supposed to be a highly controversial point, for the same SP, clients from different regions often get the totally different results when they try to retrieve, whose results should we take? As far as I know, many notaries from China have experienced obstacles when using the public Internet to download data from other countries. According to rules of fil+, do we need to restrict non-Chinese SPs from participating in the fil+ project? |
Beta Was this translation helpful? Give feedback.
-
Thank you, Fil+ community, for the feedback & comments both on this thread and in our slack threads. We have added the final version of retrieval guidelines to the README at https://github.com/filecoin-project/filecoin-plus-large-datasets/tree/main under Retrieval Guidelines for Data Clients. |
Beta Was this translation helpful? Give feedback.
-
Open data stored in verified deals comes with the claim that the data stored should be retrievable. This is something that is theoretically driven by both the DataCap applicants demands/claims/desires (see the application template where they specify if the data is open and retrievable to anyone) and by the value proposition of open data to begin with. However - historically, Fil+ has not checked/validated this nor had a declared retrievability metric or policy. As part of the push in the Quality Phase, this is a good time to kick start a conversation on the challenges, potential pathways, and tools that we can implement to better assess when open data onboarded is actually retrievable.
IMO (will keep this updated as we continue having the conversation), what we need to do is:
For those that don't believe that we should be doing this. Here's why I think we have to - at a baseline, this is a relatively good test for if clients are actually onboarding data with SPs that are meeting their claimed needs. If they are not, then this should be a low false positive path to identifying potential abuse, or at least, it becomes a good way to encourage clients to consider SPs that can actually serve their needs over time. As a reminder, here's the definition of quality data today (see https://filplus.storage/ for more details, scope, roadmap, goals):
Beta Was this translation helpful? Give feedback.
All reactions