-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IPNI+Carpark indexing from the client #24
Comments
Questions about feasability:
Strawman proposal: this invocation should simply be called In my opinion what is need to finalize this decision is an estimate of how hard triggering the e-ipfs indexing process from a client invocation will be from someone that understands this code well. If it's a lot, we can cancel this ticket and push it later. I'd like to review this in sprint planning tomorrow and come to final agreement. Tagging relevant folks who may be able to estimate this well -- @alanshaw @vasco-santos @Gozala Also tagging @reidlw since this would affect the iteration if it turns out we really can't seperate the IPNI work from write to R2. |
Today bucket event puts the info in a queue that E-IPFS indexing handlers consumer from. @alanshaw knows way better than me here, but while the code described here is quite easy to put together, I think it won't work out of the box. IIRC E-IPFS expects things to go to specific configured S3 buckets, which in this case would not happen with writes to R2 being the thing. I would also call out that calling this something different from My understanding of this would be that |
@vasco-santos got it. so I'm hearing that the client has to do the block indexing for any of this to work? |
@hannahhoward that is my understanding. Alternatively, we could put together a small service in Cloudflare workers that we call to index the data (kind of analogous of of E-IPFS lambdasdoes), but that seems a distraction from shipping the client thing |
Here is the plan I wanted to propose:
This way we do all of the client side work that would be necessary for shipping https://github.com/web3-storage/RFC/blob/main/rfc/ipni-w3c.md and in the future can rewrite At the moment there are couple of unknowns I'd like to flag:
|
While writing this it occured to me that we could reduce verification burden if we were to embrace blake3 & inclusion proofs or just BAO structure as described in github.com/storacha/RFC/pull/8. Specifically clients could produce claim with inclusion proofs that could be verified without having to fetch content and run compute over and consequently would not require running it next to where data is. I personally would be very interested in exploring this approach even if it may imply more engineering effort to ship this, because overall it would be a bigger win. |
I'm realizing now that we probably don't have blake3 support in pre-signed URLs so we may not be able to do this as easily as I have claimed. We would still need to verify that sha-256 hash matches blake3 and then do all the inclusion claims with blake3. Or we make |
Responses to unknowns:
My inclination for now is to do option
Yes, again let's do
Agree this needs coverage. @alanshaw can you provide any insight on how this is done and what we might need to put in this handler? |
I would request we leave anything Blake3 or Merkle Reference related out of this. Absolutely commiting to Blake3 as soon as it makes sense in the stack, but not within this set of tickets. |
Request, if possible, to have @gammazero work on the the client side of this, since it's part of the IPNI RFC he built. Or at least pair/help with it. I would like @gammazero to understand this client code for the future. |
I'm starting to think we may have another unaccounted problem at hand https://filecoinproject.slack.com/archives/C06EDB1NADU/p1711579957117159, specifically I'm under impression that sha256 checksum verification does not appears available in R2 which is a major throwback as lot in our system assumes that uploaded content will correspond to the multihash, even concurrency (of same content upload) is managed by it. |
I agree with @hannahhoward here. Let's do
I think For 2, there would need to be new service logic that would:
We will want this, but I am not sure if we need this to begin with.
We need to catalog all the consumers of Dynamo DB (CARPark) so that they can be converted to do IPNI queries. |
I need some help setting up a dev environment to test client change. |
I am also up for what folks here are suggestion:
Start with option i, and then go to iii. Please note that this is important and should not be a iii that gets deprioritised before. @Gozala and I talked earlier today about at least keeping in the state a "cause" property that links to where given thing came from, so that we can check if it was us or not who compute this in the future
option i
I shared with @Gozala how hoverboard uses mentioned table. A few hints for folks following:
We could not find where schema is defined 🙄 but we can see it in AWS dashboard. The important things are:
A note that I am not any expert on the E-IPFS + Hoverboard part, so I would love someone to double check these assumptions |
@gammazero working on encoding the final structure in CBOR |
|
What
We should support an
ipni/offer
action that triggers CARPark indexing + IPNI publishing. For now, this would not include the work described in #10 , and we would continue to use the CARPark database, we just would trigger new indexing actions from the client rather than a bucket event.The text was updated successfully, but these errors were encountered: