Sequence Order vs Delivery Order #23438
-
I have a question which is a bit theoretical but, I think, may have practical value. I understand that Fluid data structures provide eventual consistency through a "total order broadcast" and the documentation states that "each client receives every operation relayed from the server with enough information to apply them in the correct order." However, it is unclear whether a Fluid service should be expected to deliver the operations in that same order. Of course I realize that local changes are applied optimistically and so the practical change order observed by application code will usually be different, but I feel this may still be important for a few applications. For example, I see some of the legacy data structures (such as Task Manager) which appear to be using the delivery order of the sequenced op messages to achieve "consensus". I say that because I do not see the sequence number being read in processCore. However, I might be misunderstanding. Indeed, "consensus" seems like a misnomer if the service-level order is being used to achieve it. Another reason I think this is important is for the implementation of a custom Fluid service. If I were to write my own service, would I need to ensure that it delivered operation messages to clients in the same order as the sequence it attached to them? By now y'all may be sensing that I have an ulterior motive for asking this question. I am working with a layered application which maintains a legacy data structure with Fluid data structures being used as an intermediate format. I am sheepish about this because I realize it is not the intended use for Fluid and undermines some of the awesome benefits. Nevertheless, I am seeing potential for success with the performance characteristics of Fluid Relay and participant management the SDK. The sticking point, however, is this ordering. I am currently working around the optimistic local changes to use the delivery order in my own data merging algorithm and this works, but I do not know whether it is really safe. Obviously this consistent delivery order is not part of the Fluid Relay SLA, but is it inherent to the Fluid Framework itself? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@vladsud, can you answer this question? |
Beta Was this translation helpful? Give feedback.
-
The fluid protocol that a service implements has two parts - ordering service (that broadcasts ops) and storage (for long term storage of everything - ops, blobs, summaries). All ops that are received (and sequenced) by ordering service must be eventually flushed to storage layer. While ops could be stored forever by some services, the current min requirements - all ops are stored for at least 30 days. This is required for offline scenarios - we support clients staying offline that long - clients that are offline catch up to latest state using ops, and they can find these ops only in storage (ordering service can never keep ops for longer than 24 hours due to various compliance reasons). For completeness - we have Azure Fluid Relay, SharePoint Embedded as two production services, and a number of implementations for testing, including local service that runs on local box. All of them implement Fluid protocol and there is appropriate storage driver implementation for each service that abstracts differences between them (such that rest of the layers do not need to even know what service we deal with). Clients can't rely on any specifics RE ops when it comes to ordering service, as (for example) client can lose socket connection and thus miss some number of ops while it reconnects. Duplicate ops are also possible (as reconnecting client can receive same ops from new connection). As such, client does not impose any requirements - client is totally Ok with ops having gaps or duplicated ops coming in. Depending on particular storage implementation (storage driver used), client will request missing ops either from storage or from ordering service (or both). Perf / latency would be affected, of course, but reliability will not. As for your question about DDSs (Distributed Data Structures), it's worth mentioning that there is a layer (on the client) between DDSs layer and driver (storage abstraction) that is responsible for ensuring that DDS layer always received ops in order, with no gaps and no duplicates. This layer has other responsibilities that aid with these goals, like controlling socket connections and reconnections. Hope that helps. |
Beta Was this translation helpful? Give feedback.
@phudlow-trimble,
The fluid protocol that a service implements has two parts - ordering service (that broadcasts ops) and storage (for long term storage of everything - ops, blobs, summaries). All ops that are received (and sequenced) by ordering service must be eventually flushed to storage layer. While ops could be stored forever by some services, the current min requirements - all ops are stored for at least 30 days. This is required for offline scenarios - we support clients staying offline that long - clients that are offline catch up to latest state using ops, and they can find these ops only in storage (ordering service can never keep ops for longer than 24 hours due to various com…