Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor(tree): Use TreeChunk in more places #24001

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

yann-achard-MS
Copy link
Contributor

@yann-achard-MS yann-achard-MS commented Mar 6, 2025

Description

Core Changes

  1. initializeForest now takes in a ITreeCursorSynchronous in field mode instead of an array of ITreeCursorSynchronous in node mode.
  2. Editing methods in DefaultEditBuilder now take in a single TreeChunk instead of a single ITreeCursorSynchronous (in either node or field mode).
  3. Deltas now represent each subsequence of trees as a single TreeChunk instead of an array of ITreeCursorSynchronous in node mode.

The shift from an array of ITreeCursorSynchronous in node mode to either a single ITreeCursorSynchronous in field mode or a TreeChunk (1 and 3 above) is an improvement in efficiency.

The shift to TreeChunk (2 and 3 above) reflects the fact that these layers need to retain copies of the trees.

Other Changes

  • IForestSubscription implementers now provide a chunkField method that creates one optimized for the specific forest.
  • MappedEditBuilder was added to make it easy for LazyField editing methods to keep taking in insertion content as cursors while the underlying editors use TreeChunks.

Breaking Changes

None

Future Work

In the long run we want TreeChunks to be passed to the forest.

@github-actions github-actions bot added base: main PRs targeted against main branch area: dds Issues related to distributed data structures area: dds: tree labels Mar 6, 2025
@yann-achard-MS yann-achard-MS marked this pull request as ready for review March 7, 2025 00:03
@Copilot Copilot bot review requested due to automatic review settings March 7, 2025 00:03
@yann-achard-MS yann-achard-MS requested a review from a team as a code owner March 7, 2025 00:03

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR refactors the editing and change‐building logic to use TreeChunk earlier in the pipeline, replacing usages of ITreeCursorSynchronous where inserted content is concerned. Key changes include:

  • Introducing MappedEditBuilder to map content from one type to another using TreeChunk.
  • Updating chunking and delta conversion functions to work with TreeChunk rather than cursors.
  • Adjusting the APIs in forests, edit builders, and visitor functions accordingly.

Reviewed Changes

File Description
packages/dds/tree/src/feature-libraries/default-schema/mappedEditBuilder.ts New builder that wraps a base builder and mapDelegate to transform content to TreeChunk.
packages/dds/tree/src/feature-libraries/chunked-forest/chunkTree.ts Adds an early return for empty fields and uses TreeChunk in chunking functions.
packages/dds/tree/src/feature-libraries/modular-schema/modularChangeFamily.ts Replaces cursor-to-chunk conversion with direct use of TreeChunk.
packages/dds/tree/src/core/forest/forest.ts Exposes a new chunkField method to generate TreeChunk from a field cursor.
packages/dds/tree/src/feature-libraries/flex-tree/lazyField.ts Refactors lazy field editors to use getEditor() (MappedEditBuilder) and adjusts conversion helpers accordingly.
packages/dds/tree/src/feature-libraries/default-schema/defaultEditBuilder.ts Updates field edit methods to work with TreeChunk, removing obsolete compressor parameters.
(Other files) Other changes adjust types and callbacks in delta, visitor, and anchor APIs to support TreeChunk.

Copilot reviewed 49 out of 49 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

packages/dds/tree/src/feature-libraries/flex-tree/lazyField.ts:305

  • [nitpick] Ensure that wrapping 'newContent' in an array for 'cursorForMapTreeField' properly matches the expected TreeChunk format. Verify that this change correctly handles single-node values as intended.
this.valueFieldEditor().set(cursorForMapTreeField([newContent]));

packages/dds/tree/src/feature-libraries/default-schema/defaultEditBuilder.ts:210

  • [nitpick] Confirm that the removal of the idCompressor parameter in the call to buildTrees does not affect the ID generation guarantees. Double-check that all downstream code accounts for this change.
const build = this.modularBuilder.buildTrees(fill.localId, newContent, revision);

packages/dds/tree/src/feature-libraries/initializeForest.ts:38

  • [nitpick] Verify that converting 'content' to a TreeChunk via 'forest.chunkField' before initialization is appropriate and that the input cursor is valid for this transformation.
const delta: DeltaRoot = deltaForRootInitialization(forest.chunkField(content));
* This can be used to chunk data that is then inserted into the forest.
*
* @remarks
* Like {@link chunkField}, but forces the results into a single TreeChunk.
Copy link
Contributor

@CraigMacomber CraigMacomber Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Like {@link chunkField}, but forces the results into a single TreeChunk.
* Like {@link chunkField}, but forces the results into a single TreeChunk.
* While any TreeChunk is compatible with any forest, this method creates one optimized for this specific forest.
* The provided data must be compatible with the forest's current schema.

@@ -89,9 +85,10 @@ export function visitDelta(
const rootDestructions: Delta.DetachedNodeDestruction[] = [];
const refreshers: NestedMap<Major, Minor, ITreeCursorSynchronous> = new Map();
delta.refreshers?.forEach(({ id: { major, minor }, trees }) => {
for (let i = 0; i < trees.length; i += 1) {
const treeCursors = nodeCursorsFromChunk(trees);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either here or on nodeCursorsFromChunk there should probably be a TODO that we should not split up sequences into individual nodes during insert and instead store them as a sequence.

Comment on lines +177 to +179
if (length === 0) {
return [];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. chunkRange would actually handle this case correctly if the assert in it were removed.

Changing the assert in chunkRange to be:
assert((length === 0 && !skipLastNavigation) === (cursor.mode === CursorLocationType.Nodes), "should be in nodes mode or past end");

seems like a better fix to me.

It would be good to at least add a regression test for chunkField(emptyChunk.cursor(), someCompressor) to src/test/feature-libraries/chunked-forest/chunkTree.spec.ts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert triggers when length is not zero and cursor.mode === CursorLocationType.Nodes is true. Maybe I misunderstood what changes you wanted me to make. Do you want me to remove the call to cursor.firstNode(); from chunkField?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that assert slightly wrong. Working around JS not having XOR by using ===, I need to negate one of the sides....

I made #24009 which cleans this up based on your changes and test.

/**
* An IDefaultEditBuilder implementation based on another IDefaultEditBuilder that uses a different content type for insertions.
*/
export class MappedEditBuilder<TBase, TAdapted> implements IDefaultEditBuilder<TAdapted> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context on why this file is added in this PR would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the following to the PR description:

MappedEditBuilder was added to make it easy for LazyField editing methods to keep taking in insertion content as cursors while the underlying editors use TreeChunks.

@CraigMacomber
Copy link
Contributor

CraigMacomber commented Mar 7, 2025

At the high level, what I think is important here (and worth highlighting in the PR description) is not so much the conversion from cursors to TreeChunks. We have efficient ways to convert field cursors to tree chunks and back.

What I think actually matters here is some but not yet all places which used to use arrays of node cursors to encode a sequence of nodes now use one of the types for actually doing efficient sequences of nodes (either a field cursor, or a TreeChunk, in this case tree chunk).

Generally I think taking data in as cursors makes a ton on sense (it makes it possible for someone to express a tree from existing data without copying it), but for cases where we need to internally hold onto a copy of the tree, TreeChunks make sense. Turns out all the editing APIs needs internal copies of the tree (need to copy them into ops, store them in deltas, but them in forests etc) so moving those code paths to TreeChunk is reasonable and more type safe than using cursors, so its a win.

I think the PR description should have its emphasis and details tweaked to better align with that. For example in the list of cases using different things, field cursors vs arrays of node cursors should be called out, since one of those is fine and the other is bad/debt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dds: tree area: dds Issues related to distributed data structures base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants