-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Append()ing normal ShortStrings (length ≦ 255) to Ropes, switch DOM implementation to using that, and use that for generating reference links #157
base: main
Are you sure you want to change the base?
Conversation
This change reverts 0ad4342 and increases to 255 the length of strings that can be passed directly to Rope≫Append() without getting truncated. Otherwise, without this change, any string passed directly to Rope≫Append() whose length exceeds a particular system-dependent limit (which in practice seems to be 15) will get silently truncated. Relates to #153.
This change makes the DOM implementation append strings directly when setting attribute values and when appending text nodes and comments. That in turn allows passing non-constant strings to Element≫AppendChild() and Document≫AppendChild() and Element≫SetAttribute(). Otherwise, without this change, the strings we use for attribute values and text nodes and comments either must be constants, or else we have to create CutRopes, append strings to them, and then use TText.CreateDestructively() (to create text nodes and comments) and Element≫SetAttributeDestructively() (to set attribute values).
9226f89
to
c6fabf1
Compare
Haven't gotten a chance to look at this in detail, but I appreciate all the work here! I do wonder, what will happen if we pass a >255 character string? Silent or loud failure? |
…vely Currently the longest string we’re appending directly to Ropes is the 40-character SourceGitSHA string for the snapshots, and the second- longest string is the 30-character “This section is non-normative.”. This change switches that SourceGitSHA-handling code to instead using Scratch.Append(@SourceGitSHA) and TText.CreateDestructively(Scratch) — so that we can reduce down to 30 the size of ShortStrings we allow to be appended directly to Ropes.
The larger the size of ShortStrings we allow to be appended directly to Ropes, the more memory Wattsi consumes at runtime. We don’t need the size to be anywhere near as big as 255 — the existing code never actually needs to directly append strings with lengths any longer than 30. If we ever do run into need to append string lengths longer than 30, we could at that time just bump UTF8InlineSize up to whatever new size we actually need. Upping the size to 30 from the old size of 15 seems to increase the memory consumption by about 15–20%, or around 80–100MB. (In comparison, upping it to 255 seems to roughly double the memory consumption.)
I’ve pushed two more commits that update things a bit. The main relevant commit reduces the string-length limit all the way down to 30, and adds some code that, if we try to directly pass a string whose length is bigger than 30, causes Wattsi to fail with an error message. For example, if we try to pass the string “This string has more than 30 characters”, it would fail with this error message:
The other commit rewrites part of the SourceGitSHA-handling code (for snapshots) to use the Scratch.Append(@SourceGitSHA) and TText.CreateDestructively(Scratch) for appending — because those SourceGitSHA values are 40 characters, so we’d otherwise need to set the limit at 40. And the point of choosing 30 as the limit is, that’s the longest length of any strings the current code is appending directly. So we can safely set it to that, and we could bump it up later if we ever run into need to do anything with longer strings. And the point of setting it to 30 instead of 255 is that the larger we make it, the more memory Wattsi consumes at runtime. I haven’t done any serious profiling, but what I have found from some limited testing is that at 255, it seems to roughly double the amount of memory that Wattsi consumes at runtime. But if we set it at 30, it seems to only increases the memory consumption by about 15–20%, or around 80–100MB (on my system). I’ve tested this and found that after all the changes in this PR, Wattsi produces output identical to what the current code in the main branch outputs. I also found that it doesn’t significantly increase the build time — though it does seem to increase the build time slightly, but just by maybe half a second or less. Anyway, I meant to say this earlier, but I think the point of making this change is not only to help those of us who’ve had to waste time learning the hard way about this unwelcome quirk in the code — and don’t want to have to re-learn it again the next time we touch the code — but to also help any future contributors who come along later to not have to waste their time unnecessarily suffering through what we’ve had to suffer through. The thing in the existing code seems to just be a performance optimization for solving a performance problem that we don’t actually have in practice. So we can safely dispense with that optimization, and let ourselves and future contributors be able to get strings into attribute values and text nodes in the same simple way we can with DOM implementations in any other tools/runtimes — rather than making everybody continue to do the oddball thing the current code forces. Also for the record here, a table showing string lengths used in the current code
|
Hmm. Well, a 15-20% memory increase and 0.5 second time spent increase is not trivial. Would it be possible instead to just have an error when you try to Append() incorrectly? |
I opened #158 with a change that adds a Contributing section to the README.md, and that takes the existing error message we emit when the string is longer than the system-set limit (15), and updates that message to instead be:
|
Ah, sorry for not being clear. What I'm wondering is if it's possible to catch whatever went wrong in 3b11b2d . If I recall, the issue was that I used To recap, I think we've discovered two classes of failures:
We have nice errors for (1). We don't have nice errors for (2). Can we add them? Is there a big performance cost to doing so? (As is hinted by the fact that there seems to be some related conditionally-compiled out asserts.) It's totally fine if there's no way to detect it, or detect it cheaply! I just wanted to check, since you've done so much great work here already. |
Relates to #153. This change reverts 0ad4342 and increases to 255 the length of strings that can be passed directly to Rope≫Append() without getting truncated. Otherwise, without this change, any string passed directly to Rope≫Append() whose length exceeds a particular system-dependent limit (which in practice seems to be 15) will get silently truncated.
This change allows us to append strings to Ropes in the way we’d intuitively expect to — rather than instead needing to forever remember that we otherwise have to:
Further, the change switches the DOM implementation to append strings directly when setting attribute values and when appending text nodes and comments. That in turn allows passing non-constant strings to Element≫AppendChild() and Document≫AppendChild() and Element≫SetAttribute().
Otherwise, without that change, the strings we use for attribute values and text nodes and comments either must be constants, or else we have to create CutRopes, append strings to them, and then use TText.CreateDestructively() (to create text nodes and comments) and Element≫SetAttributeDestructively() (to set attribute values).
The change also includes a commit that switches the code for generating reference links to using a normal Append() and simple string concatenation — rather than needing to append a
Scratch
Rope built using anExtractedData
CutRope.