Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff: add new diff package based on Go 1.19 patience diff implementation #157

Closed
thepudds opened this issue May 26, 2022 · 5 comments · Fixed by #205
Closed

diff: add new diff package based on Go 1.19 patience diff implementation #157

thepudds opened this issue May 26, 2022 · 5 comments · Fixed by #205

Comments

@thepudds
Copy link
Contributor

thepudds commented May 26, 2022

Hi there👋, I would be interested in sending a PR to add a new diff package to go-internal based on the patience diff implementation Russ added in CL 384255 during 1.19 dev cycle.

Compared to the the current pkg/diff used by go-internal, it is faster and much more memory efficient, running in O(n log n) time and I think it is O(n + m) space.

I currently have a copy here:
https://github.com/thepudds/patience-diff

I think go-internals would be a better home for it, including I think it is generally useful for the community, it would bring go-internal down to zero external dependencies (which is part of my selfish reason for being interested in a PR), as well as help deal w/ current testscript diff performance issues such as:

https://github.com/rogpeppe/go-internal/blob/master/testscript/cmd.go#L144-L148

	// pkg/diff is quadratic at the moment.
	// If the product of the number of lines in the inputs is too large,
	// don't call pkg.Diff at all as it might take tons of memory or time.
	// We found one million to be reasonable for an average laptop.
	const maxLineDiff = 1_000_000

A sample performance comparison (which I originally posted in pkg/diff#26 (comment)):

For this example [~130k lines], it runs in 0.07 sec and uses 38 MB RSS, vs. pkg/diff runs in 19 sec and 18 GB RSS (roughly ~250x slower and ~500x more RAM).
[...]
The patience-diff version is not a minimal diff:

$ wc -l *.out
96935 patience-diff.out
56391 pkg-diff.out

...but at least in theory, it is often friendlier to humans.

@mvdan
Copy link
Collaborator

mvdan commented May 26, 2022

Thumbs up from me! Will you also help keep it in sync with upstream? :)

cc @josharian

@thepudds
Copy link
Contributor Author

If there is interest, some details include:

  1. Presumably we would keep the API identical? It is fairly minimal: https://pkg.go.dev/internal/diff

  2. Would rogpeppe/go-internal/diff be a reasonable import path? An alternative would be to use the word patience somewhere, but that is a detail that might not be critical and might change some day, and also wouldn't be following what the original did, including you can see in the comments that Russ preferred not to call it patience diff:

Some systems call this approach a “patience diff,” named for the “patience sorting” algorithm, itself named for a solitaire card game. We avoid that name for two reasons. First, the name has been used for a few different variants of the algorithm, so it is imprecise. Second, the name is frequently interpreted as meaning that you have to wait longer (to be patient) for the diff, meaning that it is a slower algorithm, when in fact the algorithm is faster than the standard one.

@thepudds
Copy link
Contributor Author

Will you also help keep it in sync with upstream? :)

👍

@rogpeppe
Copy link
Owner

rogpeppe/go-internal/diff seems like the right import path to me.

With that input from @mvdan, I think we have consensus enough to move forward.
Presumably we can also change the code to use the new package, which will remove the last remaining external dependency from go-internal, which makes me very happy :)

Thanks for suggesting this!

@mvdan
Copy link
Collaborator

mvdan commented May 26, 2022

I'd also like for pkg/diff to improve, because in a way I'd love for a small diff package to exist in Go without having to pull in all of go-internal. But I guess that, for now, adding go-internal as a module dependency isn't a big deal, given that it itself will have zero dependencies.

mvdan added a commit to mvdan/go-internal that referenced this issue Mar 22, 2023
From Go tip as of March 21st 2023,
at commit 5f1a0320b92a60ee1283522135e00bff540ea115.

The only change is to replace the internal/txtar dependency
with our own txtar package.
It seems like upstream has its own tiny copy of x/tools/txtar,
presumably so that even low level packages can use txtar in tests.

Fixes rogpeppe#157.
mvdan added a commit that referenced this issue Mar 22, 2023
From Go tip as of March 21st 2023,
at commit 5f1a0320b92a60ee1283522135e00bff540ea115.

The only change is to replace the internal/txtar dependency
with our own txtar package.
It seems like upstream has its own tiny copy of x/tools/txtar,
presumably so that even low level packages can use txtar in tests.

Fixes #157.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants