Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial benchmarks for intersection types + a bit of speedup #11924

Merged
merged 29 commits into from
Jan 7, 2025

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Dec 19, 2024

Pull Request Description

Addressing #11846 by writing a set of benchmarks comparing the same algorithm with different kinds of intersection type values. The fix for #11846 is then:

  • using Truffle Node types for performing operations
  • e.g. new nodes like EnsoMultiValue.NewNode and FindIndexNode and AllTypesWith are introduced
  • all these nodes speculate on the intersection type to be the same as before and provide some speed up
  • as such we need a fast way to compare two intersection types to be the same
  • hence the introduction of an internal EnsoMultiType replacing Type[]
  • it has a factory method and a cache yielding same EnsoMultiType instance for Type[] with equal elements
  • guaranteeing quick comparison of two EnsoMultiType instances via ==
  • thus all the nodes can build various inline caches and speed things up

Originally the intersection types benchmarks were 60 times slower than the base benchmark. Now they are just 10-15 times slower. Given the current memory overhead of EnsoMultiValue and nature of the benchmark (million of + operation dispatches), that's a reasonable result. Moreover additional speed up is underway in #11975 and will be visible once these benchmarks get executed daily.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Dec 19, 2024
@JaroslavTulach JaroslavTulach self-assigned this Dec 19, 2024
@JaroslavTulach JaroslavTulach marked this pull request as draft December 19, 2024 16:07
@JaroslavTulach JaroslavTulach changed the title Optimize building can casting of EnsoMultiValue Optimize building and casting of EnsoMultiValue Dec 19, 2024
@JaroslavTulach JaroslavTulach linked an issue Dec 20, 2024 that may be closed by this pull request
go 0 0


make_vector type n =
Copy link
Member Author

@JaroslavTulach JaroslavTulach Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dedicated benchmark for EnsoMultiValue instances based on the idea of ArrayProxy one. Currently the more complicated benchmarks refuse to compile and the compiler bails out. The initial results are:

sbt:enso> runtime-benchmarks/benchOnly MultiValueBenchmarks
Benchmark                                                 Mode  Cnt    Score    Error  Units
MultiValueBenchmarks.sumOverComplexAndFloat5              avgt    5  214.330 ±  4.179  ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3  avgt    5  219.803 ± 11.872  ms/op
MultiValueBenchmarks.sumOverFloat1                        avgt    5    0.079 ±  0.006  ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6              avgt    5  219.525 ±  6.393  ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  203.788 ±  9.843  ms/op
MultiValueBenchmarks.sumOverInteger0                      avgt    5    0.074 ±  0.001  ms/op

After 630ec62 the results are better:

MultiValueBenchmarks.sumOverComplexAndFloat5              avgt    5  30.109 ± 0.661  ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3  avgt    5  26.988 ± 0.446  ms/op
MultiValueBenchmarks.sumOverFloat1                        avgt    5   0.078 ± 0.003  ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6              avgt    5  27.821 ± 0.856  ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  27.961 ± 0.263  ms/op
MultiValueBenchmarks.sumOverInteger0                      avgt    5   0.078 ± 0.002  ms/op

and there are no bailouts. Time to really speed things up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* The <b>base benchmark</b> for this suite. Measures how much it takes to access an Atom in a
* Vector, read {@code re:Float} field out of it and sum all of them together.
*/
@Benchmark
Copy link
Member Author

@JaroslavTulach JaroslavTulach Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the base benchmark. The results after 4dacf53 are:

# the base one
MultiValueBenchmarks.sumOverComplexBaseBenchmark0         avgt    5  0.139 ± 0.013  ms/op

# these two are supposed to be faster
MultiValueBenchmarks.sumOverInteger1                      avgt    5  0.065 ± 0.003  ms/op
MultiValueBenchmarks.sumOverFloat2                        avgt    5  0.073 ± 0.002  ms/op

# these should catch up with sumOverComplexBaseBenchmark0 one day
MultiValueBenchmarks.sumOverComplexAndFloat5              avgt    5  8.580 ± 0.326  ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3  avgt    5  9.118 ± 0.483  ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6              avgt    5  8.110 ± 0.160  ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  9.393 ± 0.648  ms/op

still 60 times slower than it should be.

Copy link
Member Author

@JaroslavTulach JaroslavTulach Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest slowdown is currently in reorderOnly branch:

reorderOnly

It seems to always allocate new arrays. Rather it should treat EnsoMultiType as compilation constants have everything ready from previous run.

Attempting to make things better in a follow up PR:

if (reorderOnly) {
var copyTypes = allTypesWith.executeAllTypes(dispatch, mv.extra);
if (i == 0 && dispatch.typesLength() == 1) {
return newNode.newValue(copyTypes, 1, mv.values);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if seems to speed sumOverFloatComplexRecastedToFloat4 up twice:

[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  1.071 ± 0.074  ms/op
[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4  avgt    5  2.322 ± 0.086  ms/op

@JaroslavTulach JaroslavTulach added the CI: Clean build required CI runners will be cleaned before and after this PR is built. label Jan 4, 2025
@JaroslavTulach JaroslavTulach marked this pull request as ready for review January 4, 2025 06:38
@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Jan 4, 2025

Benchmarks are written. They have been sped up just like:

MultiValueBenchmarks.*5

There are possible ways to speed things up even more, but let's leave them for subsequent PRs like:

if (at != checks.length) {
// request for Number & Integer may yield two integers collision
// request for Number & Float may yield two floats collision
// request for Number & Integer & Float must yield one collision
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests in Conversion_Spec that combine these types. Thus such a combination must be supported (and it clashes with assert checking each type is in multi value only once) - but it doesn't have to be fast. Thus transferToInterpreter().

@JaroslavTulach JaroslavTulach added the CI: Keep up to date Automatically update this PR to the latest develop. label Jan 4, 2025
@JaroslavTulach JaroslavTulach changed the title Optimize building and casting of EnsoMultiValue Initial benchmarks for intersection types + a bit of speedup Jan 6, 2025

/**
* These benchmarks compare performance of {@link EnsoMultiValue}. They create a vector in a certain
* configuration representing numbers and then they perform {@code sum} operation on it.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Akirathan, @hubertp, @4e6 (and also @radeusgd):

  • these are the benchmarks for intersection types
  • a Vector is created by make_vector typ method
  • various types include Integer, Float, Complex as non-intersection type benchmarks to compare to
  • then there are various types mixing Float and Complex together into different intersection types

Ideally the intersection types benchmark results should be close to base benchmark sumOverComplexBaseBenchmark0 with some "reasonable overhead".

Are you OK with these benchmarks?

@JaroslavTulach JaroslavTulach removed the CI: Keep up to date Automatically update this PR to the latest develop. label Jan 7, 2025
return false;
}
final EnsoMultiType other = (EnsoMultiType) obj;
return Arrays.deepEquals(this.types, other.types);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't only deepEquals be under TruffleBoundary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes/No.

  • technically the previous statements are eligible for PE
  • however I don't want PE to call into equals and hashCode at all
  • only callbacks from ALL_TYPES.computeIfAbset should access these two methods
  • only findOrCreateSlow (itself by a boundary) should be calling ALL_TYPES.computeIfAbset
  • so maybe we don't need @TruffleBoundary here at all

However excluding code from PE multiple times hurts nothing...

@JaroslavTulach JaroslavTulach merged commit 2ead3f5 into develop Jan 7, 2025
47 of 49 checks passed
@JaroslavTulach JaroslavTulach deleted the wip/jtulach/MultiType11846 branch January 7, 2025 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: No changelog needed Do not require a changelog entry for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark EnsoMultiValue and speed it up
2 participants