-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial benchmarks for intersection types + a bit of speedup #11924
Conversation
go 0 0 | ||
|
||
|
||
make_vector type n = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dedicated benchmark for EnsoMultiValue
instances based on the idea of ArrayProxy one. Currently the more complicated benchmarks refuse to compile and the compiler bails out. The initial results are:
sbt:enso> runtime-benchmarks/benchOnly MultiValueBenchmarks
Benchmark Mode Cnt Score Error Units
MultiValueBenchmarks.sumOverComplexAndFloat5 avgt 5 214.330 ± 4.179 ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3 avgt 5 219.803 ± 11.872 ms/op
MultiValueBenchmarks.sumOverFloat1 avgt 5 0.079 ± 0.006 ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6 avgt 5 219.525 ± 6.393 ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 203.788 ± 9.843 ms/op
MultiValueBenchmarks.sumOverInteger0 avgt 5 0.074 ± 0.001 ms/op
After 630ec62 the results are better:
MultiValueBenchmarks.sumOverComplexAndFloat5 avgt 5 30.109 ± 0.661 ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3 avgt 5 26.988 ± 0.446 ms/op
MultiValueBenchmarks.sumOverFloat1 avgt 5 0.078 ± 0.003 ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6 avgt 5 27.821 ± 0.856 ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 27.961 ± 0.263 ms/op
MultiValueBenchmarks.sumOverInteger0 avgt 5 0.078 ± 0.002 ms/op
and there are no bailouts. Time to really speed things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some speedup achieved...
…nsoMultiValue instance
* The <b>base benchmark</b> for this suite. Measures how much it takes to access an Atom in a | ||
* Vector, read {@code re:Float} field out of it and sum all of them together. | ||
*/ | ||
@Benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now the base benchmark. The results after 4dacf53 are:
# the base one
MultiValueBenchmarks.sumOverComplexBaseBenchmark0 avgt 5 0.139 ± 0.013 ms/op
# these two are supposed to be faster
MultiValueBenchmarks.sumOverInteger1 avgt 5 0.065 ± 0.003 ms/op
MultiValueBenchmarks.sumOverFloat2 avgt 5 0.073 ± 0.002 ms/op
# these should catch up with sumOverComplexBaseBenchmark0 one day
MultiValueBenchmarks.sumOverComplexAndFloat5 avgt 5 8.580 ± 0.326 ms/op
MultiValueBenchmarks.sumOverComplexFloatRecastedToFloat3 avgt 5 9.118 ± 0.483 ms/op
MultiValueBenchmarks.sumOverFloatAndComplex6 avgt 5 8.110 ± 0.160 ms/op
MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 9.393 ± 0.648 ms/op
still 60 times slower than it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest slowdown is currently in reorderOnly
branch:
It seems to always allocate new arrays. Rather it should treat EnsoMultiType
as compilation constants have everything ready from previous run.
Attempting to make things better in a follow up PR:
if (reorderOnly) { | ||
var copyTypes = allTypesWith.executeAllTypes(dispatch, mv.extra); | ||
if (i == 0 && dispatch.typesLength() == 1) { | ||
return newNode.newValue(copyTypes, 1, mv.values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
seems to speed sumOverFloatComplexRecastedToFloat4
up twice:
[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 1.071 ± 0.074 ms/op
[info] MultiValueBenchmarks.sumOverFloatComplexRecastedToFloat4 avgt 5 2.322 ± 0.086 ms/op
Benchmarks are written. They have been sped up just like: There are possible ways to speed things up even more, but let's leave them for subsequent PRs like: |
if (at != checks.length) { | ||
// request for Number & Integer may yield two integers collision | ||
// request for Number & Float may yield two floats collision | ||
// request for Number & Integer & Float must yield one collision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests in Conversion_Spec that combine these types. Thus such a combination must be supported (and it clashes with assert checking each type is in multi value only once) - but it doesn't have to be fast. Thus transferToInterpreter()
.
|
||
/** | ||
* These benchmarks compare performance of {@link EnsoMultiValue}. They create a vector in a certain | ||
* configuration representing numbers and then they perform {@code sum} operation on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Akirathan, @hubertp, @4e6 (and also @radeusgd):
- these are the benchmarks for intersection types
- a
Vector
is created bymake_vector typ
method - various
typ
es includeInteger
,Float
,Complex
as non-intersection type benchmarks to compare to - then there are various
typ
es mixingFloat
andComplex
together into different intersection types
Ideally the intersection types benchmark results should be close to base benchmark sumOverComplexBaseBenchmark0
with some "reasonable overhead".
Are you OK with these benchmarks?
return false; | ||
} | ||
final EnsoMultiType other = (EnsoMultiType) obj; | ||
return Arrays.deepEquals(this.types, other.types); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't only deepEquals
be under TruffleBoundary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes/No.
- technically the previous statements are eligible for PE
- however I don't want PE to call into
equals
andhashCode
at all - only callbacks from
ALL_TYPES.computeIfAbset
should access these two methods - only
findOrCreateSlow
(itself by a boundary) should be callingALL_TYPES.computeIfAbset
- so maybe we don't need
@TruffleBoundary
here at all
However excluding code from PE multiple times hurts nothing...
Pull Request Description
Addressing #11846 by writing a set of benchmarks comparing the same algorithm with different kinds of intersection type values. The fix for #11846 is then:
Node
types for performing operationsEnsoMultiValue.NewNode
andFindIndexNode
andAllTypesWith
are introducedEnsoMultiType
replacingType[]
EnsoMultiType
instance forType[]
with equal elementsEnsoMultiType
instances via==
Originally the intersection types benchmarks were 60 times slower than the base benchmark. Now they are just 10-15 times slower. Given the current memory overhead of
EnsoMultiValue
and nature of the benchmark (million of+
operation dispatches), that's a reasonable result. Moreover additional speed up is underway in #11975 and will be visible once these benchmarks get executed daily.Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
EnsoMultiValue.firstDispatch
to speed benchmarks up #11975