[exporter] exporter batcher - byte size based batching #12091
+414
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is an POC of serialized size based batching (version 2)
Configuration is supported via an additional field to
MaxSizeConfig
.We will validate that at most one of the above fields are specified (TODO) and switch between item count-based batching vs. byte size-based batching accordingly.
To get the byte size of otlp protos, this PR updates
pdata/internal/cmd/pdatagen/internal/templates/message.go.tmpl
to expose an interfaceSize()
. This change will apply to allpdatagen
-generated files.Getting serialized byte size
Getting serialized byte size is an expensive operation that gets repeatedly invoked during bytes based merge-splitting. The POC shows two optimizations that improves by 1. reducing size calculation times 2. reducing size calculation cost
Caching size in
Request
We repeated call
request.ByteSize()
while merg-splitting. By caching previous result inrequest.ByteSize()
, we were able to improve the performance especially for the case where many requests are merged into a single batch.Proto size caculation
While filling the batch, we repeated evaluate size of new items and the remaining space, and both calculations are expensive. This PR shows an optimization that calculates remaining space iteratively using new item size:
where
This helped significantly improve the performance. For benchmark result without this optimization, see #12017.
Performance Benchmark
Benchmark 1 - 1000 small requests merge into one batch
This benchmark merges 1000 requests x 10 logs / request. The accumulated batch is ~1MB
Benchmark 2 - Every incoming request causes a split
Case 1: Merging 10 request, where each contains 10001 logs / ~ 1 MB and is slightly above the limit
Case 2: Merging 10 request, where each contains 9999 logs / ~ 1 MB and is slightly below the limit
Benchmark 3 - A huge request splits into 10 batches
This benchmark merge splits a request with 100000 logs / ~10MB in to 10 batches.
Link to tracking issue
Fixes #3262
Testing
Documentation
TODO: ByteSize() should return int64 instead of int