[POC][DEPRECATED] exporter batcher - byte size based batching #12017
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DEPRECATED
See for #12091 for the updated POC
Description
This is an POC of serialized size based batching.
Configuration is supported via an additional field to
MaxSizeConfig
.We will validate that at most one of the above fields are specified (TODO) and switch between item count-based batching vs. byte size-based batching accordingly.
To get the byte size of otlp protos, this PR updates
pdata/internal/cmd/pdatagen/internal/templates/message.go.tmpl
to expose an interfaceSize()
. This change will apply to allpdatagen
-generated files.Performance Benchmark
Conclusions
maxSizeLimit
is large enough for most requests. (Assumption 1)Benchmark 1
This benchmark merge splits 1000 requests x 10 logs / request. None of the incoming request would result in a split. The accumulated batch is ~1MB
Benchmark 2
The following benchmarks test the case where every request would result in a split.
Case 1: Merging 10 request, where each contains 10001 logs / ~ 1 MB and is slightly above the limit
Case 2: Merging 10 request, where each contains 9999 logs / ~ 1 MB and is slightly below the limit
Benchmark 3
This benchmark merge splits a request with 100000 logs / 9.6MB in to 10 batches. With the above mentioned assumption, this should rarely occur in practice.
Optimization 1
Instead of splitting precisely, simply put the new request into a new batch if it goes beyond capacity.
Running Benchmark 2 again with optimization:
Case 1: Merging 10 request, where each contains 10001 logs / ~ 1 MB and is slightly above the limit
Takeaway: if size limit is not large enough, this optimization might hurt the performance.
Case 2: Merging 10 request, where each contains 9999 logs / ~ 1 MB and is slightly below the limit
Optimization 2
Optimization 2 is done at the cost of being inaccurate. For example, with Benchmark 1, the actual byte size is 1010000 while the estimated byte size is 1011010. Delta is expected to be larger in real situation as we will have more metadata with logs.
(The original cost is not high either, so optimization 2 is not essential)
Link to tracking issue
Fixes #3262
Testing
Documentation
TODO: ByteSize() should return int64 instead of int