Producer: Aggregate by Predicted Shard #57

etspaceman · 2023-02-22T18:45:23Z

The KPL will attempt to leverage a shard-map and aggregate records that share a predicted shard. The logic for this is as follows:

If the shard map is not available for whatever reason (e.g. an error occurred), no aggregation is performed
If the shard map is stale, the KPL will produce a batch and detect if the predicted shard differed from the resulting shard. If it detects a difference, the shard map is refreshed and the records are resent.

The KCL will silently discard data in aggregated records if they do not have a calculated hash-key within the received shard's range. So re-sending the records, aggregated against the proper shard-id, is vital to receiving all records. This could result in duplicates though, as some records of the original request may have been successfully produced.

An example: consider a scale up operation, where new shards are created, and hash ranges for the new shards are smaller than the previous shard map. It is possible that some of the records of the original try could have successfully been consumed by the KCL, as the calculated hash key could have fallen into that smaller range. Others may not have. So this would mean a retry of the request would result in duplicates being consumed by the KCL.

This also seems like quite a bit of overhead to save a few requests. So the default today in kinesis4cats's Producer differs from the KPL, in that it only aggregates records that share a partition key. This means that the shard map can be stale or empty, and aggregation can still occur safely.

kinesis4cats should also offer the capabilities that the KPL offers, as an opt-in feature. This issue is for that feature implementation.

References:

https://github.com/awslabs/amazon-kinesis-producer/pull/277/files#diff-d1cbd3793299fd3c06ee3b0232d378f6341e9b5eb64eaa9b7b2094a9c911431fR211

https://github.com/awslabs/amazon-kinesis-producer/blob/master/aws/kinesis/core/aggregator.h

https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client/src/main/java/software/amazon/kinesis/retrieval/AggregatorUtil.java#L145-L150

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Producer: Aggregate by Predicted Shard #57

Producer: Aggregate by Predicted Shard #57

etspaceman commented Feb 22, 2023 •

edited

Loading

Producer: Aggregate by Predicted Shard #57

Producer: Aggregate by Predicted Shard #57

Comments

etspaceman commented Feb 22, 2023 • edited Loading

etspaceman commented Feb 22, 2023 •

edited

Loading