Releases: redpanda-data/redpanda
Releases · redpanda-data/redpanda
v24.3.4
Features
- Introduces the node config
crash_loop_sleep_sec
, which sets the time the broker sleeps before terminating the process when the limit on the number of consecutive times a broker can crash has been reached. This is most useful in Kubernetes environments where setting this value allows customers to have ssh access into a crash looping pod for a short window of time. by @pgellert in #24825
Bug Fixes
- Avoid large allocations for the kafka response sequencing map. by @pgellert in #24743
- Fixes Iceberg metadata serialization to avoid writing an extraneous empty Avro block. This would previously prevent some query engines (e.g. BigQuery) from reading tables created by Iceberg Topics. by @andrwng in #24920
- Fixes a bug where failing to audit an authentication event could lead to a broker crash. by @pgellert in #24738
- Fixes a crash during partition shutdown. This can happen during partition moves (cross core/broker) or at broker shutdown. by @bharathv in #24939
- Fixes an issue where transactions incorrectly timeout due incorrect cleanup of evicted producers. by @bharathv in #24879
- Remove partial kvstore snapshots at startup. by @ztlpn in #24845
- #24914 Fixes integer overflow issues when given a schema via the
POST /subject/{subject}/version
where version was > INT_MAX or a negative value was provided. by @michael-redpanda in #24916
Improvements
- stable leadership under load by @mmaslankaprv in #24773
- PR #24672 [v24.3.x] raft: Make load_snapshot exception safe by @Lazin
- PR #24747 [v24.3.x] [CORE-8485] Reset translation state on snapshot by @mmaslankaprv
- PR #24764 [v24.3.x] [CORE-8787] Schema Registry: Support normalize=true for protobuf by @BenPope
- PR #24765 [v24.3.x] [CORE-8754] Handle new TLS error code by @michael-redpanda
- PR #24783 [CORE-8787] [v24.3.x] schema_registry: Normalization improvements by @BenPope
- PR #24785 [v24.3.x] bazel: Update protobuf to v29.0 by @BenPope
- PR #24786 [v24.3.x] bazel: add rp_util and compat by @rockwotj
- PR #24790 [v24.3.x] c/topic_table: notify ntp delta waiters in batches by @ztlpn
- PR #24799 [v24.3.x] [CORE-8450] schema_registry/protobuf: Optimize construction of iobuf by @BenPope
- PR #24805 [v24.3.x] rm_stm: improved logging related to producer eviction by @bharathv
- PR #24809 Manual backport 24772 v24.3.x 935 by @michael-redpanda
- PR #24836 [v24.3.x] ducktape: bump spark version by @ztlpn
- PR #24841 [v24.3.x] r/stm_manager: stop state machines before waiting for gate by @mmaslankaprv
- PR #24872 [v24.3.x] pandaproxy/sr: Fix normalized rendering for custom options by @IoannisRP
- PR #24903 [v24.3.x] CORE-8804 Make brokers field no longer required by @michael-redpanda
- PR #24906 [v24.3.x] Fix partitions local summary by @bashtanov
- PR #24910 [v24.3.x] pandaproxy/sr: Add rendering support for extension range options by @IoannisRP
Full Changelog: v24.3.3...v24.3.4
v24.2.16
Features
- #24826 Introduces the node config
crash_loop_sleep_sec
, which sets the time the broker sleeps before terminating the process when the limit on the number of consecutive times a broker can crash has been reached. This is most useful in Kubernetes environments where setting this value allows customers to have ssh access into a crash looping pod for a short window of time. by @pgellert in #24846
Bug Fixes
- Avoid large allocations for the kafka response sequencing map. by @pgellert in #24742
- Fixes a bug where failing to audit an authentication event could lead to a broker crash. by @pgellert in #24739
- Fixes an issue where transactions incorrectly timeout due incorrect cleanup of evicted producers. by @bharathv in #24878
- Remove partial kvstore snapshots at startup. by @ztlpn in #24843
- #24915 Fixes integer overflow issues when given a schema via the
POST /subject/{subject}/version
where version was > INT_MAX or a negative value was provided. by @michael-redpanda in #24917
Improvements
- #24668 stable leadership under load by @mmaslankaprv in #24708
Full Changelog: v24.2.15...v24.2.16
v24.3.3
Bug Fixes
- Fixes a bug in Redpanda's Iceberg manifest list Avro definition that previously resulted in an end-of-file (EOF) error when reading manifest list Avro files written by other engines. This could previously crash Redpanda or block Redpanda from appending Iceberg data, and could also prevent certain query engines from successfully reading Iceberg data written by Redpanda. by @andrwng in #24650
- Fixes a bug which may lead to
archival_metadata_stm
inconsistencies when reconfiguring clusters with recovered compacted topics. by @mmaslankaprv in #24678 - #24684 Fixes an issue that blocked the compaction of consumer offsets with group transactions. by @bharathv in #24688
- fixes rare bug leading to offset translation inconsistency in recovered topics by @mmaslankaprv in #24628
Improvements
- Added metrics for pandaproxy resource usage. by @IoannisRP in #24603
- Adds logging to mention data removed by compaction. by @andrwng in #24736
- Move failed authorization log statements from the
kafka
logger to a newkafka/authz
logger, allowing for fine grained control over log statements for failed authorization. by @rockwotj in #24718 - rpk now supports well-known protobuf types when encoding/decoding records using Schema Registry. by @r-vasquez in #24699
- PR #24591 [v24.3.x] pandaproxy: add missing internal metrics by @IoannisRP
- PR #24608 [v24.3.x]
storage
: addtombstones_removed
metric toprobe
by @WillemKauf - PR #24619 [v24.3.x] Offset translator consistency validation by @mmaslankaprv
- PR #24627 [v24.3.x] rpk remote debug bundle: job-id help text change by @r-vasquez
- PR #24705 [v24.3.x] kafka/client: replace std::vector with chunked vector by @IoannisRP
- PR #24729 [v24.3.x] rpk bundle: Fix race condition in SASL credential redaction by @r-vasquez
Full Changelog: v24.3.2...v24.3.3
v24.2.15
Bug Fixes
- Fixes a bug where failing to audit an authentication event could lead to a broker crash. by @pgellert in #24739
- Fixes a bug which may lead to
archival_metadata_stm
inconsistencies when reconfiguring clusters with recovered compacted topics. by @mmaslankaprv in #24680 - #24685 Fixes an issue that blocked the compaction of consumer offsets with group transactions. by @bharathv in #24689
- fixes rare bug leading to offset translation inconsistency in recovered topics by @mmaslankaprv in #24629
Full Changelog: v24.2.14...v24.2.15
v24.2.14
Bug Fixes
- Fixes a bug in which a segment being rolled and closed could race, leading to a triggered
vassert
. by @WillemKauf in #24559
Improvements
- Added metrics for pandaproxy resource usage. by @IoannisRP in #24604
- Show leader id in
/v1/cluster/partitions
response. by @ztlpn in #24584
Full Changelog: v24.2.13...v24.2.14
v24.3.2
Features
- Improve the user messages when the
topic_partitions_reserve_shard0
cluster config is used and a user tries to create a topic with more partitions than the core-based partition limit. by @pgellert in #24461
Bug Fixes
- Ensure
redpanda_cloud_storage_cloud_log_size
metric consistent across all replicas. We used to update it seldomly from the leader replica only which lead to inconsistent/stale values. by @nvartolomei in #24364 - Fixed a bug in which sliding window compaction may become stuck on failing to build an index map for a single segment. by @WillemKauf in #24424
- Fixes a bug in which a segment being rolled and closed could race, leading to a triggered
vassert
. by @WillemKauf in #24560 - Fixes a bug in which segments which may have tombstones in them were not considered eligible for self-compaction. by @WillemKauf in #24500
- Fixes a bug that could prevent topic recovery on ABS object storage when there are objects in a bucket from multiple clusters (e.g. following a whole cluster restore). by @andrwng in #24455
- Fixes a bug where
rpk
wasn't parsing--help
when used alongside--redpanda-id
inrpk cloud <provider> byoc apply
by @r-vasquez in #24396 - Fixes a bug where serializing manifests for Iceberg topics with decimal fields could cause Redpanda to crash or upload invalid manifests by @oleiman in #24467
- Fixes a crash resulting from incorrect cleanup of log readers used for iceberg translation. by @bharathv in #24576
- Fixes a race that could prevent Iceberg translation from happening following a leadership change. by @andrwng in #24562
- Fixes accounting of iceberg commit lag metric that can remain erroneously high in some cases even though the translation if fully caught up. Additionally the change ensures that only partition leaders emit lag metrics while followers emit 0 lag. by @bharathv in #24575
- If a discrete disk is used for cloud storage cache Redpanda previously rejected writes if that disk (cache disk) was full (in degraded state). This is incorrect since the cache disk isn't in the way of writes. From now on, reject writes only if the data disk is full (in degraded state). by @nvartolomei in #24486
- #24428 Schema Registry: fixes a bug in the Avro compatibility check reader_field_missing_default_value where it was too lenient for missing default values of null-able types. by @pgellert in #24430
- #24587 Redpanda will now permit topics to be created with
redpanda.remote.[read|write]
set totrue
when a license is expired or missing provided that the cluster configcloud_storage_enabled
is set tofalse
. by @michael-redpanda in #24588
Improvements
- Adds additional debug log messages in the datalake coordinator regarding files to be committed to Iceberg. by @andrwng in #24563
- Beta version of Iceberg support was incorrectly classified as "enterprise only". by @oleiman in #24443
- Leader balancer: don't treat each core as independent and balance total number of leaders on each node as well. by @ztlpn in #24440
- Show leader id in
/v1/cluster/partitions
response. by @ztlpn in #24585 - #24539 Disable datalake services in recovery mode by @ztlpn in #24549
rpk topic describe
now supports the--format
flag to display the output in either JSON or YAML. by @r-vasquez in #24438
Full Changelog: v24.3.1...v24.3.2
v24.2.13
Features
- Improve the user messages when the
topic_partitions_reserve_shard0
cluster config is used and a user tries to create a topic with more partitions than the core-based partition limit. by @pgellert in #24462
Bug Fixes
- Ensure
redpanda_cloud_storage_cloud_log_size
metric consistent across all replicas. We used to update it seldomly from the leader replica only which lead to inconsistent/stale values. by @nvartolomei in #24365 - Fixes a bug that could prevent topic recovery on ABS object storage when there are objects in a bucket from multiple clusters (e.g. following a whole cluster restore). by @andrwng in #24454
- Fixes a bug where
rpk
wasn't parsing--help
when used alongside--redpanda-id
inrpk cloud <provider> byoc apply
by @r-vasquez in #24397 - If a discrete disk is used for cloud storage cache Redpanda previously rejected writes if that disk (cache disk) was full (in degraded state). This is incorrect since the cache disk isn't in the way of writes. From now on, reject writes only if the data disk is full (in degraded state). by @nvartolomei in #24484
- #24431 Schema Registry: fixes a bug in the Avro compatibility check reader_field_missing_default_value where it was too lenient for missing default values of null-able types. by @pgellert in #24432
- PR #24200 [v24.2.x] cst/cache: fix use-after-move caused by calling get_exception twice by @nvartolomei
- PR #24329 [v24.2.x] Fixed race condition between appends and prefix truncation by @mmaslankaprv
- PR #24335 rm_stm: remove always true assert on transaction_ga feature by @bharathv
- PR #24349 [v24.2.x] c/balancer_planner: check if topic exists in node count map by @mmaslankaprv
- PR #24372 [v24.2.x] c/controller_backend: allow
shutdown_partition
to fail on app shutdown by @bashtanov - PR #24459 [v24.2.x] raft/c: fix an indefinite hang in transfer leadership by @bharathv
Full Changelog: v24.2.12...v24.2.13
v24.3.1
Features
- Added support for Iceberg Topics (various improvements below)
- New REST API for mounting/unmounting topics by @mmaslankaprv in #23167
- adds rpk cluster storage topic mount, unmount, list-mount, status-mount, cancel-mount by @gene-redpanda in #23575
- Add leadership pinning: ability to set preferred racks for topic partition leaders. To configure, set
redpanda.leaders.preference
topic config property ordefault_leaders_preference
cluster config property. by @ztlpn in #23691 - Enable
node_local_core_assignment
feature by default by @ztlpn in #23453 - Adds Schema Registry support for the JavaScript Data Transforms SDK by @oleiman in #21491
- Adds list-mountable to allow listing mountable topics by @gene-redpanda in #23924
- Adds the topic property
delete.retention.ms
, as well as the cluster propertytombstone_retention_ms
. Configuring these allow for the removal of tombstone records in compacted topics with tiered storage disabled inredpanda
. by @WillemKauf in #23662 - Schema Registry: Support
normalize=true
by @BenPope in #22519 - Schema Registry: added support for the "verbose" query parameter on the schema compatibility checker endpoint by @pgellert in #22877
- Schema Registry: verbose compatibility error reporting is now supported for JSON as well by @pgellert in #23208
- #17984 Adds a new broker configuration transaction_max_timeout_ms. The configuration controls the maximum allowed user set timeout for transactions. If a client requested transaction timeout exceeds this configuration, the broker will return
an error during transactional producer initialization. This guardrail prevents hanging transactions from blocking consumer progress. The default value is 15mins. by @bharathv in #21504 - rpk: Add
rpk registry mode
to manage the schema registry mode. by @r-vasquez in #22675 - rpk: supports triggering on-demand partition balancer by @daisukebe in #22855
- Added support for using PKCS#12 files for TLS services by @michael-redpanda in #21313
- Adds admin API endpoint for enterprise feature info
GET /v1/features/enterprise
by @oleiman in #23314 - A new metric (cluster_features_enterprise_license_expiry_sec) is added for easier monitoring of the enterprise license's expiry time. by @pgellert in #23367
- After the cluster is first formed, a trial license is automatically loaded to provide an evaluation period of enterprise features. by @pgellert in #23893
Improvements
- --regex flag in
rpk topic describe
now supports internal topics. by @r-vasquez in #23487 - A number of optimizations to local storage compaction. by @WillemKauf in #23380
- Add an LRU caching layer to Rust transform SDK Schema Registry client by @oleiman in #19859
- Add support for differentiating tombstone records from empty-string value records in
rpk produce
andrpk consume
. by @WillemKauf in #23264 - Added support for Metadata API v8 by @michael-redpanda in #22669
- Added vectorized_kafka_rpc_connections_rejected_rate_limit metric which counts incoming Kafka connections rejected due to the connection rate limit (if set), analogously to the existing vectorized_kafka_rpc_connections_rejected metric which counts rejected connections due to the hitting the open connection limit. by @travisdowns in #22803
- Adds a shard label to some consumer group metrics. by @ballard26 in #23339
- Adds support for setting schema registry connection parameters in the
rpk
stanza ofredpanda.yaml
. by @andrewstucki in #24017 - Adds the
cloud_storage_backend::oracle
value, and helps thes3_client
properly configure for OCI storage. by @WillemKauf in #22902 - Adds the ability to configure Node UUID and ID overrides at broker startup. by @oleiman in #22972
- Allow
rpk cluster self-test start
to run, even in a cluster with mixed versions ofredpanda
(before and aftercloudcheck
addition in24.2.x
). by @WillemKauf in #21370 - Allows
DeleteRecords
requests from Kafka clients orrpk topic trim-prefix
to be called withtruncation_offset <= start_offset
without returning an error. The request is instead treated as a no-op. by @WillemKauf in #22905 - Allows the self-test to be completely compatible with a mixed version cluster, in the case of a rolling upgrade. by @WillemKauf in #22831
- Deprecate
leader_balancer_mode
cluster config property. by @ztlpn in #23780 - Implements
@redpanda-data/transform-sdk-sr.SchemaFormat
for the WASM Transforms JS module by @oleiman in #23164 - Improve handling of boolean property values during a
CreateTopics
request by making parsing case-insensitive. by @WillemKauf in #23682 - Improve handling of boolean values during a
CreateTopics
request by no longer silently ignoring an invalid value, instead throwing a configuration error. by @WillemKauf in #23682 - Improve handling of certain invalid topic configuration parameters that would lead to a timeout failure instead of a graceful error code during a
CreateTopics
request. by @WillemKauf in #23682 - Improve property configuration descriptions. by @Deflaimun in #23347
- Minimizes data loss in recovery scenarios by @mmaslankaprv in #24071
- Reduce the memory overhead of many small segments. by @rockwotj in #22962
- Return core assignments from health report in
/v1/cluster/partitions
admin API output. by @ztlpn in #22695 - Schema Registry: 5 new compatibility checks are added for protobuf (ONEOF_FIELD_REMOVED, MULTIPLE_FIELDS_MOVED_TO_ONEOF, REQUIRED_FIELD_{ADDED,REMOVED}, FIELD_NAMED_TYPE_CHANGED, MESSAGE_REMOVED) by @pgellert in #22798
- Schema Registry: Improve AVRO Normalization by @BenPope in #22519
- Schema Registry: now reports more specific error messages for Avro and Protobuf schemas when they are incompatible with earlier schemas. by @pgellert in #22958
- Set the default value of
topic_partitions_reserve_shard0
to zero. This means that we no longer weight shard 0 as if it has 2 more partitions than it actually has, leading to more even partition distribution in cases where the total number of partitions is close to the vCPU count. by @travisdowns in #22841 - The command line is now printed to the log at startup by the Redpanda process. by @travisdowns in #22826
- Upgrade data transforms tinygo compiler to version 0.34.0 by @rockwotj in #23969
- #17682 Schema Registry: Remove spurious log entry:
No syntax specified for the proto file
by @BenPope in #22633 - #21536
rpk topic describe-storage
can be used now with internal topics. by @r-vasquez in #22338 - #22333 rpk debug bundle: include the result of
uname -a
by @JFlath in #22334 - #22666 Allows users to query the value of a cluster property with
rpk cluster config get
using either the original property name, or any of its aliases. Whereas before,rpk cluster config get
using a property's aliased name would return aProperty {} not found
result. by @WillemKauf in #22674 - [#23038](https://github.com/redpanda-dat...
v24.2.12
Bug Fixes
- Fixed an issue where creating a topic with a huge number of partitions could lead to a crash. by @IoannisRP in #24232
Improvements
- Schema Registry: Add Some metrics for resource usage taken by in-memory schemas by @BenPope in #24270
Full Changelog: v24.2.11...v24.2.12
v24.2.11
Bug Fixes
- Construct audit metrics probe during service initialization to prevent null pointer access. by @michael-redpanda in #24127
- Fixed an issue where creating a topic with a huge number of partitions could lead to a crash. by @IoannisRP in #24232
- Fixes a bug in which upload candidates made from segments with missing batches would trigger metadata related errors in the
ntp_archiver_service
, due to assigned start offsets being lower than they should be. by @WillemKauf in #24106 - #24076 Fixes a rare bug during remote partition manifest downloads where broken pipe exceptions weren't retried in an edge case. by @pgellert in #24080
- #24144 This fixes a bug in the audit client where if the cluster config value
kafka_batch_max_bytes
was greater thanaudit_client_max_buffer_size
, the audit client ends up not producing any messages and becomes stuck filling up the audit log buffers. by @pgellert in #24148 - #24207 Redpanda neglected to include ECDSA based ciphers in the cipher strings used for TLSv1.2 and below. This caused TLS connections that used ECDSA based certificates to fail cipher negotiation when using TLSv1.2 and below. ECDSA ciphers are now in the list of supported ciphers. by @michael-redpanda in #24209
Full Changelog: v24.2.10...v24.2.11