Code and tests fixes to make the full test suite pass #4970

emasab · 2025-02-18T18:02:21Z

Includes a task to run the test suite on demand on Semaphore CI.

A description can be found in each commit message.

… number of iterations

confluent-cla-assistant · 2025-02-18T18:02:32Z

🎉 All Contributor License Agreements have been signed. Ready to merge.
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

as timeout and checking after wakeups if it's been reached, Avoids yielding earlier than requested because of spourious wakeups. Fix flakiness in many tests, especially 0080

because of the fetch backoff left from previous broker. Resets the fetch backoff when the partitions joins a new broker.

due to latency increase applying to all RPCs, including ApiVersions, leading to the timeout happening before the produce request is sent. The error is IN_QUEUE instead of IN_FLIGHT, and the status becomes NOT_PERSISTED instead of POSSIBLY_PERSISTED. Fixed using the mock cluster instead of sockem and applying the latency only to the Produce request.

Given only a single request can be in-flight, in some cases second request still had not been sent when purging the buffer. A condition in `on_request_sent` allows to wait the second request was sent before purging the buffers, allowing to test the scenario that is expected to test.

…nt authorization issues when removing all the topics on final cleanup

This is in line with KRaft behavior

ensuring the topic is marked as errored only in case of permanent errors, no error is surfaced to the application unless it's an authorization error and produce requests can continue with the cached metadata

…mporary error

…e in the metadata propagation period

Similar to Java client logic. Avoids a segmentation fault if the rktp is missing such as in cases of topic deletion and re-creation with same name.

…opic errors

…e pass

…d an it was needed but not sent. Current leader epoch is now taken from metadata cache instead of from the requested topic partitions to ensure it's correctly validated on the leader.

`broker.version.fallback` configuration properties.

…e consumer group

no one's listening

…e pass

… broker retention timer is triggered deleting all produced records and generating an offset out of range error while querying.

skip events generated before the assignment that lead to a test failure

- avoid full metadata refresh during metadata propagation time after topic creation - Rebalance events order after max.poll.interval.ms exceeded

…ssages verification. Log warnings for the errors to identify the cause.

calls cause an unknown topic or partition error

Script to run tests in batches, with several modes and for a variable…

eb0999c

… number of iterations

airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from 856ce9b to 0ef423a Compare February 18, 2025 18:42

emasab added 13 commits February 18, 2025 19:48

Run all tests pipeline

6171178

Change cnd_timedwait_abs to take a monotonic clock value

a9e2304

as timeout and checking after wakeups if it's been reached, Avoids yielding earlier than requested because of spourious wakeups. Fix flakiness in many tests, especially 0080

Fix for a minimum latency of 500ms in case of leader change,

7595275

because of the fetch backoff left from previous broker. Resets the fetch backoff when the partitions joins a new broker.

Fix for test 0119, remove ACLs in the test that created them to preve…

e311638

…nt authorization issues when removing all the topics on final cleanup

Fix test 0126 memory leak

d7441fa

Remove brokers that aren't up from mock metadata response.

2b9c9c2

This is in line with KRaft behavior

More tests on fast metadata refresh,

4569cad

ensuring the topic is marked as errored only in case of permanent errors, no error is surfaced to the application unless it's an authorization error and produce requests can continue with the cached metadata

Don't remove a topic from cache before its expiration in case of a te…

39c9a5d

…mporary error

Don't mark the topic as unknown even when it's already known and we'r…

bb45035

…e in the metadata propagation period

Only update partition leaders if the topic has no errors.

2da83d6

Similar to Java client logic. Avoids a segmentation fault if the rktp is missing such as in cases of topic deletion and re-creation with same name.

Don't set errors other than TOPIC_AUTHORIZATION_FAILED as permanent t…

87f1f18

…opic errors

airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from 0ef423a to b02a4eb Compare February 18, 2025 18:48

emasab added a commit to mfleming/librdkafka that referenced this pull request Feb 18, 2025

PR confluentinc#4970: Code and tests fixes to make the full test suit…

bc59ef4

…e pass

emasab added 6 commits February 18, 2025 20:31

Fix for issuing a metadata refresh after offsets_for_times call faile…

6ebc3d0

…d an it was needed but not sent. Current leader epoch is now taken from metadata cache instead of from the requested topic partitions to ensure it's correctly validated on the leader.

Connection close debug logs

8f7d4ae

Filter jmx output for improved test speed

bebba16

Deprecate api.version.request, api.version.fallback.ms and

6794ed6

`broker.version.fallback` configuration properties.

Fix for test 0084, avoid purging the rk_ops queue when terminating th…

8a686ce

…e consumer group

Avoid rk_telemetry.termination_cnd is triggered when

86a2765

no one's listening

airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from b02a4eb to 8a7a17a Compare February 18, 2025 19:32

emasab added a commit to mfleming/librdkafka that referenced this pull request Feb 18, 2025

PR confluentinc#4970: Code and tests fixes to make the full test suit…

3f5c130

…e pass

emasab added 4 commits February 19, 2025 13:52

Fix flaky test 0059. Given the timestamp is so old it's possible that…

1229422

… broker retention timer is triggered deleting all produced records and generating an offset out of range error while querying.

Test 0044: use name of the created topic

2e1abcd

Additional test fixes

877f2ec

Fix flakyness in test 0061

212f65f

skip events generated before the assignment that lead to a test failure

emasab added 5 commits February 19, 2025 13:52

Fix flakyness with metadata propagation in test 0085

a498113

Fix flakyness in test 0102:

c707976

- avoid full metadata refresh during metadata propagation time after topic creation - Rebalance events order after max.poll.interval.ms exceeded

Fix flakyness in test 0137: don't consider error count during read me…

bf7248c

…ssages verification. Log warnings for the errors to identify the cause.

Subscription version to avoid stale metadata

53fb83d

calls cause an unknown topic or partition error

PR #4908: Fix assignment lost, on illegal generation, during a commit

30f2a9c

airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from 8a7a17a to 30f2a9c Compare February 19, 2025 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code and tests fixes to make the full test suite pass #4970

Code and tests fixes to make the full test suite pass #4970

emasab commented Feb 18, 2025

confluent-cla-assistant bot commented Feb 18, 2025

Code and tests fixes to make the full test suite pass #4970

Are you sure you want to change the base?

Code and tests fixes to make the full test suite pass #4970

Conversation

emasab commented Feb 18, 2025

confluent-cla-assistant bot commented Feb 18, 2025