Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code and tests fixes to make the full test suite pass #4970

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

emasab
Copy link
Contributor

@emasab emasab commented Feb 18, 2025

Includes a task to run the test suite on demand on Semaphore CI.

A description can be found in each commit message.

@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from 856ce9b to 0ef423a Compare February 18, 2025 18:42
as timeout and checking after wakeups if it's been reached,
Avoids yielding earlier than requested because of spourious wakeups.
Fix flakiness in many tests, especially 0080
because of the fetch backoff left from previous broker.
Resets the fetch backoff when the partitions joins a
new broker.
due to latency increase applying to all RPCs,
including ApiVersions, leading to the timeout happening
before the produce request is sent.
The error is IN_QUEUE instead of IN_FLIGHT, and the
status becomes NOT_PERSISTED instead of POSSIBLY_PERSISTED.
Fixed using the mock cluster instead of sockem and applying
the latency only to the Produce request.
Given only a single request can be in-flight,
in some cases second request still had not been sent
when purging the buffer.
A condition in `on_request_sent` allows to
wait the second request was sent before purging the buffers,
allowing to test the scenario that is expected to test.
…nt authorization issues when removing all the topics on final cleanup
ensuring the topic is marked as errored only
in case of permanent errors, no error is surfaced to the application unless it's  an authorization error and
produce requests can continue with the cached metadata
Similar to Java client logic. Avoids a segmentation fault
if the rktp is missing such as in cases of topic deletion and re-creation with same name.
@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from 0ef423a to b02a4eb Compare February 18, 2025 18:48
emasab added a commit to mfleming/librdkafka that referenced this pull request Feb 18, 2025
…d an it was needed but not sent.

Current leader epoch is now taken from metadata
cache instead of from the requested topic partitions
to ensure it's correctly validated on the leader.
`broker.version.fallback` configuration properties.
@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from b02a4eb to 8a7a17a Compare February 18, 2025 19:32
emasab added a commit to mfleming/librdkafka that referenced this pull request Feb 18, 2025
… broker retention timer is triggered deleting

all produced records and generating an offset out of range error while querying.
skip events generated before the assignment
that lead to a test failure
- avoid full metadata refresh during metadata propagation time after topic creation
- Rebalance events order after max.poll.interval.ms exceeded
…ssages verification. Log warnings for the errors to identify the cause.
calls cause an unknown topic or partition error
@airlock-confluentinc airlock-confluentinc bot force-pushed the dev_run_all_tests_no_flakyness branch from 8a7a17a to 30f2a9c Compare February 19, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant