Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/ztimer: implement ztimer_mbox_get_timeout() and use it to fix race in gnrc_sock_recv() #21113

Merged
merged 3 commits into from
Jan 10, 2025

Conversation

maribu
Copy link
Member

@maribu maribu commented Dec 31, 2024

Contribution description

This implements ztimer_mbox_get_timeout() and salvages the test app from #18977 with minor tweaking.

On top of ztimer_mbox_get_timeout(), the timeout of gnrc_sock_recv() is now implemented race-free.

Testing procedure

Run the provided test app. (Maybe also set ENABLE_DEBUG to 1 in sys/ztimer/utils.c to ensure that the race when a message was received just in time but the timeout was not cancelled in time is indeed triggered by the test app.)

Also do some testing with GNRC's SOCK implementation and proper timeout handling.

Issues/PRs references

Better alternative to #18977

This function fetches a message from an mbox, possibly blocking if the
mbox has no message - but with a specified timeout.
@maribu maribu requested a review from benpicco December 31, 2024 00:03
@github-actions github-actions bot added Area: network Area: Networking Area: tests Area: tests and testing framework Area: timers Area: timer subsystems Area: sys Area: System labels Dec 31, 2024
@maribu maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch and removed Area: network Area: Networking Area: tests Area: tests and testing framework Area: timers Area: timer subsystems Area: sys Area: System labels Dec 31, 2024
@riot-ci
Copy link

riot-ci commented Dec 31, 2024

Murdock results

✔️ PASSED

56ea5cd sys/net/gnrc_sock: fix race in gnrc_sock_recv()

Success Failures Total Runtime
10270 0 10271 18m:35s

Artifacts

@maribu maribu force-pushed the sys/ztimer/ztimer_mbox_get_timeout branch from aff3a6a to 97e862c Compare December 31, 2024 10:23
@github-actions github-actions bot added Area: network Area: Networking Area: tests Area: tests and testing framework Area: timers Area: timer subsystems Area: sys Area: System labels Dec 31, 2024
@maribu maribu force-pushed the sys/ztimer/ztimer_mbox_get_timeout branch from 97e862c to a1fd9e3 Compare January 10, 2025 13:17
@maribu
Copy link
Member Author

maribu commented Jan 10, 2025

Fixed a typo found by codespell and squashed

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a much cleaner solution than what has been there before.

@maribu maribu enabled auto-merge January 10, 2025 14:46
@benpicco
Copy link
Contributor

Uh the provided test fails on CI

main(): This is RIOT! (Version: buildtest)
Testing ztimer_mbox_get_timeout()
=================================
testing mbox already full prior call: OK
testing timeout is reached: OK
testing timeout is reached despite message received (race): OK
Running test for reception prior timeout 1000 times: main.c:96 => failed condition
*** RIOT kernel panic:
CONDITION FAILED.

*** halted.

Implement the timeout using ztimer_mbox_get_timeout() to fix a race
condition.
@maribu maribu force-pushed the sys/ztimer/ztimer_mbox_get_timeout branch from a1fd9e3 to 56ea5cd Compare January 10, 2025 15:19
@maribu
Copy link
Member Author

maribu commented Jan 10, 2025

Let's try again with even more relaxed timeout on native.

@maribu maribu added this pull request to the merge queue Jan 10, 2025
Merged via the queue into RIOT-OS:master with commit ade999a Jan 10, 2025
25 checks passed
@maribu
Copy link
Member Author

maribu commented Jan 10, 2025

Thx!

@maribu maribu deleted the sys/ztimer/ztimer_mbox_get_timeout branch January 10, 2025 20:23
maribu added a commit to maribu/RIOT that referenced this pull request Jan 10, 2025
This reverts commit e3d0068, which
added a work around for two bugs:

- ztimer triggering too early (fixed in
  RIOT-OS#20924)
- gnrc_sock_recv() returning when an old "timeout" message is still
  in the message queue (fixed in
  RIOT-OS#21113)

With those bugs fixed, the work around should not longer be needed.
@maribu
Copy link
Member Author

maribu commented Jan 10, 2025

Let's try again with even more relaxed timeout on native.

The test is flaky on native when build with LLVM 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking Area: sys Area: System Area: tests Area: tests and testing framework Area: timers Area: timer subsystems CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants