Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occasional terminate_at_block failures #61

Merged
merged 13 commits into from
May 3, 2024
Merged

Conversation

linh2931
Copy link
Member

@linh2931 linh2931 commented Apr 23, 2024

https://github.com/AntelopeIO/spring/actions/runs/8707646627/job/23883711651#step:4:729 reports a failure of nodeos_read_terminate_at_block_test. It was triggered by

assert head == termAtBlock, f"head {head} termAtBlock {termAtBlock}"

with AssertionError: head 76 termAtBlock 75. This indicates nodoes did not stop at the specified terminate-at block, but a block after it.

The PR fixes the problem by log_irreversible checking each block which is to become LIB and make sure it not pass the terminate-at block.

Resolved #38

@linh2931 linh2931 requested review from heifner and greg7mdp April 23, 2024 12:25
@heifner
Copy link
Member

heifner commented Apr 23, 2024

I don't think this is correct. The expectation is to stop a node at a given block. Not around a given block. There is clearly a bug in the code which can be confirmed by looking at the logs. The node stopped after the requested block.

@linh2931
Copy link
Member Author

I don't think this is correct. The expectation is to stop a node at a given block. Not around a given block. There is clearly a bug in the code which can be confirmed by looking at the logs. The node stopped after the requested block.

Thanks. I looked at the logs before this and did not see anything wrong. Am trying to reproduce it locally with more logs.

Any reason why the implementation is conf.terminate_at_block <= chain_head.block_num(), not conf.terminate_at_block == chain_head.block_num()?

@heifner
Copy link
Member

heifner commented Apr 23, 2024

Any reason why the implementation is conf.terminate_at_block <= chain_head.block_num(), not conf.terminate_at_block == chain_head.block_num()?

My guess it is just extra paranoid, in case it was missed elsewhere.

@linh2931
Copy link
Member Author

Any reason why the implementation is conf.terminate_at_block <= chain_head.block_num(), not conf.terminate_at_block == chain_head.block_num()?

My guess it is just extra paranoid, in case it was missed elsewhere.

I missed in the log that between two received blocks from peers, a vote message comes in and advances LIB over the target terminate-at block 75. Should we keep the implementation as is and update the test, or check conf.terminate_at_block <= chain_head.block_num() for irreversible mode in log_irreversible?

debug 2024-04-16T14:56:19.364 nodeos    net_plugin.cpp:3872           process_signed_block ] received signed_block: #79 block age in    secs = 0, connection - 2, header validated, lib #74
2024-04-16T14:56:17.500 signed by eosio [trxs: 0, lib: 74, net: 0, cpu: 100, elapsed: 62, time: 298, latency: 1866 ms]
...
debug 2024-04-16T14:56:19.365 net-2     net_plugin.cpp:3724           handle_message       ] ["localhost:9776:blk - 409f852" - 1 127.0. 0.1:9776] received vote: block #79:f53fc362f2497434.., strong, key qVbh4IjYZpRGo8U_..
info  2024-04-16T14:56:19.366 nodeos    controller.cpp:3374           log_applied          ] Received block 3509b93472fabef8... #75 @   2024-04-16T14:56:17.500 signed by eosio [trxs: 0, lib: 74, net: 0, cpu: 100, elapsed: 62, time: 298, latency: 1866 ms]
debug 2024-04-16T14:56:19.366 nodeos    net_plugin.cpp:4039           on_irreversible_bloc ] on_irreversible_block, blk num = 75, id =  0000004b3509b93472fabef8b669c485332603eeed7b7deb54cdc79d1d053903
debug 2024-04-16T14:56:19.366 nodeos    net_plugin.cpp:3263           update_chain_info    ] updating chain info lib 75, fork 79
debug 2024-04-16T14:56:19.366 nodeos    controller.cpp:4235           clear_expired_input_ ] removed 0 expired transactions of the 27   input dedup list, pending block time 2024-04-16T14:56:18.000
info  2024-04-16T14:56:19.366 nodeos    controller.cpp:3374           log_applied          ] Received block 31f3e8138a42c788... #76 @   2024-04-16T14:56:18.000 signed by defproducera [trxs: 0, lib: 74, net: 0, cpu: 100, elapsed: 36, time: 176, latency: 1366 ms]
debug 2024-04-16T14:56:19.366 nodeos    net_plugin.cpp:4039           on_irreversible_bloc ] on_irreversible_block, blk num = 76, id =  0000004c31f3e8138a42c7884536f2edcaa2f2b9ec808474c5b29962f219763f
debug 2024-04-16T14:56:19.366 nodeos    net_plugin.cpp:3263           update_chain_info    ] updating chain info lib 76, fork 79
...
debug 2024-04-16T14:56:19.828 nodeos    net_plugin.cpp:3872           process_signed_block ] received signed_block: #80 block age in    secs = 0, connection - 2, header validated, lib #76
debug 2024-04-16T14:56:19.828 nodeos    producer_plugin.cpp:682       on_incoming_block    ] received incoming block 80                 0000005054531612fc7a2424da5802de4d6f12e31aaf1aa5f3674d63a42cdd29
info  2024-04-16T14:56:19.828 nodeos    controller.cpp:3915           push_block           ] Reached configured maximum block 75;       terminating

@heifner
Copy link
Member

heifner commented Apr 23, 2024

I missed in the log that between two received blocks from peers, a vote message comes in and advances LIB over the target terminate-at block 75. Should we keep the implementation as is and update the test, or check conf.terminate_at_block <= chain_head.block_num() for irreversible mode in log_irreversible?

Vote messages do not advance LIB. LIB advances as QCs received in block headers are received. The key thing is we want to avoid applying a block past terminate-at-block. We need to add checks so that doesn't happen. It might be possible to check that only in apply_block, but I have not looked in detail.

@linh2931 linh2931 changed the title Fix nodeos_read_terminate_at_block_test flakiness Fix nodeos_read_terminate_at_block_test failure Apr 23, 2024
@arhag arhag linked an issue Apr 24, 2024 that may be closed by this pull request
@linh2931 linh2931 changed the title Fix nodeos_read_terminate_at_block_test failure Fix occasional terminate_at_block failures Apr 24, 2024
libraries/chain/controller.cpp Outdated Show resolved Hide resolved
libraries/chain/controller.cpp Show resolved Hide resolved
… block logs; move terminate-at-block for irreversible mode to apply_block
@linh2931 linh2931 requested a review from heifner April 29, 2024 15:44
libraries/chain/controller.cpp Outdated Show resolved Hide resolved
libraries/chain/controller.cpp Outdated Show resolved Hide resolved
@linh2931 linh2931 requested a review from heifner April 29, 2024 17:55
Base automatically changed from savanna to main April 29, 2024 18:39
@ericpassmore
Copy link
Contributor

Note:start
group: IF
category: INTERNALS
summary: Correctly terminate-at-block when running in irreversible mode.
Note:end

@linh2931 linh2931 requested review from greg7mdp and removed request for greg7mdp May 2, 2024 01:37
shutdown();
return;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@linh2931 linh2931 merged commit 6bb273f into main May 3, 2024
36 checks passed
@linh2931 linh2931 deleted the terminate_at_block_fix branch May 3, 2024 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test Failure: nodeos_read_terminate_at_block_if_lr_test
4 participants