Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supernode unable to progress range sync due to unknown RPC error #6989

Open
jimmygchen opened this issue Feb 12, 2025 · 1 comment
Open

Supernode unable to progress range sync due to unknown RPC error #6989

jimmygchen opened this issue Feb 12, 2025 · 1 comment
Labels
das Data Availability Sampling peerdas-devnet-4 syncing

Comments

@jimmygchen
Copy link
Member

I've been trying to sync a fresh supernode to peerdas-devnet-4, and finalized sync seems to frequently get stuck.

I've observed RPC requests sent and completed but eventually the batch failed with RPC Error - but i don't see any RPC error for this batch during this period.

See some partial logs below

  • 15 requests were sent for this batch epoch = 41, range request id = 35
  • 15 requests were completed
  • and a Batch failed. RPC Error followed immediately, with no RPC error logs found in this period.

I'm suspecting we have an RPC error log suspressed somewhere, it's not obvious from the log what the rpc error is, and the node just keep retrying to sync the same batch and unable to make progress.

Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 36/35/RangeSync/41/1, peer: 16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, epoch: 41, slots: 32, method: BlocksByRange, service: sync, module: network::sync::network_context:788
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 37/35/RangeSync/41/1, peer: 16Uiu2HAmH5PwwsfXLCGqSi2xFgPu7Zy9xrzXKTqc5ZwxkkdhYbVy, columns: [105, 37, 35, 36, 76, 54, 96, 26], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 38/35/RangeSync/41/1, peer: 16Uiu2HAm9PijSZpm5QUphXRoBtkhUZPkGJ4Rgxk4Bny91oZPYZLG, columns: [100, 3, 13, 126, 63, 7, 103, 85, 41], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 39/35/RangeSync/41/1, peer: 16Uiu2HAm35d5GWdmRH2XtjM4PMi6MXyCSwXgZi5KwkGNijbSmARB, columns: [89, 8, 79, 110, 75, 15, 109, 58, 1], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 40/35/RangeSync/41/1, peer: 16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, columns: [93, 45, 46, 60, 47, 115], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 41/35/RangeSync/41/1, peer: 16Uiu2HAkzUzciBZ9mW18RAwhGowdz6SRbt36gS1NZupVXUkGENW7, columns: [18, 43, 57, 30, 120, 39, 95, 2, 56, 98, 127], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 42/35/RangeSync/41/1, peer: 16Uiu2HAm4SC4chiQADwP7M8xu2nigb7GsabhKfULHZ2gMAcw6fmD, columns: [24, 4, 71, 102, 44, 21, 118, 87, 11, 53], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 43/35/RangeSync/41/1, peer: 16Uiu2HAmNSdQ4yXgyBUpQ5yGRE32L4t4kkQahfaHCxcmZHbtT7md, columns: [99, 49, 6, 80, 40, 114, 61], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 44/35/RangeSync/41/1, peer: 16Uiu2HAmAAZ5wP6fvpe1b9tWNmgA2Wn8MsrNYVhkw5WohcAoaHKR, columns: [25, 121, 104, 108, 111, 48, 92, 14], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 45/35/RangeSync/41/1, peer: 16Uiu2HAmJyaVGkRGR9ACqompkoge8T2x4KFH4KbzDkh7zz6uN2JX, columns: [125, 77, 17, 74, 23, 123, 72, 84, 59, 124, 107], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 46/35/RangeSync/41/1, peer: 16Uiu2HAmChnSSEeAB9w5Gw2zqF23ufsrPAUqZWWo3mUgJhWJWRP4, columns: [112, 62, 91, 32, 42], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 47/35/RangeSync/41/1, peer: 16Uiu2HAmR4AjDzeAMa8ud5E4L3tu92D3GfBJ3z92MbqUEwHiFgnN, columns: [69, 70, 116, 83, 119, 94, 31, 0, 9, 68, 64, 65, 90], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 48/35/RangeSync/41/1, peer: 16Uiu2HAkwLdYWXA39aykj6uLy6knHo7V6CaMVU7Xq2smc6Ysw5eh, columns: [10, 16, 19, 27, 117, 113, 12, 50, 97, 29], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 49/35/RangeSync/41/1, peer: 16Uiu2HAmNBbHKNxDDgZ1wjRRaodUavr9oiQ7awRcmT2m42VATZpn, columns: [88, 67, 101, 55, 122, 82, 5, 81, 20, 51], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 50/35/RangeSync/41/1, peer: 16Uiu2HAmLCfZdW9qfBvmvVVCKSPDKNeyAZTgx3eZSxF11Ao85JCb, columns: [33, 38, 86, 66, 73, 52, 22, 78, 106, 34, 28], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:19.875 DEBG Sync RPC request completed, count: 209, method: DataColumnsByRange, id: 45/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:23.952 DEBG Sync RPC request completed, count: 190, method: DataColumnsByRange, id: 48/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:24.485 DEBG Sync RPC request completed, count: 209, method: DataColumnsByRange, id: 41/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:25.326 DEBG Sync RPC request completed, count: 190, method: DataColumnsByRange, id: 42/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:25.433 DEBG Sync RPC request completed, count: 143, method: DataColumnsByRange, id: 39/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:26.803 DEBG Sync RPC request completed, count: 95, method: DataColumnsByRange, id: 46/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:27.350 DEBG Sync RPC request completed, count: 209, method: DataColumnsByRange, id: 50/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:28.968 DEBG Sync RPC request completed, count: 133, method: DataColumnsByRange, id: 43/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:29.719 DEBG Sync RPC request completed, count: 152, method: DataColumnsByRange, id: 44/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:31.173 DEBG Sync RPC request completed, count: 152, method: DataColumnsByRange, id: 37/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:31.272 DEBG Sync RPC request completed, count: 171, method: DataColumnsByRange, id: 38/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:32.228 DEBG Sync RPC request completed, count: 190, method: DataColumnsByRange, id: 49/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:35.236 DEBG Sync RPC request completed, count: 247, method: DataColumnsByRange, id: 47/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:38.363 DEBG Sync RPC request completed, count: 27, method: BlocksByRange, id: 36/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:38.363 DEBG Sync RPC request completed, count: 114, method: DataColumnsByRange, id: 40/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:38.396 DEBG Batch failed. RPC Error, batch_state: [D,D,p,p,p,], request_id: 35, peer_id: 16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, batch_state: Downloading(16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, 35), batch_epoch: 41, chain: 1, s
ervice: range_sync, service: sync, module: network::sync::range_sync::chain:867

Full logs attached, I restarted the node a few times, so you may want to search for the timestamps from the partial logs above:
beacon.log.zip

The kurtosis config to sync peerdas-devnet-4:

participants:
  - cl_type: lighthouse
    cl_image: sigp/lighthouse:latest-unstable
    cl_extra_params:
      - --subscribe-all-data-column-subnets
  - cl_type: lighthouse
    cl_image: sigp/lighthouse:latest-unstable
network_params:
  network: peerdas-devnet-4
snooper_enabled: false
additional_services:
  - prometheus_grafana
@jimmygchen
Copy link
Member Author

Adding the missing error log in #6990 revealed some issues with converting the request into RPC block.

This seems to happen on supernode only, and the node just end up retrying the requests infintely:

Feb 12 06:33:08.426 DEBG Failed to convert range block components into RpcBlock, e: Not all columns consumed: [0xb5881ee675d848a305e4d1b8cdb9e4362a48bf5aeccb10e81f31e1912ca5b484, 0xd8d843527ffa6b571aead040368505ce514786e335874f676e9b2a85360dceaa, 0x9d41ee5b284a1fb9208b2
1e9fc97216494a957486e825d1dd25caa4607b2e5dc, 0x3c57315ef06f5a025211a736567b3d197f8c5a2f7984e6a59f912a065669e176, 0x22295c04e5e4ea56c3d27205fa589faf7de98ec307dedd44c9c93927200d5e17, 0x49cc7e2a469bd3fc84b76396b0549fbe791361395138df3795b19be37b3ba489, 0x674c7f1ba66468158a2
59720fef5c7690668597d9ede92a5c55e8404ad6aba7b, 0xb6ebe1564ef372bd4f992c4ae737f2cf8643e20ab32a44dd5b9454e69e1145ba, 0x8d12dd2690cf45235b9f316e91bc7cde75b22f886d8be23f9ccb3ff0f0b09be0, 0x0786a520fb77400cd575651e67fcc095e7f19cd7bfe1dfdb035d5574e490e36e, 0x9e2c304c26fff5048
f1c9ded3d73cdc425f1a46dfa0cd7f86258a06ad8347f78, 0xc2d20f20f0e31c8a54d3f5c187d605010b8ce19781f108c367a730412acb2127, 0xf39f7606244ca89c506fbcbc3ce21a459e5897e9b2ac6d4cfe10a0a10db0f1d0, 0x622da1635fad1f73ffe29681ce0d4a61b02003a0e724d0d21297303c0a77e306, 0x05761cc798f7ab7
bc636bcf1fcc9fc929e5078070b1b5270c3b2652b6036b8da, 0x2e0c04fa2438eefc718a393b4888e892278921c4493d8d530280fa7baaad22bd, 0xcee48390612703524b511a1d9bdc7f3b22d9e5c8f8317a3938af1f3d8798a746, 0xf23f8d8d0b4a08f8a6325171b48db50e2b8a890d30fdecb5d95d6853fe397963, 0xb1ead5b592d38
1bb9c0a7267b14f0ef26c49d48289af6199c352c2e010433352, 0xad84762c9baba36a0c76ca7a59b8236c4cb2917f039f77de80bac2e59227d88d, 0xff4538a243e1ad4936590e22d81df3bc468baf7a0dac1059659dd0a215816067, 0x7dc87ca2225327f9c0d8ec67e171df91985e2518fe144e13577b1e43160600f6, 0xe9ec5d71fb2
c7efff1f0e99e3006607a0ced9fd4ef9dc05883a54d3fd2ec6642, 0xd1c08b642682855a483e107e51e57bb250ca30b82201de4cb2edcd560a5e884a, 0xd118881255e60553d7fe8e4cb40fabb4f684bce14bace7faf34bd92bff1a1ea9, 0x1418d55ec3aad99a9284cc895c5ae911d9b4abcfeccbf2a6852699caa43909a1, 0xa89488159
9a669aca372c74aa10a8c6d527a3c601f0ca3ca0e0a60554923e548, 0x3c5b1b8900b8ffd59afbd4a5370447f7cdac58400d4b34909ad39bf88f612bd2, 0xb88796d98edc345d70a5571473cedad3a782a9087b489b13fb72117c155dce16, 0x1030e17ac4fc17bf36d77f8d8d410ae400d4be4977a062994ee952936ebbca9b], id: 2491
/RangeSync/5/2343, service: sync, module: network::sync::network_context:488
Feb 12 06:33:09.514 DEBG Failed to convert range block components into RpcBlock, e: No column for block 0xcb4f04f78fdd1e6f5f0fbdd6fbe1fc160c370509fd65f995416a68a99be0b902 index 72, id: 2484/RangeSync/7/2343, service: sync, module: network::sync::network_context:488
Feb 12 06:33:17.137 DEBG Failed to convert range block components into RpcBlock, e: No column for block 0x967b83b5570927a2741d0f7c2a60fd9c2ae77895297d7f63311da8fcda773dd6 index 13, id: 2505/RangeSync/6/2343, service: sync, module: network::sync::network_context:488

beacon.log.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling peerdas-devnet-4 syncing
Projects
None yet
Development

No branches or pull requests

1 participant