Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vnodes] All c-s write load threads failed with WriteTimeoutException: 3 replica were required but only 2 acknowledged the write #22699

Open
1 of 2 tasks
juliayakovlev opened this issue Feb 5, 2025 · 0 comments
Labels
triage/master Looking for assignee

Comments

@juliayakovlev
Copy link

Packages

Scylla version: 2025.1.0~rc1-20250202.28b889668011 with build-id e46f69dbaac1a5eefd731a8d83206c0af4cbd7fa
Kernel Version: 6.8.0-1021-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

All cassandra-stress write load threads failed with WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency ALL (3 replica were required but only 2 acknowledged the write)

WriteTimeoutExceptions started immediately with write load start (actually like in original test).
First command started at 11:18:38,858:

< t:2025-02-04 11:18:38,858 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2025-02-04 11:18:38.856: (CassandraStressEvent Severity.NORMAL) period_type=begin event_id=8ff8d446-c1b5-42a1-92e0-b9da689abc4e: node=Node perf-regression-predefined-steps-ub-loader-node-662c4d6b-2 [34.241.172.220 | 10.4.1.69]
< t:2025-02-04 11:18:38,858 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > stress_cmd=cassandra-stress write  cl=ALL n=162500001 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode connectionsPerHost=8 cql3 native -rate threads=150 -col 'size=FIXED(1024) n=FIXED(1)'  -pop seq=162500002..325000002

First WriteTimeoutException at 11:19:04,908

< t:2025-02-04 11:19:04,908 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.69>: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during SIMPLE write query at consistency ALL (3 replica were required but only 2 acknowledged the write)

Compactions and Rebuilding bloom filter are running in time of load.
No errors in the nodes log. Cluster is not overloaded.

Image

Image

One lsa-timing only at 11:51:16.097197 on the perf-regression-predefined-steps-ub-db-node-662c4d6b-3

Feb 04 11:51:16.097197 perf-regression-predefined-steps-ub-db-node-662c4d6b-3 scylla[4975]:  [shard  8:mt2c] lsa-timing - compact took 103998 us, trying to release 2.625 MiB non-preemptibly,
 reserve: {goal: 0, max: 30}, at 0x22bcc8e 0x22bc720 0x22bc6f8 0x3de7c21 0x17b0c52 0x1648e55 0x1fb579d
                                                                                               --------                                                                                               seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::async<row_cache
::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::e
xternal_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, row_cache::do_upda
te<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_u
pdater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#2}, seastar::future<void>::then_impl_nrvo<
seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_ca
che::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attrib
utes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::upda
te(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#2}, seastar::futur
e<void> >(row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::
update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda(seastar::interna
l::promise_base_with_type<void>&&, seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_
updater, replica::memtable&, auto:1, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, auto:1&&, (auto:2&&)...)::{lambda()#2}&, seastar::
future_state<seastar::internal::monostate>&&)#1}, void>
                                                                                               --------                                                                                               seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::f
inally_body<seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memta
ble&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::t
hread_attributes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#3}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#3}, false> >(seastar::future<void>::finally_body<seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}&&)::{lambda()#3}, false>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<seastar::async<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, auto:1, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#2}>(seastar::thread_attributes, auto:1&&, (auto:2&&)...)::{lambda()#3}, false>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
                                                                                               --------                                                                                               seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#3}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#3}, false> >(seastar::future<void>::finally_body<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#3}, false>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<row_cache::do_update<row_cache::update(row_cache::external_updater, replica::memtable&, basic_preemption_source&)::$_0>(row_cache::external_updater, replica::memtable&, auto:1, basic_preemption_source&)::{lambda()#1}::operator()() const::{lambda()#3}, false>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               seastar::internal::coroutine_traits_base<void>::promise_type
                                                                                               --------
                                                                                               N7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_6futureIvE16handle_exceptionIZN7replica20dirty_memory_manager9flush_oneERNS7_13memtable_listEONS7_12flush_permitEE3$_0Qoooooosr3stdE16is_invocable_r_vINS4_IT_EETL0__NSt15__exception_ptr13exception_ptrEEaaeqsr3stdE12tuple_size_vINSt11conditionalIXsr3stdE9is_same_vINS1_18future_stored_typeIJSE_EE4typeENS1_9monostateEEESt5tupleIJEESP_IJSN_EEE4typeEELi0Esr3stdE16is_invocable_r_vIvSH_SJ_Eaaeqsr3stdE12tuple_size_vIST_ELi1Esr3stdE16is_invocable_r_vISE_SH_SJ_Eaagtsr3stdE12tuple_size_vIST_ELi1Esr3stdE16is_invocable_r_vIST_SH_SJ_EEES5_OSE_EUlSU_E_ZNS5_17then_wrapped_nrvoIS5_SV_EENS_8futurizeISE_E4typeEOT0_EUlOS3_RSV_ONS_12future_stateISO_EEE_vEE
                                                                                               --------
                                                                                               N7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_6futureIvE16handle_exceptionIZZZZN7replica20dirty_memory_manager17flush_when_neededEvENK3$_0clEvENKUlvE0_clEvENKUlT_E_clINS7_12flush_permitEEEDaSB_EUlNSt15__exception_ptr13exception_ptrEE_Qoooooosr3stdE16is_invocable_r_vINS4_ISB_EETL0__SG_Eaaeqsr3stdE12tuple_size_vINSt11conditionalIXsr3stdE9is_same_vINS1_18future_stored_typeIJSB_EE4typeENS1_9monostateEEESt5tupleIJEESQ_IJSO_EEE4typeEELi0Esr3stdE16is_invocable_r_vIvSK_SG_Eaaeqsr3stdE12tuple_size_vISU_ELi1Esr3stdE16is_invocable_r_vISB_SK_SG_Eaagtsr3stdE12tuple_size_vISU_ELi1Esr3stdE16is_invocable_r_vISU_SK_SG_EEES5_OSB_EUlSV_E_ZNS5_17then_wrapped_nrvoIS5_SW_EENS_8futurizeISB_E4typeEOT0_EUlOS3_RSW_ONS_12future_stateISP_EEE_vEE
Feb 04 11:51:16.097208 perf-regression-predefined-steps-ub-db-node-662c4d6b-3 scylla[4975]:  [shard  8:mt2c] lsa-timing - - segments to release: 30
Feb 04 11:51:16.097212 perf-regression-predefined-steps-ub-db-node-662c4d6b-3 scylla[4975]:  [shard  8:mt2c] lsa-timing - - processed 50 regions: reclaimed from 1, compacted 49
Feb 04 11:51:16.097217 perf-regression-predefined-steps-ub-db-node-662c4d6b-3 scylla[4975]:  [shard  8:mt2c] lsa-timing - - evicted memory: 0.501 MiB
Feb 04 11:51:16.097220 perf-regression-predefined-steps-ub-db-node-662c4d6b-3 scylla[4975]:  [shard  8:mt2c] lsa-timing - - compacted segments: 56
Feb 04 11:51:16.097224 perf-regression-predefined-steps-ub-db-node-662c4d6b-3 scylla[4975]:  [shard  8:mt2c] lsa-timing - - compacted memory: 4.415 MiB

First c-s failure at 11:52:52,746

< t:2025-02-04 11:52:52,746 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:CRITICAL > 2025-02-04 11:52:52.731: (CassandraStressLogEvent Severity.CRITICAL) period_type=one-time event_id=a5ae9ee3-1617-4073-89f4-73da80663cf7: type=OperationOnKey regex=Operation x10 on key\(s\) \[ line_number=49187 node=Node perf-regression-predefined-steps-ub-loader-node-662c4d6b-1 [34.252.249.97 | 10.4.2.241]
< t:2025-02-04 11:52:52,746 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:CRITICAL >
 java.io.IOException: Operation x10 on key(s) [4c4f34364f4e314c3231]: Error executing: (WriteTimeoutException): Cassandra timeout during SIMPLE write query at consistency ALL (3 replica were required but only 2 acknowledged the write)

Write timeouts and failures

Image

Write latency less than 3 ms

Image

Same test, that was run in parallel, succeeded: https://argus.scylladb.com/tests/scylla-cluster-tests/61fcd3ee-6299-44c1-b22c-bdb92793e11c

I do not find something suspicious

Impact

Load failed

How frequently does it reproduce?

This is first and only time that the issue happened. This failure was not received in 2024.3 version.

Installation details

Cluster size: 3 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

  • perf-regression-predefined-steps-ub-db-node-662c4d6b-3 (34.242.222.143 | 10.4.1.185) (shards: 14)
  • perf-regression-predefined-steps-ub-db-node-662c4d6b-2 (3.252.164.88 | 10.4.3.174) (shards: 14)
  • perf-regression-predefined-steps-ub-db-node-662c4d6b-1 (34.245.114.22 | 10.4.1.67) (shards: 14)

OS / Image: ami-0cf9c4cdcacdd63c2 (aws: undefined_region)

Test: scylla-enterprise-perf-regression-predefined-throughput-steps-vnodes
Test id: 662c4d6b-2d8e-41d9-ab83-71e847b55403
Test name: scylla-enterprise/perf-regression/scylla-enterprise-perf-regression-predefined-throughput-steps-vnodes
Test method: performance_regression_gradual_grow_throughput.PerformanceRegressionPredefinedStepsTest.test_read_gradual_increase_load
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 662c4d6b-2d8e-41d9-ab83-71e847b55403
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 662c4d6b-2d8e-41d9-ab83-71e847b55403

Logs:

Jenkins job URL
Argus

@juliayakovlev juliayakovlev added the triage/master Looking for assignee label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/master Looking for assignee
Projects
None yet
Development

No branches or pull requests

1 participant