Don't wait for remaining speculative retries if plan is exhausted and there is no running fibers #1177

muzarski · 2025-01-22T20:04:23Z

This issue is related to speculative_execution::execute function - especially this part in current version of the driver:

scylla-rust-driver/scylla/src/policies/speculative_execution.rs

Lines 201 to 214 in 4a9367c

    
           res = async_tasks.select_next_some() => { 
        
               if let Some(r) = res { 
        
                   if !can_be_ignored(&r) { 
        
                       return r; 
        
                   } else { 
        
                       last_error = Some(r) 
        
                   } 
        
               } 
        
               if async_tasks.is_empty() && retries_remaining == 0 { 
        
                   return last_error.unwrap_or({ 
        
                       Err(EMPTY_PLAN_ERROR) 
        
                   }); 
        
               } 
        
           }

There is a small issue with this code. If fiber returns None (i.e., plan is exhausted) there still may be some remaining retries. The driver will wait for remaining retries. It is expected behaviour for most scenarios - there may be some other fibers still running, thus we do not return from this function prematurely. However, we could safely return from the function if both of the following conditions are met:

fiber returns None -> plan is empty/exhausted
there are no other running fibers in the meantime. If there are some, we should wait for their completion.

Second condition could be checked by introducing additional variable that keeps track of number of currently running fibers. From our experience, speculative_execution::execute's logic seems to be error-prone. This is why, in addition to the change mentioned in the issue, we should think of a way to rewrite the logic in a safer manner.

The text was updated successfully, but these errors were encountered:

Lorak-mmk · 2025-01-23T10:45:40Z

Second condition could be checked by introducing additional variable that keeps track of number of currently running fibers.

No need for new variable, this is just async_tasks.len()

If we do this, a bit more tricky scenario is where we first exhaust the plan, but some executions are still pending.
In such case:

We should not start new executions
When running executions finish we should return.

I think all of this can be achieved just by setting retries_remaining to 0.

            res = async_tasks.select_next_some() => {
                if let Some(r) = res {
                    if !can_be_ignored(&r) {
                        return r;
                    } else {
                        last_error = Some(r)
                    }
                } else {
                    retries_remaining = 0
                }
                if async_tasks.is_empty() && retries_remaining == 0 {
                    return last_error.unwrap_or({
                        Err(EMPTY_PLAN_ERROR)
                    });
                }
            }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't wait for remaining speculative retries if plan is exhausted and there is no running fibers #1177

Don't wait for remaining speculative retries if plan is exhausted and there is no running fibers #1177

muzarski commented Jan 22, 2025

Lorak-mmk commented Jan 23, 2025

Don't wait for remaining speculative retries if plan is exhausted and there is no running fibers #1177

Don't wait for remaining speculative retries if plan is exhausted and there is no running fibers #1177

Comments

muzarski commented Jan 22, 2025

Lorak-mmk commented Jan 23, 2025