You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
Context:
Created a 5 node cluster and created an index with 1P and 1 Search replica.
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles cluster_manager name
172.18.0.3 26 90 5 2.19 1.84 1.69 d data - opensearch-node5
172.18.0.4 23 90 5 2.19 1.84 1.69 - coordinating - opensearch-node1
172.18.0.2 27 90 6 2.19 1.84 1.69 d data - opensearch-node3
172.18.0.5 23 90 5 2.19 1.84 1.69 d data - opensearch-node4
172.18.0.6 30 90 5 2.19 1.84 1.69 m cluster_manager * opensearch-node2
Now following is the shard assignment:
index shard prirep state docs store ip node
products 0 p STARTED 0 230b 172.18.0.3 opensearch-node5
products 0 s STARTED 0 230b 172.18.0.2 opensearch-node3
simulate a node drop, Since I am running the cluster locally using docker, I stopped the node3.
index shard prirep state docs store ip node
products 0 p STARTED 0 230b 172.18.0.3 opensearch-node5
products 0 s UNASSIGNED
After 1 min, AllocationService will try to allocate the search shard to node4 and it will fail with bellow exception
when moveToUnassigned is called, we set the recoverySource as ExistingStoreRecoverySource for the search replica.
Since this scenario involves in recovering the shard in another node, it wont have any files in the local store for recovery and fails with exception. So solution is when its search replica always use EmptyStoreRecoverySource.
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
@vinaykpud This will impact cases where a SR node is restarted, can you check our recovery logic to see if thats the case? Ie. we still want to diff local segments and ensure we only fetch wahts required to recover.
@mch2 Yes. If the SR node restarts, we should consider loading the available local files instead of starting with an empty directory. In this case, the node restart causes the search replica to become unassigned. When the node comes back up, the allocator attempts to reassign the search replica to the same node. Since the shard was previously assigned to this node, it should already have the necessary files. Therefore, if any local files exist, we should load them.
Is your feature request related to a problem? Please describe
Context:
Created a 5 node cluster and created an index with 1P and 1 Search replica.
Now following is the shard assignment:
simulate a node drop, Since I am running the cluster locally using docker, I stopped the node3.
After 1 min,
AllocationService
will try to allocate the search shard to node4 and it will fail with bellow exceptionDescribe the solution you'd like
This is happens because in
ShardRouting
OpenSearch/server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java
Line 443 in d0a65d3
moveToUnassigned
is called, we set therecoverySource
asExistingStoreRecoverySource
for the search replica.Since this scenario involves in recovering the shard in another node, it wont have any files in the local store for recovery and fails with exception. So solution is when its search replica always use
EmptyStoreRecoverySource
.Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: