Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RW Separation] Search replica recovery flow breaks when search shard allocated to new node after node drop #17334

Open
vinaykpud opened this issue Feb 12, 2025 · 3 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@vinaykpud
Copy link
Contributor

vinaykpud commented Feb 12, 2025

Is your feature request related to a problem? Please describe

Context:
Created a 5 node cluster and created an index with 1P and 1 Search replica.

ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles      cluster_manager name
172.18.0.3           26          90   5    2.19    1.84     1.69 d         data            -               opensearch-node5
172.18.0.4           23          90   5    2.19    1.84     1.69 -         coordinating    -               opensearch-node1
172.18.0.2           27          90   6    2.19    1.84     1.69 d         data            -               opensearch-node3
172.18.0.5           23          90   5    2.19    1.84     1.69 d         data            -               opensearch-node4
172.18.0.6           30          90   5    2.19    1.84     1.69 m         cluster_manager *               opensearch-node2

Now following is the shard assignment:

index    shard prirep state   docs store ip         node
products 0     p      STARTED    0  230b 172.18.0.3 opensearch-node5
products 0     s      STARTED    0  230b 172.18.0.2 opensearch-node3

simulate a node drop, Since I am running the cluster locally using docker, I stopped the node3.

index    shard prirep state      docs store ip         node
products 0     p      STARTED       0  230b 172.18.0.3 opensearch-node5
products 0     s      UNASSIGNED

After 1 min, AllocationService will try to allocate the search shard to node4 and it will fail with bellow exception

2025-02-09 13:07:53 "stacktrace": ["org.opensearch.indices.recovery.RecoveryFailedException: [products3][0]: Recovery failed on {opensearch-node8}{eHuGysErRFuGUCFO2KxGuw}{Wby_7fTEToWk5bnavKBlbA}{172.18.0.9}{172.18.0.9:9300}{dimr}{zone=zone3, shard_indexing_pressure_enabled=true}",
2025-02-09 13:07:53 "at org.opensearch.index.shard.IndexShard.lambda$executeRecovery$32(IndexShard.java:3902) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.lambda$recoveryListener$10(StoreRecovery.java:618) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:347) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:123) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2919) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:994) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]",
2025-02-09 13:07:53 "at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]",
2025-02-09 13:07:53 "at java.base/java.lang.Thread.run(Thread.java:1575) [?:?]",
2025-02-09 13:07:53 "Caused by: org.opensearch.index.shard.IndexShardRecoveryException: failed to fetch index version after copying it over",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:717) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:125) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:344) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "... 8 more",
2025-02-09 13:07:53 "Caused by: org.opensearch.index.shard.IndexShardRecoveryException: shard allocated for local recovery (post api), should exist, but doesn't, current files: []",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:702) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:125) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:344) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "... 8 more",
2025-02-09 13:07:53 "Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in store(ByteSizeCachingDirectory(HybridDirectory@/usr/share/opensearch/data/nodes/0/indices/PFeBY9eRTaKcRhDKEM2WAQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@52539624)): files: []",
2025-02-09 13:07:53 "at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:808) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]",
2025-02-09 13:07:53 "at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:764) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]",
2025-02-09 13:07:53 "at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:542) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]",
2025-02-09 13:07:53 "at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:526) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]",
2025-02-09 13:07:53 "at org.opensearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:135) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.store.Store.readSegmentsInfo(Store.java:255) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:237) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:692) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:125) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:344) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]",
2025-02-09 13:07:53 "... 8 more"] }

Describe the solution you'd like

This is happens because in ShardRouting

when moveToUnassigned is called, we set the recoverySource as ExistingStoreRecoverySource for the search replica.
Since this scenario involves in recovering the shard in another node, it wont have any files in the local store for recovery and fails with exception. So solution is when its search replica always use EmptyStoreRecoverySource.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@mch2
Copy link
Member

mch2 commented Feb 13, 2025

@vinaykpud This will impact cases where a SR node is restarted, can you check our recovery logic to see if thats the case? Ie. we still want to diff local segments and ensure we only fetch wahts required to recover.

@vinaykpud
Copy link
Contributor Author

Added integ test for reproducing this : d89c1cf

@vinaykpud
Copy link
Contributor Author

vinaykpud commented Feb 14, 2025

@mch2 Yes. If the SR node restarts, we should consider loading the available local files instead of starting with an empty directory. In this case, the node restart causes the search replica to become unassigned. When the node comes back up, the allocator attempts to reassign the search replica to the same node. Since the shard was previously assigned to this node, it should already have the necessary files. Therefore, if any local files exist, we should load them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants