-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[store] Introduce ChainStoreUpdateAdapter and EpochStoreUpdateAdapter and use in epoch_sync #12866
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,11 +19,11 @@ use near_primitives::utils::{get_block_shard_id, get_outcome_id_block_hash, inde | |
use near_primitives::views::LightClientBlockView; | ||
|
||
use crate::{ | ||
get_genesis_height, DBCol, Store, CHUNK_TAIL_KEY, FINAL_HEAD_KEY, FORK_TAIL_KEY, | ||
get_genesis_height, DBCol, Store, StoreUpdate, CHUNK_TAIL_KEY, FINAL_HEAD_KEY, FORK_TAIL_KEY, | ||
HEADER_HEAD_KEY, HEAD_KEY, LARGEST_TARGET_HEIGHT_KEY, TAIL_KEY, | ||
}; | ||
|
||
use super::StoreAdapter; | ||
use super::{StoreAdapter, StoreUpdateAdapter, StoreUpdateHolder}; | ||
|
||
#[derive(Clone)] | ||
pub struct ChainStoreAdapter { | ||
|
@@ -46,6 +46,12 @@ impl ChainStoreAdapter { | |
Self { store, genesis_height } | ||
} | ||
|
||
pub fn store_update(&self) -> ChainStoreUpdateAdapter<'static> { | ||
ChainStoreUpdateAdapter { | ||
store_update: StoreUpdateHolder::Owned(self.store.store_update()), | ||
} | ||
} | ||
|
||
/// The chain head. | ||
pub fn head(&self) -> Result<Tip, Error> { | ||
option_to_not_found(self.store.get_ser(DBCol::BlockMisc, HEAD_KEY), "HEAD") | ||
|
@@ -390,6 +396,95 @@ impl ChainStoreAdapter { | |
} | ||
} | ||
|
||
pub struct ChainStoreUpdateAdapter<'a> { | ||
store_update: StoreUpdateHolder<'a>, | ||
} | ||
|
||
impl Into<StoreUpdate> for ChainStoreUpdateAdapter<'static> { | ||
fn into(self) -> StoreUpdate { | ||
self.store_update.into() | ||
} | ||
} | ||
|
||
impl ChainStoreUpdateAdapter<'static> { | ||
pub fn commit(self) -> io::Result<()> { | ||
let store_update: StoreUpdate = self.into(); | ||
store_update.commit() | ||
} | ||
} | ||
|
||
impl<'a> StoreUpdateAdapter for ChainStoreUpdateAdapter<'a> { | ||
fn store_update(&mut self) -> &mut StoreUpdate { | ||
&mut self.store_update | ||
} | ||
} | ||
|
||
impl<'a> ChainStoreUpdateAdapter<'a> { | ||
pub fn new(store_update: &'a mut StoreUpdate) -> Self { | ||
Self { store_update: StoreUpdateHolder::Reference(store_update) } | ||
} | ||
|
||
/// USE THIS FUNCTION WITH CARE; proceed only if you know what you're doing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: down the road nobody is going to "know what they're doing". It will not be clear what "use with care" means. I suggest to just warn specifically what things can go wrong if you use it directly. (You already mentioned one, but maybe there are more and maybe what you mentioned can be explained a bit more.) |
||
/// Typically while saving the block header we would also like to update | ||
/// block_header_hashes_by_height and update block_merkle_tree | ||
pub fn set_block_header_only(&mut self, header: &BlockHeader) { | ||
self.store_update.insert_ser(DBCol::BlockHeader, header.hash().as_ref(), header).unwrap(); | ||
} | ||
|
||
/// USE THIS FUNCTION WITH CARE; proceed only if you know what you're doing. | ||
/// Typically block_header_hashes_by_height is saved while saving the block header | ||
pub fn set_block_header_hashes_by_height( | ||
&mut self, | ||
height: BlockHeight, | ||
hash_set: &HashSet<CryptoHash>, | ||
) { | ||
self.store_update | ||
.set_ser(DBCol::HeaderHashesByHeight, &index_to_bytes(height), hash_set) | ||
.unwrap(); | ||
} | ||
|
||
/// USE THIS FUNCTION WITH CARE; proceed only if you know what you're doing. | ||
/// Typically block_merkle_tree is saved while saving the block header | ||
pub fn set_block_merkle_tree( | ||
&mut self, | ||
block_hash: &CryptoHash, | ||
block_merkle_tree: &PartialMerkleTree, | ||
) { | ||
self.store_update | ||
.set_ser(DBCol::BlockMerkleTree, block_hash.as_ref(), block_merkle_tree) | ||
.unwrap(); | ||
} | ||
|
||
pub fn set_block_ordinal(&mut self, block_ordinal: NumBlocks, block_hash: &CryptoHash) { | ||
self.store_update | ||
.set_ser(DBCol::BlockOrdinal, &index_to_bytes(block_ordinal), block_hash) | ||
.unwrap(); | ||
} | ||
|
||
pub fn set_block_height(&mut self, hash: &CryptoHash, height: BlockHeight) { | ||
self.store_update | ||
.set_ser(DBCol::BlockHeight, &borsh::to_vec(&height).unwrap(), hash) | ||
.unwrap(); | ||
} | ||
|
||
pub fn set_header_head(&mut self, header_head: &Tip) { | ||
self.store_update.set_ser(DBCol::BlockMisc, HEADER_HEAD_KEY, header_head).unwrap(); | ||
} | ||
|
||
pub fn set_final_head(&mut self, final_head: &Tip) { | ||
self.store_update.set_ser(DBCol::BlockMisc, FINAL_HEAD_KEY, final_head).unwrap(); | ||
} | ||
|
||
/// This function is normally clubbed with set_block_header_only | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then we can just unite both methods. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I was thinking of doing that but thought of leaving the primitives. I had plans in the future to integrate functions like these into a helper module like, I can follow up in a different PR |
||
pub fn update_block_header_hashes_by_height(&mut self, header: &BlockHeader) { | ||
let height = header.height(); | ||
let mut hash_set = | ||
self.store_update.store.chain_store().get_all_header_hashes_by_height(height).unwrap(); | ||
hash_set.insert(*header.hash()); | ||
self.set_block_header_hashes_by_height(height, &hash_set); | ||
} | ||
} | ||
|
||
fn option_to_not_found<T, F>(res: io::Result<Option<T>>, field_name: F) -> Result<T, Error> | ||
where | ||
F: std::string::ToString, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the change above looks like a simple rewrite, it is definitely NOT trivial!! There are couple of things happening here that are different.
We are no longer using chain_store_update but instead using the standard store_update.
Originally we relied on functions like
save_block_header_no_update_tree
,force_save_header_head
andsave_final_head
in chain store update along withupdate.commit()?
which internally callsupdate.finalize()
.(!!) With the new code, we are directly going and making changes to the store layer instead of relying on the weird crazy behavior of
update.finalize()
We also need to save the new blocks into store, i.e. call
chain_store_update.commit()?;
before actually reading from the store right below that as we are no longer relying on the non-committed, temp to store values inupdate.get_block_header(...)
part of the code.I had to print out all the DB ops from the old code and the new code and compare them to check if we are doing things properly. This is tested well manually.
One tiny thing to be noted is that as part of the finalize behavior in the old chain store update code, we were calling
crate::state_sync::update_sync_hashes(...)
on all the blocks we were updating, but this isn't required as part of epoch sync. cc @marcelo-gonzalez.One of those crazy things that's a byproduct of using the heavy hammer of chain store update... :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why? We likely want to sync state just after we finished epoch sync, so we need to make sure
update_sync_hashes
is called for some blocks, I just don't understand which ones.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the original code in debug mode and we are calling this function below for
last_header
Here,
update_sync_hashes
only triggers at the epoch boundary, which happens in epoch sync atlast_header
, i.e.proof.current_epoch.first_block_header_in_epoch
.Here, the key we are adding is
DBCol::StateSyncNewChunks
and the description says something like thisMakes me believe it's not super important and would eventually be updated with the next block processing as well with a different code path.
However, even after all this, in case further convincing is required, after epoch sync, we anyway do block sync for the current epoch so it anyway doesn't make sense to think about state_sync related things for that epoch. state_sync related things would get reset with the next epoch processing.
@marcelo-gonzalez could you please check if this argument makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to reiterate, even though this code path is different, I manually tested this by looking into the actual store_update db ops and the only difference was one entry, that of
DBCol::StateSyncNewChunks
that I described above.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so in the previous code where we call save_block_header_no_update_tree on those headers in epoch sync, we first call it on
proof.current_epoch.first_block_header_in_epoch
, and like the name says, it's the first block in an epoch. Soupdate_sync_hashes()
will just set all zeros in the new chunks column for that block hash since it sees it's the first block in an epoch.Then every other block header we're calling
save_block_header_no_update_tree()
on is in the previous epoch, and none of the other headers in that epoch have been saved before all this. This has us return early fromupdate_sync_hashes()
without doing anything bc of this check here. Note the comment about epoch sync thereSo tldr, only
proof.current_epoch.first_block_header_in_epoch
actually does something when we callupdate_sync_hashes()
and the others can safely be skipped. Actually there are several other comments in that file about epoch sync possibly meaning that some stuff has not been processed befeore. It would def simplify things if we could make this assumption: ifupdate_sync_hashes()
is called on some header, then if its prev header in the same epoch, then it has already been called previously on that prev header. Right now if that isn't the case we just silently return and assume it's bc of epoch sync, but it would be good to be able to WARN in that case that somethng is wrongThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, this PR does make a behavior change where
update_sync_hashes()
will no longer be called onproof.current_epoch.first_block_header_in_epoch
, but it's ok because that block is not going to be in the current epoch, so we header sync until we get to the current epoch and thenupdate_sync_hashes()
works as normal when we get to the new epochThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ Yes, that was my reasoning and wanted your confirmation!