Skip to content

Commit

Permalink
add something for w1d2
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Chi <[email protected]>
  • Loading branch information
skyzh committed Jan 20, 2024
1 parent a95d866 commit 5cff2ec
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 2 deletions.
6 changes: 5 additions & 1 deletion mini-lsm-book/src/week1-01-memtable.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ You will also notice that the `MemTable` structure does not have a `delete` inte

In this task, you will need to implement `MemTable::get` and `MemTable::put` to enable modifications of the memtable.

We use the `bytes` crate for storing the data in the memtable. `bytes::Byte` is similar to `Arc<[u8]>`. When you clone the `Bytes`, or get a slice of `Bytes`, the underlying data will not be copied, and therefore cloning it is cheap. Instead, it simply creates a new reference to the storage area and the storage area will be freed when there are no reference to that area.

## Task 2: A Single Memtable in the Engine

In this task, you will need to modify:
Expand Down Expand Up @@ -148,11 +150,13 @@ Now that you have multiple memtables, you may modify your read path `get` functi
* Is it possible to use other data structures as the memtable in LSM? What are the pros/cons of using the skiplist?
* Why do we need a combination of `state` and `state_lock`? Can we only use `state.read()` and `state.write()`?
* Why does the order to store and to probe the memtables matter?
* Is the memory layout of the memtable efficient / does it have good data locality? (Think of how `Byte` is implemented...) What are the possible optimizations to make the memtable more efficient?
* So we are using `parking_lot` locks in this tutorial. Is its read-write lock a fair lock? What might happen to the readers trying to acquire the lock if there is one writer waiting for existing readers to stop?

We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.

## Bonus Tasks

* You may implement other memtable formats. For example, BTree memtable, vector memtable, and ART memtable.
* **More Memtable Formats.** You may implement other memtable formats. For example, BTree memtable, vector memtable, and ART memtable.

{{#include copyright.md}}
21 changes: 20 additions & 1 deletion mini-lsm-book/src/week1-02-merge-iterator.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,27 @@ In this chapter, you will:

## Task 1: Memtable Iterator

self-referential struct

## Task 2: Merge Iterator

## Task 3: Read Path - Scan
error handling, order requirement

## Task 3: LSM Iterator

## Task 4: Read Path - Scan

## Test Your Understanding

* Why do we need a self-referential structure for memtable iterator?
* If we want to get rid of self-referential structure and have a lifetime on the memtable iterator (i.e., `MemtableIterator<'a>`, where `'a` = memtable or `LsmStorageInner` lifetime), is it still possible to implement the `scan` functionality?
* What happens if (1) we create an iterator on the skiplist memtable (2) someone inserts new keys into the memtable (3) will the iterator see the new key?
* Why do we need to ensure the merge iterator returns data in the iterator construction order?
* Is it possible to implement a Rust-style iterator (i.e., `next(&self) -> (Key, Value)`) for LSM iterators? What are the pros/cons?
* The scan interface is like `fn scan(&self, lower: Bound<&[u8]>, upper: Bound<&[u8]>)`. How to make this API compatible with Rust-style range (i.e., `key_a..key_b`)? If you implement this, try to pass a full range `..` to the interface and see what will happen.

## Bonus Task

* **Foreground Iterator.** In this tutorial we assumed that all operations are short, so that we can hold reference to mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB) will stay in the memory even if it has been flushed to disk. To solve this, we can provide a `ForegroundIterator` / `LongIterator` to our user. The iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.

{{#include copyright.md}}

0 comments on commit 5cff2ec

Please sign in to comment.