Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local-file-storage: scalability + race fixes #41

Closed

Conversation

ryanflorence
Copy link
Contributor

@ryanflorence ryanflorence commented Jan 8, 2025

This commit fixes race conditions and potential performance issues

Race conditions:

  • Multiple calls to set caused race conditions with the central metafile index.
  • Not only did this cause the set to fail, but it often corrupted the metafile causing a permanent, unrecoverable error since the metafile could no longer be parsed.

Scalability:

  • The single metadata file would eventually grow large in a successful application and become both a memory issue and performance bottleneck
  • Additionally, storing all files in a single directory would eventually slow down the file system’s access of files in the directory.

Changes:

  • Removes race conditions with atomic reads/writes
    • Removes the metafile and instead relies on SHA-256 hashes for file keys
    • Stores metadata alongside each file as .meta.json instead of central index
    • removes filename collision handling since it’s unnecessary with hash-based names
  • Performance
    • Shards files into subdirectories using first 8 chars of SHA-256 hash to maintain file system performance
    • Without a metafile, it is no longer a memory or performance bottleneck

Should rebase/squash the commits before merging, added the failing test first to prove the bug.

@ryanflorence ryanflorence force-pushed the ryan/local-file-race-conditions branch from 45c1213 to 6281172 Compare January 8, 2025 00:29
This commit fixes race conditions and potential performance issues

## Race conditions:

- Multiple calls to `set` caused race conditions with the central metafile index.
- Not only did this cause the set to fail, but it often corrupted the metafile causing a permanent, unrecoverable error since the metafile could no longer be parsed.

## Scalability:

- The single metadata file would eventually grow large in a successful application and become both a memory issue and performance bottleneck
- Additionally, storing all files in a single directory would eventually slow down the file system’s access of files in the directory.

## Changes:

- Removes race conditions with atomic reads/writes
  - Removes the metafile and instead relies on SHA-256 hashes for file keys
  - Stores metadata alongside each file as .meta.json instead of central index
  - removes filename collision handling since it’s unnecessary with hash-based names
- Performance
  - Shards files into subdirectories using first 8 chars of SHA-256 hash to maintain file system performance
  - Without a metafile, it is no longer a memory or performance bottleneck

Should rebase/squash the commits before merging, added the failing test first to prove the bug.
@ryanflorence ryanflorence force-pushed the ryan/local-file-race-conditions branch from 6281172 to a5b04fd Compare January 8, 2025 00:32
Copy link
Owner

@mjackson mjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm going to switch this up so it uses the web crypto API instead of node:crypto, but other than that it looks great 👍

@mjackson
Copy link
Owner

mjackson commented Jan 8, 2025

Merged in a5b04fd

@mjackson mjackson closed this Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants