Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparse file support #35

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add sparse file support #35

wants to merge 5 commits into from

Conversation

zwhitchcox
Copy link

@zwhitchcox zwhitchcox commented Dec 7, 2024

Add support for sparse files using old gnu format

This is the current default for gnu tar generated tar files, including extended headers.

Also supports raw mode, which will read the sparse file map and only emit chunks actually containing data corresponding to the sparse map. This is useful to save memory while streaming a sparse file.

* add test cases for multiple extended headers
* fix bug where 80+ extended headers would crash tar
Copy link
Owner

@mjackson mjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @zwhitchcox! I'd like to see a few small changes before merging.

});
```

If you prefer the raw data chunks as they appear in the archive (without reconstructing zeros), you can call `entry.bytes({ raw: true })`:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "without reconstructing zeros" do you mean that the resulting byte array will have zeroes in it? That's just spacer data, right? Forgive my ignorance, but when would that ever be useful?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default behavior of tar.

Basically, if someone sends you a sparse tarball, you don't have to do anything differently if you just pipe to a write stream for the file. So, for someone with no knowledge of sparse files, it "just work".

However, if you're a more advanced user, you can get the sparse offset's and lengths, and use fs.write(fd, buffer, offset, length), which will be more efficient, because you're not writing the extra data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants