Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation #6

Open
aquac opened this issue Dec 18, 2020 · 2 comments
Open

Memory allocation #6

aquac opened this issue Dec 18, 2020 · 2 comments

Comments

@aquac
Copy link

aquac commented Dec 18, 2020

The current version of the code seems to try to load the whole file into memory which fails for large files.
E.g. in my case I get the following error:
memory allocation of 5404574964 bytes failed

It would be great if the code would read the file sequentially / piece by piece.

@aquac
Copy link
Author

aquac commented Dec 18, 2020

A temporary solution to this and maybe also a nice new feature would to have an option that allows to parse only the first X bytes of a file and check those regarding the regex.

@gahag
Copy link
Owner

gahag commented Dec 18, 2020

The traditional grep doesn't have this exact issue because the regexes are limited to lines, so it reads one line of the file at a time. With bgrep, we shouldn't have such restriction, because the binary pattern might have the equivalent of a line break character, even when not representing actual textual line breaks. If I recall correctly, the Regex crate has no support for arbitrary buffered reading/matching, and that's why bgrep reads the entire file into memory. I believe the only feasible alternative for very large files is using memory maps (I know ripgrep can do this), and I'm willing to support implementing such feature, even though it would be non-trivial, probably requiring some unsafe code.

For handling only the first X bytes of a file, one can combine the head command and bgrep with a pipe:

head -c X my-large-file.bin | bgrep ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants