Memory allocation #6

aquac · 2020-12-18T07:42:14Z

The current version of the code seems to try to load the whole file into memory which fails for large files.
E.g. in my case I get the following error:
memory allocation of 5404574964 bytes failed

It would be great if the code would read the file sequentially / piece by piece.

The text was updated successfully, but these errors were encountered:

aquac · 2020-12-18T07:51:09Z

A temporary solution to this and maybe also a nice new feature would to have an option that allows to parse only the first X bytes of a file and check those regarding the regex.

gahag · 2020-12-18T19:58:30Z

The traditional grep doesn't have this exact issue because the regexes are limited to lines, so it reads one line of the file at a time. With bgrep, we shouldn't have such restriction, because the binary pattern might have the equivalent of a line break character, even when not representing actual textual line breaks. If I recall correctly, the Regex crate has no support for arbitrary buffered reading/matching, and that's why bgrep reads the entire file into memory. I believe the only feasible alternative for very large files is using memory maps (I know ripgrep can do this), and I'm willing to support implementing such feature, even though it would be non-trivial, probably requiring some unsafe code.

For handling only the first X bytes of a file, one can combine the head command and bgrep with a pipe:

head -c X my-large-file.bin | bgrep ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory allocation #6

Memory allocation #6

aquac commented Dec 18, 2020

aquac commented Dec 18, 2020

gahag commented Dec 18, 2020 •

edited

Loading

Memory allocation #6

Memory allocation #6

Comments

aquac commented Dec 18, 2020

aquac commented Dec 18, 2020

gahag commented Dec 18, 2020 • edited Loading

gahag commented Dec 18, 2020 •

edited

Loading