You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if this is a feature request or just a request for clarification, but I'm looking for a canonical way to generate a WACZ file from multiple WARC files.
I am dealing some web collections that span multiple WARCs, but should be represented as a single WACZ. From the command line I can get this to work by putting all the WARC files in a single folder and running:
wacz create -o test.wacz -f warcs/*.warc
however, I have failed with multiple attempts to cleanly invoke this from within a java wrapper. I've tried different combinations of different levels of escaping and quoting of parameters, but to no avail. Either way I assume this is relying on either OS or python expansion of the * wildcard, and it's not clear what would and would not be expected to work in terms of wildcards, regex expressions etc.
What I'm looking for ideally is either for the -f parameter to be repeatable (in the way that -i is in ffmpeg) so that each file can be explicitly listed; or to be able to specify a -d parameter to point to a directory explicitly expected to contain multiple warc files. The directory option would probably need to let you specify what file extensions to consider, or should clearly document what happens when non warc content is found in the directory.
The text was updated successfully, but these errors were encountered:
Sorry missed this earlier! The -f warcs/*.warc is relying on shell expansion to fill in the file list. The -f flag works as you are suggesting, it is expecting a list of filenames (relative to current working directory or absolute) after the -f param.
eg. -f warcs/a.warc warcs/b.warc ... warcs/n.warc should work.
I'm not sure if this is a feature request or just a request for clarification, but I'm looking for a canonical way to generate a WACZ file from multiple WARC files.
I am dealing some web collections that span multiple WARCs, but should be represented as a single WACZ. From the command line I can get this to work by putting all the WARC files in a single folder and running:
wacz create -o test.wacz -f warcs/*.warc
however, I have failed with multiple attempts to cleanly invoke this from within a java wrapper. I've tried different combinations of different levels of escaping and quoting of parameters, but to no avail. Either way I assume this is relying on either OS or python expansion of the * wildcard, and it's not clear what would and would not be expected to work in terms of wildcards, regex expressions etc.
What I'm looking for ideally is either for the
-f
parameter to be repeatable (in the way that-i
is in ffmpeg) so that each file can be explicitly listed; or to be able to specify a-d
parameter to point to a directory explicitly expected to contain multiple warc files. The directory option would probably need to let you specify what file extensions to consider, or should clearly document what happens when non warc content is found in the directory.The text was updated successfully, but these errors were encountered: