py-wacz: Add a way to list pages in the WACZ #1

ikreymer · 2021-02-04T02:52:52Z

Something that could help with testing browsertrix-crawler is a way to list pages in the WACZ, maybe something like:

> wacz extract pages --url
https://example.com/

Not sure if its generally useful, outside of testing, but may be? Could allow listing just URL, or all the other fields.
Of course, easy enough to just unzip pages/pages.jsonl so not sure if we want this.
Just opening this to keep track.

(Extracting URLs from index could be another feature later)

The text was updated successfully, but these errors were encountered:

ikreymer transferred this issue from webrecorder/specs Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py-wacz: Add a way to list pages in the WACZ #1

py-wacz: Add a way to list pages in the WACZ #1

ikreymer commented Feb 4, 2021

py-wacz: Add a way to list pages in the WACZ #1

py-wacz: Add a way to list pages in the WACZ #1

Comments

ikreymer commented Feb 4, 2021