You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a page where extra content is loaded on scrolling down, the 'first' block loaded is captured and displayed in replay. But links in this block aren't crawled. Second block loaded is also captured and displays in replay.
Ran into this on a 42k pages crawl.
Tested to confirm with only this url --scopeType any --depth 1. Adding --postLoadDelay 4 doesn't solve the issue.
Will transfer wacz to info a webrecorder.
The text was updated successfully, but these errors were encountered:
Ah, the link extraction happens before autoscrolling at the moment..
However, the new autoclick behavior might be able to address that, now available in 1.5.0 beta.
I actually ran a crawl this weekend on a site to be archived, resulting in the same issue.
I've tested with v150b2, resulting in a difference. Only the "first view" (without scrolling down and loading extra content) is captured. So: all content captured is clickable. But: not all content that is loaded on the page if scrolled down to the end is captured.
The links are actually nice <a href="..."> ; autoclick is enabled by default so I just have to add --selectLinks 'a[href]->href' ? (command line browsertrix-crawler)
(Work around for now is: manually load complete page, get all links, put them in a seedfile.)
On a page where extra content is loaded on scrolling down, the 'first' block loaded is captured and displayed in replay. But links in this block aren't crawled. Second block loaded is also captured and displays in replay.
Ran into this on a 42k pages crawl.
Tested to confirm with only this url --scopeType any --depth 1. Adding --postLoadDelay 4 doesn't solve the issue.
Will transfer wacz to info a webrecorder.
The text was updated successfully, but these errors were encountered: