You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Stagehand for browser automation. While it is an excellent tool for automation, I have encountered a performance bottleneck when crawling sites repeatedly. Here's the current challenge and a request for a potential improvement:
Current Behavior:
Every time I hit a site for crawling, Stagehand generates XPaths by sending screenshots and HTML to LLMs. This process is time-consuming, especially when crawling the same site multiple times in parallel.
Feature Request:
Save XPaths after the first execution:
On the first run, Stagehand should save the successfully executed XPaths in a local cache or database.
These XPaths can then be reused on subsequent runs without needing to reprocess the page.
Fallback for outdated XPaths:
If a saved XPath fails during execution, Stagehand should fallback to the current behavior: sending the screenshot and HTML to the LLMs to generate a new XPath.
The newly generated XPath should replace the outdated one in the cache.
Benefits:
Performance Boost: This approach will significantly reduce the overhead of repeatedly crawling the same site.
Efficiency: Ensures that only failed or outdated XPaths are recalculated, optimizing resource usage.
Durability: Maintains Stagehand's self-healing code generation capabilities without compromising speed.
Suggestion:
It would be great to have an interface or configuration option to enable and manage this feature, such as viewing or clearing cached XPaths when needed.
Thank you for considering this request! It would greatly enhance the usability and efficiency of Stagehand for repetitive crawling workflows.
The text was updated successfully, but these errors were encountered:
We should do a better job documenting this, but we do have a cache already. It's only a cache hit if both the prompt AND the dom are unchanged. Have you tried this yet?
I am using Stagehand for browser automation. While it is an excellent tool for automation, I have encountered a performance bottleneck when crawling sites repeatedly. Here's the current challenge and a request for a potential improvement:
Current Behavior:
Every time I hit a site for crawling, Stagehand generates XPaths by sending screenshots and HTML to LLMs. This process is time-consuming, especially when crawling the same site multiple times in parallel.
Feature Request:
Save XPaths after the first execution:
Fallback for outdated XPaths:
Benefits:
Suggestion:
It would be great to have an interface or configuration option to enable and manage this feature, such as viewing or clearing cached XPaths when needed.
Thank you for considering this request! It would greatly enhance the usability and efficiency of Stagehand for repetitive crawling workflows.
The text was updated successfully, but these errors were encountered: