Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add XPath Caching and Reuse in Stagehand #404

Open
Amirsohail007 opened this issue Jan 14, 2025 · 1 comment
Open

Request: Add XPath Caching and Reuse in Stagehand #404

Amirsohail007 opened this issue Jan 14, 2025 · 1 comment

Comments

@Amirsohail007
Copy link

I am using Stagehand for browser automation. While it is an excellent tool for automation, I have encountered a performance bottleneck when crawling sites repeatedly. Here's the current challenge and a request for a potential improvement:

Current Behavior:
Every time I hit a site for crawling, Stagehand generates XPaths by sending screenshots and HTML to LLMs. This process is time-consuming, especially when crawling the same site multiple times in parallel.

Feature Request:

  1. Save XPaths after the first execution:

    • On the first run, Stagehand should save the successfully executed XPaths in a local cache or database.
    • These XPaths can then be reused on subsequent runs without needing to reprocess the page.
  2. Fallback for outdated XPaths:

    • If a saved XPath fails during execution, Stagehand should fallback to the current behavior: sending the screenshot and HTML to the LLMs to generate a new XPath.
    • The newly generated XPath should replace the outdated one in the cache.

Benefits:

  • Performance Boost: This approach will significantly reduce the overhead of repeatedly crawling the same site.
  • Efficiency: Ensures that only failed or outdated XPaths are recalculated, optimizing resource usage.
  • Durability: Maintains Stagehand's self-healing code generation capabilities without compromising speed.

Suggestion:
It would be great to have an interface or configuration option to enable and manage this feature, such as viewing or clearing cached XPaths when needed.

Thank you for considering this request! It would greatly enhance the usability and efficiency of Stagehand for repetitive crawling workflows.

@kamath
Copy link
Contributor

kamath commented Jan 18, 2025

Thanks so much for the detailed write-up!

We should do a better job documenting this, but we do have a cache already. It's only a cache hit if both the prompt AND the dom are unchanged. Have you tried this yet?

For more details, check out #255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants