Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler is block if some User-agent are disallowed #39

Open
froozeify opened this issue Dec 27, 2024 · 2 comments
Open

Crawler is block if some User-agent are disallowed #39

froozeify opened this issue Dec 27, 2024 · 2 comments

Comments

@froozeify
Copy link

In my robots.txt I have a directive dedicated to disallow the crawl for some specific User-Agent

User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /core/*.css$
....

The SEO tab explain it's due to the robots.txt configuration
image

If I remove the Bytespider directive the site is well crawled.

Would it possible to have a setting to ignore specific user-agent rules (or even ignore the robots.txt file)

@janreges
Copy link
Owner

Hi @froozeify,

this bug in robots.txt processing with multiple User-Agents in robots.txt has already been fixed in 9c2c989 and will be published in the next version in the next few days.

Also you can use --ignore-robots-txt flag. This is listed in the README.md, but is missing in the documentation website. It will be added when the next version is released.

@froozeify
Copy link
Author

froozeify commented Dec 28, 2024

Thank you @janreges for your answer, do you know if in the gui version they'll be a checkbox ? (Or maybe I didn't found it also 😃)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants