Crawler is block if some User-agent are disallowed #39

froozeify · 2024-12-27T10:43:57Z

In my robots.txt I have a directive dedicated to disallow the crawl for some specific User-Agent

User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /core/*.css$
....

The SEO tab explain it's due to the robots.txt configuration

If I remove the Bytespider directive the site is well crawled.

Would it possible to have a setting to ignore specific user-agent rules (or even ignore the robots.txt file)

janreges · 2024-12-27T23:01:58Z

Hi @froozeify,

this bug in robots.txt processing with multiple User-Agents in robots.txt has already been fixed in 9c2c989 and will be published in the next version in the next few days.

Also you can use --ignore-robots-txt flag. This is listed in the README.md, but is missing in the documentation website. It will be added when the next version is released.

froozeify · 2024-12-28T13:37:16Z

Thank you @janreges for your answer, do you know if in the gui version they'll be a checkbox ? (Or maybe I didn't found it also 😃)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler is block if some User-agent are disallowed #39

Crawler is block if some User-agent are disallowed #39

froozeify commented Dec 27, 2024

janreges commented Dec 27, 2024

froozeify commented Dec 28, 2024 •

edited

Loading

Crawler is block if some User-agent are disallowed #39

Crawler is block if some User-agent are disallowed #39

Comments

froozeify commented Dec 27, 2024

janreges commented Dec 27, 2024

froozeify commented Dec 28, 2024 • edited Loading

froozeify commented Dec 28, 2024 •

edited

Loading