Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon blocking and retrying failed requests #110

Open
BigDatalex opened this issue Dec 12, 2022 · 0 comments
Open

Amazon blocking and retrying failed requests #110

BigDatalex opened this issue Dec 12, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@BigDatalex
Copy link
Contributor

There are two ways we notice request blocking by amazon:

  1. Returns 200 status code, but the actual content of the page is asking to solve some puzzle, in order to identify as a human.
  2. Returns 503 status code

The first thing usually happens before getting the 503 errors. Within my work of the latest PR #109 I noticed that the blocking is temporarily and not permanently. So after some 503 errors for some start URLs we actually retrieve again 200, that actually include the expected page content.

Within the PR #100 the Retry Middleware was disabled for Amazon, because it interferes with the custom AmazonSchedulerMiddleware. It would be great to add a Retry Middleware back in, which tries the failed requests again at some later point. The easiest might be to only schedule those requests again that returned the 503 errors, the other requests would need some additional processing to check for the unexpected page content.

@BigDatalex BigDatalex added the enhancement New feature or request label Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant