Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ForecastSolar can't recover when API rate limit is hit #106771

Closed
hmmbob opened this issue Dec 31, 2023 · 37 comments
Closed

ForecastSolar can't recover when API rate limit is hit #106771

hmmbob opened this issue Dec 31, 2023 · 37 comments

Comments

@hmmbob
Copy link
Contributor

hmmbob commented Dec 31, 2023

The problem

Been rebooting my systems quite some times, and apparently I've been rate-limited by ForecastSolar. Those errors are filling up my log now 😄 Hits about every 90 seconds, it appears.

What version of Home Assistant Core has the issue?

core-2024.1.0b2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

Forecast Solar

Link to integration documentation on our website

https://www.home-assistant.io/integrations/forecast_solar/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-12-31 12:18:18.272 ERROR (MainThread) [homeassistant.components.forecast_solar] Unexpected error fetching forecast_solar data: Rate limit for API calls reached. (error 429)
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 300, in _async_refresh
self.data = await self._async_update_data()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/forecast_solar/coordinator.py", line 67, in _async_update_data
return await self.forecast.estimate()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/forecast_solar/__init__.py", line 156, in estimate
data = await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/forecast_solar/__init__.py", line 125, in _request
raise ForecastSolarRatelimit(data["message"])
forecast_solar.exceptions.ForecastSolarRatelimit: Rate limit for API calls reached. (error 429)

Additional information

No response

@home-assistant
Copy link

Hey there @klaasnicolaas, @frenck, mind taking a look at this issue as it has been labeled with an integration (forecast_solar) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of forecast_solar can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign forecast_solar Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


forecast_solar documentation
forecast_solar source
(message by IssueLinks)

@hmmbob hmmbob changed the title ForecastSolar throws stack race when API rate limit is hit ForecastSolar throws trace when API rate limit is hit Dec 31, 2023
@klaasnicolaas
Copy link
Member

So what is the issue? 🤷🏻

@hmmbob
Copy link
Contributor Author

hmmbob commented Dec 31, 2023

As discussed over Discord, it would be great if this error is caught and just prints a single error/line in the log

@frenck frenck added this to the 2024.1.0 milestone Jan 2, 2024
@havlejan
Copy link

Have a same issue after update home assistant to version 2024.1.5. Updated on Saturday 20.1. evening. Since this time the Forecast.Solar integration has an error.

@jgibson02
Copy link

I have also been running into this error over the past week, I've tried disabling the integration for at least a day and re-enabling it twice and it still encounters this error. I also tried signing up for a trial for the Personal tier license key and adding that to my integration but that didn't work.

@klaasnicolaas klaasnicolaas removed this from the 2024.1.0 milestone Jan 24, 2024
@klaasnicolaas
Copy link
Member

I did some research to get more clarity about where things go wrong and especially how the problem can persist for days.

By default, if you make more than 12 requests to the API in an hour, you will get a rate limit error and this can be caused, for example, by many restarts of Home Assistant. But what I noticed is that with a rate limit exception, an error appears in the logs every minute, so it may try to execute a request to the API every minute.

As a result, the reset time is always pushed forward (rolling reset time), which puts you in a limbo and the problem does not solve itself after waiting 1 hour and can last for days.

@asciidisco
Copy link

I'm running into the problem as well, with an anonymous account (so no API Key); from the Code it's quite clear that it should only updaten hourly when no API-Key is set. I believe the problem for me (and maybe for others as well) is, that if the API call fails, then Home Assistant tries to re-initialize the Integration after 1 Minute (or 90 seconds, which I believe is more correct), then receives the same error & after 1 Minute tries again, and again, and again... ...pushing the next possible working call further into the future.

It would be probably be best, going full circle with the issue, to not let the Integration go into an erronious state with this error, so that the Integration itself can handle the interval, without being re-initialized all over.

@chinezbrun

This comment was marked as duplicate.

@klaasnicolaas
Copy link
Member

Just stating that "you also have the issue" does not help solve the problem and only pollutes the thread, so please don't do that. If you would like to stay informed, you will find a subscribe button on the right of the sidebar and you will receive notifications 😉

./Klaas

@EinSchwerd
Copy link

I am encountering the same issue. I tried disabling the integration for 12 hours (overnight) to ensure it did not exceed the API call limit. I re-enabled it, and the same issue occurred immediately on the first API call as the integration was starting.

@iancg
Copy link

iancg commented Feb 9, 2024

I am encountering the same issue. I tried disabling the integration for 12 hours (overnight) to ensure it did not exceed the API call limit. I re-enabled it, and the same issue occurred immediately on the first API call as the integration was starting.

I've also had the same, I wonder if either failed accesses are counted against you as a tally (e.g. I had this happening for 24+ hours before I noticed, so I would have accumulated 3 * 24 * 40 = 2880 rejected requests, which at 12 calls per IP per hour is going to take 10 days to clear ;-( Equally it could just be a bug in the rate limiting at Forecast.Solar.

@K-Ko
Copy link

K-Ko commented Feb 9, 2024

Knut here, possibly it would be a way to check responses with HTTP code 429 for retry at

image

or the headers

image

This should be stored somewhere and checked before next call.
(This does not even have to be deleted, as every call in the far future will always be after this timestamp)

The zone holds with IP ... or API key ... the reason/scope, e.g. for logging.

@iancg
Copy link

iancg commented Feb 9, 2024

https://github.com/home-assistant-libs/forecast_solar/blob/master/forecast_solar/exceptions.py shows that the ForecastSolarRatelimit exception being thrown includes reset_at

Looking at https://github.com/home-assistant/core/blob/dev/homeassistant/components/forecast_solar/coordinator.py around line 67, it needs to catch ForecastSolarRatelimit and adjust the time at which the next retry can be done.

Further looking at https://github.com/home-assistant/core/blob/dev/homeassistant/helpers/update_coordinator.py I can see that there is update_interval which controls the frequency of the polls, but next_refresh isn't available so I can't see how to make the update coordinator delay until the desired time - setting the update interval only affects the refresh after the next.

@hmmbob hmmbob changed the title ForecastSolar throws trace when API rate limit is hit ForecastSolar can't recover when API rate limit is hit Feb 9, 2024
@iancg
Copy link

iancg commented Feb 9, 2024

Maybe something like: 4364b17 (totally untested) might work?

I've tried but failed to get my local ha to load the revised code as a custom component. Looks like I may need to set up a proper ha dev env to try this (I've only ever made very minor changes to HACS installed custom components before).

@Dutchy-79
Copy link

Same here,

Logger: homeassistant.components.forecast_solar
Source: helpers/update_coordinator.py:313
Integration: Forecast.Solar (documentation, issues)
First occurred: February 10, 2024 at 10:55:29 (4544 occurrences)
Last logged: 12:12:20

Unexpected error fetching forecast_solar data: Rate limit for API calls reached. (error 429)
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 313, in _async_refresh
self.data = await self._async_update_data()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/forecast_solar/coordinator.py", line 67, in _async_update_data
return await self.forecast.estimate()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/forecast_solar/init.py", line 156, in estimate
data = await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/forecast_solar/init.py", line 125, in _request
raise ForecastSolarRatelimit(data["message"])
forecast_solar.exceptions.ForecastSolarRatelimit: Rate limit for API calls reached. (error 429)

Not sure what you need from me to help you solve this.

@ol3k
Copy link

ol3k commented Feb 15, 2024

I am encountering the same issue. I tried disabling the integration for 12 hours (overnight) to ensure it did not exceed the API call limit. I re-enabled it, and the same issue occurred immediately on the first API call as the integration was starting.

I've also had the same, I wonder if either failed accesses are counted against you as a tally (e.g. I had this happening for 24+ hours before I noticed, so I would have accumulated 3 * 24 * 40 = 2880 rejected requests, which at 12 calls per IP per hour is going to take 10 days to clear ;-( Equally it could just be a bug in the rate limiting at Forecast.Solar.

I hit the limiting too. Because of the internal integration reloads, it never recovered.
I cannot confirm your calculation because my test calls showed that a rate-limiting would end in about 1–2 hours in the future.

I got it back to work:

  1. I disabled the integration to disable the API calls.
  2. Test calls: With an example API call, you can check when you are allowed to call again:
curl -v -H 'Accept: text/csv' 'https://api.forecast.solar/estimate/watthours/day/52/12/37/0/5.67'

As already mentioned above you will get details when you are allowed to try next, what are the limits and the current calls which are registered:

- x-ratelimit-period
- x-ratelimit-limit
- x-retry-at
  1. when you are able to get information with the curl call
2024-02-15;6707
2024-02-16;12691
  1. enable integration: I enabled it one by one because of several planes/strings. Just to make sure, re-enabling everything at once could possibly trigger the limit again.
  2. Integration is working again.

Anyway: The RatelimitException (HTTP Codes 429) should not be treated as an integration fault. Maybe just log an unsuccessful call and a hint for old data, but without reloading or restarting a new attempt.

For now, this behavior leads to never ending reloads and API limiting.

@klaasnicolaas
Copy link
Member

An idea to solve this is to adjust the update_interval to a time delta when receiving this exception, which after the reset restores the update_interval to the old situation upon a successful API call.

We have already done some testing with this, but ran into some problems with the coordinator and have not yet been able to figure it out / solve it.

@menloperk
Copy link

menloperk commented Feb 21, 2024

Not sure if it's same error but this integration is not working anymore for some time now!
This is in the logs:
image

Update: disabling the integration for at least an hour and then reenabling makes the integration work again. So indeed it seems it just can't recover when API limit is hit.

@AIR-Force007
Copy link

AIR-Force007 commented Feb 26, 2024

I have ran into the same senario as above, in my case the error was quite quick to resolve.

  1. Disable the solar integration,
  2. The command: curl -v -H 'Accept: text/csv' 'https://api.forecast.solar/estimate/watthours/day/52/12/37/0/5.67'
  3. I noticed that it is restricting the data to the IP address of the WAN side.
  4. so I just reset the internet connection to get a new public address, enabled the integration and all was well.
  5. for static ip addresses it may be much more difficult to get a new IP, or you can make use of a VPN to change the device`s out IP or just block the device for a hour or so, but this step is a bit annoying to do.
    I do agree to have a waiting period active with the integration or check past requests and not request more than required. that will help to minimise the requests as mentioned above.

@thitcher
Copy link

Not sure if it's same error but this integration is not working anymore for some time now! This is in the logs: image

Update: disabling the integration for at least an hour and then reenabling makes the integration work again. So indeed it seems it just can't recover when API limit is hit.

Same here,
blocked HA for one hour from internet, reconnected again...voila!

@bertybassett
Copy link

bertybassett commented Feb 26, 2024

brand new install was working for 25 minutes then I made a single change to the configuration and now I get 429 errors.

Point to note I made no reboots throughout so why should I hit the API limit?

will block from internet for an hour but that doesn't seem right.

@thitcher
Copy link

After I reconnected to the internet, I waited another 1-2 hours (for security reasons) and then changed a few values, that worked

@jwdeboer
Copy link

This is the 'formal' response from forecast.solar

In such cases you should

  • Deactivate/uninstall the integration
  • Wait at least 60 minutes
  • Check your parameters and preferably via direct API call (e.g. via curl)
  • If the direct API call works, reactivate/reinstall the integration

https://doc.forecast.solar/facing429

@ciaocibai
Copy link

brand new install was working for 25 minutes then I made a single change to the configuration and now I get 429 errors.

Point to note I made no reboots throughout so why should I hit the API limit?

will block from internet for an hour but that doesn't seem right.

I just did a brand new install and was given the same error from the get go, no idea why that would be. I've reinstalled the plugin, waited one hour and still the same issue.

@jwdeboer
Copy link

jwdeboer commented Mar 9, 2024

I just did a brand new install and was given the same error from the get go, no idea why that would be. I've reinstalled the plugin, waited one hour and still the same issue.

Try (from desktop or phone etc) this api: https://api.forecast.solar/estimate/watthours/day/52/12/37/0/5.67

It should get you a response message 429 and in the bottom you see something that states the timestamp you should try again.

Ensure you do not make any api call until that moment.

After that moment, try again via the browser, when you get a response message including forecast numbers, you should be good to go again and can enable the integration again.

Everytime you do an API call during your block windows, the block will be extended.

@bj00rn
Copy link

bj00rn commented Mar 13, 2024

Same issue here, maybe a fix can be implemented.

I think that the problem is related to coordinator.async_config_entry_first_refresh() called during intergation setup.
When ConfigEntryNotReady is raised in coordinator.async_config_entry_first_refresh() HA automatically raises ConfigEntryNotReady and reschedules a reload which in turn extends the rate limit.

    async def async_config_entry_first_refresh(self) -> None:
        """Refresh data for the first time when a config entry is setup.

        Will automatically raise ConfigEntryNotReady if the refresh
        fails. Additionally logging is handled by config entry setup
        to ensure that multiple retries do not cause log spam.
        """
        await self._async_refresh(
            log_failures=False, raise_on_auth_failed=True, raise_on_entry_error=True
        )
        if self.last_update_success:
            return
        ex = ConfigEntryNotReady()
        ex.__cause__ = self.last_exception
        raise ex

During setup the integration should not retry setup (or delay retry, if possible) if status code 429 is received?

Something in the ways of

try:
  await coordinator.async_config_entry_first_refresh()
except ConfigEntryNotReady as e:
  if isinstance(e.__cause__, ForecastSolarRatelimit):
      pass # suppress ratelimit exception during setup
  else:
      raise # raise any other errors

@klaasnicolaas i did a quick test branch at my fork: dev...bj00rn:core:fix-rate-limit-error-in-setup
WorksOnMyMachine(TM). The downside is that model cannot safely be derived from datacoordinator anymore if we are rate limited during setup. I can do a PR if this seems like a viable solution.

Cheers

@klaasnicolaas
Copy link
Member

This won't fix it, especially for users who have previously set up the integration but simply made too many requests. The problem lies in the updateCoordinator and the function that retrieves the data.

./Klaas

@bj00rn
Copy link

bj00rn commented Mar 14, 2024

This won't fix it, especially for users who have previously set up the integration but simply made too many requests. The problem lies in the updateCoordinator and the function that retrieves the data.

./Klaas

@klaasnicolaas Are you sure? When i say setup i mean when async_setup_entry is called, not configuration in the config flow.

I think this will actually fix the problem (or at least a critical part of it).
For me the problem arises on frequent reboots of HA. I already have three instances of the integration configured, one for each PV string which amplifies the problem.

Maybe my understanding of how integration setup works is incorrect, but here it goes:

  1. When HA is booted (or integration is reloaded manually) async_setup_entry is called in which you call coordinator.async_config_entry_first_refresh.
  2. If an exception is raised in async_config_entry_first_refresh the exception ConfigEntryNotReady is raised by the coordinator base class.
  3. HA will then automatically schedule a retry of async_setup_entry (seems that the delay is 80-ish seconds here, with some kind of random seed).
  4. When the retry happens the rate limit will be extended, another ConfigEntryNotReady will be raised by the next async_setup_entry and we are stuck in a loop where the rate limit is extended every 80 seconds.
image

But maybe I am missing something here? . Is there more to this issue?

Suppressing the RateLimitError (all other errors will still be raised, authentication etc) in async_setup_entry will let the integration setup correctly on reload and not be stuck in an endless retry cycle. Even for public accounts, I think the default update_interval of 1h should eventually allow the rate limit to clear as 12 calls are allowed per hour, but only IF the integration has been allowed to setup correctly.

https://developers.home-assistant.io/docs/config_entries_index/#setting-up-an-entry

During startup, Home Assistant first calls the normal component setup, and then call the method async_setup_entry(hass, entry) for each entry. If a new Config Entry is created at runtime, Home Assistant will also call async_setup_entry(hass, entry) (example)"

https://developers.home-assistant.io/docs/integration_setup_failures/#integrations-using-async_setup_entry

Raise the ConfigEntryNotReady exception from async_setup_entry in the integration's init.py, and Home Assistant will automatically take care of retrying set up later. To avoid doubt, raising ConfigEntryNotReady in a platform's async_setup_entry is ineffective because it is too late to be caught by the config entry setup.

@K-Ko
Copy link

K-Ko commented Mar 14, 2024

As I said here, I think (independent of concrete implementation because not familiar with HA) about an abstract logic during integration installation, boot up, normal run mode etc.

  • Integration installation comes e.g. with a default .retry-at flag file with 1970-01-01 00:00:00 in it
  • On API fetch, the actual system timestamp is checked against "retry at" time in the flag file
    • if "now" is after "retry at", fetch
    • if not, just skip
  • If then at some point in time a 429 response comes up, the header "retry at" or response body "retry at" will be written to the flag file and is thus simply observed at the next "run".

Then the integration can work as now, if it runs for days/weeks fine, the (last) "retry at" is (far) before "now" and it runs smoothly :-)

@bj00rn
Copy link

bj00rn commented Mar 14, 2024

As I said here, I think (independent of concrete implementation because not familiar with HA) about an abstract logic during integration installation, boot up, normal run mode etc.

  • Integration installation comes e.g. with a default .retry-at flag file with 1970-01-01 00:00:00 in it

  • On API fetch, the actual system timestamp is checked against "retry at" time in the flag file

    • if "now" is after "retry at", fetch
    • if not, just skip
  • If then at some point in time a 429 response comes up, the header "retry at" or response body "retry at" will be written to the flag file and is thus simply observed at the next "run".

Then the integration can work as now, if it runs for days/weeks fine, the (last) "retry at" is (far) before "now" and it runs smoothly :-)

The problem as I see it is purely related to Home Assistant integration setup logic. The integration never gets a chance to setup if it is rate limited from startup (startup being when the integration is reloaded, either due to a reboot, integration added/reloaded) as there is no documented way of postponing/setting the retry of integration setup. Retries will be fired every 80 seconds indefinitely. On my server the integration has made ~100k requests over the last month due to this problem.

Once the integration has setup correctly it should be possible to implement logic to observe the response and delay any further requests if rate limit is reached. Any such logic probably won't be necessary though, since even the public still api supports 12 requests/hour per IP. The default delay in the integration is 1h so there should be no major problems with rate limit being hit.

The only exceptions I can think of would probably be when you are using a public account and

  • you have 10+ instances of the integration
  • repeatedly reload the integration either manually or by restarting home assistant

@K-Ko
Copy link

K-Ko commented Mar 18, 2024

There is another finding on my side.

Independent from a 429 during setup. which is not recognized, also return code 400 leads to an endless loop!

Here is one example of calls with an invalid location, results in a 400, but bombs the API with requests each 80 sec. :-( ...

image

image

Is it possible that not only a 429 is not recognized, but that the response code is not checked for 200 OK at all?

At the moment all calls are fully answered to give the requester the change to analyse the response, but in future it could be that such "false" requests are intercepted more generically.

@bj00rn
Copy link

bj00rn commented Mar 18, 2024

There is another finding on my side.

Independent from a 429 during setup. which is not recognized, also return code 400 leads to an endless loop!

Here is one example of calls with an invalid location, results in a 400, but bombs the API with requests each 80 sec. :-( ...

image

image

Is it possible that not only a 429 is not recognized, but that the response code is not checked for 200 OK at all?

At the moment all calls are fully answered to give the requester the change to analyse the response, but in future it could be that such "false" requests are intercepted more generically.

Looks like a related but separate issue, I think that during the config flow an api request should ideally be made to confirm the options (location, api key etc) provided before submitting the form. Any rate limit exceptions that are raised during config flow should probably prevent submitting just to be on the safe side.

@K-Ko
Copy link

K-Ko commented Mar 18, 2024

Depending on how sophisticated you want the checks to be:

@bj00rn
Copy link

bj00rn commented Mar 18, 2024

Depending on how sophisticated you want the checks to be:

So to re-cap I see three separate but related issues here with the integration:

  1. A rate limiting error during component setup (when the integration is loaded), will cause an endless request->rate limit loop.
  2. Options should be validated against the API during config flow to avoid creating broken instances of the integration that will never setup correctly and cause an endless request loop.
  3. The general problem with request being rate limited on data refresh after integration has setup correctly. This issue can probably be handled by the integration by postponing next refresh. Having multiple instances of the integration might make this one a bit tricky though since requests are rate limited by IP.

@K-Ko
Copy link

K-Ko commented Mar 18, 2024

  1. A rate limiting error during component setup (when the integration is loaded), will cause an endless request->rate limit loop.

Not only rate limit, any response code not equal 200 should trigger a kind of alert with the response error message.
response.message.text (as here)

@bj00rn
Copy link

bj00rn commented Mar 18, 2024

trigger a kind of alert with the response error message.
response.message.text (as here)

Yes you are correct here, but under nominal circumstances (the integration has been configured correctly) the retry cycle is desired behaviour. Examples would be; servers are down, dns resolution failure etc. The integration should then try to reload. For rate limiting errors this makes no sense though since rate limit will never resolve by making another request.

Edit: Im beginning to suspect it's probably better to do proper validation in config_flow/options flow and not call coordinator.async_config_entry_first_refresh at all during async_setup_entry.

That way the integration always gets created and requests will only occur at the set refresh interval of the integration. Any errors that arise can be handled by the integration from there on.

I made a PR to the lib to support validation.

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@issue-triage-workflows issue-triage-workflows bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests