Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x] Adds support for reconnecting to Redis if disconnected by server #281

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

joedixon
Copy link
Collaborator

@joedixon joedixon commented Dec 7, 2024

This PR adds support for attempting to reconnect to Redis if the connection is dropped.

It does this by listening for a disconnection and starting a configurable timeout. During the timeout, Reverb will attempt to reconnect every second. Any events that are triggered along the way are queued up and will be processed if the connection is re-established. If at the end of that timeout, it has not bee possible to reconnect, the server is stopped and it's up to the process monitor to attempt to restart Reverb which will kick off the process again.

Allowing Reverb to attempt the reconnection should prevent backoff from the process monitor when attempting to restart the server fails.

@joedixon
Copy link
Collaborator Author

@nathanaelytj I still have some tests to write here, but wondering if you could give it an early test and see if it works as you expect. Details of the behaviour are in the PR description.

@nathanaelytj
Copy link

Thank you @joedixon I have checked the fix. It's working as intended for issue when redis is disconnected and then reconnected.
Unintentionally, I found during the test that it will crash when redis is restarting in orchestration environment like docker swarm / kubernetes and Internal DNS failed to response container address.

image

I think this is okay to ignore, since orchestration like docker swarm / kubernetes has it's own healthcheck and after container crashed, it will restarted automatically.

@joedixon
Copy link
Collaborator Author

Thanks @nathanaelytj! I'm not super familiar with your configuration - do you have any idea why the code I'm introducing didn't handle that situation?

@nathanaelytj
Copy link

@joedixon I'm using docker swarm setup. When the container all gone from the service / replica 0, the bound service DNS along with it's accosiated virtual-IP is flushed, and that's when name or service not known. After a moment, a new container pop and it re-registered within Docker Swarm Internal DNS, sometimes the propagation is not fast enough so when reverb internal healthcheck run, it meet with name or service not known or something similar.

For non docker swarm / kubernetes setup, this error probably can be replicated when we are connecting to redis server through DNS, and then after connection established, we delete the DNS record. It will probably return error something similar to that.

@joedixon
Copy link
Collaborator Author

Do you think there is a different exception/error I can catch to trigger the timeout as opposed to a hard fail?

@nathanaelytj
Copy link

nathanaelytj commented Dec 20, 2024

After analyzing the stacktrace and experiment with the code I found that this line

/**
* Start the Http server
*/
public function start(): void
{
$this->loop->run();
}

must caught ErrorException for timeout to trigger. Here's my experiment:

    /**
     * Start the Http server
     */
    public function start(): void
    {
        try {
            $this->loop->run();
        } catch (\ErrorException $e) {
            Log::info("Error caught during restart");
        }
    }

The final result is:

image

It successfully trigger the timeout sequence and reconnection.

@nathanaelytj
Copy link

Btw bellow is full stack trace I use during analysis:

ErrorException

  Redis::get(): php_network_getaddresses: getaddrinfo for base_redis failed: Name or service not known

  at vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php:116
    112▕     public function command($method, array $parameters = [])
    113▕     {
    114▕         $start = microtime(true);
    115▕
  ➜ 116▕         $result = $this->client->{$method}(...$parameters);
    117▕
    118▕         $time = round((microtime(true) - $start) * 1000, 2);
    119
    120$this->events?->dispatch(new CommandExecuted(

  1   vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php:116
      Redis::get()

  2   vendor/laravel/framework/src/Illuminate/Redis/Connections/PhpRedisConnection.php:531
      Illuminate\Redis\Connections\Connection::command()

  3   vendor/laravel/framework/src/Illuminate/Redis/Connections/PhpRedisConnection.php:54
      Illuminate\Redis\Connections\PhpRedisConnection::command()

  4   vendor/laravel/framework/src/Illuminate/Cache/RedisStore.php:71
      Illuminate\Redis\Connections\PhpRedisConnection::get()

  5   vendor/laravel/framework/src/Illuminate/Cache/Repository.php:117
      Illuminate\Cache\RedisStore::get()

  6   vendor/laravel/framework/src/Illuminate/Cache/CacheManager.php:453
      Illuminate\Cache\Repository::get()

  7   vendor/laravel/framework/src/Illuminate/Support/Facades/Facade.php:361
      Illuminate\Cache\CacheManager::__call()

  8   vendor/laravel/reverb/src/Servers/Reverb/Console/Commands/StartServer.php:106
      Illuminate\Support\Facades\Facade::__callStatic()

  9   vendor/react/event-loop/src/Timer/Timers.php:102
      Laravel\Reverb\Servers\Reverb\Console\Commands\StartServer::Laravel\Reverb\Servers\Reverb\Console\Commands\{closure}()

  10  vendor/react/event-loop/src/StreamSelectLoop.php:185
      React\EventLoop\Timer\Timers::tick()

  11  vendor/laravel/reverb/src/Servers/Reverb/Http/Server.php:41
      React\EventLoop\StreamSelectLoop::run()

  12  vendor/laravel/reverb/src/Servers/Reverb/Console/Commands/StartServer.php:74
      Laravel\Reverb\Servers\Reverb\Http\Server::start()

  13  vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36
      Laravel\Reverb\Servers\Reverb\Console\Commands\StartServer::handle()

  14  vendor/laravel/framework/src/Illuminate/Container/Util.php:43
      Illuminate\Container\BoundMethod::Illuminate\Container\{closure}()

  15  vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:95
      Illuminate\Container\Util::unwrapIfClosure()

  16  vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:35
      Illuminate\Container\BoundMethod::callBoundMethod()

  17  vendor/laravel/framework/src/Illuminate/Container/Container.php:694
      Illuminate\Container\BoundMethod::call()

  18  vendor/laravel/framework/src/Illuminate/Console/Command.php:213
      Illuminate\Container\Container::call()

  19  vendor/symfony/console/Command/Command.php:279
      Illuminate\Console\Command::execute()

  20  vendor/laravel/framework/src/Illuminate/Console/Command.php:182
      Symfony\Component\Console\Command\Command::run()

  21  vendor/symfony/console/Application.php:1094
      Illuminate\Console\Command::run()

  22  vendor/symfony/console/Application.php:342
      Symfony\Component\Console\Application::doRunCommand()

  23  vendor/symfony/console/Application.php:193
      Symfony\Component\Console\Application::doRun()

  24  vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:198
      Symfony\Component\Console\Application::run()

  25  vendor/laravel/framework/src/Illuminate/Foundation/Application.php:1205
      Illuminate\Foundation\Console\Kernel::handle()

  26  artisan:13
      Illuminate\Foundation\Application::handleCommand()

@joedixon
Copy link
Collaborator Author

So the issue is that exception is being thrown too high up the stack to be caught by the reconnection code?

@nathanaelytj
Copy link

I think the connection class at laravel/framework throwing Exception earlier than reconnection code try catch block.

Using reactphp the code is like operating in a closed loop.
It calls back to itself in a set time.
Unhandled exception during loop can cause the loop to stop, that's why laravel reverb seems to exit with hard fail and exception thrown.

My experimental code caught the exception, but it's doing nothing waiting for the next tick. At the next tick the whole run sequence triggered. During this tick it threw another exception but caught by reconnection code try catch block because now DNS resolved but reverb is not subscibed. Then the reconnection triggered restoring all function to healthy state.

What do you think?

@joedixon joedixon force-pushed the fix/redis-reconnection branch from 762915c to 71a0fb6 Compare January 2, 2025 13:37
@joedixon joedixon force-pushed the fix/redis-reconnection branch from 71a0fb6 to a53bf7f Compare January 20, 2025 14:07
@joedixon joedixon force-pushed the fix/redis-reconnection branch from 7d5dbe3 to 96b452e Compare January 20, 2025 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants