Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential growth of watchers when ZK client disconnects/reconnects #10

Open
jonpither opened this issue Jan 9, 2014 · 2 comments
Open

Comments

@jonpither
Copy link

Hi,

If for whatever reason the ZK client drops a connnection and reconnects before the SESSION_EXPIRY is raised, the number of watchers managed by the client will double each time.

During some network issues we had a few processes grind down to a halt because of this.

This is because Avout raises a watcher to manage the internal state of atoms and refs. I.e. see invalidateCache in avout.refs.

What should happen here is that each time the watcher is triggered, it should filter out events like {:event-type :None, :keeper-state :Disconnected}, perhaps by only ever being interested in event-type NodeDataChanged.

When these events get raised Avout currently registers brand new watchers even though the original watcher will still carry on firing. In ZK watchers are "fire-once" but not in the case of some events like the disconnect/connect.

This is the same for addWatch in refs and atoms.

juxtin pushed a commit to juxtin/avout that referenced this issue Jun 10, 2014
As @jafingerhut pointed out, there are two apparently incorrect uses
of swap!, one in atoms.clj and one in refs.clj. In both cases, the
error is in the removeWatch implementation. This resolves liebke#11, and
possibly issues liebke#3 and liebke#10.
juxtin pushed a commit to juxtin/avout that referenced this issue Jun 11, 2014
As @jafingerhut pointed out, there are two apparently incorrect uses
of swap!, one in atoms.clj and one in refs.clj. In both cases, the
error is in the removeWatch implementation. This resolves liebke#11, and
possibly issues liebke#3 and liebke#10.
juxtin pushed a commit to juxtin/avout that referenced this issue Jun 11, 2014
As @jafingerhut pointed out, there are two apparently incorrect uses
of swap!, one in atoms.clj and one in refs.clj. In both cases, the
error is in the removeWatch implementation (shown to be broken by the
new tests in my previous commit). This commit fixes resolves liebke#11 and
resolves liebke#3. I can't tell for sure yet, but I suspect that it might
help with liebke#10.
@juxtin juxtin mentioned this issue Jun 11, 2014
@juxtin
Copy link
Contributor

juxtin commented Jun 30, 2014

Hi @jonpither,

I know it's been a while and I'm not sure if this is something you're still running into, but I've got some time and I'd like to see if I can fix this issue.

I think I've got the cause nailed down, but I could use some help reproducing the issue so I can be sure. Can you point me in the right direction?

@jonpither
Copy link
Author

Hi @holguinj

It's a tough one to reproduce. You want to cause a session disconnect but NOT a session timeout. I can't recall how I programmatically nailed it down, but you can set a long-time out and effect some disconnects. I've since moved project so I don't have the sort of test infrastructural code available.

Once you've got the disconnects happening you can then query the amount of watchers registered before/after. What should happen is that the watcher gets fired on a disconnect (and removes itself as per usual), then re-registers.

What currently happens is that the watcher fires, doesn't remove itself, and re-registers itself. Hence an exponential growth.

Good luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants