Exponential growth of watchers when ZK client disconnects/reconnects #10

jonpither · 2014-01-09T12:11:04Z

Hi,

If for whatever reason the ZK client drops a connnection and reconnects before the SESSION_EXPIRY is raised, the number of watchers managed by the client will double each time.

During some network issues we had a few processes grind down to a halt because of this.

This is because Avout raises a watcher to manage the internal state of atoms and refs. I.e. see invalidateCache in avout.refs.

What should happen here is that each time the watcher is triggered, it should filter out events like {:event-type :None, :keeper-state :Disconnected}, perhaps by only ever being interested in event-type NodeDataChanged.

When these events get raised Avout currently registers brand new watchers even though the original watcher will still carry on firing. In ZK watchers are "fire-once" but not in the case of some events like the disconnect/connect.

This is the same for addWatch in refs and atoms.

@jafingerhut

As @jafingerhut pointed out, there are two apparently incorrect uses of swap!, one in atoms.clj and one in refs.clj. In both cases, the error is in the removeWatch implementation. This resolves liebke#11, and possibly issues liebke#3 and liebke#10.

@jafingerhut

As @jafingerhut pointed out, there are two apparently incorrect uses of swap!, one in atoms.clj and one in refs.clj. In both cases, the error is in the removeWatch implementation. This resolves liebke#11, and possibly issues liebke#3 and liebke#10.

@jafingerhut

As @jafingerhut pointed out, there are two apparently incorrect uses of swap!, one in atoms.clj and one in refs.clj. In both cases, the error is in the removeWatch implementation (shown to be broken by the new tests in my previous commit). This commit fixes resolves liebke#11 and resolves liebke#3. I can't tell for sure yet, but I suspect that it might help with liebke#10.

juxtin · 2014-06-30T05:47:54Z

Hi @jonpither,

I know it's been a while and I'm not sure if this is something you're still running into, but I've got some time and I'd like to see if I can fix this issue.

I think I've got the cause nailed down, but I could use some help reproducing the issue so I can be sure. Can you point me in the right direction?

jonpither · 2014-07-03T08:06:26Z

Hi @holguinj

It's a tough one to reproduce. You want to cause a session disconnect but NOT a session timeout. I can't recall how I programmatically nailed it down, but you can set a long-time out and effect some disconnects. I've since moved project so I don't have the sort of test infrastructural code available.

Once you've got the disconnects happening you can then query the amount of watchers registered before/after. What should happen is that the watcher gets fired on a disconnect (and removes itself as per usual), then re-registers.

What currently happens is that the watcher fires, doesn't remove itself, and re-registers itself. Hence an exponential growth.

Good luck.

juxtin mentioned this issue Jun 11, 2014

Fix remove-watch #14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exponential growth of watchers when ZK client disconnects/reconnects #10

Exponential growth of watchers when ZK client disconnects/reconnects #10

jonpither commented Jan 9, 2014

juxtin commented Jun 30, 2014

jonpither commented Jul 3, 2014

Exponential growth of watchers when ZK client disconnects/reconnects #10

Exponential growth of watchers when ZK client disconnects/reconnects #10

Comments

jonpither commented Jan 9, 2014

juxtin commented Jun 30, 2014

jonpither commented Jul 3, 2014