Python core dump or memcached/redis I/O error can cause other-process `apply_async` to loop for `thundering_herd_timeout` #66

kylegibson · 2016-09-22T02:57:44Z

See here: https://github.com/PolicyStat/jobtastic/blob/master/jobtastic/cache/base.py#L91

The timeout doesn't mean that add will block and wait. It just means that if the key can be set, it will be set with that timeout. So if the key already exists, this results in a busy loop.

The text was updated successfully, but these errors were encountered:

winhamwr · 2016-09-22T15:40:20Z

I think this is actually #9. I'm going to write up a potential solution to this problem on that issue, based on our discussion.

winhamwr · 2016-09-22T16:07:49Z

This is actually different from #9. I was wrong. Because of the way we do lock.acquire in both memcached and redis with no timeout possibility, if for some reason the lock.release doesn't happen in another thread/process, the lock aquisition will wait forever.

I think that's probably low impact, since it would require the python code to have core dumped or the call to lock.release() or self.cache.delete to have failed because of network or memcached/redis problems. In that case, the other processes waiting on the lock would be in an infinite loop.

winhamwr · 2016-09-22T16:13:00Z

Probably the behavior we want is for those other processes to fail very quickly if they're stuck waiting for a lock. thundering_herd_timeout is about how long we should wait for a queued task to complete, which would depend on how busy your workers are and how long the task takes to run.

This timeout is more about how long you would expect it to take for a process to run through the apply_async method, which does no work other than checking the cache and possibly queueing a task. My 20-minute reporting/analytics/backup task should fail in a matter of seconds waiting on the task to actually be queued. It shouldn't wait in a blocking (probably web) process.

winhamwr · 2016-09-22T17:01:01Z

I created #68 to document how I think a feature might look to do simultaneous execution prevention.

winhamwr changed the title ~~Busy waiting in cache.base~~ Python core dump or memcached/redis I/O error can cause other-process apply_async to loop for thundering_herd_timeout Sep 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python core dump or memcached/redis I/O error can cause other-process `apply_async` to loop for `thundering_herd_timeout` #66

Python core dump or memcached/redis I/O error can cause other-process `apply_async` to loop for `thundering_herd_timeout` #66

kylegibson commented Sep 22, 2016

winhamwr commented Sep 22, 2016

winhamwr commented Sep 22, 2016

winhamwr commented Sep 22, 2016

winhamwr commented Sep 22, 2016

Python core dump or memcached/redis I/O error can cause other-process apply_async to loop for thundering_herd_timeout #66

Python core dump or memcached/redis I/O error can cause other-process apply_async to loop for thundering_herd_timeout #66

Comments

kylegibson commented Sep 22, 2016

winhamwr commented Sep 22, 2016

winhamwr commented Sep 22, 2016

winhamwr commented Sep 22, 2016

winhamwr commented Sep 22, 2016

Python core dump or memcached/redis I/O error can cause other-process `apply_async` to loop for `thundering_herd_timeout` #66

Python core dump or memcached/redis I/O error can cause other-process `apply_async` to loop for `thundering_herd_timeout` #66