Skip to content
This repository was archived by the owner on Apr 29, 2020. It is now read-only.

Add log lines when daemon set handling goroutine exits#1053

Open
mpuncel wants to merge 2 commits intosquare:masterfrom
mpuncel:mpuncel/log-ds-returns
Open

Add log lines when daemon set handling goroutine exits#1053
mpuncel wants to merge 2 commits intosquare:masterfrom
mpuncel:mpuncel/log-ds-returns

Conversation

@mpuncel
Copy link
Collaborator

@mpuncel mpuncel commented May 29, 2018

Occasionally we see problems in the DS farm where the main control loop
gets blocked sending updates to daemon set handlers via channels. This
is likely because the handler goroutines are exiting, but logs don't
currently reveal the cause. This commit adds logs to all of the places
where these goroutines might exit which should shed light on the issue

Occasionally we see problems in the DS farm where the main control loop
gets blocked sending updates to daemon set handlers via channels. This
is likely because the handler goroutines are exiting, but logs don't
currently reveal the cause. This commit adds logs to all of the places
where these goroutines might exit which should shed light on the issue
// so that the timer would be stopped after
err = nil
case <-ctx.Done():
ds.logger.Warnln("goroutine exiting because ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this message

Occasionally the daemon set farm locks up with the farm goroutine
blocking forever attempting to send an update to a daemon set worker
goroutine. This can happen due to a race where the worker thread might
exit for a number of reasons after the farm goroutine checks the child
map to determine a worker already exists but before sending an update.

This commit sidesteps the problem by buffering the per-daemon set update
channel so that the farm goroutine will never block sending to a worker.

If a worker dies, an existing routine grabs a mutex protecting the child
map and clears out the child entry and drains the buffered channel. The
next time an update is seen for the daemon set, the farm loop should
know that it needs to spawn another worker.
@mpuncel mpuncel force-pushed the mpuncel/log-ds-returns branch from 0918e7e to 782d378 Compare February 16, 2019 12:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants