Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Child count gets too much increased due to missing exit tracking #6

Open
achimnol opened this issue Jul 20, 2017 · 1 comment
Open
Labels

Comments

@achimnol
Copy link
Member

achimnol commented Jul 20, 2017

Often TensorFlow codes spawn many threads, but the jail recognizes "too many" threads while the actual number of threads are within the configured limit.

Potential solutions:

  • Directly read "/proc/{pid}/status" to get the actual number of threads from the OS. May incur some overheads when spawning new processes/threads in the child.
  • Guard the childCount variable with explicit locks.

But still, TensorFlow seems to increase the number of threads when we repeat calling regressors.
We need to find some good solution on this.

NOTE:
Even the following code produces a large number of threads more than the number of CPU cores allocated to the container:

config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1, \
                        allow_soft_placement=True, device_count = {'CPU': 1})
session = tf.Session(config=config)
@achimnol achimnol added the bug label Jul 20, 2017
@achimnol
Copy link
Member Author

Adding locks did not change anything, as expected because we increment/decrement childCount within a single goroutine which receives waitpid results via a channel.

After writing a function that reads procfs to get all children's number of threads recursively, I found that the original jail implementation is correct and numThreads value in "/proc/{pid}/status" contains only the direct children threads.

Then we need to find some way to further reduce the number of threads used by TensorFlow itself.

achimnol added a commit that referenced this issue Jul 20, 2017
 * Add a utility function that reads procfs recursively to count all
   children processes and threads, but it gives the same result to
   the original child counting mechanism via waitpid and ptrace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant