-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement] limit cmake jobs for Windows backend build #2240
Open
icfaust
wants to merge
9
commits into
uxlfoundation:main
Choose a base branch
from
icfaust:dev/fix_windows_todo
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
63a92f8
Update build_backend.py
icfaust 0a65818
Update build_backend.py
icfaust 2df497f
formatting
icfaust 38985d3
fix memfree
icfaust ee7cbf9
Update build_backend.py
icfaust 8d427e8
Update build_backend.py
icfaust 73d787b
Update build_backend.py
icfaust 6bcc736
Update build_backend.py
icfaust 050adc8
formatting
icfaust File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -131,16 +131,32 @@ | |||||||||
cmake_args += ["-DADD_ONEDAL_RPATH=ON"] | ||||||||||
|
||||||||||
cpu_count = multiprocessing.cpu_count() | ||||||||||
# convert to max supported pointer to a memory size in kB | ||||||||||
memfree = sys.maxsize >> 10 | ||||||||||
# limit parallel cmake jobs if memory size is insufficient | ||||||||||
# TODO: add on all platforms | ||||||||||
if IS_LIN: | ||||||||||
with open("/proc/meminfo", "r") as meminfo_file_obj: | ||||||||||
memfree = meminfo_file_obj.read().split("\n")[1].split(" ") | ||||||||||
while "" in memfree: | ||||||||||
memfree.remove("") | ||||||||||
memfree = int(memfree[1]) # total memory in kB | ||||||||||
cpu_count = min(cpu_count, floor(max(1, memfree / 2**20))) | ||||||||||
|
||||||||||
next(meminfo_file_obj) # skip MemTotal | ||||||||||
memfree = int( | ||||||||||
next(meminfo_file_obj).strip().split()[-2] | ||||||||||
) # total free physical memory in kB | ||||||||||
elif IS_WIN: | ||||||||||
txt = subprocess.run( | ||||||||||
[ | ||||||||||
"powershell", | ||||||||||
"Get-CIMInstance", | ||||||||||
"Win32_OperatingSystem", | ||||||||||
"|", | ||||||||||
"Select", | ||||||||||
"FreePhysicalMemory", | ||||||||||
], | ||||||||||
stdout=subprocess.PIPE, | ||||||||||
text=True, | ||||||||||
) | ||||||||||
memfree = int(txt.stdout.strip().split()[-1]) # total free physical memory in kB | ||||||||||
|
||||||||||
mem_per_proc = 20 # 2**20 kB or 1GB | ||||||||||
cpu_count = min(cpu_count, floor(max(1, memfree >> mem_per_proc))) | ||||||||||
Comment on lines
+158
to
+159
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that |
||||||||||
make_args = ["cmake", "--build", abs_build_temp_path, "-j " + str(cpu_count)] | ||||||||||
|
||||||||||
make_install_args = [ | ||||||||||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not getting the idea here.
sys.maxsize
returns you the maximum possible addressable memory range by the operating system in bytes, which in a modern machine would be 2^63 (expressing it in KiB would require division by 2^10).Dividing that by 128 would not have any effect, since the available memory might be in the range of 2^33 through 2^38 or so for a modern machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, was confused a little about Py_ssize_t definition and how it relates to size_t. I have switched division operators to bitshifts for clarity and directness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the exact numbers that you are arriving at, is this trying to limit addressable space to accommodate a
float64
casted tosize_t
? What's the idea behind these?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am assuming that the underlying processes have a memory bound (not just linux or windows). I wanted a general value I could easily get out of python to have a conservative upper bound of memory that can be used. I then use that as a very large initial value for splitting it in the multiprocessing. I am using Py_ssize_t like
sizeof
to yield the largest byte value possible for a platform. This at least makes all platforms to have some sort of bound without having to explicitly handle other platforms beyond windows/linux (e.g. mac).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding it correctly that the purpose of this code is to spawn multiple compilation processes with a memory limit for each such that the sum would not exceed system memory? Or does this code serve some other purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a reasonable compromise would be to just get total RAM in the system and divide by 4GB to get a target number of max parallel processes. I don't think the problem is approachable otherwise - new processes can be spawned and terminated by the user as the compilation is running either way, which also changes the calculation.
It's also quite an odd thing to have in a project's build script. I am not aware of any other ML library limiting the number of parallel jobs for the user based on available RAM - typically they just OOM on their own and the user then needs to adjust threads as needed.
Was there a CI failure on windows due to OOM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No error that I saw for windows, I assume it was out of memory on Linux at some point? @Alexsandruss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1GB constraint was chosen in #1196 based on CI VM parameters to prevent build failure due to OOM on Linux only. Change to some other working weak constraint will be fine. Also,
MemFree
might be changed toMemAvailable
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david-cortes-intel Maybe we should allow for external control of the parallel processes (something like
MAX_JOBS
from pytorch https://github.com/pytorch/pytorch/blob/main/setup.py#L13)? The explanation shows it was done to match CI rather than a consistent error for building. That way it can be hardware agnostic and we can easily add it externally to the CI yamls without massively changing any of the build scripts. Let me know what you think. If so, then I think we would need to close this PR and make a ticket. @Alexsandruss was there a ticket generated for this TODO?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think a user-configurable variable would be the best solution here.