Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sub-RFC for increased availability of NUMA API #1545

Open
wants to merge 12 commits into
base: dev/vossmjp/rfc_numa_support
Choose a base branch
from

Conversation

aleksei-fedotov
Copy link
Contributor

Description

Add sub-RFC to #1535 for increased availability of NUMA API.

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

Copy link
Contributor

@vossmjp vossmjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small wording issues.

@aleksei-fedotov aleksei-fedotov force-pushed the dev/aleksei-fedotov/rfc_numa_support-increased-availability branch from 83b8788 to 5e8b79e Compare November 14, 2024 09:27
Copy link
Contributor

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this proposal.


Having a dependency on a shared HWLOC library has advantages:
1. Code reuse with all of the positive consequences out of this, including
relying on the same code that has been tested and debugged, allowing the OS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in fact most of Linux OSes has obsolete hwloc versions and relying on it does not provide benefits.
IMO, having most up-to-date static HWLOC together with recent versions of oneTBB has benefits and fixes / new features are available to oneTBB immediately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I did not know that. I will consider this in the future changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rewrite it to smth like this:
1. Reliability. Using a tested and debugged shared library, oneTBB benefits from established, reliable functionality.
2. Code Reuse. Reuse the same code across different processes, improving cache locality and reducing memory footprint, which is the primary purpose of shared libraries.
3. Drop-In Replacement. Use your version of HWLOC without recompiling oneTBB. It can be useful in the following cases:

  • You need to apply a hotfix to support your hardware that has not yet been integrated into the HWLOC project.
  • You use a HWLOC version that may never be upstreamed. For example, if hardware unavailable to the broader market.
  • You want to test a development version of HWLOC on your system.

so that the necessary variant of ~tbbbind~ library can be found and loaded.
2. The drop of support for HWLOC 1.x allows to not introducing additional
~tbbbind~ variant of the library, yet maintaining support for popular
versions of HWLOC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you still introduce new variant of tbbbind, as ALL distributions already ship only libtbbbind_2_5.so

It means, all TBB distribution across all package managers need to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or they may continue supporting the current variant with system-specific version of HWLOC as it is still a working approach.


** Disadvantages
By default still no diagnostics if users failed to setup environment with their
own version of HWLOC library correctly. Although, specifying ~TBB_VERSION=1~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why user would be its own version of hwloc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduced that approach in the past and it may be backward incompatible if we change it at once. Perhaps, we decide to prioritize own version of HWLOC someday, but AFAIK still need to leave the possibility to specify a user one.

@@ -0,0 +1,125 @@
# -*- fill-column: 80; -*-

#+title: Link ~tbbbind~ with static HWLOC to improve predictability of NUMA support API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the title is too long. its okay though since it is not a part of the main doc set, but i suggest rephrasing it to something like that:
Link tbbbind with Static HWLOC for NUMA API Predictability

rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
** Common Advantages
- Explicitly tells that the functionality being used is not going to work
instead of just being silent.
- Does not require additional variant of ~tbbbind~ library to be distributed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Does not require additional variant of ~tbbbind~ library to be distributed
- Avoids the need to distribute an additional variant of ~tbbbind~ library.

- Explicitly tells that the functionality being used is not going to work
instead of just being silent.
- Does not require additional variant of ~tbbbind~ library to be distributed
along with the others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
along with the others.

rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
words, it does not provide complete solution to the problem.

** Disadvantages of Issuing a Warning
- The warning may still not be visible, especially if standard streams are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The warning may still not be visible, especially if standard streams are
- The warning may be unnoticed, especially if standard streams are

closed.

** Disadvantages of Throwing an Exception
- May break existing code as it does not expect an exception to be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- May break existing code as it does not expect an exception to be thrown.
- May break existing code that does not expect an exception to be thrown.

@vossmjp vossmjp added the RFC label Jan 9, 2025
rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
refuse working requiring one of the ~tbbbind~ variant to be loaded (e.g., throw
an exception).

Comparing these alternative approaches to the one proposed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No necessarily.

rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
rfcs/proposed/numa_support/tbbbind-link-static-hwloc.org Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants