-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceil_div
return common type and optmize
#3229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the previous implementation much simpler, please keep signed and unsigned separate
I don't quite agree...With two separate functions we need to duplicate 15 lines of code, which is not great. template <class _Tp,
class _Up,
_CUDA_VSTD::enable_if_t<_CCCL_TRAIT(_CUDA_VSTD::is_integral, _Tp), int> = 0,
_CUDA_VSTD::enable_if_t<_CCCL_TRAIT(_CUDA_VSTD::is_integral, _Up), int> = 0>
_CCCL_NODISCARD _LIBCUDACXX_HIDE_FROM_ABI _CCCL_CONSTEXPR_CXX14 decltype(_Tp{} / _Up{})
ceil_div(const _Tp __a, const _Up __b) noexcept
{
_CCCL_ASSERT(__b > _Up{0}, "cuda::ceil_div: b must be positive");
using _Common = decltype(_Tp{} / _Up{});
using _UCommon = _CUDA_VSTD::make_unsigned_t<_Common>;
if constexpr (_CUDA_VSTD::is_signed_v<_Tp>)
{
_CCCL_ASSERT(__a >= _Tp{0}, "cuda::ceil_div: a must be non negative");
}
auto __a1 = static_cast<_UCommon>(__a);
auto __b1 = static_cast<_UCommon>(__b); |
🟨 CI finished in 2h 02m: Pass: 79%/170 | Total: 3d 02h | Avg: 26m 16s | Max: 1h 23m | Hits: 36%/17647
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 170)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
🟨 CI finished in 2h 27m: Pass: 79%/170 | Total: 2d 18h | Avg: 23m 36s | Max: 1h 24m | Hits: 15%/17650
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 170)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
🟨 CI finished in 2h 28m: Pass: 78%/164 | Total: 2d 15h | Avg: 23m 08s | Max: 1h 16m | Hits: 419%/15310
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 164)
# | Runner |
---|---|
122 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
ceil_div
return common type and optmizeceil_div
return common type and optmize
🟩 CI finished in 5h 36m: Pass: 100%/135 | Total: 2d 18h | Avg: 29m 35s | Max: 1h 44m | Hits: 360%/23291
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 135)
# | Runner |
---|---|
92 | linux-amd64-cpu16 |
17 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
Fixes #2845, #2391
Description
ceil_div
returns the resulting type of the operation and has been optimized for CUDAFeatures
a
,b
, and both#### DO NOT MERGE