Refactor `limits` and `climits` #3221

davebayer · 2024-12-30T10:06:49Z

This PR implements refactor of limits and climits modules.

Changes:

builtins for huge_val, nan and nans are defined in the standard way and moved to __cccl/builtin.h module
limits and climits are implemented directly in the header instead of detail/libcxx/include/
__cuda/climits_prelude.h's contents are moved to climits
removed repetitive implementation of numeric_limits
use bit_cast for inf and nan when no builtin is supported

copy-pr-bot · 2024-12-30T10:06:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

libcudacxx/include/cuda/std/climits

libcudacxx/include/cuda/std/__cccl/builtin.h

libcudacxx/include/cuda/std/limits

miscco · 2025-01-02T18:22:28Z

/ok to test

libcudacxx/include/cuda/std/limits

miscco · 2025-01-03T08:46:42Z

/ok to test

miscco · 2025-01-07T11:38:05Z

/ok to test

github-actions · 2025-01-07T13:42:28Z

🟨 CI finished in 2h 02m: Pass: 82%/170 | Total: 2d 12h | Avg: 21m 19s | Max: 1h 14m | Hits: 26%/22534

🟨 libcudacxx: Pass: 75%/48 | Total: 15h 58m | Avg: 19m 58s | Max: 1h 03m | Hits: 32%/9818

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  73%/46  | Total: 15h 19m | Avg: 19m 59s | Max:  1h 03m | Hits:  32%/9818  
  🟩 arm64              Pass: 100%/2   | Total: 39m 02s | Avg: 19m 31s | Max: 20m 34s
🔍 cudacxx_family: nvcc 🔍
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 04m | Avg: 16m 01s | Max: 19m 52s
  🔍 nvcc               Pass:  72%/44  | Total: 14h 54m | Avg: 20m 20s | Max:  1h 03m | Hits:  32%/9818  
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/20  | Total:  6h 57m | Avg: 20m 51s | Max:  1h 01m
  🔍 GCC                Pass:  42%/21  | Total:  5h 23m | Avg: 15m 25s | Max:  1h 03m
  🟩 Intel              Pass: 100%/1   | Total: 21m 01s | Avg: 21m 01s | Max: 21m 01s
  🟩 MSVC               Pass: 100%/4   | Total:  2h 14m | Avg: 33m 34s | Max: 38m 57s | Hits:  32%/9818  
  🟩 NVHPC              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 08s | Max: 33m 17s
🟨 ctk
  🟨 11.1               Pass:  42%/7   | Total:  1h 17m | Avg: 11m 07s | Max: 30m 06s | Hits:  30%/2240  
  🟩 12.5               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 08s | Max: 33m 17s
  🟨 12.6               Pass:  79%/39  | Total: 13h 38m | Avg: 20m 59s | Max:  1h 03m | Hits:  33%/7578  
🟨 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 04m | Avg: 16m 01s | Max: 19m 52s
  🟨 nvcc11.1           Pass:  42%/7   | Total:  1h 17m | Avg: 11m 07s | Max: 30m 06s | Hits:  30%/2240  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 02m | Avg: 31m 08s | Max: 33m 17s
  🟨 nvcc12.6           Pass:  77%/35  | Total: 12h 34m | Avg: 21m 33s | Max:  1h 03m | Hits:  33%/7578  
🟨 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  1h 15m | Avg: 18m 46s | Max: 21m 57s
  🟩 Clang10            Pass: 100%/1   | Total: 23m 52s | Avg: 23m 52s | Max: 23m 52s
  🟩 Clang11            Pass: 100%/1   | Total: 19m 54s | Avg: 19m 54s | Max: 19m 54s
  🟩 Clang12            Pass: 100%/1   | Total: 19m 34s | Avg: 19m 34s | Max: 19m 34s
  🟩 Clang13            Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
  🟩 Clang14            Pass: 100%/1   | Total: 19m 32s | Avg: 19m 32s | Max: 19m 32s
  🟩 Clang15            Pass: 100%/1   | Total: 19m 28s | Avg: 19m 28s | Max: 19m 28s
  🟩 Clang16            Pass: 100%/1   | Total: 19m 08s | Avg: 19m 08s | Max: 19m 08s
  🟩 Clang17            Pass: 100%/1   | Total: 20m 12s | Avg: 20m 12s | Max: 20m 12s
  🟩 Clang18            Pass: 100%/8   | Total:  3h 00m | Avg: 22m 34s | Max:  1h 01m
  🟥 GCC6               Pass:   0%/2   | Total:  4m 28s | Avg:  2m 14s | Max:  2m 17s
  🟥 GCC7               Pass:   0%/2   | Total:  5m 24s | Avg:  2m 42s | Max:  2m 43s
  🟥 GCC8               Pass:   0%/1   | Total:  2m 48s | Avg:  2m 48s | Max:  2m 48s
  🟥 GCC9               Pass:   0%/3   | Total:  7m 36s | Avg:  2m 32s | Max:  2m 59s
  🟩 GCC10              Pass: 100%/1   | Total: 20m 18s | Avg: 20m 18s | Max: 20m 18s
  🟩 GCC11              Pass: 100%/1   | Total: 21m 51s | Avg: 21m 51s | Max: 21m 51s
  🟩 GCC12              Pass: 100%/1   | Total: 23m 47s | Avg: 23m 47s | Max: 23m 47s
  🟨 GCC13              Pass:  60%/10  | Total:  3h 57m | Avg: 23m 46s | Max:  1h 03m
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 21m 01s | Avg: 21m 01s | Max: 21m 01s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s | Hits:  30%/2240  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 34m 20s | Avg: 34m 20s | Max: 34m 20s | Hits:  27%/2477  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 55s | Max: 38m 57s | Hits:  36%/5101  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 02m | Avg: 31m 08s | Max: 33m 17s
🟨 jobs
  🟨 Build              Pass:  80%/41  | Total: 12h 09m | Avg: 17m 47s | Max: 38m 57s | Hits:  32%/9818  
  🟥 NVRTC              Pass:   0%/4   | Total:  1h 42m | Avg: 25m 33s | Max: 28m 47s
  🟩 Test               Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s
🟨 gpu
  🟨 v100               Pass:  75%/48  | Total: 15h 58m | Avg: 19m 58s | Max:  1h 03m | Hits:  32%/9818  
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 12m 21s | Avg: 12m 21s | Max: 12m 21s
  🟩 90a                Pass: 100%/2   | Total: 16m 06s | Avg:  8m 03s | Max: 12m 11s
🟨 std
  🟨 11                 Pass:  33%/6   | Total:  1h 09m | Avg: 11m 37s | Max: 24m 18s
  🟨 14                 Pass:  40%/5   | Total:  1h 23m | Avg: 16m 47s | Max: 30m 06s | Hits:  30%/2240  
  🟨 17                 Pass:  69%/13  | Total:  4h 03m | Avg: 18m 42s | Max: 34m 20s | Hits:  37%/4954  
  🟨 20                 Pass:  95%/23  | Total:  9h 19m | Avg: 24m 20s | Max:  1h 03m | Hits:  25%/2624

🟨 cub: Pass: 82%/47 | Total: 1d 00h | Avg: 30m 53s | Max: 1h 13m | Hits: 2%/3144

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  82%/45  | Total:  1d 00h | Avg: 32m 02s | Max:  1h 13m | Hits:   2%/3144  
  🟩 arm64              Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  5m 01s
🔍 cudacxx_family: nvcc 🔍
  🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 42s | Avg:  4m 21s | Max:  4m 28s
  🔍 nvcc               Pass:  82%/45  | Total:  1d 00h | Avg: 32m 03s | Max:  1h 13m | Hits:   2%/3144  
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/19  | Total:  9h 10m | Avg: 28m 58s | Max:  1h 01m
  🔍 GCC                Pass:  61%/21  | Total:  7h 07m | Avg: 20m 21s | Max:  1h 01m
  🟩 Intel              Pass: 100%/1   | Total: 57m 57s | Avg: 57m 57s | Max: 57m 57s
  🟩 MSVC               Pass: 100%/4   | Total:  4h 37m | Avg:  1h 09m | Max:  1h 13m | Hits:   2%/3144  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
🔍 gpu: v100 🔍
  🟩 h100               Pass: 100%/2   | Total: 20m 29s | Avg: 10m 14s | Max: 15m 58s
  🔍 v100               Pass:  82%/45  | Total: 23h 51m | Avg: 31m 48s | Max:  1h 13m | Hits:   2%/3144  
🔍 jobs: Build 🔍
  🔍 Build              Pass:  80%/40  | Total: 20h 37m | Avg: 30m 55s | Max:  1h 13m | Hits:   2%/3144  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 18m 20s | Avg: 18m 20s | Max: 18m 20s
  🟩 GraphCapture       Pass: 100%/1   | Total: 24m 32s | Avg: 24m 32s | Max: 24m 32s
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 21m | Avg: 27m 05s | Max: 34m 26s
  🟩 TestGPU            Pass: 100%/2   | Total:  1h 30m | Avg: 45m 12s | Max:  1h 01m
🟨 ctk
  🟨 11.1               Pass:  42%/7   | Total:  4h 26m | Avg: 38m 05s | Max:  1h 01m | Hits:   2%/786   
  🟩 12.5               Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
  🟨 12.6               Pass:  89%/38  | Total: 17h 26m | Avg: 27m 32s | Max:  1h 13m | Hits:   2%/2358  
🟨 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 42s | Avg:  4m 21s | Max:  4m 28s
  🟨 nvcc11.1           Pass:  42%/7   | Total:  4h 26m | Avg: 38m 05s | Max:  1h 01m | Hits:   2%/786   
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
  🟨 nvcc12.6           Pass:  88%/36  | Total: 17h 17m | Avg: 28m 49s | Max:  1h 13m | Hits:   2%/2358  
🟨 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  3h 29m | Avg: 52m 24s | Max: 58m 00s
  🟩 Clang10            Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
  🟩 Clang11            Pass: 100%/1   | Total: 56m 36s | Avg: 56m 36s | Max: 56m 36s
  🟩 Clang12            Pass: 100%/1   | Total: 58m 06s | Avg: 58m 06s | Max: 58m 06s
  🟩 Clang13            Pass: 100%/1   | Total: 55m 51s | Avg: 55m 51s | Max: 55m 51s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 27s | Avg:  5m 27s | Max:  5m 27s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 18s | Avg:  5m 18s | Max:  5m 18s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 24s | Avg:  5m 24s | Max:  5m 24s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 27m | Avg: 12m 26s | Max: 34m 26s
  🟥 GCC6               Pass:   0%/2   | Total: 56m 19s | Avg: 28m 09s | Max: 28m 59s
  🟥 GCC7               Pass:   0%/2   | Total:  1h 01m | Avg: 30m 42s | Max: 30m 43s
  🟥 GCC8               Pass:   0%/1   | Total: 29m 55s | Avg: 29m 55s | Max: 29m 55s
  🟥 GCC9               Pass:   0%/3   | Total:  1h 24m | Avg: 28m 08s | Max: 29m 15s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
  🟩 GCC11              Pass: 100%/1   | Total:  6m 03s | Avg:  6m 03s | Max:  6m 03s
  🟩 GCC12              Pass: 100%/3   | Total: 26m 29s | Avg:  8m 49s | Max: 15m 58s
  🟩 GCC13              Pass: 100%/8   | Total:  2h 37m | Avg: 19m 38s | Max:  1h 01m
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 57m 57s | Avg: 57m 57s | Max: 57m 57s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:   2%/786   
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m | Hits:   2%/786   
  🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 13m | Hits:   2%/1572  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
🟨 std
  🟨 11                 Pass:  40%/5   | Total:  3h 11m | Avg: 38m 23s | Max: 57m 56s
  🟨 14                 Pass:  50%/4   | Total:  2h 57m | Avg: 44m 22s | Max:  1h 01m | Hits:   2%/786   
  🟨 17                 Pass:  75%/12  | Total:  7h 58m | Avg: 39m 51s | Max:  1h 12m | Hits:   2%/1572  
  🟩 20                 Pass: 100%/26  | Total: 10h 03m | Avg: 23m 13s | Max:  1h 13m | Hits:   3%/786   
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 20m 29s | Avg: 10m 14s | Max: 15m 58s
  🟩 90a                Pass: 100%/1   | Total:  4m 34s | Avg:  4m 34s | Max:  4m 34s

🟨 thrust: Pass: 82%/46 | Total: 16h 18m | Avg: 21m 16s | Max: 1h 14m | Hits: 27%/9260

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  81%/44  | Total: 16h 09m | Avg: 22m 01s | Max:  1h 14m | Hits:  27%/9260  
  🟩 arm64              Pass: 100%/2   | Total:  9m 40s | Avg:  4m 50s | Max:  5m 05s
🔍 cudacxx_family: nvcc 🔍
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 18s
  🔍 nvcc               Pass:  81%/44  | Total: 16h 08m | Avg: 22m 00s | Max:  1h 14m | Hits:  27%/9260  
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/19  | Total:  6h 19m | Avg: 19m 58s | Max: 43m 37s
  🔍 GCC                Pass:  57%/19  | Total:  1h 42m | Avg:  5m 25s | Max: 12m 47s
  🟩 Intel              Pass: 100%/1   | Total: 49m 38s | Avg: 49m 38s | Max: 49m 38s
  🟩 MSVC               Pass: 100%/5   | Total:  5h 05m | Avg:  1h 01m | Max:  1h 14m | Hits:  27%/9260  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 10m
🔍 jobs: Build 🔍
  🔍 Build              Pass:  80%/40  | Total: 14h 42m | Avg: 22m 03s | Max:  1h 14m | Hits:   9%/7408  
  🟩 TestCPU            Pass: 100%/3   | Total: 36m 39s | Avg: 12m 13s | Max: 22m 13s | Hits:  99%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total:  1h 00m | Avg: 20m 01s | Max: 34m 47s
🟨 ctk
  🟨 11.1               Pass:  42%/7   | Total:  2h 29m | Avg: 21m 22s | Max:  1h 14m | Hits:   9%/1852  
  🟩 12.5               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 10m
  🟨 12.6               Pass:  89%/37  | Total: 11h 28m | Avg: 18m 35s | Max:  1h 10m | Hits:  32%/7408  
🟨 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 29s | Avg:  5m 14s | Max:  5m 18s
  🟨 nvcc11.1           Pass:  42%/7   | Total:  2h 29m | Avg: 21m 22s | Max:  1h 14m | Hits:   9%/1852  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 10m
  🟨 nvcc12.6           Pass:  88%/35  | Total: 11h 17m | Avg: 19m 21s | Max:  1h 10m | Hits:  32%/7408  
🟨 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  2h 14m | Avg: 33m 38s | Max: 37m 32s
  🟩 Clang10            Pass: 100%/1   | Total: 43m 37s | Avg: 43m 37s | Max: 43m 37s
  🟩 Clang11            Pass: 100%/1   | Total: 37m 02s | Avg: 37m 02s | Max: 37m 02s
  🟩 Clang12            Pass: 100%/1   | Total: 38m 58s | Avg: 38m 58s | Max: 38m 58s
  🟩 Clang13            Pass: 100%/1   | Total: 36m 13s | Avg: 36m 13s | Max: 36m 13s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 24s | Avg:  5m 24s | Max:  5m 24s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 07m | Avg:  9m 38s | Max: 34m 47s
  🟥 GCC6               Pass:   0%/2   | Total:  6m 25s | Avg:  3m 12s | Max:  3m 14s
  🟥 GCC7               Pass:   0%/2   | Total:  6m 24s | Avg:  3m 12s | Max:  3m 58s
  🟥 GCC8               Pass:   0%/1   | Total:  2m 29s | Avg:  2m 29s | Max:  2m 29s
  🟥 GCC9               Pass:   0%/3   | Total: 10m 14s | Avg:  3m 24s | Max:  4m 05s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 00m | Avg:  7m 33s | Max: 12m 47s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 49m 38s | Avg: 49m 38s | Max: 49m 38s
  🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 14m | Avg:  1h 14m | Max:  1h 14m | Hits:   9%/1852  
  🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 08m | Avg:  1h 08m | Max:  1h 08m | Hits:   9%/1852  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 42m | Avg: 54m 18s | Max:  1h 10m | Hits:  39%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 10m
🟨 std
  🟨 11                 Pass:  40%/5   | Total:  1h 11m | Avg: 14m 14s | Max: 34m 09s
  🟨 14                 Pass:  50%/4   | Total:  1h 57m | Avg: 29m 21s | Max:  1h 14m | Hits:   9%/1852  
  🟨 17                 Pass:  75%/12  | Total:  6h 04m | Avg: 30m 22s | Max:  1h 10m | Hits:   9%/3704  
  🟩 20                 Pass: 100%/23  | Total:  6h 47m | Avg: 17m 42s | Max:  1h 10m | Hits:  54%/3704  
🟨 gpu
  🟨 v100               Pass:  82%/46  | Total: 16h 18m | Avg: 21m 16s | Max:  1h 14m | Hits:  27%/9260  
🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 18m 36s | Avg:  9m 18s | Max: 12m 30s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 31s | Avg:  4m 31s | Max:  4m 31s

🟨 cudax: Pass: 96%/26 | Total: 3h 19m | Avg: 7m 39s | Max: 24m 28s | Hits: 30%/312

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  95%/22  | Total:  3h 08m | Avg:  8m 34s | Max: 24m 28s | Hits:  30%/312   
  🟩 arm64              Pass: 100%/4   | Total: 10m 33s | Avg:  2m 38s | Max:  2m 44s
🔍 ctk: 12.0 🔍
  🔍 12.0               Pass:  66%/3   | Total: 26m 46s | Avg:  8m 55s | Max: 12m 52s | Hits:  30%/156   
  🟩 12.5               Pass: 100%/2   | Total: 17m 41s | Avg:  8m 50s | Max:  9m 25s
  🟩 12.6               Pass: 100%/21  | Total:  2h 34m | Avg:  7m 22s | Max: 24m 28s | Hits:  30%/156   
🔍 cudacxx: nvcc12.0 🔍
  🔍 nvcc12.0           Pass:  66%/3   | Total: 26m 46s | Avg:  8m 55s | Max: 12m 52s | Hits:  30%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 41s | Avg:  8m 50s | Max:  9m 25s
  🟩 nvcc12.6           Pass: 100%/21  | Total:  2h 34m | Avg:  7m 22s | Max: 24m 28s | Hits:  30%/156   
🚨 cxx: GCC9 🚨
  🟩 Clang9             Pass: 100%/1   | Total: 12m 52s | Avg: 12m 52s | Max: 12m 52s
  🟩 Clang10            Pass: 100%/1   | Total: 16m 08s | Avg: 16m 08s | Max: 16m 08s
  🟩 Clang11            Pass: 100%/1   | Total: 14m 13s | Avg: 14m 13s | Max: 14m 13s
  🟩 Clang12            Pass: 100%/1   | Total: 13m 44s | Avg: 13m 44s | Max: 13m 44s
  🟩 Clang13            Pass: 100%/1   | Total: 15m 30s | Avg: 15m 30s | Max: 15m 30s
  🟩 Clang14            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
  🟩 Clang18            Pass: 100%/4   | Total: 32m 58s | Avg:  8m 14s | Max: 24m 28s
  🔥 GCC9               Pass:   0%/1   | Total:  2m 47s | Avg:  2m 47s | Max:  2m 47s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
  🟩 GCC12              Pass: 100%/2   | Total: 21m 00s | Avg: 10m 30s | Max: 18m 02s
  🟩 GCC13              Pass: 100%/4   | Total: 10m 50s | Avg:  2m 42s | Max:  2m 53s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 07s | Avg: 11m 07s | Max: 11m 07s | Hits:  30%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 13s | Avg: 11m 13s | Max: 11m 13s | Hits:  30%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 41s | Avg:  8m 50s | Max:  9m 25s
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/13  | Total:  1h 58m | Avg:  9m 06s | Max: 24m 28s
  🔍 GCC                Pass:  88%/9   | Total: 40m 53s | Avg:  4m 32s | Max: 18m 02s
  🟩 MSVC               Pass: 100%/2   | Total: 22m 20s | Avg: 11m 10s | Max: 11m 13s | Hits:  30%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 17m 41s | Avg:  8m 50s | Max:  9m 25s
🔍 jobs: Build 🔍
  🔍 Build              Pass:  95%/24  | Total:  2h 36m | Avg:  6m 32s | Max: 16m 08s | Hits:  30%/312   
  🟩 Test               Pass: 100%/2   | Total: 42m 30s | Avg: 21m 15s | Max: 24m 28s
🔍 std: 17 🔍
  🔍 17                 Pass:  83%/6   | Total: 31m 59s | Avg:  5m 19s | Max: 12m 52s
  🟩 20                 Pass: 100%/20  | Total:  2h 47m | Avg:  8m 22s | Max: 24m 28s | Hits:  30%/312   
🟨 cudacxx_family
  🟨 nvcc               Pass:  96%/26  | Total:  3h 19m | Avg:  7m 39s | Max: 24m 28s | Hits:  30%/312   
🟨 gpu
  🟨 v100               Pass:  96%/26  | Total:  3h 19m | Avg:  7m 39s | Max: 24m 28s | Hits:  30%/312   
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 47s | Avg:  2m 47s | Max:  2m 47s
  🟩 90a                Pass: 100%/1   | Total:  2m 53s | Avg:  2m 53s | Max:  2m 53s

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 28s | Avg: 5m 14s | Max: 8m 32s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 32s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 32s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 32s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 32s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 32s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 32s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 32s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  1m 56s | Avg:  1m 56s | Max:  1m 56s
  🟩 Test               Pass: 100%/1   | Total:  8m 32s | Avg:  8m 32s | Max:  8m 32s

🟩 python: Pass: 100%/1 | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 170)

#	Runner
125	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

miscco · 2025-01-07T14:06:26Z

/ok to test

libcudacxx/include/cuda/std/__cccl/builtin.h

github-actions · 2025-01-07T16:01:58Z

🟨 CI finished in 1h 54m: Pass: 95%/170 | Total: 1d 16h | Avg: 14m 24s | Max: 1h 28m | Hits: 77%/22534

🟨 cub: Pass: 82%/47 | Total: 11h 16m | Avg: 14m 23s | Max: 56m 56s | Hits: 99%/3144

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  82%/45  | Total: 11h 06m | Avg: 14m 48s | Max: 56m 56s | Hits:  99%/3144  
  🟩 arm64              Pass: 100%/2   | Total: 10m 06s | Avg:  5m 03s | Max:  5m 09s
🔍 cudacxx_family: nvcc 🔍
  🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  4m 34s
  🔍 nvcc               Pass:  82%/45  | Total: 11h 07m | Avg: 14m 50s | Max: 56m 56s | Hits:  99%/3144  
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/19  | Total:  3h 18m | Avg: 10m 26s | Max: 56m 56s
  🔍 GCC                Pass:  61%/21  | Total:  6h 36m | Avg: 18m 51s | Max: 35m 42s
  🟩 Intel              Pass: 100%/1   | Total:  6m 56s | Avg:  6m 56s | Max:  6m 56s
  🟩 MSVC               Pass: 100%/4   | Total: 56m 57s | Avg: 14m 14s | Max: 15m 30s | Hits:  99%/3144  
  🟩 NVHPC              Pass: 100%/2   | Total: 18m 12s | Avg:  9m 06s | Max:  9m 22s
🔍 gpu: v100 🔍
  🟩 h100               Pass: 100%/2   | Total: 20m 20s | Avg: 10m 10s | Max: 16m 09s
  🔍 v100               Pass:  82%/45  | Total: 10h 56m | Avg: 14m 35s | Max: 56m 56s | Hits:  99%/3144  
🔍 jobs: Build 🔍
  🔍 Build              Pass:  80%/40  | Total:  7h 28m | Avg: 11m 12s | Max: 30m 50s | Hits:  99%/3144  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 29m 14s | Avg: 29m 14s | Max: 29m 14s
  🟩 GraphCapture       Pass: 100%/1   | Total: 24m 53s | Avg: 24m 53s | Max: 24m 53s
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 21m | Avg: 27m 13s | Max: 49m 24s
  🟩 TestGPU            Pass: 100%/2   | Total:  1h 32m | Avg: 46m 19s | Max: 56m 56s
🟨 ctk
  🟨 11.1               Pass:  42%/7   | Total:  2h 17m | Avg: 19m 37s | Max: 30m 38s | Hits:  99%/786   
  🟩 12.5               Pass: 100%/2   | Total: 18m 12s | Avg:  9m 06s | Max:  9m 22s
  🟨 12.6               Pass:  89%/38  | Total:  8h 41m | Avg: 13m 42s | Max: 56m 56s | Hits:  99%/2358  
🟨 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  4m 34s
  🟨 nvcc11.1           Pass:  42%/7   | Total:  2h 17m | Avg: 19m 37s | Max: 30m 38s | Hits:  99%/786   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 12s | Avg:  9m 06s | Max:  9m 22s
  🟨 nvcc12.6           Pass:  88%/36  | Total:  8h 32m | Avg: 14m 13s | Max: 56m 56s | Hits:  99%/2358  
🟨 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 21m 20s | Avg:  5m 20s | Max:  6m 29s
  🟩 Clang10            Pass: 100%/1   | Total:  6m 26s | Avg:  6m 26s | Max:  6m 26s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 43s | Avg:  5m 43s | Max:  5m 43s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 31s | Avg:  5m 31s | Max:  5m 31s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 42s | Avg:  5m 42s | Max:  5m 42s
  🟩 Clang18            Pass: 100%/7   | Total:  2h 10m | Avg: 18m 40s | Max: 56m 56s
  🟥 GCC6               Pass:   0%/2   | Total: 59m 10s | Avg: 29m 35s | Max: 30m 38s
  🟥 GCC7               Pass:   0%/2   | Total: 59m 32s | Avg: 29m 46s | Max: 30m 50s
  🟥 GCC8               Pass:   0%/1   | Total: 29m 22s | Avg: 29m 22s | Max: 29m 22s
  🟥 GCC9               Pass:   0%/3   | Total:  1h 23m | Avg: 27m 51s | Max: 29m 52s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 27s | Avg:  5m 27s | Max:  5m 27s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 59s | Avg:  5m 59s | Max:  5m 59s
  🟩 GCC12              Pass: 100%/3   | Total: 25m 58s | Avg:  8m 39s | Max: 16m 09s
  🟩 GCC13              Pass: 100%/8   | Total:  2h 07m | Avg: 15m 52s | Max: 35m 42s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  6m 56s | Avg:  6m 56s | Max:  6m 56s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 30s | Avg: 15m 30s | Max: 15m 30s | Hits:  99%/786   
  🟩 MSVC14.29          Pass: 100%/1   | Total: 13m 11s | Avg: 13m 11s | Max: 13m 11s | Hits:  99%/786   
  🟩 MSVC14.39          Pass: 100%/2   | Total: 28m 16s | Avg: 14m 08s | Max: 14m 17s | Hits:  99%/1572  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 12s | Avg:  9m 06s | Max:  9m 22s
🟨 std
  🟨 11                 Pass:  40%/5   | Total:  1h 37m | Avg: 19m 35s | Max: 30m 50s
  🟨 14                 Pass:  50%/4   | Total:  1h 19m | Avg: 19m 48s | Max: 28m 42s | Hits:  99%/786   
  🟨 17                 Pass:  75%/12  | Total:  2h 37m | Avg: 13m 06s | Max: 29m 52s | Hits:  99%/1572  
  🟩 20                 Pass: 100%/26  | Total:  5h 42m | Avg: 13m 09s | Max: 56m 56s | Hits:  99%/786   
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 20m 20s | Avg: 10m 10s | Max: 16m 09s
  🟩 90a                Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s

🟩 libcudacxx: Pass: 100%/48 | Total: 16h 13m | Avg: 20m 17s | Max: 1h 28m | Hits: 49%/9818

🟩 cpu
  🟩 amd64              Pass: 100%/46  | Total: 15h 36m | Avg: 20m 21s | Max:  1h 28m | Hits:  49%/9818  
  🟩 arm64              Pass: 100%/2   | Total: 37m 34s | Avg: 18m 47s | Max: 20m 09s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  2h 18m | Avg: 19m 43s | Max: 29m 09s | Hits:  38%/2240  
  🟩 12.5               Pass: 100%/2   | Total: 35m 47s | Avg: 17m 53s | Max: 26m 54s
  🟩 12.6               Pass: 100%/39  | Total: 13h 19m | Avg: 20m 30s | Max:  1h 28m | Hits:  53%/7578  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 02m | Avg: 15m 42s | Max: 21m 18s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  2h 18m | Avg: 19m 43s | Max: 29m 09s | Hits:  38%/2240  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 35m 47s | Avg: 17m 53s | Max: 26m 54s
  🟩 nvcc12.6           Pass: 100%/35  | Total: 12h 17m | Avg: 21m 03s | Max:  1h 28m | Hits:  53%/7578  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 02m | Avg: 15m 42s | Max: 21m 18s
  🟩 nvcc               Pass: 100%/44  | Total: 15h 10m | Avg: 20m 42s | Max:  1h 28m | Hits:  49%/9818  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  1h 09m | Avg: 17m 26s | Max: 21m 41s
  🟩 Clang10            Pass: 100%/1   | Total:  5m 22s | Avg:  5m 22s | Max:  5m 22s
  🟩 Clang11            Pass: 100%/1   | Total:  3m 57s | Avg:  3m 57s | Max:  3m 57s
  🟩 Clang12            Pass: 100%/1   | Total: 20m 15s | Avg: 20m 15s | Max: 20m 15s
  🟩 Clang13            Pass: 100%/1   | Total: 19m 25s | Avg: 19m 25s | Max: 19m 25s
  🟩 Clang14            Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
  🟩 Clang15            Pass: 100%/1   | Total: 19m 32s | Avg: 19m 32s | Max: 19m 32s
  🟩 Clang16            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
  🟩 Clang17            Pass: 100%/1   | Total: 21m 11s | Avg: 21m 11s | Max: 21m 11s
  🟩 Clang18            Pass: 100%/8   | Total:  2h 17m | Avg: 17m 14s | Max: 21m 18s
  🟩 GCC6               Pass: 100%/2   | Total: 34m 16s | Avg: 17m 08s | Max: 23m 29s
  🟩 GCC7               Pass: 100%/2   | Total: 31m 20s | Avg: 15m 40s | Max: 16m 26s
  🟩 GCC8               Pass: 100%/1   | Total: 20m 05s | Avg: 20m 05s | Max: 20m 05s
  🟩 GCC9               Pass: 100%/3   | Total: 57m 46s | Avg: 19m 15s | Max: 21m 20s
  🟩 GCC10              Pass: 100%/1   | Total: 21m 05s | Avg: 21m 05s | Max: 21m 05s
  🟩 GCC11              Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s
  🟩 GCC12              Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s
  🟩 GCC13              Pass: 100%/10  | Total:  5h 21m | Avg: 32m 08s | Max:  1h 28m
  🟩 Intel2023.2.0      Pass: 100%/1   | Total: 22m 34s | Avg: 22m 34s | Max: 22m 34s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 29m 09s | Avg: 29m 09s | Max: 29m 09s | Hits:  38%/2240  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 34m 28s | Avg: 34m 28s | Max: 34m 28s | Hits:  31%/2477  
  🟩 MSVC14.39          Pass: 100%/2   | Total: 52m 13s | Avg: 26m 06s | Max: 38m 09s | Hits:  63%/5101  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 35m 47s | Avg: 17m 53s | Max: 26m 54s
🟩 cxx_family
  🟩 Clang              Pass: 100%/20  | Total:  5h 05m | Avg: 15m 16s | Max: 21m 41s
  🟩 GCC                Pass: 100%/21  | Total:  8h 14m | Avg: 23m 31s | Max:  1h 28m
  🟩 Intel              Pass: 100%/1   | Total: 22m 34s | Avg: 22m 34s | Max: 22m 34s
  🟩 MSVC               Pass: 100%/4   | Total:  1h 55m | Avg: 28m 57s | Max: 38m 09s | Hits:  49%/9818  
  🟩 NVHPC              Pass: 100%/2   | Total: 35m 47s | Avg: 17m 53s | Max: 26m 54s
🟩 gpu
  🟩 v100               Pass: 100%/48  | Total: 16h 13m | Avg: 20m 17s | Max:  1h 28m | Hits:  49%/9818  
🟩 jobs
  🟩 Build              Pass: 100%/41  | Total: 11h 38m | Avg: 17m 02s | Max: 38m 09s | Hits:  49%/9818  
  🟩 NVRTC              Pass: 100%/4   | Total:  2h 44m | Avg: 41m 07s | Max: 47m 16s
  🟩 Test               Pass: 100%/2   | Total:  1h 48m | Avg: 54m 25s | Max:  1h 28m
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 56s | Avg:  1m 56s | Max:  1m 56s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 12m 01s | Avg: 12m 01s | Max: 12m 01s
  🟩 90a                Pass: 100%/2   | Total: 15m 56s | Avg:  7m 58s | Max: 12m 00s
🟩 std
  🟩 11                 Pass: 100%/6   | Total:  2h 02m | Avg: 20m 25s | Max: 25m 37s
  🟩 14                 Pass: 100%/5   | Total:  1h 56m | Avg: 23m 23s | Max: 44m 35s | Hits:  38%/2240  
  🟩 17                 Pass: 100%/13  | Total:  4h 37m | Avg: 21m 21s | Max: 47m 16s | Hits:  65%/4954  
  🟩 20                 Pass: 100%/23  | Total:  7h 34m | Avg: 19m 46s | Max:  1h 28m | Hits:  30%/2624

🟩 thrust: Pass: 100%/46 | Total: 10h 11m | Avg: 13m 17s | Max: 40m 33s | Hits: 99%/9260

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 24m 21s | Avg: 12m 10s | Max: 18m 05s
🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total: 10h 01m | Avg: 13m 40s | Max: 40m 33s | Hits:  99%/9260  
  🟩 arm64              Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  5m 08s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  2h 29m | Avg: 21m 23s | Max: 33m 41s | Hits:  99%/1852  
  🟩 12.5               Pass: 100%/2   | Total: 28m 34s | Avg: 14m 17s | Max: 15m 06s
  🟩 12.6               Pass: 100%/37  | Total:  7h 13m | Avg: 11m 42s | Max: 40m 33s | Hits:  99%/7408  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  5m 23s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  2h 29m | Avg: 21m 23s | Max: 33m 41s | Hits:  99%/1852  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 34s | Avg: 14m 17s | Max: 15m 06s
  🟩 nvcc12.6           Pass: 100%/35  | Total:  7h 02m | Avg: 12m 04s | Max: 40m 33s | Hits:  99%/7408  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  5m 23s
  🟩 nvcc               Pass: 100%/44  | Total: 10h 00m | Avg: 13m 39s | Max: 40m 33s | Hits:  99%/9260  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 21m 15s | Avg:  5m 18s | Max:  6m 41s
  🟩 Clang10            Pass: 100%/1   | Total:  7m 06s | Avg:  7m 06s | Max:  7m 06s
  🟩 Clang11            Pass: 100%/1   | Total:  4m 59s | Avg:  4m 59s | Max:  4m 59s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 09s | Avg:  5m 09s | Max:  5m 09s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 24s | Avg:  5m 24s | Max:  5m 24s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 48s | Avg:  5m 48s | Max:  5m 48s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
  🟩 Clang18            Pass: 100%/7   | Total: 56m 15s | Avg:  8m 02s | Max: 22m 09s
  🟩 GCC6               Pass: 100%/2   | Total: 58m 23s | Avg: 29m 11s | Max: 32m 38s
  🟩 GCC7               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 57s | Max: 38m 29s
  🟩 GCC8               Pass: 100%/1   | Total: 36m 02s | Avg: 36m 02s | Max: 36m 02s
  🟩 GCC9               Pass: 100%/3   | Total:  1h 43m | Avg: 34m 31s | Max: 40m 33s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 31s | Avg:  5m 31s | Max:  5m 31s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
  🟩 GCC12              Pass: 100%/1   | Total:  6m 01s | Avg:  6m 01s | Max:  6m 01s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 16m | Avg:  9m 36s | Max: 22m 34s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 19m 05s | Avg: 19m 05s | Max: 19m 05s | Hits:  99%/1852  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 16m 02s | Avg: 16m 02s | Max: 16m 02s | Hits:  99%/1852  
  🟩 MSVC14.39          Pass: 100%/3   | Total: 58m 33s | Avg: 19m 31s | Max: 23m 45s | Hits:  99%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 34s | Avg: 14m 17s | Max: 15m 06s
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  2h 02m | Avg:  6m 26s | Max: 22m 09s
  🟩 GCC                Pass: 100%/19  | Total:  5h 59m | Avg: 18m 56s | Max: 40m 33s
  🟩 Intel              Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s
  🟩 MSVC               Pass: 100%/5   | Total:  1h 33m | Avg: 18m 44s | Max: 23m 45s | Hits:  99%/9260  
  🟩 NVHPC              Pass: 100%/2   | Total: 28m 34s | Avg: 14m 17s | Max: 15m 06s
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total: 10h 11m | Avg: 13m 17s | Max: 40m 33s | Hits:  99%/9260  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total:  8h 29m | Avg: 12m 44s | Max: 40m 33s | Hits:  99%/7408  
  🟩 TestCPU            Pass: 100%/3   | Total: 39m 10s | Avg: 13m 03s | Max: 23m 45s | Hits:  99%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total:  1h 02m | Avg: 20m 56s | Max: 22m 34s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 39s | Avg:  4m 39s | Max:  4m 39s
🟩 std
  🟩 11                 Pass: 100%/5   | Total:  1h 34m | Avg: 18m 49s | Max: 29m 26s
  🟩 14                 Pass: 100%/4   | Total:  1h 36m | Avg: 24m 13s | Max: 38m 29s | Hits:  99%/1852  
  🟩 17                 Pass: 100%/12  | Total:  3h 13m | Avg: 16m 05s | Max: 40m 33s | Hits:  99%/3704  
  🟩 20                 Pass: 100%/23  | Total:  3h 23m | Avg:  8m 49s | Max: 23m 45s | Hits:  99%/3704

🟩 cudax: Pass: 100%/26 | Total: 2h 15m | Avg: 5m 13s | Max: 23m 59s | Hits: 92%/312

🟩 cpu
  🟩 amd64              Pass: 100%/22  | Total:  2h 05m | Avg:  5m 41s | Max: 23m 59s | Hits:  92%/312   
  🟩 arm64              Pass: 100%/4   | Total: 10m 20s | Avg:  2m 35s | Max:  2m 38s
🟩 ctk
  🟩 12.0               Pass: 100%/3   | Total: 14m 37s | Avg:  4m 52s | Max:  8m 48s | Hits:  92%/156   
  🟩 12.5               Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 31s
  🟩 12.6               Pass: 100%/21  | Total:  1h 50m | Avg:  5m 14s | Max: 23m 59s | Hits:  92%/156   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/3   | Total: 14m 37s | Avg:  4m 52s | Max:  8m 48s | Hits:  92%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 31s
  🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 50m | Avg:  5m 14s | Max: 23m 59s | Hits:  92%/156   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/26  | Total:  2h 15m | Avg:  5m 13s | Max: 23m 59s | Hits:  92%/312   
🟩 cxx
  🟩 Clang9             Pass: 100%/1   | Total:  3m 04s | Avg:  3m 04s | Max:  3m 04s
  🟩 Clang10            Pass: 100%/1   | Total:  3m 45s | Avg:  3m 45s | Max:  3m 45s
  🟩 Clang11            Pass: 100%/1   | Total:  2m 59s | Avg:  2m 59s | Max:  2m 59s
  🟩 Clang12            Pass: 100%/1   | Total:  2m 59s | Avg:  2m 59s | Max:  2m 59s
  🟩 Clang13            Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
  🟩 Clang14            Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 17s | Avg:  3m 17s | Max:  3m 17s
  🟩 Clang18            Pass: 100%/4   | Total: 32m 25s | Avg:  8m 06s | Max: 23m 59s
  🟩 GCC9               Pass: 100%/1   | Total:  2m 45s | Avg:  2m 45s | Max:  2m 45s
  🟩 GCC10              Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
  🟩 GCC11              Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s
  🟩 GCC12              Pass: 100%/2   | Total: 24m 32s | Avg: 12m 16s | Max: 21m 14s
  🟩 GCC13              Pass: 100%/4   | Total: 11m 01s | Avg:  2m 45s | Max:  3m 02s
  🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 48s | Avg:  8m 48s | Max:  8m 48s | Hits:  92%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 47s | Avg:  9m 47s | Max:  9m 47s | Hits:  92%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 31s
🟩 cxx_family
  🟩 Clang              Pass: 100%/13  | Total:  1h 01m | Avg:  4m 44s | Max: 23m 59s
  🟩 GCC                Pass: 100%/9   | Total: 44m 27s | Avg:  4m 56s | Max: 21m 14s
  🟩 MSVC               Pass: 100%/2   | Total: 18m 35s | Avg:  9m 17s | Max:  9m 47s | Hits:  92%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 31s
🟩 gpu
  🟩 v100               Pass: 100%/26  | Total:  2h 15m | Avg:  5m 13s | Max: 23m 59s | Hits:  92%/312   
🟩 jobs
  🟩 Build              Pass: 100%/24  | Total:  1h 30m | Avg:  3m 46s | Max:  9m 47s | Hits:  92%/312   
  🟩 Test               Pass: 100%/2   | Total: 45m 13s | Avg: 22m 36s | Max: 23m 59s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 47s | Avg:  2m 47s | Max:  2m 47s
  🟩 90a                Pass: 100%/1   | Total:  3m 02s | Avg:  3m 02s | Max:  3m 02s
🟩 std
  🟩 17                 Pass: 100%/6   | Total: 19m 18s | Avg:  3m 13s | Max:  5m 29s
  🟩 20                 Pass: 100%/20  | Total:  1h 56m | Avg:  5m 49s | Max: 23m 59s | Hits:  92%/312

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 32s | Avg: 6m 16s | Max: 10m 26s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max: 10m 26s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max: 10m 26s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max: 10m 26s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max: 10m 26s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max: 10m 26s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max: 10m 26s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total: 12m 32s | Avg:  6m 16s | Max: 10m 26s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
  🟩 Test               Pass: 100%/1   | Total: 10m 26s | Avg: 10m 26s | Max: 10m 26s

🟩 python: Pass: 100%/1 | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 170)

#	Runner
125	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

miscco · 2025-01-07T16:40:53Z

/ok to test

github-actions · 2025-01-07T17:58:22Z

🟨 CI finished in 1h 15m: Pass: 99%/170 | Total: 1d 18h | Avg: 14m 52s | Max: 1h 08m | Hits: 72%/22534

🟨 libcudacxx: Pass: 97%/48 | Total: 15h 13m | Avg: 19m 02s | Max: 1h 08m | Hits: 37%/9818

🔍 cpu: amd64 🔍
  🔍 amd64              Pass:  97%/46  | Total: 14h 35m | Avg: 19m 02s | Max:  1h 08m | Hits:  37%/9818  
  🟩 arm64              Pass: 100%/2   | Total: 38m 15s | Avg: 19m 07s | Max: 20m 37s
🔍 ctk: 11.1 🔍
  🔍 11.1               Pass:  85%/7   | Total:  2h 02m | Avg: 17m 29s | Max: 29m 28s | Hits:  34%/2240  
  🟩 12.5               Pass: 100%/2   | Total: 43m 36s | Avg: 21m 48s | Max: 35m 20s
  🟩 12.6               Pass: 100%/39  | Total: 12h 27m | Avg: 19m 10s | Max:  1h 08m | Hits:  38%/7578  
🔍 cudacxx: nvcc11.1 🔍
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 05m | Avg: 16m 18s | Max: 21m 35s
  🔍 nvcc11.1           Pass:  85%/7   | Total:  2h 02m | Avg: 17m 29s | Max: 29m 28s | Hits:  34%/2240  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 43m 36s | Avg: 21m 48s | Max: 35m 20s
  🟩 nvcc12.6           Pass: 100%/35  | Total: 11h 22m | Avg: 19m 30s | Max:  1h 08m | Hits:  38%/7578  
🔍 cudacxx_family: nvcc 🔍
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 05m | Avg: 16m 18s | Max: 21m 35s
  🔍 nvcc               Pass:  97%/44  | Total: 14h 08m | Avg: 19m 17s | Max:  1h 08m | Hits:  37%/9818  
🔍 cxx: GCC6 🔍
  🟩 Clang9             Pass: 100%/4   | Total: 42m 03s | Avg: 10m 30s | Max: 17m 12s
  🟩 Clang10            Pass: 100%/1   | Total: 21m 17s | Avg: 21m 17s | Max: 21m 17s
  🟩 Clang11            Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
  🟩 Clang12            Pass: 100%/1   | Total: 20m 47s | Avg: 20m 47s | Max: 20m 47s
  🟩 Clang13            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
  🟩 Clang14            Pass: 100%/1   | Total: 19m 47s | Avg: 19m 47s | Max: 19m 47s
  🟩 Clang15            Pass: 100%/1   | Total: 21m 14s | Avg: 21m 14s | Max: 21m 14s
  🟩 Clang16            Pass: 100%/1   | Total: 21m 18s | Avg: 21m 18s | Max: 21m 18s
  🟩 Clang17            Pass: 100%/1   | Total: 18m 28s | Avg: 18m 28s | Max: 18m 28s
  🟩 Clang18            Pass: 100%/8   | Total:  2h 31m | Avg: 18m 54s | Max:  1h 00m
  🔍 GCC6               Pass:  50%/2   | Total: 34m 42s | Avg: 17m 21s | Max: 23m 02s
  🟩 GCC7               Pass: 100%/2   | Total: 29m 39s | Avg: 14m 49s | Max: 14m 54s
  🟩 GCC8               Pass: 100%/1   | Total: 21m 30s | Avg: 21m 30s | Max: 21m 30s
  🟩 GCC9               Pass: 100%/3   | Total:  1h 00m | Avg: 20m 00s | Max: 21m 53s
  🟩 GCC10              Pass: 100%/1   | Total: 20m 52s | Avg: 20m 52s | Max: 20m 52s
  🟩 GCC11              Pass: 100%/1   | Total: 22m 27s | Avg: 22m 27s | Max: 22m 27s
  🟩 GCC12              Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
  🟩 GCC13              Pass: 100%/10  | Total:  3h 28m | Avg: 20m 48s | Max:  1h 08m
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 29m 28s | Avg: 29m 28s | Max: 29m 28s | Hits:  34%/2240  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 36m 27s | Avg: 36m 27s | Max: 36m 27s | Hits:  31%/2477  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 12m | Avg: 36m 27s | Max: 36m 39s | Hits:  41%/5101  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 43m 36s | Avg: 21m 48s | Max: 35m 20s
🔍 cxx_family: GCC 🔍
  🟩 Clang              Pass: 100%/20  | Total:  5h 24m | Avg: 16m 14s | Max:  1h 00m
  🔍 GCC                Pass:  95%/21  | Total:  6h 41m | Avg: 19m 06s | Max:  1h 08m
  🟩 Intel              Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
  🟩 MSVC               Pass: 100%/4   | Total:  2h 18m | Avg: 34m 42s | Max: 36m 39s | Hits:  37%/9818  
  🟩 NVHPC              Pass: 100%/2   | Total: 43m 36s | Avg: 21m 48s | Max: 35m 20s
🔍 jobs: Build 🔍
  🔍 Build              Pass:  97%/41  | Total: 11h 18m | Avg: 16m 32s | Max: 36m 39s | Hits:  37%/9818  
  🟩 NVRTC              Pass: 100%/4   | Total:  1h 44m | Avg: 26m 09s | Max: 31m 32s
  🟩 Test               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 08m
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 16s | Avg:  2m 16s | Max:  2m 16s
🔍 std: 14 🔍
  🟩 11                 Pass: 100%/6   | Total:  1h 40m | Avg: 16m 41s | Max: 23m 02s
  🔍 14                 Pass:  80%/5   | Total:  1h 30m | Avg: 18m 07s | Max: 30m 00s | Hits:  34%/2240  
  🟩 17                 Pass: 100%/13  | Total:  4h 02m | Avg: 18m 37s | Max: 36m 27s | Hits:  31%/4954  
  🟩 20                 Pass: 100%/23  | Total:  7h 58m | Avg: 20m 49s | Max:  1h 08m | Hits:  50%/2624  
🟨 gpu
  🟨 v100               Pass:  97%/48  | Total: 15h 13m | Avg: 19m 02s | Max:  1h 08m | Hits:  37%/9818  
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 12m 14s | Avg: 12m 14s | Max: 12m 14s
  🟩 90a                Pass: 100%/2   | Total: 17m 30s | Avg:  8m 45s | Max: 13m 40s

🟩 cub: Pass: 100%/47 | Total: 14h 07m | Avg: 18m 01s | Max: 1h 00m | Hits: 99%/3144

🟩 cpu
  🟩 amd64              Pass: 100%/45  | Total: 13h 57m | Avg: 18m 36s | Max:  1h 00m | Hits:  99%/3144  
  🟩 arm64              Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  5m 17s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  3h 46m | Avg: 32m 21s | Max: 54m 11s | Hits:  99%/786   
  🟩 12.5               Pass: 100%/2   | Total: 18m 52s | Avg:  9m 26s | Max:  9m 36s
  🟩 12.6               Pass: 100%/38  | Total: 10h 02m | Avg: 15m 50s | Max:  1h 00m | Hits:  99%/2358  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 15s | Avg:  4m 07s | Max:  4m 08s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  3h 46m | Avg: 32m 21s | Max: 54m 11s | Hits:  99%/786   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 52s | Avg:  9m 26s | Max:  9m 36s
  🟩 nvcc12.6           Pass: 100%/36  | Total:  9h 53m | Avg: 16m 29s | Max:  1h 00m | Hits:  99%/2358  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 15s | Avg:  4m 07s | Max:  4m 08s
  🟩 nvcc               Pass: 100%/45  | Total: 13h 59m | Avg: 18m 39s | Max:  1h 00m | Hits:  99%/3144  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 20m 58s | Avg:  5m 14s | Max:  5m 54s
  🟩 Clang10            Pass: 100%/1   | Total:  6m 43s | Avg:  6m 43s | Max:  6m 43s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 48s | Avg:  5m 48s | Max:  5m 48s
  🟩 Clang13            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 31s | Avg:  5m 31s | Max:  5m 31s
  🟩 Clang18            Pass: 100%/7   | Total:  1h 20m | Avg: 11m 33s | Max: 35m 00s
  🟩 GCC6               Pass: 100%/2   | Total:  1h 35m | Avg: 47m 38s | Max: 47m 43s
  🟩 GCC7               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 54s | Max:  1h 00m
  🟩 GCC8               Pass: 100%/1   | Total: 55m 07s | Avg: 55m 07s | Max: 55m 07s
  🟩 GCC9               Pass: 100%/3   | Total:  2h 46m | Avg: 55m 25s | Max:  1h 00m
  🟩 GCC10              Pass: 100%/1   | Total:  5m 48s | Avg:  5m 48s | Max:  5m 48s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 56s | Avg:  5m 56s | Max:  5m 56s
  🟩 GCC12              Pass: 100%/3   | Total: 26m 08s | Avg:  8m 42s | Max: 16m 06s
  🟩 GCC13              Pass: 100%/8   | Total:  2h 23m | Avg: 17m 54s | Max: 36m 42s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  6m 28s | Avg:  6m 28s | Max:  6m 28s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s | Hits:  99%/786   
  🟩 MSVC14.29          Pass: 100%/1   | Total: 12m 42s | Avg: 12m 42s | Max: 12m 42s | Hits:  99%/786   
  🟩 MSVC14.39          Pass: 100%/2   | Total: 28m 40s | Avg: 14m 20s | Max: 14m 24s | Hits:  99%/1572  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 52s | Avg:  9m 26s | Max:  9m 36s
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  2h 27m | Avg:  7m 45s | Max: 35m 00s
  🟩 GCC                Pass: 100%/21  | Total: 10h 17m | Avg: 29m 24s | Max:  1h 00m
  🟩 Intel              Pass: 100%/1   | Total:  6m 28s | Avg:  6m 28s | Max:  6m 28s
  🟩 MSVC               Pass: 100%/4   | Total: 57m 17s | Avg: 14m 19s | Max: 15m 55s | Hits:  99%/3144  
  🟩 NVHPC              Pass: 100%/2   | Total: 18m 52s | Avg:  9m 26s | Max:  9m 36s
🟩 gpu
  🟩 h100               Pass: 100%/2   | Total: 20m 17s | Avg: 10m 08s | Max: 16m 06s
  🟩 v100               Pass: 100%/45  | Total: 13h 47m | Avg: 18m 23s | Max:  1h 00m | Hits:  99%/3144  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total: 10h 53m | Avg: 16m 19s | Max:  1h 00m | Hits:  99%/3144  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 33m 47s | Avg: 33m 47s | Max: 33m 47s
  🟩 GraphCapture       Pass: 100%/1   | Total: 28m 47s | Avg: 28m 47s | Max: 28m 47s
  🟩 HostLaunch         Pass: 100%/3   | Total:  1h 00m | Avg: 20m 03s | Max: 22m 19s
  🟩 TestGPU            Pass: 100%/2   | Total:  1h 11m | Avg: 35m 51s | Max: 36m 42s
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 20m 17s | Avg: 10m 08s | Max: 16m 06s
  🟩 90a                Pass: 100%/1   | Total:  4m 34s | Avg:  4m 34s | Max:  4m 34s
🟩 std
  🟩 11                 Pass: 100%/5   | Total:  2h 52m | Avg: 34m 35s | Max:  1h 00m
  🟩 14                 Pass: 100%/4   | Total:  2h 08m | Avg: 32m 08s | Max: 59m 02s | Hits:  99%/786   
  🟩 17                 Pass: 100%/12  | Total:  3h 57m | Avg: 19m 46s | Max:  1h 00m | Hits:  99%/1572  
  🟩 20                 Pass: 100%/26  | Total:  5h 08m | Avg: 11m 52s | Max: 36m 42s | Hits:  99%/786

🟩 thrust: Pass: 100%/46 | Total: 9h 55m | Avg: 12m 56s | Max: 38m 09s | Hits: 99%/9260

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 21m 23s | Avg: 10m 41s | Max: 15m 07s
🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total:  9h 45m | Avg: 13m 18s | Max: 38m 09s | Hits:  99%/9260  
  🟩 arm64              Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 18s
🟩 ctk
  🟩 11.1               Pass: 100%/7   | Total:  2h 42m | Avg: 23m 12s | Max: 38m 08s | Hits:  99%/1852  
  🟩 12.5               Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 14m 29s
  🟩 12.6               Pass: 100%/37  | Total:  6h 44m | Avg: 10m 55s | Max: 38m 09s | Hits:  99%/7408  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 34s | Avg:  4m 47s | Max:  4m 48s
  🟩 nvcc11.1           Pass: 100%/7   | Total:  2h 42m | Avg: 23m 12s | Max: 38m 08s | Hits:  99%/1852  
  🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 14m 29s
  🟩 nvcc12.6           Pass: 100%/35  | Total:  6h 34m | Avg: 11m 16s | Max: 38m 09s | Hits:  99%/7408  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 34s | Avg:  4m 47s | Max:  4m 48s
  🟩 nvcc               Pass: 100%/44  | Total:  9h 45m | Avg: 13m 18s | Max: 38m 09s | Hits:  99%/9260  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total: 20m 42s | Avg:  5m 10s | Max:  5m 48s
  🟩 Clang10            Pass: 100%/1   | Total:  7m 10s | Avg:  7m 10s | Max:  7m 10s
  🟩 Clang11            Pass: 100%/1   | Total:  5m 03s | Avg:  5m 03s | Max:  5m 03s
  🟩 Clang12            Pass: 100%/1   | Total:  5m 17s | Avg:  5m 17s | Max:  5m 17s
  🟩 Clang13            Pass: 100%/1   | Total:  4m 58s | Avg:  4m 58s | Max:  4m 58s
  🟩 Clang14            Pass: 100%/1   | Total:  5m 08s | Avg:  5m 08s | Max:  5m 08s
  🟩 Clang15            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
  🟩 Clang16            Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s
  🟩 Clang17            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
  🟩 Clang18            Pass: 100%/7   | Total: 46m 25s | Avg:  6m 37s | Max: 14m 00s
  🟩 GCC6               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 48s | Max: 37m 18s
  🟩 GCC7               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 06s | Max: 34m 08s
  🟩 GCC8               Pass: 100%/1   | Total: 36m 02s | Avg: 36m 02s | Max: 36m 02s
  🟩 GCC9               Pass: 100%/3   | Total:  1h 47m | Avg: 35m 51s | Max: 38m 09s
  🟩 GCC10              Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
  🟩 GCC11              Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
  🟩 GCC12              Pass: 100%/1   | Total:  5m 52s | Avg:  5m 52s | Max:  5m 52s
  🟩 GCC13              Pass: 100%/8   | Total:  1h 06m | Avg:  8m 22s | Max: 15m 42s
  🟩 Intel2023.2.0      Pass: 100%/1   | Total:  6m 50s | Avg:  6m 50s | Max:  6m 50s
  🟩 MSVC14.16          Pass: 100%/1   | Total: 18m 07s | Avg: 18m 07s | Max: 18m 07s | Hits:  99%/1852  
  🟩 MSVC14.29          Pass: 100%/1   | Total: 18m 09s | Avg: 18m 09s | Max: 18m 09s | Hits:  99%/1852  
  🟩 MSVC14.39          Pass: 100%/3   | Total: 55m 32s | Avg: 18m 30s | Max: 22m 43s | Hits:  99%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 14m 29s
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total:  1h 50m | Avg:  5m 49s | Max: 14m 00s
  🟩 GCC                Pass: 100%/19  | Total:  5h 57m | Avg: 18m 48s | Max: 38m 09s
  🟩 Intel              Pass: 100%/1   | Total:  6m 50s | Avg:  6m 50s | Max:  6m 50s
  🟩 MSVC               Pass: 100%/5   | Total:  1h 31m | Avg: 18m 21s | Max: 22m 43s | Hits:  99%/9260  
  🟩 NVHPC              Pass: 100%/2   | Total: 28m 37s | Avg: 14m 18s | Max: 14m 29s
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total:  9h 55m | Avg: 12m 56s | Max: 38m 09s | Hits:  99%/9260  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total:  8h 32m | Avg: 12m 49s | Max: 38m 09s | Hits:  99%/7408  
  🟩 TestCPU            Pass: 100%/3   | Total: 37m 40s | Avg: 12m 33s | Max: 22m 43s | Hits:  99%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total: 44m 49s | Avg: 14m 56s | Max: 15m 42s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
🟩 std
  🟩 11                 Pass: 100%/5   | Total:  1h 39m | Avg: 19m 56s | Max: 31m 18s
  🟩 14                 Pass: 100%/4   | Total:  1h 35m | Avg: 23m 50s | Max: 37m 18s | Hits:  99%/1852  
  🟩 17                 Pass: 100%/12  | Total:  3h 15m | Avg: 16m 19s | Max: 38m 09s | Hits:  99%/3704  
  🟩 20                 Pass: 100%/23  | Total:  3h 03m | Avg:  7m 57s | Max: 22m 43s | Hits:  99%/3704

🟩 cudax: Pass: 100%/26 | Total: 2h 14m | Avg: 5m 10s | Max: 16m 17s | Hits: 92%/312

🟩 cpu
  🟩 amd64              Pass: 100%/22  | Total:  2h 04m | Avg:  5m 38s | Max: 16m 17s | Hits:  92%/312   
  🟩 arm64              Pass: 100%/4   | Total: 10m 22s | Avg:  2m 35s | Max:  2m 42s
🟩 ctk
  🟩 12.0               Pass: 100%/3   | Total: 26m 18s | Avg:  8m 46s | Max: 13m 11s | Hits:  92%/156   
  🟩 12.5               Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 32s
  🟩 12.6               Pass: 100%/21  | Total:  1h 37m | Avg:  4m 37s | Max: 16m 17s | Hits:  92%/156   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/3   | Total: 26m 18s | Avg:  8m 46s | Max: 13m 11s | Hits:  92%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 32s
  🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 37m | Avg:  4m 37s | Max: 16m 17s | Hits:  92%/156   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/26  | Total:  2h 14m | Avg:  5m 10s | Max: 16m 17s | Hits:  92%/312   
🟩 cxx
  🟩 Clang9             Pass: 100%/1   | Total:  2m 59s | Avg:  2m 59s | Max:  2m 59s
  🟩 Clang10            Pass: 100%/1   | Total:  3m 26s | Avg:  3m 26s | Max:  3m 26s
  🟩 Clang11            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
  🟩 Clang12            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s
  🟩 Clang13            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
  🟩 Clang14            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s
  🟩 Clang15            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s
  🟩 Clang16            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
  🟩 Clang17            Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s
  🟩 Clang18            Pass: 100%/4   | Total: 24m 45s | Avg:  6m 11s | Max: 16m 17s
  🟩 GCC9               Pass: 100%/1   | Total: 13m 11s | Avg: 13m 11s | Max: 13m 11s
  🟩 GCC10              Pass: 100%/1   | Total:  2m 55s | Avg:  2m 55s | Max:  2m 55s
  🟩 GCC11              Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s
  🟩 GCC12              Pass: 100%/2   | Total: 19m 34s | Avg:  9m 47s | Max: 16m 06s
  🟩 GCC13              Pass: 100%/4   | Total: 10m 59s | Avg:  2m 44s | Max:  2m 57s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 08s | Avg: 10m 08s | Max: 10m 08s | Hits:  92%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 27s | Avg:  9m 27s | Max:  9m 27s | Hits:  92%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 32s
🟩 cxx_family
  🟩 Clang              Pass: 100%/13  | Total: 54m 21s | Avg:  4m 10s | Max: 16m 17s
  🟩 GCC                Pass: 100%/9   | Total: 49m 35s | Avg:  5m 30s | Max: 16m 06s
  🟩 MSVC               Pass: 100%/2   | Total: 19m 35s | Avg:  9m 47s | Max: 10m 08s | Hits:  92%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 32s
🟩 gpu
  🟩 v100               Pass: 100%/26  | Total:  2h 14m | Avg:  5m 10s | Max: 16m 17s | Hits:  92%/312   
🟩 jobs
  🟩 Build              Pass: 100%/24  | Total:  1h 42m | Avg:  4m 15s | Max: 13m 11s | Hits:  92%/312   
  🟩 Test               Pass: 100%/2   | Total: 32m 23s | Avg: 16m 11s | Max: 16m 17s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s
  🟩 90a                Pass: 100%/1   | Total:  2m 57s | Avg:  2m 57s | Max:  2m 57s
🟩 std
  🟩 17                 Pass: 100%/6   | Total: 29m 38s | Avg:  4m 56s | Max: 13m 11s
  🟩 20                 Pass: 100%/20  | Total:  1h 44m | Avg:  5m 14s | Max: 16m 17s | Hits:  92%/312

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 37s | Avg: 4m 48s | Max: 7m 36s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 36s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 36s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 36s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 36s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 36s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 36s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  7m 36s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s
  🟩 Test               Pass: 100%/1   | Total:  7m 36s | Avg:  7m 36s | Max:  7m 36s

🟩 python: Pass: 100%/1 | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 170)

#	Runner
125	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
15	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

miscco · 2025-01-09T13:23:56Z

/ok to test

github-actions · 2025-01-09T15:23:17Z

🟩 CI finished in 1h 57m: Pass: 100%/168 | Total: 4d 01h | Avg: 34m 40s | Max: 1h 16m | Hits: 199%/27782

🟩 libcudacxx: Pass: 100%/48 | Total: 19h 23m | Avg: 24m 14s | Max: 1h 16m | Hits: 328%/12458

🟩 cpu
  🟩 amd64              Pass: 100%/46  | Total: 18h 43m | Avg: 24m 25s | Max:  1h 16m | Hits: 328%/12458 
  🟩 arm64              Pass: 100%/2   | Total: 39m 53s | Avg: 19m 56s | Max: 21m 08s
🟩 ctk
  🟩 12.0               Pass: 100%/8   | Total:  2h 37m | Avg: 19m 41s | Max: 35m 07s | Hits: 328%/4865  
  🟩 12.5               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 55s | Max: 37m 09s
  🟩 12.6               Pass: 100%/38  | Total: 15h 36m | Avg: 24m 38s | Max:  1h 16m | Hits: 329%/7593  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 10m | Avg: 17m 30s | Max: 20m 52s
  🟩 nvcc12.0           Pass: 100%/8   | Total:  2h 37m | Avg: 19m 41s | Max: 35m 07s | Hits: 328%/4865  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 09m | Avg: 34m 55s | Max: 37m 09s
  🟩 nvcc12.6           Pass: 100%/34  | Total: 14h 26m | Avg: 25m 28s | Max:  1h 16m | Hits: 329%/7593  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 10m | Avg: 17m 30s | Max: 20m 52s
  🟩 nvcc               Pass: 100%/44  | Total: 18h 13m | Avg: 24m 51s | Max:  1h 16m | Hits: 328%/12458 
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  1h 09m | Avg: 17m 26s | Max: 21m 21s
  🟩 Clang10            Pass: 100%/1   | Total: 22m 48s | Avg: 22m 48s | Max: 22m 48s
  🟩 Clang11            Pass: 100%/1   | Total: 22m 55s | Avg: 22m 55s | Max: 22m 55s
  🟩 Clang12            Pass: 100%/1   | Total: 22m 28s | Avg: 22m 28s | Max: 22m 28s
  🟩 Clang13            Pass: 100%/1   | Total: 21m 58s | Avg: 21m 58s | Max: 21m 58s
  🟩 Clang14            Pass: 100%/1   | Total: 21m 33s | Avg: 21m 33s | Max: 21m 33s
  🟩 Clang15            Pass: 100%/1   | Total: 21m 51s | Avg: 21m 51s | Max: 21m 51s
  🟩 Clang16            Pass: 100%/1   | Total: 20m 58s | Avg: 20m 58s | Max: 20m 58s
  🟩 Clang17            Pass: 100%/1   | Total: 22m 36s | Avg: 22m 36s | Max: 22m 36s
  🟩 Clang18            Pass: 100%/8   | Total:  3h 09m | Avg: 23m 38s | Max: 57m 55s
  🟩 GCC7               Pass: 100%/4   | Total:  1h 05m | Avg: 16m 20s | Max: 18m 28s
  🟩 GCC8               Pass: 100%/1   | Total: 20m 56s | Avg: 20m 56s | Max: 20m 56s
  🟩 GCC9               Pass: 100%/3   | Total: 53m 08s | Avg: 17m 42s | Max: 21m 20s
  🟩 GCC10              Pass: 100%/1   | Total: 23m 27s | Avg: 23m 27s | Max: 23m 27s
  🟩 GCC11              Pass: 100%/1   | Total: 21m 26s | Avg: 21m 26s | Max: 21m 26s
  🟩 GCC12              Pass: 100%/1   | Total: 24m 55s | Avg: 24m 55s | Max: 24m 55s
  🟩 GCC13              Pass: 100%/10  | Total:  4h 27m | Avg: 26m 43s | Max:  1h 16m
  🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 39m | Avg: 33m 14s | Max: 37m 07s | Hits: 328%/7347  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 21m | Avg: 40m 43s | Max: 43m 26s | Hits: 329%/5111  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 55s | Max: 37m 09s
🟩 cxx_family
  🟩 Clang              Pass: 100%/20  | Total:  7h 16m | Avg: 21m 48s | Max: 57m 55s
  🟩 GCC                Pass: 100%/21  | Total:  7h 56m | Avg: 22m 41s | Max:  1h 16m
  🟩 MSVC               Pass: 100%/5   | Total:  3h 01m | Avg: 36m 14s | Max: 43m 26s | Hits: 328%/12458 
  🟩 NVHPC              Pass: 100%/2   | Total:  1h 09m | Avg: 34m 55s | Max: 37m 09s
🟩 gpu
  🟩 v100               Pass: 100%/48  | Total: 19h 23m | Avg: 24m 14s | Max:  1h 16m | Hits: 328%/12458 
🟩 jobs
  🟩 Build              Pass: 100%/41  | Total: 15h 23m | Avg: 22m 31s | Max: 43m 26s | Hits: 328%/12458 
  🟩 NVRTC              Pass: 100%/4   | Total:  1h 43m | Avg: 25m 54s | Max: 28m 42s
  🟩 Test               Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 16m
  🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 54s | Avg:  1m 54s | Max:  1m 54s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 14m 15s | Avg: 14m 15s | Max: 14m 15s
  🟩 90a                Pass: 100%/2   | Total: 31m 45s | Avg: 15m 52s | Max: 17m 36s
🟩 std
  🟩 11                 Pass: 100%/6   | Total:  1h 38m | Avg: 16m 23s | Max: 23m 41s
  🟩 14                 Pass: 100%/4   | Total:  1h 34m | Avg: 23m 44s | Max: 28m 42s | Hits: 328%/2393  
  🟩 17                 Pass: 100%/14  | Total:  5h 49m | Avg: 24m 59s | Max: 38m 01s | Hits: 328%/7436  
  🟩 20                 Pass: 100%/23  | Total: 10h 18m | Avg: 26m 53s | Max:  1h 16m | Hits: 330%/2629

🟩 cub: Pass: 100%/47 | Total: 1d 16h | Avg: 52m 20s | Max: 1h 09m | Hits: 26%/3900

🟩 cpu
  🟩 amd64              Pass: 100%/45  | Total:  1d 14h | Avg: 51m 55s | Max:  1h 09m | Hits:  26%/3900  
  🟩 arm64              Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 04m
🟩 ctk
  🟩 12.0               Pass: 100%/8   | Total:  7h 48m | Avg: 58m 36s | Max:  1h 04m | Hits:  26%/1560  
  🟩 12.5               Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m
  🟩 12.6               Pass: 100%/37  | Total:  1d 06h | Avg: 50m 15s | Max:  1h 09m | Hits:  27%/2340  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
  🟩 nvcc12.0           Pass: 100%/8   | Total:  7h 48m | Avg: 58m 36s | Max:  1h 04m | Hits:  26%/1560  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m
  🟩 nvcc12.6           Pass: 100%/35  | Total:  1d 04h | Avg: 49m 38s | Max:  1h 09m | Hits:  27%/2340  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
  🟩 nvcc               Pass: 100%/45  | Total:  1d 14h | Avg: 51m 57s | Max:  1h 09m | Hits:  26%/3900  
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  3h 53m | Avg: 58m 19s | Max:  1h 02m
  🟩 Clang10            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
  🟩 Clang11            Pass: 100%/1   | Total: 54m 50s | Avg: 54m 50s | Max: 54m 50s
  🟩 Clang12            Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
  🟩 Clang13            Pass: 100%/1   | Total: 57m 55s | Avg: 57m 55s | Max: 57m 55s
  🟩 Clang14            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
  🟩 Clang15            Pass: 100%/1   | Total: 56m 21s | Avg: 56m 21s | Max: 56m 21s
  🟩 Clang16            Pass: 100%/1   | Total: 56m 32s | Avg: 56m 32s | Max: 56m 32s
  🟩 Clang17            Pass: 100%/1   | Total: 59m 30s | Avg: 59m 30s | Max: 59m 30s
  🟩 Clang18            Pass: 100%/7   | Total:  5h 46m | Avg: 49m 32s | Max:  1h 04m
  🟩 GCC7               Pass: 100%/4   | Total:  3h 42m | Avg: 55m 37s | Max: 56m 29s
  🟩 GCC8               Pass: 100%/1   | Total: 54m 46s | Avg: 54m 46s | Max: 54m 46s
  🟩 GCC9               Pass: 100%/3   | Total:  2h 48m | Avg: 56m 16s | Max: 59m 14s
  🟩 GCC10              Pass: 100%/1   | Total: 55m 27s | Avg: 55m 27s | Max: 55m 27s
  🟩 GCC11              Pass: 100%/1   | Total: 57m 20s | Avg: 57m 20s | Max: 57m 20s
  🟩 GCC12              Pass: 100%/3   | Total:  1h 42m | Avg: 34m 14s | Max: 59m 09s
  🟩 GCC13              Pass: 100%/8   | Total:  4h 51m | Avg: 36m 23s | Max: 58m 41s
  🟩 MSVC14.29          Pass: 100%/3   | Total:  3h 13m | Avg:  1h 04m | Max:  1h 05m | Hits:  26%/2340  
  🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  27%/1560  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total: 17h 27m | Avg: 55m 09s | Max:  1h 04m
  🟩 GCC                Pass: 100%/21  | Total: 15h 52m | Avg: 45m 22s | Max: 59m 14s
  🟩 MSVC               Pass: 100%/5   | Total:  5h 27m | Avg:  1h 05m | Max:  1h 09m | Hits:  26%/3900  
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m
🟩 gpu
  🟩 h100               Pass: 100%/2   | Total: 43m 34s | Avg: 21m 47s | Max: 27m 23s
  🟩 v100               Pass: 100%/45  | Total:  1d 16h | Avg: 53m 41s | Max:  1h 09m | Hits:  26%/3900  
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total:  1d 14h | Avg: 57m 31s | Max:  1h 09m | Hits:  26%/3900  
  🟩 DeviceLaunch       Pass: 100%/1   | Total: 28m 38s | Avg: 28m 38s | Max: 28m 38s
  🟩 GraphCapture       Pass: 100%/1   | Total: 19m 03s | Avg: 19m 03s | Max: 19m 03s
  🟩 HostLaunch         Pass: 100%/3   | Total: 53m 23s | Avg: 17m 47s | Max: 19m 43s
  🟩 TestGPU            Pass: 100%/2   | Total: 57m 41s | Avg: 28m 50s | Max: 29m 15s
🟩 sm
  🟩 90                 Pass: 100%/2   | Total: 43m 34s | Avg: 21m 47s | Max: 27m 23s
  🟩 90a                Pass: 100%/1   | Total: 26m 00s | Avg: 26m 00s | Max: 26m 00s
🟩 std
  🟩 11                 Pass: 100%/5   | Total:  4h 33m | Avg: 54m 43s | Max: 55m 37s
  🟩 14                 Pass: 100%/3   | Total:  3h 01m | Avg:  1h 00m | Max:  1h 04m | Hits:  26%/780   
  🟩 17                 Pass: 100%/13  | Total: 12h 59m | Avg: 59m 57s | Max:  1h 06m | Hits:  26%/2340  
  🟩 20                 Pass: 100%/26  | Total: 20h 25m | Avg: 47m 08s | Max:  1h 09m | Hits:  27%/780

🟩 thrust: Pass: 100%/46 | Total: 1d 06h | Avg: 39m 41s | Max: 1h 13m | Hits: 119%/11112

🟩 cmake_options
  🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 43m 13s | Avg: 21m 36s | Max: 31m 56s
🟩 cpu
  🟩 amd64              Pass: 100%/44  | Total:  1d 05h | Avg: 39m 50s | Max:  1h 13m | Hits: 119%/11112 
  🟩 arm64              Pass: 100%/2   | Total:  1h 12m | Avg: 36m 16s | Max: 38m 08s
🟩 ctk
  🟩 12.0               Pass: 100%/8   | Total:  5h 44m | Avg: 43m 01s | Max:  1h 08m | Hits:  62%/3704  
  🟩 12.5               Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
  🟩 12.6               Pass: 100%/36  | Total: 22h 20m | Avg: 37m 13s | Max:  1h 13m | Hits: 147%/7408  
🟩 cudacxx
  🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 38m 13s
  🟩 nvcc12.0           Pass: 100%/8   | Total:  5h 44m | Avg: 43m 01s | Max:  1h 08m | Hits:  62%/3704  
  🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
  🟩 nvcc12.6           Pass: 100%/34  | Total: 21h 07m | Avg: 37m 16s | Max:  1h 13m | Hits: 147%/7408  
🟩 cudacxx_family
  🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 38m 13s
  🟩 nvcc               Pass: 100%/44  | Total:  1d 05h | Avg: 39m 49s | Max:  1h 13m | Hits: 119%/11112 
🟩 cxx
  🟩 Clang9             Pass: 100%/4   | Total:  2h 23m | Avg: 35m 46s | Max: 39m 52s
  🟩 Clang10            Pass: 100%/1   | Total: 40m 33s | Avg: 40m 33s | Max: 40m 33s
  🟩 Clang11            Pass: 100%/1   | Total: 42m 03s | Avg: 42m 03s | Max: 42m 03s
  🟩 Clang12            Pass: 100%/1   | Total: 36m 50s | Avg: 36m 50s | Max: 36m 50s
  🟩 Clang13            Pass: 100%/1   | Total: 38m 32s | Avg: 38m 32s | Max: 38m 32s
  🟩 Clang14            Pass: 100%/1   | Total: 41m 48s | Avg: 41m 48s | Max: 41m 48s
  🟩 Clang15            Pass: 100%/1   | Total: 43m 29s | Avg: 43m 29s | Max: 43m 29s
  🟩 Clang16            Pass: 100%/1   | Total: 39m 39s | Avg: 39m 39s | Max: 39m 39s
  🟩 Clang17            Pass: 100%/1   | Total: 42m 44s | Avg: 42m 44s | Max: 42m 44s
  🟩 Clang18            Pass: 100%/7   | Total:  3h 29m | Avg: 29m 52s | Max: 40m 22s
  🟩 GCC7               Pass: 100%/4   | Total:  2h 17m | Avg: 34m 21s | Max: 38m 37s
  🟩 GCC8               Pass: 100%/1   | Total: 38m 09s | Avg: 38m 09s | Max: 38m 09s
  🟩 GCC9               Pass: 100%/3   | Total:  1h 52m | Avg: 37m 28s | Max: 40m 55s
  🟩 GCC10              Pass: 100%/1   | Total: 40m 35s | Avg: 40m 35s | Max: 40m 35s
  🟩 GCC11              Pass: 100%/1   | Total: 39m 36s | Avg: 39m 36s | Max: 39m 36s
  🟩 GCC12              Pass: 100%/1   | Total: 41m 44s | Avg: 41m 44s | Max: 41m 44s
  🟩 GCC13              Pass: 100%/8   | Total:  3h 33m | Avg: 26m 43s | Max: 44m 36s
  🟩 MSVC14.29          Pass: 100%/3   | Total:  3h 22m | Avg:  1h 07m | Max:  1h 09m | Hits:  75%/5556  
  🟩 MSVC14.39          Pass: 100%/3   | Total:  3h 00m | Avg:  1h 00m | Max:  1h 13m | Hits: 163%/5556  
  🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
🟩 cxx_family
  🟩 Clang              Pass: 100%/19  | Total: 11h 17m | Avg: 35m 40s | Max: 43m 29s
  🟩 GCC                Pass: 100%/19  | Total: 10h 23m | Avg: 32m 49s | Max: 44m 36s
  🟩 MSVC               Pass: 100%/6   | Total:  6h 23m | Avg:  1h 03m | Max:  1h 13m | Hits: 119%/11112 
  🟩 NVHPC              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m
🟩 gpu
  🟩 v100               Pass: 100%/46  | Total:  1d 06h | Avg: 39m 41s | Max:  1h 13m | Hits: 119%/11112 
🟩 jobs
  🟩 Build              Pass: 100%/40  | Total:  1d 04h | Avg: 43m 23s | Max:  1h 13m | Hits:  70%/9260  
  🟩 TestCPU            Pass: 100%/3   | Total: 49m 10s | Avg: 16m 23s | Max: 34m 33s | Hits: 365%/1852  
  🟩 TestGPU            Pass: 100%/3   | Total: 40m 41s | Avg: 13m 33s | Max: 15m 24s
🟩 sm
  🟩 90a                Pass: 100%/1   | Total: 25m 38s | Avg: 25m 38s | Max: 25m 38s
🟩 std
  🟩 11                 Pass: 100%/5   | Total:  2h 37m | Avg: 31m 30s | Max: 32m 18s
  🟩 14                 Pass: 100%/3   | Total:  2h 21m | Avg: 47m 12s | Max:  1h 04m | Hits:  62%/1852  
  🟩 17                 Pass: 100%/13  | Total: 10h 35m | Avg: 48m 53s | Max:  1h 12m | Hits:  75%/5556  
  🟩 20                 Pass: 100%/23  | Total: 14h 07m | Avg: 36m 51s | Max:  1h 13m | Hits: 214%/3704

🟩 cudax: Pass: 100%/24 | Total: 5h 41m | Avg: 14m 12s | Max: 19m 10s | Hits: 62%/312

🟩 cpu
  🟩 amd64              Pass: 100%/20  | Total:  4h 46m | Avg: 14m 18s | Max: 19m 10s | Hits:  62%/312   
  🟩 arm64              Pass: 100%/4   | Total: 55m 05s | Avg: 13m 46s | Max: 14m 31s
🟩 ctk
  🟩 12.0               Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s | Hits:  62%/156   
  🟩 12.5               Pass: 100%/2   | Total: 18m 57s | Avg:  9m 28s | Max:  9m 50s
  🟩 12.6               Pass: 100%/21  | Total:  5h 10m | Avg: 14m 47s | Max: 19m 10s | Hits:  62%/156   
🟩 cudacxx
  🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s | Hits:  62%/156   
  🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 57s | Avg:  9m 28s | Max:  9m 50s
  🟩 nvcc12.6           Pass: 100%/21  | Total:  5h 10m | Avg: 14m 47s | Max: 19m 10s | Hits:  62%/156   
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/24  | Total:  5h 41m | Avg: 14m 12s | Max: 19m 10s | Hits:  62%/312   
🟩 cxx
  🟩 Clang10            Pass: 100%/1   | Total: 14m 30s | Avg: 14m 30s | Max: 14m 30s
  🟩 Clang11            Pass: 100%/1   | Total: 13m 47s | Avg: 13m 47s | Max: 13m 47s
  🟩 Clang12            Pass: 100%/1   | Total: 15m 07s | Avg: 15m 07s | Max: 15m 07s
  🟩 Clang13            Pass: 100%/1   | Total: 15m 23s | Avg: 15m 23s | Max: 15m 23s
  🟩 Clang14            Pass: 100%/1   | Total: 13m 27s | Avg: 13m 27s | Max: 13m 27s
  🟩 Clang15            Pass: 100%/1   | Total: 15m 31s | Avg: 15m 31s | Max: 15m 31s
  🟩 Clang16            Pass: 100%/1   | Total: 15m 52s | Avg: 15m 52s | Max: 15m 52s
  🟩 Clang17            Pass: 100%/1   | Total: 16m 53s | Avg: 16m 53s | Max: 16m 53s
  🟩 Clang18            Pass: 100%/4   | Total:  1h 01m | Avg: 15m 21s | Max: 19m 10s
  🟩 GCC10              Pass: 100%/1   | Total: 17m 22s | Avg: 17m 22s | Max: 17m 22s
  🟩 GCC11              Pass: 100%/1   | Total: 15m 20s | Avg: 15m 20s | Max: 15m 20s
  🟩 GCC12              Pass: 100%/2   | Total: 32m 38s | Avg: 16m 19s | Max: 16m 32s
  🟩 GCC13              Pass: 100%/4   | Total: 51m 15s | Avg: 12m 48s | Max: 14m 14s
  🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 39s | Avg: 11m 39s | Max: 11m 39s | Hits:  62%/156   
  🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 59s | Avg: 11m 59s | Max: 11m 59s | Hits:  62%/156   
  🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 57s | Avg:  9m 28s | Max:  9m 50s
🟩 cxx_family
  🟩 Clang              Pass: 100%/12  | Total:  3h 01m | Avg: 15m 09s | Max: 19m 10s
  🟩 GCC                Pass: 100%/8   | Total:  1h 56m | Avg: 14m 34s | Max: 17m 22s
  🟩 MSVC               Pass: 100%/2   | Total: 23m 38s | Avg: 11m 49s | Max: 11m 59s | Hits:  62%/312   
  🟩 NVHPC              Pass: 100%/2   | Total: 18m 57s | Avg:  9m 28s | Max:  9m 50s
🟩 gpu
  🟩 v100               Pass: 100%/24  | Total:  5h 41m | Avg: 14m 12s | Max: 19m 10s | Hits:  62%/312   
🟩 jobs
  🟩 Build              Pass: 100%/22  | Total:  5h 05m | Avg: 13m 52s | Max: 17m 22s | Hits:  62%/312   
  🟩 Test               Pass: 100%/2   | Total: 35m 42s | Avg: 17m 51s | Max: 19m 10s
🟩 sm
  🟩 90                 Pass: 100%/1   | Total: 11m 02s | Avg: 11m 02s | Max: 11m 02s
  🟩 90a                Pass: 100%/1   | Total: 12m 08s | Avg: 12m 08s | Max: 12m 08s
🟩 std
  🟩 17                 Pass: 100%/4   | Total: 46m 29s | Avg: 11m 37s | Max: 13m 51s
  🟩 20                 Pass: 100%/20  | Total:  4h 54m | Avg: 14m 43s | Max: 19m 10s | Hits:  62%/312

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 54s | Avg: 4m 57s | Max: 7m 40s

🟩 cpu
  🟩 amd64              Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  7m 40s
🟩 ctk
  🟩 12.6               Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  7m 40s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  7m 40s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  7m 40s
🟩 cxx
  🟩 GCC13              Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  7m 40s
🟩 cxx_family
  🟩 GCC                Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  7m 40s
🟩 gpu
  🟩 v100               Pass: 100%/2   | Total:  9m 54s | Avg:  4m 57s | Max:  7m 40s
🟩 jobs
  🟩 Build              Pass: 100%/1   | Total:  2m 14s | Avg:  2m 14s | Max:  2m 14s
  🟩 Test               Pass: 100%/1   | Total:  7m 40s | Avg:  7m 40s | Max:  7m 40s

🟩 python: Pass: 100%/1 | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
🟩 ctk
  🟩 12.6               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
🟩 cudacxx
  🟩 nvcc12.6           Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
🟩 gpu
  🟩 v100               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
+/-	libcu++
	CUB
	Thrust
	CUDA Experimental
	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
+/-	libcu++
+/-	CUB
+/-	Thrust
+/-	CUDA Experimental
+/-	python
+/-	CCCL C Parallel Library
+/-	Catch2Helper

🏃‍ Runner counts (total jobs: 168)

#	Runner
120	`linux-amd64-cpu16`
19	`linux-amd64-gpu-v100-latest-1`
18	`windows-amd64-cpu16`
10	`linux-arm64-cpu16`
1	`linux-amd64-gpu-h100-latest-1-testing`

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC

implement `add_sat` split `signed`/`unsigned` implementation, improve implementation for MSVC improve device `add_sat` implementation add `add_sat` test improve generic `add_sat` implementation for signed types implement `sub_sat` allow more msvc intrinsics on x86 add op tests partially implement `mul_sat` implement `div_sat` and `saturate_cast` add `saturate_cast` test simplify `div_sat` test Deprectate C++11 and C++14 for libcu++ (#3173) * Deprectate C++11 and C++14 for libcu++ Co-authored-by: Bernhard Manfred Gruber <[email protected]> Implement `abs` and `div` from `cstdlib` (#3153) * implement integer abs functions * improve tests, fix constexpr support * just use the our implementation * implement `cuda::std::div` * prefer host's `div_t` like types * provide `cuda::std::abs` overloads for floats * allow fp abs for NVRTC * silence msvc's warning about conversion from floating point to integral Fix missing radix sort policies (#3174) Fixes NVBug 5009941 Introduces new `DeviceReduce::Arg{Min,Max}` interface with two output iterators (#3148) * introduces new arg{min,max} interface with two output iterators * adds fp inf tests * fixes docs * improves code example * fixes exec space specifier * trying to fix deprecation warning for more compilers * inlines unzip operator * trying to fix deprecation warning for nvhpc * integrates supression fixes in diagnostics * pre-ctk 11.5 deprecation suppression * fixes icc * fix for pre-ctk11.5 * cleans up deprecation suppression * cleanup Extend tuning documentation (#3179) Add codespell pre-commit hook, fix typos in CCCL (#3168) * Add codespell pre-commit hook * Automatic changes from codespell. * Manual changes. Fix parameter space for TUNE_LOAD in scan benchmark (#3176) fix various old compiler checks (#3178) implement C++26 `std::projected` (#3175) Fix pre-commit config for codespell and remaining typos (#3182) Massive cleanup of our config (#3155) Fix UB in atomics with automatic storage (#2586) * Adds specialized local cuda atomics and injects them into most atomics paths. Co-authored-by: Georgy Evtushenko <[email protected]> Co-authored-by: gonzalobg <[email protected]> * Allow CUDA 12.2 to keep perf, this addresses earlier comments in #478 * Remove extraneous double brackets in unformatted code. * Merge unsafe atomic logic into `__cuda_is_local`. * Use `const_cast` for type conversions in cuda_local.h * Fix build issues from interface changes * Fix missing __nanosleep on sm70- * Guard __isLocal from NVHPC * Use PTX instead of running nothing from NVHPC * fixup /s/nvrtc/nvhpc * Fixup missing CUDA ifdef surrounding device code * Fix codegen * Bypass some sort of compiler bug on GCC7 * Apply suggestions from code review * Use unsafe automatic storage atomics in codegen tests --------- Co-authored-by: Georgy Evtushenko <[email protected]> Co-authored-by: gonzalobg <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Refactor the source code layout for `cuda.parallel` (#3177) * Refactor the source layout for cuda.parallel * Add copyright * Address review feedback * Don't import anything into `experimental` namespace * fix import --------- Co-authored-by: Ashwin Srinath <[email protected]> new type-erased memory resources (#2824) s/_LIBCUDACXX_DECLSPEC_EMPTY_BASES/_CCCL_DECLSPEC_EMPTY_BASES/g (#3186) Document address stability of `thrust::transform` (#3181) * Do not document _LIBCUDACXX_MARK_CAN_COPY_ARGUMENTS * Reformat and fix UnaryFunction/BinaryFunction in transform docs * Mention transform can use proclaim_copyable_arguments * Document cuda::proclaims_copyable_arguments better * Deprecate depending on transform functor argument addresses Fixes: #3053 turn off cuda version check for clangd (#3194) [STF] jacobi example based on parallel_for (#3187) * Simple jacobi example with parallel for and reductions * clang-format * remove useless capture list fixes pre-nv_diag suppression issues (#3189) Prefer c2h::type_name over c2h::demangle (#3195) Fix memcpy_async* tests (#3197) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test Add type annotations and mypy checks for `cuda.parallel` (#3180) * Refactor the source layout for cuda.parallel * Add initial type annotations * Update pre-commit config * More typing * Fix bad merge * Fix TYPE_CHECKING and numpy annotations * typing bindings.py correctly * Address review feedback --------- Co-authored-by: Ashwin Srinath <[email protected]> Fix rendering of cuda.parallel docs (#3192) * Fix pre-commit config for codespell and remaining typos * Fix rendering of docs for cuda.parallel --------- Co-authored-by: Ashwin Srinath <[email protected]> Enable PDL for DeviceMergeSortBlockSortKernel (#3199) The kernel already contains a call to _CCCL_PDL_GRID_DEPENDENCY_SYNC. This commit enables PDL when launching the kernel. Adds support for large `num_items` to `DeviceReduce::{ArgMin,ArgMax}` (#2647) * adds benchmarks for reduce::arg{min,max} * preliminary streaming arg-extremum reduction * fixes implicit conversion * uses streaming dispatch class * changes arg benches to use new streaming reduce * streaming arg-extrema reduction * fixes style * fixes compilation failures * cleanups * adds rst style comments * declare vars const and use clamp * consolidates argmin argmax benchmarks * fixes thrust usage * drops offset type in arg-extrema benchmarks * fixes clang cuda * exec space macros * switch to signed global offset type for slightly better perf * clarifies documentation * applies minor benchmark style changes from review comments * fixes interface documentation and comments * list-init accumulating output op * improves style, comments, and tests * cleans up aggregate init * renames dispatch class usage in benchmarks * fixes merge conflicts * addresses review comments * addresses review comments * fixes assertion * removes superseded implementation * changes large problem tests to use new interface * removes obsolete tests for deprecated interface Fixes for Python 3.7 docs environment (#3206) Co-authored-by: Ashwin Srinath <[email protected]> Adds support for large number of items to `DeviceTransform` (#3172) * moves large problem test helper to common file * adds support for large num items to device transform * adds tests for large number of items to device interface * fixes format * addresses review comments cp_async_bulk: Fix test (#3198) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test * cp_async_bulk: Fix test The global memory pointer could be misaligned. cudax fixes for msvc 14.41 (#3200) avoid instantiating class templates in `is_same` implementation when possible (#3203) Fix: make launchers a CUB detail; make kernel source functions hidden. (#3209) * Fix: make launchers a CUB detail; make kernel source functions hidden. * [pre-commit.ci] auto code formatting * Address review comments, fix which macro gets fixed. help the ranges concepts recognize standard contiguous iterators in c++14/17 (#3202) unify macros and cmake options that control the suppression of deprecation warnings (#3220) * unify macros and cmake options that control the suppression of deprecation warnings * suppress nvcc warning #186 in thrust header tests * suppress c++ dialect deprecation warnings in libcudacxx header tests Fx thread-reduce performance regression (#3225) cuda.parallel: In-memory caching of build objects (#3216) * Define __eq__ and __hash__ for Iterators * Define cache_with_key utility and use it to cache Reduce objects * Add tests for caching Reduce objects * Tighten up types * Updates to support 3.7 * Address review feedback * Introduce IteratorKind to hold iterator type information * Use the .kind to generate an abi_name * Remove __eq__ and __hash__ methods from IteratorBase * Move helper function * Formatting * Don't unpack tuple in cache key --------- Co-authored-by: Ashwin Srinath <[email protected]> Just enough ranges for c++14 `span` (#3211) use generalized concepts portability macros to simplify the `range` concept (#3217) fixes some issues in the concepts portability macros and then re-implements the `range` concept with `_CCCL_REQUIRES_EXPR` Use Ruff to sort imports (#3230) * Update pyproject.tomls for import sorting * Update files after running pre-commit * Move ruff config to pyproject.toml --------- Co-authored-by: Ashwin Srinath <[email protected]> fix tuning_scan sm90 config issue (#3236) Co-authored-by: Shijie Chen <[email protected]> [STF] Logical token (#3196) * Split the implementation of the void interface into the definition of the interface, and its implementations on streams and graphs. * Add missing files * Check if a task implementation can match a prototype where the void_interface arguments are ignored * Implement ctx.abstract_logical_data() which relies on a void data interface * Illustrate how to use abstract handles in local contexts * Introduce an is_void_interface() virtual method in the data interface to potentially optimize some stages * Small improvements in the examples * Do not try to allocate or move void data * Do not use I as a variable * fix linkage error * rename abtract_logical_data into logical_token * Document logical token * fix spelling error * fix sphinx error * reflect name changes * use meaningful variable names * simplify logical_token implementation because writeback is already disabled * add a unit test for token elision * implement token elision in host_launch * Remove unused type * Implement helpers to check if a function can be invoked from a tuple, or from a tuple where we removed tokens * Much simpler is_tuple_invocable_with_filtered implementation * Fix buggy test * Factorize code * Document that we can ignore tokens for task and host_launch * Documentation for logical data freeze Fix ReduceByKey tuning (#3240) Fix RLE tuning (#3239) cuda.parallel: Forbid non-contiguous arrays as inputs (or outputs) (#3233) * Forbid non-contiguous arrays as inputs (or outputs) * Implement a more robust way to check for contiguity * Don't bother if cublas unavailable * Fix how we check for zero-element arrays * sort imports --------- Co-authored-by: Ashwin Srinath <[email protected]> expands support for more offset types in segmented benchmark (#3231) Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects (#3253) * Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects * Do not add option twice ptx: Add add_instruction.py (#3190) This file helps create the necessary structure for new PTX instructions. Co-authored-by: Allard Hendriksen <[email protected]> Bump main to 2.9.0. (#3247) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop cub::Mutex (#3251) Fixes: #3250 Remove legacy macros from CUB util_arch.cuh (#3257) Fixes: #3256 Remove thrust::[unary|binary]_traits (#3260) Fixes: #3259 Architecture and OS identification macros (#3237) Bump main to 3.0.0. (#3265) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop thrust not1 and not2 (#3264) Fixes: #3263 CCCL Internal macro documentation (#3238) Deprecate GridBarrier and GridBarrierLifetime (#3258) Fixes: #1389 Require at least gcc7 (#3268) Fixes: #3267 Drop thrust::[unary|binary]_function (#3274) Fixes: #3273 Drop ICC from CI (#3277) [STF] Corruption of the capture list of an extended lambda with a parallel_for construct on a host execution place (#3270) * Add a test to reproduce a bug observed with parallel_for on a host place * clang-format * use _CCCL_ASSERT * Attempt to debug * do not create a tuple with a universal reference that is out of scope when we use it, use an lvalue instead * fix lambda expression * clang-format Enable thrust::identity test for non-MSVC (#3281) This seems to be an oversight when the test was added Co-authored-by: Michael Schellenberger Costa <[email protected]> Enable PDL in triple chevron launch (#3282) It seems PDL was disabled by accident when _THRUST_HAS_PDL was renamed to _CCCL_HAS_PDL during the review introducing the feature. Disambiguate line continuations and macro continuations in <nv/target> (#3244) Drop VS 2017 from CI (#3287) Fixes: #3286 Drop ICC support in code (#3279) * Drop ICC from code Fixes: #3278 Co-authored-by: Michael Schellenberger Costa <[email protected]> Make CUB NVRTC commandline arguments come from a cmake template (#3292) Propose the same components (thrust, cub, libc++, cudax, cuda.parallel,...) in the bug report template than in the feature request template (#3295) Use process isolation instead of default hyper-v for Windows. (#3294) Try improving build times by using process isolation instead of hyper-v Co-authored-by: Michael Schellenberger Costa <[email protected]> [pre-commit.ci] pre-commit autoupdate (#3248) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.6](https://github.com/pre-commit/mirrors-clang-format/compare/v18.1.8...v19.1.6) - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.13.0...v1.14.1) Co-authored-by: Michael Schellenberger Costa <[email protected]> Drop Thrust legacy arch macros (#3298) Which were disabled and could be re-enabled using THRUST_PROVIDE_LEGACY_ARCH_MACROS Drop Thrust's compiler_fence.h (#3300) Drop CTK 11.x from CI (#3275) * Add cuda12.0-gcc7 devcontainer * Move MSVC2017 jobs to CTK 12.6 Those is the only combination where rapidsai has devcontainers * Add /Zc:__cplusplus for the libcudacxx tests * Only add excape hatch for affected CTKs * Workaround missing cudaLaunchKernelEx on MSVC cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro, which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we provide our own copy of cudaLaunchKernelEx when it is not available from the CTK. * Workaround nvcc+MSVC issue * Regenerate devcontainers Fixes: #3249 Co-authored-by: Michael Schellenberger Costa <[email protected]> Drop CUB's util_compiler.cuh (#3302) All contained macros were deprecated Update packman and repo_docs versions (#3293) Co-authored-by: Ashwin Srinath <[email protected]> Drop Thrust's deprecated compiler macros (#3301) Drop CUB_RUNTIME_ENABLED and __THRUST_HAS_CUDART__ (#3305) Adds support for large number of items to `DevicePartition::If` with the `ThreeWayPartition` overload (#2506) * adds support for large number of items to three-way partition * adapts interface to use choose_signed_offset_t * integrates applicable feedback from device-select pr * changes behavior for empty problems * unifies grid constant macro * fixes kernel template specialization mismatch * integrates _CCCL_GRID_CONSTANT changes * resolve merge conflicts * fixes checks in test * fixes test verification * improves tests * makes few improvements to streaming dispatch * improves code comment on test * fixes unrelated compiler error * minor style improvements Refactor scan tunings (#3262) Require C++17 for compiling Thrust and CUB (#3255) * Issue an unsuppressable warning when compiling with < C++17 * Remove C++11/14 presets * Remove CCCL_IGNORE_DEPRECATED_CPP_DIALECT from headers * Remove [CUB|THRUST|TCT]_IGNORE_DEPRECATED_CPP_[11|14] * Remove CUB_ENABLE_DIALECT_CPP[11|14] * Update CI runs * Remove C++11/14 CI runs for CUB and Thrust * Raise compiler minimum versions for C++17 * Update ReadMe * Drop Thrust's cpp14_required.h * Add escape hatch for C++17 removal Fixes: #3252 Implement `views::empty` (#3254) * Disable pair conversion of subrange with clang in C++17 * Fix namespace views * Implement `views::empty` This implements `std::ranges::views::empty`, see https://en.cppreference.com/w/cpp/ranges/empty_view Refactor `limits` and `climits` (#3221) * implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC cuda.parallel: Add documentation for the current iterators along with examples and tests (#3311) * Add tests demonstrating usage of different iterators * Update documentation of reduce_into by merging import code snippet with the rest of the example * Add documentation for current iterators * Run pre-commit checks and update accordingly * Fix comments to refer to the proper lines in the code snippets in the docs Drop clang<14 from CI, update devcontainers. (#3309) Co-authored-by: Bernhard Manfred Gruber <[email protected]> [STF] Cleanup task dependencies object constructors (#3291) * Define tag types for access modes * - Rework how we build task_dep objects based on access mode tags - pack_state is now responsible for using a const_cast for read only data * Greatly simplify the previous attempt : do not define new types, but use integral constants based on the enums * It seems the const_cast was not necessarily so we can simplify it and not even do some dispatch based on access modes Disable test with a gcc-14 regression (#3297) Deprecate Thrust's cpp_compatibility.h macros (#3299) Remove dropped function objects from docs (#3319) Document `NV_TARGET` macros (#3313) [STF] Define ctx.pick_stream() which was missing for the unified context (#3326) * Define ctx.pick_stream() which was missing for the unified context * clang-format Deprecate cub::IterateThreadStore (#3337) Drop CUB's BinaryFlip operator (#3332) Deprecate cub::Swap (#3333) Clarify transform output can overlap input (#3323) Drop CUB APIs with a debug_synchronous parameter (#3330) Fixes: #3329 Drop CUB's util_compiler.cuh for real (#3340) PR #3302 planned to drop the file, but only dropped its content. This was an oversight. So let's drop the entire file. Drop cub::ValueCache (#3346) limits offset types for merge sort (#3328) Drop CDPv1 (#3344) Fixes: #3341 Drop thrust::void_t (#3362) Use cuda::std::addressof in Thrust (#3363) Fix all_of documentation for empty ranges (#3358) all_of always returns true on an empty range. [STF] Do not keep track of dangling events in a CUDA graph backend (#3327) * Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources. * replace can_ignore_dangling_events by track_dangling_events which leads to more readable code * When not storing the dangling events, we must still perform the deinit operations that were producing these events ! Extract scan kernels into NVRTC-compilable header (#3334) * Extract scan kernels into NVRTC-compilable header * Update cub/cub/device/dispatch/dispatch_scan.cuh Co-authored-by: Georgii Evtushenko <[email protected]> --------- Co-authored-by: Ashwin Srinath <[email protected]> Co-authored-by: Georgii Evtushenko <[email protected]> Drop deprecated aliases in Thrust functional (#3272) Fixes: #3271 Drop cub::DivideAndRoundUp (#3347) Use cuda::std::min/max in Thrust (#3364) Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (#3361) * implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` Cleanup util_arch (#2773) Deprecate thrust::null_type (#3367) Deprecate cub::DeviceSpmv (#3320) Fixes: #896 Improves `DeviceSegmentedSort` test run time for large number of items and segments (#3246) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * fixes spelling * adds tests for large number of segments * fixes narrowing conversion in tests * addresses review comments * fixes includes Compile basic infra test with C++17 (#3377) Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (#3308) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * addresses review comments * introduces segment offset type * adds tests for large number of segments * adds support for large number of segments * drops segment offset type * fixes thrust namespace * removes about-to-be-deprecated cub iterators * no exec specifier on defaulted ctor * fixes gcc7 linker error * uses local_segment_index_t throughout * determine offset type based on type returned by segment iterator begin/end iterators * minor style improvements Exit with error when RAPIDS CI fails. (#3385) cuda.parallel: Support structured types as algorithm inputs (#3218) * Introduce gpu_struct decorator and typing * Enable `reduce` to accept arrays of structs as inputs * Add test for reducing arrays-of-struct * Update documentation * Use a numpy array rather than ctypes object * Change zeros -> empty for output array and temp storage * Add a TODO for typing GpuStruct * Documentation udpates * Remove test_reduce_struct_type from test_reduce.py * Revert to `to_cccl_value()` accepting ndarray + GpuStruct * Bump copyrights --------- Co-authored-by: Ashwin Srinath <[email protected]> Deprecate thrust::async (#3324) Fixes: #100 Review/Deprecate CUB `util.ptx` for CCCL 2.x (#3342) Fix broken `_CCCL_BUILTIN_ASSUME` macro (#3314) * add compiler-specific path * fix device code path * add _CCC_ASSUME Deprecate thrust::numeric_limits (#3366) Replace `typedef` with `using` in libcu++ (#3368) Deprecate thrust::optional (#3307) Fixes: #3306 Upgrade to Catch2 3.8 (#3310) Fixes: #1724 refactor `<cuda/std/cstdint>` (#3325) Co-authored-by: Bernhard Manfred Gruber <[email protected]> Update CODEOWNERS (#3331) * Update CODEOWNERS * Update CODEOWNERS * Update CODEOWNERS * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix sign-compare warning (#3408) Implement more cmath functions to be usable on host and device (#3382) * Implement more cmath functions to be usable on host and device * Implement math roots functions * Implement exponential functions Redefine and deprecate thrust::remove_cvref (#3394) * Redefine and deprecate thrust::remove_cvref Co-authored-by: Michael Schellenberger Costa <[email protected]> Fix assert definition for NVHPC due to constexpr issues (#3418) NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it. Fix this by always using the host definition which should also work on device. Fixes #3411 Extend CUB reduce benchmarks (#3401) * Rename max.cu to custom.cu, since it uses a custom operator * Extend types covered my min.cu to all fundamental types * Add some notes on how to collect tuning parameters Fixes: #3283 Update upload-pages-artifact to v3 (#3423) * Update upload-pages-artifact to v3 * Empty commit --------- Co-authored-by: Ashwin Srinath <[email protected]> Replace and deprecate thrust::cuda_cub::terminate (#3421) `std::linalg` accessors and `transposed_layout` (#2962) Add round up/down to multiple (#3234) [FEA]: Introduce Python module with CCCL headers (#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894035917) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894043178 * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894044996 * Install CCCL headers under cuda.cccl.include Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894048562 Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d608da06fcf6851ef6197f88b5e7dbc3bbe. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a218ed77a075156cd1b332047202adb25aa2. Error message: https://github.com/NVIDIA/cccl/pull/3201#issuecomment-2594012971 * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd8b50a6a293e00a5825b579e125010b13d. * Implement suggestion by @shwina (https://github.com/NVIDIA/cccl/pull/3201#pullrequestreview-2556918460) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]> cuda.parallel: Add optional stream argument to reduce_into() (#3348) * Add optional stream argument to reduce_into() * Add tests to check for reduce_into() stream behavior * Move protocol related utils to separate file and rework __cuda_stream__ error messages * Fix synchronization issue in stream test and add one more invalid stream test case * Rename cuda stream validation function after removing leading underscore * Unpack values from __cuda_stream__ instead of indexing * Fix linting errors * Handle TypeError when unpacking invalid __cuda_stream__ return * Use stream to allocate cupy memory in new stream test Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (#3434) Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (#3419) * Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ Fixes #3404 move to c++17, finalize device optimization fix msvc compilation, update tests Deprectate C++11 and C++14 for libcu++ (#3173) * Deprectate C++11 and C++14 for libcu++ Co-authored-by: Bernhard Manfred Gruber <[email protected]> Implement `abs` and `div` from `cstdlib` (#3153) * implement integer abs functions * improve tests, fix constexpr support * just use the our implementation * implement `cuda::std::div` * prefer host's `div_t` like types * provide `cuda::std::abs` overloads for floats * allow fp abs for NVRTC * silence msvc's warning about conversion from floating point to integral Fix missing radix sort policies (#3174) Fixes NVBug 5009941 Introduces new `DeviceReduce::Arg{Min,Max}` interface with two output iterators (#3148) * introduces new arg{min,max} interface with two output iterators * adds fp inf tests * fixes docs * improves code example * fixes exec space specifier * trying to fix deprecation warning for more compilers * inlines unzip operator * trying to fix deprecation warning for nvhpc * integrates supression fixes in diagnostics * pre-ctk 11.5 deprecation suppression * fixes icc * fix for pre-ctk11.5 * cleans up deprecation suppression * cleanup Extend tuning documentation (#3179) Add codespell pre-commit hook, fix typos in CCCL (#3168) * Add codespell pre-commit hook * Automatic changes from codespell. * Manual changes. Fix parameter space for TUNE_LOAD in scan benchmark (#3176) fix various old compiler checks (#3178) implement C++26 `std::projected` (#3175) Fix pre-commit config for codespell and remaining typos (#3182) Massive cleanup of our config (#3155) Fix UB in atomics with automatic storage (#2586) * Adds specialized local cuda atomics and injects them into most atomics paths. Co-authored-by: Georgy Evtushenko <[email protected]> Co-authored-by: gonzalobg <[email protected]> * Allow CUDA 12.2 to keep perf, this addresses earlier comments in #478 * Remove extraneous double brackets in unformatted code. * Merge unsafe atomic logic into `__cuda_is_local`. * Use `const_cast` for type conversions in cuda_local.h * Fix build issues from interface changes * Fix missing __nanosleep on sm70- * Guard __isLocal from NVHPC * Use PTX instead of running nothing from NVHPC * fixup /s/nvrtc/nvhpc * Fixup missing CUDA ifdef surrounding device code * Fix codegen * Bypass some sort of compiler bug on GCC7 * Apply suggestions from code review * Use unsafe automatic storage atomics in codegen tests --------- Co-authored-by: Georgy Evtushenko <[email protected]> Co-authored-by: gonzalobg <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]> Refactor the source code layout for `cuda.parallel` (#3177) * Refactor the source layout for cuda.parallel * Add copyright * Address review feedback * Don't import anything into `experimental` namespace * fix import --------- Co-authored-by: Ashwin Srinath <[email protected]> new type-erased memory resources (#2824) s/_LIBCUDACXX_DECLSPEC_EMPTY_BASES/_CCCL_DECLSPEC_EMPTY_BASES/g (#3186) Document address stability of `thrust::transform` (#3181) * Do not document _LIBCUDACXX_MARK_CAN_COPY_ARGUMENTS * Reformat and fix UnaryFunction/BinaryFunction in transform docs * Mention transform can use proclaim_copyable_arguments * Document cuda::proclaims_copyable_arguments better * Deprecate depending on transform functor argument addresses Fixes: #3053 turn off cuda version check for clangd (#3194) [STF] jacobi example based on parallel_for (#3187) * Simple jacobi example with parallel for and reductions * clang-format * remove useless capture list fixes pre-nv_diag suppression issues (#3189) Prefer c2h::type_name over c2h::demangle (#3195) Fix memcpy_async* tests (#3197) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test Add type annotations and mypy checks for `cuda.parallel` (#3180) * Refactor the source layout for cuda.parallel * Add initial type annotations * Update pre-commit config * More typing * Fix bad merge * Fix TYPE_CHECKING and numpy annotations * typing bindings.py correctly * Address review feedback --------- Co-authored-by: Ashwin Srinath <[email protected]> Fix rendering of cuda.parallel docs (#3192) * Fix pre-commit config for codespell and remaining typos * Fix rendering of docs for cuda.parallel --------- Co-authored-by: Ashwin Srinath <[email protected]> Enable PDL for DeviceMergeSortBlockSortKernel (#3199) The kernel already contains a call to _CCCL_PDL_GRID_DEPENDENCY_SYNC. This commit enables PDL when launching the kernel. Adds support for large `num_items` to `DeviceReduce::{ArgMin,ArgMax}` (#2647) * adds benchmarks for reduce::arg{min,max} * preliminary streaming arg-extremum reduction * fixes implicit conversion * uses streaming dispatch class * changes arg benches to use new streaming reduce * streaming arg-extrema reduction * fixes style * fixes compilation failures * cleanups * adds rst style comments * declare vars const and use clamp * consolidates argmin argmax benchmarks * fixes thrust usage * drops offset type in arg-extrema benchmarks * fixes clang cuda * exec space macros * switch to signed global offset type for slightly better perf * clarifies documentation * applies minor benchmark style changes from review comments * fixes interface documentation and comments * list-init accumulating output op * improves style, comments, and tests * cleans up aggregate init * renames dispatch class usage in benchmarks * fixes merge conflicts * addresses review comments * addresses review comments * fixes assertion * removes superseded implementation * changes large problem tests to use new interface * removes obsolete tests for deprecated interface Fixes for Python 3.7 docs environment (#3206) Co-authored-by: Ashwin Srinath <[email protected]> Adds support for large number of items to `DeviceTransform` (#3172) * moves large problem test helper to common file * adds support for large num items to device transform * adds tests for large number of items to device interface * fixes format * addresses review comments cp_async_bulk: Fix test (#3198) * memcpy_async_tx: Fix bug in test Two bugs, one of which occurs in practice: 1. There is a missing fence.proxy.space::global between the writes to global memory and the memcpy_async_tx. (Occurs in practice) 2. The end of the kernel should be fenced with `__syncthreads()`, because the barrier is invalidated in the destructor. If other threads are still waiting on it, there will be UB. (Has not yet manifested itself) * cp_async_bulk_tensor: Pre-emptively fence more in test * cp_async_bulk: Fix test The global memory pointer could be misaligned. cudax fixes for msvc 14.41 (#3200) avoid instantiating class templates in `is_same` implementation when possible (#3203) Fix: make launchers a CUB detail; make kernel source functions hidden. (#3209) * Fix: make launchers a CUB detail; make kernel source functions hidden. * [pre-commit.ci] auto code formatting * Address review comments, fix which macro gets fixed. help the ranges concepts recognize standard contiguous iterators in c++14/17 (#3202) unify macros and cmake options that control the suppression of deprecation warnings (#3220) * unify macros and cmake options that control the suppression of deprecation warnings * suppress nvcc warning #186 in thrust header tests * suppress c++ dialect deprecation warnings in libcudacxx header tests Fx thread-reduce performance regression (#3225) cuda.parallel: In-memory caching of build objects (#3216) * Define __eq__ and __hash__ for Iterators * Define cache_with_key utility and use it to cache Reduce objects * Add tests for caching Reduce objects * Tighten up types * Updates to support 3.7 * Address review feedback * Introduce IteratorKind to hold iterator type information * Use the .kind to generate an abi_name * Remove __eq__ and __hash__ methods from IteratorBase * Move helper function * Formatting * Don't unpack tuple in cache key --------- Co-authored-by: Ashwin Srinath <[email protected]> Just enough ranges for c++14 `span` (#3211) use generalized concepts portability macros to simplify the `range` concept (#3217) fixes some issues in the concepts portability macros and then re-implements the `range` concept with `_CCCL_REQUIRES_EXPR` Use Ruff to sort imports (#3230) * Update pyproject.tomls for import sorting * Update files after running pre-commit * Move ruff config to pyproject.toml --------- Co-authored-by: Ashwin Srinath <[email protected]> fix tuning_scan sm90 config issue (#3236) Co-authored-by: Shijie Chen <[email protected]> [STF] Logical token (#3196) * Split the implementation of the void interface into the definition of the interface, and its implementations on streams and graphs. * Add missing files * Check if a task implementation can match a prototype where the void_interface arguments are ignored * Implement ctx.abstract_logical_data() which relies on a void data interface * Illustrate how to use abstract handles in local contexts * Introduce an is_void_interface() virtual method in the data interface to potentially optimize some stages * Small improvements in the examples * Do not try to allocate or move void data * Do not use I as a variable * fix linkage error * rename abtract_logical_data into logical_token * Document logical token * fix spelling error * fix sphinx error * reflect name changes * use meaningful variable names * simplify logical_token implementation because writeback is already disabled * add a unit test for token elision * implement token elision in host_launch * Remove unused type * Implement helpers to check if a function can be invoked from a tuple, or from a tuple where we removed tokens * Much simpler is_tuple_invocable_with_filtered implementation * Fix buggy test * Factorize code * Document that we can ignore tokens for task and host_launch * Documentation for logical data freeze Fix ReduceByKey tuning (#3240) Fix RLE tuning (#3239) cuda.parallel: Forbid non-contiguous arrays as inputs (or outputs) (#3233) * Forbid non-contiguous arrays as inputs (or outputs) * Implement a more robust way to check for contiguity * Don't bother if cublas unavailable * Fix how we check for zero-element arrays * sort imports --------- Co-authored-by: Ashwin Srinath <[email protected]> expands support for more offset types in segmented benchmark (#3231) Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects (#3253) * Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects * Do not add option twice ptx: Add add_instruction.py (#3190) This file helps create the necessary structure for new PTX instructions. Co-authored-by: Allard Hendriksen <[email protected]> Bump main to 2.9.0. (#3247) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop cub::Mutex (#3251) Fixes: #3250 Remove legacy macros from CUB util_arch.cuh (#3257) Fixes: #3256 Remove thrust::[unary|binary]_traits (#3260) Fixes: #3259 Architecture and OS identification macros (#3237) Bump main to 3.0.0. (#3265) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Drop thrust not1 and not2 (#3264) Fixes: #3263 CCCL Internal macro documentation (#3238) Deprecate GridBarrier and GridBarrierLifetime (#3258) Fixes: #1389 Require at least gcc7 (#3268) Fixes: #3267 Drop thrust::[unary|binary]_function (#3274) Fixes: #3273 Drop ICC from CI (#3277) [STF] Corruption of the capture list of an extended lambda with a parallel_for construct on a host execution place (#3270) * Add a test to reproduce a bug observed with parallel_for on a host place * clang-format * use _CCCL_ASSERT * Attempt to debug * do not create a tuple with a universal reference that is out of scope when we use it, use an lvalue instead * fix lambda expression * clang-format Enable thrust::identity test for non-MSVC (#3281) This seems to be an oversight when the test was added Co-authored-by: Michael Schellenberger Costa <[email protected]> Enable PDL in triple chevron launch (#3282) It seems PDL was disabled by accident when _THRUST_HAS_PDL was renamed to _CCCL_HAS_PDL during the review introducing the feature. Disambiguate line continuations and macro continuations in <nv/target> (#3244) Drop VS 2017 from CI (#3287) Fixes: #3286 Drop ICC support in code (#3279) * Drop ICC from code Fixes: #3278 Co-authored-by: Michael Schellenberger Costa <[email protected]> Make CUB NVRTC commandline arguments come from a cmake template (#3292) Propose the same components (thrust, cub, libc++, cudax, cuda.parallel,...) in the bug report template than in the feature request template (#3295) Use process isolation instead of default hyper-v for Windows. (#3294) Try improving build times by using process isolation instead of hyper-v Co-authored-by: Michael Schellenberger Costa <[email protected]> [pre-commit.ci] pre-commit autoupdate (#3248) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.6](https://github.com/pre-commit/mirrors-clang-format/compare/v18.1.8...v19.1.6) - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.13.0...v1.14.1) Co-authored-by: Michael Schellenberger Costa <[email protected]> Drop Thrust legacy arch macros (#3298) Which were disabled and could be re-enabled using THRUST_PROVIDE_LEGACY_ARCH_MACROS Drop Thrust's compiler_fence.h (#3300) Drop CTK 11.x from CI (#3275) * Add cuda12.0-gcc7 devcontainer * Move MSVC2017 jobs to CTK 12.6 Those is the only combination where rapidsai has devcontainers * Add /Zc:__cplusplus for the libcudacxx tests * Only add excape hatch for affected CTKs * Workaround missing cudaLaunchKernelEx on MSVC cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro, which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we provide our own copy of cudaLaunchKernelEx when it is not available from the CTK. * Workaround nvcc+MSVC issue * Regenerate devcontainers Fixes: #3249 Co-authored-by: Michael Schellenberger Costa <[email protected]> Update packman and repo_docs versions (#3293) Co-authored-by: Ashwin Srinath <[email protected]> Drop Thrust's deprecated compiler macros (#3301) Drop CUB_RUNTIME_ENABLED and __THRUST_HAS_CUDART__ (#3305) Adds support for large number of items to `DevicePartition::If` with the `ThreeWayPartition` overload (#2506) * adds support for large number of items to three-way partition * adapts interface to use choose_signed_offset_t * integrates applicable feedback from device-select pr * changes behavior for empty problems * unifies grid constant macro * fixes kernel template specialization mismatch * integrates _CCCL_GRID_CONSTANT changes * resolve merge conflicts * fixes checks in test * fixes test verification * improves tests * makes few improvements to streaming dispatch * improves code comment on test * fixes unrelated compiler error * minor style improvements Refactor scan tunings (#3262) Require C++17 for compiling Thrust and CUB (#3255) * Issue an unsuppressable warning when compiling with < C++17 * Remove C++11/14 presets * Remove CCCL_IGNORE_DEPRECATED_CPP_DIALECT from headers * Remove [CUB|THRUST|TCT]_IGNORE_DEPRECATED_CPP_[11|14] * Remove CUB_ENABLE_DIALECT_CPP[11|14] * Update CI runs * Remove C++11/14 CI runs for CUB and Thrust * Raise compiler minimum versions for C++17 * Update ReadMe * Drop Thrust's cpp14_required.h * Add escape hatch for C++17 removal Fixes: #3252 Implement `views::empty` (#3254) * Disable pair conversion of subrange with clang in C++17 * Fix namespace views * Implement `views::empty` This implements `std::ranges::views::empty`, see https://en.cppreference.com/w/cpp/ranges/empty_view Refactor `limits` and `climits` (#3221) * implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC cuda.parallel: Add documentation for the current iterators along with examples and tests (#3311) * Add tests demonstrating usage of different iterators * Update documentation of reduce_into by merging import code snippet with the rest of the example * Add documentation for current iterators * Run pre-commit checks and update accordingly * Fix comments to refer to the proper lines in the code snippets in the docs Drop clang<14 from CI, update devcontainers. (#3309) Co-authored-by: Bernhard Manfred Gruber <[email protected]> [STF] Cleanup task dependencies object constructors (#3291) * Define tag types for access modes * - Rework how we build task_dep objects based on access mode tags - pack_state is now responsible for using a const_cast for read only data * Greatly simplify the previous attempt : do not define new types, but use integral constants based on the enums * It seems the const_cast was not necessarily so we can simplify it and not even do some dispatch based on access modes Disable test with a gcc-14 regression (#3297) Deprecate Thrust's cpp_compatibility.h macros (#3299) Remove dropped function objects from docs (#3319) Document `NV_TARGET` macros (#3313) [STF] Define ctx.pick_stream() which was missing for the unified context (#3326) * Define ctx.pick_stream() which was missing for the unified context * clang-format Deprecate cub::IterateThreadStore (#3337) Drop CUB's BinaryFlip operator (#3332) Deprecate cub::Swap (#3333) Clarify transform output can overlap input (#3323) Drop CUB APIs with a debug_synchronous parameter (#3330) Fixes: #3329 Drop CUB's util_compiler.cuh for real (#3340) PR #3302 planned to drop the file, but only dropped its content. This was an oversight. So let's drop the entire file. Drop cub::ValueCache (#3346) limits offset types for merge sort (#3328) Drop CDPv1 (#3344) Fixes: #3341 Drop thrust::void_t (#3362) Use cuda::std::addressof in Thrust (#3363) Fix all_of documentation for empty ranges (#3358) all_of always returns true on an empty range. [STF] Do not keep track of dangling events in a CUDA graph backend (#3327) * Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when the CUDA graph completes. Therefore keeping track of "dangling events" is a waste of time and resources. * replace can_ignore_dangling_events by track_dangling_events which leads to more readable code * When not storing the dangling events, we must still perform the deinit operations that were producing these events ! Extract scan kernels into NVRTC-compilable header (#3334) * Extract scan kernels into NVRTC-compilable header * Update cub/cub/device/dispatch/dispatch_scan.cuh Co-authored-by: Georgii Evtushenko <[email protected]> --------- Co-authored-by: Ashwin Srinath <[email protected]> Co-authored-by: Georgii Evtushenko <[email protected]> Drop deprecated aliases in Thrust functional (#3272) Fixes: #3271 Drop cub::DivideAndRoundUp (#3347) Use cuda::std::min/max in Thrust (#3364) Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (#3361) * implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` Cleanup util_arch (#2773) Deprecate thrust::null_type (#3367) Deprecate cub::DeviceSpmv (#3320) Fixes: #896 Improves `DeviceSegmentedSort` test run time for large number of items and segments (#3246) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * fixes spelling * adds tests for large number of segments * fixes narrowing conversion in tests * addresses review comments * fixes includes Compile basic infra test with C++17 (#3377) Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (#3308) * fixes segment offset generation * switches to analytical verification * switches to analytical verification for pairs * addresses review comments * introduces segment offset type * adds tests for large number of segments * adds support for large number of segments * drops segment offset type * fixes thrust namespace * removes about-to-be-deprecated cub iterators * no exec specifier on defaulted ctor * fixes gcc7 linker error * uses local_segment_index_t throughout * determine offset type based on type returned by segment iterator begin/end iterators * minor style improvements Exit with error when RAPIDS CI fails. (#3385) cuda.parallel: Support structured types as algorithm inputs (#3218) * Introduce gpu_struct decorator and typing * Enable `reduce` to accept arrays of structs as inputs * Add test for reducing arrays-of-struct * Update documentation * Use a numpy array rather than ctypes object * Change zeros -> empty for output array and temp storage * Add a TODO for typing GpuStruct * Documentation udpates * Remove test_reduce_struct_type from test_reduce.py * Revert to `to_cccl_value()` accepting ndarray + GpuStruct * Bump copyrights --------- Co-authored-by: Ashwin Srinath <[email protected]> Deprecate thrust::async (#3324) Fixes: #100 Review/Deprecate CUB `util.ptx` for CCCL 2.x (#3342) Fix broken `_CCCL_BUILTIN_ASSUME` macro (#3314) * add compiler-specific path * fix device code path * add _CCC_ASSUME Deprecate thrust::numeric_limits (#3366) Replace `typedef` with `using` in libcu++ (#3368) Deprecate thrust::optional (#3307) Fixes: #3306 Upgrade to Catch2 3.8 (#3310) Fixes: #1724 refactor `<cuda/std/cstdint>` (#3325) Co-authored-by: Bernhard Manfred Gruber <[email protected]> Update CODEOWNERS (#3331) * Update CODEOWNERS * Update CODEOWNERS * Update CODEOWNERS * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Fix sign-compare warning (#3408) Implement more cmath functions to be usable on host and device (#3382) * Implement more cmath functions to be usable on host and device * Implement math roots functions * Implement exponential functions Redefine and deprecate thrust::remove_cvref (#3394) * Redefine and deprecate thrust::remove_cvref Co-authored-by: Michael Schellenberger Costa <[email protected]> Fix assert definition for NVHPC due to constexpr issues (#3418) NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it. Fix this by always using the host definition which should also work on device. Fixes #3411 Extend CUB reduce benchmarks (#3401) * Rename max.cu to custom.cu, since it uses a custom operator * Extend types covered my min.cu to all fundamental types * Add some notes on how to collect tuning parameters Fixes: #3283 Update upload-pages-artifact to v3 (#3423) * Update upload-pages-artifact to v3 * Empty commit --------- Co-authored-by: Ashwin Srinath <[email protected]> Replace and deprecate thrust::cuda_cub::terminate (#3421) `std::linalg` accessors and `transposed_layout` (#2962) Add round up/down to multiple (#3234) [FEA]: Introduce Python module with CCCL headers (#3201) * Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative * Run `copy_cccl_headers_to_aude_include()` before `setup()` * Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path. * Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel * Bug fix: cuda/_include only exists after shutil.copytree() ran. * Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py * Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions) * Replace := operator (needs Python 3.8+) * Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md * Restore original README.md: `pip3 install -e` now works on first pass. * cuda_cccl/README.md: FOR INTERNAL USE ONLY * Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894035917) Command used: ci/update_version.sh 2 8 0 * Modernize pyproject.toml, setup.py Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894043178 * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894044996 * Install CCCL headers under cuda.cccl.include Trigger for this change: * https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894048562 Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely. * Factor out cuda_cccl/cuda/cccl/include_paths.py * Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative * Add missing Copyright notice. * Add missing __init__.py (cuda.cccl) * Add `"cuda.cccl"` to `autodoc.mock_imports` * Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.) * Add # TODO: move this to a module-level import * Modernize cuda_cooperative/pyproject.toml, setup.py * Convert cuda_cooperative to use hatchling as build backend. * Revert "Convert cuda_cooperative to use hatchling as build backend." This reverts commit 61637d608da06fcf6851ef6197f88b5e7dbc3bbe. * Move numpy from [build-system] requires -> [project] dependencies * Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH * Remove copy_license() and use license_files=["../../LICENSE"] instead. * Further modernize cuda_cccl/setup.py to use pathlib * Trivial simplifications in cuda_cccl/pyproject.toml * Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code * Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml * Add taplo-pre-commit to .pre-commit-config.yaml * taplo-pre-commit auto-fixes * Use pathlib in cuda_cooperative/setup.py * CCCL_PYTHON_PATH in cuda_cooperative/setup.py * Modernize cuda_parallel/pyproject.toml, setup.py * Use pathlib in cuda_parallel/setup.py * Add `# TOML lint & format` comment. * Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml * Use pathlib in cuda/cccl/include_paths.py * pre-commit autoupdate (EXCEPT clang-format, which was manually restored) * Fixes after git merge main * Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result' ``` =========================================================================== warnings summary =========================================================================== tests/test_reduce.py::test_reduce_non_contiguous /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080> Traceback (most recent call last): File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__ bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result)) ^^^^^^^^^^^^^^^^^ AttributeError: '_Reduce' object has no attribute 'build_result' warnings.warn(pytest.PytestUnraisableExceptionWarning(msg)) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ============================================================== ``` * Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy` * Introduce cuda_cooperative/constraints.txt * Also add cuda_parallel/constraints.txt * Add `--constraint constraints.txt` in ci/test_python.sh * Update Copyright dates * Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024) For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI. * Remove unused cuda_parallel jinja2 dependency (noticed by chance). * Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead. * Make cuda_cooperative, cuda_parallel testing completely independent. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]" This reverts commit ea33a218ed77a075156cd1b332047202adb25aa2. Error message: https://github.com/NVIDIA/cccl/pull/3201#issuecomment-2594012971 * Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc] * Restore original ci/matrix.yaml [skip-rapids] * Use for loop in test_python.sh to avoid code duplication. * Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci] * Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc] * Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]" This reverts commit ec206fd8b50a6a293e00a5825b579e125010b13d. * Implement suggestion by @shwina (https://github.com/NVIDIA/cccl/pull/3201#pullrequestreview-2556918460) * Address feedback by @leofang --------- Co-authored-by: Bernhard Manfred Gruber <[email protected]> cuda.parallel: Add optional stream argument to reduce_into() (#3348) * Add optional stream argument to reduce_into() * Add tests to check for reduce_into() stream behavior * Move protocol related utils to separate file and rework __cuda_stream__ error messages * Fix synchronization issue in stream test and add one more invalid stream test case * Rename cuda stream validation function after removing leading underscore * Unpack values from __cuda_stream__ instead of indexing * Fix linting errors * Handle TypeError when unpacking invalid __cuda_stream__ return * Use stream to allocate cupy memory in new stream test Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (#3434) Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (#3419) * Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ Fixes #3404 Fix CI issues (#3443) update docs fix review restrict allowed types replace constexpr implementations with generic optimize `__is_arithmetic_integral`

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC Co-authored-by: David Bayer <[email protected]>

davebayer added 7 commits December 29, 2024 21:40

implement builtins for huge val, nan and nans

81e689a

change INFINITY and NAN implementation for NVRTC

1d11f82

collapse climits headers

cbde7d4

remove no longer needed msvc includes

4d25dd3

fix missing includes

5ebb259

refactor limits

967fecf

fix tests

7f558c5

davebayer requested review from a team as code owners December 30, 2024 10:06

davebayer requested review from ericniebler and gonidelis December 30, 2024 10:06

numeric_limits must be a class

952ec08

miscco reviewed Jan 2, 2025

View reviewed changes

libcudacxx/include/cuda/std/climits Outdated Show resolved Hide resolved

libcudacxx/include/cuda/std/__cccl/builtin.h Show resolved Hide resolved

libcudacxx/include/cuda/std/limits Show resolved Hide resolved

libcudacxx/include/cuda/std/limits Outdated Show resolved Hide resolved

move msvc win32 suppot file to cuda/std/__limits

5440437

davebayer added 2 commits January 2, 2025 22:41

add missing includes in tests

3dc3502

do not use 1.0 / 0.0 for nvrtc inf

3a9d627

miscco approved these changes Jan 3, 2025

View reviewed changes

libcudacxx/include/cuda/std/limits Show resolved Hide resolved

exclude msvc specific headers for other compilers

e5ee5d2

davebayer requested a review from a team as a code owner January 3, 2025 10:25

davebayer requested a review from robertmaynard January 3, 2025 10:25

fix comment

0fc17e7

miscco added 2 commits January 7, 2025 15:01

Fix incorrect detection logic for older gcc

fedd5a4

Fix unrelated nvrtc error?

2cb0c08

miscco reviewed Jan 7, 2025

View reviewed changes

libcudacxx/include/cuda/std/__cccl/builtin.h Show resolved Hide resolved

Ensure old gcc uses all the builtins

23fdf2c

miscco added 2 commits January 8, 2025 13:42

Merge branch 'main' into pr/davebayer/3221

58a71d5

Merge branch 'main' into pr/davebayer/3221

7e53aa4

miscco merged commit 1503b25 into NVIDIA:main Jan 9, 2025
182 checks passed

davebayer added a commit to davebayer/cccl that referenced this pull request Jan 18, 2025

Refactor limits and climits (NVIDIA#3221)

6b82e05

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC

bernhardmgruber pushed a commit to bernhardmgruber/cccl that referenced this pull request Jan 22, 2025

Refactor limits and climits (NVIDIA#3221)

972076f

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC

bernhardmgruber added the backport branch/2.8.x label Jan 22, 2025

bernhardmgruber pushed a commit to bernhardmgruber/cccl that referenced this pull request Jan 22, 2025

Refactor limits and climits (NVIDIA#3221)

c5bd5e5

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC

bernhardmgruber pushed a commit to bernhardmgruber/cccl that referenced this pull request Jan 23, 2025

Refactor limits and climits (NVIDIA#3221)

e8ceb74

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC

bernhardmgruber added a commit that referenced this pull request Jan 23, 2025

Refactor limits and climits (#3221) (#3488)

843f505

* implement builtins for huge val, nan and nans * change `INFINITY` and `NAN` implementation for NVRTC Co-authored-by: David Bayer <[email protected]>

Refactor limits and climits #3221

Refactor limits and climits #3221

Conversation

davebayer commented Dec 30, 2024 • edited Loading

copy-pr-bot bot commented Dec 30, 2024

miscco commented Jan 2, 2025

miscco commented Jan 3, 2025

miscco commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

🟨 libcudacxx: Pass: 75%/48 | Total: 15h 58m | Avg: 19m 58s | Max: 1h 03m | Hits: 32%/9818

🟨 cub: Pass: 82%/47 | Total: 1d 00h | Avg: 30m 53s | Max: 1h 13m | Hits: 2%/3144

🟨 thrust: Pass: 82%/46 | Total: 16h 18m | Avg: 21m 16s | Max: 1h 14m | Hits: 27%/9260

🟨 cudax: Pass: 96%/26 | Total: 3h 19m | Avg: 7m 39s | Max: 24m 28s | Hits: 30%/312

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 28s | Avg: 5m 14s | Max: 8m 32s

🟩 python: Pass: 100%/1 | Total: 26m 32s | Avg: 26m 32s | Max: 26m 32s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 170)

miscco commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

🟨 cub: Pass: 82%/47 | Total: 11h 16m | Avg: 14m 23s | Max: 56m 56s | Hits: 99%/3144

🟩 libcudacxx: Pass: 100%/48 | Total: 16h 13m | Avg: 20m 17s | Max: 1h 28m | Hits: 49%/9818

🟩 thrust: Pass: 100%/46 | Total: 10h 11m | Avg: 13m 17s | Max: 40m 33s | Hits: 99%/9260

🟩 cudax: Pass: 100%/26 | Total: 2h 15m | Avg: 5m 13s | Max: 23m 59s | Hits: 92%/312

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 32s | Avg: 6m 16s | Max: 10m 26s

🟩 python: Pass: 100%/1 | Total: 39m 14s | Avg: 39m 14s | Max: 39m 14s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 170)

miscco commented Jan 7, 2025

github-actions bot commented Jan 7, 2025

🟨 libcudacxx: Pass: 97%/48 | Total: 15h 13m | Avg: 19m 02s | Max: 1h 08m | Hits: 37%/9818

🟩 cub: Pass: 100%/47 | Total: 14h 07m | Avg: 18m 01s | Max: 1h 00m | Hits: 99%/3144

🟩 thrust: Pass: 100%/46 | Total: 9h 55m | Avg: 12m 56s | Max: 38m 09s | Hits: 99%/9260

🟩 cudax: Pass: 100%/26 | Total: 2h 14m | Avg: 5m 10s | Max: 16m 17s | Hits: 92%/312

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 37s | Avg: 4m 48s | Max: 7m 36s

🟩 python: Pass: 100%/1 | Total: 29m 05s | Avg: 29m 05s | Max: 29m 05s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 170)

miscco commented Jan 9, 2025

github-actions bot commented Jan 9, 2025

🟩 libcudacxx: Pass: 100%/48 | Total: 19h 23m | Avg: 24m 14s | Max: 1h 16m | Hits: 328%/12458

🟩 cub: Pass: 100%/47 | Total: 1d 16h | Avg: 52m 20s | Max: 1h 09m | Hits: 26%/3900

🟩 thrust: Pass: 100%/46 | Total: 1d 06h | Avg: 39m 41s | Max: 1h 13m | Hits: 119%/11112

🟩 cudax: Pass: 100%/24 | Total: 5h 41m | Avg: 14m 12s | Max: 19m 10s | Hits: 62%/312

🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 54s | Avg: 4m 57s | Max: 7m 40s

🟩 python: Pass: 100%/1 | Total: 25m 18s | Avg: 25m 18s | Max: 25m 18s

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 168)

Refactor `limits` and `climits` #3221

Refactor `limits` and `climits` #3221

davebayer commented Dec 30, 2024 •

edited

Loading