Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop CTK 11.x from CI #3275

Merged
merged 8 commits into from
Jan 9, 2025
Merged

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Jan 8, 2025

Copy link

copy-pr-bot bot commented Jan 8, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bernhardmgruber bernhardmgruber changed the title Drop ctk11 Drop CTK 11.x from CI Jan 8, 2025
@bernhardmgruber bernhardmgruber added the breaking Breaking change label Jan 8, 2025
@@ -41,7 +41,6 @@ workflows:
# verify-codegen:
- {jobs: ['verify_codegen'], project: 'libcudacxx'}
# cudax has different CTK reqs:
- {jobs: ['build'], project: 'cudax', ctk: ['12.0'], std: 17, cxx: ['gcc9', 'clang9']}
Copy link
Contributor Author

@bernhardmgruber bernhardmgruber Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsumed by the "Old CTK" above.

@bernhardmgruber
Copy link
Contributor Author

/ok to test

@bernhardmgruber
Copy link
Contributor Author

bernhardmgruber commented Jan 8, 2025

We cannot test MSVC2017 with CTK 12.0 (upgraded from CTK 11.1) anymore, because rapidsai does not provide devcontainers for it. However, we have devcontainers for MSVC2017 and CTK 12.4-12.6.

@bernhardmgruber bernhardmgruber force-pushed the drop_ctk11 branch 2 times, most recently from 561a7f5 to 525149d Compare January 8, 2025 11:38
@bernhardmgruber bernhardmgruber marked this pull request as ready for review January 8, 2025 11:38
@bernhardmgruber bernhardmgruber requested review from a team as code owners January 8, 2025 11:38
@miscco
Copy link
Collaborator

miscco commented Jan 8, 2025

There are a lot issues about cudaLaunchKernelEx: being undefined

2025-01-08T12:00:41.9723249Z C:/cccl/libcudacxx/test/libcudacxx/force_include.h(112): error: identifier "cudaLaunchKernelEx" is undefined

Copy link
Contributor

github-actions bot commented Jan 8, 2025

🟨 CI finished in 1h 13m: Pass: 96%/178 | Total: 1d 11h | Avg: 11m 50s | Max: 1h 10m | Hits: 79%/21062
  • 🟨 libcudacxx: Pass: 94%/50 | Total: 9h 59m | Avg: 11m 58s | Max: 45m 03s | Hits: 77%/7590

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  93%/48  | Total:  9h 51m | Avg: 12m 19s | Max: 45m 03s | Hits:  77%/7590  
      🟩 arm64              Pass: 100%/2   | Total:  7m 05s | Avg:  3m 32s | Max:  3m 44s
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 07m | Avg: 16m 46s | Max: 20m 48s
      🔍 nvcc               Pass:  93%/46  | Total:  8h 51m | Avg: 11m 33s | Max: 45m 03s | Hits:  77%/7590  
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/20  | Total:  2h 52m | Avg:  8m 36s | Max: 20m 48s
      🟩 GCC                Pass: 100%/21  | Total:  3h 53m | Avg: 11m 05s | Max: 34m 02s
      🟩 Intel              Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🔍 MSVC               Pass:  50%/6   | Total:  2h 23m | Avg: 23m 55s | Max: 45m 03s | Hits:  77%/7590  
      🟩 NVHPC              Pass: 100%/2   | Total: 44m 40s | Avg: 22m 20s | Max: 35m 28s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  93%/43  | Total:  7h 41m | Avg: 10m 43s | Max: 45m 03s | Hits:  77%/7590  
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 39m | Avg: 24m 50s | Max: 34m 02s
      🟩 Test               Pass: 100%/2   | Total: 36m 15s | Avg: 18m 07s | Max: 19m 05s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 02s | Avg:  2m 02s | Max:  2m 02s
    🟨 ctk
      🟨 12.0               Pass:  75%/8   | Total:  2h 19m | Avg: 17m 28s | Max: 29m 10s
      🟩 12.5               Pass: 100%/2   | Total: 44m 40s | Avg: 22m 20s | Max: 35m 28s
      🟨 12.6               Pass:  97%/40  | Total:  6h 54m | Avg: 10m 21s | Max: 45m 03s | Hits:  77%/7590  
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 07m | Avg: 16m 46s | Max: 20m 48s
      🟨 nvcc12.0           Pass:  75%/8   | Total:  2h 19m | Avg: 17m 28s | Max: 29m 10s
      🟩 nvcc12.5           Pass: 100%/2   | Total: 44m 40s | Avg: 22m 20s | Max: 35m 28s
      🟨 nvcc12.6           Pass:  97%/36  | Total:  5h 47m | Avg:  9m 39s | Max: 45m 03s | Hits:  77%/7590  
    🟨 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 39m 25s | Avg:  9m 51s | Max: 16m 50s
      🟩 Clang10            Pass: 100%/1   | Total:  5m 02s | Avg:  5m 02s | Max:  5m 02s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 00s | Avg:  4m 00s | Max:  4m 00s
      🟩 Clang12            Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s
      🟩 Clang13            Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 58s | Avg:  3m 58s | Max:  3m 58s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 34s | Avg:  4m 34s | Max:  4m 34s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 38m | Avg: 12m 18s | Max: 20m 48s
      🟩 GCC7               Pass: 100%/4   | Total: 50m 19s | Avg: 12m 34s | Max: 29m 10s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s
      🟩 GCC9               Pass: 100%/3   | Total: 35m 00s | Avg: 11m 40s | Max: 17m 45s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 38s | Avg:  3m 38s | Max:  3m 38s
      🟩 GCC12              Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 13m | Avg: 13m 18s | Max: 34m 02s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟥 MSVC14.16          Pass:   0%/1   | Total: 45m 03s | Avg: 45m 03s | Max: 45m 03s
      🟨 MSVC14.29          Pass:  33%/3   | Total:  1h 08m | Avg: 22m 47s | Max: 34m 14s | Hits:  31%/2481  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 30m 05s | Avg: 15m 02s | Max: 15m 20s | Hits:  99%/5109  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 44m 40s | Avg: 22m 20s | Max: 35m 28s
    🟨 std
      🟩 11                 Pass: 100%/6   | Total:  1h 10m | Avg: 11m 41s | Max: 20m 44s
      🟨 14                 Pass:  60%/5   | Total:  1h 30m | Avg: 18m 04s | Max: 45m 03s
      🟨 17                 Pass:  93%/15  | Total:  3h 29m | Avg: 13m 58s | Max: 34m 14s | Hits:  65%/4962  
      🟩 20                 Pass: 100%/23  | Total:  3h 46m | Avg:  9m 51s | Max: 35m 28s | Hits:  98%/2628  
    🟨 gpu
      🟨 v100               Pass:  94%/50  | Total:  9h 59m | Avg: 11m 58s | Max: 45m 03s | Hits:  77%/7590  
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 14m 12s | Avg: 14m 12s | Max: 14m 12s
      🟩 90a                Pass: 100%/2   | Total: 15m 56s | Avg:  7m 58s | Max: 12m 13s
    
  • 🟨 thrust: Pass: 95%/48 | Total: 10h 04m | Avg: 12m 35s | Max: 1h 02m | Hits: 89%/9260

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/46  | Total:  9h 55m | Avg: 12m 56s | Max:  1h 02m | Hits:  89%/9260  
      🟩 arm64              Pass: 100%/2   | Total:  9m 35s | Avg:  4m 47s | Max:  5m 08s
    🔍 ctk: 12.0 🔍
      🔍 12.0               Pass:  75%/8   | Total:  3h 18m | Avg: 24m 50s | Max: 59m 09s
      🟩 12.5               Pass: 100%/2   | Total: 29m 35s | Avg: 14m 47s | Max: 15m 37s
      🟩 12.6               Pass: 100%/38  | Total:  6h 16m | Avg:  9m 54s | Max:  1h 02m | Hits:  89%/9260  
    🔍 cudacxx: nvcc12.0 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  4m 59s
      🔍 nvcc12.0           Pass:  75%/8   | Total:  3h 18m | Avg: 24m 50s | Max: 59m 09s
      🟩 nvcc12.5           Pass: 100%/2   | Total: 29m 35s | Avg: 14m 47s | Max: 15m 37s
      🟩 nvcc12.6           Pass: 100%/36  | Total:  6h 06m | Avg: 10m 11s | Max:  1h 02m | Hits:  89%/9260  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 50s | Avg:  4m 55s | Max:  4m 59s
      🔍 nvcc               Pass:  95%/46  | Total:  9h 54m | Avg: 12m 55s | Max:  1h 02m | Hits:  89%/9260  
    🔍 cxx: MSVC14.29 🔍
      🟩 Clang9             Pass: 100%/4   | Total: 22m 45s | Avg:  5m 41s | Max:  6m 41s
      🟩 Clang10            Pass: 100%/1   | Total:  7m 04s | Avg:  7m 04s | Max:  7m 04s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 Clang12            Pass: 100%/1   | Total:  4m 58s | Avg:  4m 58s | Max:  4m 58s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 10s | Avg:  5m 10s | Max:  5m 10s
      🟩 Clang14            Pass: 100%/1   | Total:  4m 59s | Avg:  4m 59s | Max:  4m 59s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
      🟩 Clang18            Pass: 100%/7   | Total: 57m 37s | Avg:  8m 13s | Max: 24m 31s
      🟩 GCC7               Pass: 100%/4   | Total:  1h 11m | Avg: 17m 46s | Max: 35m 24s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 07s | Avg:  5m 07s | Max:  5m 07s
      🟩 GCC9               Pass: 100%/3   | Total: 15m 27s | Avg:  5m 09s | Max:  5m 30s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 56s | Avg:  5m 56s | Max:  5m 56s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 21m | Avg: 10m 12s | Max: 24m 16s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  7m 26s | Avg:  7m 26s | Max:  7m 26s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  50%/1852  
      🔍 MSVC14.29          Pass:  33%/3   | Total:  2h 12m | Avg: 44m 10s | Max: 59m 09s | Hits:  99%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 58m 13s | Avg: 19m 24s | Max: 23m 33s | Hits:  99%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 29m 35s | Avg: 14m 47s | Max: 15m 37s
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/19  | Total:  2h 04m | Avg:  6m 32s | Max: 24m 31s
      🟩 GCC                Pass: 100%/19  | Total:  3h 10m | Avg: 10m 01s | Max: 35m 24s
      🟩 Intel              Pass: 100%/1   | Total:  7m 26s | Avg:  7m 26s | Max:  7m 26s
      🔍 MSVC               Pass:  71%/7   | Total:  4h 12m | Avg: 36m 08s | Max:  1h 02m | Hits:  89%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total: 29m 35s | Avg: 14m 47s | Max: 15m 37s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/42  | Total:  8h 14m | Avg: 11m 46s | Max:  1h 02m | Hits:  87%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 39m 13s | Avg: 13m 04s | Max: 23m 33s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 11m | Avg: 23m 44s | Max: 24m 31s
    🟨 std
      🟩 11                 Pass: 100%/5   | Total: 44m 57s | Avg:  8m 59s | Max: 25m 44s
      🟨 14                 Pass:  75%/4   | Total:  2h 11m | Avg: 32m 59s | Max:  1h 02m | Hits:  50%/1852  
      🟨 17                 Pass:  92%/14  | Total:  3h 16m | Avg: 14m 03s | Max: 59m 09s | Hits:  99%/3704  
      🟩 20                 Pass: 100%/23  | Total:  3h 23m | Avg:  8m 49s | Max: 24m 31s | Hits:  99%/3704  
    🟨 gpu
      🟨 v100               Pass:  95%/48  | Total: 10h 04m | Avg: 12m 35s | Max:  1h 02m | Hits:  89%/9260  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 27m 57s | Avg: 13m 58s | Max: 22m 25s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s
    
  • 🟨 cub: Pass: 97%/49 | Total: 12h 02m | Avg: 14m 44s | Max: 1h 10m | Hits: 59%/3900

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/47  | Total: 11h 52m | Avg: 15m 09s | Max:  1h 10m | Hits:  59%/3900  
      🟩 arm64              Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  5m 05s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/8   | Total:  4h 31m | Avg: 33m 59s | Max:  1h 06m | Hits:   0%/1560  
      🟩 12.5               Pass: 100%/2   | Total: 18m 31s | Avg:  9m 15s | Max:  9m 23s
      🔍 12.6               Pass:  97%/39  | Total:  7h 11m | Avg: 11m 04s | Max:  1h 10m | Hits:  98%/2340  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 37s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  4h 31m | Avg: 33m 59s | Max:  1h 06m | Hits:   0%/1560  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 31s | Avg:  9m 15s | Max:  9m 23s
      🔍 nvcc12.6           Pass:  97%/37  | Total:  7h 02m | Avg: 11m 25s | Max:  1h 10m | Hits:  98%/2340  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 59s | Avg:  4m 29s | Max:  4m 37s
      🔍 nvcc               Pass:  97%/47  | Total: 11h 53m | Avg: 15m 10s | Max:  1h 10m | Hits:  59%/3900  
    🚨 cxx: MSVC14.16 🚨
      🟩 Clang9             Pass: 100%/4   | Total: 22m 48s | Avg:  5m 42s | Max:  6m 13s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 22s | Avg:  5m 22s | Max:  5m 22s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 32s | Avg:  5m 32s | Max:  5m 32s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 54s | Avg:  5m 54s | Max:  5m 54s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 20m | Avg: 11m 34s | Max: 32m 22s
      🟩 GCC7               Pass: 100%/4   | Total:  2h 16m | Avg: 34m 06s | Max:  1h 06m
      🟩 GCC8               Pass: 100%/1   | Total:  5m 18s | Avg:  5m 18s | Max:  5m 18s
      🟩 GCC9               Pass: 100%/3   | Total: 16m 07s | Avg:  5m 22s | Max:  5m 39s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 32s | Avg:  5m 32s | Max:  5m 32s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
      🟩 GCC12              Pass: 100%/3   | Total: 26m 00s | Avg:  8m 40s | Max: 15m 55s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 56m | Avg: 14m 31s | Max: 26m 43s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  6m 50s | Avg:  6m 50s | Max:  6m 50s
      🔥 MSVC14.16          Pass:   0%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m
      🟩 MSVC14.29          Pass: 100%/3   | Total:  2h 17m | Avg: 45m 52s | Max:  1h 03m | Hits:  33%/2340  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 28m 16s | Avg: 14m 08s | Max: 14m 12s | Hits:  98%/1560  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 31s | Avg:  9m 15s | Max:  9m 23s
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/19  | Total:  2h 29m | Avg:  7m 53s | Max: 32m 22s
      🟩 GCC                Pass: 100%/21  | Total:  5h 11m | Avg: 14m 49s | Max:  1h 06m
      🟩 Intel              Pass: 100%/1   | Total:  6m 50s | Avg:  6m 50s | Max:  6m 50s
      🔍 MSVC               Pass:  83%/6   | Total:  3h 55m | Avg: 39m 18s | Max:  1h 10m | Hits:  59%/3900  
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 31s | Avg:  9m 15s | Max:  9m 23s
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 55s
      🔍 v100               Pass:  97%/47  | Total: 11h 41m | Avg: 14m 56s | Max:  1h 10m | Hits:  59%/3900  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  97%/42  | Total:  9h 15m | Avg: 13m 13s | Max:  1h 10m | Hits:  59%/3900  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 22s | Avg: 19m 22s | Max: 19m 22s
      🟩 GraphCapture       Pass: 100%/1   | Total: 23m 21s | Avg: 23m 21s | Max: 23m 21s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 06m | Avg: 22m 18s | Max: 26m 43s
      🟩 TestGPU            Pass: 100%/2   | Total: 57m 16s | Avg: 28m 38s | Max: 32m 22s
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/5   | Total:  1h 27m | Avg: 17m 33s | Max:  1h 06m
      🔍 14                 Pass:  75%/4   | Total:  2h 24m | Avg: 36m 05s | Max:  1h 10m | Hits:   0%/780   
      🟩 17                 Pass: 100%/14  | Total:  3h 29m | Avg: 14m 57s | Max:  1h 03m | Hits:  65%/2340  
      🟩 20                 Pass: 100%/26  | Total:  4h 40m | Avg: 10m 48s | Max: 32m 22s | Hits:  98%/780   
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 55s
      🟩 90a                Pass: 100%/1   | Total:  4m 36s | Avg:  4m 36s | Max:  4m 36s
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 03m | Avg: 5m 09s | Max: 19m 35s | Hits: 92%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  1h 53m | Avg:  5m 39s | Max: 19m 35s | Hits:  92%/312   
      🟩 arm64              Pass: 100%/4   | Total: 10m 27s | Avg:  2m 36s | Max:  2m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 05s | Avg:  9m 05s | Max:  9m 05s | Hits:  92%/156   
      🟩 12.5               Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 35s
      🟩 12.6               Pass: 100%/21  | Total:  1h 43m | Avg:  4m 56s | Max: 19m 35s | Hits:  92%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 05s | Avg:  9m 05s | Max:  9m 05s | Hits:  92%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 35s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 43m | Avg:  4m 56s | Max: 19m 35s | Hits:  92%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 03m | Avg:  5m 09s | Max: 19m 35s | Hits:  92%/312   
    🟩 cxx
      🟩 Clang10            Pass: 100%/1   | Total:  3m 44s | Avg:  3m 44s | Max:  3m 44s
      🟩 Clang11            Pass: 100%/1   | Total:  3m 10s | Avg:  3m 10s | Max:  3m 10s
      🟩 Clang12            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s
      🟩 Clang13            Pass: 100%/1   | Total:  3m 02s | Avg:  3m 02s | Max:  3m 02s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 14s | Avg:  3m 14s | Max:  3m 14s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 17s | Avg:  3m 17s | Max:  3m 17s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s
      🟩 Clang18            Pass: 100%/4   | Total: 27m 42s | Avg:  6m 55s | Max: 19m 27s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s
      🟩 GCC12              Pass: 100%/2   | Total: 22m 40s | Avg: 11m 20s | Max: 19m 35s
      🟩 GCC13              Pass: 100%/4   | Total: 11m 04s | Avg:  2m 46s | Max:  2m 54s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 05s | Avg:  9m 05s | Max:  9m 05s | Hits:  92%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 11s | Avg:  9m 11s | Max:  9m 11s | Hits:  92%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 35s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/12  | Total: 54m 13s | Avg:  4m 31s | Max: 19m 27s
      🟩 GCC                Pass: 100%/8   | Total: 40m 17s | Avg:  5m 02s | Max: 19m 35s
      🟩 MSVC               Pass: 100%/2   | Total: 18m 16s | Avg:  9m 08s | Max:  9m 11s | Hits:  92%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 35s
    🟩 gpu
      🟩 v100               Pass: 100%/24  | Total:  2h 03m | Avg:  5m 09s | Max: 19m 35s | Hits:  92%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/22  | Total:  1h 24m | Avg:  3m 50s | Max:  9m 11s | Hits:  92%/312   
      🟩 Test               Pass: 100%/2   | Total: 39m 02s | Avg: 19m 31s | Max: 19m 35s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s
      🟩 90a                Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 35s | Avg:  3m 23s | Max:  5m 35s
      🟩 20                 Pass: 100%/20  | Total:  1h 50m | Avg:  5m 30s | Max: 19m 35s | Hits:  92%/312   
    
  • 🟩 cccl: Pass: 100%/4 | Total: 17m 54s | Avg: 4m 28s | Max: 4m 43s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  4m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  4m 43s
      🟩 12.6               Pass: 100%/2   | Total:  8m 39s | Avg:  4m 19s | Max:  4m 28s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  4m 43s
      🟩 nvcc12.6           Pass: 100%/2   | Total:  8m 39s | Avg:  4m 19s | Max:  4m 28s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  4m 43s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 43s | Avg:  4m 43s | Max:  4m 43s
      🟩 Clang18            Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 32s | Avg:  4m 32s | Max:  4m 32s
      🟩 GCC13              Pass: 100%/1   | Total:  4m 11s | Avg:  4m 11s | Max:  4m 11s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  4m 43s
      🟩 GCC                Pass: 100%/2   | Total:  8m 43s | Avg:  4m 21s | Max:  4m 32s
    🟩 gpu
      🟩 v100               Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  4m 43s
    🟩 jobs
      🟩 Infra              Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  4m 43s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 8m 04s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  8m 04s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  8m 04s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  8m 04s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  8m 04s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  8m 04s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  8m 04s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  8m 04s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s
      🟩 Test               Pass: 100%/1   | Total:  8m 04s | Avg:  8m 04s | Max:  8m 04s
    
  • 🟩 python: Pass: 100%/1 | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 30m 00s | Avg: 30m 00s | Max: 30m 00s
    

👃 Inspect Changes

Modifications in project?

Project
+/- CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
+/- CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 178)

# Runner
123 linux-amd64-cpu16
23 linux-amd64-gpu-v100-latest-1
21 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@bernhardmgruber
Copy link
Contributor Author

bernhardmgruber commented Jan 8, 2025

There are a lot issues about cudaLaunchKernelEx: being undefined

2025-01-08T12:00:41.9723249Z C:/cccl/libcudacxx/test/libcudacxx/force_include.h(112): error: identifier "cudaLaunchKernelEx" is undefined

This seems to be a bug in CTK 12.0, which defines cudaLaunchKernelEx only when #if __cplusplus >= 201103 || defined(__DOXYGEN_ONLY__), but MSVC reports __cplusplus as 199711L independently of the standard mode (unless /Zc:__cplusplus, but we cannot demand users defining this). This was fixed in a later CTK. I am trying to find which version it was fixed.

Edit: fixed in CTK 12.3

@miscco miscco requested a review from a team as a code owner January 8, 2025 14:26
@bernhardmgruber bernhardmgruber requested a review from a team as a code owner January 8, 2025 14:44
Copy link
Contributor

github-actions bot commented Jan 8, 2025

🟨 CI finished in 1h 45m: Pass: 97%/175 | Total: 2d 07h | Avg: 18m 59s | Max: 1h 11m | Hits: 66%/25925
  • 🟨 thrust: Pass: 95%/47 | Total: 18h 01m | Avg: 23m 00s | Max: 1h 09m | Hits: 69%/9260

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/45  | Total: 17h 51m | Avg: 23m 48s | Max:  1h 09m | Hits:  69%/9260  
      🟩 arm64              Pass: 100%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  5m 13s
    🔍 ctk: 12.0 🔍
      🔍 12.0               Pass:  75%/8   | Total:  5h 30m | Avg: 41m 15s | Max:  1h 09m
      🟩 12.5               Pass: 100%/2   | Total:  1h 49m | Avg: 54m 43s | Max: 55m 41s
      🟩 12.6               Pass: 100%/37  | Total: 10h 41m | Avg: 17m 20s | Max:  1h 09m | Hits:  69%/9260  
    🔍 cudacxx: nvcc12.0 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 08s
      🔍 nvcc12.0           Pass:  75%/8   | Total:  5h 30m | Avg: 41m 15s | Max:  1h 09m
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 49m | Avg: 54m 43s | Max: 55m 41s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 10h 31m | Avg: 18m 02s | Max:  1h 09m | Hits:  69%/9260  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 08s
      🔍 nvcc               Pass:  95%/45  | Total: 17h 51m | Avg: 23m 48s | Max:  1h 09m | Hits:  69%/9260  
    🔍 cxx: MSVC14.29 🔍
      🟩 Clang9             Pass: 100%/4   | Total:  2h 13m | Avg: 33m 15s | Max: 40m 27s
      🟩 Clang10            Pass: 100%/1   | Total: 35m 05s | Avg: 35m 05s | Max: 35m 05s
      🟩 Clang11            Pass: 100%/1   | Total: 32m 55s | Avg: 32m 55s | Max: 32m 55s
      🟩 Clang12            Pass: 100%/1   | Total: 32m 20s | Avg: 32m 20s | Max: 32m 20s
      🟩 Clang13            Pass: 100%/1   | Total: 31m 41s | Avg: 31m 41s | Max: 31m 41s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 41s | Avg:  5m 41s | Max:  5m 41s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
      🟩 Clang18            Pass: 100%/7   | Total: 46m 11s | Avg:  6m 35s | Max: 13m 28s
      🟩 GCC7               Pass: 100%/4   | Total:  1h 17m | Avg: 19m 29s | Max: 39m 09s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 GCC9               Pass: 100%/3   | Total:  1h 12m | Avg: 24m 00s | Max: 34m 52s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 54s | Avg:  5m 54s | Max:  5m 54s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 41s | Max: 14m 15s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  60%/1852  
      🔍 MSVC14.29          Pass:  33%/3   | Total:  3h 10m | Avg:  1h 03m | Max:  1h 09m | Hits:  62%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 31m | Avg: 50m 35s | Max:  1h 09m | Hits:  74%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 49m | Avg: 54m 43s | Max: 55m 41s
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/19  | Total:  5h 33m | Avg: 17m 32s | Max: 40m 27s
      🟩 GCC                Pass: 100%/19  | Total:  3h 53m | Avg: 12m 18s | Max: 39m 09s
      🔍 MSVC               Pass:  71%/7   | Total:  6h 44m | Avg: 57m 48s | Max:  1h 09m | Hits:  69%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 49m | Avg: 54m 43s | Max: 55m 41s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/41  | Total: 16h 42m | Avg: 24m 26s | Max:  1h 09m | Hits:  61%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 39m 49s | Avg: 13m 16s | Max: 24m 01s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 39m 00s | Avg: 13m 00s | Max: 14m 15s
    🟨 std
      🟩 11                 Pass: 100%/5   | Total:  2h 01m | Avg: 24m 20s | Max: 31m 46s
      🟨 14                 Pass:  75%/4   | Total:  2h 53m | Avg: 43m 26s | Max:  1h 09m | Hits:  60%/1852  
      🟨 17                 Pass:  92%/13  | Total:  6h 50m | Avg: 31m 36s | Max:  1h 03m | Hits:  62%/3704  
      🟩 20                 Pass: 100%/23  | Total:  5h 54m | Avg: 15m 25s | Max:  1h 09m | Hits:  80%/3704  
    🟨 gpu
      🟨 v100               Pass:  95%/47  | Total: 18h 01m | Avg: 23m 00s | Max:  1h 09m | Hits:  69%/9260  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 20m 08s | Avg: 10m 04s | Max: 14m 15s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
    
  • 🟨 libcudacxx: Pass: 97%/49 | Total: 9h 18m | Avg: 11m 23s | Max: 38m 37s | Hits: 72%/12453

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/47  | Total:  9h 11m | Avg: 11m 44s | Max: 38m 37s | Hits:  72%/12453 
      🟩 arm64              Pass: 100%/2   | Total:  6m 54s | Avg:  3m 27s | Max:  3m 35s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/8   | Total:  2h 00m | Avg: 15m 02s | Max: 30m 23s | Hits:  31%/4863  
      🟩 12.5               Pass: 100%/2   | Total: 17m 06s | Avg:  8m 33s | Max:  8m 44s
      🔍 12.6               Pass:  97%/39  | Total:  7h 00m | Avg: 10m 47s | Max: 38m 37s | Hits:  99%/7590  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 07m | Avg: 16m 45s | Max: 21m 10s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  2h 00m | Avg: 15m 02s | Max: 30m 23s | Hits:  31%/4863  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 06s | Avg:  8m 33s | Max:  8m 44s
      🔍 nvcc12.6           Pass:  97%/35  | Total:  5h 53m | Avg: 10m 06s | Max: 38m 37s | Hits:  99%/7590  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 07m | Avg: 16m 45s | Max: 21m 10s
      🔍 nvcc               Pass:  97%/45  | Total:  8h 11m | Avg: 10m 55s | Max: 38m 37s | Hits:  72%/12453 
    🚨 cxx: MSVC14.16 🚨
      🟩 Clang9             Pass: 100%/4   | Total: 20m 33s | Avg:  5m 08s | Max:  8m 06s
      🟩 Clang10            Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s
      🟩 Clang12            Pass: 100%/1   | Total:  3m 58s | Avg:  3m 58s | Max:  3m 58s
      🟩 Clang13            Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
      🟩 Clang14            Pass: 100%/1   | Total:  4m 12s | Avg:  4m 12s | Max:  4m 12s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 43m | Avg: 12m 53s | Max: 24m 07s
      🟩 GCC7               Pass: 100%/4   | Total: 27m 02s | Avg:  6m 45s | Max: 17m 48s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 37s | Avg:  3m 37s | Max:  3m 37s
      🟩 GCC9               Pass: 100%/3   | Total: 34m 45s | Avg: 11m 35s | Max: 17m 48s
      🟩 GCC10              Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 56s | Avg:  3m 56s | Max:  3m 56s
      🟩 GCC12              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 54m | Avg: 17m 25s | Max: 38m 37s
      🔥 MSVC14.16          Pass:   0%/1   | Total: 30m 06s | Avg: 30m 06s | Max: 30m 06s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 10m | Avg: 23m 29s | Max: 30m 23s | Hits:  54%/7344  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 30m 42s | Avg: 15m 21s | Max: 15m 58s | Hits:  99%/5109  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 06s | Avg:  8m 33s | Max:  8m 44s
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/20  | Total:  2h 38m | Avg:  7m 55s | Max: 24m 07s
      🟩 GCC                Pass: 100%/21  | Total:  4h 11m | Avg: 11m 58s | Max: 38m 37s
      🔍 MSVC               Pass:  83%/6   | Total:  2h 11m | Avg: 21m 52s | Max: 30m 23s | Hits:  72%/12453 
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 06s | Avg:  8m 33s | Max:  8m 44s
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  97%/42  | Total:  6h 14m | Avg:  8m 55s | Max: 30m 23s | Hits:  72%/12453 
      🟩 NVRTC              Pass: 100%/4   | Total:  2h 18m | Avg: 34m 40s | Max: 38m 37s
      🟩 Test               Pass: 100%/2   | Total: 43m 12s | Avg: 21m 36s | Max: 24m 07s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 56s | Avg:  1m 56s | Max:  1m 56s
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/6   | Total:  1h 00m | Avg: 10m 00s | Max: 33m 21s
      🔍 14                 Pass:  80%/5   | Total:  1h 43m | Avg: 20m 37s | Max: 37m 56s | Hits:  32%/2392  
      🟩 17                 Pass: 100%/14  | Total:  3h 09m | Avg: 13m 30s | Max: 38m 37s | Hits:  76%/7433  
      🟩 20                 Pass: 100%/23  | Total:  3h 24m | Avg:  8m 52s | Max: 28m 46s | Hits:  98%/2628  
    🟨 gpu
      🟨 v100               Pass:  97%/49  | Total:  9h 18m | Avg: 11m 23s | Max: 38m 37s | Hits:  72%/12453 
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 13m 38s | Avg: 13m 38s | Max: 13m 38s
      🟩 90a                Pass: 100%/2   | Total: 16m 26s | Avg:  8m 13s | Max: 12m 39s
    
  • 🟨 cub: Pass: 97%/48 | Total: 1d 00h | Avg: 30m 49s | Max: 1h 11m | Hits: 40%/3900

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/46  | Total:  1d 00h | Avg: 31m 57s | Max:  1h 11m | Hits:  40%/3900  
      🟩 arm64              Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 59s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/8   | Total:  7h 27m | Avg: 55m 55s | Max:  1h 04m | Hits:  40%/1560  
      🟩 12.5               Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m
      🔍 12.6               Pass:  97%/38  | Total: 14h 56m | Avg: 23m 36s | Max:  1h 11m | Hits:  40%/2340  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 06s | Avg:  4m 33s | Max:  4m 41s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  7h 27m | Avg: 55m 55s | Max:  1h 04m | Hits:  40%/1560  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m
      🔍 nvcc12.6           Pass:  97%/36  | Total: 14h 47m | Avg: 24m 39s | Max:  1h 11m | Hits:  40%/2340  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 06s | Avg:  4m 33s | Max:  4m 41s
      🔍 nvcc               Pass:  97%/46  | Total:  1d 00h | Avg: 31m 58s | Max:  1h 11m | Hits:  40%/3900  
    🚨 cxx: MSVC14.16 🚨
      🟩 Clang9             Pass: 100%/4   | Total:  3h 40m | Avg: 55m 11s | Max: 57m 00s
      🟩 Clang10            Pass: 100%/1   | Total: 54m 36s | Avg: 54m 36s | Max: 54m 36s
      🟩 Clang11            Pass: 100%/1   | Total: 55m 49s | Avg: 55m 49s | Max: 55m 49s
      🟩 Clang12            Pass: 100%/1   | Total: 55m 14s | Avg: 55m 14s | Max: 55m 14s
      🟩 Clang13            Pass: 100%/1   | Total: 54m 00s | Avg: 54m 00s | Max: 54m 00s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 09s | Avg:  5m 09s | Max:  5m 09s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 13s | Avg:  5m 13s | Max:  5m 13s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 33m | Avg: 13m 23s | Max: 39m 50s
      🟩 GCC7               Pass: 100%/4   | Total:  1h 57m | Avg: 29m 18s | Max: 53m 28s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
      🟩 GCC9               Pass: 100%/3   | Total:  1h 54m | Avg: 38m 06s | Max: 56m 25s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 32s | Avg:  5m 32s | Max:  5m 32s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 02s | Avg:  6m 02s | Max:  6m 02s
      🟩 GCC12              Pass: 100%/3   | Total: 25m 47s | Avg:  8m 35s | Max: 15m 59s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 03m | Avg: 15m 24s | Max: 37m 46s
      🔥 MSVC14.16          Pass:   0%/1   | Total:  1h 11m | Avg:  1h 11m | Max:  1h 11m
      🟩 MSVC14.29          Pass: 100%/3   | Total:  3h 03m | Avg:  1h 01m | Max:  1h 04m | Hits:  40%/2340  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m | Hits:  40%/1560  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m
    🔍 cxx_family: MSVC 🔍
      🟩 Clang              Pass: 100%/19  | Total:  9h 15m | Avg: 29m 13s | Max: 57m 00s
      🟩 GCC                Pass: 100%/21  | Total:  6h 37m | Avg: 18m 55s | Max: 56m 25s
      🔍 MSVC               Pass:  83%/6   | Total:  6h 31m | Avg:  1h 05m | Max:  1h 11m | Hits:  40%/3900  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 20m 15s | Avg: 10m 07s | Max: 15m 59s
      🔍 v100               Pass:  97%/46  | Total:  1d 00h | Avg: 31m 43s | Max:  1h 11m | Hits:  40%/3900  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  97%/41  | Total: 21h 32m | Avg: 31m 30s | Max:  1h 11m | Hits:  40%/3900  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 50s | Avg: 19m 50s | Max: 19m 50s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 47s | Avg: 17m 47s | Max: 17m 47s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 04s | Max: 29m 11s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 17m | Avg: 38m 48s | Max: 39m 50s
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/5   | Total:  3h 38m | Avg: 43m 39s | Max: 55m 22s
      🔍 14                 Pass:  75%/4   | Total:  3h 12m | Avg: 48m 12s | Max:  1h 11m | Hits:  40%/780   
      🟩 17                 Pass: 100%/13  | Total:  8h 29m | Avg: 39m 11s | Max:  1h 11m | Hits:  40%/2340  
      🟩 20                 Pass: 100%/26  | Total:  9h 18m | Avg: 21m 29s | Max:  1h 10m | Hits:  40%/780   
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 20m 15s | Avg: 10m 07s | Max: 15m 59s
      🟩 90a                Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 21m | Avg: 5m 53s | Max: 28m 23s | Hits: 90%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  2h 08m | Avg:  6m 26s | Max: 28m 23s | Hits:  90%/312   
      🟩 arm64              Pass: 100%/4   | Total: 12m 38s | Avg:  3m 09s | Max:  3m 15s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 57s | Avg:  9m 57s | Max:  9m 57s | Hits:  90%/156   
      🟩 12.5               Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 15s
      🟩 12.6               Pass: 100%/21  | Total:  1h 59m | Avg:  5m 41s | Max: 28m 23s | Hits:  90%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 57s | Avg:  9m 57s | Max:  9m 57s | Hits:  90%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 15s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 59m | Avg:  5m 41s | Max: 28m 23s | Hits:  90%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 21m | Avg:  5m 53s | Max: 28m 23s | Hits:  90%/312   
    🟩 cxx
      🟩 Clang10            Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s
      🟩 Clang11            Pass: 100%/1   | Total:  3m 57s | Avg:  3m 57s | Max:  3m 57s
      🟩 Clang12            Pass: 100%/1   | Total:  3m 54s | Avg:  3m 54s | Max:  3m 54s
      🟩 Clang13            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 36s | Avg:  3m 36s | Max:  3m 36s
      🟩 Clang18            Pass: 100%/4   | Total: 38m 02s | Avg:  9m 30s | Max: 28m 23s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 47s | Avg:  3m 47s | Max:  3m 47s
      🟩 GCC12              Pass: 100%/2   | Total: 20m 50s | Avg: 10m 25s | Max: 17m 11s
      🟩 GCC13              Pass: 100%/4   | Total: 12m 49s | Avg:  3m 12s | Max:  3m 20s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 57s | Avg:  9m 57s | Max:  9m 57s | Hits:  90%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 58s | Avg:  9m 58s | Max:  9m 58s | Hits:  90%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 15s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/12  | Total:  1h 08m | Avg:  5m 41s | Max: 28m 23s
      🟩 GCC                Pass: 100%/8   | Total: 41m 00s | Avg:  5m 07s | Max: 17m 11s
      🟩 MSVC               Pass: 100%/2   | Total: 19m 55s | Avg:  9m 57s | Max:  9m 58s | Hits:  90%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 15s
    🟩 gpu
      🟩 v100               Pass: 100%/24  | Total:  2h 21m | Avg:  5m 53s | Max: 28m 23s | Hits:  90%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/22  | Total:  1h 35m | Avg:  4m 21s | Max:  9m 58s | Hits:  90%/312   
      🟩 Test               Pass: 100%/2   | Total: 45m 34s | Avg: 22m 47s | Max: 28m 23s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 06s | Avg:  3m 06s | Max:  3m 06s
      🟩 90a                Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 15m 37s | Avg:  3m 54s | Max:  6m 15s
      🟩 20                 Pass: 100%/20  | Total:  2h 05m | Avg:  6m 17s | Max: 28m 23s | Hits:  90%/312   
    
  • 🟩 cccl: Pass: 100%/4 | Total: 23m 53s | Avg: 5m 58s | Max: 7m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 23m 53s | Avg:  5m 58s | Max:  7m 36s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  7m 36s
      🟩 12.6               Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  6m 06s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 12m 37s | Avg:  6m 18s | Max:  7m 36s
      🟩 nvcc12.6           Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  6m 06s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 23m 53s | Avg:  5m 58s | Max:  7m 36s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  7m 36s | Avg:  7m 36s | Max:  7m 36s
      🟩 Clang18            Pass: 100%/1   | Total:  5m 10s | Avg:  5m 10s | Max:  5m 10s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 01s | Avg:  5m 01s | Max:  5m 01s
      🟩 GCC13              Pass: 100%/1   | Total:  6m 06s | Avg:  6m 06s | Max:  6m 06s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 12m 46s | Avg:  6m 23s | Max:  7m 36s
      🟩 GCC                Pass: 100%/2   | Total: 11m 07s | Avg:  5m 33s | Max:  6m 06s
    🟩 gpu
      🟩 v100               Pass: 100%/4   | Total: 23m 53s | Avg:  5m 58s | Max:  7m 36s
    🟩 jobs
      🟩 Infra              Pass: 100%/4   | Total: 23m 53s | Avg:  5m 58s | Max:  7m 36s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 23s | Avg: 4m 41s | Max: 7m 26s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 26s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 26s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 26s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 26s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 26s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 26s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 23s | Avg:  4m 41s | Max:  7m 26s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 57s | Avg:  1m 57s | Max:  1m 57s
      🟩 Test               Pass: 100%/1   | Total:  7m 26s | Avg:  7m 26s | Max:  7m 26s
    
  • 🟩 python: Pass: 100%/1 | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 28m 40s | Avg: 28m 40s | Max: 28m 40s
    

👃 Inspect Changes

Modifications in project?

Project
+/- CCCL Infrastructure
+/- libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
+/- CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 175)

# Runner
120 linux-amd64-cpu16
23 linux-amd64-gpu-v100-latest-1
21 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@bernhardmgruber bernhardmgruber force-pushed the drop_ctk11 branch 2 times, most recently from 3cd8f4c to 98d7089 Compare January 8, 2025 18:02
Copy link
Contributor

github-actions bot commented Jan 8, 2025

🟨 CI finished in 2h 47m: Pass: 97%/172 | Total: 1d 18h | Avg: 14m 57s | Max: 1h 08m | Hits: 172%/23877
  • 🟨 cub: Pass: 89%/47 | Total: 14h 26m | Avg: 18m 25s | Max: 1h 08m

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  88%/45  | Total: 13h 20m | Avg: 17m 46s | Max:  1h 08m
      🟩 arm64              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 05s | Max:  1h 01m
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 08s | Avg:  4m 34s | Max:  4m 51s
      🔍 nvcc               Pass:  88%/45  | Total: 14h 17m | Avg: 19m 02s | Max:  1h 08m
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/19  | Total:  2h 37m | Avg:  8m 17s | Max: 35m 18s
      🟩 GCC                Pass: 100%/21  | Total:  6h 03m | Avg: 17m 17s | Max:  1h 02m
      🔥 MSVC               Pass:   0%/5   | Total:  5h 26m | Avg:  1h 05m | Max:  1h 08m
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 10s | Avg:  9m 35s | Max:  9m 50s
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 20m 36s | Avg: 10m 18s | Max: 16m 09s
      🔍 v100               Pass:  88%/45  | Total: 14h 05m | Avg: 18m 47s | Max:  1h 08m
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  87%/40  | Total: 11h 31m | Avg: 17m 16s | Max:  1h 08m
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 19s | Avg: 19m 19s | Max: 19m 19s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 04s | Avg: 17m 04s | Max: 17m 04s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 06m | Avg: 22m 15s | Max: 27m 11s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 12m | Avg: 36m 05s | Max: 36m 53s
    🟨 ctk
      🟨 12.0               Pass:  75%/8   | Total:  2h 40m | Avg: 20m 06s | Max:  1h 07m
      🟩 12.5               Pass: 100%/2   | Total: 19m 10s | Avg:  9m 35s | Max:  9m 50s
      🟨 12.6               Pass:  91%/37  | Total: 11h 26m | Avg: 18m 33s | Max:  1h 08m
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 08s | Avg:  4m 34s | Max:  4m 51s
      🟨 nvcc12.0           Pass:  75%/8   | Total:  2h 40m | Avg: 20m 06s | Max:  1h 07m
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 10s | Avg:  9m 35s | Max:  9m 50s
      🟨 nvcc12.6           Pass:  91%/35  | Total: 11h 17m | Avg: 19m 20s | Max:  1h 08m
    🟨 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 23m 42s | Avg:  5m 55s | Max:  6m 40s
      🟩 Clang10            Pass: 100%/1   | Total:  7m 03s | Avg:  7m 03s | Max:  7m 03s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 32s | Avg:  5m 32s | Max:  5m 32s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 37s | Avg:  5m 37s | Max:  5m 37s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 27m | Avg: 12m 31s | Max: 35m 18s
      🟩 GCC7               Pass: 100%/4   | Total: 21m 12s | Avg:  5m 18s | Max:  5m 41s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 GCC9               Pass: 100%/3   | Total: 16m 27s | Avg:  5m 29s | Max:  5m 48s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 GCC11              Pass: 100%/1   | Total:  6m 04s | Avg:  6m 04s | Max:  6m 04s
      🟩 GCC12              Pass: 100%/3   | Total: 26m 40s | Avg:  8m 53s | Max: 16m 09s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 41m | Avg: 35m 09s | Max:  1h 02m
      🟥 MSVC14.29          Pass:   0%/3   | Total:  3h 11m | Avg:  1h 03m | Max:  1h 07m
      🟥 MSVC14.39          Pass:   0%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 08m
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 10s | Avg:  9m 35s | Max:  9m 50s
    🟨 std
      🟩 11                 Pass: 100%/5   | Total: 26m 58s | Avg:  5m 23s | Max:  5m 59s
      🟨 14                 Pass:  66%/3   | Total:  1h 13m | Avg: 24m 35s | Max:  1h 01m
      🟨 17                 Pass:  76%/13  | Total:  5h 13m | Avg: 24m 04s | Max:  1h 07m
      🟨 20                 Pass:  96%/26  | Total:  7h 32m | Avg: 17m 24s | Max:  1h 08m
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 20m 36s | Avg: 10m 18s | Max: 16m 09s
      🟩 90a                Pass: 100%/1   | Total: 27m 00s | Avg: 27m 00s | Max: 27m 00s
    
  • 🟩 libcudacxx: Pass: 100%/48 | Total: 12h 09m | Avg: 15m 11s | Max: 1h 05m | Hits: 203%/12453

    🟩 cpu
      🟩 amd64              Pass: 100%/46  | Total: 11h 42m | Avg: 15m 16s | Max:  1h 05m | Hits: 203%/12453 
      🟩 arm64              Pass: 100%/2   | Total: 26m 40s | Avg: 13m 20s | Max: 23m 03s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  2h 22m | Avg: 17m 45s | Max: 31m 26s | Hits: 160%/4863  
      🟩 12.5               Pass: 100%/2   | Total: 39m 49s | Avg: 19m 54s | Max: 31m 06s
      🟩 12.6               Pass: 100%/38  | Total:  9h 07m | Avg: 14m 24s | Max:  1h 05m | Hits: 230%/7590  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 06m | Avg: 16m 38s | Max: 22m 14s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  2h 22m | Avg: 17m 45s | Max: 31m 26s | Hits: 160%/4863  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 39m 49s | Avg: 19m 54s | Max: 31m 06s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  8h 01m | Avg: 14m 08s | Max:  1h 05m | Hits: 230%/7590  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 06m | Avg: 16m 38s | Max: 22m 14s
      🟩 nvcc               Pass: 100%/44  | Total: 11h 02m | Avg: 15m 03s | Max:  1h 05m | Hits: 203%/12453 
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 41m 13s | Avg: 10m 18s | Max: 17m 54s
      🟩 Clang10            Pass: 100%/1   | Total:  4m 57s | Avg:  4m 57s | Max:  4m 57s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 12s | Avg:  4m 12s | Max:  4m 12s
      🟩 Clang12            Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
      🟩 Clang13            Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s
      🟩 Clang14            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 35s | Avg:  4m 35s | Max:  4m 35s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 30s | Avg:  4m 30s | Max:  4m 30s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 37m | Avg: 12m 12s | Max: 22m 14s
      🟩 GCC7               Pass: 100%/4   | Total: 39m 21s | Avg:  9m 50s | Max: 17m 43s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 37s | Avg:  3m 37s | Max:  3m 37s
      🟩 GCC9               Pass: 100%/3   | Total: 21m 15s | Avg:  7m 05s | Max: 14m 14s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 50s | Avg:  3m 50s | Max:  3m 50s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 06s | Avg:  4m 06s | Max:  4m 06s
      🟩 GCC13              Pass: 100%/10  | Total:  4h 21m | Avg: 26m 09s | Max:  1h 05m
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 36m | Avg: 32m 16s | Max: 37m 25s | Hits: 150%/7344  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 21m | Avg: 40m 32s | Max: 43m 08s | Hits: 280%/5109  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 39m 49s | Avg: 19m 54s | Max: 31m 06s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/20  | Total:  2h 54m | Avg:  8m 42s | Max: 22m 14s
      🟩 GCC                Pass: 100%/21  | Total:  5h 37m | Avg: 16m 04s | Max:  1h 05m
      🟩 MSVC               Pass: 100%/5   | Total:  2h 57m | Avg: 35m 35s | Max: 43m 08s | Hits: 203%/12453 
      🟩 NVHPC              Pass: 100%/2   | Total: 39m 49s | Avg: 19m 54s | Max: 31m 06s
    🟩 gpu
      🟩 v100               Pass: 100%/48  | Total: 12h 09m | Avg: 15m 11s | Max:  1h 05m | Hits: 203%/12453 
    🟩 jobs
      🟩 Build              Pass: 100%/41  | Total:  9h 00m | Avg: 13m 11s | Max: 43m 08s | Hits: 203%/12453 
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 42m | Avg: 25m 35s | Max: 27m 52s
      🟩 Test               Pass: 100%/2   | Total:  1h 24m | Avg: 42m 09s | Max:  1h 05m
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 09s | Avg:  2m 09s | Max:  2m 09s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 12m 46s | Avg: 12m 46s | Max: 12m 46s
      🟩 90a                Pass: 100%/2   | Total: 32m 45s | Avg: 16m 22s | Max: 19m 32s
    🟩 std
      🟩 11                 Pass: 100%/6   | Total:  1h 15m | Avg: 12m 33s | Max: 24m 37s
      🟩 14                 Pass: 100%/4   | Total:  1h 03m | Avg: 15m 54s | Max: 27m 59s | Hits: 132%/2392  
      🟩 17                 Pass: 100%/14  | Total:  4h 17m | Avg: 18m 25s | Max: 37m 57s | Hits: 205%/7433  
      🟩 20                 Pass: 100%/23  | Total:  5h 30m | Avg: 14m 21s | Max:  1h 05m | Hits: 263%/2628  
    
  • 🟩 thrust: Pass: 100%/46 | Total: 12h 25m | Avg: 16m 12s | Max: 1h 02m | Hits: 140%/11112

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 43m 48s | Avg: 21m 54s | Max: 22m 13s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 11h 42m | Avg: 15m 58s | Max:  1h 02m | Hits: 140%/11112 
      🟩 arm64              Pass: 100%/2   | Total: 42m 36s | Avg: 21m 18s | Max: 38m 02s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  2h 30m | Avg: 18m 46s | Max:  1h 02m | Hits:  61%/3704  
      🟩 12.5               Pass: 100%/2   | Total: 30m 29s | Avg: 15m 14s | Max: 15m 24s
      🟩 12.6               Pass: 100%/36  | Total:  9h 24m | Avg: 15m 41s | Max:  1h 00m | Hits: 179%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 22s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  2h 30m | Avg: 18m 46s | Max:  1h 02m | Hits:  61%/3704  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 30m 29s | Avg: 15m 14s | Max: 15m 24s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  9h 14m | Avg: 16m 18s | Max:  1h 00m | Hits: 179%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 26s | Avg:  5m 13s | Max:  5m 22s
      🟩 nvcc               Pass: 100%/44  | Total: 12h 14m | Avg: 16m 42s | Max:  1h 02m | Hits: 140%/11112 
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 23m 34s | Avg:  5m 53s | Max:  6m 35s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 26s | Avg:  6m 26s | Max:  6m 26s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 16s | Avg:  5m 16s | Max:  5m 16s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 37s | Avg:  5m 37s | Max:  5m 37s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 44s | Avg:  5m 44s | Max:  5m 44s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 12s | Avg:  5m 12s | Max:  5m 12s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 43s | Avg:  5m 43s | Max:  5m 43s
      🟩 Clang18            Pass: 100%/7   | Total: 55m 00s | Avg:  7m 51s | Max: 20m 58s
      🟩 GCC7               Pass: 100%/4   | Total: 20m 05s | Avg:  5m 01s | Max:  5m 37s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
      🟩 GCC9               Pass: 100%/3   | Total: 15m 24s | Avg:  5m 08s | Max:  5m 35s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 35s | Avg:  5m 35s | Max:  5m 35s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 27s | Avg:  5m 27s | Max:  5m 27s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 08s | Avg:  6m 08s | Max:  6m 08s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 23m | Avg: 25m 28s | Max: 38m 15s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  2h 59m | Avg: 59m 58s | Max:  1h 02m | Hits:  68%/5556  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 29m | Avg: 49m 56s | Max: 59m 42s | Hits: 212%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 30m 29s | Avg: 15m 14s | Max: 15m 24s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 03m | Avg:  6m 29s | Max: 20m 58s
      🟩 GCC                Pass: 100%/19  | Total:  4h 21m | Avg: 13m 47s | Max: 38m 15s
      🟩 MSVC               Pass: 100%/6   | Total:  5h 29m | Avg: 54m 57s | Max:  1h 02m | Hits: 140%/11112 
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 29s | Avg: 15m 14s | Max: 15m 24s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total: 12h 25m | Avg: 16m 12s | Max:  1h 02m | Hits: 140%/11112 
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total: 10h 23m | Avg: 15m 35s | Max:  1h 02m | Hits:  95%/9260  
      🟩 TestCPU            Pass: 100%/3   | Total: 51m 46s | Avg: 17m 15s | Max: 35m 58s | Hits: 365%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 10m | Avg: 23m 23s | Max: 27m 37s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 23m 06s | Avg: 23m 06s | Max: 23m 06s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 25m 03s | Avg:  5m 00s | Max:  6m 04s
      🟩 14                 Pass: 100%/3   | Total:  1h 09m | Avg: 23m 16s | Max: 57m 37s | Hits:  61%/1852  
      🟩 17                 Pass: 100%/13  | Total:  4h 40m | Avg: 21m 32s | Max:  1h 02m | Hits:  77%/5556  
      🟩 20                 Pass: 100%/23  | Total:  5h 26m | Avg: 14m 11s | Max: 54m 08s | Hits: 274%/3704  
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 53m | Avg: 7m 14s | Max: 24m 19s | Hits: 41%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  2h 20m | Avg:  7m 00s | Max: 24m 19s | Hits:  41%/312   
      🟩 arm64              Pass: 100%/4   | Total: 33m 44s | Avg:  8m 26s | Max: 15m 17s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 12m 25s | Avg: 12m 25s | Max: 12m 25s | Hits:  40%/156   
      🟩 12.5               Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  5m 12s
      🟩 12.6               Pass: 100%/21  | Total:  2h 31m | Avg:  7m 12s | Max: 24m 19s | Hits:  42%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 12m 25s | Avg: 12m 25s | Max: 12m 25s | Hits:  40%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  5m 12s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  2h 31m | Avg:  7m 12s | Max: 24m 19s | Hits:  42%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 53m | Avg:  7m 14s | Max: 24m 19s | Hits:  41%/312   
    🟩 cxx
      🟩 Clang10            Pass: 100%/1   | Total:  3m 51s | Avg:  3m 51s | Max:  3m 51s
      🟩 Clang11            Pass: 100%/1   | Total:  3m 14s | Avg:  3m 14s | Max:  3m 14s
      🟩 Clang12            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang13            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 11s | Avg:  3m 11s | Max:  3m 11s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 25s | Avg:  3m 25s | Max:  3m 25s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s
      🟩 Clang18            Pass: 100%/4   | Total: 33m 10s | Avg:  8m 17s | Max: 24m 19s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s
      🟩 GCC12              Pass: 100%/2   | Total: 22m 01s | Avg: 11m 00s | Max: 18m 39s
      🟩 GCC13              Pass: 100%/4   | Total: 51m 14s | Avg: 12m 48s | Max: 15m 17s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 12m 25s | Avg: 12m 25s | Max: 12m 25s | Hits:  40%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 20s | Avg: 11m 20s | Max: 11m 20s | Hits:  42%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  5m 12s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/12  | Total:  1h 00m | Avg:  5m 00s | Max: 24m 19s
      🟩 GCC                Pass: 100%/8   | Total:  1h 19m | Avg:  9m 59s | Max: 18m 39s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 45s | Avg: 11m 52s | Max: 12m 25s | Hits:  41%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  5m 12s
    🟩 gpu
      🟩 v100               Pass: 100%/24  | Total:  2h 53m | Avg:  7m 14s | Max: 24m 19s | Hits:  41%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/22  | Total:  2h 10m | Avg:  5m 57s | Max: 15m 17s | Hits:  41%/312   
      🟩 Test               Pass: 100%/2   | Total: 42m 58s | Avg: 21m 29s | Max: 24m 19s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 10m 33s | Avg: 10m 33s | Max: 10m 33s
      🟩 90a                Pass: 100%/1   | Total: 12m 23s | Avg: 12m 23s | Max: 12m 23s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 31m 12s | Avg:  7m 48s | Max: 13m 01s
      🟩 20                 Pass: 100%/20  | Total:  2h 22m | Avg:  7m 08s | Max: 24m 19s | Hits:  41%/312   
    
  • 🟩 cccl: Pass: 100%/4 | Total: 19m 05s | Avg: 4m 46s | Max: 5m 02s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 19m 05s | Avg:  4m 46s | Max:  5m 02s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  5m 02s
      🟩 12.6               Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  5m 02s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  5m 02s
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 19s | Avg:  4m 39s | Max:  5m 02s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 19m 05s | Avg:  4m 46s | Max:  5m 02s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  5m 02s | Avg:  5m 02s | Max:  5m 02s
      🟩 Clang18            Pass: 100%/1   | Total:  5m 02s | Avg:  5m 02s | Max:  5m 02s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 44s | Avg:  4m 44s | Max:  4m 44s
      🟩 GCC13              Pass: 100%/1   | Total:  4m 17s | Avg:  4m 17s | Max:  4m 17s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 10m 04s | Avg:  5m 02s | Max:  5m 02s
      🟩 GCC                Pass: 100%/2   | Total:  9m 01s | Avg:  4m 30s | Max:  4m 44s
    🟩 gpu
      🟩 v100               Pass: 100%/4   | Total: 19m 05s | Avg:  4m 46s | Max:  5m 02s
    🟩 jobs
      🟩 Infra              Pass: 100%/4   | Total: 19m 05s | Avg:  4m 46s | Max:  5m 02s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 40s | Avg: 5m 20s | Max: 8m 33s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  8m 33s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  8m 33s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  8m 33s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  8m 33s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  8m 33s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  8m 33s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 40s | Avg:  5m 20s | Max:  8m 33s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 07s | Avg:  2m 07s | Max:  2m 07s
      🟩 Test               Pass: 100%/1   | Total:  8m 33s | Avg:  8m 33s | Max:  8m 33s
    
  • 🟩 python: Pass: 100%/1 | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 26m 55s | Avg: 26m 55s | Max: 26m 55s
    

👃 Inspect Changes

Modifications in project?

Project
+/- CCCL Infrastructure
+/- libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
+/- CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 172)

# Runner
120 linux-amd64-cpu16
23 linux-amd64-gpu-v100-latest-1
18 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

github-actions bot commented Jan 9, 2025

🟨 CI finished in 2h 08m: Pass: 97%/172 | Total: 1d 11h | Avg: 12m 29s | Max: 1h 06m | Hits: 468%/23877
  • 🟨 cub: Pass: 89%/47 | Total: 10h 24m | Avg: 13m 17s | Max: 1h 04m

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  88%/45  | Total: 10h 15m | Avg: 13m 40s | Max:  1h 04m
      🟩 arm64              Pass: 100%/2   | Total:  9m 22s | Avg:  4m 41s | Max:  4m 41s
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 46s | Avg:  4m 23s | Max:  4m 31s
      🔍 nvcc               Pass:  88%/45  | Total: 10h 16m | Avg: 13m 41s | Max:  1h 04m
    🚨 cxx_family: MSVC 🚨
      🟩 Clang              Pass: 100%/19  | Total:  2h 33m | Avg:  8m 04s | Max: 32m 43s
      🟩 GCC                Pass: 100%/21  | Total:  4h 01m | Avg: 11m 28s | Max: 58m 02s
      🔥 MSVC               Pass:   0%/5   | Total:  3h 32m | Avg: 42m 32s | Max:  1h 04m
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 39s | Avg:  8m 49s | Max:  8m 51s
    🔍 gpu: v100 🔍
      🟩 h100               Pass: 100%/2   | Total: 19m 59s | Avg:  9m 59s | Max: 15m 54s
      🔍 v100               Pass:  88%/45  | Total: 10h 04m | Avg: 13m 26s | Max:  1h 04m
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  87%/40  | Total:  7h 39m | Avg: 11m 29s | Max:  1h 04m
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 17m 42s | Avg: 17m 42s | Max: 17m 42s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 31s | Avg: 14m 31s | Max: 14m 31s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 16m | Avg: 25m 37s | Max: 32m 43s
      🟩 TestGPU            Pass: 100%/2   | Total: 56m 06s | Avg: 28m 03s | Max: 28m 16s
    🟨 ctk
      🟨 12.0               Pass:  75%/8   | Total:  2h 39m | Avg: 19m 53s | Max:  1h 04m
      🟩 12.5               Pass: 100%/2   | Total: 17m 39s | Avg:  8m 49s | Max:  8m 51s
      🟨 12.6               Pass:  91%/37  | Total:  7h 28m | Avg: 12m 06s | Max: 58m 02s
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 46s | Avg:  4m 23s | Max:  4m 31s
      🟨 nvcc12.0           Pass:  75%/8   | Total:  2h 39m | Avg: 19m 53s | Max:  1h 04m
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 39s | Avg:  8m 49s | Max:  8m 51s
      🟨 nvcc12.6           Pass:  91%/35  | Total:  7h 19m | Avg: 12m 33s | Max: 58m 02s
    🟨 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 23m 02s | Avg:  5m 45s | Max:  6m 13s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 28s | Avg:  6m 28s | Max:  6m 28s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 22s | Avg:  5m 22s | Max:  5m 22s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 55s | Avg:  5m 55s | Max:  5m 55s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 25m | Avg: 12m 11s | Max: 32m 43s
      🟩 GCC7               Pass: 100%/4   | Total: 20m 22s | Avg:  5m 05s | Max:  5m 13s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
      🟩 GCC9               Pass: 100%/3   | Total: 16m 53s | Avg:  5m 37s | Max:  5m 51s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 22s | Avg:  5m 22s | Max:  5m 22s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
      🟩 GCC12              Pass: 100%/3   | Total: 26m 04s | Avg:  8m 41s | Max: 15m 54s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 41m | Avg: 20m 11s | Max: 58m 02s
      🟥 MSVC14.29          Pass:   0%/3   | Total:  2h 33m | Avg: 51m 08s | Max:  1h 04m
      🟥 MSVC14.39          Pass:   0%/2   | Total: 59m 17s | Avg: 29m 38s | Max: 30m 51s
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 39s | Avg:  8m 49s | Max:  8m 51s
    🟨 std
      🟩 11                 Pass: 100%/5   | Total: 26m 47s | Avg:  5m 21s | Max:  5m 51s
      🟨 14                 Pass:  66%/3   | Total:  1h 13m | Avg: 24m 39s | Max:  1h 02m
      🟨 17                 Pass:  76%/13  | Total:  3h 50m | Avg: 17m 42s | Max:  1h 04m
      🟨 20                 Pass:  96%/26  | Total:  4h 53m | Avg: 11m 18s | Max: 32m 43s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 19m 59s | Avg:  9m 59s | Max: 15m 54s
      🟩 90a                Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
    
  • 🟩 libcudacxx: Pass: 100%/48 | Total: 14h 05m | Avg: 17m 36s | Max: 1h 06m | Hits: 614%/12453

    🟩 cpu
      🟩 amd64              Pass: 100%/46  | Total: 13h 22m | Avg: 17m 26s | Max:  1h 06m | Hits: 614%/12453 
      🟩 arm64              Pass: 100%/2   | Total: 43m 08s | Avg: 21m 34s | Max: 22m 05s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  2h 20m | Avg: 17m 37s | Max: 23m 24s | Hits: 615%/4863  
      🟩 12.5               Pass: 100%/2   | Total:  1h 00m | Avg: 30m 09s | Max: 30m 47s
      🟩 12.6               Pass: 100%/38  | Total: 10h 43m | Avg: 16m 56s | Max:  1h 06m | Hits: 614%/7590  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 07m | Avg: 16m 48s | Max: 21m 33s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  2h 20m | Avg: 17m 37s | Max: 23m 24s | Hits: 615%/4863  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 00m | Avg: 30m 09s | Max: 30m 47s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  9h 36m | Avg: 16m 57s | Max:  1h 06m | Hits: 614%/7590  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 07m | Avg: 16m 48s | Max: 21m 33s
      🟩 nvcc               Pass: 100%/44  | Total: 12h 57m | Avg: 17m 40s | Max:  1h 06m | Hits: 614%/12453 
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  1h 10m | Avg: 17m 44s | Max: 20m 15s
      🟩 Clang10            Pass: 100%/1   | Total:  5m 01s | Avg:  5m 01s | Max:  5m 01s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 04s | Avg:  4m 04s | Max:  4m 04s
      🟩 Clang12            Pass: 100%/1   | Total: 22m 22s | Avg: 22m 22s | Max: 22m 22s
      🟩 Clang13            Pass: 100%/1   | Total: 17m 43s | Avg: 17m 43s | Max: 17m 43s
      🟩 Clang14            Pass: 100%/1   | Total: 22m 27s | Avg: 22m 27s | Max: 22m 27s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 26s | Avg:  4m 26s | Max:  4m 26s
      🟩 Clang16            Pass: 100%/1   | Total: 21m 44s | Avg: 21m 44s | Max: 21m 44s
      🟩 Clang17            Pass: 100%/1   | Total: 17m 33s | Avg: 17m 33s | Max: 17m 33s
      🟩 Clang18            Pass: 100%/8   | Total:  2h 42m | Avg: 20m 19s | Max:  1h 06m
      🟩 GCC7               Pass: 100%/4   | Total: 46m 07s | Avg: 11m 31s | Max: 16m 48s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 25s | Avg:  3m 25s | Max:  3m 25s
      🟩 GCC9               Pass: 100%/3   | Total: 35m 33s | Avg: 11m 51s | Max: 17m 23s
      🟩 GCC10              Pass: 100%/1   | Total: 22m 24s | Avg: 22m 24s | Max: 22m 24s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 41s | Avg:  3m 41s | Max:  3m 41s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 00s | Avg:  4m 00s | Max:  4m 00s
      🟩 GCC13              Pass: 100%/10  | Total:  3h 08m | Avg: 18m 48s | Max: 38m 04s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 12m | Avg: 24m 15s | Max: 27m 31s | Hits: 615%/7344  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 59m 53s | Avg: 29m 56s | Max: 31m 13s | Hits: 614%/5109  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 09s | Max: 30m 47s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/20  | Total:  5h 48m | Avg: 17m 26s | Max:  1h 06m
      🟩 GCC                Pass: 100%/21  | Total:  5h 03m | Avg: 14m 26s | Max: 38m 04s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 12m | Avg: 26m 31s | Max: 31m 13s | Hits: 614%/12453 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 00m | Avg: 30m 09s | Max: 30m 47s
    🟩 gpu
      🟩 v100               Pass: 100%/48  | Total: 14h 05m | Avg: 17m 36s | Max:  1h 06m | Hits: 614%/12453 
    🟩 jobs
      🟩 Build              Pass: 100%/41  | Total: 10h 33m | Avg: 15m 26s | Max: 31m 13s | Hits: 614%/12453 
      🟩 NVRTC              Pass: 100%/4   | Total:  2h 04m | Avg: 31m 01s | Max: 38m 04s
      🟩 Test               Pass: 100%/2   | Total:  1h 26m | Avg: 43m 03s | Max:  1h 06m
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 51s | Avg:  1m 51s | Max:  1m 51s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 12m 45s | Avg: 12m 45s | Max: 12m 45s
      🟩 90a                Pass: 100%/2   | Total: 16m 22s | Avg:  8m 11s | Max: 12m 42s
    🟩 std
      🟩 11                 Pass: 100%/6   | Total:  1h 32m | Avg: 15m 28s | Max: 19m 07s
      🟩 14                 Pass: 100%/4   | Total:  1h 22m | Avg: 20m 34s | Max: 36m 58s | Hits: 615%/2392  
      🟩 17                 Pass: 100%/14  | Total:  3h 50m | Avg: 16m 28s | Max: 29m 58s | Hits: 615%/7433  
      🟩 20                 Pass: 100%/23  | Total:  7h 17m | Avg: 19m 01s | Max:  1h 06m | Hits: 613%/2628  
    
  • 🟩 thrust: Pass: 100%/46 | Total: 8h 21m | Avg: 10m 54s | Max: 53m 10s | Hits: 302%/11112

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 27m 12s | Avg: 13m 36s | Max: 21m 32s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total:  8h 12m | Avg: 11m 11s | Max: 53m 10s | Hits: 302%/11112 
      🟩 arm64              Pass: 100%/2   | Total:  9m 35s | Avg:  4m 47s | Max:  4m 58s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  2h 15m | Avg: 16m 57s | Max: 53m 10s | Hits: 175%/3704  
      🟩 12.5               Pass: 100%/2   | Total: 27m 37s | Avg: 13m 48s | Max: 14m 06s
      🟩 12.6               Pass: 100%/36  | Total:  5h 38m | Avg:  9m 24s | Max: 33m 27s | Hits: 365%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  5m 07s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  2h 15m | Avg: 16m 57s | Max: 53m 10s | Hits: 175%/3704  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 27m 37s | Avg: 13m 48s | Max: 14m 06s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 28m | Avg:  9m 39s | Max: 33m 27s | Hits: 365%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  5m 07s
      🟩 nvcc               Pass: 100%/44  | Total:  8h 11m | Avg: 11m 10s | Max: 53m 10s | Hits: 302%/11112 
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 21m 54s | Avg:  5m 28s | Max:  5m 59s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 18s | Avg:  6m 18s | Max:  6m 18s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 26s | Avg:  5m 26s | Max:  5m 26s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 35s | Avg:  5m 35s | Max:  5m 35s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 33s | Avg:  5m 33s | Max:  5m 33s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 30s | Avg:  5m 30s | Max:  5m 30s
      🟩 Clang18            Pass: 100%/7   | Total: 47m 25s | Avg:  6m 46s | Max: 14m 22s
      🟩 GCC7               Pass: 100%/4   | Total: 19m 21s | Avg:  4m 50s | Max:  5m 26s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 18s | Avg:  5m 18s | Max:  5m 18s
      🟩 GCC9               Pass: 100%/3   | Total: 14m 59s | Avg:  4m 59s | Max:  5m 29s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 14s | Avg:  5m 14s | Max:  5m 14s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 14s | Avg:  6m 14s | Max:  6m 14s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 16m | Avg:  9m 31s | Max: 21m 32s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  2h 12m | Avg: 44m 13s | Max: 53m 10s | Hits: 239%/5556  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 34m | Avg: 31m 22s | Max: 33m 27s | Hits: 365%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 27m 37s | Avg: 13m 48s | Max: 14m 06s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  1h 54m | Avg:  6m 00s | Max: 14m 22s
      🟩 GCC                Pass: 100%/19  | Total:  2h 13m | Avg:  7m 00s | Max: 21m 32s
      🟩 MSVC               Pass: 100%/6   | Total:  3h 46m | Avg: 37m 47s | Max: 53m 10s | Hits: 302%/11112 
      🟩 NVHPC              Pass: 100%/2   | Total: 27m 37s | Avg: 13m 48s | Max: 14m 06s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  8h 21m | Avg: 10m 54s | Max: 53m 10s | Hits: 302%/11112 
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  6h 37m | Avg:  9m 56s | Max: 53m 10s | Hits: 289%/9260  
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 00s | Avg: 16m 20s | Max: 33m 27s | Hits: 365%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 55m 13s | Avg: 18m 24s | Max: 21m 32s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 23m 26s | Avg:  4m 41s | Max:  5m 46s
      🟩 14                 Pass: 100%/3   | Total:  1h 04m | Avg: 21m 31s | Max: 53m 10s | Hits: 175%/1852  
      🟩 17                 Pass: 100%/13  | Total:  2h 52m | Avg: 13m 14s | Max: 53m 03s | Hits: 302%/5556  
      🟩 20                 Pass: 100%/23  | Total:  3h 34m | Avg:  9m 19s | Max: 33m 27s | Hits: 365%/3704  
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 01m | Avg: 5m 04s | Max: 16m 15s | Hits: 578%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  1h 51m | Avg:  5m 33s | Max: 16m 15s | Hits: 578%/312   
      🟩 arm64              Pass: 100%/4   | Total: 10m 36s | Avg:  2m 39s | Max:  2m 41s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s | Hits: 574%/156   
      🟩 12.5               Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 40s
      🟩 12.6               Pass: 100%/21  | Total:  1h 39m | Avg:  4m 43s | Max: 16m 15s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s | Hits: 574%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 40s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 39m | Avg:  4m 43s | Max: 16m 15s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 01m | Avg:  5m 04s | Max: 16m 15s | Hits: 578%/312   
    🟩 cxx
      🟩 Clang10            Pass: 100%/1   | Total:  3m 51s | Avg:  3m 51s | Max:  3m 51s
      🟩 Clang11            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang12            Pass: 100%/1   | Total:  3m 14s | Avg:  3m 14s | Max:  3m 14s
      🟩 Clang13            Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 Clang18            Pass: 100%/4   | Total: 24m 52s | Avg:  6m 13s | Max: 16m 15s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
      🟩 GCC12              Pass: 100%/2   | Total: 18m 11s | Avg:  9m 05s | Max: 15m 02s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 39s | Avg:  2m 39s | Max:  2m 44s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 29s | Avg: 11m 29s | Max: 11m 29s | Hits: 574%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 44s | Avg: 12m 44s | Max: 12m 44s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 40s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/12  | Total: 51m 25s | Avg:  4m 17s | Max: 16m 15s
      🟩 GCC                Pass: 100%/8   | Total: 35m 07s | Avg:  4m 23s | Max: 15m 02s
      🟩 MSVC               Pass: 100%/2   | Total: 24m 13s | Avg: 12m 06s | Max: 12m 44s | Hits: 578%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 00s | Avg:  5m 30s | Max:  5m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/24  | Total:  2h 01m | Avg:  5m 04s | Max: 16m 15s | Hits: 578%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/22  | Total:  1h 30m | Avg:  4m 06s | Max: 12m 44s | Hits: 578%/312   
      🟩 Test               Pass: 100%/2   | Total: 31m 17s | Avg: 15m 38s | Max: 16m 15s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 35s | Avg:  2m 35s | Max:  2m 35s
      🟩 90a                Pass: 100%/1   | Total:  2m 44s | Avg:  2m 44s | Max:  2m 44s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 17s | Avg:  3m 19s | Max:  5m 20s
      🟩 20                 Pass: 100%/20  | Total:  1h 48m | Avg:  5m 25s | Max: 16m 15s | Hits: 578%/312   
    
  • 🟩 cccl: Pass: 100%/4 | Total: 20m 43s | Avg: 5m 10s | Max: 6m 29s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 20m 43s | Avg:  5m 10s | Max:  6m 29s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 15s
      🟩 12.6               Pass: 100%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  6m 29s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 15s
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 58s | Avg:  5m 29s | Max:  6m 29s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 20m 43s | Avg:  5m 10s | Max:  6m 29s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 30s | Avg:  4m 30s | Max:  4m 30s
      🟩 Clang18            Pass: 100%/1   | Total:  6m 29s | Avg:  6m 29s | Max:  6m 29s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s
      🟩 GCC13              Pass: 100%/1   | Total:  4m 29s | Avg:  4m 29s | Max:  4m 29s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  6m 29s
      🟩 GCC                Pass: 100%/2   | Total:  9m 44s | Avg:  4m 52s | Max:  5m 15s
    🟩 gpu
      🟩 v100               Pass: 100%/4   | Total: 20m 43s | Avg:  5m 10s | Max:  6m 29s
    🟩 jobs
      🟩 Infra              Pass: 100%/4   | Total: 20m 43s | Avg:  5m 10s | Max:  6m 29s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 00s | Avg: 4m 30s | Max: 7m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  7m 05s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  7m 05s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  7m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  7m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  7m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  7m 05s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  7m 05s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 55s | Avg:  1m 55s | Max:  1m 55s
      🟩 Test               Pass: 100%/1   | Total:  7m 05s | Avg:  7m 05s | Max:  7m 05s
    
  • 🟩 python: Pass: 100%/1 | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 26m 37s | Avg: 26m 37s | Max: 26m 37s
    

👃 Inspect Changes

Modifications in project?

Project
+/- CCCL Infrastructure
+/- libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
+/- CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 172)

# Runner
120 linux-amd64-cpu16
23 linux-amd64-gpu-v100-latest-1
18 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro,
which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we
provide our own copy of cudaLaunchKernelEx when it is not available from the CTK.
Copy link
Contributor

github-actions bot commented Jan 9, 2025

🟨 CI finished in 1h 38m: Pass: 99%/172 | Total: 1d 03h | Avg: 9m 30s | Max: 38m 07s | Hits: 537%/27777
  • 🟨 libcudacxx: Pass: 97%/48 | Total: 8h 21m | Avg: 10m 26s | Max: 33m 01s | Hits: 670%/12453

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/46  | Total:  8h 14m | Avg: 10m 44s | Max: 33m 01s | Hits: 670%/12453 
      🟩 arm64              Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  3m 43s
    🔍 ctk: 12.6 🔍
      🟩 12.0               Pass: 100%/8   | Total:  1h 22m | Avg: 10m 18s | Max: 23m 41s | Hits: 652%/4863  
      🟩 12.5               Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
      🔍 12.6               Pass:  97%/38  | Total:  6h 40m | Avg: 10m 32s | Max: 33m 01s | Hits: 682%/7590  
    🔍 cudacxx: nvcc12.6 🔍
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 04m | Avg: 16m 04s | Max: 21m 09s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 22m | Avg: 10m 18s | Max: 23m 41s | Hits: 652%/4863  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
      🔍 nvcc12.6           Pass:  97%/34  | Total:  5h 36m | Avg:  9m 53s | Max: 33m 01s | Hits: 682%/7590  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 04m | Avg: 16m 04s | Max: 21m 09s
      🔍 nvcc               Pass:  97%/44  | Total:  7h 17m | Avg:  9m 55s | Max: 33m 01s | Hits: 670%/12453 
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/4   | Total: 16m 10s | Avg:  4m 02s | Max:  4m 58s
      🟩 Clang10            Pass: 100%/1   | Total:  5m 02s | Avg:  5m 02s | Max:  5m 02s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 10s | Avg:  4m 10s | Max:  4m 10s
      🟩 Clang12            Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
      🟩 Clang13            Pass: 100%/1   | Total:  4m 04s | Avg:  4m 04s | Max:  4m 04s
      🟩 Clang14            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 36m | Avg: 12m 02s | Max: 21m 09s
      🟩 GCC7               Pass: 100%/4   | Total: 13m 06s | Avg:  3m 16s | Max:  3m 27s
      🟩 GCC8               Pass: 100%/1   | Total: 18m 37s | Avg: 18m 37s | Max: 18m 37s
      🟩 GCC9               Pass: 100%/3   | Total: 25m 59s | Avg:  8m 39s | Max: 19m 03s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 GCC11              Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s
      🔍 GCC13              Pass:  90%/10  | Total:  2h 21m | Avg: 14m 11s | Max: 33m 01s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 11m | Avg: 23m 42s | Max: 24m 29s | Hits: 662%/7344  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 53m 34s | Avg: 26m 47s | Max: 26m 52s | Hits: 682%/5109  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/20  | Total:  2h 27m | Avg:  7m 21s | Max: 21m 09s
      🔍 GCC                Pass:  95%/21  | Total:  3h 31m | Avg: 10m 04s | Max: 33m 01s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 04m | Avg: 24m 56s | Max: 26m 52s | Hits: 670%/12453 
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
    🔍 jobs: NVRTC 🔍
      🟩 Build              Pass: 100%/41  | Total:  5h 53m | Avg:  8m 38s | Max: 26m 52s | Hits: 670%/12453 
      🔍 NVRTC              Pass:  75%/4   | Total:  1h 49m | Avg: 27m 15s | Max: 33m 01s
      🟩 Test               Pass: 100%/2   | Total: 36m 19s | Avg: 18m 09s | Max: 20m 09s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/6   | Total: 37m 46s | Avg:  6m 17s | Max: 21m 13s
      🔍 14                 Pass:  75%/4   | Total:  1h 02m | Avg: 15m 36s | Max: 31m 05s | Hits: 620%/2392  
      🟩 17                 Pass: 100%/14  | Total:  3h 07m | Avg: 13m 25s | Max: 26m 42s | Hits: 683%/7433  
      🟩 20                 Pass: 100%/23  | Total:  3h 31m | Avg:  9m 10s | Max: 33m 01s | Hits: 682%/2628  
    🟨 gpu
      🟨 v100               Pass:  97%/48  | Total:  8h 21m | Avg: 10m 26s | Max: 33m 01s | Hits: 670%/12453 
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
      🟩 90a                Pass: 100%/2   | Total: 16m 06s | Avg:  8m 03s | Max: 12m 22s
    
  • 🟩 cub: Pass: 100%/47 | Total: 8h 28m | Avg: 10m 49s | Max: 35m 23s | Hits: 599%/3900

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  8h 19m | Avg: 11m 05s | Max: 35m 23s | Hits: 599%/3900  
      🟩 arm64              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  4m 58s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  1h 25m | Avg: 10m 42s | Max: 28m 07s | Hits: 599%/1560  
      🟩 12.5               Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
      🟩 12.6               Pass: 100%/37  | Total:  6h 43m | Avg: 10m 53s | Max: 35m 23s | Hits: 599%/2340  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  4m 30s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 25m | Avg: 10m 42s | Max: 28m 07s | Hits: 599%/1560  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  6h 34m | Avg: 11m 15s | Max: 35m 23s | Hits: 599%/2340  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  4m 30s
      🟩 nvcc               Pass: 100%/45  | Total:  8h 19m | Avg: 11m 06s | Max: 35m 23s | Hits: 599%/3900  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 22m 53s | Avg:  5m 43s | Max:  6m 13s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 49s | Avg:  6m 49s | Max:  6m 49s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 27m | Avg: 12m 32s | Max: 35m 23s
      🟩 GCC7               Pass: 100%/4   | Total: 20m 50s | Avg:  5m 12s | Max:  5m 23s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 35s | Avg:  5m 35s | Max:  5m 35s
      🟩 GCC9               Pass: 100%/3   | Total: 16m 06s | Avg:  5m 22s | Max:  5m 36s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 GCC12              Pass: 100%/3   | Total: 25m 54s | Avg:  8m 38s | Max: 16m 03s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 52m | Avg: 14m 04s | Max: 27m 04s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 23m | Avg: 27m 58s | Max: 29m 30s | Hits: 599%/2340  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 56m 57s | Avg: 28m 28s | Max: 30m 07s | Hits: 599%/1560  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 35m | Avg:  8m 12s | Max: 35m 23s
      🟩 GCC                Pass: 100%/21  | Total:  3h 12m | Avg:  9m 09s | Max: 27m 04s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 20m | Avg: 28m 10s | Max: 30m 07s | Hits: 599%/3900  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 20m 11s | Avg: 10m 05s | Max: 16m 03s
      🟩 v100               Pass: 100%/45  | Total:  8h 08m | Avg: 10m 51s | Max: 35m 23s | Hits: 599%/3900  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  5h 37m | Avg:  8m 26s | Max: 30m 07s | Hits: 599%/3900  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 33s | Avg: 23m 33s | Max: 23m 33s
      🟩 GraphCapture       Pass: 100%/1   | Total: 18m 33s | Avg: 18m 33s | Max: 18m 33s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 06m | Avg: 22m 06s | Max: 27m 39s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 13s | Max: 35m 23s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 20m 11s | Avg: 10m 05s | Max: 16m 03s
      🟩 90a                Pass: 100%/1   | Total:  4m 31s | Avg:  4m 31s | Max:  4m 31s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 26m 11s | Avg:  5m 14s | Max:  6m 02s
      🟩 14                 Pass: 100%/3   | Total: 37m 53s | Avg: 12m 37s | Max: 26m 17s | Hits: 599%/780   
      🟩 17                 Pass: 100%/13  | Total:  2h 24m | Avg: 11m 08s | Max: 29m 30s | Hits: 599%/2340  
      🟩 20                 Pass: 100%/26  | Total:  5h 00m | Avg: 11m 32s | Max: 35m 23s | Hits: 599%/780   
    
  • 🟩 thrust: Pass: 100%/46 | Total: 7h 17m | Avg: 9m 30s | Max: 38m 07s | Hits: 365%/11112

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 18m 08s | Avg:  9m 04s | Max: 12m 38s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total:  7h 07m | Avg:  9m 43s | Max: 38m 07s | Hits: 365%/11112 
      🟩 arm64              Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  5m 01s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  1h 21m | Avg: 10m 09s | Max: 26m 32s | Hits: 365%/3704  
      🟩 12.5               Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
      🟩 12.6               Pass: 100%/36  | Total:  5h 27m | Avg:  9m 06s | Max: 38m 07s | Hits: 365%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 03s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 21m | Avg: 10m 09s | Max: 26m 32s | Hits: 365%/3704  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 18m | Avg:  9m 21s | Max: 38m 07s | Hits: 365%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 03s
      🟩 nvcc               Pass: 100%/44  | Total:  7h 07m | Avg:  9m 42s | Max: 38m 07s | Hits: 365%/11112 
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 21m 49s | Avg:  5m 27s | Max:  6m 11s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 58s | Avg:  4m 58s | Max:  4m 58s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 09s | Avg:  5m 09s | Max:  5m 09s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 00s | Avg:  5m 00s | Max:  5m 00s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 18s | Avg:  5m 18s | Max:  5m 18s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 Clang18            Pass: 100%/7   | Total: 47m 52s | Avg:  6m 50s | Max: 15m 32s
      🟩 GCC7               Pass: 100%/4   | Total: 19m 14s | Avg:  4m 48s | Max:  5m 05s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
      🟩 GCC9               Pass: 100%/3   | Total: 15m 44s | Avg:  5m 14s | Max:  5m 53s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 13s | Avg:  6m 13s | Max:  6m 13s
      🟩 GCC13              Pass: 100%/8   | Total: 58m 46s | Avg:  7m 20s | Max: 12m 38s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 22m | Avg: 27m 22s | Max: 30m 09s | Hits: 365%/5556  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 36m | Avg: 32m 10s | Max: 38m 07s | Hits: 365%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  1h 54m | Avg:  6m 00s | Max: 15m 32s
      🟩 GCC                Pass: 100%/19  | Total:  1h 56m | Avg:  6m 07s | Max: 12m 38s
      🟩 MSVC               Pass: 100%/6   | Total:  2h 58m | Avg: 29m 46s | Max: 38m 07s | Hits: 365%/11112 
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  7h 17m | Avg:  9m 30s | Max: 38m 07s | Hits: 365%/11112 
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  5h 43m | Avg:  8m 35s | Max: 30m 55s | Hits: 365%/9260  
      🟩 TestCPU            Pass: 100%/3   | Total: 52m 54s | Avg: 17m 38s | Max: 38m 07s | Hits: 365%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 40m 43s | Avg: 13m 34s | Max: 15m 32s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 23m 52s | Avg:  4m 46s | Max:  5m 32s
      🟩 14                 Pass: 100%/3   | Total: 37m 48s | Avg: 12m 36s | Max: 26m 32s | Hits: 365%/1852  
      🟩 17                 Pass: 100%/13  | Total:  2h 27m | Avg: 11m 20s | Max: 30m 09s | Hits: 365%/5556  
      🟩 20                 Pass: 100%/23  | Total:  3h 30m | Avg:  9m 07s | Max: 38m 07s | Hits: 365%/3704  
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 10m | Avg: 5m 26s | Max: 19m 24s | Hits: 582%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  2h 00m | Avg:  6m 00s | Max: 19m 24s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 10m 28s | Avg:  2m 37s | Max:  2m 41s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
      🟩 12.6               Pass: 100%/21  | Total:  1h 47m | Avg:  5m 07s | Max: 19m 24s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 47m | Avg:  5m 07s | Max: 19m 24s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 10m | Avg:  5m 26s | Max: 19m 24s | Hits: 582%/312   
    🟩 cxx
      🟩 Clang10            Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
      🟩 Clang11            Pass: 100%/1   | Total:  3m 07s | Avg:  3m 07s | Max:  3m 07s
      🟩 Clang12            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang13            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 17s | Avg:  3m 17s | Max:  3m 17s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 Clang18            Pass: 100%/4   | Total: 27m 26s | Avg:  6m 51s | Max: 18m 49s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 54s | Avg:  3m 54s | Max:  3m 54s
      🟩 GCC11              Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s
      🟩 GCC12              Pass: 100%/2   | Total: 23m 28s | Avg: 11m 44s | Max: 19m 24s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 46s | Avg:  2m 41s | Max:  2m 51s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 04s | Avg: 12m 04s | Max: 12m 04s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/12  | Total: 54m 19s | Avg:  4m 31s | Max: 18m 49s
      🟩 GCC                Pass: 100%/8   | Total: 41m 06s | Avg:  5m 08s | Max: 19m 24s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 58s | Avg: 11m 59s | Max: 12m 04s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/24  | Total:  2h 10m | Avg:  5m 26s | Max: 19m 24s | Hits: 582%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/22  | Total:  1h 32m | Avg:  4m 11s | Max: 12m 04s | Hits: 582%/312   
      🟩 Test               Pass: 100%/2   | Total: 38m 13s | Avg: 19m 06s | Max: 19m 24s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 51s | Avg:  2m 51s | Max:  2m 51s
      🟩 90a                Pass: 100%/1   | Total:  2m 43s | Avg:  2m 43s | Max:  2m 43s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 28s | Avg:  3m 22s | Max:  5m 29s
      🟩 20                 Pass: 100%/20  | Total:  1h 57m | Avg:  5m 51s | Max: 19m 24s | Hits: 582%/312   
    
  • 🟩 cccl: Pass: 100%/4 | Total: 20m 20s | Avg: 5m 05s | Max: 5m 39s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 11m 15s | Avg:  5m 37s | Max:  5m 39s
      🟩 12.6               Pass: 100%/2   | Total:  9m 05s | Avg:  4m 32s | Max:  4m 45s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 11m 15s | Avg:  5m 37s | Max:  5m 39s
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 05s | Avg:  4m 32s | Max:  4m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
      🟩 Clang18            Pass: 100%/1   | Total:  4m 45s | Avg:  4m 45s | Max:  4m 45s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 GCC13              Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 39s
      🟩 GCC                Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    🟩 jobs
      🟩 Infra              Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 59s | Avg: 4m 59s | Max: 7m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
      🟩 Test               Pass: 100%/1   | Total:  7m 55s | Avg:  7m 55s | Max:  7m 55s
    
  • 🟩 python: Pass: 100%/1 | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    

👃 Inspect Changes

Modifications in project?

Project
+/- CCCL Infrastructure
+/- libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
+/- CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 172)

# Runner
120 linux-amd64-cpu16
23 linux-amd64-gpu-v100-latest-1
18 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

Copy link
Contributor

github-actions bot commented Jan 9, 2025

🟩 CI finished in 2h 55m: Pass: 100%/172 | Total: 1d 03h | Avg: 9m 27s | Max: 38m 07s | Hits: 537%/27777
  • 🟩 libcudacxx: Pass: 100%/48 | Total: 8h 13m | Avg: 10m 17s | Max: 33m 01s | Hits: 670%/12453

    🟩 cpu
      🟩 amd64              Pass: 100%/46  | Total:  8h 06m | Avg: 10m 34s | Max: 33m 01s | Hits: 670%/12453 
      🟩 arm64              Pass: 100%/2   | Total:  6m 57s | Avg:  3m 28s | Max:  3m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  1h 22m | Avg: 10m 18s | Max: 23m 41s | Hits: 652%/4863  
      🟩 12.5               Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
      🟩 12.6               Pass: 100%/38  | Total:  6h 33m | Avg: 10m 20s | Max: 33m 01s | Hits: 682%/7590  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 04m | Avg: 16m 04s | Max: 21m 09s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 22m | Avg: 10m 18s | Max: 23m 41s | Hits: 652%/4863  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 28m | Avg:  9m 40s | Max: 33m 01s | Hits: 682%/7590  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 04m | Avg: 16m 04s | Max: 21m 09s
      🟩 nvcc               Pass: 100%/44  | Total:  7h 09m | Avg:  9m 45s | Max: 33m 01s | Hits: 670%/12453 
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 16m 10s | Avg:  4m 02s | Max:  4m 58s
      🟩 Clang10            Pass: 100%/1   | Total:  5m 02s | Avg:  5m 02s | Max:  5m 02s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 10s | Avg:  4m 10s | Max:  4m 10s
      🟩 Clang12            Pass: 100%/1   | Total:  4m 28s | Avg:  4m 28s | Max:  4m 28s
      🟩 Clang13            Pass: 100%/1   | Total:  4m 04s | Avg:  4m 04s | Max:  4m 04s
      🟩 Clang14            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 18s | Avg:  4m 18s | Max:  4m 18s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 36m | Avg: 12m 02s | Max: 21m 09s
      🟩 GCC7               Pass: 100%/4   | Total: 13m 06s | Avg:  3m 16s | Max:  3m 27s
      🟩 GCC8               Pass: 100%/1   | Total: 18m 37s | Avg: 18m 37s | Max: 18m 37s
      🟩 GCC9               Pass: 100%/3   | Total: 25m 59s | Avg:  8m 39s | Max: 19m 03s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 GCC11              Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 14m | Avg: 13m 24s | Max: 33m 01s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 11m | Avg: 23m 42s | Max: 24m 29s | Hits: 662%/7344  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 53m 34s | Avg: 26m 47s | Max: 26m 52s | Hits: 682%/5109  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/20  | Total:  2h 27m | Avg:  7m 21s | Max: 21m 09s
      🟩 GCC                Pass: 100%/21  | Total:  3h 23m | Avg:  9m 42s | Max: 33m 01s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 04m | Avg: 24m 56s | Max: 26m 52s | Hits: 670%/12453 
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 08s
    🟩 gpu
      🟩 v100               Pass: 100%/48  | Total:  8h 13m | Avg: 10m 17s | Max: 33m 01s | Hits: 670%/12453 
    🟩 jobs
      🟩 Build              Pass: 100%/41  | Total:  5h 53m | Avg:  8m 38s | Max: 26m 52s | Hits: 670%/12453 
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 41m | Avg: 25m 19s | Max: 33m 01s
      🟩 Test               Pass: 100%/2   | Total: 36m 19s | Avg: 18m 09s | Max: 20m 09s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 12m 26s | Avg: 12m 26s | Max: 12m 26s
      🟩 90a                Pass: 100%/2   | Total: 16m 06s | Avg:  8m 03s | Max: 12m 22s
    🟩 std
      🟩 11                 Pass: 100%/6   | Total: 37m 46s | Avg:  6m 17s | Max: 21m 13s
      🟩 14                 Pass: 100%/4   | Total: 54m 43s | Avg: 13m 40s | Max: 23m 21s | Hits: 620%/2392  
      🟩 17                 Pass: 100%/14  | Total:  3h 07m | Avg: 13m 25s | Max: 26m 42s | Hits: 683%/7433  
      🟩 20                 Pass: 100%/23  | Total:  3h 31m | Avg:  9m 10s | Max: 33m 01s | Hits: 682%/2628  
    
  • 🟩 cub: Pass: 100%/47 | Total: 8h 28m | Avg: 10m 49s | Max: 35m 23s | Hits: 599%/3900

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  8h 19m | Avg: 11m 05s | Max: 35m 23s | Hits: 599%/3900  
      🟩 arm64              Pass: 100%/2   | Total:  9m 37s | Avg:  4m 48s | Max:  4m 58s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  1h 25m | Avg: 10m 42s | Max: 28m 07s | Hits: 599%/1560  
      🟩 12.5               Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
      🟩 12.6               Pass: 100%/37  | Total:  6h 43m | Avg: 10m 53s | Max: 35m 23s | Hits: 599%/2340  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  4m 30s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 25m | Avg: 10m 42s | Max: 28m 07s | Hits: 599%/1560  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
      🟩 nvcc12.6           Pass: 100%/35  | Total:  6h 34m | Avg: 11m 15s | Max: 35m 23s | Hits: 599%/2340  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 57s | Avg:  4m 28s | Max:  4m 30s
      🟩 nvcc               Pass: 100%/45  | Total:  8h 19m | Avg: 11m 06s | Max: 35m 23s | Hits: 599%/3900  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 22m 53s | Avg:  5m 43s | Max:  6m 13s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 49s | Avg:  6m 49s | Max:  6m 49s
      🟩 Clang11            Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 15s | Avg:  5m 15s | Max:  5m 15s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 23s | Avg:  5m 23s | Max:  5m 23s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 28s | Avg:  5m 28s | Max:  5m 28s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 27m | Avg: 12m 32s | Max: 35m 23s
      🟩 GCC7               Pass: 100%/4   | Total: 20m 50s | Avg:  5m 12s | Max:  5m 23s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 35s | Avg:  5m 35s | Max:  5m 35s
      🟩 GCC9               Pass: 100%/3   | Total: 16m 06s | Avg:  5m 22s | Max:  5m 36s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 34s | Avg:  5m 34s | Max:  5m 34s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 38s | Avg:  5m 38s | Max:  5m 38s
      🟩 GCC12              Pass: 100%/3   | Total: 25m 54s | Avg:  8m 38s | Max: 16m 03s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 52m | Avg: 14m 04s | Max: 27m 04s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 23m | Avg: 27m 58s | Max: 29m 30s | Hits: 599%/2340  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 56m 57s | Avg: 28m 28s | Max: 30m 07s | Hits: 599%/1560  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 35m | Avg:  8m 12s | Max: 35m 23s
      🟩 GCC                Pass: 100%/21  | Total:  3h 12m | Avg:  9m 09s | Max: 27m 04s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 20m | Avg: 28m 10s | Max: 30m 07s | Hits: 599%/3900  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 54s | Avg:  9m 57s | Max: 10m 07s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 20m 11s | Avg: 10m 05s | Max: 16m 03s
      🟩 v100               Pass: 100%/45  | Total:  8h 08m | Avg: 10m 51s | Max: 35m 23s | Hits: 599%/3900  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  5h 37m | Avg:  8m 26s | Max: 30m 07s | Hits: 599%/3900  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 33s | Avg: 23m 33s | Max: 23m 33s
      🟩 GraphCapture       Pass: 100%/1   | Total: 18m 33s | Avg: 18m 33s | Max: 18m 33s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 06m | Avg: 22m 06s | Max: 27m 39s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 02m | Avg: 31m 13s | Max: 35m 23s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 20m 11s | Avg: 10m 05s | Max: 16m 03s
      🟩 90a                Pass: 100%/1   | Total:  4m 31s | Avg:  4m 31s | Max:  4m 31s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 26m 11s | Avg:  5m 14s | Max:  6m 02s
      🟩 14                 Pass: 100%/3   | Total: 37m 53s | Avg: 12m 37s | Max: 26m 17s | Hits: 599%/780   
      🟩 17                 Pass: 100%/13  | Total:  2h 24m | Avg: 11m 08s | Max: 29m 30s | Hits: 599%/2340  
      🟩 20                 Pass: 100%/26  | Total:  5h 00m | Avg: 11m 32s | Max: 35m 23s | Hits: 599%/780   
    
  • 🟩 thrust: Pass: 100%/46 | Total: 7h 17m | Avg: 9m 30s | Max: 38m 07s | Hits: 365%/11112

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 18m 08s | Avg:  9m 04s | Max: 12m 38s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total:  7h 07m | Avg:  9m 43s | Max: 38m 07s | Hits: 365%/11112 
      🟩 arm64              Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  5m 01s
    🟩 ctk
      🟩 12.0               Pass: 100%/8   | Total:  1h 21m | Avg: 10m 09s | Max: 26m 32s | Hits: 365%/3704  
      🟩 12.5               Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
      🟩 12.6               Pass: 100%/36  | Total:  5h 27m | Avg:  9m 06s | Max: 38m 07s | Hits: 365%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 03s
      🟩 nvcc12.0           Pass: 100%/8   | Total:  1h 21m | Avg: 10m 09s | Max: 26m 32s | Hits: 365%/3704  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
      🟩 nvcc12.6           Pass: 100%/34  | Total:  5h 18m | Avg:  9m 21s | Max: 38m 07s | Hits: 365%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 45s | Avg:  4m 52s | Max:  5m 03s
      🟩 nvcc               Pass: 100%/44  | Total:  7h 07m | Avg:  9m 42s | Max: 38m 07s | Hits: 365%/11112 
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total: 21m 49s | Avg:  5m 27s | Max:  6m 11s
      🟩 Clang10            Pass: 100%/1   | Total:  6m 55s | Avg:  6m 55s | Max:  6m 55s
      🟩 Clang11            Pass: 100%/1   | Total:  4m 58s | Avg:  4m 58s | Max:  4m 58s
      🟩 Clang12            Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 Clang13            Pass: 100%/1   | Total:  5m 09s | Avg:  5m 09s | Max:  5m 09s
      🟩 Clang14            Pass: 100%/1   | Total:  5m 00s | Avg:  5m 00s | Max:  5m 00s
      🟩 Clang15            Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s
      🟩 Clang16            Pass: 100%/1   | Total:  5m 18s | Avg:  5m 18s | Max:  5m 18s
      🟩 Clang17            Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s
      🟩 Clang18            Pass: 100%/7   | Total: 47m 52s | Avg:  6m 50s | Max: 15m 32s
      🟩 GCC7               Pass: 100%/4   | Total: 19m 14s | Avg:  4m 48s | Max:  5m 05s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
      🟩 GCC9               Pass: 100%/3   | Total: 15m 44s | Avg:  5m 14s | Max:  5m 53s
      🟩 GCC10              Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
      🟩 GCC11              Pass: 100%/1   | Total:  5m 29s | Avg:  5m 29s | Max:  5m 29s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 13s | Avg:  6m 13s | Max:  6m 13s
      🟩 GCC13              Pass: 100%/8   | Total: 58m 46s | Avg:  7m 20s | Max: 12m 38s
      🟩 MSVC14.29          Pass: 100%/3   | Total:  1h 22m | Avg: 27m 22s | Max: 30m 09s | Hits: 365%/5556  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 36m | Avg: 32m 10s | Max: 38m 07s | Hits: 365%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  1h 54m | Avg:  6m 00s | Max: 15m 32s
      🟩 GCC                Pass: 100%/19  | Total:  1h 56m | Avg:  6m 07s | Max: 12m 38s
      🟩 MSVC               Pass: 100%/6   | Total:  2h 58m | Avg: 29m 46s | Max: 38m 07s | Hits: 365%/11112 
      🟩 NVHPC              Pass: 100%/2   | Total: 28m 01s | Avg: 14m 00s | Max: 14m 19s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total:  7h 17m | Avg:  9m 30s | Max: 38m 07s | Hits: 365%/11112 
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  5h 43m | Avg:  8m 35s | Max: 30m 55s | Hits: 365%/9260  
      🟩 TestCPU            Pass: 100%/3   | Total: 52m 54s | Avg: 17m 38s | Max: 38m 07s | Hits: 365%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 40m 43s | Avg: 13m 34s | Max: 15m 32s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 19s | Avg:  4m 19s | Max:  4m 19s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total: 23m 52s | Avg:  4m 46s | Max:  5m 32s
      🟩 14                 Pass: 100%/3   | Total: 37m 48s | Avg: 12m 36s | Max: 26m 32s | Hits: 365%/1852  
      🟩 17                 Pass: 100%/13  | Total:  2h 27m | Avg: 11m 20s | Max: 30m 09s | Hits: 365%/5556  
      🟩 20                 Pass: 100%/23  | Total:  3h 30m | Avg:  9m 07s | Max: 38m 07s | Hits: 365%/3704  
    
  • 🟩 cudax: Pass: 100%/24 | Total: 2h 10m | Avg: 5m 26s | Max: 19m 24s | Hits: 582%/312

    🟩 cpu
      🟩 amd64              Pass: 100%/20  | Total:  2h 00m | Avg:  6m 00s | Max: 19m 24s | Hits: 582%/312   
      🟩 arm64              Pass: 100%/4   | Total: 10m 28s | Avg:  2m 37s | Max:  2m 41s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
      🟩 12.5               Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
      🟩 12.6               Pass: 100%/21  | Total:  1h 47m | Avg:  5m 07s | Max: 19m 24s | Hits: 582%/156   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
      🟩 nvcc12.6           Pass: 100%/21  | Total:  1h 47m | Avg:  5m 07s | Max: 19m 24s | Hits: 582%/156   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/24  | Total:  2h 10m | Avg:  5m 26s | Max: 19m 24s | Hits: 582%/312   
    🟩 cxx
      🟩 Clang10            Pass: 100%/1   | Total:  3m 52s | Avg:  3m 52s | Max:  3m 52s
      🟩 Clang11            Pass: 100%/1   | Total:  3m 07s | Avg:  3m 07s | Max:  3m 07s
      🟩 Clang12            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang13            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s
      🟩 Clang14            Pass: 100%/1   | Total:  3m 17s | Avg:  3m 17s | Max:  3m 17s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 15s | Avg:  3m 15s | Max:  3m 15s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 Clang18            Pass: 100%/4   | Total: 27m 26s | Avg:  6m 51s | Max: 18m 49s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 54s | Avg:  3m 54s | Max:  3m 54s
      🟩 GCC11              Pass: 100%/1   | Total:  2m 58s | Avg:  2m 58s | Max:  2m 58s
      🟩 GCC12              Pass: 100%/2   | Total: 23m 28s | Avg: 11m 44s | Max: 19m 24s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 46s | Avg:  2m 41s | Max:  2m 51s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 54s | Avg: 11m 54s | Max: 11m 54s | Hits: 582%/156   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 04s | Avg: 12m 04s | Max: 12m 04s | Hits: 582%/156   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/12  | Total: 54m 19s | Avg:  4m 31s | Max: 18m 49s
      🟩 GCC                Pass: 100%/8   | Total: 41m 06s | Avg:  5m 08s | Max: 19m 24s
      🟩 MSVC               Pass: 100%/2   | Total: 23m 58s | Avg: 11m 59s | Max: 12m 04s | Hits: 582%/312   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 09s | Avg:  5m 34s | Max:  5m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/24  | Total:  2h 10m | Avg:  5m 26s | Max: 19m 24s | Hits: 582%/312   
    🟩 jobs
      🟩 Build              Pass: 100%/22  | Total:  1h 32m | Avg:  4m 11s | Max: 12m 04s | Hits: 582%/312   
      🟩 Test               Pass: 100%/2   | Total: 38m 13s | Avg: 19m 06s | Max: 19m 24s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 51s | Avg:  2m 51s | Max:  2m 51s
      🟩 90a                Pass: 100%/1   | Total:  2m 43s | Avg:  2m 43s | Max:  2m 43s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 28s | Avg:  3m 22s | Max:  5m 29s
      🟩 20                 Pass: 100%/20  | Total:  1h 57m | Avg:  5m 51s | Max: 19m 24s | Hits: 582%/312   
    
  • 🟩 cccl: Pass: 100%/4 | Total: 20m 20s | Avg: 5m 05s | Max: 5m 39s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 11m 15s | Avg:  5m 37s | Max:  5m 39s
      🟩 12.6               Pass: 100%/2   | Total:  9m 05s | Avg:  4m 32s | Max:  4m 45s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 11m 15s | Avg:  5m 37s | Max:  5m 39s
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 05s | Avg:  4m 32s | Max:  4m 45s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  5m 39s | Avg:  5m 39s | Max:  5m 39s
      🟩 Clang18            Pass: 100%/1   | Total:  4m 45s | Avg:  4m 45s | Max:  4m 45s
      🟩 GCC12              Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 GCC13              Pass: 100%/1   | Total:  4m 20s | Avg:  4m 20s | Max:  4m 20s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 10m 24s | Avg:  5m 12s | Max:  5m 39s
      🟩 GCC                Pass: 100%/2   | Total:  9m 56s | Avg:  4m 58s | Max:  5m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    🟩 jobs
      🟩 Infra              Pass: 100%/4   | Total: 20m 20s | Avg:  5m 05s | Max:  5m 39s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 59s | Avg: 4m 59s | Max: 7m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  7m 55s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
      🟩 Test               Pass: 100%/1   | Total:  7m 55s | Avg:  7m 55s | Max:  7m 55s
    
  • 🟩 python: Pass: 100%/1 | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 27m 36s | Avg: 27m 36s | Max: 27m 36s
    

👃 Inspect Changes

Modifications in project?

Project
+/- CCCL Infrastructure
+/- libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
+/- CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 172)

# Runner
120 linux-amd64-cpu16
23 linux-amd64-gpu-v100-latest-1
18 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@bernhardmgruber bernhardmgruber merged commit 97f4c34 into NVIDIA:main Jan 9, 2025
217 checks passed
@bernhardmgruber bernhardmgruber deleted the drop_ctk11 branch January 9, 2025 11:26
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 18, 2025
* Add cuda12.0-gcc7 devcontainer
* Move MSVC2017 jobs to CTK 12.6
Those is the only combination where rapidsai has devcontainers
* Add /Zc:__cplusplus for the libcudacxx tests
* Only add excape hatch for affected CTKs
* Workaround missing cudaLaunchKernelEx on MSVC
cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro, which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we provide our own copy of cudaLaunchKernelEx when it is not available from the CTK.
* Workaround nvcc+MSVC issue
* Regenerate devcontainers

Fixes: NVIDIA#3249

Co-authored-by: Michael Schellenberger Costa <[email protected]>
davebayer added a commit to davebayer/cccl that referenced this pull request Jan 20, 2025
implement `add_sat`

split `signed`/`unsigned` implementation, improve implementation for MSVC

improve device `add_sat` implementation

add `add_sat` test

improve generic `add_sat` implementation for signed types

implement `sub_sat`

allow more msvc intrinsics on x86

add op tests

partially implement `mul_sat`

implement `div_sat` and `saturate_cast`

add `saturate_cast` test

simplify `div_sat` test

Deprectate C++11 and C++14 for libcu++ (#3173)

* Deprectate C++11 and C++14 for libcu++

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Implement `abs` and `div` from `cstdlib` (#3153)

* implement integer abs functions
* improve tests, fix constexpr support
* just use the our implementation
* implement `cuda::std::div`
* prefer host's `div_t` like types
* provide `cuda::std::abs` overloads for floats
* allow fp abs for NVRTC
* silence msvc's warning about conversion from floating point to integral

Fix missing radix sort policies (#3174)

Fixes NVBug 5009941

Introduces new `DeviceReduce::Arg{Min,Max}` interface with two output iterators (#3148)

* introduces new arg{min,max} interface with two output iterators

* adds fp inf tests

* fixes docs

* improves code example

* fixes exec space specifier

* trying to fix deprecation warning for more compilers

* inlines unzip operator

* trying to fix deprecation warning for nvhpc

* integrates supression fixes in diagnostics

* pre-ctk 11.5 deprecation suppression

* fixes icc

* fix for pre-ctk11.5

* cleans up deprecation suppression

* cleanup

Extend tuning documentation (#3179)

Add codespell pre-commit hook, fix typos in CCCL (#3168)

* Add codespell pre-commit hook
* Automatic changes from codespell.
* Manual changes.

Fix parameter space for TUNE_LOAD in scan benchmark (#3176)

fix various old compiler checks (#3178)

implement C++26 `std::projected` (#3175)

Fix pre-commit config for codespell and remaining typos (#3182)

Massive cleanup of our config (#3155)

Fix UB in atomics with automatic storage (#2586)

* Adds specialized local cuda atomics and injects them into most atomics paths.

Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: gonzalobg <[email protected]>

* Allow CUDA 12.2 to keep perf, this addresses earlier comments in #478

* Remove extraneous double brackets in unformatted code.

* Merge unsafe atomic logic into `__cuda_is_local`.

* Use `const_cast` for type conversions in cuda_local.h

* Fix build issues from interface changes

* Fix missing __nanosleep on sm70-

* Guard __isLocal from NVHPC

* Use PTX instead of running nothing from NVHPC

* fixup /s/nvrtc/nvhpc

* Fixup missing CUDA ifdef surrounding device code

* Fix codegen

* Bypass some sort of compiler bug on GCC7

* Apply suggestions from code review

* Use unsafe automatic storage atomics in codegen tests

---------

Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: gonzalobg <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>

Refactor the source code layout for `cuda.parallel` (#3177)

* Refactor the source layout for cuda.parallel

* Add copyright

* Address review feedback

* Don't import anything into `experimental` namespace

* fix import

---------

Co-authored-by: Ashwin Srinath <[email protected]>

new type-erased memory resources (#2824)

s/_LIBCUDACXX_DECLSPEC_EMPTY_BASES/_CCCL_DECLSPEC_EMPTY_BASES/g (#3186)

Document address stability of `thrust::transform` (#3181)

* Do not document _LIBCUDACXX_MARK_CAN_COPY_ARGUMENTS
* Reformat and fix UnaryFunction/BinaryFunction in transform docs
* Mention transform can use proclaim_copyable_arguments
* Document cuda::proclaims_copyable_arguments better
* Deprecate depending on transform functor argument addresses

Fixes: #3053

turn off cuda version check for clangd (#3194)

[STF] jacobi example based on parallel_for (#3187)

* Simple jacobi example with parallel for and reductions

* clang-format

* remove useless capture list

fixes pre-nv_diag suppression issues (#3189)

Prefer c2h::type_name over c2h::demangle (#3195)

Fix memcpy_async* tests (#3197)

* memcpy_async_tx: Fix bug in test

Two bugs, one of which occurs in practice:

1. There is a missing fence.proxy.space::global between the writes to
   global memory and the memcpy_async_tx. (Occurs in practice)

2. The end of the kernel should be fenced with `__syncthreads()`,
   because the barrier is invalidated in the destructor. If other
   threads are still waiting on it, there will be UB. (Has not yet
   manifested itself)

* cp_async_bulk_tensor: Pre-emptively fence more in test

Add type annotations and mypy checks for `cuda.parallel`  (#3180)

* Refactor the source layout for cuda.parallel

* Add initial type annotations

* Update pre-commit config

* More typing

* Fix bad merge

* Fix TYPE_CHECKING and numpy annotations

* typing bindings.py correctly

* Address review feedback

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Fix rendering of cuda.parallel docs (#3192)

* Fix pre-commit config for codespell and remaining typos

* Fix rendering of docs for cuda.parallel

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Enable PDL for DeviceMergeSortBlockSortKernel (#3199)

The kernel already contains a call to _CCCL_PDL_GRID_DEPENDENCY_SYNC.
This commit enables PDL when launching the kernel.

Adds support for large `num_items` to `DeviceReduce::{ArgMin,ArgMax}` (#2647)

* adds benchmarks for reduce::arg{min,max}

* preliminary streaming arg-extremum reduction

* fixes implicit conversion

* uses streaming dispatch class

* changes arg benches to use new streaming reduce

* streaming arg-extrema reduction

* fixes style

* fixes compilation failures

* cleanups

* adds rst style comments

* declare vars const and use clamp

* consolidates argmin argmax benchmarks

* fixes thrust usage

* drops offset type in arg-extrema benchmarks

* fixes clang cuda

* exec space macros

* switch to signed global offset type for slightly better perf

* clarifies documentation

* applies minor benchmark style changes from review comments

* fixes interface documentation and comments

* list-init accumulating output op

* improves style, comments, and tests

* cleans up aggregate init

* renames dispatch class usage in benchmarks

* fixes merge conflicts

* addresses review comments

* addresses review comments

* fixes assertion

* removes superseded implementation

* changes large problem tests to use new interface

* removes obsolete tests for deprecated interface

Fixes for Python 3.7 docs environment (#3206)

Co-authored-by: Ashwin Srinath <[email protected]>

Adds support for large number of items to `DeviceTransform` (#3172)

* moves large problem test helper to common file

* adds support for large num items to device transform

* adds tests for large number of items to device interface

* fixes format

* addresses review comments

cp_async_bulk: Fix test (#3198)

* memcpy_async_tx: Fix bug in test

Two bugs, one of which occurs in practice:

1. There is a missing fence.proxy.space::global between the writes to
   global memory and the memcpy_async_tx. (Occurs in practice)

2. The end of the kernel should be fenced with `__syncthreads()`,
   because the barrier is invalidated in the destructor. If other
   threads are still waiting on it, there will be UB. (Has not yet
   manifested itself)

* cp_async_bulk_tensor: Pre-emptively fence more in test

* cp_async_bulk: Fix test

The global memory pointer could be misaligned.

cudax fixes for msvc 14.41 (#3200)

avoid instantiating class templates in `is_same` implementation when possible (#3203)

Fix: make launchers a CUB detail; make kernel source functions hidden. (#3209)

* Fix: make launchers a CUB detail; make kernel source functions hidden.

* [pre-commit.ci] auto code formatting

* Address review comments, fix which macro gets fixed.

help the ranges concepts recognize standard contiguous iterators in c++14/17 (#3202)

unify macros and cmake options that control the suppression of deprecation warnings (#3220)

* unify macros and cmake options that control the suppression of deprecation warnings

* suppress nvcc warning #186 in thrust header tests

* suppress c++ dialect deprecation warnings in libcudacxx header tests

Fx thread-reduce performance regression (#3225)

cuda.parallel: In-memory caching of build objects (#3216)

* Define __eq__ and __hash__ for Iterators

* Define cache_with_key utility and use it to cache Reduce objects

* Add tests for caching Reduce objects

* Tighten up types

* Updates to support 3.7

* Address review feedback

* Introduce IteratorKind to hold iterator type information

* Use the .kind to generate an abi_name

* Remove __eq__ and __hash__ methods from IteratorBase

* Move helper function

* Formatting

* Don't unpack tuple in cache key

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Just enough ranges for c++14 `span` (#3211)

use generalized concepts portability macros to simplify the `range` concept (#3217)

fixes some issues in the concepts portability macros and then re-implements the `range` concept with `_CCCL_REQUIRES_EXPR`

Use Ruff to sort imports (#3230)

* Update pyproject.tomls for import sorting

* Update files after running pre-commit

* Move ruff config to pyproject.toml

---------

Co-authored-by: Ashwin Srinath <[email protected]>

fix tuning_scan sm90 config issue (#3236)

Co-authored-by: Shijie Chen <[email protected]>

[STF] Logical token (#3196)

* Split the implementation of the void interface into the definition of the interface, and its implementations on streams and graphs.

* Add missing files

* Check if a task implementation can match a prototype where the void_interface arguments are ignored

* Implement ctx.abstract_logical_data() which relies on a void data interface

* Illustrate how to use abstract handles in local contexts

* Introduce an is_void_interface() virtual method in the data interface to potentially optimize some stages

* Small improvements in the examples

* Do not try to allocate or move void data

* Do not use I as a variable

* fix linkage error

* rename abtract_logical_data into logical_token

* Document logical token

* fix spelling error

* fix sphinx error

* reflect name changes

* use meaningful variable names

* simplify logical_token implementation because writeback is already disabled

* add a unit test for token elision

* implement token elision in host_launch

* Remove unused type

* Implement helpers to check if a function can be invoked from a tuple, or from a tuple where we removed tokens

* Much simpler is_tuple_invocable_with_filtered implementation

* Fix buggy test

* Factorize code

* Document that we can ignore tokens for task and host_launch

* Documentation for logical data freeze

Fix ReduceByKey tuning (#3240)

Fix RLE tuning (#3239)

cuda.parallel: Forbid non-contiguous arrays as inputs (or outputs) (#3233)

* Forbid non-contiguous arrays as inputs (or outputs)

* Implement a more robust way to check for contiguity

* Don't bother if cublas unavailable

* Fix how we check for zero-element arrays

* sort imports

---------

Co-authored-by: Ashwin Srinath <[email protected]>

expands support for more offset types in segmented benchmark (#3231)

Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects (#3253)

* Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects

* Do not add option twice

ptx: Add add_instruction.py (#3190)

This file helps create the necessary structure for new PTX instructions.

Co-authored-by: Allard Hendriksen <[email protected]>

Bump main to 2.9.0. (#3247)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Drop cub::Mutex (#3251)

Fixes: #3250

Remove legacy macros from CUB util_arch.cuh (#3257)

Fixes: #3256

Remove thrust::[unary|binary]_traits (#3260)

Fixes: #3259

Architecture and OS identification macros (#3237)

Bump main to 3.0.0. (#3265)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Drop thrust not1 and not2 (#3264)

Fixes: #3263

CCCL Internal macro documentation (#3238)

Deprecate GridBarrier and GridBarrierLifetime (#3258)

Fixes: #1389

Require at least gcc7 (#3268)

Fixes: #3267

Drop thrust::[unary|binary]_function (#3274)

Fixes: #3273

Drop ICC from CI (#3277)

[STF] Corruption of the capture list of an extended lambda with a parallel_for construct on a host execution place (#3270)

* Add a test to reproduce a bug observed with parallel_for on a host place

* clang-format

* use _CCCL_ASSERT

* Attempt to debug

* do not create a tuple with a universal reference that is out of scope when we use it, use an lvalue instead

* fix lambda expression

* clang-format

Enable thrust::identity test for non-MSVC (#3281)

This seems to be an oversight when the test was added

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Enable PDL in triple chevron launch (#3282)

It seems PDL was disabled by accident when _THRUST_HAS_PDL was renamed
to _CCCL_HAS_PDL during the review introducing the feature.

Disambiguate line continuations and macro continuations in <nv/target> (#3244)

Drop VS 2017 from CI (#3287)

Fixes: #3286

Drop ICC support in code (#3279)

* Drop ICC from code

Fixes: #3278

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Make CUB NVRTC commandline arguments come from a cmake template (#3292)

Propose the same components (thrust, cub, libc++, cudax, cuda.parallel,...) in the bug report template than in the feature request template (#3295)

Use process isolation instead of default hyper-v for Windows. (#3294)

Try improving build times by using process isolation instead of hyper-v

Co-authored-by: Michael Schellenberger Costa <[email protected]>

[pre-commit.ci] pre-commit autoupdate (#3248)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.6](https://github.com/pre-commit/mirrors-clang-format/compare/v18.1.8...v19.1.6)
- [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.8.3...v0.8.6)
- [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.13.0...v1.14.1)

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Drop Thrust legacy arch macros (#3298)

Which were disabled and could be re-enabled using THRUST_PROVIDE_LEGACY_ARCH_MACROS

Drop Thrust's compiler_fence.h (#3300)

Drop CTK 11.x from CI (#3275)

* Add cuda12.0-gcc7 devcontainer
* Move MSVC2017 jobs to CTK 12.6
Those is the only combination where rapidsai has devcontainers
* Add /Zc:__cplusplus for the libcudacxx tests
* Only add excape hatch for affected CTKs
* Workaround missing cudaLaunchKernelEx on MSVC
cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro, which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we provide our own copy of cudaLaunchKernelEx when it is not available from the CTK.
* Workaround nvcc+MSVC issue
* Regenerate devcontainers

Fixes: #3249

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Drop CUB's util_compiler.cuh (#3302)

All contained macros were deprecated

Update packman and repo_docs versions (#3293)

Co-authored-by: Ashwin Srinath <[email protected]>

Drop Thrust's deprecated compiler macros (#3301)

Drop CUB_RUNTIME_ENABLED and __THRUST_HAS_CUDART__ (#3305)

Adds support for large number of items to `DevicePartition::If` with the `ThreeWayPartition` overload (#2506)

* adds support for large number of items to three-way partition

* adapts interface to use choose_signed_offset_t

* integrates applicable feedback from device-select pr

* changes behavior for empty problems

* unifies grid constant macro

* fixes kernel template specialization mismatch

* integrates _CCCL_GRID_CONSTANT changes

* resolve merge conflicts

* fixes checks in test

* fixes test verification

* improves tests

* makes few improvements to streaming dispatch

* improves code comment on test

* fixes unrelated compiler error

* minor style improvements

Refactor scan tunings (#3262)

Require C++17 for compiling Thrust and CUB (#3255)

* Issue an unsuppressable warning when compiling with < C++17
* Remove C++11/14 presets
* Remove CCCL_IGNORE_DEPRECATED_CPP_DIALECT from headers
* Remove [CUB|THRUST|TCT]_IGNORE_DEPRECATED_CPP_[11|14]
* Remove CUB_ENABLE_DIALECT_CPP[11|14]
* Update CI runs
* Remove C++11/14 CI runs for CUB and Thrust
* Raise compiler minimum versions for C++17
* Update ReadMe
* Drop Thrust's cpp14_required.h
* Add escape hatch for C++17 removal

Fixes: #3252

Implement `views::empty` (#3254)

* Disable pair conversion of subrange with clang in C++17

* Fix namespace views

* Implement `views::empty`

This implements `std::ranges::views::empty`, see https://en.cppreference.com/w/cpp/ranges/empty_view

Refactor `limits` and `climits` (#3221)

* implement builtins for huge val, nan and nans

* change `INFINITY` and `NAN` implementation for NVRTC

cuda.parallel: Add documentation for the current iterators along with examples and tests (#3311)

* Add tests demonstrating usage of different iterators

* Update documentation of reduce_into by merging import code snippet with the rest of the example

* Add documentation for current iterators

* Run pre-commit checks and update accordingly

* Fix comments to refer to the proper lines in the code snippets in the docs

Drop clang<14 from CI, update devcontainers. (#3309)

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

[STF] Cleanup task dependencies object constructors (#3291)

* Define tag types for access modes

* - Rework how we build task_dep objects based on access mode tags
- pack_state is now responsible for using a const_cast for read only data

* Greatly simplify the previous attempt : do not define new types, but use integral constants based on the enums

* It seems the const_cast was not necessarily so we can simplify it and not even do some dispatch based on access modes

Disable test with a gcc-14 regression (#3297)

Deprecate Thrust's cpp_compatibility.h macros (#3299)

Remove dropped function objects from docs (#3319)

Document `NV_TARGET` macros (#3313)

[STF] Define ctx.pick_stream() which was missing for the unified context (#3326)

* Define ctx.pick_stream() which was missing for the unified context

* clang-format

Deprecate cub::IterateThreadStore (#3337)

Drop CUB's BinaryFlip operator (#3332)

Deprecate cub::Swap (#3333)

Clarify transform output can overlap input (#3323)

Drop CUB APIs with a debug_synchronous parameter (#3330)

Fixes: #3329

Drop CUB's util_compiler.cuh for real (#3340)

PR #3302 planned to drop the file, but only dropped its content. This
was an oversight. So let's drop the entire file.

Drop cub::ValueCache (#3346)

limits offset types for merge sort (#3328)

Drop CDPv1 (#3344)

Fixes: #3341

Drop thrust::void_t (#3362)

Use cuda::std::addressof in Thrust (#3363)

Fix all_of documentation for empty ranges (#3358)

all_of always returns true on an empty range.

[STF] Do not keep track of dangling events in a CUDA graph backend (#3327)

* Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when
the CUDA graph completes. Therefore keeping track of "dangling events" is a
waste of time and resources.

* replace can_ignore_dangling_events by track_dangling_events which leads to more readable code

* When not storing the dangling events, we must still perform the deinit operations that were producing these events !

Extract scan kernels into NVRTC-compilable header (#3334)

* Extract scan kernels into NVRTC-compilable header

* Update cub/cub/device/dispatch/dispatch_scan.cuh

Co-authored-by: Georgii Evtushenko <[email protected]>

---------

Co-authored-by: Ashwin Srinath <[email protected]>
Co-authored-by: Georgii Evtushenko <[email protected]>

Drop deprecated aliases in Thrust functional (#3272)

Fixes: #3271

Drop cub::DivideAndRoundUp (#3347)

Use cuda::std::min/max in Thrust (#3364)

Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (#3361)

* implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16`

Cleanup util_arch (#2773)

Deprecate thrust::null_type (#3367)

Deprecate cub::DeviceSpmv (#3320)

Fixes: #896

Improves `DeviceSegmentedSort` test run time for large number of items and segments (#3246)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* fixes spelling

* adds tests for large number of segments

* fixes narrowing conversion in tests

* addresses review comments

* fixes includes

Compile basic infra test with C++17 (#3377)

Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (#3308)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* addresses review comments

* introduces segment offset type

* adds tests for large number of segments

* adds support for large number of segments

* drops segment offset type

* fixes thrust namespace

* removes about-to-be-deprecated cub iterators

* no exec specifier on defaulted ctor

* fixes gcc7 linker error

* uses local_segment_index_t throughout

* determine offset type based on type returned by segment iterator begin/end iterators

* minor style improvements

Exit with error when RAPIDS CI fails. (#3385)

cuda.parallel: Support structured types as algorithm inputs (#3218)

* Introduce gpu_struct decorator and typing

* Enable `reduce` to accept arrays of structs as inputs

* Add test for reducing arrays-of-struct

* Update documentation

* Use a numpy array rather than ctypes object

* Change zeros -> empty for output array and temp storage

* Add a TODO for typing GpuStruct

* Documentation udpates

* Remove test_reduce_struct_type from test_reduce.py

* Revert to `to_cccl_value()` accepting ndarray + GpuStruct

* Bump copyrights

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Deprecate thrust::async (#3324)

Fixes: #100

Review/Deprecate CUB `util.ptx` for CCCL 2.x (#3342)

Fix broken `_CCCL_BUILTIN_ASSUME` macro (#3314)

* add compiler-specific path
* fix device code path
* add _CCC_ASSUME

Deprecate thrust::numeric_limits (#3366)

Replace `typedef` with `using` in libcu++ (#3368)

Deprecate thrust::optional (#3307)

Fixes: #3306

Upgrade to Catch2 3.8  (#3310)

Fixes: #1724

refactor `<cuda/std/cstdint>` (#3325)

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Update CODEOWNERS (#3331)

* Update CODEOWNERS

* Update CODEOWNERS

* Update CODEOWNERS

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Fix sign-compare warning (#3408)

Implement more cmath functions to be usable on host and device (#3382)

* Implement more cmath functions to be usable on host and device

* Implement math roots functions

* Implement exponential functions

Redefine and deprecate thrust::remove_cvref (#3394)

* Redefine and deprecate thrust::remove_cvref

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Fix assert definition for NVHPC due to constexpr issues (#3418)

NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it.

Fix this by always using the host definition which should also work on device.

Fixes #3411

Extend CUB reduce benchmarks (#3401)

* Rename max.cu to custom.cu, since it uses a custom operator
* Extend types covered my min.cu to all fundamental types
* Add some notes on how to collect tuning parameters

Fixes: #3283

Update upload-pages-artifact to v3 (#3423)

* Update upload-pages-artifact to v3

* Empty commit

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Replace and deprecate thrust::cuda_cub::terminate (#3421)

`std::linalg` accessors and `transposed_layout` (#2962)

Add round up/down to multiple (#3234)

[FEA]: Introduce Python module with CCCL headers (#3201)

* Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative

* Run `copy_cccl_headers_to_aude_include()` before `setup()`

* Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path.

* Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel

* Bug fix: cuda/_include only exists after shutil.copytree() ran.

* Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py

* Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions)

* Replace := operator (needs Python 3.8+)

* Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md

* Restore original README.md: `pip3 install -e` now works on first pass.

* cuda_cccl/README.md: FOR INTERNAL USE ONLY

* Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894035917)

Command used: ci/update_version.sh 2 8 0

* Modernize pyproject.toml, setup.py

Trigger for this change:

* https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894043178

* https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894044996

* Install CCCL headers under cuda.cccl.include

Trigger for this change:

* https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894048562

Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely.

* Factor out cuda_cccl/cuda/cccl/include_paths.py

* Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative

* Add missing Copyright notice.

* Add missing __init__.py (cuda.cccl)

* Add `"cuda.cccl"` to `autodoc.mock_imports`

* Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.)

* Add # TODO: move this to a module-level import

* Modernize cuda_cooperative/pyproject.toml, setup.py

* Convert cuda_cooperative to use hatchling as build backend.

* Revert "Convert cuda_cooperative to use hatchling as build backend."

This reverts commit 61637d608da06fcf6851ef6197f88b5e7dbc3bbe.

* Move numpy from [build-system] requires -> [project] dependencies

* Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH

* Remove copy_license() and use license_files=["../../LICENSE"] instead.

* Further modernize cuda_cccl/setup.py to use pathlib

* Trivial simplifications in cuda_cccl/pyproject.toml

* Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code

* Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml

* Add taplo-pre-commit to .pre-commit-config.yaml

* taplo-pre-commit auto-fixes

* Use pathlib in cuda_cooperative/setup.py

* CCCL_PYTHON_PATH in cuda_cooperative/setup.py

* Modernize cuda_parallel/pyproject.toml, setup.py

* Use pathlib in cuda_parallel/setup.py

* Add `# TOML lint & format` comment.

* Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml

* Use pathlib in cuda/cccl/include_paths.py

* pre-commit autoupdate (EXCEPT clang-format, which was manually restored)

* Fixes after git merge main

* Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result'

```
=========================================================================== warnings summary ===========================================================================
tests/test_reduce.py::test_reduce_non_contiguous
  /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080>

  Traceback (most recent call last):
    File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__
      bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result))
                                                       ^^^^^^^^^^^^^^^^^
  AttributeError: '_Reduce' object has no attribute 'build_result'

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ==============================================================
```

* Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy`

* Introduce cuda_cooperative/constraints.txt

* Also add cuda_parallel/constraints.txt

* Add `--constraint constraints.txt` in ci/test_python.sh

* Update Copyright dates

* Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024)

For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI.

* Remove unused cuda_parallel jinja2 dependency (noticed by chance).

* Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead.

* Make cuda_cooperative, cuda_parallel testing completely independent.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]"

This reverts commit ea33a218ed77a075156cd1b332047202adb25aa2.

Error message: https://github.com/NVIDIA/cccl/pull/3201#issuecomment-2594012971

* Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Restore original ci/matrix.yaml [skip-rapids]

* Use for loop in test_python.sh to avoid code duplication.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]

* Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]"

This reverts commit ec206fd8b50a6a293e00a5825b579e125010b13d.

* Implement suggestion by @shwina (https://github.com/NVIDIA/cccl/pull/3201#pullrequestreview-2556918460)

* Address feedback by @leofang

---------

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

cuda.parallel: Add optional stream argument to reduce_into() (#3348)

* Add optional stream argument to reduce_into()

* Add tests to check for reduce_into() stream behavior

* Move protocol related utils to separate file and rework __cuda_stream__ error messages

* Fix synchronization issue in stream test and add one more invalid stream test case

* Rename cuda stream validation function after removing leading underscore

* Unpack values from __cuda_stream__ instead of indexing

* Fix linting errors

* Handle TypeError when unpacking invalid __cuda_stream__ return

* Use stream to allocate cupy memory in new stream test

Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (#3434)

Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (#3419)

* Deprecate `cub::{min, max}` and replace internal uses with those from libcu++

Fixes #3404

move to c++17, finalize device optimization

fix msvc compilation, update tests

Deprectate C++11 and C++14 for libcu++ (#3173)

* Deprectate C++11 and C++14 for libcu++

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Implement `abs` and `div` from `cstdlib` (#3153)

* implement integer abs functions
* improve tests, fix constexpr support
* just use the our implementation
* implement `cuda::std::div`
* prefer host's `div_t` like types
* provide `cuda::std::abs` overloads for floats
* allow fp abs for NVRTC
* silence msvc's warning about conversion from floating point to integral

Fix missing radix sort policies (#3174)

Fixes NVBug 5009941

Introduces new `DeviceReduce::Arg{Min,Max}` interface with two output iterators (#3148)

* introduces new arg{min,max} interface with two output iterators

* adds fp inf tests

* fixes docs

* improves code example

* fixes exec space specifier

* trying to fix deprecation warning for more compilers

* inlines unzip operator

* trying to fix deprecation warning for nvhpc

* integrates supression fixes in diagnostics

* pre-ctk 11.5 deprecation suppression

* fixes icc

* fix for pre-ctk11.5

* cleans up deprecation suppression

* cleanup

Extend tuning documentation (#3179)

Add codespell pre-commit hook, fix typos in CCCL (#3168)

* Add codespell pre-commit hook
* Automatic changes from codespell.
* Manual changes.

Fix parameter space for TUNE_LOAD in scan benchmark (#3176)

fix various old compiler checks (#3178)

implement C++26 `std::projected` (#3175)

Fix pre-commit config for codespell and remaining typos (#3182)

Massive cleanup of our config (#3155)

Fix UB in atomics with automatic storage (#2586)

* Adds specialized local cuda atomics and injects them into most atomics paths.

Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: gonzalobg <[email protected]>

* Allow CUDA 12.2 to keep perf, this addresses earlier comments in #478

* Remove extraneous double brackets in unformatted code.

* Merge unsafe atomic logic into `__cuda_is_local`.

* Use `const_cast` for type conversions in cuda_local.h

* Fix build issues from interface changes

* Fix missing __nanosleep on sm70-

* Guard __isLocal from NVHPC

* Use PTX instead of running nothing from NVHPC

* fixup /s/nvrtc/nvhpc

* Fixup missing CUDA ifdef surrounding device code

* Fix codegen

* Bypass some sort of compiler bug on GCC7

* Apply suggestions from code review

* Use unsafe automatic storage atomics in codegen tests

---------

Co-authored-by: Georgy Evtushenko <[email protected]>
Co-authored-by: gonzalobg <[email protected]>
Co-authored-by: Michael Schellenberger Costa <[email protected]>

Refactor the source code layout for `cuda.parallel` (#3177)

* Refactor the source layout for cuda.parallel

* Add copyright

* Address review feedback

* Don't import anything into `experimental` namespace

* fix import

---------

Co-authored-by: Ashwin Srinath <[email protected]>

new type-erased memory resources (#2824)

s/_LIBCUDACXX_DECLSPEC_EMPTY_BASES/_CCCL_DECLSPEC_EMPTY_BASES/g (#3186)

Document address stability of `thrust::transform` (#3181)

* Do not document _LIBCUDACXX_MARK_CAN_COPY_ARGUMENTS
* Reformat and fix UnaryFunction/BinaryFunction in transform docs
* Mention transform can use proclaim_copyable_arguments
* Document cuda::proclaims_copyable_arguments better
* Deprecate depending on transform functor argument addresses

Fixes: #3053

turn off cuda version check for clangd (#3194)

[STF] jacobi example based on parallel_for (#3187)

* Simple jacobi example with parallel for and reductions

* clang-format

* remove useless capture list

fixes pre-nv_diag suppression issues (#3189)

Prefer c2h::type_name over c2h::demangle (#3195)

Fix memcpy_async* tests (#3197)

* memcpy_async_tx: Fix bug in test

Two bugs, one of which occurs in practice:

1. There is a missing fence.proxy.space::global between the writes to
   global memory and the memcpy_async_tx. (Occurs in practice)

2. The end of the kernel should be fenced with `__syncthreads()`,
   because the barrier is invalidated in the destructor. If other
   threads are still waiting on it, there will be UB. (Has not yet
   manifested itself)

* cp_async_bulk_tensor: Pre-emptively fence more in test

Add type annotations and mypy checks for `cuda.parallel`  (#3180)

* Refactor the source layout for cuda.parallel

* Add initial type annotations

* Update pre-commit config

* More typing

* Fix bad merge

* Fix TYPE_CHECKING and numpy annotations

* typing bindings.py correctly

* Address review feedback

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Fix rendering of cuda.parallel docs (#3192)

* Fix pre-commit config for codespell and remaining typos

* Fix rendering of docs for cuda.parallel

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Enable PDL for DeviceMergeSortBlockSortKernel (#3199)

The kernel already contains a call to _CCCL_PDL_GRID_DEPENDENCY_SYNC.
This commit enables PDL when launching the kernel.

Adds support for large `num_items` to `DeviceReduce::{ArgMin,ArgMax}` (#2647)

* adds benchmarks for reduce::arg{min,max}

* preliminary streaming arg-extremum reduction

* fixes implicit conversion

* uses streaming dispatch class

* changes arg benches to use new streaming reduce

* streaming arg-extrema reduction

* fixes style

* fixes compilation failures

* cleanups

* adds rst style comments

* declare vars const and use clamp

* consolidates argmin argmax benchmarks

* fixes thrust usage

* drops offset type in arg-extrema benchmarks

* fixes clang cuda

* exec space macros

* switch to signed global offset type for slightly better perf

* clarifies documentation

* applies minor benchmark style changes from review comments

* fixes interface documentation and comments

* list-init accumulating output op

* improves style, comments, and tests

* cleans up aggregate init

* renames dispatch class usage in benchmarks

* fixes merge conflicts

* addresses review comments

* addresses review comments

* fixes assertion

* removes superseded implementation

* changes large problem tests to use new interface

* removes obsolete tests for deprecated interface

Fixes for Python 3.7 docs environment (#3206)

Co-authored-by: Ashwin Srinath <[email protected]>

Adds support for large number of items to `DeviceTransform` (#3172)

* moves large problem test helper to common file

* adds support for large num items to device transform

* adds tests for large number of items to device interface

* fixes format

* addresses review comments

cp_async_bulk: Fix test (#3198)

* memcpy_async_tx: Fix bug in test

Two bugs, one of which occurs in practice:

1. There is a missing fence.proxy.space::global between the writes to
   global memory and the memcpy_async_tx. (Occurs in practice)

2. The end of the kernel should be fenced with `__syncthreads()`,
   because the barrier is invalidated in the destructor. If other
   threads are still waiting on it, there will be UB. (Has not yet
   manifested itself)

* cp_async_bulk_tensor: Pre-emptively fence more in test

* cp_async_bulk: Fix test

The global memory pointer could be misaligned.

cudax fixes for msvc 14.41 (#3200)

avoid instantiating class templates in `is_same` implementation when possible (#3203)

Fix: make launchers a CUB detail; make kernel source functions hidden. (#3209)

* Fix: make launchers a CUB detail; make kernel source functions hidden.

* [pre-commit.ci] auto code formatting

* Address review comments, fix which macro gets fixed.

help the ranges concepts recognize standard contiguous iterators in c++14/17 (#3202)

unify macros and cmake options that control the suppression of deprecation warnings (#3220)

* unify macros and cmake options that control the suppression of deprecation warnings

* suppress nvcc warning #186 in thrust header tests

* suppress c++ dialect deprecation warnings in libcudacxx header tests

Fx thread-reduce performance regression (#3225)

cuda.parallel: In-memory caching of build objects (#3216)

* Define __eq__ and __hash__ for Iterators

* Define cache_with_key utility and use it to cache Reduce objects

* Add tests for caching Reduce objects

* Tighten up types

* Updates to support 3.7

* Address review feedback

* Introduce IteratorKind to hold iterator type information

* Use the .kind to generate an abi_name

* Remove __eq__ and __hash__ methods from IteratorBase

* Move helper function

* Formatting

* Don't unpack tuple in cache key

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Just enough ranges for c++14 `span` (#3211)

use generalized concepts portability macros to simplify the `range` concept (#3217)

fixes some issues in the concepts portability macros and then re-implements the `range` concept with `_CCCL_REQUIRES_EXPR`

Use Ruff to sort imports (#3230)

* Update pyproject.tomls for import sorting

* Update files after running pre-commit

* Move ruff config to pyproject.toml

---------

Co-authored-by: Ashwin Srinath <[email protected]>

fix tuning_scan sm90 config issue (#3236)

Co-authored-by: Shijie Chen <[email protected]>

[STF] Logical token (#3196)

* Split the implementation of the void interface into the definition of the interface, and its implementations on streams and graphs.

* Add missing files

* Check if a task implementation can match a prototype where the void_interface arguments are ignored

* Implement ctx.abstract_logical_data() which relies on a void data interface

* Illustrate how to use abstract handles in local contexts

* Introduce an is_void_interface() virtual method in the data interface to potentially optimize some stages

* Small improvements in the examples

* Do not try to allocate or move void data

* Do not use I as a variable

* fix linkage error

* rename abtract_logical_data into logical_token

* Document logical token

* fix spelling error

* fix sphinx error

* reflect name changes

* use meaningful variable names

* simplify logical_token implementation because writeback is already disabled

* add a unit test for token elision

* implement token elision in host_launch

* Remove unused type

* Implement helpers to check if a function can be invoked from a tuple, or from a tuple where we removed tokens

* Much simpler is_tuple_invocable_with_filtered implementation

* Fix buggy test

* Factorize code

* Document that we can ignore tokens for task and host_launch

* Documentation for logical data freeze

Fix ReduceByKey tuning (#3240)

Fix RLE tuning (#3239)

cuda.parallel: Forbid non-contiguous arrays as inputs (or outputs) (#3233)

* Forbid non-contiguous arrays as inputs (or outputs)

* Implement a more robust way to check for contiguity

* Don't bother if cublas unavailable

* Fix how we check for zero-element arrays

* sort imports

---------

Co-authored-by: Ashwin Srinath <[email protected]>

expands support for more offset types in segmented benchmark (#3231)

Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects (#3253)

* Add escape hatches to the cmake configuration of the header tests so that we can tests deprecated compilers / dialects

* Do not add option twice

ptx: Add add_instruction.py (#3190)

This file helps create the necessary structure for new PTX instructions.

Co-authored-by: Allard Hendriksen <[email protected]>

Bump main to 2.9.0. (#3247)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Drop cub::Mutex (#3251)

Fixes: #3250

Remove legacy macros from CUB util_arch.cuh (#3257)

Fixes: #3256

Remove thrust::[unary|binary]_traits (#3260)

Fixes: #3259

Architecture and OS identification macros (#3237)

Bump main to 3.0.0. (#3265)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Drop thrust not1 and not2 (#3264)

Fixes: #3263

CCCL Internal macro documentation (#3238)

Deprecate GridBarrier and GridBarrierLifetime (#3258)

Fixes: #1389

Require at least gcc7 (#3268)

Fixes: #3267

Drop thrust::[unary|binary]_function (#3274)

Fixes: #3273

Drop ICC from CI (#3277)

[STF] Corruption of the capture list of an extended lambda with a parallel_for construct on a host execution place (#3270)

* Add a test to reproduce a bug observed with parallel_for on a host place

* clang-format

* use _CCCL_ASSERT

* Attempt to debug

* do not create a tuple with a universal reference that is out of scope when we use it, use an lvalue instead

* fix lambda expression

* clang-format

Enable thrust::identity test for non-MSVC (#3281)

This seems to be an oversight when the test was added

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Enable PDL in triple chevron launch (#3282)

It seems PDL was disabled by accident when _THRUST_HAS_PDL was renamed
to _CCCL_HAS_PDL during the review introducing the feature.

Disambiguate line continuations and macro continuations in <nv/target> (#3244)

Drop VS 2017 from CI (#3287)

Fixes: #3286

Drop ICC support in code (#3279)

* Drop ICC from code

Fixes: #3278

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Make CUB NVRTC commandline arguments come from a cmake template (#3292)

Propose the same components (thrust, cub, libc++, cudax, cuda.parallel,...) in the bug report template than in the feature request template (#3295)

Use process isolation instead of default hyper-v for Windows. (#3294)

Try improving build times by using process isolation instead of hyper-v

Co-authored-by: Michael Schellenberger Costa <[email protected]>

[pre-commit.ci] pre-commit autoupdate (#3248)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.6](https://github.com/pre-commit/mirrors-clang-format/compare/v18.1.8...v19.1.6)
- [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](https://github.com/astral-sh/ruff-pre-commit/compare/v0.8.3...v0.8.6)
- [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](https://github.com/pre-commit/mirrors-mypy/compare/v1.13.0...v1.14.1)

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Drop Thrust legacy arch macros (#3298)

Which were disabled and could be re-enabled using THRUST_PROVIDE_LEGACY_ARCH_MACROS

Drop Thrust's compiler_fence.h (#3300)

Drop CTK 11.x from CI (#3275)

* Add cuda12.0-gcc7 devcontainer
* Move MSVC2017 jobs to CTK 12.6
Those is the only combination where rapidsai has devcontainers
* Add /Zc:__cplusplus for the libcudacxx tests
* Only add excape hatch for affected CTKs
* Workaround missing cudaLaunchKernelEx on MSVC
cudaLaunchKernelEx requires C++11, but unfortunately <cuda_runtime.h> checks this using the __cplusplus macro, which is reported wrongly for MSVC. CTK 12.3 fixed this by additionally detecting _MSV_VER. As a workaround, we provide our own copy of cudaLaunchKernelEx when it is not available from the CTK.
* Workaround nvcc+MSVC issue
* Regenerate devcontainers

Fixes: #3249

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Update packman and repo_docs versions (#3293)

Co-authored-by: Ashwin Srinath <[email protected]>

Drop Thrust's deprecated compiler macros (#3301)

Drop CUB_RUNTIME_ENABLED and __THRUST_HAS_CUDART__ (#3305)

Adds support for large number of items to `DevicePartition::If` with the `ThreeWayPartition` overload (#2506)

* adds support for large number of items to three-way partition

* adapts interface to use choose_signed_offset_t

* integrates applicable feedback from device-select pr

* changes behavior for empty problems

* unifies grid constant macro

* fixes kernel template specialization mismatch

* integrates _CCCL_GRID_CONSTANT changes

* resolve merge conflicts

* fixes checks in test

* fixes test verification

* improves tests

* makes few improvements to streaming dispatch

* improves code comment on test

* fixes unrelated compiler error

* minor style improvements

Refactor scan tunings (#3262)

Require C++17 for compiling Thrust and CUB (#3255)

* Issue an unsuppressable warning when compiling with < C++17
* Remove C++11/14 presets
* Remove CCCL_IGNORE_DEPRECATED_CPP_DIALECT from headers
* Remove [CUB|THRUST|TCT]_IGNORE_DEPRECATED_CPP_[11|14]
* Remove CUB_ENABLE_DIALECT_CPP[11|14]
* Update CI runs
* Remove C++11/14 CI runs for CUB and Thrust
* Raise compiler minimum versions for C++17
* Update ReadMe
* Drop Thrust's cpp14_required.h
* Add escape hatch for C++17 removal

Fixes: #3252

Implement `views::empty` (#3254)

* Disable pair conversion of subrange with clang in C++17

* Fix namespace views

* Implement `views::empty`

This implements `std::ranges::views::empty`, see https://en.cppreference.com/w/cpp/ranges/empty_view

Refactor `limits` and `climits` (#3221)

* implement builtins for huge val, nan and nans

* change `INFINITY` and `NAN` implementation for NVRTC

cuda.parallel: Add documentation for the current iterators along with examples and tests (#3311)

* Add tests demonstrating usage of different iterators

* Update documentation of reduce_into by merging import code snippet with the rest of the example

* Add documentation for current iterators

* Run pre-commit checks and update accordingly

* Fix comments to refer to the proper lines in the code snippets in the docs

Drop clang<14 from CI, update devcontainers. (#3309)

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

[STF] Cleanup task dependencies object constructors (#3291)

* Define tag types for access modes

* - Rework how we build task_dep objects based on access mode tags
- pack_state is now responsible for using a const_cast for read only data

* Greatly simplify the previous attempt : do not define new types, but use integral constants based on the enums

* It seems the const_cast was not necessarily so we can simplify it and not even do some dispatch based on access modes

Disable test with a gcc-14 regression (#3297)

Deprecate Thrust's cpp_compatibility.h macros (#3299)

Remove dropped function objects from docs (#3319)

Document `NV_TARGET` macros (#3313)

[STF] Define ctx.pick_stream() which was missing for the unified context (#3326)

* Define ctx.pick_stream() which was missing for the unified context

* clang-format

Deprecate cub::IterateThreadStore (#3337)

Drop CUB's BinaryFlip operator (#3332)

Deprecate cub::Swap (#3333)

Clarify transform output can overlap input (#3323)

Drop CUB APIs with a debug_synchronous parameter (#3330)

Fixes: #3329

Drop CUB's util_compiler.cuh for real (#3340)

PR #3302 planned to drop the file, but only dropped its content. This
was an oversight. So let's drop the entire file.

Drop cub::ValueCache (#3346)

limits offset types for merge sort (#3328)

Drop CDPv1 (#3344)

Fixes: #3341

Drop thrust::void_t (#3362)

Use cuda::std::addressof in Thrust (#3363)

Fix all_of documentation for empty ranges (#3358)

all_of always returns true on an empty range.

[STF] Do not keep track of dangling events in a CUDA graph backend (#3327)

* Unlike the CUDA stream backend, nodes in a CUDA graph are necessarily done when
the CUDA graph completes. Therefore keeping track of "dangling events" is a
waste of time and resources.

* replace can_ignore_dangling_events by track_dangling_events which leads to more readable code

* When not storing the dangling events, we must still perform the deinit operations that were producing these events !

Extract scan kernels into NVRTC-compilable header (#3334)

* Extract scan kernels into NVRTC-compilable header

* Update cub/cub/device/dispatch/dispatch_scan.cuh

Co-authored-by: Georgii Evtushenko <[email protected]>

---------

Co-authored-by: Ashwin Srinath <[email protected]>
Co-authored-by: Georgii Evtushenko <[email protected]>

Drop deprecated aliases in Thrust functional (#3272)

Fixes: #3271

Drop cub::DivideAndRoundUp (#3347)

Use cuda::std::min/max in Thrust (#3364)

Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (#3361)

* implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16`

Cleanup util_arch (#2773)

Deprecate thrust::null_type (#3367)

Deprecate cub::DeviceSpmv (#3320)

Fixes: #896

Improves `DeviceSegmentedSort` test run time for large number of items and segments (#3246)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* fixes spelling

* adds tests for large number of segments

* fixes narrowing conversion in tests

* addresses review comments

* fixes includes

Compile basic infra test with C++17 (#3377)

Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (#3308)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* addresses review comments

* introduces segment offset type

* adds tests for large number of segments

* adds support for large number of segments

* drops segment offset type

* fixes thrust namespace

* removes about-to-be-deprecated cub iterators

* no exec specifier on defaulted ctor

* fixes gcc7 linker error

* uses local_segment_index_t throughout

* determine offset type based on type returned by segment iterator begin/end iterators

* minor style improvements

Exit with error when RAPIDS CI fails. (#3385)

cuda.parallel: Support structured types as algorithm inputs (#3218)

* Introduce gpu_struct decorator and typing

* Enable `reduce` to accept arrays of structs as inputs

* Add test for reducing arrays-of-struct

* Update documentation

* Use a numpy array rather than ctypes object

* Change zeros -> empty for output array and temp storage

* Add a TODO for typing GpuStruct

* Documentation udpates

* Remove test_reduce_struct_type from test_reduce.py

* Revert to `to_cccl_value()` accepting ndarray + GpuStruct

* Bump copyrights

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Deprecate thrust::async (#3324)

Fixes: #100

Review/Deprecate CUB `util.ptx` for CCCL 2.x (#3342)

Fix broken `_CCCL_BUILTIN_ASSUME` macro (#3314)

* add compiler-specific path
* fix device code path
* add _CCC_ASSUME

Deprecate thrust::numeric_limits (#3366)

Replace `typedef` with `using` in libcu++ (#3368)

Deprecate thrust::optional (#3307)

Fixes: #3306

Upgrade to Catch2 3.8  (#3310)

Fixes: #1724

refactor `<cuda/std/cstdint>` (#3325)

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Update CODEOWNERS (#3331)

* Update CODEOWNERS

* Update CODEOWNERS

* Update CODEOWNERS

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Fix sign-compare warning (#3408)

Implement more cmath functions to be usable on host and device (#3382)

* Implement more cmath functions to be usable on host and device

* Implement math roots functions

* Implement exponential functions

Redefine and deprecate thrust::remove_cvref (#3394)

* Redefine and deprecate thrust::remove_cvref

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Fix assert definition for NVHPC due to constexpr issues (#3418)

NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it.

Fix this by always using the host definition which should also work on device.

Fixes #3411

Extend CUB reduce benchmarks (#3401)

* Rename max.cu to custom.cu, since it uses a custom operator
* Extend types covered my min.cu to all fundamental types
* Add some notes on how to collect tuning parameters

Fixes: #3283

Update upload-pages-artifact to v3 (#3423)

* Update upload-pages-artifact to v3

* Empty commit

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Replace and deprecate thrust::cuda_cub::terminate (#3421)

`std::linalg` accessors and `transposed_layout` (#2962)

Add round up/down to multiple (#3234)

[FEA]: Introduce Python module with CCCL headers (#3201)

* Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative

* Run `copy_cccl_headers_to_aude_include()` before `setup()`

* Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path.

* Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel

* Bug fix: cuda/_include only exists after shutil.copytree() ran.

* Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py

* Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions)

* Replace := operator (needs Python 3.8+)

* Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md

* Restore original README.md: `pip3 install -e` now works on first pass.

* cuda_cccl/README.md: FOR INTERNAL USE ONLY

* Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894035917)

Command used: ci/update_version.sh 2 8 0

* Modernize pyproject.toml, setup.py

Trigger for this change:

* https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894043178

* https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894044996

* Install CCCL headers under cuda.cccl.include

Trigger for this change:

* https://github.com/NVIDIA/cccl/pull/3201#discussion_r1894048562

Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely.

* Factor out cuda_cccl/cuda/cccl/include_paths.py

* Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative

* Add missing Copyright notice.

* Add missing __init__.py (cuda.cccl)

* Add `"cuda.cccl"` to `autodoc.mock_imports`

* Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.)

* Add # TODO: move this to a module-level import

* Modernize cuda_cooperative/pyproject.toml, setup.py

* Convert cuda_cooperative to use hatchling as build backend.

* Revert "Convert cuda_cooperative to use hatchling as build backend."

This reverts commit 61637d608da06fcf6851ef6197f88b5e7dbc3bbe.

* Move numpy from [build-system] requires -> [project] dependencies

* Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH

* Remove copy_license() and use license_files=["../../LICENSE"] instead.

* Further modernize cuda_cccl/setup.py to use pathlib

* Trivial simplifications in cuda_cccl/pyproject.toml

* Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code

* Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml

* Add taplo-pre-commit to .pre-commit-config.yaml

* taplo-pre-commit auto-fixes

* Use pathlib in cuda_cooperative/setup.py

* CCCL_PYTHON_PATH in cuda_cooperative/setup.py

* Modernize cuda_parallel/pyproject.toml, setup.py

* Use pathlib in cuda_parallel/setup.py

* Add `# TOML lint & format` comment.

* Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml

* Use pathlib in cuda/cccl/include_paths.py

* pre-commit autoupdate (EXCEPT clang-format, which was manually restored)

* Fixes after git merge main

* Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result'

```
=========================================================================== warnings summary ===========================================================================
tests/test_reduce.py::test_reduce_non_contiguous
  /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080>

  Traceback (most recent call last):
    File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__
      bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result))
                                                       ^^^^^^^^^^^^^^^^^
  AttributeError: '_Reduce' object has no attribute 'build_result'

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ==============================================================
```

* Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy`

* Introduce cuda_cooperative/constraints.txt

* Also add cuda_parallel/constraints.txt

* Add `--constraint constraints.txt` in ci/test_python.sh

* Update Copyright dates

* Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024)

For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI.

* Remove unused cuda_parallel jinja2 dependency (noticed by chance).

* Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead.

* Make cuda_cooperative, cuda_parallel testing completely independent.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Fix sign-compare warning (#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]"

This reverts commit ea33a218ed77a075156cd1b332047202adb25aa2.

Error message: https://github.com/NVIDIA/cccl/pull/3201#issuecomment-2594012971

* Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Restore original ci/matrix.yaml [skip-rapids]

* Use for loop in test_python.sh to avoid code duplication.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]

* Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]"

This reverts commit ec206fd8b50a6a293e00a5825b579e125010b13d.

* Implement suggestion by @shwina (https://github.com/NVIDIA/cccl/pull/3201#pullrequestreview-2556918460)

* Address feedback by @leofang

---------

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

cuda.parallel: Add optional stream argument to reduce_into() (#3348)

* Add optional stream argument to reduce_into()

* Add tests to check for reduce_into() stream behavior

* Move protocol related utils to separate file and rework __cuda_stream__ error messages

* Fix synchronization issue in stream test and add one more invalid stream test case

* Rename cuda stream validation function after removing leading underscore

* Unpack values from __cuda_stream__ instead of indexing

* Fix linting errors

* Handle TypeError when unpacking invalid __cuda_stream__ return

* Use stream to allocate cupy memory in new stream test

Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (#3434)

Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (#3419)

* Deprecate `cub::{min, max}` and replace internal uses with those from libcu++

Fixes #3404

Fix CI issues (#3443)

update docs

fix review

restrict allowed types

replace constexpr implementations with generic

optimize `__is_arithmetic_integral`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Drop CI runs for CTK 11.x
3 participants