Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with register number used in ptx generation #14

Open
babouFomb opened this issue Mar 13, 2017 · 2 comments
Open

Problem with register number used in ptx generation #14

babouFomb opened this issue Mar 13, 2017 · 2 comments

Comments

@babouFomb
Copy link

The default maxrregcount is set to 64 in the file:

llvm/tools/clang/lib/Driver/Tools.cpp(9953).

But when I rebuilt clang with maxrregcount set to 20 and rebuilt libomptarget, there are no errors or warning message during the compilation.

I tested this on simple Sobel non separated filter:

void SobelFilter::nonSepSobel_V(uint8_t* in, int h, int w, uint8_t* out_v) {
#pragma omp target data map(to:in[0:w*h]) map(from:out_v[0:w*h])
`{

#pragma omp target teams distribute parallel for collapse(2) schedule(static,1)

for(int y=0; y<h; y++) {

  for(int x=0; x<w; x++) {

    out_v[y*w+x] = (-in[(y-1)*w+(x-1)] + in[(y-1)*w+(x+1)]

                    -2*in[y*w+(x-1)] + 2*in[y*w+(x+1)]

                    -in[(y+1)*w+(x-1)] + in[(y+1)*w+(x+1)])/9;

  }
}

}
}`

I get this linking errors:
`nvlink error : entry function '_omptgt__0_12c01e8_804' with max regcount of 20 calls function '__kmpc_for_static_init_4' with regcount of 34

nvlink error : entry function '_omptgt__2_12c01e8_804' with max regcount of 20 calls function '__kmpc_for_static_init_4' with regcount of 34

nvlink error : entry function '_omptgt__1_12c01e8_804' with max regcount of 20 calls function '__kmpc_for_static_init_4' with regcount of 34

nvlink error : entry function '_omptgt__0_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_init' with regcount of 34

nvlink error : entry function '_omptgt__2_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_init' with regcount of 34

nvlink error : entry function '_omptgt__1_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_init' with regcount of 34

nvlink error : entry function '_omptgt__0_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_prepare_parallel' with regcount of 34

nvlink error : entry function '_omptgt__2_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_prepare_parallel' with regcount of 34

nvlink error : entry function '_omptgt__1_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_prepare_parallel' with regcount of 34

nvlink error : entry function '_omptgt__0_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_parallel' with regcount of 34

nvlink error : entry function '_omptgt__2_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_parallel' with regcount of 34

nvlink error : entry function '_omptgt__1_12c01e8_804' with max regcount of 20 calls function '__kmpc_kernel_parallel' with regcount of 34

nvlink info : 1442901012 bytes gmem

nvlink info : Function properties for '_omptgt__0_12c01e8_804':

nvlink info : used 34 registers, 336 stack, 4240 bytes smem, 376 bytes cmem[0], 0 bytes lmem

nvlink info : Function properties for '_omptgt__2_12c01e8_804':

nvlink info : used 34 registers, 296 stack, 4240 bytes smem, 360 bytes cmem[0], 0 bytes lmem

nvlink info : Function properties for '_omptgt__1_12c01e8_804':

nvlink info : used 34 registers, 296 stack, 4240 bytes smem, 360 bytes cmem[0], 0 bytes lmem

clang-3.8: error: nvlink command failed with exit code 255 (use -v to see invocation)
make[2]: *** [sobel.out] Error 255
make[1]: *** [CMakeFiles/sobel.out.dir/all] Error 2
make: *** [all] Error 2
`
I have a look in the source of libomptarget, and I can not understand how and when the parameters of kmpc_kernel * are initialized. And where does that number "34" come from?

@arpith-jacob
Copy link

This is probably because the OpenMP runtime for the GPU (libomptarget/nvptx) was compiled without a maxregcount flag. As a consequence the runtime is probably using >20 registers. If you really want to do this, you should add a flag to constrain the building of the runtime as well.

As an FYI, a newer version of our compiler does not precompile the GPU runtime. It actually compiles it with the user program so you will not have this issue.

@babouFomb
Copy link
Author

Thanks for your response.

Adding --maxrregcount=16 flag in CMakelist.txt (line 50) and Makefile (line 36) in libomptarget/DevRTLs/nvptx don't solve the problem.

I get the sames errors with nvink.

I'am sorry, but I don't understand that you meaning by "FYI". And what's your newer version of your compiler.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants