-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL version? #8
Comments
@campbx It is not fully optimized yet, but it's faster than any im2col/col2im (explicit GEMM) implementation. |
I don't know if AMD could be interested to add a transparent backend for HSA like this. Actually libdnn rely on viennacl, and there was some official HSA backend initiative but I don't think that was upstreamed. |
/cc @gstoner |
Post ROCm 1.3 release. We will be putting out a developer release of OpenCL Language Runtime and Compiler on ROCm. This will be on our new native GCN ISA compiler. We holding to our promise we make the stack opensource. We had lot of work to do around OpenCL to make this happen. It is big shift for us since we are no longer leveraging the our historical two stage compiler architecture.
The LLVM native GCN ISA code generator has already been upstreamed http://llvm.org/docs/AMDGPUUsage.html. Also we now have released the Device libs for https://github.com/RadeonOpenCompute/ROCm-Device-Libs where you find the math intrinsics for OpenCL already. You will also see we are active on CLANG OpenCL development. You also find we have moved to standardize code object loader and API for compiler https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc Lots of pieces we had to pull back togther and make them clean for comunity. On the Viennacl lib, there was Graduate School port to HSA Runtime with HSAIL codegeneration as backend. We have not seem much progress on it. |
@gstoner ViennaCL is no strict requirement, the CUDA backend for LibDNN goes around ViennaCL and is used natively. So the same is possible by using HSA. |
We looked at it lot ViennaCL HIP port might be another way to attach to the platform. |
@bhack @gstoner If you look at the following kernel it's obvious why (lots of branching, low ILP, no reuse of computed offsets over the feature maps and batch size, quite many registers lost for arrays, fixed to a maximum of 6 spatial dimensions, etc.):
(this is the current all-purpose ND-pooling kernel in OpenCL Caffe for Max-pooling. Currently, with FGLRX, (OpenCL 2.0) the W9100 can do about 700 images/second in AlexNet. A GTX 1080 can do up to 3000/second with cuDNN. So there is still a long way to go, but I hope we can do 1200 images/second on the W9100/RX480. A step back was the AMDGPU-PRO driver (OpenCL 1.2), which currently reduces the performance of a W9100 to 450 images/second. Optimally, by mid-2017, a Vega card should do 2600 images/second FP32 and 5200 images/second FP16. But that is wishful speculation ;) |
Are we going out of libdnn/convolution dogma? ;) |
@gstoner as we have already discissed with @naibaf7 this kind of Hsail kernels approach doesn't improve the competiton against cudnn so much. |
@bhack I add kernels that have a significant performance effect or are required for work in my other projects as I go along, it might violate the dogma ;) |
It would be interesting to know if AMD with @GPUOpen-ProfessionalCompute-Libraries will support the new neural OpenVx extension with https://github.com/GPUOpen-ProfessionalCompute-Libraries/amdovx-modules/ |
First, we are working on optimized Deep Learning solver for our hardware. This is only way to close the gap with cuDNN. We have a dedicated team working on this, and some very interesting advisor helping us. We are also working on more optimized version of Caffe, Tensorflow and Torch7. We have also been focusing ROCm on the needs Deep learning. There are number key capabilities we need to bring to our drivers help up make it easier drive application optimization and scale.
We working on some new capabilities for Performance Tools & Debugging beyond what you’re seeing in the public today. It is lot of moving parts, but SC 2016 will be one-year anniversary since we announce Boltzmann Initiative. In that time,
We now have velocity and focus, Our Deep Learning Solver will have the same amount intensity. Welcome to the new Radeon Open Compute Program. |
On Float16, this is article I put toghter bellow, we working hard to get full F16/init16 Instruction support into the new GCN Compiler. This is stage 1 Float 16 support. https://radeonopencompute.github.io/GCN_Float16.html Fiji Family of Hardware: Radeon R9 Nano, R9 Fury, R9 Fury X, FirePro S9300x2, We will also expose our GCN 3 ISA via assembler directly support by compiler. The new LLVM Native GCN ISA compiler supports a disassembler, assembler and soon inline-assembly so you be able tune your code even further. ROCm Compilers will be bring full richness of FLOAT16 and Int16 via HCC, HIP and OpenCL. You can find out more on Float16 and other instruction in the GCN version 3 ISA manual V_FREXP_EXP_I16_F16 Returns exponent of half precision float input, such that the original single float = significand * (2 * exponent). V_ADD_U16 D.u16 = S0.u16 + S1.u16. Supports saturation (unsigned 16-bit integer domain). |
/cc @hughperkins I think that he could be interested in last comments of this thread. |
|
We had this with OpenCL 1.2 SPIR 1.2 with OpenCL, three people in the world used it My team, Continuum IO and Codeplay. Honestly this path is great for Compiler Prototypes, but when you get deep into your work you want more control over the compiler. My team and Continuum IO move away from it since it was overly constrained solution and did not allow you to solve key problem you face when you bring a lanuguage that is not like OpenCL, on the platform. On ROCm we give you the full LLVM IR interface since we are have up streamed the full source to the AMDGPU GCN compiler, and you have low level access to the ROCr system runtime for when you really want to tune it for performane. You can extend the ROCm device-library compiler intrinsics with the now public Open Compute Math Library and Open Compute kernel language: We now have standardized loader Interface, here is the ABI documentation which we plumb up via our language runtimes as well. One big thing, the compiler is be developed so we can do true offline compilation, and can be upgraded it independent of the driver. Also ROCr is language independent system runtime, which you can load binary and language runtime at execution, just like you do with CPU based software development. No more monolthic blob of stuff. |
Intel Beignet: 2.0 done https://lists.freedesktop.org/archives/beignet/2017-January/008476.html |
Cool, will test next week :) |
@naibaf7 Is there a possibility to have upstreamed Intel kernels? Cause I think mkl-dnn and mkl 2017 will cover only CPU. |
Thanks @bhack for a link - I will see if I can use it on my laptop with Ubuntu 16.04. I just had some bad experience installing Intel GPU drivers to support OpenCL on Ubuntu in the past, so hope it became a bit more user-friendly ;) ... |
Hi Fabian, I couldn't figure out which version of OpenCL does libDNN support. Is it 1.1 or 2.x?
Thanks!
PS: hope AMD employees @fsword73 and @dagamayank can chip some optimized code in to make libDNN a fast replacement for cuDNN.
The text was updated successfully, but these errors were encountered: