-
Notifications
You must be signed in to change notification settings - Fork 10
Identify features we want that are not in ISO #2
Comments
@ibaned / @alanhumphrey - in almost every compiler I've used with intrinsics, a multiply followed by an add intrinsic is converted to an FMA (if they compiler has FMA enabled). I would like us to avoid using lots of fancy (but unnecessarily complex) C++ to achieve what a minimal peephole optimizer can do. |
Also, I wasn't sure why we still hadn't evaluated the use of |
@nmhamster I personally was unaware of that. The way the ISO interface works, we have template specializations called "ABI"s (a bit of misnomer). Some of those "ABI"s are directly calling intrinsics, but I also implemented one where the data type was just |
@ibaned - right, I was thinking the same thing. I am interested to see what performance we get. The GCC vector attributes do support boolean operators as |
I would really recommend having a test suite, looping in James Elliott, and tracking performance across compilers. At LLNL we were toying with these kinds of libraries as I left, and it felt like every month we'd find out that such-and-such a compiler suddenly wasn't optimizing such-and-such a mechanism well anymore. If we have this work and a guide saying which compilers do better with which ABI's, we're in a good place. |
Rather than just relying on profiling, I think we need to first hand actually take a good look at the code which is being generated and some of the compiler output. A human in the loop during development is essential to understanding why the compiler behaves as it does. Once we have that settled down a little more, I think the transition to profiled-based on-going assessment will be useful. I am particularly interested in whether we actually execute the vectorized code even in the event that we generate it since Intel in particular has some interesting runtime choices which sometimes make this not the case. In short, we should do some homework here as a preliminary step. |
@DavidPoliakoff - Agreed on the test suite, etc. We talked at some length about this Friday. Also agree with @nmhamster on having a human in the loop initially, doing our homework, e.g., seeing what code is generated and whether we actually execute that vectorized code. I will transition fully to this effort early next week (0.60 FTE is my SNL contract), and can stay on it for the necessary duration. Thanks @ibaned for getting this conversation started. |
Since this issue was originally about missing pieces to the ISO interface, I'm going to answer the question that @alanphumphrey asked in the other issue because it fits better here. My thinking is that we should try to propose changes to the ISO interface, especially where we see that it cannot be as fast as hand-coding without those changes or it is super inconvenient without them. So far I think there are three changes we can think about individually:
|
@nmhamster I have some early data on different high-level approaches using a full and very non-trivial Sandia application built using Clang on Mac:
It seems like calling vendor-specific intrinsics can still be way better in many cases. |
@ibaned wrote:
Matthias Kretz had a proposal to permit overloading the ternary operator. |
@mhoemmen awesome! |
@ibaned It's still just a proposal :-) Not sure how long it will take to get through. |
Yep won't hold my breath :) |
In the spirit of recording things that we might want to create ISO C++ papers about, @alanw0 and the STK team identified that an equivalent of T multiplysign(T a, T b) {
return a * copysign(1.0, b);
} |
if_then_else
instk_simd
,choose
in prototype)cmath
functions (the ISO paper doesn't mention these I think...)The text was updated successfully, but these errors were encountered: