You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current state of IPPL's FFT has a few problems.
CPU
For the CPU case, the default Kokkos::View Layout is LayoutRight, which results in this loop being a transposition, resulting in slow runtime. Heffte's source code states
//! \brief Constructs a box from the low and high indexes, the span in each direction includes the low and high (uses default order).box3d(std::array<index, 3> clow, std::array<index, 3> chigh) :
low(clow), high(chigh), size(...), order({0, 1, 2})
{}
This corresponds to a Left-Layout (row-major), though this could be changed to have order {2,1,0} for CPU runs.
Interestingly, the FFTW docs on Column-Major format (which is LayoutRight in Kokkos) state that a simple reversal of indices would be enough.
Current performance
Comparing the current IPPL FFT to a simple FFTW yields a factor 42 discrepancy in the time required for a R2C transform.
The text was updated successfully, but these errors were encountered:
Yes, if heffte doesnt have some other restrictions.
It would also be nice if the boxes could be picked to exclude ghost cells, but I dont know if that would be possible
The current state of IPPL's FFT has a few problems.
CPU
For the CPU case, the default
Kokkos::View
Layout isLayoutRight
, which results in this loop being a transposition, resulting in slow runtime.Heffte's source code states
This corresponds to a Left-Layout (row-major), though this could be changed to have order {2,1,0} for CPU runs.
Interestingly, the FFTW docs on Column-Major format (which is
LayoutRight
in Kokkos) state that a simple reversal of indices would be enough.Current performance
Comparing the current IPPL FFT to a simple FFTW yields a factor 42 discrepancy in the time required for a
R2C
transform.The text was updated successfully, but these errors were encountered: