FFTX
FFTX is a performance-portable, open-source FFT software system for CPUs and GPUs analogous to FFTW for CPU systems. It supports application-specific optimizations corresponding to integrating more of the algorithms into the analysis / code generation process. FFTX is based on the use of Spiral, an open-source analysis and code generation tool chain for FFTs and tensor algebra algorithms, developed at Carnegie-Mellon University and SpiralGen, Inc.; an FFTX user API implemented in standard C++; and a factored design that allows FFTX / Spiral to be more easily ported across multiple architectures. In FFTX, we can represent larger integrated algorithms that include FFTs, i.e. ones where FFTs are composed with algorithmic operations such as multiplication by a (potentially matrix-valued) symbol and batching. By combining substeps in an integrated algorithm, the amount of data traffic can be reduced by significant amounts. FFTX applies the size-specific analysis and automatic code generation techniques in Spiral to generate code, leading to implementations with far higher performance than obtainable from approaches based on black-box FFT implementations. In addition, such an integrated algorithm approach often leads to a reduced memory footprint.
As well as deploying generic FFT capabilities, FFTX is being used to develop integrated algorithms in support of the ExaScale Computing Project in the following areas: single-particle imaging in light-source experiments (ExaFEL); solutions to Maxwell's equations (WarpX); and evaluation of plane-wave basis functions in density functional theory for materials science (NWChemEx). FFTX is being deployed on CPU systems, as well as NVidia, AMD, and Intel GPU systems.