Optimizing the domain wall fermion Dirac operator using the R-Stream source-to-source compiler

INSPIRE: 1408296
arXiv: 1512.01542
DOI: 10.22323/1.251.0022

Authors: Lin, Meifeng, Papenhausen, Eric, Langston, M. Harper, Meister, Benoit, Baskaran, Muthu, Izubuchi, Taku, Jung, Chulwoo

Submitted: 4 December 2015

Subjects:

Journal reference: PoS LATTICE2015 022 (2016)

Abstract

The application of the Dirac operator on a spinor field, the Dslash operation, is the most computation-intensive part of the lattice QCD simulations. It is often the key kernel to optimize to achieve maximum performance on various platforms. Here we report on a project to optimize the domain wall fermion Dirac operator in Columbia Physics System (CPS) using the R-Stream source-to-source compiler. Our initial target platform is the Intel PC clusters. We discuss the optimization strategies involved before and after the automatic code generation with R-Stream and present some preliminary benchmark results.