# srun -N1 -n4 benchmarks/dslash.py --shm 2048 --device-mem 16000 --grid 64.32.32.32 --mpi 4.1.1.1 --accelerator-threads 8 --Ls 12
OPENMPI detected
AcceleratorCudaInit: using default device 
AcceleratorCudaInit: assume user either uses a) IBM jsrun, or 
AcceleratorCudaInit: b) invokes through a wrapping script to set CUDA_VISIBLE_DEVICES, UCX_NET_DEVICES, and numa binding 
AcceleratorCudaInit: Configure options --enable-summit, --enable-select-gpu=no 
AcceleratorCudaInit: ================================================
SharedMemoryMpi:  World communicator of size 4
SharedMemoryMpi:  Node  communicator of size 4
0SharedMemoryMpi:  SharedMemoryMPI.cc acceleratorAllocDevice 2147483648bytes at 0x14b880000000 for comms buffers 

__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|_ |  |  |  |  |  |  |  |  |  |  |  | _|__
__|_                                    _|__
__|_   GGGG    RRRR    III    DDDD      _|__
__|_  G        R   R    I     D   D     _|__
__|_  G        R   R    I     D    D    _|__
__|_  G  GG    RRRR     I     D    D    _|__
__|_  G   G    R  R     I     D   D     _|__
__|_   GGGG    R   R   III    DDDD      _|__
__|_                                    _|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
  |  |  |  |  |  |  |  |  |  |  |  |  |  |  


Copyright (C) 2015 Peter Boyle, Azusa Yamaguchi, Guido Cossu, Antonin Portelli and other authors

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
Current Grid git commit hash=9c9566b9c9686a63755f39bb8910cf5325ef7177: (HEAD -> feature/gpt, origin/feature/gpt, origin/HEAD) clean

Grid : Message : ================================================ 
Grid : Message : MPI is initialised and logging filters activated 
Grid : Message : ================================================ 
Grid : Message : Requested 2147483648 byte stencil comms buffers 
Grid : Message : MemoryManager Cache 16777216000 bytes 
Grid : Message : MemoryManager::Init() setting up
Grid : Message : MemoryManager::Init() cache pool for recent allocations: SMALL 32 LARGE 8
Grid : Message : MemoryManager::Init() Non unified: Caching accelerator data in dedicated memory
Grid : Message : MemoryManager::Init() Using cudaMalloc

=============================================
              Initialized GPT                
     Copyright (C) 2020 Christoph Lehner     
=============================================
GPT :       1.543473 s : 
                       : DWF Dslash Benchmark with
                       :     fdimensions  : [64, 32, 32, 32]
                       :     precision    : single
                       :     Ls           : 12
                       : 
GPT :       7.958636 s : 1000 applications of Dhop
                       :     Time to complete            : 2.93 s
                       :     Total performance           : 11325.46 GFlops/s
                       :     Effective memory bandwidth  : 7824.86 GB/s
GPT :       7.959499 s : 
                       : DWF Dslash Benchmark with
                       :     fdimensions  : [64, 32, 32, 32]
                       :     precision    : double
                       :     Ls           : 12
                       : 
GPT :      17.420620 s : 1000 applications of Dhop
                       :     Time to complete            : 5.78 s
                       :     Total performance           : 5749.77 GFlops/s
                       :     Effective memory bandwidth  : 7945.14 GB/s
=============================================
               Finalized GPT                 
=============================================