# srun -N1 -n4 benchmarks/dslash.py --shm 2048 --device-mem 16000 --grid 64.32.32.32 --mpi 4.1.1.1 --accelerator-threads 8 --Ls 12 OPENMPI detected AcceleratorCudaInit: using default device AcceleratorCudaInit: assume user either uses a) IBM jsrun, or AcceleratorCudaInit: b) invokes through a wrapping script to set CUDA_VISIBLE_DEVICES, UCX_NET_DEVICES, and numa binding AcceleratorCudaInit: Configure options --enable-summit, --enable-select-gpu=no AcceleratorCudaInit: ================================================ SharedMemoryMpi: World communicator of size 4 SharedMemoryMpi: Node communicator of size 4 0SharedMemoryMpi: SharedMemoryMPI.cc acceleratorAllocDevice 2147483648bytes at 0x14b880000000 for comms buffers __|__|__|__|__|__|__|__|__|__|__|__|__|__|__ __|__|__|__|__|__|__|__|__|__|__|__|__|__|__ __|_ | | | | | | | | | | | | _|__ __|_ _|__ __|_ GGGG RRRR III DDDD _|__ __|_ G R R I D D _|__ __|_ G R R I D D _|__ __|_ G GG RRRR I D D _|__ __|_ G G R R I D D _|__ __|_ GGGG R R III DDDD _|__ __|_ _|__ __|__|__|__|__|__|__|__|__|__|__|__|__|__|__ __|__|__|__|__|__|__|__|__|__|__|__|__|__|__ | | | | | | | | | | | | | | Copyright (C) 2015 Peter Boyle, Azusa Yamaguchi, Guido Cossu, Antonin Portelli and other authors This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Current Grid git commit hash=9c9566b9c9686a63755f39bb8910cf5325ef7177: (HEAD -> feature/gpt, origin/feature/gpt, origin/HEAD) clean Grid : Message : ================================================ Grid : Message : MPI is initialised and logging filters activated Grid : Message : ================================================ Grid : Message : Requested 2147483648 byte stencil comms buffers Grid : Message : MemoryManager Cache 16777216000 bytes Grid : Message : MemoryManager::Init() setting up Grid : Message : MemoryManager::Init() cache pool for recent allocations: SMALL 32 LARGE 8 Grid : Message : MemoryManager::Init() Non unified: Caching accelerator data in dedicated memory Grid : Message : MemoryManager::Init() Using cudaMalloc ============================================= Initialized GPT Copyright (C) 2020 Christoph Lehner ============================================= GPT : 1.543473 s : : DWF Dslash Benchmark with : fdimensions : [64, 32, 32, 32] : precision : single : Ls : 12 : GPT : 7.958636 s : 1000 applications of Dhop : Time to complete : 2.93 s : Total performance : 11325.46 GFlops/s : Effective memory bandwidth : 7824.86 GB/s GPT : 7.959499 s : : DWF Dslash Benchmark with : fdimensions : [64, 32, 32, 32] : precision : double : Ls : 12 : GPT : 17.420620 s : 1000 applications of Dhop : Time to complete : 5.78 s : Total performance : 5749.77 GFlops/s : Effective memory bandwidth : 7945.14 GB/s ============================================= Finalized GPT =============================================