Warning: OMP_NUM_THREADS=6 is greater than available PU's
Warning: OMP_NUM_THREADS=6 is greater than available PU's
Warning: OMP_NUM_THREADS=6 is greater than available PU's
Warning: OMP_NUM_THREADS=6 is greater than available PU's
Warning: OMP_NUM_THREADS=6 is greater than available PU's
Warning: OMP_NUM_THREADS=6 is greater than available PU's
AcceleratorCudaInit: ========================
AcceleratorCudaInit: Device Number    : 0
AcceleratorCudaInit: ========================
AcceleratorCudaInit: Device identifier: Tesla V100-SXM2-16GB
AcceleratorCudaInit:   totalGlobalMem: 16911433728 
AcceleratorCudaInit:   managedMemory: 1 
AcceleratorCudaInit:   isMultiGpuBoard: 0 
AcceleratorCudaInit:   warpSize: 32 
AcceleratorCudaInit: IBM Summit or similar - NOT setting device to node rank
AcceleratorCudaInit: ================================================
SharedMemoryMpi:  World communicator of size 6
SharedMemoryMpi:  Node  communicator of size 6
SharedMemoryMpi:  SharedMemoryMPI.cc cudaMalloc 536870912bytes at 0x2000e0000000 for comms buffers 

__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|_ |  |  |  |  |  |  |  |  |  |  |  | _|__
__|_                                    _|__
__|_   GGGG    RRRR    III    DDDD      _|__
__|_  G        R   R    I     D   D     _|__
__|_  G        R   R    I     D    D    _|__
__|_  G  GG    RRRR     I     D    D    _|__
__|_  G   G    R  R     I     D   D     _|__
__|_   GGGG    R   R   III    DDDD      _|__
__|_                                    _|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
__|__|__|__|__|__|__|__|__|__|__|__|__|__|__
  |  |  |  |  |  |  |  |  |  |  |  |  |  |  


Copyright (C) 2015 Peter Boyle, Azusa Yamaguchi, Guido Cossu, Antonin Portelli and other authors

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
Current Grid git commit hash=63b0a19f370f643aa5b97f37bd1a18ea33a209f8: (HEAD, origin/feature/gpt, feature/gpt) clean

Grid : Message : ================================================ 
Grid : Message : MPI is initialised and logging filters activated 
Grid : Message : ================================================ 
Grid : Message : Requested 536870912 byte stencil comms buffers 
Grid : Message : MemoryManager Cache 4194304000 bytes 
Grid : Message : MemoryManager::Init() setting up
Grid : Message : MemoryManager::Init() cache pool for recent allocations: SMALL 32 LARGE 16
Grid : Message : MemoryManager::Init() Non unified: Caching accelerator data in dedicated memory
Grid : Message : MemoryManager::Init() Using cudaMalloc
Grid : Message : 4.946401 s : Grid Default Decomposition patterns
Grid : Message : 4.946409 s : 	OpenMP threads : 6
Grid : Message : 4.946420 s : 	MPI tasks      : 6 1 1 1 
Grid : Message : 4.946438 s : 	vRealF         : 512bits ; 2 2 2 2 
Grid : Message : 4.946455 s : 	vRealD         : 512bits ; 1 2 2 2 
Grid : Message : 4.946468 s : 	vComplexF      : 512bits ; 1 2 2 2 
Grid : Message : 4.946480 s : 	vComplexD      : 512bits ; 1 1 2 2 

=============================================
              Initialized GPT                
    Copyright (C) 2020 Christoph Lehner      
=============================================
GPT :       5.096184 s : 
                       : DWF Dslash Benchmark with
                       :     fdimensions  : [48, 24, 24, 24]
                       :     precision    : single
                       :     Ls           : 12
                       : 
GPT :      26.147219 s : 1000 applications of Dhop
                       :     Time to complete            : 1.88 s
                       :     Total performance           : 5603.03 GFlops/s
                       :     Effective memory bandwidth  : 3871.19 GB/s
GPT :      26.155562 s : 
                       : DWF Dslash Benchmark with
                       :     fdimensions  : [48, 24, 24, 24]
                       :     precision    : double
                       :     Ls           : 12
                       : 
GPT :      48.731163 s : 1000 applications of Dhop
                       :     Time to complete            : 5.19 s
                       :     Total performance           : 2025.24 GFlops/s
                       :     Effective memory bandwidth  : 2798.51 GB/s
=============================================
               Finalized GPT                 
=============================================