OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect
When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if neccessary using a temporary variable on the host. include/ChangeLog: * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copyies; invoke new plugins for 2D and 3D copying when available. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
Showing
- include/cuda/cuda.h 85 additions, 0 deletionsinclude/cuda/cuda.h
- libgomp/libgomp-plugin.h 7 additions, 0 deletionslibgomp/libgomp-plugin.h
- libgomp/libgomp.h 2 additions, 0 deletionslibgomp/libgomp.h
- libgomp/libgomp.texi 5 additions, 0 deletionslibgomp/libgomp.texi
- libgomp/oacc-host.c 2 additions, 0 deletionslibgomp/oacc-host.c
- libgomp/plugin/cuda-lib.def 3 additions, 0 deletionslibgomp/plugin/cuda-lib.def
- libgomp/plugin/plugin-nvptx.c 116 additions, 0 deletionslibgomp/plugin/plugin-nvptx.c
- libgomp/target.c 127 additions, 25 deletionslibgomp/target.c
- libgomp/testsuite/libgomp.c/target-12.c 3 additions, 3 deletionslibgomp/testsuite/libgomp.c/target-12.c
- libgomp/testsuite/libgomp.fortran/target-12.f90 3 additions, 3 deletionslibgomp/testsuite/libgomp.fortran/target-12.f90
- libgomp/testsuite/libgomp.fortran/target-memcpy-rect-1.f90 531 additions, 0 deletionslibgomp/testsuite/libgomp.fortran/target-memcpy-rect-1.f90
Loading
Please register or sign in to comment