From 2c372e81a996e105571e71108f6427c38ec2a71a Mon Sep 17 00:00:00 2001
From: Tom de Vries <tdevries@suse.de>
Date: Wed, 9 Jan 2019 00:07:45 +0000
Subject: [PATCH] [nvptx, libgomp] Don't launch with num_workers == 0

When using a compiler build with:
...
+#define PTX_DEFAULT_VECTOR_LENGTH PTX_CTA_SIZE
+#define PTX_MAX_VECTOR_LENGTH PTX_CTA_SIZE
...
and running the libgomp testsuite, we run into an execution failure in
parallel-loop-1.c, due to a cuda launch failure:
...
  nvptx_exec: kernel f6_none_none$_omp_fn$0: launch gangs=480, workers=0, \
    vectors=1024

libgomp: cuLaunchKernel error: invalid argument
...
because workers == 0.

The workers variable is set to 0 here in nvptx_exec:
...
                workers = blocks / actual_vectors;
...
because actual_vectors is 1024, and blocks is 768:
...
cuOccupancyMaxPotentialBlockSize: grid = 10, block = 768
...

Fix this by ensuring that workers is at least one.

2019-01-09  Tom de Vries  <tdevries@suse.de>

	* plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least
	one worker.

From-SVN: r267746
---
 libgomp/ChangeLog             | 5 +++++
 libgomp/plugin/plugin-nvptx.c | 1 +
 2 files changed, 6 insertions(+)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 120f0874b27d..fba0ba0562ac 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2019-01-09  Tom de Vries  <tdevries@suse.de>
+
+	* plugin/plugin-nvptx.c (nvptx_exec): Make sure to launch with at least
+	one worker.
+
 2019-01-07  Tom de Vries  <tdevries@suse.de>
 
 	* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Fix
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 572d9ef8d5c3..60553bdf3bd5 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1272,6 +1272,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 				      ? vectors
 				      : dims[GOMP_DIM_VECTOR]);
 		workers = blocks / actual_vectors;
+		workers = MAX (workers, 1);
 	      }
 
 	    for (i = 0; i != GOMP_DIM_MAX; i++)
-- 
GitLab