Skip to content
Snippets Groups Projects
Commit b4e68dd9 authored by Thomas Schwinge's avatar Thomas Schwinge
Browse files

nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred'

For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'): the '0xffffffff'
bitmask isn't correct if not all 32 threads of a warp are active.  The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.

We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing.  (I've tested '-muniform-simt'
code executing single-threaded.)

The change that I proposed on 2022-12-15 was to emit PTX code to calculate
'(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'.
This works, but the PTX JIT generates SASS code to do this computation.

In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon
the original code a little bit, see the following examplary SASS 'diff' before
vs. after this change:

    [...]
              /*[...]*/                   SYNC                                                        (*"BRANCH_TARGETS .L_x_332"*)        }
      .L_x_332:
    -         /*[...]*/                   VOTE.ANY R9, PT, PT ;
    +         /*[...]*/                   VOTE.ALL P1, PT ;
    -         /*[...]*/                   ISETP.NE.U32.AND P1, PT, R9, -0x1, PT ;
    -         /*[...]*/              @!P1 BRA `(.L_x_333) ;
    +         /*[...]*/               @P1 BRA `(.L_x_333) ;
              /*[...]*/                   BPT.TRAP 0x1 ;
      .L_x_333:
    -         /*[...]*/               @P1 EXIT ;
    +         /*[...]*/              @!P1 EXIT ;
    [...]

	gcc/
	* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
	non-full-warp execution, via 'vote.all.pred'.
	gcc/testsuite/
	* gcc.target/nvptx/nvptx.exp
	(check_effective_target_default_ptx_isa_version_at_least_6_0):
	New.
	* gcc.target/nvptx/uniform-simt-2.c: Adjust.
	* gcc.target/nvptx/uniform-simt-5.c: New.
parent 395ac041
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment