Jan Hubicka
authored
We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to be more complicated. gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot products in TSVC. Curiously enough breaking these out into microbenchmark reversed the situation and it turns out that the performance depends on how indices are distributed. gather is loss if indices are sequential, neutral if they are random and win for some strides (4, 8). This seems to be similar to earlier zens, so I think (especially for backporting znver5 support) that it makes sense to be conistent and disable gather unless we work out a good heuristics on when to use it. Since we typically do not know the indices in advance, I don't see how that can be done. I opened PR116582 with some examples of wins and loses gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.
Name | Last commit | Last update |
---|