Skip to content
Snippets Groups Projects
  • Jakub Jelinek's avatar
    c2565a31
    middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support · c2565a31
    Jakub Jelinek authored
    Here is a complete patch to add std::bfloat16_t support on
    x86 (AArch64 and ARM left for later).  Almost no BFmode optabs
    are added by the patch, so for binops/unops it extends to SFmode
    first and then truncates back to BFmode.
    For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
    of all those conversions so that we avoid double rounding, for
    BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
    it emits BFmode -> SFmode conversion first and then converts to the even
    wider mode, neither step should be imprecise.
    For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
    and then SFmode -> HFmode, because neither format is subset or superset
    of the other, while SFmode is superset of both.
    expr.cc then contains a -ffast-math optimization of the BF -> SF and
    SF -> BF conversions if we don't optimize for space (and for the latter
    if -frounding-math isn't enabled either).
    For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
    but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
    || !flag_unsafe_math_optimizations, because I think the insn doesn't
    raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
    By default (unless x86 -fexcess-precision=16) we use float excess
    precision for BFmode, so truncate only on explicit casts and assignments.
    The patch introduces a single __bf16 builtin - __builtin_nansf16b,
    because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN,
    and uses f16b suffix instead of bf16 because there would be ambiguity on
    log vs. logb - __builtin_logbf16 could be either log with bf16 suffix
    or logb with f16 suffix.  In other cases libstdc++ should mostly use
    __builtin_*f for std::bfloat16_t overloads (we have a problem with
    std::nextafter though but that one we have also for std::float16_t).
    
    2022-10-14  Jakub Jelinek  <jakub@redhat.com>
    
    gcc/
    	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
    	* tree.h (bfloat16_type_node): Define.
    	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
    	like float16_type_mode.
    	(build_common_tree_nodes): Initialize bfloat16_type_node if
    	BFmode is supported.
    	* expmed.h (maybe_expand_shift): Declare.
    	* expmed.cc (maybe_expand_shift): No longer static.
    	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
    	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
    	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
    	-ffast-math generic implementation for BF -> SF and SF -> BF
    	conversions.
    	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
    	* builtins.def (BUILT_IN_NANSF16B): New builtin.
    	* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
    	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
    	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
    	for -msse2.
    	(ix86_mangle_type): Mangle BFmode as DF16b.
    	(ix86_invalid_conversion, ix86_invalid_unary_op,
    	ix86_invalid_binary_op): Remove.
    	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
    	TARGET_INVALID_BINARY_OP): Don't redefine.
    	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
    	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
    	ix86_bf16_type_node, only create it if still NULL.
    	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
    	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
    gcc/c-family/
    	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
    	predefine __BFLT16_*__ macros and for C++23 also
    	__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
    	macros for -fbuilding-libgcc.
    	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
    gcc/c/
    	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
    	double.
    gcc/cp/
    	* cp-tree.h (extended_float_type_p): Return true for
    	bfloat16_type_node.
    	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
    	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
    gcc/testsuite/
    	* lib/target-supports.exp (check_effective_target_bfloat16,
    	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
    	New.
    	* gcc.dg/torture/bfloat16-basic.c: New test.
    	* gcc.dg/torture/bfloat16-builtin.c: New test.
    	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
    	* gcc.dg/torture/bfloat16-complex.c: New test.
    	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
    	from bfloat16-builtin-issignaling-1.c.
    	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
    	bfloat16-basic.c.
    	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
    	diagnostics.
    	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
    	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
    	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
    libcpp/
    	* include/cpplib.h (CPP_N_BFLOAT16): Define.
    	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
    	C++.
    libgcc/
    	* config/i386/t-softfp (softfp_extensions): Add bfsf.
    	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
    	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
    	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
    	-msse2.
    	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
    	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
    	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
    	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
    	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
    	* soft-fp/brain.h: New file.
    	* soft-fp/truncsfbf2.c: New file.
    	* soft-fp/truncdfbf2.c: New file.
    	* soft-fp/truncxfbf2.c: New file.
    	* soft-fp/trunctfbf2.c: New file.
    	* soft-fp/trunchfbf2.c: New file.
    	* soft-fp/truncbfhf2.c: New file.
    	* soft-fp/extendbfsf2.c: New file.
    libiberty/
    	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
    	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
    	entry.
    	(cplus_demangle_type): Demangle DF16b.
    	* testsuite/demangle-expected (_Z3xxxDF16b): New test.
    c2565a31
    History
    middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
    Jakub Jelinek authored
    Here is a complete patch to add std::bfloat16_t support on
    x86 (AArch64 and ARM left for later).  Almost no BFmode optabs
    are added by the patch, so for binops/unops it extends to SFmode
    first and then truncates back to BFmode.
    For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
    of all those conversions so that we avoid double rounding, for
    BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
    it emits BFmode -> SFmode conversion first and then converts to the even
    wider mode, neither step should be imprecise.
    For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
    and then SFmode -> HFmode, because neither format is subset or superset
    of the other, while SFmode is superset of both.
    expr.cc then contains a -ffast-math optimization of the BF -> SF and
    SF -> BF conversions if we don't optimize for space (and for the latter
    if -frounding-math isn't enabled either).
    For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
    but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
    || !flag_unsafe_math_optimizations, because I think the insn doesn't
    raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
    By default (unless x86 -fexcess-precision=16) we use float excess
    precision for BFmode, so truncate only on explicit casts and assignments.
    The patch introduces a single __bf16 builtin - __builtin_nansf16b,
    because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN,
    and uses f16b suffix instead of bf16 because there would be ambiguity on
    log vs. logb - __builtin_logbf16 could be either log with bf16 suffix
    or logb with f16 suffix.  In other cases libstdc++ should mostly use
    __builtin_*f for std::bfloat16_t overloads (we have a problem with
    std::nextafter though but that one we have also for std::float16_t).
    
    2022-10-14  Jakub Jelinek  <jakub@redhat.com>
    
    gcc/
    	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
    	* tree.h (bfloat16_type_node): Define.
    	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
    	like float16_type_mode.
    	(build_common_tree_nodes): Initialize bfloat16_type_node if
    	BFmode is supported.
    	* expmed.h (maybe_expand_shift): Declare.
    	* expmed.cc (maybe_expand_shift): No longer static.
    	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
    	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
    	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
    	-ffast-math generic implementation for BF -> SF and SF -> BF
    	conversions.
    	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
    	* builtins.def (BUILT_IN_NANSF16B): New builtin.
    	* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
    	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
    	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
    	for -msse2.
    	(ix86_mangle_type): Mangle BFmode as DF16b.
    	(ix86_invalid_conversion, ix86_invalid_unary_op,
    	ix86_invalid_binary_op): Remove.
    	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
    	TARGET_INVALID_BINARY_OP): Don't redefine.
    	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
    	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
    	ix86_bf16_type_node, only create it if still NULL.
    	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
    	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
    gcc/c-family/
    	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
    	predefine __BFLT16_*__ macros and for C++23 also
    	__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
    	macros for -fbuilding-libgcc.
    	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
    gcc/c/
    	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
    	double.
    gcc/cp/
    	* cp-tree.h (extended_float_type_p): Return true for
    	bfloat16_type_node.
    	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
    	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
    gcc/testsuite/
    	* lib/target-supports.exp (check_effective_target_bfloat16,
    	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
    	New.
    	* gcc.dg/torture/bfloat16-basic.c: New test.
    	* gcc.dg/torture/bfloat16-builtin.c: New test.
    	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
    	* gcc.dg/torture/bfloat16-complex.c: New test.
    	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
    	from bfloat16-builtin-issignaling-1.c.
    	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
    	bfloat16-basic.c.
    	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
    	diagnostics.
    	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
    	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
    	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
    libcpp/
    	* include/cpplib.h (CPP_N_BFLOAT16): Define.
    	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
    	C++.
    libgcc/
    	* config/i386/t-softfp (softfp_extensions): Add bfsf.
    	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
    	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
    	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
    	-msse2.
    	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
    	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
    	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
    	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
    	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
    	* soft-fp/brain.h: New file.
    	* soft-fp/truncsfbf2.c: New file.
    	* soft-fp/truncdfbf2.c: New file.
    	* soft-fp/truncxfbf2.c: New file.
    	* soft-fp/trunctfbf2.c: New file.
    	* soft-fp/trunchfbf2.c: New file.
    	* soft-fp/truncbfhf2.c: New file.
    	* soft-fp/extendbfsf2.c: New file.
    libiberty/
    	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
    	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
    	entry.
    	(cplus_demangle_type): Demangle DF16b.
    	* testsuite/demangle-expected (_Z3xxxDF16b): New test.