Skip to content
Snippets Groups Projects
  • Jakub Jelinek's avatar
    1844a4aa
    libcpp, c, middle-end: Optimize initializers using #embed in C · 1844a4aa
    Jakub Jelinek authored
    This patch actually optimizes #embed, so far in C.
    
    For a simple testcase (for 494447200 bytes long cc1plus):
    cat embed-11.c
    unsigned char a[] = {
      #embed "cc1plus"
    };
    time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
    
    real    0m13.647s
    user    0m7.157s
    sys     0m2.597s
    time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
    
    real    0m28.649s
    user    0m26.653s
    sys     0m1.958s
    
    and when configured against binutils with .base64 support
    time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
    
    real    0m4.283s
    user    0m2.288s
    sys     0m0.859s
    time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
    
    real    0m6.888s
    user    0m5.876s
    sys     0m1.002s
    
    (all times with --enable-checking=yes,rtl,extra compiler).
    
    Even just
    ./cc1plus -E -o embed-11.i embed-11.c
    (which doesn't have this optimization yet and so preprocesses it as
    1.3GB preprocessed file) needed almost 25GB of compile time RAM (but
    preprocessed fine).
    And compiling that embed-11.i with -std=c23 -O0 by unpatched gcc
    I gave up after 400 seconds when it already ate 45GB of RAM and didn't
    produce a single byte into embed-11.s yet.
    
    The patch introduces a new CPP_EMBED token which contains raw memory image
    virtually representing a sequence of int literals.
    To simplify the parsing complexities, the preprocessor guarantees CPP_EMBED
    is only emitted if there are 4+ (it actually does that for 64+ right now)
    literals in the sequence and emits CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA
    CPP_NUMBER tokens (with more CPP_EMBED separated by CPP_COMMA if it is
    longer than 2GB, as STRING_CSTs in GCC and also the new RAW_DATA_CST etc.
    are limited to INT_MAX elements).  The main reason is that the preprocessor
    doesn't really know in which context #embed directive appears, there could
    be e.g.
    { 25 *
      #embed "whatever"
    * 2 - 15 }
    or similar and dealing with this special case deep in the expression parsing
    is undesirable.
    With the CPP_NUMBERs around it, I believe in the C FE the only places which
    need handling of the CPP_EMBED token are initializer parsing (that is the
    only one which adds actual optimizations for it), comma expressions (I
    believe nothing really cares whether it is 25,13,95 or
    25,13,0,1,2,3,4,5,6,7,8,9,10,13,95 etc., so besides the 2 outer CPP_NUMBER
    the parsing just adds one INTEGER_CST to the comma expression, I doubt users
    want to be spammed with millions of -Wunused warnings per #embed),
    whatever uses c_parser_expr_list (function calls, attribute arguments,
    OpenMP sizes clause argument, OpenACC tile clause argument and whatever uses
    c_parser_get_builtin_args (mainly for __builtin_shufflevector).  Please correct
    me if I'm wrong.
    
    The patch introduces a RAW_DATA_CST tree code, which can then be used inside
    of array CONSTRUCTOR elt values.  In some sense RAW_DATA_CST is similar to
    STRING_CST, but right now STRING_CST is used only if the whole array
    initializer is that constant, while RAW_DATA_CST at index idx (should be
    always INTEGER_CST index, another advantage of the CPP_NUMBER around is that
    [30 ... 250] =
      #embed "whatever"
    really does what it would do with a integer sequence there) stands for
    [idx] = RAW_DATA_POINTER (val)[0],
    [idx+1] = RAW_DATA_POINTER (val)[1],
    ...
    [idx+RAW_DATA_LENGTH (val)-1] = RAW_DATA_POINTER (val)[RAW_DATA_LENGTH (val)-1].
    Another important thing is that unlike STRING_CST which has the data
    embedded in it RAW_DATA_CST doesn't own the data, it has RAW_DATA_OWNER
    which owns the data (that can be a STRING_CST, e.g. used for PCH or LTO
    after reading LTO in) or another RAW_DATA_CST (with NULL RAW_DATA_OWNER,
    standing for data owned by libcpp buffers).  The advantage is that it can be
    cheaply peeled off, or split into multiple smaller pieces, e.g. if one uses
    designated initializer to store something into the middle of a 10GB #embed
    array, in no case we need to actually copy data around for that.
    Right now RAW_DATA_CST is only used in initializers of integral arrays where
    the integer type has (host) CHAR_BIT precision, so usually char/signed
    char/unsigned char (for C++ later maybe std::byte); in theory we could say
    allocate 4 times as big buffer for conversions to int array and depending
    on endianity and storage order reversal etc., but I'm not sure if that is
    something that will be actually needed in the wild.
    And an optimization inside of c-common.cc attempts to undo that CPP_NUMBER
    CPP_EMBED CPP_NUMBER division in case one uses #embed the usual way and
    doesn't use the boundary literals in weird ways and the values there match
    the surrounding bytes in the owner buffer.
    
    For LTO, in order to avoid copying perhaps gigabytes long data around,
    the hacks in the streamer out/in cause the data owned by libcpp to be
    streamed right into the stream and streamed back as a STRING_CST which
    owns the data.
    
    2024-10-16  Jakub Jelinek  <jakub@redhat.com>
    
    libcpp/
    	* include/cpplib.h (TTYPE_TABLE): Add CPP_EMBED token type.
    	* files.cc (finish_embed): For limit >= 64 and C preprocessing
    	instead of emitting CPP_NUMBER CPP_COMMA separated sequence for the
    	whole embed emit it just for the first and last byte and in between
    	emit a CPP_EMBED token or tokens if too large.
    gcc/
    	* treestruct.def (TS_RAW_DATA_CST): New.
    	* tree.def (RAW_DATA_CST): New tree code.
    	* tree-core.h (struct tree_raw_data): New type.
    	(union tree_node): Add raw_data_cst member.
    	* tree.h (RAW_DATA_LENGTH, RAW_DATA_POINTER, RAW_DATA_OWNER): Define.
    	(gt_ggc_mx, gt_pch_nx): Declare overloads for tree_raw_data *.
    	* tree.cc (tree_node_structure_for_code): Handle RAW_DATA_CST.
    	(initialize_tree_contains_struct): Handle TS_RAW_DATA_CST.
    	(tree_code_size): Handle RAW_DATA_CST.
    	(initializer_zerop): Likewise.
    	(gt_ggc_mx, gt_pch_nx): Define overloads for tree_raw_data *.
    	* gimplify.cc (gimplify_init_ctor_eval): Handle RAW_DATA_CST.
    	* fold-const.cc (operand_compare::operand_equal_p): Handle
    	RAW_DATA_CST.  Formatting fix.
    	(operand_compare::hash_operand): Handle RAW_DATA_CST.
    	(native_encode_initializer): Likewise.
    	(get_array_ctor_element_at_index): Likewise.
    	(fold): Likewise.
    	* gimple-fold.cc (fold_array_ctor_reference): Likewise.  Formatting
    	fix.
    	* varasm.cc (const_hash_1): Handle RAW_DATA_CST.
    	(initializer_constant_valid_p_1): Likewise.
    	(array_size_for_constructor): Likewise.
    	(output_constructor_regular_field): Likewise.
    	* expr.cc (categorize_ctor_elements_1): Likewise.
    	(expand_expr_real_1) <case ARRAY_REF>: Punt for RAW_DATA_CST.
    	* tree-streamer.cc (streamer_check_handled_ts_structures): Mark
    	TS_RAW_DATA_CST as handled.
    	* tree-streamer-in.cc (streamer_alloc_tree): Handle RAW_DATA_CST.
    	(lto_input_ts_raw_data_cst_tree_pointers): New function.
    	(streamer_read_tree_body): Call it for RAW_DATA_CST.
    	* tree-streamer-out.cc (write_ts_raw_data_cst_tree_pointers): New
    	function.
    	(streamer_write_tree_body): Call it for RAW_DATA_CST.
    	(streamer_write_tree_header): Handle RAW_DATA_CST.
    	* lto-streamer-out.cc (DFS::DFS_write_tree_body): Handle RAW_DATA_CST.
    	* tree-pretty-print.cc (dump_generic_node): Likewise.
    gcc/c-family/
    	* c-ppoutput.cc (token_streamer::stream): Add special code to spell
    	CPP_EMBED token.
    	* c-lex.cc (c_lex_with_flags): Handle CPP_EMBED.  Formatting fix.
    	* c-common.cc (c_parse_error): Handle CPP_EMBED.
    	(braced_list_to_string): Optimize RAW_DATA_CST surrounded by
    	INTEGER_CSTs which match some bytes before or after RAW_DATA_CST in
    	its owner.
    gcc/c/
    	* c-parser.cc (c_parser_braced_init): Handle CPP_EMBED.
    	(c_parser_get_builtin_args): Likewise.
    	(c_parser_expression): Likewise.
    	(c_parser_expr_list): Likewise.
    	* c-typeck.cc (digest_init): Handle RAW_DATA_CST.  Formatting fix.
    	(init_node_successor): New function.
    	(add_pending_init): Handle RAW_DATA_CST.
    	(set_nonincremental_init): Formatting fix.
    	(output_init_element): Handle RAW_DATA_CST.  Formatting fixes.
    	(maybe_split_raw_data): New function.
    	(process_init_element): Use maybe_split_raw_data.  Handle
    	RAW_DATA_CST.
    gcc/testsuite/
    	* c-c++-common/cpp/embed-20.c: New test.
    	* c-c++-common/cpp/embed-21.c: New test.
    	* c-c++-common/cpp/embed-28.c: New test.
    	* gcc.dg/cpp/embed-8.c: New test.
    	* gcc.dg/cpp/embed-9.c: New test.
    	* gcc.dg/cpp/embed-10.c: New test.
    	* gcc.dg/cpp/embed-11.c: New test.
    	* gcc.dg/cpp/embed-12.c: New test.
    	* gcc.dg/cpp/embed-13.c: New test.
    	* gcc.dg/cpp/embed-14.c: New test.
    	* gcc.dg/cpp/embed-15.c: New test.
    	* gcc.dg/cpp/embed-16.c: New test.
    	* gcc.dg/pch/embed-1.c: New test.
    	* gcc.dg/pch/embed-1.hs: New test.
    	* gcc.dg/lto/embed-1_0.c: New test.
    	* gcc.dg/lto/embed-1_1.c: New test.
    1844a4aa
    History
    libcpp, c, middle-end: Optimize initializers using #embed in C
    Jakub Jelinek authored
    This patch actually optimizes #embed, so far in C.
    
    For a simple testcase (for 494447200 bytes long cc1plus):
    cat embed-11.c
    unsigned char a[] = {
      #embed "cc1plus"
    };
    time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
    
    real    0m13.647s
    user    0m7.157s
    sys     0m2.597s
    time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
    
    real    0m28.649s
    user    0m26.653s
    sys     0m1.958s
    
    and when configured against binutils with .base64 support
    time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
    
    real    0m4.283s
    user    0m2.288s
    sys     0m0.859s
    time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
    
    real    0m6.888s
    user    0m5.876s
    sys     0m1.002s
    
    (all times with --enable-checking=yes,rtl,extra compiler).
    
    Even just
    ./cc1plus -E -o embed-11.i embed-11.c
    (which doesn't have this optimization yet and so preprocesses it as
    1.3GB preprocessed file) needed almost 25GB of compile time RAM (but
    preprocessed fine).
    And compiling that embed-11.i with -std=c23 -O0 by unpatched gcc
    I gave up after 400 seconds when it already ate 45GB of RAM and didn't
    produce a single byte into embed-11.s yet.
    
    The patch introduces a new CPP_EMBED token which contains raw memory image
    virtually representing a sequence of int literals.
    To simplify the parsing complexities, the preprocessor guarantees CPP_EMBED
    is only emitted if there are 4+ (it actually does that for 64+ right now)
    literals in the sequence and emits CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA
    CPP_NUMBER tokens (with more CPP_EMBED separated by CPP_COMMA if it is
    longer than 2GB, as STRING_CSTs in GCC and also the new RAW_DATA_CST etc.
    are limited to INT_MAX elements).  The main reason is that the preprocessor
    doesn't really know in which context #embed directive appears, there could
    be e.g.
    { 25 *
      #embed "whatever"
    * 2 - 15 }
    or similar and dealing with this special case deep in the expression parsing
    is undesirable.
    With the CPP_NUMBERs around it, I believe in the C FE the only places which
    need handling of the CPP_EMBED token are initializer parsing (that is the
    only one which adds actual optimizations for it), comma expressions (I
    believe nothing really cares whether it is 25,13,95 or
    25,13,0,1,2,3,4,5,6,7,8,9,10,13,95 etc., so besides the 2 outer CPP_NUMBER
    the parsing just adds one INTEGER_CST to the comma expression, I doubt users
    want to be spammed with millions of -Wunused warnings per #embed),
    whatever uses c_parser_expr_list (function calls, attribute arguments,
    OpenMP sizes clause argument, OpenACC tile clause argument and whatever uses
    c_parser_get_builtin_args (mainly for __builtin_shufflevector).  Please correct
    me if I'm wrong.
    
    The patch introduces a RAW_DATA_CST tree code, which can then be used inside
    of array CONSTRUCTOR elt values.  In some sense RAW_DATA_CST is similar to
    STRING_CST, but right now STRING_CST is used only if the whole array
    initializer is that constant, while RAW_DATA_CST at index idx (should be
    always INTEGER_CST index, another advantage of the CPP_NUMBER around is that
    [30 ... 250] =
      #embed "whatever"
    really does what it would do with a integer sequence there) stands for
    [idx] = RAW_DATA_POINTER (val)[0],
    [idx+1] = RAW_DATA_POINTER (val)[1],
    ...
    [idx+RAW_DATA_LENGTH (val)-1] = RAW_DATA_POINTER (val)[RAW_DATA_LENGTH (val)-1].
    Another important thing is that unlike STRING_CST which has the data
    embedded in it RAW_DATA_CST doesn't own the data, it has RAW_DATA_OWNER
    which owns the data (that can be a STRING_CST, e.g. used for PCH or LTO
    after reading LTO in) or another RAW_DATA_CST (with NULL RAW_DATA_OWNER,
    standing for data owned by libcpp buffers).  The advantage is that it can be
    cheaply peeled off, or split into multiple smaller pieces, e.g. if one uses
    designated initializer to store something into the middle of a 10GB #embed
    array, in no case we need to actually copy data around for that.
    Right now RAW_DATA_CST is only used in initializers of integral arrays where
    the integer type has (host) CHAR_BIT precision, so usually char/signed
    char/unsigned char (for C++ later maybe std::byte); in theory we could say
    allocate 4 times as big buffer for conversions to int array and depending
    on endianity and storage order reversal etc., but I'm not sure if that is
    something that will be actually needed in the wild.
    And an optimization inside of c-common.cc attempts to undo that CPP_NUMBER
    CPP_EMBED CPP_NUMBER division in case one uses #embed the usual way and
    doesn't use the boundary literals in weird ways and the values there match
    the surrounding bytes in the owner buffer.
    
    For LTO, in order to avoid copying perhaps gigabytes long data around,
    the hacks in the streamer out/in cause the data owned by libcpp to be
    streamed right into the stream and streamed back as a STRING_CST which
    owns the data.
    
    2024-10-16  Jakub Jelinek  <jakub@redhat.com>
    
    libcpp/
    	* include/cpplib.h (TTYPE_TABLE): Add CPP_EMBED token type.
    	* files.cc (finish_embed): For limit >= 64 and C preprocessing
    	instead of emitting CPP_NUMBER CPP_COMMA separated sequence for the
    	whole embed emit it just for the first and last byte and in between
    	emit a CPP_EMBED token or tokens if too large.
    gcc/
    	* treestruct.def (TS_RAW_DATA_CST): New.
    	* tree.def (RAW_DATA_CST): New tree code.
    	* tree-core.h (struct tree_raw_data): New type.
    	(union tree_node): Add raw_data_cst member.
    	* tree.h (RAW_DATA_LENGTH, RAW_DATA_POINTER, RAW_DATA_OWNER): Define.
    	(gt_ggc_mx, gt_pch_nx): Declare overloads for tree_raw_data *.
    	* tree.cc (tree_node_structure_for_code): Handle RAW_DATA_CST.
    	(initialize_tree_contains_struct): Handle TS_RAW_DATA_CST.
    	(tree_code_size): Handle RAW_DATA_CST.
    	(initializer_zerop): Likewise.
    	(gt_ggc_mx, gt_pch_nx): Define overloads for tree_raw_data *.
    	* gimplify.cc (gimplify_init_ctor_eval): Handle RAW_DATA_CST.
    	* fold-const.cc (operand_compare::operand_equal_p): Handle
    	RAW_DATA_CST.  Formatting fix.
    	(operand_compare::hash_operand): Handle RAW_DATA_CST.
    	(native_encode_initializer): Likewise.
    	(get_array_ctor_element_at_index): Likewise.
    	(fold): Likewise.
    	* gimple-fold.cc (fold_array_ctor_reference): Likewise.  Formatting
    	fix.
    	* varasm.cc (const_hash_1): Handle RAW_DATA_CST.
    	(initializer_constant_valid_p_1): Likewise.
    	(array_size_for_constructor): Likewise.
    	(output_constructor_regular_field): Likewise.
    	* expr.cc (categorize_ctor_elements_1): Likewise.
    	(expand_expr_real_1) <case ARRAY_REF>: Punt for RAW_DATA_CST.
    	* tree-streamer.cc (streamer_check_handled_ts_structures): Mark
    	TS_RAW_DATA_CST as handled.
    	* tree-streamer-in.cc (streamer_alloc_tree): Handle RAW_DATA_CST.
    	(lto_input_ts_raw_data_cst_tree_pointers): New function.
    	(streamer_read_tree_body): Call it for RAW_DATA_CST.
    	* tree-streamer-out.cc (write_ts_raw_data_cst_tree_pointers): New
    	function.
    	(streamer_write_tree_body): Call it for RAW_DATA_CST.
    	(streamer_write_tree_header): Handle RAW_DATA_CST.
    	* lto-streamer-out.cc (DFS::DFS_write_tree_body): Handle RAW_DATA_CST.
    	* tree-pretty-print.cc (dump_generic_node): Likewise.
    gcc/c-family/
    	* c-ppoutput.cc (token_streamer::stream): Add special code to spell
    	CPP_EMBED token.
    	* c-lex.cc (c_lex_with_flags): Handle CPP_EMBED.  Formatting fix.
    	* c-common.cc (c_parse_error): Handle CPP_EMBED.
    	(braced_list_to_string): Optimize RAW_DATA_CST surrounded by
    	INTEGER_CSTs which match some bytes before or after RAW_DATA_CST in
    	its owner.
    gcc/c/
    	* c-parser.cc (c_parser_braced_init): Handle CPP_EMBED.
    	(c_parser_get_builtin_args): Likewise.
    	(c_parser_expression): Likewise.
    	(c_parser_expr_list): Likewise.
    	* c-typeck.cc (digest_init): Handle RAW_DATA_CST.  Formatting fix.
    	(init_node_successor): New function.
    	(add_pending_init): Handle RAW_DATA_CST.
    	(set_nonincremental_init): Formatting fix.
    	(output_init_element): Handle RAW_DATA_CST.  Formatting fixes.
    	(maybe_split_raw_data): New function.
    	(process_init_element): Use maybe_split_raw_data.  Handle
    	RAW_DATA_CST.
    gcc/testsuite/
    	* c-c++-common/cpp/embed-20.c: New test.
    	* c-c++-common/cpp/embed-21.c: New test.
    	* c-c++-common/cpp/embed-28.c: New test.
    	* gcc.dg/cpp/embed-8.c: New test.
    	* gcc.dg/cpp/embed-9.c: New test.
    	* gcc.dg/cpp/embed-10.c: New test.
    	* gcc.dg/cpp/embed-11.c: New test.
    	* gcc.dg/cpp/embed-12.c: New test.
    	* gcc.dg/cpp/embed-13.c: New test.
    	* gcc.dg/cpp/embed-14.c: New test.
    	* gcc.dg/cpp/embed-15.c: New test.
    	* gcc.dg/cpp/embed-16.c: New test.
    	* gcc.dg/pch/embed-1.c: New test.
    	* gcc.dg/pch/embed-1.hs: New test.
    	* gcc.dg/lto/embed-1_0.c: New test.
    	* gcc.dg/lto/embed-1_1.c: New test.