Skip to content
Snippets Groups Projects
  1. Jan 03, 2024
  2. Nov 14, 2023
    • Jakub Jelinek's avatar
      libcpp, contrib: Update to Unicode 15.1 · d64b7c82
      Jakub Jelinek authored
      The following patch (in plaintext just a pseudo-patch where I've left out
      the too big parts of either wget downloaded or regenerated files out with
      ..., full patch attached compressed) updates to Unicode 15.1 from 15.0
      we had last year.  Apparently Unicode forgot to add a new range to 4-8 Table
      we are using, but from the other files it is clear what should have been
      added; I've filed a bugreport against Unicode.
      
      2023-11-14  Jakub Jelinek  <jakub@redhat.com>
      
      contrib/
      	* unicode/README: Adjust glibc git commit hash, number of Unicode
      	data files to be updated and latest Unicode version.
      	* unicode/from_glibc/utf8_gen.py: Update from glibc.
      	* unicode/UnicodeData.txt: Update from Unicode 15.1.
      	* unicode/EastAsianWidth.txt: Likewise.
      	* unicode/DerivedNormalizationProps.txt: Likewise.
      	* unicode/NameAliases.txt: Likewise.
      	* unicode/DerivedCoreProperties.txt: Likewise.
      	* unicode/PropList.txt: Likewise.
      libcpp/
      	* makeucnid.cc (write_copyright): Update copyright year.
      	* makeuname2c.cc (write_copyright): Likewise.
      	(struct generated): Update latest Unicode version.
      	(generated_ranges): Add 2ebf0-2ee5d CJK UNIFIED IDEOGRAPH
      	range which was forgotten to be added to 4-8 table, but
      	clearly is expected to be there from the 15.1 additions.
      	* ucnid.h: Regenerated.
      	* uname2c.h: Regenerated.
      	* generated_cpp_wcwidth.h: Regenerated.
      d64b7c82
  3. Mar 16, 2023
    • Jakub Jelinek's avatar
      libcpp: Update Unicode copyright years · 99bae6ee
      Jakub Jelinek authored
      I've noticed I forgot to update copyright years when updating from
      Unicode 15.0.0 (and makeucnid.cc had it hopelessly obsolete).
      
      2023-03-16  Jakub Jelinek  <jakub@redhat.com>
      
      	* makeucnid.cc (write_copyright): Update Unicode copyright years
      	up to 2022.
      	* makeuname2c.cc (write_copyright): Likewise.
      	* ucnid.h: Regenerated.
      	* uname2c.h: Regenerated.
      99bae6ee
  4. Jan 16, 2023
  5. Nov 04, 2022
    • Jakub Jelinek's avatar
      libcpp: Update to Unicode 15 · 2662d537
      Jakub Jelinek authored
      The following pseudo-patch regenerates the libcpp tables with Unicode 15.0.0
      which added 4489 new characters.
      
      As mentioned previously, this isn't just a matter of running the
      two libcpp/make*.cc programs on the new Unicode files, but one needs
      to manually update a table inside of makeuname2c.cc according to
      a table in Unicode text (which is partially reflected in the text
      files, but e.g. in Unicode 14.0.0 not 100% accurately, in 15.0.0
      actually accurately).
      I've also added some randomly chosen subset of those 4489 new
      characters to a testcase.
      
      2022-11-04  Jakub Jelinek  <jakub@redhat.com>
      
      gcc/testsuite/
      	* c-c++-common/cpp/named-universal-char-escape-1.c: Add tests for some
      	characters newly added in Unicode 15.0.0.
      libcpp/
      	* makeuname2c.cc (struct generated): Update from Unicode 15.0.0
      	table 4-8.
      	* ucnid.h: Regenerated for Unicode 15.0.0.
      	* uname2c.h: Likewise.
      2662d537
  6. Aug 26, 2022
    • Jakub Jelinek's avatar
      c++: Implement C++23 P2071R2 - Named universal character escapes [PR106648] · eb4879ab
      Jakub Jelinek authored
      The following patch implements the
      C++23 P2071R2 - Named universal character escapes
      paper to support \N{LATIN SMALL LETTER E} etc.
      I've used Unicode 14.0, there are 144803 character name properties
      (including the ones generated by Unicode NR1 and NR2 rules)
      and correction/control/alternate aliases, together with zero terminators
      that would be 3884745 bytes, which is clearly unacceptable for libcpp.
      This patch instead contains a generator which from the UnicodeData.txt
      and NameAliases.txt files emits a space optimized radix tree (208765
      bytes long for 14.0), a single string literal dictionary (59418 bytes),
      maximum name length (currently 88 chars) and two small helper arrays
      for the NR1/NR2 name generation.
      The radix tree needs 2 to 9 bytes per node, the exact format is
      described in the generator program.  There could be ways to shrink
      the dictionary size somewhat at the expense of slightly slower lookups.
      
      Currently the patch implements strict matching (that is what is needed
      to actually implement it on valid code) and Unicode UAX44-LM2 algorithm
      loose matching to provide hints (that algorithm essentially ignores
      hyphens in between two alphanumeric characters, spaces and underscores
      (with one exception for hyphen) and does case insensitive matching).
      In the attachment is a WIP patch that shows how to implement also
      spellcheck.{h,cc} style discovery of misspellings, but I'll need to talk
      to David Malcolm about it, as spellcheck.{h,cc} is in gcc/ subdir
      (so the WIP incremental patch instead prints all the names to stderr).
      
      2022-08-26  Jakub Jelinek  <jakub@redhat.com>
      
      	PR c++/106648
      libcpp/
      	* charset.cc: Implement C++23 P2071R2 - Named universal character
      	escapes.  Include uname2c.h.
      	(hangul_syllables, hangul_count): New variables.
      	(struct uname2c_data): New type.
      	(_cpp_uname2c, _cpp_uname2c_uax44_lm2): New functions.
      	(_cpp_valid_ucn): Use them.  Handle named universal character escapes.
      	(convert_ucn): Adjust comment.
      	(convert_escape): Call convert_ucn even for \N.
      	(_cpp_interpret_identifier): Handle named universal character escapes.
      	* lex.cc (get_bidi_ucn): Fix up function comment formatting.
      	(get_bidi_named): New function.
      	(forms_identifier_p, lex_string): Handle named universal character
      	escapes.
      	* makeuname2c.cc: New file.  Small parts copied from makeucnid.cc.
      	* uname2c.h: New generated file.
      gcc/c-family/
      	* c-cppbuiltin.cc (c_cpp_builtins): Predefine
      	__cpp_named_character_escapes to 202207L.
      gcc/testsuite/
      	* c-c++-common/cpp/named-universal-char-escape-1.c: New test.
      	* c-c++-common/cpp/named-universal-char-escape-2.c: New test.
      	* c-c++-common/cpp/named-universal-char-escape-3.c: New test.
      	* c-c++-common/cpp/named-universal-char-escape-4.c: New test.
      	* c-c++-common/Wbidi-chars-25.c: New test.
      	* gcc.dg/cpp/named-universal-char-escape-1.c: New test.
      	* gcc.dg/cpp/named-universal-char-escape-2.c: New test.
      	* g++.dg/cpp/named-universal-char-escape-1.C: New test.
      	* g++.dg/cpp/named-universal-char-escape-2.C: New test.
      	* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_named_character_escapes.
      eb4879ab
Loading