-
- Downloads
libstdc++: Fix handling of incomplete UTF-8 sequences in _Unicode_view
Eddie Nolan reported to me that _Unicode_view was not correctly implementing the substitution of ill-formed subsequences with U+FFFD, due to failing to increment the counter when the iterator reaches the end of the sequence before a multibyte sequence is complete. As a result, the incomplete sequence was not completely consumed, and then the remaining character was treated as another ill-formed sequence, giving two U+FFFD characters instead of one. To avoid similar mistakes in future, this change introduces a lambda that increments the iterator and the counter together. This ensures the counter is always incremented when the iterator is incremented, so that we always know how many characters have been consumed. libstdc++-v3/ChangeLog: * include/bits/unicode.h (_Unicode_view::_M_read_utf8): Ensure count of characters consumed is correct when the end of the input is reached unexpectedly. * testsuite/ext/unicode/view.cc: Test incomplete UTF-8 sequences.
Loading
Please register or sign in to comment