Skip to content
Snippets Groups Projects
Unverified Commit 3f04f393 authored by Jonathan Wakely's avatar Jonathan Wakely
Browse files

libstdc++: Fix handling of incomplete UTF-8 sequences in _Unicode_view

Eddie Nolan reported to me that _Unicode_view was not correctly
implementing the substitution of ill-formed subsequences with U+FFFD,
due to failing to increment the counter when the iterator reaches the
end of the sequence before a multibyte sequence is complete.  As a
result, the incomplete sequence was not completely consumed, and then
the remaining character was treated as another ill-formed sequence,
giving two U+FFFD characters instead of one.

To avoid similar mistakes in future, this change introduces a lambda
that increments the iterator and the counter together. This ensures the
counter is always incremented when the iterator is incremented, so that
we always know how many characters have been consumed.

libstdc++-v3/ChangeLog:

	* include/bits/unicode.h (_Unicode_view::_M_read_utf8): Ensure
	count of characters consumed is correct when the end of the
	input is reached unexpectedly.
	* testsuite/ext/unicode/view.cc: Test incomplete UTF-8
	sequences.
parent 9927059b
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment