Unverified Commit 84e39b07 authored 4 months ago by Jonathan Wakely Committed by Jonathan Wakely 4 months ago

libstdc++: Add _Hashtable::_M_locate(const key_type&)

We have two overloads of _M_find_before_node but they have quite
different performance characteristics, which isn't necessarily obvious.

The original version, _M_find_before_node(bucket, key, hash_code), looks
only in the specified bucket, doing a linear search within that bucket
for an element that compares equal to the key. This is the typical fast
lookup for hash containers, assuming the load factor is low so that each
bucket isn't too large.

The newer _M_find_before_node(key) was added in r12-6272-ge3ef832a9e8d6a
and could be naively assumed to calculate the hash code and bucket for
key and then call the efficient _M_find_before_node(bkt, key, code)
function. But in fact it does a linear search of the entire container.
This is potentially very slow and should only be used for a suitably
small container, as determined by the __small_size_threshold() function.
We don't even have a comment pointing out this O(N) performance of the
newer overload.

Additionally, the newer overload is only ever used in exactly one place,
which would suggest it could just be removed. However there are several
places that do the linear search of the whole container with an explicit
loop each time.

This adds a new member function, _M_locate, and uses it to replace most
uses of _M_find_node and the loops doing linear searches. This new
member function does both forms of lookup, the linear search for small
sizes and the _M_find_node(bkt, key, code) lookup within a single
bucket. The new function returns a __location_type which is a struct
that contains a pointer to the first node matching the key (if such a
node is present), or the hash code and bucket index for the key. The
hash code and bucket index allow the caller to know where a new node
with that key should be inserted, for the cases where the lookup didn't
find a matching node.

The result struct actually contains a pointer to the node *before* the
one that was located, as that is needed for it to be useful in erase and
extract members. There is a member function that returns the found node,
i.e. _M_before->_M_nxt downcast to __node_ptr, which should be used in
most cases.

This new function greatly simplifies the functions that currently have
to do two kinds of lookup and explicitly check the current size against
the small size threshold.

Additionally, now that try_emplace is defined directly in _Hashtable
(not in _Insert_base) we can use _M_locate in there too, to speed up
some try_emplace calls. Previously it did not do the small-size linear
search.

It would be possible to add a function to get a __location_type from an
iterator, and then rewrite some functions like _M_erase and
_M_extract_node to take a __location_type parameter. While that might be
conceptually nice, it wouldn't really make the code any simpler or more
readable than it is now. That isn't done in this change.

libstdc++-v3/ChangeLog:

	* include/bits/hashtable.h (__location_type): New struct.
	(_M_locate): New member function.
	(_M_find_before_node(const key_type&)): Remove.
	(_M_find_node): Move variable initialization into condition.
	(_M_find_node_tr): Likewise.
	(operator=(initializer_list<T>), try_emplace, _M_reinsert_node)
	(_M_merge_unique, find, erase(const key_type&)): Use _M_locate
	for lookup.

parent a147bfca

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 145 additions and 188 deletions

Please register or to comment