Skip to content
Snippets Groups Projects
Commit 4a992eca authored by Andi Kleen's avatar Andi Kleen
Browse files

Rebalance file_cache input line cache dynamically

The input context file_cache maintains an array of anchors
to speed up accessing lines before the previous line.
The array has a fixed upper size and the algorithm relies
on the linemap reporting the maximum number of lines in the file
in advance to compute the position of each anchor in the cache.

This doesn't work for C which doesn't know the maximum number
of lines before the files has finished parsing. The code
has a fallback for this, but it is quite inefficient and
effectively defeats the cache, so many accesses have to
go through most of the input buffer to compute line
boundaries. For large files this can be very costly
as demonstrated in PR118168.

Use a different algorithm to maintain the cache without
needing the maximum number of lines in advance. When the cache
runs out of entries and the gap to the last line anchor gets
too large, prune every second entry in the cache. This maintains
even spacing of the line anchors without requiring the maximum
index.

For the original PR this moves the overhead of enabling
-Wmisleading-indentation to 32% with the default cache size.
With a 10k entry cache it becomes noise.

  cc1 -O0 -fsyntax-only mypy.c   -quiet  ran
    1.03 ± 0.05 times faster than cc1 -O0 -fsyntax-only  mypy.c   -quiet -Wmisleading-indentation --param=file-cache-lines=10000
    1.09 ± 0.08 times faster than cc1 -O0 -fsyntax-only  mypy.c   -quiet -Wmisleading-indentation --param=file-cache-lines=1000
    1.32 ± 0.07 times faster than cc1 -O0 -fsyntax-only  mypy.c   -quiet -Wmisleading-indentation

The code could be further optimized, e.g. use the vectorized
line search functions the preprocessor uses.

Also it seems the input cache always reads the whole file into
memory, so perhaps it should just be using file mmap if possible.

gcc/ChangeLog:

	PR preprocessor/118168
	* input.cc (file_cache_slot::get_next_line): Use new algorithm
	to maintain
	(file_cache_slot::read_line_num): Use binary search for lookup.
parent ae814afa
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment