Opened 2 years ago
Closed 6 months ago
#14853 closed defect (fixed)
COMMON: Rework path handling causes performance regression
| Reported by: | mikrosk | Owned by: | lephilousophe |
|---|---|---|---|
| Priority: | normal | Component: | Common |
| Version: | Keywords: | performance, regression | |
| Cc: | Game: |
Description
https://github.com/scummvm/scummvm/pull/5467 introduced a new (safer) way how to handle file paths.
I was looking forward to this change because I tested its early iteration and it brought quite visible speedup when adding new games.
Surprisingly, when the dust settled after the 2.8.0 release we have found out that our Atari backend regressed in master: adding a game with a lot of files (e.g. Dreamweb with ~100 files) could lead to nearly twice as slow game detection.
Some numbers:
Atari backend, attempting to add a folder with 1000 generated files:
- 2.8.0: 33s
- master: 58s
Linux SDL backend, attempting to add a folder with 100 000 generated files ("for i in $(seq 99999); do touch $(printf "%04d" $i); done"):
- 2.8.0: 17s
- master: 28s
Naturally, no game is going to have 100 000 files, this is just to demonstrate that the powerful CPU can hide even a performance drop to 60% of the original speed.
Fortunately, thanks to LP's responsible approach, each commit is compilable and testable: so in this case the offending commit is https://github.com/scummvm/scummvm/commit/645a35c0 (not so surprising as it directly manipulates the detection code).
Attachments (1)
Change History (13)
comment:1 by , 2 years ago
comment:2 by , 2 years ago
With https://github.com/scummvm/scummvm/pull/5621 merged, I'm happy to report that adding/detecting Dreamweb is now much faster than in 2.8! So the original speed boost I remembered is definitely present/visible.
However, the when it comes to hashing/comparing, there's still some performance decrease if the amount of files is too big: adding a folder with 999 (generated) files is still about 20% slower than in 2.8. The aforementioned PR definitely helped but it's still slower. I can provide a detailed call graph if it helps.
comment:3 by , 7 months ago
| Owner: | set to |
|---|---|
| Resolution: | → fixed |
| Status: | new → pending |
What's the status of this?
IIRC we did some additional fixes.
comment:4 by , 6 months ago
I have just tested it with the generated 999 files scenario:
branch-2-8: 155.72s
latest master: 199.44s
So yes, current master is still about 80% of the 2.8's speed when adding a new game.
comment:5 by , 6 months ago
Thanks for testing again.
Do you have a call graph (or a flame graph) as you proposed? :)
comment:6 by , 6 months ago
Sorry to bug you but since the bug reporter is complaining about scan speed.
Scummvm scans much much faster if you scan from the command line (it also has less false positives in games that falsely scan a inside dir as another version of the game).
The reason is that the command line isn't recursive, so it just checks the top level of the directory given and adds the game or not.
Since I'm in Linux, I can easily automate this for all immediate subdirs in multiple directories like this in the cmd line:
rm -f add.log
find ScummVM ScummVM_Text/ ScummVM_AGS/ -mindepth 1 -maxdepth 1 -type d -exec echo "{}" >> add.log \; -exec scummvm -a -p"{}"
the log file is for if you want to check what games aren't being added in the dirs given (usually because they aren't at the top level dir to be scanned, but sometimes scummvm problems show like the bug I opened about discworld noir and director noir).
Strangely adding games this way tends to add more games even if they don't have the checksums in scummvm. The reason is that adding this way, unlike the gui, will not repeat gameids. And unknown games in the same engine have the same Id (eg: 'Unknown Adventure Game Studio game'). Instead, after the first, it will add the first possibility of the fallbacks (the thing you see when you add a unknown checksum game). Since those possibilities are checked\added by signature files filenames, that one is usually right.
In short, it saves a LOT of time.
comment:7 by , 6 months ago
tends to add more games -> tends to add more games with the correct name
comment:8 by , 6 months ago
Also you probably want the paths given to be absolute. I had them relative just as a example, not sure it works without a absolute path (start from / until the directories holding the game directories you want to scan).
by , 6 months ago
| Attachment: | add-unknown-1000.pdf added |
|---|
comment:9 by , 6 months ago
Thanks to Eero Tamminen who has profiled various use cases I can provide you with a rich set of profiling results:
FOTAQ (3 files):
Time spent in profile = 8.26151s. ... Used cycles: 24.42% 64723246 set256 8.43% 8.56% 8.56% 22356134 22687874 22691592 tolower 6.32% 6.44% 6.63% 16740636 17059548 17577978 Graphics::VectorRendererSpec<unsigned char>::drawBevelSquareAlg(int, int, int, int, int, unsigned char, unsigned char) 3.75% 3.57% 6.95% 9948728 9471806 18426242 * Common::hashit_lower(char const*) 3.22% 3.24% 8.10% 8530456 8578314 21467158 Common::String::equalsIgnoreCase(Common::String const&) const
DreamWeb (~100 files):
Time spent in profile = 19.08808s. ... Used cycles: 18.64% 114132508 set256 7.44% 7.56% 7.56% 45570176 46284960 46291280 tolower 3.50% 2.12% 4.12% 21462472 12976494 25202188 * Common::hashit_lower(char const*) 3.39% 3.47% 3.47% 20757570 21236280 21240422 strchr 2.90% 2.94% 5.09% 17777074 18000554 31198006 Common::BaseString<char32_t>::append(unsigned long, char32_t) 2.78% 1.59% 3.87% 17021506 9738806 23722970 * Common::String::equalsIgnoreCase(Common::String const&) const 2.74% 2.78% 2.87% 16776518 16997686 17584226 Graphics::VectorRendererSpec<unsigned char>::drawBevelSquareAlg(int, int, int, int, int, unsigned char, uns igned char) 2.56% 2.62% 2.70% 15673920 16029534 16506548 Common::BaseString<char>::ensureCapacity(unsigned int, bool) 2.18% 2.22% 2.24% 13336494 13581668 13730712 Common::BaseString<char32_t>::ensureCapacity(unsigned int, bool) 2.14% 2.19% 5.42% 13075922 13406536 33164776 Common::BaseString<char>::append(char const*) 2.01% 2.03% 5.40% 12322480 12461080 33079852 Common::BaseString<char>::append(unsigned long, char) 1.88% 11498756 ROM_TOS 1.83% 1.86% 22.39% 11218950 11409332 137121990 Common::punycode_encodefilename(Common::U32String const&) 1.81% 1.82% 21.55% 11097116 11144916 131994772 Audio::MixerImpl::mixCallback(unsigned char*, unsigned int) 1.57% 1.60% 14.95% 9634502 9820150 91569596 Common::punycode_encode(Common::U32String const&) 1.50% 1.50% 6.60% 9162956 9203576 40410036 Common::U32String::operator+=(char32_t) 1.35% 1.36% 1.36% 8289502 8354752 8356456 strlen 1.33% 1.35% 1.35% 8149050 8279324 8280528 Graphics::VectorRendererSpec<unsigned char>::applyScreenShading(GUI::ThemeEngine::ShadingStyle) 1.33% 8138216 Common::U32String::decodeUTF8(char const*, unsigned int) 1.17% 1.27% 1.29% 7180466 7799142 7888284 Common::BaseString<char>::~BaseString() 1.08% 1.10% 6.47% 6597538 6708378 39630474 Common::String::operator+=(char) 0.98% 0.99% 3.04% 5978096 6043216 18636164 Common::BaseString<char>::BaseString(char const*) 0.97% 0.98% 15.80% 5938560 5999262 96773466 AdvancedMetaEngineDetectionBase::detectGame(Common::FSNode const&, Common::HashMap<Common::Path, Common::FS Node, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo> const&, Common::Language, Common::Platform, Common::String const&, unsigned int, bool) 0.94% 5769110 areeven 0.92% 0.94% 41.16% 5655470 5730882 252077854 AdvancedMetaEngineDetectionBase::composeFileHashMap(Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>&, Common::FSList const&, int, Common::Path const&) const 0.90% 5523230 Common::String::encodeUTF8(Common::U32String const&, char)
Bud Tucker (nearly 300 files): https://downloads.scummvm.org/frs/demos/tucker/tucker-dos-demo.zip
Time spent in profile = 29.49824s. ... Used cycles: 6.33% 6.44% 6.44% 59955326 60946382 60955538 tolower 5.62% 53219164 set256 4.35% 4.43% 4.43% 41166544 41912456 41919030 strchr 4.31% 4.37% 7.52% 40798584 41343440 71191320 Common::BaseString<char32_t>::append(unsigned long, char32_t) 3.49% 1.07% 2.08% 32986150 10091028 19685358 * Common::hashit_lower(char const*) 3.44% 3.47% 3.56% 32563428 32852604 33728660 Common::BaseString<char>::ensureCapacity(unsigned int, bool) 3.25% 3.28% 8.17% 30739268 31017176 77369806 Common::BaseString<char>::append(char const*) 3.25% 3.29% 3.30% 30723482 31110278 31275202 Common::BaseString<char32_t>::ensureCapacity(unsigned int, bool) 2.87% 2.93% 34.96% 27169450 27706538 330850600 Common::punycode_encodefilename(Common::U32String const&) 2.85% 2.87% 2.87% 27000896 27163522 27167086 timesub.isra.0 2.64% 2.65% 7.11% 24957458 25107594 67282674 Common::BaseString<char>::append(unsigned long, char) 2.51% 2.57% 23.46% 23761364 24303758 222060192 Common::punycode_encode(Common::U32String const&) 2.44% 23120686 ROM_TOS 2.22% 2.25% 9.78% 21015652 21321480 92538426 Common::U32String::operator+=(char32_t) 2.12% 0.92% 2.30% 20105116 8747222 21789654 * Common::String::equalsIgnoreCase(Common::String const&) const 2.02% 19107772 Common::U32String::decodeUTF8(char const*, unsigned int) 1.78% 1.80% 1.86% 16811486 17032388 17607928 Graphics::VectorRendererSpec<unsigned char>::drawBevelSquareAlg(int, int, int, int, int, unsigned char, unsigned char) 1.70% 1.74% 72.87% 16119048 16474220 689688736 AdvancedMetaEngineDetectionBase::composeFileHashMap(Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>&, Common::FSList const&, int, Common::Path const&) const 1.62% 1.74% 1.76% 15320332 16444722 16646742 Common::BaseString<char>::~BaseString() 1.57% 1.57% 1.57% 14829724 14833232 14836462 strlen 1.46% 1.48% 10.29% 13823158 14004126 97346008 Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>::lookupAndCreateIfMissing(Common::Path const&) 1.42% 13462424 Common::String::encodeUTF8(Common::U32String const&, char) 1.42% 1.43% 8.52% 13401228 13553898 80682654 Common::String::operator+=(char) 1.29% 1.31% 2.82% 12189390 12382552 26667974 Common::BaseString<char>::BaseString(Common::BaseString<char> const&) 1.27% 1.30% 9.43% 11974204 12319716 89218530 Common::String::operator+=(char const*) 1.19% 11300100 top_down 1.15% 10922474 common2 1.04% 1.05% 3.11% 9885244 9917700 29458164 Common::BaseString<char>::BaseString(char const*)
Unknown / 1000 empty files: for i in $(seq 999); do touch $(printf "%03d" $i); done
Time spent in profile = 47.15388s. ... Used cycles: 5.28% 5.36% 5.36% 79890114 81068938 81082612 tolower 4.52% 68419008 set256 3.58% 3.63% 76.81% 54137012 548850401162013508 AdvancedMetaEngineDetectionBase::composeFileHashMap(Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>&, Common::FSList const&, int, Common::Path const&) const 3.10% 3.16% 3.17% 46916114 47880444 47887554 strchr 3.08% 3.12% 13.15% 46605210 47238284 198976936 Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>::lookupAndCreateIfMissing(Common::Path const&) 2.90% 0.64% 1.25% 43917318 9742630 18956808 * Common::hashit_lower(char const*) 2.87% 2.93% 2.93% 43490364 44285108 44375388 Common::BaseString<char>::~BaseString() 2.81% 2.85% 4.83% 42536860 43145034 73097796 Common::BaseString<char32_t>::append(unsigned long, char32_t) 2.76% 2.81% 8.04% 41766818 42443872 121688982 Common::BaseString<char>::BaseString(Common::BaseString<char> const&) 2.48% 2.52% 20.40% 37592116 38177870 308685436 Common::punycode_encode(Common::U32String const&) 2.45% 2.48% 29.30% 37031828 37564898 443229266 Common::punycode_encodefilename(Common::U32String const&) 2.44% 2.47% 2.47% 36870254 37299992 37419268 Common::BaseString<char>::ensureCapacity(unsigned int, bool) 2.25% 2.29% 2.30% 33989434 34618342 34773002 Common::BaseString<char32_t>::ensureCapacity(unsigned int, bool) 2.20% 33218326 ROM_TOS 2.18% 2.21% 2.21% 32942684 33371704 33376564 timesub.isra.0 2.09% 0.57% 1.42% 31663784 8581180 21473600 * Common::String::equalsIgnoreCase(Common::String const&) const 2.05% 2.10% 5.25% 31077204 31740514 79490836 Common::BaseString<char>::append(char const*) 1.90% 28674866 common2 1.88% 1.92% 4.85% 28485862 29070728 73446690 Common::BaseString<char>::BaseString(char const*) 1.84% 27774706 top_down 1.71% 1.75% 4.58% 25863122 26402936 69338208 Common::BaseString<char>::append(unsigned long, char) 1.66% 25123190 Common::U32String::decodeUTF8(char const*, unsigned int) 1.60% 24202440 less256_d 1.56% 1.58% 4.12% 23645624 23877888 62406432 Common::BaseString<char>::BaseString(Common::BaseString<char>&&) 1.52% 22942516 copy16_d 1.41% 1.44% 6.27% 21405368 21765520 94891948 Common::U32String::operator+=(char32_t) 1.40% 1.38% 1.38% 21170554 20819506 20822328 * strlen 1.36% 1.38% 7.76% 20526426 20841760 117471978 Common::Path::encode(char const*, char) 1.30% 1.32% 3.96% 19729626 19955772 59940910 Common::Path::toString(char) const 1.22% 18518114 Common::String::encodeUTF8(Common::U32String const&, char)
Summary
The more files the selected directory has, the more time "AdvancedMetaEngineDetectionBase::composeFileHashMap()" itself, and especially the functions called by it, start taking:
- FOTAQ (3 files): <0.1%, <2%
- DreamWeb (104): 0.9%, 41%
- Tucker (287): 1.7%, 73%
- Unknown (1000): 3.6%, 77%
See attached simplified PDF callgraph of the last case.
comment:10 by , 6 months ago
Thank you for the elements.
First thing first, the composeFileHashMap function is called once per engine.
The more engines ScummVM gets, the more this function gets called and the slower the whole thing will get.
This figure is not visible in your data as it doesn't count function calls.
I did some new tests on Linux, using Callgrind and a folder with 1000 files (as it was slow enough).
The cost per call of composeFileHashMap was still slightly larger in master than in 2.8.
After more investigation, I didn't find an obvious culprit so I tackled some function standing out.
That led me to improve the performance of punycode encoding functions, scumm_stricmp and scumm_strnicmp.
I also gained some performance by not calling appendComponent when not needed in the detection code.
All of this is currently in master (commit 42b4e7ca10de9d6efb843c58e47a41ff93f6c386).
This makes things faster than master from before, but it can't beat 2.8 as we gained 10 new engines since then.
Anyway, this makes the composeFileHashMap faster than in 2.8 for my test case so I'd say we are not that bad.
Can you test to check how it goes now?
Note that I don't expect to do better.
comment:11 by , 6 months ago
I have come up with a more accurate method of measurement so I have retested everything from scratch. Without further ado:
branch-2-8: 128.08s
master before commit 0d549d86f91c72fd22bb9328d2691937d9289097: 166.24s
master before commit 42b4e7ca10de9d6efb843c58e47a41ff93f6c386: 148.41s
latest master: 144.02s
latest master with 2.8 engines: 133.4s
So I'd say your changes had the impact you intended. Great job!
comment:12 by , 6 months ago
| Status: | pending → closed |
|---|
Thanks for the retesting.
As said before, I can't do much more as we pay now for the new engines which got added in the same time.
To keep figures lower, I would have proposed to build the Atari port with the --disable-detection-full configure option but you already do it.
Closing this.

I see this same slowdown. Doing some quick experiments shows that almost all of the slowdown is explained by
Path::equalsIgnoreCase()andPath::hashIgnoreCase().They seem to be going through the full string three times instead of just once now (strchr for the separator, String constructor, and then the actual hashing or comparison), also creating a bunch of temporary strings in the process.
May be worth it to add a fast specialization for these for the non-escaped variants?