Opened 2 years ago

Closed 6 months ago

#14853 closed defect (fixed)

COMMON: Rework path handling causes performance regression

Reported by: mikrosk Owned by: lephilousophe
Priority: normal Component: Common
Version: Keywords: performance, regression
Cc: Game:

Description

https://github.com/scummvm/scummvm/pull/5467 introduced a new (safer) way how to handle file paths.

I was looking forward to this change because I tested its early iteration and it brought quite visible speedup when adding new games.

Surprisingly, when the dust settled after the 2.8.0 release we have found out that our Atari backend regressed in master: adding a game with a lot of files (e.g. Dreamweb with ~100 files) could lead to nearly twice as slow game detection.

Some numbers:

Atari backend, attempting to add a folder with 1000 generated files:

  • 2.8.0: 33s
  • master: 58s

Linux SDL backend, attempting to add a folder with 100 000 generated files ("for i in $(seq 99999); do touch $(printf "%04d" $i); done"):

  • 2.8.0: 17s
  • master: 28s

Naturally, no game is going to have 100 000 files, this is just to demonstrate that the powerful CPU can hide even a performance drop to 60% of the original speed.

Fortunately, thanks to LP's responsible approach, each commit is compilable and testable: so in this case the offending commit is https://github.com/scummvm/scummvm/commit/645a35c0 (not so surprising as it directly manipulates the detection code).

Attachments (1)

add-unknown-1000.pdf (30.3 KB ) - added by mikrosk 6 months ago.

Download all attachments as: .zip

Change History (13)

comment:1 by wjp, 2 years ago

I see this same slowdown. Doing some quick experiments shows that almost all of the slowdown is explained by Path::equalsIgnoreCase() and Path::hashIgnoreCase().

They seem to be going through the full string three times instead of just once now (strchr for the separator, String constructor, and then the actual hashing or comparison), also creating a bunch of temporary strings in the process.

May be worth it to add a fast specialization for these for the non-escaped variants?

comment:2 by mikrosk, 2 years ago

With https://github.com/scummvm/scummvm/pull/5621 merged, I'm happy to report that adding/detecting Dreamweb is now much faster than in 2.8! So the original speed boost I remembered is definitely present/visible.

However, the when it comes to hashing/comparing, there's still some performance decrease if the amount of files is too big: adding a folder with 999 (generated) files is still about 20% slower than in 2.8. The aforementioned PR definitely helped but it's still slower. I can provide a detailed call graph if it helps.

Last edited 2 years ago by mikrosk (previous) (diff)

comment:3 by lephilousophe, 7 months ago

Owner: set to lephilousophe
Resolution: fixed
Status: newpending

What's the status of this?
IIRC we did some additional fixes.

comment:4 by mikrosk, 6 months ago

I have just tested it with the generated 999 files scenario:

branch-2-8: 155.72s
latest master: 199.44s

So yes, current master is still about 80% of the 2.8's speed when adding a new game.

comment:5 by lephilousophe, 6 months ago

Thanks for testing again.
Do you have a call graph (or a flame graph) as you proposed? :)

comment:6 by i30817, 6 months ago

Sorry to bug you but since the bug reporter is complaining about scan speed.

Scummvm scans much much faster if you scan from the command line (it also has less false positives in games that falsely scan a inside dir as another version of the game).

The reason is that the command line isn't recursive, so it just checks the top level of the directory given and adds the game or not.

Since I'm in Linux, I can easily automate this for all immediate subdirs in multiple directories like this in the cmd line:

rm -f add.log
find ScummVM ScummVM_Text/ ScummVM_AGS/ -mindepth 1 -maxdepth 1 -type d -exec echo "{}" >> add.log \; -exec scummvm -a -p"{}"

the log file is for if you want to check what games aren't being added in the dirs given (usually because they aren't at the top level dir to be scanned, but sometimes scummvm problems show like the bug I opened about discworld noir and director noir).

Strangely adding games this way tends to add more games even if they don't have the checksums in scummvm. The reason is that adding this way, unlike the gui, will not repeat gameids. And unknown games in the same engine have the same Id (eg: 'Unknown Adventure Game Studio game'). Instead, after the first, it will add the first possibility of the fallbacks (the thing you see when you add a unknown checksum game). Since those possibilities are checked\added by signature files filenames, that one is usually right.

In short, it saves a LOT of time.

comment:7 by i30817, 6 months ago

tends to add more games -> tends to add more games with the correct name

comment:8 by i30817, 6 months ago

Also you probably want the paths given to be absolute. I had them relative just as a example, not sure it works without a absolute path (start from / until the directories holding the game directories you want to scan).

by mikrosk, 6 months ago

Attachment: add-unknown-1000.pdf added

comment:9 by mikrosk, 6 months ago

Thanks to Eero Tamminen who has profiled various use cases I can provide you with a rich set of profiling results:

FOTAQ (3 files):

Time spent in profile = 8.26151s.
...
Used cycles:
  24.42%                    64723246                       set256
   8.43%   8.56%   8.56%    22356134  22687874  22691592   tolower
   6.32%   6.44%   6.63%    16740636  17059548  17577978   Graphics::VectorRendererSpec<unsigned char>::drawBevelSquareAlg(int, int, int, int, int, unsigned char, unsigned char)
   3.75%   3.57%   6.95%     9948728   9471806  18426242 * Common::hashit_lower(char const*)
   3.22%   3.24%   8.10%     8530456   8578314  21467158   Common::String::equalsIgnoreCase(Common::String const&) const

DreamWeb (~100 files):

Time spent in profile = 19.08808s.
...
Used cycles:
  18.64%                   114132508                       set256
   7.44%   7.56%   7.56%    45570176  46284960  46291280   tolower
   3.50%   2.12%   4.12%    21462472  12976494  25202188 * Common::hashit_lower(char const*)
   3.39%   3.47%   3.47%    20757570  21236280  21240422   strchr
   2.90%   2.94%   5.09%    17777074  18000554  31198006   Common::BaseString<char32_t>::append(unsigned long, char32_t)
   2.78%   1.59%   3.87%    17021506   9738806  23722970 * Common::String::equalsIgnoreCase(Common::String const&) const
   2.74%   2.78%   2.87%    16776518  16997686  17584226   Graphics::VectorRendererSpec<unsigned char>::drawBevelSquareAlg(int, int, int, int, int, unsigned char, uns
igned char)
   2.56%   2.62%   2.70%    15673920  16029534  16506548   Common::BaseString<char>::ensureCapacity(unsigned int, bool)
   2.18%   2.22%   2.24%    13336494  13581668  13730712   Common::BaseString<char32_t>::ensureCapacity(unsigned int, bool)
   2.14%   2.19%   5.42%    13075922  13406536  33164776   Common::BaseString<char>::append(char const*)
   2.01%   2.03%   5.40%    12322480  12461080  33079852   Common::BaseString<char>::append(unsigned long, char)
   1.88%                    11498756                       ROM_TOS
   1.83%   1.86%  22.39%    11218950  11409332 137121990   Common::punycode_encodefilename(Common::U32String const&)
   1.81%   1.82%  21.55%    11097116  11144916 131994772   Audio::MixerImpl::mixCallback(unsigned char*, unsigned int)
   1.57%   1.60%  14.95%     9634502   9820150  91569596   Common::punycode_encode(Common::U32String const&)
   1.50%   1.50%   6.60%     9162956   9203576  40410036   Common::U32String::operator+=(char32_t)
   1.35%   1.36%   1.36%     8289502   8354752   8356456   strlen
   1.33%   1.35%   1.35%     8149050   8279324   8280528   Graphics::VectorRendererSpec<unsigned char>::applyScreenShading(GUI::ThemeEngine::ShadingStyle)
   1.33%                     8138216                       Common::U32String::decodeUTF8(char const*, unsigned int)
   1.17%   1.27%   1.29%     7180466   7799142   7888284   Common::BaseString<char>::~BaseString()
   1.08%   1.10%   6.47%     6597538   6708378  39630474   Common::String::operator+=(char)
   0.98%   0.99%   3.04%     5978096   6043216  18636164   Common::BaseString<char>::BaseString(char const*)
   0.97%   0.98%  15.80%     5938560   5999262  96773466   AdvancedMetaEngineDetectionBase::detectGame(Common::FSNode const&, Common::HashMap<Common::Path, Common::FS
Node, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo> const&, Common::Language, Common::Platform, Common::String const&, unsigned int, bool)
   0.94%                     5769110                       areeven
   0.92%   0.94%  41.16%     5655470   5730882 252077854   AdvancedMetaEngineDetectionBase::composeFileHashMap(Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>&, Common::FSList const&, int, Common::Path const&) const
   0.90%                     5523230                       Common::String::encodeUTF8(Common::U32String const&, char)

Bud Tucker (nearly 300 files): https://downloads.scummvm.org/frs/demos/tucker/tucker-dos-demo.zip

Time spent in profile = 29.49824s.
...
Used cycles:
   6.33%   6.44%   6.44%    59955326  60946382  60955538   tolower
   5.62%                    53219164                       set256
   4.35%   4.43%   4.43%    41166544  41912456  41919030   strchr
   4.31%   4.37%   7.52%    40798584  41343440  71191320   Common::BaseString<char32_t>::append(unsigned long, char32_t)
   3.49%   1.07%   2.08%    32986150  10091028  19685358 * Common::hashit_lower(char const*)
   3.44%   3.47%   3.56%    32563428  32852604  33728660   Common::BaseString<char>::ensureCapacity(unsigned int, bool)
   3.25%   3.28%   8.17%    30739268  31017176  77369806   Common::BaseString<char>::append(char const*)
   3.25%   3.29%   3.30%    30723482  31110278  31275202   Common::BaseString<char32_t>::ensureCapacity(unsigned int, bool)
   2.87%   2.93%  34.96%    27169450  27706538 330850600   Common::punycode_encodefilename(Common::U32String const&)
   2.85%   2.87%   2.87%    27000896  27163522  27167086   timesub.isra.0
   2.64%   2.65%   7.11%    24957458  25107594  67282674   Common::BaseString<char>::append(unsigned long, char)
   2.51%   2.57%  23.46%    23761364  24303758 222060192   Common::punycode_encode(Common::U32String const&)
   2.44%                    23120686                       ROM_TOS
   2.22%   2.25%   9.78%    21015652  21321480  92538426   Common::U32String::operator+=(char32_t)
   2.12%   0.92%   2.30%    20105116   8747222  21789654 * Common::String::equalsIgnoreCase(Common::String const&) const
   2.02%                    19107772                       Common::U32String::decodeUTF8(char const*, unsigned int)
   1.78%   1.80%   1.86%    16811486  17032388  17607928   Graphics::VectorRendererSpec<unsigned char>::drawBevelSquareAlg(int, int, int, int, int, unsigned char, unsigned char)
   1.70%   1.74%  72.87%    16119048  16474220 689688736   AdvancedMetaEngineDetectionBase::composeFileHashMap(Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>&, Common::FSList const&, int, Common::Path const&) const
   1.62%   1.74%   1.76%    15320332  16444722  16646742   Common::BaseString<char>::~BaseString()
   1.57%   1.57%   1.57%    14829724  14833232  14836462   strlen
   1.46%   1.48%  10.29%    13823158  14004126  97346008   Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>::lookupAndCreateIfMissing(Common::Path const&)
   1.42%                    13462424                       Common::String::encodeUTF8(Common::U32String const&, char)
   1.42%   1.43%   8.52%    13401228  13553898  80682654   Common::String::operator+=(char)
   1.29%   1.31%   2.82%    12189390  12382552  26667974   Common::BaseString<char>::BaseString(Common::BaseString<char> const&)
   1.27%   1.30%   9.43%    11974204  12319716  89218530   Common::String::operator+=(char const*)
   1.19%                    11300100                       top_down
   1.15%                    10922474                       common2
   1.04%   1.05%   3.11%     9885244   9917700  29458164   Common::BaseString<char>::BaseString(char const*)

Unknown / 1000 empty files: for i in $(seq 999); do touch $(printf "%03d" $i); done

Time spent in profile = 47.15388s.
...
Used cycles:
   5.28%   5.36%   5.36%    79890114  81068938  81082612   tolower
   4.52%                    68419008                       set256
   3.58%   3.63%  76.81%    54137012  548850401162013508   AdvancedMetaEngineDetectionBase::composeFileHashMap(Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>&, Common::FSList const&, int, Common::Path const&) const
   3.10%   3.16%   3.17%    46916114  47880444  47887554   strchr
   3.08%   3.12%  13.15%    46605210  47238284 198976936   Common::HashMap<Common::Path, Common::FSNode, Common::Path::IgnoreCase_Hash, Common::Path::IgnoreCase_EqualTo>::lookupAndCreateIfMissing(Common::Path const&)
   2.90%   0.64%   1.25%    43917318   9742630  18956808 * Common::hashit_lower(char const*)
   2.87%   2.93%   2.93%    43490364  44285108  44375388   Common::BaseString<char>::~BaseString()
   2.81%   2.85%   4.83%    42536860  43145034  73097796   Common::BaseString<char32_t>::append(unsigned long, char32_t)
   2.76%   2.81%   8.04%    41766818  42443872 121688982   Common::BaseString<char>::BaseString(Common::BaseString<char> const&)
   2.48%   2.52%  20.40%    37592116  38177870 308685436   Common::punycode_encode(Common::U32String const&)
   2.45%   2.48%  29.30%    37031828  37564898 443229266   Common::punycode_encodefilename(Common::U32String const&)
   2.44%   2.47%   2.47%    36870254  37299992  37419268   Common::BaseString<char>::ensureCapacity(unsigned int, bool)
   2.25%   2.29%   2.30%    33989434  34618342  34773002   Common::BaseString<char32_t>::ensureCapacity(unsigned int, bool)
   2.20%                    33218326                       ROM_TOS
   2.18%   2.21%   2.21%    32942684  33371704  33376564   timesub.isra.0
   2.09%   0.57%   1.42%    31663784   8581180  21473600 * Common::String::equalsIgnoreCase(Common::String const&) const
   2.05%   2.10%   5.25%    31077204  31740514  79490836   Common::BaseString<char>::append(char const*)
   1.90%                    28674866                       common2
   1.88%   1.92%   4.85%    28485862  29070728  73446690   Common::BaseString<char>::BaseString(char const*)
   1.84%                    27774706                       top_down
   1.71%   1.75%   4.58%    25863122  26402936  69338208   Common::BaseString<char>::append(unsigned long, char)
   1.66%                    25123190                       Common::U32String::decodeUTF8(char const*, unsigned int)
   1.60%                    24202440                       less256_d
   1.56%   1.58%   4.12%    23645624  23877888  62406432   Common::BaseString<char>::BaseString(Common::BaseString<char>&&)
   1.52%                    22942516                       copy16_d
   1.41%   1.44%   6.27%    21405368  21765520  94891948   Common::U32String::operator+=(char32_t)
   1.40%   1.38%   1.38%    21170554  20819506  20822328 * strlen
   1.36%   1.38%   7.76%    20526426  20841760 117471978   Common::Path::encode(char const*, char)
   1.30%   1.32%   3.96%    19729626  19955772  59940910   Common::Path::toString(char) const
   1.22%                    18518114                       Common::String::encodeUTF8(Common::U32String const&, char)

Summary

The more files the selected directory has, the more time "AdvancedMetaEngineDetectionBase::composeFileHashMap()" itself, and especially the functions called by it, start taking:

  • FOTAQ (3 files): <0.1%, <2%
  • DreamWeb (104): 0.9%, 41%
  • Tucker (287): 1.7%, 73%
  • Unknown (1000): 3.6%, 77%

See attached simplified PDF callgraph of the last case.

Last edited 6 months ago by mikrosk (previous) (diff)

comment:10 by lephilousophe, 6 months ago

Thank you for the elements.

First thing first, the composeFileHashMap function is called once per engine.
The more engines ScummVM gets, the more this function gets called and the slower the whole thing will get.
This figure is not visible in your data as it doesn't count function calls.

I did some new tests on Linux, using Callgrind and a folder with 1000 files (as it was slow enough).

The cost per call of composeFileHashMap was still slightly larger in master than in 2.8.
After more investigation, I didn't find an obvious culprit so I tackled some function standing out.

That led me to improve the performance of punycode encoding functions, scumm_stricmp and scumm_strnicmp.
I also gained some performance by not calling appendComponent when not needed in the detection code.

All of this is currently in master (commit 42b4e7ca10de9d6efb843c58e47a41ff93f6c386).
This makes things faster than master from before, but it can't beat 2.8 as we gained 10 new engines since then.
Anyway, this makes the composeFileHashMap faster than in 2.8 for my test case so I'd say we are not that bad.

Can you test to check how it goes now?
Note that I don't expect to do better.

comment:11 by mikrosk, 6 months ago

I have come up with a more accurate method of measurement so I have retested everything from scratch. Without further ado:

branch-2-8: 128.08s
master before commit 0d549d86f91c72fd22bb9328d2691937d9289097: 166.24s
master before commit 42b4e7ca10de9d6efb843c58e47a41ff93f6c386: 148.41s
latest master: 144.02s
latest master with 2.8 engines: 133.4s

So I'd say your changes had the impact you intended. Great job!

comment:12 by lephilousophe, 6 months ago

Status: pendingclosed

Thanks for the retesting.
As said before, I can't do much more as we pay now for the new engines which got added in the same time.

To keep figures lower, I would have proposed to build the Atari port with the --disable-detection-full configure option but you already do it.

Closing this.

Note: See TracTickets for help on using tickets.