Opened 17 months ago

Closed 5 months ago

#13953 closed defect (fixed)

TINSEL: DW2 - random setMusicSceneDetails() mutex lock-ups when changing scenes

Reported by: dwatteau Owned by: PushmePullyu
Priority: normal Component: Engine: Tinsel
Version: Keywords: mutex, lock-up
Cc: Game: Discworld II

Description

ScummVM 2.6.1 and current Git HEAD. I can only trigger this on OSX PPC, but I think this may be a general issue that's more likely to be triggered there (see below).

In a full gameplay (9 hours) of Discworld II DOS/GB, the game randomly got stuck on the fade-to-black screen around 15 times when leaving a scene/room for the next one. I had to force quit ScummVM, restart it and load the nearest save. DW1 was very stable on the same setup, in comparison.

Areas where this would frequently happen when leaving the scene were: the cemetery, the vampire's crypt, the University garden, and the place zooming on the two or three camels in Djelibeybi. The latter is where it was the most likely to happen (more than 1 time out of 10).

When this happens, the engine appears to (randomly) get stuck right after Setting music scene details: camels.mus is displayed, when PCMMusicPlayer::setMusicSceneDetails() calls setVol(255). This brings us to MixerImpl::setChannelVolume() calling Common::StackLock lock(_mutex) but never coming back from it.

Attaching some saves and logs below...

FWIW, this port is big-endian ppc32, uses GCC 4.8.5 and SDL 1.2. The bug happens with either a -O2 release build or a -O0 debug build. With Debian ppc32 on the same machine, I can't reproduce it, but I wonder if OSX PPC 10.4/10.5 isn't just "luckier" in triggering some threading issues, as in bug #13661 for SAGA.

Attachments (3)

dw2-gb.055 (2.7 KB ) - added by dwatteau 17 months ago.
DW2 DOS/GB save. Click on the two camels so that you go to the scene where they're zoomed-in. Then, leave and come back to that scene over and over, until the game lockups. This happens 1 time out of 10 on OSX ppc, but I haven't been able to trigger it on other platforms yet.
dw2_gdb_stuck_ppc.txt (2.8 KB ) - added by dwatteau 17 months ago.
GDB backtrace when the engine locks up.
tinsel-music-threadsanitizer.txt (11.8 KB ) - added by dwatteau 17 months ago.
For what it's worth, ThreadSanitizer appears to find a data race in the same area... here's the log.

Download all attachments as: .zip

Change History (8)

by dwatteau, 17 months ago

Attachment: dw2-gb.055 added

DW2 DOS/GB save. Click on the two camels so that you go to the scene where they're zoomed-in. Then, leave and come back to that scene over and over, until the game lockups. This happens 1 time out of 10 on OSX ppc, but I haven't been able to trigger it on other platforms yet.

by dwatteau, 17 months ago

Attachment: dw2_gdb_stuck_ppc.txt added

GDB backtrace when the engine locks up.

comment:1 by raziel-, 17 months ago

@dwatteau

No such lockup on AmigaOS4 / BE / PPC
gcc (adtools build 11.3.0) 11.3.0
Compiled with -O3

DOS/English(GB) version here as well.

But there is one thing to mention, when you stand around (after loading the save) roughly five seconds, a cutscene triggers where the camel dealer arrives.

I couldn't make it so that the cutscenes triggers when i changed rooms at the exact point, but maybe that cutscene playing intervenes with the scene change (exact frame probably?) and causes the lockup?
Then again, it seems to happen on nearly all fade-to-blacks?

If any more tests are needed, please shout

comment:2 by dwatteau, 17 months ago

@raziel- Thanks for your test!

I don't think this is related to the guy who arrives on a camel's back. Because he only appears near the end of the game, but I can also trigger this issue in Act 2 or once he's gone. Similar lock-ups happen in some other parts of the game, too. With the camels it just appears to be a bit more likely to happen, for some reason...

I wonder if this could be related to the camels triggering a very frequent noise with their mouths; i.e. when you leave the scene, you're switching from a background music to another one, but with some timing both the music and SFX will be triggered at the same time, leading to this lock-up or something like that. Just a very wild guess while putting some logs around setMusicSceneDetails() though... I may be completely wrong.

https://www.lspace.org/games/discworld/faq.html#part3.5 mentions that there originally were some random issues around the camels, but I think our engine is based on the later interpreter which fixed this, and lock-ups aren't mentionned there.

by dwatteau, 17 months ago

For what it's worth, ThreadSanitizer appears to find a data race in the same area... here's the log.

comment:3 by PushmePullyu, 10 months ago

Interesting find, thanks for the info! This looks like a deadlock due to a locking order problem with "Audio::MixerImpl::_mutex" and "Tinsel::PCMMusicPlayer::_mutex". The main thread seems to lock them in reverse order compared to the audio thread.

I can reproduce the bug on Linux x86_64 by adding a delayMillis() call in setMusicSceneDetails() (see https://github.com/PushmePullyu/scummvm/tree/tinsel-dw2-repro-music-deadlock, which also includes a commented out potential fix). Starting a new game or changing scenes with this immediately deadlocks.

I will open a PR once I have done some more testing.

comment:4 by PushmePullyu, 5 months ago

PR: https://github.com/scummvm/scummvm/pull/5475

Apologies for the delay, I had accidentally marked this as completed on my to-do list.

comment:5 by bluegr, 5 months ago

Owner: set to PushmePullyu
Resolution: fixed
Status: newclosed

Fixed in the aforementioned PR: https://github.com/scummvm/scummvm/pull/5475

Note: See TracTickets for help on using tickets.