Opened 6 years ago
Closed 2 years ago
Last modified 23 months ago
#9705 closed defect (fixed)
FULLPIPE: Crash on start (Big Endian)
|Reported by:||raziel-||Owned by:||sev-|
|Version:||Keywords:||has-backtrace, Big Endian|
ScummVM 1.10.0git (Feb 11 2017 10:16:18)
Features compiled in: Vorbis FLAC MP3 RGB zLib MPEG2 Theora AAC FreeType2 JPEG PNG cloud (servers, local)
I get a crash after starting the game.
No screen loads, just a black screen or window comes up, no sound no graphics.
No backtrace either, but a stacktrace (attached)
Not sure if it helps though, here's whataddr2line says:
addr2line --section=.text -C -fe scummvm 0x5220D8
Full Pipe (Windows/English)
AmigaOS4 - PPC - SDL - BE
gcc (adtools build 5.3.0) 5.3.0
Change History (52)
by , 6 years ago
comment:1 by , 6 years ago
I think it is just you're running out of memory. The line where you're getting a crash is '_paletteData = (byte *)malloc(1024);'
comment:2 by , 6 years ago
I still have 58% of memory free (2 GB Ram) when the crash happens, i don't believe i'm running out of it.
May htere be another reason?
Anything i can do to test?
comment:3 by , 6 years ago
Putting scummvm in a debugger and starting Fullpipe i get this stacktrace:
Your program has crashed! ip = 0x228ee7c
Program has stopped at an unknown point in memory (TRAP)
[engines/fullpipe/statics.cpp: line 1701]: _ZN8Fullpipe8Movement4loadERNS_10MfcArchiveEPNS_15StaticANIObjectE (scummvm)
[engines/fullpipe/statics.cpp: line 234]: _ZN8Fullpipe15StaticANIObject4loadERNS_10MfcArchiveE (scummvm)
[engines/fullpipe/statics.cpp: line 183]: _ZN8Fullpipe5Scene4loadERNS_10MfcArchiveE (scummvm)
Does this help?
by , 6 years ago
|Attachment:||Debug 101 v1.1.2.png added|
Debug snapshot of crash
comment:4 by , 6 years ago
As you can see in the debug window (top right) there's "abadcafe" in one of the registers, which means there was access to memory that hasn't been initialised.
comment:5 by , 6 years ago
It's a game of luck with this engine/game.
I *once* was able to make it run (only when using the non-stripped debug version though, and only in window mode).
The timer counts down and displays the intro (slow)
The game screen comes up but all of the "sprites" are either reduced to gfx meshup lines or (as in the case of the cookoo's clock a big fat black rectangular. No sprites are actually drawn correct.
After the scene with the hole under the bed, a full screen display opens and seconds after that the upper half of that screen is covered in gfx meshup and my entire system goes down with a shriek
comment:6 by , 5 years ago
comment:7 by , 5 years ago
Please give it a go with the latest master and let me know if things get any further than before. Thanks!
by , 5 years ago
comment:8 by , 5 years ago
Thanks for looking at this.
I'm afraid it's pretty much the same.
I can't reach any game screen.
Window stays at the launcher imager, screen stays black.
(new log attached)
comment:9 by , 5 years ago
According to that backtrace, the crash is happening inside of newlib, again failing to allocate memory just like before, but at a different place. I had hoped the reduced number of heap allocations and significantly reduced memory leakage in the latest code would have improved the situation, but it still seems like the system is having memory trouble. If this is the case, it is probably not a ScummVM bug at this point since the game startup doesn’t leak at this point according to instrumentation.
There is some cruft in
configure that changes the way that AmigaOS builds work which might be causing trouble, like forcing static linkage and making other changes to compiler flags which may not be needed with the more modern GCC 5 compiler you’re using, so maybe a place to start would be get rid of these lines for the AmigaOS backend in
configure and try recompiling:
append_var CXXFLAGS "-mlongcall" LDFLAGS=`echo $LDFLAGS | sed 's/-use-dynld//'` append_var LDFLAGS "-static"
and maybe this one too:
append_var LDFLAGS "-Wl,--export-dynamic"
You might also try using
configure --enable-plugins --default-dynamic to reduce the size of the ScummVM image in memory, to allow the game to have more memory, and/or
-Os for the optimisation flag to have the compiler optimise the image size a bit more.
Let me know if any of that makes a difference, otherwise you will probably need to complain upstream (there’s not much we can reasonably do to work around bugs in the system’s memory allocator, if that’s what’s happening, which it appears so). Thanks!
comment:10 by , 5 years ago
Well, the first three settings you mentioned are for cross-compiling...i don't cross compile, i do native compiling
The export-dynamic one i will take out and see if that makes any difference.
But i only do static builds anyway (right now), as shared objects are even more error prone on my platform than static builds.
I just don't get why Full Pipe is the only game that causes memory probelms (not that i don't say it's possible), but why? Is there so much gfx or other mem needed as in all those other games?
comment:11 by , 5 years ago
So, i changed my configure according to your suggestion.
I still get a crash, but maybe the log gets clearer now?
I get a crash in memorypool and now i get this crash always in memorypool and not some random addresses scattered around the code or other running programs.
Not sure, but maybe that helps?
by , 5 years ago
comment:12 by , 5 years ago
Unfortunately it is still just consistently newlib crashing when allocating more memory.
This engine peaks at 180MiB on startup on my machine, which is quite a lot compared to Starship Titanic or SCI32 which both use less than 40MiB. I am not sure if there are any other engines in ScummVM that use so much memory.
If you are getting super unlucky, newlib might be doing a poor job of managing the many small allocations and running out of pages or something even if there is enough physical RAM to handle the load. I’m not blown away by the difference in number of allocations, it makes about 2.5x the number of allocations made by SCI32, though these are more frequently very small (16 byte) allocations. At face value, this doesn’t seem like something that should be breaking. You might try creating a small test app that makes a lot of small heap allocations and see what happens.
For reference here are some rough numbers on the top 10 allocations by total number of allocations at startup of the full version of Full Pipe on my system, which you might be able to use to create some synthetic test app to crash the allocator:
Category Persistent Bytes # Persistent # Transient Total Bytes # Total # Transient/Total Malloc 16 Bytes 49.31 KiB 3156 22787 405.36 KiB 25943 Ratio: %0.01, %0.08 Malloc 32 Bytes 403.00 KiB 12896 9658 704.81 KiB 22554 Ratio: %0.05, %0.03 Malloc 4.00 KiB 300.00 KiB 75 9776 38.48 MiB 9851 Ratio: %0.00, %0.03 StdioStream 32 Bytes 1 9690 302.84 KiB 9691 Ratio: %0.00, %0.03 Common::MemoryReadStream 528 Bytes 11 9671 453.84 KiB 9682 Ratio: %0.00, %0.03 Fullpipe::MemoryObject2 992.14 KiB 9071 0 992.14 KiB 9071 Ratio: %0.03, %0.00 Malloc 1.00 KiB 8.57 MiB 8774 29 8.60 MiB 8803 Ratio: %0.03, %0.00 Fullpipe::DynamicPhase 1.84 MiB 8579 0 1.84 MiB 8579 Ratio: %0.03, %0.00 Fullpipe::GameVar 572.25 KiB 5232 0 572.25 KiB 5232 Ratio: %0.02, %0.00 Malloc 128 Bytes 152.00 KiB 1216 2685 487.62 KiB 3901 Ratio: %0.00, %0.01
There’s a fairly long tail of other allocations (total numbers are 69580 persistent, 84454 transient) but this seems like a place to start.
Hopefully we will be able to collect a bit more data during release testing to see if there are any other systems which have this problem, though I am not really sure outside of the big-five how many other platforms are even going to be powerful enough to run this game regardless.
comment:13 by , 5 years ago
I'm going to relay and quote answer from an OS dev:
There certainly might be a problem when there are so many small allocations. This is why on AmigaOS it is recommended to use memory pools and item pools for frequent allocations/deallocations of small-size buffers.
I quite doubt such a trivial test has never been done by the Amiga newlib maintainer(s)
One more thing: there's a whole world of difference if the small amounts just keep getting allocated (and are only freed when the program ends), or if they are constantly allocated and deallocated throughout program runtine. Because if the latter is the case, the "loss of memory" problem is much more likely. Could you possibly ask the authors about that?
comment:14 by , 5 years ago
|Priority:||blocker → high|
Yes, the game does these sorts of allocations all over the place throughout its lifetime, though whether or not this is so is not really relevant here since the game is crashing already on startup. :)
We do have code for creating memory pools, Common::MemoryPool. I’m happy to accept a patch to use MemoryPool where necessary in this engine, though unless this shows up in other platforms it’s not something that I plan on spending time working on and so won’t be addressed for the next release.
comment:15 by , 5 years ago
|Game:||→ Full Pipe|
|Summary:||FULLPIPE: Crash on start → FULLPIPE: AmigaOS: Crash on start|
comment:16 by , 5 years ago
A core dev was nice enough to check the demo of FullPipe (which also crashes) with the latest newlib
I tried it with the latest beta version of newlib.
The crash is in the dlmalloc code but is not because of lack of memory.
Assuming that the dlmalloc is not bugged (seems a little unlikely since
many other programs are using it without problems) what could cause the
crash is if data before or after an allocation is trashed (as in a
buffer overflow for instance).
Aynthing i could do, to further corner the problem?
comment:17 by , 4 years ago
Some good soul helped with debugging the crash and i collected some more information.
If another good soul from the ScummVM team could please take a look I'd be grateful:
Fullpipe crashed for me as well.
C++ new operator uses probably malloc too (what else?).
Newlib and malloc started to ring some bells and it seems that ScummVM does sometimes realloc(). I wonder if this could break the allocation scheme? Here is some info: http://forum.hyperion-entertainment.com/viewtopic.php?f=14&t=1591&hilit=realloc
I hacked custom malloc/calloc/realloc/free functions into ScummVM and Fullpipe started without crash 3 times. It crashed on each exit though - probably my fault :) By the way game sprite graphics looks buggy, perhaps endian-related.
I think realloc issue is not fully confirmed but it's one potential clue.
Tried also Fullpipe on Linux with Valgrind and Clang but didn't see much.
Here is my patch. Currently I'm able to start and exit FP successfully: http://capehill.kapsi.fi/scummvm/
There is a primitive attempt to profile drawing, shader compilation hack, memory allocation workaround in fullpipe.cpp and finally, some realloc() workaround attempt (but it didn't change a thing).
Maybe it would be possible to capture all malloc/free/realloc/calloc calls and create a newlib testcase based on that data. Data might be captured over serial, for example.
Could you maybe describe the malloc workaround a bit?
What do you think is going wrong in the scummvm code and what are you "fixing".
Maybe the devs at ScummVM don't want to investigate and simply add that exclusion for AmigaOS4, but history tells me they do want to support all platforms properly.
Workaround just provides global symbols called malloc, free, calloc and realloc and linker then uses these to put the binary together. Newlib's symbols get by-passed: now heap allocations are requested directly from ExecSG.
This impacts also other libraries that are linked with ScummVM and need those symbols, including SDL.
I don't suspect ScummVM doing "wrong" currently.
I recorded memory calls of Fullpipe startup and then tried to replay these with a pure Newlib-based test program and problem did NOT reproduce. It seems complicated.
If there is a heap corruption by ScummVM then maybe it's somehow AmigaOS(+newlib)-specific and doesn't trigger Valgrind on Linux.
On the other hand, after replacing Newlib-calls with IExec ones, issue didn't seem to occur.
We could try to put canaries before and after each heap allocation. If we get lucky, one of the "canaries" gets hurt and then it may be possible to slowly debug more, knowing the size of the allocation for the next attempt.
I see dead birds. Could be a bug on my side but after first tests, of 150000 free'd allocations, there were about 300 allocations where the tail end was smashed. Have to investigate more.
* Birdwatch: good canaries 149822, bad canaries 340
The canary patch (fullpipe.cpp part) is here if you want to play: http://capehill.kapsi.fi/scummvm/
Debugged little bit more and put a trap instruction inside dead canary detector. It seems that there is some bitmap freed:
(0x64D93E40) [engines/fullpipe/fullpipe.cpp:162] scummvm:free()+0x1a4 (section 8 @ 0x236C8) (0x64D93E90) [engines/fullpipe/fullpipe.cpp:162] scummvm:free()+0x1a4 (section 8 @ 0x236C8) (0x64D93EC0) [engines/fullpipe/gfx.cpp:715] scummvm:_ZN8Fullpipe6BitmapD1Ev()+0x50 (section 8 @ 0x36EE8) (0x64D93EE0) scummvm:_ZN6Common14DefaultDeleterIN8Fullpipe6BitmapEEclEPS2_()+0x38 (section 8 @ 0x3D548) (0x64D93F00) [engines/fullpipe/gfx.cpp:255] scummvm:_ZN6Common9ScopedPtrIN8Fullpipe6BitmapENS_14DefaultDeleterIS2_EEE5resetEPS2_()+0x34 (section 8 @ 0x3D59C) (0x64D93F30) [engines/fullpipe/gfx.cpp:422] scummvm:_ZN8Fullpipe7Picture11freePictureEv()+0xf0 (section 8 @ 0x375BC) (0x64D93F50) [engines/fullpipe/gfx.cpp:412] scummvm:_ZN8Fullpipe7PictureD2Ev()+0x30 (section 8 @ 0x377C0) (0x64D93F70) scummvm:_ZN8Fullpipe11StaticPhaseD2Ev()+0x44 (section 8 @ 0xA47C8) (0x64D93F90) scummvm:_ZN8Fullpipe12DynamicPhaseD2Ev()+0x34 (section 8 @ 0xA482C) (0x64D93FB0) scummvm:_ZN8Fullpipe7StaticsD0Ev()+0x54 (section 8 @ 0xA48B0) (0x64D93FD0) scummvm:_ZN6Common14DefaultDeleterIN8Fullpipe7StaticsEEclEPS2_()+0x44 (section 8 @ 0xA3A58) (0x64D93FF0) [engines/fullpipe/statics.cpp:145] scummvm:_ZN6Common8for_eachIPPN8Fullpipe7StaticsENS_14DefaultDeleterIS2_EEEET0_T_S8_S7_()+0x44 (section 8 @ 0xA3AB4) (0x64D94010) [engines/fullpipe/statics.cpp:115] scummvm:_ZN8Fullpipe15StaticANIObjectD0Ev()+0x68 (section 8 @ 0xA21E0) (0x64D94030) [engines/fullpipe/scene.cpp:124] scummvm:_ZN8Fullpipe5SceneD0Ev()+0x118 (section 8 @ 0x84BBC) (0x64D94060) [engines/fullpipe/gameloader.cpp:421] scummvm:_ZN8Fullpipe10GameLoader11unloadSceneEi()+0xf8 (section 8 @ 0x2BB44) (0x64D94090) [engines/fullpipe/gameloader.cpp:387] scummvm:_ZN8Fullpipe10GameLoader12preloadSceneEii()+0x254 (section 8 @ 0x2CB2C) (0x64D940F0) [engines/fullpipe/gameloader.cpp:572] scummvm:_ZN8Fullpipe10GameLoader13updateSystemsEi()+0xc8 (section 8 @ 0x2CDC4) (0x64D94110) [engines/fullpipe/modal.cpp:233] scummvm:_ZN8Fullpipe10ModalIntro6finishEv()+0xc4 (section 8 @ 0x62268) (0x64D94130) [engines/fullpipe/modal.cpp:91] scummvm:_ZN8Fullpipe10ModalIntro4initEi()+0x54 (section 8 @ 0x64198) (0x64D94170) [engines/fullpipe/fullpipe.cpp:630] scummvm:_ZN8Fullpipe14FullpipeEngine12updateScreenEv()+0x13c (section 8 @ 0x24000) (0x64D94190) [engines/fullpipe/fullpipe.cpp:440] scummvm:_ZN8Fullpipe14FullpipeEngine3runEv()+0x5b8 (section 8 @ 0x25394) (0x64D94320) [base/main.cpp:327] scummvm:_Z7runGamePK6PluginR7OSystemRKN6Common6StringE()+0xd48 (section 8 @ 0x80A0) (0x64D94840) [base/main.cpp:599] scummvm:scummvm_main()+0xde8 (section 8 @ 0x953C) (0x64DDA0E0) [backends/platform/sdl/amigaos/amigaos-main.cpp:79] scummvm:main()+0x260 (section 8 @ 0x6AF8)
It could be that heap corruption has something to do with endian issues. To fix endian issues, somebody would have to go through all file loading code in fullpipe engine.
Here is one obvious bug (but doesn't fix graphics):
It gives just bogus reading for BE.
by , 3 years ago
Crashlog with plugins instead of static
comment:18 by , 3 years ago
I added a new crashlog.
This time with a shared build and using plugins.
Someone told me that it "could" be easier to track down a bug, since, if there is an out of bounds read, it will not read somewhere in the huge static binary, but stay inside the .so it was triggered...so, here goes.
by , 3 years ago
...and the serial stack backtrace
comment:19 by , 3 years ago
I now get the crash always at the same place (see the last crashlog - engines/fullpipe/gameload.cpp, 309), is there anything i could try to fix this?
comment:20 by , 3 years ago
|Summary:||FULLPIPE: AmigaOS: Crash on start → FULLPIPE: Crash on start (AmigaOS4)|
by , 3 years ago
comment:21 by , 3 years ago
sev asked me on Discord if I coulp help for that bug. I have an old PowerBook G4 here, and I can reproduce the problem here on Debian PowerPC bullseye/sid.
With fullpipe-demo-de.zip, and a new debug build without -gsplit-dwarf (since it looks like Debian unstable has a GDB bug when using this, at the moment), I got a gdb backtrace which I've attached (gdb-debian-powerpc-scummvm-fullpipe1.txt).
I can do more debug steps and test some patches if this helps.
comment:22 by , 3 years ago
Oh man, thank you both...standing by for testing :-)
comment:23 by , 3 years ago
|Keywords:||Big Endian added|
|Summary:||FULLPIPE: Crash on start (AmigaOS4) → FULLPIPE: Crash on start (Big Endian)|
by , 3 years ago
comment:24 by , 3 years ago
No problem, I'm always happy to help debugging something on some uncommon arch :)
I've also attached a Valgrind dump, with ScummVM run with "-d 7" (it's big, so I've gzipped it). Had to build ScummVM with a null output, though, because of some other bug between Valgrind PPC and X11...
Hope this adds some useful info.
by , 3 years ago
comment:25 by , 3 years ago
After pulling the fix in 285bc1421e04a6ea476b2ce898513ddbb56dcc4c, we go a little further, but it still crashes after pressing Esc after the bed appears.
New gdb log attached.
comment:26 by , 3 years ago
I made a couple of commits based on the outputs which you provided, so if you could rerun it again with -d5 flag, that would be awesome. Also, please press ESC several times until you see the character falling down from a pipe.
comment:27 by , 3 years ago
The last crash is weird. It crashed in
__GI_fseek (fp=0x9b99b0, offset=103312076, whence=0) at fseek.c:36, and that offset is valid, because the file size is 110478134.
I just pushed 73508f6111a80d1d93277117cb183791b36ba01b with more debug output.
comment:28 by , 3 years ago
Ah, I looked up in the Debian sources for "mismatching next->prev_size". It looks like memory override, thus, valgrind would help here a lot.
by , 3 years ago
comment:29 by , 3 years ago
Here's what I get under Valgrind with 73508f6111a80d1d93277117cb183791b36ba01b, and SDL_VIDEODRIVER=dummy SDL_AUDIODRIVER=dummy (because of a Valgrind issue with X11 on ppc).
It's been running for hours (since I can't have X11 and Valgrind, the dummy SDL output doesn't let me press the Esc key), so I've cut it after some time. Hope it still contains some useful information, though.
I'll try running this on OpenBSD/macppc, too, it has some useful malloc() debug flags which may be much faster than Valgrind.
by , 3 years ago
comment:30 by , 3 years ago
And here's what I get on OpenBSD/macppc 6.7, with sysctl vm.malloc_conf=CFGS, which adds some malloc() canaries.
The crash appears when pressing Esc after the bed appears in the intro.
comment:31 by , 3 years ago
I made another commit. Interesting, that this time it did not crash, you just stopped it.
comment:32 by , 3 years ago
Thanks. With 409280f6e0fd08f68a0c7fc6d163540a51aa4517, it still triggers a malloc() canary on OpenBSD. New log.
Also, I see that graphical corruption issue that's reported in #9713.
comment:33 by , 3 years ago
Ah, I had that thing fixed in my branch. I just pushed another change.
by , 3 years ago
comment:34 by , 3 years ago
Thanks. With 9caa6e318c6e6e6607b878ef2451d9986964f5d8 (SharedPtr), it now runs on Debian PowerPC (still with the graphical glitch from #9713, though). It might be enough for AmigaOS too (can't test that), but it still triggers a canary on OpenBSD (which is quite strict).
So here comes gdb-openbsd-macppc-malloc-cfgs-3.txt.
by , 3 years ago
comment:35 by , 3 years ago
Sorry for the long delay, work...
With todays master the crash is finally gone.
Thank you so much for the fixes
and @dwatteau please stay around for more Big Endian madness :-)
I can also confirm #9713 is still valid
comment:36 by , 3 years ago
|Priority:||high → normal|
comment:37 by , 2 years ago
No more (default) crashing, thanks a lot
comment:38 by , 2 years ago
|Status:||new → closed|
comment:39 by , 23 months ago
|Component:||Engine: Fullpipe → Engine: NGI|