Opened 8 years ago

Closed 4 years ago

Last modified 4 years ago

#9705 closed defect (fixed)

FULLPIPE: Crash on start (Big Endian)

Reported by: raziel- Owned by: sev-
Priority: normal Component: Engine: NGI
Version: Keywords: has-backtrace, Big Endian
Cc: Game: Full Pipe

Description

ScummVM 1.10.0git (Feb 11 2017 10:16:18)
Features compiled in: Vorbis FLAC MP3 RGB zLib MPEG2 Theora AAC FreeType2 JPEG PNG cloud (servers, local)

I get a crash after starting the game.
No screen loads, just a black screen or window comes up, no sound no graphics.
No backtrace either, but a stacktrace (attached)

Not sure if it helps though, here's whataddr2line says:

addr2line --section=.text -C -fe scummvm 0x5220D8

Fullpipe::Picture::setPaletteData(unsigned char*)
engines/fullpipe/gfx.cpp:707

Full Pipe (Windows/English)

AmigaOS4 - PPC - SDL - BE
gcc (adtools build 5.3.0) 5.3.0

Attachments (13)

Crashlog_scummvm_2017-02-12_23-44-32.txt (35.8 KB ) - added by raziel- 8 years ago.
Crashlog
Debug 101 v1.1.2.png (230.8 KB ) - added by raziel- 8 years ago.
Debug snapshot of crash
Crashlog_scummvm_2017-19-11_08-21-43.txt (15.7 KB ) - added by raziel- 7 years ago.
log1.txt (16.2 KB ) - added by raziel- 7 years ago.
memorypool.cpp crash
Crashlog_scummvm_2019-10-08_09-26-42.txt (38.4 KB ) - added by raziel- 5 years ago.
Crashlog with plugins instead of static
Debug_Serial.txt (46.5 KB ) - added by raziel- 5 years ago.
...and the serial stack backtrace
gdb-debian-powerpc-scummvm-fullpipe1.txt (27.4 KB ) - added by dwatteau 4 years ago.
valgrind-debian-powerpc-scummvm-fullpipe1.txt.gz (53.4 KB ) - added by dwatteau 4 years ago.
gdb-debian-powerpc-scummvm-fullpipe2.txt (18.4 KB ) - added by dwatteau 4 years ago.
valgrind_sdl_dummy.txt.gz (209.5 KB ) - added by dwatteau 4 years ago.
gdb-openbsd-macppc-malloc-cfgs.txt (14.8 KB ) - added by dwatteau 4 years ago.
gdb-openbsd-macppc-malloc-cfgs-2.txt (13.9 KB ) - added by dwatteau 4 years ago.
gdb-openbsd-macppc-malloc-cfgs-3.txt (11.5 KB ) - added by dwatteau 4 years ago.

Download all attachments as: .zip

Change History (52)

by raziel-, 8 years ago

Crashlog

comment:1 by sev-, 8 years ago

I think it is just you're running out of memory. The line where you're getting a crash is '_paletteData = (byte *)malloc(1024);'

comment:2 by raziel-, 8 years ago

I still have 58% of memory free (2 GB Ram) when the crash happens, i don't believe i'm running out of it.
May htere be another reason?

Anything i can do to test?

comment:3 by raziel-, 8 years ago

Putting scummvm in a debugger and starting Fullpipe i get this stacktrace:

Your program has crashed! ip = 0x228ee7c
Program has stopped at an unknown point in memory (TRAP)

[engines/fullpipe/statics.cpp: line 1701]: _ZN8Fullpipe8Movement4loadERNS_10MfcArchiveEPNS_15StaticANIObjectE (scummvm)
[engines/fullpipe/statics.cpp: line 234]: _ZN8Fullpipe15StaticANIObject4loadERNS_10MfcArchiveE (scummvm)
[engines/fullpipe/statics.cpp: line 183]: _ZN8Fullpipe5Scene4loadERNS_10MfcArchiveE (scummvm)

Does this help?

by raziel-, 8 years ago

Attachment: Debug 101 v1.1.2.png added

Debug snapshot of crash

comment:4 by raziel-, 8 years ago

As you can see in the debug window (top right) there's "abadcafe" in one of the registers, which means there was access to memory that hasn't been initialised.

comment:5 by raziel-, 8 years ago

It's a game of luck with this engine/game.

I *once* was able to make it run (only when using the non-stripped debug version though, and only in window mode).

The timer counts down and displays the intro (slow)
The game screen comes up but all of the "sprites" are either reduced to gfx meshup lines or (as in the case of the cookoo's clock a big fat black rectangular. No sprites are actually drawn correct.
After the scene with the hole under the bed, a full screen display opens and seconds after that the upper half of that screen is covered in gfx meshup and my entire system goes down with a shriek

comment:6 by csnover, 7 years ago

Keywords: has-backtrace added

comment:7 by csnover, 7 years ago

Please give it a go with the latest master and let me know if things get any further than before. Thanks!

comment:8 by raziel-, 7 years ago

Thanks for looking at this.

I'm afraid it's pretty much the same.
I can't reach any game screen.
Window stays at the launcher imager, screen stays black.

(new log attached)

comment:9 by csnover, 7 years ago

According to that backtrace, the crash is happening inside of newlib, again failing to allocate memory just like before, but at a different place. I had hoped the reduced number of heap allocations and significantly reduced memory leakage in the latest code would have improved the situation, but it still seems like the system is having memory trouble. If this is the case, it is probably not a ScummVM bug at this point since the game startup doesn’t leak at this point according to instrumentation.

There is some cruft in configure that changes the way that AmigaOS builds work which might be causing trouble, like forcing static linkage and making other changes to compiler flags which may not be needed with the more modern GCC 5 compiler you’re using, so maybe a place to start would be get rid of these lines for the AmigaOS backend in configure and try recompiling:

append_var CXXFLAGS "-mlongcall"
LDFLAGS=`echo $LDFLAGS | sed 's/-use-dynld//'`
append_var LDFLAGS "-static"

and maybe this one too:

append_var LDFLAGS "-Wl,--export-dynamic"

You might also try using configure --enable-plugins --default-dynamic to reduce the size of the ScummVM image in memory, to allow the game to have more memory, and/or -Os for the optimisation flag to have the compiler optimise the image size a bit more.

Let me know if any of that makes a difference, otherwise you will probably need to complain upstream (there’s not much we can reasonably do to work around bugs in the system’s memory allocator, if that’s what’s happening, which it appears so). Thanks!

comment:10 by raziel-, 7 years ago

Well, the first three settings you mentioned are for cross-compiling...i don't cross compile, i do native compiling

The export-dynamic one i will take out and see if that makes any difference.
But i only do static builds anyway (right now), as shared objects are even more error prone on my platform than static builds.

I just don't get why Full Pipe is the only game that causes memory probelms (not that i don't say it's possible), but why? Is there so much gfx or other mem needed as in all those other games?

Just curious...

comment:11 by raziel-, 7 years ago

So, i changed my configure according to your suggestion.

I still get a crash, but maybe the log gets clearer now?
I get a crash in memorypool and now i get this crash always in memorypool and not some random addresses scattered around the code or other running programs.

Not sure, but maybe that helps?

by raziel-, 7 years ago

Attachment: log1.txt added

memorypool.cpp crash

comment:12 by csnover, 7 years ago

Unfortunately it is still just consistently newlib crashing when allocating more memory.

This engine peaks at 180MiB on startup on my machine, which is quite a lot compared to Starship Titanic or SCI32 which both use less than 40MiB. I am not sure if there are any other engines in ScummVM that use so much memory.

If you are getting super unlucky, newlib might be doing a poor job of managing the many small allocations and running out of pages or something even if there is enough physical RAM to handle the load. I’m not blown away by the difference in number of allocations, it makes about 2.5x the number of allocations made by SCI32, though these are more frequently very small (16 byte) allocations. At face value, this doesn’t seem like something that should be breaking. You might try creating a small test app that makes a lot of small heap allocations and see what happens.

For reference here are some rough numbers on the top 10 allocations by total number of allocations at startup of the full version of Full Pipe on my system, which you might be able to use to create some synthetic test app to crash the allocator:

Category	Persistent Bytes	# Persistent	# Transient	Total Bytes	# Total	# Transient/Total
Malloc 16 Bytes	49.31 KiB	3156	22787	405.36 KiB	25943	Ratio: %0.01, %0.08
Malloc 32 Bytes	403.00 KiB	12896	9658	704.81 KiB	22554	Ratio: %0.05, %0.03
Malloc 4.00 KiB	300.00 KiB	75	9776	38.48 MiB	9851	Ratio: %0.00, %0.03
StdioStream	32 Bytes	1	9690	302.84 KiB	9691	Ratio: %0.00, %0.03
Common::MemoryReadStream	528 Bytes	11	9671	453.84 KiB	9682	Ratio: %0.00, %0.03
Fullpipe::MemoryObject2	992.14 KiB	9071	0	992.14 KiB	9071	Ratio: %0.03, %0.00
Malloc 1.00 KiB	8.57 MiB	8774	29	8.60 MiB	8803	Ratio: %0.03, %0.00
Fullpipe::DynamicPhase	1.84 MiB	8579	0	1.84 MiB	8579	Ratio: %0.03, %0.00
Fullpipe::GameVar	572.25 KiB	5232	0	572.25 KiB	5232	Ratio: %0.02, %0.00
Malloc 128 Bytes	152.00 KiB	1216	2685	487.62 KiB	3901	Ratio: %0.00, %0.01

There’s a fairly long tail of other allocations (total numbers are 69580 persistent, 84454 transient) but this seems like a place to start.

Hopefully we will be able to collect a bit more data during release testing to see if there are any other systems which have this problem, though I am not really sure outside of the big-five how many other platforms are even going to be powerful enough to run this game regardless.

comment:13 by raziel-, 7 years ago

I'm going to relay and quote answer from an OS dev:

There certainly might be a problem when there are so many small allocations. This is why on AmigaOS it is recommended to use memory pools and item pools for frequent allocations/deallocations of small-size buffers.

I quite doubt such a trivial test has never been done by the Amiga newlib maintainer(s)

One more thing: there's a whole world of difference if the small amounts just keep getting allocated (and are only freed when the program ends), or if they are constantly allocated and deallocated throughout program runtine. Because if the latter is the case, the "loss of memory" problem is much more likely. Could you possibly ask the authors about that?

Thank you

comment:14 by csnover, 7 years ago

Priority: blockerhigh

Yes, the game does these sorts of allocations all over the place throughout its lifetime, though whether or not this is so is not really relevant here since the game is crashing already on startup. :)

We do have code for creating memory pools, Common::MemoryPool. I’m happy to accept a patch to use MemoryPool where necessary in this engine, though unless this shows up in other platforms it’s not something that I plan on spending time working on and so won’t be addressed for the next release.

comment:15 by dafioram, 7 years ago

Game: Full Pipe
Summary: FULLPIPE: Crash on startFULLPIPE: AmigaOS: Crash on start

comment:16 by raziel-, 7 years ago

A core dev was nice enough to check the demo of FullPipe (which also crashes) with the latest newlib

I tried it with the latest beta version of newlib.

The crash is in the dlmalloc code but is not because of lack of memory.

Assuming that the dlmalloc is not bugged (seems a little unlikely since
many other programs are using it without problems) what could cause the
crash is if data before or after an allocation is trashed (as in a
buffer overflow for instance).

Aynthing i could do, to further corner the problem?

comment:17 by raziel-, 6 years ago

Some good soul helped with debugging the crash and i collected some more information.

If another good soul from the ScummVM team could please take a look I'd be grateful:

Fullpipe crashed for me as well.

C++ new operator uses probably malloc too (what else?).

Newlib and malloc started to ring some bells and it seems that ScummVM does sometimes realloc(). I wonder if this could break the allocation scheme? Here is some info: http://forum.hyperion-entertainment.com/viewtopic.php?f=14&t=1591&hilit=realloc

I hacked custom malloc/calloc/realloc/free functions into ScummVM and Fullpipe started without crash 3 times. It crashed on each exit though - probably my fault :) By the way game sprite graphics looks buggy, perhaps endian-related.

I think realloc issue is not fully confirmed but it's one potential clue.

Tried also Fullpipe on Linux with Valgrind and Clang but didn't see much.

...

Here is my patch. Currently I'm able to start and exit FP successfully: http://capehill.kapsi.fi/scummvm/

There is a primitive attempt to profile drawing, shader compilation hack, memory allocation workaround in fullpipe.cpp and finally, some realloc() workaround attempt (but it didn't change a thing).

Maybe it would be possible to capture all malloc/free/realloc/calloc calls and create a newlib testcase based on that data. Data might be captured over serial, for example.

...

Could you maybe describe the malloc workaround a bit?
What do you think is going wrong in the scummvm code and what are you "fixing".
Maybe the devs at ScummVM don't want to investigate and simply add that exclusion for AmigaOS4, but history tells me they do want to support all platforms properly.

...

Workaround just provides global symbols called malloc, free, calloc and realloc and linker then uses these to put the binary together. Newlib's symbols get by-passed: now heap allocations are requested directly from ExecSG.

This impacts also other libraries that are linked with ScummVM and need those symbols, including SDL.

I don't suspect ScummVM doing "wrong" currently.

...

I recorded memory calls of Fullpipe startup and then tried to replay these with a pure Newlib-based test program and problem did NOT reproduce. It seems complicated.

If there is a heap corruption by ScummVM then maybe it's somehow AmigaOS(+newlib)-specific and doesn't trigger Valgrind on Linux.

On the other hand, after replacing Newlib-calls with IExec ones, issue didn't seem to occur.

...

We could try to put canaries before and after each heap allocation. If we get lucky, one of the "canaries" gets hurt and then it may be possible to slowly debug more, knowing the size of the allocation for the next attempt.

...

I see dead birds. Could be a bug on my side but after first tests, of 150000 free'd allocations, there were about 300 allocations where the tail end was smashed. Have to investigate more.

* Birdwatch: good canaries 149822, bad canaries 340

The canary patch (fullpipe.cpp part) is here if you want to play: http://capehill.kapsi.fi/scummvm/

...

Debugged little bit more and put a trap instruction inside dead canary detector. It seems that there is some bitmap freed:

(0x64D93E40) [engines/fullpipe/fullpipe.cpp:162] scummvm:free()+0x1a4 (section 8 @ 0x236C8)
(0x64D93E90) [engines/fullpipe/fullpipe.cpp:162] scummvm:free()+0x1a4 (section 8 @ 0x236C8)
(0x64D93EC0) [engines/fullpipe/gfx.cpp:715] scummvm:_ZN8Fullpipe6BitmapD1Ev()+0x50 (section 8 @ 0x36EE8)
(0x64D93EE0) scummvm:_ZN6Common14DefaultDeleterIN8Fullpipe6BitmapEEclEPS2_()+0x38 (section 8 @ 0x3D548)
(0x64D93F00) [engines/fullpipe/gfx.cpp:255] scummvm:_ZN6Common9ScopedPtrIN8Fullpipe6BitmapENS_14DefaultDeleterIS2_EEE5resetEPS2_()+0x34 (section 8 @ 0x3D59C)
(0x64D93F30) [engines/fullpipe/gfx.cpp:422] scummvm:_ZN8Fullpipe7Picture11freePictureEv()+0xf0 (section 8 @ 0x375BC)
(0x64D93F50) [engines/fullpipe/gfx.cpp:412] scummvm:_ZN8Fullpipe7PictureD2Ev()+0x30 (section 8 @ 0x377C0)
(0x64D93F70) scummvm:_ZN8Fullpipe11StaticPhaseD2Ev()+0x44 (section 8 @ 0xA47C8)
(0x64D93F90) scummvm:_ZN8Fullpipe12DynamicPhaseD2Ev()+0x34 (section 8 @ 0xA482C)
(0x64D93FB0) scummvm:_ZN8Fullpipe7StaticsD0Ev()+0x54 (section 8 @ 0xA48B0)
(0x64D93FD0) scummvm:_ZN6Common14DefaultDeleterIN8Fullpipe7StaticsEEclEPS2_()+0x44 (section 8 @ 0xA3A58)
(0x64D93FF0) [engines/fullpipe/statics.cpp:145] scummvm:_ZN6Common8for_eachIPPN8Fullpipe7StaticsENS_14DefaultDeleterIS2_EEEET0_T_S8_S7_()+0x44 (section 8 @ 0xA3AB4)
(0x64D94010) [engines/fullpipe/statics.cpp:115] scummvm:_ZN8Fullpipe15StaticANIObjectD0Ev()+0x68 (section 8 @ 0xA21E0)
(0x64D94030) [engines/fullpipe/scene.cpp:124] scummvm:_ZN8Fullpipe5SceneD0Ev()+0x118 (section 8 @ 0x84BBC)
(0x64D94060) [engines/fullpipe/gameloader.cpp:421] scummvm:_ZN8Fullpipe10GameLoader11unloadSceneEi()+0xf8 (section 8 @ 0x2BB44)
(0x64D94090) [engines/fullpipe/gameloader.cpp:387] scummvm:_ZN8Fullpipe10GameLoader12preloadSceneEii()+0x254 (section 8 @ 0x2CB2C)
(0x64D940F0) [engines/fullpipe/gameloader.cpp:572] scummvm:_ZN8Fullpipe10GameLoader13updateSystemsEi()+0xc8 (section 8 @ 0x2CDC4)
(0x64D94110) [engines/fullpipe/modal.cpp:233] scummvm:_ZN8Fullpipe10ModalIntro6finishEv()+0xc4 (section 8 @ 0x62268)
(0x64D94130) [engines/fullpipe/modal.cpp:91] scummvm:_ZN8Fullpipe10ModalIntro4initEi()+0x54 (section 8 @ 0x64198)
(0x64D94170) [engines/fullpipe/fullpipe.cpp:630] scummvm:_ZN8Fullpipe14FullpipeEngine12updateScreenEv()+0x13c (section 8 @ 0x24000)
(0x64D94190) [engines/fullpipe/fullpipe.cpp:440] scummvm:_ZN8Fullpipe14FullpipeEngine3runEv()+0x5b8 (section 8 @ 0x25394)
(0x64D94320) [base/main.cpp:327] scummvm:_Z7runGamePK6PluginR7OSystemRKN6Common6StringE()+0xd48 (section 8 @ 0x80A0)
(0x64D94840) [base/main.cpp:599] scummvm:scummvm_main()+0xde8 (section 8 @ 0x953C)
(0x64DDA0E0) [backends/platform/sdl/amigaos/amigaos-main.cpp:79] scummvm:main()+0x260 (section 8 @ 0x6AF8)

It could be that heap corruption has something to do with endian issues. To fix endian issues, somebody would have to go through all file loading code in fullpipe engine.

Here is one obvious bug (but doesn't fix graphics):
https://github.com/scummvm/scummvm/blob/master/engines/fullpipe/utils.cpp#L260

It gives just bogus reading for BE.

by raziel-, 5 years ago

Crashlog with plugins instead of static

comment:18 by raziel-, 5 years ago

I added a new crashlog.
This time with a shared build and using plugins.

Someone told me that it "could" be easier to track down a bug, since, if there is an out of bounds read, it will not read somewhere in the huge static binary, but stay inside the .so it was triggered...so, here goes.

by raziel-, 5 years ago

Attachment: Debug_Serial.txt added

...and the serial stack backtrace

comment:19 by raziel-, 5 years ago

I now get the crash always at the same place (see the last crashlog - engines/fullpipe/gameload.cpp, 309), is there anything i could try to fix this?

Thank you

comment:20 by raziel-, 4 years ago

Summary: FULLPIPE: AmigaOS: Crash on startFULLPIPE: Crash on start (AmigaOS4)

comment:21 by dwatteau, 4 years ago

Hi,

sev asked me on Discord if I coulp help for that bug. I have an old PowerBook G4 here, and I can reproduce the problem here on Debian PowerPC bullseye/sid.

With fullpipe-demo-de.zip, and a new debug build without -gsplit-dwarf (since it looks like Debian unstable has a GDB bug when using this, at the moment), I got a gdb backtrace which I've attached (gdb-debian-powerpc-scummvm-fullpipe1.txt).

I can do more debug steps and test some patches if this helps.

comment:22 by raziel-, 4 years ago

@dwatteau
@sev

Oh man, thank you both...standing by for testing :-)

comment:23 by raziel-, 4 years ago

Keywords: Big Endian added
Summary: FULLPIPE: Crash on start (AmigaOS4)FULLPIPE: Crash on start (Big Endian)

comment:24 by dwatteau, 4 years ago

No problem, I'm always happy to help debugging something on some uncommon arch :)

I've also attached a Valgrind dump, with ScummVM run with "-d 7" (it's big, so I've gzipped it). Had to build ScummVM with a null output, though, because of some other bug between Valgrind PPC and X11...

Hope this adds some useful info.

comment:25 by dwatteau, 4 years ago

After pulling the fix in 285bc1421e04a6ea476b2ce898513ddbb56dcc4c, we go a little further, but it still crashes after pressing Esc after the bed appears.

New gdb log attached.

comment:26 by sev-, 4 years ago

I made a couple of commits based on the outputs which you provided, so if you could rerun it again with -d5 flag, that would be awesome. Also, please press ESC several times until you see the character falling down from a pipe.

comment:27 by sev-, 4 years ago

The last crash is weird. It crashed in __GI_fseek (fp=0x9b99b0, offset=103312076, whence=0) at fseek.c:36, and that offset is valid, because the file size is 110478134.

I just pushed 73508f6111a80d1d93277117cb183791b36ba01b with more debug output.

comment:28 by sev-, 4 years ago

Ah, I looked up in the Debian sources for "mismatching next->prev_size". It looks like memory override, thus, valgrind would help here a lot.

by dwatteau, 4 years ago

Attachment: valgrind_sdl_dummy.txt.gz added

comment:29 by dwatteau, 4 years ago

Here's what I get under Valgrind with 73508f6111a80d1d93277117cb183791b36ba01b, and SDL_VIDEODRIVER=dummy SDL_AUDIODRIVER=dummy (because of a Valgrind issue with X11 on ppc).

It's been running for hours (since I can't have X11 and Valgrind, the dummy SDL output doesn't let me press the Esc key), so I've cut it after some time. Hope it still contains some useful information, though.

I'll try running this on OpenBSD/macppc, too, it has some useful malloc() debug flags which may be much faster than Valgrind.

by dwatteau, 4 years ago

comment:30 by dwatteau, 4 years ago

And here's what I get on OpenBSD/macppc 6.7, with sysctl vm.malloc_conf=CFGS, which adds some malloc() canaries.

The crash appears when pressing Esc after the bed appears in the intro.

comment:31 by sev-, 4 years ago

I made another commit. Interesting, that this time it did not crash, you just stopped it.

comment:32 by dwatteau, 4 years ago

Thanks. With 409280f6e0fd08f68a0c7fc6d163540a51aa4517, it still triggers a malloc() canary on OpenBSD. New log.

Also, I see that graphical corruption issue that's reported in #9713.

comment:33 by sev-, 4 years ago

Ah, I had that thing fixed in my branch. I just pushed another change.

by dwatteau, 4 years ago

comment:34 by dwatteau, 4 years ago

Thanks. With 9caa6e318c6e6e6607b878ef2451d9986964f5d8 (SharedPtr), it now runs on Debian PowerPC (still with the graphical glitch from #9713, though). It might be enough for AmigaOS too (can't test that), but it still triggers a canary on OpenBSD (which is quite strict).

So here comes gdb-openbsd-macppc-malloc-cfgs-3.txt.

by dwatteau, 4 years ago

comment:35 by raziel-, 4 years ago

@sev
@dwatteau

Sorry for the long delay, work...

With todays master the crash is finally gone.

Thank you so much for the fixes

and @dwatteau please stay around for more Big Endian madness :-)

I can also confirm #9713 is still valid

comment:36 by sev-, 4 years ago

Priority: highnormal

comment:37 by raziel-, 4 years ago

@sev-

No more (default) crashing, thanks a lot

comment:38 by sev-, 4 years ago

Owner: set to sev-
Resolution: fixed
Status: newclosed

Yay!

comment:39 by digitall, 4 years ago

Component: Engine: FullpipeEngine: NGI
Note: See TracTickets for help on using tickets.