Opened 17 years ago

Closed 17 years ago

Last modified 14 years ago

#889 closed defect (fixed)

SIMON1: Scene stays black on MAC

Reported by: fingolfin Owned by: fingolfin
Priority: high Component: Engine: AGOS
Keywords: Cc:
Game: Simon the Sorcerer 1

Description

Since some weeks, simon1talkie & simon1win are broken (don't have simon1dos, so I dunno about that) on my Mac OS X system (or maybe all BE systems?)

What happens is that after the intro (e.g. I ESC over it), when you should see simon standing in the scene, awiting your commands, you only see the verbs at the bottom of the screen - the upper half, however, stays black.

This was introduced around the time Jamieson rewrote the midi code. More precisely, it works with a checkout from "2003-05-20 12:00 GMT" but not with one from "2003- 05-20 15:00 GMT" (checkouts from 13:00 and 14:00 just crash).

Ticket imported from: #754957. Ticket imported from: bugs/889.

Change History (33)

comment:1 by fingolfin, 17 years ago

Priority: normalhigh

comment:2 by SF/jamieson630, 17 years ago

Hmm, are those checkouts based on GMT/UTC or based on SourceForge time? If the latter, I do have some trivial changes in during that 3-hour block. If it's UTC time, however (and note that the CVS server's timestamps are 3 hours earlier than UTC), then the crash precedes that time of day at which I started committing changes.

Note that this was also the day that we switched from an 0x00-based new() operator to the 0xE7 initialization, and all of the changes between 12:00 and 15:00 GMT (which are actually 09:00 through 12:00 by the CVS server's timestamps) are related to that.

Note also that no music changes occur at the point in the game you are referring to. The theme music continues to play uninterrupted, and no attempt is made to change it. The original game possibly had a music fade functionality for when the screen is fading out, but we currently do not implement it. The ONLY thing that might cause the music to hang is the auto-looping. If this were the case, the program should hang right about the time Simon pulls the monster out of the hat (if you let it run uninterrupted).

Since I don't have a BE system (and simon1win to be working just fine on my Intel), I would recommend taking the first checkout that fails (but doesn't crash) and changing the new () operator back to 0x00 initialization. That will tell you if the new init code is the culprit.

If that doesn't work, you can test the music engine by placing a return; at the start of SimonEngine::loadMusic (line 4705 of simon/simon.cpp in the current CVS). That will prevent music from ever starting. If that takes care of the problem, that is a much better indication that the music engine is the culprit.

I am removing myself from assignment as I have no way to test possible solutions, and very much doubt this is a music- related issue anyway. If after testing the code changes related to the new() behavior, you still believe this is related to the MIDI engine, feel free to indicate why and to place it back on my TODO list.

comment:3 by SF/jamieson630, 17 years ago

Owner: SF/jamieson630 removed

comment:4 by fingolfin, 17 years ago

Yes the times are in GMT (as I specfiied). I didn't actually look which changes were performed in that interval. The hint with the 0xE7 changes to operator new was good - if I change that back to a memset(0) in latest CVS, it works!

Hence, somebody with valgrind should debug this (that person should first remove the memset call in our operator new, or even the whole custom operator new(), because otherwise Valgrind can't detect the memory as unused, I think.

comment:5 by wjp, 17 years ago

Kirben asked me to run valgrind on simon1win. Used semi-current anonymous cvs, with the memset removed from operator new.

http://www.math.leidenuniv.nl/~wpalenst/simon_valgrind

The main block of errors occurred as soon as the music started.

When I use -enull I don't get any errors.

comment:6 by SF/jamieson630, 17 years ago

HAHAHAHA! I had no idea Valgrind's output was so, umm, candid. "Final error counts will be inaccurate. Go fix your program!" That's just too funny.

I will look at the reported MidiDriver_Adlib issues and try to clean up any problems. Unless I am incorrect in my assumption that Fingolfin checked these with native MIDI drivers and not Adlib, I doubt that the reported memory problems are related to the bug post. However, considering what Valgrind said -- "30000 errors... I'm not reporting any more." -- I would guess I need to fix the Adlib issues anyway, before Valgrind will be able to get all the way to the spot this bug report is about.

comment:7 by SF/jamieson630, 17 years ago

Well, I checked the Adlib code that Valgrind is complaining about, and it's extremely straightforward with no uninitialized variables being used. Are there circumstances under which Valgrind can incorrectly determine a variable is uninitialized? (I know very little about Valgrind.) Perhaps running ScummVM with a native driver would at least circumvent Valgrind's problems with the Adlib and FMOPL code...?

comment:8 by wjp, 17 years ago

valgrind only reports uninitialized data at a point where it's actually used, not when it's only being copied. So an uninitialized value can be copied around several times (passed to functions, etc...) before valgrind reports it.

comment:9 by wjp, 17 years ago

All of the errors seem to be caused by an uninitialized AdlibPart::_pitchbend_factor

comment:10 by SF/jamieson630, 17 years ago

Thanks for the information, wjpalenstijn. I'm sure I was reading the Valgrind information wrong -- the line numbers I was looking up were really trivial sections of code.

You are right, _pitchbend_factor was not being initialized. I have corrected this problem in CVS. Does Valgrind run happy with that change? (Well, at least until the point of failure we're looking for here....)

comment:11 by wjp, 17 years ago

Valgrind reports no more errors now, I'm afraid.

comment:12 by SF/jamieson630, 17 years ago

Thanks, wjpalenstijn. Hard to say if no Valgrind errors means no OOB memory access period, or just on Linux. At least we nabbed that Adlib bug.

wooshell ran simon1demo (which has the same problem, according to nnooiissee on IRC) on Solaris, another BE platform. It runs fine on there. It would seem, then, that BE systems are not the common denominator here.

I'm hypothesizing that the problem may be related to struct alignments. There are four structs in simon/vga.h that serve as "views" into blocks of memory that were directly read from disk. VgaFile1Header, VgaFile1Header2, VgaFile1Struct0x8, and VgaFile1Struct0x6 all may require the GCC_PACK attribute to ensure that the members align properly with the raw memory blocks that are read using dynamic pointer casts to these struct types.

Fingolfin, can you try a GCC_PACK on these structs and see if that makes any difference? I may be full of crap here -- but the goofy way those structs are being used makes me uncomfortable. It seems unlikely that the scripts would have survived as far as they do in simon1dos if the script_offs member of VgaFile1Struct0x8 was not being read correctly. But considering that the bug in vc_3_new_sprite() only manifested itself in a few, widely dispersed areas of the game, I think perhaps the scripts COULD execute properly as long as they do before screwing up.

comment:13 by fingolfin, 17 years ago

I doN't think trying GCC_PACK will be useful. Those structs only contain uint16s, they will be tightly packed in any case.

Also, this wouldn't explain why it works with a checkout from before "2003-05-20 15:00 GMT

comment:14 by Kirben, 17 years ago

Does setting struct VC10_state {} to 0 in vga.h make any difference ?

comment:15 by fingolfin, 17 years ago

Nope it doesn't (and would have surprised me if it did, given that this struct is only allocated on the stack, hence the operator new() has no influence on it). We really only need to consider things allocated via new().

comment:16 by SF/jamieson630, 17 years ago

There are 25 occurrences of operator new() in the scummvm/simon directory. 16 of those are in sound.cpp, so I haven't looked closely at them yet. Another one occurs in MidiParser_S1D, and is not used for simon1win. The remaining 8 are in simon.cpp, and refer to the following objects:

SimonEngine (1 new) Item (3 new) FillOrCopyStruct (1 array new) File (2 new) SimonSound (1 new)

Since SimonEngine has so many members, I went through the constructor, matching up inits with member vars. Everything seems to be accounted for, either with explicit inits or implicitly for object types with their own constructors, except for two variables.

_number_of_savegames appears to be no longer in use now that the savegames are being counted on the fly. It should probably just be removed.

_letters_to_print_buf[80] is not cleared on construction. I'm not sure if that actually makes any difference.

So far no other occurrence of new() has struck me as being suspicious-looking. I'll continue to poke around as time permits.

comment:17 by Kirben, 17 years ago

I mentioned _letters_to_print_buf[80] as not been cleared the other night on IRC but fingolfin mentioned it would make no difference. _number_of_savegames is still used by o_unk_132_helper() in simon/items.cpp. That array and var. are now cleared at startup in ScummVM cvs.

comment:18 by Kirben, 17 years ago

Also I had fingolfin try simon1 with all music/sound code disabled the other day and the problem still occured. So the problem isn't in music or sound code.

comment:19 by fingolfin, 17 years ago

Jamieson, feel free to re-check all occurances of new - but I did exactly the same in the past, even before submitting this bug report. I added most of the init stuff in the SimonEngine constructor, and inits for all the other structs allocated via new, for this reason. Of course I may have overlooked something crucial, two pairs of eyes see more :-)

As for _letters_to_print_buf, it doesn't matter if it's inited or not because one can easily prove that only the first _num_letters_to_print chars in it are used at any given time. Since _num_letters_to_print is 0 initially, and since it's only incremented at the same time as something is written to _letters_to_print_buf, that means its initial value is irrelevant. nevertheless, of course I also tried it with a memset for that array, doesn't help.

Finally, it still might be an alignment/endian bug. For example, take this struct: struct Foo { byte x; int y; }; Then at least over here, after the x-byte there are four padding bytes. Now, consider this code: Foo f = { 0, 1 }; byte z = *(int *)&foo; On a little endian machine, this will set z to the value 0, even if operator new does a memset(ptr, 0xE7, len). But on a big endian machine, this would set z to 0xE7 with the memset active.

Hence, we should also look for cases where structs are cast to pointers and then dereferenced. Doing something like that is almost always a bad idea, even if it sometime might be unavoidable.

comment:20 by SF/jamieson630, 17 years ago

I'm not real clear on the conditions under which a compiler will pad a structure, or how it decides how much padding to create. But if I understand what you just said and what you said earlier, Fingolfin, padding would only be created after single-byte members? For example, you said earlier that if a struct contains all uint16's (16-bit values), padding would never be added (under any cicrcumstances?).

With that in mind, I searched for Simon structs containing single-byte members. The following structs qualify:

FillOrCopyStruct MusicInfo GameSpecificSettings VocBlockHeader VC10_state

I will poke around and see if the structs are getting typecast in suspicious ways. (There are enough quirky typecasts in the Simon code, to be sure.) Since I can't actually test anything, I would recommend two checks for Mac testers:

(1) GCC_PACK the above structs. If it *is* an endian / struct alignment / dereferencing issue, adding GCC_PACK should theoretically fix the problem, right?

(2) If those stucts don't work, hell, add GCC_PACK to all Simon structs just to make sure. There appear to be a total of 27 structs that are Simon-specific. (I didn't check classes.)

comment:21 by fingolfin, 17 years ago

Padding depends on the compiler. The exact rules differ a lot between archs, compilers, compile versions, and settings of that compile. E.g. on Mac OS X + GCC 3.3, alignment can be: packaged, "native", 68k, and now they also added a new padding mode to optimize for the G5. For details, look it up in the specs of your compiler.

A typical scheme is that byte sized elements are aligned to byte boundaries, ints (4 bytes) to addresses divisible by 4, doubles (8bytes) to address divisible by 8 etc.. So if you have three bytes in a struct in sequence, they'll occupy sequential memory locations; but if you have a single byte, and then an int, there'll be 3 padding bytes inserted. Etc. Of course there are many ways to vary this.

comment:22 by SF/jamieson630, 17 years ago

Guess an update is in order. Fingolfin added GCC_PACK to every struct used by Simon (excluding a couple sound structs), with no change in the symptoms. Consequently, it would seem unlikely that the problem is a struct alignment issue. Back to the drawing board....

comment:23 by SF/madshi, 17 years ago

Perhaps it would be easier to just compare the two checkouts from 2003-05-20? Can't you use the 12:00 GMT checkout as a base and then piece by piece add what was changed in the 15:00 GMT checkout? That's no "nice" style of chasing bugs, of course. It's more the stupid way. But it might work faster in this case...

comment:24 by Kirben, 17 years ago

Summary: SIMON1: Scene stays blackSIMON1 MAC: Scene stays black

comment:25 by fingolfin, 17 years ago

madshi, to clarify this, we know which precise change in that time frame is "responsible" for the problem. It's the overloaded operator new which "memset"'s some memory to 0xE7. However, that is not the cause of the bug, it merely uncovers it.

BTW, the technique you describe is precisely what I did when filing this report initially. It's actually quite nice a technique :-)

comment:26 by SF/madshi, 17 years ago

Ok fingolfin, thanks for the clarification. Then I've another suggestion:

jamieson630 said that there are 25 occurences of "new()" in the simon directory. How about adding a manual "memset(0)" call after each of those 25 "new()" calls? If the problem is gone after that, we can remove the memsets step by step to find the guilty new() call. If the problem is not gone, a new() call outside of the simon directory will probably be the guilty one. Then it gets even more difficult, but at least we would know a bit more. What do you think?

comment:27 by fingolfin, 17 years ago

That, too, is more or less what we have done. We added memset calls to all involved classes, to ensure they are properly inited. Oh and of course before and after that, we did Valgrind runs, to detect access to uninitialised memory, to no avail.

The curious thing about this is that it only occurs on Big Endian systems, or maybe even only on Macs (which is probably why Valgrind didn't help). Odd.

BTW, Kirben, the name of this bug report was misleading: this bug report is not about "simon1mac", rather it's about regular simon1 running on a mac.

comment:28 by fingolfin, 17 years ago

Summary: SIMON1 MAC: Scene stays blackSIMON1: Scene stays black on MAC

comment:29 by SF/madshi, 17 years ago

>> That, too, is more or less what we have done.

Oh, I read the whole thread and thought I understood it all. Seems I missed some parts. I'm sorry for that.

>> We added memset calls to all involved classes

And the problem was still there?

So, it must be a "new()" call in one of the general parts of scummvm, which brings Simon on Mac into problems (well, of course the *real* problem is something else). Is that right?

Do you have built in stack trace functionality? I mean can you inside of a function find out which function has called you? (I can do that in Delphi). If you can, you should be able to write a test log, which lists all the new() calls (and which function has done the calls) which are done when running simon. Then adding a memset(0) after each of them should finally help in finding the problem. However, might be a lot of work, I don't know how often new() is called in the general scummvm sources. Or you might inside of new() either call memset(0) or memset(somethingDifferent), depending on which unit called new().

Hmmm... Perhaps I should stop here. Until now I've only suggested things, which you've already done... :-) I just thought, I'd drop some ideas...

comment:30 by fingolfin, 17 years ago

Owner: set to fingolfin
Resolution: fixed
Status: newclosed

comment:31 by fingolfin, 17 years ago

Found and fixed the cause of this, finally!

An uint16 member var was accessed something like this: *(int *)&var -> this worked OK on a LE system, but not on BE (but of course in either case it's bad to do this).

comment:32 by fingolfin, 15 years ago

Component: Engine: AGOSEngine: AGI
Game: Simon the Sorcerer 2Space Quest 2

comment:33 by Kirben, 14 years ago

Component: Engine: AGIEngine: AGOS
Game: Space Quest 2Simon the Sorcerer 1
Note: See TracTickets for help on using tickets.