Opened 16 years ago

Closed 15 years ago

Last modified 12 years ago

#889 closed defect (fixed)

SIMON1: Scene stays black on MAC

Reported by: fingolfin Owned by: fingolfin
Priority: high Component: Engine: AGOS
Keywords: Cc:
Game: Simon the Sorcerer 1

Description

Since some weeks, simon1talkie & simon1win are broken
(don't have simon1dos, so I dunno about that) on my Mac
OS X system (or maybe all BE systems?)

What happens is that after the intro (e.g. I ESC over it),
when you should see simon standing in the scene, awiting
your commands, you only see the verbs at the bottom of
the screen - the upper half, however, stays black.

This was introduced around the time Jamieson rewrote the
midi code. More precisely, it works with a checkout from
"2003-05-20 12:00 GMT" but not with one from "2003-
05-20 15:00 GMT" (checkouts from 13:00 and 14:00 just
crash).

Ticket imported from: #754957. Ticket imported from: bugs/889.

Change History (33)

comment:1 Changed 16 years ago by fingolfin

Priority: normalhigh

comment:2 Changed 16 years ago by SF/jamieson630

Hmm, are those checkouts based on GMT/UTC or based on
SourceForge time? If the latter, I do have some trivial
changes in during that 3-hour block. If it's UTC time, however
(and note that the CVS server's timestamps are 3 hours
earlier than UTC), then the crash precedes that time of day
at which I started committing changes.

Note that this was also the day that we switched from an
0x00-based new() operator to the 0xE7 initialization, and all
of the changes between 12:00 and 15:00 GMT (which are
actually 09:00 through 12:00 by the CVS server's
timestamps) are related to that.

Note also that no music changes occur at the point in the
game you are referring to. The theme music continues to play
uninterrupted, and no attempt is made to change it. The
original game possibly had a music fade functionality for when
the screen is fading out, but we currently do not implement
it. The ONLY thing that might cause the music to hang is the
auto-looping. If this were the case, the program should hang
right about the time Simon pulls the monster out of the hat (if
you let it run uninterrupted).

Since I don't have a BE system (and simon1win to be working
just fine on my Intel), I would recommend taking the first
checkout that fails (but doesn't crash) and changing the new
() operator back to 0x00 initialization. That will tell you if the
new init code is the culprit.

If that doesn't work, you can test the music engine by
placing a return; at the start of SimonEngine::loadMusic (line
4705 of simon/simon.cpp in the current CVS). That will
prevent music from ever starting. If that takes care of the
problem, that is a much better indication that the music
engine is the culprit.

I am removing myself from assignment as I have no way to
test possible solutions, and very much doubt this is a music-
related issue anyway. If after testing the code changes
related to the new() behavior, you still believe this is related
to the MIDI engine, feel free to indicate why and to place it
back on my TODO list.

comment:3 Changed 16 years ago by SF/jamieson630

Owner: SF/jamieson630 deleted

comment:4 Changed 16 years ago by fingolfin

Yes the times are in GMT (as I specfiied). I didn't actually look
which changes were performed in that interval.
The hint with the 0xE7 changes to operator new was good - if I
change that back to a memset(0) in latest CVS, it works!

Hence, somebody with valgrind should debug this (that person
should first remove the memset call in our operator new, or even
the whole custom operator new(), because otherwise Valgrind
can't detect the memory as unused, I think.

comment:5 Changed 16 years ago by wjp

Kirben asked me to run valgrind on simon1win. Used
semi-current anonymous cvs, with the memset removed from
operator new.

http://www.math.leidenuniv.nl/~wpalenst/simon_valgrind

The main block of errors occurred as soon as the music started.

When I use -enull I don't get any errors.

comment:6 Changed 16 years ago by SF/jamieson630

HAHAHAHA! I had no idea Valgrind's output was so, umm,
candid. "Final error counts will be inaccurate. Go fix your
program!" That's just too funny.

I will look at the reported MidiDriver_Adlib issues and try to
clean up any problems. Unless I am incorrect in my
assumption that Fingolfin checked these with native MIDI
drivers and not Adlib, I doubt that the reported memory
problems are related to the bug post. However, considering
what Valgrind said -- "30000 errors... I'm not reporting any
more." -- I would guess I need to fix the Adlib issues anyway,
before Valgrind will be able to get all the way to the spot this
bug report is about.

comment:7 Changed 16 years ago by SF/jamieson630

Well, I checked the Adlib code that Valgrind is complaining
about, and it's extremely straightforward with no uninitialized
variables being used. Are there circumstances under which
Valgrind can incorrectly determine a variable is uninitialized? (I
know very little about Valgrind.) Perhaps running ScummVM
with a native driver would at least circumvent Valgrind's
problems with the Adlib and FMOPL code...?

comment:8 Changed 16 years ago by wjp

valgrind only reports uninitialized data at a point where
it's actually used, not when it's only being copied. So an
uninitialized value can be copied around several times
(passed to functions, etc...) before valgrind reports it.

comment:9 Changed 16 years ago by wjp

All of the errors seem to be caused by an uninitialized
AdlibPart::_pitchbend_factor

comment:10 Changed 16 years ago by SF/jamieson630

Thanks for the information, wjpalenstijn. I'm sure I was
reading the Valgrind information wrong -- the line numbers I
was looking up were really trivial sections of code.

You are right, _pitchbend_factor was not being initialized. I
have corrected this problem in CVS. Does Valgrind run happy
with that change? (Well, at least until the point of failure
we're looking for here....)

comment:11 Changed 16 years ago by wjp

Valgrind reports no more errors now, I'm afraid.

comment:12 Changed 16 years ago by SF/jamieson630

Thanks, wjpalenstijn. Hard to say if no Valgrind errors means
no OOB memory access period, or just on Linux. At least we
nabbed that Adlib bug.

wooshell ran simon1demo (which has the same problem,
according to nnooiissee on IRC) on Solaris, another BE
platform. It runs fine on there. It would seem, then, that BE
systems are not the common denominator here.

I'm hypothesizing that the problem may be related to struct
alignments. There are four structs in simon/vga.h that serve
as "views" into blocks of memory that were directly read from
disk. VgaFile1Header, VgaFile1Header2, VgaFile1Struct0x8,
and VgaFile1Struct0x6 all may require the GCC_PACK attribute
to ensure that the members align properly with the raw
memory blocks that are read using dynamic pointer casts to
these struct types.

Fingolfin, can you try a GCC_PACK on these structs and see if
that makes any difference? I may be full of crap here -- but
the goofy way those structs are being used makes me
uncomfortable. It seems unlikely that the scripts would have
survived as far as they do in simon1dos if the script_offs
member of VgaFile1Struct0x8 was not being read correctly.
But considering that the bug in vc_3_new_sprite() only
manifested itself in a few, widely dispersed areas of the
game, I think perhaps the scripts COULD execute properly as
long as they do before screwing up.

comment:13 Changed 16 years ago by fingolfin

I doN't think trying GCC_PACK will be useful. Those structs only
contain uint16s, they will be tightly packed in any case.

Also, this wouldn't explain why it works with a checkout from
before "2003-05-20 15:00 GMT

comment:14 Changed 16 years ago by Kirben

Does setting struct VC10_state {} to 0 in vga.h make any
difference ?

comment:15 Changed 16 years ago by fingolfin

Nope it doesn't (and would have surprised me if it did, given that
this struct is only allocated on the stack, hence the operator new()
has no influence on it). We really only need to consider things
allocated via new().

comment:16 Changed 16 years ago by SF/jamieson630

There are 25 occurrences of operator new() in the
scummvm/simon directory. 16 of those are in sound.cpp, so I
haven't looked closely at them yet. Another one occurs in
MidiParser_S1D, and is not used for simon1win. The remaining
8 are in simon.cpp, and refer to the following objects:

SimonEngine (1 new)
Item (3 new)
FillOrCopyStruct (1 array new)
File (2 new)
SimonSound (1 new)

Since SimonEngine has so many members, I went through the
constructor, matching up inits with member vars. Everything
seems to be accounted for, either with explicit inits or
implicitly for object types with their own constructors, except
for two variables.

_number_of_savegames appears to be no longer in use now
that the savegames are being counted on the fly. It should
probably just be removed.

_letters_to_print_buf[80] is not cleared on construction. I'm
not sure if that actually makes any difference.

So far no other occurrence of new() has struck me as being
suspicious-looking. I'll continue to poke around as time
permits.

comment:17 Changed 16 years ago by Kirben

I mentioned _letters_to_print_buf[80] as not been cleared the
other night on IRC but fingolfin mentioned it would make no
difference.
_number_of_savegames is still used by o_unk_132_helper() in
simon/items.cpp.
That array and var. are now cleared at startup in ScummVM
cvs.

comment:18 Changed 16 years ago by Kirben

Also I had fingolfin try simon1 with all music/sound code
disabled the other day and the problem still occured. So the
problem isn't in music or sound code.

comment:19 Changed 16 years ago by fingolfin

Jamieson, feel free to re-check all occurances of new - but I did
exactly the same in the past, even before submitting this bug
report. I added most of the init stuff in the SimonEngine
constructor, and inits for all the other structs allocated via new,
for this reason. Of course I may have overlooked something
crucial, two pairs of eyes see more :-)

As for _letters_to_print_buf, it doesn't matter if it's inited or not
because one can easily prove that only the first
_num_letters_to_print chars in it are used at any given time.
Since _num_letters_to_print is 0 initially, and since it's only
incremented at the same time as something is written to
_letters_to_print_buf, that means its initial value is irrelevant.
nevertheless, of course I also tried it with a memset for that
array, doesn't help.

Finally, it still might be an alignment/endian bug. For example,
take this struct:
struct Foo { byte x; int y; };
Then at least over here, after the x-byte there are four padding
bytes. Now, consider this code:
Foo f = { 0, 1 };
byte z = *(int *)&foo;
On a little endian machine, this will set z to the value 0, even if
operator new does a memset(ptr, 0xE7, len). But on a big endian
machine, this would set z to 0xE7 with the memset active.

Hence, we should also look for cases where structs are cast to
pointers and then dereferenced. Doing something like that is
almost always a bad idea, even if it sometime might be
unavoidable.

comment:20 Changed 16 years ago by SF/jamieson630

I'm not real clear on the conditions under which a compiler will
pad a structure, or how it decides how much padding to
create. But if I understand what you just said and what you
said earlier, Fingolfin, padding would only be created after
single-byte members? For example, you said earlier that if a
struct contains all uint16's (16-bit values), padding would
never be added (under any cicrcumstances?).

With that in mind, I searched for Simon structs containing
single-byte members. The following structs qualify:

FillOrCopyStruct
MusicInfo
GameSpecificSettings
VocBlockHeader
VC10_state

I will poke around and see if the structs are getting typecast
in suspicious ways. (There are enough quirky typecasts in the
Simon code, to be sure.) Since I can't actually test anything,
I would recommend two checks for Mac testers:

(1) GCC_PACK the above structs. If it *is* an endian / struct
alignment / dereferencing issue, adding GCC_PACK should
theoretically fix the problem, right?

(2) If those stucts don't work, hell, add GCC_PACK to all
Simon structs just to make sure. There appear to be a total
of 27 structs that are Simon-specific. (I didn't check classes.)

comment:21 Changed 16 years ago by fingolfin

Padding depends on the compiler. The exact rules differ a lot
between archs, compilers, compile versions, and settings of that
compile. E.g. on Mac OS X + GCC 3.3, alignment can be:
packaged, "native", 68k, and now they also added a new padding
mode to optimize for the G5.
For details, look it up in the specs of your compiler.

A typical scheme is that byte sized elements are aligned to byte
boundaries, ints (4 bytes) to addresses divisible by 4, doubles
(8bytes) to address divisible by 8 etc.. So if you have three bytes
in a struct in sequence, they'll occupy sequential memory
locations; but if you have a single byte, and then an int, there'll be
3 padding bytes inserted. Etc. Of course there are many ways to
vary this.

comment:22 Changed 16 years ago by SF/jamieson630

Guess an update is in order. Fingolfin added GCC_PACK to
every struct used by Simon (excluding a couple sound
structs), with no change in the symptoms. Consequently, it
would seem unlikely that the problem is a struct alignment
issue. Back to the drawing board....

comment:23 Changed 16 years ago by SF/madshi

Perhaps it would be easier to just compare the two
checkouts from 2003-05-20? Can't you use the 12:00 GMT
checkout as a base and then piece by piece add what was
changed in the 15:00 GMT checkout? That's no "nice" style of
chasing bugs, of course. It's more the stupid way. But it
might work faster in this case...

comment:24 Changed 16 years ago by Kirben

Summary: SIMON1: Scene stays blackSIMON1 MAC: Scene stays black

comment:25 Changed 16 years ago by fingolfin

madshi, to clarify this, we know which precise change in that time
frame is "responsible" for the problem. It's the overloaded
operator new which "memset"'s some memory to 0xE7. However,
that is not the cause of the bug, it merely uncovers it.

BTW, the technique you describe is precisely what I did when filing
this report initially. It's actually quite nice a technique :-)

comment:26 Changed 16 years ago by SF/madshi

Ok fingolfin, thanks for the clarification. Then I've another
suggestion:

jamieson630 said that there are 25 occurences of "new()" in
the simon directory. How about adding a manual "memset(0)"
call after each of those 25 "new()" calls? If the problem is
gone after that, we can remove the memsets step by step to
find the guilty new() call. If the problem is not gone, a new()
call outside of the simon directory will probably be the guilty
one. Then it gets even more difficult, but at least we would
know a bit more. What do you think?

comment:27 Changed 16 years ago by fingolfin

That, too, is more or less what we have done. We added memset
calls to all involved classes, to ensure they are properly inited. Oh
and of course before and after that, we did Valgrind runs, to
detect access to uninitialised memory, to no avail.

The curious thing about this is that it only occurs on Big Endian
systems, or maybe even only on Macs (which is probably why
Valgrind didn't help). Odd.

BTW, Kirben, the name of this bug report was misleading: this bug
report is not about "simon1mac", rather it's about regular simon1
running on a mac.

comment:28 Changed 16 years ago by fingolfin

Summary: SIMON1 MAC: Scene stays blackSIMON1: Scene stays black on MAC

comment:29 Changed 16 years ago by SF/madshi

>> That, too, is more or less what we have done.

Oh, I read the whole thread and thought I understood it all.
Seems I missed some parts. I'm sorry for that.

>> We added memset calls to all involved classes

And the problem was still there?

So, it must be a "new()" call in one of the general parts of
scummvm, which brings Simon on Mac into problems (well, of
course the *real* problem is something else). Is that right?

Do you have built in stack trace functionality? I mean can
you inside of a function find out which function has called
you? (I can do that in Delphi). If you can, you should be able
to write a test log, which lists all the new() calls (and which
function has done the calls) which are done when running
simon. Then adding a memset(0) after each of them should
finally help in finding the problem. However, might be a lot of
work, I don't know how often new() is called in the general
scummvm sources. Or you might inside of new() either call
memset(0) or memset(somethingDifferent), depending on
which unit called new().

Hmmm... Perhaps I should stop here. Until now I've only
suggested things, which you've already done... :-) I just
thought, I'd drop some ideas...

comment:30 Changed 15 years ago by fingolfin

Owner: set to fingolfin
Resolution: fixed
Status: newclosed

comment:31 Changed 15 years ago by fingolfin

Found and fixed the cause of this, finally!

An uint16 member var was accessed something like this: *(int
*)&var -> this worked OK on a LE system, but not on BE (but of
course in either case it's bad to do this).

comment:32 Changed 14 years ago by fingolfin

Component: Engine: AGOSEngine: AGI
Game: Simon the Sorcerer 2Space Quest 2

comment:33 Changed 12 years ago by Kirben

Component: Engine: AGIEngine: AGOS
Game: Space Quest 2Simon the Sorcerer 1
Note: See TracTickets for help on using tickets.