Opened 17 years ago

Closed 17 years ago

Last modified 13 months ago

#8168 closed patch (outdated)

CMI: Smush experiment (don't apply)

Reported by: eriktorbjorn Owned by:
Priority: normal Component: Engine: SCUMM
Keywords: Cc:
Game: Monkey Island 3

Description

This patch doesn't fix any bugs that I'm aware of, but
I'd still be interested to hear if it makes any
difference - for better or worse - on Fingolfin's
iBook, or for anyone else who've had problems with the
CMI smush movies since the "frame skip" patch.

The patch removes the potentially unsafe timer
handling, and probably makes the frame skipping a bit
more forgiving by allowing it to lag behind a little.

That said, the patch still leaves some to be desired.
Pausing smush movies is still broken, for instance. Is
there any easy way to sync the frame to current sound
sample instead of elapsed time?

Ticket imported from: #676610. Ticket imported from: patches/273.

Attachments (3)

cmi-smushtest.diff (3.3 KB ) - added by eriktorbjorn 17 years ago.
Patch against a January 29 CVS snapshot
cmi-smushtest2.diff (5.0 KB ) - added by eriktorbjorn 17 years ago.
Patch against a January 29 CVS snapshot
cmi-smushtest3.diff (3.1 KB ) - added by eriktorbjorn 17 years ago.
Patch against a February 8 CVS snapshot

Download all attachments as: .zip

Change History (33)

by eriktorbjorn, 17 years ago

Attachment: cmi-smushtest.diff added

Patch against a January 29 CVS snapshot

by eriktorbjorn, 17 years ago

Attachment: cmi-smushtest2.diff added

Patch against a January 29 CVS snapshot

comment:1 by eriktorbjorn, 17 years ago

The second version of the patch moves the delay to before
drawing the frame, instead of after. This seems much more
natural to me, and might improve the sync slightly. It also
tries to be a bit more intelligent about frame skipping. In
theory, at least. In reality, it will probably need some
tuning, assuming it works at all.

Since frame skipping is now indicated by setting a variable
in ScummRenderer (should probably be in SmushPlayer instead,
but if so that's for later), it might be possible to make
the decoders aware of frame skipping as well.

comment:2 by fingolfin, 17 years ago

It does help a bit. While frame skipping is still bad, at least now it's avaraged. With current CVS, I get this (I modified it to print out the number of frames skipped, much more useful than knowning which frames it skipped, IMHO :-):

...
WARNING: ScummRenderer: Skipped 1 frames to catch up!
WARNING: ScummRenderer: Skipped 2 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 1 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 1 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 1 frames to catch up!
WARNING: ScummRenderer: Skipped 2 frames to catch up!
WARNING: ScummRenderer: Skipped 6 frames to catch up!
...

With your patch it is more like:

...
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
...

So still a lot is skipped, but it looks better because similar amounts are skipped each time. But later, when you see guybrush writing on his little boat, you see patterns like this (with your patch):

...
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 6 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 20 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 6 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 20 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 4 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 5 frames to catch up!
WARNING: ScummRenderer: Skipped 3 frames to catch up!
WARNING: ScummRenderer: Skipped 20 frames to catch up!
...

Maybe a different codec is used there, I don't know, which makes it slower. The patter repeats a lot. One guess would be that the long deleay comes after decompressing a master frame (i.e. a full screen), which then it only decompressed deltas w.r.t to that master... again just a guess.

comment:3 by eriktorbjorn, 17 years ago

This is discouraging. Out of curiosity, how does this
computer compare to the original minimal system
requirements? I.e. should it be able to handle this in the
first place?

If your computer should be able to handle CMI, then we're
going to have to try and optimize the smush decoder somehow.
At least codecs 44 and 47, which are both used by the intro
movie. I'm not sure exactly where or why it switches.

Once the movie reaches Guybrush it may also have to render
subtitles. If memory serves me, it has to re-render those
for every frame where they are visible, but I don't know how
much extra work that is. Does turning the subtitles off make
any noticeable difference?

Hmm... Is it just me, or does codec 44 decode to one buffer,
and then copy it to another, which in turn will be copied to
the backend? That doesn't seem very efficient to me...

comment:4 by fingolfin, 17 years ago

The original specs are: Pentium 90, PCI graphics card, 16 MB RAM, 4x CD.

Thiis iBook has a G3/500, ATI Rage Mobility (AGP/8MB VRAM), 384 MB RAM, 16x CD/DVD drive

-> yes it should easily be capable of handling this :-)

comment:5 by eriktorbjorn, 17 years ago

Darn, I was hoping you were testing on some really low-end
computer. :-)

It's strange, though. My computer is a few years old (I
forget the exact clockspeed, but it's something like a
400-500 MHz PIII), and the CPU load meter usually stays
around 15-25% during the opening movie, and I'm unlucky if I
lose more than half a dozen frames. At least as long as I
don't do anything silly like moving the window around or use
any of the graphics filters.

I run the game entirely from my hard disk, and my desktop is
at 16 bpp colour depth, if that makes any difference.

So if we can assume that it isn't the actual decoding that
is going too slowly, then what? The screen updates? The
waitForTimer() function? I suppose the latter could be a
problem, if get_msecs() / delay_msecs() behave worse under
OS X than under Linux or Windows. But why should they?
Still, it may be an interesting experiment to remove the
waitForTimer() calls and see if that makes the movie run
much too fast...

By the way, you can probably disregard most of what I said
about codec 44. I thought it was decoding an entire screen
at a time, but apparently it isn't.

comment:6 by SF/ender, 17 years ago

So, erm, given it helps a little - is it applyable? :)

Also, apparantly although ComI uses delta compression on
some codecs, smush is designed so it's possible to skip
decompressing every Xth frame. I'm not sure which frames can
be skipped, you should be able to find out by checking how
often keyframes (non-delta using frame) is decoded.

comment:7 by fingolfin, 17 years ago

I guess the second version of the patch maybe should be applied, though I am not 100% sure it's the right thing....

but w/o the patch, I just played the intro movie on my G4/400 (which is actually way faster than the G3/500 iBook), which has Comi on my fast HD, has 768 MB ram, an ATI Radeo 8500 etc. etc. And guess what: it stuttered a little bit first, and then when guybrush started to talk, it "hung" at the same frame for periods of 5-20 seconds. That's inacceptable. Even if the SDL blitter on OS X would be the bottleneck, that would be no excuse. Heck, I have mplayer (which uses SDL) play back DivX movies smoothly, I can't believe we can't play back 8bit comic graphics smoothly <sigh>.

comment:8 by fingolfin, 17 years ago

Just tried with the patch, and uhm, still no good. It was skipping and hanging and jerking, almost seemed worse than w/o the patch. To be fair I have to mention that right now a process is doing file operations in the back. But of course at the same time I can play back videos (scaled by arbitrary factors, too) just fine.

I will now test it with the GL backend, to see if that makes any speed difference.

Reading the data on the fly from HD, as we do now, might be a very bad thing, and partially cause of the problems we see. If the disk buffer isn't as optimized as it maybe should be, doing that can cause delays, and even a 10ms delay can be too much.

comment:9 by SF/ender, 17 years ago

You -are- absolutely sure your files are not corrupt this time?

I'm running it at the moment on a PIII-600 laptop, over a slow
wireless connection through a samba share - and while it's
skipping a -lot- of frames, the playback is still tollerably
smooth. This was without the patch btw

comment:10 by fingolfin, 17 years ago

Yes I am sure the files are fine. How do you think corrupt files could produce such a behavariour anyway? Note that before the sync changes (when audoi & video were not synced) both played perfectly, they just got out of sync over time.

Anyway, you are comparing Apples (literally <g>) with bananas here, I run on a completly different I/O and video subsystem.

comment:11 by fingolfin, 17 years ago

The subtitles are partially at fault it seems, when I use "-n" then guybrush plays just fine during his diary speech. With subtitles one, it bogs down to 1 frame / 10 secs or so as soon as the big subtitle appears.

comment:12 by SF/ender, 17 years ago

That pretty much narrows it down to video-related processing
as opposed to anything else - I believe the subtitles are
redrawn every frame individually?

comment:13 by eriktorbjorn, 17 years ago

The inefficient language.tab handling is partially to blame,
I guess, but I don't know if it should make *that* much of a
difference. (I've just submitted an experimental patch to
optimize this.)

I know I've asked this before, but is there any way at all
waitForTimer() could be waiting for too long under OS X? The
original frame skip patch made it so that this function is
always called at least once bewteen frames, and the
experimental patch still uses it for timing the delay right.

comment:14 by fingolfin, 17 years ago

I will try the language.tab fix.

Of course it is possible that waitForTimer waits longer, due to the way we implement it that can happen on any target OS if you use very short wait intervalls.

comment:15 by fingolfin, 17 years ago

With the language.tab, it plays perfectly smooth again on my G4. Indeed, quadratic lookup of a translation string, done each frame, does have a very negative impact...

comment:16 by fingolfin, 17 years ago

Just tested on my iBook, too. With this patch and the language patch, the movies play pretty well there. Not perfect, still has a little bit stuttering here and there, but overall it's enjoyable. I think we can apply this patch.

BTW, erik, I know you can't use CVS from work... if you can use websites, though, then you might be able to tunnel CVS, if you want I can provide you with some details how that could work.

comment:17 by eriktorbjorn, 17 years ago

I'm still puzzled that it stutters at all. As I said, my
computer probably is on par with yours, and I can watch
smush movies and compile ScummVM at the same time without
noticeable skipping.

If waitForTimer() is the culprit, then using a timer should
be more precise, though our current implementation obviously
leaves something to be desired.

By the way, I don't think there are any technical reasons to
keep me from comitting stuff to CVS. I can check out things
- even through SSH, if necessary - just fine. Heck, I've
even managed to set up MingW so I can compile ScummVM, even
if it's a bit awkward.

It just feels so much safer to have someone else do it and -
hopefully - look it over first. Some of the stuff I've
submitted has been pretty buggy initially. :-)

comment:18 by eriktorbjorn, 17 years ago

Could we try one more experiment before making any decisions
on the other ones? In this one I keep the timer mechanism,
but I change the wait() function back to use a while-loop again.

It will still call waitForTimer() at least once between
every frame, but if there aren't any pending updates it will
just use it to poll, not to delay. At least, I think that's
what'll happen. I don't know why I didn't do it like this
from the start. I guess I didn't realize just how long
waitForTimer(1) could delay.

This is probably the most important part of the patch, but
I've also added the logic that allows the movie to lag
behind a bit before it starts skipping frames. I still don't
know if this is a good or bad thing, but what the heck.

I considered adding a mutex for _pending_updates, but I
don't see why it would be needed. I don't think there's any
harm if the threads modify it in the "wrong" order, and it's
already declared as volatile... But I'm not sure just when
and where mutexes are needed, so don't take my word for it.

Another thing I hadn't realized was just how badly the
opening smush movie slows down during the credits. We're
really going to need that language.tab patch - or similar -
to get this to work.

by eriktorbjorn, 17 years ago

Attachment: cmi-smushtest3.diff added

Patch against a February 8 CVS snapshot

comment:19 by fingolfin, 17 years ago

Doesn't really perform better, but I think our problem is more file I/O related anyway. We read the data on-demand, and if it isn't retrieve fast enough (after all, CD roms have bad seek times), then you get lag effects. Maybe if the reading would be done via a seperate thread and with double buffering...

comment:20 by eriktorbjorn, 17 years ago

Ah, you're running it from the CD? I've always run it from
hard disk since I find that much more convenient. (At least
as long as I have the disk space to spare.)

comment:21 by fingolfin, 17 years ago

Yes on the iBook I am running it from CD, and on the G4 from HD (which is one of the reasons why it runs smoothly there). I thought that was clear (I did state the speed of my CD drive for that reason :-). Maybe it would even work smooth if run from HD, but I don'T have the space to spare on the HD, plus I think it should be our goal to support playing from CD - after all, the original game allows that just fine, too.

comment:22 by eriktorbjorn, 17 years ago

I guess I just missed/forgot about the CD thing.

I don't remember how fast my CD-ROM is (it's a CD-writer,
and it's a few years old so I figure it can't be *that*
fast), but playing the game from it worked fine for me. I
did miss a few more frames than usual, but nothing
noticeable. (Of course, I had to apply the language.tab
patch to get through the credits section.)

I tried stressing it a bit by grepping through the resource
directory at the same time, but it still didn't skip too
badly. Starting yet another simultaneous grep process proved
too much for it, though.

So basically, I'm out of ideas.

comment:23 by fingolfin, 17 years ago

The faster your CD drive is, the slower it'll playback the video I think. Because the new CD drives first have to spin up to get their maximum speed, and that takes time and briefly increases the access times. Now, if your disk cache is "too big", then enough data will be cached in it to cause the CD drive to spin down in the mean time (this seems to be happening here), so when the cache goes empty, it first has to spin up the drive again, access the data and read it, which is for the first couple milliseconds slower than normal (CD seek times are much much higher than HD seek times).

This all would be no issue, wasn't it for the fact that our code basically assumes that fread() works instantly. This is a bad thing to assume for any video or audio playback code. THe proper solution employed in such situation is double buffering.

comment:24 by fingolfin, 17 years ago

I should follow my own advice and first fully analyze things before stating what their "cause" is <sigh>... so last night I went through the smush playback, and I found out that the CD rom is not really the cause (or at least not the worst part of it). I finally had enough space free on my HD, so I copied the data files there - and skipping was still bade (like every 2nd frame?). Anyway, I then run (again, did that before) gprof while playing the COMI intro smush video. It showed that 33% of the time was spent in calls to memcpy. Ouch. So I commented out the calls to the decoder / memcpy in the codec47 - i.e. I basically took out all the time used to decode the video data (the sound was still being decoded). Guess what - skipping was still very bad! (invisible of course, but we do print out this debug messages whenever a frame is skipped).

So I played some more, and in the end, found out that I could almost completly get rid of all skipping by leaving everything enabled (i.e. doing full decoding) except for the call to _scumm->_system->update_screen() in scumm_renderer.cpp. Ahah! So this is the bottle neck on my iBook.

Of course the data pipeline is quite bad, when you think about it:
1) Decode into buffer
2) copy buffer to scumm screen buffer
3) use copyrect to copy to the first backend buffer
4) backend converts buffer from 8 to 16 bits (GUI is drawn atop)
5) backend uses scaler on the data (yet another "blit"); this costs even when using the 1x scaler, BTW
6) then we tell SDL to blit it to the screen. In my case, SDL has to convert the 16 bit buffer to the 32bit screen depth first, though...

We are using 6 buffers! Note that in the basic case (no GUI visible, 1x scaling), we could directly render the buffer from the decoding (in 1) to the screen, saving 4 stages in between. Ouch. And with the GUI on, the video is still anyway, so we don't have to worry about throughput... I wonder if we should extend the backend to specifically support movie playback. I.e. we could just pass it the pointer to a full screen buffer which is immediatly blitted to the screen, at most with the scaler applied to it (that would still avoid two unnecessary steps).

comment:25 by SF/ender, 17 years ago

I'm not sure that much optimisation is wise. As I've been
saying for months, we need (at least) to remove SMUSH's own
buffers at some stage and use virtscreens, for INSANE to be
able to map actors on a video underlay.

However, direct rendering to OSystem backends would
require more spagetti code to do INSANE, as if it isn't going to
be an ugly enough hack for FT already.

However, if it comes to this, can we just use the overlay
system? It doesn't optimise the patch as much as a -direct-
access, but it reduces the need for a new api with limited use.

comment:26 by fingolfin, 17 years ago

Uhm, Endy, you are not quite up-to-date, we already render directly to the virtScreen :-) And I was not suggesting to remove that, either.

Using the overlay code is no option since the overlay expects 16bit colors. Since we still would have to conver from 8 to 16 bit, we would not save any cost.

The change I had in mind did not actually complicate the backend API much (one additional call), and in fact would have been transparent for existing implementations (that additional call would have an default implementation, so it would "just work"). Also the Smush/Insane code would really not have been more complicated:
I wanted to add a call blitToScreenAndUpdate(byte *buffer) which essentially combines a full screen copy_rect() followed by an update(). As long as no overlay is visible, the data stream could have been optimised. And in the smush code, we only would have to make a tiny change.
However, some benchmarking I performed today rendered this moot: even with this change the skipping is pretty bad here. And of course, if we perform scaling, we loose almost all our advantage again. The main problem there is that the scalers are all from 16 to 16 bit. But our input data is 8 bit, while our output may be 16, 24 or 32 bit.

But in any case, it seems on my system the real bottleneck is SDL_UpdateRects :-(

comment:27 by eriktorbjorn, 17 years ago

I haven't looked at aquadran's recent smush changes yet, but
perhaps we should close this one as obsolete?

comment:28 by fingolfin, 17 years ago

Resolution: outdated
Status: newclosed

comment:29 by fingolfin, 17 years ago

Done.

comment:30 by digitall, 13 months ago

Component: Engine: SCUMM
Game: Monkey Island 3
Note: See TracTickets for help on using tickets.