Opened 12 years ago

Closed 11 years ago

Last modified 5 months ago

#8706 closed patch (wontfix)

Optimisations to DXA Player

Reported by: SF/robinwatts Owned by: sev-
Priority: normal Component: Video
Keywords: Cc:
Game: Broken Sword 1

Description

Currently the DXA player does a bit more work each frame than it needs to.

Firstly, we read into one of the framebuffers. Then we malloc a new buffer and copy into that, decompress from it, then free that new buffer.

Later on we then memcpy the entire frame from one frame buffer to another.

In the new decode methods (12 and 13) we also malloc and free buffers each frame.

This patch reduces this overhead; we now have a couple of extra buffers; an inBuffer which we use to read into, a decompBuffer which we decompress the data into (only required for films which use methods 12 or 13).

Also, by decompressing into the correct buffers to start with we can avoid several memcpys.

I've tested this locally with Broken Sword I and it seems to work fine. Unfortunately my films don't use method 12, so I need to backtrack through the encoder versions and regenerate some films to try.

This patch is uploaded here in the meantime for comments.

Ticket imported from: #1744229. Ticket imported from: patches/811.

Attachments (6)

DXAOptimisations (8.5 KB) - added by SF/robinwatts 12 years ago.
First version of the DXA optimisations
DXAOptimisations2 (8.0 KB) - added by SF/robinwatts 12 years ago.
Second version of the DXA Optimisations
DXAOptimisations3 (9.7 KB) - added by SF/robinwatts 12 years ago.
Third version of the DXA changes
DXAToolsChanges3 (4.1 KB) - added by SF/robinwatts 12 years ago.
Tools changes to match v3 of the decoder diffs.
DXAChanges4 (8.9 KB) - added by SF/robinwatts 12 years ago.
Fourth version of the DXA diffs
DXAToolsChanges4 (4.3 KB) - added by SF/robinwatts 12 years ago.
Tools changes to match v4 of the decoder diffs.

Download all attachments as: .zip

Change History (34)

Changed 12 years ago by SF/robinwatts

Attachment: DXAOptimisations added

First version of the DXA optimisations

comment:1 Changed 12 years ago by SF/robinwatts

Oh, I forgot to say that this includes the 'upscaling' diffs from pathch 1709219 too. I can clean them out later if we want to commit one and not the other. Sorry.

Changed 12 years ago by SF/robinwatts

Attachment: DXAOptimisations2 added

Second version of the DXA Optimisations

comment:2 Changed 12 years ago by SF/robinwatts

Better patch:

* memory failures now result in error() rather than silent failure.
* Removed the contents of patch 1709219 from this patch.
* Tidied the { } style.
* Added code to cope with a new tag telling us how big the decompBuffer should be (no files need be recompressed, but we *can* recompress them (once I've updated the compressor) so they can run in less memory.
* Removed a couple of unrequired params.

File Added: DXAOptimisations2

comment:3 Changed 12 years ago by fingolfin

Patch looks good to me and could from my point of view be checked in. That is -- where does that new MAXD chunk come from? Shouldn't there also be a patch for the encoder tool which generates this tag?

While looking at this patch I noticed that the code in DXAPlayer::decodeNextFrame() seems quite odd to me. So we read a "tag", and if it is equal to CMAP, we do something. And if not? Then we pretend nothing happened and read the next tag, without any attempts at error handling, trying to skip over an unknown tag -- nothing. Is that really intended? The new code for handling the new MAXD chunk does the same. This seems questionable to me... (but this comment is certainly *not* direct against this patch here).

comment:4 Changed 12 years ago by SF/robinwatts

The MAXD chunk is a new one. I plan to do a patch for the encoder tool to generate that one next.

I thought the tag handling in decodeNextFrame looked odd too. It reads a tag which (I think) will either be NULL or CMAP, and then reads the frame.

My patch reads a tag; if it's the new MAXD chunk, then I process it, and read the next tag. This is then compared with CMAP as before. In this way the natural flow of the file isn't changed (i.e. existing files will work unchanged).

I won't commit this patch until I've got the encoding stuff working too.

comment:5 Changed 12 years ago by Kirben

There are currently no checks for other tags when decoding frames, as only CMAP or FRAM tags would exist, with a NULL tag used when they aren't required.

The code clean ups look fine, but the the placemtn of the MAXD chunk, looks like an abuse of the current DXA file format. Wouldn't the MAXD chunk be better placed in the actual headers? along with adding a version check for the header? the high bit of flags seems only space left in the headers, for a version check.

Changed 12 years ago by SF/robinwatts

Attachment: DXAOptimisations3 added

Third version of the DXA changes

comment:6 Changed 12 years ago by SF/robinwatts

Here are the latest versions of my decoder tweaks.

I haven't addressed kirbens comments on the placement of the MAXD chunk yet - will think about that in a bit.

This version of the diffs introduces a ZCNT chunk, that switches the decoder into 'continuous compression' mode. This means that a) we only initialise/finalise the decompressor once (so less mallocing and freeing overall), and b) means that each frame gets to be encoded with the benefit of having the zlib window primed with the previous frame.

I had expected this to offer a significant benefit, but in fact it seems to give me just 1.5% overall on the file I've tested it with. Disappointing to say the least.

I'll upload the corresponding encoder patch next too.

File Added: DXAOptimisations3

Changed 12 years ago by SF/robinwatts

Attachment: DXAToolsChanges3 added

Tools changes to match v3 of the decoder diffs.

comment:7 Changed 12 years ago by SF/robinwatts

The changes to encode_dxa required to support the MAXD and ZCNT extensions.

Given the paltry space benefits that these offer, I'm not sure that the extra complexity is worth it - unless the malloc/frees saved add up to something?

File Added: DXAToolsChanges3

comment:8 Changed 12 years ago by johndoe123

I tried the same what you call 'continuous compression' when coding the encoder, with similar results.
Additionally, it's dangerous to use this since it makes a possible future seeking functionality completely impossible.

Also, I wouldn't add so many chunk types (like Kirben said). I'd try to set a bit in the header (the byte where the scaling is stored has some bits free) and read an extended header after the main header.

I tried so many different approaches to get the files even smaller but nothing really worked well. Most of the time the resulting data, with new block types (RLE etc), for example, was smaller but the changes introduced more 'noise' and zlib produced bigger compressed data.

Other compression algorithms could improve compression. I tried LZMA which produced smaller files but also took considerably longer. Oh well. But I digress, that's another topic :)

comment:9 Changed 12 years ago by SF/robinwatts

For a saving of 1.5%, I think I agree with you - the loss of the ability to seek outweighs the extra benefit.

There may be a halfway house to consider here; we could use Z_FULL_FLUSH rather than Z_FINISH. This should (AIUI) allow us to have a single inflate/deflate instance and yet still to have the ability to seek.

i.e. we'd get the same compression as before, but for less work on each frame.

I also (prior to you introducing decode type 13) experimented with RLE and got similar results.

Some other ideas I had, but haven't tried yet:

* Decode type 13 allows for blocks to be detected as being moved, with motion vectors of +/- 8 (or so). For some films (such as the broken sword 1 intro) the pans are faster than this; would it be worth sending a 'whole frame motion vector' at the start?

* For frames which are static, but which have something moving in the middle of them, would it be worth sending a bbox at the start of each frame? This could limit the encodings work so we only have to code the stuff within that bbox.

* If I read coding13 properly, it codes 4x4 blocks, and sometimes breaks them down into 2x2 blocks. Maybe we could consider coding 32x32s, down thru 16x16s, 8x8s, 4x4s and 2x2s ?

comment:10 Changed 12 years ago by johndoe123

1.
This would probably only work if the whole frame moves and doesn't change in any other way. Most frames, though, even if they seem to look the same, have several pixels changed.

2. & 3.
I tried something some time ago that addresses these two things (they're similar problems):
It used some kind of quad-tree hierarchical approach.
It's quite simple: It starts out with the whole frame, divided into 4 quadrants.
For each quadrant it more or less does the same the encoder currently does with 4x4 blocks, i.e. skip, motion compensation, color reduction etc. Then it stores the smallest result in a buffer.
If the resulting size is not constant it divides the quadrant further into 4 more quadrants. This all works recursively.
The output size was good, a little bit smaller than now, but not good enough.

I was working on an improved encoder, mostly code-wise, that will make it easier to add new compression ideas.
Basically, each of the codecs is a class (derived from a base class, of course) that can encode one single frame. The previous and current frame are given in a list (I also played with using more than just one previous frame).
Each codec class is then inserted into a list. The final encoder that is itself derived of the same base class.
Well, the design wasn't that bad and better than the current one :)

comment:11 Changed 12 years ago by SF/robinwatts

1. The whole frame motion vector thing does work if just a few pixels have changed; we pick the vector which results in the smallest change. (This is an interesting problem for a lossless palettised codec - I guess we'd count the number of pixels that change, rather than SAD or SSD as we'd do in a lossy codec).

Then we code as normal - and hopefully far more of it will have no changes (i.e. the blocks will code small).

2. The idea of a bbox might be more affected by "stray pixels" changing outside that box though.

> "I was working on an improved encoder..."

Does that mean you've stopped work on it now?

comment:12 Changed 12 years ago by johndoe123

1. Ah, ok. That could help.

>> "I was working on an improved encoder..."
> Does that mean you've stopped work on it now?

More or less. I sort of over-engineered the thing and it became strange :)
I'll probably try to do it again, though, since the design I had before it got messy was imo not bad, and adding new codecs was really easy.

comment:13 Changed 12 years ago by fingolfin

Folks, it's great you are thinking about how to improve the DXA codec. However, be careful not to ignore some pragmatic aspects:

* 0.10.0 is out, with the current DXA decoder in it. Any new improved DXA codec which is not fully backward compatible must provide some really noticable improvement, else it just isn't worth the pain, because people won't want to re-encode or re-download cutscenes just for a few bytes

* Being able to seek is a very useful feature. So if you want to do "continous compression", you probably should adapt a B/I/P-frame approach as in MPEG?

* Yes, using Z_FULL_FLUSH etc. can be used to retain seeking capabilites and so on. But the code gets more complex. More complex = more error prone, harder to debug, harder to maintain. I'd not even *think* about including it for less than 5-10% percent.

* Keep CPU constraints in mind, too: The decoder must be fast enough for all sorts of machines. And while the encoder doesn't have to work in real-time, if it gets too slow (which quickly happens the more complicated you make the codec), people won't be too happy either. Again, if the savings are not really really worth the effort, I'd rather not add this complexity.

As it is, we are still bigger than the (lower quality!) MPEG2 videos, but AFAIK a lot smaller than the original Smacker videos, so that's fine already. Improving the compression more would be great, but you have to weigh the gain against the costs. Don't overengineer :).

Anyway: This tracker item seems like the wrong place for this discussion ;-). The actual question that should be discussed here is the MAXD chunk, as once we add that, we will have a hard time moving back. In particular as it will make "new" DXA movies incompatible with older ScummVM versions, a step I'd think about very very hard before taking it -- is it really worth the savings? Do we need MAXD at all? Is it so bad to maybe have to reallocate a buffer a handful of times?

comment:14 Changed 12 years ago by johndoe123

> * backward compatible
I think it's almost a must to stay backward compatible and not that hard to do either.

> * seek, "continous compression"
"Continous compression" means the same approach like "solid archiving" in RAR etc., don't compress each frame independantly but see them as continuous stream.

> * More complex = more error prone, CPU constraints
I agree completely.

> * if it gets too slow
The slowest thing in the encoder is the zlib compression. The codec13 is pretty fast on my machine (16-20ms) while compression is several times slower. But you're right.

> Don't overengineer :).
Very wise words, something every coder should keep in mind :)

> This tracker item seems like the wrong place for this discussion
Well we could move to the mailing list?

This is almost worth a separate library :)

comment:15 Changed 12 years ago by SF/robinwatts

Fingolfin: I agree that it's not worth it for the current levels of compression benefit. (I say as much in one of the comments below).

I'm tempted to lose the MAXD chunk, as (as Kirben says) it is a bit of an abuse of the file format. I'll produce some more patches that do it using an extension to the header.

The upshot will be:

* Code that decompresses faster than the current stuff, given existing films (and new ones).

* A tweaked encoder that will produce films that decompress using slightly less memory than before (but old films will still work fine).

* This will leave us with a fileformat with an extensible header system, so that we can add new header fields transparently in future versions.

* No real extra complexity.

If I do do any experiementation with the codec itself, I'll write it up on a Wiki page and point john_doe at it, so we can discuss things there - but yes, we'd be wanting to drastically improve the compression to make it worthwhile.

Changed 12 years ago by SF/robinwatts

Attachment: DXAChanges4 added

Fourth version of the DXA diffs

comment:16 Changed 12 years ago by SF/robinwatts

New patches. This is the latest encoder patch, exactly as described before, except I've left the ability to do continuous compression in.

Having written the code, and had it come out really quite neatly, I'm loathe to lose it, cos it might be useful in future.

The encoder patch (which I'll upload next) includes code to use it, but it's disabled by default.

This uses bit 0 of the flags byte to signal an 'extended header', which is a series of TAG/SIZE/DATA BLOCK areas at the end of the header terminated with a 0 tag. This means the code can skip over any unknown tags. None of the references I've found for DXA files show bit 0 as being used.

File Added: DXAChanges4

Changed 12 years ago by SF/robinwatts

Attachment: DXAToolsChanges4 added

Tools changes to match v4 of the decoder diffs.

comment:17 Changed 12 years ago by SF/robinwatts

File Added: DXAToolsChanges4

comment:18 Changed 12 years ago by SF/pwigren

If you update the DXA format again, please make encode_dxa able to re-encode older dxa files to the new format, without having to go through the PITA process of converting the SMKs...

comment:19 Changed 12 years ago by fingolfin

Use our patch tracker to make feature requests. Here is not the place to make them, anyway.

comment:20 Changed 11 years ago by sev-

So, with MAXD chunk, is it still backwards-compatible? If it is, I'll commit these patches.

comment:21 Changed 11 years ago by sev-

Owner: set to sev-

comment:22 Changed 11 years ago by Kirben

The MAXD chunk changes are backwards compatible, but don't offer enough benefits (1.5% difference), to make extending the DXA header worth it.

comment:23 Changed 11 years ago by SF/robinwatts

Kirben is right. I should remove the MAXD stuff and just commit the buffering optimisations.

That'll be entirely compatible with what we've currently got, just faster.

I had hoped for more than 1.5%, which might have justified the extensions... ah well.

comment:24 Changed 11 years ago by sev-

So, any plans on this? We really have to make it before release.

comment:25 Changed 11 years ago by SF/robinwatts

The non contentious parts of this patch have gone to svn already.

The other stuff (continuous compression) just doesn't offer enough benefit to justify the need to fork the bitstream. So closing the patch.

comment:26 Changed 11 years ago by SF/robinwatts

Status: newclosed

comment:27 Changed 11 years ago by sev-

Resolution: wontfix

comment:28 Changed 5 months ago by digitall

Component: Video
Game: Broken Sword 1
Note: See TracTickets for help on using tickets.