Opened 12 years ago

Closed 12 years ago

Last modified 10 months ago

#8632 closed patch

WinCE Port: ARM version of PocketPCHalf

Reported by: SF/robinwatts Owned by: SF/knakos
Priority: normal Component: Port: WinCE
Keywords: Cc:
Game:

Description

This patch offers an ARM assembly version of the PocketPCHalf scaler. This scaler is used for "The Curse of Monkey Island" to take a 640x480 screen (in either 565 or 555 format) down to 320x240 by averaging groups of 4 pixels together.

The ARM version is almost exactly the same as the C version, except it gets the rounding of pixel components correct.

Ticket imported from: #1696852. Ticket imported from: patches/737.

Attachments (2)

diff (3.9 KB) - added by SF/robinwatts 12 years ago.
New ARM PocketPCHalf implementation, plus C changes to call it
diff.2 (7.3 KB) - added by SF/robinwatts 12 years ago.
New PocketPCHalf implementation (unrolled), plus C changes to call it

Download all attachments as: .zip

Change History (9)

Changed 12 years ago by SF/robinwatts

Attachment: diff added

New ARM PocketPCHalf implementation, plus C changes to call it

comment:1 Changed 12 years ago by SF/knakos

Very good. Rounding should look good as well. Why not try unrolling it a couple of times for extra speed? Do you pass any special optinos to 'as' in your makefile?

comment:2 Changed 12 years ago by SF/knakos

Owner: set to SF/knakos

comment:3 Changed 12 years ago by SF/robinwatts

The inner 'per destination pixel' core is 19 cycles, so unrolling can at best approach 19 cycles per pixel.

On XScales, the branch prediction stuff should mean that branches cost just a single cycle, so the inner loop works out is 21 cycles. Unrolling once would get that down to 20, at the cost of more complex logic. Doesn't seem like enough of a win to justify the additional complexity and code size to me, but I'll do it if you want.

For non XScales, we lose the branch prediction stuff, which means we have to figure on branches costing 5 cycles or so. Even so, we should be within 26% or so of optimal with this code (assuming we are operating entirely within the D-cache). I'll have a play with an unrolled version with this in mind to see if savings can be made.

No special options passed to as.

Changed 12 years ago by SF/robinwatts

Attachment: diff.2 added

New PocketPCHalf implementation (unrolled), plus C changes to call it

comment:4 Changed 12 years ago by SF/robinwatts

Here is a new version, with unrolled middle loop. Should be 19.5 cycles per pixel on XScale, 20.5 on non XScale.

File Added: diff

comment:5 Changed 12 years ago by SF/knakos

Status: newclosed

comment:6 Changed 12 years ago by SF/knakos

Excellent work Robin. Committed in the trunk in rev. 26442.

* Be aware that currently the curse of monkey island has a memory leak which leads to a crash soon in the wince version. This will be taken care of shortly by the dev. team.

comment:7 Changed 10 months ago by digitall

Component: Port: WinCE
Note: See TracTickets for help on using tickets.