Opened 12 years ago

Closed 10 years ago

Last modified 12 months ago

#8851 closed patch

Speech for Mac BS1 english

Reported by: criezy Owned by: sev-
Priority: normal Component: Engine: Sword1
Keywords: Cc:
Game: Broken Sword 1

Description

According to some reports on the forum (see http://forum.scummvm.org/viewtopic.php?t=5453) the english version of the Mac BS1 is using big endian for the speech file. However this is not the case for all the Mac BS1 version (my french version has little endian speech file identical to the PC version).
Making the game work for big endian speech file is easy (the attached patch does it and has been tested by Mort - see aforementioned forum topic).

The difficulty is to know when to expect big endian speech file and when to expect little endian ones. The patch supposes that the PC version always uses little endian (should be safe) and that for the mac version (both demo and full) is depends on the language. In the patch only the english language is switched to big endian speech files (see SwordEngine::init()).

If it is indeed linked to the language (and not something else as the original interpreter version) the patch will make the game work for all PC versions and the french and english mac versions.

Ticket imported from: #1936137. Ticket imported from: patches/956.

Attachments (4)

bs1-mac-en.diff (4.3 KB ) - added by criezy 12 years ago.
bs1-mac-en-v2.patch (3.4 KB ) - added by sev- 11 years ago.
Updated patch for r38961
bs1-mac-en-v3.patch (6.2 KB ) - added by criezy 11 years ago.
Updated with heuristic to make the decision
bs1-mac-en-v4.patch (6.0 KB ) - added by criezy 10 years ago.
Update patch for r42417

Download all attachments as: .zip

Change History (17)

by criezy, 12 years ago

Attachment: bs1-mac-en.diff added

comment:1 by Kirben, 12 years ago

Actually the speech files in the English Macintosh demo of Broken Sword 1 are still little endian, the current code in ScummVM was based off testing with this demo.

comment:2 by criezy, 12 years ago

I remember that demo was mentionned in the forum. But as I don't have it and it is not listed on the demo download page I could't check it's speech file.

Also there was a patch for the mac version of BS1 that corrected bugs with the music and sound (and it is still mentioned on the Revolution's support page, althought the link is dead). Maybe it also made the original interpreter use big endian speech file for performance reason.

Anyway, as we don't know how to choose between little and big endian speech file for the mac version (and any choice will be based on wild guesses) maybe we should actually make that a user setting.

comment:3 by criezy, 12 years ago

I remember that demo was mentioned in the forum. But as I don't have it and it is not listed on the demo download page I coulnd't check it's speech file.

Also there was a patch for the mac version of BS1 that corrected bugs with the music and sound (and it is still mentioned on the Revolution's support page, although the link is dead). Maybe it also made the original interpreter use big endian speech file for performance reason.

Anyway, as we don't know how to choose between little and big endian speech file for the mac version (and any choice will be based on wild guesses) maybe we should actually make that a user setting.

comment:4 by sev-, 11 years ago

Owner: set to sev-

comment:5 by fingolfin, 11 years ago

Time to revive this... :)

I don't think a user setting would be a good solution; it's the "last way out" option if all else fails; but first we should try to detect this difference automatically if we can in any way.
First off, one should check if there is some header bit somewhere which indicates this difference (looking at the screenshot in the linked forum post, this sadly does not seem to be the case; but I don't have BE data files for BS1, so I am not 100% sure.

Next, we could try to use (partial) MD5s to detect the respective versions. Drawback: Doesn't work for currently unknown versions.

Lastly, we could use a heuristic: add a global flag _clustersAreBigEndian, default to false. Then, look at some suitable default WAVE from the data files (not sure if that makes sense, I don't know BS1 well enough -- but maybe it's possible to e.g. always look at the sounds from the intro screen or so).
To "guess" whether a set of 16 bit samples are in LE or BE, do this: Read the first few samples in one endianess. Then, compute the differences between samples. If your endianess was right, then statistically, those differences mostly should be very small (esp. at the start of a sample which starts quiet).

To illustrate this, consider this data in BE:
00 0E 06 99 05 89 03 78 04 A6 08 CF
and the same in LE
0E 00 99 06 89 05 78 03 A6 04 CF 08

If I read the first set as BE data (correct), the differences are less than 0x0800. If I read the second set as BE (wrong), there are differences bigger than 0x8000. Hence the guess would be that the first set is BE, the second is LE.

The idea is to do this once, at startup, for a sound for which we know that it starts out silent. The heuristic then would be extremely accurate.

comment:6 by criezy, 11 years ago

The heuristic you propose sounds like a good idea. You could even compute the difference between consecutive samples by first supposing the data are big endian and then little endian and keep the smallest one (this way you don't rely on a hardcoded cut-off).

Not sure which sound should be used though. The sound for the intro sequence is not in the speech file (it is provided in the cut-scene pack). The music is also in other files (AIF for the mac version). I suppose the first sound in the speech file is when Georges is brushing himself off after the explosion in the cafe. But my guess is that all speach sounds start at a low level and that two consecutive samples should be fairly similar anyway (even if the sound itself is loud), in other words that you have a smooth wave and not spikes.

comment:7 by bluegr, 11 years ago

Just had a look at criezy's screenshot of the two files (Mac and PC version). It seems that the WAVE files of the Mac version break the WAVE standard. Check here:
http://ccrma.stanford.edu/courses/422/projects/WaveFormat/

The "fmt " sub chunk ID should be 0x666d7420 if the sound data was in BE, but it's "fmt ", which breaks the standard...

by sev-, 11 years ago

Attachment: bs1-mac-en-v2.patch added

Updated patch for r38961

comment:8 by sev-, 11 years ago

I updated the patch so it applies cleanly to current SVN.

We really should look into adding that heuristic and commit it to trunk

by criezy, 11 years ago

Attachment: bs1-mac-en-v3.patch added

Updated with heuristic to make the decision

comment:9 by criezy, 11 years ago

You must have some mind reading skills: I was planning to work on that patch this week-end.
I have attached an updated version based on your updated patch (i.e. applies cleanly on current SVN) and with the heuristic to decide if the speech files of the mac version are in big or little endian.

This has been tested with the mac french version (as this is the only one I have) and it correctly detects it is little endian. It would be a good idea to test the heuristic result on other languages (it can be with the Windows version, but you will need to slightly modify the initialisation of the sword 1 engine to also use the heuristic with this version).
What I can say is that for the french version all the sentences I have tried give similar result (average difference at about 5e+3 for correct endianness and at about 2e+4 for incorrect one). This gives me some confidence on the stability of the computation (it needs at least 1000 samples in the sentence to give a stable result though, and the attached patch uses the first 2000 samples).

Note: I am computing the sum of the differences between two consecutive samples in double because I get overflow very quickly with the wrong endianness when using uint32.

by criezy, 10 years ago

Attachment: bs1-mac-en-v4.patch added

Update patch for r42417

comment:10 by criezy, 10 years ago

I updated the patch so it applies cleanly to current SVN.

Now that it has the requested heuristic, is there anything that still needs to be done before commit to the trunk?

comment:11 by sev-, 10 years ago

Yes, I reveiwed it long ago, just forgot to commit. Committed almost as is (waw -> wav in comments ;) )

comment:12 by sev-, 10 years ago

Status: newclosed

comment:13 by digitall, 12 months ago

Component: Engine: Sword1
Game: Broken Sword 1
Note: See TracTickets for help on using tickets.