Opened 2 years ago
Closed 2 years ago
Last modified 2 years ago
#11919 closed defect (invalid)
|Reported by:||ObjectInSpace||Owned by:||sev-|
Description (last modified by )
ScummVM should provide the ability for in-game text to be read via synthesized speech. Apart from the convenience of playing interactive fiction without having to look at a screen, this will enable all games using this engine to be enjoyed by the estimated 289 million people around the world with vision loss, some of whom enjoy adventure. Several Z-code interpreters support TTS, as does Retroarch, so it should be feasible for this project also.
Windows has an open-source library available called Tolk which should do most of the heavy lifting itself: https://github.com/dkager/tolk/
There is another called Universal Speech which appears to do a similar thing, but I don't think it has been updated as recently: https://github.com/qtnc/UniversalSpeech
Both of these libraries support JAWS+NVDA which are the most popular screenreaders. They also offer SAPI speech for universal compatibility.
Microsoft also provides a text-to-speech API via the XBox SDK. This is specific to Narrator, which is also included on Windows. Similarly for OSX, IOS and Android, support for their screenreaders is provided via native accessibility APIs.
Ideally, a player of an IF game should be able to hear the response of their input read back to them. Graphical adventure games should have the name of the currently selected object or action be read, along with any text as it is displayed on screen such as conversation prompts.
Scenario 1: player of Zork 1 types :open mailbox", hears back "the mailbox is now open. There's a leaflet inside."
Scenario 2: Day of the Tentacle player moves the cursor over the clock, hears "grandfather clock." Player presses o, hears "open."
Scenario 3: player of Grim Fandango decides to talk to Carla, hears: "1. Busy night? 2. What's the shuttle waiting for? 3. Can I try out your metal detector?"
There are a few different ways to achieve this. Retroarch uses optical character recognition (OCR), which converts the text from screenshots into a machine readable format via pattern matching algorithms. another project called SoniFight essentially reverse-engineers certain games to find the text from the memory address. (https://github.com/FedUni/SoniFight) However I feel that these are both sort of hackish. Since these strings exist in the game already, ScummVM should ideally be able to send them directly to each platform's assistive technology via their applicable APIs.
Change History (9)
comment:1 by , 2 years ago
comment:2 by , 2 years ago
comment:3 by , 2 years ago
comment:4 by , 2 years ago
You're right, I did in fact miss this! Thank you for the information, that's huge progress. It sounds like the remaining work is for the TTS to be supported on more platforms and game engines, with the engine support lagging more behind? Is there a methodology for locating the text, or does it change for each game? I'm interested to help but I'm not sure where to begin.
comment:5 by , 2 years ago
So...can this be closed?
comment:6 by , 2 years ago
Since it sounds like TTS is a work in progress, I would appreciate a tracking bug for the remaining work around platform and engine support as well as any additional features being considered (such as a global preference to enable TTS everywhere) We could use this bug for that. Otherwise, there's probably not much use in keeping it open.
comment:7 by , 2 years ago
|Status:||new → closed|
Closing as invalid, this is not a bug. If you would like to follow the progress, we regularly put the most significant features/improvements/additions to the NEWS.md file which then goes into release ChangeLog. The most finetuned details are seen in git logs.
Adding additional load on the team for tracking the progress here is not a great idea.
Also, if you think, ScummVM "should" support something, you are more than welcome to help with the coding or finding someone who will do it for you. We do features as we like or find passion in since the dev team are volunteers, not the hired ones.
On your question about why the support is not everywhere. ScummVM is a collection of game engines. Each engine author/subteam need to do some work in order to add calls to TTS subsystem when they're displaying text. Currently, we have 86 engines and some of them are complete and authors have left the project since. Thus, there is a very low chance that _all_ engines would get TTS support unless some dedicated effort from an interested person is provided.
comment:8 by , 2 years ago
What is the process for adding TTS calls to an engine? Where does it "go" to look up the text? This sounds like a great community effort for interested people, of which I am one! I'm just not sure where to start.
comment:9 by , 2 years ago
Here are few examples for you:
https://github.com/scummvm/scummvm/pull/1953 <-- simple
https://github.com/scummvm/scummvm/pull/2343 <-- more advanced
In every engine, you need to find the place where it prints out subtitles. Usually, it is something named text*, subtitles*, say* or a thing dealing with fonts. Then, in that place you would need to give a call to our TTS subsystem, the interface is pretty straightforward: https://github.com/scummvm/scummvm/blob/master/common/text-to-speech.h
The first PR for Sherlock engine, mentioned above, would be the thing needed for the vast majority of our engines, e.g. pretty minimal changes.
I am confused by this request. I think you may have missed the fact that Text to Speech is already available in the ScummVM 2.2.0 release for some game engines, including the one for IF games. This is an option that is turned off by default, so even if you turned TTS on for the ScummVM GUI, you still need to navigate to the game settings (select the game in the ScummVM launcher and click on Edit) to enable TTS for that game. We maybe should add a general option to enable TTS by default for games that support it.
TTS in ScummVM is currently only supported on Windows (using SAPI), macOS (using the system speech synthesis), and Linux (using speech dispatcher). I started working on support for it in the iOS port, but it is not available yet.
Also only a few game engines support it at this point. For each game engine we need to track the places where it prints text to the screen so that we can also add the option to speak that text. So this needs to be implemented individually for each engine and is easier for some than others depending how it handles text. So far, from what I remember, it has been implemented in Mortville Manor, Rex Nebular, The Case of the Serrated Scalpel, Lure of the Temptress, and most IF games (maybe all, but it's difficult to say considering how many of them there are, and they don't all handle the text in the same way, so we may have missed a few cases).