Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual synth driver which can automatically recognise and switch between certain languages/synths #279

Open
nvaccessAuto opened this issue Jan 1, 2010 · 42 comments

Comments

@nvaccessAuto
Copy link

Reported by aleksey_s on 2009-02-20 19:56
Often, user works with multilingual information. Also it is not a secret, that quite rarely one synthesizer can speak good enough in more than one language. So NVDA can have feature to automatic recognize language and change synthesizer/voice to the prefered for this particular language. I see two cases this can be implemented in:

  1. Using info, provided by some assistive APIs;
  2. Having an internal mechanizm to recognize (some of) languages in the text, no matter from where the text is coming;
  3. Also, have a combinet variant.

I myself do not like [becouse i have bad experience with such implementation in jaws. Also, i am not sure assistive APIs provide language info for each portion of text data. Or often text has very mixed cases, and i am even more unsure AT will split such text by portions and provide appropriate language info for each.
So for me, 2 is more suitable, and i will try explain my view how it can be done in NVDA.

We can implement virtual synthesizer, let's call it "Auto language". It will split input text by punctuation characters and build queue of phrases. For each phrase, language will be detected by applying some hard-coded rules, and then portion will be sent to the configured synthesizer. also, this virtual synthesizer needs know when portion is finished by real synthesizer, so it will add blank portion with some defined "index", and after wait in different thread when this index become active. Then it will send the next portion etc.
to avoid overhead of synthesizers switching, "auto language" synth can initialize all required synthesizers when it inits itself.

Requires

  • It must provide functionality to switch any settings of synthesizer/voice, including switching synthesizers, voices, variants or even rate. For example, one would read english slower, than his mother language.
  • It must be easy to add recognizer for the new language.

Problems

It is difficult to write rules for most east european languages.
Blocked by #312

@nvaccessAuto
Copy link
Author

Comment 1 by jteh (in reply to comment description) on 2009-02-20 23:20
Replying to aleksey_s:

I myself do not like [1], becouse i have bad experience with such implementation in jaws. Also, i am not sure assistive APIs provide language info for each portion of text data.

If the web site is designed well, they should. However, as we all know, not all web sites are designed well. :) Nevertheless, I believe that this is the better option for languages based on the Latin alphabet; e.g. English, German, Italian, etc. Also, I think you will find that more sites are starting to honour the language attribute. What sort of problems have you seen problems in other screen readers?

We can implement virtual synthesizer, let's call it "Auto language". It will split input text by punctuation characters

Note that punctuation characters are different in many languages and are used differently. Splitting based on detected characters (e.g. Russian and Hebrew characters are obviously quite different to English, etc.) may be better. However, you then have to decide which language to use when you return to a Latin alphabetic language.

I am changing this ticket to restrict its scope to option 2 only, as option 1 will require a completely different implementation.

Changes:
Changed title from "language detection/voice switching feature" to "Virtual synth driver which can automatically recognise and switch between certain languages/synths"

@nvaccessAuto
Copy link
Author

Comment 2 by aleksey_s (in reply to comment 1) on 2009-02-21 14:18
Replying to jteh:

If the web site is designed well, they should. However, as we all know, not all web sites are designed well. :)

And what about forums, where user can use quotes from another languages, etc?.. i can imagine huge of cases, when this will be broken.

Nevertheless, I believe that this is the better option for languages based on the Latin alphabet; e.g. English, German, Italian, etc.

Thanks to Olga Yakovleva (author of festival synthDriver), i played with one python library (http://personal.unizd.hr/~dcavar/LID/), which shows very very promise results. I guess if we rewrite it in c++, there will be no performance degradation. Note, that it will be unique feature, as i know, no one another screen reader offers it.

What sort of problems have you seen problems in other screen readers?

This is that i have described. Rarely language info provided by web pages is quite enough.

Note that punctuation characters are different in many languages and are used differently. Splitting based on detected characters (e.g. Russian and Hebrew characters are obviously quite different to English, etc.)

i do not feel i fuly understood what you mean :-)
Also, punctuation marks in english and russian is allmost identical.

However, you then have to decide which language to use when you return to a Latin alphabetic language.

take a look on that library i talked about, it really impresses me.

@nvaccessAuto
Copy link
Author

Comment 3 by OlgaYakovleva (in reply to comment 2) on 2009-02-22 12:38
I studied the implementation of the library. Today I've found the project that seems to be more advanced:
http://code.google.com/p/guess-language/
I think it also worth studying.

@nvaccessAuto
Copy link
Author

Comment 4 by jteh on 2009-05-01 06:59
Changes:
Milestone changed from 0.6 to None

@nvaccessAuto
Copy link
Author

Comment by aleksey_s on 2009-05-04 10:05
(In #312) See the above branch.

@nvaccessAuto
Copy link
Author

Comment 6 by Bernd on 2010-08-21 11:59
Hi Lex and jteh,

is someboddy working on this ticket since ticket 312 has been closed?

@nvaccessAuto
Copy link
Author

Comment 7 by aleksey_s (in reply to comment 6) on 2010-08-26 06:58
Replying to Bernd:

is someboddy working on this ticket since ticket 312 has been closed?

I made a synthDriver which recognizes Cyrillic and Latin alphabets and uses a defined synthesizer for each. It is possible to specify different settings (such as rate, voice, variant) for each language separately. It preloads both synthesizers at a load time, so there is no lag when reading multilanguage texts. One of the known limitations is that you can't use the same synth for two languages with different settings. e.g. you can't use one sapi5 voice for Russian and one sapi5 voice for English. But you can use espeak for one and sapi5 for another freely.
I'll attach the driver in question.

@nvaccessAuto
Copy link
Author

Attachment multilang.ini added by aleksey_s on 2010-08-26 07:02
Description:
Config file. Goes where nvda.ini is located

@nvaccessAuto
Copy link
Author

Comment 8 by m11chen on 2011-04-25 08:07
Hi,

I am wondering if this multilan.py synthDriver could be used with the current 2011.1.1 release? Also, would it be possible to make modifications for it to allow auto synthesizer switching for Chinese, and if so, how should I make such modifications? I am guessing that language detection should be much easier for English-Chinese content compared to other mixed Latin based language content.

Thanks

@nvaccessAuto
Copy link
Author

Attachment multilang.py added by aleksey_s on 2011-04-25 08:22
Description:
Actual driver. goes under userConfig/synthDrivers folder.

@nvaccessAuto
Copy link
Author

Comment 9 by aleksey_s (in reply to comment 8) on 2011-04-25 08:28
Replying to m11chen:

I am wondering if this multilan.py synthDriver could be used with the current 2011.1.1 release?

Yes, see updated attachment. Note, however, that due to the latest changes in the main branch, it is broken for snapshots.

Also, would it be possible to make modifications for it to allow auto synthesizer switching for Chinese, and if so, how should I make such modifications? I am guessing that language detection should be much easier for English-Chinese content compared to other mixed Latin based language content.

Indeed. You probably should create a regular expression or check Unicode character boundaries. Subclass !Language and define !alphabetRegex class attribute or reimplement !recognize entirely, if regular expression check is not enough for you. Add new language to the !languagesList module variable.

@nvaccessAuto
Copy link
Author

Comment 10 by m11chen on 2011-04-25 11:19
Hi,

I tried to modify the multilang.py file with the following contents:

    [=sapi5
        voice = HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\IQ John

class ChineseLanguage(Language):
short = "zh_TW"
name = _("Chinese")
alphabetRegex = re.compile(U"['\U4E00'-'\U9FFF']([[zh_TW]]]
synthesizer)",re.U|re.I)
defaultSynth="sapi5"

But, as soon as I put in the new ChineseLanguage class, the virtual synth driver no longer appears in the list of available synthesizers. Sorry, as I do not have very much programming experience. For the following section:

#with detection order
languagesList = (EnglishLanguage(),RussianLanguage())

Does this mean that only two languages should be specified at a time? Or is it because I do not have the NewFon synthesizer installed and the script does not get compiled correctly...

Thanks for any help.

@nvaccessAuto
Copy link
Author

Comment 11 by m11chen on 2011-06-05 17:15
Hi,

Just wanted to find out if this is going to be included in the development priority anytime soon. Allowing NVDA to be configured with two or more synthesizers to speak two different languages, for example Chinese and English, where the two languages have very different pronunciation tables, thus making most voices incapable of speaking both fluently.

@nvaccessAuto
Copy link
Author

Comment 13 by TimothyLee on 2011-06-17 02:29
Aleksey, I've attached two Python modules developed by the Hong Kong Blind Union to read Chinese and English using separate voices.

The VirtualSynthDriver class refactored code from your multilang.py file to allow better reuse. VirtualSynthDriver supports the new speech API in trunk, and treats change in rate, pitch, inflection and volume as relative adjustment to all synths under its management.

The multilang-cjk speech synth inherits from VirtualSynthDriver and performs the actual language detection. There is no need for a separate configuration file, because settings for physical synths can be adjusted via the Settings Dialog Box.

It would be excellent if you can review the VirtualSynthDriver class, and have it added to trunk for use by other virtual synthesizers.

@nvaccessAuto
Copy link
Author

Comment 14 by m11chen (in reply to comment 13) on 2011-06-17 04:39
Replying to TimothyLee:

Aleksey, I've attached two Python modules developed by the Hong Kong Blind Union to read Chinese and English using separate voices.

The VirtualSynthDriver class refactored code from your multilang.py file to allow better reuse. VirtualSynthDriver supports the new speech API in trunk, and treats change in rate, pitch, inflection and volume as relative adjustment to all synths under its management.

The multilang-cjk speech synth inherits from VirtualSynthDriver and performs the actual language detection. There is no need for a separate configuration file, because settings for physical synths can be adjusted via the Settings Dialog Box.

It would be excellent if you can review the VirtualSynthDriver class, and have it added to trunk for use by other virtual synthesizers.

Thanks for the great work. I am testing this right now with the latest source code, and I have found a few problems.
I have placed both the multilang-cjk.py and virtualSynthDriver.py in the userConfig/syntheDrivers directory. I got the voices to work to speak in SAPI5 for Chinese and ESpeak for English. But when NVDA loads, there is an error generated:

ERROR - synthDriverHandler.getSynthList (12:19:29):
Error while importing SynthDriver virtualSynthDriver
Traceback (most recent call last):
File "synthDriverHandler.pyc", line 45, in getSynthList
File "synthDriverHandler.pyc", line 37, in _getSynthDriver
AttributeError: 'module' object has no attribute 'SynthDriver'

This is just an error sound when NVDA loads, and as far as I can tell, it doesn't affect the screen reader's operation.

Another thing I found was that if the text to be spoken contains numbers in the middle of the text, the text after the number does not get spoken. For example:

hello world 123 and this is the text after the number.

This is true regardless of setting the default language of NVDA to Traditional Chinese or English. I am running an English version Windows 7 64-bit, and I haven't tested on Chinese version Windows yet.

Everything else is great, except for a minor lag when navigating the menus with the cursor, but I suppose this is due to the fact that two synthesizers are loaded, and I'm not sure how much this can be improved. Previously, I have been using ESpeak in English at 100% rate, and that's just about as responsive as NVDA can be, so it would be pretty hard to match with the newly added overhead.

Also, would it be possible to make the speaking of numbers configurable as to having it spoken in English or Chinese?

Again, thanks for the great work. I am sure many of the Chinese users of NVDA will be very excited to hear about this long sought for enhancement.

@nvaccessAuto
Copy link
Author

Comment 15 by TimothyLee on 2011-06-18 02:54
Ah, the error from synthDriverHandler is the result of placing virtualSynthDriver.py in the source\synthDrivers\ folder. It should be placed inside the source\ folder instead.

As for the problem with numbers, it is actually related to a bug we have found with the espeak.dll callback. Apparently sometimes the callback is not invoked at all. We're in the process of debugging espeak right now, and hopefully will have a fix for upstream.

Thanks for the suggestion about an option to read numbers in English. We'll try to get that implemented and post an updated version of multilang-cjk.py here for further testing.

@nvaccessAuto
Copy link
Author

Attachment multilang-cjk.py added by TimothyLee on 2011-06-18 03:34
Description:
Updated dual-voice Chinese/English synthesizer with option to read numbers in English

@nvaccessAuto
Copy link
Author

Comment 16 by m11chen on 2011-06-19 05:09
Hello

Another problem I found for the CJK Dual-voice Virtual Synthesizer is that when a sentence contains both Chinese and English text, if the English text proceeds the chinese text, the speech output of both Chinese and English towards the junction of the two languages becomes blurred together. This is especially obvious when the text is short.

For example:

今天天氣很好hello world

This is spoken without overlap, compared to:

hello world今天天氣很好

Also, say all seems to get stuck in some situations. I have noticed this when reading emails in Windows Mail/Outlook Express. If the email contains previous conversations, the attached previous messages does not read continuously with say all.

@nvaccessAuto
Copy link
Author

Comment 17 by m11chen on 2011-06-19 05:18
To be more specific, I think Say All is broken when encountering an empty line break.

@nvaccessAuto
Copy link
Author

Comment 18 by m11chen on 2011-06-19 11:14
Hello

Would it be possible for the CJK Dual-voice Virtual Synthesizer to configure two separate SAPI5 voices for both English and Chinese?

@nvaccessAuto
Copy link
Author

Comment 19 by TimothyLee on 2011-06-20 00:55
The overlapped reading of text after English happens if espeak is used to read English. This is how we discovered the callback bug in espeak.

It is currently not possible to configure a physical synthesizer with two different voices because of the way the synthesizers are implemented. Sorry.

@nvaccessAuto
Copy link
Author

Comment 20 by m11chen (in reply to comment 19) on 2011-06-20 14:07
Replying to TimothyLee:

The overlapped reading of text after English happens if espeak is used to read English. This is how we discovered the callback bug in espeak.

It is currently not possible to configure a physical synthesizer with two different voices because of the way the synthesizers are implemented. Sorry.

I have been playing around with all types of combination of Primary and English synthesizers, but I find the issue with English text not being read after a string of numbers common not only with ESpeak chosen as the English synthesizer. Specifically, I have tested with SAPI4, SAPI5, Festival, and Pico.

@nvaccessAuto
Copy link
Author

Comment 21 by m11chen on 2011-06-24 03:03
Pertaining to the lag in NVDA's response time when using the CJK Dual-Voice Synthesizer, I noticed that this is actually specific to ESpeak and SAPI4 set as the English synthesizer. There seems to be a problem with stopping speech, for example when moving the focus through a list of items such as the Outlook Express message titles, ESpeak and SAPI4 would sometimes insist on reading the rendered text of the last item in its buffer without instantly updating the speech to match the current item under the cursor. This behavior is very obvious, since if the speech output is a long string of text, even pressing Ctrl does not stop the speech. The result is the apparent lag of speech not updating fast enough. This problem does not occur when using Festival or Pico set as the English synthesizer. Sorry for reporting this again if this is intrinsically the same problem in the ESpeak DLL as the overlapping of English and Chinese text and if work is already under way.

@nvaccessAuto
Copy link
Author

Comment 22 by TimothyLee (in reply to comment 17) on 2011-06-24 04:39
Replying to m11chen:

To be more specific, I think Say All is broken when encountering an empty line break.

I've compiled espeak.dll using the latest source code from SVN, and the Say All problem has gone away.

@nvaccessAuto
Copy link
Author

Comment 23 by m11chen (in reply to comment 22) on 2011-06-24 04:52
Replying to TimothyLee:

Replying to m11chen:

To be more specific, I think Say All is broken when encountering an empty line break.

I've compiled espeak.dll using the latest source code from SVN, and the Say All problem has gone away.

How could I obtain this new DLL? Is it available for download on the ESpeak Source Forge site, or could you provide an attachment?

Thanks.

@nvaccessAuto
Copy link
Author

Attachment espeak-svn20110524.zip added by TimothyLee on 2011-06-24 07:36
Description:
Latest espeak.dll compiled from SVN dated 2011-05-24

@nvaccessAuto
Copy link
Author

Comment 24 by pvagner (in reply to comment 23) on 2011-06-24 08:09
Replying to m11chen:

How could I obtain this new DLL? Is it available for download on the ESpeak Source Forge site, or could you provide an attachment?

I've also compiled one: http://sk.nvda-community.org/download/espeak-1.45.30-win_dll.zip

@nvaccessAuto
Copy link
Author

Comment 25 by m11chen on 2011-07-04 03:44
A few observations from using the latest development version: bzr-main-4530 with the CJK Dual-voice virtual synthesizer. There has been some fixes specific to the ESpeak(Rev 4528), SAPI4(Rev 4525), and SAPI5(Rev 4526) synthDrivers which seems to affect the performance of the virtual synthesizer. Previously, I have reported in comment #21 that with the primary synthesizer set as SAPI5 and English synthesizer set as ESpeak/SAPI4, there was some problem with stopping speech. For instance when moving through the message titles in Outlook Express, the speech does not stop right away upon focus change. This specific test case has now been significantly improved from the recent commits to main. The improvement in the responsiveness of the virtual synthesizer is also apparent when tabbing through items in dialogue boxes. In both these cases, I can no longer tell the difference between using the real ESpeak/SAPI4 synthDriver and the virtual synthesizer to read mixed English-Chinese text. The only remaining place where I can still notice some lag is when moving the focus through menu items, such as the start menu. If this can be fixed, the virtual synthesizer would be close to perfect in terms of responsiveness.

@nvaccessAuto
Copy link
Author

Attachment virtualSynthDriver.py added by TimothyLee on 2011-07-07 08:47
Description:
Updated VirtualSynthDriver class with support for "Say All" command

@nvaccessAuto
Copy link
Author

Comment 26 by TimothyLee on 2011-07-07 08:48
The updated virtualSynthDriver.py allows the "Say All" command (NVDA+down) to function properly.

@nvaccessAuto
Copy link
Author

Comment 27 by m11chen on 2011-07-08 05:05
Hi Timothy,

Thanks for the update for say all. I can confirm that it is fixed with the latest ESpeak DLL.

How might we fix the overlapping of English and Chinese speech?

@nvaccessAuto
Copy link
Author

Comment 28 by m11chen on 2011-07-13 11:41
When using the CJK Dual-voice virtual synthesizer with SAPI4 set as the English synthesizer, there would be unnatural pauses when encountering line breaks. This does not happen with ESpeak, but I think this might be related to the reason ESpeak causes overlapping of speech with the primary synthesizer, as this does not happen with SAPI4.

@nvaccessAuto
Copy link
Author

Comment 29 by m11chen on 2011-10-14 07:35
Hi Timothy,

Will this be fixed for the new beta release?

Thanks

@nvaccessAuto
Copy link
Author

Comment 30 by m11chen on 2012-12-21 15:07
Hello,

I have been trying to get this to work with the new input composition feature, but for some reason it doesn't work right now. I think it has to do with the locale setting when two synthesizers are specified. Currently, I have the primary synthesizer set to SAPI5 and the English synthesizer set to SAPI4. Can anyone also using this maybe provide me with a hint for where to start for the fix?

Thanks

@nvaccessAuto
Copy link
Author

Attachment multilang-cjk.patch added by TimothyLee on 2013-05-21 08:30
Description:
Patch set dated 2013-05-21

@nvaccessAuto
Copy link
Author

Comment 31 by TimothyLee on 2013-05-21 08:33
I've found a solution for the overlapping speech problem. It involves adding a "busy" attribute to the SynthDriver class to indicate whether speech is being output. Each synthesizer must override this attribute with a custom implementation to support non-overlapping speech when used by the VirtualSynthDriver class.

In the patch set dated 2013-05-21, I've included patch for audiologic, espeak, sapi4 and sapi5. Please test them and comment. Thanks!

@nvaccessAuto
Copy link
Author

Comment 32 by vgjh2005 on 2014-08-10 16:55
Hi:
I have tested this patch, and it is very useful. Please add support for Vocalizer2.2/5.5/Aisound/microsoft speech platform. Please let these use together.
Please made it into a addon package.
Thanks a lot!

@nvaccessAuto
Copy link
Author

Comment 33 by taghavi on 2015-02-02 06:55
I developed a similar open source add-on that named "Dual Voice for NVDA". This let use two separate voices for reading non-Latin and Latin languages. This add-on is compatible with SAPI5 (Speech API version 5) and MSSP (Speech Platform).
The Dual-Voice-for-NVDA project homepage is at: [http://dualvoice.sf.net]

@LeonarddeR
Copy link
Collaborator

Blocked by #4877

@Adriani90
Copy link
Collaborator

I removed the blocked label since speech refactor is now part of NVDA and this opens up new possibilities.

@Adriani90
Copy link
Collaborator

@ehollig could you please add the attachments for this issue? The last three attachments should suffice here since they seem to be the most updates ones.

@ehollig
Copy link
Collaborator

ehollig commented Sep 14, 2020

Hey @Adriani90 sorry for the delay. Here is some of the files. Let me know if these are not the ones you want.
multilang.py.txt
virtualSynthDriver.py.txt
multilang-cjk.patch.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants