New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New text symbol processing framework #332
Comments
Comment 2 by jteh on 2009-07-24 05:21 Another question: how do we deal with punctuation symbols which need to be preserved in the output (e.g. "dot.", "comma,") to provide proper entonation? There need to be special rules for this; e.g. "..." should not become "dot. dot. dot.", otherwise you will hear pauses after each dot. The current code knows how to handle this for English, but I know that some other languages have their own punctuation systems. We can have a field which specifies that the symbol should be preserved in the output, but we also need to somehow cover the exceptions. |
Comment 4 by aleksey_s on 2010-08-01 21:27 Problems, that come in mind:
|
Comment 5 by aleksey_s on 2010-08-02 07:27
|
Comment 6 by jteh on 2010-08-13 03:20
|
Comment 7 by aleksey_s (in reply to comment 6) on 2010-08-13 05:19
It will make NVDA very unfriendly to newbies or, better said, for non-geeks at least in Russian. To give an example: Currently, NVDA says something like "levaya krooglaya skobka" for "(" (left paren). It is how we used to name this symbol in school etc. I changed it to "le sko" (first letters, something like "le par"). It is how things were written on telegraphs. While I know that some users also do desire or/and can stand with such abbreviations for punctuation, i also know well that there are a lot of newbies who will hate NVDA for such geeky stuff. It would be cool if brief labels are build-in NVDA, but as i said in previous comment adding profiles will increase complexity, and, after all, user can go to NVDA-community.org and optain a punctuation file with brief labels, if he/she wants.
OK, let it be 4.
Sounds like a science fiction. As you said, this is real world, and almost all synthesizers do not support it. Anyway, if you are too concerned, we can make an option "let synthesizer process the punctuation", and add support for turning on punctuation handling in synthDrivers for synths, which support that stuff like espeak.
Indeed.
Sure.
Why do you think this is a problem? Seems as not a big deal for me. the more punctuation the better. However, we can implement an ability to load language-specific punctuation independently for each language, as well as default. Then language-specific punctuation symbols will be in separate files. (What about gettext translation then? Or we may be language-specific marks can stay untranslated.)
I think yes, probably for nearest future, while we do not have support for language setting in synths.
Doesn't matter. Anyway, we don't support changing languages on the fly.
This breaks gettext usage.
I read a lot of multilingual texts here. So loading a new synth each time is a bad idea. However, I have created a virtual synthesizer called "multilang", which can determine language of text and speak it with desired synth. It keeps all needed synths loaded simultaneously, so there is no lag when switching synths.
I still believe that we need a flag on punctuation symbol that it must be preserved in the text after an insertion of actual label.
OK, let's go with regular expressions then. BTW, we will be able to handle differently "." when it is between numbers (one point zero two), when at end of sentence (a very long sentence full stop) and when in other context (www dot NVDA dash project dot org).
What about languages, which haven't idea of sentence endings at all? I remember you or Mick said something about that some time ago.
Same here for Ukrainian and Russian.
We always can go and add missing ones, when users complain. |
Comment 8 by briang1 on 2010-10-04 09:06 Of course, all of this being user configurable could end up in an unholy mess , so some default settings would be needed for each scheme, and presumably language. It might be a good time to try to tidy up the way symbols are spoken so that globally they are the same in all contexts, and to stop using # as the deliniator for comments. I'd also suggest, as has been touched upon here, that geek based systems should be avoided. Thus, I think many will and indeed do find regular expressions difficult to grasp. Obviously not used Dos, but this should be defaulted off and at least some explanation given in the user guide. Its my vote for four levels of punctuation too, but someone needs to sort out the when its not spoken, ie, it would be nice to hear punctuation in reading a line at a time, but not in say all, etc. This is my few comments for what they are worth. |
Comment 9 by BugHunter on 2010-10-17 10:02 |
Comment 11 by jteh on 2010-12-02 03:51 |
Comment by jteh on 2010-12-28 23:46 However, I've just realised that this will cause problems with regard to indexing. We're moving by line, so the indexes need to be for each line. Because sentences may cross multiple lines, this means that the indexes need to be inserted in the middle of an utterance. While synths supporting markup do allow this, NVDA doesn't currently support speech markup. Unfortunately, this means we probably won't be able to implement better say all until we implement speech markup. There may also be some synths that don't support markup (eSpeak, sapi4 and sapi5 do, but I'm not sure about newfon and audiologic for example). If this is the case, say all by sentence won't be possible for these synths. |
Comment 14 by jteh on 2011-01-17 00:21 |
Comment by mdcurran on 2011-01-20 23:47 |
Comment by jteh on 2011-03-31 06:49 |
Comment 19 by jteh on 2011-04-11 01:00 |
Comment 20 by aleksey_s (in reply to comment 19) on 2011-04-12 19:44
Yes. One of requirements for the framework was an ability for user to customise punctuation labels. But if user customizes labels globally (say, shortens Russian labels) and changes synthesizer to another language (say, English), the synthesizer will be unable to handle punctuation labels in Russian. So changing punctuation labels globally is a bad idea in my view. |
Comment 21 by jteh on 2011-04-12 21:54 |
Comment 22 by jteh on 2011-04-12 21:57 Will this work for you? Otherwise, we'll just delay user configuration to a later stage as I said above. |
Comment 23 by jteh on 2011-04-13 05:02 |
Comment 24 by aleksey_s (in reply to comment 22) on 2011-04-13 05:38
This will work, however I don't understand why such a restricting limitation. what is so ridiculously complex you came to when designing the system, where user can set locale-dependent changes? Could you please write your thoughts, i want to collaborate on this. What about similar system like done for global gesture map? e.g. there is one file where user changes are saved. it is divided into sections, and each section contains info for specific locale. |
Comment 25 by jteh (in reply to comment 24) on 2011-04-13 06:36 I originally thought locale specific user customisation would make the code far more complex. However, I ended up having to rethink the design anyway, so the new code should handle this more easily in future. I definitely can't implement all of the customisations in one file, as the maps are parsed separately for each locale and I'd rather not maintain two separate file formats. However, we could have symbols-en.dic, symbols-ru.dic, etc. This does present a challenge as far as GUI is concerned. How will the user specify which locale they're configuring? Too many options gets needlessly complex. Imo, they should only be able to configure a single locale at a time. The question is which locale the GUI will choose. Anyway, the initial implementation won't handle user customisation. I'll address that separately. One thing at a time. :) |
Comment by jteh on 2011-04-14 07:30 |
Comment by jteh on 2011-04-14 07:34 |
Comment 29 by jteh on 2011-04-14 10:28 |
Comment 30 by jteh on 2011-04-14 10:37 |
Comment 31 by aleksey_s on 2011-04-14 14:06 or what about rethinking the pattern-building code? e.g. process 100 symbols at a time, not the whole thing. |
Comment 32 by jteh on 2011-04-14 22:35 I thought about splitting the expression. However, the problem is that you don't want overlapping matches. For example, if the ". sentence ending" pattern preserves the ".", if you apply the normal "." pattern later, it will try to convert it into "dot.", so you will end up with something awful like "dot. dot." for a single sentence ending. You can work around this by using special marker characters to mark sections that are already processed, but this makes the expressions much more complicated. I don't think 90 or so (it's actually less than 99) complex symbols is going to be a problem. If it is, I guess we can cross that bridge later and switch libraries. |
Comment 33 by jteh on 2011-04-15 07:23 The only thing left now for this part is documentation. Any translators should note that info will be inherited from English by default, so you don't need to specify anything that's already specified in English. See the French symbols.dic for an example... or wait for the documentation if you don't understand. :) |
Comment 34 by jteh on 2011-04-15 12:18 |
Comment 35 by mdcurran on 2011-04-16 03:39 |
Comment 36 by jteh on 2011-04-18 03:07 |
Reported by aleksey_s on 2009-06-15 19:20
Rationale
currently, punctuation processing in NVDA is very simple and quite limited. Limitations of current system are:
So new punctuation handling system need to be developed to eliminate existing bariers.
expected features
implementation
template
The current speechDictionaries system may be extended to textProcessing module which will handle speech dicts, punctuation, indentation, char repetition and other stuff
...more detailed info about implementation
need to answer
Blocking #43, #271, #454, #919
The text was updated successfully, but these errors were encountered: