Ticket #149 (new enhancement)

Opened 4 months ago

Last modified 4 months ago

Improve SayAll reading

Reported by: aleksey_s Owned by:
Priority: major Milestone: 0.6
Component: Core Version: trunk
Keywords: sayall, synthesizer Cc:
Blocking: Blocked By:

Description

Currently, when reading text in e.g. notepad, nvda sends text to the synth by line. So, most synths decide this is an end of text and make end of sentence inflection. its ofcourse sounds bad when there are a long paragraph of text. So nvda must send text to the synth by another chunk of text (sentence or paragraph).

Attachments

paragraphOffsets.py (1.7 kB) - added by pvagner 4 months ago.
Implementation of the _getParagraphOffsets() method for the NVDA textInfo class. This assumes blank lines are splitting the text into paragrapsh and paragraph can't start with space or a punctuation symbols. For testing purposes you can add it to any of these classes as long as you can test it.

Change History

Changed 4 months ago by jteh

  • milestone changed from 0.6p2 to 0.6

Changed 4 months ago by jteh

Sentence is probably better than paragraph, as paragraphs can potentially be quite large, which means increased search time and larger chunks of text being sent to the synth.

Unfortunately, determining where sentences begin and end poses a localisation problem. Different languages have different indications of sentence boundaries and some languages don't have a concept of a sentence at all. Aside from the problem of gathering rules for different languages, as usual, we can't make these determinations based on the NVDA language, as the user might be reading in a language other than their NVDA language at any given time.

There is another option aside from reading by sentence or paragraph. Note that reading by sentence can introduce pauses that the synth would not otherwise introduce. For example, if the synth would normally not have paused after a full stop (.) for some reason, reading by sentence would introduce a pause which the synth would not otherwise have made. An alternative used by some screen readers is to end the current chunk of text only if that chunk ended with characters which would have indicated a pause. For example, if there are three sentences across two lines where the third sentence ends at the end of the second line, the chunk would end only at the end of the second line. This is rather complicated and would result in larger chunks of text than reading by sentence. It still suffers from the localisation problem above, as the pause characters would be specific to each language.

Changed 4 months ago by jteh

  • type changed from defect to enhancement

This is an enhancement, not a defect.

On further discussion, although less efficient, doing this by paragraph is simpler, less prone to error and does not suffer from the localisation problem I described.

Changed 4 months ago by pvagner

Implementation of the _getParagraphOffsets() method for the NVDA textInfo class. This assumes blank lines are splitting the text into paragrapsh and paragraph can't start with space or a punctuation symbols. For testing purposes you can add it to any of these classes as long as you can test it.

Changed 4 months ago by pvagner

  • component changed from Speech to Core

Hello,
I am playing with this for a while. I have attached what I have so far. It can detect paragraph boundaries.
I thing because of the performance by default we'll be using lines anyway. So I don't see a problem also adding ability to read by sentence. I have came up with a regular expression which searches for sentence endings and it is really working fine for me. I will post a method to get sentence offsets as I'll complete it.
Oh BTW I am changing component from speech to core because this really affects speech but the implementation it-self will go to the core (textInfo class for the baseNVDAObject)

Note: See TracTickets for help on using tickets.