General Progress Update
It has been quite some time since the last general post. A great deal has happened over the past couple of months, including the CSUN conference and preparation therefor, about which I posted separately. I have thus been dreading writing this, as I struggle to remember some of the minor, but nonetheless important, happenings of the last couple of months.
Perhaps the most exciting work on NVDA has been that relating to the new in-process virtual buffers for Mozilla Gecko 1.9, which includes Firefox 3 and Thunderbird 3. Current NVDA users would certainly be aware of this, even if they haven't tried it personally. NVDA 0.6p1, released just before CSUN, was the first release to feature these new buffers. Aside from massive improvements in reliability and accuracy, the new code allows for almost instantaneous rendering of pages in most cases thanks to its in-process workings. The new code also sports a far better design which makes many exciting features possible that were either impossible or extremely difficult to implement in the old code. For example, it took me only about half an hour to plan and implement the links list, a feature coveted by many users and which I use frequently myself. This was a great reinforcement of our design choices. The code in both the virtual buffer library and NVDA itself has continued to improve steadily since CSUN. While still under steady development, the goals in the web access grant from the Mozilla Foundation are almost complete. One major feature which is currently missing is the ability to efficiently navigate and read tables. Also, NVDA currently does not automatically report changes in live regions, although it does dynamically update the buffer. Nevertheless, Mick and I both use NVDA on the web full time and are very satisfied. Mick, who did the majority of the work on this project, has done an absolutely fantastic job.
Aside from assisting Mick with the virtual buffer work, I have worked on a lot of bug fixes, general improvements and code cleanup. One of the largest of the improvements is perhaps the refactoring of the NVDA GUI, which was included in 0.6p1. The NVDA window is now gone and everything is instead accessed from the NVDA system tray menu. A lot of issues with NVDA windows not gaining focus were fixed. In the last few days, I introduced more fixes in this area, including the elimination of the freezing in NVDA dialogs which occurred on some systems, particularly in Windows Vista. These are important changes in terms of user experience. I have also spent some time working on two tools useful in NVDA development. The log viewer simply allows the user to quickly view the NVDA log file right from the NVDA menu, rather than having to find the NVDA log file and open it with a text editor. It can be refreshed with a single key press and refreshes automatically when switching back to the window. In addition, it allows the user to save the current log content. It is thus useful to both users and developers alike. The other tool is the NVDA Python console, which allows developers to interact with the running internals of NVDA using a Python interpreter. Not only is this extremely useful in debugging NVDA, but it can also assist in inspecting the accessibility architecture of other applications.
I have been holding the fort somewhat more than usual in the last month, as Mick became the proud father of a baby girl in April. Congratulations, Mick! This has been a bit of a challenge for me, as I have had to adapt to working with less of his valuable feedback and collaboration. Even so, despite the exhaustion and chaos of early fatherhood, Mick has still managed to make some very significant contributions to NVDA and has been steadily increasing his working hours over the last couple of weeks.
Other than coding, we have done a lot of work in relation to the NVDA project resources and collaboration tools. I have been strongly encouraging users to use Trac to report issues and Mick and I now make extensive use of it ourselves. Thanks to all of the users who have started to use this resource. It is certainly improving the organisation of the project and makes it much easier for developers and other users to keep track of reported issues. Due to our previous hosting provider becoming ever more unreliable, we moved all of our internet services to a new server. I moved all of the important articles from our old wiki, taken down due to increasing spam and lack of maintenance, to the Trac wiki. Some work still needs to be done to the Trac front page to allow these articles to be found and to provide better direction for new users. I made some long needed updates to the NVDA web site. Most recently, we migrated the NVDA Wordpress blog to Trac, which allows for easier posting, maintenance and integration with the rest of Trac.
Outside of NVDA, both Mick and I have spent a great deal of time testing Mozilla Firefox 3. We have filed several significant bug reports relating to accessibility, some of which have resulted in noteable improvements to Mozilla accessibility, sometimes not just for NVDA, but for other assistive technologies as well. We have continued to attend meetings and contribute to the IAccessible2 effort. Due to the increasing number of external issues we are reporting and tracking, we have started the ExternalBugs wiki page to list and provide links to all of these reports.
Although Mick and I talk via phone on a daily basis, we decided in a recent discussion that we will meet in person for another NVDA hack fest some time in June. We have plans for some major improvements to the core of NVDA; specifically, the creation and handling of NVDAObjects. Topics such as proper support for tables and future plans will also be covered. We will probably make another 0.6 preview release some time over the next few months. I will then embark on implementing support for braille in NVDA.
New virtualBuffers now in NVDA, and fun with lines
Since my last blog post on the web access grant a lot has happened in regards to this stopic. A few features talked about in previous postings for the storage code have been implemented, but most importantly, I have taken that next step of actually completely integrating the storage code in to NVDA and now giving myself and other NVDA developers the choice of using the new code to interact with Gecko 1.9 documents (such as Ff3 and ThunderBird?3).
The last post I talked about the fact that Jamie and I had talked about allowing arbitrary properties on nodes in the buffer, rather than just locking it down to just role, value, states, keyboardShortcut and contains. Well, this has been achieved, so now functions such as addTagNodeToBuffer and findBufferFieldIDByProperties take an array of attributes (name and value pares), or sometimes name and multiValue pares, if using findBufferFieldIDByProperties to search on multiple values of a given property. There are still particular properties of a node which have their own specific member variables (such as ID, and some other new ones that are generic to nodes or tagNodes)
The coding that has probably taken up most of my time through out the last month or so is the code that manages lines in the storage module. A virtualBuffer's job is to render a representation of a document in a flat layout, meaning that every character in the buffer has an index, from 0 to the length of the buffer - every character has an ordered place. But, it also has to have an idea of what lines are, as in it must allow the user (through the AT) to be able to arrow up and down through the buffer by lines of information that are not too long, but also not too short as to make the user have to press keys more than they need to. Working out exactly how to implement this was hard, what it uses now is my third go at implementing it.
NVDA itself has pretty good text management through its TextInfo? classes, so all the storage module had to do to communicate line placement was to allow the querying of line offsets using a particular offset as reference, with the getBufferLineOffsets function.
My first attempt was to simply scan back from the offset, looking for a line feed character, and then do the same forward. This works ok if the only information stored in the buffer is itself basic text broken up by line feeds. However, if for some reason the AT wanted to some how tweek where line breaks occured (perhaps for ease of reading), it would have to insert its own line feeds in along with the original information.
This way of doing things was ok for testing. In fact, with some initial tests in NVDA, I had NVDA place a line feed at the end of each node it inserted, plus I also had it scan each block of text it added and got it to insert line feeds with in the text, to break it up in to reasonable sized chunks.
Mutating text in this way is not only bad because when the text is navigated by the user, they will see a line feed character at the end of each line, even if the line was only broken due to line length rules, not because a paragraph actually ended. The other major problem is that because Mozilla Gecko provides text with imbedded objects, who's events depend on the text offsets staying the same as what they internally have, things could get out of sync pretty quickly.
I then designed a way so that the AT or backend, when adding the text to the storage buffer, could provide a list of offsets where lines should be broken. These would be soft line breaks that did not actually appear in the text, but the buffer would know about them and when asked for line offsets, could take those in to account.
I was happy with this approach for quite a while, as it meant 1. that we were not mutating the text at all and Gecko events would be happy, and 2. Users could arrow to the end of a line in the middle of a paragraph and not see line feeds that shouldn't be there.
There were two major problems with this approach. The first was that the AT or backend needed to know the user's chosen maximum line length at render time, and although individual text blocks would not contain lines longer than the chosen length, there was nothing stopping two text blocks (say part of a paragraph and then some links) from all together added up being much longer than the chosen length. Of course this wouldn't be a problem if a line break was inforced at the end of all nodes (such as in many popular windows screen readers), but if NVDA was to support a screen layout, then this problem could be quite evident.
Eventually I decided on the third approach. This way was to allow getBufferLineOffsets to receive a maximum line length int , and also an int that indicated whether a screen layout was to be used, and then it would calculate the offsets itself by a set of steps. To accomidate the new way, tag nodes in the buffer also needed to take a new member variable, addTagNodeToBuffer also needed to be able to receive this. This was an int that indicated if this tag node was a block element or not, as in, should the buffer assume that this node has to both inforce the start and end of lines at its edges.
So, the steps that getBufferLineOffsets takes are: *Set some initial line offsets to the start and end of the buffer *Locate the deepest node at the offset given *Move up the node's ancestors until it locates a tagNode that is indicated as being a block element. If one is found, then the line offsets are set to this node's start and end offsets. Also record the start and end of any tagNodes passed in a possibleLineBreaks set. *Then from the given offset, do a traversal search both backwards and forwards in the tree locating the closest block elements. If one is closer than the ancestor block element, then set the line offsets to this node's offset. Again also while traversing, save the start and end offsets of any tagNodes in the possibleLineBreaks set. *Then scan the text between the now found line offsets, looking for both line feeds and beginnings of words. If a lineBreak is found before the given offset, and its the closest one to the offset, than it now becomes the line's startOffset. Same for a line feed on or after the given offset, if its the cloest it becomes the end offset. The beginning of word offsets are saved in the possibleLineBreaks set. *Finally, The line start to line end is counted up to make sure it doesn't exceed the maximum line length the user requested. If it does, then the line start is brought forward to an offset either at the max line length, or before (using the possibleLineBreaks set as indication of where its healthy to break), and then the line length is counted up from there again. Of course this does not ever pass the original given offset, and the line end of course will not end up being before, or too far after, the given offset.
Note that if the user chooses not to use a screen layout, then rather than searching for block elements, it just uses any tagNode, meaning lines will seem to always break at the end of links and other fields etc.
A rather complex set of actions, however in c++ they really do not take too much time at all. I didn't really like this approach at first as it has a danger of being non-cemetrical, in that there could be a chance that asking for two different offsets that should be on the same line, it may give back two different lines, due to the fact that a maximum line length has to be checked. Though, I foun that as long as I always calculated all soft line breaks, even before the given offset, between clear block line breaks, this would never be a problem.
Around the same time I was improving upon the line offset code, I started re-writing NVDA to use the new virtualBuffer code. At this point in time, the new virtualBuffers for Gecko 1.9 applications have improved quite a lot in comparison to the old virtualBuffers NVDA was using before the grant. Although the backend rendering code is still in Python, the technique of using imbedded objects in text with IAccessible2 and so forth proved to be a rendering time improvement of over 50%. This means when NVDA loads a document in Firefox3, it now takes just under half the time it used to.
As the low-level management of nodes and text is all maintained in c++, this has made sure that its much more accurate, and we no longer have large chunks of documents mysteriously not being rendered, or complaints from the virtualBuffers that some ID doesn't exist and other fun things we used to have.
We have been waiting for a long time to be able to convert NVDA's virtualBuffer interface code to using the TextInfo? classes I spent a lot of time on last year. As we needed to make NVDA work with the c++ virtualBuffer storage module, and because we needed to improve NVDA's rendering patterns for imbedded objects and such, I made sure the new virtualBuffers were designed around the TextInfo? classes. this now means users of NVDA now have the ability to select text with in virtualBuffers, and also copy that text to the clipboard if they wish. They can also choose to read the buffers as a screen layout, or as a more conventional node per line layout. Its taken a little while, but I've also now added the quick key navigation (as in press h to jump to a heading, l for a list etc) in to the new virtualBuffers; its great to see that the findBufferFieldIDByProperties function actually works like I'd hoped.
At the moment we're still in discussion on the development list about how particular fields such as links etc should be spoken: should the word link be spoken before or after the text etc. Though I think we've come to a pretty good agreement on most of the fields.
The new virtualBuffers (at least in Gecko 1.9 applications) can be interacted with in regards to activating links, toggling on and off a pas-through mode to interact with edit fields and combo boxes etc, though the one thing that makes the new virtualBuffers incomplete still is that they have no support for events etc, as in if content changes dynamically in the document, the buffers do not pick up this change. This code however will be added when I re-write the rendering code in c++, as it needs to be very fast, and for best accuracy, it should really be in-process so that things don't start disappearing before NVDA's process gets around to actually asking Gecko for it. However in the mean time I've added a key stroke to tell NVDA to manually re-render the current document, so for most websites, they are able to be tested well enough.
Over the next little while its probably going to be more work on NVDA and virtualBuffers in general, to make sure that the user experience is the best it can be. Once this is ok, then my next task wil be to re-write the rendering code for Gecko virtualBuffers in c++. this should give load times an estimated speed up of about a multiple of three. Then after that the fun work will begin on trying to integrate all of the virtualBuffer c++ code so that the rendering code is injected in to the Gecko application, and rendering takes place in-process. Which by estimates should speed up load times by a multiple of twelve or so.
More work on Web Access grant
Since my last post on the web access grant, the virtual buffer library code has undergone quite a few changes, both for code readability, and for makeing sure it will really work the way it should, in all situations.
The first change is that the wm module (that manages all the window messages) has now been removed. So rather than having individual window messages for each API call that must cross a process boundary, we only now have one window message which takes a pointer to the internal function, and a pointer to a struct of arguments, as its wParam and lParam arguments. This means that only one window message needs to be registered, plus it means one less code change when adding new functions to the API.
The next change is that rather than client and internal functions using the storage buffer directly, they use a buffer container instead. This buffer container is a pointer to a struct that contains a handle to the window being virtualized, a handle to the current backend dll being used for this buffer, a pointer to the storage buffer, and a pointer to a win32 Critical Section, which is used to serialize access to the storage buffer. These changes make it much more possible to have multiple buffers for the same window, and it makes sure that the storage buffer can not be read from while its being written to etc. The latter of course depends on the fact that any backends will also properly use that critical section when accessing the storage buffer.
Previously, nodes were only used to represent tags, as in actual nodes with properties, not just text. The only way that text manifested itself was if a node was wider than 0 offsets and it had no children, or as a gap of more than 0 offsets between two sibling nodes. This was very dificult to manage when inserting and removing text, so now text has its own node type.
As Firefox3 accessibility has a *very* different way of handling text and child nodes (through its embedded object approach) compared to Firefox2, or other web browsers, a lot of long phone calls between Jamie and I were had, and a lot of code restructurig was done, to make sure that we can handle the two very different approaches as efficiently as possible.
The main problem was that the virtual buffer library was very ID-centric, meaning that all our API functions took node IDs as arguments. However, in Firefox3, text itself does not have a unique ID, so this approach just doesn't work at all. So the calls for adding and removing nodes have now been changed to take actual nodes as arguments, not just IDs. Also, the call to add a node also actually returns the node as its added, which allows the backend to gain a reference to the created node for later use. To avoid code duplication, much of the adding/removal code has ben broken down in to much smaller reusable functions, which make the code easier to read, and probably even more efficient.
Two extra functions have been added which handle the merging and splitting of text nodes. When removing a node which is flanked by two text nodes, its probably best to actually merge the two text nodes together so as not to cause fragmenting over a long period of time. Otherwise we could end up with a whole bunch of one-character wide text nodes all over the place, if there were a lot of arbitrary removals. The reason for the splitting function, is that if firefox instructs us that we need to add a node in to a parent at offset n, offset n may actually be right in the middle of a text node, so before adding the node, we need to split the text node in two at this position, and then add the new node directly after the first text node. Note that a function has not yet been written to actually take a parent and an offset from firefox and calculate where exactly in its children the new node must be added, this will get written along with the Gecko backend as its quite specific to Gecko.
Many other little fixes and tweeks have been made to the code, making sure that it handles different situations properly.
All through the writing of this code, there has been a testing program that tests different actions to perform on the storage buffer, to make sure we don't break anything.
Lately, I have branched NVDA trunk to a virtual buffer testing branch, which has allowed me to pull apart NVDA's old virtual buffers, and start writing a test one using the storage module from the virtual buffer library. Its very basic, only printing about three or so lines to a virtual buffer, with a few links and headings etc, but this is really just to enable me to start writing the necessary code in NVDA that will be used to navigate the new virtual buffer etc. I must say its quite nice to finally be able to navigate around the new virtual buffer, it does prove that this code is actually going somewhere.
One thing that Jamie and I were talking about on the phone last night was the use of hard-coded properties in the virtual buffer library, such as role, value, states, contains and shortcut. It has always been thought up until this point that backends will convert their own role and states values to virtual buffer library - sspecific ones, then NVDA only has to deal with one set. However, this seems to create a lot of work for the backends, plus rather large mappings need to be written for all the different accessibility APIs. I think we have agreed that the backends will now just use API specific values for roles and states, and NVDA itself will do any conversions after fetching them from the virtual buffer.
We are also worried about exactly what properties a node should have. Obviously a node needs an ID, as that is what makes it unique, but as far as role, value, states, contains and shortcut is concerned, these are really quite arbitrary to a virtual buffer, and specific to the API used. Our thought is that perhaps rather than having hard-coded properties, we will allow arbitrary properties instead, meaning that when a node is created, the backend who created it can specify a string of name=value pares, which denote the properties. The property names should also probably have namespaces for the different accessibility APIs, though there may be some properties which are not specific to any API.
All this needs to be thought out a little more, but what we are starting to realize is, is that the virtual buffer library should only act as a pipe line for information, at the same time tweeking the sintax and structure (i.e. converting a hyerarchical model in to a flat model), but not in any way changing the actual content. For example, the virtual buffer library should not have any idea about what a 'role' is, it should only know that nodes have properties. Its up to the actual accessibility API to inforce the semantics.
The advantage of this is that we don't need to keep changing the virtual buffer library interface when we want to add some other property, perhaps its to do with live regions, or its something to do with tables. Instead, the backend just needs to make sure it adds that particular property to the node, and of course NVDA needs to know to use that property on the other end, when its reading from the buffer.
Although work is slow, I think we are certainly making progress, and at the very least I am certainly learning a lot. This is new grround, for us at least, and I think we'll get there.
Virtual buffer Library code started
As the web access solution in NVDA will allow users to use both object navigation and a flat model approach, we have to start writing a replacement for NVDA's current virtual buffer code.
At the time of this blog entry, Quite a bit of the code has already been written. That being the storage module, whos job is to allow the storage and retreavel of text and fields. And part of the client and management code has already been written.
To access the source code, you can grab it from http://svn.nvda-project.org/nvda/virtualBufferLibrary/
As the virtualBuffer library must be fast and light-weight, it is being written in c++. It also will be written so that parts of it (such as the rendering/updating of the buffer) will be executed in-process. This means that for instance if virtualizing some web content fromFirefox, then some of the code will be executed with in Firefox itself.
The basic idea of a virtual buffer is that a screen reader wants to access a flat model of some content in a particular window. It is the job of the virtual buffer to render and update the flat model inside the window, and allow the screen reader to query for certain information about the flat model.
In order to facilitate the execution of code in-process, the library needs to firstly be able to inject itself in to another process, and also then be able to send information to and from the code that was injected.
The virtual buffer library itself only manages the window, and allows the reading from storage. However particular backend libraries must be also written which know how to work with particular object models in Mozilla Gecko etc, in order to render and update content, storing it in storage so that the virtual buffer can read and interact with it.
The library is split up in to a few distinct parts: dllMain, client, internal, wm, storage.
DllMain? is very small, but its job is to initialize common variables etc, that must exist for any instance of the library, whether it be in or out of process. For now this is just a handle to the opened dll, and also it needs to initialize all the window message values.
Wm keeps and manages all the window messages that will be used to communicate information between processes. As the library will be injecting itself in to processes and then intercepting window messages from any window the client tells it to, it is necessary that the library's own window messages be values that don't clash with any other messages already used for that particular window. The win32 API call RegisterWindowMessage? is useful for this task as it can assign unique values to the libraries window messages. However, this does mean though that each time the dll is loaded in to a process, all the window message values have to be initialized. RegisterWindowMessage? (given the same string argument) will always give back the same message value, at least until the system is rebooted.
Wm also contains quite a few struct types which are used to hold arguments needed for a window message. When a message is sent with SendMessage?, it is necessary to allocate either system memory, or virtual memory in the process the message is being sent to, and using this memory as the struct, and then giving SendMessage? a pointer to this memory.
Client contains all the high-level functions that will be used by a screen reader to create and read from the virtual buffer. It can create a buffer, destroy a buffer, get text between two offsets, find out a field ID at a particular offset, get an XML representation of the text and fields between two offsets, find particular text, and even find a field given certain properties.
These client functions all pretty much just send a window message to the given window, passing a buffer handle, and perhaps a pointer to system allocated memory containing further arguments for the message.
Client also contains some functions which can prepare and unprepare a window. These functions are what manage the injection of code in to a process, indirectly intercepting a chosen window. How it does this is to temporarily set a window message filter hook on the thread who owns the window, then send the window a message (to make sure the hook gets called at least once), and then unregister the hook. Because the hook function the library used is actually part of the library, Windows automatically loads the library in to the process who owns the window. However, it is then up to this hook function to intercept the window, and make sure that windows can't unload the library from the process when the hook is unregistered.
Internal contains the hook function, and a window procedure. The hook function's job is to pass on any window messages it receives, ignoring them, unless its the window message that client sent in prepare window. If it is this message, the hook function intercepts the window that message was for by retreaving the window's current window procedure, saving it as a property on the window for later use, and setting the window's window procedure as the window procedure in the library. It finally finds out the file path of the library, and uses the win32 api call LoadLibrary? to up the reference count of the library, so that Windows won't automatically unload it once the hook function is unregistered. This all means that from then on, any message for that window will travel through the libraries own window procedure.
The hook function will also have to load a backend library, and instruct it to render the current content, and also register any events it might need so that it can continue to update the content.
The libraries window procedure will pass on any message it receives to the old window procedure, unless its one of the library's own window messages. If it is one of these messages, then the window procedure will perform the appropriate action and return the result. e.g. ask storage for the text between to offsets, and return the result.
However if the message is for unpreparing the window, then the window procedure has to replace the window's old window procedure, and call FreeLibrary?, allowing Windows to finally unload the library from this process. It would also have to do this if it detected that the window was going to be destroied.
Storage is the code that manages the actual text and fields. It stores fields as a tree of nodes (with next, previous, parent, firstChild and lastChild relationships). Each node has a given ID, and also contains properties such as a role, value, states etc. It also contains start and end offsets, which are used to work out what text goes with what field. The text is all stored in one large c++ string.
There is also a map of IDs to nodes, to make it easy to locate a node for any given ID.
There are functions to perform all the tasks that the client needs to perform (such as getting text between two offsets, finding text, finding fields, getting an xml representation for text and fields between two offsets).
No backends have been written yet. But their job will be to render and update the content, using storage to store the rendered content.
This description is a very rough idea of how the library will work. Please look at the actual source code for more detail.
There is still much code to write, and many things to decide upon.
Research with Voice Over, and design decisions, for Web Access Solution
After my last blog entry on the Web Access Solution, I received the Mac laptop NV Access had hired for a week, so I could test out the way Voice Over handled access to the web.
On the whole, I found that using Voice Over, I was able to perform any task I needed to, though the keyboard commands took me quite a while to get used to. Though it was a nice feeling to be able to unpack the laptop, turn it on, and press command f5 and have the system start talking, allowing me access to 99% of the operating system.
For navigation, Voice over takes an approach which is sort of a mix between Gnopernicus/Virgo/NVDA (tree-based object navigation) and Jaws/Hal/Window Eyes (flat screen model). Voice Over allows you to navigate by object, though its tree-structure is very minimal. Its more as if the order of objects is governed by where they appear on the screen, rather than where they are logically positioned.
When I used Safari, I noticed that Voice Over does not use the virtual buffer flat-model approach to web content like many Windows screen readers, but just continues to allow the user to use its operating system wide object navigation. Once you type in the URL, and then locate the html content, you can either tell Voice Over to read all the objects inside the html content object, or you can enter the html content object and then navigate around the objects within.
It was nice to be able to quickly get an idea of the structure of the page using object navigation, though I did feel yet again that the tree-structure was quite minimalistic. Ialso found it a little hard to review bits of information on a page, as you could only really move between paragraphs and other elements, rather than also being able to move easily between lines and characters etc. There is a mode you can switch in to to review by character on an object, though it is quite fidly to do.
Having had a play with Voice Over on the web, We have decided that there are advantages and disadvantages to both object navigation and a flat model approach. We have planned now to make sure that NVDA's web access solution uses not one or the other, but both in paralell.
The idea will be that when you go to a web page, the content will be loaded in to a flat representation, but also you will be able to use NVDA's object navigation at the same time. In fact, each time you move with in the flat model, where you are in object navigation will be updated. And the same goes for moving with object navigation: your position in the flat model will be updated.
This means that users can use what ever approach they like to read the page. Some highly structured information might be best navigated by object, but some textual information might be best read in a flat model.
There will most probably also be a setting in NVDA to say whether you in fact want the flat model at all. Some users may only want to use object navigation, and in that case,they shouldn't have to be affected by the rendering of a flat model they never intend to use.
I must admit I was a little surprised at Voice Over's object navigation. Users of Voice over have been singing its praises for quite a while, in that Voice Over takes a very different approach to web access. Although I personally totally agree that object navigation is very useful, in truth object navigation has been around in Gnopernicus, and Virgo4 for many years. Plus, NVDA has had the ability to navigate a web page (at least in Firefox) by object navigation for over a year, though it seems to me that many users of Windows don't seem to find this useful.
So, hopefully with NVDA having both, users can choose which way is best for them.
First Work on Web Access Grant
NV Access (the supporting organisation of NVDA) has just received a grant from the Mozilla Foundation. This grant enables us to implement a web access solution to allow NVDA to work with web content in Mozilla Gekco windows (in such programs as Mozilla Firefox, and Mozilla Thunderbird). NVDA so far does have support for Gecko already, but there are many problems with the current solution such as slowness in loading pages, and errors in rendering, and keeping up to date, pages containing javascript.
Over the last two weeks, we have finally started on the research for the web access solution. Jamie and I started with a phone call where we talked about where exactly the project should go, we planned out a few very basic implementations such as as simply fixing up the current virtual buffers by making them faster and more accurate, or going for a completely different approach.
The different approach could be where the web content could be navigated using an object oriented idea. This would mean moving among objects on a particular level, and then moving in to an object and navigating the next level down. When navigating to an object, NVDA would speak the object plus any objects inside. This sort of means that rather than rendering the entire document, only the specific object being navigated to would be rendered. Though
this fixes a few problems with the current implementation it does also have its drawbacks in that it may take extra time to navigate from object to object. Having a good understanding about how other screen readers handle web content is quite important when developing our own solution, so I spent some time looking at various screen readers (namely Jaws, Window Eyes, System Access, Orca and Hal). I looked at how they show their web content and how they let users interact with form fields. I tested with Firefox when possible, failing this, Internet Explorer.
My findings showed me that some screen readers use two modes, one for arrowing around a flat view of the page, and one where arrowing goes straight through to the application, enabling the user to interact with forms. However, some other screen readers seem to integrate these two modes in to one where arrowing on to a field then automatically allows further arrow presses to go to the form field if it supports them, tabbing away from the field is the means of getting out. Advantages to this method is that the user does not have to worry about two modes, getting rid of the need to remember just how to toggle the modes. The disadvantages are that there is no over all logic that can be easily applied to the arrow keys when arrowing a web page. Sometimes the arrows move you around the page, sometimes they move in a form field.
Another variation I found was the way that simantic info such as link, heading, list etc was presented to the user. Some info such as lists in some screen readers appeared as physical text that you had to arrow around, some info in some screen readers was spoken but never shown at all, some info was shown in the buffer, but only took up the space of one character. Also whether a field type was spoken before or after the actual content varied due to the field type, and also the screen reader.
Although this quick look at screen readers hasn't completely set our minds to what is best for NVDA as far as navigation logic and speaking order goes, it has helped us formulate more of an idea as to what questions we will soon be asking screen reader users, as to how NVDA should look and feel when it comes to web access.
One screen reader we havn't yet been able to test is Voice Over (the Built in screen reader in MAC OSX 10.4 and above). Many users of this screen reader report great things about how it takes a more object oriented approach and gives a very useable access solution to users viewing web pages in Safari (the default MAC web browser).
Before we can make a better decision as to how NVDA's web access solution should allow users to navigate, we do need to try out Voice Over properly, so NV Access has hired a MAC Powerbook for the next week, enabling me to test Voice over with Safari and other applications, to get a good feel for how things are done the MAC way.
Probably the most important peace of work that I have worked on so far is extending a small C++ library I wrote called Gekco Walker. Now known as MSAA Walker, this software is used to traverse a tree of MSAA objects, logging their name, role, and value to a file. I originally wrote this library before starting the grant to time how long it took to completely traverse an MSAA tree produced from a Mozilla Gecko window with a particular web page loaded.
The code in its original state worked out of process, meaning that it executed from where the user started it. All MSAA objects being retreaved had to be pulled across process boundaries. This is the same way that NVDA currently works with MSAA objects, and probably is also the easiest way to code. Originally I firstly timed a particular web page just using NVDA itself to render it, and it took a total of 36 seconds. This was a very large page, being a quite verbose article from Wikipedia. I then timed how long it took to traverse the same page with MSAA walker. This only took a total of 12 seconds.
These findings so far show that there is at least a 3-time speed up of traversal when traversing a document using pure c++ as apposed to a higher level language such as Python. This is not to say that Python is bad, its just probably that in pure C++ there is much less bagage to be carried around when performing a repeditive task on many objects.
Since starting on the grant I have extended MSAA Walker so that it can also execute in-process. This means that it can inject itself in to the process containing the MSAA objects, and run inside. When retreaving MSAA objects, they no longer have to cross process boundaries, in theory speeding up the traversal.
Once I made the changes, I again ran MSAA Walker on the particular Wikipedia article, and this time instead of running in 12 seconds, or 36 seconds, it ran in a total of 0.8 seconds! When I extended the library I also allowed it to count all the MSAA objects it logged so I could make sure that the same amount of objects were being traversed. And sure enough, with both out of process and in-process, 5403 DOM nodes were counted.
Jamie and I had heard from people that in-process would give us a speed up, but like any information, it is important to test for yourself, especially when a lot of information to do with in-process execution comes from programmers of commercial screen readers, where the code is not available for testing. We were very surprised with the 12-time speed up, and after testing a few other large pages, also yielding times similar to 0.8 seconds, we became quite a bit more certain about sticking with a virtual buffer approach. However, we are still not yet at all ready to completely drop non-virtualBuffer ideas, or think about coding as we would still like to fully test Voice Over's web content support, and also get some answers from users as to how they like things presented.
Over the next week I plan to test Voice Over, and also have further talks with Jamie about the questions we'll be asking, and perhaps look more closely at useful methods of sharing a large buffer of information between processes. I already have been researching in to memory mapped files, and other means of sharing memory.

rss
NVDA is supported by