Posts for the month of September 2007

Virtual buffer Library code started

As the web access solution in NVDA will allow users to use both object navigation and a flat model approach, we have to start writing a replacement for NVDA's current virtual buffer code.

At the time of this blog entry, Quite a bit of the code has already been written. That being the storage module, whos job is to allow the storage and retreavel of text and fields. And part of the client and management code has already been written.

To access the source code, you can grab it from http://svn.nvda-project.org/nvda/virtualBufferLibrary/

As the virtualBuffer library must be fast and light-weight, it is being written in c++. It also will be written so that parts of it (such as the rendering/updating of the buffer) will be executed in-process. This means that for instance if virtualizing some web content fromFirefox, then some of the code will be executed with in Firefox itself.

The basic idea of a virtual buffer is that a screen reader wants to access a flat model of some content in a particular window. It is the job of the virtual buffer to render and update the flat model inside the window, and allow the screen reader to query for certain information about the flat model.

In order to facilitate the execution of code in-process, the library needs to firstly be able to inject itself in to another process, and also then be able to send information to and from the code that was injected.

The virtual buffer library itself only manages the window, and allows the reading from storage. However particular backend libraries must be also written which know how to work with particular object models in Mozilla Gecko etc, in order to render and update content, storing it in storage so that the virtual buffer can read and interact with it.

The library is split up in to a few distinct parts: dllMain, client, internal, wm, storage.

DllMain? is very small, but its job is to initialize common variables etc, that must exist for any instance of the library, whether it be in or out of process. For now this is just a handle to the opened dll, and also it needs to initialize all the window message values.

Wm keeps and manages all the window messages that will be used to communicate information between processes. As the library will be injecting itself in to processes and then intercepting window messages from any window the client tells it to, it is necessary that the library's own window messages be values that don't clash with any other messages already used for that particular window. The win32 API call RegisterWindowMessage? is useful for this task as it can assign unique values to the libraries window messages. However, this does mean though that each time the dll is loaded in to a process, all the window message values have to be initialized. RegisterWindowMessage? (given the same string argument) will always give back the same message value, at least until the system is rebooted.

Wm also contains quite a few struct types which are used to hold arguments needed for a window message. When a message is sent with SendMessage?, it is necessary to allocate either system memory, or virtual memory in the process the message is being sent to, and using this memory as the struct, and then giving SendMessage? a pointer to this memory.

Client contains all the high-level functions that will be used by a screen reader to create and read from the virtual buffer. It can create a buffer, destroy a buffer, get text between two offsets, find out a field ID at a particular offset, get an XML representation of the text and fields between two offsets, find particular text, and even find a field given certain properties.

These client functions all pretty much just send a window message to the given window, passing a buffer handle, and perhaps a pointer to system allocated memory containing further arguments for the message.

Client also contains some functions which can prepare and unprepare a window. These functions are what manage the injection of code in to a process, indirectly intercepting a chosen window. How it does this is to temporarily set a window message filter hook on the thread who owns the window, then send the window a message (to make sure the hook gets called at least once), and then unregister the hook. Because the hook function the library used is actually part of the library, Windows automatically loads the library in to the process who owns the window. However, it is then up to this hook function to intercept the window, and make sure that windows can't unload the library from the process when the hook is unregistered.

Internal contains the hook function, and a window procedure. The hook function's job is to pass on any window messages it receives, ignoring them, unless its the window message that client sent in prepare window. If it is this message, the hook function intercepts the window that message was for by retreaving the window's current window procedure, saving it as a property on the window for later use, and setting the window's window procedure as the window procedure in the library. It finally finds out the file path of the library, and uses the win32 api call LoadLibrary? to up the reference count of the library, so that Windows won't automatically unload it once the hook function is unregistered. This all means that from then on, any message for that window will travel through the libraries own window procedure.

The hook function will also have to load a backend library, and instruct it to render the current content, and also register any events it might need so that it can continue to update the content.

The libraries window procedure will pass on any message it receives to the old window procedure, unless its one of the library's own window messages. If it is one of these messages, then the window procedure will perform the appropriate action and return the result. e.g. ask storage for the text between to offsets, and return the result.

However if the message is for unpreparing the window, then the window procedure has to replace the window's old window procedure, and call FreeLibrary?, allowing Windows to finally unload the library from this process. It would also have to do this if it detected that the window was going to be destroied.

Storage is the code that manages the actual text and fields. It stores fields as a tree of nodes (with next, previous, parent, firstChild and lastChild relationships). Each node has a given ID, and also contains properties such as a role, value, states etc. It also contains start and end offsets, which are used to work out what text goes with what field. The text is all stored in one large c++ string.

There is also a map of IDs to nodes, to make it easy to locate a node for any given ID.

There are functions to perform all the tasks that the client needs to perform (such as getting text between two offsets, finding text, finding fields, getting an xml representation for text and fields between two offsets).

No backends have been written yet. But their job will be to render and update the content, using storage to store the rendered content.

This description is a very rough idea of how the library will work. Please look at the actual source code for more detail.

There is still much code to write, and many things to decide upon.

Research with Voice Over, and design decisions, for Web Access Solution

After my last blog entry on the Web Access Solution, I received the Mac laptop NV Access had hired for a week, so I could test out the way Voice Over handled access to the web.

On the whole, I found that using Voice Over, I was able to perform any task I needed to, though the keyboard commands took me quite a while to get used to. Though it was a nice feeling to be able to unpack the laptop, turn it on, and press command f5 and have the system start talking, allowing me access to 99% of the operating system.

For navigation, Voice over takes an approach which is sort of a mix between Gnopernicus/Virgo/NVDA (tree-based object navigation) and Jaws/Hal/Window Eyes (flat screen model). Voice Over allows you to navigate by object, though its tree-structure is very minimal. Its more as if the order of objects is governed by where they appear on the screen, rather than where they are logically positioned.

When I used Safari, I noticed that Voice Over does not use the virtual buffer flat-model approach to web content like many Windows screen readers, but just continues to allow the user to use its operating system wide object navigation. Once you type in the URL, and then locate the html content, you can either tell Voice Over to read all the objects inside the html content object, or you can enter the html content object and then navigate around the objects within.

It was nice to be able to quickly get an idea of the structure of the page using object navigation, though I did feel yet again that the tree-structure was quite minimalistic. Ialso found it a little hard to review bits of information on a page, as you could only really move between paragraphs and other elements, rather than also being able to move easily between lines and characters etc. There is a mode you can switch in to to review by character on an object, though it is quite fidly to do.

Having had a play with Voice Over on the web, We have decided that there are advantages and disadvantages to both object navigation and a flat model approach. We have planned now to make sure that NVDA's web access solution uses not one or the other, but both in paralell.

The idea will be that when you go to a web page, the content will be loaded in to a flat representation, but also you will be able to use NVDA's object navigation at the same time. In fact, each time you move with in the flat model, where you are in object navigation will be updated. And the same goes for moving with object navigation: your position in the flat model will be updated.

This means that users can use what ever approach they like to read the page. Some highly structured information might be best navigated by object, but some textual information might be best read in a flat model.

There will most probably also be a setting in NVDA to say whether you in fact want the flat model at all. Some users may only want to use object navigation, and in that case,they shouldn't have to be affected by the rendering of a flat model they never intend to use.

I must admit I was a little surprised at Voice Over's object navigation. Users of Voice over have been singing its praises for quite a while, in that Voice Over takes a very different approach to web access. Although I personally totally agree that object navigation is very useful, in truth object navigation has been around in Gnopernicus, and Virgo4 for many years. Plus, NVDA has had the ability to navigate a web page (at least in Firefox) by object navigation for over a year, though it seems to me that many users of Windows don't seem to find this useful.

So, hopefully with NVDA having both, users can choose which way is best for them.