Honours Project Proposal

Author: Holger Winnemöller

Supervisor: Shaun Bangay

Grahamstown, March 18, 1999



"Investigation into gestures as an input mechanism –
A simple 3D modelling application in a virtual reality environment"


With the availability of standardised and low-cost 3D accelerator hardware and ever-increasing processing resources the virtual reality (VR) environment is emerging from the shadows of science fiction stories and science laboratories. Still the process of interacting with this environment is largely unstandardised so that intuitive and practical ways of communicating with a VR system have to be sought.

In a normal desktop situation, the main input devices are the mouse and the keyboard both of which have very limited use in a VR context. The keyboard may not be visible to the user if he is wearing a head mounted display and even if it were (using LCD shutter glasses for example) the context switch between virtual world and real world in order to use the real keyboard might be confusing and time-consuming. The mouse on the other hand is a 2D input device which, using several buttons, can be used to interact with the VR environment. Problems still arise, because of the inherent 2D nature of the mouse and (as with the keyboard) the fact that the user might have limited visual access to the mouse.

A more natural interaction method was suggested [1] by using gesture recognition. This meaning that commands to a computer system could be issued using movements of the hands.

In this project I want to investigate the usability and feasibility of gesture recognition as a form of command input to a VR system. The model application I will develop to demonstrate the above is a simple 3D modelling tool, in which 3D objects can be created and modified using gestures as a control mechanism.


The goal of the project is two-fold. The main emphasis will be put on creating a gesture driven application. This will be done assuming the availability of a working gesture recognition system. Since the main goal of this application will not be to code yet another 3D modelling tool, but to classify gesture recognition as a command input mechanism, it mustn’t depend on any particular implementation of the gesture recognition component. It will receive the commands from the input component and process these commands. The ease of use and success rate of this application will give an indication about how well gesture recognition can be used as a command input method to a VR environment.

The second goal of this project will be to adapt and hopefully improve the existing gesture recognition system (as devised by Gail Shaw [2]) to the specific needs of the testing application.


The resources found concerning this topic were very similar to those found by Gail Shaw, namely references to the GRANDMA toolkit described by [3] and JOVE by [4]. There were also interesting articles about work done on recognising the American sign language and an application concerning training of military arm signals and commands. These were the main references found with respect to the gesture recognition part of the project.

In addition to that I found two internet-sites that gave relevant information and ideas about modelling in 3D.

The first one [5] is a paper on a 3D CAD system using stereo glasses and a 3D mouse for out- and input devices thus not applying directly to my project. Yet the authors state several concepts about depth perception and visual aides concerning depth perception. In addition some of their visual representation ideas might be useful for my project.

The second reference [6] is a paper about hierarchical organisation of interconnected rigid objects. This might be of relevance when creating more complex 3D models from basic primitives (such as spheres, boxes, etc.).


A variety of issues have to be addressed here. Firstly I want to mention issues concerning the existing gesture recognition system (implemented by Gail Shaw):

The existing system provides recognition for six different gestures (originally intended to drive a 3D-paint application). This might not be enough for my intended application. On the other hand I am aware that having to learn too many gestures in order to control an application will demand too much of the user. A compromise will have to be found.

The gestures currently recognisable by the system have flaws. E.g. the select gesture (which will be extensively used in my application) may be mistaken for the delete gesture with the obvious consequences. The choice of gestures will have to be optimised.

An idea that crossed my mind was to use a finite state automaton to distinguish different gestures. This will solve the constraint that individual gestures will have to be independent (gesture 1 + gesture 2 != gesture 3) instead only each pair of gesture will have to be distinctly different. A complex gesture can then be comprised of a succession of several simple gestures. This poses several difficulties in itself. One aspect is that the chance of misinterpreting a gesture will become greater as the error for the recognition of the gesture components accumulate. Another aspect is that of identifying the gesture components. If say, one base component was a stroke to the right and another one was an arrowhead (as in Shaw’s move gesture) one could build up the arrow to the right gesture. The problem arising would be to identify where the arrowhead started and how long the stroke to the right is. The latter problem could be solved by implementing that any succession of strokes to the right is still a stroke to the right, but since the current system does not discriminate spatially, we will not be able to tell apart the individual components of the arrowhead and the stroke to the right. This might be alleviated by considering in more detail the angle between components but definitely needs looking into. If I can work around the above mentioned problems I will be able to extend the set of recognisable commands while not increasing the complexity of the algorithm dramatically. This will also allow for easier adaptation of the gesture recognition component to other applications.

Another drawback of the current design is that it does not take into consideration the user of the system and so the performance (i.e. the success rate of the recogniser) varies from person to person. It would be very advantageous, if the system would be able to adjust to different users (just like current speech recognition systems will initiate a training session with each new user). Since the existing system does not make use of involved techniques such as neural networks or fuzzy logic, this might be as simple as adjusting the solid cut-off values for each user (also using some kind of training calibration).

Some other issues to be addressed involve the application side of the project.

In comparing several 3D modelling packets and looking for similarities I found that most applications include the following concepts: vertices, objects and groups. An object is a collection of interconnected vertices and a group is a collection of logically connected objects. This allows for two very useful concepts: Any primitive object can be customised by modifying its vertices and complex models can be constructed from grouped objects. I intend to incorporate these concepts in my application, because I find them computationally useful and because I believe them to be known to the computer literate user (who will thus be able to concentrate on using the application instead of learning how to use it). The hierarchical structure aforementioned could be facilitated here. Thus moving one object would automatically result in all attached objects being moved as well. Unfortunately I didn’t have the time to familiarise myself with the CoRgi system as yet, but there might be support for hierarchical methods already.

I intend to use visual aides to facilitate the gesture input mechanism. Namely, I envision a sort of "mousetrail" as seen on some laptop computers. This will probably be partially transparent and fade out with time. The intention is to give the user a feedback of what the system "sees" of the gestures performed by the user.

Other visualisation aides will concern the representation of the model to be created. As mentioned above a very nice idea came from the 3D CAD paper. They placed their model in a Cartesian room (all walls perpendicular) and cast light from infinity and normal onto all walls, thus creating a 2D shadow of the object in the plane of the specific wall. This can be useful when a planar view of the object is needed (to check contours etc.). Other aides listed were: texture, shading (of the object itself), perspective distortion, amongst others. Different viewmodes (wireframe, solid, semitransparent) will definitely be useful.

The problem of selecting vertices, objects and groups will have to be addressed. When considering painting over any of the above mentioned to select, then the issue of which ones are in the front and which ones are in the back arises, but this should be easily solvable. Another question is if selection will happen in object space (like painting directly over vertices, etc. to be selected – which might be less intuitive) or projection space (like painting over vertices with a brush on a long stick – which might make positioning less accurate).

Automatic insertion of vertices along edges connecting vertices that have been moved far from their initial positions to smooth the edge has to be looked into and of course the opposite (automatic deletion/optimisation of vertices that are very close together).

These are the main aspects I can think of at this stage of the project.


On the software side of the output will be a 3D modelling application that can be driven solely by gestures as input and aides the user in doing so. A Positive side effect might be the improvement of the existing gesture recognition system.

The intellectual output is supposedly an insight into the feasibility of a gesture mechanism as a means of communicating efficiently and accurately with a VR system (or any other system that supports gesture recognition for that matter).


If the gesture input technique proves to be successful, other applications can be rewritten to facilitate its use. In some cases this could be as simple as writing a front-end to the existing application, but in many others more adaptation will have to be provided.

Obviously the gesture recognition system will always be extendable in versatility, reliability and efficiency.


[1] http://vision.ucsd.edu/ieeeMultimedia/node6.html

[2] Gail Shaw "Hand gestures as a method of interacting with Virtual Reality" (Rhodes, University – November 1998)

[3] Rubine, R. “Specifying Gestures by Example”, SIGGraph 1991

[4] http://www.hitl.washington.edu/projects/knowledge_base/virtual-worlds/...

[5] http://www.igd.fhg.de/~stork/papers/mvd95/mvd95.html

[6] http://www.student.hk-r.se/~pt93mm/thesis/papers/hierarcy-instance/hier_ins.html

<Back to Main Page>