The Work done so far...
("don't you ever think it'll stop!")
The first major part in getting to grips
with the project was to learn all there was to know about the
existing one. This I did by reading the project report by Gail
Shaw and going through printouts of her code. Some improving
suggestions came readily to mind, while others took some time. A
lot of code was repeated (the spline would be traversed for each
feature) without really providing new computational information.
I therefore decided to remodel the system.
The linkedlist of 3Dpoints that was previously passed around now
became a structure of its own. In an object-oriented manner, it
now knows its data-points, its latest length and angle and can
compute these values autonomously.
The GestureRecognizer is presently holding the pieces together.
It holds the spline and the features and obviously knows about
the gestures. New Features can easily be plugged into the system
and repetitive calculations are no longer necessary, since the
Spline holds all information important to the features readily
available. In iterating through the list only once, the spline
calls each feature and causes it to perform its feature-specific
calculations. After all Featurevalues have been extracted, the
evaluation can take place.
The Interface to the system was not changed as yet and probably
won't.
This is not so much a politic statement,
as more the latest development with the recognition part of the
VRGestureRecognizer. The previously implemented recognizers
(Recognizer1 and Recognizer2 respecively) had some severe
shortcomings.
Recognizer1, based on a solid cut-off mechanism simply determined
wheather a the computed features of a gesture lied strictly
within the bounds specified by an upper and lower bound. The
first gesture to lie within these bounds would be said to be
recognized. This mechanism heavily relies on the independence of
gestures, as no two gestures may have the same allowed ranges for
their characteristic feature values.
Recognizer2 improved on this and calculated an error from an
expected value for each gesture and feature. The gesture that
minimized the error would be choosen. This proved a distinct
improvement, but failed under certain cicrumstances. Some
features would have the problem of being extremely unreliable for
specific gestures and thus the error returned from these features
would "drown out" any small errors and dominate the
recognition process.
This is where the idea of a voting system evolved. Instead of
accumulating all errors in one big result I looked at the errors
individually. This was first realised for the original three
gestures, where a gesture was recognised, if two or three of the
features agreed on a gesture. Soon the concept hardened and was
generalized to any amount of gestures.
At the present time any number of features can "vote"
on what they think the gesture to be recognized is. The absolute
majority (two-thirds or more) will decide on the final result.
Not only does this make it possible to ignore recognition
outliers, but also an estimation of confidence can be computed.
The more features agree on a result, the more reliable the result
is theoretically.
The next step was to generally forbid some features to vote for
certain gestures knowing that they are
"trouble-causers" and their judgement cannot be
trusted, thus disallowing them to spoil an election. The way this
was achieved was to generate a normalised standard deviation of
sample data and thus get an estimate of the reliability of the
features for a particular gesture. The smaller the normalised
standard deviation the more reliable it is considered. The
introduction to this new featurevalue also can be used to improve
the mechanism of recogniser2, by weighing the respective errors
with the reliability-factor provided by the standard deviation.
Presently, features are allowed or disallowed according to their
standard deviation value falling above or below an arbitrarily
assigned threshold value.
Having developed the new Recognizer
system, I was in urgent need of a new feature to improve the
usability of my system. Already from a very early stage in the
project I wanted to get some information about the geometrical
shape of the object, as in "is it round" or a triangle
or a rectangle and so on. I decided it would be useful to know
how many corners (discontinuities in an otherwise smooth curve)
are in a spline. With the restructured recognition model in place
this was relatively easily accomplished.
I derived a new feature "Kinkyfeature" from the Feature
Class, and implemented the ComputeFeatureValue method as a
LowPass-Filter with debouncing capabilities. This worked very
well for gestures that actually have "kinks" in them
(like the arrow, the cross, the scribble, etc.) but would not be
totally reliable for smooth gestures (such as the select or the
circles gestures) for some irregularities were still able to
penetrate the filter. To remedy this I decided to multiply the
number of corners found in a gesture by the sum of the
derivatives of the curve at those points. This made for some
improvement, but still needs looking into.
The new recogniser introduced earlier not
only proved to be more adaptable and autonomous than the previous
ones, but also evened the way for the next big step in improving
the gesture recognition system: to include the ability to learn
new gestures and/or train existing ones.
The system will prompt a user to repeatedly perform a certain
gesture a number of times and then evaluate the data. This will
be done in several steps: The gesture will be recorded and
featurevalues for all features will be calculated and tabulated.
Outliers will then be eliminated from these recorded
featurevalues and afterwards maximum, minimum, average and
normalised standard deviation will be calculated. From these
values the system will be able to judge the quality of a gesture
(i.e. if the majority of features would be disallowed to vote on
that gesture it is a "bad" gesture) and also be able to
tell if it is too similar to an existing gesture.
In order to test the system's learning curve, I recorded some data and let it run through the evaluator and the validator. The evaluator successfully pointed out weeknesses and strengths of features regarding each gesture. In a sufficiently large data-set, the outliers even for feature 3 were spotted and the data-set thus smoothed. From visual inspection of the resulting graphs, the evaluator prooved a valuable tool in learning new gestures or training existing ones. The fact that the validator seems to be too strict in rejecting features for the recognition process needs further looking into.
But for all of those who can't take the suspense any longer: Here are the graphs!!!
Legend:
Feature 0 - Curvature
Feature 1 - Absolute Curvature (no alcohol involved
here!)
Feature 2 - LengthOverDisplacement
Feature 3 - KinkyFeature
Note:
Not all gestures were recorded the same number of times. This
results in the frequency-distribution having higher values for
often-recorded gestures. Since the absolute values of the
frequency distribution are irelavant to the discussion, care
should be taken only to look at the spatial distribution and not
the absolute values. For a closer look at the gestures mentioned
in this section follow this link.
![]() |
In this Histogram we can clearly see some of the outliers spotted by the evaluator (e.g. Tick-values around 2217 and 2253 or Scribble-value of 2519). The unpredictability of the Arrow-values (being grouped in two regions) resulted in Feature 0 not being eligible to vote for this gesture. |
![]() |
Here it becomes obvious how several Gestures yield the same values for Feature 1 (i.e. Select & Cross as well as Hill & Tick). These similarities are obvious, when looking at visual representatins of these gestures. The Hill is basically a round Tick and the Cross a "kinky" Select. These situations will be spotted by the validator, who disallows Features to vote for Gestures that are too similar. |
![]() |
The graph here is represented differently because otherwise some gestures would overlap others, rendering them invisible. The spikes are of interest here: It looks like a considerable overlap occurs in basically all the gestures (except Select, which for apparent reasons is disallowed anyway). Praxis shows though that Feature 2 is well able to distinguish most of the gestures (except Tick and Arrow maybe), so that the previously assumed strategy for the validator has to be reviewed. |
![]() |
Feature 3 shows well defined integer values for the
gestures. An apparent flaw is that Select, Circles and
Hill naturally share the Featurevalue 10, which makes
them indistinguishable. It has been seen, that for a
sufficiently large data-set outliers are removed even
here with great confidence. (Notice the aforementioned unevenly distributed number of times the data was recorded for each gesture. Also notice that the spatial distribution is not affected by this) |