Many image manipulation systems can mimic painterly techniques by applying digital filters to the image data. Commonly found filters include

coal, watercolor, oil. Depending on the size (area) of the picture, the filters in use and their implementation the use of these filters is in general computationally very intensive and in most cases far from real-time. To render an object or a scene in a painterly style, we have to deal with three problems and find solutions that are implementable in an efficient real-time manner. These problems are:

- to render an object in a customary style (i.e. common lighting, textures, etc.)
- to sample and/or filter this rendering
- to display the filtered version of this rendering
In theory it would be possible to render an object directly in a painterly style, but since no established methods for this exist, we could not take advantage of modern accelerated rendering hardware.

For demonstration purposes we will use the image shown in Figure 1. Even though it is not a rendered image but the digitised version of an actual photograph, we take the initial rendering for granted, as many established methods exist to create (realistic) imagery. Our concern lies with sampling/filtering the image and displaying the modified version of the original.

* Figure 1 - Demo picture. Choosen for the clarity in colours and
presence of well
defined edges (sign) as well as fuzzy regions (tree).
*[All originals copyright Gunnar Schulz 1999]

To demonstrate the effects created by modern image manipulation software, we display some samples in Figure 2. The degree to which these filters analyse the original picture is unknown to the authors, but we assume that some filters may make a single pass over an image, while others detect edges, perform recursive refinements or other

expensiveoperations. To create the pictures below with Photoshop 5.5 on a PIII system running Win98 on 65MB of RAM took roughly half a second per picture, which amount to two frames per second. The size of the pictures is small (218 x 218) and the complexity of the algorithms implementing the filters is proportional to the area of the images. As the area increases inversely proportional to the square of the distance of the object to the viewer, we can see that the computation of the filtering stage is highly non-linear and will result in an extreme performance loss when an object approaches the viewer.

Paint Daubs effectSpatter effectSponge effect

Figure 2- The original picture after different filters have been applied.This has in fact been observed by us after we implemented the following rendering scheme:

- Render the object using CoRgi's Visual Representation (including all textures, colors, etc.)
- Grab the frame-buffer-area containing the rendering (determined by a projection of the object's bounding box onto the viewplane)
- Apply the filter (in this case an oil-filter, which works on each pixel and a radius of pixels around it - thus a very expensive filter) to the grabbed sub-frame
- Re-paint the filtered sub-frame (using the glPutPixel routine)
Several tricks and optimisations have been applied in the implementation of the above-mentioned scheme, which will be discussed in an extra section as they are generally applicable and non-specific to painterly rendering. The actual filter that we used in our implementation has been applied to pictures in Figure 3.

Oil filter applied onceOil filter applied twice

Figure 3- Oil filter algorithm with radius of 10 pixels applied once and twice respectively.To overcome the very severe real-time limitations of the first implementation, we designed and implemented our own filtering and re-painting technique. Here we make use of brushes (fundamentally luminance-textures of monochrome or grayscale values) which are painted over the scene. Various factors influence the re-rendering. These include:

- Brush Textures
- Number of brushes
- Brush sizes
- Brush Movement
- Source-picture sampling technique
The

Brush Textureobviously has an influence on howstrokesordabsare applied to the painting. Several Textures can be used to obtain a heterogenous mixture of brush-shapes.The

number of brusheshas various effects on the re-rendering. It determines the rendering-rate and has to be high enough to cover the painting-area while being small enough to allow fast rendering. In practice it should be proportional to the area of the original image to provide a constant resolution. Fewer brushes can be used if the original rendering ispainted over(as opposed to re-rendering on a blank back-ground) because the brushes are blended with the back-ground (see Figure 4 for details).The

brush sizehas to be choosen in accordance with the number of brushes to cover the entire painting-area. During the rendering the size is changed to provide a greater variety in shape and size of brushes.

Brush Movementis random and roughly follows a brownian motion. This provides a dynamic appearance, mimicking the inconsistencies one would expect from a human painter aiming to produce an animation in a painterly style. We've tested our model with a totally random appearance, but found the resulting flickering too disturbing, therefore we added the brownian motion.

Sampling of the source-pictureis determined by the position of the brushes. A single pixel's color value is chosen to determine the brush's color. This approach is simple, fast and effective. More elaborate approaches can be imagined and could be used to implement different painting-styles.

Rendering on blank BackgroundRendering over Original

[see Gallery for details]Figure 4- Our Brush-Oil filter with 5000 Brushes on blank Background

and with 2000 Brushes rendered over the original picture.

Our latest approach is 100% supported by modern graphics accelerators. Frame-rates are interactive and real-time. In addition to this the great number of factors influencing the re-rendering allow for numerous of different painterly styles to be implemented. Possible additions with respect to this are:

- Analysis of the picture (i.e. following gradients with brushes oriented along them) to punctuate on the stroke-technique
- Sampling of the color-space (we use 24 bit color), but a specific palette could be applied easily
- Brush-size could be adjusted to the degree of detail in regions of the picture (to allow for smaller details to remain distinguishable).
It should be noted that most of these (and other additions) can be implemented without a significant loss in performance. Figure 5 shows a typical picture that has been processed with our Brush-Filters. Please Visit the Gallery for more examples.

Figure 5- a typical Re-rendering with our current Painterly-style system.

In order to implement our Painterly-style renderer (PSR), we solved several problems, most of which are not limited to PSR. Several of our techniques are very efficient and due to their generality can be applied to a diversity of other situations.

## Minimising the Scanning Region

A naïve approach to aquire the image to be processed would be to select the entire screen. Since we already determined that the complexity of a filter is in most cases at best proportional to the area of the image to be processed, we want to minimise this area as much as possible. This has to be done in a cheap and efficient manner, as we do not want to trade filtering-time for time spent on minimising the scanning region. We do this by projecting the bounding box of an object into screen-space and selecting the scanning region by evaluating the extrema of the projected vertices. This is shown in Figure 6. This approach is fast (projection of exactly 8 vertices, independent of the size or shape of the object) and in most cases a fairly good approximation of the screen area of an object while guaranteeing to contain all of its screen-pixels.

**Figure 6** - An object (star), its bounding box (green) and

the connected maxima of the projected bounding box (red)

## Projection of Bounding Box (BB)

Our next problem was logically linked to the previous one and dealt with quickly and efficiently projecting the vertices of the BB onto Screen-space. This was done using an innovative method which takes advantage of the capabilities of modern day graphics accelerators (providing T&L engine on-board). In general this projection can be achieved by a Matrix multiplication in 4D (homogenous space), provided we can obtain the correct multiplication Matrix. In other words the projected vertex

p(vector) of another vertexv(vector) can be obtained with:

p= Mv [Equation 1]where M is the mult. Matrix. This itself might in practice be comprised of several other Matrices according to:

M = M

_{1}^{.}M_{2}^{.}...^{.}M_{n }[Equation 2]Our method takes advantage of two facts. Firstly, the multiplication of four 4-row vectors

vwith a 4x4 matrix M can be written as the Matrix multiplication of M with another Matrix V containing the row-vectors of_{i}v. The results can then be found in the rows of the multiplication result:_{i}

p= M_{1}v,_{1}p= M_{2}v_{2,}p= M_{3}v_{3,}p= M_{4}vis equivalant to_{4}Secondly, the Matrix M can easily be obtain in OpenGL with a series of commands indicating the current projection and model-view matrix. Combining these two facts allows us to do the following:

Construct two Matrices containing the 8 BB vertices as column vectors (4 a piece) [once off]

Determine the current projection Matrix (including Modelview Matrix) [per frame]

Multiply each of the BB Matrices with the projection Matrix and obtain the results

The projected vertices are now row vectors of the resulting Matrices

This method is shown in C++ code in Listing 1. The advantage of our method is that it takes advantage of the Hardware Matrix Multiplication capabilities of modern Graphics accelerators and is therefore very fast. In a multi-threaded system, we could thus use a graphics card as a secondary Floating Point unit, specialised in Matrix Multiplication (for a 4x4 Matrix this amounts to 16*4=64 multiplications and 16*3=48 additions, a total of 112 floating point operations).

Some problems with our method have been raised:

Is only advantageous if T&L is performed in Hardware (like on GeForce cards), as otherwise the Transformation-commands are performed by the CPU and FPU anyway.

Transferring Data (the BB Matrices) on the graphics Bus (even for specialised AGPxX buses) is much slower than transferring it on the CPU bus.

While we acknowledge the first point, we assume that as the hardware development progresses more and more graphics adapters will be equipped with T&L on-board engines. In fact NVIDIA is currently developing GPU (Graphics Processing Units) that are uniquely programmable by the user to perform customised lighting calculations and other effects.

The second point (while we have not seen tests on this or performed these ourselves) could also be a valid one. Again, the trend in current hardware development is towards faster and wider Graphics-busses. In addition to this, we could imagine our method being used in other applications, where the data transfer is minimised. For example the RGB-values of an image could be interpreted as co-ordinates in RGB-space. The image could then be loaded onto the Graphics card using very efficient vertex-arrays. Depending on the manufacturer and implementation these vertices can and are cached on the graphics card memory itself. Different matrices can then be applied to the

image datato perform image manipulation tasks - an otherwise extremely processor intensive task.Our point is that modern graphics accelerators are very powerful (and quite complete) micromachines, containing significant amounts of memory (currently between 16MB and 64MB), processing power (even if specialised) which could and should be utilised for other tasks than graphics processing.

## Conversion between image-space and screen-space

By using normalised co-ordinates (ranging [0..1]) for our brush-positions, we can easily convert from image-space to screen-space, by simply multiplying the brush-co-ordinates by the dimensions of the image/screen to obtain image/scree-space co-ordinates. This de-couples the image and the screen and allows for indepedent dimensions.

## Re-painting the scene

When re-painting the filtered scene, we have to make sure that the brushes face the viewer (ie the normal of the brush-area is anti-parallel to the view-vector). In other applications this is called

Billboardingand can be done in several ways. If the current view-point was obtained by using the position and orientation of a virtual viewer, we can simply undo (reverse) the transformations responsible for the view-point when drawing the brushses. A second, more general approach is to save the current Matrix-stack, Reset it to the default rendering-context, Render the brushes and restore the Matrix-stack to its last state. For the sake of independence of the virtual viewer, we chose the second method for our implementation. The outline of this method is listed in Listing 2.

// allocate space for the Matrices GLfloat modelViewMatrix[16],projectionViewMatrix[16],tempMatrix1[16],tempMatrix2[16] ; glGetFloatv(GL_PROJECTION_MATRIX,projectionViewMatrix); // get projection View Matrix glGetFloatv(GL_MODELVIEW_MATRIX,modelViewMatrix); // get Modelview Matrix glPushMatrix(); // save original matrix stack (as A) // load the projectionmatrix and multiply with modelview to obtain final matrix mult glLoadMatrixf(projectionViewMatrix); glMultMatrixf(modelViewMatrix ); // save this state (as B) glPushMatrix(); glMultMatrixf(bbMatrix1); // load and multiply first 4 BB vertices glGetFloatv(GL_MODELVIEW_MATRIX,tempMatrix1); // get result in tempMatrix glPopMatrix(); // return to state B glMultMatrixf(bbMatrix2); // load and multiply second set of 4 BB vertices glGetFloatv(GL_MODELVIEW_MATRIX,tempMatrix2); // get result in tempMatrix2 glPopMatrix(); // restore to original state A |

**Listing 1** - Using OpenGL to perform parallel Vector-multiplication

(on T&L Hardware). Vecrots have been stored in bbMatrix1 and bbMatrix2

respectively

glMatrixMode(GL_PROJECTION); glPushMatrix(); // save projection-view Matrix glLoadIdentity(); // Reset it to Unity glMatrixMode(GL_MODELVIEW); glPushMatrix(); // save ModelView Matrix glLoadIdentity(); // Reset it to Unity //---------------------- //-- Do the Rendering -- // Use 0 for z-Co-ordinate (e.g. glVertex2f()) //---------------------- glPopMatrix(); // restore modelviewmatrix glMatrixMode(GL_PROJECTION); glPopMatrix(); // restore projectionview matrix glMatrixMode(GL_MODELVIEW); |

**Listing 2** - Billboarding - Rendering objects which always face

the viewer.

The code given here may not be ideal or stright-forward, but has proven reliable and generic. If you have suggestions or comments for improvement, please don't hesitate to contact the authors.Note: