eScience Lectures Notes : Java Optimisation

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." - Donald Knuth
Although you shouldn't optimise too early, performance is a Fundamental Requirement of any real time application.
most of the techniques described relate as much to design as they do to implementation
Optimising Java code usually means working at a higher level to guarantee that your object structure and class hierarchies are efficient.
Design an efficient system, build it, debug it, then profile and optimise it.

You've laboured away all semester on your 3D masterpiece. It's packed full of interesting features, clever behaviours, gorgeous geometry and tasty bitmaps. Java3D shouldn't get any sweeter; this is what the Wedge was built for. But there's just one thing ... the performance. Your magnificent application crawls along at a snails pace. It's real-time alright, but only if you think in geological terms. The universe itself will end before your app! It chugs. It bites. It blows. It stinks.

So how do you crank up the speed? In this article we'll run through a number of general optimisation techniques for all Java programs as well as providing some hints about how to deal specifically with Java3D.

Before we start, lets get one common misconception out of the way. Many people think that Java is inherently slow. It isn't. There are plenty of examples of Java running just as fast as the equivalent C++. And it's not just on contrived benchmarks or specialised numerical code: full-blown Java applications can be as quick as native compiled code if you are careful to avoid some obvious bottlenecks. So tempting as it may be to blame the Java Virtual Machine (JVM), the problem is almost certainly elsewhere. The trick is to avoid excessive use of the more expensive features of Java. A little careful optimisation and your sedentary Java code will be buzzing on a caffeine high!

One other thing before we start:

"More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity." - W.A. Wulf

I'm not sure who Mr Wulf is, but that's some sound advice he's offering. Someone who I have heard of is Donald Knuth. Don isn't known for his work on 3D visualisation, but he's still got heaps of street cred so it's worth listening when he says:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." - Donald Knuth

The root of all evil. Strong words, but the message from both these lads is that optimisation should come second - program correctness must always come first. Make your program work, then make it work faster.

That said, it's easy to go too far the other way, and treat optimisation as one of the final steps in debugging. Although you shouldn't optimise too early, performance is a Fundamental Requirement of any real time application. That's Requirment with a capital R and like all other requirements it must be considered carefully in the design, not left until the last minute. In fact most of the techniques described in this article relate as much to design as they do to implementation. Java doesn't give you the same opportunities as C++ when it comes to ultra fine code tweaking. Optimising Java code usually means working at a higher level to guarantee that your object structure and class hierarchies are efficient. The bottom line: design an efficient system, build it, debug it, then profile and optimise it.

Enough with the generalisations. In the words of Maverick and Goose in the 80's classic Top Gun, "I feel the need, the need for speed"...

Slide 3 : 3 / 20 : 1. Minimise object creation and use of Strings

Basic Java Optimisation Hints

1. Minimise object creation and use of Strings

The Java programming style encourages you to create lots of little objects which don't hang around for terrible long

Especially true when working with strings

How many object in the following code ? ...

Perhaps the cardinal performance sin in Java is to create too many objects. The Java programming style encourages you to create lots of little objects which don't hang around for terrible long. This is especially true when working with strings, where the compiler will magically create many little objects for you. But it's bad practice, and it can really hurt you're performance. For example, take a guess how many objects are used by the following code snippet:

    String my_str = "date:   " + new Date() + '\n' + 
                    "user:   " + System.getProperties.getProperty("user.name") + '\n' +
                    "thread: " + Thread.currentThread().getName();

12 objects. You could be stung for as many as 18 objects by some compilers.

New statement is an expensive operation

Java objects aren't small (18 - 20 bytes, but more typically the minimum object size is 40 bytes + your data !)

Slide 4 : 4 / 20 : 1. Minimise object creation and use of Strings

Basic Java Optimisation Hints : Minimise object creation and use of Strings (2)

1. Minimise object creation and use of Strings

Big Objects...

chew up virtual memory
swamp your valuable memory bandwidth
pollute your CPU cache
force the garbage collector to do more work

One big issue : the garbage collection

The answer is twelve (check out StringBuffer if you want to learn more) [1]. Twelve objects for a tiny little trace message! Allocating memory on the heap for just one of those objects is actually quite costly. It may not look like much, but the new statement is an expensive operation. Java objects aren't small either. The minimum size depends on what JVM you're running. If you're very lucky you might get away with a minimum of 18 - 20 bytes, but more typically the minimum object size is 40 bytes. And remember that's the minimum overhead the JVM requires ... actual data will be extra! Big objects are bad because they chew up virtual memory, swamp your valuable memory bandwidth (the most limited hardware resource in current machines), pollute your CPU cache and force the garbage collector to do more work. With our little snippet above, all 12 objects have to be allocated on the heap, those big 40 byte headers have to be filled out with all the information the JVM requires, then the data is copied in, they get used briefly, and then they're discarded. But of course they don't go away immediately. Oh no, they hang around until the garbage collector finally catches them.

And all that takes time.

In a loop, a simple Object creation becomes a set of object creations

In a tight loop, even a handful of unnecessary object creations can blow out a big chunk of memory and cost you a lot of time. What kind of tight loop? Well how about the rendering loop called every frame in your Java3D application.

Example of loop : the rendering loop called every frame in your Java3D application

Be very careful about how and when you create objects (new and String use)

Pre-allocate and recycle your objects: get the memory once and then reuse it

Java3D code : Point, Matrix and Vector are ideal candidates for this treatment

"setIdenty" to get a clean Matrix

The answer is to be very careful about how and when you create objects. Take particular note of where you use new and where you work with strings. Where possible you should pre-allocate and recycle your objects: get the memory once and then reuse it. If you always work with a fixed number of objects each don't allocate them every frame, allocate them once before the first frame has started. For Java3D code Point, Matrix and Vector are ideal candidates for this treatment [2]. Working in 3D inevitably involves matrix and vector manipulation. If you allocate your matrices every frame then you'll burn memory, waste cache and force the garbage collector to do much more work (see tip #4). Much cheaper to call setIdentity than to create a new object every time you need a fresh matrix.

Slide 5 : 5 / 20 : 2. Take a good look at your method call chains

Basic Java Optimisation Hints

2. Take a good look at your method call chains

General rule of thumb static methods are cheapest, then final methods, then instance methods, then interface methods and finally synchronized methods:

Methods. Great things aren't they? A lot of fun. But method calls in Java are a bit more involved than in other languages and in the wrong circumstances they can become quite expensive. How expensive? Well that depends in part on your JVM but mostly on the type of method. As a general rule of thumb static methods are cheapest, then final methods, then instance methods, then interface methods and finally synchronized methods:

static < final < instance < interface < synchronized

And the difference is not trivial: in one popular JVM interface methods take three times longer to call than static methods declared in a class. In the same JVM synchronized methods are almost seven times slower than statics!

Single method call the cost is still pretty minimal, but chaining them together mounts up the cost (methodA() calls methodB() calls methodC())

Long call chains are a common feature of Java's event model

Now for a single method call the cost is still pretty minimal. But once you start chaining them together (methodA() calls methodB() calls methodC()) the costs mount up. Long call chains are a common feature of Java's event model: "this listener registers with that component and when it receives event X it passes on event Y". This is known as delegation and it can be an elegant technique to simplify your design. However, if you delegate your event handling too much you get long call sequences. If such a sequence is called every time the user twitches the mouse, then the costs can blow right out.

Sometimes you don't have any choice about the type of method you use. If you're writing an AWT event listener you simply have to implement one of the well-defined interfaces. But for your own code you should design the shortest possible call chains for the performance critical code. The crucial design tips are:

understand the costs of different types of method
use static and final where it makes sense but without compromising generality or reuse
use abstract classes instead of an interfaces if you can
keep call chains as short as possible
use recursion with great care
always make your private methods final

Slide 6 : 6 / 20 : 3. Thread synchonisation is expensive

Basic Java Optimisation Hints

3. Thread synchonisation is expensive

Multithreading is good to improve user interface reactivity, but...

Acquiring locks to guarantee thread safety is slow

If you intend to make extensive use of Java's threads you probably ought to grab a couple of good textbooks

AWT uses a couple of threads, Java3D uses lots of threads

You don't have to lock everything : identify a minimal model for synchronisation

This one is common sense really: acquiring locks to guarantee thread safety is slow (see tip #2). Concurrent programming is a major subject in it's own right and if you intend to make extensive use of Java's threads you probably ought to grab a couple of good textbooks. Even if you don't create any threads of your own, the Java libraries use their own: AWT uses a couple of threads, Java3D uses lots of threads. You're may not be alone... But remember just because your application has multiple threads doesn't mean you have to lock everything. An important part of designing a concurrent system is to determine what data structures each thread will touch, and so identify a minimal model for synchronisation. Look carefully at the Java class libraries too, especially the data containers (see tip #5) some of which are thread-safe while others are not.

Slide 7 : 7 / 20 : 4. Collect your own garbage, with care

Basic Java Optimisation Hints

4. Collect your own garbage, with care

Java programs run nicely most of the time, but now and again they grind to a halt... that's garbage collection time !

Automatic memory management

A common performance complaint with Java programs is that they run nicely most of the time, but now and again they grind to a halt. As often as not the culprit is the garbage collector. Automatic memory management is a great feature, but it comes at a price. Collecting that garbage is a slow and difficult task and although most modern JVMs try to minimise the costs, when the collector-man comes knocking you will know about it. So what do you do?

Avoid garbage collection in first place by reusing your objects

Schedule the garbage collector yourself : System.gc()

Do this when things are quite

Well obviously the best thing is to avoid garbage collection in first place by reusing your objects (see tip #1). The alternative is to schedule the garbage collector yourself, at a time which suites you best. You can manually invoke the collector at any time by calling: System.gc(). The precise effect of this call will vary from JVM to JVM, but in general it will clean out the memory so that the collector won't need to run for a while. If you do this when things are quite it can save you some grief when things get busy.

Java3D : Usability studies : it is better to have a slow but constant frame rate than a fast but variable one

If you regularly schedule the garbage collector you can average out the costs

This technique is especially useful for Java3D applications. Most usability studies have concluded that for interactive applications it is better to have a slow but constant frame rate than a fast but variable one. Anyone who's played Quake will know there's nothing more annoying than a game where everything is ticking over nicely, then just as the action hots up the frame rate bombs out. Averaging 60 frames per second (fps) sounds great, but if that average varies from 20fps to 80fps it will be more annoying to users than if you simply maintain a constant 30fps. So if you regularly schedule the garbage collector you can average out the costs. Sure the highs won't be quite as high, but then the low's won't be quite as low either. (We'll see how this same observation affects behaviours in tip #12 and tip #15).

Slide 8 : 8 / 20 : 5. Use arrays [] for small collections of objects

Basic Java Optimisation Hints

5. Use arrays [] for small collections of objects

LinkedList, Set, Map and TreeSet (java.util package) are all very convenient, but they aren't necessarily high performance.

Use a separate Iterator object, means more objects, chained method calls...

Casting and a slow runtime type check

For very small, simple collections of objects or primitive types you are much better to use arrays (6 - 8 objects).

In the java.util package you'll find a bunch of nice container classes for managing groups of objects. LinkedList, Set, Map and TreeSet are all very convenient, but they aren't necessarily high performance. To access the contents of these containers you typically use a separate Iterator object. That means more objects, chained method calls and other speed sapping overheads. Also, the containers all store their contents as type Object, which means casting and a slow runtime type check with each access. For very small, simple collections of objects or primitive types you are much better to use arrays.

It's important to emphasise that this is only true for small collections: say to a maximum of 6 - 8 objects. When you have a large number of objects a HashMap or some form of tree can be extremely efficient. Obviously you should analyse the complexity of your algorithm and choose a data structure which makes sense. But for small collections it will always be hard to beat an array.

Slide 9 : 9 / 20 : 6. Be afraid of Reflection and Serialization

Basic Java Optimisation Hints

6. Be afraid of Reflection and Serialization

Funkiest features are also it's greatest bottlenecks

Reflection is the abiliy to introspect a class and dynamically work out what methods and fields it has

Invoking a method through reflection is approximately one thousand times (1000x) slower than a normal method call

Serialisation is the ability to take a group of objects and dump them out into an array of bytes

Very handy for loading and saving data, and also useful for sharing data between machines (Java RMI)... But spectacularly slow

Often : abstract features = performance bottlenecks : Dynamic class loading, JDBC, parsing XML documents, LDAP directory access, CORBA networking...

Some of Java's funkiest features are also it's greatest bottlenecks. Reflection is the abiliy to introspect a class and dynamically work out what methods and fields it has. It can be quite useful when working with JavaBeans and sometimes gets used in event processing code. But it really bites when it comes to performance. Invoking a method through reflection is approximately one thousand times (1000x) slower than a normal method call. That's three orders of magnitude! Better go and put the kettle on, we could be here for a while...

Serialisation is the ability to take a group of objects and dump them out into an array of bytes. Very handy for loading and saving data, and also useful for sharing data between machines. Serialisation is used extensively in Java RMI to pass parameters back and forth. A great convenience, but also a great way to burn CPU cycles. Serialisation is spectacularly slow! Gob-smackingly inefficient, so use it only if you want your application to be gob-smackingly unresponsive.

In general you should treat any of Java's more abstract features as performance bottlenecks. Dynamic class loading, JDBC, parsing XML documents, LDAP directory access, CORBA networking it's all great stuff but none of it was designed by speed freaks. Handle with care.

N.B. : JavaTM Remote Method Invocation (RMI) enables the programmer to create distributed Java technology-based to Java technology-based applications, in which the methods of remote Java objects can be invoked from other Java virtual machines*, possibly on different hosts.

Slide 10 : 10 / 20 : 7. Never ignore Exceptions

Basic Java Optimisation Hints

7. Never ignore Exceptions

Okay, okay, we've all done it. As a quick and dirty way to make a piece of code compile you ignore the exceptions and end up with something like:

    try {
      dodgeyMethodCall();
    }
    catch (Exception e) {
      ;
    }

It keeps the compiler happy while you get on with worrying about the rest of your algorithm.

That dodgey method full of bug could be throwing exceptions all the time and you'd never know.

Common fix is to dump the exception out to the System.err stream

But if you're running a Wedge application in full screen mode you probably won't be looking at the console output very much

It keeps the compiler happy while you get on with worrying about the rest of your algorithm. Trouble is, that dodgey method could be throwing exceptions all the time and you'd never know. That's bad for all sorts of obvious reasons, but also because exception handling is expensive and so your performance will take a hammering. The cost of the method invocation blows out, an Exception object is created (and then ignored), and who knows what knock on effects will occur if dodgeyMethodCall does part but not all of what it should.

The common fix is to dump the exception out to the System.err stream. That's better, but you can fall into the same problem if you don't actually bother checking the text output of your code as it's running. This may seem unlikely, but if you're running a Wedge application in full screen mode you probably won't be looking at the console output very much. That makes it easy to miss important exception traces.

So the lesson is never ignore exceptions: either in your code, or when they're reported to you in your console

Slide 11 : 11 / 20 : 8. Go native ... but only if you have a really good reason

Basic Java Optimisation Hints

8. Go native ... but only if you have a really good reason

When all else fails you may want to use the Java Native Interface (JNI) to jump out to native, compiled code

You might want to do some really low level, grimey optimisation work without having to worry about garbage collectors, strong type checking and the other elegant abstractions Java offers

You should only look to native code if you know exactly what you're doing and why. Think cost-benefit analysis.

Moving data in and out of the JVM's garbage collected memory space is not free. Calling Java code from native code is slow, and you still have to worry about thread safety

Native code brings with it major development headaches: you loose portability

You're much better to concentrate on algorithmic improvements

When all else fails you can use the Java Native Interface (JNI) to jump out to native, compiled code. "Hang on a minute" you might say, "isn't this tantamount to admitting that Java is slow after all?". Err ... well no it isn't quite, but it is an admission that you might want to do some really low level, grimey optimisation work without having to worry about garbage collectors, strong type checking and the other elegant abstractions Java offers. This is especially true if you're one of those sick-o types who hand code tight SIMD assembly routines to crunch through the inner-most loop of an image processing function or a scientific computation. Don't believe the hype about what modern compilers can do - nothing beats hand-crafted assembler!

This is, of course, you're absolute last resort. Lets be really clear on this point: you should only look to native code if you know exactly what you're doing and why. Think cost-benefit analysis. Profile your application extensively, understand precisely how much time each routine uses and account for every precious CPU cycle. Then way up any performance boost against the costs associated with going native. Moving data in and out of the JVM's garbage collected memory space is not free. Calling Java code from native code is slow, and you still have to worry about thread safety. Once all this is wayed up you may not see a speed gain with native code. Worse still native code brings with it major development headaches: you loose portability; you have to deal with the seedy underbelly of the JVM; debugging becomes a nightmare. In my experience you're much better to concentrate on algorithmic improvements since native code will only buy you a percentage or two in the margins.

In fact the reason I've included this tip in the article is to try and convince you not to go native. Some gung-ho types dive into native code at the first hint of performance trouble. But a "rush of blood to the head" is not part of many good design methodologies. Less of a recipe for success, most of the time it's either blind enthusiasm or blind panic! My advice is go talk to someone else before ordering that copy of "x86 for Dummies".

Slide 12 : 12 / 20 : 9. Set up your Canvas3D with care

Java3D Specific Hints

9. Set up your Canvas3D with care

If you work with Tiwi you won't have to worry about this one

To roll your own Java3D initialisation code, you need to get a Canvas3D

If you work with Tiwi you won't have to worry about this one, but if you decide to roll your own Java3D initialisation code take care with the Canvas3D. The crucial step is to make sure you use a GraphicsConfigTemplate3D when you create your GraphicsConfiguration. The wrong way to do things is as follows:

The wrong way to do

    GraphicsEnvironment ge =
        GraphicsEnvironment.getLocalGraphicsEnvironment();
    GraphicsDevice gs = ge.getDefaultScreenDevice();
    GraphicsConfiguration gc = gs.getDefaultConfiguration();
    Canvas3D aCanvas3D = new Canvas3D(gc);

The right way to do

    GraphicsEnvironment ge =
        GraphicsEnvironment.getLocalGraphicsEnvironment();
    GraphicsDevice gs = ge.getDefaultScreenDevice();
    GraphicsConfigTemplate3D devconfig = new GraphicsConfigTemplate3D();
    GraphicsConfiguration config = gs.getBestConfiguration(devconfig);
    Canvas3D aCanvas3D = new Canvas3D(config);

Little difference but depending on your graphics card it can mean a big improvement. The desktop machines in the eScience laboratory are particularly sensitive to this one so take note.

Slide 13 : 13 / 20 : 10. Play by Java3D's rules

Java3D Specific Hints

10. Play by Java3D's rules

The most obvious thing to affect the performance of Java3D is the scene graph you create

Read the documentation

The most obvious thing to affect the performance of Java3D is the scene graph you create. Java 3D recommends a number of basic things you can do to your scene graph in the interests of efficiency. These are all pretty obvious if you've read the documentation, but as a refresher remember to:

enable only the minimum number of capabilities required for each node in the scene - see SceneGraphObject.setCapability()
compile any static branches of the scene - see BranchGroup.compile()
set minimal bounding volumes for the Nodes in the scene - see Node.setBounds()

The first two won't make a massive difference to the performance of your code [3] but they're good programming practices to follow. Tight specification of bounds is more important, and can really improve picking and collision detection. Think a little more carefully when it comes to the bounds on Behaviour nodes. Java3D tries to encourage you to minimise behaviour bounds, but in tip #12 and tip #15 we'll look at why this isn't necessarily good advice.

Slide 14 : 14 / 20 : 11. Collapse chains of transforms

Java3D Specific Hints

11. Collapse chains of transforms

Dangers of having too many transforms in a scene

Every position and orientation of every object in a scene is specified with one or more TransformGroups, so inevitably they are going to be one of the most common nodes you use.

Java3D has to multiply together all the transforms from that leaf back to the root of the scene.

Where you can, combine a sequence of transformations together into a single TransformGroup.

Java3D does this automatically (compile a branch), but only for transformations which can't be read or written to.

Dear old TransformGroup, what a trusty friend it is. But sometimes your friends can lead you into bad ways. Mother always said not to take sweets from strangers, but she never mentioned the dangers of having too many transforms in a scene. Every position and orientation of every object in a scene is specified with one or more TransformGroups, so inevitably they are going to be one of the most common nodes you use. Trouble is to render a bit of geometry at one of the leaves in your scene Java3D has to multiply together all the transforms from that leaf back to the root of the scene. If you have long chains of transforms that can start to cost a bit. Perhaps Mother doesn't know best after all.

Where you can, it pays to combine a sequence of transformations together and so collapse a long chain down to a single TransformGroup. Java3D does this automatically when you compile a branch of your scene graph, but only for transformations which can't be read or written to. If you need to update the positions of objects it's far better to concentrate all those updates into one, or a small number, of transform nodes.

Slide 15 : 15 / 20 : 12. Combine behaviours and schedule them for consistent performance

Java3D Specific Hints

12. Combine behaviours and schedule them for consistent performance

The Java3D documentation tells you to set tight bounding volumes on behaviours so that they only run when they're absolutely required

tip #4 that it is better to have a constant frame-rate than a highly variable one.

The solution is to concentrate all your code into a small number of intelligent Behaviour nodes with large (or infinite) scheduling bounds.

Behaviour nodes are the smarts in you application. The interesting bits. By using Java3Ds interpolators and other behaviours in novel ways you can wire some clever logic into your scene without having to write a single line of code. The Java3D documentation tends to encourage this, and it also tells you to set tight bounding volumes on behaviours so that they only run when they're absolutely required (see tip #10). Wiring logic directly into a scene seems like a pretty cool idea at first and many new Java3D programmers take this approach to heart. Having tight scheduling bounds also seems like a cool optimisation by only running those behaviours that are actually visible.

So the temptation is to try to build everything out of the existing behaviours, and write very simple little behaviours to plug any gaps. It seems like a win-win situation. The Java3D scheduler has plenty of flexibility in running the minimum number of behaviours based on the visible area of the scene. You win too because by keeping your behaviours simple you get plenty of opportunity to reuse them in other applications. But not everthing that seems like a good idea actually turns out to be so. Fortran seemed like a good idea at the time. So did the Leyland P.76.

What have I got against behaviours? Essentially it comes back to the point in tip #4 that it is better to have a constant frame-rate than a highly variable one. Turning behaviours on and off all the time is a great way to guarantee inconsistent frame-rate. Sure it improves things for one or two individual frames, but the overall effect is more fluctuation in performance. This is a bad thing. It is also inefficient to use the Java3D behaviour scheduler to arrange what bits of your code should run when. To decide if a behaviour should run, Java3D has to check the bounding volume of the behaviour against the visible volumes of the scene. This means mapping the behaviour volume into world space (remember those long transform chains in tip #11) and then intersecting it with the view volume. Every behaviour, every frame.

The solution is to concentrate all your code into a small number of intelligent Behaviour nodes with large (or infinite) scheduling bounds. That way you get more consistent frame rates, and when it comes to turning on and off certain pieces of code you are almost always in a better position to make that decision than the Java3D scheduler (usually without having to map all sorts of complex volumes through a chain of coordinate transformations).

Good behaviour is rewarded with good performance.

Slide 16 : 16 / 20 : 13. Minimise your reliance on collision detection, or do your own

Java3D Specific Hints

13. Minimise your reliance on collision detection, or do your own

You're walking along quitely minding your own business, head stuck in a paper, oblivious to the world around you when smack! you walk into a lamp post. Collision detection - it can get you in the physical world, why not the virtual world too!

The most common form of physical modelling done in a virtual environment

Java3D provides a mechanism to do it automatically for you

Like all forms of physical modelling, collision detection is expensive to perform and the more objects in your scene the more complex it becomes

Java3D's collision seems not to be the best one and is slow too

Okay for detecting a basic two-object collisions in very simple scenes

If you plan to use it, reduce the number of objects that can collide to the barest minimum

If collision detection is a big part of your application, build your own.

A good source of information is actually the computer games industry

Collision detection is perhaps the most common form of physical modelling done in a virtual environment. Because it is such an important part of physical interaction Java3D provides a mechanism to do it automatically for you. But, like all forms of physical modelling, collision detection is expensive to perform and the more objects in your scene the more complex it becomes. Java3D's collision detection system has some fairly nasty limitations and even the odd outright bug. It's slow too. It may be okay for detecting a basic two-object collisions in very simple scenes but don't rely on it for anything even remotely complex. If you plan to use it, reduce the number of objects that can collide to the barest minimum.

If collision detection is a big part of your application you will probably need to do your own collision detection. This is not a trivial task, but fortunately collision detection is a well studied problem. A good source of information is actually the computer games industry. Modern games have very demanding requirements for physical modelling.Collisions are the basis of all the interaction in games such as Quake, Unreal and Half-life. Grab a good game programming text, or hit any of online resources (http://www.gamasutra.com is a ripper) to learn more. But be warned, writing a general purpose collision detector is not for the faint of heart!

Slide 17 : 17 / 20 : 14. Don't burn time in system callbacks / don't try to run everything at full frame-rate

Java3D Specific Hints

14. Don't burn time in system callbacks / don't try to run everything at full frame-rate

Don't burn large amounts of time in your event listening methods or in a behaviour's processStimulus method

If you do start burning serious amounts of time in a callback the AWT event queue will start to fill or Java3D will start to fall behind in it's processing.

This effect can snowball

This tip is as relevant to general Java programming as it is to Java3D: don't burn large amounts of time in your event listening methods or in a behaviour's processStimulus method. The thread that calls your listener method is not yours to do with as you see fit: it's an AWT or Java3D thread that has important work to do elsewhere. If you do start burning serious amounts of time in a callback the AWT event queue will start to fill or Java3D will start to fall behind in it's processing. This effect can snowball. For example, if you take too long to handle one event by the time you've finished there may be another three events waiting for you, then another seven, and so on. Like garbage collection (see tip #4) this can lead to inconsistent frame rates.

Start being lazy : do not try to do everything at full frame-rate

run different bits of processing code every alternate frame
run your complex processing code in a separate, lower priority thread :
"decouple the job of rendering from that of updating your application"

The solution is to start being lazy. That's lazy in the sense of lazy evaluation, not lazy as in "wearing your socks inside-out instead of washing them". Different kind of lazy.

The basic idea is not to try to do everything at full frame-rate. One simple way to do this is to run different bits of processing code every alternate frame - move objects one frame, check for collisions the next frame. A better solution is to run your complex processing code in a separate, lower priority thread and so decouple the job of rendering from that of updating your application. By making processing code asynchonous to rendering code, you get snappy visuals without crippling what your application can do. Just remember to take care how you synchronise your threads (see tip #3).

Slide 18 : 18 / 20 : 15. Don't be afraid to step outside Java3D

Java3D Specific Hints

15. Don't be afraid to step outside Java3D

A conclusion from the 3 previous examples

The trap that many first-time developers is to do everything with the scene graph

Understand the limits of Java3D

Update the major state of your application in a separate thread, C
Calculate your transformation matrices
Perform your own collision detection
Interface with the scene graph through a small number of Behaviour nodes

The last three tips have all really been leading us in the same direction. The message is simple: don't be afraid to work outside Java3D. It is great for presenting results and the scene graph does impose a degree of structure on your application, but it doesn't diminsh the need for a good design. The trap that many first-time developers is to do everything with the scene graph and then attempt to build up the extra functionality as a bunch of different behaviours. But that's like designing the GUI before you know what your application does!

So the last tip is simply this: understand the limits of Java3D and if it can't do everything you want don't be afraid of building major parts of your application outside it's scope. Update the major state of your application in a separate thread, calculate your transformation matrices and perform your own collision detection and interface with the scene graph through a small number of Behaviour nodes.

Slide 19 : 19 / 20 : Java 3DTM API Collateral — 1.2.1 Performance Guide

Java 3D^TM API Collateral — 1.2.1 Performance Guide

I - Introduction

The Java 3D^TM API was designed with high performance 3D graphics as a primary goal. Since this is a new API, many of its performance features are not well known. This document presents the performance features of Java 3D in a number of ways. It describes the specific APIs that were included for performance. It describes which optimizations are currently implemented in Java 3D 1.2.1. And, it describes a number of tips and tricks that application writers can use to improve the performance of their application.

II - Performance in the API

There are a number of things in the API that were included specifically to increase performance. This section examines a few of them.

— Capability bits
Capability bits are the applications way of describing its intentions to the Java 3D implementation. The implementation examines the capability bits to determine which objects may change at run time. Many optimizations are possible with this feature.

— Compile
The are two compile methods in Java 3D 1.2.1. They are in the BranchGroup and SharedGroup classes. Once an application calls compile(), only those attributes of objects that have their capability bits set may be modified. The implementation may then use this information to "compile" the data into a more efficient rendering format.

— Bounds
Many Java 3D object require a bounds associated with them. These objects include Lights, Behaviors, Fogs, Clips, Backgrounds, BoundingLeafs, Sounds, and Soundscapes. The purpose of these bounds is to limit the spatial scope of the specific object. The implementation may quickly disregard the processing of any objects that are out of the spatial scope of a target object.

— Unordered Rendering
All state required to render a specific object in Java 3D is completely defined by the direct path from the root node to the given leaf. That means that leaf nodes have no effect on other leaf nodes, and therefore may be rendered in any order. There are a few ordering requirements for direct descendents of OrderedGroup nodes or Transparent objects. But, most leaf nodes may be reordered to facilitate more efficient rendering.

— Appearance Bundles
A Shape3D node has a reference to a Geometry and an Appearance. An Appearance NodeComponent is simply a collection of other NodeComponent references that describe the rendering characteristics of the geometry. Because the Appearance is nothing but a collection of references, it is much simpler and more efficient for the implementation to check for rendering characteristic changes when rendering. This allows the implementation to minimize state changes in the low level rendering API.

III - Current Optimizations in Java 3D 1.2.1

This section describes a number of optimizations that are currently implemented in Java 3D 1.2.1. Other optimizations will be implemented as the API matures. The purpose of this section is to help application programmers focus their optimizations on things that will compliment the current optimizations in Java 3D.

— Hardware
Java 3D uses OpenGL and Direct3D as its low level rendering APIs. It relies on the underlying OpenGL and Direct3D drivers for its low level rendering acceleration. Using a graphics display adapter that offers OpenGL or Direct3D acceleration is the best way to increase overall rendering performance in Java 3D.

— Compile
In the Java 3D 1.2 release, no compile optimizations were implemented. The following compile optimizations are implemented in the Java 3D 1.2.1 release:

Scene graph flattening: TransformGroup nodes that are neither readable nor writable are collapsed into a single transform node.

Combining Shape3D nodes: Non-writable Shape3D nodes that have the same appearance attributes and are under the same TransformGroup (after flattening) are combined, internally, into a single Shape3D node that can be rendered with less overhead.

— State Sorted Rendering
Since Java 3D allows for unordered rendering for most leaf nodes, the implementation sorts all objects to be rendered on a number of rendering characteristics. The characteristics that are sorted on are, in order, Lights, Texture, Geometry Type, Material, and finally localToVworld transform. The only exception to this is any child of an OrderedGroup node. There is no state sorting for those objects.

— 3View Frustum Culling
The Java 3D implementation implements view frustum culling. The view frustum cull is done when an object is processed for a specific Canvas3D. This cuts down on the number of objects needed to be processed by the low level graphics API.

— Multithreading
The Java 3D API was designed with multithreaded environments in mind. The current implementation is a fully multithreaded system. At any point in time, there may be parallel threads running performing various tasks such as visibility detection, rendering, behavior scheduling, sound scheduling, input processing, collision detection, and others. Java 3D is careful to limit the number of threads that can run in parallel based on the number of CPUs available.

IV - Tips and Tricks <<=====

This section presents a number of tips and tricks for an application programmer to try when optimizing their application. These tips focus on improving rendering frame rates, but some may also help overall application performance. A number of these optimization will eventually be handled directly by the Java 3D implementation.

— Move Object vs. Move ViewPlatform
If the application simply needs to transform the entire scene, transform the ViewPlatform instead. This changes the problem from transforming every object in the scene into only transforming the ViewPlatform.

— Capability bits
Only set them when needed. Many optimizations can be done when they are not set. So, plan out application requirements and only set the capability bits that are needed.

— Bounds and Activation Radius
Consider the spatial extent of various leaf nodes in the scene and assign bounds accordingly. This allows the implementation to prune processing on objects that are not in close proximity. Note, this does not apply to Geometric bounds. Automatic bounds calculations for geometric objects is fine.

— Change Number of Shape3D Nodes
In the current implementation there is a certain amount of fixed overhead associated with the use of the Shape3D node. In general, the fewer Shape3D nodes that an application uses, the better. However, combining Shape3D nodes without factoring in the spatial locality of the nodes to be combined can adversely effect performance by effectively disabling view frustum culling. An application programmer will need to experiment to find the right balance of combining Shape3D nodes while leveraging view frustum culling. The .compile optimization that combines shape node will do this automatically, when possible.

— Geometry Type and Format
Most rendering hardware reaches peak performance when rendering long triangle strips. Unfortunately, most geometry data stored in files is organized as independent triangles or small triangle fans (polygons). The Java 3D utility package includes a stripifier utility that will try to convert a given geometry type into long triangle strips. Application programmers should experiment with the stripifier to see if it helps with their specific data. If not, any stripification that the application can do will help. Another option is that most rendering hardware can process a long list of independent triangles faster than a long list of single triangle triangle fans. The stripifier in the Java 3D utility package will be continually updated to provided better stripification.

— Sharing Appearance/Texture/Material NodeComponents
To assist the implementation in efficient state sorting, and allow more shape nodes to be combined during compilation, applications can help by sharing Appearance/Texture/Material NodeComponent objects when possible.

— Geometry by reference
Using geometry by reference reduces the memory needed to store a scene graph, since Java 3D avoids creating a copy in some cases. However, using this features prevents Java 3D from creating display lists (unless the scene graph is compiled), so rendering performance can suffer in some cases. It is appropriate if memory is a concern or if the geometry is writable and may change frequently. The interleaved format will perform better than the non-interleaved formats, and should be used where possible. In by-reference mode, an application should use arrays of native data types; referring to TupleXX[] arrays should be avoided.

— Texture by reference and Y-up
Using texture by reference and Y-up format may reduce the memory needed to store a texture object, since Java 3D avoids creating a copy in some cases. Currently, Java3D will not make a copy of texture image for the following combinations of BufferedImage format and ImageComponent format (byReference and Yup should both be set to true):

On both Solaris and Win32 OpenGL:

BufferedImage.TYPE_CUSTOM
of form 3BYTE_RGB

BufferedImage.TYPE_CUSTOM
of form 4BYTE_RGBA

BufferedImage.TYPE_BYTE_GRAY

ImageComponent.FORMAT_RGB8 or
ImageComponent.FORMAT_RGB

mageComponent.FORMAT_RGBA8 or
ImageComponent.FORMAT_RGBA

ImageComponent.FORMAT_CHANNEL8

On Win32/OpenGL:

BufferedImage format
----------------------
BufferedImage.TYPE_3BYTE_BGR

ImageComponentFormat
----------------------
ImAgeComponent.FORMAT_RGB8 or
ImageComponent.FORMAT_RGB

On Solaris/OpenGL:

BufferedImage format
----------------------
BufferedImage.TYPE_4BYTE_ABGR

ImageComponentFormat
----------------------
ImageComponent.FORMAT_RGBA8 or
ImageComponent.FORMAT_RGBA

— Application Threads
The built in threads support in the Java language is very powerful, but can be deadly to performance if it is not controlled. Applications need to be very careful in their threads usage. There are a few things to be careful of when using Java threads. First, try to use them in a demand driven fashion. Only let the thread run when it has a task to do. Free running threads can take a lot of cpu cycles from the rest of the threads in the system - including Java 3D threads. Next, be sure the priority of the threads are appropriate.

Most Java Virtual Machines will enforce priorities aggressively. Too low a priority will starve the thread and too high a priority will starve the rest of the system. If in doubt, use the default thread priority. Finally, see if the application thread really needs to be a thread. Would the task that the thread performs be all right if it only ran once per frame? If so, consider changing the task to a Behavior that wakes up each frame.

— Java 3D Threads
Java 3D uses many threads in its implementation, so it also needs to implement the precautions listed above. In almost all cases, Java 3D manages its threads efficiently. They are demand driven with default priorities. There are a few cases that don't follow these guidelines completely.

— Behaviors
One of these cases is the Behavior scheduler when there are pending WakeupOnElapsedTime criteria. In this case, it needs to wakeup when the minimum WakeupOnElapsedTime criteria is about to expire. So, application use of WakeupOnElapsedTime can cause the Behavior scheduler to run more often than might be necessary.

— Sounds
The final special case for Java 3D threads is the Sound subsystem. Due to some limitations in the current sound rendering engine, enabling sounds cause the sound engine to potentially run at a higher priority than other threads. This may adversely effect performance.

— Threads in General
There is one last comment to make on threads is general. Since Java 3D is a fully multithreaded system, applications may see significant performance improvements by increasing the number of CPUs in the system. For an application that does strictly animation, then two CPUs should be sufficient. As more features are added to the application (Sound, Collision, etc.), more CPUs could be utilized. Note: When running in the Solaris environment, be sure that native threads are enabled. Green threads will not take advantage of multiple CPUs.

— Switch Nodes for Occlusion Culling
If the application is a first person point of view application, and the environment is well known, Switch nodes may be used to implement simple occlusion culling. The children of the switch node that are not currently visible may be turned off. If the application has this kind of knowledge, this can be a very useful technique.

— Switch Nodes for Animation
Most animation is accomplished by changing the transformations that effect an object. If the animation is fairly simple and repeatable, the flip-book trick can be used to display the animation. Simply put all the animation frames under one switch node and use a SwitchValueInterpolator on the switch node. This increases memory consumption in favor of smooth animations.

— Switch nodes under Writable Transforms
Switch nodes that are descendants of writable TransformGroup nodes can incur extra cost associate with updating the vworld bounds and localToVworld transforms of all children (not just those that are switched on). This is one more reason why it is better to rotate the viewer than the entire scene graph (see "Move Object vs. Move ViewPlatform").

— Link/SharedGroup versus cloneTree
Using multiple Link nodes pointing to a shared subgraph (SharedGroup) can have a performance penalty over a shallow clone of the scene graph. To create a shallow clone of the scene graph, use cloneTree without duplication the node components. Restrict the use of Link/SharedGroup to those cases where you really need the kind of sharing that it provides.

— OrderedGroup Nodes
OrderedGroup and its subclasses are not as high performing as the unordered group nodes. They disable any state sorting optimizations that are possible. If the application can find alternative solutions, performance will improve.

— LOD Behaviors
For complex scenes, using LOD Behaviors can improve performance by reducing geometry needed to render objects that don't need high level of detail. This is another option that increases memory consumption for faster render rates.

— Picking
If the application doesn't need the accuracy of geometry based picking, use bounds based picking. For more accurate picking and better picking performance, use PickRay instead of PickCone/PickCylnder unless you need to pick line/point. PickCanvas with a tolerance of 0 will use PickRay for picking.

Slide 20 : 20 / 20 : Conclusions, Resources and Further Reading

Conclusions

Java doesn't compile down to assembly language, but that doesn't mean it is a slow pig.

Modern JVMs : just-in-time compilers and other sophisticated techniques

Much richer set of abstractions than C++ : it is possible to write inefficient code if you're careless

Algorithmic improvements often yield the biggest gains and they are not specific to any one language or API

Java doesn't compile down to assembly language, but that doesn't mean it is a slow pig. Modern JVMs use just-in-time compilers and other sophisticated techniques to bring the performance up to that of C++. But because Java also supports a much richer set of abstractions than C++ it is possible to write inefficient code if you're careless. The secret is to design your application for performance right from the start, and to profile and optimise it once it's working correctly.

This tutorial covered 15 ways to improve the efficiency of you code, but it is hardly the last word on optimisation. Look online; there are numerous sources of further information available. Glen McCluskey recently produced an excellent paper on techniques to improve Java performance. Sun also have a great Java3D Performance Guide with lots of useful suggestions and advice. Over the years JavaWorld has run plenty of articles on the same subject. There are also some great interest groups that can help you if you get stuck. Algorithmic improvements often yield the biggest gains and they are not specific to any one language or API - hit the proceedings of SIGGRAPH and check out what the game development community have done.

So what are you waiting for? Time to move up a gear. Soup up that application of yours: drop the suspension, fit a growly exhaust, a rear spoiler and some fat tires. Try life in the fast lane.

Java 3DTM API Collateral — 1.2.1 Performance Guide

Resources and Further Reading

The first two are absolutely essential reading:

1. Glen McCluskey's "Thirty Ways to Improve the Performance of Your Java Programs" (PDF)

2. Sun's Java 3D API Performance Guide

3. Nate Sammons Optimisation Hints

4. Jonathan Hardwick's Optimisation Page

5. JavaWorld Article on Performance Issues

6. Gamasutra - The Game Developers Site

Footnotes

[1] Actually the answer can be worse than that. Different Java compilers treat strings in different ways but you would be fortunate indeed to get away with less than 12 objects. You could be stung for as many as 18 objects by some compilers.

[2] Take a look at one of the Matrix classes and you'll see things are worse than they first seem. Each matrix is stored in an array of primitive types. For example Matrix3f is stored in a float[9] array. So when you create a matrix you're actually creating two objects: the Matrix object and the array. Ow!

[3] On all but the most recent releases of Java3D (1.2.1) compiling branches doesn't actually do anything!

On both Solaris and Win32 OpenGL:
BufferedImage.TYPE_CUSTOM of form 3BYTE_RGB BufferedImage.TYPE_CUSTOM of form 4BYTE_RGBA BufferedImage.TYPE_BYTE_GRAY	ImageComponent.FORMAT_RGB8 or ImageComponent.FORMAT_RGB mageComponent.FORMAT_RGBA8 or ImageComponent.FORMAT_RGBA ImageComponent.FORMAT_CHANNEL8
On Win32/OpenGL:
BufferedImage format ---------------------- BufferedImage.TYPE_3BYTE_BGR	ImageComponentFormat ---------------------- ImAgeComponent.FORMAT_RGB8 or ImageComponent.FORMAT_RGB
On Solaris/OpenGL:
BufferedImage format ---------------------- BufferedImage.TYPE_4BYTE_ABGR	ImageComponentFormat ---------------------- ImageComponent.FORMAT_RGBA8 or ImageComponent.FORMAT_RGBA

eScience Lectures Notes : Java Optimisation

Java Optimisation

Basic Java Optimisation Hints

Java3D Specific Hints

Source : Hints for Optimising Java3D

15 tips to go from snails-paced to faster-than-a-greased-weasel!

Sam Taylor

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." - Donald Knuth

Although you shouldn't optimise too early, performance is a Fundamental Requirement of any real time application.

most of the techniques described relate as much to design as they do to implementation

Optimising Java code usually means working at a higher level to guarantee that your object structure and class hierarchies are efficient.

Design an efficient system, build it, debug it, then profile and optimise it.

Basic Java Optimisation Hints

1. Minimise object creation and use of Strings

The Java programming style encourages you to create lots of little objects which don't hang around for terrible long

Especially true when working with strings

How many object in the following code ? ...

New statement is an expensive operation

Java objects aren't small (18 - 20 bytes, but more typically the minimum object size is 40 bytes + your data !)

Basic Java Optimisation Hints : Minimise object creation and use of Strings (2)

1. Minimise object creation and use of Strings

Big Objects...

chew up virtual memory

swamp your valuable memory bandwidth

pollute your CPU cache

force the garbage collector to do more work

One big issue : the garbage collection

In a loop, a simple Object creation becomes a set of object creations

Example of loop : the rendering loop called every frame in your Java3D application

Be very careful about how and when you create objects (new and String use)

Pre-allocate and recycle your objects: get the memory once and then reuse it

Java3D code : Point, Matrix and Vector are ideal candidates for this treatment

"setIdenty" to get a clean Matrix

Basic Java Optimisation Hints

2. Take a good look at your method call chains

General rule of thumb static methods are cheapest, then final methods, then instance methods, then interface methods and finally synchronized methods:

static < final < instance < interface < synchronized

Single method call the cost is still pretty minimal, but chaining them together mounts up the cost (methodA() calls methodB() calls methodC())

Long call chains are a common feature of Java's event model

understand the costs of different types of method

use static and final where it makes sense but without compromising generality or reuse

use abstract classes instead of an interfaces if you can

keep call chains as short as possible

use recursion with great care

always make your private methods final

Basic Java Optimisation Hints

3. Thread synchonisation is expensive

Multithreading is good to improve user interface reactivity, but...

Acquiring locks to guarantee thread safety is slow

If you intend to make extensive use of Java's threads you probably ought to grab a couple of good textbooks

AWT uses a couple of threads, Java3D uses lots of threads

You don't have to lock everything : identify a minimal model for synchronisation

Basic Java Optimisation Hints

4. Collect your own garbage, with care

Java programs run nicely most of the time, but now and again they grind to a halt... that's garbage collection time !

Automatic memory management

Avoid garbage collection in first place by reusing your objects

Schedule the garbage collector yourself : System.gc()

Do this when things are quite

Java3D : Usability studies : it is better to have a slow but constant frame rate than a fast but variable one

If you regularly schedule the garbage collector you can average out the costs

Basic Java Optimisation Hints

5. Use arrays [] for small collections of objects

LinkedList, Set, Map and TreeSet (java.util package) are all very convenient, but they aren't necessarily high performance.

Use a separate Iterator object, means more objects, chained method calls...

Casting and a slow runtime type check

For very small, simple collections of objects or primitive types you are much better to use arrays (6 - 8 objects).

Basic Java Optimisation Hints

6. Be afraid of Reflection and Serialization

Funkiest features are also it's greatest bottlenecks

Reflection is the abiliy to introspect a class and dynamically work out what methods and fields it has

Invoking a method through reflection is approximately one thousand times (1000x) slower than a normal method call

Serialisation is the ability to take a group of objects and dump them out into an array of bytes

Very handy for loading and saving data, and also useful for sharing data between machines (Java RMI)... But spectacularly slow

Often : abstract features = performance bottlenecks : Dynamic class loading, JDBC, parsing XML documents, LDAP directory access, CORBA networking...

Basic Java Optimisation Hints

7. Never ignore Exceptions

It keeps the compiler happy while you get on with worrying about the rest of your algorithm.

That dodgey method full of bug could be throwing exceptions all the time and you'd never know.

Common fix is to dump the exception out to the System.err stream