In this presentation, you’ll learn preferred architecture patterns and practical steps for building your apps to be fast and responsive. Michael Samarin, Paul Houghton, Timo Saarinen of Futurice, show the results of various Series 40 code fragments and reveal where to spend your time in making improvements. They show you the differences that best-practice micro-optimisations make, where these optimisations are useful, and where you should let the tools and processor optimise for you.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Designing and coding Series 40 Java apps for high performance
1. Series 40 Developer Training Michael Samarin
Paul Houghton
Timo Saarinen
Designing and coding Series 40
Java apps for high performance Futurice Ltd
2. Today’s Topics
» Performance Basics on Series 40
» Mobile Front End Architecture Patterns
» Choosing GUI, Caching, Threading
» Low Level “Micro” Performance Optimization
3. Miller, R. B. (1968)
Response time in man-computer conversational
transactions.
Proc. AFIPS Fall Joint Computer ConferenceVol. 33, 267-277
» 0.1 second is about the limit for having the
user feel that the system is reacting
instantaneously, meaning that no special
feedback is necessary except to display the
result.
4. » 1.0 second is about the limit for the user's flow of
thought to stay uninterrupted, even though the user will
notice the delay. Normally, no special feedback is
necessary during delays of more than 0.1 but less than
1.0 second, but the user does lose the feeling of
operating directly on the data.
» 10 seconds is about the limit for keeping the user's
attention focused on the dialogue. For longer delays,
users will want to perform other tasks while waiting for
the computer to finish, so they should be given feedback
indicating when the computer expects to be done.
Feedback during the delay is especially important if the
response time is likely to be highly variable, since users
will then not know what to expect.
6. LCDUI Forms
» Fast, simple and standard way of making UI.
» On full touch Asha very attractive looking and have
huge UX improvements.
» Not as fast as Canvas. Animation on a Form is much
slower than on a Canvas, and there is no way to
influence the vertical scroll position, animate transitions
between screen, or shift to a full screen view. You can,
however, slightly increase the performance when
changing screens by using just one Form and re-
populating it with new Items.
» http://www.developer.nokia.com/Resources/Library/Full_Touch
7. Canvas
» Highly customizable way of making UI.
» You have to take care of render timing yourself, or you can use
Nokia’s FrameAnimator class to quickly create effects such as
kinetic scrolling.
» Any part of your code can call Canvas.repaint() to signal that
painting should occur soon.
» The most important performance tip for navigating through a
Canvas-based UI is to implement your own View class to
represent each screen, and paint all Views on one Canvas rather
than switching from one Canvas to another, which can be slow
and does not give you the possibility of animating the transition
for smooth effect.
8. GameCanvas
» GameCanvas is double buffered with more control over the
painting cycle and threading.
» Unlike Canvas, you should create your own Thread, which calls
GameCanvas.paint() directly to fill the graphics buffer, and then
GameCanvas.flushGraphics() to instantly blit the graphics buffer
onto the screen.
9. LWUIT
» LWUIT (Lightweight User Interface Toolkit) is a toolkit for
creating SWING-like applications without some of the complexity
of SWING.
» Like Form, it offers basic components, but it adds to this better
layouts, styles and theming, bundling own fonts into your
application, and animated screen transitions.
» LWUIT is implemented on top of a Canvas, but it is a large and
complex library written to be a general purpose replacement for
the default UI on many different phones.
» LWUIT and the associated themes and any fonts you include
quickly make your JAR file grow quite large.
» http://projects.developer.nokia.com/LWUIT_for_Series_40
10. Heap Memory
» On Series 40 only from 2 to 4 MB.
» Instances of classes (objects) and primitive types
are created in the heap.
» Total number of methods in classes loaded by JVM
has a direct impact on how much heap space is left
for other data. These memory allocations are
permanent for the runtime of the application and
are not dynamically unloaded by the JVM once a
class is no longer in use.
11. Recursive Algorithms and Stack Memory
» Variables passed as arguments to a method are passed on
the current thread’s stack. Method variables of primitive
types are also allocated on the stack.
» Recursive algorithms are algorithms where a method calls
itself in a loop to complete a task. As a result, they create
multiple stack frames.
› They use a lot of stack memory. The same method is called repeatedly, and only as the
application completes does it unwind the queued stack frames. This extra stack
memory is often not useful, and stack memory per thread is limited and such heavy
stack use may well cause an OutOfMemoryException well before you are actually out of
heap memory.
› Recursive algorithms can be slow. Each method call includes a certain amount of
overhead, which is not really necessary since a recursive algorithm can be unwound into
a non-recursive equivalent loop that does not include the relatively heavy method call.
12. › Provides basic “free”
optimization
› Fixes code redundancy and
pre-calculate things
Compile Time whenever possible
Optimization › Minimizes memory usage
and › Should be last step in
Obfuscation building apps – takes time
and makes debugging
difficult
› Doesn’t fix wrong
architecture
15. Obfuscation Example: Battle Tank
https://projects.developer.nokia.com/JMEBattleTank
› JAR File size decreased by 4%
(889 -> 852 kB)
› RAM usage decreased by 14%
(161 -> 138 kB)
16. Architecture changes
» Carefully consider architecture of your drawing loop and
input loops and decouple them whenever possible.
» Example: panorama drawing and sensor driving loop.
» Original example:
» http://www.youtube.com/watch?v=PfW4BVHgri8
» After optimization:
» http://www.youtube.com/watch?v=xSRYVYrNNMI
17. WeakReference object Caching
» Best pattern for using all available heap memory, but
never running into the dreaded OutOfMemoryError.
» CLDC 1.1 WeakReference
» When an object is referenced by a WeakReference, and not
using traditional Object pointers, this is a signal to the
garbage collector that is has permission to collect the
object if memory is running low.
» You have to maintain own HashTable of Objects
» To understand this pattern better look at Tantalum 3:
http://projects.developer.nokia.com/Tantalum
18. public class WeakHashCache {
protected final Hashtable hash = new Hashtable();
public Object get(final Object key) {
final WeakReference reference = (WeakReference) hash.get(key);
if (reference != null) {
return reference.get();
}
return null;
}
public void put(final Object key, final Object value) {
synchronized (hash) {
if (key == null) {
return;
}
if (value == null) {
hash.remove(key);
return;
}
hash.put(key, new WeakReference(value));
}
}
public void remove(final Object key) {
if (key != null) {
hash.remove(key);
}
}
public boolean containsKey(final Object key) {
if (key != null) {
return hash.containsKey(key);
}
return false;
}
public int size() {
return hash.size();
}
public void clear() {
hash.clear();
}
}
19. Render Caching
» One of the common performance needs is to make your
application paint, in particular scroll, smoothly and quickly.
» You can paint items each into their own Image, keeping
that pre-painted Image in a cache, and reusing it as the
object moves around the screen. Essentially,
WeakReference cach of pre-painted Images.
» Can achieve dramatic FPS increase, like in this example
from 3 to 12 on Asha 305:
» http://www.youtube.com/watch?v=Z2QcnhROFGc
» To understand this pattern better look at Tantalum 3:
http://projects.developer.nokia.com/Tantalum
20. File System (Flash Memory) Caching
» Flash memory is slow, but faster then Web.
» Cache downloaded data from previous session. Improve
startup time of app, by loading from disk cache instead of
new Web requests.
» RMS and File System (JSR-75) same speed, but with RMS
no security prompts.
» Can achieve dramatic startup time decrease, like in this
example from 10 to 2 seconds on Asha 305:
» http://www.youtube.com/watch?v=Cn96lET4moU
21. File System (Flash Memory) Caching
» Underwater stones: still remember, Flash memory is slow.
» Architect your application to use asynchronous loading
/saving of data from / to disk cache.
» In Battle Tank example, it was possible to save 28ms in
each rendered frame, by removing synchronous references
to flash memory in loop.
» To understand this pattern better look at Tantalum 3:
http://projects.developer.nokia.com/Tantalum
22. Hash Acceleration
» Some iterative algorithms are slow. Proper usage of
collections types of data structures can increase
performance.
» Vector.contains() is very slow, but Hashtable.containsKey() is
very fast. Reconsider your algorithms to use Hashtables.
» Usage can be found in very surprising places. For example,
Font.stringWidth() is slow, but necessary for drawing
multiline text on Canvas. Creating a Hashtable with the
width in each character you have used in the Font can
transform this into a fast operation and increase
Canvas.paint() speed.
23. Synchronized vs. Volatile Variables
» When a variable or Object needs to be accessed from more
than one Thread.
» Marking a variable as volatile is the least restrictive
approach and can have very high performance because no
Thread is blocked.
» Only one Thread may enter the synchronized sections at
any one time.
» Consider atomic operations on two variables. For example, when
updating firstName and lastName from “John Smith” to “Jane
Marceau”, do so within a synchronized block to avoid briefly exposing
the transitional state “Jane Smith” to other threads.
24. Constants
» We can give the compiler and Proguard more opportunities
to optimize the code at the compile step, and this will also
give the ARM processor opportunities for handling these
variables with more efficient byte codes.
private static int loopCount = 10;
private static long startTime = System.currentTimeMillis();
private static boolean enableImages = true;
Should be
private static final int LOOP_COUNT = 10;
private static final long START_TIME = System.currentTimeMillis();
private static final boolean ENABLE_IMAGES = true;
25. Primitives
» Use int instead of short, byte or long.
for (int i = 0; i < 3000000; i++) {
short/int/long a = 123;
short/int/long b = -44;
short/int/long c = 12;
a += c;
b += a;
c *= b;
}
Average times spent in loops on Nokia Asha 305 (obfuscated):
int: 710 (580) ms
short: 900 (850) ms 50% slower
long: 1450 (1150) ms 100% slower
26. Final in methods
for (int i = 0; i < 1000000; i++) {
a = finalMethod(1, 2, 3);
}
for (int i = 0; i < 1000000; i++) {
a = nonFinalMethod(1, 2, 3);
}
public final int finalMethod(final int a, final int b, final int c) {
final float x = 1.23f, y = 0.05f;
final float z = x * y;
final int d = a + b + c;
return d;
}
public int nonFinalMethod(int a, int b, int c) {
float x = 1.23f, y = 0.05f;
float z = x * y;
int d = a + b + c;
return d;
}
27. Final in methods
Average times on a Nokia Asha 305:
finalMethod: 650 ms
nonFinalMethod: 940 ms 45% slower
In this case, the time difference comes from final keyword before
x and y. It is logical because then z value can be precalculated.
The final keywords with parameters a, b, c let us not precalculate
d or anything. And because we don’t use z, it being final does not
help us
28. Static
» Generally static methods and variables should be faster.
Oddly, with some combinations of ARM and JVM, instance
accesses are slightly faster.
for (int i = 0; i < 1000000; i++) {
staticMethod();
Average times spent in loops
} on Nokia Asha 305
for (int i = 0; i < 1000000; i++) {
nonStaticMethod();
(obfuscated):
}
private static void staticMethod() { nonStaticMethod: 570 ms
b++; // static variable
} staticMethod: 680 ms 20%
private void nonStaticMethod() { slower
a++; // instance variable
}
29. String Concatenation
If you are going to concatenate a large number of small Strings,
use:
StringBuffer.append()
instead of the
String +=
operator. String is much slower because every time you
concatenate a string to another with += operator, a new
StringBuffer is created under the hood. Depending on the number
of concatenations, a single explicit StringBuffer can be many times
faster than multiple implicit StringBuffers created by String
addition.
30. Addition vs. Multiplication vs. Division
for (int i = 0; i < 500000; i++) {
a = 1.23f;
b = 1.45f;
c = 0.004523f;
c += a;
a = b + c;
}
for (int i = 0; i < 500000; i++) {
Average times spent in loops
a = 1.23f;
b = 1.45f;
on Nokia Asha 305:
c = 0.004523f;
c *= a; Multiplying: 330 ms
a = b * c;
} Addition: 360 ms 9% slower
for (int i = 0; i < 500000; i++) {
a = 1.23f; Division: 560 ms 70% slower
b = 1.45f;
c = 0.004523f;
c /= a;
a = b / c;
}
31. Switch vs. If
The switch statement in C is implemented as a direct jump which is
extremely fast. In Java on Nokia Series 40 phones, switches are
implemented at the bytecode level as a series of if statements.
Therefore in many cases a switch statement is less efficient than a
manually created series of if..else statements in which the first
positive case is selected as the one which occurs more frequently. If
you prefer to use switch statements for code clarity, then arrange
them so that the most frequent cases appear first.
32. Hidden References
» All inner classes contain a reference to the parent class.
Even if your code does not take advantage of this, if you
pass an inner class to an execution queue such as the event
dispatch thread (EDT), the parent class cannot be garbage
collected until the inner class instance has been executed
and can be garbage collected.
MyCanvas:
midlet.getDisplay().callSerially(new Runnable() {
public void run() {
System.out.println(“Canvas width: “ +
MyCanvas.this.getWidth());
}
});
33. Performance summary
» Compare Algorithms
› Talk to colleagues and pick the best algorithm; having
the best possible algorithm is the most effective way to
optimize performance.
» Simple Architecture
› Keep your architecture simple and to the point without
extra layers of method calls or objects for artificial
abstraction. Mobile front end code does not last for
ever, so over-engineering and excessive abstraction into
multiple classes will slow you down compared to simple
use of variables.
34. Performance summary
» Manage Memory with WeakReference
Caching
› Avoid memory problems by always accessing image
data in memory using a WeakReference Cache.
› Create a type of virtual memory by duplicating the
WeakReference cache contents in Flash memory
(Record Management System) so that you can quickly
recover items which are no longer available in RAM.
35. Performance summary
» Use micro-optimizations of the code as habit
› Know the rules of micro-optimisation for memory
performance, logic and calculations. Include those as
you develop, but trust Proguard to add the finishing
touches.
› Help Proguard by making everything possible final or
static final. Avoid static variables in high performance
loops as they are slower than instance variables.
36. Performance summary
» Profile your app towards the end of project
› Profile your application in an emulator.
› Also test the actual run-time of critical code sections on
the phone using System.currentTimeMillis() to see and
carefully measure the effects of your code changes.
37. › Extreme Mobile Java
JavaOne 2012 Performance Tuning, User
San Francisco Experience, and Architecture
Patterns
› Wednesday, Oct 3, 11:30AM
› Notel Nikko – Monterey I/II
› http://tinyurl.com/95moz2l
› Java for Mobile Devices: New
Horizons with Fantastic New
Devices
› Monday, Oct 1, 8:30AM
› Notel Nikko – Monterey I/II
› http://tinyurl.com/8lndb3m