This document discusses best practices for highly scalable Java programming on multi-core systems. It begins by outlining software challenges like parallelism, memory management, and storage management. It then introduces profiling tools like the Java Lock Monitor (JLM) and Multi-core SDK (MSDK) to analyze parallel applications. The document provides techniques like reducing lock scope and granularity, using lock stripping and striping, splitting hot points, and alternatives to exclusive locks. It also recommends reducing memory allocation and using immutable/thread local data. The document concludes by discussing lock-free programming and its advantages for scalability over locking.
Boost Fertility New Invention Ups Success Rates.pdf
Highly Scalable Java Programming for Multi-Core System
1. Highly Scalable Java Programming
for Multi-Core System
Zhi Gan (ganzhi@gmail.com)
http://ganzhi.blogspot.com
2. Agenda
• Software Challenges
• Profiling Tools Introduction
• Best Practice for Java Programming
• Rocket Science: Lock-Free Programming
2
3. Software challenges
• Parallelism
– Larger threads per system = more parallelism needed to achieve
high utilization
– Thread-to-thread affinity (shared code and/or data)
• Memory management
– Sharing of cache and memory bandwidth across more threads =
greater need for memory efficiency
– Thread-to-memory affinity (execute thread closest to associated
data)
• Storage management
– Allocate data across DRAM, Disk & Flash according to access
frequency and patterns
3
19. Split Hot Points : Scalable Counter
– ConcurrentHashMap maintains a independent
counter for each segment of hash map, and use
a lock for each counter
– get global counter by sum all independent
counters
20. Alternatives of Exclusive Lock
• Duplicate shared resource if possible
• Atomic variables
– counter, sequential number generator, head
pointer of linked-list
• Concurrent container
– java.util.concurrent package, Amino lib
• Read-Write Lock
– java.util.concurrent.locks.ReadWriteLock
21. Example of AtomicLongArray
public synchronized void set1(int private final AtomicLongArray a;
idx, long val) {
d[idx] = val; public void set2(int idx, long val) {
} a.addAndGet(idx, val);
}
public synchronized long get1(int public long get2(int idx) {
idx) { long ret = a.get(idx); return ret;
long ret = d[idx]; }
return ret;
}
Execution Time: 23550 Execution Time: 842 milliseconds
milliseconds
96%
22. Using Concurrent Container
• java.util.concurrent package
– since Java1.5
– ConcurrentHashMap, ConcurrentLinkedQueue,
CopyOnWriteArrayList, etc
• Amino Lib is another good choice
– LockFreeList, LockFreeStack, LockFreeQueue, etc
• Thread-safe container
• Optimized for common operations
• High performance and scalability for multi-core
platform
• Drawback: without full feature support
23. Using Immutable and Thread Local data
• Immutable data
– remain unchanged in its life cycle
– always thread-safe
• Thread Local data
– only be used by a single thread
– not shared among different threads
– to replace global waiting queue, object pool
– used in work-stealing scheduler
24. Reduce Memory Allocation
• JVM: Two level of memory allocation
– firstly from thread-local buffer
– then from global buffer
• Thread-local buffer will be exhausted quickly
if frequency of allocation is high
• ThreadLocal class may be helpful if
temporary object is needed in a loop
26. Using Lock-Free/Wait-Free Algorithm
• Lock-Free allow concurrent updates of
shared data structures without using any
locking mechanisms
– solves some of the basic problems associated
with using locks in the code
– helps create algorithms that show good
scalability
• Highly scalable and efficient
• Amino Lib
27. Why Lock-Free Often Means Better Scalability? (I)
Lock:All threads wait for one
Lock free: No wait, but only one can succeed,
Other threads need retry
28. Why Lock-Free Often Means Better Scalability? (II)
X X
Lock:All threads wait for one
Lock free: No wait, but only one can succeed,
Other threads often need to retry
29. Performance of A Lock-Free Stack
Picture from: http://www.infoq.com/articles/scalable-java-components
What if all previous best prestise cannot meet your need? You would like to optimize your application manually?
msdk – This tool can be used to do detailed performance analysis of concurrent Java applications. It does an in-depth analysis of the complete execution stack, starting from the hardware to the application layer. Information is gathered from all four layers of the stack – hardware, operating system, jvm and application.
`
For multi-thread application, lock-free approach is different with lock-based approach in several aspects: When accessing shared resource, lock-based approach will only allow one thread to enter critical section and others will wait for it On the contrary, lock-free approach will all every thread to modify state of shared state. But one of the all threads can succeed, and all other threads will be aware of their action are failed so they will retry or choose other actions.
The real difference occurs when something bad happens to the running thread. If a running thread is paused by OS scheduler, different thing will happen to the two approach: Lock-based approach: All other threads are waiting for this thread, and no one can make progress Lock-free approach: Other threads will be free to do any operations. And the paused thread might fail its current operation From this difference, we can found in multi-core environment, lock-free will have more advantage. It will have better scalability since threads don’t wait for each other. And it will waste some CPU cycles if contention. But this won’t be a problem for most cases since we have more than enough CPU resource