Matrix multiplication is a standard benchmark for evaluating the performance of intensive dataparallel operations on recent multi-core processors. This whitepaper shows to use Ateji PX for Java to achieve state-of-the-art parallel performance, by adding one single operator in your existing code.