SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Three Optimization Tips for C++

                                Andrei Alexandrescu, Ph.D.
                                   Research Scientist, Facebook
                                  andrei.alexandrescu@fb.com




© 2012- Facebook. Do not redistribute.                            1 / 33
This Talk




         • Basics
         • Reduce strength
         • Minimize array writes




© 2012- Facebook. Do not redistribute.   2 / 33
Things I Shouldn’t Even




© 2012- Facebook. Do not redistribute.       3 / 33
Today’s Computing Architectures



         • Extremely complex
         • Trade reproducible performance for average
           speed
         • Interrupts, multiprocessing are the norm
         • Dynamic frequency control is becoming
           common
         • Virtually impossible to get identical timings
           for experiments




© 2012- Facebook. Do not redistribute.                     4 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions




© 2012- Facebook. Do not redistribute.               5 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions


         • “Fewer instructions = faster code”




© 2012- Facebook. Do not redistribute.               5 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions


         • “Fewer instructions = faster code”




© 2012- Facebook. Do not redistribute.               5 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions


         • “Fewer instructions = faster code”
         • “Data is faster than computation”




© 2012- Facebook. Do not redistribute.               5 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions


         • “Fewer instructions = faster code”
         • “Data is faster than computation”




© 2012- Facebook. Do not redistribute.               5 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions


         • “Fewer instructions = faster code”
         • “Data is faster than computation”
         • “Computation is faster than data”




© 2012- Facebook. Do not redistribute.               5 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions


         • “Fewer instructions = faster code”
         • “Data is faster than computation”
         • “Computation is faster than data”




© 2012- Facebook. Do not redistribute.               5 / 33
Intuition


         • Ignores aspects of a complex reality
         • Makes narrow/obsolete/wrong assumptions


         • “Fewer instructions = faster code”
         • “Data is faster than computation”
         • “Computation is faster than data”


         • The only good intuition: “I should time this.”



© 2012- Facebook. Do not redistribute.                      5 / 33
Paradox




      Measuring gives you a
      leg up on experts who
      don’t need to measure


© 2012- Facebook. Do not redistribute.   6 / 33
Common Pitfalls


         • Measuring speed of debug builds
         • Different setup for baseline and measured
             ◦ Sequencing: heap allocator
             ◦ Warmth of cache, files, databases, DNS
         • Including ancillary work in measurement
             ◦ malloc, printf common
         • Mixtures: measure ta + tb , improve ta ,
           conclude tb got improved
         • Optimize rare cases, pessimize others



© 2012- Facebook. Do not redistribute.                 7 / 33
Optimizing Rare Cases




© 2012- Facebook. Do not redistribute.   8 / 33
More generalities




         • Prefer static linking and PDC
         • Prefer 64-bit code, 32-bit data
         • Prefer (32-bit) array indexing to pointers
            ◦ Prefer a[i++] to a[++i]
         • Prefer regular memory access patterns
         • Minimize flow, avoid data dependencies




© 2012- Facebook. Do not redistribute.                  9 / 33
Storage Pecking Order



         • Use enum for integral constants
         • Use static const for other immutables
            ◦ Beware cache issues
         • Use stack for most variables
         • Globals: aliasing issues
         • thread_local slowest, use local caching
            ◦ 1 instruction in Windows, Linux
            ◦ 3-4 in OSX




© 2012- Facebook. Do not redistribute.               10 / 33
Reduce Strength




© 2012- Facebook. Do not redistribute.             11 / 33
Strength reduction



         • Speed hierarchy:
            ◦ comparisons
            ◦ (u)int add, subtract, bitops, shift
            ◦ FP add, sub (separate unit!)
            ◦ Indexed array access
            ◦ (u)int32 mul; FP mul
            ◦ FP division, remainder
            ◦ (u)int division, remainder




© 2012- Facebook. Do not redistribute.              12 / 33
Your Compiler Called




        I get it. a >>= 1 is the
           same as a /= 2.


© 2012- Facebook. Do not redistribute.   13 / 33
Integrals



         • Prefer 32-bit ints to all other sizes
            ◦ 64 bit may make some code slower
            ◦ 8, 16-bit computations use conversion to
              32 bits and back
            ◦ Use small ints in arrays
         • Prefer unsigned to signed
            ◦ Except when converting to floating point
         • “Most numbers are small”




© 2012- Facebook. Do not redistribute.                   14 / 33
Floating Point



         • Double precision as fast as single precision
         • Extended precision just a bit slower
         • Do not mix the three
         • 1-2 FP addition/subtraction units
         • 1-2 FP multiplication/division units
         • SSE accelerates throughput for certain
           computation kernels
         • ints→FPs cheap, FPs→ints expensive




© 2012- Facebook. Do not redistribute.                    15 / 33
Advice




      Design algorithms to
     use minimum operation
            strength


© 2012- Facebook. Do not redistribute.   16 / 33
Strength reduction: Example


         • Digit count in base-10 representation
       uint32_t digits10(uint64_t v) {
          uint32_t result = 0;
          do {
             ++result;
             v /= 10;
          } while (v);
          return result;
       }

         • Uses integral division extensively
            ◦ (Actually: multiplication)


© 2012- Facebook. Do not redistribute.             17 / 33
Strength reduction: Example

       uint32_t digits10(uint64_t v) {
          uint32_t result = 1;
          for (;;) {
             if (v < 10) return result;
             if (v < 100) return result + 1;
             if (v < 1000) return result + 2;
             if (v < 10000) return result + 3;
             // Skip ahead by 4 orders of magnitude
             v /= 10000U;
             result += 4;
          }
       }

         • More comparisons and additions, fewer /=
         • (This is not loop unrolling!)
© 2012- Facebook. Do not redistribute.                18 / 33
Minimize Array Writes




© 2012- Facebook. Do not redistribute.        20 / 33
Minimize Array Writes: Why?



         •   Disables enregistering
         •   A write is really a read and a write
         •   Aliasing makes things difficult
         •   Maculates the cache



         • Generally just difficult to optimize




© 2012- Facebook. Do not redistribute.              21 / 33
Minimize Array Writes


       uint32_t u64ToAsciiClassic(uint64_t value, char* dst) {
          // Write backwards.
          auto start = dst;
          do {
             *dst++ = ’0’ + (value % 10);
             value /= 10;
          } while (value != 0);
          const uint32_t result = dst - start;
          // Reverse in place.
          for (dst--; dst > start; start++, dst--) {
             std::iter_swap(dst, start);
          }
          return result;
       }




© 2012- Facebook. Do not redistribute.                           22 / 33
Minimize Array Writes
         • Gambit: make one extra pass to compute
            length
       uint32_t uint64ToAscii(uint64_t v, char *const buffer) {
          auto const result = digits10(v);
          uint32_t pos = result - 1;
          while (v >= 10) {
             auto const q = v / 10;
             auto const r = static_cast<uint32_t>(v % 10);
             buffer[pos--] = ’0’ + r;
             v = q;
          }
          assert(pos == 0);
          // Last digit is trivial to handle
          *buffer = static_cast<uint32_t>(v) + ’0’;
          return result;
       }


© 2012- Facebook. Do not redistribute.                            23 / 33
Improvements




         •   Fewer array writes
         •   Regular access patterns
         •   Fast on small numbers
         •   Data dependencies reduced




© 2012- Facebook. Do not redistribute.   24 / 33
One More Pass




         • Reformulate digits10 as search
         • Convert two digits at a time




© 2012- Facebook. Do not redistribute.      26 / 33
uint32_t         digits10(uint64_t v) {
          if (v         < P01) return 1;
          if (v         < P02) return 2;
          if (v         < P03) return 3;
          if (v         < P12) {
             if         (v < P08) {
                        if (v < P06) {
                           if (v < P04) return 4;
                           return 5 + (v < P05);
                        }
                        return 7 + (v >= P07);
                   }
                   if (v < P10) {
                      return 9 + (v >= P09);
                   }
                   return 11 + (v >= P11);
             }
             return 12 + digits10(v / P12);
       }

© 2012- Facebook. Do not redistribute.              27 / 33
unsigned u64ToAsciiTable(uint64_t value, char* dst) {
          static const char digits[201] =
             "0001020304050607080910111213141516171819"
             "2021222324252627282930313233343536373839"
             "4041424344454647484950515253545556575859"
             "6061626364656667686970717273747576777879"
             "8081828384858687888990919293949596979899";
          uint32_t const length = digits10(value);
          uint32_t next = length - 1;
          while (value >= 100) {
             auto const i = (value % 100) * 2;
             value /= 100;
             dst[next] = digits[i + 1];
             dst[next - 1] = digits[i];
             next -= 2;
          }




© 2012- Facebook. Do not redistribute.                         28 / 33
// Handle last 1-2 digits
             if (value < 10) {
                dst[next] = ’0’ + uint32_t(value);
             } else {
                auto i = uint32_t(value) * 2;
                dst[next] = digits[i + 1];
                dst[next - 1] = digits[i];
             }
             return length;
       }




© 2012- Facebook. Do not redistribute.               29 / 33
Summary




© 2012- Facebook. Do not redistribute.             32 / 33
Summary




         • You can’t improve what you can’t measure
            ◦ Pro tip: You can’t measure what you don’t
              measure
         • Reduce strength
         • Minimize array writes




© 2012- Facebook. Do not redistribute.                    33 / 33

Más contenido relacionado

La actualidad más candente

Overview of Android binder IPC implementation
Overview of Android binder IPC implementationOverview of Android binder IPC implementation
Overview of Android binder IPC implementationChethan Pchethan
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicJoseph Lu
 
Hunting rootkits with windbg
Hunting rootkits with windbgHunting rootkits with windbg
Hunting rootkits with windbgFrank Boldewin
 
C++からWebRTC (DataChannel)を利用する
C++からWebRTC (DataChannel)を利用するC++からWebRTC (DataChannel)を利用する
C++からWebRTC (DataChannel)を利用する祐司 伊藤
 
Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!Kent Chen
 
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...ScyllaDB
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing PerformanceBrendan Gregg
 
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...Opersys inc.
 
Quick and Easy Device Drivers for Embedded Linux Using UIO
Quick and Easy Device Drivers for Embedded Linux Using UIOQuick and Easy Device Drivers for Embedded Linux Using UIO
Quick and Easy Device Drivers for Embedded Linux Using UIOChris Simmonds
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Steve Pember
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Ray Jenkins
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Android Security Internals
Android Security InternalsAndroid Security Internals
Android Security InternalsOpersys inc.
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)CODE WHITE GmbH
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...Mihai Criveti
 

La actualidad más candente (20)

I Hunt Sys Admins
I Hunt Sys AdminsI Hunt Sys Admins
I Hunt Sys Admins
 
Overview of Android binder IPC implementation
Overview of Android binder IPC implementationOverview of Android binder IPC implementation
Overview of Android binder IPC implementation
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
Hunting rootkits with windbg
Hunting rootkits with windbgHunting rootkits with windbg
Hunting rootkits with windbg
 
C++からWebRTC (DataChannel)を利用する
C++からWebRTC (DataChannel)を利用するC++からWebRTC (DataChannel)を利用する
C++からWebRTC (DataChannel)を利用する
 
Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!
 
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
 
YOW2021 Computing Performance
YOW2021 Computing PerformanceYOW2021 Computing Performance
YOW2021 Computing Performance
 
Logging system of Android
Logging system of AndroidLogging system of Android
Logging system of Android
 
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
 
Quick and Easy Device Drivers for Embedded Linux Using UIO
Quick and Easy Device Drivers for Embedded Linux Using UIOQuick and Easy Device Drivers for Embedded Linux Using UIO
Quick and Easy Device Drivers for Embedded Linux Using UIO
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Android Security Internals
Android Security InternalsAndroid Security Internals
Android Security Internals
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
 

Destacado

Solid C++ by Example
Solid C++ by ExampleSolid C++ by Example
Solid C++ by ExampleOlve Maudal
 
Stabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance EvaluationStabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance EvaluationEmery Berger
 
ACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei AlexandrescuACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei AlexandrescuAndrei Alexandrescu
 
Generic Programming Galore Using D
Generic Programming Galore Using DGeneric Programming Galore Using D
Generic Programming Galore Using DAndrei Alexandrescu
 
Three, no, Four Cool Things About D
Three, no, Four Cool Things About DThree, no, Four Cool Things About D
Three, no, Four Cool Things About DAndrei Alexandrescu
 
iOS 6 Exploitation 280 days later
iOS 6 Exploitation 280 days lateriOS 6 Exploitation 280 days later
iOS 6 Exploitation 280 days laterWang Hao Lee
 
Deep C Programming
Deep C ProgrammingDeep C Programming
Deep C ProgrammingWang Hao Lee
 
C++ idioms by example (Nov 2008)
C++ idioms by example (Nov 2008)C++ idioms by example (Nov 2008)
C++ idioms by example (Nov 2008)Olve Maudal
 
Insecure coding in C (and C++)
Insecure coding in C (and C++)Insecure coding in C (and C++)
Insecure coding in C (and C++)Olve Maudal
 
Code Optimization
Code OptimizationCode Optimization
Code Optimizationguest9f8315
 
H2O - making HTTP better
H2O - making HTTP betterH2O - making HTTP better
H2O - making HTTP betterKazuho Oku
 
TDD in C - Recently Used List Kata
TDD in C - Recently Used List KataTDD in C - Recently Used List Kata
TDD in C - Recently Used List KataOlve Maudal
 

Destacado (18)

Three Optimization Tips for C++
Three Optimization Tips for C++Three Optimization Tips for C++
Three Optimization Tips for C++
 
Dconf2015 d2 t4
Dconf2015 d2 t4Dconf2015 d2 t4
Dconf2015 d2 t4
 
Dconf2015 d2 t3
Dconf2015 d2 t3Dconf2015 d2 t3
Dconf2015 d2 t3
 
Solid C++ by Example
Solid C++ by ExampleSolid C++ by Example
Solid C++ by Example
 
Stabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance EvaluationStabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance Evaluation
 
ACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei AlexandrescuACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei Alexandrescu
 
Generic Programming Galore Using D
Generic Programming Galore Using DGeneric Programming Galore Using D
Generic Programming Galore Using D
 
Three, no, Four Cool Things About D
Three, no, Four Cool Things About DThree, no, Four Cool Things About D
Three, no, Four Cool Things About D
 
iOS 6 Exploitation 280 days later
iOS 6 Exploitation 280 days lateriOS 6 Exploitation 280 days later
iOS 6 Exploitation 280 days later
 
Deep C Programming
Deep C ProgrammingDeep C Programming
Deep C Programming
 
C++11
C++11C++11
C++11
 
C++ idioms by example (Nov 2008)
C++ idioms by example (Nov 2008)C++ idioms by example (Nov 2008)
C++ idioms by example (Nov 2008)
 
Insecure coding in C (and C++)
Insecure coding in C (and C++)Insecure coding in C (and C++)
Insecure coding in C (and C++)
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Code generation
Code generationCode generation
Code generation
 
H2O - making HTTP better
H2O - making HTTP betterH2O - making HTTP better
H2O - making HTTP better
 
TDD in C - Recently Used List Kata
TDD in C - Recently Used List KataTDD in C - Recently Used List Kata
TDD in C - Recently Used List Kata
 
Deep C
Deep CDeep C
Deep C
 

Similar a Three Optimization Tips for C++

Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
Writing Readable Code
Writing Readable CodeWriting Readable Code
Writing Readable Codeeddiehaber
 
Database , 6 Query Introduction
Database , 6 Query Introduction Database , 6 Query Introduction
Database , 6 Query Introduction Ali Usman
 
How To Handle Your Tech Debt Better - Sean Moir
How To Handle Your Tech Debt Better - Sean MoirHow To Handle Your Tech Debt Better - Sean Moir
How To Handle Your Tech Debt Better - Sean MoirMike Harris
 
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Anne Nicolas
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and batteryVitali Pekelis
 
4 colin walls - self-testing in embedded systems
4   colin walls - self-testing in embedded systems4   colin walls - self-testing in embedded systems
4 colin walls - self-testing in embedded systemsIevgenii Katsan
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...Nesreen K. Ahmed
 
Optimizing Browser Rendering
Optimizing Browser RenderingOptimizing Browser Rendering
Optimizing Browser Renderingmichael.labriola
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenJeff Mace
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the FieldMongoDB
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?ScyllaDB
 
SOLID, DRY, SLAP design principles
SOLID, DRY, SLAP design principlesSOLID, DRY, SLAP design principles
SOLID, DRY, SLAP design principlesSergey Karpushin
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Clean code, Feb 2012
Clean code, Feb 2012Clean code, Feb 2012
Clean code, Feb 2012cobyst
 
Pragmatic Performance from NDC Oslo 2019
Pragmatic Performance from NDC Oslo 2019Pragmatic Performance from NDC Oslo 2019
Pragmatic Performance from NDC Oslo 2019David Wengier
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkDoug Chang
 

Similar a Three Optimization Tips for C++ (20)

Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf6-Query_Intro (5).pdf
6-Query_Intro (5).pdf
 
Writing Readable Code
Writing Readable CodeWriting Readable Code
Writing Readable Code
 
Database , 6 Query Introduction
Database , 6 Query Introduction Database , 6 Query Introduction
Database , 6 Query Introduction
 
How To Handle Your Tech Debt Better - Sean Moir
How To Handle Your Tech Debt Better - Sean MoirHow To Handle Your Tech Debt Better - Sean Moir
How To Handle Your Tech Debt Better - Sean Moir
 
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
 
Performance #5 cpu and battery
Performance #5  cpu and batteryPerformance #5  cpu and battery
Performance #5 cpu and battery
 
4 colin walls - self-testing in embedded systems
4   colin walls - self-testing in embedded systems4   colin walls - self-testing in embedded systems
4 colin walls - self-testing in embedded systems
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
 
Optimizing Browser Rendering
Optimizing Browser RenderingOptimizing Browser Rendering
Optimizing Browser Rendering
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
 
SOLID, DRY, SLAP design principles
SOLID, DRY, SLAP design principlesSOLID, DRY, SLAP design principles
SOLID, DRY, SLAP design principles
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Clean code, Feb 2012
Clean code, Feb 2012Clean code, Feb 2012
Clean code, Feb 2012
 
Pragmatic Performance from NDC Oslo 2019
Pragmatic Performance from NDC Oslo 2019Pragmatic Performance from NDC Oslo 2019
Pragmatic Performance from NDC Oslo 2019
 
Interactive DSML Design
Interactive DSML DesignInteractive DSML Design
Interactive DSML Design
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
 
Cassandra On EC2
Cassandra On EC2Cassandra On EC2
Cassandra On EC2
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Three Optimization Tips for C++

  • 1. Three Optimization Tips for C++ Andrei Alexandrescu, Ph.D. Research Scientist, Facebook andrei.alexandrescu@fb.com © 2012- Facebook. Do not redistribute. 1 / 33
  • 2. This Talk • Basics • Reduce strength • Minimize array writes © 2012- Facebook. Do not redistribute. 2 / 33
  • 3. Things I Shouldn’t Even © 2012- Facebook. Do not redistribute. 3 / 33
  • 4. Today’s Computing Architectures • Extremely complex • Trade reproducible performance for average speed • Interrupts, multiprocessing are the norm • Dynamic frequency control is becoming common • Virtually impossible to get identical timings for experiments © 2012- Facebook. Do not redistribute. 4 / 33
  • 5. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions © 2012- Facebook. Do not redistribute. 5 / 33
  • 6. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” © 2012- Facebook. Do not redistribute. 5 / 33
  • 7. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” © 2012- Facebook. Do not redistribute. 5 / 33
  • 8. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” © 2012- Facebook. Do not redistribute. 5 / 33
  • 9. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” © 2012- Facebook. Do not redistribute. 5 / 33
  • 10. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” • “Computation is faster than data” © 2012- Facebook. Do not redistribute. 5 / 33
  • 11. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” • “Computation is faster than data” © 2012- Facebook. Do not redistribute. 5 / 33
  • 12. Intuition • Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” • “Computation is faster than data” • The only good intuition: “I should time this.” © 2012- Facebook. Do not redistribute. 5 / 33
  • 13. Paradox Measuring gives you a leg up on experts who don’t need to measure © 2012- Facebook. Do not redistribute. 6 / 33
  • 14. Common Pitfalls • Measuring speed of debug builds • Different setup for baseline and measured ◦ Sequencing: heap allocator ◦ Warmth of cache, files, databases, DNS • Including ancillary work in measurement ◦ malloc, printf common • Mixtures: measure ta + tb , improve ta , conclude tb got improved • Optimize rare cases, pessimize others © 2012- Facebook. Do not redistribute. 7 / 33
  • 15. Optimizing Rare Cases © 2012- Facebook. Do not redistribute. 8 / 33
  • 16. More generalities • Prefer static linking and PDC • Prefer 64-bit code, 32-bit data • Prefer (32-bit) array indexing to pointers ◦ Prefer a[i++] to a[++i] • Prefer regular memory access patterns • Minimize flow, avoid data dependencies © 2012- Facebook. Do not redistribute. 9 / 33
  • 17. Storage Pecking Order • Use enum for integral constants • Use static const for other immutables ◦ Beware cache issues • Use stack for most variables • Globals: aliasing issues • thread_local slowest, use local caching ◦ 1 instruction in Windows, Linux ◦ 3-4 in OSX © 2012- Facebook. Do not redistribute. 10 / 33
  • 18. Reduce Strength © 2012- Facebook. Do not redistribute. 11 / 33
  • 19. Strength reduction • Speed hierarchy: ◦ comparisons ◦ (u)int add, subtract, bitops, shift ◦ FP add, sub (separate unit!) ◦ Indexed array access ◦ (u)int32 mul; FP mul ◦ FP division, remainder ◦ (u)int division, remainder © 2012- Facebook. Do not redistribute. 12 / 33
  • 20. Your Compiler Called I get it. a >>= 1 is the same as a /= 2. © 2012- Facebook. Do not redistribute. 13 / 33
  • 21. Integrals • Prefer 32-bit ints to all other sizes ◦ 64 bit may make some code slower ◦ 8, 16-bit computations use conversion to 32 bits and back ◦ Use small ints in arrays • Prefer unsigned to signed ◦ Except when converting to floating point • “Most numbers are small” © 2012- Facebook. Do not redistribute. 14 / 33
  • 22. Floating Point • Double precision as fast as single precision • Extended precision just a bit slower • Do not mix the three • 1-2 FP addition/subtraction units • 1-2 FP multiplication/division units • SSE accelerates throughput for certain computation kernels • ints→FPs cheap, FPs→ints expensive © 2012- Facebook. Do not redistribute. 15 / 33
  • 23. Advice Design algorithms to use minimum operation strength © 2012- Facebook. Do not redistribute. 16 / 33
  • 24. Strength reduction: Example • Digit count in base-10 representation uint32_t digits10(uint64_t v) { uint32_t result = 0; do { ++result; v /= 10; } while (v); return result; } • Uses integral division extensively ◦ (Actually: multiplication) © 2012- Facebook. Do not redistribute. 17 / 33
  • 25. Strength reduction: Example uint32_t digits10(uint64_t v) { uint32_t result = 1; for (;;) { if (v < 10) return result; if (v < 100) return result + 1; if (v < 1000) return result + 2; if (v < 10000) return result + 3; // Skip ahead by 4 orders of magnitude v /= 10000U; result += 4; } } • More comparisons and additions, fewer /= • (This is not loop unrolling!) © 2012- Facebook. Do not redistribute. 18 / 33
  • 26.
  • 27. Minimize Array Writes © 2012- Facebook. Do not redistribute. 20 / 33
  • 28. Minimize Array Writes: Why? • Disables enregistering • A write is really a read and a write • Aliasing makes things difficult • Maculates the cache • Generally just difficult to optimize © 2012- Facebook. Do not redistribute. 21 / 33
  • 29. Minimize Array Writes uint32_t u64ToAsciiClassic(uint64_t value, char* dst) { // Write backwards. auto start = dst; do { *dst++ = ’0’ + (value % 10); value /= 10; } while (value != 0); const uint32_t result = dst - start; // Reverse in place. for (dst--; dst > start; start++, dst--) { std::iter_swap(dst, start); } return result; } © 2012- Facebook. Do not redistribute. 22 / 33
  • 30. Minimize Array Writes • Gambit: make one extra pass to compute length uint32_t uint64ToAscii(uint64_t v, char *const buffer) { auto const result = digits10(v); uint32_t pos = result - 1; while (v >= 10) { auto const q = v / 10; auto const r = static_cast<uint32_t>(v % 10); buffer[pos--] = ’0’ + r; v = q; } assert(pos == 0); // Last digit is trivial to handle *buffer = static_cast<uint32_t>(v) + ’0’; return result; } © 2012- Facebook. Do not redistribute. 23 / 33
  • 31. Improvements • Fewer array writes • Regular access patterns • Fast on small numbers • Data dependencies reduced © 2012- Facebook. Do not redistribute. 24 / 33
  • 32.
  • 33. One More Pass • Reformulate digits10 as search • Convert two digits at a time © 2012- Facebook. Do not redistribute. 26 / 33
  • 34. uint32_t digits10(uint64_t v) { if (v < P01) return 1; if (v < P02) return 2; if (v < P03) return 3; if (v < P12) { if (v < P08) { if (v < P06) { if (v < P04) return 4; return 5 + (v < P05); } return 7 + (v >= P07); } if (v < P10) { return 9 + (v >= P09); } return 11 + (v >= P11); } return 12 + digits10(v / P12); } © 2012- Facebook. Do not redistribute. 27 / 33
  • 35. unsigned u64ToAsciiTable(uint64_t value, char* dst) { static const char digits[201] = "0001020304050607080910111213141516171819" "2021222324252627282930313233343536373839" "4041424344454647484950515253545556575859" "6061626364656667686970717273747576777879" "8081828384858687888990919293949596979899"; uint32_t const length = digits10(value); uint32_t next = length - 1; while (value >= 100) { auto const i = (value % 100) * 2; value /= 100; dst[next] = digits[i + 1]; dst[next - 1] = digits[i]; next -= 2; } © 2012- Facebook. Do not redistribute. 28 / 33
  • 36. // Handle last 1-2 digits if (value < 10) { dst[next] = ’0’ + uint32_t(value); } else { auto i = uint32_t(value) * 2; dst[next] = digits[i + 1]; dst[next - 1] = digits[i]; } return length; } © 2012- Facebook. Do not redistribute. 29 / 33
  • 37.
  • 38.
  • 39. Summary © 2012- Facebook. Do not redistribute. 32 / 33
  • 40. Summary • You can’t improve what you can’t measure ◦ Pro tip: You can’t measure what you don’t measure • Reduce strength • Minimize array writes © 2012- Facebook. Do not redistribute. 33 / 33