Enviar búsqueda
Cargar
Three Optimization Tips for C++
•
71 recomendaciones
•
23,063 vistas
Andrei Alexandrescu
Seguir
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 40
Descargar ahora
Descargar para leer sin conexión
Recomendados
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the World
Milo Yip
Ninja Build: Simple Guide for Beginners
Ninja Build: Simple Guide for Beginners
Chang W. Doh
Improve Android System Component Performance
Improve Android System Component Performance
National Cheng Kung University
Scheduling in Android
Scheduling in Android
Opersys inc.
Understanding the Android System Server
Understanding the Android System Server
Opersys inc.
Android IPC Mechanism
Android IPC Mechanism
National Cheng Kung University
Inside Android's UI
Inside Android's UI
Opersys inc.
Implementation & Comparison Of Rdma Over Ethernet
Implementation & Comparison Of Rdma Over Ethernet
James Wernicke
Recomendados
How to Write the Fastest JSON Parser/Writer in the World
How to Write the Fastest JSON Parser/Writer in the World
Milo Yip
Ninja Build: Simple Guide for Beginners
Ninja Build: Simple Guide for Beginners
Chang W. Doh
Improve Android System Component Performance
Improve Android System Component Performance
National Cheng Kung University
Scheduling in Android
Scheduling in Android
Opersys inc.
Understanding the Android System Server
Understanding the Android System Server
Opersys inc.
Android IPC Mechanism
Android IPC Mechanism
National Cheng Kung University
Inside Android's UI
Inside Android's UI
Opersys inc.
Implementation & Comparison Of Rdma Over Ethernet
Implementation & Comparison Of Rdma Over Ethernet
James Wernicke
I Hunt Sys Admins
I Hunt Sys Admins
Will Schroeder
Overview of Android binder IPC implementation
Overview of Android binder IPC implementation
Chethan Pchethan
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
Joseph Lu
Hunting rootkits with windbg
Hunting rootkits with windbg
Frank Boldewin
C++からWebRTC (DataChannel)を利用する
C++からWebRTC (DataChannel)を利用する
祐司 伊藤
Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!
Kent Chen
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ScyllaDB
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
YOW2021 Computing Performance
YOW2021 Computing Performance
Brendan Gregg
Logging system of Android
Logging system of Android
Tetsuyuki Kobayashi
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Opersys inc.
Quick and Easy Device Drivers for Embedded Linux Using UIO
Quick and Easy Device Drivers for Embedded Linux Using UIO
Chris Simmonds
Cuda introduction
Cuda introduction
Hanibei
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Steve Pember
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
Android Security Internals
Android Security Internals
Opersys inc.
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
CODE WHITE GmbH
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
Mihai Criveti
Three Optimization Tips for C++
Three Optimization Tips for C++
Andrei Alexandrescu
Dconf2015 d2 t4
Dconf2015 d2 t4
Andrei Alexandrescu
Más contenido relacionado
La actualidad más candente
I Hunt Sys Admins
I Hunt Sys Admins
Will Schroeder
Overview of Android binder IPC implementation
Overview of Android binder IPC implementation
Chethan Pchethan
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
Joseph Lu
Hunting rootkits with windbg
Hunting rootkits with windbg
Frank Boldewin
C++からWebRTC (DataChannel)を利用する
C++からWebRTC (DataChannel)を利用する
祐司 伊藤
Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!
Kent Chen
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ScyllaDB
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
YOW2021 Computing Performance
YOW2021 Computing Performance
Brendan Gregg
Logging system of Android
Logging system of Android
Tetsuyuki Kobayashi
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Opersys inc.
Quick and Easy Device Drivers for Embedded Linux Using UIO
Quick and Easy Device Drivers for Embedded Linux Using UIO
Chris Simmonds
Cuda introduction
Cuda introduction
Hanibei
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Steve Pember
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
Android Security Internals
Android Security Internals
Opersys inc.
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
CODE WHITE GmbH
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
Mihai Criveti
La actualidad más candente
(20)
I Hunt Sys Admins
I Hunt Sys Admins
Overview of Android binder IPC implementation
Overview of Android binder IPC implementation
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
Hunting rootkits with windbg
Hunting rootkits with windbg
C++からWebRTC (DataChannel)を利用する
C++からWebRTC (DataChannel)を利用する
Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
YOW2021 Computing Performance
YOW2021 Computing Performance
Logging system of Android
Logging system of Android
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Native Android Userspace part of the Embedded Android Workshop at Linaro Conn...
Quick and Easy Device Drivers for Embedded Linux Using UIO
Quick and Easy Device Drivers for Embedded Linux Using UIO
Cuda introduction
Cuda introduction
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
Android Security Internals
Android Security Internals
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
Destacado
Three Optimization Tips for C++
Three Optimization Tips for C++
Andrei Alexandrescu
Dconf2015 d2 t4
Dconf2015 d2 t4
Andrei Alexandrescu
Dconf2015 d2 t3
Dconf2015 d2 t3
Andrei Alexandrescu
Solid C++ by Example
Solid C++ by Example
Olve Maudal
Stabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance Evaluation
Emery Berger
ACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei Alexandrescu
Andrei Alexandrescu
Generic Programming Galore Using D
Generic Programming Galore Using D
Andrei Alexandrescu
Three, no, Four Cool Things About D
Three, no, Four Cool Things About D
Andrei Alexandrescu
iOS 6 Exploitation 280 days later
iOS 6 Exploitation 280 days later
Wang Hao Lee
Deep C Programming
Deep C Programming
Wang Hao Lee
C++11
C++11
Andrey Dankevich
C++ idioms by example (Nov 2008)
C++ idioms by example (Nov 2008)
Olve Maudal
Insecure coding in C (and C++)
Insecure coding in C (and C++)
Olve Maudal
Code Optimization
Code Optimization
guest9f8315
Code generation
Code generation
Aparna Nayak
H2O - making HTTP better
H2O - making HTTP better
Kazuho Oku
TDD in C - Recently Used List Kata
TDD in C - Recently Used List Kata
Olve Maudal
Deep C
Deep C
Olve Maudal
Destacado
(18)
Three Optimization Tips for C++
Three Optimization Tips for C++
Dconf2015 d2 t4
Dconf2015 d2 t4
Dconf2015 d2 t3
Dconf2015 d2 t3
Solid C++ by Example
Solid C++ by Example
Stabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance Evaluation
ACCU Keynote by Andrei Alexandrescu
ACCU Keynote by Andrei Alexandrescu
Generic Programming Galore Using D
Generic Programming Galore Using D
Three, no, Four Cool Things About D
Three, no, Four Cool Things About D
iOS 6 Exploitation 280 days later
iOS 6 Exploitation 280 days later
Deep C Programming
Deep C Programming
C++11
C++11
C++ idioms by example (Nov 2008)
C++ idioms by example (Nov 2008)
Insecure coding in C (and C++)
Insecure coding in C (and C++)
Code Optimization
Code Optimization
Code generation
Code generation
H2O - making HTTP better
H2O - making HTTP better
TDD in C - Recently Used List Kata
TDD in C - Recently Used List Kata
Deep C
Deep C
Similar a Three Optimization Tips for C++
Data oriented design and c++
Data oriented design and c++
Mike Acton
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf
JaveriaShoaib4
Writing Readable Code
Writing Readable Code
eddiehaber
Database , 6 Query Introduction
Database , 6 Query Introduction
Ali Usman
How To Handle Your Tech Debt Better - Sean Moir
How To Handle Your Tech Debt Better - Sean Moir
Mike Harris
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Anne Nicolas
Performance #5 cpu and battery
Performance #5 cpu and battery
Vitali Pekelis
4 colin walls - self-testing in embedded systems
4 colin walls - self-testing in embedded systems
Ievgenii Katsan
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
Nesreen K. Ahmed
Optimizing Browser Rendering
Optimizing Browser Rendering
michael.labriola
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
Jeff Mace
Tales from the Field
Tales from the Field
MongoDB
Where Did All These Cycles Go?
Where Did All These Cycles Go?
ScyllaDB
SOLID, DRY, SLAP design principles
SOLID, DRY, SLAP design principles
Sergey Karpushin
Cassandra introduction mars jug
Cassandra introduction mars jug
Duyhai Doan
Clean code, Feb 2012
Clean code, Feb 2012
cobyst
Pragmatic Performance from NDC Oslo 2019
Pragmatic Performance from NDC Oslo 2019
David Wengier
Interactive DSML Design
Interactive DSML Design
Andriy Levytskyy
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
Doug Chang
Cassandra On EC2
Cassandra On EC2
Matthew Dennis
Similar a Three Optimization Tips for C++
(20)
Data oriented design and c++
Data oriented design and c++
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf
Writing Readable Code
Writing Readable Code
Database , 6 Query Introduction
Database , 6 Query Introduction
How To Handle Your Tech Debt Better - Sean Moir
How To Handle Your Tech Debt Better - Sean Moir
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Performance #5 cpu and battery
Performance #5 cpu and battery
4 colin walls - self-testing in embedded systems
4 colin walls - self-testing in embedded systems
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
Optimizing Browser Rendering
Optimizing Browser Rendering
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
Tales from the Field
Tales from the Field
Where Did All These Cycles Go?
Where Did All These Cycles Go?
SOLID, DRY, SLAP design principles
SOLID, DRY, SLAP design principles
Cassandra introduction mars jug
Cassandra introduction mars jug
Clean code, Feb 2012
Clean code, Feb 2012
Pragmatic Performance from NDC Oslo 2019
Pragmatic Performance from NDC Oslo 2019
Interactive DSML Design
Interactive DSML Design
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
Cassandra On EC2
Cassandra On EC2
Último
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Último
(20)
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Three Optimization Tips for C++
1.
Three Optimization Tips
for C++ Andrei Alexandrescu, Ph.D. Research Scientist, Facebook andrei.alexandrescu@fb.com © 2012- Facebook. Do not redistribute. 1 / 33
2.
This Talk
• Basics • Reduce strength • Minimize array writes © 2012- Facebook. Do not redistribute. 2 / 33
3.
Things I Shouldn’t
Even © 2012- Facebook. Do not redistribute. 3 / 33
4.
Today’s Computing Architectures
• Extremely complex • Trade reproducible performance for average speed • Interrupts, multiprocessing are the norm • Dynamic frequency control is becoming common • Virtually impossible to get identical timings for experiments © 2012- Facebook. Do not redistribute. 4 / 33
5.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions © 2012- Facebook. Do not redistribute. 5 / 33
6.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” © 2012- Facebook. Do not redistribute. 5 / 33
7.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” © 2012- Facebook. Do not redistribute. 5 / 33
8.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” © 2012- Facebook. Do not redistribute. 5 / 33
9.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” © 2012- Facebook. Do not redistribute. 5 / 33
10.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” • “Computation is faster than data” © 2012- Facebook. Do not redistribute. 5 / 33
11.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” • “Computation is faster than data” © 2012- Facebook. Do not redistribute. 5 / 33
12.
Intuition
• Ignores aspects of a complex reality • Makes narrow/obsolete/wrong assumptions • “Fewer instructions = faster code” • “Data is faster than computation” • “Computation is faster than data” • The only good intuition: “I should time this.” © 2012- Facebook. Do not redistribute. 5 / 33
13.
Paradox
Measuring gives you a leg up on experts who don’t need to measure © 2012- Facebook. Do not redistribute. 6 / 33
14.
Common Pitfalls
• Measuring speed of debug builds • Different setup for baseline and measured ◦ Sequencing: heap allocator ◦ Warmth of cache, files, databases, DNS • Including ancillary work in measurement ◦ malloc, printf common • Mixtures: measure ta + tb , improve ta , conclude tb got improved • Optimize rare cases, pessimize others © 2012- Facebook. Do not redistribute. 7 / 33
15.
Optimizing Rare Cases ©
2012- Facebook. Do not redistribute. 8 / 33
16.
More generalities
• Prefer static linking and PDC • Prefer 64-bit code, 32-bit data • Prefer (32-bit) array indexing to pointers ◦ Prefer a[i++] to a[++i] • Prefer regular memory access patterns • Minimize flow, avoid data dependencies © 2012- Facebook. Do not redistribute. 9 / 33
17.
Storage Pecking Order
• Use enum for integral constants • Use static const for other immutables ◦ Beware cache issues • Use stack for most variables • Globals: aliasing issues • thread_local slowest, use local caching ◦ 1 instruction in Windows, Linux ◦ 3-4 in OSX © 2012- Facebook. Do not redistribute. 10 / 33
18.
Reduce Strength © 2012-
Facebook. Do not redistribute. 11 / 33
19.
Strength reduction
• Speed hierarchy: ◦ comparisons ◦ (u)int add, subtract, bitops, shift ◦ FP add, sub (separate unit!) ◦ Indexed array access ◦ (u)int32 mul; FP mul ◦ FP division, remainder ◦ (u)int division, remainder © 2012- Facebook. Do not redistribute. 12 / 33
20.
Your Compiler Called
I get it. a >>= 1 is the same as a /= 2. © 2012- Facebook. Do not redistribute. 13 / 33
21.
Integrals
• Prefer 32-bit ints to all other sizes ◦ 64 bit may make some code slower ◦ 8, 16-bit computations use conversion to 32 bits and back ◦ Use small ints in arrays • Prefer unsigned to signed ◦ Except when converting to floating point • “Most numbers are small” © 2012- Facebook. Do not redistribute. 14 / 33
22.
Floating Point
• Double precision as fast as single precision • Extended precision just a bit slower • Do not mix the three • 1-2 FP addition/subtraction units • 1-2 FP multiplication/division units • SSE accelerates throughput for certain computation kernels • ints→FPs cheap, FPs→ints expensive © 2012- Facebook. Do not redistribute. 15 / 33
23.
Advice
Design algorithms to use minimum operation strength © 2012- Facebook. Do not redistribute. 16 / 33
24.
Strength reduction: Example
• Digit count in base-10 representation uint32_t digits10(uint64_t v) { uint32_t result = 0; do { ++result; v /= 10; } while (v); return result; } • Uses integral division extensively ◦ (Actually: multiplication) © 2012- Facebook. Do not redistribute. 17 / 33
25.
Strength reduction: Example
uint32_t digits10(uint64_t v) { uint32_t result = 1; for (;;) { if (v < 10) return result; if (v < 100) return result + 1; if (v < 1000) return result + 2; if (v < 10000) return result + 3; // Skip ahead by 4 orders of magnitude v /= 10000U; result += 4; } } • More comparisons and additions, fewer /= • (This is not loop unrolling!) © 2012- Facebook. Do not redistribute. 18 / 33
26.
27.
Minimize Array Writes ©
2012- Facebook. Do not redistribute. 20 / 33
28.
Minimize Array Writes:
Why? • Disables enregistering • A write is really a read and a write • Aliasing makes things difficult • Maculates the cache • Generally just difficult to optimize © 2012- Facebook. Do not redistribute. 21 / 33
29.
Minimize Array Writes
uint32_t u64ToAsciiClassic(uint64_t value, char* dst) { // Write backwards. auto start = dst; do { *dst++ = ’0’ + (value % 10); value /= 10; } while (value != 0); const uint32_t result = dst - start; // Reverse in place. for (dst--; dst > start; start++, dst--) { std::iter_swap(dst, start); } return result; } © 2012- Facebook. Do not redistribute. 22 / 33
30.
Minimize Array Writes
• Gambit: make one extra pass to compute length uint32_t uint64ToAscii(uint64_t v, char *const buffer) { auto const result = digits10(v); uint32_t pos = result - 1; while (v >= 10) { auto const q = v / 10; auto const r = static_cast<uint32_t>(v % 10); buffer[pos--] = ’0’ + r; v = q; } assert(pos == 0); // Last digit is trivial to handle *buffer = static_cast<uint32_t>(v) + ’0’; return result; } © 2012- Facebook. Do not redistribute. 23 / 33
31.
Improvements
• Fewer array writes • Regular access patterns • Fast on small numbers • Data dependencies reduced © 2012- Facebook. Do not redistribute. 24 / 33
32.
33.
One More Pass
• Reformulate digits10 as search • Convert two digits at a time © 2012- Facebook. Do not redistribute. 26 / 33
34.
uint32_t
digits10(uint64_t v) { if (v < P01) return 1; if (v < P02) return 2; if (v < P03) return 3; if (v < P12) { if (v < P08) { if (v < P06) { if (v < P04) return 4; return 5 + (v < P05); } return 7 + (v >= P07); } if (v < P10) { return 9 + (v >= P09); } return 11 + (v >= P11); } return 12 + digits10(v / P12); } © 2012- Facebook. Do not redistribute. 27 / 33
35.
unsigned u64ToAsciiTable(uint64_t value,
char* dst) { static const char digits[201] = "0001020304050607080910111213141516171819" "2021222324252627282930313233343536373839" "4041424344454647484950515253545556575859" "6061626364656667686970717273747576777879" "8081828384858687888990919293949596979899"; uint32_t const length = digits10(value); uint32_t next = length - 1; while (value >= 100) { auto const i = (value % 100) * 2; value /= 100; dst[next] = digits[i + 1]; dst[next - 1] = digits[i]; next -= 2; } © 2012- Facebook. Do not redistribute. 28 / 33
36.
// Handle last
1-2 digits if (value < 10) { dst[next] = ’0’ + uint32_t(value); } else { auto i = uint32_t(value) * 2; dst[next] = digits[i + 1]; dst[next - 1] = digits[i]; } return length; } © 2012- Facebook. Do not redistribute. 29 / 33
37.
38.
39.
Summary © 2012- Facebook.
Do not redistribute. 32 / 33
40.
Summary
• You can’t improve what you can’t measure ◦ Pro tip: You can’t measure what you don’t measure • Reduce strength • Minimize array writes © 2012- Facebook. Do not redistribute. 33 / 33
Descargar ahora