Jump to content
Nytro

Optimizing software in C++

Recommended Posts

Posted

Optimizing software in C++An optimization guide for Windows, Linux and Macplatforms

By Agner Fog.

Technical University of Denmark.

Copyright © 2004 - 2014.

Last updated 2014-08-07.

Contents1 Introduction ....................................................................................................................... 3
1.1 The costs of optimizing ............................................................................................... 4
2 Choosing the optimal platform........................................................................................... 5
2.1 Choice of hardware platform....................................................................................... 5
2.2 Choice of microprocessor ........................................................................................... 6
2.3 Choice of operating system......................................................................................... 6
2.4 Choice of programming language ............................................................................... 8
2.5 Choice of compiler.................................................................................................... 10
2.6 Choice of function libraries........................................................................................ 12
2.7 Choice of user interface framework........................................................................... 14
2.8 Overcoming the drawbacks of the C++ language...................................................... 14
3 Finding the biggest time consumers ................................................................................ 16
3.1 How much is a clock cycle? ...................................................................................... 16
3.2 Use a profiler to find hot spots .................................................................................. 16
3.3 Program installation .................................................................................................. 18
3.4 Automatic updates .................................................................................................... 19
3.5 Program loading ....................................................................................................... 19
3.6 Dynamic linking and position-independent code ....................................................... 20
3.7 File access................................................................................................................ 20
3.8 System database ...................................................................................................... 20
3.9 Other databases ....................................................................................................... 21
3.10 Graphics ................................................................................................................. 21
3.11 Other system resources.......................................................................................... 21
3.12 Network access ...................................................................................................... 21
3.13 Memory access....................................................................................................... 22
3.14 Context switches..................................................................................................... 22
3.15 Dependency chains ................................................................................................ 22
3.16 Execution unit throughput ....................................................................................... 22
4 Performance and usability ............................................................................................... 23
5 Choosing the optimal algorithm ....................................................................................... 24
6 Development process...................................................................................................... 25
7 The efficiency of different C++ constructs........................................................................ 26
7.1 Different kinds of variable storage............................................................................. 26
7.2 Integers variables and operators............................................................................... 29
7.3 Floating point variables and operators ...................................................................... 32
7.4 Enums ...................................................................................................................... 33
7.5 Booleans................................................................................................................... 33
7.6 Pointers and references............................................................................................ 36
7.7 Function pointers ...................................................................................................... 37
7.8 Member pointers....................................................................................................... 37
7.9 Smart pointers .......................................................................................................... 38
7.10 Arrays ..................................................................................................................... 38
7.11 Type conversions.................................................................................................... 40
7.12 Branches and switch statements............................................................................. 43
7.13 Loops...................................................................................................................... 45
2
7.14 Functions ................................................................................................................ 48
7.15 Function parameters ............................................................................................... 50
7.16 Function return types .............................................................................................. 50
7.17 Structures and classes............................................................................................ 51
7.18 Class data members (properties)............................................................................ 51
7.19 Class member functions (methods)......................................................................... 53
7.20 Virtual member functions ........................................................................................ 53
7.21 Runtime type identification (RTTI)........................................................................... 54
7.22 Inheritance.............................................................................................................. 54
7.23 Constructors and destructors .................................................................................. 55
7.24 Unions .................................................................................................................... 55
7.25 Bitfields................................................................................................................... 56
7.26 Overloaded functions .............................................................................................. 56
7.27 Overloaded operators ............................................................................................. 56
7.28 Templates............................................................................................................... 57
7.29 Threads .................................................................................................................. 60
7.30 Exceptions and error handling ................................................................................ 61
7.31 Other cases of stack unwinding .............................................................................. 65
7.32 Preprocessing directives ......................................................................................... 65
7.33 Namespaces........................................................................................................... 65
8 Optimizations in the compiler .......................................................................................... 66
8.1 How compilers optimize ............................................................................................ 66
8.2 Comparison of different compilers............................................................................. 74
8.3 Obstacles to optimization by compiler....................................................................... 77
8.4 Obstacles to optimization by CPU............................................................................. 81
8.5 Compiler optimization options ................................................................................... 81
8.6 Optimization directives.............................................................................................. 82
8.7 Checking what the compiler does ............................................................................. 84
9 Optimizing memory access ............................................................................................. 87
9.1 Caching of code and data ......................................................................................... 87
9.2 Cache organization................................................................................................... 87
9.3 Functions that are used together should be stored together...................................... 88
9.4 Variables that are used together should be stored together ...................................... 88
9.5 Alignment of data...................................................................................................... 90
9.6 Dynamic memory allocation...................................................................................... 90
9.7 Container classes ..................................................................................................... 93
9.8 Strings ...................................................................................................................... 96
9.9 Access data sequentially .......................................................................................... 96
9.10 Cache contentions in large data structures ............................................................. 96
9.11 Explicit cache control .............................................................................................. 99
10 Multithreading.............................................................................................................. 101
10.1 Hyperthreading ..................................................................................................... 103
11 Out of order execution................................................................................................. 103
12 Using vector operations............................................................................................... 105
12.1 AVX instruction set and YMM registers ................................................................. 107
12.2 AVX-512 instruction set and ZMM registers .......................................................... 107
12.3 Automatic vectorization......................................................................................... 107
12.4 Using intrinsic functions ........................................................................................ 109
12.5 Using vector classes ............................................................................................. 113
12.6 Transforming serial code for vectorization............................................................. 117
12.7 Mathematical functions for vectors........................................................................ 119
12.8 Aligning dynamically allocated memory................................................................. 120
12.9 Aligning RGB video or 3-dimensional vectors ....................................................... 120
12.10 Conclusion.......................................................................................................... 120
13 Making critical code in multiple versions for different instruction sets........................... 122
13.1 CPU dispatch strategies........................................................................................ 122
13.2 Model-specific dispatching.................................................................................... 124
13.3 Difficult cases........................................................................................................ 124
3
13.4 Test and maintenance .......................................................................................... 126
13.5 Implementation ..................................................................................................... 126
13.6 CPU dispatching in Gnu compiler ......................................................................... 128
13.7 CPU dispatching in Intel compiler ......................................................................... 130
14 Specific optimization topics ......................................................................................... 132
14.1 Use lookup tables ................................................................................................. 132
14.2 Bounds checking .................................................................................................. 134
14.3 Use bitwise operators for checking multiple values at once................................... 135
14.4 Integer multiplication............................................................................................. 136
14.5 Integer division...................................................................................................... 137
14.6 Floating point division ........................................................................................... 139
14.7 Don't mix float and double..................................................................................... 140
14.8 Conversions between floating point numbers and integers ................................... 141
14.9 Using integer operations for manipulating floating point variables......................... 142
14.10 Mathematical functions ....................................................................................... 145
14.11 Static versus dynamic libraries............................................................................ 146
14.12 Position-independent code.................................................................................. 148
14.13 System programming.......................................................................................... 150
15 Metaprogramming ....................................................................................................... 150
16 Testing speed.............................................................................................................. 153
16.1 Using performance monitor counters .................................................................... 155
16.2 The pitfalls of unit-testing ...................................................................................... 156
16.3 Worst-case testing ................................................................................................ 157
17 Optimization in embedded systems............................................................................. 158
18 Overview of compiler options....................................................................................... 160
19 Literature..................................................................................................................... 163
20 Copyright notice .......................................................................................................... 164

Download: http://www.agner.org/optimize/optimizing_cpp.pdf

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...