Jump to content
Nytro

Optimizing software in C++

Recommended Posts

Optimizing software in C++An optimization guide for Windows, Linux and Macplatforms

By Agner Fog.

Technical University of Denmark.

Copyright © 2004 - 2014.

Last updated 2014-08-07.

Contents1 Introduction ....................................................................................................................... 3
1.1 The costs of optimizing ............................................................................................... 4
2 Choosing the optimal platform........................................................................................... 5
2.1 Choice of hardware platform....................................................................................... 5
2.2 Choice of microprocessor ........................................................................................... 6
2.3 Choice of operating system......................................................................................... 6
2.4 Choice of programming language ............................................................................... 8
2.5 Choice of compiler.................................................................................................... 10
2.6 Choice of function libraries........................................................................................ 12
2.7 Choice of user interface framework........................................................................... 14
2.8 Overcoming the drawbacks of the C++ language...................................................... 14
3 Finding the biggest time consumers ................................................................................ 16
3.1 How much is a clock cycle? ...................................................................................... 16
3.2 Use a profiler to find hot spots .................................................................................. 16
3.3 Program installation .................................................................................................. 18
3.4 Automatic updates .................................................................................................... 19
3.5 Program loading ....................................................................................................... 19
3.6 Dynamic linking and position-independent code ....................................................... 20
3.7 File access................................................................................................................ 20
3.8 System database ...................................................................................................... 20
3.9 Other databases ....................................................................................................... 21
3.10 Graphics ................................................................................................................. 21
3.11 Other system resources.......................................................................................... 21
3.12 Network access ...................................................................................................... 21
3.13 Memory access....................................................................................................... 22
3.14 Context switches..................................................................................................... 22
3.15 Dependency chains ................................................................................................ 22
3.16 Execution unit throughput ....................................................................................... 22
4 Performance and usability ............................................................................................... 23
5 Choosing the optimal algorithm ....................................................................................... 24
6 Development process...................................................................................................... 25
7 The efficiency of different C++ constructs........................................................................ 26
7.1 Different kinds of variable storage............................................................................. 26
7.2 Integers variables and operators............................................................................... 29
7.3 Floating point variables and operators ...................................................................... 32
7.4 Enums ...................................................................................................................... 33
7.5 Booleans................................................................................................................... 33
7.6 Pointers and references............................................................................................ 36
7.7 Function pointers ...................................................................................................... 37
7.8 Member pointers....................................................................................................... 37
7.9 Smart pointers .......................................................................................................... 38
7.10 Arrays ..................................................................................................................... 38
7.11 Type conversions.................................................................................................... 40
7.12 Branches and switch statements............................................................................. 43
7.13 Loops...................................................................................................................... 45
2
7.14 Functions ................................................................................................................ 48
7.15 Function parameters ............................................................................................... 50
7.16 Function return types .............................................................................................. 50
7.17 Structures and classes............................................................................................ 51
7.18 Class data members (properties)............................................................................ 51
7.19 Class member functions (methods)......................................................................... 53
7.20 Virtual member functions ........................................................................................ 53
7.21 Runtime type identification (RTTI)........................................................................... 54
7.22 Inheritance.............................................................................................................. 54
7.23 Constructors and destructors .................................................................................. 55
7.24 Unions .................................................................................................................... 55
7.25 Bitfields................................................................................................................... 56
7.26 Overloaded functions .............................................................................................. 56
7.27 Overloaded operators ............................................................................................. 56
7.28 Templates............................................................................................................... 57
7.29 Threads .................................................................................................................. 60
7.30 Exceptions and error handling ................................................................................ 61
7.31 Other cases of stack unwinding .............................................................................. 65
7.32 Preprocessing directives ......................................................................................... 65
7.33 Namespaces........................................................................................................... 65
8 Optimizations in the compiler .......................................................................................... 66
8.1 How compilers optimize ............................................................................................ 66
8.2 Comparison of different compilers............................................................................. 74
8.3 Obstacles to optimization by compiler....................................................................... 77
8.4 Obstacles to optimization by CPU............................................................................. 81
8.5 Compiler optimization options ................................................................................... 81
8.6 Optimization directives.............................................................................................. 82
8.7 Checking what the compiler does ............................................................................. 84
9 Optimizing memory access ............................................................................................. 87
9.1 Caching of code and data ......................................................................................... 87
9.2 Cache organization................................................................................................... 87
9.3 Functions that are used together should be stored together...................................... 88
9.4 Variables that are used together should be stored together ...................................... 88
9.5 Alignment of data...................................................................................................... 90
9.6 Dynamic memory allocation...................................................................................... 90
9.7 Container classes ..................................................................................................... 93
9.8 Strings ...................................................................................................................... 96
9.9 Access data sequentially .......................................................................................... 96
9.10 Cache contentions in large data structures ............................................................. 96
9.11 Explicit cache control .............................................................................................. 99
10 Multithreading.............................................................................................................. 101
10.1 Hyperthreading ..................................................................................................... 103
11 Out of order execution................................................................................................. 103
12 Using vector operations............................................................................................... 105
12.1 AVX instruction set and YMM registers ................................................................. 107
12.2 AVX-512 instruction set and ZMM registers .......................................................... 107
12.3 Automatic vectorization......................................................................................... 107
12.4 Using intrinsic functions ........................................................................................ 109
12.5 Using vector classes ............................................................................................. 113
12.6 Transforming serial code for vectorization............................................................. 117
12.7 Mathematical functions for vectors........................................................................ 119
12.8 Aligning dynamically allocated memory................................................................. 120
12.9 Aligning RGB video or 3-dimensional vectors ....................................................... 120
12.10 Conclusion.......................................................................................................... 120
13 Making critical code in multiple versions for different instruction sets........................... 122
13.1 CPU dispatch strategies........................................................................................ 122
13.2 Model-specific dispatching.................................................................................... 124
13.3 Difficult cases........................................................................................................ 124
3
13.4 Test and maintenance .......................................................................................... 126
13.5 Implementation ..................................................................................................... 126
13.6 CPU dispatching in Gnu compiler ......................................................................... 128
13.7 CPU dispatching in Intel compiler ......................................................................... 130
14 Specific optimization topics ......................................................................................... 132
14.1 Use lookup tables ................................................................................................. 132
14.2 Bounds checking .................................................................................................. 134
14.3 Use bitwise operators for checking multiple values at once................................... 135
14.4 Integer multiplication............................................................................................. 136
14.5 Integer division...................................................................................................... 137
14.6 Floating point division ........................................................................................... 139
14.7 Don't mix float and double..................................................................................... 140
14.8 Conversions between floating point numbers and integers ................................... 141
14.9 Using integer operations for manipulating floating point variables......................... 142
14.10 Mathematical functions ....................................................................................... 145
14.11 Static versus dynamic libraries............................................................................ 146
14.12 Position-independent code.................................................................................. 148
14.13 System programming.......................................................................................... 150
15 Metaprogramming ....................................................................................................... 150
16 Testing speed.............................................................................................................. 153
16.1 Using performance monitor counters .................................................................... 155
16.2 The pitfalls of unit-testing ...................................................................................... 156
16.3 Worst-case testing ................................................................................................ 157
17 Optimization in embedded systems............................................................................. 158
18 Overview of compiler options....................................................................................... 160
19 Literature..................................................................................................................... 163
20 Copyright notice .......................................................................................................... 164

Download: http://www.agner.org/optimize/optimizing_cpp.pdf

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...