Jump to content
Nytro

Optimizing software in C++

Recommended Posts

Optimizing software in C++

An optimization guide for Windows, Linux and Mac platforms

By Agner Fog. Copenhagen University College of Engineering.

Copyright © 2004 - 2011. Last updated 2011-06-08.

Contents
1 Introduction ....................................................................................................................... 3
1.1 The costs of optimizing ............................................................................................... 4
2 Choosing the optimal platform........................................................................................... 4
2.1 Choice of hardware platform....................................................................................... 4
2.2 Choice of microprocessor ........................................................................................... 6
2.3 Choice of operating system......................................................................................... 6
2.4 Choice of programming language ............................................................................... 8
2.5 Choice of compiler .................................................................................................... 10
2.6 Choice of function libraries........................................................................................ 12
2.7 Choice of user interface framework........................................................................... 14
2.8 Overcoming the drawbacks of the C++ language...................................................... 14
3 Finding the biggest time consumers ................................................................................ 16
3.1 How much is a clock cycle? ...................................................................................... 16
3.2 Use a profiler to find hot spots .................................................................................. 16
3.3 Program installation .................................................................................................. 18
3.4 Automatic updates .................................................................................................... 19
3.5 Program loading ....................................................................................................... 19
3.6 Dynamic linking and position-independent code ....................................................... 19
3.7 File access................................................................................................................21
3.8 System database ...................................................................................................... 22
3.9 Other databases ....................................................................................................... 22
3.10 Graphics ................................................................................................................. 22
3.11 Other system resources.......................................................................................... 22
3.12 Network access ...................................................................................................... 22
3.13 Memory access....................................................................................................... 23
3.14 Context switches..................................................................................................... 23
3.15 Dependency chains ................................................................................................ 23
3.16 Execution unit throughput ....................................................................................... 23
4 Performance and usability ............................................................................................... 24
5 Choosing the optimal algorithm....................................................................................... 25
6 Development process...................................................................................................... 26
7 The efficiency of different C++ constructs........................................................................ 27
7.1 Different kinds of variable storage............................................................................. 27
7.2 Integers variables and operators............................................................................... 30
7.3 Floating point variables and operators ...................................................................... 32
7.4 Enums ......................................................................................................................34
7.5 Booleans................................................................................................................... 34
7.6 Pointers and references............................................................................................ 36
7.7 Function pointers ...................................................................................................... 38
7.8 Member pointers....................................................................................................... 38
7.9 Smart pointers .......................................................................................................... 38
7.10 Arrays ..................................................................................................................... 39
7.11 Type conversions.................................................................................................... 41
7.12 Branches and switch statements............................................................................. 44
7.13 Loops...................................................................................................................... 46
2
7.14 Functions ................................................................................................................ 48
7.15 Function parameters ............................................................................................... 50
7.16 Function return types .............................................................................................. 51
7.17 Structures and classes............................................................................................ 51
7.18 Class data members (properties) ............................................................................ 52
7.19 Class member functions (methods)......................................................................... 53
7.20 Virtual member functions ........................................................................................ 54
7.21 Runtime type identification (RTTI)........................................................................... 54
7.22 Inheritance.............................................................................................................. 54
7.23 Constructors and destructors .................................................................................. 55
7.24 Unions ....................................................................................................................56
7.25 Bitfields................................................................................................................... 56
7.26 Overloaded functions .............................................................................................. 57
7.27 Overloaded operators ............................................................................................. 57
7.28 Templates............................................................................................................... 57
7.29 Threads .................................................................................................................. 60
7.30 Exceptions and error handling ................................................................................ 61
7.31 Other cases of stack unwinding .............................................................................. 65
7.32 Preprocessing directives......................................................................................... 65
7.33 Namespaces........................................................................................................... 65
8 Optimizations in the compiler .......................................................................................... 66
8.1 How compilers optimize ............................................................................................ 66
8.2 Comparison of different compilers............................................................................. 74
8.3 Obstacles to optimization by compiler....................................................................... 77
8.4 Obstacles to optimization by CPU............................................................................. 80
8.5 Compiler optimization options ................................................................................... 81
8.6 Optimization directives.............................................................................................. 82
8.7 Checking what the compiler does ............................................................................. 84
9 Optimizing memory access ............................................................................................. 87
9.1 Caching of code and data ......................................................................................... 87
9.2 Cache organization................................................................................................... 87
9.3 Functions that are used together should be stored together...................................... 88
9.4 Variables that are used together should be stored together ...................................... 88
9.5 Alignment of data...................................................................................................... 90
9.6 Dynamic memory allocation...................................................................................... 90
9.7 Container classes ..................................................................................................... 92
9.8 Strings ...................................................................................................................... 95
9.9 Access data sequentially .......................................................................................... 96
9.10 Cache contentions in large data structures ............................................................. 96
9.11 Explicit cache control .............................................................................................. 99
10 Multithreading.............................................................................................................. 101
10.1 Hyperthreading ..................................................................................................... 102
11 Out of order execution................................................................................................. 103
12 Using vector operations............................................................................................... 105
12.1 AVX instruction set and YMM registers................................................................. 105
12.2 Automatic vectorization......................................................................................... 106
12.3 Explicit vectorization ............................................................................................. 108
12.4 Mathematical functions for vectors........................................................................ 121
12.5 Aligning dynamically allocated memory................................................................. 124
12.6 Aligning RGB video or 3-dimensional vectors ....................................................... 124
12.7 Conclusion............................................................................................................ 124
13 Making critical code in multiple versions for different CPUs......................................... 125
13.1 CPU dispatch strategies........................................................................................ 125
13.2 Difficult cases........................................................................................................ 127
13.3 Test and maintenance .......................................................................................... 129
13.4 Implementation ..................................................................................................... 129
13.5 CPU dispatching in Gnu compiler ......................................................................... 131
13.6 CPU dispatching in Intel compiler ......................................................................... 132
3
14 Specific optimization tips ............................................................................................. 138
14.1 Use lookup tables ................................................................................................. 138
14.2 Bounds checking .................................................................................................. 140
14.3 Use bitwise operators for checking multiple values at once................................... 141
14.4 Integer multiplication............................................................................................. 142
14.5 Integer division...................................................................................................... 143
14.6 Floating point division ........................................................................................... 145
14.7 Don’t mix float and double..................................................................................... 146
14.8 Conversions between floating point numbers and integers ................................... 146
14.9 Using integer operations for manipulating floating point variables......................... 148
14.10 Mathematical functions ....................................................................................... 151
15 Metaprogramming ....................................................................................................... 152
16 Testing speed.............................................................................................................. 155
16.1 The pitfalls of unit-testing ...................................................................................... 157
16.2 Worst-case testing ................................................................................................ 157
17 Optimization in embedded systems............................................................................. 159
18 Overview of compiler options....................................................................................... 161
19 Literature..................................................................................................................... 164
20 Copyright notice .......................................................................................................... 165

Download:

http://www.agner.org/optimize/optimizing_cpp.pdf

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...