Contents
1.1 Reasons for using assembly code
1.2 Reasons for not using assembly code
1.3 Operating systems covered by this manual
2.1 Things to decide before you start programming
3 The basics of assembly coding
3.2 Register set and basic instructions
4.3 Function calling conventions
4.4 Name mangling and name decoration
5 Using intrinsic functions in C++
5.1 Using intrinsic functions for system code
5.2 Using intrinsic functions for instructions not available in standard C++
5.3 Using intrinsic functions for vector operations
5.4 Availability of intrinsic functions
6.1 MASM style inline assembly
6.3 Inline assembly in Delphi Pascal
7.3 Libraries in source code form
7.4 Making classes in assembly
8 Making function libraries compatible with multiple compilers and platforms
8.1 Supporting multiple name mangling schemes
8.2 Supporting multiple calling conventions in 32 bit mode
8.3 Supporting multiple calling conventions in 64 bit mode
8.4 Supporting different object file formats
8.5 Supporting other high level languages
9.1 Identify the most critical parts of your code
9.3 Instruction fetch, decoding and retirement
9.4 Instruction latency and throughput
10.1 Choosing shorter instructions
10.2 Using shorter constants and addresses
10.5 Addresses and pointers in 64-bit mode
10.6 Making instructions longer for the sake of alignment
10.7 Using multi-byte NOPs for alignment
11.6 Organizing data for improved caching
11.7 Organizing code for improved caching
11.8 Cache control instructions
12.5 Instruction fetch, decoding and retirement in a loop
12.6 Distribute µops evenly between execution units
12.7 An example of analysis for bottlenecks in vector loops
12.9 Same example on Sandy Bridge
12.13 Vector loops using mask registers (AVX512)
12.17 Loops on processors without out-of-order execution
13.1 Conditional moves in SIMD registers
13.2 Using vector instructions with other types of data than they are intended for
13.5 Accessing unaligned data and partial vectors
13.6 Using AVX instruction set and YMM registers
13.7 Vector operations in general purpose registers
15.1 Checking for operating system support for XMM and YMM registers
16.1 LEA instruction (all processors)
16.5 Rotates through carry (all processors)
16.6 Bit test (all processors)
16.7 LAHF and SAHF (all processors)
16.8 Integer multiplication (all processors)
16.9 Division (all processors)
16.10 String instructions (all processors)
16.11 Vectorized string instructions (processors with SSE4.2)
16.12 WAIT instruction (all processors)
16.13 FCOM + FSTSW AX (all processors)
16.15 FRNDINT (all processors)
16.16 FSCALE and exponential function (all processors)
16.19 FLDCW (Most Intel processors)
17.1 XMM versus floating point registers
17.4 Freeing floating point registers (all processors)
17.5 Transitions between floating point and MMX instructions
17.6 Converting from floating point to integer (All processors)
17.7 Using integer instructions for floating point operations
17.8 Using floating point instructions for integer operations
17.9 Moving blocks of data (All processors)
17.10 Self-modifying code (All processors)