Inline assembly is another way of putting assembly code into a C++ file. The keyword asm or _asm or __asm or __asm__ tells the compiler that the code is assembly. Different compilers have different syntaxes for inline assembly. The different syntaxes are explained below.
The advantages of using inline assembly are:
The disadvantages of using inline assembly are:
The following sections illustrate how to make inline assembly with different compilers.
6.1 MASM style inline assembly
The most common syntax for inline assembly is a MASM-style syntax. This is the easiest way of making inline assembly and it is supported by most compilers, but not the Gnu compiler. Unfortunately, the syntax for inline assembly is poorly documented or not documented at all in the compiler manuals. I will therefore briefly describe the syntax here.
The following examples show a function that raises a floating point number x to an integer power n. The algorithm is to multiply x1, x2, x4, x8, etc. according to each bit in the binary representation of n. Actually, it is not necessary to code this in assembly because a good compiler will optimize it almost as much when you just write pow(x,n). My purpose here is just to illustrate the syntax of inline assembly.
First the code in C++ to illustrate the algorithm:
// Example 6.1a. Raise double x to the power of int n.
double ipow (double x, int n) {
unsigned int nn = abs(n); // absolute value of n
double y = 1.0; // used for multiplication
while (nn != 0) { // loop for each bit in nn
if (nn & 1) y *= x; // multiply if bit = 1
x *= x; // square x
nn >>= 1; // get next bit of nn
}
if (n < 0) y = 1.0 / y; // reciprocal if n is negative
return y; // return y = pow(x,n)
}
And then the optimized code using inline assembly with MASM style syntax:
// Example 6.1b. MASM style inline assembly, 32 bit mode
double ipow (double x, int n) {
__asm {
mov eax, n // Move n to eax
// abs(n) is calculated by inverting all bits and adding 1 if n < 0:
cdq // Get sign bit into all bits of edx
xor eax, edx // Invert bits if negative
sub eax, edx // Add 1 if negative. Now eax = abs(n)
fld1 // st(0) = 1.0
jz L9 // End if n = 0
fld qword ptr x // st(0) = x, st(1) = 1.0
jmp L2 // Jump into loop
L1: // Top of loop
fmul st(0), st(0) // Square x
L2: // Loop entered here
shr eax, 1 // Get each bit of n into carry flag
jnc L1 // No carry. Skip multiplication, goto next
fmul st(1), st(0) // Multiply by x squared i times for bit # i
jnz L1 // End of loop. Stop when nn = 0
fstp st(0) // Discard st(0)
test edx, edx // Test if n was negative
jns L9 // Finish if n was not negative
fld1 // st(0) = 1.0, st(1) = x^abs(n)
fdivr // Reciprocal
L9: // Finish
} // Result is in st(0)
#pragma warning(disable:1011) // Don't warn for missing return value
}
Note that the function entry and parameters are declared with C++ syntax. The function body, or part of it, can then be coded with inline assembly. The parameters x and n, which are declared with C++ syntax, can be accessed directly in the assembly code using the same names. The compiler simply replaces x and n in the assembly code with the appropriate memory operands, probably [esp+4] and [esp+12]. If the inline assembly code needs to access a variable that happens to be in a register, then the compiler will store it to a memory variable on the stack and then insert the address of this memory variable in the inline assembly code.
The result is returned in st(0) according to the 32-bit calling convention. The compiler will normally issue a warning because there is no return y; statement in the end of the function. This statement is not needed if you know which register to return the value in. The #pragma warning(disable:1011) removes the warning. If you want the code to work with different calling conventions (e.g. 64-bit systems) then it is necessary to store the result in a temporary variable inside the assembly block:
// Example 6.1c. MASM style, independent of calling convention
double ipow (double x, int n) {
double result; // Define temporary variable for result
__asm {
mov eax, n
cdq
xor eax, edx
sub eax, edx
fld1
jz L9
fld qword ptr x
jmp L2
L1:fmul st(0), st(0)
L2:shr eax, 1
jnc L1
fmul st(1), st(0)
jnz L1
fstp st(0)
test edx, edx
jns L9
fld1
fdivr
L9:fstp qword ptr result // store result to temporary variable
}
return result;
}
Now the compiler takes care of all aspects of the calling convention and the code works on all x86 platforms.
The compiler inspects the inline assembly code to see which registers are modified. The compiler will automatically save and restore these registers if required by the register usage convention. In some compilers it is not allowed to modify register ebp or ebx in the inline assembly code because these registers are needed for a stack frame. The compiler will generally issue a warning in this case.
It is possible to remove the automatically generated prolog and epilog code by adding __declspec(naked) to the function declaration. In this case it is the responsibility of the programmer to add any necessary prolog and epilog code and to save any modified registers if necessary. The only thing the compiler takes care of in a naked function is name mangling. Automatic variable name substitution may not work with naked functions because it depends on how the function prolog is made. A naked function cannot be inlined.
Accessing register variables
Register variables cannot be accessed directly by their symbolic names in MASM-style inline assembly. Accessing a variable by name in an inline assembly code will force the compiler to store the variable to a temporary memory location.
If you know which register a variable is in then you can simply write the name of the register. This makes the code more efficient but less portable.
For example, if the code in the above example is used in 64-bit Windows, then x will be in register XMM0 and n will be in register EDX. Taking advantage of this knowledge, we can improve the code:
// Example 6.1d. MASM style, 64-bit Windows
double ipow (double x, int n) {
const double one = 1.0; // define constant 1.0
__asm { // x is in xmm0
mov eax, edx // get n into eax
cdq
xor eax, edx
sub eax, edx
movsd xmm1, one // load 1.0
jz L9
jmp L2
L1:mulsd xmm0, xmm0 // square x
L2:shr eax, 1
jnc L1
mulsd xmm1, xmm0 // Multiply by x squared i times
jnz L1
movsd xmm0, xmm1 // Put result in xmm0
test edx, edx
jns L9
movsd xmm0, one
divsd xmm0, xmm1 // Reciprocal
L9: }
#pragma warning(disable:1011) // Don't warn for missing return value
}
In 64-bit Linux we will have n in register EDI so the line mov eax,edx should be changed to mov eax,edi.
Accessing class members and structure members
Let's take as an example a C++ class containing a list of integers:
// Example 6.2a. Accessing class data members
// define C++ class
class MyList {
protected:
int length; // Number of items in list
int buffer[100]; // Store items
public:
MyList(); // Constructor
void AttItem(int item); // Add item to list
int Sum(); // Compute sum of items
};
MyList::MyList() { // Constructor
length = 0;}
void MyList::AttItem(int item) { // Add item to list
if (length < 100) {
buffer[length++] = item;
}
}
int MyList::Sum() { // Member function Sum
int i, sum = 0;
for (i = 0; i < length; i++) sum += buffer[i];
return sum;}
Below, I will show how to code the member function MyList::Sum in inline assembly. I have not tried to optimize the code, my purpose here is simply to show the syntax.
Class members are accessed by loading 'this' into a pointer register and addressing class data members relative to the pointer with the dot operator (.).
// Example 6.2b. Accessing class members (32-bit)
int MyList::Sum() {
__asm {
mov ecx, this // 'this' pointer
xor eax, eax // sum = 0
xor edx, edx // loop index, i = 0
cmp [ecx].length, 0 // if (this->length != 0)
je L9
L1: add eax, [ecx].buffer[edx*4] // sum += buffer[i]
add edx, 1 // i++
cmp edx, [ecx].length // while (i < length)
jb L1
L9:
} // Return value is in eax
#pragma warning(disable:1011)
}
Here the 'this' pointer is accessed by its name 'this', and all class data members are addressed relative to 'this'. The offset of the class member relative to 'this' is obtained by writing the member name preceded by the dot operator. The index into the array named buffer must be multiplied by the size of each element in buffer [edx*4].
Some 32-bit compilers for Windows put 'this' in ecx, so the instruction mov ecx,this can be omitted. 64-bit systems require 64-bit pointers, so ecx should be replaced by rcx and edx by rdx. 64-bit Windows has 'this' in rcx, while 64-bit Linux has 'this' in rdi. Structure members are accessed in the same way by loading a pointer to the structure into a register and using the dot operator. There is no syntax check against accessing private and protected members. There is no way to resolve the ambiguity if more than one structure or class has a member with the same name. The MASM assembler can resolve such ambiguities by using the assume directive or by putting the name of the structure before the dot, but this is not possible with inline assembly.
Calling functions
Functions are called by their name in inline assembly. Member functions can only be called from other member functions of the same class. Overloaded functions cannot be called because there is no way to resolve the ambiguity. It is not possible to use mangled function names. It is the responsibility of the programmer to put any function parameters on the stack or in the right registers before calling a function and to clean up the stack after the call. It is also the programmer's responsibility to save any registers you want to preserve across the function call, unless these registers have callee-save status.
Because of these complications, I will recommend that you go out of the assembly block and use C++ syntax when making function calls.
Syntax overview
The syntax for MASM-style inline assembly is not well described in any compiler manual I have seen. I will therefore summarize the most important rules here.
In most cases, the MASM-style inline assembly is interpreted by the compiler without in