2.1 Things to decide before you start programming
Before you start to program in assembly, you have to think about why you want to use assembly language, which part of your program you need to make in assembly, and what programming method to use. If you haven't made your development strategy clear, then you will soon find yourself wasting time optimizing the wrong parts of the program, doing things in assembly that could have been done in C++, attempting to optimize things that cannot be optimized further, making spaghetti code that is difficult to maintain, and making code that is full or errors and difficult to debug.
Here is a checklist of things to consider before you start programming:
Assembly code is error prone, difficult to debug, difficult to make in a clearly structured way, difficult to read, and difficult to maintain, as I have already mentioned. A consistent test strategy can ameliorate some of these problems and save you a lot of time.
My recommendation is to make the assembly code as an isolated module, function, class or library with a well-defined interface to the calling program. Make it all in C++ first. Then make a test program which can test all aspects of the code you want to optimize. It is easier and safer to use a test program than to test the module in the final application.
The test program has two purposes. The first purpose is to verify that the assembly code works correctly in all situations. And the second purpose is to test the speed of the assembly code without invoking the user interface, file access and other parts of the final application program that may make the speed measurements less accurate and less reproducible.
You should use the test program repeatedly after each step in the development process and after each modification of the code.
Make sure the test program works correctly. It is quite common to spend a lot of time looking for an error in the code under test when in fact the error is in the test program.
There are different test methods that can be used for verifying that the code works correctly. A white box test supplies a carefully chosen series of different sets of input data to make sure that all branches, paths and special cases in the code are tested. A black box test supplies a series of random input data and verifies that the output is correct. A very long series of random data from a good random number generator can sometimes find rarely occurring errors that the white box test hasn't found.
The test program may compare the output of the assembly code with the output of a C++ implementation to verify that it is correct. The test should cover all boundary cases and preferably also illegal input data to see if the code generates the correct error responses.
The speed test should supply a realistic set of input data. A significant part of the CPU time may be spent on branch mispredictions in code that contains a lot of branches. The amount of branch mispredictions depends on the degree of randomness in the input data. You may experiment with the degree of randomness in the input data to see how much it influences the computation time, and then decide on a realistic degree of randomness that matches a typical real application.
An automatic test program that supplies a long stream of test data will typically find more errors and find them much faster than testing the code in the final application. A good test program will find most errors, but you cannot be sure that it finds all errors. It is possible that some errors show up only in combination with the final application.
The following list points out some of the most common programming errors in assembly code.
1. Forgetting to save registers. Some registers have callee-save status, for example EBX. These registers must be saved in the prolog of a function and restored in the epilog if they are modified inside the function. Remember that the order of POP instructions must be the opposite of the order of PUSH instructions. See page 28 for a list of callee-save registers.
2. Unmatched PUSH and POP instructions. The number of PUSH and POP instructions must be equal for all possible paths through a function. Example:
Example 2.1. Unmatched push/pop
push ebx
test ecx, ecx
jz Finished
...
pop ebx
Finished: ; Wrong! Label should be before pop ebx
ret
Here, the value of EBX that is pushed is not popped again if ECX is zero. The result is that the RET instruction will pop the former value of EBX and jump to a wrong address.
3. Using a register that is reserved for another purpose. Some compilers reserve the use of EBP or EBX for a frame pointer or other purpose. Using these registers for a different purpose in inline assembly can cause errors.
4. Stack-relative addressing after push. When addressing a variable relative to the stack pointer, you must take into account all preceding instructions that modify the stack pointer. Example:
Example 2.2. Stack-relative addressing
mov [esp+4], edi
push ebp
push ebx
cmp esi, [esp+4] ; Probably wrong!
Here, the programmer probably intended to compare ESI with EDI, but the value of ESP that is used for addressing has been changed by the two PUSH instructions, so that ESI is in fact compared with EBP instead.
5. Confusing value and address of a variable. Example:
Example 2.3. Value versus address (MASM syntax)
.data
MyVariable DD 0 ; Define variable
.code
mov eax, MyVariable ; Gets value of MyVariable
mov eax, offset MyVariable; Gets address of MyVariable
lea eax, MyVariable ; Gets address of MyVariable
mov ebx, [eax] ; Gets value of MyVariable through pointer
mov ebx, [100] ; Gets the constant 100 despite brackets
mov ebx, ds:[100] ; Gets value from address 100
6. Ignoring calling conventions. It is important to observe the calling conventions for functions, such as the order of parameters, whether parameters are transferred on the stack or in registers, and whether the stack is cleaned up by the caller or the called function. See page 27.
7. Function name mangling. A C++ code that calls an assembly function should use extern "C" to avoid name mangling. Some systems require that an underscore (_) is put in front of the name in the assembly code. See page 30.
8. Forgetting return. A function declaration must end with both RET and ENDP. Using one of these is not enough. The execution will continue in the code after the procedure if there is no RET.
9. Forgetting stack alignment. The stack pointer must point to an address divisible by 16 before any call statement, except in 16-bit systems and 32-bit Windows. See page 27.
10. Forgetting shadow space in 64-bit Windows. It is required to reserve 32 bytes of empty stack space before any function call in 64-bit Windows. See page 30.
11. Mixing calling conventions. The calling conventions in 64-bit Windows and 64-bit Linux are different. See page 27.
12. Forgetting to clean up floating point register stack. All floating point stack registers that are used by a function must be cleared, typically with FSTP ST(0), before the function returns, except for ST(0) if it is used for return value. It is necessary to keep track of exactly how many floating point registers are in use. If a functions pushes more values on the floating point register stack than it pops, then the register stack will grow each time the function is called. An exception is generated when the stack is full. This exception may occur somewhere else in the program.
13. Forgetting to clear MMX state. A function that uses MMX registers must clear these with the EMMS instruction before any call or return.
14. Forgetting to clear YMM state. A function that uses YMM registers must clear these with the VZEROUPPER or VZEROALL instruction before any call or return.
15. Forgetting to clear direction flag. Any function that sets the direction flag with STD must clear it with CLD before any call or return.
16. Mixing signed and unsigned integers. Unsigned integers are compared using the JB and JA instructions. Signed integers are compared using the JL and JG instructions. Mixing signed and unsigned integers can have unintended consequences.
17. Forgetting to scale array index. An array index must be multiplied by the size of one array element. For example mov eax, MyIntegerArray[ebx*4].
18. Exceeding array bounds. An array with n elements is indexed from 0 to n - 1, not from 1 to n. A defective loop writing outside the bounds of an array can cause errors elsewhere in the program that are hard to find.
19. Loop with ECX = 0. A loop that ends with the LOOP instruction will repeat 232 times if ECX is zero. Be sure to check if ECX is zero before the loop.
20. Reading carry flag after INC or DEC. The INC and DEC instructions do not change the carry flag. Do not use instructions that read the carry flag, such as ADC, SBB, JC, JBE, SETA, etc. after INC or DEC. Use ADD and SUB instead of INC and DEC to avoid this problem.