5 Using intrinsic functions in C++
As already mentioned, there are three different ways of making assembly code: using intrinsic functions and vector classes in C++, using inline assembly in C++, and making separate assembly modules. Intrinsic functions are described in this chapter. The other two methods are described in the following chapters.
Intrinsic functions and vector classes are highly recommended because they are much easier and safer to use than assembly language syntax.
The Microsoft, Intel, Gnu and Clang C++ compilers have support for intrinsic functions. Most of the intrinsic functions generate one machine instruction each. An intrinsic function is therefore equivalent to an assembly instruction.
Coding with intrinsic functions is a kind of high-level assembly. It can easily be combined with C++ language constructs such as if-statements, loops, functions, classes and operator overloading. Using intrinsic functions is an easier way of doing high level assembly coding than using .if constructs etc. in an assembler or using the so-called high level assembler (HLA).
The invention of intrinsic functions has made it much easier to do programming tasks that previously required coding with assembly syntax. The advantages of using intrinsic functions are:
The disadvantages of using intrinsic functions are:
5.1 Using intrinsic functions for system code
Intrinsic functions are useful for making system code and access system registers that are not accessible with standard C++. Some of these functions are listed below.
Functions for accessing system registers:
__rdtsc, __readpmc, __readmsr, __readcr0, __readcr2, __readcr3, __readcr4,
__readcr8, __writecr0, __writecr3, __writecr4, __writecr8, __writemsr,
_mm_getcsr, _mm_setcsr, __getcallerseflags.
Functions for input and output:
__inbyte, __inword, __indword, __outbyte, __outword, __outdword.
Functions for atomic memory read/write operations:
_InterlockedExchange, etc.
Functions for accessing FS and GS segments:
__readfsbyte, __writefsbyte, etc.
Cache control instructions (Require SSE or SSE2 instruction set):
_mm_prefetch, _mm_stream_si32, _mm_stream_pi, _mm_stream_si128, _ReadBarrier,
_WriteBarrier, _ReadWriteBarrier, _mm_sfence.
Other system functions:
__cpuid, __debugbreak, _disable, _enable.
5.2 Using intrinsic functions for instructions not available in standard C++
Some simple instructions that are not available in standard C++ can be coded with intrinsic functions, for example functions for bit-rotate, bit-scan, etc.:
_rotl8, _rotr8, _rotl16, _rotr16, _rotl, _rotr, _rotl64, _rotr64, _BitScanForward,
_BitScanReverse.
5.3 Using intrinsic functions for vector operations
Vector instructions are very useful for improving the speed of code with inherent parallelism. There are intrinsic functions for almost instructions on vector registers.
The use of these intrinsic functions for vector operations is thoroughly described in manual 1: "Optimizing software in C++".
5.4 Availability of intrinsic functions
The intrinsic functions are available on newer versions of Microsoft, Gnu and Intel compilers. Most intrinsic functions have the same names in all three compilers. You have to include a header file named intrin.h or emmintrin.h to get access to the intrinsic functions. The Codeplay compiler has limited support for intrinsic vector functions, but the function names are not compatible with the other compilers.
The intrinsic functions are listed in the help documentation for each compiler, in the appropriate header files, in msdn.microsoft.com, in "Intel 64 and IA-32 Architectures Software Developer’s Manual" (developer.intel.com) and in "Intel Intrinsic Guide" (softwareprojects.intel.com/avx/).