_InterlockedCompareExchange vs _InterlockedCompareExchange_np - c

I am developing a very specialized and very optimized kernel code (target Windows7 to Windows10, x64), and I need some of the Interlocked intrisics, in particular _InterlockedCompareExchange. The documentation says that there are many variants of the function, but for some of them the description is quite cryptic. MSDN says "The intrinsics with an _np ("no prefetch") suffix prevent a possible prefetch operation from being inserted by the compiler". Does it mean that the compiler may mess up my code by reordering instructions? Should I always use the "_np" version? I tried both functions and the resulting machine code is the same. Any hints?

Related

_asm in which cases is it best to use it? [duplicate]

This question already has answers here:
Why do you program in assembly? [closed]
(30 answers)
Closed 8 years ago.
Is there anything that we can do in assembly that we can't do in raw C? Or anything which is easier to do in assembly? Is any modern code actually written using inline assembly, or is it simply implemented as a legacy or educational feature?
Inline assembly (and on a related note, calling external functions written purely in assembly) can be extremely useful or absolutely essential for reasons such as writing device drivers, direct access to hardware or processor capabilities not defined in the language, hardware-supported parallel processing (as opposed to multi-threading) such as CUDA, interfacing with FPGAs, performance, etc.
It is also important because some things are only possible by going "beneath" the level of abstraction provided by the Standard (both C++ and C).
The Standard(s) recognize that some things will be inherently implementation-defined, and allow for that throughout the Standard. One of these allowances (perhaps the lowest-level) is recognition of asm. Well, "sort of" recognition:
In C (N1256), it is found in the Standard under "Common extensions":
J.5.10 The asm keyword
1 The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form:
asm ( character-string-literal );
In C++ (N3337), it has similar caveats:
§7.4/1
An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note]
It should be noted that an important development in recent years is that attempting to increase performance by using inline assembly is often counter-productive, unless you know exactly what you are doing. Compiler/optimizer register usage decisions, awareness of pipeline and branch prediction behavior, etc., are almost always enough for most uses.
On the other hand, processors in recent years have added CPU-level support for higher-level operations (such as Intel's AES extensions) that can increase performance by several orders of magnitude for specialized applications.
So:
Legacy feature? Not at all. It is absolutely essential for some requirements.
Educational feature? In an ideal world, only if accompanied by a series of lectures explaining why you'll probably never need it, and if you ever do need it, how to limit it's visible surface area to the rest of your application as much as possible.
You also need to code with inlined asm when:
you need to use some processor features not usable in standard C; typically, the add with carry machine instruction is useful in bignum implementations like GMPlib
on today's processors with current optimizing compilers, you usually should not use  asm for performance reasons, since compilers optimize better than you can (an old example was implementing memcpy with rep stosw on x86).
you need some asm when you are using or implementing a different ABI. For example, the runtime system of some Ocaml or Common Lisp implementations have different calling conventions, and transitioning to them may require asm; but the current libffi (which is using  asm) may avoid you to code with asm
your brand-new hardware might have a recent instruction set not fully implemented by your compiler (e.g. extensions like AVX512...) for which you might need asm
you want to implement some functionality not implementable in C, e.g. backtrace
In general, you should think more than twice before using asm and if you do use it, you should use it in very few places. In general, avoid using asm....
The GCC compiler introduced an extended asm feature which has nearly become a de facto standard supported by many other compilers (e.g. Clang/LLVM...) - but the evil is in the details. See also the GCC Inline Assembly HowTo
The Linux kernel (and the many libc implementations, e.g. glibc or musl libc, etc...) is using asm (at least to make syscalls) but few major free software are also (directly) using asm instructions.
Read also the Linux Assembly HowTo

How to use x86intrin.h

In one of my applications, I need to efficiently de-interleave bits in a long stream of data. Ideally, I would like to use the BMI2 pext_u32() and/or pext_u64() x86_64 intrinsic instructions when available. I scoured the internet for doc on x86intrin.h (GCC), but couldn't find much on the subject; so, I am asking the gurus on StackOverflow to help me out.
Where can I find documentation about how to work with functions in x86intrin.h?
Does gcc's implementation of pext_*() already have code behind it to fall back on, or do I need to write the fallback code myself (for conditional compile)?
Is it possible to write a binary that automatically falls back to an alternate implementation if a target does not support the intrinsic? If so, how does one do so?
Is there a known programming pattern that will be recognized by GCC and automatically converted to pext_*() when compiling with optimization enabled and with -mbmi2?
Intel publishes the Intrinsics Guide, which also applies to GCC. You will have to write your own fallback code if you use these intrinsics.
You can achieve automatic switching of implementations by using IFUNC resolvers, but for non-library code, using conditionals or function pointers is probably simpler.
Looking at the gcc/config/i386/i386.md and gcc/config/i386/i386.c files, I don't see anything in GCC 8 which would automatically select the pext instruction without intrinsics in the source code.
The design philosophy of Intel's intrinsics is that you can only use them in functions that will run only on CPUs with the required extensions. Checking for support every instruction would add way too much overhead, and then there's have to be a fallback (there isn't).
Intel intrinsics are not like GNU C __builtin_popcountll (which does use a fallback if compiled without -mpopcnt, but not you can enable target options on a per-function basis with attributes.)

What are some "real-life" uses of inline assembly? [duplicate]

This question already has answers here:
Why do you program in assembly? [closed]
(30 answers)
Closed 8 years ago.
Is there anything that we can do in assembly that we can't do in raw C? Or anything which is easier to do in assembly? Is any modern code actually written using inline assembly, or is it simply implemented as a legacy or educational feature?
Inline assembly (and on a related note, calling external functions written purely in assembly) can be extremely useful or absolutely essential for reasons such as writing device drivers, direct access to hardware or processor capabilities not defined in the language, hardware-supported parallel processing (as opposed to multi-threading) such as CUDA, interfacing with FPGAs, performance, etc.
It is also important because some things are only possible by going "beneath" the level of abstraction provided by the Standard (both C++ and C).
The Standard(s) recognize that some things will be inherently implementation-defined, and allow for that throughout the Standard. One of these allowances (perhaps the lowest-level) is recognition of asm. Well, "sort of" recognition:
In C (N1256), it is found in the Standard under "Common extensions":
J.5.10 The asm keyword
1 The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form:
asm ( character-string-literal );
In C++ (N3337), it has similar caveats:
§7.4/1
An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note]
It should be noted that an important development in recent years is that attempting to increase performance by using inline assembly is often counter-productive, unless you know exactly what you are doing. Compiler/optimizer register usage decisions, awareness of pipeline and branch prediction behavior, etc., are almost always enough for most uses.
On the other hand, processors in recent years have added CPU-level support for higher-level operations (such as Intel's AES extensions) that can increase performance by several orders of magnitude for specialized applications.
So:
Legacy feature? Not at all. It is absolutely essential for some requirements.
Educational feature? In an ideal world, only if accompanied by a series of lectures explaining why you'll probably never need it, and if you ever do need it, how to limit it's visible surface area to the rest of your application as much as possible.
You also need to code with inlined asm when:
you need to use some processor features not usable in standard C; typically, the add with carry machine instruction is useful in bignum implementations like GMPlib
on today's processors with current optimizing compilers, you usually should not use  asm for performance reasons, since compilers optimize better than you can (an old example was implementing memcpy with rep stosw on x86).
you need some asm when you are using or implementing a different ABI. For example, the runtime system of some Ocaml or Common Lisp implementations have different calling conventions, and transitioning to them may require asm; but the current libffi (which is using  asm) may avoid you to code with asm
your brand-new hardware might have a recent instruction set not fully implemented by your compiler (e.g. extensions like AVX512...) for which you might need asm
you want to implement some functionality not implementable in C, e.g. backtrace
In general, you should think more than twice before using asm and if you do use it, you should use it in very few places. In general, avoid using asm....
The GCC compiler introduced an extended asm feature which has nearly become a de facto standard supported by many other compilers (e.g. Clang/LLVM...) - but the evil is in the details. See also the GCC Inline Assembly HowTo
The Linux kernel (and the many libc implementations, e.g. glibc or musl libc, etc...) is using asm (at least to make syscalls) but few major free software are also (directly) using asm instructions.
Read also the Linux Assembly HowTo

What's the purpose of using assembly language inside a C program?

What's the purpose of using assembly language inside a C program? Compilers are able to generate assembly language already. In what cases would it be better to write assembly than C? Is performance a consideration?
In addition to what everyone said: not all CPU features are exposed to C. Sometimes, especially in driver and operating system programming, one needs to explicitly work with special registers and/or commands that are not otherwise available.
Also vector extensions.
That was especially true before the advent of compiler intrinsics. Those alleviate the need for inline assembly somewhat.
One more use case for inline assembly has to do with interfacing C with reflected languages. Specifically, assembly is all but necessary if you need to call a function when its prototype is not known at compile time. In other words, when the quantity and datatypes of that function's arguments are but runtime variables. C variadic functions and the stdarg machinery won't help you in this case - they would help you parse a stack frame, but not build one. In assembly, on the other hand, it's quite doable.
This is not an OS/driver scenario. There are at least two technologies out there - Java's JNI and COM Automation - where this is a must. In case of Automation, I'm talking about the way the COM runtime is marshaling dual interfaces using their type libraries.
I can think of a very crude C alternative to assembly for that, but it'd be ugly as sin. Slightly less ugly in C++ with templates.
Yet another use case: crash/run-time error reporting. For postmortem debugging, you'd want to capture as much of program state at the point of crash as possible (i. e. all the CPU registers), and assembly is a much better vehicle for that than C. Postmortem debugging of crashing native code usually involves staring at the assembly anyway.
Yet another use case - code that is intended for execution in another process without that process' co-operation or knowledge. This is often referred to as "shellcode", but it doesn't have to be shell related. Code like that needs to be very carefully written, and it can't rely on the conveniences of a high level language (like the run time library, or having a data section) that are normally taken for granted. When one is after injecting a significant piece of functionality into a target process, they usually end up loading a dynamic library, but the initial trampoline code that loads the library and passes control to it tends to be in assembly.
I've been only covering cases where assembly is necessary. Hand-optimizing for performance is covered in other answers.
There are a few, although not many, cases where hand-optimized assembly language can be made to run more efficiently than assembly language generated by C compilers from C source code. Also, for developers used to assembly language, some things can just seem easier to write in assembler.
For these cases, many C compilers allow inline assembly.
However, this is becoming increasingly rare as C compilers get better and better and producing efficient code, and most platforms put restrictions on some of the low-level type of software that is often the type of software that benefits most from being written in assembler.
In general, it is performance but performance of a very specific kind. For example, the SIMD parallel instructions of a processor might not generated by the compiler. By utilizing processor specific data formats and then issuing processor specific parallel instructions (e.g. ARM NEON or Intel SSE), very fast performance on graphics or signal processing problems can occur. Even then, some compilers allow these to be expressed in C using intrinsic functions.
While it used to be common to use assembly language inserts to hand-optimize critical functions, those days are largely done. Modern compilers are very good and modern processors have very complicated timing requirements so hand optimized code is often less optimal than expected.
There were various reasons to write inline assemblies in C. We can simply categorize the reasons into necessary and unnecessary.
For the reasons of unnecessary, possibly be:
platform compatibility
performance concerning
code optimization
etc.
I consider above as unnecessary because sometime they can be discard or implemented through pure C. For example of platform compatibility, you can totally implement particular version for each platform, however, use inline assemblies might reduce the effort. Here we are not going to talk too much about the unnecessary reasons.
For necessary reasons, they possibly be:
something with standard libraries was insufficient to do
some instruction set was not supported by compilers
object code generated incorrectly
writing stack-sensitive code
etc.
These reasons considered necessary, because of they are almost not possibly done with pure C language. For example, in old DOSes, software interrupt INT21 was not reentrantable. If you want to write a Virtual Dirve fully use INT21 supported by the compiler, it was impossible to do. In this situation, you would need to hook the original INT21, and make it reentrantable. However, the compiled code wraps your every call with prolog/epilog. Thus, you can never break something restricted, or you just crashed the code. You can try any of trick by using the pure language of C with libraries; but even you can successfully find a trick, that would mean you found a particular order that the compiler generates the machine code; this is implying: you tried to let the compiler compiles your code to exactly machine code. So, why not just write inline assemblies directly?
This example explained all above of necessary reasons except instruction set not supported, but I think that was easy to think about.
In fact, there're more reasons to write inline assemblies, but now you have some ideas of them, and so on.
Just as a curiosity, I'm adding here a concrete example of something not-so-low-level you can only do in assembly. I read this in an assembly book from my university time where it was used to show an inherent limitation of C/C++, and how to overcome it with assembly.
The problem is how do I invoke a function when the exact number of parameters is only known at runtime? In fact, in C/C++ you can easily define a function that takes a variable number of arguments like printf. But when it comes to calling that function, the compiler wants to know exactly how many parameters must be passed. You may pass more paremters than required, that won't do any harm. But what if the number grows unexpectedly to 100 or 1000 parameters, and must be picked out of an array?
The solution of course is using assembly, where you can dynamically create a stack frame of the proper size, copy the parameters on the stack, invoke the function, and finally reset the stack.
In practice, this would hardly ever be a limitation (except if the library you're using is really really bad designed). People who use assembly in C have much better reasons to do so like others have pointed out in their answers. Still, I think may be an interesting fact to know.
I would rather think of that as a way to write a very specific code for a specific platform, optimization, though still common, is used less nowadays. Knowledge and usage of assembly in C is also practiced by all-color hats.

How to create a C compiler for custom CPU?

What would be the easiest way to create a C compiler for a custom CPU, assuming of course I already have an assembler for it?
Since a C compiler generates assembly, is there some way to just define standard bits and pieces of assembly code for the various C idioms, rebuild the compiler, and thereby obtain a cross compiler for the target hardware?
Preferably the compiler itself would be written in C, and build as a native executable for either Linux or Windows.
Please note: I am not asking how to write the compiler itself. I did take that course in college, I know about general compiler-compilers, etc. In this situation, I'd just like to configure some existing framework if at all possible. I don't want to modify the language, I just want to be able to target an arbitrary architecture. If the answer turns out to be "it doesn't work that way", that information will be useful to myself and anyone else who might make similar assumptions.
Quick overview/tutorial on writing a LLVM backend.
This document describes techniques for writing backends for LLVM which convert the LLVM representation to machine assembly code or other languages.
[ . . . ]
To create a static compiler (one that emits text assembly), you need to implement the following:
Describe the register set.
Describe the instruction set.
Describe the target machine.
Implement the assembly printer for the architecture.
Implement an instruction selector for the architecture.
There's the concept of a cross-compiler, ie., one that runs on one architecture, but targets a different one. You can see how GCC does it (for example) and add a new architecture to the set, if that's the compiler you want to extend.
Edit: I just spotted a question a few years ago on a GCC mailing list on how to add a new target and someone pointed to this
vbcc (at www.compilers.de) is a good and simple retargetable C-compiler written in C. It's much simpler than GCC/LLVM. It's so simple I was able to retarget the compiler to my own CPU with a few weeks of work without having any prior knowledge of compilers.
The short answer is that it doesn't work that way.
The longer answer is that it does take some effort to write a compiler for a new CPU type. You don't need to create a compiler from scratch, however. Most compilers are structured in several passes; here's a typical architecture (a lot of variations are possible):
Syntactic analysis (lexer and parser), and for C preprocessing, leading to an abstract syntax tree.
Type checking, leading to an annotated abstract syntax tree.
Intermediate code generation, leading to architecture-independent intermediate code. Some optimizations are performed at this stage.
Machine code generation, leading to assembly or directly to machine code. More optimizations are performed at this stage.
In this description, only step 4 is machine-dependent. So you can take a compiler where step 4 is clearly separated and plug in your own step 4. Doing this requires a deep understanding of the CPU and some understanding of the compiler internals, but you don't need to worry about what happens before.
Almost all CPUs that are not very small, very rare or very old have a backend (step 4) for GCC. The main documentation for writing a GCC backend is the GCC internals manual, in particular the chapters on machine descriptions and target descriptions. GCC is free software, so there is no licensing cost in using it.
1) Short answer:
"No. There's no such thing as a "compiler framework" where you can just add water (plug in your own assembly set), stir, and it's done."
2) Longer answer: it's certainly possible. But challenging. And likely expensive.
If you wanted to do it yourself, I'd start by looking at Gnu CC. It's already available for a large variety of CPUs and platforms.
3) Take a look at this link for more ideas (including the idea of "just build a library of functions and macros"), that would be my first suggestion:
http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/
You can modify existing open source compilers such as GCC or Clang. Other answers have provided you with links about where to learn more. But these compilers are not designed to easily retargeted; they are "easier" to retarget than compilers than other compilers wired for specific targets.
But if you want a compiler that is relatively easy to retarget, you want one in which you can specify the machine architecture in explicit terms, and some tool generates the rest of the compiler (GCC does a bit of this; I don't think Clang/LLVM does much but I could be wrong here).
There's a lot of this in the literature, google "compiler-compiler".
But for a concrete solution for C, you should check out ACE, a compiler vendor that generates compilers on demand for customers. Not free, but I hear they produce very good compilers very quickly. I think it produces standard style binaries (ELF?) so it skips the assembler stage. (I have no experience or relationship with ACE.)
If you don't care about code quality, you can likely write a syntax-directed translation of C to assembler using a C AST. You can get C ASTs from GCC, Clang, maybe ANTLR, and from our DMS Software Reengineering Toolkit.

Resources