What are some "real-life" uses of inline assembly? [duplicate] - c

This question already has answers here:
Why do you program in assembly? [closed]
(30 answers)
Closed 8 years ago.
Is there anything that we can do in assembly that we can't do in raw C? Or anything which is easier to do in assembly? Is any modern code actually written using inline assembly, or is it simply implemented as a legacy or educational feature?

Inline assembly (and on a related note, calling external functions written purely in assembly) can be extremely useful or absolutely essential for reasons such as writing device drivers, direct access to hardware or processor capabilities not defined in the language, hardware-supported parallel processing (as opposed to multi-threading) such as CUDA, interfacing with FPGAs, performance, etc.
It is also important because some things are only possible by going "beneath" the level of abstraction provided by the Standard (both C++ and C).
The Standard(s) recognize that some things will be inherently implementation-defined, and allow for that throughout the Standard. One of these allowances (perhaps the lowest-level) is recognition of asm. Well, "sort of" recognition:
In C (N1256), it is found in the Standard under "Common extensions":
J.5.10 The asm keyword
1 The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form:
asm ( character-string-literal );
In C++ (N3337), it has similar caveats:
§7.4/1
An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note]
It should be noted that an important development in recent years is that attempting to increase performance by using inline assembly is often counter-productive, unless you know exactly what you are doing. Compiler/optimizer register usage decisions, awareness of pipeline and branch prediction behavior, etc., are almost always enough for most uses.
On the other hand, processors in recent years have added CPU-level support for higher-level operations (such as Intel's AES extensions) that can increase performance by several orders of magnitude for specialized applications.
So:
Legacy feature? Not at all. It is absolutely essential for some requirements.
Educational feature? In an ideal world, only if accompanied by a series of lectures explaining why you'll probably never need it, and if you ever do need it, how to limit it's visible surface area to the rest of your application as much as possible.

You also need to code with inlined asm when:
you need to use some processor features not usable in standard C; typically, the add with carry machine instruction is useful in bignum implementations like GMPlib
on today's processors with current optimizing compilers, you usually should not use  asm for performance reasons, since compilers optimize better than you can (an old example was implementing memcpy with rep stosw on x86).
you need some asm when you are using or implementing a different ABI. For example, the runtime system of some Ocaml or Common Lisp implementations have different calling conventions, and transitioning to them may require asm; but the current libffi (which is using  asm) may avoid you to code with asm
your brand-new hardware might have a recent instruction set not fully implemented by your compiler (e.g. extensions like AVX512...) for which you might need asm
you want to implement some functionality not implementable in C, e.g. backtrace
In general, you should think more than twice before using asm and if you do use it, you should use it in very few places. In general, avoid using asm....
The GCC compiler introduced an extended asm feature which has nearly become a de facto standard supported by many other compilers (e.g. Clang/LLVM...) - but the evil is in the details. See also the GCC Inline Assembly HowTo
The Linux kernel (and the many libc implementations, e.g. glibc or musl libc, etc...) is using asm (at least to make syscalls) but few major free software are also (directly) using asm instructions.
Read also the Linux Assembly HowTo

Related

_asm in which cases is it best to use it? [duplicate]

This question already has answers here:
Why do you program in assembly? [closed]
(30 answers)
Closed 8 years ago.
Is there anything that we can do in assembly that we can't do in raw C? Or anything which is easier to do in assembly? Is any modern code actually written using inline assembly, or is it simply implemented as a legacy or educational feature?
Inline assembly (and on a related note, calling external functions written purely in assembly) can be extremely useful or absolutely essential for reasons such as writing device drivers, direct access to hardware or processor capabilities not defined in the language, hardware-supported parallel processing (as opposed to multi-threading) such as CUDA, interfacing with FPGAs, performance, etc.
It is also important because some things are only possible by going "beneath" the level of abstraction provided by the Standard (both C++ and C).
The Standard(s) recognize that some things will be inherently implementation-defined, and allow for that throughout the Standard. One of these allowances (perhaps the lowest-level) is recognition of asm. Well, "sort of" recognition:
In C (N1256), it is found in the Standard under "Common extensions":
J.5.10 The asm keyword
1 The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form:
asm ( character-string-literal );
In C++ (N3337), it has similar caveats:
§7.4/1
An asm declaration has the form
asm-definition:
asm ( string-literal ) ;
The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note]
It should be noted that an important development in recent years is that attempting to increase performance by using inline assembly is often counter-productive, unless you know exactly what you are doing. Compiler/optimizer register usage decisions, awareness of pipeline and branch prediction behavior, etc., are almost always enough for most uses.
On the other hand, processors in recent years have added CPU-level support for higher-level operations (such as Intel's AES extensions) that can increase performance by several orders of magnitude for specialized applications.
So:
Legacy feature? Not at all. It is absolutely essential for some requirements.
Educational feature? In an ideal world, only if accompanied by a series of lectures explaining why you'll probably never need it, and if you ever do need it, how to limit it's visible surface area to the rest of your application as much as possible.
You also need to code with inlined asm when:
you need to use some processor features not usable in standard C; typically, the add with carry machine instruction is useful in bignum implementations like GMPlib
on today's processors with current optimizing compilers, you usually should not use  asm for performance reasons, since compilers optimize better than you can (an old example was implementing memcpy with rep stosw on x86).
you need some asm when you are using or implementing a different ABI. For example, the runtime system of some Ocaml or Common Lisp implementations have different calling conventions, and transitioning to them may require asm; but the current libffi (which is using  asm) may avoid you to code with asm
your brand-new hardware might have a recent instruction set not fully implemented by your compiler (e.g. extensions like AVX512...) for which you might need asm
you want to implement some functionality not implementable in C, e.g. backtrace
In general, you should think more than twice before using asm and if you do use it, you should use it in very few places. In general, avoid using asm....
The GCC compiler introduced an extended asm feature which has nearly become a de facto standard supported by many other compilers (e.g. Clang/LLVM...) - but the evil is in the details. See also the GCC Inline Assembly HowTo
The Linux kernel (and the many libc implementations, e.g. glibc or musl libc, etc...) is using asm (at least to make syscalls) but few major free software are also (directly) using asm instructions.
Read also the Linux Assembly HowTo

Why does glibc library use assembly

I am looking at this page: https://sys.readthedocs.io/en/latest/doc/01_introduction.html
that goes into explanation about how glibc does system calls. In one of the examples the code is examined and it is shown, that the last instruction glibc does to actually do a system call (meaning the interrupt to the cpu) is written in assembly.... So why is part of glibc in assembly? Is there some sort of advantage by writing that small part in assembly?
Also, the shared libraries during runtime are already compiled to machine code correct?
So why would there be any advantage using two different languages before compilation? Thank you.
The answer is super simple - since C doesn't cover system calls (because it doesn't cover any physical hardware in general, and prefers to express itself in terms of abstract machine), there is no C construct glibc can use to perform system call.
One could argue that compiler could provide a sort of intrinsic to do that, but since in Linux glibc is actually part of the compiler suit of tools (in contains CRT as well) there is really no need for it, glibc can do the job.
Also, last, but not the least, in modern CPUs syscall is usually not an interrupt. Instead, it's a specific instruction (syscall in x86_64).
I want to address this piece of your question:
Also, the shared libraries during runtime are already compiled to machine code correct?
So why would there be any advantage using two different languages before compilation?
SergeyA correctly points out that there isn't any C construct (even with all of GCC's extensions) that will cause the compiler to emit a syscall instruction. That's not the only thing that the C library is supposed to do that simply can't be written purely in C: the implementations of setjmp and longjmp, makecontext and setcontext, the "entry point" code that calls main, the "trampoline" that you return to when you return from a signal handler, and several other low-level bits all require a little bit of hand-written assembly. (Exercise: what do they all have in common?)
But there's another reason to mix assembly language into a program mostly written in C. This is one of the several implementations of memcpy for x86-64 in glibc. It is 3100 lines of hand-written assembly language and preprocessor macros. What it does could be expressed in four lines of C. Why would anyone go to that much trouble? Speed. Compilers are always getting closer, but they haven't yet quite managed to beat the human brain when it comes to squeezing every last possible cycle out of a critical innermost loop. (It is worth mentioning that in early 2018 the glibc devs spent a bunch of time replacing hand-written assembly implementations of math.h functions with C, because the compilers have caught up on those, and C is ever so much more maintainable.)
And yet a third answer, which isn't particularly relevant to glibc but comes up a bunch elsewhere, is that maybe you have two different languages in your program because each of them is better at part of your problem. The statistical language R is mostly implemented in C, but a bunch of its mathematical primitives are (or were, I haven't checked in a while) written in FORTRAN, because FORTRAN is still the language that numerical computation wizards think in. Both C and FORTRAN get compiled to machine code, and in principle you could rewrite all the FORTRAN in C, but nobody wants to.

Is it true that it is common for programs written in C to contain assembly code?

I have read this, saying that
For example, it is common for programs that are written primarily in C to contain portions that are in an assembly language for optimization of processor efficiency.
I have never seen a program written primarily in C that contains assembly code too, at least not directly as source code. Only, their example with the Linux kernel.
Is this statement true and if so, how could it possibly optimize processor efficiency?
Aren't C code just translated into assembly code by the compiler?
No, it's not true. I'd estimate that less than 1% of C programmers even know how to program in assembly, and the need to use it is very rare. It's generally only needed for very special applications, such as some parts of an OS kernel or programming embedded systems, because they need to perform machine operations that don't have corresponding C code (such as directly manipulating CPU registers). In earlier days some programmers would use it for performance-critical sections of code, but compiler optimizations have improved significantly, and CPUs have gotten faster, so this is rarely needed now. It might still be used in the built-in libraries, so that functions like strcpy() will be as fast as possible. But application programmers almost never have to resort to assembly.
Aren't C code just translated into assembly code by the compiler?
Yes, but...
There are situations where you may want to access a specific register or other platform-specific location, and Standard C doesn't provide good ways to do that. If you want to look at a status word or load/read a data register directly, then you often need to drop down to the assembler level.
Also, even in this age of very smart optimizing compilers, it's still possible for a human assembly programmer to write assembly code that will out-perform code generated by the compiler. If you need to wring every possible cycle out of your code, you may need to "go manual" for a couple of routines.

What remains in C if I exclude libraries and compiler extensions?

Imagine a situation where you can't or don't want to use any of the libraries provided by the compiler as "standard", nor any external library. You can't use even the compiler extensions (such as gcc extensions).
What is the remaining part you get if you strip C language of all the things a lot of people use as a matter of course?
In such a way, probably a list of every callable function supported by any big C compiler (not only ANSI C) out-of-box would be satisfying as as answer as it'd at least approximately show the use-case of the language.
First I thought about sizeof() and printf() (those were already clarified in the comments - operator + stdio), so... what remains? In-line assembly seem like an extension too, so that pretty much strips even the option to use assembly with C if I'm right.
Probably in the matter of code it'd be easier to understand. Imagine a code compiled with only e.g. gcc main.c (output flag permitted) that has no #include, nor extern.
int main() {
// replace_me
return 0;
}
What can I call to actually do something else than "boring" type math and casting from type to type?
Note that switch, goto, if, loops and other constructs that do nothing and only allow repeating a piece of code aren't the thing I'm looking for (if it isn't obvious).
(Hopefully the edit clarified wtf I'm actually asking, but Matteo's answer pretty much did it.)
If you remove all libraries essentially you have something similar to a freestanding implementation of C (which still has to provide some libraries - say, string.h, but that's nothing you couldn't easily implement yourself in portable C), and that's what normally you start with when programming microcontrollers and other computers that don't have a ready-made operating system - and what operating system writers in general use when they compile their operating systems.
There you typically have two ways of doing stuff besides "raw" computation:
assembly blocks (where you can do literally anything the underlying machine can do);
memory mapped IO (you set a volatile pointer to some hardware dependent location and read/write from it; that affects hardware stuff).
That's really all you need to build anything - and after all, it all boils down to that stuff anyway, the C library of a regular hosted implementation is normally written in C itself, with some assembly used either for speed or to communicate with the operating system1 (typically the syscalls are invoked through some kind of interrupt).
Again, it's nothing you couldn't implement yourself. But the point of having a standard library is both to avoid to continuously reinvent the wheel, and to have a set of portable functions that spare you to have to rewrite everything knowing the details of each target platform.
And mainstream operating systems, in turn, are generally written in a mix or C and assembly as well.
C has no "built-in" functions as such. A compiler implementation may include "intrinsic" functions that are implemented directly by the compiler without provision of an external library, although a prototype declaration is still required for intrinsics, so you would still normally include a header file for such declarations.
C is a systems-level language with a minimal run-time and start-up requirement. Because it can directly access memory and memory mapped I/O there is very little that it cannot do (and what it cannot do is what you use assembly, in-line assembly or intrinsics for). For example, much of the library code you are wondering what you can do without is written in C. When running in an OS environment however (using C as an application-level rather then system-level language), you cannot practically use C in that manner - the OS has control over such things as I/O and memory-management and in modern systems will normally prevent unmediated access to such resources. Of course that OS itself is likely to largely written in C (and/or C++).
In a standalone of bare-metal environment with no OS, C is often used very early in the bootstrap process initialising hardware and establishing an application execution environment. In fact on ARM Cortex-M processors it is possible to boot directly into C code from reset, since the hardware loads an initial stack-pointer and start address from the vector table on start-up; this being enough to run C code that does not rely on library or static data initialisation - such initialisation can however be written in C before calling main().
Note that sizeof is not a function, it is an operator.
I don't think you really understand the situation.
You don't need a header to call a function in C. You can call with unchecked parameters - a bad idea and an obsolete feature, but still supported. And if a compiler links a library by default instead of only when you explicitly tell it to, that's only a little switch within the compiler to "link libc". Notoriously Unix compilers need to be told to link the math library, it wasn't linked by default because some very early programs didn't use floating point.
To be fair, some standard library functions like memcpy tend to be special-cased these days as they lend themselves to inlining and optimisation.
The standard library is documented and is usually available, though in effect deprecated by Microsoft for security reasons. You can write pretty much any function quite easily with only stdlib functions, what you can't do is fancy IO.

What's the purpose of using assembly language inside a C program?

What's the purpose of using assembly language inside a C program? Compilers are able to generate assembly language already. In what cases would it be better to write assembly than C? Is performance a consideration?
In addition to what everyone said: not all CPU features are exposed to C. Sometimes, especially in driver and operating system programming, one needs to explicitly work with special registers and/or commands that are not otherwise available.
Also vector extensions.
That was especially true before the advent of compiler intrinsics. Those alleviate the need for inline assembly somewhat.
One more use case for inline assembly has to do with interfacing C with reflected languages. Specifically, assembly is all but necessary if you need to call a function when its prototype is not known at compile time. In other words, when the quantity and datatypes of that function's arguments are but runtime variables. C variadic functions and the stdarg machinery won't help you in this case - they would help you parse a stack frame, but not build one. In assembly, on the other hand, it's quite doable.
This is not an OS/driver scenario. There are at least two technologies out there - Java's JNI and COM Automation - where this is a must. In case of Automation, I'm talking about the way the COM runtime is marshaling dual interfaces using their type libraries.
I can think of a very crude C alternative to assembly for that, but it'd be ugly as sin. Slightly less ugly in C++ with templates.
Yet another use case: crash/run-time error reporting. For postmortem debugging, you'd want to capture as much of program state at the point of crash as possible (i. e. all the CPU registers), and assembly is a much better vehicle for that than C. Postmortem debugging of crashing native code usually involves staring at the assembly anyway.
Yet another use case - code that is intended for execution in another process without that process' co-operation or knowledge. This is often referred to as "shellcode", but it doesn't have to be shell related. Code like that needs to be very carefully written, and it can't rely on the conveniences of a high level language (like the run time library, or having a data section) that are normally taken for granted. When one is after injecting a significant piece of functionality into a target process, they usually end up loading a dynamic library, but the initial trampoline code that loads the library and passes control to it tends to be in assembly.
I've been only covering cases where assembly is necessary. Hand-optimizing for performance is covered in other answers.
There are a few, although not many, cases where hand-optimized assembly language can be made to run more efficiently than assembly language generated by C compilers from C source code. Also, for developers used to assembly language, some things can just seem easier to write in assembler.
For these cases, many C compilers allow inline assembly.
However, this is becoming increasingly rare as C compilers get better and better and producing efficient code, and most platforms put restrictions on some of the low-level type of software that is often the type of software that benefits most from being written in assembler.
In general, it is performance but performance of a very specific kind. For example, the SIMD parallel instructions of a processor might not generated by the compiler. By utilizing processor specific data formats and then issuing processor specific parallel instructions (e.g. ARM NEON or Intel SSE), very fast performance on graphics or signal processing problems can occur. Even then, some compilers allow these to be expressed in C using intrinsic functions.
While it used to be common to use assembly language inserts to hand-optimize critical functions, those days are largely done. Modern compilers are very good and modern processors have very complicated timing requirements so hand optimized code is often less optimal than expected.
There were various reasons to write inline assemblies in C. We can simply categorize the reasons into necessary and unnecessary.
For the reasons of unnecessary, possibly be:
platform compatibility
performance concerning
code optimization
etc.
I consider above as unnecessary because sometime they can be discard or implemented through pure C. For example of platform compatibility, you can totally implement particular version for each platform, however, use inline assemblies might reduce the effort. Here we are not going to talk too much about the unnecessary reasons.
For necessary reasons, they possibly be:
something with standard libraries was insufficient to do
some instruction set was not supported by compilers
object code generated incorrectly
writing stack-sensitive code
etc.
These reasons considered necessary, because of they are almost not possibly done with pure C language. For example, in old DOSes, software interrupt INT21 was not reentrantable. If you want to write a Virtual Dirve fully use INT21 supported by the compiler, it was impossible to do. In this situation, you would need to hook the original INT21, and make it reentrantable. However, the compiled code wraps your every call with prolog/epilog. Thus, you can never break something restricted, or you just crashed the code. You can try any of trick by using the pure language of C with libraries; but even you can successfully find a trick, that would mean you found a particular order that the compiler generates the machine code; this is implying: you tried to let the compiler compiles your code to exactly machine code. So, why not just write inline assemblies directly?
This example explained all above of necessary reasons except instruction set not supported, but I think that was easy to think about.
In fact, there're more reasons to write inline assemblies, but now you have some ideas of them, and so on.
Just as a curiosity, I'm adding here a concrete example of something not-so-low-level you can only do in assembly. I read this in an assembly book from my university time where it was used to show an inherent limitation of C/C++, and how to overcome it with assembly.
The problem is how do I invoke a function when the exact number of parameters is only known at runtime? In fact, in C/C++ you can easily define a function that takes a variable number of arguments like printf. But when it comes to calling that function, the compiler wants to know exactly how many parameters must be passed. You may pass more paremters than required, that won't do any harm. But what if the number grows unexpectedly to 100 or 1000 parameters, and must be picked out of an array?
The solution of course is using assembly, where you can dynamically create a stack frame of the proper size, copy the parameters on the stack, invoke the function, and finally reset the stack.
In practice, this would hardly ever be a limitation (except if the library you're using is really really bad designed). People who use assembly in C have much better reasons to do so like others have pointed out in their answers. Still, I think may be an interesting fact to know.
I would rather think of that as a way to write a very specific code for a specific platform, optimization, though still common, is used less nowadays. Knowledge and usage of assembly in C is also practiced by all-color hats.

Resources