C and assembly how can it work? - c

I am wondering how mixing C and assembly can be possible as compilers generate code in different ways, for example many C compilers will use registers rather than pushing to the stack while making a function call, These functions will then move those registers into the appropiate memory locations because of this what if you write assembly code or link with an object file created by a different compiler that will call the C function but instead push the arguments to the stack rather than set the registers.
My guess is the C compiler assembly output has done it in such a clever way that it doesn't make a difference and it will still work but I can't be sure looking at the assembly code it doesn't appear it would work.
Can anyone answer my question as I am writing a compiler and need to know this so I don't make any mistakes should I want to link with a C module in the future.

The conventions that are used for calling functions are part of what's called the "application binary interface" (ABI). If this interface is specified, then all code that follows the specification can be linked together.
There is no standard ABI for C. However, most popular platforms have one prevailing C compiler that effectively produces a de-facto standard ABI (e.g. there's one for Windows, one for Linux on x86 (32 and 64 bit), one for Linux on ARM, etc.). ABIs may specify a large number of separate "calling conventions", and your C compiler will typically let you specify the desired convention at the point of function declaration using some vendor extension.
Conversely, if there is no documented ABI for your C compiler, or for an existing bit of object code, then you cannot in general link (or otherwise interact) with it successfully.

Related

What remains in C if I exclude libraries and compiler extensions?

Imagine a situation where you can't or don't want to use any of the libraries provided by the compiler as "standard", nor any external library. You can't use even the compiler extensions (such as gcc extensions).
What is the remaining part you get if you strip C language of all the things a lot of people use as a matter of course?
In such a way, probably a list of every callable function supported by any big C compiler (not only ANSI C) out-of-box would be satisfying as as answer as it'd at least approximately show the use-case of the language.
First I thought about sizeof() and printf() (those were already clarified in the comments - operator + stdio), so... what remains? In-line assembly seem like an extension too, so that pretty much strips even the option to use assembly with C if I'm right.
Probably in the matter of code it'd be easier to understand. Imagine a code compiled with only e.g. gcc main.c (output flag permitted) that has no #include, nor extern.
int main() {
// replace_me
return 0;
}
What can I call to actually do something else than "boring" type math and casting from type to type?
Note that switch, goto, if, loops and other constructs that do nothing and only allow repeating a piece of code aren't the thing I'm looking for (if it isn't obvious).
(Hopefully the edit clarified wtf I'm actually asking, but Matteo's answer pretty much did it.)
If you remove all libraries essentially you have something similar to a freestanding implementation of C (which still has to provide some libraries - say, string.h, but that's nothing you couldn't easily implement yourself in portable C), and that's what normally you start with when programming microcontrollers and other computers that don't have a ready-made operating system - and what operating system writers in general use when they compile their operating systems.
There you typically have two ways of doing stuff besides "raw" computation:
assembly blocks (where you can do literally anything the underlying machine can do);
memory mapped IO (you set a volatile pointer to some hardware dependent location and read/write from it; that affects hardware stuff).
That's really all you need to build anything - and after all, it all boils down to that stuff anyway, the C library of a regular hosted implementation is normally written in C itself, with some assembly used either for speed or to communicate with the operating system1 (typically the syscalls are invoked through some kind of interrupt).
Again, it's nothing you couldn't implement yourself. But the point of having a standard library is both to avoid to continuously reinvent the wheel, and to have a set of portable functions that spare you to have to rewrite everything knowing the details of each target platform.
And mainstream operating systems, in turn, are generally written in a mix or C and assembly as well.
C has no "built-in" functions as such. A compiler implementation may include "intrinsic" functions that are implemented directly by the compiler without provision of an external library, although a prototype declaration is still required for intrinsics, so you would still normally include a header file for such declarations.
C is a systems-level language with a minimal run-time and start-up requirement. Because it can directly access memory and memory mapped I/O there is very little that it cannot do (and what it cannot do is what you use assembly, in-line assembly or intrinsics for). For example, much of the library code you are wondering what you can do without is written in C. When running in an OS environment however (using C as an application-level rather then system-level language), you cannot practically use C in that manner - the OS has control over such things as I/O and memory-management and in modern systems will normally prevent unmediated access to such resources. Of course that OS itself is likely to largely written in C (and/or C++).
In a standalone of bare-metal environment with no OS, C is often used very early in the bootstrap process initialising hardware and establishing an application execution environment. In fact on ARM Cortex-M processors it is possible to boot directly into C code from reset, since the hardware loads an initial stack-pointer and start address from the vector table on start-up; this being enough to run C code that does not rely on library or static data initialisation - such initialisation can however be written in C before calling main().
Note that sizeof is not a function, it is an operator.
I don't think you really understand the situation.
You don't need a header to call a function in C. You can call with unchecked parameters - a bad idea and an obsolete feature, but still supported. And if a compiler links a library by default instead of only when you explicitly tell it to, that's only a little switch within the compiler to "link libc". Notoriously Unix compilers need to be told to link the math library, it wasn't linked by default because some very early programs didn't use floating point.
To be fair, some standard library functions like memcpy tend to be special-cased these days as they lend themselves to inlining and optimisation.
The standard library is documented and is usually available, though in effect deprecated by Microsoft for security reasons. You can write pretty much any function quite easily with only stdlib functions, what you can't do is fancy IO.

Passing struct between code generated by different compilers

The memory layout of a struct is up to the compiler. So what happens when some code compiled by one compiler uses a struct generated by code compiled by another compiler?
For example, say I have a header file that declares a struct somestruct, and a function that returns the struct. One source file defines that function and is compiled by compiler A. Another source file uses than function and is compiled by compiler B and links against the binary of the other source file.
If the two compilers create two different layouts for somestruct, then what's the layout of the variable returned by the function? Does it defer to one compiler's layout, or will there be a memory bug when the second source file tries to access elements of the struct returned by the first source file? Is it an error at compile time or link time?
The function will return a structure as specified by the ABI of the compiler of the function. The callee compiler, will just treat the function as if it conforms to the ABI of itself.
Assuming the two compilers use a similar ABI, in most cases, no errors will be reported during compile-time or link time or even during runtime. For some compatible compilers like Clang, GCC, and Intel C Compiler on OS X and Linux, no errors should result (if there are errors then it's a bug of the compiler). However in real world it is usually difficult to find fully compatible compilers (in most cases their ABIs are similar but not exactly the same; such ABI errors will be even harder to track down because your app would appear normal and crashes under some really weird circumstances are encountered during runtime).
Just as Basile said, name mangling for C++ poses an additional difference in ABI, but such differences are more easily caught during compile time as the linker literally can't find the symbol of the function, rather than finding a function that is not compatible.
Also, passing structures is another headache in terms of ABI because there are multiple structure-packing ABIs, sometimes even different in "compatible" compilers like GCC/MinGW and MSVC. (See also the -m[no-]ms-bitfields option in GCC, which forces GCC to use the MSVC ABI for structures.) I have also seen some cases where passing structures by pointer is more reliable than passing structures by value.
The layout of data (e.g. structures etc...), and the call protocol (how are call done at the processor level) are defined in a (processor and operating system specific) document called Application Binary Interface. If both compilers are following the same ABI (for the same processor and the same operating system) their generated code should be interoperable.
See e.g. the wikipage for x86 calling conventions and the x86-64 ABI specification.
Name mangling, notably for C++, might also be an issue.
Read also Levine's book on Linkers and Loaders

Assuming a calling convention when combining C and x86 Assembly

I have some assembly routines that are called by and take arguments from C functions. Right now, I'm assuming those arguments are passed on the stack in cdecl order. Is that a fair assumption to make?
Would a compiler (GCC) detect this and make sure the arguments are passed correctly, or should I manually go and declare them cdecl? If so, will that attribute still hold if I specify a higher optimisation level?
Calling conventions mean much more than just argument ordering. There is a good pdf explaining all the details, written by Agner Fog: Calling conventions for different C++ compilers and operating systems.
This is a matter of the ABI for the platform you're writing code for. Almost all platforms follow the Unix System V ABI for C calling convention and other ABI issues, which includes both a general ABI (gABI) document detailing the common ABI characteristics across all CPU architectures, and a processor-specific ABI (psABI) document specific to the particular CPU architecture/family. When it comes to x86, this matches what you refer to as "cdecl". So from a practical standpoint, x86 assembly meant to be called from C should be written to assume "cdecl". Basically the only exception to the universality of this calling convention is Windows API functions, which use their own nonstandard "stdcall" calling convention due to legacy Win16 dll thunk compatibility issues; nonetheless, the "default" calling convention on x86 Windows is still "cdecl".
A more important concern when writing asm to be called from C is whether symbol names should be prefixed with an underscore or not. This varies widely between platforms, with the general trend being that ELF-based platforms don't use the prefix, and most other platforms do...
The quick and dirty way to do it is create a dummy C function that matches the asm function you want to implement, do a few things in the dummy C function with the passed in parameters so you can tell them apart, compile then disassemble. Not foolproof but works often.

How does C code call assembly code (e.g. optimized strlen)?

I always read things about how certain functions within the C programming language are optimized by being written in assembly. Let me apologize if that sentence sounds a little misguided.
So, I'll put it clearly: How is it that when you call some functions like strlen on UNIX/C systems, the actual function you're calling is written in assembly? Can you write assembly right into C programs somehow or is it an external call situation? Is it part of the C standard to be able to do this, or is it an operating system specific thing?
The C standard dictates what each library function must do rather than how it is implemented.
Almost all known implementations of C are compiled into machine language. It is up to the implementers of the C compiler/library how they choose to implement functions like strlen. They could choose to implement it in C and compile it to an object, or they could choose to write it in assembly and assemble it to an object. Or they could implement it some other way. It doesn't matter so long as you get the right effect and result when you call strlen.
Now, as it happens, many C toolsets do allow you to write inline assembly, but that is absolutely not part of the standard. Any such facilties have to be included as extensions to the C standard.
At the end of the road compiled programs and programs in assembly are all machine language, so they can call each other. The way this is done is by having the assembly code use the same calling conventions (way to prepare for a call, prepare parameters and such) as the program written in C. An overview of popular calling conventions for x86 processors can be found here.
Many (most?) C compilers do happen to support inline assembly, though it's not part of the standard. That said, there's no strict need for a compiler to support any such thing.
First, recognize that assembly is mostly just human (semi-)readable machine code, and that C ends up as machine code anyway.
"Calling" a C function just generates a set of instructions that prepare registers, the stack, and/or some other machine-dependent mechanism according to some established calling convention, and then jumps to the start of the called function.
A block of assembly code can conform to the appropriate calling convention, and thus generate a blob of machine code that another blob of machine code that was originally written in C is able to call. The reverse is, of course, also possible.
The details of the calling convention, the assembly process, and the linking process (to link the assembly-generated object file with the C-generated object file) may all vary wildly between platforms, compilers, and linkers. A good assembly tutorial for your platform of choice will probably cover such details.
I happen to like the x86-centric PC Assembly Tutorial, which specifically addresses interfacing assembly and C code.
When C code is compiled by gcc, it's first compiled to assembler instructions, which are then again compiled to a binary, machine-executable file. You can see the generated assembler instructions by specifying -S, as in gcc file.c -S.
Assembler code just passes the first stage of C-to-assembler compilation and is then indistinguishable from code compiled from C.
One way to do it is to use inline assembler. That means you can write assembler code directly into your C code. The specific syntax is compiler-specific. For example, see GCC syntax and MS Visual C++ syntax.
You can write inline assembly in your C code. The syntax for this is highly compiler specific but the asm keyword is ususally used. Look into inline assembly for more information.

What can you do in C without "std" includes? Are they part of "C," or just libraries?

I apologize if this is a subjective or repeated question. It's sort of awkward to search for, so I wasn't sure what terms to include.
What I'd like to know is what the basic foundation tools/functions are in C when you don't include standard libraries like stdio and stdlib.
What could I do if there's no printf(), fopen(), etc?
Also, are those libraries technically part of the "C" language, or are they just very useful and effectively essential libraries?
The C standard has this to say (5.1.2.3/5):
The least requirements on a conforming
implementation are:
— At sequence points, volatile objects
are stable in the sense that previous
accesses are complete and subsequent
accesses have not yet occurred.
— At program termination, all data
written into files shall be identical
to the result that execution of the
program according to the abstract
semantics would have produced.
— The input and output dynamics of
interactive devices shall take place
as specified in
7.19.3.
So, without the standard library functions, the only behavior that a program is guaranteed to have, relates to the values of volatile objects, because you can't use any of the guaranteed file access or "interactive devices". "Pure C" only provides interaction via standard library functions.
Pure C isn't the whole story, though, since your hardware could have certain addresses which do certain things when read or written (whether that be a SATA or PCI bus, raw video memory, a serial port, something to go beep, or a flashing LED). So, knowing something about your hardware, you can do a whole lot writing in C without using standard library functions. Potentially, you could implement the C standard library, although this might require access to special CPU instructions as well as special memory addresses.
But in pure C, with no extensions, and the standard library functions removed, you basically can't do anything other than read the command line arguments, do some work, and return a status code from main. That's not to be sniffed at, it's still Turing complete subject to resource limits, although your only resource is automatic and static variables, no heap allocation. It's not a very rich programming environment.
The standard libraries are part of the C language specification, but in any language there does tend to be a line drawn between the language "as such", and the libraries. It's a conceptual difference, but ultimately not a very important one in principle, because the standard says they come together. Anyone doing something non-standard could just as easily remove language features as libraries. Either way, the result is not a conforming implementation of C.
Note that a "freestanding" implementation of C only has to implement a subset of standard includes not including any of the I/O, so you're in the position I described above, of relying on hardware-specific extensions to get anything interesting done. If you want to draw a distinction between the "core language" and "the libraries" based on the standard, then that might be a good place to draw the line.
What could I do if there's no printf(), fopen(), etc?
As long as you know how to interface the system you are using you can live without the standard C library. In embedded systems where you only have several kilobytes of memory, you probably don't want to use the standard library at all.
Here is a Hello World! example on Linux and Windows without using any standard C functions:
For example on Linux you can invoke the Linux system calls directly in inline assembly:
/* 64 bit linux. */
#define SYSCALL_EXIT 60
#define SYSCALL_WRITE 1
void sys_exit(int error_code)
{
asm volatile
(
"syscall"
:
: "a"(SYSCALL_EXIT), "D"(error_code)
: "rcx", "r11", "memory"
);
}
int sys_write(unsigned fd, const char *buf, unsigned count)
{
unsigned ret;
asm volatile
(
"syscall"
: "=a"(ret)
: "a"(SYSCALL_WRITE), "D"(fd), "S"(buf), "d"(count)
: "rcx", "r11", "memory"
);
return ret;
}
void _start(void)
{
const char hwText[] = "Hello world!\n";
sys_write(1, hwText, sizeof(hwText));
sys_exit(12);
}
You can look up the manual page for "syscall" which you can find how can you make system calls. On Intel x86_64 you put the system call id into RAX, and then return value will be stored in RAX. The arguments must be put into RDI, RSI, RDX, R10, R9 and R8 in this order (when the argument is used).
Once you have this you should look up how to write inline assembly in gcc.
The syscall instruction changes the RCX, R11 registers and memory so we add this to the clobber list make GCC aware of it.
The default entry point for the GNU linker is _start. Normally the standard library provides it, but without it you need to provide it.
It isn't really a function as there is no caller function to return to. So we must make another system call to exit our process.
Compile this with:
gcc -nostdlib nostd.c
And it outputs Hello world!, and exits.
On Windows the system calls are not published, instead it's hidden behind another layer of abstraction, the kernel32.dll. Which is always loaded when your program starts whether you want it or not. So you can simply include windows.h from the Windows SDK and use the Win32 API as usual:
#include <windows.h>
void _start(void)
{
const char str[] = "Hello world!\n";
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD written;
WriteFile(stdout, str, sizeof(str), &written, NULL);
ExitProcess(12);
}
The windows.h has nothing to do with the standard C library, as you should be able to write Windows programs in any other language too.
You can compile it using the MinGW tools like this:
gcc -nostdlib C:\Windows\System32\kernel32.dll nostdlib.c
Then the compiler is smart enough to resolve the import dependencies and compile your program.
If you disassemble the program, you can see only your code is there, there is no standard library bloat in it.
So you can use C without the standard library.
What could you do? Everything!
There is no magic in C, except perhaps the preprocessor.
The hardest, perhaps is to write putchar - as that is platform dependent I/O.
It's a good undergrad exercise to create your own version of varargs and once you've got that, do your own version of vaprintf, then printf and sprintf.
I did all of then on a Macintosh in 1986 when I wasn't happy with the stdio routines that were provided with Lightspeed C - wrote my own window handler with win_putchar, win_printf, in_getchar, and win_scanf.
This whole process is called bootstrapping and it can be one of the most gratifying experiences in coding - working with a basic design that makes a fair amount of practical sense.
You're certainly not obligated to use the standard libraries if you have no need for them. Quite a few embedded systems either have no standard library support or can't use it for one reason or another. The standard even specifically talks about implementations with no library support, C99 standard 5.1.2.1 "Freestanding environment":
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.
The headers required by C99 to be available in a freestanding implemenation are <float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>. These headers define only types and macros so there's no need for a function library to support them.
Without the standard library, you're entire reliant on your own code, any non-standard libraries that might be available to you, and any operating system system calls that you might be able to interface to (which might be considered non-standard library calls). Quite possibly you'd have to have your C program call assembly routines to interface to devices and/or whatever operating system might be on the platform.
You can't do a lot, since most of the standard library functions rely on system calls; you are limited to what you can do with the built-in C keywords and operators. It also depends on the system; in some systems you may be able to manipulate bits in a way that results in some external functionality, but this is likely to be the exception rather than the rule.
C's elegance is in it's simplicity, however. Unlike Fortran, which includes much functionality as part of the language, C is quite dependent on its library. This gives it a great degree of flexibility, at the expense of being somewhat less consistent from platform to platform.
This works well, for example, in the operating system, where completely separate "libraries" are implemented, to provide similar functionality with an implementation inside the kernel itself.
Some parts of the libraries are specified as part of ANSI C; they are part of the language, I suppose, but not at its core.
None of them is part of the language keywords. However, all C distributions must include an implementation of these libraries. This ensures portability of many programs.
First of all, you could theoretically implement all these functions yourself using a combination of C and assembly, so you could theoretically do anything.
In practical terms, library functions are primarily meant to save you the work of reinventing the wheel. Some things (like string and library functions) are easier to implement. Other things (like I/O) very much depend on the operating system. Writing your own version would be possible for one O/S, but it is going to make the program less portable.
But you could write programs that do a lot of useful things (e.g., calculate PI or the meaning of life, or simulate an automata). Unless you directly used the OS for I/O, however, it would be very hard to observe what the output is.
In day to day programming, the success of a programming language typically necessitates the availability of a useful high-quality standard library and libraries for many useful tasks. These can be first-party or third-party, but they have to be there.
The std libraries are "standard" libraries, in that for a C compiler to be compliant to a standard (e.g. C99), these libraries must be "include-able." For an interesting example that might help in understanding what this means, have a look at Jessica McKellar's challenge here:
http://blog.ksplice.com/2010/03/libc-free-world/
Edit: The above link has died (thanks Oracle...)
I think this link mirrors the article: https://sudonull.com/post/178679-Hello-from-the-libc-free-world-Part-1
The CRT is part of the C language just as much as the keywords and the syntax. If you are using C, your compiler MUST provide an implementation for your target platform.
Edit:
It's the same as the STL for C++. All languages have a standard library. Maybe assembler as the exception, or some other seriously low level languages. But most medium/high levels have standard libs.
The Standard C Library is part of ANSI C89/ISO C90. I've recently been working on the library for a C compiler that previously was not ANSI-compliant.
The book The Standard C Library by P.J. Plauger was a great reference for that project. In addition to spelling out the requirements of the standard, Plauger explains the history of each .h file and the reasons behind some of the API design. He also provides a full implementation of the library, something that helped me greatly when something in the standard wasn't clear.
The standard describes the macros, types and functions for each of 15 header files (include stdio.h, stdlib.h, but also float.h, limits.h, math.h, locale.h and more).
A compiler can't claim to be ANSI C unless it includes the standard library.
Assembly language has simple commands that move values to registers of the CPU, memory, and other basic functions, as well as perform the core capabilities and calculations of the machine. C libraries are basically chunks of assembly code. You can also use assembly code in your C programs. var is an assembly code instruction. When you use 0x before a number to make it Hex, that is assembly instruction. Assembly code is the readable form of machine code, which is the visual form of the actual switch states of the circuits paths.
So while the machine code, and therefore the assembly code, is built into the machine, C languages are combined of all kinds of pre-formed combinations of code, including your own functions that might be in part assembly language and in part calling on other functions of assembly language or other C libraries. So the assembly code is the foundation of all the programming, and after that it's anyone's guess about what is what. That's why there are so many languages and so few true standards.
Yes you can do a ton of stuff without libraries.
The lifesaver is __asm__ in GCC. It is a keyword so yes you can.
Mostly because every programming language is built on Assembly, and you can make system calls directly under some OSes.

Resources