Is there any special C standard for microcontrollers?
I ask because so far when I programmed something under Windows OS, it doesn't matter which compiler I used. If I had a compiler for C99, I knew what I could do with it.
But recently I started to program in C for microcontrollers, and I was shocked, that even it's still C in its basics, like loops, variables creation and so, there is some syntax type I have never seen in C for desktop computers. And furthermore, the syntax is changing from version to version. I use AVR-GCC compiler, and in previous versions, you used a function for port I/O, now you can handle a port like a variable in the new version.
What defines what functions and how to have them to be implemented into the compiler and still have it be called C?
Is there any special C standard for microcontrollers?
No, there is the ISO C standard. Because many small devices have special architecture features that need to be supported, many compilers support language extensions. For example because an 8051 has bit addressable RAM, a _bit data type may be provided. It also has a Harvard architecture, so keywords are provided for specifying different memory address spaces which an address alone does not resolve since different instructions are required to address these spaces. Such extensions will be clearly indicated in the compiler documentation. Moreover, extensions in a conforming compiler should be prefixed with an underscore. However, many provide unadorned aliases for backward compatibility, and their use should be deprecated.
... when I programmed something under Windows OS, it doesn't matter which compiler I used.
Because the Windows API is standardized (by Microsoft), and it only runs on x86, so there is no architectural variation to consider. That said, you may still see FAR, and NEAR macros in APIs, and that is a throwback to 16-bit x86 with its segmented addressing, which also required compiler extensions to handle.
... that even it's still C in its basics, like loops, variables creation and so,
I am not sure what that means. A typical microcontroller application has no OS or a simple kernel, you should expect to see a lot more 'bare metal' or 'system-level' code, because there are no extensive OS APIs and device driver interfaces to do lots of work under the hood for you. All those library calls are just that; they are not part of the language; it is the same C language; jut put to different work.
... there is some syntax type I have never seen in C for desktop computers.
For example...?
And furthermore, the syntax is changing from version to version.
I doubt it. Again; for example...?
I use AVR-GCC compiler, and in previous versions, you used a function for port I/O, now you can handle a port like a variable in the new version.
That is not down to changes in the language or compiler, but more likely simple 'preprocessor magic'. On AVR, all I/O is memory mapped, so if for example you include the device support header, it may have a declaration such as:
#define PORTA (*((volatile char*)0x0100))
You can then write:
PORTA = 0xFF;
to write 0xFF to memory mapped the register at address 0x100. You could just take a look at the header file and see exactly how it does it.
The GCC documentation describes target specific variations; AVR is specifically dealt with here in section 6.36.8, and in 3.17.3. If you compare that with other targets supported by GCC, it has very few extensions, perhaps because the AVR architecture and instruction set were specifically designed for clean and efficient implementation of a C compiler without extensions.
What defines what functions and how to have them to be implemented into the compiler and still have it be called C?
It is important to realise that the C programming language is a distinct entity from its libraries, and that functions provided by libraries are no different from the ones you might write yourself - they are not part of the language - so it can be C with no library whatsoever. Ultimately, library functions are written using the same basic language elements. You cannot expect the level of abstraction present in, say, the Win32 API to exist in a library intended for a microcontroller. You can in most cases expect at least a subset of the C Standard Library to be implemented since it was designed as a systems level library with few target hardware dependencies.
I have been writing C and C++ for embedded and desktop systems for years and do not recognise the huge differences you seem to perceive, so can only assume that they are the result of a misunderstanding of what constitutes the C language. The following books may help.
C Programming Language (2nd Edition) by Brian W. Kernighan and Dennis M. Ritchie
Embedded C by Michael J. Pont
Embedded systems are weird and sometimes have exceptions to "standard" C.
From system to system you will have different ways to do things like declare interrupts, or define what variables live in different segments of memory, or run "intrinsics" (pseudo-functions that map directly to assembly code), or execute inline assembly code.
But the basics of control flow (for/if/while/switch/case) and variable and function declarations should be the same across the board.
and in previous versions, you used function for Port I/O, now you can handle Port like variable in new version.
That's not part of the C language; that's part of a device support library. That's something each manufacturer will have to document.
The C language assumes a von Neumann architecture (one address space for all code and data) which not all architectures actually have, but most desktop/server class machines do have (or at least present with the aid of the OS). To get around this without making horrible programs, the C compiler (with help from the linker) often support some extensions that aid in making use of multiple address spaces efficiently. All of this could be hidden from the programmer, but it would often slow down and inflate programs and data.
As far as how you access device registers -- on different desktop/server class machines this is very different as well, but since programs written to run under common modern OSes for these machines (Mac OS X, Windows, BSDs, or Linux) don't normally access hardware directly, this isn't an issue. There is OS code that has to deal with these issues, though. This is usually done through defining macros and/or functions that are implemented differently on different architectures or even have multiple versions on a single system so that a driver could work for a particular device (such an Ethernet chip) whether it were on a PCI card or a USB dongle (possibly plugged into a USB card plugged into a PCI slot), or directly mapped into the processor's address space.
Additionally, the C standard library makes more assumptions than the compiler (and language proper) about the system that hosts the programs that use it (the C standard library). These things just don't make sense when there isn't a general purpose OS or filesystem. fopen makes no sense on a system without a filesystem, and even printf might not be easily definable.
As far as what AVR-GCC and its libraries do -- there are lots of stuff that goes into how this is done. The AVR is a Harvard architecture with memory mapped device control registers, special function registers, and general purpose registers (memory addresses 0-31), and a different address space for code and constant data. This already falls outside of what standard C assumes. Some of the registers (general, special, and device control) are accessible via special instructions for things like flipping single bits and read/writing to some multi-byte registers (a multi-instruction operation) implicitly blocks interrupts for the next instruction (so that the second half of the operation can happen). These are things that desktop C programs don't have to know anything about, and since AVR-GCC comes from regular GCC, it didn't initially understand all of these things either. That meant that the compiler wouldn't always use the best instructions to access control registers, so:
*(DEVICE_REG_ADDR) |= 1; // Set BIT0 of control register REG
would have turned into:
temp_reg = *DEVICE_REG_ADDR;
temp_reg |= 1;
*DEVICE_REG_ADDR = temp_reg;
because AVR generally has to have things in its general purpose registers to do bit operations on them, though for some memory locations this isn't true. AVR-GCC had to be altered to recognize that when the address of a variable used in certain operations is known at compile time and lies within a certain range, it can use different instructions to preform these operations. Prior to this, AVR-GCC just provided you with some macros (that looked like functions) that had inline assembly to do this (and use the single instruction inplemenations that GCC now uses). If they no longer provide the macro versions of these operations then that's probably a bad choice since it breaks old code, but allowing you to access these registers as though they were normal variables once the ability to do so efficiently and atomically was implemented is good.
I have never seen a C compiler for a microcontroller which did not have some controller-specific extensions. Some compilers are much closer to meeting ANSI standards than others, but for many microcontrollers there are tradeoffs between performance and ANSI compliance.
On many 8-bit microcontrollers, and even some 16-bit ones, accessing variables on a stack frame is slow. Some compilers will always allocate automatic variables on a run-time stack despite the extra code required to do so, some will allocate automatic variables at compile time (allowing variables that are never live simultaneously to overlap), and some allow the behavior to be controlled with a command-line options or #pragma directives. When coding for such machines, I sometimes like to #define a macro called "auto" which gets redefined to "static" if it will help things work faster.
Some compilers have a variety of storage classes for memory. You may be able to improve performance greatly by declaring things to be of suitable storage classes. For example, an 8051-based system might have 96 bytes of "data" memory, 224 bytes of "idata" memory which overlaps the first 96 bytes, and 4K of "xdata" memory.
Variables in "data" memory may be accessed directly.
Variables in "idata" memory may only be accessed by loading their address into a one-byte pointer register. There is no extra overhead accessing them in cases where that would be necessary anyway, so idata memory is great for arrays. If array q is stored in idata memory, a reference to q[i] will be just as fast as if it were in data memory, though a reference to q[0] will be slower (in data memory, the compiler could pre-compute the address and access it without a pointer register; in idata memory that is not possible).
Variables in xdata memory are far slower to access than those in other types, but there's a lot more xdata memory available.
If one tells an 8051 compiler to put everything in "data" by default, one will "run out of memory" if one's variables total more than 96 bytes and one hasn't instructed the compiler to put anything elsewhere. If one puts everything in "xdata" by default, one can use a lot more memory without hitting a limit, but everything will run slower. The best is to place frequently-used variables that will be directly accessed in "data", frequently-used variables and arrays that are indirectly accessed in "idata", and infrequently-used variables and arrays in "xdata".
The vast majority of the standard C language is common with microcontrollers. Interrupts do tend to have slightly different conventions, although not always.
Treating ports like variables is a result of the fact that the registers are mapped to locations in memory on most microcontrollers, so by writing to the appropriate memory location (defined as a variable with a preset location in memory), you set the value on that port.
As previous contributors have said, there is no standard as such, mainly due to different architectures.
Having said that, Dynamic C (sold by Rabbit Semiconductor) is described as "C with real-time extensions". As far as I know, the compiler only targets Rabbit processors, but there are useful additional keywords (for example, costate, cofunc, and waitfor), some real peculiarities (for example, #use mylib.lib instead of #include mylib.h - and no linker), and several omissions from ANSI C (for example, no file-scope static variables).
It's still described as 'C' though.
Wiring has a C-based language syntax. Perhaps you might want to see what makes it as such.
Related
I am wondering what methods there are to add typing information to generated C methods. I'm transpiling a higher-level programming language to C and I'd like to add a moving garbage collector. However to do that I need the method variables to have typing information, otherwise I could modify a primitive value that looks like a pointer.
An obvious approach would be to encapsulate all (primitive and non-primitive) variables in a struct that has an extra (enum) variable for typing information, however this would cause memory and performance overhead, the transpiled code is namely meant for embedded platforms. If I were to accept the memory overhead the obvious option would be to use a heap handle for all objects and then I'd be able to freely move heap blocks. However I'm wondering if there's a more efficient better approach.
I've come up with a potential solution, namely to predeclare and group variables based whether they're primitives or not (I can do that in the transpiler), and add an offset variable to each method at the end (I need to be able to find it accurately when scanning the stack area), that tells me where the non-primitive variables begin and where they end, so I can only scan those. This means that each method will use an additional 16/32-bit (depending on arch) of memory, however this should still be more memory efficient than the heap handle approach.
Example:
void my_func() {
int i = 5;
int z = 3;
bool b = false;
void* person;
void* person_info = ...;
.... // logic
volatile int offset = 0x034;
}
My aim is for something that works universally across GCC compilers, thus my concerns are:
Can the compiler reorder the variables from how they're declared in
the source code?
Can I force the compiler to put some data in the
method's stack frame (using volatile)?
Can I find the offset accurately when scanning the stack?
I'd like to avoid assembly so this approach can work (by default) across multiple platforms, however I'm open for methods even if they involve assembly (if they're reliable).
Typing information could be somehow encoded in the C function name; this is done by C++ and other implementations and called name mangling.
Actually, you could decide, since all your C code is generated, to adopt a different convention: generate long C identifiers which are practically unique and sort-of random program-wide, such as tiziw_7oa7eIzzcxv03TmmZ and keep their typing information elsewhere (e.g. some database). On Linux, such an approach is friendly to both libbacktrace and dlsym(3) + dladdr(3) (and of course nm(1) or readelf(1) or gdb(1)), so used in both bismon and RefPerSys projects.
Typing information is practically tied to calling conventions and ABIs. For example, the x86-64 ABI for Linux mandates different processor registers for passing floating points or pointers.
Read the Garbage Collection handbook or at least P.Wilson Uniprocessor Garbage Collection Techniques survey. You could decide to use tagged integers instead of boxing them, and you could decide to have a conservative GC (e.g. Boehm's GC) instead of a precise one. In my old GCC MELT project I generated C or C++ code for a generational copying GC. Similar techniques are used both in Bismon and in RefPerSys.
Since you are transpiling to C, consider also alternatives, such as libgccjit or LLVM. Look into libjit and asmjit.
Study also the implementation of other transpilers (compilers to C), including Chicken/Scheme and Bigloo.
Can the GCC compiler reorder the variables from how they're declared in the source code?
Of course yes, depending upon the optimizations you are asking. Some variables won't even exist in the binary (e.g. those staying in registers).
Can I force the compiler to put some data in the method's stack frame (using volatile)?
Better generate a single struct variable containing all your language variables, and leave optimizations to the compiler. You will be surprised (see this draft report).
Can I find the offset accurately when scanning the stack?
This is the most difficult, and depends a lot of compiler optimizations (e.g. if you run gcc with -O1 or -O3 on the generated C code; in some cases a recent GCC -e.g GCC 9 or GCC 10 on x86-64 for Linux- is capable of tail-call optimizations; check by compiling using gcc -O3 -S -fverbose-asm then looking into the produced assembler code). If you accept some small target processor and compiler specific tricks, this is doable. Study the implementation of the Ocaml compiler.
Send me (to basile#starynkevitch.net) an email for discussion. Please mention the URL of your question in it.
If you want to have an efficient generational copying GC with multi-threading, things become extremely tricky. The question is then how many years of development can you afford spending.
If you have exceptions in your language, take also a great care. You could with great caution generate calls to longjmp.
See of course this answer of mine.
With transpiling techniques, the evil is in the details
On Linux (specifically!) see also my manydl.c program. It demonstrates that on a Linux x86-64 laptop you could generate, in practice, hundred of thousands of dlopen(3)-ed plugins. Read then How to write shared libraries
Study also the implementation of SBCL and of GNU Prolog, at least for inspiration.
PS. The dream of a totally architecture-neutral and operating-system independent transpiler is an illusion.
There are some architectures which have multiple address spaces, notable examples are true Harvard ones, but for example OpenCL also has this property.
C compilers may provide some solutions to this, one of these is Named Address Spaces, supporting special pointer qualifiers to indicate the pointer's address space, but other solutions might also be present.
For GCC, the corresponding documentation is here: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Named-Address-Spaces.html
For IAR targeting the AVR, the corresponding documentation is here: https://www.iar.com/support/tech-notes/compiler/strings-with-iccavr-2.x/ (note that this is earlier than GCC's support, which GCC likely adapted for the 8 bit AVR target).
For SDCC (Small Device C compiler): http://sdcc.sourceforge.net/doc/sdccman.pdf , starts on Page 36. Covers microcontrollers like the 8051, Z80 and 68HC08.
Some information for OpenCL: https://www.khronos.org/registry/OpenCL/sdk/1.1/docs/man/xhtml/local.html and https://software.intel.com/en-us/articles/the-generic-address-space-in-opencl-20
I didn't know about them, and on the architecture I am using (8 bit AVR), there is another solution to deal with the problem, specialized macros (pgmspace.h) to work with data in the ROM. But there are no type checks on these, and they (in my opinion) make code ugly, so it would seem to me that using Named Address Spaces are a superior, and possibly even more portable way to deal with the problem (portable in that one could easily port such a software to a target having a single address space by providing empty definitions for the address space qualifiers).
However in a previous question in which I learned from their availability, solutions suggesting the use of Named Address Spaces got severely downvoted, here: How to make two otherwise identical pointer types incompatible
Downvoters didn't provide any explanation, and I neither found any myself, for me it seems like Named Address Spaces are a good and perfectly functional way of dealing with the problem.
Could anyone provide an explanation? Why Named Address Spaces probably shouldn't be used? (favoring whatever other method available on the target having multiple distinct address spaces)
Another approach is to steal a technique used in things like the Linux kernel and tools like smatch.
Linux has defines like
#define __user
which mean the code can say things like int foo(const __user char *p). The compiler ignores the __user but tools like smatch are then used to make sure that pointers don't accidentally wander between namespaces.
The problem with these is obvious: they only work on the gcc compiler.
And in the embedded systems branch there are lots of different compilers, each offering its own unique, non-portable way to do this. Sometimes that is fine (most embedded projects never get ported to different compilers) but from a generic point-of-view, it is not.
(The very same issue also exists with extended addresses - if you would for example use a 8 or 16 bit MCU with more than 64kib addressable memory. Compilers then use various non-standard extensions such as near and far.)
One solution to these problems is to make a "wrapper" around the compiler-specific behavior, by making a hardware abstraction layer (HAL) where you specify that the type used for storing data in flash is flash_byte_t or some such, then from your HAL include a compiler-specific header-file containing the actual typedef, such as typedef const __flash uint8_t flash_byte_t;. For example, the application includes "compiler.h" and this one in turn includes "gcc.h". That way you only need to re-write one small header file when you switch compiler.
Also as it turns out, C allows const flash_byte_t just fine even though this was already typedef'd as const. There's a special odd rule in C saying that you can add the same qualifier as many times in a declaration as you like. So const const int x is equivalent to const int x. This means that if the user would put on extra const-qualification, that's fine.
Note that it's mostly AVR being a special exception here, because of its weird Harvard model.
Otherwise, there's an industry de facto standard convention used by most compilers: all const qualified variables with static storage duration should be allocated in flash. Of course the C standard makes no guarantees of this (it is out of scope of the standard), but most embedded compilers behave like that.
I'm unable to understand one of the following sentence from WikiBooks :
Why C, and not assembly language?
" C is a compiled language, which creates fast and efficient executable files. It is also a small "what you see is all you get" language: a C statement corresponds to at most a handful of assembly statements, everything else is provided by library functions. "
Website Link : C Programming/Why learn C? - Wikibooks, open books for an open world
Note : I am a complete beginner and I've started to learn C . So, I need a precise explanation of what the above sentence means.
The assembly is the language for a single processor family, it is directly compiled to the machine code that the processor runs. If one programs in assembly, one needs to rewrite the entire code for the different processor family. Phones usually use ARM processors whereas the desktop computers have 32-bit or 64-bit x86-compatible processors. Each 3 of these potentially need a completely separately written program, and perhaps not even limited to that.
In contrast standard C is a portable language - if you write so-called strictly conforming programs. C11 4p5:
A strictly conforming program shall use only those features of the language and library specified in this International Standard. (3) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.
With footnote 5 noting that:
Strictly conforming programs are intended to be maximally portable among conforming implementations. Conforming programs may depend upon nonportable features of a conforming implementation
Unlike the assembler whose specifics vary from processor to another, it is possible to write programs in C and then port them to various platforms without any changes into the source code, yet these programs will still be compiled into the assembly language, and performance could - and often will - surpass hand-written assembly when using a modern high-quality optimizing compiler.
Additionally the C standard library, which any conforming hosted implementation needs to provide, provides for a portable way to manage files, dynamic memory, input and output, all of which are not only processor but also operating-system specific when using assembler.
However, C is still quite close to the assembly, to the extent that it has been called a "high-level assembly language" by some.
It makes no sense to say compiled language and interpreted language.
This kind of statement are made by persons without education and could not understand the foundations of programming.
A language is defined mathematically via a way to define languages -- operational, denotational, axiomatic, etc and the programmers implement the language as they wish.
There are machines that run C via interpretation, they dispatch the code at the moment of execution and execute it instead of accumulating some object code that would be executed later by some machine, etc.
It is correct to say compiled implementation, interpreted implementation for the language, but even so it is relative to a given machine. Because when you compile it for x86 processors, the compiled code is interpreted by the datapath and controller of a stack machine for the X86 language, etc.
Basically the statement what you see is all you get means that is almost 1 to 1 correspondence between the operators of the CAM defined in the abstract semantics of ISO 9899 and the current stack-machines on the market, like x86, mips, etc.
C is nothing more than a platform-independent Assembly translator, what you write in C is efficiently "translated" into machine code as it would if you write it directly in Assembly. Thats the point of:
"what you see is all you get" language: a C statement corresponds to at most a handful of assembly statements
Any C sentence you write is directly transformed to ASM by the compiler without abstraction layers, interpreters, etc, unlike other languages.
By definition, C is tinny, it has nothing but the esentials to be considered a turing complete language and nothing more. Any additional feature is achieved via libraries, C ships the std lib (diferent implementations tho) that packs things like RNG, memory management, etc.
That's what this means:
everything else is provided by library functions
It's an old and largely outdated claim about C.
C was originally designed as, roughly, a more readable and portable assembler. For this reason, most of the core language features tended - on most target machines - to be easily translated. Generally more complicated functionality was provided by library functions, including the standard library.
Over time, C (both the language and the standard library) have evolved, and become more complicated. Computing hardware has also become more complicated - for example, supporting a set of more advanced instructions - and C constructs which can be implemented in terms of advanced instructions will translate to more complicated assembler on machines that support older and simpler instruction sets.
The distinction between a "small" language and a "large" one is completely subjective - so some people still continue to describe C as small and simple, both others describe is as large and complex. While simpler than some other languages (like C++), C is now also significantly more complex - by various measures - than quite a few other programming languages.
This quote is absolutely true for the good old K&R C implementation of the 70'. In this old days, C was indeed a thin wrapper around machine instructions, and the programmer could easily guess how the compiler would translate the source:
for loop: a counter in appropriate register, a test at end of loop a goto
function call: push arguments to the stack (with no conversion!), call the sub-routine address. On return put the return value (required to be scalar or pointer) to the appropriate register and use machine return. On return, the caller cleans up the stack
On a symetric point of view, anything that could be executed by the processor could be expressed in C. If you have an array of two integers and know that the internal representation is a valid double, just cast a pointer and use it.
That's all wrong with recent version of the C language and with optimizing compilers. The as if rule allows the optimizer to do anything, provided the observable results are what a natural implementation should have given. Many operations can invoke Undefined Behaviour. For example writing a float at a memory location and using it as an integer is explicitely UB. The optimizer can assume that no UB exists in the program, so it can just optimize out any block containing UB (recent versions of gcc are great at that).
Look for example at this function:
void stopit() {
int i = 0;
while(1) {
i+=1;
}
printf("done");
}
It contains an infinite loop, so the printf should never be reached. But the loop has no observable result, so the compiler is free to optimize it out and translate it the same as:
void stopit() {
printf("done");
}
Another example
int i = 12;
float *f = &i;
*f = 12.5; // UB use an float variable to access an int
printf("0x%04x\n", i); // try to dump the representation of 12.5
This code can legally display 0x000c, because the compiler is free to assume that the *f=0. has not modified i, so it can directly use a cached value and translate the last line directly as printf("0x%04x\n", 12);
So not, recent versions of the C language are no longer a small "what you see is all you get" language
What is true is that C is a low level language. The programmer has full control on allocation/deallocation of dynamic storage. You have a natural access at the byte level for any type, you have the notion of pointer and explicit pointer/integer conversion to allow direct access to well known memory addresses. That indeed allows to program embedded systems or micro-controller in C. The standard even defines two environment levels: a hosted environment where you have full access to the standard library and a freestanding environment where the standard library is not present. This can be specifically interesting for systems with very little memory.
C provides low-level control of memory and resources at the byte and bit level. For example C and assembly language are very common in the programming of microcontrollers (my area of expertise), which have very little memory and most often require bit-level control of input and output ports.
If you write a C program and build it, then look at your listing file, you'll typically see the very close correspondence between your C statements and the few assembly instructions into which the C is assembled.
Another clue to its simplicity is to look at its grammar definition as compared to that for C# or Java or Python, for example. The C grammar is small, terse, compact compared to the "fuller" languages, and it's true, there isn't even input or output defined in C. That typically comes from including stdio.h or similar. In this way, you only get what you need in your executable. That is in start contrast to the "big" languages.
While many in the embedded (microcontroller) programming space still prefer assembly, C is a great way to abstract a little bit things like flow of control and pointers, while still retaining the power to employ practically every instruction the microprocessor or microcontroller is capable of.
Regarding the "what you see is all you get" statement...
C is a "small" language in that provides only a handful of abstractions - that is, high-level language constructs that either hide implementation-specific details (such as I/O, type representations, address representations, etc.) or simplify complex operations (memory management, event processing, etc.). C doesn't provide any support at the language level (either in the grammar or standard library) for things like networking, graphics, sound, etc.; you must use separate, third-party libraries for those tasks, which will vary based on platform (Windows, MacOS, iOS, Linux). Compare that to a language like Java, which provides a class library for just about everything you could ever want to do.
Compared to languages like C++ and Java, not a whole lot of things happen "under the hood" in C. There's no overloading of functions or operators, there are no constructors or destructors that are automatically called when objects are created or destroyed, there's no real support for "generic" programming (writing a function template that can be automatically instantiated for arguments of different types), etc. Because of this, it's often easier to predict how a particular piece of code will perform.
There's no automatic resource management in C - arrays don't grow or shrink as you add or remove elements, there's no automatic garbage collection that reclaims dynamic memory that you aren't using anymore, etc.
The only container provided by the C language is the array - for anything more complex (lists, trees, queues, stacks, etc.) you have to write your own implementation, or use somebody else's library.
C is "close to the machine" in that the types and abstractions it provides are based on what real-world hardware provides. For example, integer and floating-point representations and operations are based on what the native hardware supports. The size of an int is (usually) based on the native CPU's word size, meaning it can only represent a certain range of values (the minimum range required by the language standard is [-32767..32767] for signed integers and [0..65535] for unsigned integers). Operations on int objects are mapped to native ADD/DIV/MUL/SUB opcodes. Languages like Python provide "arbitrary precision" types, which are not limited by what the hardware can natively support - the tradeoff is that operations using these types are often slower, since you're not using native opcodes.
Imagine a situation where you can't or don't want to use any of the libraries provided by the compiler as "standard", nor any external library. You can't use even the compiler extensions (such as gcc extensions).
What is the remaining part you get if you strip C language of all the things a lot of people use as a matter of course?
In such a way, probably a list of every callable function supported by any big C compiler (not only ANSI C) out-of-box would be satisfying as as answer as it'd at least approximately show the use-case of the language.
First I thought about sizeof() and printf() (those were already clarified in the comments - operator + stdio), so... what remains? In-line assembly seem like an extension too, so that pretty much strips even the option to use assembly with C if I'm right.
Probably in the matter of code it'd be easier to understand. Imagine a code compiled with only e.g. gcc main.c (output flag permitted) that has no #include, nor extern.
int main() {
// replace_me
return 0;
}
What can I call to actually do something else than "boring" type math and casting from type to type?
Note that switch, goto, if, loops and other constructs that do nothing and only allow repeating a piece of code aren't the thing I'm looking for (if it isn't obvious).
(Hopefully the edit clarified wtf I'm actually asking, but Matteo's answer pretty much did it.)
If you remove all libraries essentially you have something similar to a freestanding implementation of C (which still has to provide some libraries - say, string.h, but that's nothing you couldn't easily implement yourself in portable C), and that's what normally you start with when programming microcontrollers and other computers that don't have a ready-made operating system - and what operating system writers in general use when they compile their operating systems.
There you typically have two ways of doing stuff besides "raw" computation:
assembly blocks (where you can do literally anything the underlying machine can do);
memory mapped IO (you set a volatile pointer to some hardware dependent location and read/write from it; that affects hardware stuff).
That's really all you need to build anything - and after all, it all boils down to that stuff anyway, the C library of a regular hosted implementation is normally written in C itself, with some assembly used either for speed or to communicate with the operating system1 (typically the syscalls are invoked through some kind of interrupt).
Again, it's nothing you couldn't implement yourself. But the point of having a standard library is both to avoid to continuously reinvent the wheel, and to have a set of portable functions that spare you to have to rewrite everything knowing the details of each target platform.
And mainstream operating systems, in turn, are generally written in a mix or C and assembly as well.
C has no "built-in" functions as such. A compiler implementation may include "intrinsic" functions that are implemented directly by the compiler without provision of an external library, although a prototype declaration is still required for intrinsics, so you would still normally include a header file for such declarations.
C is a systems-level language with a minimal run-time and start-up requirement. Because it can directly access memory and memory mapped I/O there is very little that it cannot do (and what it cannot do is what you use assembly, in-line assembly or intrinsics for). For example, much of the library code you are wondering what you can do without is written in C. When running in an OS environment however (using C as an application-level rather then system-level language), you cannot practically use C in that manner - the OS has control over such things as I/O and memory-management and in modern systems will normally prevent unmediated access to such resources. Of course that OS itself is likely to largely written in C (and/or C++).
In a standalone of bare-metal environment with no OS, C is often used very early in the bootstrap process initialising hardware and establishing an application execution environment. In fact on ARM Cortex-M processors it is possible to boot directly into C code from reset, since the hardware loads an initial stack-pointer and start address from the vector table on start-up; this being enough to run C code that does not rely on library or static data initialisation - such initialisation can however be written in C before calling main().
Note that sizeof is not a function, it is an operator.
I don't think you really understand the situation.
You don't need a header to call a function in C. You can call with unchecked parameters - a bad idea and an obsolete feature, but still supported. And if a compiler links a library by default instead of only when you explicitly tell it to, that's only a little switch within the compiler to "link libc". Notoriously Unix compilers need to be told to link the math library, it wasn't linked by default because some very early programs didn't use floating point.
To be fair, some standard library functions like memcpy tend to be special-cased these days as they lend themselves to inlining and optimisation.
The standard library is documented and is usually available, though in effect deprecated by Microsoft for security reasons. You can write pretty much any function quite easily with only stdlib functions, what you can't do is fancy IO.
From a discussion somewhere else:
C++ has no standard ABI (Application Binary Interface)
But neither does C, right?
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.
What's your take on this?
C defines no ABI. In fact, it bends over backwards to avoid defining an ABI. Those people, who like me, who have spent most of their programming lives programming in C on 16/32/64 bit architectures with 8 bit bytes, 2's complement arithmetic and flat address spaces, will usually be quite surprised on reading the convoluted language of the current C standard.
For example, read the stuff about pointers. The standard doesn't say anything so simple as "a pointer is an address" for that would be making an assumption about the ABI. In particular, it allows for pointers being in different address spaces and having varying width.
An ABI is a mapping from the execution model of the language to a particular machine/operating system/compiler combination. It makes no sense to define one in the language specification because that runs the risk of excluding C implementations on some architectures.
C has no standard ABI in principle, but in practice, this rarely matters: You do what your OS-vendor does.
Take the calling conventions on x86 Windows, for example: The Windows API uses the so-called 'standard' calling convention (stdcall). Thus, any compiler which wants to interface with the OS needs to implement it. However, stdcall doesn't support all C90 language features (eg calling functions without prototypes, variadic functions). As Microsoft provided a C compiler, a second calling convention was necessary, called the 'C' calling convention (cdecl). Most C compilers on Windows use this as their default calling convention, and thus are interoperable.
In principle, the same could have happened with C++, but as the C++ ABI (including the calling convention) is necessarily far more elaborate, compiler vendors did not agree on a single ABI, but could still interoperate by falling back to extern "C".
The ABI for C is platform specific - it covers issues such as register allocation and calling conventions, which are obviously specific to a particular processor. Here are some examples:
The ARM ABI (includes C++)
The PowerPC Embedded ABI
The several ABIs of x86
x86 has had many calling conventions, which extensions under Windows to declare which one is used. Platform ABIs for embedded Linux have also changed over time, leading to incompatible user space. See some history of the ARM Linux port here, which shows the problems in the transition to a newer ABI.
Although several attempts have been
made at defining a single ABI for a
given architecture across multiple
operating systems (Particularly for
i386 on Unix Systems), the efforts
have not met with such success.
Instead, operating systems tend to
define their own ABIs ...
Quoting ... Linux System Programming page 4.
An ABI, even for C, has parts which are quite platform independent, parts which depend on the processor (which registers should be saved, which are used for passing parameters,...) and parts which depend on the OS (more or less the same factors as for the processor as some choices are not imposed by the architecture but are the result of trade-offs, plus some OS's have a language independent notion of exception and so a compiler for any language has to generate the right thing to handle those, handling of threads may also impose things on the ABI -- if a register points to TLS, you can't use it for what you want).
In theory, every compiler may have its own ABI. But usually, for a couple processor/OS, the ABI is fixed by the OS vendor which often also provide a C compiler and common libraries which use that ABI and competitors prefer to be compatible. (I'd not be surprised if there are exceptions for some OS for which C isn't a major programming language).
But the OS vendor may switch ABI for one reason or the other (new versions of processors may have features that you want to use in the ABI for one - for instance some have asked for a 32bit ABI for x86_64 allowing to use all the registers). During the migration phase - which may be for a very long time - you may have to handle two ABI.
neither does C, right?Right
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.Pretty much might refer to architecture-specific defaults chosen by C compiler vendors being adapted within other languages. So if Keil's ARM C compiler will use left to right little endian parameter ordering and stack to pass arguments and some predetermined register for return value, then extern "C" from other compilers will assume compatibility with such scheme.
While such agreement maybe considered part of ABI, unlike managed execution context such as JVM browser sandbox, this is far from being complete standard ABI by itself.
C does not have a standard ABI. This is easily illustrated by all the calling conventions (cdecl, fastcall and stdcall) that are used out there. Each is a different ABI.
There's no standard ABI because C has always been about maximum runtime performance and the ABI with the highest performance depends on the underlying hardware. As a result, the ABI may use only stack or prefer registers for passing function call arguments and return values as needed for any given hardware.
For example, even amd64 (a.k.a x86-64) has two calling conventions: Microsoft x64 and System V AMD64 ABI. The former puts 4 first arguments to registers and the rest into the stack. The latter puts 6 first arguments to registers and the rest into the stack. I have no idea why Microsoft created non-compatible calling convention for amd64 hardware. For all I know, the Microsoft variant has a slightly worse performance and was created later.
For more information, see https://en.wikipedia.org/wiki/X86_calling_conventions
Prior to the C89 Standard, C compilers for many platforms used essentially the same ABI, save for variations in data sizes. For machines whose stack grows downward, code which calls a function would push the arguments on the stack in order from right to left and then call the function (pushing the return address in the process). A called function would leave its arguments on the stack, and the caller would at its leisure adjust the stack pointer to remove them [or, on some architectures, might adjust the stacked values in place]. While <stdarg.h> made it unnecessary for most programs to rely upon that convention, it remained in use for many years because it was simple and worked pretty well. While there was no "official" document establishing that as a cross-platform "standard", most compilers targeting machines with downward-growing stacks worked that way, leading to a greater level of consistency than exists today.