What remains in C if I exclude libraries and compiler extensions? - c

Imagine a situation where you can't or don't want to use any of the libraries provided by the compiler as "standard", nor any external library. You can't use even the compiler extensions (such as gcc extensions).
What is the remaining part you get if you strip C language of all the things a lot of people use as a matter of course?
In such a way, probably a list of every callable function supported by any big C compiler (not only ANSI C) out-of-box would be satisfying as as answer as it'd at least approximately show the use-case of the language.
First I thought about sizeof() and printf() (those were already clarified in the comments - operator + stdio), so... what remains? In-line assembly seem like an extension too, so that pretty much strips even the option to use assembly with C if I'm right.
Probably in the matter of code it'd be easier to understand. Imagine a code compiled with only e.g. gcc main.c (output flag permitted) that has no #include, nor extern.
int main() {
// replace_me
return 0;
}
What can I call to actually do something else than "boring" type math and casting from type to type?
Note that switch, goto, if, loops and other constructs that do nothing and only allow repeating a piece of code aren't the thing I'm looking for (if it isn't obvious).
(Hopefully the edit clarified wtf I'm actually asking, but Matteo's answer pretty much did it.)

If you remove all libraries essentially you have something similar to a freestanding implementation of C (which still has to provide some libraries - say, string.h, but that's nothing you couldn't easily implement yourself in portable C), and that's what normally you start with when programming microcontrollers and other computers that don't have a ready-made operating system - and what operating system writers in general use when they compile their operating systems.
There you typically have two ways of doing stuff besides "raw" computation:
assembly blocks (where you can do literally anything the underlying machine can do);
memory mapped IO (you set a volatile pointer to some hardware dependent location and read/write from it; that affects hardware stuff).
That's really all you need to build anything - and after all, it all boils down to that stuff anyway, the C library of a regular hosted implementation is normally written in C itself, with some assembly used either for speed or to communicate with the operating system1 (typically the syscalls are invoked through some kind of interrupt).
Again, it's nothing you couldn't implement yourself. But the point of having a standard library is both to avoid to continuously reinvent the wheel, and to have a set of portable functions that spare you to have to rewrite everything knowing the details of each target platform.
And mainstream operating systems, in turn, are generally written in a mix or C and assembly as well.

C has no "built-in" functions as such. A compiler implementation may include "intrinsic" functions that are implemented directly by the compiler without provision of an external library, although a prototype declaration is still required for intrinsics, so you would still normally include a header file for such declarations.
C is a systems-level language with a minimal run-time and start-up requirement. Because it can directly access memory and memory mapped I/O there is very little that it cannot do (and what it cannot do is what you use assembly, in-line assembly or intrinsics for). For example, much of the library code you are wondering what you can do without is written in C. When running in an OS environment however (using C as an application-level rather then system-level language), you cannot practically use C in that manner - the OS has control over such things as I/O and memory-management and in modern systems will normally prevent unmediated access to such resources. Of course that OS itself is likely to largely written in C (and/or C++).
In a standalone of bare-metal environment with no OS, C is often used very early in the bootstrap process initialising hardware and establishing an application execution environment. In fact on ARM Cortex-M processors it is possible to boot directly into C code from reset, since the hardware loads an initial stack-pointer and start address from the vector table on start-up; this being enough to run C code that does not rely on library or static data initialisation - such initialisation can however be written in C before calling main().
Note that sizeof is not a function, it is an operator.

I don't think you really understand the situation.
You don't need a header to call a function in C. You can call with unchecked parameters - a bad idea and an obsolete feature, but still supported. And if a compiler links a library by default instead of only when you explicitly tell it to, that's only a little switch within the compiler to "link libc". Notoriously Unix compilers need to be told to link the math library, it wasn't linked by default because some very early programs didn't use floating point.
To be fair, some standard library functions like memcpy tend to be special-cased these days as they lend themselves to inlining and optimisation.
The standard library is documented and is usually available, though in effect deprecated by Microsoft for security reasons. You can write pretty much any function quite easily with only stdlib functions, what you can't do is fancy IO.

Related

Why does glibc library use assembly

I am looking at this page: https://sys.readthedocs.io/en/latest/doc/01_introduction.html
that goes into explanation about how glibc does system calls. In one of the examples the code is examined and it is shown, that the last instruction glibc does to actually do a system call (meaning the interrupt to the cpu) is written in assembly.... So why is part of glibc in assembly? Is there some sort of advantage by writing that small part in assembly?
Also, the shared libraries during runtime are already compiled to machine code correct?
So why would there be any advantage using two different languages before compilation? Thank you.
The answer is super simple - since C doesn't cover system calls (because it doesn't cover any physical hardware in general, and prefers to express itself in terms of abstract machine), there is no C construct glibc can use to perform system call.
One could argue that compiler could provide a sort of intrinsic to do that, but since in Linux glibc is actually part of the compiler suit of tools (in contains CRT as well) there is really no need for it, glibc can do the job.
Also, last, but not the least, in modern CPUs syscall is usually not an interrupt. Instead, it's a specific instruction (syscall in x86_64).
I want to address this piece of your question:
Also, the shared libraries during runtime are already compiled to machine code correct?
So why would there be any advantage using two different languages before compilation?
SergeyA correctly points out that there isn't any C construct (even with all of GCC's extensions) that will cause the compiler to emit a syscall instruction. That's not the only thing that the C library is supposed to do that simply can't be written purely in C: the implementations of setjmp and longjmp, makecontext and setcontext, the "entry point" code that calls main, the "trampoline" that you return to when you return from a signal handler, and several other low-level bits all require a little bit of hand-written assembly. (Exercise: what do they all have in common?)
But there's another reason to mix assembly language into a program mostly written in C. This is one of the several implementations of memcpy for x86-64 in glibc. It is 3100 lines of hand-written assembly language and preprocessor macros. What it does could be expressed in four lines of C. Why would anyone go to that much trouble? Speed. Compilers are always getting closer, but they haven't yet quite managed to beat the human brain when it comes to squeezing every last possible cycle out of a critical innermost loop. (It is worth mentioning that in early 2018 the glibc devs spent a bunch of time replacing hand-written assembly implementations of math.h functions with C, because the compilers have caught up on those, and C is ever so much more maintainable.)
And yet a third answer, which isn't particularly relevant to glibc but comes up a bunch elsewhere, is that maybe you have two different languages in your program because each of them is better at part of your problem. The statistical language R is mostly implemented in C, but a bunch of its mathematical primitives are (or were, I haven't checked in a while) written in FORTRAN, because FORTRAN is still the language that numerical computation wizards think in. Both C and FORTRAN get compiled to machine code, and in principle you could rewrite all the FORTRAN in C, but nobody wants to.

What's the purpose of using assembly language inside a C program?

What's the purpose of using assembly language inside a C program? Compilers are able to generate assembly language already. In what cases would it be better to write assembly than C? Is performance a consideration?
In addition to what everyone said: not all CPU features are exposed to C. Sometimes, especially in driver and operating system programming, one needs to explicitly work with special registers and/or commands that are not otherwise available.
Also vector extensions.
That was especially true before the advent of compiler intrinsics. Those alleviate the need for inline assembly somewhat.
One more use case for inline assembly has to do with interfacing C with reflected languages. Specifically, assembly is all but necessary if you need to call a function when its prototype is not known at compile time. In other words, when the quantity and datatypes of that function's arguments are but runtime variables. C variadic functions and the stdarg machinery won't help you in this case - they would help you parse a stack frame, but not build one. In assembly, on the other hand, it's quite doable.
This is not an OS/driver scenario. There are at least two technologies out there - Java's JNI and COM Automation - where this is a must. In case of Automation, I'm talking about the way the COM runtime is marshaling dual interfaces using their type libraries.
I can think of a very crude C alternative to assembly for that, but it'd be ugly as sin. Slightly less ugly in C++ with templates.
Yet another use case: crash/run-time error reporting. For postmortem debugging, you'd want to capture as much of program state at the point of crash as possible (i. e. all the CPU registers), and assembly is a much better vehicle for that than C. Postmortem debugging of crashing native code usually involves staring at the assembly anyway.
Yet another use case - code that is intended for execution in another process without that process' co-operation or knowledge. This is often referred to as "shellcode", but it doesn't have to be shell related. Code like that needs to be very carefully written, and it can't rely on the conveniences of a high level language (like the run time library, or having a data section) that are normally taken for granted. When one is after injecting a significant piece of functionality into a target process, they usually end up loading a dynamic library, but the initial trampoline code that loads the library and passes control to it tends to be in assembly.
I've been only covering cases where assembly is necessary. Hand-optimizing for performance is covered in other answers.
There are a few, although not many, cases where hand-optimized assembly language can be made to run more efficiently than assembly language generated by C compilers from C source code. Also, for developers used to assembly language, some things can just seem easier to write in assembler.
For these cases, many C compilers allow inline assembly.
However, this is becoming increasingly rare as C compilers get better and better and producing efficient code, and most platforms put restrictions on some of the low-level type of software that is often the type of software that benefits most from being written in assembler.
In general, it is performance but performance of a very specific kind. For example, the SIMD parallel instructions of a processor might not generated by the compiler. By utilizing processor specific data formats and then issuing processor specific parallel instructions (e.g. ARM NEON or Intel SSE), very fast performance on graphics or signal processing problems can occur. Even then, some compilers allow these to be expressed in C using intrinsic functions.
While it used to be common to use assembly language inserts to hand-optimize critical functions, those days are largely done. Modern compilers are very good and modern processors have very complicated timing requirements so hand optimized code is often less optimal than expected.
There were various reasons to write inline assemblies in C. We can simply categorize the reasons into necessary and unnecessary.
For the reasons of unnecessary, possibly be:
platform compatibility
performance concerning
code optimization
etc.
I consider above as unnecessary because sometime they can be discard or implemented through pure C. For example of platform compatibility, you can totally implement particular version for each platform, however, use inline assemblies might reduce the effort. Here we are not going to talk too much about the unnecessary reasons.
For necessary reasons, they possibly be:
something with standard libraries was insufficient to do
some instruction set was not supported by compilers
object code generated incorrectly
writing stack-sensitive code
etc.
These reasons considered necessary, because of they are almost not possibly done with pure C language. For example, in old DOSes, software interrupt INT21 was not reentrantable. If you want to write a Virtual Dirve fully use INT21 supported by the compiler, it was impossible to do. In this situation, you would need to hook the original INT21, and make it reentrantable. However, the compiled code wraps your every call with prolog/epilog. Thus, you can never break something restricted, or you just crashed the code. You can try any of trick by using the pure language of C with libraries; but even you can successfully find a trick, that would mean you found a particular order that the compiler generates the machine code; this is implying: you tried to let the compiler compiles your code to exactly machine code. So, why not just write inline assemblies directly?
This example explained all above of necessary reasons except instruction set not supported, but I think that was easy to think about.
In fact, there're more reasons to write inline assemblies, but now you have some ideas of them, and so on.
Just as a curiosity, I'm adding here a concrete example of something not-so-low-level you can only do in assembly. I read this in an assembly book from my university time where it was used to show an inherent limitation of C/C++, and how to overcome it with assembly.
The problem is how do I invoke a function when the exact number of parameters is only known at runtime? In fact, in C/C++ you can easily define a function that takes a variable number of arguments like printf. But when it comes to calling that function, the compiler wants to know exactly how many parameters must be passed. You may pass more paremters than required, that won't do any harm. But what if the number grows unexpectedly to 100 or 1000 parameters, and must be picked out of an array?
The solution of course is using assembly, where you can dynamically create a stack frame of the proper size, copy the parameters on the stack, invoke the function, and finally reset the stack.
In practice, this would hardly ever be a limitation (except if the library you're using is really really bad designed). People who use assembly in C have much better reasons to do so like others have pointed out in their answers. Still, I think may be an interesting fact to know.
I would rather think of that as a way to write a very specific code for a specific platform, optimization, though still common, is used less nowadays. Knowledge and usage of assembly in C is also practiced by all-color hats.

Is it a good idea to compile a language to C?

All over the web, I am getting the feeling that writing a C backend for a compiler is not such a good idea anymore. GHC's C backend is not being actively developed anymore (this is my unsupported feeling). Compilers are targeting C-- or LLVM.
Normally, I would think that GCC is a good old mature compiler that does performs well at optimizing code, therefore compiling to C will use the maturity of GCC to yield better and faster code. Is this not true?
I realize that the question greatly depends on the nature of the language being compiled and on other factors such that getting more maintainable code. I am looking for a rather more general answer (w.r.t. the compiled language) that focuses solely on performance (disregarding code quality, ..etc.). I would be also really glad if the answer includes an explanation on why GHC is drifting away from C and why LLVM performs better as a backend (see this) or any other examples of compilers doing the same that I am not aware of.
Let me list my two biggest problems with compiling to C. If this is a problem for your language depends on what kind of features you have.
Garbage collection When you have garbage collection you may have to interrupt regular execution at just about any point in the program, and at this point you need to access all pointers that point into the heap. If you compile to C you have no idea where those pointers might be. C is responsible for local variables, arguments, etc. The pointers are probably on the stack (or maybe in other register windows on a SPARC), but there is no real access to the stack. And even if you scan the stack, which values are pointers? LLVM actually addresses this problem (thought I don't know how well since I've never used LLVM with GC).
Tail calls Many languages assume that tail calls work (i.e., that they don't grow the stack); Scheme mandates it, Haskell assumes it. This is not the case with C. Under certain circumstances you can convince some C compilers to do tail calls. But you want tail calls to be reliable, e.g., when tail calling an unknown function. There are clumsy workarounds, like trampolining, but nothing quite satisfactory.
While I'm not a compiler expert, I believe that it boils down to the fact that you lose something in translation to C as opposed to translating to e.g. LLVM's intermediate language.
If you think about the process of compiling to C, you create a compiler that translates to C code, then the C compiler translates to an intermediate representation (the in-memory AST), then translates that to machine code. The creators of the C compiler have probably spent a lot of time optimizing certain human-made patterns in the language, but you're not likely to be able to create a fancy enough compiler from a source language to C to emulate the way humans write code. There is a loss of fidelity going to C - the C compiler doesn't have any knowledge about your original code's structure. To get those optimizations, you're essentially back-fitting your compiler to try to generate C code that the C compiler knows how to optimize when it's building its AST. Messy.
If, however, you translate directly to LLVM's intermediate language, that's like compiling your code to a machine-independent high-level bytecode, which is akin to the C compiler giving you access to specify exactly what its AST should contain. Essentially, you cut out the middleman that parses the C code and go directly to the high-level representation, which preserves more of the characteristics of your code by requiring less translation.
Also related to performance, LLVM can do some really tricky stuff for dynamic languages like generating binary code at runtime. This is the "cool" part of just-in-time compilation: it is writing binary code to be executed at runtime, instead of being stuck with what was created at compile time.
Part of the reason for GHC's moving away from the old C backend was that the code produced by GHC was not the code gcc could particularly well optimise. So with GHC's native code generator getting better, there was less return for a lot of work. As of 6.12, the NCG's code was only slower than the C compiled code in very few cases, so with the NCG getting even better in ghc-7, there was no sufficient incentive to keep the gcc backend alive. LLVM is a better target because it's more modular, and one can do many optimisations on its intermediate representation before passing the result to it.
On the other hand, last I looked, JHC still produced C and the final binary from that, typically (exclusively?) by gcc. And JHC's binaries tend to be quite fast.
So if you can produce code the C compiler handles well, that is still a good option, but it's probably not worth jumping through too many hoops to produce good C if you can easier produce good executables via another route.
One point that hasn't been brought up yet is, how close is your language to C? If you're compiling a fairly low-level imperative language, C's semantics may map very closely to the language you're implementing. If that's the case, it's probably a win, because the code written in your language is likely to resemble the kind of code someone would write in C by hand. That was definitely not the case with Haskell's C backend, which is one reason why the C backend optimized so poorly.
Another point against using a C backend is that C's semantics are actually not as simple as they look. If your language diverges significantly from C, using a C backend means you're going to have to keep track of all those infuriating complexities, and possibly differences between C compilers as well. It may be easier to use LLVM, with its simpler semantics, or devise your own backend, than keep track of all that.
Aside form all the codegenerator quality reasons, there are also other problems:
The free C compilers (gcc, clang) are a bit Unix centric
Support more than one compiler (e.g. gcc on Unix and MSVC on Windows) requires duplication of effort.
compilers might drag in runtime libraries (or even *nix emulations) on Windows that are painful. Two different C runtimes (e.g. linux libc and msvcrt) to base on complicate your own runtime and its maintenance
You get a big externally versioned blob in your project, which means a major version transition (e.g. a change of mangling could hurts your runtime lib, ABI changes like change of alignment) might require quite some work. Note that this goes for compiler AND externally versioned (parts of the) runtime library. And multiple compilers multiply this. This is not so bad for C as backend though as in the case where you directly connect to (read: bet on) a backend, like being a gcc/llvm frontend.
In many languages that follow this path, you see Cisms trickle through into the main language. Of course this won't happy to you, but you will be tempted :-)
Language functionality that doesn't directly map to standard C (like nested procedures,
and other things that need stack fiddling) are difficult.
If something is wrong, users will be confronted with C level compiler or linker errors that are outside their field of experience. Parsing them and making them your own is painful, specially with multiple compilers and -versions
Note that point 4 also means that you will have to invest time to just keep things working when the external projects evolve. That is time that generally doesn't really go into your project, and since the project is more dynamic, multiplatform releases will need a lot of extra release engineering to cater for change.
So in short, from what I've seen, while such a move allows a swift start (getting a reasonable codegenerator for free for many architectures), there are downsides. Most of them are related to loss of control and poor Windows support of *nix centric projects like gcc. (LLVM is too new to say much on long term, but their rhetoric sounds a lot like gcc did ten years ago). If a project you are hugely dependent on keeps a certain course (like GCC going to win64 awfully slow), then you are stuck with it.
First, decide if you want to have serious non *nix ( OS X being more unixy) support, or only a Linux compiler with a mingw stopgap for Windows? A lot of compilers need first rate Windows support.
Second, how finished must the product become? What's the primary audience ? Is it a tool for the open source developer that can handle a DIY toolchain, or do you want to target a beginner market (like many 3rd party products, e.g. RealBasic)?
Or do you really want to provide a well rounded product for professionals with deep integration and complete toolchains?
All three are valid directions for a compiler project. Ask yourself what your primary direction is, and don't assume that more options will be available in time. E.g. evaluate where projects are now that chose to be a GCC frontend in the early nineties.
Essentially the unix way is to go wide (maximize platforms)
The complete suites (like VS and Delphi, the latter which recently also started to support OS X and has supported linux in the past) go deep and try maximize productivity. (support specially the windows platform nearly completely with deep levels of integration)
The 3rd party projects are less clear cut. They go more after self-employed programmers, and niche shops. They have less developer resources, but manage and focus them better.
As you mentioned, whether C is a good target language depends very much on your source language. So here's a few reasons where C has disadvantages compared to LLVM or a custom target language:
Garbage Collection: A language that wants to support efficient garbage collection needs to know extra information that interferes with C. If an allocation fails, the GC needs to find which values on the stack and in registers are pointers and which aren't. Since the register allocator is not under our control we need to use more expensive techniques such as writing all pointers to a separate stack. This is just one of many issues when trying to support modern GC on top of C. (Note that LLVM also still has some issues in that area, but I hear it's being worked on.)
Feature mapping & Language-specific optimisations: Some languages rely on certain optimisations, e.g., Scheme relies on tail-call optimisation. Modern C compilers can do this but are not guaranteed to do this which could cause problems if a program relies on this for correctness. Another feature that could be difficult to support on top of C is co-routines.
Most dynamically typed languages also cannot be optimised well by C-compilers. For example, Cython compiles Python to C, but the generated C uses calls to many generic functions which are unlikely to be optimised well even by latest GCC versions. Just-in-time compilation ala PyPy/LuaJIT/TraceMonkey/V8 are much more suited to give good performance for dynamic languages (at the cost of much higher implementation effort).
Development Experience: Having an interpreter or JIT can also give you a much more convenient experience for developers -- generating C code, then compiling it and linking it, will certainly be slower and less convenient.
That said, I still think it's a reasonable choice to use C as a compilation target for prototyping new languages. Given that LLVM was explicitly designed as a compiler backend, I would only consider C if there are good reasons not to use LLVM. If the source-level language is very high-level, though, you most likely need an earlier higher-level optimisation pass as LLVM is indeed very low-level (e.g., GHC performs most of its interesting optimisations before generating calling into LLVM). Oh, and if you're prototyping a language, using an interpreter is probably easiest -- just try to avoid features that rely too much on being implemented by an interpreter.
Personally I would compile to C. That way you have a universal intermediary language and don't need to be concerned about whether your compiler supports every platform out there. Using LLVM might get some performance gains (although I would argue the same could probably be achieved by tweaking your C code generation to be more optimizer-friendly), but it will lock you in to only supporting targets LLVM supports, and having to wait for LLVM to add a target when you want to support something new, old, different, or obscure.
As far as I know, C can't query or manipulate processor flags.
This answer is a rebuttal to some of the points made against C as a target language.
Tail call optimizations
Any function that can be tail call optimized is actually equivalent to an iteration (it's an iterative process, in SICP terminology). Additionally, many recursive functions can and should be made tail recursive, for performance reasons, by using accumulators etc.
Thus, in order for your language to guarantee tail call optimization, you would have to detect it and simply not map those functions to regular C functions - but instead create iterations from them.
Garbage collection
It can be actually implemented in C. You can create a run-time system for your language which consists of some basic abstractions over the C memory model - using for example your own memory allocators, constructors, special pointers for objects in the source language, etc.
For example instead of employing regular C pointers for the objects in the source language, a special structure could be created, over which a garbage collection algorithm could be implemented. The objects in your language (more accurately, references) - could behave just like in Java, but in C they could be represented along with meta-information (which you wouldn't have in case you were working just with pointers).
Of course, such a system could have problems integrating with existing C tooling - depends on your implementation and trade-offs that you're willing to make.
Lacking operations
hippietrail noted that C lacks rotate operators (by which I assume he meant circular shift) that are supported by processors. If such operations are available in the instruction set, then they can be added using inline assembly.
The frontend would in this case have to detect the architecture which it's running for and provide the proper snippets. Some kind of a fallback in the form of a regular function should also be provided.
This answer seems to be addressing some core issues seriously. I'd like to see some more substantiation on which problems exactly are caused by C's semantics.
There's a particular case where if you're writing a programming language with strong security* or reliability requirements.
For one, it would take you years to know a big enough subset of C well enough that you know all the C operations you will choose to employ in your compilation are safe and don't evoke undefined behaviour. Secondly, you'd then have to find an implementation of C that you can trust (which would mean a tiny trusted code base, and probably wont be very efficient). Not to mention you'll need to find a trusted linker, OS capable of executing compiled C code, and some basic libraries, all of which would need to be well-defined and trusted.
So in this case you might as well either use assembly language, if you care about about machine independence, some intermediate representation.
*please note that "strong security" here is not related at all to what banks and IT businesses claim to have
Is it a good idea to compile a language to C?
No.
...which begs one obvious question: why do some still think compiling via C is a good idea?
Two big arguments in favour of misusing C in this fashion is that it's stable and standardised:
For GCC, it's C or bust (but there is work underway which may allow other options).
For LLVM, there's the routine breaking of backwards-compatibility in its IR and APIs - what would you prefer: spending time on improving your project or chasing after LLVM?
Providing little more than a promise of stability is somewhat ironic, considering the intended purpose of LLVM.
For these and other reasons, there are various half-done, toy-like, lab-experiment, single-site/use, and otherwise-ignominious via-C backends scattered throughout cyberspace - being abandoned, most have succumbed to bit-rot. But there are some projects which do manage to progress to the mainstream, and their success is then used by via-C supporters to further perpetuate this fantasy.
But if you're one of those supporters, feel free to make fantasy into reality - there's that work happening in GCC, or the resurrected LLVM backend for C. Just imagine it: two well-built, well-maintained via-C backends into which the sum of all prior knowledge can be directed.
They just need you.

Is there any C standard for microcontrollers?

Is there any special C standard for microcontrollers?
I ask because so far when I programmed something under Windows OS, it doesn't matter which compiler I used. If I had a compiler for C99, I knew what I could do with it.
But recently I started to program in C for microcontrollers, and I was shocked, that even it's still C in its basics, like loops, variables creation and so, there is some syntax type I have never seen in C for desktop computers. And furthermore, the syntax is changing from version to version. I use AVR-GCC compiler, and in previous versions, you used a function for port I/O, now you can handle a port like a variable in the new version.
What defines what functions and how to have them to be implemented into the compiler and still have it be called C?
Is there any special C standard for microcontrollers?
No, there is the ISO C standard. Because many small devices have special architecture features that need to be supported, many compilers support language extensions. For example because an 8051 has bit addressable RAM, a _bit data type may be provided. It also has a Harvard architecture, so keywords are provided for specifying different memory address spaces which an address alone does not resolve since different instructions are required to address these spaces. Such extensions will be clearly indicated in the compiler documentation. Moreover, extensions in a conforming compiler should be prefixed with an underscore. However, many provide unadorned aliases for backward compatibility, and their use should be deprecated.
... when I programmed something under Windows OS, it doesn't matter which compiler I used.
Because the Windows API is standardized (by Microsoft), and it only runs on x86, so there is no architectural variation to consider. That said, you may still see FAR, and NEAR macros in APIs, and that is a throwback to 16-bit x86 with its segmented addressing, which also required compiler extensions to handle.
... that even it's still C in its basics, like loops, variables creation and so,
I am not sure what that means. A typical microcontroller application has no OS or a simple kernel, you should expect to see a lot more 'bare metal' or 'system-level' code, because there are no extensive OS APIs and device driver interfaces to do lots of work under the hood for you. All those library calls are just that; they are not part of the language; it is the same C language; jut put to different work.
... there is some syntax type I have never seen in C for desktop computers.
For example...?
And furthermore, the syntax is changing from version to version.
I doubt it. Again; for example...?
I use AVR-GCC compiler, and in previous versions, you used a function for port I/O, now you can handle a port like a variable in the new version.
That is not down to changes in the language or compiler, but more likely simple 'preprocessor magic'. On AVR, all I/O is memory mapped, so if for example you include the device support header, it may have a declaration such as:
#define PORTA (*((volatile char*)0x0100))
You can then write:
PORTA = 0xFF;
to write 0xFF to memory mapped the register at address 0x100. You could just take a look at the header file and see exactly how it does it.
The GCC documentation describes target specific variations; AVR is specifically dealt with here in section 6.36.8, and in 3.17.3. If you compare that with other targets supported by GCC, it has very few extensions, perhaps because the AVR architecture and instruction set were specifically designed for clean and efficient implementation of a C compiler without extensions.
What defines what functions and how to have them to be implemented into the compiler and still have it be called C?
It is important to realise that the C programming language is a distinct entity from its libraries, and that functions provided by libraries are no different from the ones you might write yourself - they are not part of the language - so it can be C with no library whatsoever. Ultimately, library functions are written using the same basic language elements. You cannot expect the level of abstraction present in, say, the Win32 API to exist in a library intended for a microcontroller. You can in most cases expect at least a subset of the C Standard Library to be implemented since it was designed as a systems level library with few target hardware dependencies.
I have been writing C and C++ for embedded and desktop systems for years and do not recognise the huge differences you seem to perceive, so can only assume that they are the result of a misunderstanding of what constitutes the C language. The following books may help.
C Programming Language (2nd Edition) by Brian W. Kernighan and Dennis M. Ritchie
Embedded C by Michael J. Pont
Embedded systems are weird and sometimes have exceptions to "standard" C.
From system to system you will have different ways to do things like declare interrupts, or define what variables live in different segments of memory, or run "intrinsics" (pseudo-functions that map directly to assembly code), or execute inline assembly code.
But the basics of control flow (for/if/while/switch/case) and variable and function declarations should be the same across the board.
and in previous versions, you used function for Port I/O, now you can handle Port like variable in new version.
That's not part of the C language; that's part of a device support library. That's something each manufacturer will have to document.
The C language assumes a von Neumann architecture (one address space for all code and data) which not all architectures actually have, but most desktop/server class machines do have (or at least present with the aid of the OS). To get around this without making horrible programs, the C compiler (with help from the linker) often support some extensions that aid in making use of multiple address spaces efficiently. All of this could be hidden from the programmer, but it would often slow down and inflate programs and data.
As far as how you access device registers -- on different desktop/server class machines this is very different as well, but since programs written to run under common modern OSes for these machines (Mac OS X, Windows, BSDs, or Linux) don't normally access hardware directly, this isn't an issue. There is OS code that has to deal with these issues, though. This is usually done through defining macros and/or functions that are implemented differently on different architectures or even have multiple versions on a single system so that a driver could work for a particular device (such an Ethernet chip) whether it were on a PCI card or a USB dongle (possibly plugged into a USB card plugged into a PCI slot), or directly mapped into the processor's address space.
Additionally, the C standard library makes more assumptions than the compiler (and language proper) about the system that hosts the programs that use it (the C standard library). These things just don't make sense when there isn't a general purpose OS or filesystem. fopen makes no sense on a system without a filesystem, and even printf might not be easily definable.
As far as what AVR-GCC and its libraries do -- there are lots of stuff that goes into how this is done. The AVR is a Harvard architecture with memory mapped device control registers, special function registers, and general purpose registers (memory addresses 0-31), and a different address space for code and constant data. This already falls outside of what standard C assumes. Some of the registers (general, special, and device control) are accessible via special instructions for things like flipping single bits and read/writing to some multi-byte registers (a multi-instruction operation) implicitly blocks interrupts for the next instruction (so that the second half of the operation can happen). These are things that desktop C programs don't have to know anything about, and since AVR-GCC comes from regular GCC, it didn't initially understand all of these things either. That meant that the compiler wouldn't always use the best instructions to access control registers, so:
*(DEVICE_REG_ADDR) |= 1; // Set BIT0 of control register REG
would have turned into:
temp_reg = *DEVICE_REG_ADDR;
temp_reg |= 1;
*DEVICE_REG_ADDR = temp_reg;
because AVR generally has to have things in its general purpose registers to do bit operations on them, though for some memory locations this isn't true. AVR-GCC had to be altered to recognize that when the address of a variable used in certain operations is known at compile time and lies within a certain range, it can use different instructions to preform these operations. Prior to this, AVR-GCC just provided you with some macros (that looked like functions) that had inline assembly to do this (and use the single instruction inplemenations that GCC now uses). If they no longer provide the macro versions of these operations then that's probably a bad choice since it breaks old code, but allowing you to access these registers as though they were normal variables once the ability to do so efficiently and atomically was implemented is good.
I have never seen a C compiler for a microcontroller which did not have some controller-specific extensions. Some compilers are much closer to meeting ANSI standards than others, but for many microcontrollers there are tradeoffs between performance and ANSI compliance.
On many 8-bit microcontrollers, and even some 16-bit ones, accessing variables on a stack frame is slow. Some compilers will always allocate automatic variables on a run-time stack despite the extra code required to do so, some will allocate automatic variables at compile time (allowing variables that are never live simultaneously to overlap), and some allow the behavior to be controlled with a command-line options or #pragma directives. When coding for such machines, I sometimes like to #define a macro called "auto" which gets redefined to "static" if it will help things work faster.
Some compilers have a variety of storage classes for memory. You may be able to improve performance greatly by declaring things to be of suitable storage classes. For example, an 8051-based system might have 96 bytes of "data" memory, 224 bytes of "idata" memory which overlaps the first 96 bytes, and 4K of "xdata" memory.
Variables in "data" memory may be accessed directly.
Variables in "idata" memory may only be accessed by loading their address into a one-byte pointer register. There is no extra overhead accessing them in cases where that would be necessary anyway, so idata memory is great for arrays. If array q is stored in idata memory, a reference to q[i] will be just as fast as if it were in data memory, though a reference to q[0] will be slower (in data memory, the compiler could pre-compute the address and access it without a pointer register; in idata memory that is not possible).
Variables in xdata memory are far slower to access than those in other types, but there's a lot more xdata memory available.
If one tells an 8051 compiler to put everything in "data" by default, one will "run out of memory" if one's variables total more than 96 bytes and one hasn't instructed the compiler to put anything elsewhere. If one puts everything in "xdata" by default, one can use a lot more memory without hitting a limit, but everything will run slower. The best is to place frequently-used variables that will be directly accessed in "data", frequently-used variables and arrays that are indirectly accessed in "idata", and infrequently-used variables and arrays in "xdata".
The vast majority of the standard C language is common with microcontrollers. Interrupts do tend to have slightly different conventions, although not always.
Treating ports like variables is a result of the fact that the registers are mapped to locations in memory on most microcontrollers, so by writing to the appropriate memory location (defined as a variable with a preset location in memory), you set the value on that port.
As previous contributors have said, there is no standard as such, mainly due to different architectures.
Having said that, Dynamic C (sold by Rabbit Semiconductor) is described as "C with real-time extensions". As far as I know, the compiler only targets Rabbit processors, but there are useful additional keywords (for example, costate, cofunc, and waitfor), some real peculiarities (for example, #use mylib.lib instead of #include mylib.h - and no linker), and several omissions from ANSI C (for example, no file-scope static variables).
It's still described as 'C' though.
Wiring has a C-based language syntax. Perhaps you might want to see what makes it as such.

What can you do in C without "std" includes? Are they part of "C," or just libraries?

I apologize if this is a subjective or repeated question. It's sort of awkward to search for, so I wasn't sure what terms to include.
What I'd like to know is what the basic foundation tools/functions are in C when you don't include standard libraries like stdio and stdlib.
What could I do if there's no printf(), fopen(), etc?
Also, are those libraries technically part of the "C" language, or are they just very useful and effectively essential libraries?
The C standard has this to say (5.1.2.3/5):
The least requirements on a conforming
implementation are:
— At sequence points, volatile objects
are stable in the sense that previous
accesses are complete and subsequent
accesses have not yet occurred.
— At program termination, all data
written into files shall be identical
to the result that execution of the
program according to the abstract
semantics would have produced.
— The input and output dynamics of
interactive devices shall take place
as specified in
7.19.3.
So, without the standard library functions, the only behavior that a program is guaranteed to have, relates to the values of volatile objects, because you can't use any of the guaranteed file access or "interactive devices". "Pure C" only provides interaction via standard library functions.
Pure C isn't the whole story, though, since your hardware could have certain addresses which do certain things when read or written (whether that be a SATA or PCI bus, raw video memory, a serial port, something to go beep, or a flashing LED). So, knowing something about your hardware, you can do a whole lot writing in C without using standard library functions. Potentially, you could implement the C standard library, although this might require access to special CPU instructions as well as special memory addresses.
But in pure C, with no extensions, and the standard library functions removed, you basically can't do anything other than read the command line arguments, do some work, and return a status code from main. That's not to be sniffed at, it's still Turing complete subject to resource limits, although your only resource is automatic and static variables, no heap allocation. It's not a very rich programming environment.
The standard libraries are part of the C language specification, but in any language there does tend to be a line drawn between the language "as such", and the libraries. It's a conceptual difference, but ultimately not a very important one in principle, because the standard says they come together. Anyone doing something non-standard could just as easily remove language features as libraries. Either way, the result is not a conforming implementation of C.
Note that a "freestanding" implementation of C only has to implement a subset of standard includes not including any of the I/O, so you're in the position I described above, of relying on hardware-specific extensions to get anything interesting done. If you want to draw a distinction between the "core language" and "the libraries" based on the standard, then that might be a good place to draw the line.
What could I do if there's no printf(), fopen(), etc?
As long as you know how to interface the system you are using you can live without the standard C library. In embedded systems where you only have several kilobytes of memory, you probably don't want to use the standard library at all.
Here is a Hello World! example on Linux and Windows without using any standard C functions:
For example on Linux you can invoke the Linux system calls directly in inline assembly:
/* 64 bit linux. */
#define SYSCALL_EXIT 60
#define SYSCALL_WRITE 1
void sys_exit(int error_code)
{
asm volatile
(
"syscall"
:
: "a"(SYSCALL_EXIT), "D"(error_code)
: "rcx", "r11", "memory"
);
}
int sys_write(unsigned fd, const char *buf, unsigned count)
{
unsigned ret;
asm volatile
(
"syscall"
: "=a"(ret)
: "a"(SYSCALL_WRITE), "D"(fd), "S"(buf), "d"(count)
: "rcx", "r11", "memory"
);
return ret;
}
void _start(void)
{
const char hwText[] = "Hello world!\n";
sys_write(1, hwText, sizeof(hwText));
sys_exit(12);
}
You can look up the manual page for "syscall" which you can find how can you make system calls. On Intel x86_64 you put the system call id into RAX, and then return value will be stored in RAX. The arguments must be put into RDI, RSI, RDX, R10, R9 and R8 in this order (when the argument is used).
Once you have this you should look up how to write inline assembly in gcc.
The syscall instruction changes the RCX, R11 registers and memory so we add this to the clobber list make GCC aware of it.
The default entry point for the GNU linker is _start. Normally the standard library provides it, but without it you need to provide it.
It isn't really a function as there is no caller function to return to. So we must make another system call to exit our process.
Compile this with:
gcc -nostdlib nostd.c
And it outputs Hello world!, and exits.
On Windows the system calls are not published, instead it's hidden behind another layer of abstraction, the kernel32.dll. Which is always loaded when your program starts whether you want it or not. So you can simply include windows.h from the Windows SDK and use the Win32 API as usual:
#include <windows.h>
void _start(void)
{
const char str[] = "Hello world!\n";
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD written;
WriteFile(stdout, str, sizeof(str), &written, NULL);
ExitProcess(12);
}
The windows.h has nothing to do with the standard C library, as you should be able to write Windows programs in any other language too.
You can compile it using the MinGW tools like this:
gcc -nostdlib C:\Windows\System32\kernel32.dll nostdlib.c
Then the compiler is smart enough to resolve the import dependencies and compile your program.
If you disassemble the program, you can see only your code is there, there is no standard library bloat in it.
So you can use C without the standard library.
What could you do? Everything!
There is no magic in C, except perhaps the preprocessor.
The hardest, perhaps is to write putchar - as that is platform dependent I/O.
It's a good undergrad exercise to create your own version of varargs and once you've got that, do your own version of vaprintf, then printf and sprintf.
I did all of then on a Macintosh in 1986 when I wasn't happy with the stdio routines that were provided with Lightspeed C - wrote my own window handler with win_putchar, win_printf, in_getchar, and win_scanf.
This whole process is called bootstrapping and it can be one of the most gratifying experiences in coding - working with a basic design that makes a fair amount of practical sense.
You're certainly not obligated to use the standard libraries if you have no need for them. Quite a few embedded systems either have no standard library support or can't use it for one reason or another. The standard even specifically talks about implementations with no library support, C99 standard 5.1.2.1 "Freestanding environment":
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.
The headers required by C99 to be available in a freestanding implemenation are <float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>. These headers define only types and macros so there's no need for a function library to support them.
Without the standard library, you're entire reliant on your own code, any non-standard libraries that might be available to you, and any operating system system calls that you might be able to interface to (which might be considered non-standard library calls). Quite possibly you'd have to have your C program call assembly routines to interface to devices and/or whatever operating system might be on the platform.
You can't do a lot, since most of the standard library functions rely on system calls; you are limited to what you can do with the built-in C keywords and operators. It also depends on the system; in some systems you may be able to manipulate bits in a way that results in some external functionality, but this is likely to be the exception rather than the rule.
C's elegance is in it's simplicity, however. Unlike Fortran, which includes much functionality as part of the language, C is quite dependent on its library. This gives it a great degree of flexibility, at the expense of being somewhat less consistent from platform to platform.
This works well, for example, in the operating system, where completely separate "libraries" are implemented, to provide similar functionality with an implementation inside the kernel itself.
Some parts of the libraries are specified as part of ANSI C; they are part of the language, I suppose, but not at its core.
None of them is part of the language keywords. However, all C distributions must include an implementation of these libraries. This ensures portability of many programs.
First of all, you could theoretically implement all these functions yourself using a combination of C and assembly, so you could theoretically do anything.
In practical terms, library functions are primarily meant to save you the work of reinventing the wheel. Some things (like string and library functions) are easier to implement. Other things (like I/O) very much depend on the operating system. Writing your own version would be possible for one O/S, but it is going to make the program less portable.
But you could write programs that do a lot of useful things (e.g., calculate PI or the meaning of life, or simulate an automata). Unless you directly used the OS for I/O, however, it would be very hard to observe what the output is.
In day to day programming, the success of a programming language typically necessitates the availability of a useful high-quality standard library and libraries for many useful tasks. These can be first-party or third-party, but they have to be there.
The std libraries are "standard" libraries, in that for a C compiler to be compliant to a standard (e.g. C99), these libraries must be "include-able." For an interesting example that might help in understanding what this means, have a look at Jessica McKellar's challenge here:
http://blog.ksplice.com/2010/03/libc-free-world/
Edit: The above link has died (thanks Oracle...)
I think this link mirrors the article: https://sudonull.com/post/178679-Hello-from-the-libc-free-world-Part-1
The CRT is part of the C language just as much as the keywords and the syntax. If you are using C, your compiler MUST provide an implementation for your target platform.
Edit:
It's the same as the STL for C++. All languages have a standard library. Maybe assembler as the exception, or some other seriously low level languages. But most medium/high levels have standard libs.
The Standard C Library is part of ANSI C89/ISO C90. I've recently been working on the library for a C compiler that previously was not ANSI-compliant.
The book The Standard C Library by P.J. Plauger was a great reference for that project. In addition to spelling out the requirements of the standard, Plauger explains the history of each .h file and the reasons behind some of the API design. He also provides a full implementation of the library, something that helped me greatly when something in the standard wasn't clear.
The standard describes the macros, types and functions for each of 15 header files (include stdio.h, stdlib.h, but also float.h, limits.h, math.h, locale.h and more).
A compiler can't claim to be ANSI C unless it includes the standard library.
Assembly language has simple commands that move values to registers of the CPU, memory, and other basic functions, as well as perform the core capabilities and calculations of the machine. C libraries are basically chunks of assembly code. You can also use assembly code in your C programs. var is an assembly code instruction. When you use 0x before a number to make it Hex, that is assembly instruction. Assembly code is the readable form of machine code, which is the visual form of the actual switch states of the circuits paths.
So while the machine code, and therefore the assembly code, is built into the machine, C languages are combined of all kinds of pre-formed combinations of code, including your own functions that might be in part assembly language and in part calling on other functions of assembly language or other C libraries. So the assembly code is the foundation of all the programming, and after that it's anyone's guess about what is what. That's why there are so many languages and so few true standards.
Yes you can do a ton of stuff without libraries.
The lifesaver is __asm__ in GCC. It is a keyword so yes you can.
Mostly because every programming language is built on Assembly, and you can make system calls directly under some OSes.

Resources