Something I still don't fully understand. For example, standard C functions such as printf() and scanf() which deal with sending data to the standard output or getting data from the standard input. Will the source code which implements these functions be different depending on if we are using them for Windows or Linux?
I'm guessing the quick answer would be "yes", but do they really have to be different?
I'm probably wrong , but my guess is that the actual function code be the same, but the lower layer functions of the OS that eventually get called by these functions are different. So could any compiler compile these same C functions, but it is what gets linked after (what these functions depend on to work on lower layers) is what gives us the required behavior?
Will the source code which implements these functions be different
depending on if we are using them for Windows or Linux?
Probably. It may even be different on different Linuxes, and for different Windows programs. There are several distinct implementations of the C standard library available for Linux, and maybe even more than one for Windows. Distinct implementations will have different implementation code, otherwise lawyers get involved.
my guess is that the actual function code be the same, but the lower
layer functions of the OS that eventually get called by these
functions are different. So could any compiler compile these same C
functions, but it is what gets linked after (what these functions
depend on to work on lower layers) is what gives us the required
behavior?
It is conceivable that standard library functions would be written in a way that abstracts the environment dependencies to some lower layer, so that the same source for each of those functions themselves can be used in multiple environments, with some kind of environment-specific compatibility layer underneath. Inasmuch as the GNU C library supports a wide variety of environments, it serves as an example of the general principle, though Windows is not among the environments it supports. Even then, however, the environment distinction would be effective even before the link stage. Different environments have a variety of binary formats.
In practice, however, you are very unlikely to see the situation you describe for Windows and Linux.
Yes, they have different implementations.
Moreover you might be using multiple different implementations on the same OS. For example:
MinGW is shipped with its own implementation of standard library which is different from the one used by MSVC.
There are many different implementations of C library even for Linux: glibc, musl, dietlibc and others.
Obviously, this means there is some code duplication in the community, but there are many good reasons for that:
People have different views on how things should be implemented and tested. This alone is enough to "fork" the project.
License: implementations put some restrictions on how they can be used and might require some actions from the end user (GPL requires you to share your code in some cases). Not everyone can follow those requirements.
People have very different needs. Some environments are multithreaded, some are not. printf might need or might not need to use some thread synchronization mechanisms. Some people need locale support, some don't. All this can bloat the code in the end, not everyone is willing to pay for things they do not use. Even strerror is vastly different on different OSes.
Aforementioned synchronization mechanisms are usually OS-specific and work in specific ways. Same can be said about locale handling, signal handling and other things, including the actual data writing and reading.
Some implementations add non-standard extensions that can make your life easier. Not all of those make sense on other OSes. For example glibc adds 'e' mode specifier to open file with O_CLOEXEC flag. This doesn't make sense for Windows.
Many complex things cannot be implemented in pure C and require some compiler-specific extensions. This can tie implementation to a limited number of compilers.
In the end, it is much simpler to have many C libraries, than trying to create a one-size-fits-all implementation.
As you say the higher level parts of the implementation of something like printf, like the code used to format the string using the arguments, can be written in a cross-platform way and be shared between Linux and Windows. I'm not sure if there's a C library that actually does it though.
But to interact with the hardware or use other operating system facilities (such as when printf writes to the console), the libc implementation has to use the OS's interface: the system calls. And these are very different between Windows and Unix-likes, and different even among Unix-likes (POSIX specifies a lot of them but there are OS specific extensions). For example here you can find system call tables for Linux and Windows.
There are two parts to functions like printf(). The first part parses the format string, and assembles an array of characters ready for output. If this part is written in C, there's no reason preventing it being common across all C libraries, and no reason preventing it being different, so long the standard definition of what printf() does is implemented. As it happens, different library developers have read the standard's definition of printf(), and have come up with different ways of parsing and acting on the format string. Most of them have done so correctly.
The second part, the bit that outputs those characters to stdout, is where the differences come in. It depends on using the kernel system call interface; it's the kernel / OS that looks after input/output, and that is done in a specific way. The source code required to get the Linux kernel to output characters is very different to that required to get Windows to output characters.
On Linux, it's usual to use glibc; this does some elaborate things with printf(), buffering the output characters in a pipe until a newline is output, and only then calling the Linux system call for displaying characters on the screen. This means that printf() calls from separate threads are neatly separated, each being on their own line. But the same program source code, compiled against another C library for Linux, won't necessarily do the same thing, resulting in printf() output from different threads being all jumbled up and unreadable.
There's also no reason why the library that contains printf() should be written in C. So long as the same function calling convention as used by the C compiler is honoured, you could write it in assembler (though that'd be slightly mad!). Or Ada (calling convention might be a bit tricky...).
Will the source code which implements these functions be different
Let us try another point-of-view: competition.
No. Competitors in industry are not required by the C spec to share source code to issue a compliant compiler - nor would various standard C library developers always want to.
C does not require "open source".
As per Wikipedia there are many variants of standard C library based on operating system and compilers.
Ref: http://en.wikipedia.org/wiki/C_standard_library
But I want to understand that how plenty of functions which are declared in different headers(eg: stdio.h, string.h, stdlib.h etc. ) are defined in single library. Is the source code file is same for all these header files or there are different libraries for stdio.h, string.h etc? As I am beginner to programming I don't know if multiple source code files can generate a single library like executable. If it is possible then I can understand that libc contains definition of all the standard header files. Where can I see the source code of standard C library?
Is it a static library or dynamic library? If both versions are present in my environment(OS/IDE) which one get linked when I include any standard header file in my source code. Is it IDE dependent? But in case of gcc, programmer does not include libc explicitly.
Is libc a standard name for standard C library?
In windows operating system/environment is it already present or not? If it is present what is the name of it(is it libc only)?
Is there any other standard C library like libm?
Generally speaking, a header (.h) file contains the declarations of functions and variables. Implementation files (.c) contain the actual implementation of the declared functions. Since several implementation files can be translated and linked into a single library binary, you can have one library with multiple headers. Many C library implementations are Open Source, and you can look at their source code at their relative project pages. GNU libc and RedHat newlib are the most prominent. I am sure people will add more in comments.
Implementation defined. You can translate the very same sources into either a static or a dynamic library. It is not uncommon to have both versions installed on your system. Since virtually every executable requires libc, it is usually added to the linker input by default so you don't have to add -lc to every command line.
No. The standard name for the standard C library is "The Standard C Library". Note that virtually all implementations of the standard library extend the library with non-standard functions. These remain non-standard, even if they come as part of the standard library. (alloca() springs to mind.)
MSVCRT.dll or somesuch, if I remember correctly.
libm stands for the math section of the standard library, which is not added to the linker input by default as it is seldom required. There is only one standard C library, the one described by the ISO/IEC 9899 language standard (hence the name). There are many other libraries that can readily be assumed to be present on a given system, but only what's described in the ISO/IEC documents is "the standard".
I'm looking through the glib header files that reside in /usr/include to get a feel for what is going on behind the scenes. All the files I'm looking at simply declare a bunch of macros and functions but I want to take a look at the implementation of these functions.
The glibc source repository is here:
https://sourceware.org/git/?p=glibc.git;a=tree
Note that a lot of the interesting code is under the sysdeps directory, particularly sysdeps/unix/sysv/linux/*. Also worth noting is that stdio is split between stdio-common and libio, and all of the POSIX threads interfaces are implemented under nptl (which also has its own sysdeps tree.
Further, note that there are a lot of functions for which you will simply not find source code at all. Many of the standard functions are simply entry points for making calls to the kernel (syscalls), and these wrappers are automatically generated as part of the build process.
The readable form of the implementation of the functions within the GLibC is contained within its source code, downloadable from its website.
Note that some of the functions are stubs that delegate to system calls, and the complete implementation will be found within the source code of your operating system.
I have heard that in Windows, parameters are passed a single parameter, and then the program splits it into arguments, either in its runtime libraries, or sometimes, in the actual code.
I've heard that most C/C++ compilers do it in runtime libararies (for example, TCC - Tiny C Compiler, which I downloaded)
Are there any C compilers I can download, that don't? Any links to them?
And in such a compiler, would argsv[0] have the whole string?
Added
It's based on what this person (jdedb) said in Super User question Can't pipe or redirect Cygwin grep output, after seeming to suggest that I ask on Stack Overflow.
"It's up to the called program to split the command tail into words, if it wants to operate in Unix (and C language) fashion. (The runtime support libraries of most C and C++ language implementations for Win32 do this splitting behind the scenes."
He said it's the compilers.. But according to Necrolis, it's not the compiler.
(added- Necrolis commented correcting my misreading, compiler!=runtime library)
If you are on Windows, just use GetCommandLine. This is how most CRT wrappers get the command line to split to start with.
As for your actual question, it's not the compiler, but the CRT startup wrapper that they use. If you implement mainCRTstartup, and override the entrypoint with it, you can do whatever you want. A good example of how it works can be seen here.
That "parameter splitting" is the way mandated by the C99 Standard (PDF file) in 5.1.2.2.1.
If an implementation (compiler + library + options) recognizes but does not separate the program name from the other parameters (and parameters from each other) it is not conforming.
Of course, if you use a free-standing implementation none of this applies.
I apologize if this is a subjective or repeated question. It's sort of awkward to search for, so I wasn't sure what terms to include.
What I'd like to know is what the basic foundation tools/functions are in C when you don't include standard libraries like stdio and stdlib.
What could I do if there's no printf(), fopen(), etc?
Also, are those libraries technically part of the "C" language, or are they just very useful and effectively essential libraries?
The C standard has this to say (5.1.2.3/5):
The least requirements on a conforming
implementation are:
— At sequence points, volatile objects
are stable in the sense that previous
accesses are complete and subsequent
accesses have not yet occurred.
— At program termination, all data
written into files shall be identical
to the result that execution of the
program according to the abstract
semantics would have produced.
— The input and output dynamics of
interactive devices shall take place
as specified in
7.19.3.
So, without the standard library functions, the only behavior that a program is guaranteed to have, relates to the values of volatile objects, because you can't use any of the guaranteed file access or "interactive devices". "Pure C" only provides interaction via standard library functions.
Pure C isn't the whole story, though, since your hardware could have certain addresses which do certain things when read or written (whether that be a SATA or PCI bus, raw video memory, a serial port, something to go beep, or a flashing LED). So, knowing something about your hardware, you can do a whole lot writing in C without using standard library functions. Potentially, you could implement the C standard library, although this might require access to special CPU instructions as well as special memory addresses.
But in pure C, with no extensions, and the standard library functions removed, you basically can't do anything other than read the command line arguments, do some work, and return a status code from main. That's not to be sniffed at, it's still Turing complete subject to resource limits, although your only resource is automatic and static variables, no heap allocation. It's not a very rich programming environment.
The standard libraries are part of the C language specification, but in any language there does tend to be a line drawn between the language "as such", and the libraries. It's a conceptual difference, but ultimately not a very important one in principle, because the standard says they come together. Anyone doing something non-standard could just as easily remove language features as libraries. Either way, the result is not a conforming implementation of C.
Note that a "freestanding" implementation of C only has to implement a subset of standard includes not including any of the I/O, so you're in the position I described above, of relying on hardware-specific extensions to get anything interesting done. If you want to draw a distinction between the "core language" and "the libraries" based on the standard, then that might be a good place to draw the line.
What could I do if there's no printf(), fopen(), etc?
As long as you know how to interface the system you are using you can live without the standard C library. In embedded systems where you only have several kilobytes of memory, you probably don't want to use the standard library at all.
Here is a Hello World! example on Linux and Windows without using any standard C functions:
For example on Linux you can invoke the Linux system calls directly in inline assembly:
/* 64 bit linux. */
#define SYSCALL_EXIT 60
#define SYSCALL_WRITE 1
void sys_exit(int error_code)
{
asm volatile
(
"syscall"
:
: "a"(SYSCALL_EXIT), "D"(error_code)
: "rcx", "r11", "memory"
);
}
int sys_write(unsigned fd, const char *buf, unsigned count)
{
unsigned ret;
asm volatile
(
"syscall"
: "=a"(ret)
: "a"(SYSCALL_WRITE), "D"(fd), "S"(buf), "d"(count)
: "rcx", "r11", "memory"
);
return ret;
}
void _start(void)
{
const char hwText[] = "Hello world!\n";
sys_write(1, hwText, sizeof(hwText));
sys_exit(12);
}
You can look up the manual page for "syscall" which you can find how can you make system calls. On Intel x86_64 you put the system call id into RAX, and then return value will be stored in RAX. The arguments must be put into RDI, RSI, RDX, R10, R9 and R8 in this order (when the argument is used).
Once you have this you should look up how to write inline assembly in gcc.
The syscall instruction changes the RCX, R11 registers and memory so we add this to the clobber list make GCC aware of it.
The default entry point for the GNU linker is _start. Normally the standard library provides it, but without it you need to provide it.
It isn't really a function as there is no caller function to return to. So we must make another system call to exit our process.
Compile this with:
gcc -nostdlib nostd.c
And it outputs Hello world!, and exits.
On Windows the system calls are not published, instead it's hidden behind another layer of abstraction, the kernel32.dll. Which is always loaded when your program starts whether you want it or not. So you can simply include windows.h from the Windows SDK and use the Win32 API as usual:
#include <windows.h>
void _start(void)
{
const char str[] = "Hello world!\n";
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD written;
WriteFile(stdout, str, sizeof(str), &written, NULL);
ExitProcess(12);
}
The windows.h has nothing to do with the standard C library, as you should be able to write Windows programs in any other language too.
You can compile it using the MinGW tools like this:
gcc -nostdlib C:\Windows\System32\kernel32.dll nostdlib.c
Then the compiler is smart enough to resolve the import dependencies and compile your program.
If you disassemble the program, you can see only your code is there, there is no standard library bloat in it.
So you can use C without the standard library.
What could you do? Everything!
There is no magic in C, except perhaps the preprocessor.
The hardest, perhaps is to write putchar - as that is platform dependent I/O.
It's a good undergrad exercise to create your own version of varargs and once you've got that, do your own version of vaprintf, then printf and sprintf.
I did all of then on a Macintosh in 1986 when I wasn't happy with the stdio routines that were provided with Lightspeed C - wrote my own window handler with win_putchar, win_printf, in_getchar, and win_scanf.
This whole process is called bootstrapping and it can be one of the most gratifying experiences in coding - working with a basic design that makes a fair amount of practical sense.
You're certainly not obligated to use the standard libraries if you have no need for them. Quite a few embedded systems either have no standard library support or can't use it for one reason or another. The standard even specifically talks about implementations with no library support, C99 standard 5.1.2.1 "Freestanding environment":
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.
The headers required by C99 to be available in a freestanding implemenation are <float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>. These headers define only types and macros so there's no need for a function library to support them.
Without the standard library, you're entire reliant on your own code, any non-standard libraries that might be available to you, and any operating system system calls that you might be able to interface to (which might be considered non-standard library calls). Quite possibly you'd have to have your C program call assembly routines to interface to devices and/or whatever operating system might be on the platform.
You can't do a lot, since most of the standard library functions rely on system calls; you are limited to what you can do with the built-in C keywords and operators. It also depends on the system; in some systems you may be able to manipulate bits in a way that results in some external functionality, but this is likely to be the exception rather than the rule.
C's elegance is in it's simplicity, however. Unlike Fortran, which includes much functionality as part of the language, C is quite dependent on its library. This gives it a great degree of flexibility, at the expense of being somewhat less consistent from platform to platform.
This works well, for example, in the operating system, where completely separate "libraries" are implemented, to provide similar functionality with an implementation inside the kernel itself.
Some parts of the libraries are specified as part of ANSI C; they are part of the language, I suppose, but not at its core.
None of them is part of the language keywords. However, all C distributions must include an implementation of these libraries. This ensures portability of many programs.
First of all, you could theoretically implement all these functions yourself using a combination of C and assembly, so you could theoretically do anything.
In practical terms, library functions are primarily meant to save you the work of reinventing the wheel. Some things (like string and library functions) are easier to implement. Other things (like I/O) very much depend on the operating system. Writing your own version would be possible for one O/S, but it is going to make the program less portable.
But you could write programs that do a lot of useful things (e.g., calculate PI or the meaning of life, or simulate an automata). Unless you directly used the OS for I/O, however, it would be very hard to observe what the output is.
In day to day programming, the success of a programming language typically necessitates the availability of a useful high-quality standard library and libraries for many useful tasks. These can be first-party or third-party, but they have to be there.
The std libraries are "standard" libraries, in that for a C compiler to be compliant to a standard (e.g. C99), these libraries must be "include-able." For an interesting example that might help in understanding what this means, have a look at Jessica McKellar's challenge here:
http://blog.ksplice.com/2010/03/libc-free-world/
Edit: The above link has died (thanks Oracle...)
I think this link mirrors the article: https://sudonull.com/post/178679-Hello-from-the-libc-free-world-Part-1
The CRT is part of the C language just as much as the keywords and the syntax. If you are using C, your compiler MUST provide an implementation for your target platform.
Edit:
It's the same as the STL for C++. All languages have a standard library. Maybe assembler as the exception, or some other seriously low level languages. But most medium/high levels have standard libs.
The Standard C Library is part of ANSI C89/ISO C90. I've recently been working on the library for a C compiler that previously was not ANSI-compliant.
The book The Standard C Library by P.J. Plauger was a great reference for that project. In addition to spelling out the requirements of the standard, Plauger explains the history of each .h file and the reasons behind some of the API design. He also provides a full implementation of the library, something that helped me greatly when something in the standard wasn't clear.
The standard describes the macros, types and functions for each of 15 header files (include stdio.h, stdlib.h, but also float.h, limits.h, math.h, locale.h and more).
A compiler can't claim to be ANSI C unless it includes the standard library.
Assembly language has simple commands that move values to registers of the CPU, memory, and other basic functions, as well as perform the core capabilities and calculations of the machine. C libraries are basically chunks of assembly code. You can also use assembly code in your C programs. var is an assembly code instruction. When you use 0x before a number to make it Hex, that is assembly instruction. Assembly code is the readable form of machine code, which is the visual form of the actual switch states of the circuits paths.
So while the machine code, and therefore the assembly code, is built into the machine, C languages are combined of all kinds of pre-formed combinations of code, including your own functions that might be in part assembly language and in part calling on other functions of assembly language or other C libraries. So the assembly code is the foundation of all the programming, and after that it's anyone's guess about what is what. That's why there are so many languages and so few true standards.
Yes you can do a ton of stuff without libraries.
The lifesaver is __asm__ in GCC. It is a keyword so yes you can.
Mostly because every programming language is built on Assembly, and you can make system calls directly under some OSes.