definition of function printf in C language - c

I have read that C language does not include instructions for input and for output and that printf, scanf, getchar, putchar are actually functions.
Which are the primitive C language instructions to obtain the function printf , then?
Thank you.

If you want to use printf, you have to #include <stdio.h>. That file declares the function.
If you where thinking about how printf is implemented: printf might internally call any other functions and probably goes down to putc (also part of the C runtime) to write out the characters one-by-one. Eventually one of the functions needs to really write the character to the console. How this is done depends on the operating system. On Linux for example printf might internally call the Linux write function. On Windows printf might internally call WriteConsole.

The function printf is documented here; in fact, it is not part of the C language itself. The language itself does not provide a means for input and output. The function printf is defined in a library, which can be accessed using the compiler directive #include <stdio.h>.

No programming language provides true "primitives" for I/O. Any I/O "primitives" rely on lower abstraction levels, in this language or another.
I/O, at the lowest level, needs to access hardware. You might be looking at BIOS interrupts, hardware I/O ports, memory-mapped device controlers, or something else entirely, depending on the actual hardware your program is running on.
Because it would be a real burden to cater for all these possibilities in the implementation of the programming language, a hardware abstraction layer is employed. Individual I/O controllers are accessed by hardware drivers, which in turn are controlled by the operating system, which is providing I/O services to the application developer through a defined, abstract API. These may be accessed directly (e.g. by user-space assembly), or wrapped further (e.g. by the implementation of a programming language's interpreter, or standard library functions).
Whether you are looking at "commands" like (bash) echo or (Python) print, or library functions like (Java) System.out.println() or (C) printf() or (C++) std::cout, is just a syntactic detail: Any I/O is going through several layers of abstraction, because it is easier, and because it protects you from all kinds of erroneous or malicious program behaviour.
Those are the "primitives" of the respective language. If you're digging down further, you are leaving the realm of the language, and enter the realm of its implementation.
I once worked on a C library implementation myself. Ownership of the project has passed on since, but basically it worked like this:
printf() was implemented by means of vfprintf() (as was, eventually, every function of the *printf() family).
vfprintf() used a couple of internal helpers to do the fancy formatting, writing into a buffer.
If the buffer needed to be flushed, it was passed to an internal writef() function.
this writef() function needed to be implemented differently for each target system. On POSIX, it would call write(). On Win32, it would call WriteFile(). And so on.

Related

Are OS libraries written in assembly or in C

I ask this, because I am getting very conflicting definitions of System calls.
One one hand, I have seen the definition that they are an API the OS provides that a user program can call. Since this API is a high level interface, it has to be implemented in a high level language like C.
On the other hand, I have seen that the actual OS syscalls are machine instructions, for which you have to set certain registers to call (according to some compliance standard set by the OS). But this looks nothing like the UNIX APIs like open(), write() and read(), so what is going on here.
I have also read that these high level interfaces are implemented in the C libraries which do the actual assembly code syscalls. In that case, why do we say the OS provides this interface when it is actually provided by the C language. What if I want to perform a UNIX syscall directly to the OS without having to use C?
There are two open functions - one, the syscall open exposed by the operating system (e.g. Linux), and two, the C-library function open, exposed by the C standard library (e.g. glibc).
You can see two different man pages for these functions - run man 2 open to see the man page regarding the syscall, and man 3 open to see the man page regarding the C standard function.
Functions you mentioned like open, write, and read can be confusing - because they exist both as syscalls and as C standard functions. But they are separate entities entirely - in fact, glibc's open function doesn't even use the open syscall - it uses the openat syscall.
On Windows, where the syscall open doesn't even exist - the C standard library function open does still exist, and uses WinAPI's CreateFile behind the scenes.
What if I want to perform a UNIX syscall directly to the OS without
having to use C?
This is possible - indeed, glibc has to do it to implement C standard library functions. But it's tricky, and involves implementing wrappers for the syscalls and sometimes even handcrafting assembly.
If you want to see things for yourself, you can look at how glibc implements open:
int
__libc_open (const char *file, int oflag, ...)
{
int mode = 0;
if (__OPEN_NEEDS_MODE (oflag))
{
va_list arg;
va_start (arg, oflag);
mode = va_arg (arg, int);
va_end (arg);
}
return SYSCALL_CANCEL (openat, AT_FDCWD, file, oflag, mode);
}
...
weak_alias (__libc_open, open)
notice that the function ends with a call to the macro SYSCALL_CANCEL, which will end up calling the OS-exposed openat syscall.
Are OS libraries written in assembly or in C
That is a question that can not really be answered as it depends. Technically there are no limitations on the implementation (i.e. it can be written in any language, though C is probably the most common followed by assembly).
The important part here is the ABI. This defines how OS calls can be made.
You can make system calls in assembly (if you know the ABI you can manually write all the code to comply), the C compiler knows the ABI and will automatically generate all the code required to make a call.
Most languages though allow you to make system calls, they will either know the ABI or have a wrapper API that translates the calls from a language call to the appropriate ABI for that OS.
I ask this, because I am getting very conflicting definitions of System calls.
The definitions will depend on the context. You will have to give examples of what the definitions are AND in what context they are being used.
One one hand, I have seen the definition that they are an API the OS provides that a user program can call.
Sure this is one way to look at it.
More strictly I would ays the OS provides a set of interfaces that can be used to perform privileged tasks. Now those interfaces can be exposed via an API provided by a particular environment that makes them easier to use.
Since this API is a high level interface, it has to be implemented in a high level language like C.
Sort of true.
An environment can expose an API does not mean that it needs a high level language (and C is not a high level language, it is one step above assembly, it is considered a low level language). And just because it is exposed by the language does not mean it is implemented in that language.
On the other hand, I have seen that the actual OS syscalls are machine instructions, for which you have to set certain registers to call (according to some compliance standard set by the OS).
OK. Here we have moved from System Calls to syscalls. We should be very careful on how we use these terms to make sure we are not conflating different terms.
I would (and this is a bit abstract still) think about the computer as several levels of abstraction:
Hardware
------ --------------
syscalls
OS --------------
System Calls (read/write etc..)
------ --------------
Language Interface (read/write etc..)
You can poke the hardware directly if you want (if you know how), but it is better if you can make syscalls (if you know how), but it better to use the OS System Calls which use a well defined ABI, but it better to use the language interface (what you would call the API) to call the underlying System Calls.
But this looks nothing like the UNIX APIs like open(), write() and read(), so what is going on here.
Here the UNIX OS provides the open/close/read interface.
The C libraries provides a very thin API wrapper interface above the the OS System Calls. The C compiler will then generate the correct instructions to call the System Calls using the correct ABI, which in turn will call the next layer down in the OS to use the syscalls.
I have also read that these high level interfaces are implemented in the C libraries which do the actual assembly code syscalls.
The high level interface can be written in any language. But the C one is so easy to use that most other languages don't bother doing it themselves but simply call via the C interface.
It's VERRRY rare to ever directly write something in assembly. By writing in C you can compile it for many different CPU architectures whereas by writing in assembly you are basically stuck with one specific architecture. Most operating systems are written in C. We say the OS provides the interface because you are interacting with the operating system which happens to be written in C.

Are functions such as printf() implemented differently for Linux and Windows

Something I still don't fully understand. For example, standard C functions such as printf() and scanf() which deal with sending data to the standard output or getting data from the standard input. Will the source code which implements these functions be different depending on if we are using them for Windows or Linux?
I'm guessing the quick answer would be "yes", but do they really have to be different?
I'm probably wrong , but my guess is that the actual function code be the same, but the lower layer functions of the OS that eventually get called by these functions are different. So could any compiler compile these same C functions, but it is what gets linked after (what these functions depend on to work on lower layers) is what gives us the required behavior?
Will the source code which implements these functions be different
depending on if we are using them for Windows or Linux?
Probably. It may even be different on different Linuxes, and for different Windows programs. There are several distinct implementations of the C standard library available for Linux, and maybe even more than one for Windows. Distinct implementations will have different implementation code, otherwise lawyers get involved.
my guess is that the actual function code be the same, but the lower
layer functions of the OS that eventually get called by these
functions are different. So could any compiler compile these same C
functions, but it is what gets linked after (what these functions
depend on to work on lower layers) is what gives us the required
behavior?
It is conceivable that standard library functions would be written in a way that abstracts the environment dependencies to some lower layer, so that the same source for each of those functions themselves can be used in multiple environments, with some kind of environment-specific compatibility layer underneath. Inasmuch as the GNU C library supports a wide variety of environments, it serves as an example of the general principle, though Windows is not among the environments it supports. Even then, however, the environment distinction would be effective even before the link stage. Different environments have a variety of binary formats.
In practice, however, you are very unlikely to see the situation you describe for Windows and Linux.
Yes, they have different implementations.
Moreover you might be using multiple different implementations on the same OS. For example:
MinGW is shipped with its own implementation of standard library which is different from the one used by MSVC.
There are many different implementations of C library even for Linux: glibc, musl, dietlibc and others.
Obviously, this means there is some code duplication in the community, but there are many good reasons for that:
People have different views on how things should be implemented and tested. This alone is enough to "fork" the project.
License: implementations put some restrictions on how they can be used and might require some actions from the end user (GPL requires you to share your code in some cases). Not everyone can follow those requirements.
People have very different needs. Some environments are multithreaded, some are not. printf might need or might not need to use some thread synchronization mechanisms. Some people need locale support, some don't. All this can bloat the code in the end, not everyone is willing to pay for things they do not use. Even strerror is vastly different on different OSes.
Aforementioned synchronization mechanisms are usually OS-specific and work in specific ways. Same can be said about locale handling, signal handling and other things, including the actual data writing and reading.
Some implementations add non-standard extensions that can make your life easier. Not all of those make sense on other OSes. For example glibc adds 'e' mode specifier to open file with O_CLOEXEC flag. This doesn't make sense for Windows.
Many complex things cannot be implemented in pure C and require some compiler-specific extensions. This can tie implementation to a limited number of compilers.
In the end, it is much simpler to have many C libraries, than trying to create a one-size-fits-all implementation.
As you say the higher level parts of the implementation of something like printf, like the code used to format the string using the arguments, can be written in a cross-platform way and be shared between Linux and Windows. I'm not sure if there's a C library that actually does it though.
But to interact with the hardware or use other operating system facilities (such as when printf writes to the console), the libc implementation has to use the OS's interface: the system calls. And these are very different between Windows and Unix-likes, and different even among Unix-likes (POSIX specifies a lot of them but there are OS specific extensions). For example here you can find system call tables for Linux and Windows.
There are two parts to functions like printf(). The first part parses the format string, and assembles an array of characters ready for output. If this part is written in C, there's no reason preventing it being common across all C libraries, and no reason preventing it being different, so long the standard definition of what printf() does is implemented. As it happens, different library developers have read the standard's definition of printf(), and have come up with different ways of parsing and acting on the format string. Most of them have done so correctly.
The second part, the bit that outputs those characters to stdout, is where the differences come in. It depends on using the kernel system call interface; it's the kernel / OS that looks after input/output, and that is done in a specific way. The source code required to get the Linux kernel to output characters is very different to that required to get Windows to output characters.
On Linux, it's usual to use glibc; this does some elaborate things with printf(), buffering the output characters in a pipe until a newline is output, and only then calling the Linux system call for displaying characters on the screen. This means that printf() calls from separate threads are neatly separated, each being on their own line. But the same program source code, compiled against another C library for Linux, won't necessarily do the same thing, resulting in printf() output from different threads being all jumbled up and unreadable.
There's also no reason why the library that contains printf() should be written in C. So long as the same function calling convention as used by the C compiler is honoured, you could write it in assembler (though that'd be slightly mad!). Or Ada (calling convention might be a bit tricky...).
Will the source code which implements these functions be different
Let us try another point-of-view: competition.
No. Competitors in industry are not required by the C spec to share source code to issue a compliant compiler - nor would various standard C library developers always want to.
C does not require "open source".

system call in C library function

Since the system calls which any library function of C (say printf()) makes is OS dependant , does that imply that we have a different function printf() for different OS ?
It depends on your definition of "different", because I can think of at least three levels of difference:
Interface differences
High-level code differences
Machine code differences
The C standard suggests an interface, and this interface is supposed to be respected across the board. This means that for any OS with a C standard library, the OS should show your program an outlet called printf, and if your program plugs into it, it can expect it to behave as documented. This means that for all you're concerned, printf is the same across the board.
This doesn't mean that printf has to be the same piece of code in every standard library. If someone told me to write a printf function and told you to write a printf function, we could have a different approach, and that would still be fine as long as we both respected the documented behavior. As a matter of fact, for copyright reasons, you can be certain that the code for Windows's printf is different from Linux's printf code.
And finally, even with the same source code, printf would have to be different to accommodate platform differences. You can't expect an x86 printf to work on ARM, for instance. And as you noted, you can't expect a Linux printf to work on Windows because of platform conventions and system call differences.
So the machine code behind the printf outlet will be different, but the point of the standard is to make it work the same.
If you mean "printf behave differently on different OSes", then the answer is:
externally (from the user of the function viewpoint) no, its semantics is standardized. That means that a given call to such a function leads to the same results, whatever is the OS.
internally probably, its implementation is free. That means, that the computation that such a function will really do to produce you the result can be different.

How to print something without using std lib functions?

In the C language, when printing something on the screen, we usually use printf, puts and so on. Which are all defined in the or other header documents.
Is there any way to print something on screen without using such functions? That is to say, how is printf realised?
Eventually the C function printf will result in a sys_write system call, directly or by going through write (see man 2 write). The actual implementation depends on the compiler and the standard libraries.
Printing to screen requires access to framebuffer (hardware) and userspace programs are not allowed to have a direct access to it. So what they do is make a system call and kernel performs the required function for them. printf -> write system call -> kernel writes the data into framebuffer and then control is given back to user program.
Even if you don't want to use printf or puts (they are implemented in hosted libc) still you have to use write system call to tell the kernel on which device you want to write the buffer.
The standard headers are not, necessarily, a library containing functions written in C code.
They are functions with C "interfase", however it's very probably that they contain explicit machine code, adapted, in each case, to the target system.
The standard headers provide, in this way, ways of doing special process that it would not be possible to achieve in strict C code.
In the specific case of printf(), the situation is even more clear, because if none header is #include-d, then there is not any mechanism through the use of the C syntax only that performs an Input/Output operation.
library ncurses can help you, but if you want to use a low level function use write() and if you want to do kernel programming you have to use printk()

What can you do in C without "std" includes? Are they part of "C," or just libraries?

I apologize if this is a subjective or repeated question. It's sort of awkward to search for, so I wasn't sure what terms to include.
What I'd like to know is what the basic foundation tools/functions are in C when you don't include standard libraries like stdio and stdlib.
What could I do if there's no printf(), fopen(), etc?
Also, are those libraries technically part of the "C" language, or are they just very useful and effectively essential libraries?
The C standard has this to say (5.1.2.3/5):
The least requirements on a conforming
implementation are:
— At sequence points, volatile objects
are stable in the sense that previous
accesses are complete and subsequent
accesses have not yet occurred.
— At program termination, all data
written into files shall be identical
to the result that execution of the
program according to the abstract
semantics would have produced.
— The input and output dynamics of
interactive devices shall take place
as specified in
7.19.3.
So, without the standard library functions, the only behavior that a program is guaranteed to have, relates to the values of volatile objects, because you can't use any of the guaranteed file access or "interactive devices". "Pure C" only provides interaction via standard library functions.
Pure C isn't the whole story, though, since your hardware could have certain addresses which do certain things when read or written (whether that be a SATA or PCI bus, raw video memory, a serial port, something to go beep, or a flashing LED). So, knowing something about your hardware, you can do a whole lot writing in C without using standard library functions. Potentially, you could implement the C standard library, although this might require access to special CPU instructions as well as special memory addresses.
But in pure C, with no extensions, and the standard library functions removed, you basically can't do anything other than read the command line arguments, do some work, and return a status code from main. That's not to be sniffed at, it's still Turing complete subject to resource limits, although your only resource is automatic and static variables, no heap allocation. It's not a very rich programming environment.
The standard libraries are part of the C language specification, but in any language there does tend to be a line drawn between the language "as such", and the libraries. It's a conceptual difference, but ultimately not a very important one in principle, because the standard says they come together. Anyone doing something non-standard could just as easily remove language features as libraries. Either way, the result is not a conforming implementation of C.
Note that a "freestanding" implementation of C only has to implement a subset of standard includes not including any of the I/O, so you're in the position I described above, of relying on hardware-specific extensions to get anything interesting done. If you want to draw a distinction between the "core language" and "the libraries" based on the standard, then that might be a good place to draw the line.
What could I do if there's no printf(), fopen(), etc?
As long as you know how to interface the system you are using you can live without the standard C library. In embedded systems where you only have several kilobytes of memory, you probably don't want to use the standard library at all.
Here is a Hello World! example on Linux and Windows without using any standard C functions:
For example on Linux you can invoke the Linux system calls directly in inline assembly:
/* 64 bit linux. */
#define SYSCALL_EXIT 60
#define SYSCALL_WRITE 1
void sys_exit(int error_code)
{
asm volatile
(
"syscall"
:
: "a"(SYSCALL_EXIT), "D"(error_code)
: "rcx", "r11", "memory"
);
}
int sys_write(unsigned fd, const char *buf, unsigned count)
{
unsigned ret;
asm volatile
(
"syscall"
: "=a"(ret)
: "a"(SYSCALL_WRITE), "D"(fd), "S"(buf), "d"(count)
: "rcx", "r11", "memory"
);
return ret;
}
void _start(void)
{
const char hwText[] = "Hello world!\n";
sys_write(1, hwText, sizeof(hwText));
sys_exit(12);
}
You can look up the manual page for "syscall" which you can find how can you make system calls. On Intel x86_64 you put the system call id into RAX, and then return value will be stored in RAX. The arguments must be put into RDI, RSI, RDX, R10, R9 and R8 in this order (when the argument is used).
Once you have this you should look up how to write inline assembly in gcc.
The syscall instruction changes the RCX, R11 registers and memory so we add this to the clobber list make GCC aware of it.
The default entry point for the GNU linker is _start. Normally the standard library provides it, but without it you need to provide it.
It isn't really a function as there is no caller function to return to. So we must make another system call to exit our process.
Compile this with:
gcc -nostdlib nostd.c
And it outputs Hello world!, and exits.
On Windows the system calls are not published, instead it's hidden behind another layer of abstraction, the kernel32.dll. Which is always loaded when your program starts whether you want it or not. So you can simply include windows.h from the Windows SDK and use the Win32 API as usual:
#include <windows.h>
void _start(void)
{
const char str[] = "Hello world!\n";
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD written;
WriteFile(stdout, str, sizeof(str), &written, NULL);
ExitProcess(12);
}
The windows.h has nothing to do with the standard C library, as you should be able to write Windows programs in any other language too.
You can compile it using the MinGW tools like this:
gcc -nostdlib C:\Windows\System32\kernel32.dll nostdlib.c
Then the compiler is smart enough to resolve the import dependencies and compile your program.
If you disassemble the program, you can see only your code is there, there is no standard library bloat in it.
So you can use C without the standard library.
What could you do? Everything!
There is no magic in C, except perhaps the preprocessor.
The hardest, perhaps is to write putchar - as that is platform dependent I/O.
It's a good undergrad exercise to create your own version of varargs and once you've got that, do your own version of vaprintf, then printf and sprintf.
I did all of then on a Macintosh in 1986 when I wasn't happy with the stdio routines that were provided with Lightspeed C - wrote my own window handler with win_putchar, win_printf, in_getchar, and win_scanf.
This whole process is called bootstrapping and it can be one of the most gratifying experiences in coding - working with a basic design that makes a fair amount of practical sense.
You're certainly not obligated to use the standard libraries if you have no need for them. Quite a few embedded systems either have no standard library support or can't use it for one reason or another. The standard even specifically talks about implementations with no library support, C99 standard 5.1.2.1 "Freestanding environment":
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.
The headers required by C99 to be available in a freestanding implemenation are <float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>. These headers define only types and macros so there's no need for a function library to support them.
Without the standard library, you're entire reliant on your own code, any non-standard libraries that might be available to you, and any operating system system calls that you might be able to interface to (which might be considered non-standard library calls). Quite possibly you'd have to have your C program call assembly routines to interface to devices and/or whatever operating system might be on the platform.
You can't do a lot, since most of the standard library functions rely on system calls; you are limited to what you can do with the built-in C keywords and operators. It also depends on the system; in some systems you may be able to manipulate bits in a way that results in some external functionality, but this is likely to be the exception rather than the rule.
C's elegance is in it's simplicity, however. Unlike Fortran, which includes much functionality as part of the language, C is quite dependent on its library. This gives it a great degree of flexibility, at the expense of being somewhat less consistent from platform to platform.
This works well, for example, in the operating system, where completely separate "libraries" are implemented, to provide similar functionality with an implementation inside the kernel itself.
Some parts of the libraries are specified as part of ANSI C; they are part of the language, I suppose, but not at its core.
None of them is part of the language keywords. However, all C distributions must include an implementation of these libraries. This ensures portability of many programs.
First of all, you could theoretically implement all these functions yourself using a combination of C and assembly, so you could theoretically do anything.
In practical terms, library functions are primarily meant to save you the work of reinventing the wheel. Some things (like string and library functions) are easier to implement. Other things (like I/O) very much depend on the operating system. Writing your own version would be possible for one O/S, but it is going to make the program less portable.
But you could write programs that do a lot of useful things (e.g., calculate PI or the meaning of life, or simulate an automata). Unless you directly used the OS for I/O, however, it would be very hard to observe what the output is.
In day to day programming, the success of a programming language typically necessitates the availability of a useful high-quality standard library and libraries for many useful tasks. These can be first-party or third-party, but they have to be there.
The std libraries are "standard" libraries, in that for a C compiler to be compliant to a standard (e.g. C99), these libraries must be "include-able." For an interesting example that might help in understanding what this means, have a look at Jessica McKellar's challenge here:
http://blog.ksplice.com/2010/03/libc-free-world/
Edit: The above link has died (thanks Oracle...)
I think this link mirrors the article: https://sudonull.com/post/178679-Hello-from-the-libc-free-world-Part-1
The CRT is part of the C language just as much as the keywords and the syntax. If you are using C, your compiler MUST provide an implementation for your target platform.
Edit:
It's the same as the STL for C++. All languages have a standard library. Maybe assembler as the exception, or some other seriously low level languages. But most medium/high levels have standard libs.
The Standard C Library is part of ANSI C89/ISO C90. I've recently been working on the library for a C compiler that previously was not ANSI-compliant.
The book The Standard C Library by P.J. Plauger was a great reference for that project. In addition to spelling out the requirements of the standard, Plauger explains the history of each .h file and the reasons behind some of the API design. He also provides a full implementation of the library, something that helped me greatly when something in the standard wasn't clear.
The standard describes the macros, types and functions for each of 15 header files (include stdio.h, stdlib.h, but also float.h, limits.h, math.h, locale.h and more).
A compiler can't claim to be ANSI C unless it includes the standard library.
Assembly language has simple commands that move values to registers of the CPU, memory, and other basic functions, as well as perform the core capabilities and calculations of the machine. C libraries are basically chunks of assembly code. You can also use assembly code in your C programs. var is an assembly code instruction. When you use 0x before a number to make it Hex, that is assembly instruction. Assembly code is the readable form of machine code, which is the visual form of the actual switch states of the circuits paths.
So while the machine code, and therefore the assembly code, is built into the machine, C languages are combined of all kinds of pre-formed combinations of code, including your own functions that might be in part assembly language and in part calling on other functions of assembly language or other C libraries. So the assembly code is the foundation of all the programming, and after that it's anyone's guess about what is what. That's why there are so many languages and so few true standards.
Yes you can do a ton of stuff without libraries.
The lifesaver is __asm__ in GCC. It is a keyword so yes you can.
Mostly because every programming language is built on Assembly, and you can make system calls directly under some OSes.

Resources