How do C output functions actually work under the hood? [duplicate] - c

This question already has answers here:
Where can I find the source code for all the C standard libraries?
(4 answers)
Closed 11 months ago.
I am trying to learn some C, but I am finding some of the standard functions a bit opaque.
Take putc or putchar as an example. I am trying to work out what drives this at the most basic level. I have tried to follow their definitions back through the GNU compiler source but it just ends up in this enormous tree of source files.
Is there a primitive "print this character" function that all the others are built from? I had assumed that it was just a write() system call, but an answer to a related question said that this is completely implementation specific. So how else can it actually produce the output if not a system call?

So how else can it actually produce the output if not a system call?
It does use a system call, but the specific system call is implementation-dependent.
Unix implementations use the write() system call. Implementations for other operating systems will use whatever is analogous to this.
There could also be standalone implementations that run directly on hardware without an operating system. These "unhosted" implementations might omit the stdio library, or they could implement its features by accessing the hardware directly. In this case there's no system call, the I/O is done by the stdio library itself.

For Unix based systems for which Linux is part, most functions in stdio library are wrappers that are one layer above the standard I/O system calls. You see, the operating system provides a set of APIs called system calls. Applications cannot directly access hardware resources and hence they usually call these "system calls" whenever they need to do any sort of privileged thing like writing to the screen or reading from the keyboard.
In Unix, everything is abstracted as a file so whenever you need to write characters to a screen, all you need to do is open some file that represents the "screen" and write those characters there. The kernel will take care of the rest. Quite simply, this is how you'd do this in C:
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#define BUFF_SIZE 2
int main()
{
int terminal;
char buffer[BUFF_SIZE] = "C\n"; // This will store the character to print + new line
terminal = open("/dev/tty", O_WRONLY); // systemcall to open terminal
if (terminal < 0)
exit(1); // some error happened
dup2(terminal, STDOUT_FILENO); // use our newly opened terminal as default Standard output
if (write(terminal, buffer, BUFF_SIZE) != BUFF_SIZE) // systemcall to write to terminal
exit(1); // We couldn't write anything
}
This just goes to show you that everything in stdio is layered on top of the basic I/O system calls. These system calls are read, write, open, etc. If you want to learn more about system calls and some OS internals, read the book "Three Easy Pieces" by Andrea Arpaci-Dusseau

Related

How puts standard library function works in C? [duplicate]

This question already has answers here:
Where can I find the source code for all the C standard libraries?
(4 answers)
Closed 11 months ago.
While going through the standard library functions of C (glibc), I found that printf() actually calls puts() functions (_IO_puts). But I am unable to find out how the puts function actually writes to the stdout ?
Does it uses write() system call defined in unistd.h or something else ? One thing I find out that puts() actually calling _IO_xputn through _IO_putn.
Please help. Thank you.
For Unix based systems for which Linux is part, most functions in stdio library are wrappers that are one layer above the standard I/O system calls. You see, the operating system provides a set of APIs called system calls. Applications cannot directly access hardware resources and hence they usually call these "system calls" whenever they need to do any sort of privileged thing like writing to the screen or reading from the keyboard.
In Unix, everything is abstracted as a file so whenever you need to write characters to a screen, all you need to do is open some file that represents the "screen" and write those characters there. The kernel will take care of the rest. Quite simply, this is how you'd do this in C:
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#define BUFF_SIZE 2
int main()
{
int terminal;
char buffer[BUFF_SIZE] = "C\n"; // This will store the character to print + new line
terminal = open("/dev/tty", O_WRONLY); // systemcall to open terminal
if (terminal < 0)
exit(1); // some error happened
dup2(terminal, STDOUT_FILENO); // use our newly opened terminal as default Standard output
if (write(terminal, buffer, BUFF_SIZE) != BUFF_SIZE) // systemcall to write to terminal
exit(1); // We couldn't write anything
}
This just goes to show you that everything in stdio is layered on top of the basic I/O system calls. These system calls are read, write, open, etc. If you want to learn more about system calls and some OS internals, read the book "Three Easy Pieces" by Andrea Arpaci-Dusseau

Do read and write C system calls use buffers?

I was talking with a teacher and he told me that read and write system calls was using buffers, because there is a variable in your system spec that controls how many times you can have access to the device you want to read/write on, and the system uses buffer to stock data while he is waiting for writing on the device.
I saw on an other Stack Overflow post (C fopen vs open) that one of the advantages of fopen and fwrite functions was that those functions were using buffers (which is supposed to be way faster).
I have read the man page of read and write sys calls, and the man pages do not talk about any buffers.
Did I misunderstood something ? How do read / write C syscall buffers work?
The functions you mention, read and write are system calls, therefore their behavior is platform dependent.
As you know, fread and fwrite are C standard library functions. They do buffering in the user space and in this way optimize the performance for typical application. read and write are different. There is some stub code in userspace C libraries (such as GNU libc) for these functions, but the main function of that code is just to provide a convenient wrapper for invoking the right kernel functionality (but it's also possible to invoke that functionality with syscall() directly!)
If you're interested in the details, here is an example: the wrapper for write system call in the uclibc library.
So the typical implementations of read and write do not do buffering in user space. They may still do buffering in the kernel space, though. Read about the O_DIRECT flag for more details: How are the O_SYNC and O_DIRECT flags in open(2) different/alike?

What is relationship between some C and Unix functions

For example, in C, we have fopen, and in Unix, we have open. There are some subtle differences between them, but they are doing the same thing.
There are also many other functions that both existing in C and Unix, what is the relationship between them? Which one should I prefer?
open is a system call from Unix systems.
fopen is the standard c function to open a file.
There's some advantages of using fopen rather than open.
It's mult-platform, as it's C standard, you can port your program to any platform with a C compiler.
It supports use of C standard functions, (i.e: fprintf, fscanf)
If you are handling with text files, those functions can deal with different new lines characters (Unix/Windows)
fopen(3) is returning a FILE* on success, but open(2) is returning a file descriptor on success, so they are not doing the same (since not giving the same type).
However, on Linux, fopen is internally using the open system call (and some others too...).
<stdio.h> file handles are handling buffering. With system calls like open and read you'll better do your own buffering.
See also this & that and read Advanced Linux Programming & syscalls(2). Be aware that on Linux, from the user-land application point of view, a system call is essentially an atomic elementary operation (e.g. the SYSCALL or SYSENTER machine instruction).
Use strace(1) to find out which system calls are executed (by a given process or command).
On Linux, the libc is implementing standard functions (like fprintf ....) above system calls.
Many system calls don't have any libc counterpart (except their wrapper), e.g. poll(2)

Is it possible to fake a file stream, such as stdin, in C?

I am working on an embedded system with no filesystem and I need to execute programs that take input data from files specified via command like arguments or directly from stdin.
I know it is possible to bake-in the file data with the binary using the method from this answer: C/C++ with GCC: Statically add resource files to executable/library but currently I would need to rewrite all the programs to access the data in a new way.
Is it possible to bake-in a text file, for example, and access it using a fake file pointer to stdin when running the program?
If your system is an OS-less bare-metal system, then your C library will have "retargetting" stubs or hooks that you need to implement to hook the library into the platform. This will typically include low-level I/O functions such as open(), read(), write(), seek() etc. You can implement these as you wish to implement the basic stdin, stdout, stderr streams (in POSIX and most other implementations they will have fixed file descriptors 0, 1 and 2 respectively, and do not need to be explicitly opened), file I/O and in this case for managing an arbitrary memory block.
open() for example will be passed a file or device name (the string may be interpreted any way you wish), and will return a file descriptor. You might perhaps recognise "cfgdata:" as a device name to access your "memory file", and you would return a unique descriptor that is then passed into read(). You use the descriptor to reference data for managing the stream; probably little more that an index that is incremented by the number if characters read. The same index may be set directly by the seek() implementation.
Once you have implemented these functions, the higher level stdio functions or even C++ iostreams will work normally for the devices or filesystems you have supported in your low level implementation.
As commented, you could use the POSIX fmemopen function. You'll need a libc providing it, e.g. musl-libc or possibly glibc. BTW for benchmarking purposes you might install some tiny Linux-like OS on your hardware, e.g. uclinux

How does scanf() work inside the OS?

I've been wondering how scanf()/printf() actually works in the hardware and OS levels. Where does the data flow and what exactly is the OS doing around these times? What calls does the OS make? And so on...
scanf() and printf() are functions in libc (the C standard library), and they call the read() and write() operating system syscalls respectively, talking to the file descriptors stdin and stdout respectively (fscanf and fprintf allow you to specify the file stream you want to read/write from).
Calls to read() and write() (and all syscalls) result in a 'context switch' out of your user-level application into kernel mode, which means it can perform privileged operations, such as talking directly to hardware. Depending on how you started the application, the 'stdin' and 'stdout' file descriptors are probably bound to a console device (such as tty0), or some sort of virtual console device (like that exposed by an xterm). read() and write() safely copy the data to/from a kernel buffer called a 'uio'.
The format-string conversion part of scanf and printf does not occur in kernel mode, but just in ordinary user mode (inside 'libc'), the general rule of thumb with syscalls is you switch to kernel mode as infrequently as possible, both to avoid the performance overhead of context switching, and for security (you need to be very careful about anything that happens in kernel mode! less code in kernel mode means less bugs/security holes in the operating system).
btw.. all of this was written from a unix perspective, I don't know how MS Windows works.
On my OS I am working with scanf and printf are based on functions getch() ant putch().
I think the OS just provides two streams, one for input and the other for output, the streams abstract away how the output data gets presented or where the input data comes from.
so what scanf & printf are doing are just adding bytes (or consuming bytes) from either streams.
scanf , printf etc internally all these types of functions can't be directly written in c/c++ language. internally they all are written in assembly language by the use of keword "asm", any thing written with keyword "asm" are directly introduced to object file irrespective of compilation (not changed even after compilation), and in assembly language we have got predefined codes which can implement all these functions ...... so in short SCANF PRINTF etc ALL ARE WRITTEN IN ASSEMBLY LANGUAGE INTERNALLY. YOU CAN DESIGN YOUR OWN INPUT FUNCTION USING KEYWORD "ASM".

Resources