How does scanf() work inside the OS? - c

I've been wondering how scanf()/printf() actually works in the hardware and OS levels. Where does the data flow and what exactly is the OS doing around these times? What calls does the OS make? And so on...

scanf() and printf() are functions in libc (the C standard library), and they call the read() and write() operating system syscalls respectively, talking to the file descriptors stdin and stdout respectively (fscanf and fprintf allow you to specify the file stream you want to read/write from).
Calls to read() and write() (and all syscalls) result in a 'context switch' out of your user-level application into kernel mode, which means it can perform privileged operations, such as talking directly to hardware. Depending on how you started the application, the 'stdin' and 'stdout' file descriptors are probably bound to a console device (such as tty0), or some sort of virtual console device (like that exposed by an xterm). read() and write() safely copy the data to/from a kernel buffer called a 'uio'.
The format-string conversion part of scanf and printf does not occur in kernel mode, but just in ordinary user mode (inside 'libc'), the general rule of thumb with syscalls is you switch to kernel mode as infrequently as possible, both to avoid the performance overhead of context switching, and for security (you need to be very careful about anything that happens in kernel mode! less code in kernel mode means less bugs/security holes in the operating system).
btw.. all of this was written from a unix perspective, I don't know how MS Windows works.

On my OS I am working with scanf and printf are based on functions getch() ant putch().

I think the OS just provides two streams, one for input and the other for output, the streams abstract away how the output data gets presented or where the input data comes from.
so what scanf & printf are doing are just adding bytes (or consuming bytes) from either streams.

scanf , printf etc internally all these types of functions can't be directly written in c/c++ language. internally they all are written in assembly language by the use of keword "asm", any thing written with keyword "asm" are directly introduced to object file irrespective of compilation (not changed even after compilation), and in assembly language we have got predefined codes which can implement all these functions ...... so in short SCANF PRINTF etc ALL ARE WRITTEN IN ASSEMBLY LANGUAGE INTERNALLY. YOU CAN DESIGN YOUR OWN INPUT FUNCTION USING KEYWORD "ASM".

Related

Do read and write C system calls use buffers?

I was talking with a teacher and he told me that read and write system calls was using buffers, because there is a variable in your system spec that controls how many times you can have access to the device you want to read/write on, and the system uses buffer to stock data while he is waiting for writing on the device.
I saw on an other Stack Overflow post (C fopen vs open) that one of the advantages of fopen and fwrite functions was that those functions were using buffers (which is supposed to be way faster).
I have read the man page of read and write sys calls, and the man pages do not talk about any buffers.
Did I misunderstood something ? How do read / write C syscall buffers work?
The functions you mention, read and write are system calls, therefore their behavior is platform dependent.
As you know, fread and fwrite are C standard library functions. They do buffering in the user space and in this way optimize the performance for typical application. read and write are different. There is some stub code in userspace C libraries (such as GNU libc) for these functions, but the main function of that code is just to provide a convenient wrapper for invoking the right kernel functionality (but it's also possible to invoke that functionality with syscall() directly!)
If you're interested in the details, here is an example: the wrapper for write system call in the uclibc library.
So the typical implementations of read and write do not do buffering in user space. They may still do buffering in the kernel space, though. Read about the O_DIRECT flag for more details: How are the O_SYNC and O_DIRECT flags in open(2) different/alike?

Is using streams over pipes under Linux worthwhile?

When using pipes to communicate between processes under Linux, is there any benefit to creating streams from the pipes using fdopen and then using fread/fwrite on the streams instead of read/write?
Standard Input/Output (stdio)
fdopen is part of the stdio library. From stdio manual, you get this:
The standard I/O library provides a simple and efficient buffered
stream I/O interface. Input and output is mapped into logical data
streams and the physical I/O characteristics are concealed. The
functions and macros are listed below; more information is available
from the individual man pages.
And then:
The stdio library is a part of the library libc and routines are
automatically loaded as needed by the compilers cc(1) and pc(1). The
SYNOPSIS sections of the following manual pages indicate which
include files are to be used, what the compiler declaration for the
function looks like and which external variables are of interest.
Being part of the libc, it means that programs written using these functions will compile in all standard-conforming compilers. If you write a program using open/write (which are POSIX), then your program will only run on POSIX systems.
So you could reason that (a) it's worth because of portability and (b) it's not worth it if you're only using it in Linux, because then using open/write you remove a whole lot of abstraction (from stdio) - keep in mind that under GNU GLibC open/write are wrappers around the syscalls, you're not actually calling then directly, so a small amount of abstraction is present.
Writing into a pipe involves a syscall and a context switch. If you would like to minimize these, you may like to use stdio functions to do buffering in the user space, and this also allows for formatted output with fprintf.
A FILE* created out of a file descriptor using fdopen() will provide the additional features of buffering, error checking (feof(), ferror()) etc which you may or may not need.
I don't see any benefit of using a fdopen() mainly because the pipe itself will do certain level of buffering (on modern Linux, it's 64K).
Besides, in most use-cases where pipes are used in IPC, buffering isn't desirable.
So, I don't see any benefit of using fdopen(). Using read() & write() directly will be sufficient and often desirable in IPC.

Is it possible to fake a file stream, such as stdin, in C?

I am working on an embedded system with no filesystem and I need to execute programs that take input data from files specified via command like arguments or directly from stdin.
I know it is possible to bake-in the file data with the binary using the method from this answer: C/C++ with GCC: Statically add resource files to executable/library but currently I would need to rewrite all the programs to access the data in a new way.
Is it possible to bake-in a text file, for example, and access it using a fake file pointer to stdin when running the program?
If your system is an OS-less bare-metal system, then your C library will have "retargetting" stubs or hooks that you need to implement to hook the library into the platform. This will typically include low-level I/O functions such as open(), read(), write(), seek() etc. You can implement these as you wish to implement the basic stdin, stdout, stderr streams (in POSIX and most other implementations they will have fixed file descriptors 0, 1 and 2 respectively, and do not need to be explicitly opened), file I/O and in this case for managing an arbitrary memory block.
open() for example will be passed a file or device name (the string may be interpreted any way you wish), and will return a file descriptor. You might perhaps recognise "cfgdata:" as a device name to access your "memory file", and you would return a unique descriptor that is then passed into read(). You use the descriptor to reference data for managing the stream; probably little more that an index that is incremented by the number if characters read. The same index may be set directly by the seek() implementation.
Once you have implemented these functions, the higher level stdio functions or even C++ iostreams will work normally for the devices or filesystems you have supported in your low level implementation.
As commented, you could use the POSIX fmemopen function. You'll need a libc providing it, e.g. musl-libc or possibly glibc. BTW for benchmarking purposes you might install some tiny Linux-like OS on your hardware, e.g. uclinux

What is the meaning of low-level I/O? How do I implement it in this program?

I need to write a C program that accepts three command line arguments:
input file one
input file two
name of output file
The program needs to read the data in from files 1 and 2 and concatenate the first file followed by the second file, resulting in the third file.
This seems like it should be pretty easy, but one of the stipulations of the assignment is to only use low-level I/O.
What exactly does that mean (low-level I/O)?
To answer the only question (what is low-level I/O) it probably means operating system native input/output functions.
In POSIX this would be e.g. open(), close(), read() and write().
On Windows e.g. CreateFile(), CloseHandle(), ReadFile() and WriteFile().
Low level basically stands for OS level. This can be done by using System calls.
Application developers often do not have direct access to the system calls, but can access them through an application programming interface (API). The functions that are included in the API invoke the actual system calls. By using the API, certain benefits can be gained:
Portability: as long a system supports an API, any program using that API can compile and run.
Ease of Use: using the API can be significantly easier then using the actual system call.
For more information on system calls have a look here ,here and here.
For your program have a look here.

Understanding the hardware of printf

I was wondering if there was any resources available online that explains what happens with something, like printf of C, that explains what's going on in the very low level (BIOS/kernel calls)
Linux:
printf() ---> printf() in the C library ---> write() in C library ---> write() system call in kernel.
To understand the interface between user space and kernel space, you will need to have some knowledge of how system calls work.
To understand what is going on at the lowest levels, you will need to analyze the source code in the kernel.
The Linux system call quick reference (pdf link) may be useful as it identifies where in the kernel source you might begin looking.
Something like printf, or printf specifically? That is somewhat vague.
printf outputs to the stdout FILE* stream; what that is associated with is system dependent and can moreover be redirected to any other stream device for which the OS provides a suitable device driver. I work in embedded systems, and most often stdout is by default directed to a UART for serial I/O - often that is the only stream I/O device supported, and cannot be redirected. In a GUI OS for console mode applications, the output is 'drawn' graphically in the system defined terminal font to a window, in Windows for example this may involve GDI or DirectDraw calls, which in turn access the video hardware's device driver. On a modern desktop OS, console character output does not involve the BIOS at all other than perhaps initial bootstrapping.
So in short, there typically is a huge amount of software between a printf() call and the hardware upon which it is output.
This is very platform-specific. From a hardware perspective, the back-end implementation of printf() could be directed to a serial port, a non-serial LCD, etc. You're really asking two questions:
How does printf() interpret arguments and format string to generate correct output?
How does output get from printf() to your target device?
You must remember that an OS, kernel, and BIOS are not required for an application to function. Embedded apps typically have printf() and other IO routines write to a character ring buffer. An interrupt may then poll that buffer and manipulate output hardware (LCD, serial port, laser show, etc) to send the buffered output to the correct destination.
By definition, BIOS and kernel calls are platform-specific. What platform are you interested in? Several links to Linux-related information have already been posted.
Also note that printf may not even result in any BIOS or kernel calls, as your platform may not have a kernel or BIOS present (embedded systems are a good example of this).
The printf() takes multiple arguments (variable length arguments function). The user supplies a string and input arguments.
The printf() function creates an internal buffer for constructing output string.
Now, printf() iterates through each character of user string and copies the character to the output string. Printf() only stops at "%".
"%" means there is an argument to convert(Arguments are in the form of char, int, long, float, double or string). It converts it to string and appends to the output buffer. If the argument is a string then it does a string copy.
Finally, printf() may reach at the end of user sting and it copies the entire buffer to the stdout file.

Resources