Do read and write C system calls use buffers? - c

I was talking with a teacher and he told me that read and write system calls was using buffers, because there is a variable in your system spec that controls how many times you can have access to the device you want to read/write on, and the system uses buffer to stock data while he is waiting for writing on the device.
I saw on an other Stack Overflow post (C fopen vs open) that one of the advantages of fopen and fwrite functions was that those functions were using buffers (which is supposed to be way faster).
I have read the man page of read and write sys calls, and the man pages do not talk about any buffers.
Did I misunderstood something ? How do read / write C syscall buffers work?

The functions you mention, read and write are system calls, therefore their behavior is platform dependent.
As you know, fread and fwrite are C standard library functions. They do buffering in the user space and in this way optimize the performance for typical application. read and write are different. There is some stub code in userspace C libraries (such as GNU libc) for these functions, but the main function of that code is just to provide a convenient wrapper for invoking the right kernel functionality (but it's also possible to invoke that functionality with syscall() directly!)
If you're interested in the details, here is an example: the wrapper for write system call in the uclibc library.
So the typical implementations of read and write do not do buffering in user space. They may still do buffering in the kernel space, though. Read about the O_DIRECT flag for more details: How are the O_SYNC and O_DIRECT flags in open(2) different/alike?

Related

Is using streams over pipes under Linux worthwhile?

When using pipes to communicate between processes under Linux, is there any benefit to creating streams from the pipes using fdopen and then using fread/fwrite on the streams instead of read/write?
Standard Input/Output (stdio)
fdopen is part of the stdio library. From stdio manual, you get this:
The standard I/O library provides a simple and efficient buffered
stream I/O interface. Input and output is mapped into logical data
streams and the physical I/O characteristics are concealed. The
functions and macros are listed below; more information is available
from the individual man pages.
And then:
The stdio library is a part of the library libc and routines are
automatically loaded as needed by the compilers cc(1) and pc(1). The
SYNOPSIS sections of the following manual pages indicate which
include files are to be used, what the compiler declaration for the
function looks like and which external variables are of interest.
Being part of the libc, it means that programs written using these functions will compile in all standard-conforming compilers. If you write a program using open/write (which are POSIX), then your program will only run on POSIX systems.
So you could reason that (a) it's worth because of portability and (b) it's not worth it if you're only using it in Linux, because then using open/write you remove a whole lot of abstraction (from stdio) - keep in mind that under GNU GLibC open/write are wrappers around the syscalls, you're not actually calling then directly, so a small amount of abstraction is present.
Writing into a pipe involves a syscall and a context switch. If you would like to minimize these, you may like to use stdio functions to do buffering in the user space, and this also allows for formatted output with fprintf.
A FILE* created out of a file descriptor using fdopen() will provide the additional features of buffering, error checking (feof(), ferror()) etc which you may or may not need.
I don't see any benefit of using a fdopen() mainly because the pipe itself will do certain level of buffering (on modern Linux, it's 64K).
Besides, in most use-cases where pipes are used in IPC, buffering isn't desirable.
So, I don't see any benefit of using fdopen(). Using read() & write() directly will be sufficient and often desirable in IPC.

Is it possible to fake a file stream, such as stdin, in C?

I am working on an embedded system with no filesystem and I need to execute programs that take input data from files specified via command like arguments or directly from stdin.
I know it is possible to bake-in the file data with the binary using the method from this answer: C/C++ with GCC: Statically add resource files to executable/library but currently I would need to rewrite all the programs to access the data in a new way.
Is it possible to bake-in a text file, for example, and access it using a fake file pointer to stdin when running the program?
If your system is an OS-less bare-metal system, then your C library will have "retargetting" stubs or hooks that you need to implement to hook the library into the platform. This will typically include low-level I/O functions such as open(), read(), write(), seek() etc. You can implement these as you wish to implement the basic stdin, stdout, stderr streams (in POSIX and most other implementations they will have fixed file descriptors 0, 1 and 2 respectively, and do not need to be explicitly opened), file I/O and in this case for managing an arbitrary memory block.
open() for example will be passed a file or device name (the string may be interpreted any way you wish), and will return a file descriptor. You might perhaps recognise "cfgdata:" as a device name to access your "memory file", and you would return a unique descriptor that is then passed into read(). You use the descriptor to reference data for managing the stream; probably little more that an index that is incremented by the number if characters read. The same index may be set directly by the seek() implementation.
Once you have implemented these functions, the higher level stdio functions or even C++ iostreams will work normally for the devices or filesystems you have supported in your low level implementation.
As commented, you could use the POSIX fmemopen function. You'll need a libc providing it, e.g. musl-libc or possibly glibc. BTW for benchmarking purposes you might install some tiny Linux-like OS on your hardware, e.g. uclinux

How to disable Operating System(Ubuntu) cache in C program

As far as I know, I can disable OS cache through use open() with O_DIRECT. But How to do that if I am willing to use fopen() instead of open()?
I think due to the alignment requirements of the O_DIRECT flag it's not possible (see that question). The f...() - IO family uses an internal buffer to cache IO operation and I don't think that a standard implementation would align that buffer appropriately.
Edit
For special purposes, I could think of two non-portable solutions:
If you are sure, that your file system doesn't require any special alignment, you could use fdopen():
int fd = open( ....., O_WRONLY|O_DIRECT );
FILE *fp = fdopen( fd, "w" );
If you are working on linux only, using fopencookie() could be a solution:
Use cookie to transort the 'real' fd from open() and provide a write function that copies the data to an appropriately aligned buffer and then calls write() (I have never used fopencookie(), but I think it could be worth trying if using a non-standard GNU extension isn't a NoGo)
In both cases be aware that f-...() I/O functions still do internal buffering so real write()s may not occur before you call fflush() or fclose()
After each read/write from the file, you can call fflush() to force the file to dump all user space buffers to lower level buffers. syncfs() may be of use to you to force the kernel to clear all buffers to disk. If you need greater control at a lower level, you will probably just have to use open() instead of fopen().
You may also want to expore available ioctl() calls for your disk and memory devices to see if caching can be disabled systemwide at that level.

How to avoid the buffer mechanism from FileSystem

Take VirtualBox's virtual disk as example:if VirtualBox didn't avoid the buffer mechanism from FileSystem in host os,the FileSystem in guest os would move data from memory to meory.
In fact ,I want to write a filesystem in user space(put all directorys and files in a single big file). But if I use c api such fread and fwrite ,the FileSystem in os would buffer the data that My UserSpace-FileSystem read、write.But My UserSpace-FileSystem has implement a buffer mechanism by itself.If i didn't avoid the buffer mechanism from FileSystem in os,My UserSpace-FileSystem would move data from memory to memory.It's so bad .
Dose anyone know how to solve this problem?
stdio doesn't support that.
For *NIX: man open for O_DIRECT, man fadvise and man madvise.
For Windows, check the CreateFile for FILE_FLAG_NO_BUFFERING. Probably a good idea to dig the CreateFileMapping too.
Your question isn't very clear, but if all you want to do is use stdio without buffering, then setbuf(file, NULL); will solve your problem. A better solution might be to avoid stdio entirely and use lower-level io primitives read, write, etc. (not part of plain C but specified by POSIX, and with nearly-equivalent versions of them available on most non-POSIX systems too).

How does scanf() work inside the OS?

I've been wondering how scanf()/printf() actually works in the hardware and OS levels. Where does the data flow and what exactly is the OS doing around these times? What calls does the OS make? And so on...
scanf() and printf() are functions in libc (the C standard library), and they call the read() and write() operating system syscalls respectively, talking to the file descriptors stdin and stdout respectively (fscanf and fprintf allow you to specify the file stream you want to read/write from).
Calls to read() and write() (and all syscalls) result in a 'context switch' out of your user-level application into kernel mode, which means it can perform privileged operations, such as talking directly to hardware. Depending on how you started the application, the 'stdin' and 'stdout' file descriptors are probably bound to a console device (such as tty0), or some sort of virtual console device (like that exposed by an xterm). read() and write() safely copy the data to/from a kernel buffer called a 'uio'.
The format-string conversion part of scanf and printf does not occur in kernel mode, but just in ordinary user mode (inside 'libc'), the general rule of thumb with syscalls is you switch to kernel mode as infrequently as possible, both to avoid the performance overhead of context switching, and for security (you need to be very careful about anything that happens in kernel mode! less code in kernel mode means less bugs/security holes in the operating system).
btw.. all of this was written from a unix perspective, I don't know how MS Windows works.
On my OS I am working with scanf and printf are based on functions getch() ant putch().
I think the OS just provides two streams, one for input and the other for output, the streams abstract away how the output data gets presented or where the input data comes from.
so what scanf & printf are doing are just adding bytes (or consuming bytes) from either streams.
scanf , printf etc internally all these types of functions can't be directly written in c/c++ language. internally they all are written in assembly language by the use of keword "asm", any thing written with keyword "asm" are directly introduced to object file irrespective of compilation (not changed even after compilation), and in assembly language we have got predefined codes which can implement all these functions ...... so in short SCANF PRINTF etc ALL ARE WRITTEN IN ASSEMBLY LANGUAGE INTERNALLY. YOU CAN DESIGN YOUR OWN INPUT FUNCTION USING KEYWORD "ASM".

Resources