What is relationship between some C and Unix functions - c

For example, in C, we have fopen, and in Unix, we have open. There are some subtle differences between them, but they are doing the same thing.
There are also many other functions that both existing in C and Unix, what is the relationship between them? Which one should I prefer?

open is a system call from Unix systems.
fopen is the standard c function to open a file.
There's some advantages of using fopen rather than open.
It's mult-platform, as it's C standard, you can port your program to any platform with a C compiler.
It supports use of C standard functions, (i.e: fprintf, fscanf)
If you are handling with text files, those functions can deal with different new lines characters (Unix/Windows)

fopen(3) is returning a FILE* on success, but open(2) is returning a file descriptor on success, so they are not doing the same (since not giving the same type).
However, on Linux, fopen is internally using the open system call (and some others too...).
<stdio.h> file handles are handling buffering. With system calls like open and read you'll better do your own buffering.
See also this & that and read Advanced Linux Programming & syscalls(2). Be aware that on Linux, from the user-land application point of view, a system call is essentially an atomic elementary operation (e.g. the SYSCALL or SYSENTER machine instruction).
Use strace(1) to find out which system calls are executed (by a given process or command).
On Linux, the libc is implementing standard functions (like fprintf ....) above system calls.
Many system calls don't have any libc counterpart (except their wrapper), e.g. poll(2)

Related

How exactly _fsopen() works?

How exactly _fsopen() works? Does Linux also has similar way of opening files which prepares the file for subsequent shared reading or writing based on shflag?
Referred article here.
How exactly _fsopen() works?
You've linked to the docs. It does what they say it does. If you're asking how it is implemented then we cannot answer because that information is proprietary.
and Does linux also has similar way of opening files which prepares the file for subsequent shared reading or writing based on shflg?
Linux does not have share modes. That's a Windows quirk. Under Linux or other Unix-like operating systems such as macOS, you don't need special flags or modes to share files between processes.
Overall, _fsopen() is an MS-specific variant of the C standard library's fopen() function. In addition to the share-mode flag, which is not relevant to other operating systems, it performs parameter validation in the manner of various other MS extension functions. On Linux, one takes responsibility for validating one's own arguments and simply uses fopen().
On Windows files are opened using the CreateFileW function which uses the NtCreateFile system call.
Argument dwShareMode is used to specify file sharing policy and contains combination of flags FILE_SHARE_DELETE, FILE_SHARE_READ and FILE_SHARE_WRITE which are mapped to shflag argument of _fsopen.
If you want to know how possible implementation of the function can look like, then first you should keep in mind that MSVCRT tries to support to some equivalent of POSIX file descriptor API. Then check the following functions:
_open_osfhandle allows you to convert NT HANDLE to POSIX-like file descriptor
_fdopen allows you to get a FILE * from a file descriptor (equivalent of POSIX fdopen function).
So the possible implementation can look like this (in pseudo code):
FILE *_fsopen(...)
{
HANDLE hFile = CreateFileW(...);
int fd = _open_osfhandle(hFile, ...);
return _fdopen(fd, ...);
}
Linux doesn't provide an equivalent of file sharing policy, so there is no equivalent.
PS: Another related function is _wsopen - combines CreateFileW and _open_osfhandle.

Do read and write C system calls use buffers?

I was talking with a teacher and he told me that read and write system calls was using buffers, because there is a variable in your system spec that controls how many times you can have access to the device you want to read/write on, and the system uses buffer to stock data while he is waiting for writing on the device.
I saw on an other Stack Overflow post (C fopen vs open) that one of the advantages of fopen and fwrite functions was that those functions were using buffers (which is supposed to be way faster).
I have read the man page of read and write sys calls, and the man pages do not talk about any buffers.
Did I misunderstood something ? How do read / write C syscall buffers work?
The functions you mention, read and write are system calls, therefore their behavior is platform dependent.
As you know, fread and fwrite are C standard library functions. They do buffering in the user space and in this way optimize the performance for typical application. read and write are different. There is some stub code in userspace C libraries (such as GNU libc) for these functions, but the main function of that code is just to provide a convenient wrapper for invoking the right kernel functionality (but it's also possible to invoke that functionality with syscall() directly!)
If you're interested in the details, here is an example: the wrapper for write system call in the uclibc library.
So the typical implementations of read and write do not do buffering in user space. They may still do buffering in the kernel space, though. Read about the O_DIRECT flag for more details: How are the O_SYNC and O_DIRECT flags in open(2) different/alike?

Is using streams over pipes under Linux worthwhile?

When using pipes to communicate between processes under Linux, is there any benefit to creating streams from the pipes using fdopen and then using fread/fwrite on the streams instead of read/write?
Standard Input/Output (stdio)
fdopen is part of the stdio library. From stdio manual, you get this:
The standard I/O library provides a simple and efficient buffered
stream I/O interface. Input and output is mapped into logical data
streams and the physical I/O characteristics are concealed. The
functions and macros are listed below; more information is available
from the individual man pages.
And then:
The stdio library is a part of the library libc and routines are
automatically loaded as needed by the compilers cc(1) and pc(1). The
SYNOPSIS sections of the following manual pages indicate which
include files are to be used, what the compiler declaration for the
function looks like and which external variables are of interest.
Being part of the libc, it means that programs written using these functions will compile in all standard-conforming compilers. If you write a program using open/write (which are POSIX), then your program will only run on POSIX systems.
So you could reason that (a) it's worth because of portability and (b) it's not worth it if you're only using it in Linux, because then using open/write you remove a whole lot of abstraction (from stdio) - keep in mind that under GNU GLibC open/write are wrappers around the syscalls, you're not actually calling then directly, so a small amount of abstraction is present.
Writing into a pipe involves a syscall and a context switch. If you would like to minimize these, you may like to use stdio functions to do buffering in the user space, and this also allows for formatted output with fprintf.
A FILE* created out of a file descriptor using fdopen() will provide the additional features of buffering, error checking (feof(), ferror()) etc which you may or may not need.
I don't see any benefit of using a fdopen() mainly because the pipe itself will do certain level of buffering (on modern Linux, it's 64K).
Besides, in most use-cases where pipes are used in IPC, buffering isn't desirable.
So, I don't see any benefit of using fdopen(). Using read() & write() directly will be sufficient and often desirable in IPC.

Is it possible to fake a file stream, such as stdin, in C?

I am working on an embedded system with no filesystem and I need to execute programs that take input data from files specified via command like arguments or directly from stdin.
I know it is possible to bake-in the file data with the binary using the method from this answer: C/C++ with GCC: Statically add resource files to executable/library but currently I would need to rewrite all the programs to access the data in a new way.
Is it possible to bake-in a text file, for example, and access it using a fake file pointer to stdin when running the program?
If your system is an OS-less bare-metal system, then your C library will have "retargetting" stubs or hooks that you need to implement to hook the library into the platform. This will typically include low-level I/O functions such as open(), read(), write(), seek() etc. You can implement these as you wish to implement the basic stdin, stdout, stderr streams (in POSIX and most other implementations they will have fixed file descriptors 0, 1 and 2 respectively, and do not need to be explicitly opened), file I/O and in this case for managing an arbitrary memory block.
open() for example will be passed a file or device name (the string may be interpreted any way you wish), and will return a file descriptor. You might perhaps recognise "cfgdata:" as a device name to access your "memory file", and you would return a unique descriptor that is then passed into read(). You use the descriptor to reference data for managing the stream; probably little more that an index that is incremented by the number if characters read. The same index may be set directly by the seek() implementation.
Once you have implemented these functions, the higher level stdio functions or even C++ iostreams will work normally for the devices or filesystems you have supported in your low level implementation.
As commented, you could use the POSIX fmemopen function. You'll need a libc providing it, e.g. musl-libc or possibly glibc. BTW for benchmarking purposes you might install some tiny Linux-like OS on your hardware, e.g. uclinux

How does scanf() work inside the OS?

I've been wondering how scanf()/printf() actually works in the hardware and OS levels. Where does the data flow and what exactly is the OS doing around these times? What calls does the OS make? And so on...
scanf() and printf() are functions in libc (the C standard library), and they call the read() and write() operating system syscalls respectively, talking to the file descriptors stdin and stdout respectively (fscanf and fprintf allow you to specify the file stream you want to read/write from).
Calls to read() and write() (and all syscalls) result in a 'context switch' out of your user-level application into kernel mode, which means it can perform privileged operations, such as talking directly to hardware. Depending on how you started the application, the 'stdin' and 'stdout' file descriptors are probably bound to a console device (such as tty0), or some sort of virtual console device (like that exposed by an xterm). read() and write() safely copy the data to/from a kernel buffer called a 'uio'.
The format-string conversion part of scanf and printf does not occur in kernel mode, but just in ordinary user mode (inside 'libc'), the general rule of thumb with syscalls is you switch to kernel mode as infrequently as possible, both to avoid the performance overhead of context switching, and for security (you need to be very careful about anything that happens in kernel mode! less code in kernel mode means less bugs/security holes in the operating system).
btw.. all of this was written from a unix perspective, I don't know how MS Windows works.
On my OS I am working with scanf and printf are based on functions getch() ant putch().
I think the OS just provides two streams, one for input and the other for output, the streams abstract away how the output data gets presented or where the input data comes from.
so what scanf & printf are doing are just adding bytes (or consuming bytes) from either streams.
scanf , printf etc internally all these types of functions can't be directly written in c/c++ language. internally they all are written in assembly language by the use of keword "asm", any thing written with keyword "asm" are directly introduced to object file irrespective of compilation (not changed even after compilation), and in assembly language we have got predefined codes which can implement all these functions ...... so in short SCANF PRINTF etc ALL ARE WRITTEN IN ASSEMBLY LANGUAGE INTERNALLY. YOU CAN DESIGN YOUR OWN INPUT FUNCTION USING KEYWORD "ASM".

Resources