How are C file I/O operations handled at low level? - c

To extend the title.I am wondering how the OS handles functions like fwrite,fread,fopen and fclose.
What is actually a stream?
Sorry if I was not clear enough.
BTW I am using GNU/Linux Ubuntu 11.04.
A bit better explanation of what I am trying to ask.
I want to know how are files written to HDD how are read into memory and how can is later a handle to them created.Is BIOS doing that through drivers?

The C library takes a function like fopen and converts that to the proper OS system call. On Linux that is the POSIX open function. You can see the definition for this in a Linux terminal with man 2 open. On Windows the call would be CreateFile which you can see in the MSDN documentation. On Windows NT, that function is in turn another translation of the actual NT kernel function NtCreateFile.
A stream in the C library is a collection of information stored in a FILE struct. This is usually a 'handle' to the operating system's idea of the file, an area of memory allocated as a 'buffer', and the current read and write positions.
I just noticed you tagged this with 'assembly'. You might then want to know about the really low level details. This seems like a good article.
Now you've changed the question to ask about even lower levels. Well, once the operating system gets a command to open a file, it passes that command to the VFS (Virtual File System). That piece of the operating system looks up the file name, including any directories needed and does the necessary access checks. If this is in RAM cache then no disk access is needed. If not, the VFS sends a read request to the specific file system which is probably EXT4. Then the EXT4 file system driver will determine in what disk block that directory is located in. It will then send a read command to the disk device driver.
Assuming that the disk driver is AHCI, it will convert a request to read a block into a series of register writes that will set up a DMA (Direct Memory Access) request. This looks like a good source for some details.
At that point the AHCI controller on the motherboard takes over. It will communicate with the hard disk controller to cooperate in reading the data and writing into the DMA memory location.
While this is going on the operating system puts the process on hold so it can continue with other work. The hardware is taking care of things and the CPU isn't required to pay attention. The disk request will take many milliseconds during which the CPU can run millions of instructions.
When the request is complete the AHCI controller will send an interrupt. One of the system CPUs will receive the interrupt, look in its IDT (Interrupt Descriptor Table) and jump to the machine code at that location: the interrupt handler.
The operating system interrupt handler will read some data, find out that it has been interrupted by the AHCI controller, then it will jump into the AHCI driver code. The AHCI driver will read the registers on the controller, determine that the read is complete, put a marker into its operations queue, tell the OS scheduler that it needs to run, then return. Nothing else happens at this point.
The operating system will note that it needs to run the AHCI driver's queue. When it decides to do that (it might have a real-time task running or it might be reading networking packets at the moment) it will then go read the data from the memory block marked for DMA and copy that data to the EXT4 file system driver. That EXT4 driver will then return the data to the VFS which will put it into cache. The VFS will return an operating system file handle to the open system call, which will return that to the fopen library call, which will put that into the FILE struct and return a pointer to that to the program.

fopen et al are usually implemented on top of OS-specific system calls. On Unix, this means the APIs for working with file descriptors: open, read, write, close, and a few others. On Windows, it's CreateFile, ReadFile, etc.

Related

Writing a FreeBSD kernel module that handles arbitrary interrupt and output to device

I would like to write a FreeBSD kernel module that could accept some arbitrary interrupts and upon receiving these interrupt, output some data to an arbitrary device. Currently, I'm facing several issues:
How would I acquire interrupts through a specific IRQ? On Linux there is the request_irq() call but it seems there's no similar API for FreeBSD... Say, I want to be able to detect all the keyboard interrupt through my kernel module (the keyboard is on irq1), how would I do that? (On Linux it is possible through calling free_irq(1, NULL) and request_irq(1, ...), correct me if I'm wrong though).
Is it possible at all to write to a device file under /dev through a kernel module? I've read the question Example for reading text files in FreeBSD kernel module; following this example I was able to do read/write on regular files, but not a device file under /dev (the "device" was a pseudo "echo device", the classical one used in char device examples). I was able to open the file though.
I do understand that it is considered as a bad practice to do file I/O's in kernel, but I could not think any other way... If anyone has a better solution please tell me. (i.e. write to a device through its device_t node?)
The reason I was doing this in a kernel is that I really need all interrupts to be hit, and running it in the user space has the risk of missing interrupts due to kernel threads preempting user threads (the interrupts could come very frequent).
I would also appreciate if anyone could provide me with some other ideas on how to implement this program (basically, the idea is a kernel module that could do the job of a microcontroller...)
You can register an IRQ handler with bus_setup_intr.
Normally, what one would do in this situation is to have a driver collect the interrupts and any other useful data, and export it through a device, and then a (real-time maybe?) process in user-space can read from one device, do whatever it needs to do, and write to the other device.

Mocking a memory mapped device in C in userspace

I wish to mock a memory mapped device in C in order to do effective unit testing of a device wrapping library (in Linux).
Now, I know I can mmap a file descriptor into userspace which could in principle represent a mock of said device.
So, AFAICT, my question comes down to this: Is it possible in userspace to create a file descriptor on which mmap can act, with the reading and writing being handled by suitable callbacks?
Alternatively, perhaps this is a solved problem and there is a known kernel driver that can be hooked into?
Considering it's a Linux system, you can implement a very simple FUSE filesystem with just one file on it. The kernel can handle it from there.
The main issue is that you can expect the kernel to not flush every write. There's a msync() call to flush all outstanding writes, though, but your System Under Test isn't going to call that. However, I think you can get away with opening the file descriptor using O_DIRECT | O_DSYNC

How to do data cache flush/invalidate from linux user space

Trying to use cacheable mapped buffers in linux user space. These buffers will be accessed by the accelerators.
In ARMv7-A architecture, is there any possibility to flush/invalidate data cache explicitly from linux user space?
Tried __clear_cache(), it didnt work. As per URL https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html , my understanding is that it flushes only instruction cache.
user space applications run in user mode, do we need to set any privileged mode permissions for cache operations.
More info will be helpful.
There is no way to flush an ARMv7-A/ARMv8-A processor cache from userspace (kernel <= 5.13.x) without writing a kernel driver such as a simple misc class driver that would allow you to do an ioctl or sysfs action that would cause the driver to call the kernel API arch_sync_dma_for_device for the area of RAM that you wish to flush.
See
#include <linux/dma-noncoherent.h>
for the function prototype for arch_sync_dma_for_device.
So unless the logistics of your project allow you to add a kernel module to the system or rebuild and replace the kernel, you can't flush the processor caches from a userspace application. For legacy projects with product in the field, or projects whose kernel version is locked by digital signing the logistics usually do not support this type of invasive solution.
I have successfully demonstrated such a misc driver that flushes the processor caches on an IPQ ARMv8a implementation for a new product design. The driver took me about two hours to write and test.
The __builtin___clear_cache function works in my case (Zynq MP, arm64 + linux) but I think it is because I use mmaped memory from a custom linux driver kernel module which allocates a DMA coherent buffer (dma_alloc_coherent).
Edit: Back to this topic, the __builtin___clear_cache function works well in my case on a general /dev/mem mmapped DDR segment. I open the /dev/mem without the O_SYNC flag.

Block Device driver read/write from user application

I am trying to implement "simple file-system" for my personal experience. For this, I have created a block device driver with which I will perform read/write operations in unit of blocks. Now my question is how should I perform open, read, write and close operation on the block device from the user application.
What I am actually looking for is a function with which I can open the block device /dev/sbd and it returns the struct block_device, if successful. And for the read/write functions, I can issue request to block device struct request with parameters as "buffer, sectore_number, numbe_of_sectors".
Till now I only got block_read() and block_write() functions. But it seems that they are BSD specific. And I am using Debain.
Anyone having idea about it?
Thanks.
I've been doing something similar writing a application level file system that works with files or devices. What you are writing is not really a device driver as device drivers are directly handled/used by the kernel. A user application has no way to access one directly. Regardless, I want to point you to the function calls open(2), read(2), write(2), close(2) (manual page section 2 for all of them). You will need the unistd.h header file to use these. You can set your read/write size as a multiple of your block size when calling read and write. But in the end, you are still going through the kernel.
EDIT: Upon further examination and comments, the device driver really is in the kernel. Normally, there is no direct connection between a driver and an application as there are several layers of code within the kernel to abstract the device so it looks the same like everything else to the application.
There are two ways around this. One is to establish one or more system calls in the system call tree to expose the read/write routines of the device driver to the application. Another idea that I had was to use the ioctl (I/O Control) system call to perform this, but this call is meant to control the actual device. For example, the hard disk uses read and write commands to transfer data, but to talk to the hard drive to get information about it, such as what the last LBA is or get its identity, you would use IOCTL to do that.
Hope this helps.

Why does "read" have to be a system call run in "Kernel Mode"?

As I understood, the UNIX function read() will cause an interrupt(TRAP) and invoke the system call read. I also remembered that it has to switch to "Kernel Mode" before invoking the system call read and the switching is expensive..
I was wondering that why the read operation has to be delegated to system call in "Kernel Mode", instead of being done in "User Mode" completely.
For example, if there could be a service in "User Mode" which manages the access permissions of files, the read operation can just request this service, not disturbing the Kernel..
And for the disk driver, it is said in this link that
Device drivers can run in either user or kernel mode
Does anyone have ideas about this? Why does read have to be in Kernel Mode?
Is not the way Operating Systems are designed. The definition of OS is to handle the computers' hardware and to bring resources to their users. Operating Sysmtes also have the concept of user mode and kernel mode (as you said).
By having these concepts, OS define an specific line to what a user might do and what not. Letting them manage hardware is definitely something OS don't want users to do.
read usually involves a hardware access. Accessing hardware is cumbersome and error prone and can leave the computer in an unusable state. Operating System uses drivers to control the computer's hardware.
Issuing a read (assuming a hard disk IO) generally makes a driver to send a set of commands to the disk controller, read it's output, pass it to main memory, etc. This are dangerous operations that shouldn't be trusted to User Mode.
If there would be a service in user mode to handle this. Context switch still would be needed to be done, because the service would be running as another process.
Sure thing it can be done an Operating System that allows this. But modern operating systems aren't design to fulfill this behavior.
There are other approaches to building operating systems that relies on microkernels. A microkernel just do the minimum to get the pc started and leave everything else to other modules. Meaning that if a module crashes, the system still up. That's the case of specific drivers, filesystems, etc. I don't know if microkernels let these run in user space though.
Hope this helps!
First of all: it's no longer true that calling kernel is very expensive. It used to be when causing an exception/trap/fault/interrupt were the only way to switch from user mode to kernel mode in x86 systems, but that all changed with the addition of the systenter/sysexit machine code instructions, which perform a more lightweight transition.
Even if it is/were expensive, in terms of time consumed, system calls that deal with character and block device drivers should run in kernel mode because dealing with hardware devices involves reading and writting to hardware registers, which could be memory mapped or accessed thru I/O ports.
These registers must be protected from any access from userspace process. Not doing so may lead to any process to not to use the established API for reading a file, and directly use the hardware registers to read and write to the device. In the case of a disk with file, this would allow the userprocess to bypass the filesystem entirely, and hence, all the security and permission system.
So, if we need to protect these hardware registers so no user process can use them, code that does use them cannot run at the same priviledge level as any other user process. Hence, they run in another (more priviledged) mode, which is what is called "kernel mode".
Think on what would happen if you configure a Linux system so /dev/sda (usually the main harddisk in which the root filesystem lives) is read/write to anybody and everybody:
# chmod 666 /dev/sda
Having done this is more or less the equivalent of exposing the hard disk device to any user process. You can effectively write a program that could open, read, and write files stored within this device, but at the same time, you can write a program that open, read and write ANY files within the partition, no matter which permissions files have.
That said, there are cases in which a system runs only trusted applications. This kind of system doesn't need the level of protection that is present in a general purpose system, and hence it can benefit from the increased speed that comes by not depending on layers of APIs to isolate the process from the hardware. The most widely known example would be a videoconsole system. I recall that Windows CE used to run all its programs and device drivers at the same privilege too.

Resources