Will file system be corrupted on raw disk modification? - filesystems

Some OSs allow programs to edit raw disk, bypassing the file system. If I'm wrong(ie only raw read is allowed, not raw write), please correct me.
If I'm correct, then what will happen if a program modifies a raw disk, modifying some blocks, but not updating the disk tables(free block list, FAT etc)? Will the file system auto update it or notify the program, get corrupted, or something else?
For example, if nothing is done, the file system may write a new file to the blocks containing those data thinking that they are currently free(if the free block list is not updated by the program).

Related

What data is held in the operating system's file descriptors?

I've been working with assembly and have been working with file IO. From what I've learned, the process goes as follows. CPU makes a system call to the kernel to open a file ie "hello.txt". The kernel then finds that location in the filesystem (persistent memory), makes it accessible for read and/or write, and returns a file descriptor that uniquely identifies that file. From my understanding the file descriptor is an index for a table that stores file data. My question is: what data is stored? presumably storing the entire file data would get grossly memory expense for large files. Does it store file metadata like mime-type, encoding, etc? Or does it actually store the whole contents?

Memory Mapped I/O in Unix

I am unable to understand how files are managed in memory mapped I/O. As normal If we open a file using open or fopen, it returns fd or
file pointer respectively. After this open where the file resides for processing. It is in memory(copy of the file which is in hard disk) or not? If it
is not in memory where the data is fetch by consequent read or write system call or It fetchs data from the hard disk for each time of calling read or write.
Otherwise the copy of the file is stored in memory and the file is accessed by process for furthur manipulation and once the process is completed the file is copied to hard disk. In the above concepts
which scenario is worked ?
The following is the definition given for memory mapped i/o in Advanced Programming in Unix Environment(2nd Edition) book:
Memory-mapped I/O lets us map a file on disk into a buffer in memory so that, when we fetch bytes from the buffer, the corresponding bytes of the file are read. Similarly, when we store data in the buffer, the corresponding bytes are automatically written to the file. This lets us perform I/O without using read or write.
what is mapping a file into memory? And here, they defined the memory is placed in between stack and heap. In this memory, what
type of data is present after mapping a file. It contains copy of the file or the address of the file which resides in hard disk. And
how the above scenario becomes true.
Does anyone explain the working mechanism of memory mapped I/O and mmap functionality?
Normally when you open a file, the system sets up some bookkeeping structures (metadata) but does not need to read any part of the actual data of the file. When you call read(), the system loads a chunk of the file into (virtual) memory which you allocated for the purpose.
When you memory-map a file, the system again sets up bookkeeping, and also sets up a (virtual) memory "mapping" which means a range of valid addresses which, if used, will reflect reads (or writes) of the underlying file. It does not mean the entire file needs to be read at once, because it can be "paged in" on demand, i.e. the system can give you an address range to use, then wait for you to actually use it before loading any data there. This "page faulting" is supported by a hardware device called the Memory Management Unit, or MMU. The same system is used when you run an executable file--the system can simply map it into virtual memory and read pages (chunks) from disk only as needed.
It is in memory(copy of the file which is in hard disk) or not?
According to Computer Programming and Utilization, When you open file with fopen its content are loaded into memory. (Partially or wholly).
If it is not in memory where the data is fetch by consequent read or
write system call
When you fwrite some data, it is eventually copied into the kernel which will then write it to disk (or wherever) after buffering. In general, no part of a file needs to be loaded in order to write.
what is mapping a file into memory?
For more refer here
In this memory, what type of data is present after mapping a file. It
contains copy of the file or the address of the file which resides in
hard disk.
A memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource.Refer this
It is possible to mmap a file to a region of memory. When this is done, the file can be accessed just like an array in the program.This is more efficient than read or write, as only the regions of the file that a program actually accesses are loaded. Accesses to not-yet-loaded parts of the mmapped region are handled in the same way as swapped out pages.
After this open where the file resides for processing. It is in memory(copy of the file which is in hard disk) or not?
On the disk. It may also be partly or completely in memory if the operating system does a read-ahead, but that isn't detectable by you. You still have to issue reads to get data from the file.
If it is not in memory where the data is fetch by consequent read or write system call
From the disk.
or It fetchs data from the hard disk for each time of calling read or write.
In effect, but you also have to consider the effect of any caching.
Otherwise the copy of the file is stored in memory and the file is accessed by process for furthur manipulation and once the process is completed the file is copied to hard disk.
No. The file behaves as though it is all on the disk.
And here, they defined the memory is placed in between stack and heap.
Not in what you quoted.
In this memory, what type of data is present after mapping a file.
The data in the file. The question 'what type of data' doesn't make sense. Data is data.
It contains copy of the file or the address of the file which resides in hard disk.
It effectively contains a copy of the file.
And how the above scenario becomes true.
Via virtual memory. Too broad to cover here.

How does fwite/putc write to Disk?

Suppose we have an already existing file, say <File>. This file has been opened by a C program for update (r+b). We use fseek to navigate to a point inside <File>, other than the end of it. Now we start writing data using fwrite/fputc. Note that we don't delete any data previously existing in <File>...
How does the system handle those writes? Does it rewrite the whole file to another position in the Disk, now containing the new data? Does it fragment the file and write only the new data in another position (and just remember that in the middle there is some free space)? Does it actually overwrite in place only the part that has changed?
There is a good reason for asking: In the first case, if you continuously update a file, the system can get slow. In the second case, it could be faster but will mess up the File System if done to many files. In the third case, especially if you have a solid state Disk, updating the same spot of a File over and over again may render that part of the Disk useless.
Actually, that's where my question originates from. I've read that, to save Disk Sectors from overuse, Solid State Disks move Data to less used sectors, using different techniques. But how exactly does the stdio functions handle such situations?
Thanks in advance for your time! :D
The fileystem handler creates a kind of dicationary writing to sectors on the disc, so when you update the content of the file, the filesystem looks up the dictionary on the disc, which tells it, in which sector on the disc the file data is located. Then it spins (or waits until the disc arrives there) and updates the appropriate sectors on the disc.
That's the short version.
So in case, of updating the file, the file is normally not moved to a new place. When you write new data to the file, appending to it, and the data doesn't fit into the existing sector, then additional sectors are allocated and the data is written there.
If you delete a file, then usually the sectors are marked as free and are reused. So only if you open a new file and rewrite it, it can happen that the file is put in different sectors than before.
But the details can vary, depending on the hardware. AFAIK if you overwrite data on a CD, then the data is newly written (as long as the session is not finalized), because you can not update data on a CD, once it is written.
Your understanding is incorrect: "Note that we don't delete any data previously existing in File"
If you seek into the middle of a file and start writing it will write over whatever was at that position before.
How this is done under the covers probably depends on how computer in the hard disk implements it. It's supposed to be invisible outside the hard disk and shouldn't matter.

Getting data from MATLAB Simulink every 0.008s in .txt file

I need to get data from my simulink model, write it to txt file, have another program read it, and this every 0.008s.
Is there any way to do it? All i could get is to get data into workspace
Also the system is discrete
You should use a To File block to save the data to disk. It will figure out the correct buffer size, etc., for you and write the data to disk. You just have to poll from the other program to get new data.
8 milliseconds is generally not enough data to justify the overhead of disk IO, so the To File block needs more than this to write to disk, and your other program needs more than this to read. This obviously introduces latency.
If you want a lower-latency solution, consider using UDP or TCP comminication blocks that exist in the DSP System Toolbox libarary.
Of course, it's impossible to say anything without a lot more detail.
How much data? What operating system? What happens if you "miss"? What kind of disk is the file on? Does it really have to be a file on-disk, can't you use e.g. pipes or something to avoid hitting disk? What does the "other program" have to do with the data?
8 milliseconds is not a lot of time for a disk to do anything, you're basically going to be assuming all accesses are in cache in order to work, so factor out the disk. Use a pipe or a RAM disk.
8 milliseconds is also not a lot of time for a typical desktop operating system.

If mmap is faster than legacy file accessing, where we see the time saving?

I Understand the usage of the mmap. Considering simple read/write operation on the file, involves, opening the file, and allocating the buffer, read [ which requires context switch, ], and then the data available to the user in the buffer, and changes in the buffer will not reflect into the file unless it is written explictly.
Instead , if we use mmap, writting directly to the buffer is nothing but writting into the file.
The Question:
1) File is in the hard disk, mmaped into the process, Each time i write into mmaped memory, is it written directly to the file?. In this case, does not it require any context switch, because, the changes are done directly into the file itself. If mmap is faster than legacy file accessing, where we see the time saving?
Kindly explain. correct me if i m wrong also.
Updates to the file are not immediately visible in the disk, but are visible after an unmap or following an msync call. Hence, there is no system call during the updates, and the kernel is not involved. However, since the file is lazily read page by page, as needed, OS may need to read-in portions of the file as you cross page boundaries. Most obvious advantage of memory mapping is that it eliminates kernel-space to user-space data copies. There is also no need for system calls to seek to a specific position in a file.

Resources