Sectors written when over-writing a file? - c

Imagine there is a file of size 5 MB. I open it in write mode in C and then fill it up with junk data of exactly 5 MB. Will the same disk sectors previously used be overwritten or will the OS select new disk sectors for the new data?

It depends on the file system.
Classically, the answer would be 'yes, the same sectors would be overwritten with the new data'.
With journalled file systems, the answer might be different. With flash drive systems, the answer would almost certainly be 'no; new sectors will be written to avoid wearing out the currently written sectors'.

The filesystem could do anything it wishes. But any real file system will write the data back to the same sectors.
Image if it didn't. Every time your wrote to a file the file system would have to find a new free sector, write to that sector, then update the file system meta data for the file to point to the new sector. This would also cause horrible file fragmentation, because writing a single sector in the middle of your contiguous 5MB file would cause it to fragment. So it's much easier to just write back to the same sector.
The only exception I can think of is JFFS2 because it was designed to support wear leveling on flash.
Now the file system will write to the same sector, but the disk hardware could write anywhere it wants. In fact on SSD/flash drives the hardware, to handle wear leveling, is almost guaranteed to write the data to a different sector. But that is transparent to the OS/file system. (It's possible on hard drives as well due to sector sparing)

Related

How to get size of disk driver sector

Can get size of disk sector via the Linux API/ABI? It's about the quantum of I/O disk, normally it's equal 512 bytes, but others values can be too (usually multiple 512 bytes).
Also it should not confuse to size of logical block or to size of sector of a file system.
A block device is reflected as file in a file system of an UNIX (/dev/sda, /dev/sr etc.) It means, can open that file and make some manipulations to its content like with content of the corresponded block device.
So specifically the work to a true block device similar the work to a virtual hard disk (the .vhd format for instance).
But i don't know how to get size of sector in general case.
At moment i've single solution: get the maximal CHS address and size of hard drive, both action via BIOS. But i think, it's bad idea, because portability lost

Giving read() a start position

When you give read a start position - does it slow down read()? Does it have to read everything before the position to find the text it's looking for?
In other words, we have two different read commands,
read(fd,1000,2000)
read(fd,50000,51000)
where we give it two arguments:
read(file descriptor, start, end)
is there a way to implement read so that the two commands take the same amount of computing time?
You don't name a specific file system implementation or one specific language library so I will comment in general.
In general, a file interface will be built directly on top of the OS level file interface. In the OS level interface for most types of drives, data can be read in sectors with random access. The drive can seek to the start of a particular sector (without reading data) and can then read that sector without reading any of the data before it in the file. Because data is typically read in chunks by sector, if the data you request doesn't perfectly align on a sector boundary, it's possible the OS will read the entire sector containing the first byte you requested, but it won't be a lot and won't make a meaningful difference in performance as once the read/write head is positioned correctly, a sector is typically read in one DMA transfer.
Disk access times to read a given set of bytes for a spinning hard drive are not entirely predictable so it's not possible to design a function that will take exactly the same time no matter which bytes you're reading. This is because there's OS level caching, disk controller level caching and a difference in seek time for the read/write head depending upon what the read/write head was doing beforehand. If there are any other processes or services running on your system (which there always are) some of them may also be using the disk and contending for disk access too. In addition, depending upon how your files were written and how many bytes you're reading and how well your files are optimized, all the bytes you read may or may not be in one long readable sequence. It's possible the drive head may have to read some bytes, then seek to a new position on the disk and then read some more. All of that is not entirely predictable.
Oh, and some of this is different if it's a different type of drive (like an SSD) since there's no drive head to seek.
When you give read a start position - does it slow down read()?
No. The OS reads the directory entry to find out where the file is located on the disk, then calculates where on the disk your desired read should be, seeks to that position on the disk and starts reading.
Does it have to read everything before the position to find the text it's looking for?
No. Since it reads sectors at a time, it may read a few bytes before what you requested (whatever is before it in the sector), but sectors are not huge (often 8K) and are typically read in one fell swoop using DMA so that extra part of the sector before your desired data is not likely noticeable.
Is there a way to implement read so that the two commands take the same amount of computing time?
So no, not really. Disk reads, even of identical number of bytes vary a bit depending upon the situation and what else might be happening on the computer and what else might be cached already by the OS or the drive itself.
If you share what problem you're really trying to solve, we could probably suggest alternate approaches rather than relying on a given disk read taking an exact amount of time.
Well, filesystems usually split the data in a file in even-sized blocks. In most file systems the allocated blocks are organized in trees with high branching factor so it is effectively the same time to find the the nth data block than the first data block of the file, computing-wise.
The only general exception to this rule is the brain-damaged floppy disk file system FAT from Microsoft that should have become extinct in 1980s, because in it the blocks of the file are organized in a singly-linked list so to find the nth block you need to scan through n items in the list. Of course decent operating systems then have all sorts of tricks to address the shortcomings here.
Then the next thing is that your reads should touch the same number of blocks or operating system memory pages. Usually operating system pages are 4K nowadays and disk blocks something like 4k too so having every count being a multiple of 4096, 8192 or 16384 is better design than to have decimal even numbers.
i.e.
read(fd, 4096, 8192)
read(fd, 50 * 4096, 51 * 4096)
While it does not affect the computing time in a multiprocessing system, the type of media affects a lot: in magnetic disks the heads need to move around to find the new read position, and the disk must have spun to be in the reading position whereas SSDs have identical random access timings regardless of where on disk the data is positioned. And additionally the operating system might cache frequently accessed locations or expect that the block that is read after N would be N + 1 and hence such order be faster. But most of the time you wouldn't care.
Finally: perhaps instead of read you should consider using memory mapped I/O for random accesses!
Read typically reads data from the given file descriptor into a buffer. The amount of data it reads is from start (arg2) - end (arg3). More generically put the amount of data read can be found with (end-start). So if you have the following reads
read(fd1, 0xffff, 0xffffffff)
and
read(fd2, 0xf, 0xff)
the second read will be quicker because the end (0xff) - the start (0xf) is less than the first reads end (0xffffffff) - start (0xffff). AKA less bytes are being read.

Getting data from MATLAB Simulink every 0.008s in .txt file

I need to get data from my simulink model, write it to txt file, have another program read it, and this every 0.008s.
Is there any way to do it? All i could get is to get data into workspace
Also the system is discrete
You should use a To File block to save the data to disk. It will figure out the correct buffer size, etc., for you and write the data to disk. You just have to poll from the other program to get new data.
8 milliseconds is generally not enough data to justify the overhead of disk IO, so the To File block needs more than this to write to disk, and your other program needs more than this to read. This obviously introduces latency.
If you want a lower-latency solution, consider using UDP or TCP comminication blocks that exist in the DSP System Toolbox libarary.
Of course, it's impossible to say anything without a lot more detail.
How much data? What operating system? What happens if you "miss"? What kind of disk is the file on? Does it really have to be a file on-disk, can't you use e.g. pipes or something to avoid hitting disk? What does the "other program" have to do with the data?
8 milliseconds is not a lot of time for a disk to do anything, you're basically going to be assuming all accesses are in cache in order to work, so factor out the disk. Use a pipe or a RAM disk.
8 milliseconds is also not a lot of time for a typical desktop operating system.

Is there any way to write a few bytes to a disk sector without reading it first?

I've been experimenting with the performance of reading and writing files on Linux, specifically O_DIRECT, and I'm wondering, both at a hard drive level and the posix/Linux API level, is it possible to write only a few bytes to a sector, without destroying the rest of the sector, and without reading it first?
My experience with disk drives is that they expect data to be sent to them in entire sectors. So, basically, there's no way of writing less than an entire sector and if you wish to change the start of a sector without changing the end, you must read the whole sector, modify and write back. That is partly to do with how the disk head interacts with the platter (for physical disks anyway. In the case of flash drives, it's more likely to be with how small a chunk of the flash can be erased in one go).
In a portable way? Probably not.
In Linux and a few other Unix-like systems, you can open the block device for the drive, seek to a position (probably aligned to the sector size) and write some data to it, but I don't know what effect it would have on the remaining portion of that block.
Your best bet is to try it out on a virtual machine and see what happens. (Obviously, you'll have to have permission to write to the block device.)

FAT File system and lots of writes

I am considering using a FAT file system for an embedded data logging application. The logger will only create one file to which it continually appends 40 bytes of data every minute. After a couple years of use this would be over one million write cycles. MY QUESTION IS: Does a FAT system change the File Allocation Table every time a file is appended? How does it keep track where the end of the file is? Does it just put an EndOfFile marker at the end or does it store the length in the FAT table? If it does change the FAT table every time I do a write, I would ware out the FLASH memory in just a couple of years. Is a FAT system the right thing to use for this application?
My other thought is that I could just store the raw data bytes in the memory card and put an EndOfFile marker at the end of my data every time I do a write. This is less desirable though because it means the only way of getting data out of the logger is through serial transfers and not via a PC and a card reader.
FAT updates the directory table when you modify the file (at least, it will if you close the file, I'm not sure what happens if you don't). It's not just the file size, it's also the last-modified date:
http://en.wikipedia.org/wiki/File_Allocation_Table#Directory_table
If your flash controller doesn't do transparent wear levelling, and your flash driver doesn't relocate things in an effort to level wear, then I guess you could cause wear. Consult your manual, but if you're using consumer hardware I would have thought that everything has wear-levelling somewhere.
On the plus side, if the event you're worried about only occurs every minute, then you should be able to speed that up considerably in a test to see whether 2 years worth of log entries really does trash your actual hardware. Might even be faster than trying to find the relevant manufacturer docs...
No, a flash file system driver is explicitly designed to minimize the wear and spread it across the memory cells. Taking advantage of the near-zero seek time. Your data rates are low, it's going to last a long time. Specifying a yearly replacement of the media is a simple way to minimize the risk.
If your only operation is appending to one file it may be simpler to forgo a filesystem and use the flash device as a data tape. You have to take into account the type of flash and its block size, though.
Large flash chips are divided into sub-pages that are a power-of-two multiple of 264 (256+8) bytes in size, pages that are a power-of-two multiple of that, and blocks which are a power-of-two multiple of that. A blank page will read as all FF's. One can write a page at a time; the smallest unit one can write is a sub-page. Once a sub-page is written, it may not be rewritten until the entire block containing it is erased. Note that on smaller flash chips, it's possible to write the bytes of a page individually, provided one only writes to blank bytes, but on many larger chips that is not possible. I think in present-generation chips, the sub-page size is 528 bytes, the page size is 2048+64 bytes, and the block size is 128K+4096 bytes.
An MMC, SD, CompactFlash, or other such card (basically anything other than SmartMedia) combines a flash chip with a processor to handle PC-style sector writes. Essentially what happens is that when a sector is written, the controller locates a blank page, writes a new version of that sector there along with up to 16 bytes of 'header' information indicating what sector it is, etc. The controller then keeps a map of where all the different pages of information are located.
A SmartMedia card exposes the flash interface directly, and relies upon the camera, card reader, or other device using it to perform such data management according to standard methods.
Note that keeping track of the whereabouts of all 4,000,000 pages on a 2 gig card would require either having 12-16 megs of RAM, or else using 12-16 meg of flash as a secondary lookup table. Using the latter approach would mean that every write to a flash page would also require a write to the lookup table. I wouldn't be at all surprised if slower flash devices use such an approach (so as to only have to track the whereabouts of about 16,000 'indirect' pages).
In any case, the most important observation is that flash write times are not predictable, but you shouldn't normally have to worry about flash wear.
Did you check what happens to the FAT file system consistency in case of a power failure or reset of your device?
When your device experience such a failure you must not lose only that log entry, that you are just writing. Older entries must stay valid.
No, FAT is not the right thing if you need to read back the data.
You should further consider what happens, if the flash memory is filled with data. How do you get space for new data? You need to define the requirements for this point.

Resources