When looking into DB storage engines, it seems most use mmap to persist. However, is there a situation where writing to a cache layer and writing binary to disk using read and write makes sense?
What I'm trying to understand is what is the difference between mmap and unmmap vs read and write? And when to use the one or the other?
If you can feasibly use mmap(), it's usually the better way. When you use read()/write(), you have to perform a system call for every operation (although libraries like stdio minimize this with user-mode buffering), and these context switches are expensive. Even if the file block is in the buffer cache, you have to first switch into the kernel to check for it. Additionally, the kernel needs to copy the data from the kernel buffer to the caller's memory.
On the other hand, when you use mmap(), you only have to perform a system call when you first open and map the file. From then on, the virtual memory subsystem keeps the application memory synchronized with the file contents. Context switches are only necessary when you try to access a file block that hasn't yet been paged in from disk, not for each part of the file you try to read or write. When you modify the mapped memory, it gets written back to the file lazily.
For most practical applications, you should use whichever method fits the logic of the application best. The performance difference between the two methods will only be significant in highly time-critical applications. When implementing a library, you can't tell the needs of client applications, so of course you try to wring every bit of performance out of it. But for many other applications, premature optimization is the root of all evil.
I am developing a system which copies and writes files on NTFS inside a virtual machine. At any time the VM can poweroff (direct shutdown). The poweroff is controlled from the outside so I do not have any way to detect it. Due to that files and complete directories which are being written to get lost. Is there any way to prevent that or do I have to develop my own file system? I have to store the files on the local disk and cannot send files via network.
There always exists a [short] period between when your data is written (sent to the API) and when this data is written to the physical hardware. If the system crashes in the middle, the data will be lost.
There is a setting in Windows to disable system write cache for certain disks. This setting can help you ensure that the data is at least sent to the host's hardware. Probably that's the answer you've been looking for.
Writing your own filesystem won't help much because it's mainly the write cache that causes the data to be lost. There can exist a filesystem-level cache as well, though, and I don't know if the write cache setting I mentioned above also affects internal filesystem cache.
If you write data to a file opened with "write through" enabled, the method only returns after the data is physically written to the disk so you can be sure it got written. You normally do that by passing in a WRITE_THROUGH flag when you open the file.
I'm running some very specialized experiments for a research project. These experiments call for controlling memory accesses: my application should not, under any circumstances, swap information with the disk. That is, all information the application needs must stay in RAM for the duration of the execution, but it should use as much RAM as possible.
My question is: is there any way I can control disk access by my application, or at least count disk accesses for later analysis?
This is using C and Linux.
Please let me know if I can clarify the question... been working on this for so long I think everybody knows exactly what I'm talking about.
One thing you can do is actually create a ramfs or RAM file system. Are you working on a unix platform? If so you can check out mount and umount on how to create them.
http://linux.die.net/man/8/mount
http://linux.die.net/man/8/umount
Basically what you do is you create a file system stored in your RAM. You don't have to deal with all the disk read/write time anymore. If i read your question correctly you want to try avoiding disk access if you can. It's very simple to do really since you can have multiple file systems located on both a hard drive and memory.
http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/
http://www.alper.net/linuxunix/linux-ram-based-filesystem/
Hope this all helped.
The mlock system call allows you to lock part or all of your process's virtual memory to RAM, thus preventing it from being written to swap space. Notice that another process with root priviledges can still that memory area.
I am working on an embedded application without any OS that needs the use of a File System. I've been over this many times with the people in the project and some agree with me that the system must make a proper shut down of the system whenever there is a power failure or else the file system might go crazy.
Some people say that it doesn't matter if you simply power off the system and let nature run its course, but I think that's one of the worst things to do, especially if you know this will bring you a problem and probably shorten your product's life span.
In the last paragraph I just assumed that it is a problem, but my question remains:
Does a power down have any effect on the file system?
Here is a list of various techniques to help an embedded system tolerate a power failure. These may not be practical for your particular application.
Use a Journaling File System - Can tolerate incomplete writes due to power failure, OS crash, etc. Most modern filesystems are journaled, but do your homework to confirm.
Unless your application needs the write performance, disable all write caching. Check your disk drivers for caching options. Under Linux/Unix, consider mounting the filesystem in sync mode.
Unless it must be writable, make it read-only. Try to keep your application executables and operating system files on their own partition(s), with write protections in place (e.g. mount read only in Linux). Your read/write data should be on its own partition. Even if your application data gets corrupted, your system should still be able to boot (albeit with a fail safe default configuration).
3a. For data that is only written once (e.g. Configuration Settings), try to keep it mounted as read-only most of the time. If there is a settings change mount is as R/W temporarily, update the data, and then unmount/remount it as read-only.
3b. Use a technique similar to 3a to handle application/OS updates in the field.
3c. If it is impractical for you to mount the FS as read-only, at least consider opening individual files as read-only (e.g. fp=fopen("configuration.ini", "r")).
If possible, use separate devices for your storage. Keeping things in separate partitions provides some protection, but there are still edge cases where a partition table may become corrupt and render the entire drive unreadable. Using physically separate devices further isolates against one corrupt device bringing down the whole system. In a perfect world, you would have at least 4 separate devices:
4a. Boot Loader
4b. Operating System & Application Code
4c. Configuration Settings
4e. Application Data
Know the characteristics of your storage devices, and control the brand/model/revision of devices used. Some hard disks ignore cache flush commands from the OS. We had cases where some models of CompactFlash cards would corrupt themselves during a power failure, but the "industrial" models did not have this problem. Of course, this information was not published in any datasheet, and had to be gathered by experimental testing. We developed a list of approved CF cards, and kept inventory of those cards. We periodically had to update this list as older cards became obsolete, or the manufacturer would make a revision.
Put your temporary files in a RAM Disk. If you keep those writes off-disk, you eliminate them as a potential source of corruption. You also reduce flash wear and tear.
Develop automated corruption detection and recovery methods. - All of the above techniques will not help you if the application simply hangs because a missing config file. You need to be able to recover as gracefully as possible:
7a. Your system should maintain at least two copies of its configuration settings, a "primary" and a "backup". If the primary fails for some reason, switch to the backup. You should also consider mechanisms for making backups whenever whenever the configuration is changed, or after a configuration has been declared "good" by the user (testing vs production mode).
7b. Did your Application Data partition fail to mount? Automatically run chkdsk/fsck.
7c. Did chkdsk/fsck fail to fix the problem? Automatically re-format the partition and get it back to a known state.
7d. Do you have a Boot Loader or other method to restore the OS and application after a failure?
7e. Make sure your system will beep, flash an LED, or something to indicate to the user what happened.
Power Failures should be part of your system qualification testing. The only way you will be sure you have a robust system is to test it. Yank the power cord from the system and document what happens. Try yanking the power at multiple points in the system operation (during runtime, while booting, mid configuration, etc). Repeat each test multiple times.
If you cannot mitigate all power failure problems, incorporate a battery or Supercapacitor into the system - Keep in mind that you will need a background process in your OS to initiate a graceful shutdown when power gets low. Also, batteries will require periodic testing and replacement with age.
Addition to msemack's response, unfortunately my rating is too low to post a comment to his answer vs. a separate answer.
Does a power down have any effect on the file system?
Yes, if proper measures aren't put in place to prevent corruption. See previous answers for file system options to help mitigate. However if ATA flush/sleep aren't properly implemented on your device you may run into the scenario we did. In our scenario the device was corrupt beyond the file system, and fdisk/format would not recover the device.
Instead an ATA security-erase was required to recover the device once corruption occurs. In order to avoid this, we implemented an ATA sleep command prior to power loss. This required hold-up of 400ms to support the 160ms ATA sleep took, and leave some head room for degradation of the caps over the life of the product.
Notes from our scenario:
fdisk/format failed to repair/recover the drive.
Our power-safe file system's check disk utility returned that the device had bad blocks, but there really weren't any.
flush/sync returned success, quickly, and most likely weren't implemented.
Once corrupt, dd could not read the device beyond the 1st partition boundary and returned i/o errors after.
hdparm used to issue ATA security-erase, as only method of recovery for some corruption scenarios.
For non-journalling filesystem unexpected turn-off can mean corruption of certain data including directory structure. This happens if there's unsaved data in the cache or if the FS is in the process of writing multi-block update and interruption happens when only some blocks are written.
Journalling addresses this problem mostly - if there's interruption in the middle, recovery routine or check-and-repair operation done by the FS (usually implicitly) brings the filesystem to consistent state. However this state is not always the latest - i.e. if there were some data in the memory cache, they can be lost even with journalling. This is because journalling saves you from corruption of the filesystem but doesn't do magic.
Write-through mode (no write caching) reduces possibility of the data loss but doesn't solve the problem completely, as journalling will work as a cache (for a very short time).
So unfortunately backup or data duplication are the main ways to prevent data loss.
It totally depends on the file system you are using and if it is acceptable to loose some data at power off based on your project requirements.
One could imagine using a file system that is secured against unattended power-off and is able to recover from a partial write sequence. So on the applicative side, if you don't have critic data that absolutely needs to be written before shuting down, there is no need for a specific power off detection procedure.
Now if you want a more specific answer for your project you will have to give more information on the file system you are using and your project requirements.
Edit: As you have critical applicative data to save before power-off, i think you have answered the question yourself. The only way to secure unattended power-off is to have a brown-out detection that alerts your embedded device coupled with some hardware circuitry that allows keeping delivering enought power to the device to perform the shutdown procedure.
The FAT file-system is particularly prone to corruption if a write is in progress or a file is open on shutdown - specifically if ther is a buffered operation that is not flushed . On one project I worked on the solution was to run a file system integrity check and repair (essentially chkdsk/scandsk) on start-up. This strategy did not prevent data loss, but it did prevent the file system becoming unusable.
A number of vendors provide journalling add-on components for FAT to counter exactly this problem. These include Segger, Quadros and Micrium for example.
Either way, your system should generally adopt a open-write-close approach to file access, or open-write-flush if you feel the need to keep the file open.
I need to save very large amounts of data (>500GB) which is being streamed (800Mb/s) from another device connected to my PC. The speed rules out use of a database e.g. MySQl/ISAM and I am looking for a fast, light library which sits on top of the 'C' stdio file lib (i.e. fopen/fclose/fwrite) which will allow me to write/read a very large file (up to available disk-space).
Behind-the-scenes, the large file can be broken up into smaller files e.g. 1GB and I want the API to take care of these details.
The data arrives at the PC in a compressed binary format and no further processing is needed before writing it to the hard-disk.
The library should be work for Windows and Linux.
if you need random access into the data, take a look at memory mapped files.
It lets you map a file (or a section of a file) into memeory transparently, without having to explicitly allocate memeory and read data. It works on windows/Linux (there is a boost lib that wraps the differences).
On Windows you can handle files >>4gb on a 32bit os by using multiple windows into the file.
edit: Sorry 800Mb/s !! I don't know any disks that can cope with that. You migth be lookign at a raid array of SSD drives.
There used to be image capture cards that used an attached drive as a simple series of bytes with no filesystem to get very high speed sustained writes. I don't know if you are going to need somethign like that.
For ultimate speed, I suggest you go highly platform specific.
The objective is to get as close as you can to connecting the input device directly to hard drive. One method is to write a driver for the input device that writes directly to the hard drive.
The generic algorithm is to use either a very large circular byte buffer or use multiple buffers. You need extra space to compensate for the speed difference between the input device and the output device; provided the input device is non-stop.
If you can pause the input device, the issue becomes easier.