File System with "Corruption" protection - filesystems

I am developing a system which copies and writes files on NTFS inside a virtual machine. At any time the VM can poweroff (direct shutdown). The poweroff is controlled from the outside so I do not have any way to detect it. Due to that files and complete directories which are being written to get lost. Is there any way to prevent that or do I have to develop my own file system? I have to store the files on the local disk and cannot send files via network.

There always exists a [short] period between when your data is written (sent to the API) and when this data is written to the physical hardware. If the system crashes in the middle, the data will be lost.
There is a setting in Windows to disable system write cache for certain disks. This setting can help you ensure that the data is at least sent to the host's hardware. Probably that's the answer you've been looking for.
Writing your own filesystem won't help much because it's mainly the write cache that causes the data to be lost. There can exist a filesystem-level cache as well, though, and I don't know if the write cache setting I mentioned above also affects internal filesystem cache.

If you write data to a file opened with "write through" enabled, the method only returns after the data is physically written to the disk so you can be sure it got written. You normally do that by passing in a WRITE_THROUGH flag when you open the file.

Related

fsync on mapped crypted device with dm-crypt?

I have a question about dm-crypt.
Here is my situation. I have an encrypted partition mapped (encrypted in virtual device) using the cryptsetup command in Linux. I am opening the mapped virtual device in a c-program using the open() function.
Can i be sure that when i use the fsync() function all information will be written to the encrypted partition or is there some buffer in the dm-crypt driver?
I could not find much reference on this. Maybe someone can shed more light on this, as I have not grokked the source, but it seems as though a sync writes to disk.
One point is the questions trim-with-lvm-and-dm-crypt where a sync changes the disk content reliably, yet the cached content is only updated after a echo 1 > /proc/sys/vm/drop_caches.
Another is the issue that sync hangs on a suspended device, which indicates that the sync goes directly to the device.
A third is this Gentoo discussion where luksClose is possible reliably after a sync.
A fourth is this UL answer, which says
the rest of the stuff [dm-crypt] is in kernel and pretty heavily used, so it's
probably fine
It may still be that all these are wrong, and it can happen that sync does not write directly to the encrypted disk, but it seems unlikely.

linux kernel: how can I copy files before panic?

I have a file on tmpfs partition, which is updated alot.
I want to copy it to other partition (flash partition) before crash/reboot.
It is not an option to keep this file in the first place on the flash partition,
because this flash has limited read/write life-cycle and I'm trying to avoid excessive read/writes to it.
too many writes will damage the flash, that is why the file is on tmpfs.
regrading reboot - I can modify the reboot process to copy before reboot - is there more neat way?
regrading crash - I don't know any way to do so. any ideas?
I know that that I should not mess with files from kernel space.
Thanks
Only a Kernel panic occurs its possible that in-core data structures are already corrupted and unreliable. Ideally, your kernel is not expected to panic, if the version you are using is a stable and tested release. I would recommend capturing a vmcore using crash tool and working with the vendor on the root cause of the kernel panic.
However, if you are referring to an abrupt system shutdown dew to a power failure, etc which could possibly cause loss of the data / file stored in the memory, you could write a cron-job to sync the file to disk on intervals and tune the kernel on how frequently the dirty page get synced. Having said that, if the file you are writing to is quite important, why design it to be kept in the memory in the first place.
You should be syncing this file back to the disk once in every few seconds or in regular intervals. In this way you will not loose the complete data.
As the numbers of read/writes are heavy on the tmpfs file, it may be worth considering using a SSD for this purpose. Read about how file system transaction logs are configured to be stored on SSD drives.
Write a cron-job for syncing the tmpfs file to the SSD or disk in frequent intervals or when ever there are updates. You may want to consider changing some kernel tunables (such as, vm.dirty_expire_centisecs=0, vm.dirty_background_ratio=0) such that any dirty pages would get synced to the disk immediately. A word of caution, doing this would cause higher CPU% and I/O loads, as pages would synced to the disk frequently, although the data loss would be kept to minimal.

Embedded File System and power-off

I am working on an embedded application without any OS that needs the use of a File System. I've been over this many times with the people in the project and some agree with me that the system must make a proper shut down of the system whenever there is a power failure or else the file system might go crazy.
Some people say that it doesn't matter if you simply power off the system and let nature run its course, but I think that's one of the worst things to do, especially if you know this will bring you a problem and probably shorten your product's life span.
In the last paragraph I just assumed that it is a problem, but my question remains:
Does a power down have any effect on the file system?
Here is a list of various techniques to help an embedded system tolerate a power failure. These may not be practical for your particular application.
Use a Journaling File System - Can tolerate incomplete writes due to power failure, OS crash, etc. Most modern filesystems are journaled, but do your homework to confirm.
Unless your application needs the write performance, disable all write caching. Check your disk drivers for caching options. Under Linux/Unix, consider mounting the filesystem in sync mode.
Unless it must be writable, make it read-only. Try to keep your application executables and operating system files on their own partition(s), with write protections in place (e.g. mount read only in Linux). Your read/write data should be on its own partition. Even if your application data gets corrupted, your system should still be able to boot (albeit with a fail safe default configuration).
3a. For data that is only written once (e.g. Configuration Settings), try to keep it mounted as read-only most of the time. If there is a settings change mount is as R/W temporarily, update the data, and then unmount/remount it as read-only.
3b. Use a technique similar to 3a to handle application/OS updates in the field.
3c. If it is impractical for you to mount the FS as read-only, at least consider opening individual files as read-only (e.g. fp=fopen("configuration.ini", "r")).
If possible, use separate devices for your storage. Keeping things in separate partitions provides some protection, but there are still edge cases where a partition table may become corrupt and render the entire drive unreadable. Using physically separate devices further isolates against one corrupt device bringing down the whole system. In a perfect world, you would have at least 4 separate devices:
4a. Boot Loader
4b. Operating System & Application Code
4c. Configuration Settings
4e. Application Data
Know the characteristics of your storage devices, and control the brand/model/revision of devices used. Some hard disks ignore cache flush commands from the OS. We had cases where some models of CompactFlash cards would corrupt themselves during a power failure, but the "industrial" models did not have this problem. Of course, this information was not published in any datasheet, and had to be gathered by experimental testing. We developed a list of approved CF cards, and kept inventory of those cards. We periodically had to update this list as older cards became obsolete, or the manufacturer would make a revision.
Put your temporary files in a RAM Disk. If you keep those writes off-disk, you eliminate them as a potential source of corruption. You also reduce flash wear and tear.
Develop automated corruption detection and recovery methods. - All of the above techniques will not help you if the application simply hangs because a missing config file. You need to be able to recover as gracefully as possible:
7a. Your system should maintain at least two copies of its configuration settings, a "primary" and a "backup". If the primary fails for some reason, switch to the backup. You should also consider mechanisms for making backups whenever whenever the configuration is changed, or after a configuration has been declared "good" by the user (testing vs production mode).
7b. Did your Application Data partition fail to mount? Automatically run chkdsk/fsck.
7c. Did chkdsk/fsck fail to fix the problem? Automatically re-format the partition and get it back to a known state.
7d. Do you have a Boot Loader or other method to restore the OS and application after a failure?
7e. Make sure your system will beep, flash an LED, or something to indicate to the user what happened.
Power Failures should be part of your system qualification testing. The only way you will be sure you have a robust system is to test it. Yank the power cord from the system and document what happens. Try yanking the power at multiple points in the system operation (during runtime, while booting, mid configuration, etc). Repeat each test multiple times.
If you cannot mitigate all power failure problems, incorporate a battery or Supercapacitor into the system - Keep in mind that you will need a background process in your OS to initiate a graceful shutdown when power gets low. Also, batteries will require periodic testing and replacement with age.
Addition to msemack's response, unfortunately my rating is too low to post a comment to his answer vs. a separate answer.
Does a power down have any effect on the file system?
Yes, if proper measures aren't put in place to prevent corruption. See previous answers for file system options to help mitigate. However if ATA flush/sleep aren't properly implemented on your device you may run into the scenario we did. In our scenario the device was corrupt beyond the file system, and fdisk/format would not recover the device.
Instead an ATA security-erase was required to recover the device once corruption occurs. In order to avoid this, we implemented an ATA sleep command prior to power loss. This required hold-up of 400ms to support the 160ms ATA sleep took, and leave some head room for degradation of the caps over the life of the product.
Notes from our scenario:
fdisk/format failed to repair/recover the drive.
Our power-safe file system's check disk utility returned that the device had bad blocks, but there really weren't any.
flush/sync returned success, quickly, and most likely weren't implemented.
Once corrupt, dd could not read the device beyond the 1st partition boundary and returned i/o errors after.
hdparm used to issue ATA security-erase, as only method of recovery for some corruption scenarios.
For non-journalling filesystem unexpected turn-off can mean corruption of certain data including directory structure. This happens if there's unsaved data in the cache or if the FS is in the process of writing multi-block update and interruption happens when only some blocks are written.
Journalling addresses this problem mostly - if there's interruption in the middle, recovery routine or check-and-repair operation done by the FS (usually implicitly) brings the filesystem to consistent state. However this state is not always the latest - i.e. if there were some data in the memory cache, they can be lost even with journalling. This is because journalling saves you from corruption of the filesystem but doesn't do magic.
Write-through mode (no write caching) reduces possibility of the data loss but doesn't solve the problem completely, as journalling will work as a cache (for a very short time).
So unfortunately backup or data duplication are the main ways to prevent data loss.
It totally depends on the file system you are using and if it is acceptable to loose some data at power off based on your project requirements.
One could imagine using a file system that is secured against unattended power-off and is able to recover from a partial write sequence. So on the applicative side, if you don't have critic data that absolutely needs to be written before shuting down, there is no need for a specific power off detection procedure.
Now if you want a more specific answer for your project you will have to give more information on the file system you are using and your project requirements.
Edit: As you have critical applicative data to save before power-off, i think you have answered the question yourself. The only way to secure unattended power-off is to have a brown-out detection that alerts your embedded device coupled with some hardware circuitry that allows keeping delivering enought power to the device to perform the shutdown procedure.
The FAT file-system is particularly prone to corruption if a write is in progress or a file is open on shutdown - specifically if ther is a buffered operation that is not flushed . On one project I worked on the solution was to run a file system integrity check and repair (essentially chkdsk/scandsk) on start-up. This strategy did not prevent data loss, but it did prevent the file system becoming unusable.
A number of vendors provide journalling add-on components for FAT to counter exactly this problem. These include Segger, Quadros and Micrium for example.
Either way, your system should generally adopt a open-write-close approach to file access, or open-write-flush if you feel the need to keep the file open.

Ensure that file state on the client is in sync with NFS server

I'm trying to find proper way to handle stale data on NFS client. Consider following scenario:
Two servers mount same NFS shared storage with number of files
Client application on 1 server deletes some files
Client application on 2 server tries to access deleted files and fails with: Stale NFS file handle (nothing strange, error is expected)
(Also it may be useful to know, that cache mount options are pretty high on both servers for performance reasons).
What I'm trying to understand is:
Is there reliable method to check, that file is present? In the scenario given above lstat on the file returns success and application fails only after trying to move file.
How can I manually sync contents of directory on the client with server?
Some general advise on how to write reliable file management code in case of NFS?
Thanks.
Is there reliable method to check, that file is present? In the scenario given above lstat on the file returns success and application fails only after trying to move file.
That's it normal NFS behavior.
How can I manually sync contents of directory on the client with server?
That is impossible to do manually, since NFS pretends to be a normal POSIX-compliant file system.
I have tried once to code close()/open() in an attempt to somehow mitigate the effects of the NFS client-side caching. In my case I needed to read the info written to the file on other server. But even the reopen trick had close to zero effect. And I can't add fdatasync() to the writing side, since that slows whole application down.
My experience with NFS to date is that nothing you can do. In critical code paths I simply coded to retry the file operations which return ESTALE.
Some general advise on how to write reliable file management code in case of NFS?
Mod me down all you want, but if your customers want reliability then they shouldn't use NFS.
My company for example advertises use of proper distributed file system (I intentionally omit the brand) if customer wants reliability. Our core software is not guaranteed to run on NFS and we do not support such configurations. But in our case we really need the guarantees that as soon as the data are written to FS they become accessible on all other nodes.
Coherency in NFS can be achieved, but at the cost of performance, making NFS barely usable. (Check its mount options.) NFS is caching like crazy to hide the fact that it is a server file system. To make all operations coherent, NFS client would have to go to the NFS server synchronously for every little operation, bypassing the local cache. And that would never be fast.
But since we are talking Linux here, one can advise customers of the software to evaluate available cluster file systems. E.g. RedHat now officially support GFS. I have heard about people using CodaFS, but have no hard info on it.
i have had success with doing ls -l on the directory which contains the file.
You could try the ''noac'' mount option
from man nfs:
In addition to preventing the client
from caching file attributes, the noac
option forces application writes to
become synchronous so that local
changes to a file become visible on
the server immediately. That way,
other clients can quickly detect
recent writes when they check the
file's attributes.
Using the noac option provides
greater cache coherence among NFS
clients accessing the same files, but
it extracts a significant performance
penalty. As such, judicious use of
file locking is encouraged instead.
You could have two mounts, one for critical fast changing data that you need synchronized and another mount for other data.
Also, look into NFS locking and its limitations.
As for general advice:
One way to truncate a file that is concurrently read from multiple hosts is to write the content into a temporary file and then rename that file to the final location.
On the same filesystem this operation should be atomic.

Accessing same resource across restarts in Windows

I will write some thing in a file/memory just before system shutdown or a service shutdown. In the next restart of system, Is it possible to access same file or same memory on the disk, before filesystem loads? Actual requirement is like this, we have a driver that sits between volume level drivers and filesystem driver...in that part of the driver code, I want to access some memory or file.
Thanks & Regards,
calvin
The logical thing here is to read/write this into the registry if it is not too big. Is there a reason you do not want to use the registry?
If you need to access large data and you are writing a volume or device filter and cannot rely on ZwOpen/Read/Write/Close functions in the kernel an approach would be to create the file in user mode, get its device name and cluster chain and store them in the registry. On the next boot, you can get the device and clusters from registry, and do direct I/O on them.
Since you want to access this before the filesystem loads, my first thought is to allocate and use a block of storage space on the hard drive outside of the filesystem. You can create a hidden mini-partition on the drive and use low-level I/O commands to read and write your data.
This is a common task in the world of embedded systems, and we often implement it by adding some sort of non-volatile memory device into the system (flash, battery-backed DRAM, etc) and reading and writing to that device. Since you likely don't have the same level of control over the available hardware as embedded developers do, the closest analogue I can think of would be to reserve a chunk of space on a physical disk that you can read from without having to mount as a filesystem. A dedicated mini-partition might work the best because if you know the size of it, you can treat it as one big raw-access buffer and can avoid having to hassle with filenames, filesystems, etc.

Resources