is it possible to overwrite MFT file table in windows api? - c

is it possible to overwrite MFT file table in windows api. When windows is up and ready?
I know we can read MFT but I ask about write.

Vista restricted raw access but you can probably still do it if you unmount the volume first.
Changes to the file system and to the storage stack to restrict direct disk access and direct volume access in Windows Vista and in Windows Server 2008.
I don't know the type of program you are writing but it might fit in this category:
Backup programs must unmount the volume before they write to the volume. Otherwise, the program writes will collide with file system writes. Such collisions will result in corruption or in system instability.
Writing to a live volume might be possible if you jump trough all their hoops but the risk of corruption is probably too high, you might want to investigate obscure and/or undocumented NTFS IO control codes instead.

Related

mkstemp and hard disk stress

Are temporary files created with mkstemp synced to disk?
Here is what I have:
Program creates temporary file using mkstemp and sends fd to another program.
This temporary file is mmap-ped by both programs and used heavily (up to 400 MB/sec of writes and 400 MB/sec of reads; up to 60 reads and writes per second).
I can't use memfd_create (may not be supported on target devices).
Lets also assume (and this is almost true) that I can't create this file on tmpfs (like in /tmp).
What I need is guarantee that such file will not stress hard disk. I can't allow it to be written to disk even if this only happens once every 5 seconds. If I can't get such guarantee, I will look for another way.
Additional info (not important):
I am writing wayland compositor for Android devices. Currently temporary files (wayland surfaces actually) are created on tmpfs. And everything works fine as long as SELinux is not enabled. But if I enable SELinux, it prevents fd's from being transferred from client to compositor. Only solution I currently know is to create temporary files in app's home dir. But if such way is dangerous, I will find another.
Are temporary files created with mkstemp synced to disk?
The mkstemp function does not impart any special properties to files it opens that would prevent them from being synced to disk. The filesystem on which they are created might have such a property, but that's independent of file creation. In particular, files created via mkstemp() will persist indefinitely if not removed.
What I need is guarantee that such file will not stress hard disk. I can't allow it to be written to disk even if this only happens once every 5 seconds. If I can't get such guarantee, I will look for another way.
As far as I am aware, even tmpfs filesystems do not guarantee that their contents will remain locked in memory, as opposed to being paged out. They are backed by virtual memory. But if the actual file is comparatively small and all its pages are hot, then they are likely to remain in memory only.
With regard to the larger problem,
everything works fine as long as SELinux is not enabled. But if I
enable SELinux, it prevents fd's from being transferred from client to
compositor. Only solution I currently know is to create temporary
files in app's home dir.
By default, newly-created files inherit the SELinux type of their parent directory. Your Wayland clients presumably do not have sufficient privilege to modify the SELinux labels of the files they create, but you should be able to administratively create a directory wherever you like with a label conducive to your needs. For example, you could cause a subdirectory of /dev/shm to be created for the purpose (at every boot), and chconned to have an appropriate label. If the clients create their temp files there then they should inherit the SELinux type you choose.

A way to make a file contents snapshot in Linux

What is the best way to create an "atomic" snapshot of file contents in Linux? Emphasis is not on performance, but on getting contents as a whole.
I may think of using sendfile(2) (since 2.6.33) or splice(2), but neither have any indication of operation atomicity. Both are run in the kernel-space entirely, but at least sendfile(2) implies it's using mmap(2) and mmap gives no guarantees that writes to the same mmaped (as MAP_SHARED) region in other processes won't be visible even with MAP_PRIVATE (probably they will, because that are the same pages).
Taking that this functions are writing with performance in mind and sendfile(2) is optimized to be used with DMA, I may only assume that they just copy memory in some background kernel thread and it's quite possible that other operations may also affect the data being copied.
So the only possible solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?
So the only possible [filesystem-independent] solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?
Almost; there is also fanotify. Plus, as mentioned in a comment, there are some filesystem-specific options, and some possibilities only available in certain configurations.
The lease break timer is configurable, /proc/sys/fs/lease_break_time in seconds, and the default is 45 seconds.
"Just give up and try later" is also a bit defeatist; you do have ways to monitor when the snapshot might work. Consider placing an inotify IN_CLOSE_WRITE and IN_CLOSE_NOWRITE watch on the file, and try the snapshot whenever you receive such an event.
fanotify:
For a few years now, I've been monitoring the progress of Linux fanotify, in the hopes that it would grow enough features that it could be used for automagic file versioning. Essentially, whenever someone opens the file with write permissions, the current file would be snapshot to temporary storage, marked with some metadata (timestamp, real human user (backtracked through sudo/su), and so on). When that descriptor is closed, another snapshot is taken, and a helper thread/process diffs the two, annotating the changes (or even pushing it to git).
It is limited to local filesystems, but with 2.6.37 and later kernels (including 3.x), the interface is sufficient for specific files, or an entire mount. In your case, the fanotify interface allows similar features to file leases, except for local filesystems only, but you can simply deny any accesses during the snapshot. (One can argue whether that is a good idea at all, especially if the file to be snapshotted is a system or configuration file; many programmers overlook error checking, because "some files just have to be always accessible, or your system is broken".)
As far as my change monitoring goes, fanotify should now have all sufficient features, but only if an entire mount is monitored. I was hoping to monitor configuration files on multi-admin clusters, but those files reside on the same mount as all system libraries and binaries do, so the monitoring causes considerable overhead. So much so, that it seems more appropriate to just modify SSH configuration, console configuration (getty etc.), sudo configuration, and possibly su, to always include a dynamic library that interposes file access syscalls, and basically does the versioning on behalf of the user. This way service binaries are not affected, only user actions are monitored.
This might work under some circumstances:
(Optional) Do something to prevent new processes to open the file:
a/ rename the file
b/ restrict file permissions
Find all existing file readers/writers via lsof and kill -STOP them
Do your snapshot
kill -CONT all readers/writers
(Optional) Restore action 1.

Adding Capability to NFS Server - Compressing/Decompressing Stored/Retrieved Files

I need to build a custom Suse Linux NFS Server that does compression on certain files that are stored on the disk, and decompresses files as they are read from the disk. This needs to be transparent to the remote users of the file system, meaning that if a user saves a 10MB file named XYZZY.tif on /archiveDirectoryOnNFSServer, that when they do a ls -l on that mounted directory, they will see a 10MB file called XYZZY.tif, even though the actual file stored on the disk on the NFS server will be XYZZY.tif.compressed, and it will be 2MB in size.
I'm expecting that I need to build this as a driver that sits below the NFS Server software stack, but, I'm having difficulty finding where to start. Are there existing NFS Servers that provide this level of customization through APIs? Will I need to modify source of an open source NFS Server, and, if so, is there one that would be easiest to start with, and are they modularly structured such that this will be straight forward? I'm having difficult locating relevant content on the internet, and any pointers will be greatly appreciated.
IMO that kind of functionality is absolutely not the NFS server's responsibility (an nfs server should, well, serve files over nfs), but the underlying filesystem's. However, there's not that much choice in Linuxland but you could start by checking out fusecompress and btrfs.
This post is a bit old so you may already be aware of some options here, but there are a couple others (both for server side).
http://zfsonlinux.org/
zfs filesystem has built-in compression. I typically use lzjb as it is the fastest compression algorithm and does a reasonable job (MySQL DB's get 2-4x compression, filesystems with non-compressed data get around 4). you have a choice of algorithm depending on how much CPU time you wish to offer the compression.
if you want different file types compressed then you may consider laying gluster on top of a set of zfs filesystems.
gluster will allow you to store certain file types (by extension) on different underlying filesystems.
in this case, you specify the underlying filesystem as a zfs volume with the particular options you need (for example, .zip and .png go on an uncompressed filesystem, while things you write once and read many like static html files might go on a higher compression--you'll pay once when it's written but reads should be really fast since it scans fewer disk blocks and decompression is very fast)
zfs will manage the nfs mounts if you use it as your nfs server--you wont want this if you lay gluster on top.
it's easy to specify dynamically other attributes per filesystem (atime/noatime, # of copies if you want redundancy other than your normal raid, you can add SSD's as cache devices to get more performance).
in these solutions, you still send the full uncompressed files over the wire, so it doesn't make up for network performance but gives a lot of options if you're trying to speed up Disk IO or get more utilization out of your drives.

Garbage data in file after sudden power loss

I am using a flash with FAT32 system. I am continuosly writing data to a file using file system APIs from rtos(SMX). However, after sudden poweroffs, the file contains garbage values just above the first file entry on system reboot.
I run chkdsk utility, but it doesn't fix any problem.
Any idea how can i get rid of these garbage entries even on unclean power offs?
If you expect sudden power loss, you'll need to disable all caching/buffering on file writes. Of course you'll also need to deal with partially-written files, but that should at least prevent trailing garbage.
I don't know the API you're using, but this might be done by mounting the drive "synchronously" (e.g., mount -o sync in Linux) or by opening individual files with specific options. If you do disable buffering on individual file writes, you may still run the risk of corrupting the FAT, however, and losing all the files.

Read data from damaged media

Is it possible to read damaged media (cd, hdd, dvd,...) even if windows explorer bombs out?
What I mean to ask is, whether there is a set of APIs or something that can access the disk at a very low level (below explorer?) and read whatever can be retrieved even if it is only partial, especially if you can still see the file is there from explorer, but can't do anything with it because it is damaged somehow (scratch on cd, etc)?
The main problem with Windows Explorer is that it doesn't support resuming copying after a read error. Most superficially scratched CDs, for example, will fail on different areas of the disk every time you eject and reinsert them.
Therefore, with a utility that supports resuming copy operations, it is possible to read the entire contents of a damaged CD with by doing "eject/reload/resume" a few times.
In fact, this is what a utility I wrote does, and I've never needed anything fancier to read scratched disks. (It simply uses ReadFile and WriteFile.)
One step lower would be opening the raw partition (i.e. disk image) by passing a string such as "\.\F:" (note: slashes are literal here) to CreateFile. It would allow you to read raw sectors from a drive, but reconstructing files from that data would be hard.
In fact, the "\.\" syntax allows you to open devices in the "\GLOBAL??" branch of the Windows Object Manager namespace as if they were files. It's not unlike calling dd with /dev/x as a parameter. There is also a "\Device" branch, but that's only accessible via DeviceIoControl() (i.e. ioctl()), meaning there's no simple ReadFile()/WriteFile() interface.
Anything lower level than that would be device-specific, I guess; like reading raw CD-ROM data (including ECC bits) the way some CD-burning programs do. You'd have to do some research on the specific media (CD, flash, DVD) and what your hardware allows you to do on them.
Note: The backslashes seem to get lost on the way to the web page; you need to pass "backslash backslash dot backslash DeviceName" to CreateFile. You need to escape them, too, of course.
If you want to do it, do it from the Linux side - see: http://sourceforge.net/projects/monkeycity/ opensource
or ready made app and freeware too: http://www.theabsolute.net/sware/dskinv.html
the first step is dd_rescue. After that, you're free to try anything to reconstruct the data.
And there's GNU ddrescue
GNU ddrescue is a data recovery tool. It copies data from one file or block device (hard disc, cdrom, etc) to another, trying to rescue the good parts first in case of read errors.
Make sure to use the 3-arg version (manual):
ddrescue [options] infile outfile [mapfile]
That is, do use a mapfile even if it's optional, because:
If you use the mapfile feature of ddrescue, the data is rescued very efficiently, (only the needed blocks are read). Also you can interrupt the rescue at any time and resume it later at the same point. The mapfile is an essential part of ddrescue's effectiveness. Use it unless you know what you are doing.
And it's also included in Cygwin and Homebrew.
I don't know what layer exists between Windows Explorer and the Win32 APIs. You can try to write a program with the Win32 File I/O stuff. If that doesn't work, then you have to write your own device driver to get any lower.
I've had some luck from the linux side, or using BartPE (http://www.nu2.nu/pebuilder/), but just seeing the file doesn't always mean the file is going to be recoverable, whether you're trying from Windows or Linux. You're best bet might be to use a trial of a recovery program.
I have had two disks start to disintegrate on me. From the pattern of unreadable sectors I think they had internal flaking of their emulsion. WinXP Explorer just threw up its hands and said the drive didn't even exist.
In both cases I used "GetDataBack for NTFS" from Runtime Software (http://www.runtime.org/). You can download a free trial which will show you what you could get back if you paid for it. When I bought it it was $49, but I see it is now $79.
This program is amazing. It's not necessarily fast as it will reread some sectors over and over, trying to get a consensus value from multiple tries, but when it's done you can get back stuff that you thought was gone forever. I had one drive that it took over 10 hours to analyze, but when it was done I got back over 97% of a 500GB drive. Definitely worth the price.
Another great tool is Beyond Compare. I have rev 2.5.3, but it is currently at 3.?? and costs $30. They have a full-functionality, 30-day trail. It does a great job of copying large quantities of files (and only those that need to be copied) and, unlike Explorer, it doesn't blow up if something fails. It's sort of like a visual rsync for Windows, if you're familiar with that program from the Samba people.
I have no connection with either of the comapnies mentioned other than being a very satisfied customer.
The gold standard for recovering data from a magnetic storage device would have to be SpinRite. It's a commerical app though, so you probably wouldn't learn much from it.
If you have a Linux machine around, I can recommend dvdisaster. It is originally meant for creating error correction files, but it also reads DVDs into an image and ignores read errors; and you can use different drives one after another to get missing sectors filled in the image.

Resources