Secure File Delete in C - c

Secure File Deleting in C
I need to securely delete a file in C, here is what I do:
use fopen to get a handle of the file
calculate the size using lseek/ftell
get random seed depending on current time/or file size
write (size) bytes to the file from a loop with 256 bytes written each iteration
fflush/fclose the file handle
reopen the file and re-do steps 3-6 for 10~15 times
rename the file then delete it
Is that how it's done? Because I read the name "Gutmann 25 passes" in Eraser, so I guess 25 is the number of times the file is overwritten and 'Gutmann' is the Randomization Algorithm?

You can't do this securely without the cooperation of the operating system - and often not even then.
When you open a file and write to it there is no guarantee that the OS is going to put the new file on the same bit of spinning rust as the old one. Even if it does you don't know if the new write will use the same chain of clusters as it did before.
Even then you aren't sure that the drive hasn't mapped out the disk block because of some fault - leaving your plans for world domination on a block that is marked bad but is still readable.
ps - the 25x overwrite is no longer necessary, it was needed on old low density MFM drives with poor head tracking. On modern GMR drives overwriting once is plenty.

Yes, In fact it is overwriting n different patterns on a file
It does so by writing a series of 35 patterns over the
region to be erased.
The selection of patterns assumes that the user doesn't know the
encoding mechanism used by the drive, and so includes patterns
designed specifically for three different types of drives. A user who
knows which type of encoding the drive uses can choose only those
patterns intended for their drive. A drive with a different encoding
mechanism would need different patterns.
More information is here.

#Martin Beckett is correct; there is so such thing as "secure deletion" unless you know everything about what the hardware is doing all the way down to the drive. (And even then, I would not make any bets on what a sufficiently well-funded attacker could recover given access to the physical media.)
But assuming the OS and disk will re-use the same blocks, your scheme does not work for a more basic reason: fflush does not generally write anything to the disk.
On most multi-tasking operating systems (including Windows, Linux, and OS X), fflush merely forces data from the user-space buffer into the kernel. The kernel will then do its own buffering, only writing to disk when it feels like it.
On Linux, for example, you need to call fsync(fileno(handle)). (Or just use file descriptors in the first place.) OS X is similar. Windows has FlushFileBuffers.
Bottom line: The loop you describe is very likely merely to overwrite a kernel buffer 10-15 times instead of the on-disk file. There is no portable way in C or C++ to force data to disk. For that, you need to use a platform-dependent interface.

MFT(master File Table) similar as FAT (File Allocation table),
MFT keeps records: files offsets on disk, file name, date/time, id, file size, and even file data if file data fits inside record's empty space which is about 512 bytes,1 record size is 1KB.
Note: New HDD data set to 0x00.(just let you know)
Let's say you want overwrite file1.txt OS MFT finds this file offset inside record.
you begin overwrite file1.txt with binary (00000000) in binary mode.
You will overwrite file data on disk 100% this is why MFT have file offset on disk.
after you will rename it and delete.
NOTE: MFT will mark file as deleted, but you still can get some data about this file i.e. date/time : created, modified, accessed. file offset , attributes, flags.
1- create folder in c:\ and move file and in same time rename in to folder( use rename function ) rename file to 0000000000 or any another without extention
2- overwrite file with 0x00 and check if file was overwrited
3- change date/time
4- make without attributes
5- leave file size untouched OS faster reuse empty space.
6- delete file
7- repeat all files (1-6)
8- delete folder
or
(1, 2, 6, 7, 8)
9- find files in MFT remove records of these files.

The Gutmann method worked fine for older disk technology encoding schemes, and the 35 pass wiping scheme of the Gutmann method is no longer requuired which even Gutmann acknowledges. See: Gutmann method at: https://en.wikipedia.org/wiki/Gutmann_method in the Criticism section where Gutmann discusses the differences.
It is usually sufficient to make at most a few random passes to securely delete a file (with possibly an extra zeroing pass).
The secure-delete package from thc.org contains the sfill command to securely wipe disk and inode space on a hard drive.

Related

Is it possible to temporarily hide a file from any system call in linux?

I am trying to implement a filesystem by using FUSE, and i want the file to get hidden temporarily when it is deleted. I tried to store all the files' name or its inode in an array and check them when some system call like 'open' , 'getattr' or 'readdir' get invoked. But it could eat up tons of performance when the number gets really huge. So i wonder is there a better way to do this? Thanks in advance!
There are two problems to your approach (and to the solution pointed out by Oren Kishon, and marked as selected):
first is that a file has no name per se. The name of a file is not part of the file. The mapping of filenames to files (actually to inodes) is created by the system for the comodity of the user, but the names are completely independent of the files they point to. This means that you know easily which is the inode a link points to, but it is very difficult to do the reverse mapping (getting the directory entry that points to the inode, with just knowing the inode) The deletion of a file is a two phase process. In the first phase, you call unlink(2) system call to erase a link (to erase a directory entry) from the directory it belongs, and then deallocate all the blocks pertaining to that file, but only in case the reference count (which is stored in the inode itself) drops to zero. This is an easy process, as everything starts from the directory entry you want to be deleted. But if you dont erase it, searching for it later will be painfull, as you can see below, in the second problem stated here.
Second is that if you do this with, let's say, six links (hard links) to the same file, you'll never know, when you need the space to be actually reallocated to another file (because you run out of unallocated space) because the link reference count is still six on the inode. Even worse, if you add a second ref count in the inode to follow the (different) number of truly erased files that have not yet been unallocated, the problem is that you have to search over the whole filesystem.(because you have no idea on where should the links be) So you need to maintain a lot of information (to add to the space the file occupies in the filesystem) first to gather all the links that pointed once to this file, and second to check if this is indeed the file that has to be deallocated, in case more space is needed in the filesystem.
By the way, your problem has an easy solution in user space, although. Just modify the rm command to never erase a file completely(e.g. never unlink the last link to a file), but to move the files in a queue in some fixed directory in the same filesystem in which the file resided, to handle the last link to it, and this will maintain the files still allocated (but you lose any reference, or you can save it in an associated file, to the name of the file). A monitor process can check the amount of free space and select from the queue the first one (erased oldest), and truly erase it. Beware that if you have large files erased, this will make your system load to grow at random times when it is time to actually erase the files you are deallocating.
There's another alternative. Use zfs as your filesystem. This requires a lot of memory and cpu, but is a complete solution to the undeletion of files, because zfs conserves the full history of the filesystem, so you can get back in time upto a snapshot in which the file existed, and then make a copy of it, actually recovering it. ZFS can be used on WORM(Write Once Read Many, as DVD) media and this allows you to conserve the filesystem state over time (at the expense of never reusing the same data again) But you will never lose a file.
Edit
There's one case in which the file is no longer available to use for any other process than the ones that have it open. In this scenario, one process opens a file, then deletes it (deletion involves just breaking the link that allows to translate the name of the file to the inode in the system) but continues using the file, until it finally closes.
As you probably know, a file can be opened by several files at the same time. Apart from the number of references that figures in the disk inode, there's a number of references to the inode in the inode table in kernel memory. This is the number of references of the file in the disk inode (the number of directory entries that point to the file's inode) plus one reference for each file entry that states a file is open.
When a file is unlinked (and it should be deleted, because no more links to the inode are referencing it) the deallocation doesn't take immediately, as the file is still being used by processes. The file is alive, although it doesn't appear in the filesystem (there's no more references to it in any directory) Only when the last close(2) of the file takes place, the file is deallocated in the system.
But what happened to the directory entry that referenced las that file. It can be reused (as I told you in one of the comments) immediately it has been freed, long before the file is deallocated. A new file (it will be forcibly a different inode, as the old one is still in use) will be created and named as the original one (because you decided to name it the same) and no problem is on this, but that you are using a different file. The old file is still in use, and has no name, and for this reason is unvisible to other processes except the one that is using it. This technique is used frequently to use temporary files, in which you create a file with open(2), and immediately unlink(2) it. No other process can access that file, and that file will be deallocated as soon as the file entry is close(2)d. But such a file will be deallocated as soon as the last close(2) on it is called. No file of this characteristics can survive a reboot of the system. (it cannot even survive the process that had it open)
As the question states:
Is it possible to temporarily hide a file from any system call in linux?
The file is hidden to all the system calls that require a name for the file (it has no name anymore) but not to other system calls (e.g. fstat(2) continue to work, while stat(2) will be impossible to use on that file, same with link(2), rename(2), open(2), etc.)
If I understand, when unlink is called, you want the file to be marked as deleted rather than actually deleted.
You could implement this mark as an extended attribute, one with "system" namespace (see https://man7.org/linux/man-pages/man7/xattr.7.html) which is not listed as part of the file xattr list.
In your "unlink", do setxattr(system.markdelete). In all of your other calls with path arg, and in readdir, getxattr and treat it as deleted.

exFAT cluster allocation strategy

I am attempting to create an exFAT file system implementation for an embedded processor. I have a question about the cluster allocation of a file.
Suppose I have created a file and then written some data to the file. Lets say that the file has used 10 clusters and they are continuous on disk. As far as I understand it, exFAT has a way to mark the file as continuous in the file's directory entry.
Lets say I create more files and write data to them. Then I go back to my first file and want to write more data to the file. But now there are no free clusters at the end of the file as other files occupy them. I guess it must be possible to add more space to the file, but how is this recorded in exFAT? The original file was continuous but now it isnt. The FAT table wouldnt have been updated because of that. Does the FAT table now have to be re-written because the file is fragmented? Thanks
I did an experiment using a USB pen and a disk utility. It turns out that if you write more data to the file and it becomes fragmented, then the continuous cluster flag gets cleared in the dir entry. Then the FAT table needs to be created to indicate where the clusters for the file are located.

How does fwite/putc write to Disk?

Suppose we have an already existing file, say <File>. This file has been opened by a C program for update (r+b). We use fseek to navigate to a point inside <File>, other than the end of it. Now we start writing data using fwrite/fputc. Note that we don't delete any data previously existing in <File>...
How does the system handle those writes? Does it rewrite the whole file to another position in the Disk, now containing the new data? Does it fragment the file and write only the new data in another position (and just remember that in the middle there is some free space)? Does it actually overwrite in place only the part that has changed?
There is a good reason for asking: In the first case, if you continuously update a file, the system can get slow. In the second case, it could be faster but will mess up the File System if done to many files. In the third case, especially if you have a solid state Disk, updating the same spot of a File over and over again may render that part of the Disk useless.
Actually, that's where my question originates from. I've read that, to save Disk Sectors from overuse, Solid State Disks move Data to less used sectors, using different techniques. But how exactly does the stdio functions handle such situations?
Thanks in advance for your time! :D
The fileystem handler creates a kind of dicationary writing to sectors on the disc, so when you update the content of the file, the filesystem looks up the dictionary on the disc, which tells it, in which sector on the disc the file data is located. Then it spins (or waits until the disc arrives there) and updates the appropriate sectors on the disc.
That's the short version.
So in case, of updating the file, the file is normally not moved to a new place. When you write new data to the file, appending to it, and the data doesn't fit into the existing sector, then additional sectors are allocated and the data is written there.
If you delete a file, then usually the sectors are marked as free and are reused. So only if you open a new file and rewrite it, it can happen that the file is put in different sectors than before.
But the details can vary, depending on the hardware. AFAIK if you overwrite data on a CD, then the data is newly written (as long as the session is not finalized), because you can not update data on a CD, once it is written.
Your understanding is incorrect: "Note that we don't delete any data previously existing in File"
If you seek into the middle of a file and start writing it will write over whatever was at that position before.
How this is done under the covers probably depends on how computer in the hard disk implements it. It's supposed to be invisible outside the hard disk and shouldn't matter.

Alter a file (without fseek or + modes) or concatenate two files with minimal copying

I am writing an audio file to an SD/MMC storage card in real time, in WAVE format, working on an ARM board. Said card is (and must remain) in FAT32 format. I can write a valid WAVE file just fine, provided I know how much I'm going to write beforehand.
I want to be able to put placeholder data in the Chunk Data Size field of the RIFF and data chunks, write my audio data, and then go back and update the Chunk Data Size field in those two chunks so that they have correct values, but...
I have a working filesystem and some stdio functions, with some caveats:
fwrite() supports r, w, and a, but not any + modes.
fseek() does not work in write mode.
I did not write the implementations of the above functions (I am using ARM's RL-FLashFS), and I am not certain what the justification for the restrictions/partial implementations is. Adding in the missing functionality personally is probably an option, but I would like to avoid it if possible (I have no other need of those features, do not forsee any, and can't really afford to spend too much time on it.) Switching to a different implementation is also not an option here.
I have very limited memory available, and I do not know how much audio data will be received, except that it will almost certainly be more than I can keep in memory at any one time.
I could write a file with the raw interleaved audio data in it while keeping track of how many bytes I write, close it, then open it again for reading, open a second file for writing, write the header into the second file, and copy the audio data over. That is, I could post-process it into a properly formatted valid WAVE file. I have done this and it works fine. But I want to avoid post-processing large amounts of data if at all possible.
Perhaps I could somehow concatenate two files in place? (I.e. write the data, then write the chunks to a separate file, then join them in the filesystem, avoiding much of the time spent copying potentially vast amounts of data.) My understanding of that is that, if possible, it would still involve some copying due to block orientation of the storage.
Suggestions?
EDIT:
I really should have mentioned this, but there is no OS running here. I have some stdio functions running on top of a hardware abstraction layer, and that's about it.
This should be possible, but it involves writing a set of FAT table manipulation routines.
The concept of FAT is simple: A file is stored in a chain of "clusters" - fixed size blocks. The clusters do not have to be contiguous on the disk. The Directory entry for a file includes the ID of the first cluster. The FAT contains one value for each cluster, which is either the ID of the next cluster in the chain, or an "End-Of-Chain" (EOC) marker.
So you can concatenate files together by altering the first file's EOC marker to point to the head cluster of the second file.
For your application you could write all the data, rewrite the first cluster (with the correct header) into a new file, then do FAT surgery to graft the new head onto the old tail:
Determine the FAT cluster size (S)
Determine the size of the WAV header up to the first data byte (F)
Write the incoming data to a temp file. Close when stream ends.
Create a new file with the desired name.
Open the temp file for reading, and copy the header to the new file while filling in the size field(s) correctly (as you have done before).
Write min(S-F, bytes_remaining) to the new file.
Close the new file.
If there are no bytes remaining, you are done,
else,
Read the FAT and Directory into memory.
Read the Directory to get
the first cluster of the temp file (T1) (with all the data),
the first cluster of the wav file (W1). (with the correct header)
Read the FAT entry for T1 to find the second temp cluster (T2).
Change the FAT entry for W1 from "EOC" to T2.
Change the FAT entry for T1 from T2 to "EOC".
Swap the FileSize entries for the two files in the Directory.
Write the FAT and Directory back to disk.
Delete the Temp file.
Of course, by the time you do this, you will probably understand the file system well enough to implement fseek(fp,0,SEEK_SET), which should give you enough functionality to do the header fixup through standard library calls.
We are working with exactly the same scenario as you in our project - recorder application. Since the length of file is unknown - we write a RIFF header with 0 length in the beginning (to reserve space) and on closing - go back to the 0 position (with fseek) and write correct header. Thus, I think you have to debug why fseek doesn't work in write mode, otherwise you will not be able to perform this task efficiently.
By the way, you'd better off from file system internal specific workarounds like concatenating blocks, etc - this is hardly possible, will not be portable and can bring you new problems. Let's use standard and proven methods instead.
Update
(After finding out that your FS is ARM's RL-FlashFS) why not using rewind http://www.keil.com/support/man/docs/rlarm/rlarm_rewind.htm instead of fseek?

Reading a sector on the boot disk

This is a continuation of my question about reading the superblock.
Let's say I want to target the HFS+ file system in Mac OS X. How could I read sector 2 of the boot disk? As far as I know Unix only provides system calls to read from files, which are never stored at that location.
Does this require either 1) the program to run kernel mode, or 2) the program to be written in Assembly? I would prefer to avoid either of these restrictions, particularly the latter.
I've done this myself on the Mac, see my disk editor tool: http://apps.tempel.org/iBored
You'd open the drive using the /dev/diskN or /dev/rdiskN (N is a disk index number starting from 0). Then you can use lseek (make sure to use the 64 bit range version!) and read/write calls on the opened file.
Also, use the shell command "ls /dev/disk*" to see which drives exist currently. And note that the drives also exist with a "sM" extension where M is the partition number. That way, could can also read partitions directly.
Or, you could just use the shell tool "xxd" or "dd" to read data and then use their output. Might be easier.
You'll not be able to read your root disk and other internal disks unless you run as root, though. You may be able to access other drives as long as they were mounted by the user, or have their permissions disabled. But you may also need to unmount the drive's volumes first. Look for the unmount command in the shell command "diskutil".
Hope this helps.
Update 2017: On OS X 10.11 and later SIP may also prevent you from directly accessing the disk sectors.
In Linux, you can read from the special device file /dev/sda, assuming the hard drive you want to read is the first one. You need to be root to read this file. To read sector 2, you just seek to offset 2*SECTOR_SIZE and read in SECTOR_SIZE bytes.
I don't know if this device file is available on OS X. Check for interestingly named files under /dev such as /dev/sda or /dev/hda.
I was also going to suggest hitting the /dev/ device file for the volume, but you might want to contact Amit Singh who has written an hfsdebug utility and has probably done just what you want to do.
How does this work in terms of permissions? Wouldn't reading from /dev/... be insecure since if you read far enough you would be able to read files for which you do not have read access?

Resources