Virtual file in memory - c

I'm adding encryption functionality to existing program (plain c console app).
It takes few files as input parameter and my task is to make this files(sensitive data) secured.
What I do now is that I encrypt the files first (simple XOR with external application), then decipher them back inside the program, the existing program process the files. Then, after everything is successful I encrypt those files back (everything is stored locally on hard disk).
HOWEVER, there is a hole in security, since all the "open" files are stored on hard disk. In case the program fails somewhere in the middle, those files will not be decrypted back.
My problem is that the existing program is taking the FILE variable as input and works directly with those files. It's not my program so I don't have rights to modify it.
What I would need is to write files into memory instead of hard disk.
I know that there are some libraries in Linux that enable this, but I develop this in win.

The FILE data type is used to obtain a file pointer to the file being opened, so your FILE * variable is already in memory. If you want the whole file in memory, then you have to allocate a buffer with the same size of your file and read the whole file in memory.
On Stackoverflow you can find more examples about fread, fwrite, fseek and so on.
EDIT:
If you want to manipulate files in memory, but using the existing stdio.h interfaces (fopen, fread, fwrite, etc.) you need a ramdrive. Here you can find some free driver to make it on Windows. Remember that you have to move files inside the ramdrive to process them, and move them outside the ramdrive when you have finished, or all your changes will be lost.

Related

Is it possible to temporarily hide a file from any system call in linux?

I am trying to implement a filesystem by using FUSE, and i want the file to get hidden temporarily when it is deleted. I tried to store all the files' name or its inode in an array and check them when some system call like 'open' , 'getattr' or 'readdir' get invoked. But it could eat up tons of performance when the number gets really huge. So i wonder is there a better way to do this? Thanks in advance!
There are two problems to your approach (and to the solution pointed out by Oren Kishon, and marked as selected):
first is that a file has no name per se. The name of a file is not part of the file. The mapping of filenames to files (actually to inodes) is created by the system for the comodity of the user, but the names are completely independent of the files they point to. This means that you know easily which is the inode a link points to, but it is very difficult to do the reverse mapping (getting the directory entry that points to the inode, with just knowing the inode) The deletion of a file is a two phase process. In the first phase, you call unlink(2) system call to erase a link (to erase a directory entry) from the directory it belongs, and then deallocate all the blocks pertaining to that file, but only in case the reference count (which is stored in the inode itself) drops to zero. This is an easy process, as everything starts from the directory entry you want to be deleted. But if you dont erase it, searching for it later will be painfull, as you can see below, in the second problem stated here.
Second is that if you do this with, let's say, six links (hard links) to the same file, you'll never know, when you need the space to be actually reallocated to another file (because you run out of unallocated space) because the link reference count is still six on the inode. Even worse, if you add a second ref count in the inode to follow the (different) number of truly erased files that have not yet been unallocated, the problem is that you have to search over the whole filesystem.(because you have no idea on where should the links be) So you need to maintain a lot of information (to add to the space the file occupies in the filesystem) first to gather all the links that pointed once to this file, and second to check if this is indeed the file that has to be deallocated, in case more space is needed in the filesystem.
By the way, your problem has an easy solution in user space, although. Just modify the rm command to never erase a file completely(e.g. never unlink the last link to a file), but to move the files in a queue in some fixed directory in the same filesystem in which the file resided, to handle the last link to it, and this will maintain the files still allocated (but you lose any reference, or you can save it in an associated file, to the name of the file). A monitor process can check the amount of free space and select from the queue the first one (erased oldest), and truly erase it. Beware that if you have large files erased, this will make your system load to grow at random times when it is time to actually erase the files you are deallocating.
There's another alternative. Use zfs as your filesystem. This requires a lot of memory and cpu, but is a complete solution to the undeletion of files, because zfs conserves the full history of the filesystem, so you can get back in time upto a snapshot in which the file existed, and then make a copy of it, actually recovering it. ZFS can be used on WORM(Write Once Read Many, as DVD) media and this allows you to conserve the filesystem state over time (at the expense of never reusing the same data again) But you will never lose a file.
Edit
There's one case in which the file is no longer available to use for any other process than the ones that have it open. In this scenario, one process opens a file, then deletes it (deletion involves just breaking the link that allows to translate the name of the file to the inode in the system) but continues using the file, until it finally closes.
As you probably know, a file can be opened by several files at the same time. Apart from the number of references that figures in the disk inode, there's a number of references to the inode in the inode table in kernel memory. This is the number of references of the file in the disk inode (the number of directory entries that point to the file's inode) plus one reference for each file entry that states a file is open.
When a file is unlinked (and it should be deleted, because no more links to the inode are referencing it) the deallocation doesn't take immediately, as the file is still being used by processes. The file is alive, although it doesn't appear in the filesystem (there's no more references to it in any directory) Only when the last close(2) of the file takes place, the file is deallocated in the system.
But what happened to the directory entry that referenced las that file. It can be reused (as I told you in one of the comments) immediately it has been freed, long before the file is deallocated. A new file (it will be forcibly a different inode, as the old one is still in use) will be created and named as the original one (because you decided to name it the same) and no problem is on this, but that you are using a different file. The old file is still in use, and has no name, and for this reason is unvisible to other processes except the one that is using it. This technique is used frequently to use temporary files, in which you create a file with open(2), and immediately unlink(2) it. No other process can access that file, and that file will be deallocated as soon as the file entry is close(2)d. But such a file will be deallocated as soon as the last close(2) on it is called. No file of this characteristics can survive a reboot of the system. (it cannot even survive the process that had it open)
As the question states:
Is it possible to temporarily hide a file from any system call in linux?
The file is hidden to all the system calls that require a name for the file (it has no name anymore) but not to other system calls (e.g. fstat(2) continue to work, while stat(2) will be impossible to use on that file, same with link(2), rename(2), open(2), etc.)
If I understand, when unlink is called, you want the file to be marked as deleted rather than actually deleted.
You could implement this mark as an extended attribute, one with "system" namespace (see https://man7.org/linux/man-pages/man7/xattr.7.html) which is not listed as part of the file xattr list.
In your "unlink", do setxattr(system.markdelete). In all of your other calls with path arg, and in readdir, getxattr and treat it as deleted.

How to get memory address from file path without opening file

I have a file. I know its file path, and I want to know its memory address. My problem is that I can't actually open the file (fopen can't be used because the OS thinks the file is in use) to get its memory address.
Ex.
fopen("C:/example/file", "r") returns null
From why I understand the OS returns the memory address after it confirms the file isn't in use. So is it even possible to bystep the OS?
#Alter by finding the Process ID of the process that has locks on the file, you could get somewhere... You might be able to track your files contents in memory as part of the memory space allocated to the Process.
However, just because a file is locked does not at all mean that the file is in memory. Sometimes just a part of a file is used, like the functions within a DLL - where only the 'used' and necessary chunks of the file would be in memory. Other times, the entire document (file) will be present very nicely and contiguously in memory (consider a text file open in Notepad) . It is also possible that the file is locked purely like a placeholder, where the lock is all that matters and none of the file is actually loaded. You really need to know alot about the Process that has locks on the file.
Now if you simply want to copy the file to another file, then launch the copy before the 'Process' locks the file. You could try a batch file that runs at Windows Startup - and see if that is early enough to copy the file before a lock is placed on it.

Secure File Delete in C

Secure File Deleting in C
I need to securely delete a file in C, here is what I do:
use fopen to get a handle of the file
calculate the size using lseek/ftell
get random seed depending on current time/or file size
write (size) bytes to the file from a loop with 256 bytes written each iteration
fflush/fclose the file handle
reopen the file and re-do steps 3-6 for 10~15 times
rename the file then delete it
Is that how it's done? Because I read the name "Gutmann 25 passes" in Eraser, so I guess 25 is the number of times the file is overwritten and 'Gutmann' is the Randomization Algorithm?
You can't do this securely without the cooperation of the operating system - and often not even then.
When you open a file and write to it there is no guarantee that the OS is going to put the new file on the same bit of spinning rust as the old one. Even if it does you don't know if the new write will use the same chain of clusters as it did before.
Even then you aren't sure that the drive hasn't mapped out the disk block because of some fault - leaving your plans for world domination on a block that is marked bad but is still readable.
ps - the 25x overwrite is no longer necessary, it was needed on old low density MFM drives with poor head tracking. On modern GMR drives overwriting once is plenty.
Yes, In fact it is overwriting n different patterns on a file
It does so by writing a series of 35 patterns over the
region to be erased.
The selection of patterns assumes that the user doesn't know the
encoding mechanism used by the drive, and so includes patterns
designed specifically for three different types of drives. A user who
knows which type of encoding the drive uses can choose only those
patterns intended for their drive. A drive with a different encoding
mechanism would need different patterns.
More information is here.
#Martin Beckett is correct; there is so such thing as "secure deletion" unless you know everything about what the hardware is doing all the way down to the drive. (And even then, I would not make any bets on what a sufficiently well-funded attacker could recover given access to the physical media.)
But assuming the OS and disk will re-use the same blocks, your scheme does not work for a more basic reason: fflush does not generally write anything to the disk.
On most multi-tasking operating systems (including Windows, Linux, and OS X), fflush merely forces data from the user-space buffer into the kernel. The kernel will then do its own buffering, only writing to disk when it feels like it.
On Linux, for example, you need to call fsync(fileno(handle)). (Or just use file descriptors in the first place.) OS X is similar. Windows has FlushFileBuffers.
Bottom line: The loop you describe is very likely merely to overwrite a kernel buffer 10-15 times instead of the on-disk file. There is no portable way in C or C++ to force data to disk. For that, you need to use a platform-dependent interface.
MFT(master File Table) similar as FAT (File Allocation table),
MFT keeps records: files offsets on disk, file name, date/time, id, file size, and even file data if file data fits inside record's empty space which is about 512 bytes,1 record size is 1KB.
Note: New HDD data set to 0x00.(just let you know)
Let's say you want overwrite file1.txt OS MFT finds this file offset inside record.
you begin overwrite file1.txt with binary (00000000) in binary mode.
You will overwrite file data on disk 100% this is why MFT have file offset on disk.
after you will rename it and delete.
NOTE: MFT will mark file as deleted, but you still can get some data about this file i.e. date/time : created, modified, accessed. file offset , attributes, flags.
1- create folder in c:\ and move file and in same time rename in to folder( use rename function ) rename file to 0000000000 or any another without extention
2- overwrite file with 0x00 and check if file was overwrited
3- change date/time
4- make without attributes
5- leave file size untouched OS faster reuse empty space.
6- delete file
7- repeat all files (1-6)
8- delete folder
or
(1, 2, 6, 7, 8)
9- find files in MFT remove records of these files.
The Gutmann method worked fine for older disk technology encoding schemes, and the 35 pass wiping scheme of the Gutmann method is no longer requuired which even Gutmann acknowledges. See: Gutmann method at: https://en.wikipedia.org/wiki/Gutmann_method in the Criticism section where Gutmann discusses the differences.
It is usually sufficient to make at most a few random passes to securely delete a file (with possibly an extra zeroing pass).
The secure-delete package from thc.org contains the sfill command to securely wipe disk and inode space on a hard drive.

Reading and piping large files with C

I am interested in writing a utility that modifies PostScript files. It needs to traverse the file, make certain decisions about the page count and dimensions, and then write the output to a file or stdout making certain modifications to the PostScript code.
What would be a good way to handle file processing on a *NIX system in this case? I'm fairly new to pipes and forking in C, and it is my understanding that, in case of reading a file directly, I could probably seek back and forth around the input file, but if input is directly piped into the program, I can't simply rewind to the beginning of an input as the input could be a network stream for example, correct?
Rather than store the entire PS file into memory, which can grow huge, it seems like it would make more sense to buffer the input to disk while doing my first pass of page analysis, then re-read from the temporary file, produce output, and remove the temporary file. If that's a viable solution, where would be a good place to store such a file on a *NIX system? I'm not sure how safe such code would be either: the program could potentially be used by multiple users on the same server. It sounds like I would have make sure to save the file somewhere in a temporary directory unique to a given user account as well as give the temporary file on disk a fairly unique name.
Would appreciate any tips and pointers on this crazy puzzling world of file processing.
Use mkstemp(3) to create your temporary file. It will handle concurrency issues for you. mmap(2) will let you move around in the file with abandon.
if input is directly piped into the program, I can't simply rewind to the beginning of an input as the input could be a network stream for example, correct?
That's correct. You can only perform random access on a file.
If you read the file, perhaps you could build a table of metadata, which you can use to seek specific portions of the file later, without keeping the file itself in memory.
/tmp is the temporary directory on unix systems. It's specified by FHS. It's cleaned out when the system is rebooted.
If you need more persistent data storage than that there's /var/tmp which is not cleaned out after reboots. Also FHS.
http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

How exactly is a file opened for reading/writing by applications like msword/pdf?

I want are the steps that an application takes inorder to open the file and allow user to read. File is nothing more than sequence of bits on the disk. What steps does it take to show show the contents of the file?
I want to programatically do this in C. I don't want to begin with complex formats like word/pdf but something simpler. So, which format is best?
If you want to investigate this, start with plain ASCII text. It's just one byte per character, very straightforward, and you can open it in Notepad or any one of its much more capable replacements.
As for what actually happens when a program reads a file... basically it involves making a system call to open the file, which gives you a file handle (just a number that the operating system maps to a record in the filesystem). You then make a system call to read some data from the file, and the OS fetches it from the disk and copies it into some region of RAM that you specify (that would be a character/byte array in your program). Repeat reading as necessary. And when you're done, you issue yet another system call to close the file, which simply tells the OS that you're done with it. So the sequence, in C-like pseudocode, is
int f = fopen(...);
while (...) {
byte foo[BLOCK_SIZE];
fread(f, foo, BLOCK_SIZE);
do something with foo
}
fclose(f);
If you're interested in what the OS actually does behind the scenes to get data from the disk to RAM, well... that's a whole other can of worms ;-)
Start with plain text files

Resources