I have a file. I know its file path, and I want to know its memory address. My problem is that I can't actually open the file (fopen can't be used because the OS thinks the file is in use) to get its memory address.
Ex.
fopen("C:/example/file", "r") returns null
From why I understand the OS returns the memory address after it confirms the file isn't in use. So is it even possible to bystep the OS?
#Alter by finding the Process ID of the process that has locks on the file, you could get somewhere... You might be able to track your files contents in memory as part of the memory space allocated to the Process.
However, just because a file is locked does not at all mean that the file is in memory. Sometimes just a part of a file is used, like the functions within a DLL - where only the 'used' and necessary chunks of the file would be in memory. Other times, the entire document (file) will be present very nicely and contiguously in memory (consider a text file open in Notepad) . It is also possible that the file is locked purely like a placeholder, where the lock is all that matters and none of the file is actually loaded. You really need to know alot about the Process that has locks on the file.
Now if you simply want to copy the file to another file, then launch the copy before the 'Process' locks the file. You could try a batch file that runs at Windows Startup - and see if that is early enough to copy the file before a lock is placed on it.
Related
I am trying to implement a filesystem by using FUSE, and i want the file to get hidden temporarily when it is deleted. I tried to store all the files' name or its inode in an array and check them when some system call like 'open' , 'getattr' or 'readdir' get invoked. But it could eat up tons of performance when the number gets really huge. So i wonder is there a better way to do this? Thanks in advance!
There are two problems to your approach (and to the solution pointed out by Oren Kishon, and marked as selected):
first is that a file has no name per se. The name of a file is not part of the file. The mapping of filenames to files (actually to inodes) is created by the system for the comodity of the user, but the names are completely independent of the files they point to. This means that you know easily which is the inode a link points to, but it is very difficult to do the reverse mapping (getting the directory entry that points to the inode, with just knowing the inode) The deletion of a file is a two phase process. In the first phase, you call unlink(2) system call to erase a link (to erase a directory entry) from the directory it belongs, and then deallocate all the blocks pertaining to that file, but only in case the reference count (which is stored in the inode itself) drops to zero. This is an easy process, as everything starts from the directory entry you want to be deleted. But if you dont erase it, searching for it later will be painfull, as you can see below, in the second problem stated here.
Second is that if you do this with, let's say, six links (hard links) to the same file, you'll never know, when you need the space to be actually reallocated to another file (because you run out of unallocated space) because the link reference count is still six on the inode. Even worse, if you add a second ref count in the inode to follow the (different) number of truly erased files that have not yet been unallocated, the problem is that you have to search over the whole filesystem.(because you have no idea on where should the links be) So you need to maintain a lot of information (to add to the space the file occupies in the filesystem) first to gather all the links that pointed once to this file, and second to check if this is indeed the file that has to be deallocated, in case more space is needed in the filesystem.
By the way, your problem has an easy solution in user space, although. Just modify the rm command to never erase a file completely(e.g. never unlink the last link to a file), but to move the files in a queue in some fixed directory in the same filesystem in which the file resided, to handle the last link to it, and this will maintain the files still allocated (but you lose any reference, or you can save it in an associated file, to the name of the file). A monitor process can check the amount of free space and select from the queue the first one (erased oldest), and truly erase it. Beware that if you have large files erased, this will make your system load to grow at random times when it is time to actually erase the files you are deallocating.
There's another alternative. Use zfs as your filesystem. This requires a lot of memory and cpu, but is a complete solution to the undeletion of files, because zfs conserves the full history of the filesystem, so you can get back in time upto a snapshot in which the file existed, and then make a copy of it, actually recovering it. ZFS can be used on WORM(Write Once Read Many, as DVD) media and this allows you to conserve the filesystem state over time (at the expense of never reusing the same data again) But you will never lose a file.
Edit
There's one case in which the file is no longer available to use for any other process than the ones that have it open. In this scenario, one process opens a file, then deletes it (deletion involves just breaking the link that allows to translate the name of the file to the inode in the system) but continues using the file, until it finally closes.
As you probably know, a file can be opened by several files at the same time. Apart from the number of references that figures in the disk inode, there's a number of references to the inode in the inode table in kernel memory. This is the number of references of the file in the disk inode (the number of directory entries that point to the file's inode) plus one reference for each file entry that states a file is open.
When a file is unlinked (and it should be deleted, because no more links to the inode are referencing it) the deallocation doesn't take immediately, as the file is still being used by processes. The file is alive, although it doesn't appear in the filesystem (there's no more references to it in any directory) Only when the last close(2) of the file takes place, the file is deallocated in the system.
But what happened to the directory entry that referenced las that file. It can be reused (as I told you in one of the comments) immediately it has been freed, long before the file is deallocated. A new file (it will be forcibly a different inode, as the old one is still in use) will be created and named as the original one (because you decided to name it the same) and no problem is on this, but that you are using a different file. The old file is still in use, and has no name, and for this reason is unvisible to other processes except the one that is using it. This technique is used frequently to use temporary files, in which you create a file with open(2), and immediately unlink(2) it. No other process can access that file, and that file will be deallocated as soon as the file entry is close(2)d. But such a file will be deallocated as soon as the last close(2) on it is called. No file of this characteristics can survive a reboot of the system. (it cannot even survive the process that had it open)
As the question states:
Is it possible to temporarily hide a file from any system call in linux?
The file is hidden to all the system calls that require a name for the file (it has no name anymore) but not to other system calls (e.g. fstat(2) continue to work, while stat(2) will be impossible to use on that file, same with link(2), rename(2), open(2), etc.)
If I understand, when unlink is called, you want the file to be marked as deleted rather than actually deleted.
You could implement this mark as an extended attribute, one with "system" namespace (see https://man7.org/linux/man-pages/man7/xattr.7.html) which is not listed as part of the file xattr list.
In your "unlink", do setxattr(system.markdelete). In all of your other calls with path arg, and in readdir, getxattr and treat it as deleted.
I am unable to understand how files are managed in memory mapped I/O. As normal If we open a file using open or fopen, it returns fd or
file pointer respectively. After this open where the file resides for processing. It is in memory(copy of the file which is in hard disk) or not? If it
is not in memory where the data is fetch by consequent read or write system call or It fetchs data from the hard disk for each time of calling read or write.
Otherwise the copy of the file is stored in memory and the file is accessed by process for furthur manipulation and once the process is completed the file is copied to hard disk. In the above concepts
which scenario is worked ?
The following is the definition given for memory mapped i/o in Advanced Programming in Unix Environment(2nd Edition) book:
Memory-mapped I/O lets us map a file on disk into a buffer in memory so that, when we fetch bytes from the buffer, the corresponding bytes of the file are read. Similarly, when we store data in the buffer, the corresponding bytes are automatically written to the file. This lets us perform I/O without using read or write.
what is mapping a file into memory? And here, they defined the memory is placed in between stack and heap. In this memory, what
type of data is present after mapping a file. It contains copy of the file or the address of the file which resides in hard disk. And
how the above scenario becomes true.
Does anyone explain the working mechanism of memory mapped I/O and mmap functionality?
Normally when you open a file, the system sets up some bookkeeping structures (metadata) but does not need to read any part of the actual data of the file. When you call read(), the system loads a chunk of the file into (virtual) memory which you allocated for the purpose.
When you memory-map a file, the system again sets up bookkeeping, and also sets up a (virtual) memory "mapping" which means a range of valid addresses which, if used, will reflect reads (or writes) of the underlying file. It does not mean the entire file needs to be read at once, because it can be "paged in" on demand, i.e. the system can give you an address range to use, then wait for you to actually use it before loading any data there. This "page faulting" is supported by a hardware device called the Memory Management Unit, or MMU. The same system is used when you run an executable file--the system can simply map it into virtual memory and read pages (chunks) from disk only as needed.
It is in memory(copy of the file which is in hard disk) or not?
According to Computer Programming and Utilization, When you open file with fopen its content are loaded into memory. (Partially or wholly).
If it is not in memory where the data is fetch by consequent read or
write system call
When you fwrite some data, it is eventually copied into the kernel which will then write it to disk (or wherever) after buffering. In general, no part of a file needs to be loaded in order to write.
what is mapping a file into memory?
For more refer here
In this memory, what type of data is present after mapping a file. It
contains copy of the file or the address of the file which resides in
hard disk.
A memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource.Refer this
It is possible to mmap a file to a region of memory. When this is done, the file can be accessed just like an array in the program.This is more efficient than read or write, as only the regions of the file that a program actually accesses are loaded. Accesses to not-yet-loaded parts of the mmapped region are handled in the same way as swapped out pages.
After this open where the file resides for processing. It is in memory(copy of the file which is in hard disk) or not?
On the disk. It may also be partly or completely in memory if the operating system does a read-ahead, but that isn't detectable by you. You still have to issue reads to get data from the file.
If it is not in memory where the data is fetch by consequent read or write system call
From the disk.
or It fetchs data from the hard disk for each time of calling read or write.
In effect, but you also have to consider the effect of any caching.
Otherwise the copy of the file is stored in memory and the file is accessed by process for furthur manipulation and once the process is completed the file is copied to hard disk.
No. The file behaves as though it is all on the disk.
And here, they defined the memory is placed in between stack and heap.
Not in what you quoted.
In this memory, what type of data is present after mapping a file.
The data in the file. The question 'what type of data' doesn't make sense. Data is data.
It contains copy of the file or the address of the file which resides in hard disk.
It effectively contains a copy of the file.
And how the above scenario becomes true.
Via virtual memory. Too broad to cover here.
I will be dealing with really huge files, for which I want just partially to load the content into memory. So I was wondering if the command:
FILE* file=fopen("my/link/file.txt", "r");
loads the whole file content into memory or it is just a pointer to the content? After I open the file I use fgets() to read the file line by line.
And what about fwrite()? Do I need to open and close the file every time I write something so It doesn't overloads or it is managed in the background?
Another thing, is there maybe a nice bash command like "-time" which could tell me the maximal peak memory of my executed program ? I am using OSx.
As per the man page for fopen(),
The fopen() function opens the file whose name is the string pointed to by path and associates a stream with it.
So, no, it does not load the content of the file into memory or elsewhere.
To operate on the returned file pointer, as you already know, you need to use fgets() and family.
Also, once you open the file, get a pointer and does not fclose() the same, you can use the pointer any number of time to write into the file (remember to open the file in append more). You don't need to open and close for every read and write made to the pointer.
Also, FWIW, if you want to move the file pointer back and forth, you may feel fseek() can come handy.
fopen does not load all the file into the memory. It create a file descriptor to the file. Like a pointer to the place of the open file table.
in the open file table you have a pointer to the location of the file on the disk.
if you want to go to place on the file use fseek.
another Option is to use mmap. This is create new mapping in the virtual address space of the calling process. You can access to the file like an array.. (not all the file load into the memory. it use the memory pages mechanism to load the data)
fopen does not read the file, fread and fgets and similar functions do.
Personally I've never tried reading and writing a file at the same time.
It should work, though.
You can use multiple file pointers to the same file.
There is no command like time for memory consumption. The simplest way is to look at top. There exist malloc/new replacement libraries which can do that for you.
loads the whole file content into memory or it is just a pointer to the content?
No,
fopen() open file with the specified filename and associates it with a stream that can be identified by the FILE pointer.
fread() can be used to get file contents into buffer.
Multiple read/write operations can be carried out without any need for opening files number of times.
Functions like rewind() and fseek() can be used to change position of cursor in file.
I'm adding encryption functionality to existing program (plain c console app).
It takes few files as input parameter and my task is to make this files(sensitive data) secured.
What I do now is that I encrypt the files first (simple XOR with external application), then decipher them back inside the program, the existing program process the files. Then, after everything is successful I encrypt those files back (everything is stored locally on hard disk).
HOWEVER, there is a hole in security, since all the "open" files are stored on hard disk. In case the program fails somewhere in the middle, those files will not be decrypted back.
My problem is that the existing program is taking the FILE variable as input and works directly with those files. It's not my program so I don't have rights to modify it.
What I would need is to write files into memory instead of hard disk.
I know that there are some libraries in Linux that enable this, but I develop this in win.
The FILE data type is used to obtain a file pointer to the file being opened, so your FILE * variable is already in memory. If you want the whole file in memory, then you have to allocate a buffer with the same size of your file and read the whole file in memory.
On Stackoverflow you can find more examples about fread, fwrite, fseek and so on.
EDIT:
If you want to manipulate files in memory, but using the existing stdio.h interfaces (fopen, fread, fwrite, etc.) you need a ramdrive. Here you can find some free driver to make it on Windows. Remember that you have to move files inside the ramdrive to process them, and move them outside the ramdrive when you have finished, or all your changes will be lost.
I'm studying for my operating systems midterm and was wondering if I can get some help.
Can someone explain the checks and what the kernel does during the open() system call?
Thanks!
Very roughly, you can think of the following steps:
Translate the file name into an inode, which is the actual file system object describing the contents of the file, by traversing the filesystem data structures.
During this traversal, the kernel will check that you have sufficient access through the directory path to the file, and check access on the file itself. The precise checks depend on what modes were passed to open.
Create what's sometimes called an open file descriptor within the kernel. There is one of these objects for each file the kernel has opened on behalf of any process.
Allocate an unused index in the per-process file descriptor table, and point it at the open file descriptor.
Return this index from the system call as the file descriptor.
This description should be essentially correct for opening plain files and/or directories, but things are different for various sorts of special files, in particular for devices.
I would go back to what the prof told you - there a lot of things that happen during open(), depending on what you're opening (i.e. a device, a file, a directory), and unless you write what the professor's looking for, you'll lose points.
That being said, it mostly involves the checks to see if this open is valid (i.e. does this file exist, does the user have permissions to read/write it, etc), then an entry in the kernel handle table is allocated to keep track of the fd and its current file position (and of course, some other things)