Duplicate file descriptor with its own file offset - c

How can one create a new file descriptor from an existing file descriptor such that the new descriptor does not share the same internal file structure/entry in the file table? Specifically attributes such as file offset (and preferably permissions, sharing and modes) should not be shared between the new and old file descriptors.
Under both Windows and Linux, dup() will duplicate the file descriptor, but both descriptors still point to the same file structure in the process' file table. Any seeking on either descriptor will adjust the position for the other descriptors as well.
Note
I've since received answers for both Windows and Linux and adjusted the question a little too often, which has made it difficult for people to answer. I'll adjust my votes and accept the cleanest answer which covers both Windows and Linux. Apologies to all, I'm still new to the SO paradigm. Thanks for the great answers!

So basically, what you really want is to be given a file descriptor, and basically open the same file over again, to get a separate position, sharing, mode, etc. And you want to do this on Windows (where the "file descriptor" is basically a foreign object, not something used directly by the OS or the run-time library at all.
Amazingly enough, there is a way to do that, at least with MS VC++. All but two steps of it use only the Win32 API so porting to other compilers/libraries should be fairly reasonable (I think most supply versions of those two functions). Those are for converting a Unix-style file descriptor to a native Win32 file handle, and converting a native Win32 file handle back to a Unix-style file descriptor.
Convert file-descriptor to native file handle with _get_osfhandle()
Get a name for the file with GetFileInformationByHandleEx(FILE_NAME_INFO)1
Use CreateFile to open a new handle to that file
Create a file descriptor for that handle with _open_osfhandle()
Et voilà, we have a new file descriptor referring to the same file, but with its own permissions, position, etc.
Toward the end of your question, you make it sound like you also want the "permissions", but that doesn't seem to make any real sense -- the permissions attach to the file itself, not to how the file is opened, so opening or reopening the file has no effect on the file's permissions. If you really want to know the, you can get it with GetFileInformationByHandle, but be aware that file permissions in Windows are quite a bit different from the (traditional) file permissions in Unix. Unix has owner/group/world permissions on all files, and most systems also have ACLs (though there's more variation in how they work). Windows either has no permissions at all (e.g., files on FAT or FAT32) or else uses ACLs (e.g., files on NTFS), but nothing that's really equivalent to the traditional owner/group/world permissions most people are accustomed to on Unix.
Perhaps you're using "permissions" to refer to whether the file was open for reading, writing, or both. Getting that is considerably uglier than any of the preceding. The problem is that most of it is in the library, not Win32, so there's probably no way to do it that will be even close to portable between compilers. With MS VC++ 9.0 SP1 (not guaranteed for any other compiler) you can do this:
#include <stdio.h>
int get_perms(int fd) {
int i;
FILE * base = __iob_func();
for (i=0; i<_IOB_ENTRIES; i++)
if (base[i]._file == fd)
return base[i]._flag; // we've found our file
return 0; // file wasn't found.
}
Since this involved some spelunking, I wrote a quick test to verify that it might actually work:
#ifdef TEST
#include <io.h>
void show_perms(int perms, char const *caption) {
printf("File opened for %s\n", caption);
printf("Read permission = %d\n", (perms & _IOREAD)!=0);
printf("Write permission = %d\n", (perms & _IOWRT)!=0);
}
int main(int argc, char **argv) {
FILE *file1, *file2;
int perms1, perms2;
file1=fopen(argv[1], "w");
perms1 = get_perms(_fileno(file1));
fclose(file1);
file2=fopen(argv[1], "r");
perms2 = get_perms(_fileno(file2));
fclose(file2);
show_perms(perms1, "writing");
show_perms(perms2, "reading");
return 0;
}
#endif
And the results seem to indicate success:
File opened for writing
Read permission = 0
Write permission = 1
File opened for reading
Read permission = 1
Write permission = 0
You can then test that returned flag against _IOREAD, _IOWRT, and _IORW, which are defined in stdio.h. Despite my previous warnings, I should probably point out that I suspect (though I certainly can't guarantee) that this part of the library is fairly stable, so the real chances of major changes are probably fairly minimal.
In the other direction, however, there's basically no chance at all that it'll work with any other library. It could (but certainly isn't guaranteed to) work with the other compilers that use the MS library, such as Intel, MinGW or Comeau using MS VC++ as its back-end. Of those, I'd say the most likely to work would be Comeau, and the least likely MinGW (but that's only a guess; there's a good chance it won't work with any of them).
Requires the redistributable Win32 FileID API Library

So, I recommend reading up on this a little more. The dup() and related functions serve to create a duplicate value in the file descriptor table pointing to the same entry in the open file table. This is intended to have the same offset. If you call open(), you will create a new entry the open file table.
It doesn't make any sense to create a duplicate of a file descriptor and that new file descriptor have a different offset in the open file table (this seems to contradict what the word "duplicate" means).
I'm not sure what your question is actually. I mean, it isn't the same thing as a duplicate. You could read:
/proc/self/fd/[descriptor]
and get the string that was used to open that file descriptor; bear in mind this may provide some pitfalls, some of which you actually noted in your observation of calling open() again.
Maybe you can explain a little more and I can try to update to help.

Why don't you just open the file a second time with open() or CreateFile() on windows? This gives you all freedom of different access rights and separate offset.
This of course has the drawback that you you can not open the file exclusively, but it solves your problem very simply.

Related

How to replace the use of tmpnam() in the following code snippet

I fully understand that tmpnam has been deprecated and would like to remove it from a function in an existing file that prevents me from building the project. However, since I am not familiar with it and am unable to experiment with it, I am not sure how best to replicate this functionality.
if ((myfileName = tmpnam(NULL)) == NULL) { return APP_ERROR }
I read the information on tmpnam here but the best I can come up with is to use something like:
if (tmpnam_r == NULL) { return APP_ERROR }
However, since I cannot compile with tmpnam and am unfamiliar with the code in question, I am not confident in properly capturing the original intent.
As best as I can tell, this appears to be testing if the file exists, and if not, simply returns an error, as the next step consists of copying content into myfileName, which should presumably exist following the above check.
The problem with tmpnam() is that it generates a name that is unique and does not exist when it returns, but that name is not guaranteed to be unique by the time you use it in a call to fopen() (or open()).
The key feature of the mkstemp() function is that it creates and opens a file with the new name, so that there isn't a TOCTOU (time of check, time of use) vulnerability. This cuts down the avenues for security risks.
Code designed to use tmpnam() usually needs a file name, so using tmpfile() is usually not an option; it doesn't provide a way to find the file name. If you don't need the file name then using tmpfile() works well and is Standard C, so it is widely available.
The specific case of tmpnam() and tmpnam_s() is interesting. Although tmpnam_s() avoids some string-related problems, it does not change the behaviour of tmpnam() in the way that causes the security problems addressed by mkstemp(). So, independent of the portability issues that arise from attempting to use tmpnam_s() (or any of the other *_s() functions from Annex K of the C11 or C18 standards), it doesn't fix the problem that causes tmpnam() to be deprecated.
You can arrange to use mkstemp() instead of tmpnam() and close the file descriptor before continuing with the other code:
tmpnam(name); // Replace this
int fd = mkstemp(name); // With this…
if (fd >= 0)
close(fd);
It's not great, but it does ensure the file is created, which reduces the security vulnerabilities a bit, but not as much as using the file descriptor directly. You could (should) wrap that into a function.
Note that the mkstemp() returns a file descriptor; if you want a file stream, you can use fdopen() to create a file stream from the file descriptor. And if that fails, you probably want to remove the file (with remove() or unlink()).
So, that gives you a need for fmkstemp():
#include <stdio.h>
#include <stdlib.h> /* mkstemp() */
#include <unistd.h> /* close() */
extern FILE *fmkstemp(char *name); /* Add to a convenient header file */
FILE *fmkstemp(char *name)
{
int fd = mkstemp(name);
FILE *fp = 0;
if (fd >= 0)
{
fp = fdopen(fd, "w+");
if (fp == 0)
{
close(fd);
unlink(name);
}
}
return(fp);
}
Note that after you've used fmkstemp(), you use fclose() to close the file stream (and, behind the scenes, that closes the file descriptor).
Don't forget to remove the temporary file before exit. That's where a function registered with atexit() or one of its variants can be useful.

Basics questions regarding File and I/O System Calls in C (on Linux/UNIX)

I'm working on improving my C programming knowledge, but I am having trouble understanding the man pages for the following Unix system calls:
open
create
close
unlink
read
write
lseek
The man pages for each of these are, for lack of a better term, completely confusing and unintelligible. For example, here is the man page for open:
"Given a pathname for a file, open() returns a file descriptor, a small, nonnegative integer for use in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.). The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
By default, the new file descriptor is set to remain open across an execve(2) (i.e., the FD_CLOEXEC file descriptor flag described in fcntl(2) is initially disabled; the O_CLOEXEC flag, described below, can be used to change this default). The file offset is set to the beginning of the file (see lseek(2)).
A call to open() creates a new open file description, an entry in the system-wide table of open files. This entry records the file offset and the file status flags (modifiable via the fcntl(2) F_SETFL operation). A file descriptor is a reference to one of these entries; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. The new open file description is initially not shared with any other process, but sharing may arise via fork(2)."
I have no idea what this all means. From my understanding, if open returns a negative integer, an error occurred, and if it returns a positive integer, then that integer can be used in further system calls (???). That is, unfortunately, basically the extent of my knowledge and what I can attempt to parse from the man page. I need some help.
What does it mean that it "returns the lowest-numbered file descriptor not currently open for the process"? What process is it referring to? Why is it the lowest-numbered file descriptor, and why does this matter/how would I use this? I hate to sound like an idiot but I honestly have no clue what it's talking about.
Let's take an example. Let's say I wanted to create a new file in a directory, and open up a file from another directory, and copy the file I opened into the file I created, while checking for errors along the way. This is my attempt:
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
int XYZ = creat("XYZ.doc", 0 );
if (XYZ < 0)
printf("file creating error");
int file = open("/usr/.../xx.xx", 0);
if(file < 0)
printf("file opening error");
}
How would I copy the file that I opened into the file that I created? That should be easy. But what if I wanted to copy the file that I opened in reverse to the file that I created? (Maybe that example will illuminate how to use the file offset stuff mentioned in the man page, which I don't currently understand...)
I would like to edit this post to write a layman's terms description next to each of these system calls, thus creating a good online resource for people to study from. Also, if anyone has any good references for these system calls in C, that would be much appreciated as well.
Error checking left out for simplicity sake:
char data[1024]; /* size of this chosen more or less on a whim */
ssize_t n;
while ((n = read(file, data, sizeof(data))) > 0) {
write(XYZ, data, n);
}
close(file);
close(XYZ);

Using File Descriptors with readlink()

I have a situation where I need to get a file name so that I can call the readlink() function. All I have is an integer that was originally stored as a file descriptor via an open() command. Problem is, I don't have access to the function where the open() command executed (if I did, then I wouldn't be posting this). The return value from open() was stored in a struct that I do have access to.
char buf[PATH_MAX];
char tempFD[2]; //file descriptor number of the temporary file created
tempFD[0] = fi->fh + '0';
tempFD[1] = '\0';
char parentFD[2]; //file descriptor number of the original file
parentFD[0] = (fi->fh - 1) + '0';
parentFD[1] = '\0';
if (readlink(tempFD, buf, sizeof(buf)) < 0) {
log_msg("\treadlink() error\n");
perror("readlink() error");
} else
log_msg("readlink() returned '%s' for '%s'\n", buf, tempFD);
This is part of the FUSE file system. The struct is called fi, and the file descriptor is stored in fh, which is of type uint64_t. Because of the way this program executes, I know that the two linked files have file descriptor numbers that are always 1 apart. At least that's my working assumption, which I am trying to verify with this code.
This compiles, but when I run it, my log file shows a readlink error every time. My file descriptors have the correct integer values stored in them, but it's not working.
Does anyone know how I can get the file name from these integer values? Thanks!
If it's acceptable that your code becomes non portable and is tied to being run on a somewhat modern version of Linux, then you can use /proc/<pid>/fd/<fd>. However, I would recommend against adding '0' to the fd as a means to get the string representing the number, because it uses the assumption that fd < 10.
However it would be best if you were able to just pick up the filename instead of relying on /proc. At the very least, you can replace calls to the library's function with a wrapper function using a linker flag. Example of usage is gcc program.c -Wl,-wrap,theFunctionToBeOverriden -o program, all calls to the library function will be linked against __wrap_theFunctionToBeOverriden; the original function is accessible under the name __real_theFunctionToBeOverriden. See this answer https://stackoverflow.com/a/617606/111160 for details.
But, back to the answer not involving linkage rerouting: you can do it something like
char fd_path[100];
snprintf("/proc/%d/fd/%d", sizeof(fd_path), getpid(), fi->fh);
You should now use this /proc/... path (it is a softlink) rather than using the path it links to.
You can call readlink to find the actual path in the filesystem. However, doing so introduces a security vulnerability and I suggest against using the path readlink returns.
When the file the descriptor points at is deleted,unlinked, then you can still access it through the /proc/... path. However, when you readlink on it, you get the original pathname (appended with a ' (deleted)' text).
If your file was /tmp/a.txt and it gets deleted, readlink on the /proc/... path returns /tmp/a.txt (deleted). If this path exists, you will be able to access it!, while you wanted to access a different file (/tmp/a.txt). An attacker may be able to provide hostile contents in the /tmp/a.txt (deleted) file.
On the other hand, if you just access the file through the /proc/... path, you will access the correct (unlinked but still alive) file, even if the path claims to be a link to something else.

Calling fopen on Windows core files returns NULL pointer

I am trying to open a couple different files via their absolute path (determined elsewhere, programmatically), so I can get their SHA1 hash*, some of which are core windows files. fopen() is returning NULL on some (but not all) files when I attempt to open them as follows (normally the filename is gotten via QueryFullProcessImageName but I hardcoded it just in case):
char * filename = "c:\\windows\\system32\\spoolsv.exe";
FILE * currFileRead = fopen(filename, "rb");
if (currFileRead == NULL)
{
printf("Failed to open %s, error %s\n", filename, strerror(errno) );
}
else
{
//hashing code
}
The reported error is 2: "No such file or directory", but obviously they're there. It also only fails for some processes, like spoolsv.exe or winlogon.exe, while svchost.exe and wininint.exe seem to open just fine.
My program has administrative privileges, and I can't figure why some processes would fail while others opened without trouble?
*I'm using a method from LibTomCrypt (http://libtom.org/?page=features) which is open source with a permissive license. The call to sha1_process takes in a hash_state (internal to the library), an unsigned char buffer, and the length of the buffer. I need to read the file with fopen to get the file into memory for hashing.
Because your program is a 32-bit process, when you try to open c:\windows\system32 you actually get c:\windows\syswow64 which does not contain all of the same files.
You can use IsWow64Process to determine whether you are running on a 64-bit system. If you are, you can replace system32 with sysnative in the path to open the actual file, unless you need to support Windows 2003 or Windows XP. Depending on your circumstances, you might need to cope with the possibility that the Windows folder is not c:\windows and/or the possibility that there are other folders named system32.
On the whole it would be more robust to have separate 32-bit and 64-bit versions of your application, or perhaps just the particular part of it that is exhibiting the problem. If you can't leave it up to the user to install the appropriate version, the installer could decide which to install, or you could always install both and have the 32-bit version automatically launch the 64-bit version when running on a 64-bit system.
Having administrative privileges is not always enough, because if the file you want to open is in use and the program that is using it has locked it, then you can't open and read that file.

How to check if a file is already open by another process in C?

I see that standard C has no way of telling if a file is already opened in another process. So the answer should contain several examples for each platform. I need that check for Visual C++ / Windows though.
Windows: Try to open the file in exclusive mode. If it works, no one else has opened the file and will not be able to open the file
HANDLE fh;
fh = CreateFile(filename, GENERIC_READ, 0 /* no sharing! exclusive */, NULL, OPEN_EXISTING, 0, NULL);
if ((fh != NULL) && (fh != INVALID_HANDLE_VALUE))
{
// the only open file to filename should be fh.
// do something
CloseHandle(fh);
}
MS says: dwShareMode
The sharing mode of an object, which can be read, write, both, delete, all of these, or none (refer to the following table).
If this parameter is zero and CreateFile succeeds, the object cannot be shared and cannot be opened again until the handle is closed.
You cannot request a sharing mode that conflicts with the access mode that is specified in an open request that has an open handle, because that would result in the following sharing violation: ERROR_SHARING_VIOLATION.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx
extension:
how to delete a (not readonly) file filesystem which no one has open for read/write?
access right FILE_READ_ATTRIBUTES, not DELETE. DELETE could cause problems on smb share (to MS Windows Servers) - CreateFile will leave with a still open FileHandle /Device/Mup:xxx filename - why ever and whatever this Mup is. Will not happen with access right FILE_READ_ATTRIBUTES
use FILE_FLAG_OPEN_REPARSE_POINT to delete filename. Else you will delete the target of a symbolic link - which is usually not what you want
HANDLE fh;
fh = CreateFile(filename, FILE_READ_ATTRIBUTES, FILE_SHARE_DELETE /* no RW sharing! */, NULL, OPEN_EXISTING, FILE_FLAG_OPEN_REPARSE_POINT|FILE_FLAG_DELETE_ON_CLOSE, NULL);
if ((fh != NULL) && (fh != INVALID_HANDLE_VALUE))
{
DeleteFile(filename); /* looks stupid?
* but FILE_FLAG_DELETE_ON_CLOSE will not work on some smb shares (e.g. samba)!
* FILE_SHARE_DELETE should allow this DeleteFile() and so the problem could be solved by additional DeleteFile()
*/
CloseHandle(fh); /* a file, which no one has currently opened for RW is delete NOW */
}
what to do with an open file? If the file is open and you are allowed to do an unlink, you will be left a file where subsequent opens will lead to ACCESS_DENIED.
If you have a temporary folder, then it could be a good idea to rename(filename, tempdir/filename.delete) and delete tempdir/filename.delete.
There's no way tell, unless the other process explicitly forbids access to the file. In MSVC, you'd do so with _fsopen(), specifying _SH_DENYRD for the shflag argument. The notion of being interested whether a file is opened that isn't otherwise locked is deeply flawed on a multitasking operating system. It might be opened a microsecond after you'd have found it wasn't. That's also the reason that Windows doesn't have a IsFileLocked() function.
If you need synchronized access to files, you'll need to add this with a named mutex, use CreateMutex().
Getting the open_files information is DIFFICULT, it's like pulling teeth, and if you don't have an immediate need for it you shouldn't be asking for "several examples for each platform" just for the hell of it. Just my opinion, of course.
Linux and many Unix systems have a system utility called lsof which finds open file handles and stuff. The way it does so is by accessing /dev/kmem, which is a pseudo-file containing a copy of "live" kernel memory, i.e. the working storage of the operating system kernel. There are tables of open files in there, naturally, and the memory structure is open-source and documented, so it's just a matter of a lot of busywork for lsof to go in there, find the information and format it for the user.
Documentation for the deep innards of Windows, on the other hand, is practically nonexistent, and I'm not aware that the data structures are somehow exposed to the outside. I'm no Windows expert, but unless the Windows API explicitly offers this kind of information it may simply not be available.
Whatever is available is probably being used by Mark Russinovich's SysInternals utilities; the first one that comes to mind is FileMon. Looking at those may give you some clues. Update: I've just been informed that SysInternals Handles.exe is even closer to what you want.
If you manage to figure that out, good; otherwise you may be interested in catching file open/close operations as they happen: The Windows API offers a generous handful of so-called Hooks: http://msdn.microsoft.com/en-us/library/ms997537.aspx. Hooks allow you to request notification when certain things happen in the system. I believe there's one that will tell you when a program –systemwide– opens a file. So you can make your own list of files opened for the duration you're listening to your hooks. I don't know for sure but I suspect this may be what FileMon does.
The Windows API, including the hook functions, can be accessed from C. Systemwide hooks will require you to create a DLL to be loaded alongside your program.
Hope these hints help you get started.
For Windows, this code works also:
boolean isClosed(File f) { return f.renameTo(f); }
An opened file can not be renamed, and a rename to same name does not cause another error. So if the rename succeeds, not having really done something, you know the file is not open.
Any such check would be inherently racy. Another process could always open the file between the point where you did the check and the point where you accessed the file.
The answers so far should tell you that finding out the information you've asked for is tricky, non-portable, and often inherently unreliable. So, from my perspective, the real answer is don't do that. Try to find a way to think about your real problem so that this question doesn't arise.
this can't be that hard guys.
do this:
try{
File fileout = new File(path + ".xls");
FileOutPutStream out = new FileOutPutStream(fileout);
}
catch(FileNotFoundException e1){
// if a MS Windows process is already using the file, this exception will be thrown
}
catch(Exception e){
}
You can use something like this. It is not a proper solution. But it works,
bool IsFileDownloadComplete(const std::wstring& dir, const std::wstring& fileName)
{
std::wstring originalFileName = dir + fileName;
std::wstring tempFileName = dir + L"temp";
while(true)
{
int ret = rename(convertWstringToString(originalFileName).c_str(), convertWstringToString(tempFileName).c_str());
if(ret == 0)
break;
Sleep(10);
}
/** File is not open. Rename to original. */
int ret = rename(convertWstringToString(tempFileName).c_str(), convertWstringToString(originalFileName).c_str());
if(ret != 0)
throw std::exception("File rename failed");
return true;
}

Resources