Basics questions regarding File and I/O System Calls in C (on Linux/UNIX) - c

I'm working on improving my C programming knowledge, but I am having trouble understanding the man pages for the following Unix system calls:
open
create
close
unlink
read
write
lseek
The man pages for each of these are, for lack of a better term, completely confusing and unintelligible. For example, here is the man page for open:
"Given a pathname for a file, open() returns a file descriptor, a small, nonnegative integer for use in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.). The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
By default, the new file descriptor is set to remain open across an execve(2) (i.e., the FD_CLOEXEC file descriptor flag described in fcntl(2) is initially disabled; the O_CLOEXEC flag, described below, can be used to change this default). The file offset is set to the beginning of the file (see lseek(2)).
A call to open() creates a new open file description, an entry in the system-wide table of open files. This entry records the file offset and the file status flags (modifiable via the fcntl(2) F_SETFL operation). A file descriptor is a reference to one of these entries; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. The new open file description is initially not shared with any other process, but sharing may arise via fork(2)."
I have no idea what this all means. From my understanding, if open returns a negative integer, an error occurred, and if it returns a positive integer, then that integer can be used in further system calls (???). That is, unfortunately, basically the extent of my knowledge and what I can attempt to parse from the man page. I need some help.
What does it mean that it "returns the lowest-numbered file descriptor not currently open for the process"? What process is it referring to? Why is it the lowest-numbered file descriptor, and why does this matter/how would I use this? I hate to sound like an idiot but I honestly have no clue what it's talking about.
Let's take an example. Let's say I wanted to create a new file in a directory, and open up a file from another directory, and copy the file I opened into the file I created, while checking for errors along the way. This is my attempt:
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
int XYZ = creat("XYZ.doc", 0 );
if (XYZ < 0)
printf("file creating error");
int file = open("/usr/.../xx.xx", 0);
if(file < 0)
printf("file opening error");
}
How would I copy the file that I opened into the file that I created? That should be easy. But what if I wanted to copy the file that I opened in reverse to the file that I created? (Maybe that example will illuminate how to use the file offset stuff mentioned in the man page, which I don't currently understand...)
I would like to edit this post to write a layman's terms description next to each of these system calls, thus creating a good online resource for people to study from. Also, if anyone has any good references for these system calls in C, that would be much appreciated as well.

Error checking left out for simplicity sake:
char data[1024]; /* size of this chosen more or less on a whim */
ssize_t n;
while ((n = read(file, data, sizeof(data))) > 0) {
write(XYZ, data, n);
}
close(file);
close(XYZ);

Related

How to replace the use of tmpnam() in the following code snippet

I fully understand that tmpnam has been deprecated and would like to remove it from a function in an existing file that prevents me from building the project. However, since I am not familiar with it and am unable to experiment with it, I am not sure how best to replicate this functionality.
if ((myfileName = tmpnam(NULL)) == NULL) { return APP_ERROR }
I read the information on tmpnam here but the best I can come up with is to use something like:
if (tmpnam_r == NULL) { return APP_ERROR }
However, since I cannot compile with tmpnam and am unfamiliar with the code in question, I am not confident in properly capturing the original intent.
As best as I can tell, this appears to be testing if the file exists, and if not, simply returns an error, as the next step consists of copying content into myfileName, which should presumably exist following the above check.
The problem with tmpnam() is that it generates a name that is unique and does not exist when it returns, but that name is not guaranteed to be unique by the time you use it in a call to fopen() (or open()).
The key feature of the mkstemp() function is that it creates and opens a file with the new name, so that there isn't a TOCTOU (time of check, time of use) vulnerability. This cuts down the avenues for security risks.
Code designed to use tmpnam() usually needs a file name, so using tmpfile() is usually not an option; it doesn't provide a way to find the file name. If you don't need the file name then using tmpfile() works well and is Standard C, so it is widely available.
The specific case of tmpnam() and tmpnam_s() is interesting. Although tmpnam_s() avoids some string-related problems, it does not change the behaviour of tmpnam() in the way that causes the security problems addressed by mkstemp(). So, independent of the portability issues that arise from attempting to use tmpnam_s() (or any of the other *_s() functions from Annex K of the C11 or C18 standards), it doesn't fix the problem that causes tmpnam() to be deprecated.
You can arrange to use mkstemp() instead of tmpnam() and close the file descriptor before continuing with the other code:
tmpnam(name); // Replace this
int fd = mkstemp(name); // With this…
if (fd >= 0)
close(fd);
It's not great, but it does ensure the file is created, which reduces the security vulnerabilities a bit, but not as much as using the file descriptor directly. You could (should) wrap that into a function.
Note that the mkstemp() returns a file descriptor; if you want a file stream, you can use fdopen() to create a file stream from the file descriptor. And if that fails, you probably want to remove the file (with remove() or unlink()).
So, that gives you a need for fmkstemp():
#include <stdio.h>
#include <stdlib.h> /* mkstemp() */
#include <unistd.h> /* close() */
extern FILE *fmkstemp(char *name); /* Add to a convenient header file */
FILE *fmkstemp(char *name)
{
int fd = mkstemp(name);
FILE *fp = 0;
if (fd >= 0)
{
fp = fdopen(fd, "w+");
if (fp == 0)
{
close(fd);
unlink(name);
}
}
return(fp);
}
Note that after you've used fmkstemp(), you use fclose() to close the file stream (and, behind the scenes, that closes the file descriptor).
Don't forget to remove the temporary file before exit. That's where a function registered with atexit() or one of its variants can be useful.

What is the relation between fopen and open?

I am working on a project for a class and we were given a .c file containing the following code:
int fd = -1;
if (fd < 0)
{
fd = open ("my_dev", O_RDWR);
if (fd < 0)
{
perror ("open");
return -1;
}
...
So I understand that it is trying to open a file "my_dev" with read/write permissions, and then is returning the file descriptor on success or a negative value on failure, but what I dont understand is why it is giving me "permission denied" consistently. I tried to use this code:
int des = open("my_dev", O_CREAT | O_RDWR, 0777);
...
close(des)
to open/create the file (this is called before the other block), but this does not work, and yet if I just use this instead:
FILE* file = fopen("my_dev","w+");
fprintf(file,str);
fclose(file);
I can write to the file, meaning I have write permissions. Now normally, I would just use fopen and fprintf for everything, but for this project, we sort of have to use the teacher's .c file which is going to try to use
open()
which is going to give a "permission denied" error which is in turn going to screw up my code.
I guess my question is how fopen and open relate to each other? Everyone seems to be able to recite that open is a system call whereas fopen is a standard lib function, but I cant seem to find a clear answer for how I can create a file with fopen() that can be opened by open() without a "permission denied" error, or how I can create a file with open() which I can then write to, close and open again with open().
In short, how do I create a file in C that I can write to and open later with open(O_RDWR)?
Sorry if this is a little piecey, im super tired.
PS: It should be noted that I am compiling and running on a university computer, so permissions may be "weird" BUT it should be noted that if I create the file with the terminal command "dd" open() will work, and furthermore, I clearly have SOME write permissions since I can indeed write to the file with fopen and fprintf
fopen is a library function that provided by the standard C runtime, it returns a stream and you can call stream functions on it, like fscanf, fprintf, or fread, fwrite.
open is usually a system call on unix-like systems, provided by the operating system kernel, it returns an file descriptor, you can call IO functions with the fd, like read, write.
Generally fopen is implemented using open underline.
If you want to use standard stream functions on a file descriptor, you can use the posix api, fdopen, which takes a fd, and returns a FILE* stream.

Negative return value in open system call for file created in /proc file system

I have created a file in /proc named "test" (it was created in kernel). The file exists. When I want to open it in user level it returns negative.
int fd;
if((fd=open("/proc/test","O_RDONLY"))<0){
perror("open");
}
The error that I see is open: File exists. I have seen this question but it is not my case.
You need parentheses in there (now fixed in the question), and the second argument to open() is not a string:
#include <fcntl.h>
int fd;
if ((fd = open("/proc/test", O_RDONLY)) < 0)
perror("open");
I'm not convinced it was a good idea to create a file of any sort in the /proc file system. In fact, I'm a bit surprised you were allowed to. If you are learning to program as root, I hope you have good backups.

Using File Descriptors with readlink()

I have a situation where I need to get a file name so that I can call the readlink() function. All I have is an integer that was originally stored as a file descriptor via an open() command. Problem is, I don't have access to the function where the open() command executed (if I did, then I wouldn't be posting this). The return value from open() was stored in a struct that I do have access to.
char buf[PATH_MAX];
char tempFD[2]; //file descriptor number of the temporary file created
tempFD[0] = fi->fh + '0';
tempFD[1] = '\0';
char parentFD[2]; //file descriptor number of the original file
parentFD[0] = (fi->fh - 1) + '0';
parentFD[1] = '\0';
if (readlink(tempFD, buf, sizeof(buf)) < 0) {
log_msg("\treadlink() error\n");
perror("readlink() error");
} else
log_msg("readlink() returned '%s' for '%s'\n", buf, tempFD);
This is part of the FUSE file system. The struct is called fi, and the file descriptor is stored in fh, which is of type uint64_t. Because of the way this program executes, I know that the two linked files have file descriptor numbers that are always 1 apart. At least that's my working assumption, which I am trying to verify with this code.
This compiles, but when I run it, my log file shows a readlink error every time. My file descriptors have the correct integer values stored in them, but it's not working.
Does anyone know how I can get the file name from these integer values? Thanks!
If it's acceptable that your code becomes non portable and is tied to being run on a somewhat modern version of Linux, then you can use /proc/<pid>/fd/<fd>. However, I would recommend against adding '0' to the fd as a means to get the string representing the number, because it uses the assumption that fd < 10.
However it would be best if you were able to just pick up the filename instead of relying on /proc. At the very least, you can replace calls to the library's function with a wrapper function using a linker flag. Example of usage is gcc program.c -Wl,-wrap,theFunctionToBeOverriden -o program, all calls to the library function will be linked against __wrap_theFunctionToBeOverriden; the original function is accessible under the name __real_theFunctionToBeOverriden. See this answer https://stackoverflow.com/a/617606/111160 for details.
But, back to the answer not involving linkage rerouting: you can do it something like
char fd_path[100];
snprintf("/proc/%d/fd/%d", sizeof(fd_path), getpid(), fi->fh);
You should now use this /proc/... path (it is a softlink) rather than using the path it links to.
You can call readlink to find the actual path in the filesystem. However, doing so introduces a security vulnerability and I suggest against using the path readlink returns.
When the file the descriptor points at is deleted,unlinked, then you can still access it through the /proc/... path. However, when you readlink on it, you get the original pathname (appended with a ' (deleted)' text).
If your file was /tmp/a.txt and it gets deleted, readlink on the /proc/... path returns /tmp/a.txt (deleted). If this path exists, you will be able to access it!, while you wanted to access a different file (/tmp/a.txt). An attacker may be able to provide hostile contents in the /tmp/a.txt (deleted) file.
On the other hand, if you just access the file through the /proc/... path, you will access the correct (unlinked but still alive) file, even if the path claims to be a link to something else.

Duplicate file descriptor with its own file offset

How can one create a new file descriptor from an existing file descriptor such that the new descriptor does not share the same internal file structure/entry in the file table? Specifically attributes such as file offset (and preferably permissions, sharing and modes) should not be shared between the new and old file descriptors.
Under both Windows and Linux, dup() will duplicate the file descriptor, but both descriptors still point to the same file structure in the process' file table. Any seeking on either descriptor will adjust the position for the other descriptors as well.
Note
I've since received answers for both Windows and Linux and adjusted the question a little too often, which has made it difficult for people to answer. I'll adjust my votes and accept the cleanest answer which covers both Windows and Linux. Apologies to all, I'm still new to the SO paradigm. Thanks for the great answers!
So basically, what you really want is to be given a file descriptor, and basically open the same file over again, to get a separate position, sharing, mode, etc. And you want to do this on Windows (where the "file descriptor" is basically a foreign object, not something used directly by the OS or the run-time library at all.
Amazingly enough, there is a way to do that, at least with MS VC++. All but two steps of it use only the Win32 API so porting to other compilers/libraries should be fairly reasonable (I think most supply versions of those two functions). Those are for converting a Unix-style file descriptor to a native Win32 file handle, and converting a native Win32 file handle back to a Unix-style file descriptor.
Convert file-descriptor to native file handle with _get_osfhandle()
Get a name for the file with GetFileInformationByHandleEx(FILE_NAME_INFO)1
Use CreateFile to open a new handle to that file
Create a file descriptor for that handle with _open_osfhandle()
Et voilĂ , we have a new file descriptor referring to the same file, but with its own permissions, position, etc.
Toward the end of your question, you make it sound like you also want the "permissions", but that doesn't seem to make any real sense -- the permissions attach to the file itself, not to how the file is opened, so opening or reopening the file has no effect on the file's permissions. If you really want to know the, you can get it with GetFileInformationByHandle, but be aware that file permissions in Windows are quite a bit different from the (traditional) file permissions in Unix. Unix has owner/group/world permissions on all files, and most systems also have ACLs (though there's more variation in how they work). Windows either has no permissions at all (e.g., files on FAT or FAT32) or else uses ACLs (e.g., files on NTFS), but nothing that's really equivalent to the traditional owner/group/world permissions most people are accustomed to on Unix.
Perhaps you're using "permissions" to refer to whether the file was open for reading, writing, or both. Getting that is considerably uglier than any of the preceding. The problem is that most of it is in the library, not Win32, so there's probably no way to do it that will be even close to portable between compilers. With MS VC++ 9.0 SP1 (not guaranteed for any other compiler) you can do this:
#include <stdio.h>
int get_perms(int fd) {
int i;
FILE * base = __iob_func();
for (i=0; i<_IOB_ENTRIES; i++)
if (base[i]._file == fd)
return base[i]._flag; // we've found our file
return 0; // file wasn't found.
}
Since this involved some spelunking, I wrote a quick test to verify that it might actually work:
#ifdef TEST
#include <io.h>
void show_perms(int perms, char const *caption) {
printf("File opened for %s\n", caption);
printf("Read permission = %d\n", (perms & _IOREAD)!=0);
printf("Write permission = %d\n", (perms & _IOWRT)!=0);
}
int main(int argc, char **argv) {
FILE *file1, *file2;
int perms1, perms2;
file1=fopen(argv[1], "w");
perms1 = get_perms(_fileno(file1));
fclose(file1);
file2=fopen(argv[1], "r");
perms2 = get_perms(_fileno(file2));
fclose(file2);
show_perms(perms1, "writing");
show_perms(perms2, "reading");
return 0;
}
#endif
And the results seem to indicate success:
File opened for writing
Read permission = 0
Write permission = 1
File opened for reading
Read permission = 1
Write permission = 0
You can then test that returned flag against _IOREAD, _IOWRT, and _IORW, which are defined in stdio.h. Despite my previous warnings, I should probably point out that I suspect (though I certainly can't guarantee) that this part of the library is fairly stable, so the real chances of major changes are probably fairly minimal.
In the other direction, however, there's basically no chance at all that it'll work with any other library. It could (but certainly isn't guaranteed to) work with the other compilers that use the MS library, such as Intel, MinGW or Comeau using MS VC++ as its back-end. Of those, I'd say the most likely to work would be Comeau, and the least likely MinGW (but that's only a guess; there's a good chance it won't work with any of them).
Requires the redistributable Win32 FileID API Library
So, I recommend reading up on this a little more. The dup() and related functions serve to create a duplicate value in the file descriptor table pointing to the same entry in the open file table. This is intended to have the same offset. If you call open(), you will create a new entry the open file table.
It doesn't make any sense to create a duplicate of a file descriptor and that new file descriptor have a different offset in the open file table (this seems to contradict what the word "duplicate" means).
I'm not sure what your question is actually. I mean, it isn't the same thing as a duplicate. You could read:
/proc/self/fd/[descriptor]
and get the string that was used to open that file descriptor; bear in mind this may provide some pitfalls, some of which you actually noted in your observation of calling open() again.
Maybe you can explain a little more and I can try to update to help.
Why don't you just open the file a second time with open() or CreateFile() on windows? This gives you all freedom of different access rights and separate offset.
This of course has the drawback that you you can not open the file exclusively, but it solves your problem very simply.

Resources