I'm programming in C and trying to write portable code.
My question is, how can I tell if a file exists and is readable?
I am currently using the code:
f = fopen(filename, "r");
if (f) printf("File exists!");
In my program, filename is set by the user, and should be considered as untrusted input (e.g. filename could be maliciously crafted).
My issue is that the above code is not robust. For example, when used on windows with the filename "PRN" it will print "File exists!" even though no such file exists on the filesystem.
I know I could filter out the reserved filenames on Windows, as there is only about a dozen of them, but that feels like a hack. Also, I only know what I know. Maybe there are other "reserved" or "special" names that I don't know about.
Is there any simple and portable way to determine if a file exists in C?
Alternatively, if I have to use an OS API, which function should I use?
Related
What is a good way to check that a file exists with case sensitivity in C on Windows?
I have got this to work by comparing the filename with the all the file entries in the directory of the filename. Is there a more efficient method in C?
Use this:
WIN32_FIND_DATAA FindFileData;
HANDLE h = FindFirstFile(filenametocheck, &FindFileData);
now FindFileData.cFileName contains the filename as it is stored in NTFS.
All you need to do is compare filenametocheck with FindFileData.cFileName.
Don't forget to close the h handle with FindClose(h) and do error checking.
This works only for checking in the current directory, if filenametocheck contains a path (e.g ..\somefile.txt, or C:\\Somedir\Somefile.txt) you need to do some more work.
For further details read the documentation of FindFirstFile and possibly look into this sample.
Be aware that depending on what exactly you're trying to achieve, this may cause a TOCTOU bug as mentioned in a comment.
As a general question: What's the role of file extension when determining file types?
For example, I can change .jpeg file to .png extension and even .txt. Of course, in the case of changing to .txt, it will neither be opened as picture, nor readable.
To determine file type, it seems the safe way is to parse the first few bytes of the file. If extension is not trustable, extension is no more than file name.
As a general rule, you should ALWAYS parse the COMPLETE file in order to be sure that the file is what the extension says. As you can easily imagine, it is pretty simple to create a binary file resembling a e.g. BMP (with a correct header) but then containing something different.
You should never trust the extension neither the header because otherwise a malicious user could exploit some of your code to generate e.g. a buffer overflow, and this is absolutely paramount if you are writing programs that must run at root/admin privilege.
Having said the obvious, the file extension nowadays is mainly used so that the OS can associate a program to that particular file (usually calling the program and passing the selected file as first parameter), and then it's up to the program to determine the file content.
It is a little bit different when talking about executable files. Under Unix, in order to be executable a file has to have the "x" flag set, otherwise it would not run, regardless of the extension. Under Windows, there is not such thing and the OS relies on only a few extensions (EXE, COM, BAT, etc.) to determine which files can be executed.
The EXE file, for example, has to start with "MZ" followed by some information for its allocation and size (http://www.delorie.com/djgpp/doc/exe/) and the OS surely checks its internal headers. Other formats (e.g. the COM executable format of the MS-DOS era) is just "pure" assembly code, so there is no check done by the OS. It just interprets those opcodes, hoping that everything will be fine.
So, to summarize:
File extension is mainly used so that the OS can call the appropriate program to open it (and passing the filename as the first parameter, argc/argv in C language for example)
Windows relies on some file extension to know if a file is executable, while Unix/Mac relies on a particular flag (x) associated with the file
Two things that are not well known about file extensions: directory names can have extension too, and extension can be way longer than the usual 3 characters.
With the help of file extension, you know how to read the first few and all the rest of the bytes. You also know what program to use to read the file. Or if it is an executable, you know that it is to be executed and not shown as a picture.
Yes you can change the file extension, but what does it mean then? It only means that OS (or any program that tried to read the file) is working correctly. Only you are providing bad data to it.
File extension is not something that some bytes of data inherently have. Extensions are given to those bytes depending upon the protocol followed to write them that way. After you have encoded the letters in binary form, you provide that binary form with .txt extension so that the text reader knows that these bytes convert to letters. That's the role of file extension. With bad file extension, this role is not fulfilled, resulting in incomprehension of the data you saved in binary.
As a general question: What's the role of file extension when determining file types?
The file extension usually identifies the application that opens a file.
If you rename a .JPG to a .PNG and while having JPG and PNG opened by the same application (usually an image viewer) that application can read the image stream and process it correctly regardless of having an incorrect file stream.
The problem arises if you rename the file in such a way that the file gets routed to an application that cannot handle the file's content.
If you rename a .DOCX (word) file to an Autocad extension (.DWG), opening the word file in autocad is likely to produce errors (unless per chance autocad can read word files).
I currently have a short program to read and sort a text tile in C.
If I want to read many files, is there a substitute for:
FILE *f
f = fopen("*.txt", "rw");
Thanks in advance.
f = fopen("*.txt", "rw"); won't work in any case.
The usual way to do this probably depends on your operating system. On Unix-like systems, the simple way is to invoke your program with a command line like "my_pgm *.txt" and let the shell find the matching files. (You'll get multiple arguments, each one being a file name.) I understand that microsoft OSes would require the program to find the files itself.
To do that more or less portably, I'd probably use opendir() and readdir() to examine directory entries and see whether they matched the desired pattern.
What I want:
void printFname(FILE * f)
{
char buf[255];
MagicFunction(f,buf);
printf("File name: %s",buf);
}
So, all I need is "MagicFunction", but unfortunatelly I haven't found such ...
Is there any way to implement using an OS library? (windows.h , cocoa.h, posix.h etc.)
There is no such function. There may be no filename, or more than one filename that correspond with the FILE *. On Unix, a program can continue to have a reference to a file after it has been renamed or deleted, which could mean that you have a FILE * with no name. Or more hard links may be made to the file, which means a file can have multiple names; which one would you choose? To further confuse things, a file can be temporarily hidden, by mounting a filesystem over a directory containing that file. The file will still be on disk, at its original pathname, but the file will be inaccessible at that path because the mount is obscuring it.
It's also possible that the FILE * never corresponded to a file on the filesystem at all; while they usually do, you can create one from any file descriptor using fdopen(), and that file descriptor may be a pipe, socket, or other file-like object that has never had a path on the disk. In some versions of the C library, you can open a string stream (for instance, fmemopen() in glibc), so the FILE * actually just corresponds to a memory buffer.
If you care about the name, it's best to just keep track of what it was named when you opened the file.
There are some hacky ways to approximate getting the filename; if you're just using this for debugging or informational purposes, then they may be sufficient. Most of these will require operating on the file descriptor rather than the FILE *, as the file descriptor is the lower level way of referring to a file. To get the file descriptor, run fileno() on the FILE *, and remember to check for errors in case there is no file descriptor associated with that FILE *.
On Linux, you can do readlink() on "/proc/self/fd/fileno" where fileno is the file descriptor. That will show you what filename the file had when the file was opened, or a string indicating what other kind of file descriptor it is, like a socket or inotify handle. FreeBSD and NetBSD have Linux emulation layers, which include emulation of Linux-style procfs; you may be able to do this on those if you mount a Linux-compatible procfs, though I don't have them available for testing.
On Mac OS X, you don't have /proc/self/fd. If you don't care about finding the original filename, but some other filename that refers to the file would work (such that you could pass it to another program), you can construct one: /.vol/deviceid/inode. For example, /.vol/234881030/281363. To get those values, run fstat() on the file descriptor, and use st_dev and st_ino on the resulting struct stat.
On Windows, files and the filesystem work quite differently than Unix. Apparently it's possible to map a file back to its name on Windows. As of Windows Vista, you can simply call GetFinalPathNameByHandle(). This takes a HANDLE; to get the HANDLE from the file descriptor, call _get_osfhandle(). Prior to Windows Vista, you need to do a little more work, as described in this article. Note that on Windows fileno() is named _fileno(), though the former may work with a warning.
Going even further into hacky territory, there are a few more techniques that you could use. You could shell out to lsof, or you could extract the code it uses to resolve pathnames. lsof actually looks directly in kernel memory, extracting information from the kernel's name cache. This has several limitations, outlined in the lsof FAQ. And of course, you need root or equivalent privileges to do this, either directly or with an suid/sgid binary.
And finally, for a portable but slow solution for finding one or more filenames matching an open file, you could find the device and inode number using fstat() on the file descriptor, and then recursively traverse the filesystem stat()ing every file, until you find a file with matching device and inode number. Remember the caveats I mention above; you may find no matching files, more than one matching file, and even if you don't find any matching files, the file might still be there, but hidden by a mount point. And of course, there may be race conditions; something may rename the file in such a way that you never see it while traversing the hierarchy.
There is no such standard function.
Do you fopen() yourself? If then, maintain FILE * to filename hash table yourself.
Otherwise, it's not possible in general.
I don't think that there is such function even at windows.h,coca.h or unistd.h.
Most probably you write it yourself. Just make a
struct myFile {
FILE *fh;
char *filename;
}
and hold such structures into array of struct myFile and in MagicFunction(f,b) walk on the array looking for the address equal to f.
I've been wondering about this one. Most books I've read shows that when you open a file and you found that the file is not existing, you should put an error that there's no such file then exit the system...
FILE *stream = NULL;
stream = fopen("student.txt", "rt");
if (stream==NULL) {
printf(“Cannot open input file\n”);
exit(1);
else {printf("\nReading the student list directory. Wait a moment please...");
But I thought that instead of doing that.. why not automatically create a new one when you found that the file you are opening is not existing. Even if you will not be writing on the file upon using the program (but will use it next time). I'm not sure if this is efficient or not. I'm just new here and have no programming experience whatsoever so I'm asking your opinion what are the advantages and disadvantages of creating a file upon trying to open it instead of exiting the system as usually being exampled on the books.
FILE *stream = NULL;
stream = fopen("student.txt", "rt");
if (stream == NULL) stream = fopen("student.txt", "wt");
else {
printf("\nReading the student list directory. Wait a moment please...");
Your opinion will be highly appreciated. Thank you.
Because from your example, it seems like it's an input file, if it doesn't exist, no point creating it.
For example if the program is supposed to open a file, then count how many vowels in it, then I don't see much sense of creating the file if it doesn't exist.
my $0.02 worth.
Argument mode:
``r'' Open text file for reading.
``r+'' Open for reading and writing.
``w'' Truncate file to zero length or create text file for writing.
``w+'' Open for reading and writing. The file is created if it does not
exist, otherwise it is truncated.
``a'' Open for writing. The file is created if it does not exist.
``a+'' Open for reading and writing. The file is created if it does not
exist.
Your question is a simple case. Read above description, when you call fopen(), you should decide which mode shall be used. Please consider why a file is not created for "r" and "r+", and why a file is truncated for "w" and "w+", etc. All of these are reasonable designs.
If your program expects a file to exist and it doesn't, then creating one yourself doesn't make much sense, since it's going to be empty.
If OTOH, your program is OK with a file not existing and knows how to populate one from scratch, then it's perfectly fine to do so.
Either is fine as long as it makes sense for your program. Don't worry about efficiency here -- it's negligible. Worry about correctness first.
You may not have permission to create/write to a file in the directory that the user chooses. You will have to handle that error condition.