How to get canonical filename on a case-insensitive filesystem?

How to get canonical filename on a case-insensitive filesystem? - c

Imagine I have a file foO/bar.txt.
On a case-insensitive filesystem, I'm able to open the file as FOO/BaR.tXt.
Now I would like to detect the "canonical" filename (foO/bar.txt), so I could warn my users, that they should use the correct spelling if they want their save-files to be usable on systems with case-sensitive filesystems.
(that is: my users can insert relative paths via a text-input; on Windows they sometimes use non-canonical cases; when the project is then opened on a case-sensitive system, the relative paths are broken)
The entire code is in plain old C, and should work cross-platform (Linux, macOS, Windows; the latter two being the obvious candidates for case-insensitive filesystems...)
I tried using glob() (using the filename as the pattern), hoping that it would return the canonicalized filename, but alas! it does not. also the Windows equivalent FindFirstFile() will happily return the queried filename, rather than return the filename as found on disk.
Any idea for a simple solution that involves only stdlib?
(ideally without manually reading the content of the directory and then checking whether there's an exact match...)

Related

C Programming: How to create a parent directory and insert files manually?

My goal is to, inside my C program, make a new directory. Within this directory, I want to create 15 simple text files. The part that I am stuck on is how to generate the 15 text files inside the newly created directory. I have created the new directory like this:
mkdir("new_dir", 0755);
But I am unsure of how to create a text file within it (in the same program). Any tips for this?

I am guessing you are on some POSIX system. The C11 standard (read n1570) does not know about directories (an abstraction provided by your operating system). If you are on Windows, it has a different WinAPI (you should then use CreateDirectory)
First, your call to mkdir(2) could fail (for a variety of reasons, including the fact that the directory did already exist). And very probably, you actually want to create the directory in the home directory, or document that you are creating it in the current working directory (e.g. leave the burden of some appropriate and prior cd shell builtin to your user). Practically speaking, the directory path should be computed at runtime as a string (perhaps using snprintf(3) or asprintf(3)).
So if you wanted to create a directory in the home directory of the user (remember that ~/foo/ is expanded by the shell during globbing, see glob(7)...; you need to fetch the home directory from environ(7)), you would code something like:
char pathbuf[256];
snprintf(pathbuf, sizeof(pathbuf), "%s/new_dir", getenv("HOME"));
to compute that string. Of course, you need to handle failure (of getenv(3), or of snprintf). I am leaving these checks as an exercise. You might want to keep the result of getenv("HOME") in some automatic variable.
Then you need to make the directory, and check against failure. At the very least (using perror(3) and see errno(3)):
if (mkdir (pathbuf, 0750)) { perror(pathbuf); exit(EXIT_FAILURE); }
BTW, the mode passed to mkdir might not allow every other user to write or access it (if it did, you could have some security vulnerability). So I prefer 0750 to yours 0755.
At last you need to create files inside it, perhaps using fopen(3) before writing into them. So some code like
int i = somenumber();
snprintf(pathbuf, sizeof(pathbuf),
"%s/new_dir/file%d.txt", getenv("HOME"), i);
FILE* f = fopen(pathbuf, "w");
if (!f) { perror(pathbuf); exit(EXIT_FAILURE); };
As Jonathan Leffler wisely commented, there are other ways.
My recommendation is to document some convention. Do you want your program to create a directory in the current working directory, or to create it in some "absolute" path, perhaps related to the home directory of your user? If your program is started by some user (and is not setuid or doesn't have root permissions, see credentials(7)) it is not permitted to create directories or files at arbitrary places (see hier(7)).
If on Linux, you'll better read some system programming book like ALP or newer. If you use a different OS, you should read the documentation of its system API.

Does file type rely on file extension?

As a general question: What's the role of file extension when determining file types?
For example, I can change .jpeg file to .png extension and even .txt. Of course, in the case of changing to .txt, it will neither be opened as picture, nor readable.
To determine file type, it seems the safe way is to parse the first few bytes of the file. If extension is not trustable, extension is no more than file name.

As a general rule, you should ALWAYS parse the COMPLETE file in order to be sure that the file is what the extension says. As you can easily imagine, it is pretty simple to create a binary file resembling a e.g. BMP (with a correct header) but then containing something different.
You should never trust the extension neither the header because otherwise a malicious user could exploit some of your code to generate e.g. a buffer overflow, and this is absolutely paramount if you are writing programs that must run at root/admin privilege.
Having said the obvious, the file extension nowadays is mainly used so that the OS can associate a program to that particular file (usually calling the program and passing the selected file as first parameter), and then it's up to the program to determine the file content.
It is a little bit different when talking about executable files. Under Unix, in order to be executable a file has to have the "x" flag set, otherwise it would not run, regardless of the extension. Under Windows, there is not such thing and the OS relies on only a few extensions (EXE, COM, BAT, etc.) to determine which files can be executed.
The EXE file, for example, has to start with "MZ" followed by some information for its allocation and size (http://www.delorie.com/djgpp/doc/exe/) and the OS surely checks its internal headers. Other formats (e.g. the COM executable format of the MS-DOS era) is just "pure" assembly code, so there is no check done by the OS. It just interprets those opcodes, hoping that everything will be fine.
So, to summarize:
File extension is mainly used so that the OS can call the appropriate program to open it (and passing the filename as the first parameter, argc/argv in C language for example)
Windows relies on some file extension to know if a file is executable, while Unix/Mac relies on a particular flag (x) associated with the file
Two things that are not well known about file extensions: directory names can have extension too, and extension can be way longer than the usual 3 characters.

With the help of file extension, you know how to read the first few and all the rest of the bytes. You also know what program to use to read the file. Or if it is an executable, you know that it is to be executed and not shown as a picture.
Yes you can change the file extension, but what does it mean then? It only means that OS (or any program that tried to read the file) is working correctly. Only you are providing bad data to it.
File extension is not something that some bytes of data inherently have. Extensions are given to those bytes depending upon the protocol followed to write them that way. After you have encoded the letters in binary form, you provide that binary form with .txt extension so that the text reader knows that these bytes convert to letters. That's the role of file extension. With bad file extension, this role is not fulfilled, resulting in incomprehension of the data you saved in binary.

As a general question: What's the role of file extension when determining file types?
The file extension usually identifies the application that opens a file.
If you rename a .JPG to a .PNG and while having JPG and PNG opened by the same application (usually an image viewer) that application can read the image stream and process it correctly regardless of having an incorrect file stream.
The problem arises if you rename the file in such a way that the file gets routed to an application that cannot handle the file's content.
If you rename a .DOCX (word) file to an Autocad extension (.DWG), opening the word file in autocad is likely to produce errors (unless per chance autocad can read word files).

Copying a directory using sockets

I'm writing a program in C that sends files across the network using sockets. This works fine for files - they are read into a buffer and then written onto the socket. They are picked up at the other end by reversing this process.
However, how can this apply to directories? I also want to copy directories, keeping the permissions the same (so I don't think mkdir will work). At the moment when I try to run this on a directory, it says the size is -1. How is a directory represented?
To be clear, for example, if I want my program to copy /tmp across the network, it will do this:
/tmp/1.txt - OK
/tmp/2.txt - OK
/tmp/dir/ - Skip
/tmp/dir/3.txt - Can't write to path

There are several possibilities. It would fit fairly will with what you have already to tar the directory to transfer, send the resulting archive across the network, and untar on the other side.
Alternatively, you can walk the directory tree recursively. For each directory you need transfer only the name and whichever attributes you want to preserve, but then you must list the directory contents (probably via readdir()) and transfer each member.
By the way, don't neglect to think about how you're going to handle links, both symbolic ones and hard ones. And if you want your program to be really robust then consider also what to do with special files such as device files and FIFOs.

I guess it is homework, otherwise why not use FTP, scp, rsync, unison etc.
To test if a file path is a plain file, a device, a directory, etc etc... use
stat(2)
To read a directory, use opendir(3) then loop on readdir(3) (then of course closedir). You don't need to know how a directory is represented.
You probably should be interested in nftw(3) to recursively traverse a file tree.
To make one directory, use mkdir(2)
You should read Advanced Linux Programming
BTW, this answer contains useful information too...

How to obtain a file name from the standard FILE structure?

What I want:
void printFname(FILE * f)
{
char buf[255];
MagicFunction(f,buf);
printf("File name: %s",buf);
}
So, all I need is "MagicFunction", but unfortunatelly I haven't found such ...
Is there any way to implement using an OS library? (windows.h , cocoa.h, posix.h etc.)

There is no such function. There may be no filename, or more than one filename that correspond with the FILE *. On Unix, a program can continue to have a reference to a file after it has been renamed or deleted, which could mean that you have a FILE * with no name. Or more hard links may be made to the file, which means a file can have multiple names; which one would you choose? To further confuse things, a file can be temporarily hidden, by mounting a filesystem over a directory containing that file. The file will still be on disk, at its original pathname, but the file will be inaccessible at that path because the mount is obscuring it.
It's also possible that the FILE * never corresponded to a file on the filesystem at all; while they usually do, you can create one from any file descriptor using fdopen(), and that file descriptor may be a pipe, socket, or other file-like object that has never had a path on the disk. In some versions of the C library, you can open a string stream (for instance, fmemopen() in glibc), so the FILE * actually just corresponds to a memory buffer.
If you care about the name, it's best to just keep track of what it was named when you opened the file.
There are some hacky ways to approximate getting the filename; if you're just using this for debugging or informational purposes, then they may be sufficient. Most of these will require operating on the file descriptor rather than the FILE *, as the file descriptor is the lower level way of referring to a file. To get the file descriptor, run fileno() on the FILE *, and remember to check for errors in case there is no file descriptor associated with that FILE *.
On Linux, you can do readlink() on "/proc/self/fd/fileno" where fileno is the file descriptor. That will show you what filename the file had when the file was opened, or a string indicating what other kind of file descriptor it is, like a socket or inotify handle. FreeBSD and NetBSD have Linux emulation layers, which include emulation of Linux-style procfs; you may be able to do this on those if you mount a Linux-compatible procfs, though I don't have them available for testing.
On Mac OS X, you don't have /proc/self/fd. If you don't care about finding the original filename, but some other filename that refers to the file would work (such that you could pass it to another program), you can construct one: /.vol/deviceid/inode. For example, /.vol/234881030/281363. To get those values, run fstat() on the file descriptor, and use st_dev and st_ino on the resulting struct stat.
On Windows, files and the filesystem work quite differently than Unix. Apparently it's possible to map a file back to its name on Windows. As of Windows Vista, you can simply call GetFinalPathNameByHandle(). This takes a HANDLE; to get the HANDLE from the file descriptor, call _get_osfhandle(). Prior to Windows Vista, you need to do a little more work, as described in this article. Note that on Windows fileno() is named _fileno(), though the former may work with a warning.
Going even further into hacky territory, there are a few more techniques that you could use. You could shell out to lsof, or you could extract the code it uses to resolve pathnames. lsof actually looks directly in kernel memory, extracting information from the kernel's name cache. This has several limitations, outlined in the lsof FAQ. And of course, you need root or equivalent privileges to do this, either directly or with an suid/sgid binary.
And finally, for a portable but slow solution for finding one or more filenames matching an open file, you could find the device and inode number using fstat() on the file descriptor, and then recursively traverse the filesystem stat()ing every file, until you find a file with matching device and inode number. Remember the caveats I mention above; you may find no matching files, more than one matching file, and even if you don't find any matching files, the file might still be there, but hidden by a mount point. And of course, there may be race conditions; something may rename the file in such a way that you never see it while traversing the hierarchy.

There is no such standard function.

Do you fopen() yourself? If then, maintain FILE * to filename hash table yourself.
Otherwise, it's not possible in general.

I don't think that there is such function even at windows.h,coca.h or unistd.h.
Most probably you write it yourself. Just make a
struct myFile {
FILE *fh;
char *filename;
}
and hold such structures into array of struct myFile and in MagicFunction(f,b) walk on the array looking for the address equal to f.

Tcl determine file name from browser upload

I have run into a problem in one of my Tcl scripts where I am uploading a file from a Windows computer to a Unix server. I would like to get just the original file name from the Windows file and save the new file with the same name. The problem is that [file tail windows_file_name] does not work, it returns the whole file name like "c:\temp\dog.jpg" instead of just "dog.jpg". File tail works correctly on a Unix file name "/usr/tmp/dog.jpg", so for some reason it is not detecting that the file is in Windows format. However Tcl on my Windows computer works correctly for either name format. I am using Tcl 8.4.18, so maybe it is too old? Is there another trick to get it to split correctly?
Thanks

The problem here is that on Windows, both \ and / are valid path separators so long Windows API is concerned (even though only \ is deemed to be "official" on Windows). On the other hand, in POSIX, the only valid path separator is /, and the only two bytes which can't appear in a pathname component are / and \0 (a byte with value 0).
Hence, on a POSIX system, "C:\foo\bar.baz" is a perfectly valid short filename, and running
file normalize {C:\foo\bar.baz}
would yield /path/to/current/dir/C:\foo\bar.baz. By the same logic, [file tail $short_filename] is the same as $short_filename.
The solution is to either do what Glenn Jackman proposed or to somehow pass the short name from the browser via some other means (some JS bound to an appropriate file entry?). Also you could attempt to detect the user's OS from the User-Agent header.
To make Glenn's idea more agnostic to user's platform, you could go like this:
Scan the file name for "/".
If none found, do set fname [string map {\\ /} $fname] then go to the next step.
Use [file tail $fn] to extract the tail name.
It's not very bullet-proof, but supposedly better than nothing.

You could always do [lindex [split $windows_file_name \\] end]

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight