Spaces in the filename - filesystems

Some file system cares about spaces at the beginning or end of the file or directory name?
They (file system) convert this: "/ directory /" to this "/directory/" when create a file?
English is not my native language, so I apologize any mistake.

Yes they do care.
For instance in Linux Ext3 / Ext4:
touch "file1"
touch " file1"
touch "file1 "
Will create three different files. One without spaces, other with a leading space, and the other with a trailing one.
It works just the same with directories, as Linux follows the Unix principle of everything is a file.
Windows filename rules advices against using trailing spaces for files or directories, even though the underlying filesystem may support it.

Related

Where does fopen() search for File to read?

The question is self-descriptive. I just want to know the search range of fopen() in :
a) Windows
b) Unix-like systems like MacOS & Linux
When asked to open a file for reading, or reading & writing or even just writing, with a relative path, i.e "File.txt". And I need an answer addressing both - text & binary files (if at all they differ in this regard).
Does it scan only the current directory , or does it scan particular folders ?
(Since scanning full disk would be painstakingly slow, right ?)
Edit:
Why the downvotes ? Because the ya'll simply don't know ?
fopen() doesn't scan at all
It just opens the file you tell it to open.
The path is either absolute, or relative to the current directory.
The behaviour is pretty much the same across platforms.
Of course in Windows paths look a bit different (drive letters, backslashes instead of slashes).
One relevant difference I can think of:
If the path starts with a drive letter and a colon, it will look at another drive.
If there is no backslash after the drive letter and colon, then the location will be relative to that drive's current working directory (as Windows remembers a current directory per drive letter).

Auto detect OS in C and handle with their specific line breaks

Is there a way to detect the OS where the C code is compiled to handle with it's specific line break characters in text files? For example I compile my code on a Windows machine, it should use \r\n as line break in text files, on Linux it should just use \n.
I need this for a program which should read text files binary and match substrings of the file with other strings. This should work on windows and Linux.
Thanks for your help!
You don't need to know the native storage format. When reading a file, you cannot know if it was created on a Window, Linux, or other system -- it could be created on another system than the one you are working on. When writing, your program will use the native libraries for your OS and output whatever it deems appropriate for \n.
Reading a text file line-ending agnostically comes down to this:
use a binary mode rather than "text mode" (you seem to already do this).
read text until you encounter either an \r or \n.
if you get an \r, skip all next \n;
if you get an \n, skip all next \r.
This will work for line endings of \n (Linux and other Unix-like OSes such as Mac OS X), Windows-like \r\n and older Mac OS files ending with \r only. That covers about 99.99% of all "normal" text files you are likely to encounter. There used to be a very rare one that used \r\n\n (or possibly \n\r\r) but even that will be handled correctly.
The best way would be to check for a predefined macro and #ifdef on it.
You can print all the predefined MACROs using the command
gcc -dM -E - < /dev/null
and grep for "LINUX" or "WIN32"
I'd expect to find _ LINUX _ defined on Linux machines and _ WIN32 _ defined on windows machine.

Changing backslashes to forward slashes changes file size

I have two small to medium sized files (2k) that are for all intents and purposes identical. The second file is the result of the first file being duplicated and replacing backslashes with forward slashes. The new file is bigger by 80 bytes (or one byte per line).
I did this with a simple batch script,and at first I thought the script might have unintentionally added some spaces or other artifacts. Or maybe the fact that their extensions are different has something to do with it (one has a tmp extension and the other has a lst extension).
From an editor, I replaced all forward slashes in the new file with backslashes and saved it without changing the extension.
And, hey guess what? The files were the same size again.
Now, before this is written off as a random fluke, I also see the same behavior exhibited in three other pairs of files (in other words six files) created in the same manner as the first. They are all one byte bigger per line in the file. The largest is about 12k bytes, and the smallest is about 2k.
I wouldn't think it has anything to do with escaping because I am on a Windows box using the Windows 7 cmd.exe shell.
Also one other thing. I tried the following:
echo \\\\\ >> a.txt
echo ///// >> b.txt
The files matched in size (7 bytes)
Does anyone have an explanation for this behavior?
I would suggest opening the files with an editor like Notepad++ that shows the type of linefeed (Windows/Mac/Unix). This is most likely your problem if the file size differs 1 byte per line.
Notepad++ can show line endings as small CR/LF symbols (View -> Show Symbol -> Show End of Line) and convert between the Windows/Mac/Unix line endings (Edit -> EOL Conversion).
Both Unix and Mac systems are usually storing files with an one byte line ending (Mac: CR, Unix: LF), Windows uses two bytes (CR LF).
Depending on the programs your batch scripts use, this might occur even though your system is a pure Windows box. The reason you don't get a difference when using an editor is that editors usually keep the file's original line endings.
Okay. I just solved it. #schnaader pointed me in the right direction. It actually has nothing to do with the forward or backslashes.
What happened is that my script added one character of trailing white space to each line. Why the file again became the same size after I reverted the slashes is because the editor I used to find and replace (Komodo Edit) is set up to automatically trim trailing white space on file save.
Funny.

Tcl determine file name from browser upload

I have run into a problem in one of my Tcl scripts where I am uploading a file from a Windows computer to a Unix server. I would like to get just the original file name from the Windows file and save the new file with the same name. The problem is that [file tail windows_file_name] does not work, it returns the whole file name like "c:\temp\dog.jpg" instead of just "dog.jpg". File tail works correctly on a Unix file name "/usr/tmp/dog.jpg", so for some reason it is not detecting that the file is in Windows format. However Tcl on my Windows computer works correctly for either name format. I am using Tcl 8.4.18, so maybe it is too old? Is there another trick to get it to split correctly?
Thanks
The problem here is that on Windows, both \ and / are valid path separators so long Windows API is concerned (even though only \ is deemed to be "official" on Windows). On the other hand, in POSIX, the only valid path separator is /, and the only two bytes which can't appear in a pathname component are / and \0 (a byte with value 0).
Hence, on a POSIX system, "C:\foo\bar.baz" is a perfectly valid short filename, and running
file normalize {C:\foo\bar.baz}
would yield /path/to/current/dir/C:\foo\bar.baz. By the same logic, [file tail $short_filename] is the same as $short_filename.
The solution is to either do what Glenn Jackman proposed or to somehow pass the short name from the browser via some other means (some JS bound to an appropriate file entry?). Also you could attempt to detect the user's OS from the User-Agent header.
To make Glenn's idea more agnostic to user's platform, you could go like this:
Scan the file name for "/".
If none found, do set fname [string map {\\ /} $fname] then go to the next step.
Use [file tail $fn] to extract the tail name.
It's not very bullet-proof, but supposedly better than nothing.
You could always do [lindex [split $windows_file_name \\] end]

What are reserved filenames for various platforms?

I'm not asking about general syntactic rules for file names. I mean gotchas that jump out of nowhere and bite you. For example, trying to name a file "COM<n>" on Windows?
From: http://www.grouplogic.com/knowledge/index.cfm/fuseaction/view_Info/docID/111.
The following characters are invalid as file or folder names on Windows using NTFS: / ? < > \ : * | " and any character you can type with the Ctrl key.
In addition to the above illegal characters the caret ^ is also not permitted under Windows Operating Systems using the FAT file system.
Under Windows using the FAT file system file and folder names may be up to 255 characters long.
Under Windows using the NTFS file system file and folder names may be up to 256 characters long.
Under Window the length of a full path under both systems is 260 characters.
In addition to these characters, the following conventions are also illegal:
Placing a space at the end of the name
Placing a period at the end of the name
The following file names are also reserved under Windows:
aux,
com1,
com2,
...
com9,
lpt1,
lpt2,
...
lpt9,
con,
nul,
prn
Full description of legal and illegal filenames on Windows: http://msdn.microsoft.com/en-us/library/aa365247.aspx
A tricky Unix gotcha when you don't know:
Files which start with - or -- are legal but a pain in the butt to work with, as many command line tools think you are providing options to them.
Many of those tools have a special marker "--" to signal the end of the options:
gzip -9vf -- -mydashedfilename
As others have said, device names like COM1 are not possible as filenames under Windows because they are reserved devices.
However, there is an escape method to create and access files with these reserved names, for example, this command will redirect the output of the ver command into a file called COM1:
ver > "\\?\C:\Users\username\COM1"
Now you will have a file called COM1 that 99% of programs won't be able to open, and will probably freeze if you try to access.
Here's the Microsoft article that explains how this "file namespace" works. Basically it tells Windows not to do any string processing on the text and to pass it straight through to the filesystem. This trick can also be used to work with paths longer than 260 characters.
The boost::filesystem Portability Guide has a lot of good info.
Well, for MSDOS/Windows, NUL, PRN, LPT<n> and CON. They even cause problems if used with an extension: "NUL.TXT"
Unless you're touching special directories, the only illegal names on Linux are '.' and '..'. Any other name is possible, although accessing some of them from the shell requires using escape sequences.
EDIT: As Vinko Vrsalovic said, files starting with '-' and '--' are a pain from the shell, since those character sequences are interpreted by the application, not the shell.

Resources