Spaces in Directory Names - file

Is putting a space in a directory name still a big deal? I've been doing some reading, but all the articles are from the early 2000s. Is it a problem now?
For those who don't get what I mean: public_html/space directory/index.html
If this is still an issue, why shouldn't I use spaces when naming files and directories?

Spaces in URLs are still special characters that need to be escaped or encoded (either a + or %20).

Well, I am still crossing fingers when executing external processes (from ant or Java's ProcessBuilder for example). If you just pass this dir to the external process within the command - it may break apart in two arguments which is clearly not what you want.
Some quoting and minding the spaces is still required in some usecases.

Related

Changing backslashes to forward slashes changes file size

I have two small to medium sized files (2k) that are for all intents and purposes identical. The second file is the result of the first file being duplicated and replacing backslashes with forward slashes. The new file is bigger by 80 bytes (or one byte per line).
I did this with a simple batch script,and at first I thought the script might have unintentionally added some spaces or other artifacts. Or maybe the fact that their extensions are different has something to do with it (one has a tmp extension and the other has a lst extension).
From an editor, I replaced all forward slashes in the new file with backslashes and saved it without changing the extension.
And, hey guess what? The files were the same size again.
Now, before this is written off as a random fluke, I also see the same behavior exhibited in three other pairs of files (in other words six files) created in the same manner as the first. They are all one byte bigger per line in the file. The largest is about 12k bytes, and the smallest is about 2k.
I wouldn't think it has anything to do with escaping because I am on a Windows box using the Windows 7 cmd.exe shell.
Also one other thing. I tried the following:
echo \\\\\ >> a.txt
echo ///// >> b.txt
The files matched in size (7 bytes)
Does anyone have an explanation for this behavior?
I would suggest opening the files with an editor like Notepad++ that shows the type of linefeed (Windows/Mac/Unix). This is most likely your problem if the file size differs 1 byte per line.
Notepad++ can show line endings as small CR/LF symbols (View -> Show Symbol -> Show End of Line) and convert between the Windows/Mac/Unix line endings (Edit -> EOL Conversion).
Both Unix and Mac systems are usually storing files with an one byte line ending (Mac: CR, Unix: LF), Windows uses two bytes (CR LF).
Depending on the programs your batch scripts use, this might occur even though your system is a pure Windows box. The reason you don't get a difference when using an editor is that editors usually keep the file's original line endings.
Okay. I just solved it. #schnaader pointed me in the right direction. It actually has nothing to do with the forward or backslashes.
What happened is that my script added one character of trailing white space to each line. Why the file again became the same size after I reverted the slashes is because the editor I used to find and replace (Komodo Edit) is set up to automatically trim trailing white space on file save.
Funny.

How do I find out if a file name has any extension in Unix?

I need to find out if a file or directory name contains any extension in Unix for a Bourne shell scripting.
The logic will be:
If there is a file extension
Remove the extension
And use the file name without the extension
This is my first question in SO so will be great to hear from someone.
The concept of an extension isn't as strictly well-defined as in traditional / toy DOS 8+3 filenames. If you want to find file names containing a dot where the dot is not the first character, try this.
case $filename in
[!.]*.*) filename=${filename%.*};;
esac
This will trim the extension (as per the above definition, starting from the last dot if there are several) from $filename if there is one, otherwise no nothing.
If you will not be processing files whose names might start with a dot, the case is superfluous, as the assignment will also not touch the value if there isn't a dot; but with this belt-and-suspenders example, you can easily pick the approach you prefer, in case you need to extend it, one way or another.
To also handle files where there is a dot, as long as it's not the first character (but it's okay if the first character is also a dot), try the pattern ?*.*.
The case expression in pattern ) commands ;; esac syntax may look weird or scary, but it's quite versatile, and well worth learning.
I would use a shell agnostic solution. Runing the name through:
cut -d . -f 1
will give you everything up to the first dot ('-d .' sets the delimeter and '-f 1' selects the first field). You can play with the params (try '--complement' to reverse selection) and get pretty much anything you want.

What corner cases must we consider when parsing $PATH on Linux?

I'm working on a C application that has to walk $PATH to find full pathnames for binaries, and the only allowed dependency is glibc (i.e. no calling external programs like which). In the normal case, this just entails splitting getenv("PATH") by colons and checking each directory one by one, but I want to be sure I cover all of the possible corner cases. What gotchas should I look out for? In particular, are relative paths, paths starting with ~ meant to be expanded to $HOME, or paths containing the : char allowed?
One thing that once surprised me is that the empty string in PATH means the current directory. Two adjacent colons or a colon at the end or beginning of PATH means the current directory is included. This is documented in man bash for instance.
It also is in the POSIX specification.
So
PATH=:/bin
PATH=/bin:
PATH=/bin::/usr/bin
All mean the current directory is in PATH
I'm not sure this is a problem with Linux in general, but make sure that your code works if PATH has some funky (like, UTF-8) encoding to deal with directories with fancy letters. I suspect this might depend on the filesystem encoding.
I remember working on a bug report of some russian guy who had fancy letters in his user name (and hence, his home directory name which appeared in PATH).
This is minor but I'll added it since it hasn't already been mentioned. $PATH can include both absolute and relative paths. If your crawling the paths list by chdir(2)ing into each directory, you need to keep track of the original working directory (getcwd(3)) and chdir(2) back to it at each iteration of the crawl.
The existing answers cover most of it, but it's worth covering parts of the question that wasn't answered yet:
$ and ~ are not special in the value of $PATH.
If $PATH is not set at all, execvp() will use a default value.

server path / vs \

In some documentation, I have gotten the instructions to write
SERVER_PATH\theme\
When I check _SERVER["DOCUMENT_ROOT"] from php info, it's
/storage/content/75/113475/frilansbyran.se/public_html
this renders of course
/storage/content/75/113475/frilansbyran.se/public_html\theme\
this looks really weird to me what's the difference anyway which should I use? (unix-server)
Using backslashes is typically the windows way of paths, eg:
C:\Windows\System32
Forward slashes are usually used in Unix systems ala:
/usr/home/jdoe
If you are using a Unix-server I would stick to the forward slashes (/'s), though many times in in practice the system is smart enough to use either interchangeably
Path separator in Unix is /. The backslash \ is used to escape some special characters (incl. space) in the directory and file names.
The backslash \ is used as a path separator in the Windows world. It is possible, that 'some' documentation uses it.
If you are on Unix, use slash / only.
In *nix, forward slash [/] is used as the directory separator, so I would use that.
Back slashes [\] are used as directory separators on Windows systems.
To add to what others have said:
in URLs on the web, the Unix-style forward-slash / is always used; a backslash won't work in most situations.
in Windows filepaths, the forward-slash is an acceptable alternative to the backslash, so although it's not the normal way of writing it, C:/Foo/Bar will work.
So if in doubt, use the forward slash.

What are reserved filenames for various platforms?

I'm not asking about general syntactic rules for file names. I mean gotchas that jump out of nowhere and bite you. For example, trying to name a file "COM<n>" on Windows?
From: http://www.grouplogic.com/knowledge/index.cfm/fuseaction/view_Info/docID/111.
The following characters are invalid as file or folder names on Windows using NTFS: / ? < > \ : * | " and any character you can type with the Ctrl key.
In addition to the above illegal characters the caret ^ is also not permitted under Windows Operating Systems using the FAT file system.
Under Windows using the FAT file system file and folder names may be up to 255 characters long.
Under Windows using the NTFS file system file and folder names may be up to 256 characters long.
Under Window the length of a full path under both systems is 260 characters.
In addition to these characters, the following conventions are also illegal:
Placing a space at the end of the name
Placing a period at the end of the name
The following file names are also reserved under Windows:
aux,
com1,
com2,
...
com9,
lpt1,
lpt2,
...
lpt9,
con,
nul,
prn
Full description of legal and illegal filenames on Windows: http://msdn.microsoft.com/en-us/library/aa365247.aspx
A tricky Unix gotcha when you don't know:
Files which start with - or -- are legal but a pain in the butt to work with, as many command line tools think you are providing options to them.
Many of those tools have a special marker "--" to signal the end of the options:
gzip -9vf -- -mydashedfilename
As others have said, device names like COM1 are not possible as filenames under Windows because they are reserved devices.
However, there is an escape method to create and access files with these reserved names, for example, this command will redirect the output of the ver command into a file called COM1:
ver > "\\?\C:\Users\username\COM1"
Now you will have a file called COM1 that 99% of programs won't be able to open, and will probably freeze if you try to access.
Here's the Microsoft article that explains how this "file namespace" works. Basically it tells Windows not to do any string processing on the text and to pass it straight through to the filesystem. This trick can also be used to work with paths longer than 260 characters.
The boost::filesystem Portability Guide has a lot of good info.
Well, for MSDOS/Windows, NUL, PRN, LPT<n> and CON. They even cause problems if used with an extension: "NUL.TXT"
Unless you're touching special directories, the only illegal names on Linux are '.' and '..'. Any other name is possible, although accessing some of them from the shell requires using escape sequences.
EDIT: As Vinko Vrsalovic said, files starting with '-' and '--' are a pain from the shell, since those character sequences are interpreted by the application, not the shell.

Resources