Single Quotes or No quotes in file paths in Unix shells - file

I am new to Unix systems and trying to learn some thing with help of terminal. I have following question in my mind. If we can write filepath without single quotes in terminal (for ex : mv path1 path2) then why we sometime use single quotes to specify paths. What is the difference between these two?

This is not a question of the operating system, but of the shell you use. You can actually chose what shell you want to use on a unixoid system if multiple are installed (which usually is the case).
In general the shell has to interpret the input you make. It has to decide how to handle the tokens of the input. What to consider as the "command" you want to execute, what as arguments. For the arguments it has to decide if the string is meant as a single argument or multiple arguments.
Without quotes (single or double quotes), whitespace characters are considered separators between words, words are typically considered separate arguments. So you can specify multiple arguments for a single command. If that is not desired then you can use quote characters to group multiple words separated by whitespace characters into a single argument, for example a folder name containing a space character. This works because now the shell knows that you want everything following the quote character to be considered as a single argument up to the next matching quote character (actually except escaped ones...).

It's used to escape spaces in file names, otherwise, a backslash is needed. For instance:
$ rm spaces\ in\ file\ name
$ rm 'spaces in file name'
If your file path does not have spaces, it's probably safe to omit the quotes.

Related

How to safely pass an arbitrary text as parameter to a program in a shell script?

I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.
My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:
echo '%t' >> history.txt
So the full command to be executed is:
echo ''&& rm -rf some_dir'' >> history.txt
Obviously, that's a bad idea.
The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:
#!/bin/sh
echo "$1" >> history.txt
It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?
In-Band: Escaping Arbitrary Data In An Unquoted Context
Don't do this. See the "Out-Of-Band" section below.
To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:
Prepend a ' (moving from the required initial unquoted context to a single-quoted context).
Replace each literal ' within the data with the string '"'"'. These characters work as follows:
' closes the initial single-quoted context.
" enters a double-quoted context.
' is, in a double-quoted context, literal.
" closes the double-quoted context.
' re-enters single-quoted context.
Append a ' (returning to the required initial single-quoted context).
This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.
However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.
(One might ask why '"'"' is advised instead of '\''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).
Out-Of-Band: Environment Variables, Or Command-Line Arguments
Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.
In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).
Using Environment Variables
Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:
setenv("torrent_name", torrent_name_str, 1);
setenv("torrent_category", torrent_category_str, 1);
setenv("save_path", path_str, 1);
# shell script should use "$torrent_name", etc
system(user_provided_shell_script);
A few notes:
While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").
Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.
Using Command-Line Arguments:
This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.
/* You'll need to do the usual fork() before this, and the usual waitpid() after
* if you want to let it complete before proceeding.
* Lots of Q&A entries on the site already showing the context.
*/
execl("/bin/sh", "-c", user_provided_shell_script,
"sh", /* this is $0 in the script */
torrent_name_str, /* this is $1 in the script */
torrent_category_str, /* this is $2 in the script */
path_str, /* this is $3 in the script */
NUL);
Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.
There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).

system() not working

I am trying to launch executables from a C source file. When there is a space in the path I.e.
system("D:\\Games\\Subway Surfers\\Subway_Surfers.exe")
it does not work but
when I change the folder name and remove the space it works. Is there a way around this?
You have to use escape characters while using spaces in path.
Ex: system("D:\\Games\\Subway\ Surfers\\Subway_Surfers.exe");
Try replacing the \ with \\ and with \. You have to replace the characters with their respective escape characters.
system("\"D:\\Games\\Subway\ Surfers\\Subway_Surfers.exe\"");
This command would be interpreted as:
"D:\Games\Subway Surgers\Subway_Surfers.exe"
And, the quotes around the path with spaces ensure that the string is not truncated about the space.
Thanks guys escape characters didn't work so I just used CreateProcess() function. Its long but works fine even with spaces
You need to quote subdir name containing space character. For example like
system("D:\\Games\\\"Subway Surfers\"\\Subway_Surfers.exe") where \"Subway Surfers\" is quoted subdir with spaces.
I have found a perfect workaround to use the system() function. it requires a string in the argument so i just create a string whose contents are the path e.g char path[50] = "D:\SubwaySurfers\SubwaySurfers.exe" then call the function as
system(path);
however in some specific applications such as Apache(Game) it doesnt work whether i use CreateProcess or System.

C command-line parser for handling comments

I have a tool that takes input and makes output:
$ tool input > output
I'd like to add an option that is a long string — say, a "comment" option. This comment text is an argument to the option and is a sentence enclosed in forward tick marks:
$ tool --comment='I am commenting on the use of comments' input > output_plus_comment
This is different from the usual --foo=bar key-value pairing, where foo is the option name and bar is a one-word value (e.g., true, red, ...).
Is there a good command-line parser library for C that handles this particular case?
Tokenizing the command line into arguments for your program is the responsibility of your shell, not yours. So there's nothing for you to do.
Just put quotation marks around strings that contain spaces, or escape spaces with backslashes on your command line, and your --foo value can contain as many spaces as you like.

Find and replace multi line file content in files

What I want to do is:
find some_files -name '*.html' -exec sed -i "s/`cat old`/`cat new`/g" {} \;
with old and new containing newline characters and slashes and other special characters, which prevent sed from parsing correctly.
I have read about how to escape newline characters with sed, and the command tr, the command printf '%q', but I can't make these work properly, maybe because I don't fully understand their function. Additionally, I don't know which special characters I still have to escape for sed to work.
I'm not sure what you want to do exactly, but if the old file contains newlines, you're probably going to run into trouble. That is because sed works by applying the commands on each line, so trying to match a line with a pattern that represents multiple lines will not work unless you load more lines explicitly.
My suggestion would be to load the whole file into sed's "buffer" before applying the substitute command. Then, you'd have to make sure that old and new are escaped correctly. Also, what could become more confusing is that escaping for the old file (the pattern) must be different than for the new file (the replacement).
Let's start by escaping the new file into a "new.tmp" file. For clarity, we'll create a sed script called "escape_new.sed":
#!/bin/sed -f
# Commas used as separators
s,\\,\\\\,g
s,$,\\,g
s,[/&],\\&,g
$ a/
Then run it: sed -f escape_new.sed new > new.tmp
There are three commands we use to escape:
Backslashes should be preceded by another backslash
Newlines should be preceded by a backslash (we do this by adding a backslash before the end of the line).
Ampersands and slashes should be preceded by a backslash (notice that the & at the replacement text is actually an operator that contains the match, therefore if it matches the slash it contains the slash, and if it matches the ampersand, it contains the ampersand).
On the last line (refered to with the "$" symbol), we append (through the "a" command) a slash. This is the closing slash for the substitute command we will be using later. We have to put it here because the backticks will remove any extra newlines at the end of the input, and that can cause problems (like for example a backslash used for quoting a newline actually quoting the terminating slash).
Now let's escape the old file. As above, we'll create an "escape_old.sed" script. Before we do it though, we need to load the whole file into the pattern space (sed's internal buffer) so we can replace newline characters. We can do that with the following commands:
: a
$! {
N
b a
}
The first command creates a label called "a". The second command ("{") actually starts a group of commands. The magic here is the "$!" address prefix. That prefix tells it to run the commands only if the last input line that was read wasn't the last line of the input ("$" means last line of the input and "!" means not). The first command in the group appends the next line from the input into the pattern space. If this "N" command is executed in the last line, it terminates the script, so we must be careful to not execute it on the last line. The second command in the group is a branch command, "b", which will "jump" back to the "a" label. The magic is the "$!" address prefix we have before the command. The closing bracket closes the group. This group, with its respective address prefix, allows us to loop through all of the lines, concatanting them together, and stop after the last line, allowing any further commands to be executed. We then have the final script:
#!/bin/sed -f
: a
$! {
N
b a
}
s,\\,\\\\,g
s,\n,\\n,g
s,[][/^$.],\\&,g
As above, we need to escape the special characters. In this case an actual newline is now escaped as a backslash followed by the letter n. In the last command, there are more characters that need to be prefixed by a backslash. Notice that to match a closing square-bracket, it needs to be the first character inside the square-brackets, to prevent sed from interpreting it as the closing character for our list of characters to match. Therefore, the characters that are listed in order between the square brackets are ][/^$. .
And again, we execute it with: sed -f escape_new.sed old > old.tmp
Now we can use these escaped files in the sed command, but again we must load all of the lines into pattern space. Using the same commands as before, but placing them into a single line we have the compact form: :a;$!{N;ba}: which we can now use in the final expression (without the closing slash character that is now on the new.tmp file):
find some_files -name '*.html' -exec sed -e ":a;\$!{N;ba};s/`cat old.tmp`/`cat new.tmp`g" -i {} \;
And hopefully it will work =)
Notice that we have escaped the $ symbol with a backslash, otherwise the shell will think that we are trying to access the $! variable (result of the last asynchronous command executed).

Is there a way to prevent sh/bash from performing command substitution?

From a C program I want to call a shell script with a filename as a parameter. Users can control the filename. The C is something like (initialization/error checking omitted):
sprintf(buf, "/bin/sh script.sh \"%s\"", filename);
system(buf);
The target device is actually an embedded system so I don't need to worry about malicious users. Obviously this would be an attack vector in a web environment. Still, if there is a filename on the system which, for example, contains backquotes in its name, the command will fail because the shell will perform expansion on the name. Is there any to prevent command substitution?
Well, you could always reimplement system() using a call to fork() and then execv().
http://www.opengroup.org/onlinepubs/000095399/functions/system.html
Try triggering "unalias " in the system function.
Since you tagged this as C I will provide you with a C answer. You will need to escape the filename -- create a new string that will be treated properly by the shell, so that things like This is a file name produces This\ is\ a\ file\ name or bad;rm *;filename becomes bad\;rm\ \*\;filename. Then you can pass that to the shell.
Another way around this would be to run the shell directly with fork and one of the exec functions. Passing arguments directly to programs does not result in shell command line expansion or interpretation.
As sharth said, you should not use system but fork and execv yourself. But to answer the question of how you make strings safe to pass to the shell (in case you insist on using system), you need to escape the string. The simplest way to do this is to first replace every occurrence of ' (single quote) with '\'' (single quote, backslash, single quote, single quote) then add ' (single quote) at the beginning and end of the string. The other fairly easy (but usually less efficient) method is to place a backslash before every single character, but then you still need to do some special quotation mark tricks to handle embedded newlines, so I prefer the first method.

Resources