How to use "&" correctly in an URL in C? - c

I want to call an URL in a C program which contains some "&". The system only recognices the URL until the "&" (https://chart.googleapis.com/chart?cht=qr) and tells me it doesnt know the command cht and chl. How can I make it to use the whole URL?
char get_qr[100];
sprintf (get_qr, "start https://chart.googleapis.com/chart?cht=qr&chs=500x500&chl=Hello World!);
system (get_qr);

The issue isn’t the URL handling, it’s that you’re launching a shell command via system.
Shell commands need to be shell quoted. In particular, & is a special character of the shell. So put quotes around your shell command argument:
system("start \"https://chart.googleapis.com/chart?cht=qr&chs=500x500&chl=Hello World!\"");
(In your code there’s no need for sprintf anyway. If your actual code requires sprintf, don’t just allocate a static buffer; allocate a dynamic buffer of the correct size!)

Related

How to safely pass an arbitrary text as parameter to a program in a shell script?

I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.
My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:
echo '%t' >> history.txt
So the full command to be executed is:
echo ''&& rm -rf some_dir'' >> history.txt
Obviously, that's a bad idea.
The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:
#!/bin/sh
echo "$1" >> history.txt
It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?
In-Band: Escaping Arbitrary Data In An Unquoted Context
Don't do this. See the "Out-Of-Band" section below.
To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:
Prepend a ' (moving from the required initial unquoted context to a single-quoted context).
Replace each literal ' within the data with the string '"'"'. These characters work as follows:
' closes the initial single-quoted context.
" enters a double-quoted context.
' is, in a double-quoted context, literal.
" closes the double-quoted context.
' re-enters single-quoted context.
Append a ' (returning to the required initial single-quoted context).
This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.
However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.
(One might ask why '"'"' is advised instead of '\''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).
Out-Of-Band: Environment Variables, Or Command-Line Arguments
Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.
In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).
Using Environment Variables
Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:
setenv("torrent_name", torrent_name_str, 1);
setenv("torrent_category", torrent_category_str, 1);
setenv("save_path", path_str, 1);
# shell script should use "$torrent_name", etc
system(user_provided_shell_script);
A few notes:
While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").
Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.
Using Command-Line Arguments:
This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.
/* You'll need to do the usual fork() before this, and the usual waitpid() after
* if you want to let it complete before proceeding.
* Lots of Q&A entries on the site already showing the context.
*/
execl("/bin/sh", "-c", user_provided_shell_script,
"sh", /* this is $0 in the script */
torrent_name_str, /* this is $1 in the script */
torrent_category_str, /* this is $2 in the script */
path_str, /* this is $3 in the script */
NUL);
Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.
There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).

Strange behavior of argv when passing string containing "!!!!"

I have written a small program that takes some input parameters from *argv[] and prints them. In almost all use cases my code works perfectly fine. A problem only arises when I use more than one exclamation mark at the end of the string I want to pass as an argument ...
This works:
./program -m "Hello, world!"
This does NOT work:
./program -m "Hello, world!!!!"
^^ If I do this, the program output is either twice that string, or the command I entered previous to ./program.
However, what I absolutely don't understand: The following, oddly enough, DOES work:
./program -m 'Hello, world!!!!'
^^ The output is exactly ...
Hello, world!!!!
... just as desired.
So, my questions are:
Why does this strange behavior occur when using multiple exclamation marks in a string?
As far as I know, in C you use "" for strings and '' for single chars. So why do I get the desired result when using '', but not when using "" as I should (in my understanding)?
Is there a mistake in my code or what do I need to change to be able to enter any string (no matter if, what, and how many punctuation marks are used) and get exactly that string printed?
The relevant parts of my code:
// this is a simplified example that, in essence, does the same
// as my (significantly longer) code
int main(int argc, char* argv[]) {
char *msg = (char *)calloc(1024, sizeof(char));
printf("%s", strcat(msg, argv[2])); // argv[1] is "-m"
free(msg);
}
I already tried copying the content of argv[2] into a char* buffer first and appending a '\0' to it, which didn't change anything.
This is not related to your code but to the shell that starts it.
In most shells, !! is shorthand for the last command that was run. When you use double quotes, the shell allows for history expansion (along with variable substitution, etc.) within the string, so when you put !! inside of a double-quoted string it substitutes the last command run.
What this means for your program is that all this happens before your program is executed, so there's not much the program can do except check if the string that is passed in is valid.
In contrast, when you use single quotes the shell does not do any substitutions and the string is passed to the program unmodified.
So you need to use single quotes to pass this string. Your users would need to know this if they don't want any substitution to happen. The alternative is to create a wrapper shell script that prompts the user for the string to pass in, then the script would subsequently call your program with the proper arguments.
The shell does expansion in double-quoted strings. And if you read the Bash manual page (assuming you use Bash, which is the default on most Linux distributions) then if you look at the History Expansion section you will see that !! means
Refer to the previous command.
So !!!! in your double-quoted string will expand to the previous command, twice.
Such expansion is not made for single-quoted strings.
So the problem is not within your program, it's due to the environment (the shell) calling your program.
In addition to the supplied answers, you should remember that echo is your shell friend. If you prefix your command with "echo ", you will see what shell is actually sending to your script.
echo ./program -m "Hello, world!!!!"
This would have showed you some strangeness and might have helped steer you in the right direction.

How to "source" a shell file in C?

I have one C program and one shell script and I'd like to "source" shell script using my C.
I tried use system() function, after it I can run script properly, but my colors doesn't work.
For example instead of CYAN - I defined it as:
CYAN='\e[96m'
it shows only \e[96m and some functions just failed with message:
./myscript.sh: 27: [: y: unexpected operator
Is there some solution?
A program that is not itself the shell cannot "source" a file of shell commands as the shell itself can do. A program can run such a file as a script, either directly or by invoking a shell to run it, but the script then gets its own environment, and any changes it applies to that environment do not propagate to the parent process's environment.
Programs receive their environment as a function of program startup. If you want a variable to be set in a program's environment then by far the easiest thing to do is arrange for it to be set when the program is invoked, either by exporting it from the parent process's environment or by wrapping program launch in a script that arranges for the same. There are additional alternatives on the process startup side, as well.
If a C program wants to alter its environment after startup, then it can use the setenv() and unsetenv() functions. Those are defined by POSIX, not C itself, but if we're talking about sourcing shell commands then it seems reasonable to assume a POSIX context.
Additionally, if you are trying to define CYAN as a shell variable whose contents are an ANSI escape sequence, then your syntax is wrong. No escape sequences at all are recognized within ordinary single quotes (even closing single quote cannot be escaped). Within double quotes the backslash does function as an escape character, but in a strict sense: C-style character codes are not supported there. If, again, you're processing that in the shell, as opposed to in C, then you appear to want
CYAN=$'\e[96m'
(Note the $, which is essential for \e to be recognized as representing the "escape" character, and which causes the shell to recognize a few other C-style escape sequences as well.)

C - secure execution of system() or exec() with environment variables

I have two strings, both of which can be set by the user, e.g.
char *command = "vim $VAR";
char *myVar = "/tmp/something";
I want to execute *command using *myVar for $VAR.
I tried concatenating them as an environment variable (e.g. (pseudo-code) system("VAR=" + *myVar + "; " + *command), but the user controls myVar so this would be very insecure and buggy.
I considered tokenizing on spaces to directly replace $var and passing the results to exec(), but it's too awkward to worry about tokenizing shell command arguments correctly.
I think the solution is to emulate system() with exec by doing something like exec("sh", "-c", command, "--argument", "VAR", myVar), but I can't see anything in the sh/dash/bash man pages to permit setting environment variables in this way.
Edit: I just saw execvpe() which has an argument for setting environment variables from key=value strings. Would this be safe to use with untrusted input for the value?
How do I do this safely?
You can perform some string replacement on the value of myVar — put it inside single quotes, and replace all single quotes (the character ') by the four-character string '\''. Fiddly but safe if you don't make an implementation mistake. If possible, use a library that does it for you.
If your program is single-threaded, I recommend a different solution that doesn't involve fiddly quoting. You talk of setting environment variables… Well, just do it: make VAR an environment variable.
setenv("VAR", myVar, 1);
system(command);
unsetenv("VAR")
I've omitted error checking, and I assume that VAR isn't needed elsewhere in your program (if it is, this solution becomes more tedious because you need to remember the old value).
If you want fine control over the environment in which the command runs, you can reimplement system on top of fork, execve (or execvpe) and waitpid, or on top of posix_spawn (or posix_spawnp) and waitpid. It's more effort but you gain flexibility.
Note that whatever solution you adopt other than doing string replacement to "vim $VAR" inside the C program, the command will need to be vim "$VAR" and not vim $VAR. This is because in shell syntax, $VAR means “the value of the variable VAR” only if it's inside double quotes — otherwise, $VAR means “take the value of VAR, split it into words, and expand each word as a file name wildcard pattern”.
You need to quote the string contained in myVar; this may mean escaping naughty characters (eg with backslash).
You could use g_shell_quote from Glib
So as Ben pointed out, command is probably loaded at runtime.
I think the best approach is to tokenize command, rather than to tokenize myVar. You can then find which word in command is $VAR and replace that with the value of myVar. Then you can use posix_spawnp as per below.
If you really want command to be an arbitrary shell command, then your only option is to escape myVar before assigning it to an environment variable. Otherwise the shell will expand spaces and other special characters in it regardless of how you set it.
Third option is to make sure command is vim "$VAR" instead of vim $VAR. In that case you can assign it to environment using setenv, then call system, and then unset it after.
Old answer in case command is static:
It looks like what you actually want to do is
extern char *environ[];
posix_spawnp(NULL, "vim", NULL, NULL, (char*[]){"vim", myVar, NULL}, environ);
wait(NULL);
i.e. exec vim directly without any shell, with myVar as the first argument.
How about:
fork (if emulating system or spawn, skip this if doing exec)
setenv("VAR", myVar) in the fork child
exec "sh -c " + command

Is there a way to prevent sh/bash from performing command substitution?

From a C program I want to call a shell script with a filename as a parameter. Users can control the filename. The C is something like (initialization/error checking omitted):
sprintf(buf, "/bin/sh script.sh \"%s\"", filename);
system(buf);
The target device is actually an embedded system so I don't need to worry about malicious users. Obviously this would be an attack vector in a web environment. Still, if there is a filename on the system which, for example, contains backquotes in its name, the command will fail because the shell will perform expansion on the name. Is there any to prevent command substitution?
Well, you could always reimplement system() using a call to fork() and then execv().
http://www.opengroup.org/onlinepubs/000095399/functions/system.html
Try triggering "unalias " in the system function.
Since you tagged this as C I will provide you with a C answer. You will need to escape the filename -- create a new string that will be treated properly by the shell, so that things like This is a file name produces This\ is\ a\ file\ name or bad;rm *;filename becomes bad\;rm\ \*\;filename. Then you can pass that to the shell.
Another way around this would be to run the shell directly with fork and one of the exec functions. Passing arguments directly to programs does not result in shell command line expansion or interpretation.
As sharth said, you should not use system but fork and execv yourself. But to answer the question of how you make strings safe to pass to the shell (in case you insist on using system), you need to escape the string. The simplest way to do this is to first replace every occurrence of ' (single quote) with '\'' (single quote, backslash, single quote, single quote) then add ' (single quote) at the beginning and end of the string. The other fairly easy (but usually less efficient) method is to place a backslash before every single character, but then you still need to do some special quotation mark tricks to handle embedded newlines, so I prefer the first method.

Resources