How to "source" a shell file in C? - c

I have one C program and one shell script and I'd like to "source" shell script using my C.
I tried use system() function, after it I can run script properly, but my colors doesn't work.
For example instead of CYAN - I defined it as:
CYAN='\e[96m'
it shows only \e[96m and some functions just failed with message:
./myscript.sh: 27: [: y: unexpected operator
Is there some solution?

A program that is not itself the shell cannot "source" a file of shell commands as the shell itself can do. A program can run such a file as a script, either directly or by invoking a shell to run it, but the script then gets its own environment, and any changes it applies to that environment do not propagate to the parent process's environment.
Programs receive their environment as a function of program startup. If you want a variable to be set in a program's environment then by far the easiest thing to do is arrange for it to be set when the program is invoked, either by exporting it from the parent process's environment or by wrapping program launch in a script that arranges for the same. There are additional alternatives on the process startup side, as well.
If a C program wants to alter its environment after startup, then it can use the setenv() and unsetenv() functions. Those are defined by POSIX, not C itself, but if we're talking about sourcing shell commands then it seems reasonable to assume a POSIX context.
Additionally, if you are trying to define CYAN as a shell variable whose contents are an ANSI escape sequence, then your syntax is wrong. No escape sequences at all are recognized within ordinary single quotes (even closing single quote cannot be escaped). Within double quotes the backslash does function as an escape character, but in a strict sense: C-style character codes are not supported there. If, again, you're processing that in the shell, as opposed to in C, then you appear to want
CYAN=$'\e[96m'
(Note the $, which is essential for \e to be recognized as representing the "escape" character, and which causes the shell to recognize a few other C-style escape sequences as well.)

Related

Correct fd for a shell prompt

I'm making a custom shell in C and I wonder on which fd I should write my prompts.
mycoolshell $
Looking into other classic shells, I found that dash uses STDERR for its prompts. csh and tcsh use STDOUT. For bash, zsh and BSD sh I wasn't able to find anything. I used
% dash 2>file
echo qwe
echo qwe
% cat file
(dashprompt$)
(dashprompt$)
to check dash's prompt fd. Same with csh with csh 1>file but I was unlucky with the other ones.
Is there a standard or POSIX fd for this? Is it ok to use STDIN?
If you wish to be Posix compatible, you'll need to write the prompt to stderr. (See the specification of the PS1 environment variable, below.)
Regardless of strict Posix compatibility, stdin is definitely not correct, since it may not allow write operations. stdout is also not a good idea, since it is usually line-buffered. Some shells (including zsh, I believe) write the prompt to a file descriptor connected to the current terminal (such as /dev/tty) which is probably what stderr is opened as if not redirected, although it is not necessarily the same file descriptor. But using /dev/tty or equivalent is non-standard.
The prompt is only printed if the shell is interactive. A shell is interactive, according to Posix, if it is invoked in one of two ways:
If the -i option is present, or if there are no operands and the shell's standard input and standard error are attached to a terminal, the shell is considered to be interactive. (sh utility, Options)
Clearly, you wouldn't want the shell to spew out prompts if you are using it to execute a script. So you need some mechanism to tell if the shell is being used interactively or as a script processor; Posix's requirement seems reasonably accurate. (See the isatty() library function to see one way to do this test.)
That also shows why your test failed to capture the prompt when stderr was redirected to a file. Redirecting stderr causes the shell to be non-interactive so there will not be a prompt. To do the test properly, you need to force the shell to be interactive using the -i option.
Posix requires that the prompt be modifiable by changing the value of the PS1 environment variable. Here's what Posix has to say, including the requirement that the prompt be printed to stderr: (emphasis added)
PS1
Each time an interactive shell is ready to read a command, the value of this variable shall be subjected to parameter expansion and written to standard error. The default value shall be "$ ". For users who have specific additional implementation-defined privileges, the default may be another, implementation-defined value. The shell shall replace each instance of the character '!' in PS1 with the history file number of the next command to be typed. Escaping the '!' with another '!' (that is, "!!" ) shall place the literal character '!' in the prompt. (Shell command language, Shell Variables)
Most shells allow a much richer set of substitutions in PS1. But the fact that the value is subject to parameter expansion allows extensive customisation. That means that (unlike usual variable expansion) parameter and command references appearing in the value of the PS1 variable are expanded every time the prompt is printed.

How to safely pass an arbitrary text as parameter to a program in a shell script?

I'm writing a GUI application for character recognition that uses Tesseract. I want to allow the user to specify a custom shell command to be executed with /bin/sh -c when the text is ready.
The problem is the recognized text can contain literally anything, for example && rm -rf some_dir.
My first thought was to make it like in many other programs, where
the user can type the command in a text entry, and then special strings (like in printf()) in the command are replaced by the appropriate data (in my case, it might be %t). Then the whole string is passed to execvp(). For example, here is a screenshot from qBittorrent:
The problem is that even if I properly escape the text before replacing %t, nothing prevents the user to add extra quotes around the specifier:
echo '%t' >> history.txt
So the full command to be executed is:
echo ''&& rm -rf some_dir'' >> history.txt
Obviously, that's a bad idea.
The second option is only let the user to choose an executable (with a file selection dialog), so I can manually put the text from Tesseract as argv[1] for execvp(). The idea is that the executable can be a script where users can put anything they want and access the text with "$1". That way, the command injection is not possible (I think). Here's an example script a user can create:
#!/bin/sh
echo "$1" >> history.txt
It there any pitfalls with this approach? Or maybe there's a better way to safely pass an arbitrary text as parameter to a program in shell script?
In-Band: Escaping Arbitrary Data In An Unquoted Context
Don't do this. See the "Out-Of-Band" section below.
To make an arbitrarily C string (containing no NULs) evaluate to itself when used in an unquoted context in a strictly POSIX-compliant shell, you can use the following steps:
Prepend a ' (moving from the required initial unquoted context to a single-quoted context).
Replace each literal ' within the data with the string '"'"'. These characters work as follows:
' closes the initial single-quoted context.
" enters a double-quoted context.
' is, in a double-quoted context, literal.
" closes the double-quoted context.
' re-enters single-quoted context.
Append a ' (returning to the required initial single-quoted context).
This works correctly in a POSIX-compliant shell because the only character that is not literal inside of a single-quoted context is '; even backslashes are parsed as literal in that context.
However, this only works correctly when sigils are used only in an unquoted context (thus putting onus on your users to get things right), and when a shell is strictly POSIX-compliant. Also, in a worst-case scenario, you can have the string generated by this transform be up to 5x longer than the original; one thus needs to be cautious around how the memory used for the transform is allocated.
(One might ask why '"'"' is advised instead of '\''; this is because backslashes change their meaning used inside legacy backtick command substitution syntax, so the longer form is more robust).
Out-Of-Band: Environment Variables, Or Command-Line Arguments
Data should only be passed out-of-band from code, such that it's never run through the parser at all. When invoking a shell, there are two straightforward ways to do this (other than using files): Environment variables, and command-line arguments.
In both of the below mechanisms, only the user_provided_shell_script need be trusted (though this also requires that it be trusted not to introduce new or additional vulnerabilities; invoking eval or any moral equivalent thereto voids all guarantees, but that's the user's problem, not yours).
Using Environment Variables
Excluding error handling (if setenv() returns a nonzero result, this should be treated as an error, and perror() or similar should be used to report to the user), this will look like:
setenv("torrent_name", torrent_name_str, 1);
setenv("torrent_category", torrent_category_str, 1);
setenv("save_path", path_str, 1);
# shell script should use "$torrent_name", etc
system(user_provided_shell_script);
A few notes:
While values can be arbitrary C strings, it's important that the variable names be restricted -- either hardcoded constants as above, or prefixed with a constant (lowercase 7-bit ASCII) string and tested to contain only characters which are permissible shell variable names. (A lower-case prefix is advised because POSIX-compliant shells use only all-caps names for variables that modify their own behavior; see the POSIX spec on environment variables, particularly the note that "The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities").
Environment space is a limited resource; on modern Linux, the maximum combined storage for both environment variables and command-line arguments is typically on the scale of 128kb; thus, setting large environment variables will cause execve()-family calls with large command lines to fail. Validating that length is within reasonable domain-specific limits is wise.
Using Command-Line Arguments:
This version requires an explicit API, such that the user configuring the trigger command knows which value will be passed in $1, which will be passed in $2, etc.
/* You'll need to do the usual fork() before this, and the usual waitpid() after
* if you want to let it complete before proceeding.
* Lots of Q&A entries on the site already showing the context.
*/
execl("/bin/sh", "-c", user_provided_shell_script,
"sh", /* this is $0 in the script */
torrent_name_str, /* this is $1 in the script */
torrent_category_str, /* this is $2 in the script */
path_str, /* this is $3 in the script */
NUL);
Any time you're runnng commands with even the possibility of user input making its way into them you must escape for the shell context.
There's no built-in function in C to do this, so you're on your own, but the basic idea is to render user parameters as either properly escaped strings or as separate arguments to some kind of execution function (e.g. exec family).

using bash to execute a group of commands from C without storing them in a file

I've got a program which accepts a set of rules in the form of a single rules file.
When one of the conditions are considered met by my program, I seek to treat the block of commands associated with the condition as an independent bash script which needs to be executed. I would rather not deal with storing these commands in files as that leaves an undesirable attack vector. Is there a way to feed a line delimited list of bash commands to bash as a single group? I want if conditions and other things from the bash script to function correctly, not just executing each line raw on its own.
Example rules file:
if CONDITION
some nice
bash commands
pkill some process
./launching something!
endif
I want to be able to run the four lines of bash code as a group of bash commands, not independently of each other, when CONDITION is true, as determined by my C program.
Obviously this is from Linux, using C as the programming language.
You could also perhaps popen a bash process.
However, your approach suggests also to embed some scripting interpreter inside your application. Did you consider embedding e.g. lua inside it?
The simplest approach is probably to use sh -c "string containing commands to be executed". What's tricky is the embedded newlines. If the commands themselves won't contain single quotes, then you can wrap that multi-line string in single quotes. If it can contain single quotes, you'd want to escape the string to ensure that they are unchanged.
So:
read the commands into a buffer
do escape processing on the buffer; replace each ' with '\'' (remembering that the backslash must be in the output, so the string in C looks like "'\\''")
format the command: snprintf(command, sizeof(command), "sh -c '%s'", escaped_buffer);
ensure there was enough room
run system(command);

cmd- comma to separate parameters Compared to space?

I have some questions on the subject, of commas compared to space, in delimiting parameters.
They are questions that C programmers familiar with the cmd prompt, may be able to throw some light on..
I know that when doing
c:\>program a b c
there are 4 parameters [0]=program [1]=a [2]=b [3]=c
According to hh ntcmds.chm concepts..
Shell overview
; and , are used to separate parameters
; or , command1 parameter1;parameter2 Use to separate command parameters.
I see dir a,b gives the same result as dir a b
but
c:\>program a,b,c
gives parameters [0]=program [1]=a,b,c
So do some? or all? windows commands use ; and , ? and is that interpretation within the code of each command, or done by the shell like with space?
And if it is in the code of each command.. how would I know which do it?
I notice that documentation of explorer.exe mentions the comma,e.g. you can do
explorer /e,.
but DIR /? does not mention it, but can use it. And a typical c program doesn't take , as a delimiter at all.. So is it the case that the shell doesn't use comma to delimit, it uses space. And windows commands that do, do so 'cos they are (all?) written to delimit the parameters the shell has given them further when commas are used?
There are two differences here between Unix and Windows:
Internal commands such as DIR are built into the shell; their command line syntax doesn't have to follow the same rules as for regular programs
On Windows, programs are responsible for parsing their own command lines. The shell parses redirects and pipes, then passes the rest of the command line to the program in one string
Windows C programs built using Visual Studio use the command line parser in the Microsoft C runtime, which is similar to a typical Unix shell parser and obeys spaces and quotation marks.
I've never seen a C program that uses , or ; as a command line separator. I was aware of the special case for explorer /e,., but I'd never seen the dir a,b example until just now.
Batch files use a comma or semicolon as an alternative argument separator.
Test batch file:
#echo %1/%2/%3
Test run:
> test.cmd 1,2,3
1/2/3
> test.cmd 1;2 3
1/2/3
And, as you note, dir uses it, copy as well – those are both shell built-ins and probably run through a similar parser like batch files as well (it isn't exactly the same, since you can do things like cd.. or dir/s which aren't possible for anything else). I guess (note: speculation) this is some sort of backwards compatibility that goes back into the DOS or even CP/M days. Nowadays you probably should just use spaces. And as Tim notes, the C runtime dictates certain things about arguments and how they are supposed to be parsed. Many other languages/frameworks follow that convention but not necessarily all. PowerShell for example has completely different argument handling and this can sometimes be a surprise when interacting with native programs from within it (that being said, PowerShell cmdlets and functions are no programs executable elsewhere, but batch files likewise).

Is there a way to prevent sh/bash from performing command substitution?

From a C program I want to call a shell script with a filename as a parameter. Users can control the filename. The C is something like (initialization/error checking omitted):
sprintf(buf, "/bin/sh script.sh \"%s\"", filename);
system(buf);
The target device is actually an embedded system so I don't need to worry about malicious users. Obviously this would be an attack vector in a web environment. Still, if there is a filename on the system which, for example, contains backquotes in its name, the command will fail because the shell will perform expansion on the name. Is there any to prevent command substitution?
Well, you could always reimplement system() using a call to fork() and then execv().
http://www.opengroup.org/onlinepubs/000095399/functions/system.html
Try triggering "unalias " in the system function.
Since you tagged this as C I will provide you with a C answer. You will need to escape the filename -- create a new string that will be treated properly by the shell, so that things like This is a file name produces This\ is\ a\ file\ name or bad;rm *;filename becomes bad\;rm\ \*\;filename. Then you can pass that to the shell.
Another way around this would be to run the shell directly with fork and one of the exec functions. Passing arguments directly to programs does not result in shell command line expansion or interpretation.
As sharth said, you should not use system but fork and execv yourself. But to answer the question of how you make strings safe to pass to the shell (in case you insist on using system), you need to escape the string. The simplest way to do this is to first replace every occurrence of ' (single quote) with '\'' (single quote, backslash, single quote, single quote) then add ' (single quote) at the beginning and end of the string. The other fairly easy (but usually less efficient) method is to place a backslash before every single character, but then you still need to do some special quotation mark tricks to handle embedded newlines, so I prefer the first method.

Resources