So I was reading The C Programming language and came across a section where programs were now allowed to have arguments...
For example
find -x -n pattern
Here, -x means except.
-n means numbered lines...
and pattern is what it will look for, in another few lines of input.
Now they regard find as *argv[0], -x and -n at *++argv[0], and pattern as *++argv[0]. How does a computer know one arg from the other?
If 3 things are all equal to *++argv[0], then they stay at argv[1], but all of them??
Could anyone please explain in depth?
argv[0] = program name = "find"
argv[1] = first argument = "-x"
argv[2] = second argument = "-n"
argv[3] = third argument = "pattern"
argc = 4, so you know there are no other arguments to process.
Don't be confused by the use of the pre-increment operator in expressions like *++argv[0]. The arguments are passed in separate array elements.
When the shell executes your command, it uses whitespace to divide the command line
up into the program name and arguments and pass them to your program. Sometimes you need to work around that by using double quotes, for example, if you need to deal with a file whose name contains embedded spaces:
mv some stupid filename sane_filename
This won't work because "some" "stupid" "filename" will be seen as separate arguments.
But you can do this:
mv "some stupid filename" sane_filename
to get a single argument with embedded spaces.
The ++n preincrement operator changes the variable it is applied to. The first time ++argv is executed, indexing it at 0 actually points to element 1 of the original value argv had, the second time it points to element 2, and so on.
Related
I have written a small program that takes some input parameters from *argv[] and prints them. In almost all use cases my code works perfectly fine. A problem only arises when I use more than one exclamation mark at the end of the string I want to pass as an argument ...
This works:
./program -m "Hello, world!"
This does NOT work:
./program -m "Hello, world!!!!"
^^ If I do this, the program output is either twice that string, or the command I entered previous to ./program.
However, what I absolutely don't understand: The following, oddly enough, DOES work:
./program -m 'Hello, world!!!!'
^^ The output is exactly ...
Hello, world!!!!
... just as desired.
So, my questions are:
Why does this strange behavior occur when using multiple exclamation marks in a string?
As far as I know, in C you use "" for strings and '' for single chars. So why do I get the desired result when using '', but not when using "" as I should (in my understanding)?
Is there a mistake in my code or what do I need to change to be able to enter any string (no matter if, what, and how many punctuation marks are used) and get exactly that string printed?
The relevant parts of my code:
// this is a simplified example that, in essence, does the same
// as my (significantly longer) code
int main(int argc, char* argv[]) {
char *msg = (char *)calloc(1024, sizeof(char));
printf("%s", strcat(msg, argv[2])); // argv[1] is "-m"
free(msg);
}
I already tried copying the content of argv[2] into a char* buffer first and appending a '\0' to it, which didn't change anything.
This is not related to your code but to the shell that starts it.
In most shells, !! is shorthand for the last command that was run. When you use double quotes, the shell allows for history expansion (along with variable substitution, etc.) within the string, so when you put !! inside of a double-quoted string it substitutes the last command run.
What this means for your program is that all this happens before your program is executed, so there's not much the program can do except check if the string that is passed in is valid.
In contrast, when you use single quotes the shell does not do any substitutions and the string is passed to the program unmodified.
So you need to use single quotes to pass this string. Your users would need to know this if they don't want any substitution to happen. The alternative is to create a wrapper shell script that prompts the user for the string to pass in, then the script would subsequently call your program with the proper arguments.
The shell does expansion in double-quoted strings. And if you read the Bash manual page (assuming you use Bash, which is the default on most Linux distributions) then if you look at the History Expansion section you will see that !! means
Refer to the previous command.
So !!!! in your double-quoted string will expand to the previous command, twice.
Such expansion is not made for single-quoted strings.
So the problem is not within your program, it's due to the environment (the shell) calling your program.
In addition to the supplied answers, you should remember that echo is your shell friend. If you prefix your command with "echo ", you will see what shell is actually sending to your script.
echo ./program -m "Hello, world!!!!"
This would have showed you some strangeness and might have helped steer you in the right direction.
I'm trying to send input from the command line to my main function. The input is then sent to the functions checkNum etc.
int main(int argc, char *argv[])
{
int x = checkNum(argv[1]);
int y = checkNum(argv[3]);
int o = checkOP(argv[2]);
…
}
It is supposed to be a calculator so for example in the command line when I write:
program.exe 4 + 2
and it will give me the answer 6 (code for this is not included).
The problem is when I want to multiply and I type for example
program.exe 3 * 4
It seems like it creates a pointer (or something, not quite sure) instead of giving me the char pointer to the char '*'.
The question is can I get the input '*' to behave the same way as when I type '+'?
Edit: Writing "*" in the command line works. Is there a way where I only need to type *?
The code is running on Windows, which seems to be part of the problem.
As #JohnBollinger wrote in the comments, you should use
/path/to/program 3 '*' 4
the way it's written at the moment.
But some explanation is clearly required. This is because the shell will parse the command line before passing it to your program. * will expand to any file in the directory (UNIX) or something similar (windows), space separated. This is not what you need. You cannot fix it within your program as it will be too late. (On UNIX you can ensure you are in an empty directory but that probably doesn't help).
Another way around this is to quote the entire argument (and rewrite you program appropriately), i.e.
/path/to/program '3 * 4'
in which case you would need to use strtok_r or strsep to step through the (single) argument passed, separating it on the space(s).
How the shell handles the command-line arguments is outside the scope and control of your program. There is nothing you can put in the program to tell the shell to avoid performing any of its normal command-handling behavior.
I suggest, however, that instead of relying on the shell for word splitting, you make your program expect the whole expression as a single argument, and for it to parse the expression. That will not relieve you of the need for quotes, but it will make the resulting commands look more natural:
program.exe 3+4
program.exe "3 + 4"
program.exe "4*5"
That will also help if you expand your program to handle more complex expressions, such as those containing parentheses (which are also significant to the shell).
You can turn off the shell globbing if you don't want to use single quote (') or double quote (").
Do
# set -o noglob
or
# set -f
(both are equivalent).
to turn off the shell globbing. Now, the shell won't expand any globs, including *.
I'm trying to write a basic find command for a assignment (without using find). Right now I have an array of files I want to exec something on. The syntax would look like this:
-exec /bin/mv {} ~/.TRASH
And I have an array called current that holds all of the files. My array only holds /bin/mv, {}, and ~/.TRASH (since I shift the -exec out) and are in an array called arguments.
I need it so that every file gets passed into {} and exec is called on it.
I'm thinking I should use sed to replace the contents of {} like this (within a for loop):
for i in "${current[#]}"; do
sed "s#$i#{}"
#exec stuff?
done
How do I exec the other arguments though?
You can something like this:
cmd='-exec /bin/mv {} ~/.TRASH'
current=(test1.txt test2.txt)
for f in "${current[#]}"; do
eval $(sed "s/{}/$f/;s/-exec //" <<< "$cmd")
done
Be very careful with eval command though as it can do nasty things if input comes from untrusted sources.
Here is an attempt to avoid eval (thanks to #gniourf_gniourf for his comments):
current=( test1.txt test2.txt )
arguments=( "/bin/mv" "{}" ~/.TRASH )
for f in "${current[#]}"; do
"${arguments[#]/\{\}/$f}"
done
Your are lucky that your design is not too bad, that your arguments are in an array.
But you certainly don't want to use eval.
So, if I understand correctly, you have an array of files:
current=( [0]='/path/to/file'1 [1]='/path/to/file2' ... )
and an array of arguments:
arguments=( [0]='/bin/mv' [1]='{}' [2]='/home/alex/.TRASH' )
Note that you don't have the tilde here, since Bash already expanded it.
To perform what you want:
for i in "${current[#]}"; do
( "${arguments[#]//'{}'/"$i"}" )
done
Observe the quotes.
This will replace all the occurrences of {} in the fields of arguments by the expansion of $i, i.e., by the filename1, and execute this expansion. Note that each field of the array will be expanded to one argument (thanks to the quotes), so that all this is really safe regarding spaces, glob characters, etc. This is really the safest and most correct way to proceed. Every solution using eval is potentially dangerous and broken (unless some special quotings is used, e.g., with printf '%q', but this would make the method uselessly awkward). By the way, using sed is also broken in at least two ways.
Note that I enclosed the expansion in a subshell, so that it's impossible for the user to interfere with your script. Without this, and depending on how your full script is written, it's very easy to make your script break by (maliciously) changing some variables stuff or cd-ing somewhere else. Running your argument in a subshell, or in a separate process (e.g., separate instance of bash or sh—but this would add extra overhead) is really mandatory for obvious security reasons!
Note that with your script, user has a direct access to all the Bash builtins (this is a huge pro), compared to some more standard find versions2!
1 Note that POSIX clearly specifies that this behavior is implementation-defined:
If a utility_name or argument string contains the two characters "{}", but not just the two characters "{}", it is implementation-defined whether find replaces those two characters or uses the string without change.
In our case, we chose to replace all occurrences of {} with the filename. This is the same behavior as, e.g., GNU find. From man find:
The string {} is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find.
2 POSIX also specifies that calling builtins is not defined:
If the utility_name names any of the special built-in utilities (see Special Built-In Utilities), the results are undefined.
In your case, it's well defined!
I think that trying to implement (in pure Bash) a find command is a wonderful exercise that should teach you a lot… especially if you get relevant feedback. I'd be happy to review your code!
I have re-purposed this example to keep it simple, but what I am trying to do is get a nested double-quote string as a single argv value when the bash shell executes it.
Here is the script example:
set -x
command1="key1=value1 \"key2=value2 key3=value3\""
command2="keyA=valueA keyB=valueB keyC=valueC"
echo $command1
echo $command2
the output is:
++ command1='key1=value1 "key2=value2 key3=value3"'
++ command2='keyA=valueA keyB=valueB keyC=valueC'
++ echo key1=value1 '"key2=value2' 'key3=value3"'
key1=value1 "key2=value2 key3=value3"
++ echo keyA=valueA keyB=valueB keyC=valueC
keyA=valueA keyB=valueB keyC=valueC
I did test as well, that when you do everything on the command line, the nested quote message IS set as a single argv value. i.e.
prog.exe argument1 "argument2 argument3"
argv[0] = prog.exe
argv[1] = argument1
argv[2] = argument2 argument3
Using the above example:
command1="key1=value1 \"key2=value2 key3=value3\""
The error is, my argv is comming back like:
arg[1] = echo
arg[2] = key1=value1
arg[3] = "key2=value2
arg[4] = key3=value3"
where I really want my argv[3] value to be "key2=value2 key3=value3"
I noticed that debug (set -x) shows a single-quote at the points where my arguments get broken which kinda indicates that it is thinking about the arguments at these break point...just not sure.
Any idea what is really going on here? How can I change the script?
Thanks in advance.
What is happening is that your nested quotes are literal and not parsed into separate arguments by the shell. The best way to handle this using bash is to use an array instead of a string:
args=('key1=value1', 'key2=value2 key3=value3')
prog.exe "${args[#]}"
The Bash FAQ50 has some more examples and use cases for dynamic commands.
A kind of crazy "answer" is to set IFS to double quote like this (save/restore original IFS):
SAVED_IFS=$IFS
IFS=$'\"'
prog.exe $command1
IFS=$SAVED_IFS
It kind of illustrates word splitting which occurs on unquoted arguments but does not affect variables or text inside ".." quotes. Text inside double quotes (after various expansions) is passed to the program as a single argument. However a bare variable $command1 (unquoted) undergoes word splitting which does not care about " inside the variable (taking it literal). A stupid IFS hack forces word splitting to be made at ". Also beware of a trailing whitespace at the end of argv[1] which appears because of word splitting at the " boundary.
jordanm's answer is much better for production use than mine :) The array is quoted, i.e. each array element is expanded as individual string and no word splitting occurs afterwards. This is essential. If it is unquoted like ${args[#]} it would be word split into three arguments instead of two.
If I want to pass a program data files how can I distinguish the fact they are data files, not just strings of the file names. Basically I want to file redirect, but use command line arguments so I can a sure input is correct.
I have been using:
./theapp < datafile1 < datafile2 arg1 arg2 arg3 > outputfile
but I am wondering is it posible for it to look like this:
./the app datafile1 datafile2 arg1 arg2 arg3 > outputfile
Allowing the use of command line arguments.
It's a little hard to combine two files into standard input like that. Better would be:
cat datafile1 datafile2 | ./theapp arg1 arg2 arg3 >outputfile
With bash (at least), the second input redirection overrides the first, it does not augment it. You can see that with the two commands:
cat <realfile.txt </dev/null # no output.
cat </dev/null <realfile.txt # outputs realfile.txt.
When you use redirection, your application never even sees >outputfile (for example). It is evaluated by the shell which opens it up and connects it to the standard output of the process you're trying to run. All your program will generally see will be:
./theapp arg1 arg2 arg3
Same with standard input, it's taken care of by the shell.
The only possible problem with that first command above is that it combines the two files into one stream so that your program doesn't know where the first ends and second begins (unless it can somehow deduce this from the content of the files).
If you want to process multiple files and know which they are, there's a time-honoured tradition of doing something like:
./theapp arg1 arg2 arg3 #datafile1 #datafile2 >outputfile
and then having your application open and process the files itself. This is more work than letting the shell do it though.
From the perspective of your program, all command line arguments are strings, and you have to decide whether they represent file names or not yourself. There are only two bytes that cannot appear in a file name on Unix: 0x00 and 0x2F (NUL and /). [I really mean bytes. Except for HFS+, Unix file systems are completely oblivious to character encoding, although sensible people use UTF-8, of course.]
Shell redirections don't appear in argv at all.
There is a convention, though: treat each element of argv (except argv[0] of course) that does not begin with a dash as the name of a file to process, in the order that they appear. You do NOT have to do any unquoting operations; just pass them to fopen (or open) as is. If the string "-" appears as an element of argv, process standard input at that point until exhausted, then continue looping over argv. And if the string "--" appears in argv, treat everything after that point as a file name, whether or not it begins with a dash. (Including subsequent appearances of "-" or "--").
There may be a handy library module or even a language primitive to deal with this stuff for you, depending on what language you're using. For instance, in Perl, you just write
for (<>) {
... do stuff with $_ ...
}
and you get everything I said in the "There is a convention..." paragraph for free. (But you said C, so, um, you gotta do most of it yourself. I'm not aware of an argument-processing library for plain C that's worth the space it takes on disk. :-( )