Bash and Double-Quotes passing to argv - c

I have re-purposed this example to keep it simple, but what I am trying to do is get a nested double-quote string as a single argv value when the bash shell executes it.
Here is the script example:
set -x
command1="key1=value1 \"key2=value2 key3=value3\""
command2="keyA=valueA keyB=valueB keyC=valueC"
echo $command1
echo $command2
the output is:
++ command1='key1=value1 "key2=value2 key3=value3"'
++ command2='keyA=valueA keyB=valueB keyC=valueC'
++ echo key1=value1 '"key2=value2' 'key3=value3"'
key1=value1 "key2=value2 key3=value3"
++ echo keyA=valueA keyB=valueB keyC=valueC
keyA=valueA keyB=valueB keyC=valueC
I did test as well, that when you do everything on the command line, the nested quote message IS set as a single argv value. i.e.
prog.exe argument1 "argument2 argument3"
argv[0] = prog.exe
argv[1] = argument1
argv[2] = argument2 argument3
Using the above example:
command1="key1=value1 \"key2=value2 key3=value3\""
The error is, my argv is comming back like:
arg[1] = echo
arg[2] = key1=value1
arg[3] = "key2=value2
arg[4] = key3=value3"
where I really want my argv[3] value to be "key2=value2 key3=value3"
I noticed that debug (set -x) shows a single-quote at the points where my arguments get broken which kinda indicates that it is thinking about the arguments at these break point...just not sure.
Any idea what is really going on here? How can I change the script?
Thanks in advance.

What is happening is that your nested quotes are literal and not parsed into separate arguments by the shell. The best way to handle this using bash is to use an array instead of a string:
args=('key1=value1', 'key2=value2 key3=value3')
prog.exe "${args[#]}"
The Bash FAQ50 has some more examples and use cases for dynamic commands.

A kind of crazy "answer" is to set IFS to double quote like this (save/restore original IFS):
SAVED_IFS=$IFS
IFS=$'\"'
prog.exe $command1
IFS=$SAVED_IFS
It kind of illustrates word splitting which occurs on unquoted arguments but does not affect variables or text inside ".." quotes. Text inside double quotes (after various expansions) is passed to the program as a single argument. However a bare variable $command1 (unquoted) undergoes word splitting which does not care about " inside the variable (taking it literal). A stupid IFS hack forces word splitting to be made at ". Also beware of a trailing whitespace at the end of argv[1] which appears because of word splitting at the " boundary.
jordanm's answer is much better for production use than mine :) The array is quoted, i.e. each array element is expanded as individual string and no word splitting occurs afterwards. This is essential. If it is unquoted like ${args[#]} it would be word split into three arguments instead of two.

Related

Split array element delimited with '.'

I am trying to read below CSV file content line by line in Perl.
CSV File Content:
A7777777.A777777777.XXX3604,XXX,3604,YES,9
B9694396.B216905785.YYY0018,YYY,0018,YES,13
C9694396.C216905785.ZZZ0028,ZZZ,0028,YES,16
I am able to split line content using below code and able to verify the content too:
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
I am also trying to find the second part on the first column of CSV file (i.e., A777777777 or B216905785 or C216905785) – the first column delimited with . using the below code and I am unable to get it.
Instead, just a new line printed.
my ($v1, $v2, $v3) = split(".", $column_fields1[0]);
print $v2,"\n";
Can someone suggest me how to split the array element and get the above value?
On my functionality, I need the first column value altogether at someplace and just only the second part at someplace.
Below is my code:
use strict;
use warnings;
my $dailybillable_tab_section1_file = "./sql/demanding_01_T.csv";
open(FILE, $dailybillable_tab_section1_file) or die "Could not read from $dailybillable_tab_section1_file, program halting.";
my #column_fields1;
my #column_fields2;
while (<FILE>)
{
chomp;
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
my ($v1, $v2, $v3) = split(".",$column_fields1[0]);
print $v2,"\n";
if($v2 ne 'A777777777')
{
…
…
…
}
else
{
…
…
…
}
}
close FILE;
split takes a regex as its first argument. You can pass it a string (as in your code), but the contents of the string will simply be interpreted as a regex at runtime.
That's not a problem for , (which has no special meaning in a regex), but it breaks with . (which matches any (non-newline) character in a regex).
Your attempt to fix the problem with split "\." fails because "\." is identical to ".": The backslash has its normal string escape meaning, but since . isn't special in strings, escaping it has no effect. You can see this by just printing the resulting string:
print "\.\n"; # outputs '.', same as print ".\n";
That . is then interpreted as a regex, causing the problems you have observed.
The normal fix is to just pass a regex to split:
split /\./, $string
Now the backslash is interpreted as part of the regex, forcing . to match itself literally.
If you really wanted to pass a string to split (I'm not sure why you'd want to do that), you could also do it like this:
split "\\.", $string
The first backslash escapes the second backslash, giving a two character string (\.), which when interpreted as a regex means the same thing as /\./.
If you look at the documentation for split(), you'll see it gives the following ways to call the function:
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split
In three of those examples, the first argument to the function is /PATTERN/. That is, split() expects to be given a regular expression which defines how the input string is split apart.
It's very important to realise that this argument is a regex, not a string. Unfortunately, Perl's parser doesn't insist on that. It allows you to use a first argument which looks like a string (as you have done). But no matter how it looks, it's not a string. It's a regex.
So you have confused yourself by using code like this:
split(".",$COLUMN_FIELDS1[0])
If you had made the first argument look like a regex, then you would be more likely to realise that the first argument is a regex and that, therefore, a dot needs to be escaped to prevent it being interpreted as a metacharacter.
split(/\./, $COLUMN_FIELDS1[0])
Update: It's generally accepted among Perl programmers, that variable with upper case names are constants and don't change their values. By using upper case names for standard variables, you are likely to confuse the next person who edits your code (who could well be you in six months time).

Strange behavior of argv when passing string containing "!!!!"

I have written a small program that takes some input parameters from *argv[] and prints them. In almost all use cases my code works perfectly fine. A problem only arises when I use more than one exclamation mark at the end of the string I want to pass as an argument ...
This works:
./program -m "Hello, world!"
This does NOT work:
./program -m "Hello, world!!!!"
^^ If I do this, the program output is either twice that string, or the command I entered previous to ./program.
However, what I absolutely don't understand: The following, oddly enough, DOES work:
./program -m 'Hello, world!!!!'
^^ The output is exactly ...
Hello, world!!!!
... just as desired.
So, my questions are:
Why does this strange behavior occur when using multiple exclamation marks in a string?
As far as I know, in C you use "" for strings and '' for single chars. So why do I get the desired result when using '', but not when using "" as I should (in my understanding)?
Is there a mistake in my code or what do I need to change to be able to enter any string (no matter if, what, and how many punctuation marks are used) and get exactly that string printed?
The relevant parts of my code:
// this is a simplified example that, in essence, does the same
// as my (significantly longer) code
int main(int argc, char* argv[]) {
char *msg = (char *)calloc(1024, sizeof(char));
printf("%s", strcat(msg, argv[2])); // argv[1] is "-m"
free(msg);
}
I already tried copying the content of argv[2] into a char* buffer first and appending a '\0' to it, which didn't change anything.
This is not related to your code but to the shell that starts it.
In most shells, !! is shorthand for the last command that was run. When you use double quotes, the shell allows for history expansion (along with variable substitution, etc.) within the string, so when you put !! inside of a double-quoted string it substitutes the last command run.
What this means for your program is that all this happens before your program is executed, so there's not much the program can do except check if the string that is passed in is valid.
In contrast, when you use single quotes the shell does not do any substitutions and the string is passed to the program unmodified.
So you need to use single quotes to pass this string. Your users would need to know this if they don't want any substitution to happen. The alternative is to create a wrapper shell script that prompts the user for the string to pass in, then the script would subsequently call your program with the proper arguments.
The shell does expansion in double-quoted strings. And if you read the Bash manual page (assuming you use Bash, which is the default on most Linux distributions) then if you look at the History Expansion section you will see that !! means
Refer to the previous command.
So !!!! in your double-quoted string will expand to the previous command, twice.
Such expansion is not made for single-quoted strings.
So the problem is not within your program, it's due to the environment (the shell) calling your program.
In addition to the supplied answers, you should remember that echo is your shell friend. If you prefix your command with "echo ", you will see what shell is actually sending to your script.
echo ./program -m "Hello, world!!!!"
This would have showed you some strangeness and might have helped steer you in the right direction.

Shell script split a string by space

The bash shell script can split a given string by space into a 1D array.
str="a b c d e"
arr=($str)
# arr[0] is a, arr[1] is b, etc. arr is now an array, but what is the magic behind?
But, what exactly happened when we can arr=($str)? My understanding is the parenthesis here creates a subshell, but what happen after that?
In an assignment, the parentheses simply indicate that an array is being created; this is independent of the use of parentheses as a compound command.
This isn't the recommended way to split a string, though. Suppose you have the string
str="a * b"
arr=($str)
When $str is expanded, the value undergoes both word-splitting (which is what allows the array to have multiple elements) and pathname expansion. Your array will now have a as its first element, b as its last element, but one or more elements in between, depending on how many files in the current working directly * matches. A better solution is to use the read command.
read -ra arr <<< "$str"
Now the read command itself splits the value of $str without also applying pathname expansion to the result.
It seems you've confused
arr=($str) # An array is created with word-splitted str
with
(some command) # executing some command in a subshell
Note that
arr=($str) is different from arr=("$str")in that in the latter, the double quotes prevents word splitting ie the array will contain only one value -> a b c d e.
You can check the difference between the two by the below
echo "${#arr[#]}"

Trying to get an asterisk * as input to main from command line

I'm trying to send input from the command line to my main function. The input is then sent to the functions checkNum etc.
int main(int argc, char *argv[])
{
int x = checkNum(argv[1]);
int y = checkNum(argv[3]);
int o = checkOP(argv[2]);
…
}
It is supposed to be a calculator so for example in the command line when I write:
program.exe 4 + 2
and it will give me the answer 6 (code for this is not included).
The problem is when I want to multiply and I type for example
program.exe 3 * 4
It seems like it creates a pointer (or something, not quite sure) instead of giving me the char pointer to the char '*'.
The question is can I get the input '*' to behave the same way as when I type '+'?
Edit: Writing "*" in the command line works. Is there a way where I only need to type *?
The code is running on Windows, which seems to be part of the problem.
As #JohnBollinger wrote in the comments, you should use
/path/to/program 3 '*' 4
the way it's written at the moment.
But some explanation is clearly required. This is because the shell will parse the command line before passing it to your program. * will expand to any file in the directory (UNIX) or something similar (windows), space separated. This is not what you need. You cannot fix it within your program as it will be too late. (On UNIX you can ensure you are in an empty directory but that probably doesn't help).
Another way around this is to quote the entire argument (and rewrite you program appropriately), i.e.
/path/to/program '3 * 4'
in which case you would need to use strtok_r or strsep to step through the (single) argument passed, separating it on the space(s).
How the shell handles the command-line arguments is outside the scope and control of your program. There is nothing you can put in the program to tell the shell to avoid performing any of its normal command-handling behavior.
I suggest, however, that instead of relying on the shell for word splitting, you make your program expect the whole expression as a single argument, and for it to parse the expression. That will not relieve you of the need for quotes, but it will make the resulting commands look more natural:
program.exe 3+4
program.exe "3 + 4"
program.exe "4*5"
That will also help if you expand your program to handle more complex expressions, such as those containing parentheses (which are also significant to the shell).
You can turn off the shell globbing if you don't want to use single quote (') or double quote (").
Do
# set -o noglob
or
# set -f
(both are equivalent).
to turn off the shell globbing. Now, the shell won't expand any globs, including *.

How can *++argv[0] refer to different command line arguments?

So I was reading The C Programming language and came across a section where programs were now allowed to have arguments...
For example
find -x -n pattern
Here, -x means except.
-n means numbered lines...
and pattern is what it will look for, in another few lines of input.
Now they regard find as *argv[0], -x and -n at *++argv[0], and pattern as *++argv[0]. How does a computer know one arg from the other?
If 3 things are all equal to *++argv[0], then they stay at argv[1], but all of them??
Could anyone please explain in depth?
argv[0] = program name = "find"
argv[1] = first argument = "-x"
argv[2] = second argument = "-n"
argv[3] = third argument = "pattern"
argc = 4, so you know there are no other arguments to process.
Don't be confused by the use of the pre-increment operator in expressions like *++argv[0]. The arguments are passed in separate array elements.
When the shell executes your command, it uses whitespace to divide the command line
up into the program name and arguments and pass them to your program. Sometimes you need to work around that by using double quotes, for example, if you need to deal with a file whose name contains embedded spaces:
mv some stupid filename sane_filename
This won't work because "some" "stupid" "filename" will be seen as separate arguments.
But you can do this:
mv "some stupid filename" sane_filename
to get a single argument with embedded spaces.
The ++n preincrement operator changes the variable it is applied to. The first time ++argv is executed, indexing it at 0 actually points to element 1 of the original value argv had, the second time it points to element 2, and so on.

Resources