Shell script split a string by space - arrays

The bash shell script can split a given string by space into a 1D array.
str="a b c d e"
arr=($str)
# arr[0] is a, arr[1] is b, etc. arr is now an array, but what is the magic behind?
But, what exactly happened when we can arr=($str)? My understanding is the parenthesis here creates a subshell, but what happen after that?

In an assignment, the parentheses simply indicate that an array is being created; this is independent of the use of parentheses as a compound command.
This isn't the recommended way to split a string, though. Suppose you have the string
str="a * b"
arr=($str)
When $str is expanded, the value undergoes both word-splitting (which is what allows the array to have multiple elements) and pathname expansion. Your array will now have a as its first element, b as its last element, but one or more elements in between, depending on how many files in the current working directly * matches. A better solution is to use the read command.
read -ra arr <<< "$str"
Now the read command itself splits the value of $str without also applying pathname expansion to the result.

It seems you've confused
arr=($str) # An array is created with word-splitted str
with
(some command) # executing some command in a subshell
Note that
arr=($str) is different from arr=("$str")in that in the latter, the double quotes prevents word splitting ie the array will contain only one value -> a b c d e.
You can check the difference between the two by the below
echo "${#arr[#]}"

Related

Using awk to parse a string from a file

I'm still learning Unix and I'm having issues understanding the following line of code.
echo "$lines" | awk '{split($0,a,":"); print a[3],a[2],a[1]}'
I don't understand what is happening with the array a in the line of code above. Is it declaring the array and setting it equal to the string it's parsing? If it is declaring the array a, then, why can't I print out the results later on in the code?
echo "${a[1]}"
The line above prints an empty line and not what has been stored in the array a when the string was parsed. I know there is always something in the string that needs to be parsed and when I call the array a[1] I know that I'm in inside the scope. I just don't see/understand what is happening with the array a that prevents me from printing it out later on in the code.
Your code is printing a line for each line of input. If you dont have get output, my first guess would be, that you don't have input.
Given an input of:
lines="ab:cd:ef
ij:kl:m"
the output is:
ef cd ab
m kl ij
awk is executing the commands (which is everything in between the single quotes) for each line of input. First splitting the input line $0 at each : into an array a, then printing the first three elements in reverse order.
If you try to access an array element in the shell, what echo suggests, then you are too late. The array exists within awk and is gone when awk has finished.

Passing an array as an argument from a Perl script to a R script

I am new to R and I have a Perl Script in which I want to call a R Script, which calculates something for me (not important what in this context). I want to give as arguments an input file, an array which contains some numbers and a number for a total number of clusters. medoid.r is the name of my R Script.
my $R_out;
$R_out = qx{./script/medoid.r $output #cluster $NUMBER_OF_CLUSTERS}
My current R code looks like this. Right now I just print cluster to see what is inside.
args <- commandArgs(TRUE)
filename = args[1]
cluster = as.vector(args[2])
number_of_cluster = args[3]
matrix = read.table(filename, sep='\t', header=TRUE, row.names=1, quote="")
print(cluster)
Is it possible to give an array as an argument? How can I save it in R? Right now only the first number of the array is stored and printed, but I would like to have every number in a vector or something similar.
If you do this in Perl
$R_out = qx{./script/medoid.r $output #cluster $NUMBER_OF_CLUSTERS};
your command line will look similar to this
./scriptmedoid.r output 111 222 333 3
assuming that $output is 'output' and #clusters = (111, 222, 333).
If you want to read that in R, you need to assign all elements after the first one in args to cluster but the last one, and the last one to number_of_cluster. In Perl you can use shift and pop for that.
my #args = #_;
my $output = shift #args;
my $number = pop #args;
# now #args only contains the clusters
I don't know if those operators exist in R.
You cannot pass a full data structure unless you serialize it in some way.
In perl, qx will expect a string as an argument. You may certainly use an array to generate that string, but ultimately it will still be a string. You cannot "pass an array" to a system call, you can only pass command-line text/arguments.
Keep in mind, you are executing a system call running Rscript as a child process. The way you're describing the issue, there is no inter-process communication beyond the command line. Think of it this way: how would you type an array on the command line? You may have some textual way of representing an array, but you can't type an array on the command line. Arrays are stored and accessed in memory differently by various different languages, and thus are not really portable between two languages like you're suggesting.
One solution: all that said, there may be a simple solution for you. You haven't provided any information on the type of data you want to pass in your array. If it is simple enough, you may try passing it on the command line as delimited text, and then break it up to use in your Rscript.
Here is an Rscript that shows you what I mean:
args = commandArgs(trailingOnly=TRUE)
filename = args[1]
cluster <- c(strsplit(args[2],"~"))
sprintf("Filename: %s",filename)
sprintf("Cluster list: %s",cluster)
print("Cluster:")
cluster
sprintf("First Item: %s",cluster[[1]][1])
Save it as "test.r" and try executing it with "Rscript test.r test.txt one~two" and you'll get the following output (tested on Rscript 46084, OpenBSD):
[1] "Filename: test.txt"
[1] "Cluster list: c(\"one\", \"two\")"
[1] "Cluster:"
[[1]]
[1] "one" "two"
[1] "First Item: one"
So, all you'd have to do on the perl side of things is join() your array using "~" or any other delimiter- it is highly dependent on your data, and you haven't provided it.
Summary: re-think how you want to communicate between perl and Rscript. Consider sending the data as a delimited string (if it's the right size) and breaking it up on the other side. Look into IPC if that won't work, consider environment variables or other options. There is no way to send an array reference on the command-line.
Note: you may want to read up on security risks of different system calls in perl.

bash array with square bracket strings

I want to make an array with string values that have square brackets. but every time I keep getting output unexpected.
selections=()
for i in $choices
do
selections+=("role[${filenames[$i]}]")
done
echo ${selections[#]}
If choices were 1 and 2, and the array filenames[1] and filenames[2] held the values 'A', 'B' I want the selections array to hold the strings role[A], and role[B]
instead the output I get is just roles.
I can make the code you presented produce the output you wanted, or not, depending on the values I assign to variables filenames and choices.
First, I observe that bash indexed arrays are indexed starting at 0, not 1. If you are using the values 1 and 2 as indices into array filenames, and if that is an indexed array with only two elements, then it may be that ${filenames[2]} expands to nothing. This would be the result if you initialize filenames like so:
# NOT WHAT YOU WANT:
filenames=(A B)
Instead, either assign array elements individually, or add a dummy value at index 0:
# Could work:
filenames=('' A B)
Next, I'm suspicious of choices. Since you're playing with arrays, I speculate that you may have initialized choices as an array, like so:
# NOT CONSISTENT WITH YOUR LATER USAGE:
choices=(1 2)
If you expand an array-valued variable without specifying an index, it is as if you specified index 0. With the above initialization, then, $choices would expand to just 1, not 1 2 as you intend. There are two possibilities: either initialize choices as a flat string:
# Could work:
choices='1 2'
or expand it differently:
# or expand it this way:
for i in "${choices[#]}"
. Do not overlook the quotes, by the way: that particular form will expand to one word per array element, but without the quotes the array elements would be subject to word splitting and other expansions (though that's moot for the particular values you're using in this case).
The quoting applies also, in general, to your echo command: if you do not quote the expansion then you have to analyze the code much more carefully to be confident that it will do what you intend in all cases. It will be subject not only to word splitting, but pathname expansion and a few others. In your case, there is a potential for pathname expansion to be performed, depending on the names of the files in the working directory (thanks #CharlesDuffy). It is far safer to just quote.
Anyway, here is a complete demonstration incorporating your code verbatim and producing the output you want:
#!/bin/bash
filenames=('' 'A' 'B')
choices="1 2"
selections=()
for i in $choices
do
selections+=("role[${filenames[$i]}]")
done
echo ${selections[#]}
# better:
# echo "${selections[#]}"
Output:
role[A] role[B]
Finally, as I observed in comments, there is no way that your code could output "roles", as you claim it does, given the inputs (variable values) you claim it has. If that's in fact what you see, then either it is not related to the code you presented at all, or your inputs are different than you claim.

Bash and Double-Quotes passing to argv

I have re-purposed this example to keep it simple, but what I am trying to do is get a nested double-quote string as a single argv value when the bash shell executes it.
Here is the script example:
set -x
command1="key1=value1 \"key2=value2 key3=value3\""
command2="keyA=valueA keyB=valueB keyC=valueC"
echo $command1
echo $command2
the output is:
++ command1='key1=value1 "key2=value2 key3=value3"'
++ command2='keyA=valueA keyB=valueB keyC=valueC'
++ echo key1=value1 '"key2=value2' 'key3=value3"'
key1=value1 "key2=value2 key3=value3"
++ echo keyA=valueA keyB=valueB keyC=valueC
keyA=valueA keyB=valueB keyC=valueC
I did test as well, that when you do everything on the command line, the nested quote message IS set as a single argv value. i.e.
prog.exe argument1 "argument2 argument3"
argv[0] = prog.exe
argv[1] = argument1
argv[2] = argument2 argument3
Using the above example:
command1="key1=value1 \"key2=value2 key3=value3\""
The error is, my argv is comming back like:
arg[1] = echo
arg[2] = key1=value1
arg[3] = "key2=value2
arg[4] = key3=value3"
where I really want my argv[3] value to be "key2=value2 key3=value3"
I noticed that debug (set -x) shows a single-quote at the points where my arguments get broken which kinda indicates that it is thinking about the arguments at these break point...just not sure.
Any idea what is really going on here? How can I change the script?
Thanks in advance.
What is happening is that your nested quotes are literal and not parsed into separate arguments by the shell. The best way to handle this using bash is to use an array instead of a string:
args=('key1=value1', 'key2=value2 key3=value3')
prog.exe "${args[#]}"
The Bash FAQ50 has some more examples and use cases for dynamic commands.
A kind of crazy "answer" is to set IFS to double quote like this (save/restore original IFS):
SAVED_IFS=$IFS
IFS=$'\"'
prog.exe $command1
IFS=$SAVED_IFS
It kind of illustrates word splitting which occurs on unquoted arguments but does not affect variables or text inside ".." quotes. Text inside double quotes (after various expansions) is passed to the program as a single argument. However a bare variable $command1 (unquoted) undergoes word splitting which does not care about " inside the variable (taking it literal). A stupid IFS hack forces word splitting to be made at ". Also beware of a trailing whitespace at the end of argv[1] which appears because of word splitting at the " boundary.
jordanm's answer is much better for production use than mine :) The array is quoted, i.e. each array element is expanded as individual string and no word splitting occurs afterwards. This is essential. If it is unquoted like ${args[#]} it would be word split into three arguments instead of two.

How to change values of bash array elements without loop

array=(a b c d)
I would like to add a character before each element of the array in order to have this
array=(^a ^b ^c ^d)
An easy way to do that is to loop on array elements and change values one by one
for i in "${#array[#]}"
do
array[i]="^"array[i]
done
But I would like to know if there is any way to do the same thing without looping on the array as I have to do the same instruction on all elements.
Thanks in advance.
Use Parameter Expansion:
array=("${array[#]/#/^}")
From the documentation:
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern just as in pathname
expansion. Parameter is expanded and the longest match of pattern against its value is
replaced with string. If pattern begins with /, all matches of pattern are replaced with
string. Normally only the first match is replaced. If pattern begins with #, it must
match at the beginning of the expanded value of parameter. If pattern begins with %, it
must match at the end of the expanded value of parameter. If string is null, matches of
pattern are deleted and the / following pattern may be omitted. If parameter is # or *,
the substitution operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with # or *, the
substitution operation is applied to each member of the array in turn, and the expansion is
the resultant list.
This way also honor whitespaces in array values:
array=( "${array[#]/#/^}" )
Note, this will FAIL if array was empty and you set previously
set -u
I don't know how to eliminate this issue using short code...

Resources