I am pulling my hair out manipulating arrays in bash. I have an array of strings, which contain spaces. I would like an array containing all but the first element of my input array.
input=("first string" "second string" "third string")
echo ${#input[#]}
# len(input)=3
# get slice of all except for first element of input
slice=${input[#]:1}
echo ${#slice[#]}
# expect 2, but get 1
echo $slice
# second string third string
# slice should contain ("second string" "third string"), but instead is "second string third string"
Slicing the array clearly works to eliminate the first element, but the result appears to be a concatenation of all remaining strings, rather than an array. Is there a way to slice an array in bash and get an array as a result?
(sorry, I'm not new to bash, but I've never used it for much before, and I can't find any documentation showing why my slice is flattened)
First off, you should always quote variable expansions. Be very wary of any solution that relies on unquoted expansions. ShellCheck.net is a great tool for catching bugs related to quoting (among many other issues).
To your specific issue, slice=${input[#]:1} does not do what you want. It defines a single scalar variable slice rather than an array, meaning the array expansion (denoted by the [#]) will first be munged into a single string using the current IFS. Here's a demo:
$ arr=(1 2 '3 4')
$ IFS=,
$ var="${arr[#]:1}"
$ echo "$var"
2,3 4
To instead declare and populate an array use the =() notation, like so:
$ var=("${arr[#]:1}")
$ printf '%s\n' "${var[#]}"
2
3 4
Indexes are reset, element 1 is now element 0:
slice=("${input[#]:1}")
Element and index are removed, the first element is now index 1, not index 0:
unset input[0]
${#slice[#]} or ${#input[#]} will now be 1 less than the previous value of ${#input[#]}. Starting out with three elements in slice, the values of "${!slice[#]}" and "${!input[#]}", will be 0 1 and 1 2 respectively (for either the first or second approach)
If you don't quote slice=("${input[#]:1}"), each array element is split on whitespace, creating many more elements.
Related
Let's say I have an array of n elements. Each element is a string of comma-separated x,y coordinate pairs, e.g. "581,284". There is no set character length to these x,y values.
Say I wanted to subtract 8 from each x value, and 5 from each y value.
What would be the simplest way to modify x and y, independently of each other, without permanently splitting the x and y values apart?
e.g the first array element "581,284" becomes "573,279", the second array element "1013,562" becomes "1005,274", and so forth.
I worked on this problem for a couple of hours (I'm an amateur at bash), and it seemed as if my approach was awfully convoluted.
Please note that the apostrophes above are only added for emphasis, and are not a part of the problem.
Thank you in advance, I've been racking my head over this for a while now!
Edit: The following excerpt is the approach I was taking. I don't know much about bash, as you can tell.
while read value
do
if [[ -z $offset_list ]]
then
offset_list="$value"
else
offset_list="$offset_list,$value"
fi
done < text.txt
new_offset=${offset_list//,/ }
read -a new_array <<< $new_offset
for value in "${new_array[#]}"
do
if [[ $((value%2)) -eq 1 ]]
then
value=$((value-8));
new_array[$counter]=$value
counter=$((counter+1));
elif [[ $((value%2)) -eq 0 ]]
then
value=$((value-5));
new_array[$counter]=$value
counter=$((counter+1));
fi
done
Essentially I had originally read the coordinate pairs, and stripped the commas from them, and then planned on modifying odd/even values which were populated into the new array. At this point I realized that there had to be a more efficient way.
I believe the following should achieve what you are looking for:
#!/bin/bash
input=("581,284" "1013,562")
echo "Initial array ${input[#]}"
for index in ${!input[#]}; do
value=${input[$index]}
x=${value%%,*}
y=${value##*,}
input[$index]="$((x-8)),$((y+5))"
done
echo "Modified array ${input[#]}"
${!input[#]} allows us to loop over the indexes of the bash array.
${value%%,*} and ${value##*,} relies on bash parameter substitution to remove the everything after or before the comma (respectively). This effectively splits your string into two variables.
From there, it's your required math and variable reassignment to mutate the array.
The syntax to delete an element from an array can be found here: Remove an element from a Bash array
Also, here is how to find the last element of an array: https://unix.stackexchange.com/questions/198787/is-there-a-way-of-reading-the-last-element-of-an-array-with-bash
But how can I mix them (if possible) together to remove the last element of the array ?
I tried this:
TABLE_COLUMNS=${TABLE_COLUMNS[#]/${TABLE_COLUMNS[-1]}}
But it throws:
bad array subscript
You can use unset to remove a specific element of an array given its position.
$ foo=(1 2 3 4 5)
$ printf "%s\n" "${foo[#]}"
1
2
3
4
5
$ unset 'foo[-1]'
$ printf "%s\n" "${foo[#]}"
1
2
3
4
Edit: This is useful for printing elements except the last without altering the array. See chepner's answer for a far more convenient solution to OP.
Substring expansions* could be used on arrays for extracting subarrays, like:
TABLE_COLUMNS=("${TABLE_COLUMNS[#]::${#TABLE_COLUMNS[#]}-1}")
* The syntax is:
${parameter:offset:length}
Both offset and length are arithmetic expressions, an empty offset implies 1. Used on array expansions (i.e. when parameter is an array name subscripted with * or #), the result is at most length elements starting from offset.
There is this typical problem: given a list of values, check if they are present in an array.
In awk, the trick val in array does work pretty well. Hence, the typical idea is to store all the data in an array and then keep doing the check. For example, this will print all lines in which the first column value is present in the array:
awk 'BEGIN {<<initialize the array>>} $1 in array_var' file
However, it is initializing the array takes some time because val in array checks if the index val is in array, and what we normally have stored in array is a set of values.
This becomes more relevant when providing values from command line, where those are the elements that we want to include as indexes of an array. For example, in this basic example (based on a recent answer of mine, which triggered my curiosity):
$ cat file
hello 23
bye 45
adieu 99
$ awk -v values="hello adieu" 'BEGIN {split(values,v); for (i in v) names[v[i]]} $1 in names' file
hello 23
adieu 99
split(values,v) slices the variable values into an array v[1]="hello"; v[2]="adieu"
for (i in v) names[v[i]] initializes another array names[] with names["hello"] and names["adieu"] with empty value. This way, we are ready for
$1 in names that checks if the first column is any of the indexes in names[].
As you see, we slice into a temp variable v to later on initialize the final and useful variable names[].
Is there any faster way to initialize the indexes of an array instead of setting one up and then using its values as indexes of the definitive?
No, that is the fastest (due to hash lookup) and most robust (due to string comparison) way to do what you want.
This:
BEGIN{split(values,v); for (i in v) names[v[i]]}
happens once on startup and will take close to no time while this:
$1 in array_var
which happens once for every line of input (and so is the place that needs to have optimal performance) is a hash lookup and so the fastest way to compare a string value to a set of strings.
not an array solution but one trick is to use pattern matching. To eliminate partial matches wrap the search and array values with the delimiter. For your example,
$ awk -v values="hello adieu" 'FS values FS ~ FS $1 FS' file
hello 23
adieu 99
Suppose I have 3 arrays, A, B and C
I want to do the following:
A=("1" "2")
B=("3" "4")
C=("5" "6")
for i in $A $B $C; do
echo ${i[0]} ${i[1]}
#process data etc
done
So, basically i takes the value of the whole array each time and I am able to access the specific data stored in each array.
On the 1st loop, i should take the value of the 1st array, A, on the 2nd loop the value of array B etc.
The above code just iterates with i taking the value of the first element of each array, which clearly isn't what I want to achieve.
So the code only outputs 1, 3 and 5.
You can do this in a fully safe and supportable way, but only in bash 4.3 (which adds namevar support), a feature ported over from ksh:
for array_name in A B C; do
declare -n current_array=$array_name
echo "${current_array[0]}" "${current_array[1]}"
done
That said, there's hackery available elsewhere. For instance, you can use eval (allowing a malicious variable name to execute arbitrary code, but otherwise safe):
for array_name in A B C; do
eval 'current_array=( "${'"$array_name"'[#]}"'
echo "${current_array[0]}" "${current_array[1]}"
done
If the elements of the arrays don't contain spaces or wildcard characters, as in your question, you can do:
for i in "${A[*]}" "${B[*]}" "${C[*]}"
do
iarray=($i)
echo ${iarray[0]} ${iarray[1]}
# process data etc
done
"${A[*]}" expands to a single string containing all the elements of ${A[*]}. Then iarray=($i) splits this on whitespace, turning the string back into an array.
I am writing a program and trying to break up data, which is stored in an array, in order to make it run faster.
I am attempting to go about it this way:
data_to_analyze=(1 2 3 4 5 6 7 8 9 10)
#original array size
dataSize=(${#data_to_analyze[#]})
#half of that size
let samSmall="$dataSize/2"
#the other half
let samSmall2=("$dataSize - $samSmall -1")
#the first half
smallArray=("${data_to_analyze[#]:0:$samSmall}")
#the rest
smallArray2=("${data_to_analyze[#]:$samSmall:$samSmall2}")
#an array of names(which correspond to arrays)
combArray=(smallArray smallArray2)
sizeComb=(${#combArray[#]})
#for the length of the new array
for ((i=0; i<= $sizeComb ; i++)); do
#through first set of data and then loop back around for the second arrays data?
for sample_name in ${combArray[i]}; do
command
wait
command
wait
done
What I imagine this does is gives only the first array of data to the for loop at first. When the first array is done it should go through again with the second array set.
That leaves me with two questions. Is combArray really passing the two smaller arrays? And is there a better way?
You can make a string that looks like an array reference then use it to indirectly access the elements of the referenced array. It even works for elements that contain spaces!
combArray=(smallArray smallArray2)
for array in "${combArray[#]}"
do
indirect=$array[#] # make a string that sort of looks like an array reference
for element in "${!indirect}"
do
echo "Element: $element"
done
done
#!/bin/bash
data_to_analyze=(1 2 3 4 5 6 7 8 9 10)
dataSize=${#data_to_analyze[#]}
((samSmall=dataSize/2,samSmall2=dataSize-samSmall))
smallArray=("${data_to_analyze[#]:0:$samSmall}")
smallArray2=("${data_to_analyze[#]:$samSmall:$samSmall2}")
combArray=(smallArray smallArray2)
sizeComb=${#combArray[#]}
for ((i=0;i<$sizeComb;i++));do
eval 'a=("${'${combArray[i]}'[#]}")'
for sample_name in "${a[#]}";do
...
done
done
EDIT: removed the double quotes near ${combArray[i]} and replaced <= by < in for