How to remove the last element of an array in bash? - arrays

The syntax to delete an element from an array can be found here: Remove an element from a Bash array
Also, here is how to find the last element of an array: https://unix.stackexchange.com/questions/198787/is-there-a-way-of-reading-the-last-element-of-an-array-with-bash
But how can I mix them (if possible) together to remove the last element of the array ?
I tried this:
TABLE_COLUMNS=${TABLE_COLUMNS[#]/${TABLE_COLUMNS[-1]}}
But it throws:
bad array subscript

You can use unset to remove a specific element of an array given its position.
$ foo=(1 2 3 4 5)
$ printf "%s\n" "${foo[#]}"
1
2
3
4
5
$ unset 'foo[-1]'
$ printf "%s\n" "${foo[#]}"
1
2
3
4

Edit: This is useful for printing elements except the last without altering the array. See chepner's answer for a far more convenient solution to OP.
Substring expansions* could be used on arrays for extracting subarrays, like:
TABLE_COLUMNS=("${TABLE_COLUMNS[#]::${#TABLE_COLUMNS[#]}-1}")
* The syntax is:
${parameter:offset:length}
Both offset and length are arithmetic expressions, an empty offset implies 1. Used on array expansions (i.e. when parameter is an array name subscripted with * or #), the result is at most length elements starting from offset.

Related

sliced arrays flatten to len=1?

I am pulling my hair out manipulating arrays in bash. I have an array of strings, which contain spaces. I would like an array containing all but the first element of my input array.
input=("first string" "second string" "third string")
echo ${#input[#]}
# len(input)=3
# get slice of all except for first element of input
slice=${input[#]:1}
echo ${#slice[#]}
# expect 2, but get 1
echo $slice
# second string third string
# slice should contain ("second string" "third string"), but instead is "second string third string"
Slicing the array clearly works to eliminate the first element, but the result appears to be a concatenation of all remaining strings, rather than an array. Is there a way to slice an array in bash and get an array as a result?
(sorry, I'm not new to bash, but I've never used it for much before, and I can't find any documentation showing why my slice is flattened)
First off, you should always quote variable expansions. Be very wary of any solution that relies on unquoted expansions. ShellCheck.net is a great tool for catching bugs related to quoting (among many other issues).
To your specific issue, slice=${input[#]:1} does not do what you want. It defines a single scalar variable slice rather than an array, meaning the array expansion (denoted by the [#]) will first be munged into a single string using the current IFS. Here's a demo:
$ arr=(1 2 '3 4')
$ IFS=,
$ var="${arr[#]:1}"
$ echo "$var"
2,3 4
To instead declare and populate an array use the =() notation, like so:
$ var=("${arr[#]:1}")
$ printf '%s\n' "${var[#]}"
2
3 4
Indexes are reset, element 1 is now element 0:
slice=("${input[#]:1}")
Element and index are removed, the first element is now index 1, not index 0:
unset input[0]
${#slice[#]} or ${#input[#]} will now be 1 less than the previous value of ${#input[#]}. Starting out with three elements in slice, the values of "${!slice[#]}" and "${!input[#]}", will be 0 1 and 1 2 respectively (for either the first or second approach)
If you don't quote slice=("${input[#]:1}"), each array element is split on whitespace, creating many more elements.

Perl: How to delete elements from array without spaces

How to delete the elements of array from index 4 to 10. I am doing like this
#!usr/bin/perl
#array=1..10;
#array[0..3]=#array[5..8];
#array[4..10]=();
$str=(join ",",#array);
print "$str\n";
and getting output as
6,7,8,9,,,,,,,
How to remove extra ","?
You're doing a bit more than trying to delete elements of the array in that code...
Anyways, you want splice to remove a contiguous range of array elements:
splice #array, 4, 6;
I'm assuming this is some kind of homework question, because if it's not you're doing some REALLY strange things with this code.
But let's take the problem at hand - the easiest way to filter an array is with grep
my #no_empty = grep { defined } #array;
print join ( ",", #no_empty )
In this we use grep to filter out any undef array elements. The element is returned if the expression in the braces evaluates as true - $_ is set to the current element, and used in any tests that implicitly work on $_. defined does, as does regular expression matches.
However if you genuinely just want to truncate the array to an arbitrary number, you can just do:
#array = #array[0..4]
You start with this array:
my #array = 1..10;
Now you have an array of 10 elements.
1 2 3 4 5 6 7 8 9 10
In your next step, you perform an array slice then assign that to another array slice:
#array[0..3] = #array[5..8];
This leaves alone the elements from 5 to 8, but does a list assignment of their values to the elements from 0 to 3, so you end up with some duplicated elements:
6 7 8 9 5 6 7 8 9 10
Now you want to get rid of everything after index 4. You try that with another list assignment:
#array[4..10] = ();
However, that just assigns the list on the right to the list on the right. As with any list assignment, the left hand elements get their corresponding elements from the right hand list. If the right hand list doesn't have enough elements, Perl uses undef for the rest of the right hand elements. That's why you still have 10 elements in #array:
6 7 8 9 undef undef undef undef undef undef undef
If you want 4 to be the last index, you can assign to the last index of the array. Whatever number you assign becomes the last index, either shortening or extending the array. Since you wanted to get rid of elements 4 and beyond, you can set the last index to 3 (the index right before the one you want to remove):
$#array = 3;
Now you've truncated the array to four elements:
6 7 8 9
You can also do this with splice, which replaces part of an array:
splice #array, $start_index, $length, #replacement;
You might use the starting index to figure out the length since you want to go all the way to the end. Then, replace that portion of the array with the empty list, which effectively makes the array shorter:
my $start = 4;
splice #array, $start, #array - $start, ();
Leaving off the replacement list is the same as the empty list:
splice #array, $start, #array - $start;
This is much more handy when you want to remove parts in the middle. This removes three elements starting at index 4 (so there will be stuff left over at the end):
splice #array, 4, 3;
Now your array has elements that were at the beginning and end of your array:
6,7,8,9,8,9,10
Without shortening the array
There's another sort of problem. You don't want to change the array, but you don't want to deal with empty fields. You can use a grep to select only the defined elements:
say join ',', grep { defined } #array;
If you have undefined elements in the middle of the array, this might be a problem if you expect columns to line up properly. Removing a column in the middle shifts the other columns. You may not care about that though.
Similarly, you might turn the undefined values into something that makes sense for the problem. A map can inspect the value and decide to pass it through or transform it. In this example, undef values turn into 0:
say join ',', map { defined ? $_ : 0 } #array;
6,7,8,9,0,0,0,0,0,0,0
Or "NULL":
say join ',', map { defined ? $_ : 'NULL' } #array;
6,7,8,9,NULL,NULL,NULL,NULL,NULL,NULL,NULL
Or even just undef as a string:
say join ',', map { defined ? $_ : 'undef' } #array;
6,7,8,9,undef,undef,undef,undef,undef,undef,undef
Sometimes those are handy to see what's going on.
And map can act like grep to filter an array. Use the empty list to pass on no elements from the map:
say join ',', map { defined ? $_ : () } #array;

How to slice a variable into array indexes?

There is this typical problem: given a list of values, check if they are present in an array.
In awk, the trick val in array does work pretty well. Hence, the typical idea is to store all the data in an array and then keep doing the check. For example, this will print all lines in which the first column value is present in the array:
awk 'BEGIN {<<initialize the array>>} $1 in array_var' file
However, it is initializing the array takes some time because val in array checks if the index val is in array, and what we normally have stored in array is a set of values.
This becomes more relevant when providing values from command line, where those are the elements that we want to include as indexes of an array. For example, in this basic example (based on a recent answer of mine, which triggered my curiosity):
$ cat file
hello 23
bye 45
adieu 99
$ awk -v values="hello adieu" 'BEGIN {split(values,v); for (i in v) names[v[i]]} $1 in names' file
hello 23
adieu 99
split(values,v) slices the variable values into an array v[1]="hello"; v[2]="adieu"
for (i in v) names[v[i]] initializes another array names[] with names["hello"] and names["adieu"] with empty value. This way, we are ready for
$1 in names that checks if the first column is any of the indexes in names[].
As you see, we slice into a temp variable v to later on initialize the final and useful variable names[].
Is there any faster way to initialize the indexes of an array instead of setting one up and then using its values as indexes of the definitive?
No, that is the fastest (due to hash lookup) and most robust (due to string comparison) way to do what you want.
This:
BEGIN{split(values,v); for (i in v) names[v[i]]}
happens once on startup and will take close to no time while this:
$1 in array_var
which happens once for every line of input (and so is the place that needs to have optimal performance) is a hash lookup and so the fastest way to compare a string value to a set of strings.
not an array solution but one trick is to use pattern matching. To eliminate partial matches wrap the search and array values with the delimiter. For your example,
$ awk -v values="hello adieu" 'FS values FS ~ FS $1 FS' file
hello 23
adieu 99

Why does awk seem to randomize the array?

If you look at output of this awk test, you see that array in awk seems to be printed at some random pattern. It seems to be in same order for same number of input. Why does it do so?
echo "one two three four five six" | awk '{for (i=1;i<=NF;i++) a[i]=$i} END {for (j in a) print j,a[j]}'
4 four
5 five
6 six
1 one
2 two
3 three
echo "P04637 1A1U 1AIE 1C26 1DT7 1GZH 1H26 1HS5 1JSP 1KZY 1MA3 1OLG 1OLH 1PES 1PET 1SAE 1SAF 1SAK 1SAL 1TSR 1TUP 1UOL 1XQH 1YC5 1YCQ" | awk '{for (i=1;i<=NF;i++) a[i]=$i} END {for (j in a) print j,a[j]}'
17 1SAF
4 1C26
18 1SAK
5 1DT7
19 1SAL
6 1GZH
7 1H26
8 1HS5
9 1JSP
10 1KZY
20 1TSR
11 1MA3
21 1TUP
12 1OLG
22 1UOL
13 1OLH
23 1XQH
14 1PES
1 P04637
24 1YC5
15 1PET
2 1A1U
25 1YCQ
16 1SAE
3 1AIE
Why does it do so, is there rule for this?
From 8. Arrays in awk --> 8.5 Scanning All Elements of an Array in the GNU Awk user's guide when referring to the for (value in array) syntax:
The order in which elements of the array are accessed by this
statement is determined by the internal arrangement of the array
elements within awk and cannot be controlled or changed. This can lead
to problems if new elements are added to array by statements in the
loop body; it is not predictable whether or not the for loop will
reach them. Similarly, changing var inside the loop may produce
strange results. It is best to avoid such things.
So if you want to print the array in the order you store it, then you have to use the classical for loop:
for (j=1; j<=NF; j++) print j,a[j]
Example:
$ awk '{for (i=1;i<=NF;i++) a[i]=$i} END {for (j=1; j<=NF; j++) print j,a[j]}' <<< "P04637 1A1U 1AIE 1C26 1DT7 1GZH 1H26 1HS5 1JSP 1KZY 1MA3 1OLG 1OLH 1PES 1PET 1SAE 1SAF 1SAK 1SAL 1TSR 1TUP 1UOL 1XQH 1YC5 1YCQ"
1 P04637
2 1A1U
3 1AIE
4 1C26
5 1DT7
6 1GZH
7 1H26
8 1HS5
9 1JSP
10 1KZY
11 1MA3
12 1OLG
13 1OLH
14 1PES
15 1PET
16 1SAE
17 1SAF
18 1SAK
19 1SAL
20 1TSR
21 1TUP
22 1UOL
23 1XQH
24 1YC5
25 1YCQ
Awk uses hash tables to implement associative arrays. This is just an inherent property of this particular data structure. The location that a particular element is stored into the array depends on the hash of the value. Other factors to consider is the implementation of the hash table. If it is memory efficient, it will limit the range each key gets stored in using the modulus function or some other method. You also may get clashing hash values for different keys so chaining will occur, again affecting the order depending on which key was inserted first.
The construct (key in array) is perfectly fine when used appropriately to loop over every key but you cannot count on the order and you should not update array whilst in the loop as you may end up process array[key] multiple times by mistake.
There is a good decription of hash tables in the book Think Complexity.
The issue is the operator you use to get the array indices, not the fact that the array is stored in a hash table.
The in operator provides the array indices in a random(-looking) order (which IS by default related to the hash table but that's an implementation choice and can be modified).
A for loop that explicitly provides the array indices in a numerically increasing order also operates on the same hash table that the in operator on but that produces output in a specific order regardless.
It's just 2 different ways of getting the array indices, both of which work on a hash table.
man awk and look up the in operator.
If you want to control the output order using the in operator, you can do so with GNU awk (from release 4.0 on) by populating PROCINFO["sorted_in"]. See http://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Array-Traversal for details.
Some common ways to access array indices:
To print array elements in an order you don't care about:
{a[$1]=$0} END{for (i in a) print i, a[i]}
To print array elements in numeric order of indices if the indices are numeric and contiguous starting at 1:
{a[++i]=$0} END{for (i=1;i in a;i++) print i, a[i]}
To print array elements in numeric order of indices if the indices are numeric but non-contiguous:
{a[$1]=$0; min=($1<min?$1:min); max=($1>max?$1:max)} END{for (i=min;i<=max;i++) if (i in a) print i, a[i]}
To print array elements in the order they were seen in the input:
{a[$1]=$0; b[++max]=$1} END{for (i=1;i <= max;i++) print b[i], a[b[i]]}
To print array elements in a specific order of indices using gawk 4.0+:
BEGIN{PROCINFO["sorted_in"]=whatever} {a[$1]=$0} END{for (i in a) print i, a[i]}
For anything else, write your own code and/or see gawk asort() and asorti().
If you are using gawk or mawk, you can also set an env variable WHINY_USERS, which will sort indices before iterating.
Example:
echo "one two three four five six" | WHINY_USERS=true awk '{for (i=1;i<=NF;i++) a[i]=$i} END {for (j in a) print j,a[j]}'
1 one
2 two
3 three
4 four
5 five
6 six
From mawk's manual:
WHINY_USERS
This is an undocumented gawk feature. It tells mawk to sort array indices before it starts to iterate over the elements of an array.

Bash: Can an array hold the name of another array?

I am writing a program and trying to break up data, which is stored in an array, in order to make it run faster.
I am attempting to go about it this way:
data_to_analyze=(1 2 3 4 5 6 7 8 9 10)
#original array size
dataSize=(${#data_to_analyze[#]})
#half of that size
let samSmall="$dataSize/2"
#the other half
let samSmall2=("$dataSize - $samSmall -1")
#the first half
smallArray=("${data_to_analyze[#]:0:$samSmall}")
#the rest
smallArray2=("${data_to_analyze[#]:$samSmall:$samSmall2}")
#an array of names(which correspond to arrays)
combArray=(smallArray smallArray2)
sizeComb=(${#combArray[#]})
#for the length of the new array
for ((i=0; i<= $sizeComb ; i++)); do
#through first set of data and then loop back around for the second arrays data?
for sample_name in ${combArray[i]}; do
command
wait
command
wait
done
What I imagine this does is gives only the first array of data to the for loop at first. When the first array is done it should go through again with the second array set.
That leaves me with two questions. Is combArray really passing the two smaller arrays? And is there a better way?
You can make a string that looks like an array reference then use it to indirectly access the elements of the referenced array. It even works for elements that contain spaces!
combArray=(smallArray smallArray2)
for array in "${combArray[#]}"
do
indirect=$array[#] # make a string that sort of looks like an array reference
for element in "${!indirect}"
do
echo "Element: $element"
done
done
#!/bin/bash
data_to_analyze=(1 2 3 4 5 6 7 8 9 10)
dataSize=${#data_to_analyze[#]}
((samSmall=dataSize/2,samSmall2=dataSize-samSmall))
smallArray=("${data_to_analyze[#]:0:$samSmall}")
smallArray2=("${data_to_analyze[#]:$samSmall:$samSmall2}")
combArray=(smallArray smallArray2)
sizeComb=${#combArray[#]}
for ((i=0;i<$sizeComb;i++));do
eval 'a=("${'${combArray[i]}'[#]}")'
for sample_name in "${a[#]}";do
...
done
done
EDIT: removed the double quotes near ${combArray[i]} and replaced <= by < in for

Resources