Reading several files into an associative array in bash (>4.0) [duplicate] - arrays

This question already has answers here:
How to pipe input to a Bash while loop and preserve variables after loop ends
(3 answers)
Closed 4 years ago.
I am new to associative arrays in bash so please forgive me if I sound silly somewhere. Let's say am reading through a large file and using bash (version = 4.2.46) associative array to store FDR values for genes. For one file, I am simply doing:
declare -A array
while read ID GeneID geneSymbol chr strand exonStart_0base exonEnd upstreamES upstreamEE downstreamES downstreamEE ID IJC_SAMPLE_1 SJC_SAMPLE_1 IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue FDR IncLevel1 IncLevel2 IncLevelDifference; do
array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR" ;
done < input.txt
Which will store the FDR values that I can print by doing
for key in "${!array[#]}"; do echo "$key->${array[$key]}"; done
# Prints out
"ABHD14B"->0.285807588279,0.898327660004,0.820468496328
"DHFR"->0.464931314555,0.449582575347
...
I naively tried to read several file through my array by doing
declare -A array
find ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt -type f -exec cat {} + |
while read ID GeneID geneSymbol chr strand exonStart_0base exonEnd upstreamES upstreamEE downstreamES downstreamEE ID IJC_SAMPLE_1 SJC_SAMPLE_1IJC_SAMPLE_2 SJC_SAMPLE_2 IncFormLen SkipFormLen PValue FDR IncLevel1 IncLevel2 IncLevelDifference;
do array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR" ;
done
But in this case my array ends up being empty. I can of course cat all the files I need and save them into a single file that I can use as above, but it would be nice to know how to make an associative array to store data from several distinct files.
Thank you very much!

You probably shouldn't be doing this in bash in the first place, but your main problem is that the while loop runs in a subshell induced by the pipeline. Use process substitution to invert the relationship.
(Also, don't give names to all the fields you don't actually use; just split the line into an indexed array and pick out the two fields you actually want.)
while read -a fields; do
geneSymbol=${fields[1]}
FDR=${fields[...]} # some number; i'm not counting
array[$geneSymbol]="${array[$geneSymbol]}${array[$geneSymbol]:+,}$FDR"
done < <(find ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt -type f -exec cat {} +)
find probably isn't necessary; just put your while loop inside a for loop:
for f in ./aligned.filtered/rMAT*/MATS_output/SE.MATS.JunctionCountOnly.txt; do
while read -a fields; do
...
done < "$f"
done

Related

Read directory name and size into associative array

I want to get the name and size of all directories [at the top level] of a specified directory into an associative array, such that the name is used as the key and the size as the value.
I know that I can use mapfile to read the output of a command (this extracts the directory size) into an indexed array:
mapfile -t inter_arry < <( du -d0 "$completePath"* | sed -E 's/^([0-9]*).*$/\1/' );
(I would then loop through this array and use it to populate the associative array.)
I know that I could create a matching array and populate it with the directory name (though there's no way of knowing if there's been a change in the contents between commands), but how can I extract both the size and the name by modifying my code snippet?
Is there any way to skip the intermediate indexed array?
If there are many items, it's faster, though not prettier, to avoid a loop. Here I use du | awk to create the array initialization string:
declare -A ARR=$(
echo '( '$(
du -d0 "$completePath"* |
awk -F$'\t' '{printf "["$2"]="$1" "}'
)')'
)
If there are few items (e.g., thousands or less), use a loop as #Inian suggests:
declare -A ARR
while IFS=$'\t' read size name; do
ARR[$name]=$size
done < <(du -d0 "$completePath"*)

Single line while loop updating array

I am trying to build a while loop that updates the values in an array but I keep getting a command not found error.
i=1
bool=true
declare -a LFT
declare -a RGT
while read -r line; do
${LFT[$i]}=${line:0:1}; ${RGT[$i]}=$(wc -l < temp$i.txt);
if [ ${LFT[$i]} -ne ${RGT[$i]} ]; then
$bool=false;
fi;
((i=i+1));
done<output2.txt
The file I am reading from contains a single digit on each line, and I want to fill the array LFT with each entry being the digit. The array RGT should be filled with the line counts of files denoted temp*.txt. And I want to test to make sure each entry of these two arrays are the same.
However, I keep getting an error: command =# not found, where # is whatever digit is on the line in the file. Am I assigning values to the arrays incorrectly? Also, I get the error: command true=false not found. I am assuming this has something to do with assigning values to the boolean.
Thanks
The issue is on these lines:
${LFT[$i]}=${line:0:1}; ${RGT[$i]}=$(wc -l < temp$i.txt);
Change it to:
LFT[$i]=${line:0:1}; RGT[$i]=$(wc -l < temp$i.txt);
Valid assignment in shell should be:
var=<expression>
rather than
$var=<expression> ## this will be interpreted by the shell as a command
This is one of the common mistakes Bash programmers do. More Bash pitfalls here.

Shell Array Cleared for Unknown Reason [duplicate]

This question already has answers here:
A variable modified inside a while loop is not remembered
(8 answers)
Closed 6 years ago.
I have a pretty simple sh script where I make a system cat call, collect the results and parse some relevant information before storing the information in an array, which seems to work just fine. But as soon as I exit the for loop where I store the information, the array seems to clear itself. I'm wondering if I am accessing the array incorrectly outside of the for loop. Relevant portion of my script:
#!/bin/sh
declare -a QSPI_ARRAY=()
cat /proc/mtd | while read mtd_instance
do
# split result into individiual words
words=($mtd_instance)
for word in "${words[#]}"
do
# check for uboot
if [[ $word == *"uboot"* ]]
then
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="uboot"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
elif [[ $word == *"fpga_a"* ]]
then
echo "found it: "$word""
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="fpga_a"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
# other items are added to the array, all successfully
fi
done
echo "length of array: ${#QSPI_ARRAY[#]}"
echo "----------------------"
done
My output is great until I exit the for loop. While within the for loop, the array size increments and I can check that the item has been added. After the for loop is complete I check the array like so:
echo "RESULTING ARRAY:"
echo "length of array: ${#QSPI_ARRAY[#]}"
for qspi in "${QSPI_ARRAY}"
do
echo "qspi instance: $qspi"
done
Here are my results, echod to my display:
dev: size erasesize name
length of array: 0
-------------
mtd0: 00100000 00001000 "qspi-fsbl-uboot"
QSPI_ARRAY[] at index 0: uboot
length of array: 1
-------------
mtd1: 00500000 00001000 "qspi-fpga_a"
QSPI_ARRAY[] at index 1: fpga_a
length of array: 2
-------------
RESULTING ARRAY:
length of array: 0
qspi instance:
EDIT: After some debugging, it seems I have two different arrays here somehow. I initialized the array like so: QSPI_ARRAY=("a" "b" "c" "d" "e" "f" "g"), and after my for-loop for parsing the array it is still a, b, c, etc. How do I have two different arrays of the same name here?
This structure:
cat /proc/mtd | while read mtd_instance
do
...
done
Means that whatever comes between do and done cannot have any effects inside the shell environment that are still there after the done.
The fact that the while loop is on the right hand side of a pipe (|) means that it runs in a subshell. Once the loop exits, so does the subshell. And all of its variable settings.
If you want a while loop which makes changes that stick around, don't use a pipe. Input redirection doesn't create a subshell, and in this case, you can just read from the file directly:
while read mtd_instance
do
...
done </proc/mtd
If you had a more complicated command than a cat, you might need to use process substitution. Still using cat as an example, that looks like this:
while read mtd_instance
do
...
done < <(cat /proc/mtd)
In the specific case of your example code, I think you could simplify it somewhat, perhaps like this:
#!/usr/bin/env bash
QSPI_ARRAY=()
while read -a words; do␣
declare -i mtd_num=${words[0]//[!0-9]/}
for word in "${words[#]}"; do
for type in uboot fpga_a; do
if [[ $word == *$type* ]]; then
QSPI_ARRAY[mtd_num]=$type
break 2
fi
done
done
done </proc/mtd
Is this potentially what you are seeing:
http://mywiki.wooledge.org/BashFAQ/024

How to copy an array in Bash?

I have an array of applications, initialized like this:
depends=$(cat ~/Depends.txt)
When I try to parse the list and copy it to a new array using,
for i in "${depends[#]}"; do
if [ $i #isn't installed ]; then
newDepends+=("$i")
fi
done
What happens is that only the first element of depends winds up on newDepends.
for i in "${newDepends[#]}"; do
echo $i
done
^^ This would output just one thing. So I'm trying to figure out why my for loop is is only moving the first element. The whole list is originally on depends, so it's not that, but I'm all out of ideas.
a=(foo bar "foo 1" "bar two") #create an array
b=("${a[#]}") #copy the array in another one
for value in "${b[#]}" ; do #print the new array
echo "$value"
done
The simplest way to copy a non-associative array in bash is to:
arrayClone=("${oldArray[#]}")
or to add elements to a preexistent array:
someArray+=("${oldArray[#]}")
Newlines/spaces/IFS in the elements will be preserved.
For copying associative arrays, Isaac's solutions work great.
The solutions given in the other answers won't work for associative arrays, or for arrays with non-contiguous indices. Here are is a more general solution:
declare -A arr=([this]=hello [\'that\']=world [theother]='and "goodbye"!')
temp=$(declare -p arr)
eval "${temp/arr=/newarr=}"
diff <(echo "$temp") <(declare -p newarr | sed 's/newarr=/arr=/')
# no output
And another:
declare -A arr=([this]=hello [\'that\']=world [theother]='and "goodbye"!')
declare -A newarr
for idx in "${!arr[#]}"; do
newarr[$idx]=${arr[$idx]}
done
diff <(echo "$temp") <(declare -p newarr | sed 's/newarr=/arr=/')
# no output
Try this: arrayClone=("${oldArray[#]}")
This works easily.
array_copy() {
set -- "$(declare -p $1)" "$2"
eval "$2=${1#*=}"
}
# Usage examples:
these=(apple banana catalog dormant eagle fruit goose hat icicle)
array_copy these those
declare -p those
declare -A src dest
source=(["It's a 15\" spike"]="and it's 1\" thick" [foo]=bar [baz]=qux)
array_copy src dest
declare -p dest
Note: when copying associative arrays, the destination must already exist as an associative array. If not, array_copy() will create it as a standard array and try to interpret the key names from the associative source as arithmetic variable names, with ugly results.
Isaac Schwabacher's solution is more robust in this regard, but it can't be tidily wrapped up in a function because its eval step evaluates an entire declare statement and bash treats those as equivalent to local when they're inside a function. This could be worked around by wedging the -g option into the evaluated declare but that might give the destination array more scope than it's supposed to have. Better, I think, to have array_copy() perform only the actual copy into an explicitly scoped destination.
You can copy an array by inserting the elements of the first array into the copy by specifying the index:
#!/bin/bash
array=( One Two Three Go! );
array_copy( );
let j=0;
for (( i=0; i<${#array[#]}; i++)
do
if [[ $i -ne 1 ]]; then # change the test here to your 'isn't installed' test
array_copy[$j]="${array[$i]}
let i+=1;
fi
done
for k in "${array_copy[#]}"; do
echo $k
done
The output of this would be:
One
Three
Go!
A useful document on bash arrays is on TLDP.
Problem is to copy array in function to be visible in parent code. This solution works for indexed arrays and if before copying are predefined as declare -A ARRAY, works also for associative arrays.
function array_copy
# $1 original array name
# $2 new array name with the same content
{
local INDEX
eval "
for INDEX in \"\${!$1[#]}\"
do
$2[\"\$INDEX\"]=\"\${$1[\$INDEX]}\"
done
"
}
Starting with Bash 4.3, you can do this
$ alpha=(bravo charlie 'delta 3' '' foxtrot)
$ declare -n golf=alpha
$ echo "${golf[2]}"
delta 3
Managed to copy an array into another.
firstArray=()
secondArray=()
firstArray+=("Element1")
firstArray+=("Element2")
secondArray+=("${firstArray[#]}")
for element in "${secondArray[#]}"; do
echo "${element}"
done
I've found that this works for me (mostly :)) ...
eval $(declare -p base | sed "s,base,target,")
extending the sed command to edit any switches as necessary e.g. if the new structure has to be writeable, to edit out read-only (-r).
I've discovered what was wrong.. My if isn't installed test is two for loops that remove excess characters from file names, and spits them out if they exist on a certain web server. What it wasn't doing was removing a trailing hyphen. So, when it tested it online for availability, they were parsed out. Because "file" exists, but "file-" doesn't.

Bash: Split a string into an array

First of all, let me state that I am very new to Bash scripting. I have tried to look for solutions for my problem, but couldn't find any that worked for me.
Let's assume I want to use bash to parse a file that looks like the following:
variable1 = value1
variable2 = value2
I split the file line by line using the following code:
cat /path/to/my.file | while read line; do
echo $line
done
From the $line variable I want to create an array that I want to split using = as a delimiter, so that I will be able to get the variable names and values from the array like so:
$array[0] #variable1
$array[1] #value1
What would be the best way to do this?
Set IFS to '=' in order to split the string on the = sign in your lines, i.e.:
cat file | while IFS='=' read key value; do
${array[0]}="$key"
${array[1]}="$value"
done
You may also be able to use the -a argument to specify an array to write into, i.e.:
cat file | while IFS='=' read -a array; do
...
done
bash version depending.
Old completely wrong answer for posterity:
Add the argument -d = to your read statement. Then you can do:
cat file | while read -d = key value; do
$array[0]="$key"
$array[1]="$value"
done
while IFS='=' read -r k v; do
: # do something with $k and $v
done < file
IFS is the 'inner field separator', which tells bash to split the line on an '=' sign.

Resources