Fill Two Arrays on the fly From stdin File in Bash - arrays

I wrote a bash script that reads a file from stdin $1, and needs to read that file line by line within a loop, and based on a condition statement in each iteration, each line tested from the file will feed into one of two new arrays lets say named GOOD array and BAD array. Lastly, I'll display the total elements of each array.
#!/bin/bash
for x in $(cat $1); do
#testing something on x
if [ $? -eq 0 ]; then
#add the current value of x into array called GOOD
else
#add the current value of x into array called BAD
fi
done
echo "Total GOOD elements: ${#GOOD[#]}"
echo "Total BAD elements: ${#BAD[#]}"
What changes should i make to accomplish it?

#!/usr/bin/env bash
# here, we're checking the number of lines more than 5 characters long
# replace with your real test
testMyLine() { (( ${#1} > 5 )); }
good=( ); bad=( )
while IFS= read -r line; do
if testMyLine "$line"; then
good+=( "$line" )
else
bad+=( "$line" )
fi
done <"$1"
echo "Read ${#good[#]} good and ${#bad[#]} bad lines"
Note:
We're using a while read loop to iterate over file contents. This doesn't need to read more than one line into memory at a time (so it won't run out of RAM even with really big files), and it doesn't have unwanted side effects like changing a line containing * to a list of files in the current directory.
We aren't using $?. if foo; then is a much better way to branch on the exit status of foo than foo; if [ $? = 0 ]; then -- in particular, this avoids depending on the value of $? not being changed between when you assign it and when you need it; and it marks foo as "checked", to avoid exiting via set -e or triggering an ERR trap when your boolean returns false.
The use of lower-case variable names is intentional. All-uppercase names are used for shell-builtin variables and names with special meaning to the operating system -- and since defining a regular shell variable overwrites any environment variable with the same name, this convention applies to both types. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html

Related

Create associative array from grep output

I have a grep output and I'm trying to make an associative array from the output that I get.
Here is my grep output:
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
I want to use that output to define an associative array, like this:
array[123456789101]=devid1234
array[11213141516]=devid5678
Is that possible? I'm new at making arrays. I hope someone could help me in my problem.
Either pipe your grep output to a helper script with a while loop containing a simple "0/1" toggle to read two lines taking the last field of each to fill your array, e.g.
#!/bin/bash
declare -A array
declare -i n=0
arridx=
while read -r label value; do # read 2 fields
if [ "$n" -eq 0 ]
then
arridx="${value:1}" # strip 1st and lst 2 chars
arridx="${arridx:0:(-2)}" # save in arridx (array index)
((n++)) # increment toggle
else
arrval="${value:1}" # strip 1st and lst 2 chars
arrval="${arrval:0:(-2)}" # save in arrval (array value)
array[$arridx]="$arrval" # assign to associative array
n=0 # zero toggle
fi
done
for i in ${!array[#]}; do # output array
echo "array[$i] ${array[$i]}"
done
Or you can use process substitution containing the grep command within the script to do the same thing, e.g.
done < <( your grep command )
You can also add a check under the else clause that if [[ $label =~ DeviceId ]] to validate you are on the right line and catch any variation in the grep output content.
Example Input
$ cat dat/grepout.txt
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
Example Use/Output
$ cat dat/grepout.txt | bash parsegrep2array.sh
array[123456789101] devid1234
array[111213141516] devid5678
Parsing out the values is easy, and once you have them you can certainly use those values to build up an array. The trickiest part comes from the fact that you need to combine input from separate lines. Here is one approach; note that this script is verbose on purpose, to show what's going on; once you see what's happening, you can eliminate most of the output:
so.input
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
so.sh
#!/bin/bash
declare -a hardwareInfo
while [[ 1 ]]; do
# read in two lines of input
# if either line is the last one, we don't have enough input to proceed
read lineA < "${1:-/dev/stdin}"
# if EOF or empty line, exit
if [[ "$lineA" == "" ]]; then break; fi
read lineB < "${1:-/dev/stdin}"
# if EOF or empty line, exit
if [[ "$lineB" == "" ]]; then break; fi
echo "$lineA"
echo "$lineB"
hwsn=$lineA
hwsn=${hwsn//HardwareSerialNumber/}
hwsn=${hwsn//\"/}
hwsn=${hwsn//:/}
hwsn=${hwsn//,/}
echo $hwsn
# some checking could be done here to test that the value is numeric
devid=$lineB
devid=${devid//DeviceId/}
devid=${devid//\"/}
devid=${devid//:/}
devid=${devid//,/}
echo $devid
# some checking could be done here to make sure the value is valid
# populate the array
hardwareInfo[$hwsn]=$devid
done
# spacer, for readability of the output
echo
# display the array; in your script, you would do something different and useful
for key in "${!hardwareInfo[#]}"; do echo $key --- ${hardwareInfo[$key]}; done
cat so.input | ./so.sh
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
123456789101
devid1234
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
111213141516
devid5678
111213141516 --- devid5678
123456789101 --- devid1234
I created the input file so.input just for convenience. You would probably pipe your grep output into the bash script, like so:
grep-command | ./so.sh
EDIT #1: There are lots of choices for parsing out the key and value from the strings fed in by grep; the answer from #David C. Rankin shows another way. The best way depends on what you can rely on about the content and structure of the grep output.
There are also several choices for reading two separate lines that are related to each other; David's "toggle" approach is also good, and commonly used; I considered it myself, before going with "read two lines and stop if either is blank".
EDIT #2: I see declare -A in David's answer and in examples on the web; I used declare -a because that's what my version of bash wants (I'm using a Mac). So, just be aware that there can be differences.

Shell Array Cleared for Unknown Reason [duplicate]

This question already has answers here:
A variable modified inside a while loop is not remembered
(8 answers)
Closed 6 years ago.
I have a pretty simple sh script where I make a system cat call, collect the results and parse some relevant information before storing the information in an array, which seems to work just fine. But as soon as I exit the for loop where I store the information, the array seems to clear itself. I'm wondering if I am accessing the array incorrectly outside of the for loop. Relevant portion of my script:
#!/bin/sh
declare -a QSPI_ARRAY=()
cat /proc/mtd | while read mtd_instance
do
# split result into individiual words
words=($mtd_instance)
for word in "${words[#]}"
do
# check for uboot
if [[ $word == *"uboot"* ]]
then
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="uboot"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
elif [[ $word == *"fpga_a"* ]]
then
echo "found it: "$word""
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="fpga_a"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
# other items are added to the array, all successfully
fi
done
echo "length of array: ${#QSPI_ARRAY[#]}"
echo "----------------------"
done
My output is great until I exit the for loop. While within the for loop, the array size increments and I can check that the item has been added. After the for loop is complete I check the array like so:
echo "RESULTING ARRAY:"
echo "length of array: ${#QSPI_ARRAY[#]}"
for qspi in "${QSPI_ARRAY}"
do
echo "qspi instance: $qspi"
done
Here are my results, echod to my display:
dev: size erasesize name
length of array: 0
-------------
mtd0: 00100000 00001000 "qspi-fsbl-uboot"
QSPI_ARRAY[] at index 0: uboot
length of array: 1
-------------
mtd1: 00500000 00001000 "qspi-fpga_a"
QSPI_ARRAY[] at index 1: fpga_a
length of array: 2
-------------
RESULTING ARRAY:
length of array: 0
qspi instance:
EDIT: After some debugging, it seems I have two different arrays here somehow. I initialized the array like so: QSPI_ARRAY=("a" "b" "c" "d" "e" "f" "g"), and after my for-loop for parsing the array it is still a, b, c, etc. How do I have two different arrays of the same name here?
This structure:
cat /proc/mtd | while read mtd_instance
do
...
done
Means that whatever comes between do and done cannot have any effects inside the shell environment that are still there after the done.
The fact that the while loop is on the right hand side of a pipe (|) means that it runs in a subshell. Once the loop exits, so does the subshell. And all of its variable settings.
If you want a while loop which makes changes that stick around, don't use a pipe. Input redirection doesn't create a subshell, and in this case, you can just read from the file directly:
while read mtd_instance
do
...
done </proc/mtd
If you had a more complicated command than a cat, you might need to use process substitution. Still using cat as an example, that looks like this:
while read mtd_instance
do
...
done < <(cat /proc/mtd)
In the specific case of your example code, I think you could simplify it somewhat, perhaps like this:
#!/usr/bin/env bash
QSPI_ARRAY=()
while read -a words; do␣
declare -i mtd_num=${words[0]//[!0-9]/}
for word in "${words[#]}"; do
for type in uboot fpga_a; do
if [[ $word == *$type* ]]; then
QSPI_ARRAY[mtd_num]=$type
break 2
fi
done
done
done </proc/mtd
Is this potentially what you are seeing:
http://mywiki.wooledge.org/BashFAQ/024

Strange scope issues in bash while loop [duplicate]

I can get this to work in ksh but not in bash which is really driving me nuts.
Hopefully it is something obvious that I'm overlooking.
I need to run an external command where each line of the output will be stored at an array index.
This simplified example looks like it is setting the array in the loop correctly however after the loop has completed those array assignments are gone? It's as though the loop is treated completely as an external shell?
junk.txt
this is a
test to see
if this works ok
testa.sh
#!/bin/bash
declare -i i=0
declare -a array
echo "Simple Test:"
array[0]="hello"
echo "array[0] = ${array[0]}"
echo -e "\nLoop through junk.txt:"
cat junk.txt | while read line
do
array[i]="$line"
echo "array[$i] = ${array[i]}"
let i++
done
echo -e "\nResults:"
echo " array[0] = ${array[0]}"
echo " Total in array = ${#array[*]}"
echo "The whole array:"
echo ${array[#]}
Output
Simple Test:
array[0] = hello
Loop through junk.txt:
array[0] = this is a
array[1] = test to see
array[2] = if this works ok
Results:
array[0] = hello
Total in array = 1
The whole array:
hello
So while in the loop, we assign array[i] and the echo verifies it.
But after the loop I'm back at array[0] containing "hello" with no other elements.
Same results across bash 3, 4 and different platforms.
Because your while loop is in a pipeline, all variable assignments in the loop body are local to the subshell in which the loop is executed. (I believe ksh does not run the command in a subshell, which is why you have the problem in bash.) Do this instead:
while read line
do
array[i]="$line"
echo "array[$i] = ${array[i]}"
let i++
done < junk.txt
Rarely, if ever, do you want to use cat to pipe a single file to another command; use input redirection instead.
UPDATE: since you need to run from a command and not a file, another option (if available) is process substitution:
while read line; do
...
done < <( command args ... )
If process substitution is not available, you'll need to output to a temporary file and redirect input from that file.
If you are using bash 4.2 or later, you can execute these two commands before your loop, and the original pipe-into-the-loop will work, since the while loop is the last command in the pipeline.
set +m # Turn off job control; it's probably already off in a non-interactive script
shopt -s lastpipe
cat junk.txt | while read line; do ...; done
UPDATE 2: Here is a loop-less solution based on user1596414's comment
array[0]=hello
IFS=$'\n' array+=( $(command) )
The output of your command is split into words based solely on newlines (so that each line is a separate word), and appends the resulting line-per-slot array to the original. This is very nice if you are only using the loop to build the array. It can also probably be modified to accomodate a small amount of per-line processing, vaguely similar to a Python list comprehension.

Which data structure might be a more efficient implementation?

I was doing an exercise on reading from a setup file in which every line specifies two words and a number. The number denotes the number of words in between the two words specified. Another file – input.txt – has a block of text, and the program attempts to count the number of occurrences in the input file which follows the constraints in each line in the setup file (i.e., two particular words a and b should be separated by n words, where a, b and n are specified in the setup file.
So I've tried to do this as a shell script, but my implementation is probably highly inefficient. I used an array to store the words from the setup file, and then did a linear search on the text file to find out the words, and the works. Here's a bit of the code, if it helps:
#!/bin/sh
j=0
count=0;
m=0;
flag=0;
error=0;
while read line; do
line=($line);
a[j]=${line[0]}
b[j]=${line[1]}
num=${line[2]}
c[j]=`expr $num + 0`
j=`expr $j + 1`
done <input2.txt
while read line2; do
line2=($line2)
for (( i=0; $i<=50; i++ )); do
for (( m=0; $m<j; m++)); do
g=`expr $i + ${c[m]}`
g=`expr $g + 1`
if [ "${line2[i]}" == "${a[m]}" ] ; then
for (( k=$i; $k<$g; k++)); do
if [[ "${line2[k]}" == *.* ]]; then
flag=1
break
fi
done
if [ "${b[m]}" == "${line2[g]}" ] ; then
if [ "$flag" == 1 ] ; then
error=`expr $error + 1`
fi
count=`expr $count + 1`
fi
flag=0
fi
if [ "${line2[i]}" == "${b[m]}" ] ; then
for (( k=$i; $k<$g; k++)); do
if [[ "${line2[k]}" == *.* ]]; then
flag=1
break
fi
done
if [ "${a[m]}" == "${line2[g]}" ] ; then
if [ "$flag" == 1 ] ; then
error=`expr $error + 1`
fi
count=`expr $count + 1`
fi
flag=0
fi
done
done
done <input.txt
count=`expr $count - $error`
echo "| Count = $count |"
As you can see, this takes a lot of time.
I was thinking of a more efficient way to implement this, in C or C++, this time. What could be a possible alternative implementation of this, efficiency considered? I thought of hash tables, but could there be a better way?
I'd like to hear what everyone has to say on this.
Here's a fully working possibility. It is not 100% pure bash since it uses (GNU) sed: I'm using sed to lowercase everything and to get rid of punctuation marks. Maybe you won't need this. Adapt to your needs.
#!/bin/bash
input=input.txt
setup=setup.txt
# The Check function
Check() {
# $1 is word1
# $2 is word2
# $3 is number of words between word1 and word2
nb=0
# Get all positions of w1
IFS=, read -a q <<< "${positions[$1]}"
# Check, for each position, if word2 is at distance $3 from word1
for i in "${q[#]}"; do
[[ ${words[$i+$3+1]} = $2 ]] && ((++nb))
done
echo "$nb"
}
# Slurp input file in an array
words=( $(sed 's/[,.:!?]//g;s/\(.*\)/\L\1/' -- "$input") )
# For each word, specify its positions in file
declare -A positions
pos=0
for i in "${words[#]}"; do
positions[$i]+=$((pos++)),
done
# Do it!
while read w1 w2 p; do
# Check that w1 w2 are not empty
[[ -n $w2 ]] || continue
# Check that p is a number
[[ $p =~ ^[[:digit:]]+$ ]] || continue
n=$(Check "$w1" "$w2" "$p")
[[ $w1 != $w2 ]] && (( n += $(Check "$w2" "$w1" "$p") ))
echo "$w1 $w2 $p: $n"
done < <(sed 's/\(.*\)/\L\1/' -- "$setup")
How does it work:
we first read the whole file input.txt in the array words: a word per field. Observe I'm using sed here to delete all punctuation marks (well, only ,, ., :, !, ?, for testing purposes, add some more if you wish) and to lowercase every letter.
Loop through the array words and for each word, put its position in an associative array positions:
w => "position1,position2,...,positionk,"
Finally, we read the setup.txt file (filtered through sed again to lowercase everything – optional see below). Do a quick check whether the line is valid (2 words and a number) and then call the Check function (twice, for each permutation of the given words, unless both words are equal).
The Check function finds all positions of word1 in file, thanks to associative array positions and then using the array words, check whether word2 is at the given "distance" from word1.
The second sed is optional. I've filtered the setup.txt file through sed to lowercase everything. This sed will leave only very little overhead, so, efficiency-wise, it's not a big deal. You'll be able to add more filtering later to make sure the data is consistent with how the script will use it (e.g., get rid of punctuation marks). Otherwise you could:
Get rid of it altogether: replace the corresponding line (the last line) with just
done < "$setup"
In this case, you'll have to trust the guy/gal who will write the setup.txt file.
Get rid of it as above, but still want to convert everything to lowercase. In this case, below the
while read w1 w2 p; do
line, just add these lines:
w1=${w1,,}
w2=${w2,,}
That's the bash way to lowercase a string.
Caveats. The script will break if:
The number given in setup.txt file starts with a 0 and contains an 8 or a 9. This is because bash will consider it's an octal number, where 8's and 9's are not valid. There are workarounds for this.
The text in input.txt doesn't follow proper typographical practices: a punctuation mark is always followed by a space. E.g., if the input file contains
The quick,brown,dog jumps over the lazy fox
then after the sed treatment the text will look like
The quickbrowndog jumps over the lazy fox
and the words quick, brown and dog won't be treated properly. You can replace the sed substitution s/[,:!?]//g with s/[,:!?]/ /g to convert these symbols with a space. It's up to you, but in that case, abbreviations as, e.g., e.g. and i.e. might not be considered properly… it now really depends what you need to do.
Different character encodings are used… I don't really know how robust you need the script to be, and what languages and encodings you'll consider.
(Add stuff here :).)
About efficiency. I'd say the algorithm is rather efficient. bash is probably not the best suited language for that, but it's a lot of fun, and not that difficult after all if we look at it (less than 20 lines of relevant code, and even less than that!). If you only have 50 files with 50000 words, it's ok, you will not notice too much difference between bash and perl/python/awk/C/you-name-it: bash performs decently quickly for files of this type. Now if you have 100000 files each containing millions of words, well, a different approach should be taken and a different language should be used (but I don't know which one).
If:
it can get complex for the sake of efficiency
the text file can be large
the setup file can have many rows
then I would do it the following way:
As preparation I would create:
A hash map with the index of the word as key and the word as the value (named -say- WORDS). So WORDS[1] would be the first word, WORDS[2] the second, and so on.
A hashmap with the words as keys and the list of indexes as values (named -say- INDEXES). So if WORDS[2] and WORDS[5] is "dog" and none other, than INDEXES["dog"] would yield the numers 2 and 5. The value can be a dynamic indexed array or a linked list. Linked list is better if there are words that occur many times.
You can read the text file, and populate both structures at the same time.
Processing:
For each row of the setup file I would get the indexes in INDEXES[firstword] and check if WORDS[index + wordsinbetween + 1] equals with secondword. If it does, that's a hit.
Notes:
Preparation: You only read the text file once. For each word in the text file, you're doing fast operations thats' performance is not really effected by the amount of words already processed.
Processing: You only read the setup file once. For each row you're here too doing operations that are only effected by the number of occurences of firstword in the text file.

bash trouble assigning to an array index in a loop

I can get this to work in ksh but not in bash which is really driving me nuts.
Hopefully it is something obvious that I'm overlooking.
I need to run an external command where each line of the output will be stored at an array index.
This simplified example looks like it is setting the array in the loop correctly however after the loop has completed those array assignments are gone? It's as though the loop is treated completely as an external shell?
junk.txt
this is a
test to see
if this works ok
testa.sh
#!/bin/bash
declare -i i=0
declare -a array
echo "Simple Test:"
array[0]="hello"
echo "array[0] = ${array[0]}"
echo -e "\nLoop through junk.txt:"
cat junk.txt | while read line
do
array[i]="$line"
echo "array[$i] = ${array[i]}"
let i++
done
echo -e "\nResults:"
echo " array[0] = ${array[0]}"
echo " Total in array = ${#array[*]}"
echo "The whole array:"
echo ${array[#]}
Output
Simple Test:
array[0] = hello
Loop through junk.txt:
array[0] = this is a
array[1] = test to see
array[2] = if this works ok
Results:
array[0] = hello
Total in array = 1
The whole array:
hello
So while in the loop, we assign array[i] and the echo verifies it.
But after the loop I'm back at array[0] containing "hello" with no other elements.
Same results across bash 3, 4 and different platforms.
Because your while loop is in a pipeline, all variable assignments in the loop body are local to the subshell in which the loop is executed. (I believe ksh does not run the command in a subshell, which is why you have the problem in bash.) Do this instead:
while read line
do
array[i]="$line"
echo "array[$i] = ${array[i]}"
let i++
done < junk.txt
Rarely, if ever, do you want to use cat to pipe a single file to another command; use input redirection instead.
UPDATE: since you need to run from a command and not a file, another option (if available) is process substitution:
while read line; do
...
done < <( command args ... )
If process substitution is not available, you'll need to output to a temporary file and redirect input from that file.
If you are using bash 4.2 or later, you can execute these two commands before your loop, and the original pipe-into-the-loop will work, since the while loop is the last command in the pipeline.
set +m # Turn off job control; it's probably already off in a non-interactive script
shopt -s lastpipe
cat junk.txt | while read line; do ...; done
UPDATE 2: Here is a loop-less solution based on user1596414's comment
array[0]=hello
IFS=$'\n' array+=( $(command) )
The output of your command is split into words based solely on newlines (so that each line is a separate word), and appends the resulting line-per-slot array to the original. This is very nice if you are only using the loop to build the array. It can also probably be modified to accomodate a small amount of per-line processing, vaguely similar to a Python list comprehension.

Resources