Parallelizing a for Loop which accesses files - loops

Here is the complete code. In BER_SB, values of K,SB passed to rand-src command and value of sigama passed to transmit command are being calculated in main. Vlues written to BER array by BER_SB are being further used in main.
BER_SB()
{
s=$1
mkdir "$1"
cp ex-ldpc36-5000a.pchk ex-ldpc36-5000a.gen "$1"
cd "$1"
rand-src ex-ldpc36-5000a.src $s "$K"x"$SB"
encode ex-ldpc36-5000a.pchk ex-ldpc36-5000a.gen ex-ldpc36-5000a.src ex-ldpc36-5000a.enc
transmit ex-ldpc36-5000a.enc ex-ldpc36-5000a.rec 1 awgn $sigma
decode ex-ldpc36-5000a.pchk ex-ldpc36-5000a.rec ex-ldpc36-5000a.dec awgn $sigma prprp 250
BER="$(verify ex-ldpc36-5000a.pchk ex-ldpc36-5000a.dec ex-ldpc36-5000a.gen ex-ldpc36-5000a.src)"
echo $BER
}
export BER
export -f BER_SB
K=5000 # No of Message Bits
N=10000 # No of encoded bits
R=$(echo "scale=3; $K/$N" | bc) # Code Rate
# Creation of Parity Check and Generator files
make-ldpc ex-ldpc36-5000a.pchk $K $N 2 evenboth 3 no4cycle
make-gen ex-ldpc36-5000a.pchk ex-ldpc36-5000a.gen dense
# Creation of file to write BER values
echo>/media/BER/BER_LDPC36_5000_E.txt -n
S=1; # Variable to control no of blocks of source messages
for Eb_No in 0.5 1.0
do
B=$(echo "10^($S+1)" | bc)
# No of Blocks are increased for higher Eb/No values
S=$(($S+1))
# As we have four cores in our PC so we will divide number of source blocks into four subblocks to process these in parallel
SB=$(echo "$B/4" | bc)
# Calculation of Noise Variance from Eb/No values
tmp=$(echo "scale=3; e(($Eb_No/10)*l(10))" | bc -l)
sigma=$(echo "scale=3; sqrt(1/(2*$R*$tmp))" | bc)
# Calling of functions to process the each subbloc
parallel BER_SB ::: 1 2 3 4
BER_T= Here I want to process values of BER variables returned by BER_SB function
done

It is not very clear what you want done. From what you write it seems you want the same 3 lines run 4 times in parallel. That is easily done:
runone() {
mkdir "$1"
cd "$1"
rand-src ex-ldpc36-5000a.src 0 5000 1000
encode ex-ldpc36-5000a.pchk ex-ldpc36-5000a.gen ex-ldpc36-5000a.src ex-ldpc36-5000a.enc
transmit ex-ldpc36-5000a.enc ex-ldpc36-5000a.rec 1 awgn .80
}
export -f runone
parallel runone ::: 1 2 3 4
But that does not use the '1 2 3 4' for anything. If you want the '1 2 3 4' used for anything you will need to describe better what you really want.
Edit:
It is unclear whether you have:
Read the examples: LESS=+/EXAMPLE: man parallel
Walked through the tutorial: man parallel_tutorial
Watched the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
and whether I can assume that the material covered in those are known to you.
In your code you use BER[1]..BER[4], but they are not initialized. You also use BER[x] in the function. Maybe you forgot that a sub-shell cannot pass values in an array back to its parent?
If I were you I would move all the computation in the function and call the function with all needed parameters instead of passing them as environment variables. Something like:
parallel BER_SB ::: 1 2 3 4 ::: 0.5 1.0 ::: $S > computed.out
post process computed.out >>/media/BER/BER_LDPC36_5000_E.txt
To keep the arguments in computed.out you can use --tag. That may make it easier to postprocess.

Related

reading multiple matches into arrays with bash

The utility 'sas2ircu' can output multiple lines for every hard drive attached to the host. A sample of the output for a single drive looks like this:
Enclosure # : 5
Slot # : 20
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
I have a bash script that executes the sas2ircu command and does the following with the output:
identifies a drive by the RDY string
reads the numerical value of the enclosure (ie, 5) into an array 'enc'
reads the numerical value of the slot (ie, 20) into another array 'slot'
The code I have serves its purpose, but I'm trying to figure out if I can combine it into a single line and run the sas2ircu command once instead of twice.
mapfile -t enc < <(/root/sas2ircu 0 display|grep -B3 RDY|awk '/Enclosure/{print $NF}')
mapfile -t slot < <(/root/sas2ircu 0 display|grep -B2 RDY|awk '/Slot/{print $NF}')
I've done a bunch of reading on awk but I'm still quite novice with it and haven't come up with anything better than what I have. Suggestions?
Should be able to eliminate the grep and combine the awk scripts into a single awk script; the general idea is to capture the enclosure and slot data and then if/when we see State/RDY we print the enclosure and slot to stdout:
awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}'
I don't have sas2ircu so I'll simulate some data (based on OP's sample):
$ cat raw.dat
Enclosure # : 5
Slot # : 20
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
Enclosure # : 7
Slot # : 12
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
Enclosure # : 9
Slot # : 23
SAS Address : 5003048-0-185f-b21c
State : Off (OFF)
Simulating thw sas2ircu call:
$ cat raw.dat | awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}'
5 20
7 12
The harder part is going to be reading these into 2 separate arrays and I'm not aware of an easy way to do this with a single command (eg, mapfile doesn't provide a way to split an input file across 2 arrays).
One idea using a bash/while loop:
unset enc slot
while read -r e s
do
enc+=( ${e} )
slot+=( ${s} )
done < <(cat raw.dat | awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}')
This generates:
$ typeset -p enc slot
declare -a enc=([0]="5" [1]="7")
declare -a slot=([0]="20" [1]="12")

Populate array in a for loop

I have an array of strings to pass through a script. The script is well-behaved, and will return error code 0 if the string "passes" and non-zero if it "fails." If the string passes, it should be included in a final array to be output or written to file or etc.
The problem I'm having is that the only item ending up in my final array is the first "passing" string.
#!/bin/bash
# validator.sh
if [[ $1 -le 10 ]]; then
exit 0
else
exit 1
fi
#!/bin/bash
# main.sh
numbers=(2 4 6 8 10 12 14 16)
keep=()
for n in ${numbers[#]}; do
if ./validator.sh $n; then
keep+=("$n")
fi
done
echo $keep
Running main.sh produces:
$ ./main.sh
2
but I expect 2 4 6 8 10
Unless you meant keep to be an array of matching elements, change:
keep+=("$n")
to
keep="$keep $n"
That would work with any Bourne compatible shell and is therefore better, if you're looking for BASH specific solution, the below will also work:
keep+="${n} "
If you DO want it to be an array, then in order to output all elements, you can use:
echo ${keep[#]}
As noted by #Jetchisel and #kamilCuk in the comments.
Since you wrote you want to output all elements or save them to a file, I had assumed you don't actually need an array here but perhaps you plan to use this data in other ways later:)

Passing named arrays to another bash script

I would like to pass multiple named arrays to another bash script as they are.
For example, given
outer.sh
echo "outer"
a=(1 2)
b=(3 4 5)
echo ${#a[#]}
echo ${#b[#]}
a=${a[#]} b=${b[#]} sh inner.sh
and inner.sh
echo "inner"
echo ${a[#]}
echo ${b[#]}
echo ${#a[#]}
echo ${#b[#]}
running the outer.sh gives
$ sh outer.sh
outer
2
3
inner
1 2
3 4 5
1
1
That is, even though values are preserved, but their lengths change, which means, they are not arrays anymore, but strings.
How do I pass multiple named arrays to another bash script as they are?
There are several approaches available, each with their own disadvantages.
The Easy Way: Run The Inner Script Inside A Subshell
This means all variables are inherited, including ones (like arrays!) that can't be passed through the environment.
a=(1 2)
b=(3 4 5)
(. inner)
Of course, it also means that shell settings (IFS, set -e, etc) are inherited too, so inner.sh needs to be written robustly to handle whatever setup it may happen to receive; and you can't rewrite it in a different / non-shell language later.
The Unsafe Way: Pass eval-able code (and trust your caller!)
Modify inner.sh to run [[ $setup ]] && eval "$setup", and then invoke it as:
setup=$(declare -p a b) ./inner
Obviously, this is a severe security risk if you don't control the process environment.
The Hard Way: Deserialize into individual elements
Here, we pass each array as its name, its length, and then its original elements.
inner needs to be modified to copy items off its command-argument list back into the arrays, as in the following example:
while (( $# )); do # iterating over our argument list:
dest_name=$1; shift # expect the variable name first
dest_size=$1; shift # then its size
declare -g -a "$dest_name=( )" # initialize our received variable as empty
declare -n dest="$dest_name" # bash 4.3: make "dest" point to our target name
while (( dest_size )) && (( $# )); do # and then, for up to "size" arguments...
dest+=( "$1" ); shift # pop an argument off the list onto an array
(( dest_size -= 1 )) # and decrease the count left in "size"
done
unset -n dest # end that redirection created above
done
...and then expand into that format in outer:
./inner a "${#a[#]}" "${a[#]}" b "${#b[#]}" "${b[#]}"

Shell Array Cleared for Unknown Reason [duplicate]

This question already has answers here:
A variable modified inside a while loop is not remembered
(8 answers)
Closed 6 years ago.
I have a pretty simple sh script where I make a system cat call, collect the results and parse some relevant information before storing the information in an array, which seems to work just fine. But as soon as I exit the for loop where I store the information, the array seems to clear itself. I'm wondering if I am accessing the array incorrectly outside of the for loop. Relevant portion of my script:
#!/bin/sh
declare -a QSPI_ARRAY=()
cat /proc/mtd | while read mtd_instance
do
# split result into individiual words
words=($mtd_instance)
for word in "${words[#]}"
do
# check for uboot
if [[ $word == *"uboot"* ]]
then
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="uboot"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
elif [[ $word == *"fpga_a"* ]]
then
echo "found it: "$word""
mtd_num=${words[0]}
index=${mtd_num//[!0-9]/} # strip everything except the integers
QSPI_ARRAY[$index]="fpga_a"
echo "QSPI_ARRAY[] at index $index: ${QSPI_ARRAY[$index]}"
# other items are added to the array, all successfully
fi
done
echo "length of array: ${#QSPI_ARRAY[#]}"
echo "----------------------"
done
My output is great until I exit the for loop. While within the for loop, the array size increments and I can check that the item has been added. After the for loop is complete I check the array like so:
echo "RESULTING ARRAY:"
echo "length of array: ${#QSPI_ARRAY[#]}"
for qspi in "${QSPI_ARRAY}"
do
echo "qspi instance: $qspi"
done
Here are my results, echod to my display:
dev: size erasesize name
length of array: 0
-------------
mtd0: 00100000 00001000 "qspi-fsbl-uboot"
QSPI_ARRAY[] at index 0: uboot
length of array: 1
-------------
mtd1: 00500000 00001000 "qspi-fpga_a"
QSPI_ARRAY[] at index 1: fpga_a
length of array: 2
-------------
RESULTING ARRAY:
length of array: 0
qspi instance:
EDIT: After some debugging, it seems I have two different arrays here somehow. I initialized the array like so: QSPI_ARRAY=("a" "b" "c" "d" "e" "f" "g"), and after my for-loop for parsing the array it is still a, b, c, etc. How do I have two different arrays of the same name here?
This structure:
cat /proc/mtd | while read mtd_instance
do
...
done
Means that whatever comes between do and done cannot have any effects inside the shell environment that are still there after the done.
The fact that the while loop is on the right hand side of a pipe (|) means that it runs in a subshell. Once the loop exits, so does the subshell. And all of its variable settings.
If you want a while loop which makes changes that stick around, don't use a pipe. Input redirection doesn't create a subshell, and in this case, you can just read from the file directly:
while read mtd_instance
do
...
done </proc/mtd
If you had a more complicated command than a cat, you might need to use process substitution. Still using cat as an example, that looks like this:
while read mtd_instance
do
...
done < <(cat /proc/mtd)
In the specific case of your example code, I think you could simplify it somewhat, perhaps like this:
#!/usr/bin/env bash
QSPI_ARRAY=()
while read -a words; do␣
declare -i mtd_num=${words[0]//[!0-9]/}
for word in "${words[#]}"; do
for type in uboot fpga_a; do
if [[ $word == *$type* ]]; then
QSPI_ARRAY[mtd_num]=$type
break 2
fi
done
done
done </proc/mtd
Is this potentially what you are seeing:
http://mywiki.wooledge.org/BashFAQ/024

What is the shell script instruction to divide a file with sorted lines to small files?

I have a large text file with the next format:
1 2327544589
1 3554547564
1 2323444333
2 3235434544
2 3534532222
2 4645644333
3 3424324322
3 5323243333
...
And the output should be text files with a suffix in the name with the number of the first column of the original file keeping the number of the second column in the corresponding output file as following:
file1.txt:
2327544589
3554547564
2323444333
file2.txt:
3235434544
3534532222
4645644333
file3.txt:
3424324322
5323243333
...
The script should run on Solaris but I'm also having trouble with the instruction awk and options of another instruccions like -c with cut; its very limited so I am searching for common commands on Solaris. I am not allowed to change or install anything on the system. Using a loop is not very efficient because the script takes too long with large files. So aside from using the awk instruction and loops, any suggestions?
Something like this perhaps:
$ awk 'NF>1{print $2 > "file"$1".txt"}' input
$ cat file1.txt
2327544589
3554547564
2323444333
or if you have bash available, try this:
#!/bin/bash
while read a b
do
[ -z $a ] && continue
echo $b >> "file"$a".txt"
done < input
output:
$ paste file{1..3}.txt
2327544589 3235434544 3424324322
3554547564 3534532222 5323243333
2323444333 4645644333

Resources