Read file using awk and iterated over array with nested loop - arrays

I tried to iterate over two arrays (with two nested for loops) using bash.
Unfortunatly bash is really slow by iterating over large arrays. So I tried to use awk.
First
I am reading in two files (around 200.000 lines) and take the tab seperated column I wanna use
START=($(awk -F'\t' '{print $5}' $inputGenes))
I was always thinking that START is now something like an array, but right now I'm not sure any more.
I have a lot of different "arrays" and go to the next step
Second
Everything is working fine with small files and not using awk, but a normal nested bash loop.
Now I was trying to use awk and I fail.
The two variables $len and $varlen are indicating the size of two arrays (read in like before using awk)
len=${#posVCF[#]}
The loops are working but I get no output, because it is not possible to get the information out of the arrays : $posVCF[$i] returns nothing. But I have no idea how to get information out of my arrays variables.
**echo | awk 'BEGIN {for(i=1; i -st $len; i++) {
for (j=1; j -st $varlen; j++) {
if ($posVCF[$i] -gt $START[$j] && $posVCF[$i] -st $END[$j]) {
print $posVCF[$i] " > " $START[$j] " und < " $END[$j]
}
}
}
}'**
Am I doing something wrong by reading the files or do you have any ideas? I'm really new in programming in bash, but I have to write in bash.
I hope you can help me, thank you very much.

You need braces to dereference an array element. Not $posVCF[$i] but ${posVCF[$i]} is correct.
I misread your question. Why do you think you need awk? All your variables are in the shell. You can use C-like for loops in bash:
for ((i=1; i < len; i++)); do
for ((j=1; j < varlen; j++)); do
if (( ${posVCF[$i]} > ${START[$j]} && ${posVCF[$i]} < ${END[$j]} )); then
echo ${posVCF[$i]} " > " ${START[$j]} " und < " ${END[$j]}
fi
done
done
This is using the bash arithmetic evaluation syntax: (( ... ))

Related

Parametrized population of a string in bash from 1 to n

Is there an easy way to populate a dynamic string with a size parameter?
lets say, we have:
Case N=1:
echo "Benchmark,Time_Run1" > $LOGDIR/$FILENAME
however, the run variable is parametric and we want to have all Time_Runs from 1 to n:
Case N=4:
echo "Benchmark,Time_Run1,Time_Run2,Time_Run3,Time_Run4" > $LOGDIR/$FILENAME
and the generic solution should be this form:
Case N=n:
echo "Benchmark,Time_Run1,...,Time_Run${n}" > $LOGDIR/$FILENAME
Is there a way to do that in a single loop rather than having two loops, one looping over n to generate the Run${n} and the other, looping n times to append "Time_Run" to the list (similar to Python)? Thanks!
Use a loop from 1 to $n.
{
printf 'Benchmark'
for ((i = 1; i <= $n; i++)); do
printf ',Time_Run%d' $i
done
printf '\n'
} > $LOGDIR/$FILENAME
One way to populate the output string with a single loop is:
outstr=Benchmark
for ((i=1; i<=n; i++)); do
outstr+=",Time_Run$i"
done
It can also be done without a loop:
eval "printf -v outstr ',%s' Time_Run{1..$n}"
outstr="Benchmark${outstr}"
However, eval is dangerous and should be used only in cases where there is no reasonable alternative. This is not such a case. See Why should eval be avoided in Bash, and what should I use instead?.

How to return an array from read command in a function?

I have a small problem in here with bash
I wrote an array in a simple function and I need to return it as an array with read command and also need to call it somehow.
function myData {
echo 'Enter the serial number of your items : '
read -a sn
return ${sn[#]}
}
for example like this ???
$ ./myapp.sh
Enter the serial number of your items : 92467 90218 94320 94382
myData
echo ${?[#]}
Why we don't have return value in here like other languages ?
thanks for your help...
As others mention, the builtin command return is intended to send the exit status to the caller.
If you want to pass the result of processing in the function to the
caller, there will be several ways:
Use standard output
If you write something to the standard output within a function, the output
is redirected to the caller. The standard output is just a non-structured
stream of bytes. If you want to make it have a special meaning such as an
array, you need to define the structure by assigning a delimiter to some
character(s). If you are sure each element do not contain space, tab, or
newline, you can rely on the default value of IFS:
myfunc() {
echo "92467 90218 94320 94382"
}
ary=( $(myfunc) )
for i in "${ary[#]}"; do
echo "$i"
done
If the elements of the array may contain whitespace or other special
characters and you need to preserve them (such a case as you are handling
filenames), you can use the null character as the delimiter:
myfunc() {
local -a a=("some" "elements" "contain whitespace" $'or \nnewline')
printf "%s\0" "${a[#]}"
}
mapfile -d "" -t ary < <(myfunc)
for i in "${ary[#]}"; do
echo ">$i" # The leading ">" just indicates the start of each element
done
Pass by reference
As other languages, bash>=4.3 has a mechanism to pass the variable by
reference or by name:
myfunc() {
local -n p="$1" # now p refers to the variable with the name of value of $1
for (( i=0; i<${#p[#]}; i++ )); do
((p[i]++)) # increment each value
done
}
ary=(0 1 2)
myfunc "ary"
echo "${ary[#]}" # array elements are modified
Use the array as a global variable
Will be needless to explain its usage and pros/cons.
Hope this helps.

Find and replace in AIX 5.3

I am running AIX 5.3.
I have two flat text files.
One is a "master" list of network devices, along with their communication settings(CLLIFile.tbl).
The other is a list of specific network devices that need to have one setting changed, within the main file(specifically, cn to le). The list file is called DDM2000-030215.txt.
I have gotten as far as looping through DDM2000-030215.txt, pulling the lines I need to change with grep from CLLIFile.tbl, changing cn to le with sed, and sending the output to a file.
The trouble is, all I get are the changed lines. I need to make the changes inside CLLIFile.tbl, because I cannot disturb the formatting or structure.
Here's what we tried, so far:
for i in 'DDM2000-030215.txt'
do
grep -p $ii CLLIFile.tbl| sed s/cn/le/g >> CLLIFileNew.tbl
done
Basically, I need to replace all instances of 'le' with 'cn', within 'CLLIFile.tbl', that are on lines that contain a network element name from 'DDM2000-030215.txt'.
Your sed (on AIX) will not have an -i option (edit the input file),
and you do not want to use a temporary file.
You can try a here construction with vi:
vi CLLIFile.tbl >/dev/null <<END
:1,$ s/cn/le/g
:wq
END
You don't want grep here, because, as you've observed, it only outputs the matching lines. You want to just use sed and have it do the replacement only on the lines that match while passing the other lines through unchanged.
So instead of this:
grep 'pattern' | sed 's/old/new/'
just do this:
sed '/pattern/s/old/new/'
You will have to send the output into a new file, and then move that new file into place to replace the old CLLIfile.tbl. Something like this:
cp CLLIfile.tbl CLLIfile.tbl.bak # make a backup in case something goes awry
sed '/pattern/s/old/new/' CLLIfile.tbl >newclli && mv newclli CLLIfile.tbl
EDIT: Entirely new question, I see. For this, I would use awk:
awk 'NR == FNR { a[++n] = $0; next } { for(i = 1; i <= n; ++i) { if($0 ~ a[i]) { gsub(/cn/, "le"); break } } print }' DDM2000-030215.txt CLLIFile.txt
This works as follows:
NR == FNR { # when processing the first file
# (DDM2000-030215.txt)
a[++n] = $0 # remember the tokens. This assumes that every
# full line of the file is a search token.
next # That is all.
}
{ # when processing the second file (CLLIFile.tbl)
for(i = 1; i <= n; ++i) { # check all remembered tokens
if($0 ~ a[i]) { # if the line matches one
gsub(/cn/, "le") # replace cn with le
break # and break out of the loop, because that only
# needs to be done once.
}
}
print # print the line, whether it was changed or not.
}
Note that if the contents of DDM2000-030215.txt are to be interpreted as fixed strings rather than regexes, you should use index($0, a[i]) instead of $0 ~ a[i] in the check.

How to get index no. of the element of an array which matches regex in bash?

I have a file like this:
c
a
b<
d
f
I need to get the index no. of letter which has < as suffix in a bash script. I thought of reading the file into an array then matching it with the regex .<$. But how do I get the index no. of that element which matches this regex?
I need the index no. because I want to modify this file to get the letter which is pointed to, move the < to the next line, and if it is at the last line, shuffle the order of the lines and place < after the first line.
you need awk '/<$/ { print NR; }' <your-file>
Grep could be used also:
grep -n \< infile
Then:
grep -n \< infile|cut -d : -f 1
So I build the source file,
$ cat file
c
a
b<
d
f<
with below awk, it will move the < to next line, but if it is last line, < will be moved to fist line.
awk '{ if (/</) a[NR]
sub(/</,"")
b[NR]=$0 }
END{ for (i in a)
{ if (i==NR) { b[1]=b[1] "<" }
else{ b[i+1]=b[i+1] "<"}
}
for (i=1;i<=NR;i++) print b[i]
}' file
c<
a
b
d<
f

matrix from data with awk

Warning, not an awk programmer.
I have a file, let's call it file.txt. It has a list of numbers which I will be using to find the information I need from the rest of the directory (which is full of files *.asc). The remaining files do not have the same lengths, but since I will be drawing data based on file.txt, the matrix I will be building will have the same number of rows. All files DO however contain the same number of columns, 3. The first column will be compared to file.txt, the second column of each *.asc file will be used to build the matrix. Here is what I have so far:
awk '
NR==FNR{
A[$1];
next}
$1 in A
{print $2 >> "data.txt";}' file.txt *.asc
This, however, prints the information from each file below the previous file. I want the information side by side, like a matrix. I looked up paste, but it seems to be called before awk, and all examples were only of a couple of files. I tried it still in place of print and did not work.
If anyone could help me out, this would be the last piece to my project. Thanks so much!
You could try:
awk -f ext.awk file.txt *.asc > data.txt
where ext.awk is
NR==FNR {
A[$1]++
next
}
FNR==1 {
if (ARGIND > 2)
print ""
}
$1 in A {
printf "%s ", $2
}
END {
print ""
}
Update
If you do not have Gnu Awk, the ARGIND variable is not available. You could then try
NR==FNR {
A[$1]++
next
}
FNR==1 {
if (++ai > 1)
print ""
}
$1 in A {
printf "%s ", $2
}
END {
print ""
}

Resources