Bad substitution error when passing decremented variable to "sed" command - arrays

I like to use "sed" command to delete two consecutive lines from a file.
I can delete single line using following syntax where
variable "index" holds the line number:
sed -i "${index}d" "$PWD$DEBUG_DIR$DEBUG_MENU"
Since I like to delete consecutive lines I did test this
syntax which supposedly decrements / increments the variable but I am getting a bad substitution error.
Addendum
I am not sure where to post this , but during troubleshooting of
this issue I have discovered that this syntax does not work as expected
(( index++)) nor (( index-- ))

Using sed with "index" in single line deletion twice in a row / sequentially / works and resolves few issues.
#delete command line
sed -i "${index}d" "$PWD$DEBUG_DIR$DEBUG_MENU"
#delete description line
sed -i "${index}d" "$PWD$DEBUG_DIR$DEBUG_MENU"

sed is the right tool for doing s/old/new, that is all. What you're trying to do isn't that so why are you trying to coerce sed into doing something when there's far more appropriate tools can do the job far easier?
The right way to do what your script:
#retrieve first matching line number
index=$(sed -n "/$choice/=" "$PWD$DEBUG_DIR$DEBUG_MENU")
#delete matching line plus next line from file
sed -i "/$index[1], (( $index[1]++))/"
"$PWD$DEBUG_DIR$DEBUG_MENU"
seems to be trying to do is this:
awk -v choice="$choice" '$0~choice{skip=2} !(skip&&skip--)' "$PWD$DEBUG_DIR$DEBUG_MENU"
For example:
$ seq 20 | awk -v choice="4" '$0~choice{skip=2} !(skip&&skip--)'
1
2
3
6
7
8
9
10
11
12
13
16
17
18
19
20
if you only want to delete the first match:
$ seq 20 | awk -v choice="4" '$0~choice{skip=2;cnt++} !(cnt==1 && skip && skip--)'
1
2
3
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
or the 2nd:
$ seq 20 | awk -v choice="4" '$0~choice{skip=2;cnt++} !(cnt==2 && skip && skip--)'
1
2
3
4
5
6
7
8
9
10
11
12
13
16
17
18
19
20
and to skip 5 lines instead of 2:
$ seq 20 | awk -v choice="4" '$0~choice{skip=5} !(skip&&skip--)'
1
2
3
9
10
11
12
13
19
20
Just use the right tool for the right job, don't go digging holes to plant trees with a teaspoon.

If you just want the first one, then quit when you see it:
sed -n "/$choice/ {=;q}" file
But you look like you're processing this file multiple times. There must be a simpler way to do it, if you can describe your over-arching goal.
For example, if you just want to remove the matched line and the next line, but only the first time, you can use awk: here we see "4" and "5" are gone, but "14" and "15" remain:
$ seq 20 | awk '/4/ && !seen {getline; seen++; next} 1'
1
2
3
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
With GNU sed, you can use +1 as the second address in an address range:
index=4
seq 10 | sed "$index,+1 d"
Or if you want to use bash/ksh arithmetic expansion: use the post-increment operator
seq 10 | sed "$((index++)),$index d"
Note here that, due to the pipeline, sed is running in a subshell: even though the index value is now 5, this occurs in a subshell. After the sed command ends, index has value 4 in the current shell.

Related

Picking input record fields with AWK

Let's say we have a shell variable $x containing a space separated list of numbers from 1 to 30:
$ x=$(for i in {1..30}; do echo -n "$i "; done)
$ echo $x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
We can print the first three input record fields with AWK like this:
$ echo $x | awk '{print $1 " " $2 " " $3}'
1 2 3
How can we print all the fields starting from the Nth field with AWK? E.g.
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
EDIT: I can use cut, sed etc. to do the same but in this case I'd like to know how to do this with AWK.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this awk:
awk '{for (i=3; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
or pass start position as argument:
awk -v n=3 '{for (i=n; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
Version 4: Shortest is probably using sub to cut off the first three fields and their separators:
$ echo $x | awk 'sub(/^ *([^ ]+ +){3}/,"")'
Output:
4 5 6 7 8 9 ...
This will, however, preserve all space after $4:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"")'
4 5
so if you wanted the space squeezed, you'd need to, for example:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"") && $1=$1'
4 5
with the exception that if there are only 4 fields and the 4th field happens to be a 0:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"")&&$1=$1'
$ [no output]
in which case you'd need to:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"") && ($1=$1) || 1'
0
Version 1: cut is better suited for the job:
$ cut -d\ -f 4- <<<$x
Version 2: Using awk you could:
$ echo -n $x | awk -v RS=\ -v ORS=\ 'NR>=4;END{printf "\n"}'
Version 3: If you want to preserve those varying amounts of space, using GNU awk you could use split's fourth parameter seps:
$ echo "1 2 3 4 5 6 7" |
gawk '{
n=split($0,a,FS,seps) # actual separators goes to seps
for(i=4;i<=n;i++) # loop from 4th
printf "%s%s",a[i],(i==n?RS:seps[i]) # get fields from arrays
}'
Adding one more approach to add all value into a variable and once all fields values are done with reading just print the value of variable. Change the value of n= as per from which field onwards you want to get the data.
echo "$x" |
awk -v n=3 '{val="";for(i=n; i<=NF; i++){val=(val?val OFS:"")$i};print val}'
With GNU awk, you can use the join function which has been a built-in include since gawk 4.1:
x=$(seq 30 | tr '\n' ' ')
echo "$x" | gawk '#include "join"
{split($0, arr)
print join(arr, 4, length(arr), "|")}
'
4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30
(Shown here with a '|' instead of a ' ' for clarity...)
Alternative way of including join:
echo "$x" | gawk -i join '{split($0, arr); print join(arr, 4, length(arr), "|")}'
Using gnu awk and gensub:
echo $x | awk '{ print gensub(/^([[:digit:]]+[[:space:]]){3}(.*$)/,"\\2",$0)}'
Using gensub, split the string into two sections based on regular expressions and print the second section only.

Use bash variable as array in awk and filter input file by comparing with array

I have bash variable like this:
val="abc jkl pqr"
And I have a file that looks smth like this:
abc 4 5
abc 8 8
def 43 4
def 7 51
jkl 4 0
mno 32 2
mno 9 2
pqr 12 1
I want to throw away rows from file which first field isn't present in the val:
abc 4 5
abc 8 8
jkl 4 0
pqr 12 1
My solution in awk doesn't work at all and I don't have any idea why:
awk -v var="${val}" 'BEGIN{split(var, arr)}$1 in arr{print $0}' file
Just slice the variable into array indexes:
awk -v var="${val}" 'BEGIN{split(var, arr)
for (i in arr)
names[arr[i]]
}
$1 in names' file
As commented in the linked question, when you call split() you get values for the array, while what you want to set are indexes. The trick is to generate another array with this content.
As you see $1 in names suffices, you don't have to call for the action {print $0} when this happens, since it is the default.
As a one-liner:
$ awk -v var="${val}" 'BEGIN{split(var, arr); for (i in arr) names[arr[i]]} $1 in names' file
abc 4 5
abc 8 8
jkl 4 0
pqr 12 1
grep -E "$( echo "${val}"| sed 's/ /|/g' )" YourFile
# or
awk -v val="${val}" 'BEGIN{gsub(/ /, "|",val)} $1 ~ val' YourFile
Grep:
it use a regex (extended version with option -E) that filter all the lines that contains the value. The regex is build OnTheMove in a subshell with a sed that replace the space separator by a | meaning OR
Awk:
use the same princip as the grep but everything is made inside (so no subshell)
use the variable val assigned to the shell variable of the same name
At start of the script (before first line read) change the space, (in val) by | with BEGIN{gsub(/ /, "|",val)}
than, for every line where first field (default field separator is space/blank in awk, so first is the letter group) matching, print it (defaut action of a filter with $1 ~ val.

Converting continuous streaming text into comma separated, multi-line file

I'm trying to convert a continuous stream of data (random) into comma separated and line separated values. I'm converting the continuous data into csv and then after some columns (let's say 80), I need to put a newline and repeat the process until.
Here's what I did for csv:
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp
'tmp' is the file with following data:
"ZaOAkHEnOsBmD5yZk8cNLC26rIFGSLpzuGHtZgb4VUP4x1Pd21bukeK6wUYNueQQMglvExbnjEaHuoxU0b7Dcne5Y4JP332RzgiI3ZDgHOzm0gjDLVat8au7uckM3t60nqFX0Cy93jXZ5T0IaQ4fw2JfdNF1PbqxDxXv7UGiyysFJ8z16TmYQ9zfBRCZvZirIyRboHNEGgMUFZ18y8XXCGrbpeL0WLstzpSuXetmo47G2xPkDLDcFA6cdM4WAFNpoC2ztspY7YyVsoMZdU7D3u3Lm6dDcKuJKdTV6600GkbLuvAamKGyzMtoqW3liI3ybdTNR9KLz2l7KTjUiGgc3Eci5wnhIosAUMkcSQVxFrZdJ9MVyj6duXAk0CJoRvHYuyfdAr7vjlwjkLkYPtFvAZp6wK3dfetoh3ZmhJhUxqzuxOLDQ9FYcvz64iuIUbgXVZoRnpRoNGw7j3fCwyaqCi..."
I'm generating the continuous sequence from /dev/urandom. I'm not getting how to repeat the gawk after some column by adding a newline character after the column ends.
I got it actually. A simple for loop did that.
Here's my whole code:
for i in $(seq 10)
do
tr -dc A-Za-z0-9 < /dev/urandom | head -c 100 > tmp
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp >> tmp1
done
Any optimizations would be appreciated.

Same column of different files into the same new file

I have multiple folders Case-1, Case-2....Case-N and they all have a file named PPD. I want to extract all 2nd columns and put them into one file named 123.dat.
It seems that I cannot use awk in a for loop.
case=$1
for (( i = 1; i <= $case ; i ++ ))
do
file=Case-$i
cp $file/PPD temp$i.dat
awk 'FNR==1{f++}{a[f,FNR]=$2}
END
{for(x=1;x<=FNR;x++)
{for(y=1;y<ARGC;y++)
printf("%s ",a[y,x]);print ""} }'
temp$i.dat >> 123.dat
done
Now 123.dat only has the date of the last PPD in Case-N
I know I can use join(I used that command before) if every PPD file has at least one column the same, but it turns out to be extremely slow if I have lots of Case folders
Maybe
eval paste $(printf ' <(cut -f2 %s)' Case-*/PPD)
There is probably a limit to how many process substitutions you can perform in one go. I did this with 20 columns and it was fine. Process substitutions are a Bash feature, so not portable to other Bourne-compatible shells in general.
The wildcard will be expanded in alphabetical order. If you want the cases in numerical order, maybe use case-[1-9] case-[1-9][0-9] case-[1-9][0-9][0-9] to force the expansion to get the single digits first, then the double digits, etc.
The interaction between the outer shell script and inner awk invocation aren't working the way you expect.
Every time through the loop, the shell script calls awk a new time, which means that f will be unset, and then that first clause will set it to 1. It will never become 2. That is, you are starting a new awk process for each iteration through the outer loop, and awk is starting from scratch each time.
There are other ways to structure your code, but as a minimal tweak, you can pass in the number $i to the awk invocation using the -v option, e.g. awk -v i="$i" ....
Note that there are better ways to structure your overall solution, as other answerers have already suggested; I meant this response to be an answer the question, "Why doesn't this work?" and not "Please rewrite this code."
The below AWK program can help you.
#!/usr/bin/awk -f
BEGIN {
# Defaults
nrecord=1
nfiles=0
}
BEGINFILE {
# Check if the input file is accessible,
# if not skip the file and print error.
if (ERRNO != "") {
print("Error: ",FILENAME, ERRNO)
nextfile
}
}
{
# Check if the file is accessed for the first time
# if so then increment nfiles. This is to keep count of
# number of files processed.
if ( FNR == 1 ) {
nfiles++
} else if (FNR > nrecord) {
# Fetching the maximum size of the record processed so far.
nrecord=FNR
}
# Fetch the second column from the file.
array[nfiles,FNR]=$2
}
END {
# Iterate through the array and print the records.
for (i=1; i<=nrecord; i++) {
for (j=1; j<=nfiles; j++) {
printf("%5s", array[j,i])
}
print ""
}
}
Output:
$ ./get.awk Case-*/PPD
1 11 21
2 12 22
3 13 23
4 14 24
5 15 25
6 16 26
7 17 27
8 18 28
9 19 29
10 20 30
Here the Case*/PPD expands to Case-1/PPD, Case-2/PPD, Case-3/PPD and so on. Below are the source files for which the output was generated.
$ cat Case-1/PPD
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
$ cat Case-2/PPD
11 11 11 11
12 12 12 12
13 13 13 13
14 14 14 14
15 15 15 15
16 16 16 16
17 17 17 17
18 18 18 18
19 19 19 19
20 20 20 20
$ cat Case-3/PPD
21 21 21 21
22 22 22 22
23 23 23 23
24 24 24 24
25 25 25 25
26 26 26 26
27 27 27 27
28 28 28 28
29 29 29 29
30 30 30 30

How to read from two files in the same time shell

I have two files,
A
john 1 2 3 4 5 6 7
Ely 10 9 9 9 9 9 9
Maria 3 5 7 9 2 1 4
Rox 10 10 10 10 10 10 10
B
john 7.5
Ely 4.5
Maria 3,7
Rox 8.5
What i want to do is create another file with only the persons who have in file A their average greater or equal with the 8.5 and in B their mark also greater or equal to 8.5, so in my example the C file would contain only Rox because only she fulfil the criteria.
I have this
#shell program
echo "Fiserul are numele $1"
filename=$1
filename2=$2
echo "">temp.txt
touch results
compara="8.5"
cat $filename | while read -r line
do
nota=0
media=0
echo " $line"
rem=$( echo "$line"| cut -f 2- -d ' ')
for word in $rem
do
echo "$word"
nota=$(($nota+$word))
echo "Nota=$nota"
done
media=$(($nota / 7))
if [ "$(echo $media '>=' $compara | bc -l)" -eq 1 ];
then
nume=$( echo "$line"| cut -f 1 -d ' ')
echo "$nume $media" >> temp.txt
fi
echo "Media : $media"
done
cat $filename2 | while read -r line
do
so I have in the temp.txt files the persons who fulfil the criteria for file A but my question is how can i compare them with the persons from filename2 and create "results" from them ?
I've tried with two while loops but i get an error, could someone please help ?
Thanks!
If you really want to read two files at the same time (which doesn't appear to be your actual question -- join is indeed the right tool for what you're doing), you can open them on different FDs:
while IFS= read -r -u 4 line1 && IFS= read -r -u 5 line2; do
echo "Line from first file: $line1"
echo "Line from second file: $line2"
done 4<file1 5<file2
Use the join command to combine A and B into a single file C:
$ join A.txt B.txt
john 1 2 3 4 5 6 7 7.5
Ely 10 9 9 9 9 9 9 4.5
Maria 3 5 7 9 2 1 4 3,7
Rox 10 10 10 10 10 10 10 8.5
It should be simple to modify your current script to process the data in this form.

Resources