Awk: extract different columns from many different files - file

File Example
I have a 3-10 amount of files with:
- different number of columns
- same number of rows
- inconsistent spacing (sometimes one space, other tabs, sometimes many spaces) **within** the very files like the below
> 0 55.4 9.556E+09 33
> 1 1.3 5.345E+03 1
> ........
> 33 134.4 5.345E+04 932
>
........
I need to get column (say) 1 from file1, column 3 from file2, column 7 from file3 and column 1 from file4 and combine them into a single file, side by side.
Trial 1: not working
paste <(cut -d[see below] -f1 file1) <(cut -d[see below] -f3 file2) [...]
where the delimiter was ' ' or empty.
Trial 2: working with 2 files but not with many files
awk '{
a1=$1;b1=$4;
getline <"D2/file1.txt";
print a1,$1,b1,$4
}' D1/file1.txt >D3/file1.txt
Now more general question:
How can I extract different columns from many different files?

In your paste / cut attempt, replace cut by awk:
$ paste <(awk '{print $1}' file1 ) <(awk '{print $3}' file2 ) <(awk '{print $7}' file3) <(awk '{print $1}' file4)

Assuming each of your files has the same number of rows, here's one way using GNU awk. Run like:
awk -f script.awk file1.txt file2.txt file3.txt file4.txt
Contents of script.awk:
FILENAME == ARGV[1] { one[FNR]=$1 }
FILENAME == ARGV[2] { two[FNR]=$3 }
FILENAME == ARGV[3] { three[FNR]=$7 }
FILENAME == ARGV[4] { four[FNR]=$1 }
END {
for (i=1; i<=length(one); i++) {
print one[i], two[i], three[i], four[i]
}
}
Note:
By default, awk separates columns on whitespace. This includes tab characters and spaces, and any amount of these. This makes awk ideal for files with inconsistent spacing. You can also expand the above code to include more files if you wish.

The combination of cut and paste should work:
$ cat f1
foo
bar
baz
$ cat f2
1 2 3
4 5 6
7 8 9
$ cat f3
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -d' ' -f3 f3)
foo 2 c
bar 5 g
baz 8 k
Edit: This works with tabs, too:
$ cat f4
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -f3 f4)
foo 2 c
bar 5 g
baz 8 k

Related

Picking input record fields with AWK

Let's say we have a shell variable $x containing a space separated list of numbers from 1 to 30:
$ x=$(for i in {1..30}; do echo -n "$i "; done)
$ echo $x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
We can print the first three input record fields with AWK like this:
$ echo $x | awk '{print $1 " " $2 " " $3}'
1 2 3
How can we print all the fields starting from the Nth field with AWK? E.g.
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
EDIT: I can use cut, sed etc. to do the same but in this case I'd like to know how to do this with AWK.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this awk:
awk '{for (i=3; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
or pass start position as argument:
awk -v n=3 '{for (i=n; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
Version 4: Shortest is probably using sub to cut off the first three fields and their separators:
$ echo $x | awk 'sub(/^ *([^ ]+ +){3}/,"")'
Output:
4 5 6 7 8 9 ...
This will, however, preserve all space after $4:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"")'
4 5
so if you wanted the space squeezed, you'd need to, for example:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"") && $1=$1'
4 5
with the exception that if there are only 4 fields and the 4th field happens to be a 0:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"")&&$1=$1'
$ [no output]
in which case you'd need to:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"") && ($1=$1) || 1'
0
Version 1: cut is better suited for the job:
$ cut -d\ -f 4- <<<$x
Version 2: Using awk you could:
$ echo -n $x | awk -v RS=\ -v ORS=\ 'NR>=4;END{printf "\n"}'
Version 3: If you want to preserve those varying amounts of space, using GNU awk you could use split's fourth parameter seps:
$ echo "1 2 3 4 5 6 7" |
gawk '{
n=split($0,a,FS,seps) # actual separators goes to seps
for(i=4;i<=n;i++) # loop from 4th
printf "%s%s",a[i],(i==n?RS:seps[i]) # get fields from arrays
}'
Adding one more approach to add all value into a variable and once all fields values are done with reading just print the value of variable. Change the value of n= as per from which field onwards you want to get the data.
echo "$x" |
awk -v n=3 '{val="";for(i=n; i<=NF; i++){val=(val?val OFS:"")$i};print val}'
With GNU awk, you can use the join function which has been a built-in include since gawk 4.1:
x=$(seq 30 | tr '\n' ' ')
echo "$x" | gawk '#include "join"
{split($0, arr)
print join(arr, 4, length(arr), "|")}
'
4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30
(Shown here with a '|' instead of a ' ' for clarity...)
Alternative way of including join:
echo "$x" | gawk -i join '{split($0, arr); print join(arr, 4, length(arr), "|")}'
Using gnu awk and gensub:
echo $x | awk '{ print gensub(/^([[:digit:]]+[[:space:]]){3}(.*$)/,"\\2",$0)}'
Using gensub, split the string into two sections based on regular expressions and print the second section only.

bash: store/update text data [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
so for reading the list of file, I use this code here below:
IFS=$'\n' read -d '' -r -a data < ./somefolder/mytext.txt
for i in {0..9} #i know that i have 10 items, thats why i use 0..9
do
echo "${data[$i]}"
done
lets say i have 1-10 in the txt file, so it should print like below:
1
2
3
4
5
6
7
8
9
10
Questions:
is there any simpler way to read/write the text list than this?
how to save/update/overwrite data of mytext.txt? lets say change 4 to 88 for example.
Full example:
#!bin/bash
IFS=$'\n' read -d '' -r -a data < ./somefolder/mytext.txt
for i in {0..9} #i know that i have 10 items, thats why i use 0..9
do
echo "${data[$i]}"
done
echo "change 4 to anything"
read any
update(){
for n in {0..9}
do
if [[ n == 3 ]]; then
echo any
else
echo "${data[$n]}"
fi
done
}
update > ./somefolder/mytext.txt
#i dont know what i should do, it throws some errors saying syntax error
echo "saved"
exit 0
This is the code and output of the code, it is not the same as you describe in the comments.
printf '%s\n' {a..z} > file.txt
cat file.txt
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
A quick way of showing line numbers by using grep
grep -n . file.txt
A function to loop through an array.
func() {
n=1
for f; do
if (( n == 3 )); then
printf '%d %s\n' "$n" foo
else
printf '%d %s\n' "$n" "$f"
fi
((n++))
done
}
mapfile -t array < file.txt
func "${array[#]}"
Output
1 a
2 b
3 foo
4 d
5 e
6 f
7 g
8 h
9 i
10 j
11 k
12 l
13 m
14 n
15 o
16 p
17 q
18 r
19 s
20 t
21 u
22 v
23 w
24 x
25 y
26 z
On the other hand if you just want to replace everything with anything at a certain line and and ed is acceptable/available.
#!/usr/bin/env bash
printf '%s\n' ,n | ed -s file.txt
read -rp 'Change 4 to anything: ' input
printf '%s\n' "4c" "$input" . ,n w | ed -s file.txt
A more flexible version of the previous script.
#!/usr/bin/env bash
total=$(printf '%s\n' '$=' | ed -s file.txt)
printf '%s\n' ,n | ed -s file.txt
read -rp 'Enter the line number you want to change: ' int
if [[ $int == *[!0-9]* ]]; then
printf >&2 '%s is not an int\n' "$int"
exit 1
elif (( int > total )); then
printf >&2 '%s is out of range!' "$int"
exit 1
fi
read -rp "Enter the replacement at line $int: " input
printf '%s\n' "${int}c" "$input" . ,n w | ed -s file.txt
Caveat The file.txt name and path is still hard coded to the script, just add an additional read for the the file.

Gnuplot: How to plot bash array without dumping it to a file

I am trying to plot a bash array using gnuplot without dumping the array to a temporary file.
Let's say:
myarray=$(seq 1 5)
I tried the following:
myarray=$(seq 1 5)
gnuplot -p <<< "plot $myarray"
I got the following error:
line 0: warning: Cannot find or open file "1"
line 0: No data in plot
gnuplot> 2
^
line 0: invalid command
gnuplot> 3
^
line 0: invalid command
gnuplot> 4
^
line 0: invalid command
gnuplot> 5''
^
line 0: invalid command
Why it doesn't interpret the array as a data block?
Any help is appreciated.
bash array
myarray=$(seq 1 5)
The myarray is not a bash array, it is a normal variable.
The easiest is to put the data to stdin and plot <cat.
seq 5 | gnuplot -p -e 'plot "<cat" w l'
Or with your variable and with using a here-string:
<<<"$myarray" gnuplot -p -e 'plot "<cat" w l'
Or with your variable with redirection with echo or printf:
printf "%s\n" "$myarray" | gnuplot -p -e 'plot "<cat" w l'
And if you want to plot an actual array, just print it on separate lines and then pipe to gnuplot
array=($(seq 5))
printf "%s\n" "${array[#]}" | gnuplot -p -e 'plot "<cat" w l'
Plot STDIN
gnuplot -p -e 'plot "/dev/stdin"'
Sample:
( seq 5 10; seq 7 12 ) | gnuplot -p -e 'plot "/dev/stdin"'
or
gnuplot -p -e 'plot "/dev/stdin" with steps' < <( seq 5 10; seq 7 12 )
More tunned plot
gnuplot -p -e "set terminal wxt 0 enhanced;set grid;
set label \"Test demo with random values\" at 0.5,0 center;
set yrange [ \"-1\" : \"80\" ] ; set timefmt \"%s\";
plot \"/dev/stdin\" using 1:2 title \"RND%30+40\" with impulse;" < <(
paste <(
seq 2300 2400
) <(
for ((i=101;i--;)){ echo $[RANDOM%30+40];}
)
)
Please note that this is still one line, you could Copy'n paste into any terminal console.

Print duplicate entries in a file using linux commands

I have a file called foo.txt, which consists of:
abc
zaa
asd
dess
zaa
abc
aaa
zaa
I want the output to be stored in another file as:
this text abc appears 2 times
this text zaa appears 3 times
I have tried the following command, but this just writes duplicate entries and their number.
sort foo.txt | uniq --count --repeated > sample.txt
Example of output of above command:
abc 2
zaa 3
How do I add the line "this text appears x times" ?
Awk is your friend:
sort foo.txt | uniq --count --repeated | awk '{print($2" appears "$1" times")}'

Print the middle line of any file UNIX

I have to print the middle line of any text file without sed nor awk.
For example, the following file.txt:
line 1
line 2
line 3
line 4
line 5
I need something like:
$ command -flags file.txt
line 3
Is there any command?
Thanks.
Not the most efficient, but works in bash.
Use wc -l to count the lines, and divide by two. Then use tail -n +N | head -n 1 to print just the Nth line (where N starts at 1).
$ cat input.txt
A
B
C
D
E
$ tail -n +$(((`cat input.txt | wc -l` / 2) + 1)) input.txt | head -n 1
C
Note that a file with an even number of lines has no single "middle line".
I cat-ed the file to wc -l so it wouldn't print the filename.
sed -n $(((`cat input.txt| wc -l`/ 2) + 1))p input.txt

Resources