Hey guys so I have a grep problem. I want to grep lines from a file that contain a certain number, and then I want to paste certain columns from that line to a file.
For example, if I have the number 1068
File A has
1094 A B C
1068 D E F
1044 G H I
File B has
1092 L M N
1068 X Y Z
1045 Q R S
File C has
1093 A B C
1062 D E F
1041 G H I
I want to grep the line that has 1068 from all files, only paste certain columns, and paste them side by side. Note that File C does not have 1068, but I would like to paste NA instead. So that the final output looks like this:
1068 FileA A C FileB X Z FileC NA NA
Any help would be appreciated! I don't now how you would grep columns, or even check if it exists. For example in File C, grep would just come out with nothing, but I want to had in NA NA instead. How would I do that?
I don't think this is a job for grep. More of an awk job.
awk -v num=1068 '
BEGIN { printf "%d", num }
# If file has changed and num has not been found...
FNR==1 && NR!=1 && !found_num { printf " %s NA NA", FILENAME }
# If at the beginning of a file (needs to be after previous line).
FNR==1 { found_num = 0 }
# If we find num, print the data and set found_num flag.
$1 == num { printf " %s %s %s", FILENAME, $2, $4; found_num = 1 }
END { if (!found_num) printf " %s NA NA", FILENAME; print"" }
' FileA FileB FileC
Related
I want to loop through all elements in an array in awk and print. The values are sourced from the file below:
Ala A Alanine
Arg R Arginine
Asn N Asparagine
Asp D Aspartic acid
Cys C Cysteine
Gln Q Glutamine
Glu E Glutamic acid
Gly G Glycine
His H Histidine
Ile I Isoleucine
Leu L Leucine
Lys K Lysine
Met M Methionine
Phe F Phenylalanine
Pro P Proline
Pyl O Pyrrolysine
Ser S Serine
Sec U Selenocysteine
Thr T Threonine
Trp W Tryptophan
Tyr Y Tyrosine
Val V Valine
Asx B Aspartic acid or Asparagine
Glx Z Glutamic acid or Glutamine
Xaa X Any amino acid
Xle J Leucine or Isoleucine
TERM TERM termination codon
I have tried this:
awk 'BEGIN{FS="\t";OFS="\t"}{if (FNR==NR) {codes[$1]=$2;} else{next}}END{for (key in codes);{print key,codes[key],length(codes)}}' $input1 $input2
And the output is always Cys C 27 and when I replace codes[$1]=$2 for codes[$2]=$1 I get M Met 27.
How can I make my code print out all the values sequentially? I don't understand why my code selectively prints out just one element when I can tell the array length is 27 as expected. (To keep my code minimal I have excluded code within else{next} - Otherwise I just want to print all elements from array codes while retaining the else{***} command)
According to How to view all the content in an awk array?, The syntax above should work. I tried it here echo -e "1 2\n3 4\n5 6" | awk '{my_dict[$1] = $2};END {for(key in my_dict) print key " : " my_dict[key],": "length(my_dict)}' and that worked well.
With your shown samples and attempts please try following, written and tested in GNU awk.
awk '
BEGIN{
FS=OFS="\t"
}
{
codes[$1]=$2
}
END{
for(key in codes){
print key,codes[key],length(codes)
}
}' Input_file
Will add detailed explanation and OP's misses too in few mins.
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="\t" ##Setting FS and OFS as TAB here.
}
{
codes[$1]=$2 ##Creating array codes with index of 1st field and value of 2nd field
}
END{ ##Starting END block of this program from here.
for(key in codes){ ##Traversing through codes array here.
print key,codes[key],length(codes) ##Printing index and value of current item along with total length of codes.
}
}' Input_file ##Mentioning Input_file name here.
I'm a bit confused what you are after, but to print the codes sequentially, with the no., (ignoring the name), you can do:
awk '{seq[++n]=$2; codes[$2]=$1}
END{for (i=1;i<=n;i++) printf "%s\t%s\t%d\n", codes[seq[i]], seq[i], i}' file
Which uses two arrays to coordinate the sequence number with the single letter in the seq array and then the letter to the code in the codes array.
Example Use/Output
$ awk '{seq[++n]=$2; codes[$2]=$1}
END{for (i=1;i<=n;i++) printf "%s\t%s\t%d\n", codes[seq[i]], seq[i], i}' file
Ala A 1
Arg R 2
Asn N 3
Asp D 4
Cys C 5
Gln Q 6
Glu E 7
Gly G 8
His H 9
Ile I 10
Leu L 11
Lys K 12
Met M 13
Phe F 14
Pro P 15
Pyl O 16
Ser S 17
Sec U 18
Thr T 19
Trp W 20
Tyr Y 21
Val V 22
Asx B 23
Glx Z 24
Xaa X 25
Xle J 26
TERM TERM 27
Resolved: The error was brought about by the introduction of ; here: END{for (key in codes);{print key,codes[key],length(codes)}}.
Solution:
awk 'BEGIN{FS="\t";OFS="\t"}{if (FNR==NR) {codes[$1]=$2;} else{next}}END{for (key in codes){print key,codes[key],length(codes)}}' $input1 $input2
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
so for reading the list of file, I use this code here below:
IFS=$'\n' read -d '' -r -a data < ./somefolder/mytext.txt
for i in {0..9} #i know that i have 10 items, thats why i use 0..9
do
echo "${data[$i]}"
done
lets say i have 1-10 in the txt file, so it should print like below:
1
2
3
4
5
6
7
8
9
10
Questions:
is there any simpler way to read/write the text list than this?
how to save/update/overwrite data of mytext.txt? lets say change 4 to 88 for example.
Full example:
#!bin/bash
IFS=$'\n' read -d '' -r -a data < ./somefolder/mytext.txt
for i in {0..9} #i know that i have 10 items, thats why i use 0..9
do
echo "${data[$i]}"
done
echo "change 4 to anything"
read any
update(){
for n in {0..9}
do
if [[ n == 3 ]]; then
echo any
else
echo "${data[$n]}"
fi
done
}
update > ./somefolder/mytext.txt
#i dont know what i should do, it throws some errors saying syntax error
echo "saved"
exit 0
This is the code and output of the code, it is not the same as you describe in the comments.
printf '%s\n' {a..z} > file.txt
cat file.txt
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
A quick way of showing line numbers by using grep
grep -n . file.txt
A function to loop through an array.
func() {
n=1
for f; do
if (( n == 3 )); then
printf '%d %s\n' "$n" foo
else
printf '%d %s\n' "$n" "$f"
fi
((n++))
done
}
mapfile -t array < file.txt
func "${array[#]}"
Output
1 a
2 b
3 foo
4 d
5 e
6 f
7 g
8 h
9 i
10 j
11 k
12 l
13 m
14 n
15 o
16 p
17 q
18 r
19 s
20 t
21 u
22 v
23 w
24 x
25 y
26 z
On the other hand if you just want to replace everything with anything at a certain line and and ed is acceptable/available.
#!/usr/bin/env bash
printf '%s\n' ,n | ed -s file.txt
read -rp 'Change 4 to anything: ' input
printf '%s\n' "4c" "$input" . ,n w | ed -s file.txt
A more flexible version of the previous script.
#!/usr/bin/env bash
total=$(printf '%s\n' '$=' | ed -s file.txt)
printf '%s\n' ,n | ed -s file.txt
read -rp 'Enter the line number you want to change: ' int
if [[ $int == *[!0-9]* ]]; then
printf >&2 '%s is not an int\n' "$int"
exit 1
elif (( int > total )); then
printf >&2 '%s is out of range!' "$int"
exit 1
fi
read -rp "Enter the replacement at line $int: " input
printf '%s\n' "${int}c" "$input" . ,n w | ed -s file.txt
Caveat The file.txt name and path is still hard coded to the script, just add an additional read for the the file.
I want to print all lines from file 1 where the values of $1 and $4 are found in $1 and $4 of file 2 AND where the value in file 1 $2 is greater than or equal to the value in file 2 $2 AND where the value in file 1 $3 is less than or equal to the value in file 2 $3.
file 1
1 110201809 117658766 a
1 168095261 182305990 b
1 215456074 233436403 c
2 9465687 12905490 d
2 28765309 35235120 e
2 48958595 64702082 f
file 2
1 245371026 249210707 a
2 937388 46504962 h
2 937388 162731186 b
2 2954974 6777829 c
2 9465687 12996275 d
2 14539477 44757554 d
2 14766820 30080818 m
2 16531332 23584565 n
2 17340076 26206255 o
2 18535880 24452180 p
2 28830071 35289330 q
2 36206662 47273732 r
2 48958495 64703082 f
Desired output only prints the lines from file 1 that meet the condition.
desired output
2 9465687 12905490 d
2 48958595 64702082 f
I've tried the following which gave an empty file:
awk 'NR==FNR{ a[$1,$4]= $0; b[$2] = $2 ; c[$3] = $3; next } ($1 $4 in a) && ($2 >= b[$2]) && ($3 <= c[$3])' file2 file1>desired output
I would do this by collecting the second and third columns in separate hashes, e.g.:
parse.awk
NR==FNR {
g[$1,$4] = $2
h[$1,$4] = $3
next
}
($1 SUBSEP $4 in g) && g[$1,$4] >= $2 && h[$1,$4] <= $3
Run it like this:
awk -f parse.awk file1 file2
Output:
2 9465687 12996275 d
2 48958495 64703082 f
I have a file similar to this:
A B C
D E C
F G C
A B X
F G X
A B Q
D E Q
Thats what I am looking for
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q
So far I have a kind of complicated work-around.
Using AWK to add a empty line.
awk -v i=3 "NR>0 && $i!=p { print "A" }{ p=$i } 1" file.txt
I dont manage to add a ">" directly with awk since its a newline value. Instead of the "A", awk is outputting a empty line. Not really sure why..
Using then
sed -e "s/^$/>/" file.txt
I manage to insert a ">" to the empty line but the heading behind is still missing.
sed is for doing s/old/new, that is all. What you are attempting to do is not just s/old/new so you shouldn't be considering using sed, just use awk:
$ awk '$3!=p{print ">", $3; p=$3} 1' file
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q
awk solution. Assuming that your input file is sorted:
awk '!a[$NF]++{ print ">",$NF }1' file
The output:
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q
Could you please try following also and let me know if this helps you.
awk 'NR==1{print ">",$3 RS $0;prev=$3;next} prev!=$3{print ">",$3};1; {prev=$3}' Input_file
Output will be as follows.
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q
So in Perl how can I go through a sample file like so:
1 D Z
1 E F
1 G L
2 D I
2 E L
3 D P
3 G L
So here I want to be able to print out only the values that have a value in the first column that appears with every different value of the second column.
The output would look like this:
1 D Z
1 E F
1 G L
cat test
1 D Z
1 E F
1 G L
2 D I
2 E L
3 D P
3 G L
perl -a -lne 'unless ( $h{ $F[1] } ) { print }; $h{ $F[1] } = 1; ' test
1 D Z
1 E F
1 G L
Okay this isn't as easy as it seems. I've read the file into memory so that I can take three passes over it
Count the number of different values in column 2
Record each combination of values in column 1 and column 2
Print those lines in the file whose first column has as many occurrences as there are different values of column 2
This could be improved with more information about the input file, but it works fine as it is and I see no reason to optimise it
use strict;
use warnings 'all';
use List::MoreUtils qw/ uniq /;
my #lines = <>;
my #col2 = uniq map { (split)[1] } #lines;
my %data;
for ( #lines ) {
my ($c1, $c2) = split;
$data{$c1}{$c2} = 1;
}
for ( #lines ) {
my ($c1) = split;
print if keys %{ $data{$c1} } == #col2;
}
output
1 D Z
1 E F
1 G L