grep - blacklisting using a file - blacklist

New to using Grep. Basically I have two text files; blacklist.txt and many foo.txt in different directories.
I started off using:
grep -vE "(insert|blacklist|items|here)" foo.txt > filtered_foo.txt
but my blacklist has grown exponentially and so I need to compare the two files instead.
In foo.txt there are four columns with columns 1,2,3 being unique. I want to delete rows where column 4 matches a string in my blacklist.
Sample of a foo.txt
A1 A2 A3 Bob
B1 B2 B3 Anne
C1 C2 C3 Henry
D1 D2 D3 Ted
blacklist.txt
Anne
Ted
Desired output: filtered_foo.txt
A1 A2 A3 Bob
C1 C2 C3 Henry
I have tried different things in grep such as:
grep -vF "'cat blacklist.txt'" foo.txt > filtered_foo.txt

Use the -f option to get the patterns from a file.
grep -vF -f blacklist.txt foo.txt > filtered_foo.txt

Related

Eliminate duplicate columns based on separate field using awk array?

I am trying to eliminate a set of duplicate rows based on a separate field.
cat file.txt
1 345 a blue
1 345 b blue
3 452 c blue
3 342 d green
3 342 e green
1 345 f green
I would like to remove duplicates rows based on field 1 and 2, but separately for each colour. Desired output:
1 345 a blue
3 452 c blue
3 342 d green
1 345 f green
I can achieve this output using a for loop that iterates over the colours:
for i in $(awk '{ print $4 }' file.txt | sort -u); do
grep -w ${i} |
awk '!x[$1,$2]++' >> output.txt
done
But this is slow. Is there any way to get this output without use of a loop?
Thank you.
At least for the example, it is simple as:
$ awk 'arr[$1,$2,$4]++{next} 1' file
1 345 a blue
3 452 c blue
3 342 d green
1 345 f green
Or, you can negate that:
$ awk '!arr[$1,$2,$4]++' file
You can also use GNU sort for the same which may be faster:
$ sort -k4,4 -k2,2 -k1,1 -u file
Could you please try this too:
awk '!a[$1,$2,$4]++' Input_file

Print duplicate entries in a file using linux commands

I have a file called foo.txt, which consists of:
abc
zaa
asd
dess
zaa
abc
aaa
zaa
I want the output to be stored in another file as:
this text abc appears 2 times
this text zaa appears 3 times
I have tried the following command, but this just writes duplicate entries and their number.
sort foo.txt | uniq --count --repeated > sample.txt
Example of output of above command:
abc 2
zaa 3
How do I add the line "this text appears x times" ?
Awk is your friend:
sort foo.txt | uniq --count --repeated | awk '{print($2" appears "$1" times")}'

Merge multiple files by common field - Unix

I have hundreds of files, each with two columns :
For example :
file1.txt
ID Value1
1 40
2 30
3 70
file2.txt
ID Value2
1 50
2 70
3 20
And so on, till
file150.txt
ID Value150
1 98
2 52
3 71
How do I merge these files based on the first column (which is common). My output should be
ID Value1 Value2...........Value150
1 40 50 98
2 30 70 52
3 70 20 71
Thank you.
using cut and paste combination to solve the file merging problem on three files or more. cd to the folder only contains file1, file2, file3, ... file150:
i=0
cut -f 1 file1 > delim ## use first column as delimiter
for file in file*
do
i=$(($i+1)) ## for adding count to distinguish files from original ones
cut -f 2 $file > ${file}__${i}.temp
done
paste -d\\t delim file*__*.temp > output
Another solution is using join to merge two files once by steps.
join -j 1 test1 test2 | join -j 1 test3 - | join -j 1 test4 -

Matching two files with awk codes

There are two files
first.file
M1
M2
M3
...
second.file
A1 M1
A2 M1
A2 M3
A3 M2
A3 M4
A3 M5
....
I want to match first.file to second.file My result file should be like that:
result.file
A1 M1
A2 M1
A2 M3
A3 M2
How can I do that with awk codes ?
Thank you in advance
awk '
BEGIN { while (getline < "first.file") { file1[$0]=1 } }
$2 in file1 { print }
' <second.file
Use the below:
grep -f firstfile secondfile
grep is enough.
even though we can do this with awk too,i prefer grep
If you still insist on awk,Then i have a very simple solution in awk too.
awk 'FNR==NR{a[$0];next}($0 in a)' file2 file1
Explanation:
Put file2 entries into an array. Then iterate file1, each time finding those entries in the array.

Comparing two files using awk or sed

I have two files...
Lookup is 1285 lines long:
cat Lookup.txt
abc
def
ghi
jkl
main is 4,838,869 lines long:
cat main.txt
abc, USA
pqr, UK
xyz, SA
I need to compare lookup and main and then output the matching lines in main to final.txt
You don't need awk or sed here but grep, assuming I am reading your requirements correctly:
% grep -f lookup.txt main.txt > final.txt
% cat final.txt
abc, USA

Resources