Matching two files with awk codes

Matching two files with awk codes - file

There are two files
first.file
M1
M2
M3
...
second.file
A1 M1
A2 M1
A2 M3
A3 M2
A3 M4
A3 M5
....
I want to match first.file to second.file My result file should be like that:
result.file
A1 M1
A2 M1
A2 M3
A3 M2
How can I do that with awk codes ?
Thank you in advance

awk '
BEGIN { while (getline < "first.file") { file1[$0]=1 } }
$2 in file1 { print }
' <second.file

Use the below:
grep -f firstfile secondfile
grep is enough.
even though we can do this with awk too,i prefer grep
If you still insist on awk,Then i have a very simple solution in awk too.
awk 'FNR==NR{a[$0];next}($0 in a)' file2 file1
Explanation:
Put file2 entries into an array. Then iterate file1, each time finding those entries in the array.

Related

Edit a string in shell script and display it as an array

Input:
1234-A1;1235-A2;2345-B1;5678-C2;2346-D5
Expected Output:
1234
1235
2345
5678
2346
Input shown is a user input. I want to store it in an array and do some operations to display as shown in 'Expected Output'
I have done it in perl, but want to achieve it in shell script. Please help in achieving this.

To split an input text to an array you can follow this technique:
IFS="[;-]" read -r -a arr <<< "1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"
printf '%s\n' "${arr[#]}"
1234
A1
1235
A2
2345
B1
5678
C2
2346
D5
If you want to keep only 1234,1234, etc as per your expected output you can either to use the corresponding array elements (0-2-4-etc) or to do something like this:
a="1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"
IFS="[;]" read -r -a arr <<< "${a//-[A-Z][0-9]/}" #or more generally <<< "${a//-??/}"
declare -p arr #This asks bash to print the array for us
#Output
declare -a arr='([0]="1234" [1]="1235" [2]="2345" [3]="5678" [4]="2346")'
# Array can now be printed or used elsewhere in your script. Array counting starts from zero

#Yash:#try:
echo "1234-A1;1235-A2;2345-B1;5678-C2;2346-D5" | awk '{gsub(/-[[:alnum:]]+/,"");gsub(/;/,RS);print}'
Substituting all alpha bate, numbers with NULL, then substituting all semi colons to RS(record separator) which is a new line by default.

Thanks #George and #Vipin.
Based on your inputs the solution which best suites my environment is as under:
i=0
a="1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"
IFS="[;]" read -r -a arr <<< "${a//-??/}"
#declare -p arr
for var in "${arr[#]}"
do
echo " var $((i++)) is : $var"
done
Output:
var 0 is : 1234
var 1 is : 1235
var 2 is : 2345
var 3 is : 5678
var 4 is : 2346

Try this -
awk -F'[-;]' '{for(i=1;i<=NF;i++) if(i%2!=0) {print $i}}' f
1234
1235
2345
5678
2346
OR
echo "1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"|tr ';' '\n'|cut -d'-' -f1
OR
As #George Vasiliou Suggested -
awk -F'[-;]' '{for(i=1;i<=NF;i+=2) {print $i}}'f
If Data needs to store in Array and you are using gawk, try below -
awk -F'[;-]' -v k=1 '{for(i=1;i<=NF;i++) if($i !~ /[[:alpha:]]/) {a[k++]=$i}} END {
> PROCINFO["sorted_in"] = "#ind_str_asc"
> for(k in a) print k,a[k]}' f
1 1234
2 1235
3 2345
4 5678
5 2346
PROCINFO["sorted_in"] = "#ind_str_asc" used to print the data in
sorted order.

grep - blacklisting using a file

New to using Grep. Basically I have two text files; blacklist.txt and many foo.txt in different directories.
I started off using:
grep -vE "(insert|blacklist|items|here)" foo.txt > filtered_foo.txt
but my blacklist has grown exponentially and so I need to compare the two files instead.
In foo.txt there are four columns with columns 1,2,3 being unique. I want to delete rows where column 4 matches a string in my blacklist.
Sample of a foo.txt
A1 A2 A3 Bob
B1 B2 B3 Anne
C1 C2 C3 Henry
D1 D2 D3 Ted
blacklist.txt
Anne
Ted
Desired output: filtered_foo.txt
A1 A2 A3 Bob
C1 C2 C3 Henry
I have tried different things in grep such as:
grep -vF "'cat blacklist.txt'" foo.txt > filtered_foo.txt

Use the -f option to get the patterns from a file.
grep -vF -f blacklist.txt foo.txt > filtered_foo.txt

passing array as parameter in bash 3

I've refer Passing arrays as parameters in bash, but it failed. Here is my test script(both bash 3.0)
The bash version
GNU bash, version 3.00.16(1)-release (sparc-sun-solaris2.10)
Copyright (C) 2004 Free Software Foundation, Inc.
The script t.sh
fn() {
local i
local v1="$1"
local v2="$2"
local v3="$3"
echo "v1=$1"
echo "v2=$2"
echo "v3=$3"
declare -a a1=("${!1}")
declare -a a2=("${!2}")
echo "a1:"
for i in ${!a1[*]} ; do
echo " ${a1[$i]}"
done
echo "a2:"
for i in ${!a2[*]} ; do
echo " ${a2[$i]}"
done
}
caller() {
local a=("a1 a2" "a3" "a4")
local b=("b1" "b2" "b3" "b4")
echo "method 1:"
fn "${a[#]}" "${b[#]}" $1 $2 $3
echo "method 2:" # workable on bash 4.2.45
fn a[#] b[#] $1 $2 $3
}
caller c
Output
method 1:
v1=(a1 a2 a3 a4)
v2=(b1 b2 b3 b4)
v3=c
a1:
a2:
method 2:
v1=a[#]
v2=b[#]
v3=c
t.sh: array assign: line 10: syntax error near unexpected token `('
t.sh: array assign: line 10: `(a1 a2 a3 a4)'
expected output
...
a1:
a1 a2
a3
a4
a2:
b1
b2
b3
b4

I'm not sure where you are having the issue, but I've confirmed operation on Bash 3.2.25:
./caller.sh
method 1:
v1=a1 a2
v2=a3
v3=a4
a1:
a2:
method 2:
v1=a[#]
v2=b[#]
v3=c
a1:
a1 a2
a3
a4
a2:
b1
b2
b3
b4
03:36 lakehouse~/scr/tmp> bash --version
GNU bash, version 3.2.25(1)-release (i586-suse-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.

Print all subexpressions of a logical expression by omitting one condition each time

Using perl, i would like to print all subexpressions that I can get by omitting exactly one of the conditions in the main expression.
So if this is the input: C1 and C2 and C3 and C4
This should be the output (the order also matters, i want to omit the first element, then the second etc.):
C2 and C3 and C4 (first element missing)
C1 and C3 and C4 (second element missing)
C1 and C2 and C4 (third element missing)
C1 and C2 and C3 (fourth element missing)
Note that my expressions only use AND as conjunctions. I know how to split the original expression into conditions:
my #CONDITIONS = split( / and /, $line );
I also know I could do what I want using two nested loops and some if/else to properly handle the conjunction placements, BUT i'm quite sure a more elegant perl solution is out there. But for the life of me i cannot figure it out on my own. Basicaly what i'm asking is if there is a way how to join an array without the i-th element.

i like your Problem. Based on your expected output comes my solution:
my $string = "C1 and C2 and C3 and C4";
my #split = split / and /, $string;
for my $counter (0..$#split) {
print join ' and ', grep { $_ !~ /$split[$counter]/ } #split;
print "\n";
}
Explain:
The magic here is the grep which only greps the entries of #split which does not contain the part which is at the current index of the loop. For example we start at index 0:
# $counter == 0
# $split[$counter] contains C1
# grep goes through #split and only takes the parts of #split
# which does not contain C1, because its inside $split[$counter]
#
# the next loop set $counter to 1
# $split[$counter] contains C2 now and the
# grep just grep again only the stuff of #split which does not contain C2
# that way, we just take the parts of #split which are not at the current loop
# position inside #split :)
EDIT:
note that my stuff does not work with Strings with duplicate entries:
my $string = "C1 and C2 and C3 and C4 and C4";
Output:
C2 and C3 and C4 and C4
C1 and C3 and C4 and C4
C1 and C2 and C4 and C4
C1 and C2 and C3
C1 and C2 and C3

Parallel iteration over lists in makefile or CMake file

Is there a way to loop over multiple lists in parallel in a makefile or CMake file?
I would like to do something like the following in CMake, except AFAICT this syntax isn't supported:
set(a_values a0 a1 a2)
set(b_values b0 b1 b2)
foreach(a in a_values b in b_values)
do_something_with(a b)
endforeach(a b)
This would execute:
do_something_with(a0 b0)
do_something_with(a1 b1)
do_something_with(a2 b2)
I would accept an answer in either CMake or Make, though CMake would be preferred. Thanks!

Here you go:
set(list1 1 2 3 4 5)
set(list2 6 7 8 9 0)
list(LENGTH list1 len1)
math(EXPR len2 "${len1} - 1")
foreach(val RANGE ${len2})
list(GET list1 ${val} val1)
list(GET list2 ${val} val2)
message(STATUS "${val1} ${val2}")
endforeach()

As of CMake 3.17, the foreach() loop supports a ZIP_LISTS option to iterate through two (or more) lists simultaneously:
set(a_values a0 a1 a2)
set(b_values b0 b1 b2)
foreach(a b IN ZIP_LISTS a_values b_values)
message("${a} ${b}")
endforeach()
This prints:
a0 b0
a1 b1
a2 b2

In make you can use the GNUmake table toolkit to achieve this by handling the two lists as 1-column tables:
include gmtt/gmtt.mk
# declare the lists as tables with one column
list_a := 1 a0 a1 a2 a3 a4 a5
list_b := 1 b0 b1 b2
# note: list_b is shorter and will be filled up with a default value
joined_list := $(call join-tbl,$(list_a),$(list_b), /*nil*/)
$(info $(joined_list))
# Apply a function (simply output "<tuple(,)>") on each table row, i.e. tuple
$(info $(call map-tbl,$(joined_list),<tuple($$1,$$2)>))
Output:
2 a0 b0 a1 b1 a2 b2 a3 /*nil*/ a4 /*nil*/ a5 /*nil*/
<tuple(a0,b0)><tuple(a1,b1)><tuple(a2,b2)><tuple(a3,/*nil*/)><tuple(a4,/*nil*/)><tuple(a5,/*nil*/)>

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Matching two files with awk codes - file

There are two files first.file M1 M2 M3 ... second.file A1 M1 A2 M1 A2 M3 A3 M2 A3 M4 A3 M5 .... I want to match first.file to second.file My result file should be like that: result.file A1 M1 A2 M1 A2 M3 A3 M2 How can I do that with awk codes ? Thank you in advance

awk ' BEGIN { while (getline < "first.file") { file1[$0]=1 } } $2 in file1 { print } ' <second.file

Related

Edit a string in shell script and display it as an array

grep - blacklisting using a file

passing array as parameter in bash 3

Print all subexpressions of a logical expression by omitting one condition each time

Parallel iteration over lists in makefile or CMake file

Categories

Resources