Comparing arrays in shell - arrays

I am writing a shell script as below, that gets a list of files provided by the user in a file and then ftp's to a server and then compares the list of files to what is on the server. The issue I am having is that when I am calling my diff function, the list that is being returned are the files that are unique to both arrays. I want only those that are in unique_array1 but not in unique_array2. In short, a list that shows what files within the list the user provided are not on the ftp server. Please note, that in the list of files provided by the user, each line is a file name, separated by a new line character.My script is as below:
#!/bin/bash
SERVER=ftp://localhost
USER=anonymous
PASS=password
EXT=txt
FILELISTTOCHECK="ftpFileList.txt"
#create a list of files that is on the ftp server
listOfFiles=$(curl $SERVER --user $USER:$PASS 2> /dev/null | awk '{ print $9 }' | grep -E "*.$EXT$")
#read the list of files from the list provided##
#Eg:
# 1.txt
# 2.txt
# 3.txt
#################################################
listOfFilesToCheck=`cat $FILELISTTOCHECK`
unique_array1=$(echo $listOfFiles | sort -u)
unique_array2=$(echo $listOfFilesToCheck | sort -u)
diff(){
awk 'BEGIN{RS=ORS=" "}
{NR==FNR?a[$0]++:a[$0]--}
END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}")
}
#Call the diff function above
Array3=($(diff unique_array1[#] unique_array2[#]))
#get what files are in listOfFiles but not in listOfFilesToCheck
echo ${Array3[#]}

Based on this You may try comm command:
Usage: comm [OPTION]... FILE1 FILE2
Compare sorted files FILE1 and FILE2 line by line.
With no options, produce three-column output. Column one contains
lines unique to FILE1, column two contains lines unique to FILE2,
and column three contains lines common to both files.
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
A test program:
#!/bin/bash
declare -a arr1
declare -a arr2
arr1[0]="this"
arr1[1]="is"
arr1[2]="a"
arr1[3]="test"
arr2[0]="test"
arr2[1]="is"
unique_array1=$(printf "%s\n" "${arr1[#]}" | sort -u)
unique_array2=$(printf "%s\n" "${arr2[#]}" | sort -u)
comm -23 <(printf "%s\n" "${unique_array1[#]}") <(printf "%s\n" "${unique_array2[#]}")
Output:
a
this

Related

Output results from cat into different files with names specified into an array

I would like to do cat on several files, which names are stored in an array:
cat $input | grep -v "#" | cut -f 1,2,3
Here the content of the array:
echo $input
1.blastp 2.blastp 3.blastp 4.blastp 5.blastp 6.blastp 7.blastp 8.blastp 9.blastp 10.blastp 11.blastp 12.blastp 13.blastp 14.blastp 15.blastp 16.blastp 17.blastp 18.blastp 19.blastp 20.blastp
This will work just nicely. Now, I am struggling in storing the results into proper output files. So I want to also store the output into files which names are stored into another array:
echo $out_in
1_pairs.tab 2_pairs.tab 3_pairs.tab 4_pairs.tab 5_pairs.tab 6_pairs.tab 7_pairs.tab 8_pairs.tab 9_pairs.tab 10_pairs.tab 11_pairs.tab 12_pairs.tab 13_pairs.tab 14_pairs.tab 15_pairs.tab 16_pairs.tab 17_pairs.tab 18_pairs.tab 19_pairs.tab 20_pairs.tab
cat $input | grep -v "#" | cut -f 1,2,3 > "$out_in"
My problem is:
When I don't use the "" I will get 'ambiguous redirect' error.
When I use them, a single file will be created that comes by the name:
1_pairs.tab?2_pairs.tab?3_pairs.tab?4_pairs.tab?5_pairs.tab?6_pairs.tab?7_pairs.tab?8_pairs.tab?9_pairs.tab?10_pairs.tab?11_pairs.tab?12_pairs.tab?13_pairs.tab?14_pairs.tab?15_pairs.tab?16_pairs.tab?17_pairs.tab?18_pairs.tab?19_pairs.tab?20_pairs.tab
I don't get why the input array is read with no problem but that's not the case for the output array...
any ideas?
Thanks a lot!
D.
You cannot redirect output that way, the output is a stream of characters and the redirection can not know when to switch to the next file. You need a loop over input files.
Assuming that the file names do not contain spaces:
for fn in $input; do
grep -v "$fn" | cut -f 1,2,3 >"${fn%%.*}_pairs.tab"
done

How do i echo specific rows and columns from csv's in a variable?

The below script:
#!/bin/bash
otscurrent="
AAA,33854,4528,38382,12
BBB,83917,12296,96213,13
CCC,20399,5396,25795,21
DDD,27198,4884,32082,15
EEE,2472,981,3453,28
FFF,3207,851,4058,21
GGG,30621,4595,35216,13
HHH,8450,1504,9954,15
III,4963,2157,7120,30
JJJ,51,59,110,54
KKK,87,123,210,59
LLL,573,144,717,20
MMM,617,1841,2458,75
NNN,234,76,310,25
OOO,12433,1908,14341,13
PPP,10627,1428,12055,12
QQQ,510,514,1024,50
RRR,1361,687,2048,34
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
"
IFS="," array1=(${otscurrent})
echo ${array1[4]}
Prints:
$ ./test.sh
12
BBB
I'm trying to get it to just print 12... And I am not even sure how to make it just print row 5 column 4
The variable is an output of a sqlquery that has been parsed with several sed commands to change the formatting to csv.
otscurrent="$(sqlplus64 user/password#dbserverip/db as sysdba #query.sql |
sed '1,11d; /^-/d; s/[[:space:]]\{1,\}/,/g; $d' |
sed '$d'|sed '$d'|sed '$d' | sed '$d' |
sed 's/Used,MB/Used MB/g' |
sed 's/Free,MB/Free MB/g' |
sed 's/Total,MB/Total MB/g' |
sed 's/Pct.,Free/Pct. Free/g' |
sed '1b;/^Name/d' |
sed '/^$/d'
)"
Ultimately I would like to be able to call on a row and column and run statements on the values.
Initially i was piping that into :
awk -F "," 'NR>1{ if($5 < 10) { printf "%-30s%-10s%-10s%-10s%-10s\n", $1,$2,$3,$4,$5"%"; } else { echo "Nothing to do" } }')"
Which works but I couldn't run commands from if else ... or atleaste I didn't know how.
If you have bash 4.0 or newer, an associative array is an appropriate way to store data in this kind of form.
otscurrent=${otscurrent#$'\n'} # strip leading newline present in your sample data
declare -A data=( )
row=0
while IFS=, read -r -a line; do
for idx in "${!line[#]}"; do
data["$row,$idx"]=${line[$idx]}
done
(( row += 1 ))
done <<<"$otscurrent"
This lets you access each individual item:
echo "${data[0,0]}" # first field of first line
echo "${data[9,0]}" # first field of tenth line
echo "${data[9,1]}" # second field of tenth line
"I'm trying to get it to just print 12..."
The issue is that IFS="," splits on commas and there is no comma between 12 and BBB. If you want those to be separate elements, add a newline to IFS. Thus, replace:
IFS="," array1=(${otscurrent})
With:
IFS=$',\n' array1=(${otscurrent})
Output:
$ bash test.sh
12
All you need to print the value of the 4th column on the 5th row is:
$ awk -F, 'NR==5{print $4}' <<< "$otscurrent"
3453
and just remember that in awk row (record) and column (field) numbers start at 1, not 0. Some more examples:
$ awk -F, 'NR==1{print $5}' <<< "$otscurrent"
12
$ awk -F, 'NR==2{print $1}' <<< "$otscurrent"
BBB
$ awk -F, '$5 > 50' <<< "$otscurrent"
JJJ,51,59,110,54
KKK,87,123,210,59
MMM,617,1841,2458,75
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
If you'd like to avoid all of the complexity and simply parse your SQL output to produce what you want without 20 sed commands in between, post a new question showing the raw sqlplus output as the input and what you want finally output and someone will post a brief, clear, simple, efficient awk script to do it all at one time, or maybe 2 commands if you still want an intermediate CSV for some reason.

Convert output to arrays then extract values to loop

I want that output result to grep second and third column:
1 db1 ADM_DAT 300 yes 95.09
2 db2 SYSAUX 400 yes 94.52
and convert them like array for example:
outputres=("db1 ADM_DAT" "db2 SYSAUX")
and after that to be able to read those values in loop for example:
for i in "${outputres[#]}"; do read -r a b <<< "$i"; unix_command $(cat file|grep $a|awk '{print $1}') $a $b;done
file:
10.1.1.1 db1
10.1.1.2 db2
Final expectation:
unix_command 10.1.1.1 db1 ADM_DAT
unix_command 10.1.1.2 db2 SYSAUX
This is only a theoretical example, I am not sure if it is working.
I would use a simple bash while read and keep adding elements into the array with the += syntax:
outputres=()
while read -r _ a b _; do
outputres+=("$a $b")
done < file
Doing so, with your input file, I got:
$ echo "${outputres[#]}" #print all elements
db1 ADM_DAT db2 SYSAUX
$ echo "${outputres[0]}" #print first one
db1 ADM_DAT
$ echo "${outputres[1]}" #print second one
db2 SYSAUX
Since you want to use both values separatedly, it may be better to use an associative array:
$ declare -A array=()
$ while read -r _ a b _; do array[$a]=$b; done < file
And then you can loop through the values with:
$ for key in ${!array[#]}; do echo "array[$key] = ${array[$key]}"; done
array[db2] = SYSAUX
array[db1] = ADM_DAT
See a basic example of utilization of these arrays:
#!/bin/bash
declare -A array=([key1]='value1' [key2]='value2')
for key in ${!array[#]}; do
echo "array[$key] = ${array[$key]}"
done
echo ${array[key1]}
echo ${array[key2]}
So maybe this can solve your problem: loop through the file with columns, fetch the 2nd and 3rd and use them twice: firstly the $a to perform a grep in file and then as parameters to cmd_command:
while read -r _ a b _
do
echo "cmd_command $(awk -v patt="$a" '$0~patt {print $1}' file) $a, $b"
done < columns_file
For a sample file file:
$ cat file
hello this is db1
and this is another db2
I got this output (note I am just echoing):
$ while read -r _ a b _; do echo "cmd_command $(awk -v patt="$a" '$0~patt {print $1}' file) $a, $b"; done < a
cmd_command hello db1, ADM_DAT
cmd_command and db2, SYSAUX

Store grep output in an array

I need to search a pattern in a directory and save the names of the files which contain it in an array.
Searching for pattern:
grep -HR "pattern" . | cut -d: -f1
This prints me all filenames that contain "pattern".
If I try:
targets=$(grep -HR "pattern" . | cut -d: -f1)
length=${#targets[#]}
for ((i = 0; i != length; i++)); do
echo "target $i: '${targets[i]}'"
done
This prints only one element that contains a string with all filnames.
output: target 0: 'file0 file1 .. fileN'
But I need:
output: target 0: 'file0'
output: target 1: 'file1'
.....
output: target N: 'fileN'
How can I achieve the result without doing a boring split operation on targets?
You can use:
targets=($(grep -HRl "pattern" .))
Note use of (...) for array creation in BASH.
Also you can use grep -l to get only file names in grep's output (as shown in my command).
Above answer (written 7 years ago) made an assumption that output filenames won't contain special characters like whitespaces or globs. Here is a safe way to read those special filenames into an array: (will work with older bash versions)
while IFS= read -rd ''; do
targets+=("$REPLY")
done < <(grep --null -HRl "pattern" .)
# check content of array
declare -p targets
On BASH 4+ you can use readarray instead of a loop:
readarray -d '' -t targets < <(grep --null -HRl "pattern" .)

Grep rows from reference file while keeping the source column?

I have two tables. Table 1 sample has multiple columns and table 2 has one column. My question is, how can i extract rows from table 1 based on values in table 1. I guess a simple grep should work but how can i do a grep on each row. I would like the output to retain the table 2 identifier that matched.
Thanks!
Desired Output:
IPI00004233 IPI00514755;IPI00004233;IPI00106646; Q9BRK5-1;Q9BRK5-2;
IPI00001849 IPI00420049;IPI00001849; Q5SV97-1;Q5SV97-2;
...
......
Table 1:
IPI00436567; Q6VEP3;
IPI00169105;IPI01010102; Q8NH21;
IPI00465263; Q6IEY1;
IPI00465263; Q6IEY1;
IPI00478224; A6NHI5;
IPI00853584;IPI00000733;IPI00166122; Q96NU1-1;Q96NU1-2;
IPI00411886;IPI00921079;IPI00385785; Q9Y3T9;
IPI01010975;IPI00418437;IPI01013997;IPI00329191; Q6TDP4;
IPI00644132;IPI00844469;IPI00030240; Q494U1-1;Q494U1-2;
IPI00420049;IPI00001849; Q5SV97-1;Q5SV97-2;
IPI00966381;IPI00917954;IPI00028151; Q9HCC6;
IPI00375631; P05161;
IPI00374563;IPI00514026;IPI00976820; O00468;
IPI00908418; E7ERA6;
IPI00062955;IPI00002821;IPI00909677; Q96HA4-1;Q96HA4-2;
IPI00641937;IPI00790556;IPI00889194; Q6ZVT0-1;Q6ZVT0-2;Q6ZVT0-3;
IPI00001796;IPI00375404;IPI00217555; Q9Y5U5-1;Q9Y5U5-2;Q9Y5U5-3;
IPI00515079;IPI00018859; P43489;
IPI00514755;IPI00004233;IPI00106646; Q9BRK5-1;Q9BRK5-2;
IPI00064848; Q96L58;
IPI00373976; Q5T7M4;
IPI00375728;IPI86;IPI00383350; Q8N2K1-1;Q8N2K1-2;
IPI01022053;IPI00514605;IPI00514599; P51172-1;P51172-2;
Table 2:
IPI00000207
IPI00000728
IPI00000733
IPI00000846
IPI00000893
IPI00001849
IPI00002214
IPI00002335
IPI00002349
IPI00002821
IPI00003362
IPI00003419
IPI00003865
IPI00004233
IPI00004399
IPI00004795
IPI00004977
You cannot use grep to prepend the needle, so no chance to use -f file2.
Use a loop and prepend manually:
while read token; do grep $token file1 |xargs -I{} echo $token {} ; done <file2
Alternatively, you could store both the results of grep and grep -o and paste them:
grep -f 2.txt 1.txt >a
grep -of 2.txt 1.txt >b
paste b a
If you're also fine with using awk, try this:
awk 'FNR==NR { a[$0];next } { for (x in a) if ($0 ~ x) print x, $0 }' 2.txt 1.txt
Explanation: For the first file (as long as FNR==NR), store all needles into array a ({ a[$0];next }). Then (implicitly) loop over all lines of the second file, loop again over all needles and print needle and line if found.

Resources