I have two arrays, I am trying to find matching values using comm. Array1 contains some additional information in each element that I strip out for the comparison. However, I would like to keep that information after the comparison is complete.
For example:
Array1=("abc",123,"hello" "def",456,"world")
Array2=("abc")
declare -a Array1
declare -a Array2
I then compare the two arrays:
oldIFS=$IFS IFS=$'\n\t'
array3=($(comm -12 <(echo "${Array1[*]}" | awk -F "," {'print $1'} | sort) <(echo "${Array2[*]}" | sort)))
IFS=$oldIFS
Which finds the match of abc:
echo ${test3[0]}
abc
However what I want is remaining values from array1 that were not part of my comm statement.
abc,123,hello
EDIT: For more clarification
The arrays in this example are populated with dummy data.
My real example is pulling information from server logs which I am saving into array1. array1 contains (userIDs,hostIPs,count) that I want to cross reference against a list of userID's (array2). My goal is to find out what userIDs exsist in array1 and array2 and save those ID's with the additional information from array1 (hostIPs,count) into array3
array1 is populated from a variable that is is the results of a curl command that generates a splunk search. The data returned looks like this:
"uniqueID=<ID>","<IP>","<hostname>",1
I save the results of the splunk report as $splunk, and then decalare array1 with the results of $splunk - the header information since the results come back in csv format
array1=( $(echo $splunk | sed 's/ /\n/g' | sed 1d) )
array2 is generated from a master file that I have stored locally. That contains all the application ID's in our ecosystem. For example
uid=<ID>
I cat the contents of the master file into array2
array2=( $(cat master.txt) )
I then want to find what IDs from array1 exsist in array2 and save that as array3. This requires some massaging of the data in array1 to make it match the format of array2.
oldIFS=$IFS IFS=$'\n\t'
array3=($(comm -12 <(echo "${array1[*]}" | sed 's/ /\n/g' | awk -F "\"," {'print $1'} | sed 's/\"//g' | sed 's/|/ /g' | awk -F$'=' -v OFS=$'=' '{ $1 = "uid" }1' | grep -i "OU=People" | sed 's/OU/ou/g' | sort) <(echo "${array2[*]}" | sort)))
IFS=$oldIFS
array 3 will then contain lines that match in both arrays
uid=<ID>
uid=<ID>
However I am looking for something more along the line of
"uid=<ID>","<IP>","<hostname>",1
"uid=<ID>","<IP>","<hostname>",1
I would do it like this:
join -t, \
<(printf '%s\n' "${Array1[#]}" | sort -t, -k1,1) \
<(printf '%s\n' "${Array2[#]}" | sort)
Use the join command with , as the field delimiter. The first "file" is the first array, one element per line, sorted on the first field (comma delimited); the second "file" is the second array, one element per line, sorted.
The output will be every line where the first element of the first file matches the element from the second file; for the example input it's
abc,123,hello
This makes only one assumption, namely that no array element contains a newline. To make it more robust (assuming GNU Coreutils), we can use NUL as the delimiter:
join -z -t, \
<(printf '%s\0' "${Array1[#]}" | sort -z -t, -k1,1) \
<(printf '%s\0' "${Array2[#]}" | sort -z)
This prints the output separated by NUL as well; to read the result into an array, we can use readarray:
readarray -d '' -t Array3 < <(
join -z -t, \
<(printf '%s\0' "${Array1[#]}" | sort -z -t, -k1,1) \
<(printf '%s\0' "${Array2[#]}" | sort -z)
)
readarray -d requires Bash 4.4 or newer. For older Bash, you can use a loop:
while IFS= read -r -d '' element; do
Array3+=("$element")
done < <(
join -z -t, \
<(printf '%s\0' "${Array1[#]}" | sort -z -t, -k1,1) \
<(printf '%s\0' "${Array2[#]}" | sort -z)
)
I don't know how to do this with comm, but I do have a solution for you with sed and grep. The following commands match on the regex uid=X,, where the string/array is in the form of uid=x or (uid=x uid=y) respectively.
# Array 2 (B) is a string
$ A=("uid=1,10.10.10.1,server1,1" "uid=2,10.10.10.2,server2,1")
$ B="uid=1"
$ echo ${A[#]} | grep -oE "([^ ]*${B},[^ ]*)"
uid=1,10.10.10.1,server1,1
# Array 2 (D) is an array
$ C=(${A[#]} "uid=3,10.10.10.3,server3,1" "uid=4,10.10.10.4,server4,1")
$ D=(${B} "uid=3")
$ echo ${C[*]} | grep -oE "([^ ]*($(echo ${D[#]} | sed 's/ /,|/g'))[^ ]*)"
uid=1,10.10.10.1,server1,1
uid=3,10.10.10.3,server3,1
# Content of arrays
$ echo ${A[#]}
uid=1,10.10.10.1,server1,1 uid=2,10.10.10.2,server2,1
$ echo ${B}
uid=1
$ echo ${C[#]}
uid=1,10.10.10.1,server1,1 uid=2,10.10.10.2,server2,1 uid=3,10.10.10.3,server3,1 uid=4,10.10.10.4,server4,1
$ echo ${D[#]}
uid=1 uid=3
I have a react application which uses a .sh file to generate a part of the application. But I have come across an error which prevents the application from reading the data files.
I've looked at versioning, possible syntax changes in the .sh file which is emap.sh
This is the .sh file
#!/bin/bash
object='{}'
find 'public/data' -type f -name '*.csv' | while read -r filename;
do
filename=$(echo "$filename" | sed 's/public\/data\///' | sed 's/ /\//')
city=$(echo "$filename" | cut -d'/' -f1)
medium=$(echo "$filename" | cut -d'/' -f2)
direction=$(echo "$filename" | cut -d'/' -f3)
date=$(echo "$filename" | cut -d'/' -f4)
time=$(echo "$filename" | cut -d'/' -f5| head -c-5)
object=$(echo $object | jq -c ".$city.$medium.$direction[\"$date\"] |= . + [\"$time\"]")
echo $object > public/available.json
done
It should be successful so that when we yarn the application, it shows up having read the data, but we get a page without data in the area using the information that needs to be processed.
The shell variables should be passed to jq in a more robust manner, e.g. along these lines:
jq -c --arg city "$city" \
--arg medium "$medium" \
--arg direction "$direction" \
--arg date "$date" \
--arg time "$time" \
'.[$city][$medium][$direction][$date] += [$time]'
As #OguzIsmail points out, though, you would probably be better off avoiding all the messiness by doing everything with just find and jq.
I have a (let's call it original) script to parse a file (~1000 lines) line by line, generate arguments to execute a C++ program.
#!/bin/bash
i = 0
while IFS='' read -r line || [[ -n "$line" ]]; do
a="$line" | cut -c1-2
b="$line" | cut -c3-4
c="$line" | cut -c5-6
d="$line" | cut -c7-8
e="$line" | cut -c9-10
f="$line" | cut -c11-12
g="$line" | cut -c13-14
h="$line" | cut -c15-16
i="$line" | cut -c17-18
j="$line" | cut -c19-20
k="$line" | cut -c21-22
l="$line" | cut -c23-24
m="$line" | cut -c25-26
n="$line" | cut -c27-28
o="$line" | cut -c29-30
p="$line" | cut -c31-32
./a.out "$a" "$b" "$c" "$d" "$e" "$f" "$g" "$h" "$i" "$j" "$k" "$l" "$m" "$n" "$o" "$p" > $(echo some-folder/output_${i}.txt)
done < test_10.txt
I want to schedule this job in a batch, so that each run is queued and ran on separate cores.
I checked the PBS and qsub writing styles. I could write a PBS file (simple one, without all options for now. Lets call it callPBS.PBS):
#!/bin/bash
cd $PBS_O_WORKDIR
qsub ./a.out -F "my arguments"
exit 0
I can call this file instead of ./a.out -------- in original script. BUT how do I pass "my arguments"? Problem is they are not fixed.
Secondly I know qsub takes -o as an option for output file. But I want my output file name to be changed. I can pass that as an argument again, but how?
Can I do this in my original script:
callPBS.pbs > $(echo some-folder/output_${i}.txt)
I am sorry if I am missing something here. I am trying to use all that I know!
in bash script, I define a array:
array=$(awk '{print $4}' /var/log/httpd/sample | uniq -c | cut -d[ -f1)
Now, I want to translate this content to code in bash script:
"if there is NOT any element in array, it means array=nothing, then echo "nothing in array".
help me to do that??? Thanks a lot
*besides, I want to delete access_log's content periodically every 5min (/var/log/httpd/access_log). Please tell me how to do that??*
Saying:
array=$(awk '{print $4}' /var/log/httpd/sample | uniq -c | cut -d[ -f1)
does not define an array. This simply puts the result of the command into the variable array.
If you wanted to define an array, you'd say:
array=( $(awk '{print $4}' /var/log/httpd/sample | uniq -c | cut -d[ -f1) )
You can get the count of the elements in the array by saying echo "${#foo[#]}".
For checking whether the array contains an element or not, you can say:
(( "${#array[#]}" )) || echo "Nothing in array"
First, I am not experienced in scripting, so be gentle with me
Anyway, I tried making a script for finding files by mime-type ( audio, video, text...etc), and here's the poor result I came up with.
#!/bin/bash
FINDPATH="$1"
FILETYPE="$2"
locate $FINDPATH* | while read FILEPROCESS
do
if file -bi "$FILEPROCESS" | grep -q "$FILETYPE"
then
echo $FILEPROCESS
fi
done
It works, but as you could guess, the performance is not so good.
So, can you guys help me make it better ? and also, I don't want to rely on files extensions.
Update:
Here's what I am using now
#!/bin/bash
FINDPATH="$1"
find "$FINDPATH" -type f | file -i -F "::" -f - | awk -v FILETYPE="$2" -F"::" '$2 ~ FILETYPE { print $1 }'
Forking (exec) is expensive. This runs the file command only once, so it is fast:
find . -print | file -if - | grep "what you want" | awk -F: '{print $1}'
or
locate what.want | file -if -
check man file
-i #print mime types
-f - #read filenames from the stdin
#!/bin/bash
find $1 | file -if- | grep $2 | awk -F: '{print $1}'
#!/usr/bin/env bash
mimetypes=$(sed -E 's/\/.*//g; /^$/d; /^#/d' /etc/mime.types | uniq)
display_help(){
echo "Usage: ${0##*/} [mimetype]"
echo "Available mimetypes:"
echo "$mimetypes"
exit 2
}
[[ $# -lt 1 ]] && display_help
ext=$(sed -E "/^${1}/!d; s/^[^ \t]+[ \t]*//g; /^$/d; s/ /\n/g" /etc/mime.types | sed -Ez 's/\n$//; s/\n/\\|/g; s/(.*)/\.*\\.\\(\1\\)\n/')
find "$PWD" -type f -regex "$ext"