Count unique values in a bash array - arrays

I have an array ${sorted[#]}. How can I count the frequency of occurrence of the elements of the array.
e.g:
Array values:
bob
jane
bob
peter
Results:
bob 2
jane 1
peter 1

The command
(IFS=$'\n'; sort <<< "${array[*]}") | uniq -c
Explanation
Counting occurrences of unique lines is done with the idiom sort file | uniq -c.
Instead of using a file, we can also feed strings from the command line to sort using the here string operator <<<.
Lastly, we have to convert the array entries to lines inside a single string. With ${array[*]} the array is expanded to one single string where the array elements are separated by $IFS.
With IFS=$'\n' we set the $IFS variable to the newline character for this command exclusively. The $'...' is called ANSI-C Quoting and allows us to express the newline character as \n.
The subshell (...) is there to keep the change of $IFS local. After the command $IFS will have the same value as before.
Example
array=(fire air fire earth water air air)
(IFS=$'\n'; sort <<< "${array[*]}") | uniq -c
prints
3 air
1 earth
2 fire
1 water

Related

How to get user input as number and echo the stored array value of that number in bash scripting

I have wrote a script that throws the output of running node processes with the cwd of that process and I store the value in an array using for loop and do echo that array.
How can I able to get the user enter the index of array regarding the output that the script throws and show the output against that input generated by user
Example Myscript
array=$(netstat -nlp | grep node)
for i in ${array[*]}
do
echo $i
done
output is something like that
1056
2064
3024
I want something more advance. I want to take input from user like
Enter the regarding index from above list = 1
And lets suppose user enter 1
Then next output should be
Your selected value is 2064
Is it possible in bash
First, you're not actually using an array, you are storing a plain string in the variable "array". The string contains words separated by whitespace, so when you supply the variable in the for statement, the unquoted value is subject to Word Splitting
You need to use the array syntax for setting the array:
array=( $(netstat -nlp | grep node) )
However, the unquoted command substitution still exposes you to Filename Expansion. The best way to store the lines of a command into an array is to use the mapfile command with a process substitution:
mapfile -t array < <(netstat -nlp | grep node)
And in the for loop, make sure you quote all the variables and use index #
for i in "${array[#]}"; do
echo "$i"
done
Notes:
arrays created with mapfile will start at index 0, so be careful of off-by-one errors
I don't know how variables are implemented in bash, but there is this oddity:
if you refer to the array without an index, you'll get the first element:
array=( "hello" "world" )
echo "$array" # ==> hello
If you refer to a plain variable with array syntax and index zero, you'll get the value:
var=1234
echo "${var[0]}" # ==> 1234

Sort multiple column String array in bash

I have an array of strings:
arr[0]="1 10 2Z6UVU6h"
arr[1]="1 12 7YzF5mFs"
arr[2]="2 36 qRwAiLg7"
How could i sort by the 2nd column and use the 1st as a tie break.
Is there anything similar to something like...
sort -k 2,2n -k 1,1 $arr
As long as there are no newline characters in any array element, it's straight-forward: Just printf the array into sort and capture the output:
mapfile -t sorted < <(printf "%s\n" "${arr[#]}" | sort -k2,2n -k1,1)
(The use of process substitution is to avoid having the mapfile run in a subshell, which wouldn't be helpful since the goal is to set the value of $sorted in this shell.)
If the array elements might contain newlines, then you could use NUL as a delimiter in the printf and the sort (option -z for sort), but you'd have to replace mapfile with an explicit loop because mapfile does not offer an option to change the line delimiter. read does (-d '' will cause read to use NUL as a line delimiter), but it only reads one line at a time.

comparing 2 strings element of an array

a=0
b=1
for a in ${my_array[#]}
do
for b in ${my_array[#]}
do
if [ " ${my_array[a]} " = " ${my_array[b]} " ]
then
continue
((a++))
fi
done
((b++))
done
Hi. I want to compare 2 strings. They are in the same array. If they are same, I just print it one of them. How can I do that ? I write some code. There are 2 thins (a and b ) a's first value is 0 and it stores first element of array. b's first value is 1 and it stores 1.element of array. I want to compare them and if they are same strings, I just print one of them .so I use "continue". think my code is true, but it doesn't work .there is a mistake which I can't see. Can you help me ?
for example it runs like that .
Enter words :
erica 17
sally 16
john 18
henry 17
john 18
jessica 19
as you see there are 2 john 18. I don't want both of them. My program will be check there are 2 strings are the same . If they are same I will just use one of them .
The if statment "=" - assign, "==" - compare.
If I understand correctly, you want to uniquify the elements of an array. If this is right, then the following stackoverflow question (How can I get unique values from an array in Bash?) appears to answer it in the following one-liner:
echo "${my_array[#]}" | tr ' ' '\n' | sort | uniq
Unfortunately, since your input words (or elements of array) contain spaces, the above will not work as expected. The issue is because the first echo will flatten-out the array into space-separated elements. The solution to this would be to use printf and remove 'tr'. Here it is...
printf '%s\n' "${my_array[#]}" | sort | uniq -c
But this alters the position of elements wrt the original array. Hope that is fine?
You can use sort and uniq to be able to get your desired output:
echo "${my_array[#]}" | tr ' ' '\n' | sort | uniq -c | tr '\n' ' '
And if you have spaces in your input you can use this:
printf '%s\n' "${my_array[#]}" | sort | uniq -c
That way you will get the number of times the string occurred but it will be printed just once.

How to parse only selected column values using awk

I have a sample flat file which contains the following block
test my array which array is better array huh got it?
INDIA USA SA NZ AUS ARG ARM ARZ GER BRA SPN
I also have an array(ksh_arr2) which was defined like this
ksh_arr2=$(awk '{if(NR==1){for(i=1;i<=NF;i++){if($i~/^arr/){print i}}}}' testUnix.txt)
and contains the following integers
3 5 8
Now I want to parse only those column values which are at the respective numbered positions i.e. third fifth and eighth.
I also want the outputs from the 2nd line on wards.
So I tried the following
awk '{for(i=1;i<=NF;i++){if(NR >=1 && i=${ksh_arr2[i]}) do print$i ; done}}' testUnix.txt
but it is apparently not printing the desired outputs.
What am I missing ? Please help.
How i would approach it
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{for(i in B)print $B[i]}' file
Explanation
-vA="${ksh_arr2[*]}" - Set variable A to expanded ksh array
'BEGIN{split(A,B," ") - Splits the expanded array on spaces
(effictively recreating it in awk)
{for(i in B)print $B[i]} - Index in the new array print the field that is the number
contained in that index
Edit
If you want to preserve the order of the fields when printing then this would be better
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{while(++i<=length(B))print $B[i]}' file
Since no sample output is shown, I don't know if this output is what you want. It is the output one gets from the code provided with the minimal changes required to get it to run:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<=length(a);i++){print $a[i]}}' testUnix.txt
array
array
array
SA
AUS
ARZ
The above code prints out the selected columns in the same order supplied by the variable k.
Notes
The awk code never defined ksh_arr2. I presume that the value of this array was to be passed in from the shell. It is done here using the -v option to set the variable k to the value of ksh_arr2.
It is not possible to pass into awk an array directly. It is possible to pass in a string, as above, and then convert it to an array using the split function. Above the string k is converted to the awk array a.
awk syntax is different from shell syntax. For instance, awk does not use do or done.
Details
-v k='3 5 8'
This defines an awk variable k. To do this programmatically, replace 3 5 8 with a string or array from the shell.
BEGIN{split(k,a," ");}
This converts the space-separated values in variable k into an array named a.
for(i=1;i<=length(a);i++){print $a[i]}
This prints out each column in array a in order.
Alternate Output
If you want to keep the output from each line on a single line:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<length(a);i++) printf "%s ",$a[i]; print $a[length(a)]}' testUnix.txt
array array array
SA AUS ARZ
awk 'NR>=1 { print $3 " " $5 " " $8 }' testUnix.txt

Store grep output containing whitespaces in an array

I want to store some lines of the output of blkid in an array. The problem is, that those lines contain whitespace and the array syntax takes those as delimiters for single array elements, so that i end up with splitted lines in my array instead of one line beeing one array element.
This is the code i currently have:
devices=($(sudo blkid | egrep '^/dev/sd[b-z]'))
echo ${devices[*]} gives me the following output:
/dev/sdb1: LABEL="ARCH_201108" TYPE="udf"
/dev/sdc1: LABEL="WD" UUID="414ECD7B314A557F" TYPE="ntfs"
But echo ${#devices[*]} gives me 7 but insted i want to have 2. I want /dev/sdb1: LABEL="ARCH_201108" TYPE="udf" to be the first element in my devices array and /dev/sdc1: LABEL="WD" UUID="414ECD7B314A557F" TYPE="ntfs" to be the second one. How can i accomplish that?
Array elements are split on the IFS value. If you want to split on newline, adjust IFS:
IFS_backup=$IFS
IFS=$'\n'
devices=($(sudo blkid | egrep '^/dev/sd[b-z]'))
IFS=$IFS_backup
echo ${#devices[#]}

Resources