Spaces in array content getting broken with grep

Spaces in array content getting broken with grep - arrays

I am using array to tackle with spaces in line of my file. But when i am using grep to filter with value of array it is breaking because of spaces.
For example my line is as per below
bbbh.cone.abc.com:/home 'bbbh.cone.abc.com
As it has spaces i am using array as per below.
object1=$(echo "$line" | awk '{print $1}' )
object2=$(echo "$line" | awk '{print $2}' )
object3=$(echo "$line" | awk '{print $3}' )
object4=$(echo "$line" | awk '{print $4}' )
hiteshcharry=("$object1" "$object2" "$object3" "$object4")
grep "${hiteshcharry[#]}" <filename>
It give me error because of spaces.
Below is the example.
I have below line in my file.
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
So i have 2 spaces in my above line. I have written my script in such way so that it can handle a line with maximum 4 spaces.
When i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
it give me error because of spaces. However when i print the value of array it show me correct value.
Example : -
one of line from my file is as below( it has 2 spaces)
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
I am putting this value in my array named as hiteshcharry. when i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
It is giving me error because of spaces in value of array. In output it should filter the line having value equal to array named hiteshcharry.
I hope this is clear now.
Output of omnidb command is in picture. So i want to grep the lines having
"st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space
'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'" from
output of omnidb command which is in picture
enter image description here
Thanks. i have added declare -p hiteshcharry and it start printing the each elements of array. But i am error shown in picture .
enter image description here

When you pass your array to grep through "${array[#]}", grep will see each array element as a separate argument. So, the first element would become the pattern to search for, and the second element onwards would become the file names to be searched on. Obviously, that's not what you want.
You can use process substitution to make grep match the strings contained in your array, like this:
omnidb -session "$sessionid" -detail | grep -Fxf <(printf '%s\n' "${hiteshcharry[#]}")
printf will print your array elements one line per element
grep -Fxf treats the about output as a file containing strings to be searched (-F option treats them as strings, not patterns, -x matches the whole line of omnidb output, preventing any partial matches)

Related

Output results from cat into different files with names specified into an array

I would like to do cat on several files, which names are stored in an array:
cat $input | grep -v "#" | cut -f 1,2,3
Here the content of the array:
echo $input
1.blastp 2.blastp 3.blastp 4.blastp 5.blastp 6.blastp 7.blastp 8.blastp 9.blastp 10.blastp 11.blastp 12.blastp 13.blastp 14.blastp 15.blastp 16.blastp 17.blastp 18.blastp 19.blastp 20.blastp
This will work just nicely. Now, I am struggling in storing the results into proper output files. So I want to also store the output into files which names are stored into another array:
echo $out_in
1_pairs.tab 2_pairs.tab 3_pairs.tab 4_pairs.tab 5_pairs.tab 6_pairs.tab 7_pairs.tab 8_pairs.tab 9_pairs.tab 10_pairs.tab 11_pairs.tab 12_pairs.tab 13_pairs.tab 14_pairs.tab 15_pairs.tab 16_pairs.tab 17_pairs.tab 18_pairs.tab 19_pairs.tab 20_pairs.tab
cat $input | grep -v "#" | cut -f 1,2,3 > "$out_in"
My problem is:
When I don't use the "" I will get 'ambiguous redirect' error.
When I use them, a single file will be created that comes by the name:
1_pairs.tab?2_pairs.tab?3_pairs.tab?4_pairs.tab?5_pairs.tab?6_pairs.tab?7_pairs.tab?8_pairs.tab?9_pairs.tab?10_pairs.tab?11_pairs.tab?12_pairs.tab?13_pairs.tab?14_pairs.tab?15_pairs.tab?16_pairs.tab?17_pairs.tab?18_pairs.tab?19_pairs.tab?20_pairs.tab
I don't get why the input array is read with no problem but that's not the case for the output array...
any ideas?
Thanks a lot!
D.

You cannot redirect output that way, the output is a stream of characters and the redirection can not know when to switch to the next file. You need a loop over input files.
Assuming that the file names do not contain spaces:
for fn in $input; do
grep -v "$fn" | cut -f 1,2,3 >"${fn%%.*}_pairs.tab"
done

Shell - Looping Array with command and increment command values

var1=$(echo $getDate | awk '{print $1} {print $2}')
var2=$(echo $getDate | awk '{print $3} {print $4}')
var3=$(echo $getDate | awk '{print $5} {print $6}')
Instead of repeating like the code above, I need to:
loop the same command
increment the values ({print $1} {print $2})
store the value in an array
I was doing something like below but I am stuck maybe someone can help me please:
COMMAND=`find $locationA -type f | wc -l`
getDate=$(find $locationA -type f | xargs ls -lrt | awk '{print $6} {print $7}')
a=1
b=2
for i in $COMMAND
do
i=$(echo $getDate | awk '{print $a} {print $b}')
myarray+=('$i')
a=$((a+1))
b=$((b+1))
done
PS - using ksh
Problem: $COMMAND stores the number of files found in $locationA. I need to loop through the amount of files found and store their dates in an array.

I don't get the meaning of your example code (what is the 'for' loop supposed to do? What is the content of the variable COMMAND?), but in your question you ask to store something in an array, while in the code you wish to simplify, you don't use an array, but simple variables (var1, var2, ....).
If I understand your requirement correctly, your variable getDate contains a string of several words, which are separated by spaces, and you want to assign the first two words to var1, the following two words to var2, and so on. Is this correct?

Now the edited code is at least a bit clearer, though I still don't understand, why you use i as a loop variable, and overwrite it in the first statement inside the loop.
However, a few comments:
If you push '$i' into your array, you will get a literal '$' sign, followed by the letter 'i'. To add a variable i containing to numbers, you need double quotes ("$i").
I don't understand why you want to loop over the cotnent of the variable COMMAND. This variable will always hold a single number, which means that the loop will be executed exactly once.
You could use a counting loop, incrementing loop variable by 2 on each iteration. You would have to precalculate the number of iterations beforehand.
Perhaps an easier alternative, which would work in bash or in zsh (I did not try other shells) is to first turn your variable in an array,
tmparr=($(echo $getDate|fmt -w 1))
and then use a loop to collect pairs of this element:
myarray=()
for ((i=0; i<${#tmparr[*]}; i+=2))
do
myarray+=("${tmparr[$i]} ${tmparr[$((i+1))]}")
done
${myarray[0]} will hold a string consisting of the first to words from getDate, etc.

This one should work on zsh, at least with newer versions:
myarray=()
echo $g|fmt -w 1|paste -s -d " \n"|while read s; do myarray+=("$s"); done
This leaves the first pair in ${myarray[1]}, etc.
It doesn't work with bash (and old zsh versions), because these shells would execute the body of the loop in a subshell.
ADDED:
On a second thought, in zsh this one would be simpler:
myarray=("${(f)$(echo $g|fmt -w 1|paste -s -d ' \n')}")

How can I print a specific field from a specific line in a delimited type file

I have a sorted, delimited type file and I want to extract a specific field in specific line.
This is my input file: somefile.csv
efevfe,132143,27092011080210,howdy,hoodie
adfasdfs,14321,27092011081847,howdy,hoodie
gerg,7659876,27092011084604,howdy,hoodie
asdjkfhlsdf,7690876,27092011084688,howdy,hoodie
alfhlskjhdf,6548,27092011092413,howdy,hoodie
gerg,769,27092011092415,howdy,hoodie
badfa,124314,27092011092416,howdy,hoodie
gfevgreg,1213421,27092011155906,howdy,hoodie
I want to extract 27092011084688 (value from 4th line, 3rd column).
I used awk 'NR==4' but it gave me whole 4th line.

Fairly straightforward:
awk -F',' 'NR == 4 { print $3 }' somefile.csv
Using , as a field separator, take record number 4 and print field 3 in somefile.csv.

$ sed -n "4p" somefile.csv | cut -d, -f3
Edit
What's this?
-n turns of normal output
4p prints the 4th row
-d, makes cut use , as delimiter
-f3 makes cut print the 3rd field

One way using awk:
awk -F, 'NR==4 { print $3 }' file.txt

Use the following:
awk -F ',' 'NR==4 {print $3}'

perl alternative to print element 3 on line 4 in a csv file:
perl -F, -lane 'print $F[2] if $. == 4' somefile.csv

How do I assign the output of a command into an array?

I need to assign the results from a grep to an array... for example
grep -n "search term" file.txt | sed 's/:.*//'
This resulted in a bunch of lines with line numbers in which the search term was found.
1
3
12
19
What's the easiest way to assign them to a bash array? If I simply assign them to a variable they become a space-separated string.

To assign the output of a command to an array, you need to use a command substitution inside of an array assignment. For a general command command this looks like:
arr=( $(command) )
In the example of the OP, this would read:
arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
The inner $() runs the command while the outer () causes the output to be an array. The problem with this is that it will not work when the output of the command contains spaces. To handle this, you can set IFS to \n.
IFS=$'\n' arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
You can also cut out the need for sed by performing an expansion on each element of the array:
arr=($(grep -n "search term" file.txt))
arr=("${arr[#]%%:*}")

Space-separated strings are easily traversable in bash.
# save the ouput
output=$(grep -n "search term" file.txt | sed 's/:.*//')
# iterating by for.
for x in $output; do echo $x; done;
# awk
echo $output | awk '{for(i=1;i<=NF;i++) print $i;}'
# convert to an array
ar=($output)
echo ${ar[3]} # echos 4th element
if you are thinking space in file name use find . -printf "\"%p\"\n"

#Charles Duffy linked the Bash anti-pattern docs in a comment, and those give the most correct answer:
readarray -t arr < <(grep -n "search term" file.txt | sed 's/:.*//')
His comment:
Note that array=( $(command) ) is considered an antipattern, and is the topic of BashPitfalls #50. – Charles Duffy Nov 16, 2020 at 14:07

Assigning command's output to shell variable and get the variables size

I have a file consisting of digits. Usually, each line contains one single number. I would like to count the number of lines in the file that begin with digit '0'. If it's the case, then I would like to do some post-processing.
Although I'm able to retrieve correctly the corresponding line numbers, the total number of retrieved lines is not correct. Below, I'm posting the code that I'm using.
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
# linesToRemove=$(grep -n "^0" ${inputFile} | cut -d":" -f1);
linesNr=${#linesToRemove} # <- here, the error
# linesNr=${#linesToRemove[#]} # <- here, the error
if [ "${linesNr}" -gt "0" ]; then
# do something here, e.g. remove corresponding lines.
awk -v n=$linesToRemove 'NR == n {next} {print}' ${anotherFile} > ${outputFile}
fi
Also, as for the awk-based command, how could I use a shell-variable? I tried the command below, but it's not working correctly, since 'myIndex' is interpreted as a text and not as a variable.
linesToRemove=$(awk -v myIndex="$myIndex" '/^myIndex/ { print NR;}' ${inputFile});
Given the line numbers starting with 0 found in ${inputFile}, I would like to remove the corresponding lines numbers from ${anotherFile}. An example for both ${inputFile} and ${anotherFile} is given below:
// ${inputFile}
0
1
3
0
// ${anotherFile}
2.617300e+01 5.886700e+01 -1.894697e-01 1.251225e+02
5.707397e+01 2.214040e+02 8.607959e-02 1.229114e+02
1.725900e+01 1.734360e+02 -1.298053e-01 1.250318e+02
2.177940e+01 1.249531e+02 1.538853e-01 1.527150e+02
// ${outputFile}
5.707397e+01 2.214040e+02 8.607959e-02 1.229114e+02
1.725900e+01 1.734360e+02 -1.298053e-01 1.250318e+02
In the example above, I need to delete lines 0 and 3 from ${anotherFile}, given that those lines correspond to the lines starting with 0 in ${inputFile}.

If you want to count the number of lines in the file that begins with 0, then this line is wrong.
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
The above says to print the line number when the line start with 0, and your linesToRemove variable will contain all the line numbers, not the total number of lines. Use END{} block to capture the total. eg
linesToRemove=$(awk '/^0/ {c++}END{print c}' ${inputFile});
As for your 2nd question on using variable inside awk, use the regex operator ~. And then set your myIndex variable to include the ^ anchor
linesToRemove=$(awk -v myIndex="^$myIndex" '$0 ~ myIndex{ print NR;}' ${inputFile});
finally, if you just want to remove those lines that start with 0, then just simply remove it
awk '/^0/{next}{print $0>FILENAME}' file
If you want to remove lines from another file using what is captured in input file, here's one way
paste -d"|" inputfile anotherfile | awk '!/^0/{gsub(/^.*\|/,"");print}'
Or just one awk command
awk 'FNR==NR && /^0/{a[FNR]} NR>FNR && (!(FNR in a))' inputfile anotherfile
crude explanation: FNR==NR && /^0/ means process the first file whole line starts with 0 and put its line number into array a. NR>FNR means process the next file and if line number not in array, print the line. See the gawk documentation for what FNR,NR etc means

I think you have to do the following to assign an array:
linesToRemove=( $(awk '/^0/ { print NR; }' ${inputFile}) )
And to get the number of elements do (as you have in a commented line):
linesNr=${#linesToRemove[#]}
To remove the lines from from the file you could do something like:
sedCmd=""
for lineNr in ${linesToRemove[#]}; do
sedCmd="$sedCmd;${lineNr}d"
done
sed "$sedCmd" ${anotherFile} > ${outputFile}

In general if you do this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
instead of this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
linesNr=${#linesToRemove}
use this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
linesNr=${echo $linesToRemove|awk '{print NF}'}
POC :
cat temp.sh
#!/usr/bin/ksh
lines=$(awk '/^d/{print NR}' script.sh)
nooflines=$(echo $lines|awk '{print NF}')
echo $nooflines
torinoco!DBL:/oo_dgfqausr/test/dfqwrk12/vijay> temp.sh
8
torinoco!DBL:/oo_dgfqausr/test/dfqwrk12/vijay>

It greatly depends on the post-processing you are doing, but do you really need the actual count? Why not do something like this:
if grep ^0 $inputfile > /dev/null; then
# There is at least one line with a leading 0
:
fi
grep -v ^0 $inputfile | process-lines-without-leading-zero
grep ^0 $inputfile | process-lines-with-leading-zero
Or, even just:
if grep ^0 $inputfile | process-lines-with-leading-zero; then
# some post processing
:
fi
--EDIT--
Based on what you've said in your comment, I would recommend a different approach. If I understand you correctly, you want to read file a, looking for lines of the form ^0[0-9]*,
and then remove those line numbers from file b. Doing it one line at a time is pretty slow if the files get big. Just do:
cmd=$( grep '^0[0-9]*$' a | sed 's/$/d;/g' )
sed "$cmd" b
The assignment to cmd forms a sed command to delete the lines. Invoking sed on b will omit those lines. You'll need to redirect the sed output appropriately (perhaps to a temp file and then back to b, or just use 'sed -i' if you're using gnu sed.)

Given the large number of edits to this question, it seems easiest to start a new answer. Your problem can be solved with a simple one-liner:
$ sed "$( grep -n ^0 $inputFile | sed 's/:.*/d;/g' )" $anotherFile > $outputFile