Split string after matching the word in shell - arrays

I have a csv file in that values are like:
Wt-Do-U-Do-Wit-The-Black,black
Yay-Its-Your-Birthday-Black,black
You-Are-My-Sunshine-Happy-Birthday-Red,red
You-Are-Special-Navy-Blue,navy-blue
You-Dont-Look-A-Day-Over-Fabulous-Green,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday-Pink,pink
I want to split each string before colour name comes.For Ex:
str1=Wt-Do-U-Do-Wit-The
str1=Yay-Its-Your-Birthday
str1=You-Are-My-Sunshine-Happy-Birthday
str1=You-Are-Special
str1=You-Dont-Look-A-Day-Over-Fabulous
str1=You-My-Friend-Are-Fabulous-Happy-Birthday
For searching the string i am using :-
if [ "$string" == *"Black"* ] && [ "$string" == *"White"* ] ; then
echo "It's there!"
else
echo "SOrry"
fi
It is searching fine. But how can I split the string?
Another way I used :
colour_arr[0]='Red'
colour_arr[1]='Black'
colour_arr[2]='Navy-Blue'
colour_arr[3]='White'
inarray=$(echo ${colour_arr[#]} | grep -o "$string" | wc -w)
echo "$inarray"
But this is not working.

you can use sed; inspired from this answer
I simplified the problem a little since you parsed the strings already correctly; using this input file:
This is red colour
Ball is black colour
some more words before red and more after
for the second part of the string; starting with the color name:
sed -n -e 's/^.*\(\(red\|black\).*\)/\1/p' test
gives:
red colour
black colour
red and more after
and
sed -n -e 's/\(^.*\)\(\(red\|black\).*\)/\1/p' test
gives:
This is
Ball is
some more words before
I won't explain all the options; since they are well explained in the answer I referred to. You can use sed on a bash variable using:
leftpart=$(sed -n -e 's/\(^.*\)\(\(red\|black\).*\)/\1/p' <<< $INPUT_STRING)
EDIT after the OP changed the input format:
my answer still applies; just replace red with Red. The rest applies the same.

For your new input
Input
$ cat f2
Wt-Do-U-Do-Wit-The-Black,black
Yay-Its-Your-Birthday-Black,black
You-Are-My-Sunshine-Happy-Birthday-Red,red S
You-Are-Special-Navy-Blue,navy-blue
You-Dont-Look-A-Day-Over-Fabulous-Green,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday-Pink,pink
Output ( Using gawk )
$ awk 'BEGIN{IGNORECASE=1;FS="[ ,]";OFS=","}match($1,$2){print "str1="substr($1,1,RSTART-2)}' f2
str1=Wt-Do-U-Do-Wit-The
str1=Yay-Its-Your-Birthday
str1=You-Are-My-Sunshine-Happy-Birthday
str1=You-Are-Special
str1=You-Dont-Look-A-Day-Over-Fabulous
str1=You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday
For your old input
Input
$ cat f
"This is red colour",red
"Ball is black colour",black
"Tshirt is white colour",white
"Shoes are blue colour",blue
"This is green colour",green
Output
$ awk 'BEGIN{FS=OFS=","}{gsub(/"/,"");match($1,$2);print "str1="substr($1,1,RSTART-1),"str2=" substr($1,RSTART) }' f
str1=This is ,str2=red colour
str1=Ball is ,str2=black colour
str1=Tshirt is ,str2=white colour
str1=Shoes are ,str2=blue colour
str1=This is ,str2=green colour

OneLiner using awk (gnu for IGNORECASE)
awk -F ',' 'BEGIN{IGNORECASE=1}{sub("-"$NF"$","",$1);print "str1="$1}' YourFile
Self commented code
awk -F ',' '# sepeartor of field is coma
# before first line
BEGIN{
# define case compair behaviour (ignoring the case)
IGNORECASE=1
}
# for each line
{
# substitute the pattern ( minus than field 2 content, so the color, at the end) in fields 1 by "" (remove)
sub( "-" $NF "$", "", $1)
# print the new content of filed 1 with str1= before
print "str1="$1
}' YourFile

Based on your comments, you need color out of first "dashed" field , not the value of the second field (comma separated).
If color in this first "dashed" field is always the last string (dash separated), you can simply use
a="You-Are-My-Sunshine-Happy-Birthday-Red" ; awk -F- '{print $NF}' <<<"$a"
PS: You can isolate the first field of the whole line with cut or awk:
awk -f, '{print $1}' <<<"$fileline" or cut -d, -f1 <<<"$fileline"
You can combine above two to achieve what you need.

Keep it simple:
$< input.txt
Wt-Do-U-Do-Wit-The-Black,black
Yay-Its-Your-Birthday-Black,black
You-Are-My-Sunshine-Happy-Birthday-Red,red
You-Are-Special-Navy-Blue,navy-blue
You-Dont-Look-A-Day-Over-Fabulous-Green,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday-Pink,pink
$sed -E 's/(-[^-]+)(,.*)/\2/g' input.txt
Wt-Do-U-Do-Wit-The,black
Yay-Its-Your-Birthday,black
You-Are-My-Sunshine-Happy-Birthday,red
You-Are-Special-Navy,navy-blue
You-Dont-Look-A-Day-Over-Fabulous,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday,pink
(Note: on my OS, OSX, sed -E is for extended regex.)

Related

Spaces in array content getting broken with grep

I am using array to tackle with spaces in line of my file. But when i am using grep to filter with value of array it is breaking because of spaces.
For example my line is as per below
bbbh.cone.abc.com:/home 'bbbh.cone.abc.com
As it has spaces i am using array as per below.
object1=$(echo "$line" | awk '{print $1}' )
object2=$(echo "$line" | awk '{print $2}' )
object3=$(echo "$line" | awk '{print $3}' )
object4=$(echo "$line" | awk '{print $4}' )
hiteshcharry=("$object1" "$object2" "$object3" "$object4")
grep "${hiteshcharry[#]}" <filename>
It give me error because of spaces.
Below is the example.
I have below line in my file.
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
So i have 2 spaces in my above line. I have written my script in such way so that it can handle a line with maximum 4 spaces.
When i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
it give me error because of spaces. However when i print the value of array it show me correct value.
Example : -
one of line from my file is as below( it has 2 spaces)
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
I am putting this value in my array named as hiteshcharry. when i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
It is giving me error because of spaces in value of array. In output it should filter the line having value equal to array named hiteshcharry.
I hope this is clear now.
Output of omnidb command is in picture. So i want to grep the lines having
"st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space
'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'" from
output of omnidb command which is in picture
enter image description here
Thanks. i have added declare -p hiteshcharry and it start printing the each elements of array. But i am error shown in picture .
enter image description here
When you pass your array to grep through "${array[#]}", grep will see each array element as a separate argument. So, the first element would become the pattern to search for, and the second element onwards would become the file names to be searched on. Obviously, that's not what you want.
You can use process substitution to make grep match the strings contained in your array, like this:
omnidb -session "$sessionid" -detail | grep -Fxf <(printf '%s\n' "${hiteshcharry[#]}")
printf will print your array elements one line per element
grep -Fxf treats the about output as a file containing strings to be searched (-F option treats them as strings, not patterns, -x matches the whole line of omnidb output, preventing any partial matches)

How do i echo specific rows and columns from csv's in a variable?

The below script:
#!/bin/bash
otscurrent="
AAA,33854,4528,38382,12
BBB,83917,12296,96213,13
CCC,20399,5396,25795,21
DDD,27198,4884,32082,15
EEE,2472,981,3453,28
FFF,3207,851,4058,21
GGG,30621,4595,35216,13
HHH,8450,1504,9954,15
III,4963,2157,7120,30
JJJ,51,59,110,54
KKK,87,123,210,59
LLL,573,144,717,20
MMM,617,1841,2458,75
NNN,234,76,310,25
OOO,12433,1908,14341,13
PPP,10627,1428,12055,12
QQQ,510,514,1024,50
RRR,1361,687,2048,34
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
"
IFS="," array1=(${otscurrent})
echo ${array1[4]}
Prints:
$ ./test.sh
12
BBB
I'm trying to get it to just print 12... And I am not even sure how to make it just print row 5 column 4
The variable is an output of a sqlquery that has been parsed with several sed commands to change the formatting to csv.
otscurrent="$(sqlplus64 user/password#dbserverip/db as sysdba #query.sql |
sed '1,11d; /^-/d; s/[[:space:]]\{1,\}/,/g; $d' |
sed '$d'|sed '$d'|sed '$d' | sed '$d' |
sed 's/Used,MB/Used MB/g' |
sed 's/Free,MB/Free MB/g' |
sed 's/Total,MB/Total MB/g' |
sed 's/Pct.,Free/Pct. Free/g' |
sed '1b;/^Name/d' |
sed '/^$/d'
)"
Ultimately I would like to be able to call on a row and column and run statements on the values.
Initially i was piping that into :
awk -F "," 'NR>1{ if($5 < 10) { printf "%-30s%-10s%-10s%-10s%-10s\n", $1,$2,$3,$4,$5"%"; } else { echo "Nothing to do" } }')"
Which works but I couldn't run commands from if else ... or atleaste I didn't know how.
If you have bash 4.0 or newer, an associative array is an appropriate way to store data in this kind of form.
otscurrent=${otscurrent#$'\n'} # strip leading newline present in your sample data
declare -A data=( )
row=0
while IFS=, read -r -a line; do
for idx in "${!line[#]}"; do
data["$row,$idx"]=${line[$idx]}
done
(( row += 1 ))
done <<<"$otscurrent"
This lets you access each individual item:
echo "${data[0,0]}" # first field of first line
echo "${data[9,0]}" # first field of tenth line
echo "${data[9,1]}" # second field of tenth line
"I'm trying to get it to just print 12..."
The issue is that IFS="," splits on commas and there is no comma between 12 and BBB. If you want those to be separate elements, add a newline to IFS. Thus, replace:
IFS="," array1=(${otscurrent})
With:
IFS=$',\n' array1=(${otscurrent})
Output:
$ bash test.sh
12
All you need to print the value of the 4th column on the 5th row is:
$ awk -F, 'NR==5{print $4}' <<< "$otscurrent"
3453
and just remember that in awk row (record) and column (field) numbers start at 1, not 0. Some more examples:
$ awk -F, 'NR==1{print $5}' <<< "$otscurrent"
12
$ awk -F, 'NR==2{print $1}' <<< "$otscurrent"
BBB
$ awk -F, '$5 > 50' <<< "$otscurrent"
JJJ,51,59,110,54
KKK,87,123,210,59
MMM,617,1841,2458,75
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
If you'd like to avoid all of the complexity and simply parse your SQL output to produce what you want without 20 sed commands in between, post a new question showing the raw sqlplus output as the input and what you want finally output and someone will post a brief, clear, simple, efficient awk script to do it all at one time, or maybe 2 commands if you still want an intermediate CSV for some reason.

Comment out items that do not match pattern in array

I have a log file I am trying to comment out lines that do not match my array. I did successfully learn how to create an array and I can echo out the array items but I am having trouble taking anything that doesn't match my array and adding something in front of it. Here is my code, if you have suggestions on another path or ways I can make it better:
for itsSaturday in $(find "$LOCATION" -mindepth 1 -maxdepth 1 -name "*.log" ); do
TEMPFILE="$itsSaturday.$$"
declare -a someArray=( "breakfast" "scrambled eggs" "Bloody Mary" )
theCall='some_additional_text_'
commentOn="## You_need_"
for arrayItem in "${someArray[#]}"; do
merged="$theCall$arrayItem"
if ! grep -q "$merged" "$itsSaturday"; then
sed -e '/$merged/! s:$commentOn$theCall::g' "$itsSaturday" > $TEMPFILE && mv $TEMPFILE "$itsSaturday"
fi
done
done
file:
some_additional_text_breakfast
some_additional_text_bacon
some_additional_text_scrambled eggs
some_additional_text_Bloody Mary
some_additional_text_orange juice
some_additional_text_breakfast
file into:
some_additional_text_breakfast
## You_need_some_additional_text_bacon
some_additional_text_scrambled eggs
some_additional_text_Bloody Mary
## You_need_some_additional_text_orange juice
some_additional_text_breakfast
How can I add a variable before items that do not match my array?
I don't like doing this using bash and sed, but I think the following might be enough:
#! /bin/bash
declare -a someArray=( "breakfast" "scrambled eggs" "Bloody Mary" )
theCall='some_additional_text_'
commentOn="## You_need_"
OIFS="$IFS"
IFS='|' mergedLines="${someArray[*]/#/$theCall}"
IFS="$OIFS"
for i in *.txt
do
TEMPFILE="$i.$$"
sed -r "/$mergedLines/!s/^/$commentOn/" "$i" >> "$TEMPFILE"
done
I shifted the array and other constants out of the loop.
"${someArray[*]/#/$theCall}" uses bash string substitution to append the contents of $theCall to every element in the array.
IFS='|' mergedLines="${someArray[*]} is a convenient trick to combine the elements of an array into a pipe-separated string.
Combined, (2) and (3) get me
some_additional_text_breakfast|some_additional_text_scrambled eggs|some_additional_text_Bloody Mary
in mergedLines.
Then it's just a matter of using extended regular expressions in sed (for |) and replacing non-matching lines.
Your sed pattern used single quotes, so the variables within were not expanded.
Try replacing the inner for-loop with:
PROG=$(printf '%s\n' "${COMMENT[#]}" | while read comment ; do
/bin/echo -n '$0 !~ /'"$comment"'$/ && '
done
echo '1 { printf commentOn } ; { print }')
awk -v commentOn="$commentOn" "$PROG" $itsSaturday > $TEMPFILE && mv $TEMPFILE $itsSaturday
On each file, this creates an awk program that does the work.

Append elements of an array to the end of a line

First let me say I followed questions on stackoverflow.com that relate to my question and it seems the rules are not applying. Let me show you.
The following script:
#!/bin/bash
OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
TODAY=`date +"%m-%d-%y"`
HOSTNAME=`hostname`
WORKSPACES=( "bob" "mel" "sideshow-ws2" )
if ! [ -f $OUTPUT_DIR/$HOSTNAME.csv ] && [ $HOSTNAME == "sideshow" ]; then
echo "$TODAY","$HOSTNAME" > $OUTPUT_DIR/$HOSTNAME.csv
echo "${WORKSPACES[0]}," >> $OUTPUT_DIR/$HOSTNAME.csv
sed -i "/^'"${WORKSPACES[0]}"'/$/'"${WORKSPACES[1]}"'/" $OUTPUT_DIR/$HOSTNAME.csv
sed -i "/^'"${WORKSPACES[1]}"'/$/${WORKSPACES[2]}"'/" $OUTPUT_DIR/$HOSTNAME.csv
fi
I want the output to look like:
09-20-14,sideshow
bob,mel,sideshow-ws2
the sed statements are supposed to append successive array elements to preceding ones on the same line. Now I know there's a simpler way to do this like:
echo "${WORKSPACES[0]},${WORKSPACES[1]},${WORKSPACES[2]}" >> $OUTPUT_DIR/$HOSTNAME.csv
But let's say I had 30 elements in the array and I wanted to appended them one after the other on the same line? Can you show me how to loop through the elements in an array and append them one after the other on the same line?
Also let's say I had the output of a command like:
df -m /export/ws/$ws | awk '{if (NR!=1) {print $3}}'
and I wanted to append that to the end of the same line.
But when I run it I get:
+ OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
++ date +%m-%d-%y
+ TODAY=09-20-14
++ hostname
+ HOSTNAME=sideshow
+ WORKSPACES=("bob" "mel" "sideshow-ws2")
+ '[' -f /share/es-ops/Build_Farm_Reports/WorkSpace_Reports/sideshow.csv ']'
And the file right now looks like:
09-20-14,sideshow
bob,
I am happy to report that user syme solved this (see below) but then I realized I need the date in the first column:
09-7-14,bob,mel,sideshow-ws2
Can I do this using syme's for loop?
Okay user syme solved this too he said "Just add $TODAY to the for loop" like this:
for v in "$TODAY" "${WORKSPACES[#]}"
Okay now the output looks like this I changed the elements in the array btw:
sideshow
09-20-14,bob_avail,bob_used,mel_avail,mel_used,sideshow-ws2_avail,sideshow-ws2_used
Now below that the next line will be populated by a , in the first column skipping the date and then:
df -m /export/ws/$v | awk '{if (NR!=1) {print $3}}
which equals the value of available space on bob in the first iteration
and then:
df -m /export/ws/$v | awk '{if (NR!=1) {print $2}}
which equals the value of used space on bob in the 2nd iteration
and then we just move on to the next value in ${WORKSPACE[#]}
which will be mel and do the available and used as we did with bob or $v above.
I know you geniuses on here will make child's play out of this.
I solved my own last question on this thread:
WORKSPACES2=( "bob" "mel" "sideshow-ws2" )
separator="," # defined empty for the first value
for v in "${WORKSPACES2[#]}"
do
available=`df -m /export/ws/$v | awk '{if (NR!=1) {print $3}}'`
used=`df -m /export/ws/$v | awk '{if (NR!=1) {print $2}}'`
echo -n "$separator$available$separator$used" >> $OUTPUT_DIR/$HOSTNAME.csv # append, concatenated, the separator and the value to the file
done
produces:
sideshow
09-20-14,bob_avail,bob_used,mel_avail,mel_used,sideshow-ws2_avail,sideshow-ws2_used
,470400,1032124,661826,1032124,43443,1032108
echo -n permits to print text without the linebreak.
To loop over the values of the array, you can use a for-loop:
echo "$TODAY,$HOSTNAME" > $OUTPUT_DIR/$HOSTNAME.csv # with a linebreak
separator="" # defined empty for the first value
for v in "${WORKSPACES[#]}"
do
echo -n "$separator$v" >> $OUTPUT_DIR/$HOSTNAME.csv # append, concatenated, the separator and the value to the file
separator="," # comma for the next values
done
echo >> $OUTPUT_DIR/$HOSTNAME.csv # add a linebreak (if you want it)

How can I print a specific field from a specific line in a delimited type file

I have a sorted, delimited type file and I want to extract a specific field in specific line.
This is my input file: somefile.csv
efevfe,132143,27092011080210,howdy,hoodie
adfasdfs,14321,27092011081847,howdy,hoodie
gerg,7659876,27092011084604,howdy,hoodie
asdjkfhlsdf,7690876,27092011084688,howdy,hoodie
alfhlskjhdf,6548,27092011092413,howdy,hoodie
gerg,769,27092011092415,howdy,hoodie
badfa,124314,27092011092416,howdy,hoodie
gfevgreg,1213421,27092011155906,howdy,hoodie
I want to extract 27092011084688 (value from 4th line, 3rd column).
I used awk 'NR==4' but it gave me whole 4th line.
Fairly straightforward:
awk -F',' 'NR == 4 { print $3 }' somefile.csv
Using , as a field separator, take record number 4 and print field 3 in somefile.csv.
$ sed -n "4p" somefile.csv | cut -d, -f3
Edit
What's this?
-n turns of normal output
4p prints the 4th row
-d, makes cut use , as delimiter
-f3 makes cut print the 3rd field
One way using awk:
awk -F, 'NR==4 { print $3 }' file.txt
Use the following:
awk -F ',' 'NR==4 {print $3}'
perl alternative to print element 3 on line 4 in a csv file:
perl -F, -lane 'print $F[2] if $. == 4' somefile.csv

Resources