I've got a .dat file:
#id|firstName|lastName|gender|birthday|creationDate|locationIP|browserUsed
933|Mahinda|Perera|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.123|Firefox
1129|Carmen|Lepland|female|1984-02-18|2010-02-28T04:39:58.781+0000|81.25.252.111|Internet Explorer
4194|Hồ Chí|Do|male|1988-10-14|2010-03-17T22:46:17.657+0000|103.10.89.118|Internet Explorer
8333|Chen|Wang|female|1980-02-02|2010-03-15T10:21:43.365+0000|1.4.16.148|Internet Explorer
8698|Chen|Liu|female|1982-05-29|2010-02-21T08:44:41.479+0000|14.103.81.196|Firefox
8853|Albin|Monteno|male|1986-04-09|2010-03-19T21:52:36.860+0000|178.209.14.40|Internet Explorer
10027|Ning|Chen|female|1982-12-08|2010-02-22T17:59:59.221+0000|1.2.9.86|Firefox
I want to take firstName, lastName and birthday from a specific line by id.
Example: if the input is 933, I want to extract (separated by space):
Mahinda Perera 1989-12-03
This should do it:
#!/bin/sh
id="$1"
awk -F '|' -v ID="$id" '($1==ID){print $2, $3, $5}' infile
Use as:
$ script.sh 933
Mahinda Perera 1989-12-03
awk -F'|' '$1 ~ /933/{print $2, $3, $5}' file
Mahinda Perera 1989-12-03
If field one matches 933 print following fields: 2,3 and 5.
With GNU sed:
$ id=933
$ sed -n "/^$id"'|/{s/^[^|]*|\([^|]*\)|\([^|]*\)|[^|]*|\([^|]*\)|.*$/\1 \2 \3/p;q}' infile
Mahinda Perera 1989-12-03
-n prevents printing; the rest does this:
"/^$id"'|/ { # If a line starts with the ID... (notice quoting for parameter expansion)
# Capture second, third and fifth field, discard rest, print
s/^[^|]*|\([^|]*\)|\([^|]*\)|[^|]*|\([^|]*\)|.*$/\1 \2 \3/p
q # Quit to avoid processing the rest of the file for nothing
}'
To get this to run under BSD sed, there has to be another semicolon between the q and the closing brace.
Goes to show that awk is much better suited for the problem.
Use:
awk '{print $2 $3 $5}' infile.dat
Related
I have a csv file in that values are like:
Wt-Do-U-Do-Wit-The-Black,black
Yay-Its-Your-Birthday-Black,black
You-Are-My-Sunshine-Happy-Birthday-Red,red
You-Are-Special-Navy-Blue,navy-blue
You-Dont-Look-A-Day-Over-Fabulous-Green,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday-Pink,pink
I want to split each string before colour name comes.For Ex:
str1=Wt-Do-U-Do-Wit-The
str1=Yay-Its-Your-Birthday
str1=You-Are-My-Sunshine-Happy-Birthday
str1=You-Are-Special
str1=You-Dont-Look-A-Day-Over-Fabulous
str1=You-My-Friend-Are-Fabulous-Happy-Birthday
For searching the string i am using :-
if [ "$string" == *"Black"* ] && [ "$string" == *"White"* ] ; then
echo "It's there!"
else
echo "SOrry"
fi
It is searching fine. But how can I split the string?
Another way I used :
colour_arr[0]='Red'
colour_arr[1]='Black'
colour_arr[2]='Navy-Blue'
colour_arr[3]='White'
inarray=$(echo ${colour_arr[#]} | grep -o "$string" | wc -w)
echo "$inarray"
But this is not working.
you can use sed; inspired from this answer
I simplified the problem a little since you parsed the strings already correctly; using this input file:
This is red colour
Ball is black colour
some more words before red and more after
for the second part of the string; starting with the color name:
sed -n -e 's/^.*\(\(red\|black\).*\)/\1/p' test
gives:
red colour
black colour
red and more after
and
sed -n -e 's/\(^.*\)\(\(red\|black\).*\)/\1/p' test
gives:
This is
Ball is
some more words before
I won't explain all the options; since they are well explained in the answer I referred to. You can use sed on a bash variable using:
leftpart=$(sed -n -e 's/\(^.*\)\(\(red\|black\).*\)/\1/p' <<< $INPUT_STRING)
EDIT after the OP changed the input format:
my answer still applies; just replace red with Red. The rest applies the same.
For your new input
Input
$ cat f2
Wt-Do-U-Do-Wit-The-Black,black
Yay-Its-Your-Birthday-Black,black
You-Are-My-Sunshine-Happy-Birthday-Red,red S
You-Are-Special-Navy-Blue,navy-blue
You-Dont-Look-A-Day-Over-Fabulous-Green,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday-Pink,pink
Output ( Using gawk )
$ awk 'BEGIN{IGNORECASE=1;FS="[ ,]";OFS=","}match($1,$2){print "str1="substr($1,1,RSTART-2)}' f2
str1=Wt-Do-U-Do-Wit-The
str1=Yay-Its-Your-Birthday
str1=You-Are-My-Sunshine-Happy-Birthday
str1=You-Are-Special
str1=You-Dont-Look-A-Day-Over-Fabulous
str1=You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday
For your old input
Input
$ cat f
"This is red colour",red
"Ball is black colour",black
"Tshirt is white colour",white
"Shoes are blue colour",blue
"This is green colour",green
Output
$ awk 'BEGIN{FS=OFS=","}{gsub(/"/,"");match($1,$2);print "str1="substr($1,1,RSTART-1),"str2=" substr($1,RSTART) }' f
str1=This is ,str2=red colour
str1=Ball is ,str2=black colour
str1=Tshirt is ,str2=white colour
str1=Shoes are ,str2=blue colour
str1=This is ,str2=green colour
OneLiner using awk (gnu for IGNORECASE)
awk -F ',' 'BEGIN{IGNORECASE=1}{sub("-"$NF"$","",$1);print "str1="$1}' YourFile
Self commented code
awk -F ',' '# sepeartor of field is coma
# before first line
BEGIN{
# define case compair behaviour (ignoring the case)
IGNORECASE=1
}
# for each line
{
# substitute the pattern ( minus than field 2 content, so the color, at the end) in fields 1 by "" (remove)
sub( "-" $NF "$", "", $1)
# print the new content of filed 1 with str1= before
print "str1="$1
}' YourFile
Based on your comments, you need color out of first "dashed" field , not the value of the second field (comma separated).
If color in this first "dashed" field is always the last string (dash separated), you can simply use
a="You-Are-My-Sunshine-Happy-Birthday-Red" ; awk -F- '{print $NF}' <<<"$a"
PS: You can isolate the first field of the whole line with cut or awk:
awk -f, '{print $1}' <<<"$fileline" or cut -d, -f1 <<<"$fileline"
You can combine above two to achieve what you need.
Keep it simple:
$< input.txt
Wt-Do-U-Do-Wit-The-Black,black
Yay-Its-Your-Birthday-Black,black
You-Are-My-Sunshine-Happy-Birthday-Red,red
You-Are-Special-Navy-Blue,navy-blue
You-Dont-Look-A-Day-Over-Fabulous-Green,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday-Pink,pink
$sed -E 's/(-[^-]+)(,.*)/\2/g' input.txt
Wt-Do-U-Do-Wit-The,black
Yay-Its-Your-Birthday,black
You-Are-My-Sunshine-Happy-Birthday,red
You-Are-Special-Navy,navy-blue
You-Dont-Look-A-Day-Over-Fabulous,green
You-My-Friend-Are-Ridiculously-Fabulous-Happy-Birthday,pink
(Note: on my OS, OSX, sed -E is for extended regex.)
The below script:
#!/bin/bash
otscurrent="
AAA,33854,4528,38382,12
BBB,83917,12296,96213,13
CCC,20399,5396,25795,21
DDD,27198,4884,32082,15
EEE,2472,981,3453,28
FFF,3207,851,4058,21
GGG,30621,4595,35216,13
HHH,8450,1504,9954,15
III,4963,2157,7120,30
JJJ,51,59,110,54
KKK,87,123,210,59
LLL,573,144,717,20
MMM,617,1841,2458,75
NNN,234,76,310,25
OOO,12433,1908,14341,13
PPP,10627,1428,12055,12
QQQ,510,514,1024,50
RRR,1361,687,2048,34
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
"
IFS="," array1=(${otscurrent})
echo ${array1[4]}
Prints:
$ ./test.sh
12
BBB
I'm trying to get it to just print 12... And I am not even sure how to make it just print row 5 column 4
The variable is an output of a sqlquery that has been parsed with several sed commands to change the formatting to csv.
otscurrent="$(sqlplus64 user/password#dbserverip/db as sysdba #query.sql |
sed '1,11d; /^-/d; s/[[:space:]]\{1,\}/,/g; $d' |
sed '$d'|sed '$d'|sed '$d' | sed '$d' |
sed 's/Used,MB/Used MB/g' |
sed 's/Free,MB/Free MB/g' |
sed 's/Total,MB/Total MB/g' |
sed 's/Pct.,Free/Pct. Free/g' |
sed '1b;/^Name/d' |
sed '/^$/d'
)"
Ultimately I would like to be able to call on a row and column and run statements on the values.
Initially i was piping that into :
awk -F "," 'NR>1{ if($5 < 10) { printf "%-30s%-10s%-10s%-10s%-10s\n", $1,$2,$3,$4,$5"%"; } else { echo "Nothing to do" } }')"
Which works but I couldn't run commands from if else ... or atleaste I didn't know how.
If you have bash 4.0 or newer, an associative array is an appropriate way to store data in this kind of form.
otscurrent=${otscurrent#$'\n'} # strip leading newline present in your sample data
declare -A data=( )
row=0
while IFS=, read -r -a line; do
for idx in "${!line[#]}"; do
data["$row,$idx"]=${line[$idx]}
done
(( row += 1 ))
done <<<"$otscurrent"
This lets you access each individual item:
echo "${data[0,0]}" # first field of first line
echo "${data[9,0]}" # first field of tenth line
echo "${data[9,1]}" # second field of tenth line
"I'm trying to get it to just print 12..."
The issue is that IFS="," splits on commas and there is no comma between 12 and BBB. If you want those to be separate elements, add a newline to IFS. Thus, replace:
IFS="," array1=(${otscurrent})
With:
IFS=$',\n' array1=(${otscurrent})
Output:
$ bash test.sh
12
All you need to print the value of the 4th column on the 5th row is:
$ awk -F, 'NR==5{print $4}' <<< "$otscurrent"
3453
and just remember that in awk row (record) and column (field) numbers start at 1, not 0. Some more examples:
$ awk -F, 'NR==1{print $5}' <<< "$otscurrent"
12
$ awk -F, 'NR==2{print $1}' <<< "$otscurrent"
BBB
$ awk -F, '$5 > 50' <<< "$otscurrent"
JJJ,51,59,110,54
KKK,87,123,210,59
MMM,617,1841,2458,75
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
If you'd like to avoid all of the complexity and simply parse your SQL output to produce what you want without 20 sed commands in between, post a new question showing the raw sqlplus output as the input and what you want finally output and someone will post a brief, clear, simple, efficient awk script to do it all at one time, or maybe 2 commands if you still want an intermediate CSV for some reason.
I am very new to Unix shell script and trying to get some knowledge in shell scripting. Please check my requirement and my approach.
I have a input file having data
ABC = A:3 E:3 PS:6
PQR = B:5 S:5 AS:2 N:2
I am trying to parse the data and get the result as
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
The values can be added horizontally and vertically so I am trying to use an array. I am trying something like this:
myarr=(main.conf | awk -F"=" 'NR!=1 {print $1}'))
echo ${myarr[1]}
# Or loop through every element in the array
for i in "${myarr[#]}"
do
:
echo $i
done
or
awk -F"=" 'NR!=1 {
print $1"\n"
STR=$2
IFS=':' read -r -a array <<< "$STR"
for i in "${!array[#]}"
do
echo "$i=>${array[i]}"
done
}' main.conf
But when I add this code to a .sh file and try to run it, I get syntax errors as
$ awk -F"=" 'NR!=1 {
> print $1"\n"
> STR=$2
> FS= read -r -a array <<< "$STR"
> for i in "${!array[#]}"
> do
> echo "$i=>${array[i]}"
> done
>
> }' main.conf
awk: cmd. line:4: FS= read -r -a array <<< "$STR"
awk: cmd. line:4: ^ syntax error
awk: cmd. line:5: for i in "${!array[#]}"
awk: cmd. line:5: ^ syntax error
awk: cmd. line:8: done
awk: cmd. line:8: ^ syntax error
How can I complete the above expectations?
This is the awk code to do what you want:
$ cat tst.awk
BEGIN { FS="[ =:]+"; OFS="=" }
{
print $1
for (i=2;i<NF;i+=2) {
print $i, $(i+1)
}
print ""
}
and this is the shell script (yes, all a shell script does to manipulate text is call awk):
$ awk -f tst.awk file
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
A UNIX shell is an environment from which to call UNIX tools (find, sort, sed, grep, awk, tr, cut, etc.). It has its own language for manipulating (e.g. creating/destroying) files and processes and sequencing calls to tools but it is NOT intended to be used to manipulate text. The guys who invented shell also invented awk for shell to call to manipulate text.
Read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice and the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
First off, a command that does what you want:
$ sed 's/ = /\n/;y/: /=\n/' main.conf
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
This replaces, on each line, the first (and only) occurrence of = with a newline (the s command), then turns all : into = and all spaces into newlines (the y command). Notice that
this works only because there is a space at the end of the first line (otherwise it would be a bit more involved to get the empty line between the blocks) and
this works only with GNU sed because it substitutes newlines; see this fantastic answer for all the details and how to get it to work with BSD sed.
As for what you tried, there is almost too much wrong with it to try and fix it piece by piece: from the wild mixing of awk and Bash to syntax errors all over the place. I recommend you read good tutorials for both, for example:
The BashGuide
Effective AWK Programming
A Bash solution
Here is a way to solve the same in Bash; I didn't use any arrays.
#!/bin/bash
# Read line by line into the 'line' variable. Setting 'IFS' to the empty string
# preserves leading and trailing whitespace; '-r' prevents interpretation of
# backslash escapes
while IFS= read -r line; do
# Three parameter expansions:
# Replace ' = ' by newline (escape backslash)
line="${line/ = /\\n}"
# Replace ':' by '='
line="${line//:/=}"
# Replace spaces by newlines (escape backslash)
line="${line// /\\n}"
# Print the modified input line; '%b' expands backslash escapes
printf "%b" "$line"
done < "$1"
Output:
$ ./SO.sh main.conf
ABC
A=3
E=3
PS=6
PQR
B=5
S=5
AS=2
N=2
I have a sorted, delimited type file and I want to extract a specific field in specific line.
This is my input file: somefile.csv
efevfe,132143,27092011080210,howdy,hoodie
adfasdfs,14321,27092011081847,howdy,hoodie
gerg,7659876,27092011084604,howdy,hoodie
asdjkfhlsdf,7690876,27092011084688,howdy,hoodie
alfhlskjhdf,6548,27092011092413,howdy,hoodie
gerg,769,27092011092415,howdy,hoodie
badfa,124314,27092011092416,howdy,hoodie
gfevgreg,1213421,27092011155906,howdy,hoodie
I want to extract 27092011084688 (value from 4th line, 3rd column).
I used awk 'NR==4' but it gave me whole 4th line.
Fairly straightforward:
awk -F',' 'NR == 4 { print $3 }' somefile.csv
Using , as a field separator, take record number 4 and print field 3 in somefile.csv.
$ sed -n "4p" somefile.csv | cut -d, -f3
Edit
What's this?
-n turns of normal output
4p prints the 4th row
-d, makes cut use , as delimiter
-f3 makes cut print the 3rd field
One way using awk:
awk -F, 'NR==4 { print $3 }' file.txt
Use the following:
awk -F ',' 'NR==4 {print $3}'
perl alternative to print element 3 on line 4 in a csv file:
perl -F, -lane 'print $F[2] if $. == 4' somefile.csv
I need to assign the results from a grep to an array... for example
grep -n "search term" file.txt | sed 's/:.*//'
This resulted in a bunch of lines with line numbers in which the search term was found.
1
3
12
19
What's the easiest way to assign them to a bash array? If I simply assign them to a variable they become a space-separated string.
To assign the output of a command to an array, you need to use a command substitution inside of an array assignment. For a general command command this looks like:
arr=( $(command) )
In the example of the OP, this would read:
arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
The inner $() runs the command while the outer () causes the output to be an array. The problem with this is that it will not work when the output of the command contains spaces. To handle this, you can set IFS to \n.
IFS=$'\n' arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
You can also cut out the need for sed by performing an expansion on each element of the array:
arr=($(grep -n "search term" file.txt))
arr=("${arr[#]%%:*}")
Space-separated strings are easily traversable in bash.
# save the ouput
output=$(grep -n "search term" file.txt | sed 's/:.*//')
# iterating by for.
for x in $output; do echo $x; done;
# awk
echo $output | awk '{for(i=1;i<=NF;i++) print $i;}'
# convert to an array
ar=($output)
echo ${ar[3]} # echos 4th element
if you are thinking space in file name use find . -printf "\"%p\"\n"
#Charles Duffy linked the Bash anti-pattern docs in a comment, and those give the most correct answer:
readarray -t arr < <(grep -n "search term" file.txt | sed 's/:.*//')
His comment:
Note that array=( $(command) ) is considered an antipattern, and is the topic of BashPitfalls #50. – Charles Duffy Nov 16, 2020 at 14:07