Replacing column 2 in original with column 2 in new - file

I have a file containing thousands of original results and a file containing hundreds of new results. Only column 2 of new is different from the original. I also need to keep original results that haven't been changed. How should I go about doing this? Is it possible to create a file3 containing the original results which did not change and the new results? see below for an example.
Original New file3
1:1:1 2:5:2 1:1:1
2:2:2 3:4:3 2:5:2
3:3:3 5:9:5 3:4:3
4:4:4 6:8:6 4:4:4
5:5:5 5:9:5
6:6:6 6:8:6
7:7:7 7:7:7

awk
awk -F':' '{a[$1]=$0}END{for(i in a) print a[i]}' Original_file new_file | sort
Original_file new_file - read both files
for each one of the files read line and:
1) -F':' - use : as separator
2) a[$1]=$0 - create a Hash that it's key is the first column and the value is the all line. if key exists run it over with the new value.
3) for(i in a) print a[i] - print the hash values
4) sort - sort results by order

You can use the diff command between the old file and the new file.
diff -y Original.txt New.txt
Original New
1:1:1 1:1:1
2:2:2 | 2:5:2
3:3:3 | 3:4:3
4:4:4 4:4:4
5:5:5 | 5:9:5
6:6:6 | 6:8:6
7:7:7 7:7:7
For each line, if it contain this character "|" use the command awk to catch the value of new file. Otherwise catch the value of one of both sides, after all both are equals.
Try something how this:
number_of_lines_pipe=$(diff -y Orginal.txt New.txt | grep -e "|" | wc - l)
number_of_lines_without_pipe=$(diff -y Orginal.txt New.txt | grep -v "|" | wc - l)
for ((i = 1; i <= $number_of_lines_pipe; i++))
do
line=$(diff -y Orginal.txt New.txt | grep -e "|" | sed -n $i'p')
echo "$line" | awk -F"|" '{ print $2 }' | sed 's/\t *//' >> File3.log
done
for ((i = 1; i <= $number_of_lines_without_pipe; i++))
do
line=$(diff -y Orginal.txt New.txt | grep -v "|" | sed -n $i'p')
echo "$line" | awk -F" " '{ print $1 }' >> File3.log
done

Related

Shell script which reads each line of a csv file and counts the number of each column

I want a script that reads each row of a CSV file which is called sample.csv and it counts the number of fields of each row and if the number is more than a threshold (here is 14) it stores the whole of that line or just two fields of that line in another file (Hello.bsd) the script which I wrote is as below:
while read -r line
do
echo "$line" > tmp.kk
count= $(awk -F, '{ print NF; exit }' ~/tmp.kk)
if [ "$count" -gt 14 ]; then
field1=$(echo "$line" | awk -F',' '{printf "%s", $1}' | tr -d ',')
field2=$(echo "$line" | awk -F',' '{printf "%s", $2}' | tr -d ',')
echo "$field1 $field2" >> Hello.bsd
fi
done < ~/sample.csv
there is no output for the above code.
I would be so grateful if you could help me in this regard.
Best regards,
sina
FOR JUST FIRST 2 FIELDS
< sample.csv |
mawk 'NF=(_=(+__<NF))+_' FS=',' __="14" # enter constant or shell variable
SAMPLE OUTPUT
echo "${a}"
04z,Y7N,=TT,WLq,n54,cb8,qfy,LLG,ria,hIQ,Mmd,8N2,FK=,7a9,
us6,ck6,LvI,tnY,CQm,wBp,gPH,8ly,JAH,Phv,uwm,x1r,MF1,ide,
03I,GEs,Mok,BxK,z2D,IUH,VWn,Zb7,TkP,Ddt,RE9,mv2,XyD,tr5,
A2t,u0z,MLi,3RF,es1,goz,G0S,l=h,8Ka,coN,vHP,snk,tTV,xNF,
RiU,yBI,QrS,N6D,fWG,oOr,CwZ,9lb,f8h,g5I,c1u,D3X,kOo,lKG,
CSj,da4,Y54,S7R,AEj,Vqx,Fem,sqn,l4Z,YEA,OKe,6Bu,0xU,hGc,
1X8,jUD,XZM,pMc,Q6V,piz,6jp,SJp,E3W,zgJ,BuW,5wd,qVg,wBy,
TQC,O9k,RJ9,fie,2AV,XZ4,meR,tEC,U7v,JWH,LTs,ngF,3A3,ZPa,
ONJ,Phw,jrp,UvY,9Kb,qxf,57f,yHo,a0Q,2S=,=Ob,l1b,XjC
echo "${a}" | mawk 'NF=(_=(+__<NF))+_' FS=',' __="14"
04z Y7N
us6 ck6
03I GEs
A2t u0z
RiU yBI
CSj da4
1X8 jUD
TQC O9k
note that the last line didn't print because it didn't meet the NF threshold

Convert field names to lower case using miller

I would like to use miller (mlr) to convert column names to lower case. The closest I get is using the rename verb with a regular expression. \L should change the case, but instead the the column names are getting prefixed by "\L".
I'm using macOS Catalina and miller 5.10.0
echo -e 'A,B,C\n1,2,3' | mlr --csv --opprint rename -r '(.*),\L\1'
prints
\LA \LB \LC
1 2 3
But I would like it to print
a b c
1 2 3
Two examples ways:
echo -e 'A,B,C\n1,2,3' | mlr --csv put '
map inrec = $*;
$* = {};
for (oldkey, value in inrec) {
newkey = tolower(oldkey);
$[newkey] = value;
}
'
or
echo -e 'A,B,C\n1,2,3' | mlr --csv -N put -S 'if (NR == 1) {for (k in $*) {$[k] = tolower($[k])}}'
Sometimes, standard tools are easier to use:
echo -e 'A,B,C\n1,2,3' | awk 'NR == 1 {print tolower($0); next} 1'
UPDATE
with Miller:
echo -e 'A,B,C\n1,2,3' |
mlr --csv -N put 'NR == 1 {for (k,v in $*) {$[k] = tolower(v)}}'

Bash array values as variables

Is it possible to use array values as variables?
For example, i have this script:
#!/bin/bash
SOURCE=$(curl -k -s $1 | sed 's/{//g;s/}//g;s/,/"\n"/g;s/:/=/g;s/"//g' | awk -F"=" '{ print $1 }')
JSON=$(curl -k -s $1 | sed 's/{//g;s/}//g;s/,/"\n"/g;s/:/=/g;s/"//g' | awk -F"=" '{ print $NF }')
data=$2
readarray -t prot_array <<< "$SOURCE"
readarray -t pos_array <<< "$JSON"
for ((i=0; i<${#prot_array[#]}; i++)); do
echo "${prot_array[i]}" "${pos_array[i]}" | sed 's/NOK/0/g;s/OK/1/g' | grep $2 | awk -F' ' '{ print $2,$3,$4 }'
done
EDIT:
I just added: grep $2 | awk -F' ' '{ print $2,$3,$4 }'
Usage:
./json.sh URL
Sample (very short) output:
DATABASE 1
STATUS 1
I don't want to echo out all the lines, i would like to use DATABASE STATUS as variable $DATABASE and echo that out.
I just need DATABASE (or any other) value from command line.
Is it somehow possible to use something like this?
./json.sh URL $DATABASE
Happy to explain more if needed.
EDIT:
curl output without any formattings etc:
{
"VERSION":"R3.1",
"STATUS":"OK",
"DATABASES":{
"READING":"OK"
},
"TIMESTAMP":"2017-03-08-16-20-35"
}
Output using script:
VERSION R3.1
STATUS 1
DATABASES 1
TIMESTAMP 2017-03-08-16-21-54
What i want is described before. For example use DATABASE as varible $DATABASE and somehow get the value "1"
EDIT:
Random json from uconn.edu
./json.sh https://github.uconn.edu/raw/nam12023/novaLauncher/master/manifest.json
Another:
./json.sh https://gitlab.uwe.ac.uk/dc2-roskilly/angular-qs/raw/master/.npm/nan/2.4.0/package/package.json
Last output begins with:
name nan
version 2.4.0
From command line: ./json.sh URL version
At leats it works for me.
I think you want to use jq something like this:
$ curl -k -s "$1" | jq --arg d DATABASES -r '
"VERSION \(.VERSION)",
"STATUS \(if .STATUS == "OK" then 1 else 0 end)",
"DATABASES \(if .[$d].READING == "OK" then 1 else 0 end)",
"TIMESTAMP \(.TIMESTAMP)"
'
VERSION R3.1
STATUS 1
DATABASES 1
TIMESTAMP 2017-03-08-16-20-35
(I'm probably missing a simpler way to convert a boolean value to an integer.)
Quick explanation:
The ,-separated strings each become a separate output line.
The -r option outputs a raw string, rather than a JSON string value.
The name of the database field is passed using the --arg option.
\(...) is jq's interpolation operator; the contents are evaluated as a JSON expression and the result is inserted into the string.

create arrays from for loop output

I'm trying to understand what I'm doing wrong here, but can't seem to determine the cause. I would like to create a set of arrays from an output for a for loop in bash. Below is the code I have so far:
for i in `onedatastore list | grep pure02 | awk '{print $1}'`;
do
arr${i}=($(onedatastore show ${i} | sed 's/[A-Z]://' | cut -f2 -d\:)) ;
echo "Output of arr${i}: ${arr${i}[#]}" ;
done
The output for the condition is as such:
107
108
109
What I want to do is based on these unique IDs is create arrays:
arr107
arr108
arr109
The arrays will have data like such in each:
[oneadmin#opennebula/]$ arr107=($(onedatastore show 107 | sed 's/[A-Z]://' | cut -f2 -d\:))
[oneadmin#opennebula/]$ echo ${arr107[#]}
DATASTORE 107 INFORMATION 107 pure02_vm_datastore_1 oneadmin oneadmin 0 IMAGE vcenter vcenter /var/lib/one//datastores/107 FILE READY DATASTORE CAPACITY 60T 21.9T 38.1T - PERMISSIONS um- u-- --- DATASTORE TEMPLATE CLONE_TARGET="NONE" DISK_TYPE="FILE" DS_MAD="vcenter" LN_TARGET="NONE" RESTRICTED_DIRS="/" SAFE_DIRS="/var/tmp" TM_MAD="vcenter" VCENTER_CLUSTER="CLUSTER01" IMAGES
When I try this in the script section though I get output errors as such:
./test.sh: line 6: syntax error near unexpected token `$(onedatastore show ${i} | sed 's/[A-Z]://' | cut -f2 -d\:)'
I can't seem to figure out the syntax to use on this scenario.
In the end what I want to do is be able to compare different datastores and based on which on has more free space, deploy VMs to it.
Hope someone can help. Thanks
You can use the eval (potentially unsafe) and declare (safer) commands:
for i in $(onedatastore list | grep pure02 | awk '{print $1}');
do
declare "arr$i=($(onedatastore show ${i} | sed 's/[A-Z]://' | cut -f2 -d\:))"
eval echo 'Output of arr$i: ${arr'"$i"'[#]}'
done
readarray or mapfile, added in bash 4.0, will read directly into an array:
while IFS= read -r i <&3; do
readarray -t "arr$i" < <(onedatastore show "$i" | sed 's/[A-Z]://' | cut -f2 -d:)
done 3< <(onedatastore list | awk '/pure02/ {print $1}')
Better, back through bash 3.x, one can use read -a to read to an array:
shopt -s pipefail # cause pipelines to fail if any element does
while IFS= read -r i <&3; do
IFS=$'\n' read -r -d '' -a "arr$i" \
< <(onedatastore show "$i" | sed 's/[A-Z]://' | cut -f2 -d: && printf '\0')
done 3< <(onedatastore list | awk '/pure02/ {print $1}')
Alternately, one can use namevars to create an alias for an array with an arbitrarily-named array in bash 4.3:
while IFS= read -r i <&3; do
declare -a "arr$i"
declare -n arr="arr$i"
# this is buggy: expands globs, string-splits on all characters in IFS, etc
# ...but, well, it's what the OP is asking for...
arr=( $(onedatastore show "$i" | sed 's/[A-Z]://' | cut -f2 -d:) )
done 3< <(onedatastore list | awk '/pure02/ {print $1}')

How do i echo specific rows and columns from csv's in a variable?

The below script:
#!/bin/bash
otscurrent="
AAA,33854,4528,38382,12
BBB,83917,12296,96213,13
CCC,20399,5396,25795,21
DDD,27198,4884,32082,15
EEE,2472,981,3453,28
FFF,3207,851,4058,21
GGG,30621,4595,35216,13
HHH,8450,1504,9954,15
III,4963,2157,7120,30
JJJ,51,59,110,54
KKK,87,123,210,59
LLL,573,144,717,20
MMM,617,1841,2458,75
NNN,234,76,310,25
OOO,12433,1908,14341,13
PPP,10627,1428,12055,12
QQQ,510,514,1024,50
RRR,1361,687,2048,34
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
"
IFS="," array1=(${otscurrent})
echo ${array1[4]}
Prints:
$ ./test.sh
12
BBB
I'm trying to get it to just print 12... And I am not even sure how to make it just print row 5 column 4
The variable is an output of a sqlquery that has been parsed with several sed commands to change the formatting to csv.
otscurrent="$(sqlplus64 user/password#dbserverip/db as sysdba #query.sql |
sed '1,11d; /^-/d; s/[[:space:]]\{1,\}/,/g; $d' |
sed '$d'|sed '$d'|sed '$d' | sed '$d' |
sed 's/Used,MB/Used MB/g' |
sed 's/Free,MB/Free MB/g' |
sed 's/Total,MB/Total MB/g' |
sed 's/Pct.,Free/Pct. Free/g' |
sed '1b;/^Name/d' |
sed '/^$/d'
)"
Ultimately I would like to be able to call on a row and column and run statements on the values.
Initially i was piping that into :
awk -F "," 'NR>1{ if($5 < 10) { printf "%-30s%-10s%-10s%-10s%-10s\n", $1,$2,$3,$4,$5"%"; } else { echo "Nothing to do" } }')"
Which works but I couldn't run commands from if else ... or atleaste I didn't know how.
If you have bash 4.0 or newer, an associative array is an appropriate way to store data in this kind of form.
otscurrent=${otscurrent#$'\n'} # strip leading newline present in your sample data
declare -A data=( )
row=0
while IFS=, read -r -a line; do
for idx in "${!line[#]}"; do
data["$row,$idx"]=${line[$idx]}
done
(( row += 1 ))
done <<<"$otscurrent"
This lets you access each individual item:
echo "${data[0,0]}" # first field of first line
echo "${data[9,0]}" # first field of tenth line
echo "${data[9,1]}" # second field of tenth line
"I'm trying to get it to just print 12..."
The issue is that IFS="," splits on commas and there is no comma between 12 and BBB. If you want those to be separate elements, add a newline to IFS. Thus, replace:
IFS="," array1=(${otscurrent})
With:
IFS=$',\n' array1=(${otscurrent})
Output:
$ bash test.sh
12
All you need to print the value of the 4th column on the 5th row is:
$ awk -F, 'NR==5{print $4}' <<< "$otscurrent"
3453
and just remember that in awk row (record) and column (field) numbers start at 1, not 0. Some more examples:
$ awk -F, 'NR==1{print $5}' <<< "$otscurrent"
12
$ awk -F, 'NR==2{print $1}' <<< "$otscurrent"
BBB
$ awk -F, '$5 > 50' <<< "$otscurrent"
JJJ,51,59,110,54
KKK,87,123,210,59
MMM,617,1841,2458,75
SSS,1,24,25,96
TTT,0,5,5,100
UUU,294,1606,1900,85
If you'd like to avoid all of the complexity and simply parse your SQL output to produce what you want without 20 sed commands in between, post a new question showing the raw sqlplus output as the input and what you want finally output and someone will post a brief, clear, simple, efficient awk script to do it all at one time, or maybe 2 commands if you still want an intermediate CSV for some reason.

Resources