How can I put CSV files in a array in bash? - arrays

So I need to put the content of some of the columns of a CSV file into a array so I can operate with them.
My File looks like this:
userID,placeID,rating,food_rating,service_rating
U1077,135085,2,2,2
U1077,135038,2,2,1
U1077,132825,2,2,2
U1077,135060,1,2,2
U1068,135104,1,1,2
U1068,132740,0,0,0
U1068,132663,1,1,1
U1068,132732,0,0,0
U1068,132630,1,1,1
U1067,132584,2,2,2
U1067,132733,1,1,1
U1067,132732,1,2,2
U1067,132630,1,0,1
U1067,135104,0,0,0
U1067,132560,1,0,0
U1103,132584,1,2,1
U1103,132732,0,0,2
U1103,132630,1,2,0
U1103,132613,2,2,2
U1103,132667,1,2,2
U1103,135104,1,2,0
U1103,132663,1,0,2
U1103,132733,2,2,2
U1107,132660,2,2,1
U1107,132584,2,2,2
U1107,132733,2,2,2
U1044,135088,2,2,2
U1044,132583,1,2,1
U1070,132608,2,2,1
U1070,132609,1,1,1
U1070,132613,1,1,0
U1031,132663,0,0,0
U1031,132665,0,0,0
U1031,132668,0,0,0
U1082,132630,1,1,1
and I want to get the PlaceID and save it in a array and in same position also put the ratings. What I need to do is get a average rating of every PlaceID.
I have been trying something like
cut -d"," -f2 FileName >> var[#]

Hard to accomplish in bash but pretty strightforward in awk:
awk -F',' 'NR>1 {sum[$2] += $3; count[$2]++}; END{ for (id in sum) { print id, sum[id]/count[id] } }' file.csv
Explanation: -F set's the field separator and you want filed 2 and the average of field 3. At the end we print the unique ids and the average. We work on all rows but the first one (row number over 1).

Related

awk compares two column of the same file and print the output in a new column

I am trying to compare values from two column in a txt file in order to identify if valus of the first are major than values of the second one, and than I would like to have a third column saying "here is major" or "here is less".
here an example:
col 1 col 2
0.19727856043434921 0.3970115896012234
0.23832706720231472 0.25380634765625
0.12105356101624322 0.3748512347530257
0.28048626091230955 0.05645118584277009
I would like to have something like
col 1 col 2
0.19727856043434921 0.3970115896012234 here is less
0.23832706720231472 0.25380634765625 here is less
0.12105356101624322 0.3748512347530257 here is less
0.28048626091230955 0.05645118584277009 here is major
I tried to use awk as follow but is not detecting the truth
awk 'OFS="\t" {if( $1>$2) print "here is major"; else if( $1<$2) print "here is less"} ' file.txt > file2.txt

Using awk or shell to total values in a delimited file

I have a pipe delimited file that also has delimited values in certain fields and I want to get the total of the 37th field and have it added to end of each line
Sample file looks like this
1|2|18|45324.56|John|Smith|...etc then the 37th field has |1.99^2.46^79.87|next data field here|etc
I want to add the numbers 1.99, 2.46, 79.87 together and add them to the end of the file
1|2|18|45324.56|John|Smith|...etc then the 37th field has |1.99^2.46^79.87|next data field here|84.32 <- total of all values in $37 (this field can have 1 value or over 100 values in it
Obviously I can do awk 'F'|' '{print $37}' file and it will show me 1.99^2.46^79.87 but I'm uncertain how to total those values since it's basically delimited data inside of differently delimited data
Edit here is full line of data
1|12|15|29786.31|test|true|2019-12-01|2021-02-28||2019-12-01|2021-02-28|1417.00|t0000000|John|Smith|current|1234 Main St|Dallas|TX|75000|Office|8709999999||||||||Attachment^Attachment|t0000000_4042 - Application Documents.pdf^t0000000_4042 - Lease Agreement.pdf|704405808^704405809^704405810^704523038^704523039^704523593^704523594|2021-03-01^2021-03-01^2021-03-01^2021-02-28^2021-02-28^2021-03-06^2021-03-06|RUBS Income Water/Sewer^RUBS Income Water/Sewer^Utility Bi
lling Revenue^Damages/Cleaning Fees^Damages/Cleaning Fees^RUBS Income Water/Sewer^RUBS Income Water/Sewer|Charge^Charge^Charge^Charge^Charge^Charge^Charge|18.25^15.26^2.99^40.00^25.00^18.88^15.78|18.25^15.26^2.99^40.00^25.00^18.88^15.78|Charge Code^Charge Code Desc^Transaction Note^Charge Code^Charge Code Desc^Transa
ction Note^Charge Code^Charge Code Desc^Transaction Note^Charge Code^Charge Code Desc^Transaction Note^Charge Code^Charge Code Desc^Transaction Note^Charge Code^Charge Code Desc^Transaction Note^Charge Code^Charge Code Desc^Transaction Note
awk has a function split that splits a field on some delimiter:
awk -F\| '
split($37, a, "^") # split on caret, store in a
s = 0; for (i in a) s += a[i] # sum values
print $0 "|" s # append new column and output
' sample_file

Select groups of lines in file where value of one field is the same

I'm not sure how to word this question so I'll try my best to explain it:
Lets say I have a file:
100001,ABC,400
100001,EFG,500
100001,ABC,500
100002,DEF,400
100002,EFG,300
100002,XYZ,1000
100002,ABC,700
100003,DEF,400
100003,EFG,300
I want to grab each row and group them together where the first value in each row is the same. So all 100001's go together, all 100002's go together, etc.
I just need help figuring out the logic. Don't need a specific implementation in a language.
Pseudocode is fine.
I assume the lines are in order by COL1.
I assume "go together" means they are concatenated into one line.
The logic with pseudocode:
while not EOF
read line
if not same group
if not first line
print accumulated values
start new group
append values
print the last group
In awk you can test it with the following code:
awk '
BEGIN { FS = ","; x=""; last="";}
{
if ($1 != last) {
if (x != "")
print x;
x=$1;
last=$1;
}
x=x";"$2";"$3;
}
END {print x;} '

Merging csv file's lines with the same initial fields and sorting them by their length

I have a huge csv file with 4 fields for each line in this format (ID1, ID2, score, elem):
HELLO, WORLD, 2323, elem1
GOODBYE, BLUESKY, 3232, elem2
HELLO, WORLD, 421, elem3
GOODBYE, BLUESKY, 41134, elem4
ETC...
I would like to merge each line which has the same ID1,ID2 fields on the same line eliminating the score field, resulting in:
HELLO, WORLD, elem1, elem3.....
GOODBYE, BLUESKY, elem2, elem4.....
ETC...
where each elem come from a different line with the same ID1,ID2.
After that I would like to sort the lines on the basis of their length.
I have tried to do coding in java but is superslow. I have read online about AWK, but I can't really find a good spot where I can understand its syntax for csv files.
I used this command, how can I adapt it to my needs?
awk -F',' 'NF>1{a[$1] = a[$1]","$2}END{for(i in a){print i""a[i]}}' finale.txt > finale2.txt^C
your key should be composite, also delimiter need to be set to accommodate comma and spaces.
$ awk -F', *' -v OFS=', ' '{k=$1 OFS $2; a[k]=k in a?a[k] OFS $4:$4}
END{for(k in a) print k, a[k]}' file
GOODBYE, BLUESKY, elem2, elem4
HELLO, WORLD, elem1, elem3
Explanation
set field separator (FS) to comma followed with one or more spaces, and output field separator (OFS) to normalized form (comma and one space). Create a composite key from first two fields separated with OFS (since we're going to use it in the output). Append the fourth field to the array element indexed by key (treat first element special since we don't want to start with OFS). When all records are done (END block) print all keys and values.
To add the length keep a parallel counter and increment each time you append for each key, c[k]++ and use it when printing. That is,
$ awk -F', *' -v OFS=', ' '{k=$1 OFS $2; c[k]++; a[k]=k in a?a[k] OFS $4:$4}
END{for(k in a) print k, c[k], a[k]}' file |
sort -t, -k3n
GOODBYE, BLUESKY, 2, elem2, elem4
HELLO, WORLD, 2, elem1, elem3

Bash formatting text file into columns

I have a text file with data in it which is set up like a table, but separated with commas, eg:
Name, Age, FavColor, Address
Bob, 18, blue, 1 Smith Street
Julie, 17, yellow, 4 John Street
Firstly I have tried using a for loop, and placing each 'column' with all its values into a separate array.
eg/ 'nameArray' would contain bob, julie.
Here is the code from my actual script, there is 12 columns hence why c should not be greater than 12.
declare -A Array
for((c = 1; c <= 12; c++))
{
for((i = 1; i <= $total_lines; i++))
{
record=$(cat $FILE | awk -F "," 'NR=='$i'{print $'$c';exit}'| tr -d ,)
Array[$c,$i]=$record
}
}
From here I then use the 'printf' function to format each array and print them as columns. The issue with this is that I have more than 3 arrays, in my actual code they're all in the same 'printf' line. Which I don't like and I know it is a silly way to do it.
for ((i = 1; i <= $total_lines; i++))
{
printf "%0s %-10s %-10s...etc \n" "${Array[1,$i]}" "${Array[2,$i]}" "${Array[3,$i]}" ...etc
}
This does however give me the desired output, see image below:
I would like to figure out how to do this another way that doesn't require a massive print statement. Also the first time I call the for loop I get an error with 'awk'.
Any advice would be appreciated, I have looked through multiple threads and posts to try and find a suitable solution but haven't found something that would be useful.
Try the column command like
column -t -s','
This is what I can get quickly. See the man page for details.

Resources