Bash formatting text file into columns - arrays

I have a text file with data in it which is set up like a table, but separated with commas, eg:
Name, Age, FavColor, Address
Bob, 18, blue, 1 Smith Street
Julie, 17, yellow, 4 John Street
Firstly I have tried using a for loop, and placing each 'column' with all its values into a separate array.
eg/ 'nameArray' would contain bob, julie.
Here is the code from my actual script, there is 12 columns hence why c should not be greater than 12.
declare -A Array
for((c = 1; c <= 12; c++))
{
for((i = 1; i <= $total_lines; i++))
{
record=$(cat $FILE | awk -F "," 'NR=='$i'{print $'$c';exit}'| tr -d ,)
Array[$c,$i]=$record
}
}
From here I then use the 'printf' function to format each array and print them as columns. The issue with this is that I have more than 3 arrays, in my actual code they're all in the same 'printf' line. Which I don't like and I know it is a silly way to do it.
for ((i = 1; i <= $total_lines; i++))
{
printf "%0s %-10s %-10s...etc \n" "${Array[1,$i]}" "${Array[2,$i]}" "${Array[3,$i]}" ...etc
}
This does however give me the desired output, see image below:
I would like to figure out how to do this another way that doesn't require a massive print statement. Also the first time I call the for loop I get an error with 'awk'.
Any advice would be appreciated, I have looked through multiple threads and posts to try and find a suitable solution but haven't found something that would be useful.

Try the column command like
column -t -s','
This is what I can get quickly. See the man page for details.

Related

Using cells as output table of while loop in octave

So I'm implementing a while loop in my code that just does some simple calculations. The thing is, that I want to have an output that no only shows the final values but all of them from each step. The best I could do was using cell arrays with the following code:
i=1; p=(a+b)/2;
valores=cell(n, 3);
while (i<=n && f(p)!=0);
if f(a)*f(p)<0;
a=a; b=p;
else a=p; b=b;
endif
i=i+1; p=(a+b)/2;
valores(i, :)={i-1 p f(p)}; fprintf('%d %d %d \n', valores{i, :});
endwhile
An example output would be:
1 1.25 -1.40998
2 1.125 -0.60908
3 1.0625 -0.266982
4 1.03125 -0.111148
5 1.01562 -0.0370029
But I have two main issues with this method, the first one is that I couldn't find a way to get some text as title in the first line, so I have to explain what each column in a sentence later, and second I don't know how to make it so that all the columns stay at the same distance from each other instead of each text staying at the same distance. I assume this last issue has something to do with the way I used the fprintf line since I'm not to familiar with it.
In case it helps to understand what I want to get from this algorithm, I'm trying to calculate the root of a function with the bisection method. And sorry if this was to long or unclear, feel free to give me advise, I'm kinda new here :)
An open-source package called Tablicious can take care of cell, row, and column alignment. Using print statements and whitespace gets tedious and leads to unmaintainable code.
Tablicious is a package for GNU Octave that provides relational data structures for Octave. It includes implementations of table arrays, datetime, string, categorical, and some other related stuff. You can think of it as “pandas for Octave”.
Installation
pkg install https://github.com/apjanke/octave-tablicious/releases/download/v0.3.6/tablicious-0.3.6.tar.gz
Example
pkg load tablicious
Forename = {"Tom"; "Dick"; "Harry"};
Age = [21; 63; 38];
Salary = {"$1"; "$2"; "$3"};
tab = table(Forename, Age, Salary);
prettyprint (tab)
Result
-------------------------------
| Forename | Age | Salary |
-------------------------------
| Tom | 21 | $1 |
| Dick | 63 | $2 |
| Harry | 38 | $3 |
-------------------------------
Documentation can be found here.

awk split string on commas ignore if inside double quotes

I know it may sounds that there are 2000 answer to this question online but I found none for this specific case (ex. -vFPAT of this and other answers) cause I need to be with split. I have to split a CSV file with awk in which there may be some values inside double quotes. I need to tell the split function to ignore , if inside "" in order to get an array of the elements.
Here what I tried based on other answers as example
cat try.txt
Hi,I,"am,your",father
maybe,you,knew,it
but,"I,wanted",to,"be,sure"
cat tst.awk
BEGIN {}
{
n_a = split($0,a,/([^,]*)|("[^"]+")/);
for (i=1; i<=n_a; i++) {
collecter[NR][i]=a[i];
}
}
END {
for (i=1; i<=length(collecter); i++)
{
for (z=1; z<=length(collecter[i]);z++)
{
printf "%s\n", collecter[i][z];
}
}
}
but no luck:
awk -f tst.awk try.txt
,
,
,
,
,
,
,
,
,
I tried other regex expression based on other similar answer but none works for this particular case.
Please note: double quoted fields mat and may not be present, may be more than one, and without fixed position/length!
Thanks in advance for any help!
gnu awk has a function called patsplit that lets you do a split using an FPAT pattern:
$ awk '{ print "RECORD " NR ":"; n=patsplit($0, a, "([^,]*)|(\"[^\"]+\")"); for (i=1;i<=n;++i) {print i, "|" a[i] "|"}}' file
RECORD 1:
1 |Hi|
2 |I|
3 |"am,your"|
4 |father|
RECORD 2:
1 |maybe|
2 |you|
3 |knew|
4 |it|
RECORD 3:
1 |but|
2 |"I,wanted"|
3 |to|
4 |"be,sure"|
If Python is an alternative, here is a solution:
try.txt:
Hi,I,"am,your",father
maybe,you,knew,it
but,"I,wanted",to,"be,sure"
Python snippet:
import csv
with open('try.txt') as f:
reader = csv.reader(f, quoting=csv.QUOTE_ALL)
for row in reader:
print(row)
The code snippet above will result in:
['Hi', 'I', 'am,your', 'father']
['maybe', 'you', 'knew', 'it']
['but', 'I,wanted', 'to', 'be,sure']

How can I put CSV files in a array in bash?

So I need to put the content of some of the columns of a CSV file into a array so I can operate with them.
My File looks like this:
userID,placeID,rating,food_rating,service_rating
U1077,135085,2,2,2
U1077,135038,2,2,1
U1077,132825,2,2,2
U1077,135060,1,2,2
U1068,135104,1,1,2
U1068,132740,0,0,0
U1068,132663,1,1,1
U1068,132732,0,0,0
U1068,132630,1,1,1
U1067,132584,2,2,2
U1067,132733,1,1,1
U1067,132732,1,2,2
U1067,132630,1,0,1
U1067,135104,0,0,0
U1067,132560,1,0,0
U1103,132584,1,2,1
U1103,132732,0,0,2
U1103,132630,1,2,0
U1103,132613,2,2,2
U1103,132667,1,2,2
U1103,135104,1,2,0
U1103,132663,1,0,2
U1103,132733,2,2,2
U1107,132660,2,2,1
U1107,132584,2,2,2
U1107,132733,2,2,2
U1044,135088,2,2,2
U1044,132583,1,2,1
U1070,132608,2,2,1
U1070,132609,1,1,1
U1070,132613,1,1,0
U1031,132663,0,0,0
U1031,132665,0,0,0
U1031,132668,0,0,0
U1082,132630,1,1,1
and I want to get the PlaceID and save it in a array and in same position also put the ratings. What I need to do is get a average rating of every PlaceID.
I have been trying something like
cut -d"," -f2 FileName >> var[#]
Hard to accomplish in bash but pretty strightforward in awk:
awk -F',' 'NR>1 {sum[$2] += $3; count[$2]++}; END{ for (id in sum) { print id, sum[id]/count[id] } }' file.csv
Explanation: -F set's the field separator and you want filed 2 and the average of field 3. At the end we print the unique ids and the average. We work on all rows but the first one (row number over 1).

Select groups of lines in file where value of one field is the same

I'm not sure how to word this question so I'll try my best to explain it:
Lets say I have a file:
100001,ABC,400
100001,EFG,500
100001,ABC,500
100002,DEF,400
100002,EFG,300
100002,XYZ,1000
100002,ABC,700
100003,DEF,400
100003,EFG,300
I want to grab each row and group them together where the first value in each row is the same. So all 100001's go together, all 100002's go together, etc.
I just need help figuring out the logic. Don't need a specific implementation in a language.
Pseudocode is fine.
I assume the lines are in order by COL1.
I assume "go together" means they are concatenated into one line.
The logic with pseudocode:
while not EOF
read line
if not same group
if not first line
print accumulated values
start new group
append values
print the last group
In awk you can test it with the following code:
awk '
BEGIN { FS = ","; x=""; last="";}
{
if ($1 != last) {
if (x != "")
print x;
x=$1;
last=$1;
}
x=x";"$2";"$3;
}
END {print x;} '

Merging csv file's lines with the same initial fields and sorting them by their length

I have a huge csv file with 4 fields for each line in this format (ID1, ID2, score, elem):
HELLO, WORLD, 2323, elem1
GOODBYE, BLUESKY, 3232, elem2
HELLO, WORLD, 421, elem3
GOODBYE, BLUESKY, 41134, elem4
ETC...
I would like to merge each line which has the same ID1,ID2 fields on the same line eliminating the score field, resulting in:
HELLO, WORLD, elem1, elem3.....
GOODBYE, BLUESKY, elem2, elem4.....
ETC...
where each elem come from a different line with the same ID1,ID2.
After that I would like to sort the lines on the basis of their length.
I have tried to do coding in java but is superslow. I have read online about AWK, but I can't really find a good spot where I can understand its syntax for csv files.
I used this command, how can I adapt it to my needs?
awk -F',' 'NF>1{a[$1] = a[$1]","$2}END{for(i in a){print i""a[i]}}' finale.txt > finale2.txt^C
your key should be composite, also delimiter need to be set to accommodate comma and spaces.
$ awk -F', *' -v OFS=', ' '{k=$1 OFS $2; a[k]=k in a?a[k] OFS $4:$4}
END{for(k in a) print k, a[k]}' file
GOODBYE, BLUESKY, elem2, elem4
HELLO, WORLD, elem1, elem3
Explanation
set field separator (FS) to comma followed with one or more spaces, and output field separator (OFS) to normalized form (comma and one space). Create a composite key from first two fields separated with OFS (since we're going to use it in the output). Append the fourth field to the array element indexed by key (treat first element special since we don't want to start with OFS). When all records are done (END block) print all keys and values.
To add the length keep a parallel counter and increment each time you append for each key, c[k]++ and use it when printing. That is,
$ awk -F', *' -v OFS=', ' '{k=$1 OFS $2; c[k]++; a[k]=k in a?a[k] OFS $4:$4}
END{for(k in a) print k, c[k], a[k]}' file |
sort -t, -k3n
GOODBYE, BLUESKY, 2, elem2, elem4
HELLO, WORLD, 2, elem1, elem3

Resources