Get values of columns of a file with tabs and newlines separating them read via bash script to an array - arrays

Hello I have been trying to get the numbers in the columns of a file for two days by reading a file via a bash script. Here is the file sample.txt
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7
By column I mean i.e the first column is
1
9
6
3
3
6
I need to have the column elements each be in a given array col1 or col2 etc so that I can manipulate the values further.
Here's what I have done so far using while loop I have read the contents of the file assigning them each line to an array.
If I set IFS=$'\n'
while read -a line
do
IFS=$'\n'
#I can get the whole column 1 with this
echo ${line[0]}
#for column 2 I can get it by this an the others too
echo ${line[1]}
done < sample.txt
Now that may seem good as i thought but since I want to calculate averages of the columns putting in another loop like a for loop becomes impossible since ${line[0]} has all the elements in column 1 but they are all as a single string (i have tried to observe) that cannot be acted upon.
What would be the best way to get those elements be members of a given array and then compute the averages on them. help appreciated .

In bash I'd write
declare -A cols
n=0
while read -ra fields; do
for ((i=0; i<${#fields[#]}; i++)); do
cols[$i,$n]=${fields[i]}
((n[i]++))
done
done < sample.txt
read -a reads the fields of the line into the named array.
I'm using cols as an associative array to fake a multi-dimensional array. That's way easier to deal with than using a dynamic variable name:
eval "column${i}[$n]=\${fields[$i]}"

Related

Modify IFS in bash while building and array

I'm trying to build an array from 4 different arrays in bash with a custom IFS, can you lend me a hand please.
#!/bin/bash
arr1=(1 2 3 4)
arr2=(1 2 3 4)
arr3=(1 2 3 4)
arr4=(1 2 3 4)
arr5=()
oldIFS=$IFS
IFS=\;
for i in ${!arr1[#]}; do
arr5+=($(echo ${arr1[i]} ${arr2[i]} ${arr3[i]} ${arr4[i]}))
done
IFS=$oldIFS
echo ${arr5[#]}
i what the output to be:
1 1 1 1;2 2 2 2;3 3 3 3;4 4 4 4 4 4
But it doesn't work the output is with normal ' '.
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4 4
Any ideeas?
I tried IFS in different places:
1) In the for loop
2) Before arr5()
I tested it in the for loop and after IFS does change to ";" but it doesn't take effect in the array creation.
IFS is used during the expansion of ${arr5[*]}, not while creating arr5.
arr1=(1 2 3 4)
arr2=(1 2 3 4)
arr3=(1 2 3 4)
arr4=(1 2 3 4)
arr5=()
for i in ${!arr1[#]}; do
arr5+=("${arr1[i]}" "${arr2[i]}" "${arr3[i]}" "${arr4[i]}")
done
(IFS=";"; echo "${arr5[*]}")
Where possible, it's simpler to just change IFS in a subshell rather than try to save and restore its value manually. (Your attempt fails in the rare but possible case that IFS was unset to begin with.)
That said, if you just want the ;-delimited string and arr5 was a way to get there, just build the string directly:
for i in ${!arr1[#]}; do
s+="${arr1[i]} ${arr2[i]} ${arr3[i]} ${arr4[i]};"
done
s=${s%;} # Remove the last extraneous semicolon

delete elements in array bash

im trying to delete two elements at same time in my array in bash script
my code is
elegidos = (1 2 3 4 5 6 7)
i=0
j=${#elegidos[#]}
delete=($i $j)
while [ $i -le $j ]; do
#elegidosmenosdos=${elegidos[#]/$i:$j}
echo ${elegidos[#]/$delete}
delete=($i $j)
let "i++"
let "j--"
done
the output that i have is
1 2 3 4 5 6 7
1 2 3 4 5 6 7
2 3 4 5 6 7
1 3 4 5 6 7
and i need 21 different combinations with five elements using seven numbers
output example
1 2 3 4 5
2 3 4 5 6
7 6 5 4 3
.
.
.
.
.(21 MORE)
Well, it took a minute to suss out that you were simply wanting to do an end-to-middle delete of two-elements at a time working from the ends of the array, deleting the end nodes at the same time, increment/decrement your counters and repeat until you reached the middle. Why you want to do it this way is a bit of a mystery. You can, of course, simply unset elegidos and unset all values at once. However, if you want to work in from both ends -- that fine too if you have a purpose for it.
You have several problems in your script. In bash, all arrays are zero indexed. Your array has 7-elements, so the valid indexes are 0-6. Therefore, you j was wrong to begin with. You needed to subtract 1 from the number of elements to get the index for the end-element, e.g.
i=0
j=$((${#elegidos[#]} - 1))
bash provides a C-style loop that can greatly simplify your task. While you are free to use a while a C-style loop can handle the index increment and decrement seamlessly, e.g.
for ((; i <= j; i++, j--)); do
note the i <= j. If you have an odd-number of elements in the array, on your last iteration, you will simply be deleting one-value instead of two. To handle that condition you need a simple test within the loop to check whether [ "$i" = "$j" ] (or using the arithmetic comparison (( i == j ))).
Putting that altogether, you could refine your element removal to empty the array two-elements at a time to something similar to the following:
#!/bin/bash
elegidos=(1 2 3 4 5 6 7)
i=0
j=$((${#elegidos[#]} - 1))
delete=($i $j)
for ((; i <= j; i++, j--)); do
declare -p elegidos
unset elegidos[$i]
[ "$i" != "$j" ] && unset elegidos[$j]
delete=($i $j)
done
Example Use/Output
$ bash array_del.sh
declare -a elegidos='([0]="1" [1]="2" [2]="3" [3]="4" [4]="5" [5]="6" [6]="7")'
declare -a elegidos='([1]="2" [2]="3" [3]="4" [4]="5" [5]="6")'
declare -a elegidos='([2]="3" [3]="4" [4]="5")'
declare -a elegidos='([3]="4")'
You can see above, that on the first 3-iterations, both end-elements are removed. However, notice on the last removal (there originally being and odd number of elements, only the single-last value is removed on the final iteration.
Look things over and let me know if I captured what you were attempting and whether you have any further questions. If your intent was something else, drop a comment and I'm happy to help further.

Converting continuous streaming text into comma separated, multi-line file

I'm trying to convert a continuous stream of data (random) into comma separated and line separated values. I'm converting the continuous data into csv and then after some columns (let's say 80), I need to put a newline and repeat the process until.
Here's what I did for csv:
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp
'tmp' is the file with following data:
"ZaOAkHEnOsBmD5yZk8cNLC26rIFGSLpzuGHtZgb4VUP4x1Pd21bukeK6wUYNueQQMglvExbnjEaHuoxU0b7Dcne5Y4JP332RzgiI3ZDgHOzm0gjDLVat8au7uckM3t60nqFX0Cy93jXZ5T0IaQ4fw2JfdNF1PbqxDxXv7UGiyysFJ8z16TmYQ9zfBRCZvZirIyRboHNEGgMUFZ18y8XXCGrbpeL0WLstzpSuXetmo47G2xPkDLDcFA6cdM4WAFNpoC2ztspY7YyVsoMZdU7D3u3Lm6dDcKuJKdTV6600GkbLuvAamKGyzMtoqW3liI3ybdTNR9KLz2l7KTjUiGgc3Eci5wnhIosAUMkcSQVxFrZdJ9MVyj6duXAk0CJoRvHYuyfdAr7vjlwjkLkYPtFvAZp6wK3dfetoh3ZmhJhUxqzuxOLDQ9FYcvz64iuIUbgXVZoRnpRoNGw7j3fCwyaqCi..."
I'm generating the continuous sequence from /dev/urandom. I'm not getting how to repeat the gawk after some column by adding a newline character after the column ends.
I got it actually. A simple for loop did that.
Here's my whole code:
for i in $(seq 10)
do
tr -dc A-Za-z0-9 < /dev/urandom | head -c 100 > tmp
gawk '$1=$1' FIELDWIDTHS='4 5 7 1 9 5 10 6 8 3 2 2 8 4 8 8 4 6 9 1' OFS=, tmp >> tmp1
done
Any optimizations would be appreciated.

Sorting an array in a BASH Script by columns and rows while keeping them intact

I have a test file that looks like this:
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7
Each row is supposed to represent a students grades. So the user puts in either an 'r' or a 'c' into the command line to choose to sort by rows or columns, followed by the file name. Sorting by rows would represent getting a students average and sorting my columns would represent a particular assignments average.
I am not doing anything with the choice variable yet because I need to get the array sorted first so I can take the averages and then get the median for each column and row.
So im not sure how I can choose to sort by those specific options. Here is what I have so far:
#!/bin/bash
choice="$1"
filename="$2"
c=0
if [ -e "$filename" ]
then
while read line
do
myArray[$c]=$line
c=$(expr $c + 1)
done < "$filename"
else
echo "File does not exist"
fi
printf -- '%s\n' "${myArray[#]}"
FS=$'\n' sorted=($(sort -n -k 1,1<<<"${myArray[*]}"))
echo " "
printf '%s\n' "${sorted[#]}"
This is only sorting the first column though and im not sure why its even doing that. Any push in the right direction would be appreciated. Examples would help a ton, thanks!
UPDATE:
With the changes that were suggested I have this so far:
#!/bin/sh
IFS=$'\n';
choice="$1"
filename="$2"
if [ -e "$filename" ]
then
while read line
do
myArray[$c]=$line
c=$(expr $c + 1)
done < "$filename"
else
echo "File does not exist."
fi
printf -- '%s\n' "${myArray[#]}"
width=${myArray[0]// /}
width=${#width}
height=${#myArray[#]}
bar=()
for w in $(seq 0 1 $((${width}-1)))
do
tmp=($(sort -n <<<"${myArray[*]}"))
for h in $(seq 0 1 $((${height}-1)))
do
myArray[h]=${myArray[h]#* }
bar[h]="${bar[h]} ${tmp[h]%% *}"
bar[h]="${bar[h]# }"
done
done
printf -- '%s\n' "${bar[*]}"
But now I am getting some really strange output of way more numbers than i started with and in a seemingly random order.
actually it is sorting $line(s) which are strings. you need to initialize the column to sort correctly, so that it is an array
UPDATE:
the following code is really straight forward. no performance aspects are regarded. so for large datasets this will take a while to sort column wise. your datasets have to contain lines of numbers seperated by single spaces to make this work.
#!/bin/bash
IFS=$'\n';
# here you can place your read line function
ar[0]="5 3 2 8"
ar[1]="1 1 1 1"
ar[2]="3 2 4 5"
printf -- '%s\n' "${ar[*]}" # print the column wise unsorted ar
echo
# sorting
width=${ar[0]// /}
width=${#width}
height=${#ar[#]}
bar=()
for w in $(seq 0 1 $((${width}-1))); do # for each column
#sort -n <<<"${ar[*]}" # debug, see first column removal
tmp=($(sort -n <<<"${ar[*]}")) # this just sorts lexigraphically by "first column"
# rows are strings, see initial definition of ar
#echo
for h in $(seq 0 1 $((${height}-1))); do # update first column
ar[h]=${ar[h]#* } # strip first column
bar[h]="${bar[h]} ${tmp[h]%% *}" # add sorted column to new array
bar[h]="${bar[h]# }" # trim leading space
done
#printf -- '%s\n' "${bar[*]}" # debug, see growing bar
#echo "---"
done
printf -- '%s\n' "${bar[*]}" # print the column wise sorted ar
prints out the unsorted and sorted array
5 3 2 8
1 1 1 1
3 2 4 5
1 1 1 1
3 2 2 5
5 3 4 8

What format does matlab need for n-dimensional data input?

I have a 4-dimensional dictionary I made with a Python script for a data mining project I'm working on, and I want to read the data into Matlab to do some statistical tests on the data.
To read a 2-dimensional matrix is trivial. I figured that since my first dimension is only 4-deep, I could just write each slice of it out to a separate file (4 files total) with each file having many 2-dimensional slices, looking something like this:
2 3 6
4 5 8
6 7 3
1 4 3
6 6 7
8 9 0
This however does not work, and matlab reads it as a single continuous 6 x 3 matrix. I even took a look a dlmread but could not figure out how to get it do what I wanted. How do I format this so I can put 3 (or preferably more) dimensions in a single file?
A simple solution is to create a file with two lines only: the first line contains the target array size, the second line contains all your data. Then, all you need to do is reshape the data.
Say your file is
3 2 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
You do the following to read the array into the variable data
fid = fopen('myFile'); %# open the file (don't forget the extension)
arraySize = str2num(fgetl(fid)); %# read the first line, convert to numbers
data = str2num(fgetl(fid)); %# read the second line
data = reshape(data,arraySize); %# reshape the data
fclose(fid); %# close the file
Have a look at data to see how Matlab orders elements in multidimensional arrays.
Matlab stores data column wise. So from your example (assuming its a 3x2x3 matrix), matlab will store it as first, second and third column from the first "slice", followed by the first, second third columns from the second slice and so on like this
2
4
3
5
6
8
6
1
7
4
3
3
6
8
6
9
7
0
So you can write the data out like this from python (I don't know how) and then read it into matlab. Then you can reshape it back into a 3x2x3 matrix and you'll retain your correct ordering.

Resources