How to structure multiple python arrays for sorting - arrays

A fourier analysis I'm doing outputs 5 data fields, each of which I've collected into 1-d numpy arrays: freq bin #, amplitude, wavelength, normalized amplitude, %power.
How best to structure the data so I can sort by descending amplitude?
When testing with just one data field, I was able to use a dict as follows:
fourier_tuples = zip(range(len(fourier)), fourier)
fourier_map = dict(fourier_tuples)
import operator
fourier_sorted = sorted(fourier_map.items(), key=operator.itemgetter(1))
fourier_sorted = np.argsort(-fourier)[:3]
My intent was to add the other arrays to line 1, but this doesn't work since dicts only accept 2 terms. (That's why this post doesn't solve my issue.)
Stepping back, is this a reasonable approach, or are there better ways to combine & sort separate arrays? Ultimately, I want to take the data values from the top 3 freqs and associated other data, and write them to an output data file.
Here's a snippet of my data:
fourier = np.array([1.77635684e-14, 4.49872050e+01, 1.05094837e+01, 8.24322470e+00, 2.36715913e+01])
freqs = np.array([0. , 0.00246951, 0.00493902, 0.00740854, 0.00987805])
wavelengths = np.array([inf, 404.93827165, 202.46913583, 134.97942388, 101.23456791])
amps = np.array([4.33257766e-16, 1.09724890e+00, 2.56328871e-01, 2.01054261e-01, 5.77355886e-01])
powers% = np.array([4.8508237956526163e-32, 0.31112370227749603, 0.016979224022185751, 0.010445983875848858, 0.086141014686372669])
The last 4 arrays are other fields corresponding to 'fourier'. (Actual array lengths are 42, but pared down to 5 for simplicity.)

You appear to be using numpy, so here is the numpy way of doing this. You have the right function np.argsort in your post, but you don't seem to use it correctly:
order = np.argsort(amplitudes)
This is similar to your dictionary trick only it computes the inverse shuffling compared to your procedure. Btw. why go through a dictionary and not simply a list of tuples?
The contents of order are now indices into amplitudes the first cell of order contains the position of the smallest element of amplitudes, the second cell contains the position of the next etc. Therefore
top5 = order[:-6:-1]
Provided your data are 1d numpy arrays you can use top5 to extract the elements corresponding to the top 5 ampltiudes by using advanced indexing
freq_bin[top5]
amplitudes[top5]
wavelength[top5]
If you want you can group them together in columns and apply top5 to the resulting n-by-5 array:
np.c_[freq_bin, amplitudes, wavelength, ...][top5, :]

If I understand correctly you have 5 separate lists of the same length and you are trying to sort all of them based on one of them. To do that you can either use numpy or do it with vanilla python. Here are two examples from top of my head (sorting is based on the 2nd list).
a = [11,13,10,14,15]
b = [2,4,1,0,3]
c = [22,20,23,25,24]
#numpy solution
import numpy as np
my_array = np.array([a,b,c])
my_sorted_array = my_array[:,my_array[1,:].argsort()]
#vanilla python solution
from operator import itemgetter
my_list = zip(a,b,c)
my_sorted_list = sorted(my_list,key=itemgetter(1))
You can then flip the array with my_sorted_array = np.fliplr(my_sorted_array) if you wish or if you are working with lists you can reverse it in place with my_sorted_list.reverse()
EDIT:
To get first n values only, you have to simply slice the array similarly to what #Paul is suggesting. Slice is done in a similar manner to classic list slicing by specifying start:stop:step (you can omit the step) arguments. In your case for 5 top columns it would be [:,-5:]. So in the example above you can take top 2 columns from each row like this:
my_sliced_sorted_array = my_sorted_array[:,-2:]
result will be:
array([[15, 13],
[ 3, 4],
[24, 20]])
Hope it helps.

Related

How do I create a string of array combinations given a list of "source code" strings?

Basically, I’m given a list of strings such as:
["structA.structB.myArr[6].myVar",
"structB.myArr1[4].myArr2[2].myVar",
"structC.myArr1[3][4].myVar",
"structA.myArr1[4]",
"structA.myVar"]
These strings are describing variables/arrays from multiple structs. The integers in the arrays describe the size each array. Given a string has a/multiple arrays (1d or 2d), I want to generate a list of strings which go through each index combination in the array for that string. I thought of using for loops but issue is I don’t know how many arrays are in a given string before running the script. So I couldn’t do something like
for i in range (0, idx1):
for j in range (0, idx2):
for k in range (0, idx3):
arr.append(“structA.myArr1[%i][%i].myArr[%i]” %(idx1,idx2,idx3))
but the issue is that I don’t know how I can create multiple/dynamic for loops based on how many indexes and how I could create a dynamic append statement that changes per each string from the original list since each string will have a different number of indexes and the arrays will be in different locations of the string.
I was able to write a regex to find all the index for each string in my list of strings:
indexArr = re.findall('\[(.*?)\]', myString)
//after looping, indexArr = [['6'],['4','2'],['3','4'],['4']]
however I'm really stuck on how to achieve the "dynamic for loops" or use recursion for this. I want to get my ending list of strings to look like:
[
["structA.structB.myArr[0].myVar",
"structA.structB.myArr[1].myVar",
...
"structA.structB.myArr[5].myVar”],
[“structB.myArr1[0].myArr2[0].myVar",
"structB.myArr1[0].myArr2[1].myVar",
"structB.myArr1[1].myArr2[0].myVar",
…
"structB.myArr1[3].myArr2[1].myVar”],
[“structC.myArr1[0][0].myVar",
"structC.myArr1[0][1].myVar",
…
"structC.myArr1[2][3].myVar”],
[“structA.myArr1[0]”,
…
"structA.myArr1[3]”],
[“structA.myVar”] //this will only contain 1 string since there were no arrays
]
I am really stuck on this, any help is appreciated. Thank you so much.
The key is to use itertools.product to generate all possible combinations of a set of ranges and substitute them as array indices of an appropriately constructed string template.
import itertools
import re
def expand(code):
p = re.compile('\[(.*?)\]')
ranges = [range(int(s)) for s in p.findall(code)]
template = p.sub("[{}]", code)
result = [template.format(*s) for s in itertools.product(*ranges)]
return result
The result of expand("structA.structB.myArr[6].myVar") is
['structA.structB.myArr[0].myVar',
'structA.structB.myArr[1].myVar',
'structA.structB.myArr[2].myVar',
'structA.structB.myArr[3].myVar',
'structA.structB.myArr[4].myVar',
'structA.structB.myArr[5].myVar']
and expand("structB.myArr1[4].myArr2[2].myVar") is
['structB.myArr1[0].myArr2[0].myVar',
'structB.myArr1[0].myArr2[1].myVar',
'structB.myArr1[1].myArr2[0].myVar',
'structB.myArr1[1].myArr2[1].myVar',
'structB.myArr1[2].myArr2[0].myVar',
'structB.myArr1[2].myArr2[1].myVar',
'structB.myArr1[3].myArr2[0].myVar',
'structB.myArr1[3].myArr2[1].myVar']
and the corner case expand("structA.myVar") naturally works to produce
['structA.myVar']

Array intersection issue (Matlab)

I am trying to carry out the intersection of two arrays in Matlab but I cannot find the way.
The arrays that I want to intersect are:
and
I have tried:[dur, itimes, inewtimes ] = intersect(array2,char(array1));
but no luck.
However, if I try to intersect array1 with array3 (see array3 below), [dur, itimes, inewtimes ] = intersect(array3,char(array1));the intersection is performed without any error.
Why I cannot intersect array1 with array2?, how could I do it?. Thank you.
Just for ease of reading, your formats for Arrays are different, and you want to make them the same. There are many options for you, like #Visser suggested, you could convert the date/time into a long int which allows faster computation, or you can keep them as strings, or even convert them into characters (like what you have done with char(Array2)).
This is my example:
A = {'00:00:00';'00:01:01'} %//Type is Cell String
Z = ['00:00:00';'00:01:01'] %//Type is Cell Char
Q = {{'00:00:00'};{'00:01:01'}} %//Type is a Cell of Cells
A = cellstr(A) %//Convert CellStr to CellStr is essentially doing nothing
Z = cellstr(Z) %//Convert CellChar to CellStr
Q = vertcat(Q{:,:}) %// Convert Cell of Cells to Cell of Strings
I = intersect (A,Z)
>>'00:00:00'
'00:01:01'
II = intersect (A,Q)
>>'00:00:00'
'00:01:01'
This keeps your dates in the format of Strings in case you want to export them back into a txt/csv file.
Your first array would look something like this:
array1 = linspace(0,1,86400); % creates 86400 seconds in 1 day
Your second array should be converted using datenum, then use cell2mat to make it a matrix. Lastly, use ismember to find the intersection:
InterSect = ismember(array2,array1);

Matlab: Delete the item in an N-dimensional array whose Nth dimension is 1, where N is unknown?

I have an N-dimensional array of items whose last dimension is the index of the array.
For example, if the array A contained images, then A(:,:,:,1) would be the first image, A(:,:,:,2) would be the second image, and so forth.
Similarly, if the array just contained integers, then A(:,1) would be the first integer, A(:,2) would be the second integer, and so forth.
-=-=-=-
What I'm trying to do is delete the first item from A when I do not know ahead of time what dimensionality it is.
If A contains images, I want to do this:
A(:,:,:,1) = [];
If A contains integers, I want to do this:
A(:,1) = [];
The problem is since I don't know what dimensionality it is, I don't know how many colons to put, and I don't know how to denote "N-1 colons here" in Matlab.
I'm hoping there is a programmatic way to do this, but I frankly have no idea what to search for if this is possible.
You can either use cell to comma-separated list expansion:
%// Build cell: {':', ':', ..., ':', [1]}
I(1:ndims(A)-1) = {':'};
I{ndims(A)} = 1;
%// Expand cell to comma separated list and delete:
A(I{:}) = [];
Or convert to cell using num2cell and then convert back using cell2mat:
C = num2cell(A,1:ndims(A)-1);
A = cell2mat(C(2:end));
I guess that unless you really need n-dimensional arrays, doing this with a cell array of n-1 dimensional arrays instead (as is C in the above code) should be a smart move in terms of simplicity of notation.

Split array into smaller unequal-sized arrays dependend on array-column values

I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.

What type of array should I use to combine numerical values and string values and have the ability to sort them in octave/matlab

I'm trying to link numerical (octave/matlab) values in an array to string values in the array how can I go about doing this. The reason I'm trying to do this is to sort the array based on the numerical values.
Example:
array=[1,2,'filename1';3,4,'filename2';5,6,'filename3'] (I know this is incorrect and will give an error)
This is what I'm trying to get it to look like so I can sort based on the first or the second column and have the third column be "linked" / follow the sort. (Please note the numbers will not be a sequential sequence like 1,2,3... I just used that as an example)
1,2,filename1
3,4,filename2
5,6,filename3
If I sort the first numerical column in descending order it should look like this
5,6,filename3
3,4,filename2
1,2,filename1
How can I go about doing this and still get the values of the array individually?
Example:
array(1,1) would be 5
and array(3,3) would be filename1
If you want to know, I plan on creating a playlist of wavfile names based on this sort.
Ps: I'm using Octave/Matlab
You can use 2 separate arrays, one with 2 columns of number and one with strings. When you sort the first array based on the first number, the function will return both the sorted list as well as the reordered indices. You can use the reordered indices to rearrange the strings.
It would be something like this:
[sort_list, indx] = sort(array1, 1);
array_string = array_strings[indx];
I don't think the access in that way is possible. Maybe you can do something similar with cells but I would be more expensive than using 2 arrays.
Use a cell arrray. You then sort using sortrows:
>> myCell = {1, 2, 'filename1';
3, 4, 'filename2';
5, 6, 'filename3'};
>> array = flipud(sortrows(myCell, 1)) %// "1" => col. "flipud" => descending
ans =
[5] [6] 'filename3'
[3] [4] 'filename2'
[1] [2] 'filename1'
>> array(1,1)
ans =
[5]
>> array(3,3)
ans =
'filename1'

Resources