Related
Say, Y is a 7-dimensional array, and I need an efficient way to maximize it along the last 3 dimensions, that will work on GPU.
As a result I need a 4-dimensional array with maximal values of Y and three 4-dimensional arrays with the indices of these values in the last three dimensions.
I can do
[Y7, X7] = max(Y , [], 7);
[Y6, X6] = max(Y7, [], 6);
[Y5, X5] = max(Y6, [], 5);
Then I have already found the values (Y5) and the indices along the 5th dimension (X5). But I still need indices along the 6th and 7th dimensions.
Here's a way to do it. Let N denote the number of dimensions along which to maximize.
Reshape Y to collapse the last N dimensions into one.
Maximize along the collapsed dimensions. This gives argmax as a linear index over those dimensions.
Unroll the linear index into N subindices, one for each dimension.
The following code works for any number of dimensions (not necessarily 7 and 3 as in your example). To achieve that, it handles the size of Y generically and uses a comma-separated list obtained from a cell array to get N outputs from sub2ind.
Y = rand(2,3,2,3,2,3,2); % example 7-dimensional array
N = 3; % last dimensions along which to maximize
D = ndims(Y);
sz = size(Y);
[~, ind] = max(reshape(Y, [sz(1:D-N) prod(sz(D-N+1:end))]), [], D-N+1);
sub = cell(1,N);
[sub{:}] = ind2sub(sz(D-N+1:D), ind);
As a check, after running the above code, observe for example Y(2,3,1,2,:) (shown as a row vector for convenience):
>> reshape(Y(2,3,1,2,:), 1, [])
ans =
0.5621 0.4352 0.3672 0.9011 0.0332 0.5044 0.3416 0.6996 0.0610 0.2638 0.5586 0.3766
The maximum is seen to be 0.9011, which occurs at the 4th position (where "position" is defined along the N=3 collapsed dimensions). In fact,
>> ind(2,3,1,2)
ans =
4
>> Y(2,3,1,2,ind(2,3,1,2))
ans =
0.9011
or, in terms of the N=3 subindices,
>> Y(2,3,1,2,sub{1}(2,3,1,2),sub{2}(2,3,1,2),sub{3}(2,3,1,2))
ans =
0.9011
As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I checked it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here is my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the code above provides this error:
ValueError: total size of new array must be unchanged
With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it.
I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with default f, that will give you a array of maximums for your windows.
Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)
If you really want to pad with zeros, you can use np.lib.pad:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Warning:
It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats:
You would need to calculate the second axis because -1 tricks don't work with resize.
If the original array a is referenced by anything else, a.resize will fail with the following error:
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.
Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into.
EDIT:
Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
Here's a generalized way to reshape with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which your shapeshifter could use in place of reshape
My code:
B = zeros(height(A),1);
col_names = A.Properties.VariableNames; % Replicate header names
for k = 1:height(A)
% the following 'cellfun' compares each column to the values in A.L{k},
% and returns a cell array of the result for each of them, then
% 'cell2mat' converts it to logical array, and 'any' combines the
% results for all elements in A.L{k} to one logical vector:
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),A.L{k},...
'UniformOutput', false).'),1);
% then a logical indexing is used to define the columns for summation:
B(k) = sum(A{k,C});
end
generates the following error message.
Error using cellfun
Input #2 expected to be a cell array, was double instead.
How do I solve this error?
This is how table 'A' looks like:
A.L{1,1} contains:
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),A.L{k},...
'UniformOutput', false).'),1);
here A.L{k} gets the contents of the cell located at the kth position of A.L. Using A.L(k) you get the cell itself which is located at A.L:
tmp = A.L(k);
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),tmp{1},...
'UniformOutput', false).'),1);
Bit of a hacky way, as you first need to get the cell at A.L(k) and then need the contents of that cell, so you need a temporary variable.
I'm not entirely sure quite what's going on here, but here's a fabricated example that I think is similar to what you're trying to achieve.
%% Setup - fabricate some data
colNames = {'xx', 'yy', 'zz', 'qq'};
h = 20;
% It looks like 'L' contains something related to the column names
% so I'm going to build something like that.
L = repmat(colNames, h, 1);
% Empty some rows out completely
L(rand(h,1) > 0.7, :) = {''};
% Empty some other cells out at random
L(rand(numel(L), 1) > 0.8) = {''};
A = table(L, rand(h,1), rand(h, 1), rand(h, 1), rand(h, 1), ...
'VariableNames', ['L', colNames]);
%% Attempt to process each row
varNames = A.Properties.VariableNames;
B = zeros(height(A), 1);
for k = 1:height(A)
% I think this is what's required - work out which columns are
% named in "A.L(k,:)". This can be done simply by using ISMEMBER
% on the row of A.L.
C = ismember(varNames, A.L(k,:));
B(k) = sum(A{k, C});
end
If I'm completely off-course here, then perhaps you could give us an executable example.
I am using Python 2.7. From previous posts, I am learning Python and I have moved from arrays and now I am working on loops. I am also trying to work with operations using arrays.
A1 = np.random.random_integers(35, size=(10.,5.))
A = np.array(A1)
B1 = np.random.random_integers(68, size=(10.,5.))
B = np.array(B1)
D = np.zeros(10,5) #array has 10 rows and 5 columns filled with zeros to give me the array size I want
for j in range (1,5):
for k in range (1,5):
D[j,k] = 0
for el in range (1,10):
D[j,k] = D[j,k] + A[j] * B[k]
The error I am getting is : setting an array element with a sequence
Is my formatting incorrect?
Because A, B and D are all 2D arrays, then D[j,k]
is a single element, while A[j] (the same as A[j,:]) is a 1D array which, in this case, has 5 elements. Similar for B[k] = B[k,:], i.e. also a 5 element array.
A[j] * B[k] is therefore also five element array, which can not be stored in the place of a single element, and you therefore get the error: setting an array element with a sequence.
If you want to select single elements from A and B, then the last line should be
D[j,k] = D[j,k] + A[j,k] * B[j,k]
Some further comments on your code:
# A is already a numpy array, so 'A = np.array(A1)' is redundant and can be omitted
A = np.random.random_integers(35, size=(10.,5.))
# Same as above
B = np.random.random_integers(68, size=(10.,5.))
D = np.zeros([10,5]) # This is the correct syntax for creating a 2D array with the np.zeros() function
for j in range(1,5):
for k in range(1,5):
# D[j,k] = 0 You have already defined D to be zero for all elements with the np.zeros function, so there is no need to do it again
for el in range(1,75):
D[j,k] = D[j,k] + A[j] * B[k]
EDIT:
Well, I do not have enough reputation to comment on your post #Caroline.py, so I will do it here instead:
First of all, remember that python uses zero indexing, so 'range(1,5)' gives you '[1,2,3,4]', which means that you would not reach the first index, i.e. index 0. Thus you would probably want to use 'range(0,5)', which is the same as just 'range(5)', instead.
I can see that you changed the el range from 75 to 10. If you don't use el to anything, it just means that you add perform the last line 10 times.
I don't know what you want to do, but if you want to store the multiple of A and B in D, then this should be right:
for j in range(10):
for k in range(5):
D[j,k] = A[j,k] * B[j,k]
or just
D = A * B
I am trying to do some numpy matrix math because I need to replicate the repmat function from MATLAB. I know there are a thousand examples online, but I cannot seem to get any of them working.
The following is the code I am trying to run:
def getDMap(image, mapSize):
newSize = (float(mapSize[0]) / float(image.shape[1]), float(mapSize[1]) / float(image.shape[0]))
sm = cv.resize(image, (0,0), fx=newSize[0], fy=newSize[1])
for j in range(0, sm.shape[1]):
for i in range(0, sm.shape[0]):
dmap = sm[:,:,:]-np.array([np.tile(sm[j,i,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))])
return dmap
The function getDMap(image, mapSize) expects an OpenCV2 HSV image as its image argument, which is a numpy array with 3 dimensions: [:,:,:]. It also expects a tuple with 2 elements as its imSize argument, of course making sure the function passing the arguments takes into account that in numpy arrays the rows and colums are swapped (not: x, y, but: y, x).
newSize then contains a tuple containing fracions that are used to resize the input image to a specific scale, and sm becomes a resized version of the input image. This all works fine.
This is my goal:
The following line:
np.array([np.tile(sm[i,j,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))]),
should function equivalent to the MATLAB expression:
repmat(sm(j,i,:),[size(sm,1) size(sm,2)]),
This is my problem:
Testing this, an OpenCV2 image with dimensions 800x479x3 is passed as the image argument, and (64, 48) (a tuple) is passed as the imSize argument.
However when testing this, I get the following ValueError:
dmap = sm[:,:,:]-np.array([np.tile(sm[i,j,:], (len(sm[0]),
len(sm[1]))) for k in xrange(len(sm[2]))])
ValueError: operands could not be broadcast together with
shapes (48,64,3) (64,64,192)
So it seems that the array dimensions do not match and numpy has a problem with that. But my question is what? And how do I get this working?
These 2 calculations match:
octave:26> sm=reshape(1:12,2,2,3)
octave:27> x=repmat(sm(1,2,:),[size(sm,1) size(sm,2)])
octave:28> x(:,:,2)
7 7
7 7
In [45]: sm=np.arange(1,13).reshape(2,2,3,order='F')
In [46]: x=np.tile(sm[0,1,:],[sm.shape[0],sm.shape[1],1])
In [47]: x[:,:,1]
Out[47]:
array([[7, 7],
[7, 7]])
This runs:
sm[:,:,:]-np.array([np.tile(sm[0,1,:], (2,2,1)) for k in xrange(3)])
But it produces a (3,2,2,3) array, with replication on the 1st dimension. I don't think you want that k loop.
What's the intent with?
for i in ...:
for j in ...:
data = ...
You'll only get results from the last iteration. Did you want data += ...? If so, this might work (for a (N,M,K) shaped sm)
np.sum(np.array([sm-np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
z = np.array([np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
np.sum(sm - z, axis=0) # let numpy broadcast sm
Actually I don't even need the tile. Let broadcasting do the work:
np.sum(np.array([sm-sm[i,j,:] for i in xrange(N) for j in xrange(M)]),axis=0)
I can get rid of the loops with repeat.
sm1 = sm.reshape(N*M,L) # combine 1st 2 dim to simplify repeat
z1 = np.repeat(sm1, N*M, axis=0).reshape(N*M,N*M,L)
x1 = np.sum(sm1 - z1, axis=0).reshape(N,M,L)
I can also apply broadcasting to the last case
x4 = np.sum(sm1-sm1[:,None,:], 0).reshape(N,M,L)
# = np.sum(sm1[None,:,:]-sm1[:,None,:], 0).reshape(N,M,L)
With sm I have to expand (and sum) 2 dimensions:
x5 = np.sum(np.sum(sm[None,:,None,:,:]-sm[:,None,:,None,:],0),1)
len(sm[0]) and len(sm[1]) are not the sizes of the first and second dimensions of sm. They are the lengths of the first and second row of sm, and should both return the same value. You probably want to replace them with sm.shape[0] and sm.shape[1], which are equivalent to your Matlab code, although I am not sure that it will work as you expect it to.