Numpy reshape list to 3D Array - arrays

I have a list of arrays that I would like to be reshaped. Each array is a trial, the columns within each array is a feature, and the rows in each array is the timestep. I would like the list reshaped to (trial, timestep, feature). As an example, D is what I am trying to convert to a 3D array - the timesteps are not uniform.
A = np.random.rand(3,10) #Trial 1 has 3 timesteps and ten features
B = np.random.rand(10,10) #Trial 2 has 10 timesteps and ten features
C = np.random.rand(7,10) #Trial 3 has 7 timesteps and ten features
D = [A,B,C,D] #Data as given in the form of a list
How am I able to get a 3d array with variable timesteps? I am trying to use this an input to a keras neural network

You can do the following:
D = tf.ragged.stack([tf.RaggedTensor.from_tensor(x) for x in [A, B, C]])
This yields an ragged tensor with shape: TensorShape([3, None, None])
Or
values = np.vstack([A, B, C])
D = tf.RaggedTensor.from_row_lengths(values, [x.shape[0] for x in [A, B, C]])
which yields a ragged tensor with shape: (3, None, 10)
Working with ragged tensors in keras can be tricky.
Typically, for most applications the best choice is to use a reasonable number for the maximum sequence dimension and mask the sequences that are empty. For some applications I'm working on that is not a very attractive option because I have lots of documents with very small sequences and then some with very large sequences. But if you don't feel really comfortable with keras/tensorflow mechanics you should probably avoid using ragged tensors.

Related

Tensorflow: analogue for numpy.take?

Is there analogue for numpy.take?
I want to form N+1-dimensional array from N-dimensional array, more precisely from array with shape (B, H, W, C) I want to make (B, H, W, X, C) array.
I suppose that for my case there is solution even without such general operation. But I'm really unsure that if I will write code with multiple intermediate operations and tensors (shifting, repeating and so on) TF will be able to optimize it and remove unnecessary operations. Moreover I suppose that such code will be unclean and just awful.
I want to add dimension with shifted values. I.e. for (H,W)->(H,W,3) dimensions case indices must be
[
[[0,0], #[0,-1], may be padding with zeros but for now pad with edge value
[0,0],
[0,1]],
[[0,0],
[0,1]
[0,2]]
...
[[1,0],
[1,0],
[1,1]],
[[1,0],
[1,1],
[1,2]],
...
]
I thought about tf.scatter_nd (https://www.tensorflow.org/api_docs/python/tf/scatter_nd) but for now I don't understand how to use it. If I understand correctly, I can't use indices with shapes larger than shapes of update array (i.e. I can't use indices with shape (3,4,5,3) and update with shape (3,4,3) or even (3,4,1,3). If it's so then this operation seems useless until I make intermediate array with shape that I need to form in result.
UPD: may be I'm wrong and tensors operations (shifting, tiling and so on) is more appropriate and efficient solution.
But in any case I think that analogue for np.take will be useful.
The closest function in tensorflow to np.take are tf.gather and tf.gather_nd.
tf.gather_nd is more general than tf.gather (and np.take) as it can slices through several dimensions at once.
A noticeable restriction of tf.gather[_nd] compared to np.take is that they slice through the first dimensions of the tensor only -- you can't slice through inner dimensions. When you want to slice through an arbitrary dimension (as in your case), you need to transpose the array to put the slice dimensions first, gather, then transpose back.
Exemplary code for tf.gather replacing np.take:
import numpy as np
a = np.array([5, 7, 42])
b = np.random.randint(0, 3, (2, 3, 4))
c = a[b]
result_numpy = np.take(a, b)
print(a, b, c, result_numpy)
import tensorflow as tf
a = tf.convert_to_tensor(a)
b = tf.convert_to_tensor(b)
# c = a[b] # does not work
result_tf = tf.gather(a, b)
print(a, b, result_tf)
assert(np.array_equal(result_numpy, result_tf.numpy()))

How to structure multiple python arrays for sorting

A fourier analysis I'm doing outputs 5 data fields, each of which I've collected into 1-d numpy arrays: freq bin #, amplitude, wavelength, normalized amplitude, %power.
How best to structure the data so I can sort by descending amplitude?
When testing with just one data field, I was able to use a dict as follows:
fourier_tuples = zip(range(len(fourier)), fourier)
fourier_map = dict(fourier_tuples)
import operator
fourier_sorted = sorted(fourier_map.items(), key=operator.itemgetter(1))
fourier_sorted = np.argsort(-fourier)[:3]
My intent was to add the other arrays to line 1, but this doesn't work since dicts only accept 2 terms. (That's why this post doesn't solve my issue.)
Stepping back, is this a reasonable approach, or are there better ways to combine & sort separate arrays? Ultimately, I want to take the data values from the top 3 freqs and associated other data, and write them to an output data file.
Here's a snippet of my data:
fourier = np.array([1.77635684e-14, 4.49872050e+01, 1.05094837e+01, 8.24322470e+00, 2.36715913e+01])
freqs = np.array([0. , 0.00246951, 0.00493902, 0.00740854, 0.00987805])
wavelengths = np.array([inf, 404.93827165, 202.46913583, 134.97942388, 101.23456791])
amps = np.array([4.33257766e-16, 1.09724890e+00, 2.56328871e-01, 2.01054261e-01, 5.77355886e-01])
powers% = np.array([4.8508237956526163e-32, 0.31112370227749603, 0.016979224022185751, 0.010445983875848858, 0.086141014686372669])
The last 4 arrays are other fields corresponding to 'fourier'. (Actual array lengths are 42, but pared down to 5 for simplicity.)
You appear to be using numpy, so here is the numpy way of doing this. You have the right function np.argsort in your post, but you don't seem to use it correctly:
order = np.argsort(amplitudes)
This is similar to your dictionary trick only it computes the inverse shuffling compared to your procedure. Btw. why go through a dictionary and not simply a list of tuples?
The contents of order are now indices into amplitudes the first cell of order contains the position of the smallest element of amplitudes, the second cell contains the position of the next etc. Therefore
top5 = order[:-6:-1]
Provided your data are 1d numpy arrays you can use top5 to extract the elements corresponding to the top 5 ampltiudes by using advanced indexing
freq_bin[top5]
amplitudes[top5]
wavelength[top5]
If you want you can group them together in columns and apply top5 to the resulting n-by-5 array:
np.c_[freq_bin, amplitudes, wavelength, ...][top5, :]
If I understand correctly you have 5 separate lists of the same length and you are trying to sort all of them based on one of them. To do that you can either use numpy or do it with vanilla python. Here are two examples from top of my head (sorting is based on the 2nd list).
a = [11,13,10,14,15]
b = [2,4,1,0,3]
c = [22,20,23,25,24]
#numpy solution
import numpy as np
my_array = np.array([a,b,c])
my_sorted_array = my_array[:,my_array[1,:].argsort()]
#vanilla python solution
from operator import itemgetter
my_list = zip(a,b,c)
my_sorted_list = sorted(my_list,key=itemgetter(1))
You can then flip the array with my_sorted_array = np.fliplr(my_sorted_array) if you wish or if you are working with lists you can reverse it in place with my_sorted_list.reverse()
EDIT:
To get first n values only, you have to simply slice the array similarly to what #Paul is suggesting. Slice is done in a similar manner to classic list slicing by specifying start:stop:step (you can omit the step) arguments. In your case for 5 top columns it would be [:,-5:]. So in the example above you can take top 2 columns from each row like this:
my_sliced_sorted_array = my_sorted_array[:,-2:]
result will be:
array([[15, 13],
[ 3, 4],
[24, 20]])
Hope it helps.

MATLAB pairwise differences in Nth dimension

Say i have a N-Dimensional matrix A that can be of any size. For example:
A = rand([2,5,3]);
I want to calculate all possible pairwise differences between elements of the matrix, along a given dimension. For example, if i wanted to calculate the differences along dimension 3, a shortcut would be to create a matrix like so:
B = cat(3, A(:,:,2) - A(:,:,1), A(:,:,3) - A(:,:,1), A(:,:,3) - A(:,:,2));
However, i want this to be able to operate along any dimension, with a matrix of any size. So, ideally, i'd like to either create a function that takes in a matrix A and calculates all pairwise differences along dimension DIM, or find a builtin MATLAB function that does the same thing.
The diff function seems like it could be useful, but it only calculates differences between adjacent elements, not all possible differences.
Doing my research on this issue, i have found a couple of posts about getting all possible differences, but most of these are for items in a vector (and ignore the dimensionality issue). Does anyone know of a quick fix?
Specific Dimension Cases
If you don't care about a general solution, for a dim=3 case, it would be as simple as couple lines of code -
dim = 3
idx = fliplr(nchoosek(1:size(A,dim),2))
B = A(:,:,idx(:,1)) - A(:,:,idx(:,2))
You can move around those idx(..) to specific dimension positions, if you happen to know the dimension before-hand. So, let's say dim = 4, then just do -
B = A(:,:,:,idx(:,1)) - A(:,:,:,idx(:,2))
Or let's say dim = 3, but A is a 4D array, then do -
B = A(:,:,idx(:,1),:) - A(:,:,idx(:,2),:)
Generic Case
For a Nth dim case, it seems you need to welcome a party of reshapes and permutes -
function out = pairwise_diff(A,dim)
%// New permuting dimensions
new_permute = [dim setdiff(1:ndims(A),dim)];
%// Permuted A and its 2D reshaped version
A_perm = permute(A,new_permute);
A_perm_2d = reshape(A_perm,size(A,dim),[]);
%// Get pairiwse indices for that dimension
N = size(A,dim);
[Y,X] = find(bsxfun(#gt,[1:N]',[1:N])); %//' OR fliplr(nchoosek(1:size(A,dim),2))
%// Get size of new permuted array that would have the length of
%// first dimension equal to number of such pairwise combinations
sz_A_perm = size(A_perm);
sz_A_perm(1) = numel(Y);
%// Get the paiwise differences; reshape to a multidimensiona array of same
%// number of dimensions as the input array
diff_mat = reshape(A_perm_2d(Y,:) - A_perm_2d(X,:),sz_A_perm);
%// Permute back to original dimension sequence as the final output
[~,return_permute] = sort(new_permute);
out = permute(diff_mat,return_permute);
return
So much for a generalization , huh!

Cross products of elements of 3D array and matrix columns without loop in R

I'm working on a fishery stock assessment model and want to speed it up by removing a loop (actually two loops of the same form).
I have an array, A, dim(A)=[L,L,Y], and a matrix, M, dim(M)=[L,Y].
These are used to make a matrix, mat, dim(mat)=[L,Y], by calculating matrix products. My loop looks like:
for(i in 1:Y){
mat[,i]<-(A[,,i]%*%M[,i])[,1]}
Can anyone help me out? I really need a speed gain.
Also, (don't know if it'll make a difference but) each A[,,i] matrix is lower triangular.
I'm pretty sure this will give you the results you want. Since there is no reproducible example, I can't be absolutely sure. Had to trace some of the linear algebra logic to see what you are trying to accomplish.
library(plyr) # We need this to split the array into a list of 9 matrices
B = lapply(alply(A, 3), function(x) (x%*%M)) # Perform 9 linear algebra multiplications
sapply(1:9, function(i) (B[[i]])[,i]) # Extract the 9 columns you actually want.
I used the following test data:
A = array(rnorm(225), dim = c(5,5,9))
M = matrix(rnorm(45), nrow = 5, ncol = 9)

Multiplying array columns by vector

I'm new to R and I am certain that this is simple yet I can't seem to find an answer. I have an array [36,21,12012], and I need to multiply all of the columns by a vector of the same length to create a new array of the same dimensions.
If v is your vector and a is your array, in your case it would be as simple as v * a, because arrays are built column-wise. But in general, you would use sweep. For example to multiply along the rows, sweep(a, MARGIN=2, STATS=v, FUN='*').

Resources