Clarification of the leading dimension in CUBLAS when transposing - c

For a matrix A, the documentation only states that the corresponding leading dimension parameter lda refers to the:
leading dimension of two-dimensional array used to store the matrix A
Thus I presume this is just the number of rows of A given CUBLAS' column major storage format. However, when we consider Op(A), what does the leading dimension refer to now?

Nothing changes. The leading dimension always refers to the length of the first dimension of the array. The data order flags (normal, transpose, conjugate) only indicate to BLAS how the data within the array is stored. They have no effect on the array itself, which is always column major ordered and requires an LDA value for indexing in 2D.
So whether the matrix data is stored in transposed form or not, an m x n array always has LDA>=m.

If you are using row-major representation then the number of "columns" will be leading dimension and vice versa in column-major representation number of "rows".

Related

Sparse matrices, sparse accumulator and multiplication

I'm trying to implement the algorithm for multiplying two sparse matrices from this paper: https://crd.lbl.gov/assets/pubs_presos/spgemmicpp08.pdf (the first algorithm - 1D algorithm).
What bothers me is that I'm not sure what SPA (sparse accumulator) really is. I've done some research and what I've concluded is that SPA represents a 𝐬𝐢𝐧𝐠𝐥𝐞 row/column of a sparse matrix (I'm mostly not sure about that part) and that it consists of a dense vector with nonzero values, a list of indices of nonzero elements (why list?) and a bool dense vector consisting of "occupied" flags (𝑇𝑟𝑢𝑒 on 𝑖-th index if an element in the active row/column on that position is not zero). Some also keep the number of nonzero inputs.
Am I correct? If so, I have some questions. If this structure has a dense boolean vector and we must keep the values, isn't it easier to simply fill one dense vector and ignore that it's sparse? I'm sure that there are reasons why this is more efficient (memory and time), but I don't see why.
Also, as I've already asked, why is everything a vector except the list of indices? Why isn't that also a vector?
Thanks in advance!
Many sparse matrix algorithms use a dense working vector to allow random access to the currently "active" column or row of a matrix.
The sparse MATLAB implementation formalizes this idea by defining an
abstract data type called the sparse accumulator, or SPA. The SPA consists of a dense vector of real (or complex) values, a dense vector of true/false "occupied" flags, and an unordered list of the indices whose occupied flags are true.
The SPA represents a column vector whose "unoccupied" positions are zero and
whose "occupied" positions have values (zero or nonzero) specified by the dense real or complex vector. It allows random access to a single element in constant time, as well as sequencing through the occupied positions in constant time per element.
Check section 3.1.3 at https://epubs.siam.org/doi/pdf/10.1137/0613024

signrank test in a three-dimensional array in MATLAB

I have a 60x60x35 array and would like to calculate the Wilcoxon signed rank test to calculate if the median for each element value across the third array dimension (i.e. with 35 values) is different from zero. Thus, I would like my results in two 60x60 arrays - with values of 0 and 1 depending on the test statistic, and in a separate array with corresponding p values.
The problem I am facing is specifying the command in a way that desired output would have appropriate dimensions and would be calculated across the appropriate dimension of the array.
Thanks for your help and all the best!
So one way to solve your problem is using a nested for-loop. Lets say your data is stored in data:
data=rand(60,60,35);
size_data=size(data);
p=zeros(size_data(1),size_data(2));
p(:,:)=NaN;
h=zeros(size_data(1),size_data(2));
h(:,:)=NaN;
for k=1:size_data(1)
for l=1:size_data(2)
tmp_data=data(k,l,:);
tmp_data=reshape(tmp_data,1,numel(tmp_data));
[p(k,l), h(k,l)]=signrank(tmp_data);
end
end
What I am doing is I preallocate the memory of p,h as a 60x60 matrix. Then I set them to NaN, so if you can easily see if sth went wrong (0 would be an acceptable result). Now I loop over all elements and store the actual data array in a new variable. signrank needs the data to be an array so I reshape it to two dimensions.
I guess you could skip those loops by using bsxfun

How to efficiently store large sparse array

I am tracking particles into a 3D lattice. Each lattice element is labeled with an index corresponding to an unrolled 3D array
S = x + WIDTH * (y + DEPTH * z)
I am interested in the transition form cell S1 to cell S2. The resulting transition matrix M(S1,S2) is sparsely populated, because particles can reach only near by cells. Unfortunately using the indexing of an unrolled 3D array cells that are geometrically near might have big difference in their indexes. For instance, cells that are siting on top of each other (say at z and z+1) will have their indexes shifted by WIDTH*DEPTH. Therefore if I try accumulating the resulting 2D matrix M(S1,S2) , S1 and S2 will be very different, even dough the cells are adjacent. This is a significant problem, because I can't use the usual sparse matrix storage.
At the beginning I tried storing the matrix in coordinate format:
I , J VALUE
Unfortunately I need to loop the entire index set to find the proper S1,S2 and store the accumulated M(S1,S2).
Unusually sparse matrices have some underlying structure and therefore the indexing is quite straightforward. In this case however, I have some troubles figuring out how to index my cells.
I would appreciate your help
Thank you in advance,
There are several approaches. Which is best depends on operations that need to be performed on the matrix.
A good general purpose one is to use a hash table where the key is the index tuple, in your case (i,j).
If neighboring (in the Euclidean sense) matrix elements must be discoverable, then an alternate strategy is a balanced tree with a Morton Order key. The Morton order value of a key (i,j) is just the integers i and j with their bits interleaved. You should quickly see that index tuples close to each other in the index 2-space are also close in linear Morton order.
Of course if you are building the matrix all at once, after which it's immutable, then you can build the key-value pairs in an array rather than a hash table or balanced tree, sort them (lexicographically for (i,j) pairs and linearly for Morton keys) and then do reads with simple binary search.

Mean of a 4D array across selected dimensions

I am using the mean function in MATLAB on a 4D matrix.
The matrix is a 32x2x20x7 array and I wish to find the mean of each row, of all columns and elements of 3rd dimension, for each 4th dimension.
So basically mean(data(b,:,:,c)) [pseudo-code] for each b, c.
However, when I do this it spits me out separate means for each 3rd dimension, do you know how I can get it to give me one mean for the above equation - so it would be (32x7=)224 means.
You could do it without loops:
data = rand(32,2,20,7); %// example data
squeeze(mean(mean(data,3),2))
The key is to use a second argument to mean, which specifies across which dimension the mean is taken (in your case: dimensions 2 and 3). squeeze just removes singleton dimensions.
this should work
a=rand(32,2,20,7);
for i=1:32
for j=1:7
c=a(i,:,:,j);
mean(c(:))
end
end
Note that with two calls to mean, there will be small numerical differences in the result depending on the order of operations. As such, I suggest doing this with one call to mean to avoid such concerns:
squeeze(mean(reshape(data,size(data,1),[],size(data,4)),2))
Or if you dislike squeeze (some people do!):
mean(permute(reshape(data,size(data,1),[],size(data,4)),[1 3 2]),3)
Both commands use reshape to combine the second and third dimensions of data, so that a single call to mean on the new larger second dimension will perform all of the required computations.

How to get mean, median, and other statistics over entire matrix, array or dataframe?

I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean and median.
For a matrix, or array, as the others have stated, mean and median will return a single value. However, var will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var goes back to returning a single value. sd on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector() first. Having fun yet?
For a data.frame, mean is deprecated, but will again act on the columns separately. median requires that you coerce to a vector first, or unlist. As before, var will return the covariances, and sd is again deprecated but will return the standard deviation of the columns. mad requires that you coerce to a vector or unlist. In general for a data.frame if you want something to act on all values, you generally will just unlist it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices are
defunct.
By default, mean and median etc work over an entire array or matrix.
E.g.:
# array:
m <- array(runif(100),dim=c(10,10))
mean(m) # returns *one* value.
# matrix:
mean(as.matrix(m)) # same as before
For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):
# data frame
mdf <- as.data.frame(m)
# mean(mdf) returns column means
mean( as.matrix(mdf) ) # one value.
Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.
You can use library dplyr via install.packages('dplyr') and then
dataframe.mean <- dataframe %>%
summarise_all(mean) # replace for median

Resources