I am trying to convert a matlab code to C. The matlab code uses a singular value decomposition (SVD) of 3x3 matrices that I implemented in C using numerical reciepes. The matlab code works later with the right singular vectors wich are in some cases that I tested different between Matlab and C, either the second and third columns are swapped or some values are the opposites. In some cases the values are identical. Here are some examples:
Expl1: (Identical values without considering round off error)
Matlab:
-0.3939 0.9010 0.1819
0.6583 0.1385 0.7399
0.6414 0.4112 -0.6477
C:
-0.3939 0.9010 0.1819
0.6584 0.1385 0.7398
0.6414 0.4112 -0.6477
Expl2: (swapped 2nd and 3rd columns)
Matlab:
-0.0309 0.1010 0.9944
-0.0073 -0.9949 0.1008
0.9995 -0.0042 0.0315
C:
-0.0309 0.9944 0.1010
-0.0074 0.1008 -0.9949
0.9995 0.0315 -0.0042
Expl3:(opposite values)
Matlab:
-0.1712 -0.8130 -0.5566
-0.8861 -0.1199 0.4476
0.4306 -0.5698 0.6999
C:
-0.1712 0.8130 0.5566
-0.8861 0.1199 -0.4477
0.4307 0.5698 -0.6999
would this difference cause erroneous results?
The right singular vectors of a matrix are unique up to multiplication by a unit-phase factor if it has distinct singular values. When considering real singular vectors, this comes down to a change of sign (more information here).
Also, since singular vectors correspond to certain singular values (diagonal entries of ฮฃ), their order can be changed when the position of the singular values on the diagonal of ฮฃ is changed.
Whether these changes cause erroneous results depends heavily on what you intend to do with the right singular vectors later on in you code.
Related
I'm trying to implement the algorithm for multiplying two sparse matrices from this paper: https://crd.lbl.gov/assets/pubs_presos/spgemmicpp08.pdf (the first algorithm - 1D algorithm).
What bothers me is that I'm not sure what SPA (sparse accumulator) really is. I've done some research and what I've concluded is that SPA represents a ๐ฌ๐ข๐ง๐ ๐ฅ๐ row/column of a sparse matrix (I'm mostly not sure about that part) and that it consists of a dense vector with nonzero values, a list of indices of nonzero elements (why list?) and a bool dense vector consisting of "occupied" flags (๐๐๐ข๐ on ๐-th index if an element in the active row/column on that position is not zero). Some also keep the number of nonzero inputs.
Am I correct? If so, I have some questions. If this structure has a dense boolean vector and we must keep the values, isn't it easier to simply fill one dense vector and ignore that it's sparse? I'm sure that there are reasons why this is more efficient (memory and time), but I don't see why.
Also, as I've already asked, why is everything a vector except the list of indices? Why isn't that also a vector?
Thanks in advance!
Many sparse matrix algorithms use a dense working vector to allow random access to the currently "active" column or row of a matrix.
The sparse MATLAB implementation formalizes this idea by defining an
abstract data type called the sparse accumulator, or SPA. The SPA consists of a dense vector of real (or complex) values, a dense vector of true/false "occupied" flags, and an unordered list of the indices whose occupied flags are true.
The SPA represents a column vector whose "unoccupied" positions are zero and
whose "occupied" positions have values (zero or nonzero) specified by the dense real or complex vector. It allows random access to a single element in constant time, as well as sequencing through the occupied positions in constant time per element.
Check section 3.1.3 at https://epubs.siam.org/doi/pdf/10.1137/0613024
I need help
First of all, I'm not looking if the 2 data sets are equal (A==B), or if the have similar features, because they are similar.
I have two 2D data sets (there are actually 2 vector fields), one is 'fixed' and the other is 'experimental', I want to know HOW MUCH equal they are. My thought is to get a number per point who say if they are equal in a range of values (0 to 1, including decimals). That is for make an iterative algorithm to find the best experimental data set who agrees with the fixed one... but first I need to find "how much equal they are"
It's like measure the error to minimize it
If one has |A| = |B| and the same (or close) sample points, one could use simply the standard deviation of each pair of |a-b|, where a \in A, b \in B, pairwise. One doesn't need a separate temporary array if you use a stable, on-line algorithm like Welford's, (just take the square root at the end to get the standard deviation.)
I've a vector called x that is a 3D vector.
In my code I need to compute the norm of x(1,1,:) (the vector composed by x(1,1,1), x(1,1,2),...). If I try to use the norm(x(1,i,:)) command, Matlab returns me the error "Input must be 2-D". What can I do?
MATLABโs norm is a โspecialโ function, it doesnโt work like many other functions such as sum and mean.
However, vecnorm does behave like those functions. It computes the norm along the first non-singleton dimension, or you can specify along which dimension to compute the norm:
vecnorm(x(1,1,:))
vecnorm(x,2,3) % computes 2-norm for all vectors along 3rd dimension.
Note that this function was introduced in R2017b. For older versions you can emulate the behavior using sqrt(sum(x.^2,3)).
I am using the mean function in MATLAB on a 4D matrix.
The matrix is a 32x2x20x7 array and I wish to find the mean of each row, of all columns and elements of 3rd dimension, for each 4th dimension.
So basically mean(data(b,:,:,c)) [pseudo-code] for each b, c.
However, when I do this it spits me out separate means for each 3rd dimension, do you know how I can get it to give me one mean for the above equation - so it would be (32x7=)224 means.
You could do it without loops:
data = rand(32,2,20,7); %// example data
squeeze(mean(mean(data,3),2))
The key is to use a second argument to mean, which specifies across which dimension the mean is taken (in your case: dimensions 2 and 3). squeeze just removes singleton dimensions.
this should work
a=rand(32,2,20,7);
for i=1:32
for j=1:7
c=a(i,:,:,j);
mean(c(:))
end
end
Note that with two calls to mean, there will be small numerical differences in the result depending on the order of operations. As such, I suggest doing this with one call to mean to avoid such concerns:
squeeze(mean(reshape(data,size(data,1),[],size(data,4)),2))
Or if you dislike squeeze (some people do!):
mean(permute(reshape(data,size(data,1),[],size(data,4)),[1 3 2]),3)
Both commands use reshape to combine the second and third dimensions of data, so that a single call to mean on the new larger second dimension will perform all of the required computations.
I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean and median.
For a matrix, or array, as the others have stated, mean and median will return a single value. However, var will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var goes back to returning a single value. sd on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector() first. Having fun yet?
For a data.frame, mean is deprecated, but will again act on the columns separately. median requires that you coerce to a vector first, or unlist. As before, var will return the covariances, and sd is again deprecated but will return the standard deviation of the columns. mad requires that you coerce to a vector or unlist. In general for a data.frame if you want something to act on all values, you generally will just unlist it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices are
defunct.
By default, mean and median etc work over an entire array or matrix.
E.g.:
# array:
m <- array(runif(100),dim=c(10,10))
mean(m) # returns *one* value.
# matrix:
mean(as.matrix(m)) # same as before
For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):
# data frame
mdf <- as.data.frame(m)
# mean(mdf) returns column means
mean( as.matrix(mdf) ) # one value.
Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.
You can use library dplyr via install.packages('dplyr') and then
dataframe.mean <- dataframe %>%
summarise_all(mean) # replace for median