I got a piece of code here and I can't seem to figure out an efficient way of converting this piece of code to the Fortran 95 equivalent. I have tried several things already, but I'm always stuck on making 1D arrays from matrices and the other way around (the point is to reduce calculation time, and if I convert them, I can't think of another way than using loops again :/).
This is the piece of code:
do i=1,dim
do j=1,dim
Snorm(i,j)=Sval(j)/Sval(i)
Bnorm(i,j)=Bval(j)/Bval(i)
Pnorm(i,j)=Pval(j)/Pval(i)
enddo
enddo
How would you write that in Fortran95 code?
The equivalent of the matrix calculations in R is this:
Snorm <- t(Sval %*% t(1/Sval))
Bnorm <- t(Bval %*% t(1/Bval))
Pnorm <- t(Pval %*% t(1/Pval))
The equivalent of it in Python is this:
Snorm = (numpy.dot((Svalmat.T),(1/Svalmat))).T
Bnorm = (numpy.dot((Bvalmat.T),(1/Bvalmat))).T
Pnorm = (numpy.dot((Pvalmat.T),(1/Pvalmat))).T
with Svalmat etc the equivalent of Sval, but then columnmatrix
Anyone has an idea?
It is not worth changing in my opinion. It is valid Fortran 95. Especially if your goal is the calculation time. Any "clever" tricks with subarrays can introduce array temporaries.
The obvious try is forall or do concurrent
forall(i=1:dim, j=1:dim)
Snorm(i,j)=Sval(j)/Sval(i)
Bnorm(i,j)=Bval(j)/Bval(i)
Pnorm(i,j)=Pval(j)/Pval(i)
end forall
and the same with do concurrent.
Notice, that your original order of the loops is probably not efficient.
Related
I need to multiply parts of a column vector with a fixed row vector. I solved this problem using a for-loop. However, I am wondering if the performance can be improved as I have to perform this kind of computation around 50 million times. Here's my code so far:
multMat = 1:5;
mat = randi(5,10,1);
windowSize = 5;
vout = nan(10,1);
for r = windowSize : 10
vout(r) = multMat * mat( (r - windowSize + 1) : r);
end
I was thinking about uisng arrayfun. However, first I don't know how to adress the cell range (i.e. the previous five cells including the current cell), and second, I am not sure if arrayfun will be any faster than using the loop?
This sliding vector multiplication you're describing is an example of what is known as convolution. The following produces the same result as the loop in your example:
vout = [nan(windowSize-1,1);
conv(mat,flip(multMat),'valid')];
If your output doesn't really need the leading NaN values which aren't overwritten in your loop then the conv expression is sufficient without concatenating the NaN elements to it.
For sufficiently large vectors this is of course not guaranteed to be as fast as you'd like it to be, but MATLAB's built-in convolution implementation is likely to be pretty close to an optimal tool for the job.
I'm doing some Bayesian analysis using Stan and I'm trying to make my code more efficient.
In my Stan model string, I have a variable that is an NxJ matrix. It is declared this way to make use of quick matrix operations and assignments.
However, in the final modeling step (assigning a distribution), I need to transform this NxJ matrix into an N-long array that contains J real values in each of the array's elements.
In other words, I want the following transformation:
matrix[N,J] x;
vector[J] y[N];
for (i in 1:N)
for (j in 1:J)
y[i][j] = x[i,j]
Is there any way to do this in a vectorized way without for loops?
Thank you!!!!
No. Loops are very fast in Stan. The only reason to vectorize for speed is if there are derivatives involved. You could shorten it a bit to
for (n in 1:N)
y[n] = x[n]';
but it wouldn't be any more efficient.
I should qualify this by saying that there is one inefficiency here, which is lack of memory locality. If the matrices are large, they'll be slow to traverse by row because internally they are stored column-major.
OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.
I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.
Consider the case where I have a function that provides the square of a Float64. I might write this as:
function mysquare(x::Float64)
return(x^2);
end
Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:
function mysquare(x::Array{Float64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:
function mysquare(x::Int64)
return(x^2);
end
function mysquare(x::Array{Int64, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?
function mysquare{T<:Number}(x::T)
return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
y = Array(Float64, length(x));
for k = 1:length(x)
y[k] = x[k]^2;
end
return(y);
end
This feels sensible, but will my code run as quickly as the case where I avoid parametric types?
In summary, there are two parts to my question:
If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?
When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?
Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.
Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.
As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro #vectorize_1arg to automatically generate the array version, e.g.:
function mysquare{T<:Number}(x::T)
return(x^2)
end
#vectorize_1arg Number mysquare
println(mysquare([1,2,3]))
As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.
As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(T, n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where I've added the #inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be
function mysquare{T<:Number}(x::Array{T,1})
n = length(x)
y = Array(typeof(one(T)^2), n)
for k = 1:n
#inbounds y[k] = x[k]^2
end
return y
end
where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.
As of Julia 0.6 (c. June 2017), the "dot syntax" provides an easy and idiomatic way to apply a function to a scalar or an array.
You only need to provide the scalar version of the function, written in the normal way.
function mysquare{x::Number)
return(x^2)
end
Append a . to the function name (or preprend it to the operator) to call it on every element of an array:
x = [1 2 3 4]
x2 = mysquare(2) # 4
xs = mysquare.(x) # [1,4,9,16]
xs = mysquare.(x*x') # [1 4 9 16; 4 16 36 64; 9 36 81 144; 16 64 144 256]
y = x .+ 1 # [2 3 4 5]
Note that the dot-call will handle broadcasting, as in the last example.
If you have multiple dot-calls in the same expression, they will be fused so that y = sqrt.(sin.(x)) makes a single pass/allocation, instead of creating a temporary expression containing sin(x) and forwarding it to the sqrt() function. (This is different from Matlab/Numpy/Octave/Python/R, which don't make such a guarantee).
The macro #. vectorizes everything on a line, so #. y=sqrt(sin(x)) is the same as y = sqrt.(sin.(x)). This is particularly handy with polynomials, where the repeated dots can be confusing...
This is another step of my battle with multi-dimensional arrays in R, previous question is here :)
I have a big R array with the following dimensions:
> data = array(..., dim = c(x, y, N, value))
I'd like to perform a sort of bootstrap comparing the mean (see here for a discussion about it) obtained with:
> vmean = apply(data, c(1,2,3), mean)
With the mean obtained sampling the N values randomly with replacement, to explain better if data[1,1,,1] is equals to [v1 v2 v3 ... vN] I'd like to replace it with something like [v_k1 v_k2 v_k3 ... v_kN] with k values sampled with sample(N, N, replace = T).
Of course I want to AVOID a for loop. I've read this but I don't know how to perform an efficient indexing of this array avoiding a loop through x and y.
Any ideas?
UPDATE: the important thing here is that I want a different sample for each sample in the fourth (value) dimension, otherwise it would be simple to do something like:
> dataSample = data[,,sample(N, N, replace = T), ]
Also there's the compiler package which speeds up for loops by using a Just In Time compiler.
Adding thes lines at the top of your code enables the compiler for all code.
require("compiler")
compilePKGS(enable=T)
enableJIT(3)
setCompilerOptions(suppressAll=T)
I have this 'simplified' fortran code
real B(100, 200)
real A(100,200)
... initialize B array code.
do I = 1, 100
do J = 1, 200
A(J,I) = B(J,I)
end do
end do
One of the programming gurus warned me, that fortran accesses data efficiently in column order, while c accesses data efficiently in row order. He suggested that I take a good hard look at the code, and be prepared to switch loops around to maintain the speed of the old program.
Being the lazy programmer that I am, and recognizing the days of effort involved, and the mistakes I am likely to make, I started wondering if there might a #define technique that would let me convert this code safely, and easily.
Do you have any suggestions?
In C, multi-dimensional arrays work like this:
#define array_length(a) (sizeof(a)/sizeof((a)[0]))
float a[100][200];
a[x][y] == ((float *)a)[array_length(a[0])*x + y];
In other words, they're really flat arrays and [][] is just syntactic sugar.
Suppose you do this:
#define at(a, i, j) ((typeof(**(a)) *)a)[(i) + array_length((a)[0])*(j)]
float a[100][200];
float b[100][200];
for (i = 0; i < 100; i++)
for (j = 0; j < 200; j++)
at(a, j, i) = at(b, j, i);
You're walking sequentially through memory, and pretending that a and b are actually laid out in column-major order. It's kind of horrible in that a[x][y] != at(a, x, y) != a[y][x], but as long as you remember that it's tricked out like this, you'll be fine.
Edit
Man, I feel dumb. The intention of this definition is to make at(a, x, y) == at[y][x], and it does. So the much simpler and easier to understand
#define at(a, i, j) (a)[j][i]
would be better that what I suggested above.
Are you sure your FORTRAN guys did things right?
The code snippet you originally posted is already accessing the arrays in row-major order (which is 'inefficient' for FORTRAN, 'efficient' for C).
As illustrated by the snippet of code and as mentioned in your question, getting this 'correct' can be error prone. Worry about getting the FORTRAN code ported to C first without worrying about details like this. When the port is working - then you can worry about changing column-order accesses to row-order accesses (if it even really matters after the port is working).
One of my first programming jobs out of college was to fix a long-running C app that had been ported from FORTRAN. The arrays were much larger than yours and it was taking something around 27 hours per run. After fixing it, they ran in about 2.5 hours... pretty sweet!
(OK, it really wasn't assigned, but I was curious and found a big problem with their code. Some of the old timers didn't like me much despite this fix.)
It would seem that the same issue is found here.
real B(100, 200)
real A(100,200)
... initialize B array code.
do I = 1, 100
do J = 1, 200
A(I,J) = B(I,J)
end do
end do
Your looping (to be good FORTRAN) would be:
real B(100, 200)
real A(100,200)
... initialize B array code.
do J = 1, 200
do I = 1, 100
A(I,J) = B(I,J)
end do
end do
Otherwise you are marching through the arrays in row-major, which could be highly inefficient.
At least I believe that's how it would be in FORTRAN - it's been a long time.
Saw you updated the code...
Now, you'd want to swap the loop control variables so that you iterate on the rows and then inside that iterate on the columns if you are converting to C.