Is it possible in modern Fortran to use a vector to index a multidimensional array? That is, given, say,
integer, dimension(3) :: index = [4,6,9]
double precision, dimension(10,10,10) :: data
is there a better (more general) way to access data(4,6,9) than writing data(index(1), index(2), index(3))? It would be good not to have to hard-code the rank of the data array.
(Naively I would like to write data(index) but of course this actually means something different - subset "gathering" - requiring data to be a rank-one array itself.)
For what it's worth this is essentially the same question as multidimensional index by array of indices in JavaScript, but in Fortran instead. Unfortunately the clever answers there won't work with predefined array ranks.
No. And all the workarounds I can think of are ghastly hacks, you're better off writing a function to take data and index as arguments and spit out the element(s) you want.
You might, however, be able to use modern Fortran's capabilities for array rank remapping to do exactly the opposite, which might satisfy your wish to play fast-and-loose with array ranks.
Given the declaration
double precision, dimension(1000), target :: data
you can define a rank-3 pointer
double precision, pointer :: index_3d(:,:,:)
and then set it like this:
index_3d(1:10,1:10,1:10) => data
and hey presto, you can now use both rank-3 and rank-1 indices into data, which is close to what you want to do. I've not used this in anger yet, but a couple of simple tests haven't revealed any serious problems.
Related
I was trying to write a library for linear algebra operations in Haskell. In order to be able to define safe operations for matrices and vectors I wanted to encode their dimensions in their types. After some research I found that using DataKinds one is able to do that, similar to the way it's done here. For example:
data Vector (n :: Nat) a
dot :: Num a => Vector n a -> Vector n a -> a
In the aforementioned article, as well as in some libraries, the size of the vectors is a phantom type and the vector type itself is a wrapper around an Array. In trying to figure out if there is a array type with its size at the type-level in the standard library I started wondering about the underlying representation of arrays. From what I could gather form this commentary on GHC memory layout, arrays need to store their size on the heap so a 3-dimensional vector would need to take up 1 more word than necessary. Of course we could use the following definition:
data Vector3 a = Vector3 a a a
which might be fine if we only care about 3D geometry, but it doesn't allow for vectors of arbitrary size and also it makes indexing awkward.
So, my question is this. Wouldn't it be useful and a potential memory optimization to have an array type with statically known size in the standard library? As far, as I understand the only thing that it would need is a different info table, which would store the size, instead of it being stored for at each heap object. Also, the compiler could choose between Array and SmallArray automatically.
Wouldn't it be useful and a potential memory optimization to have an array type with statically known size in the standard library?
Sure. I suspect if you wrote up your use case carefully and implemented this, GHC HQ would accept a patch. You might want to do the writeup first and double-check that they're into it to avoid wasting time on a patch they won't accept, though; I certainly don't speak for them.
Also, the compiler could choose between Array and SmallArray automatically.
I'm not an expert here, but I kinda doubt this. Usually supporting polymorphism means you need a uniform representation.
Is it possible in modern Fortran to use a vector to index a multidimensional array? That is, given, say,
integer, dimension(3) :: index = [4,6,9]
double precision, dimension(10,10,10) :: data
is there a better (more general) way to access data(4,6,9) than writing data(index(1), index(2), index(3))? It would be good not to have to hard-code the rank of the data array.
(Naively I would like to write data(index) but of course this actually means something different - subset "gathering" - requiring data to be a rank-one array itself.)
For what it's worth this is essentially the same question as multidimensional index by array of indices in JavaScript, but in Fortran instead. Unfortunately the clever answers there won't work with predefined array ranks.
No. And all the workarounds I can think of are ghastly hacks, you're better off writing a function to take data and index as arguments and spit out the element(s) you want.
You might, however, be able to use modern Fortran's capabilities for array rank remapping to do exactly the opposite, which might satisfy your wish to play fast-and-loose with array ranks.
Given the declaration
double precision, dimension(1000), target :: data
you can define a rank-3 pointer
double precision, pointer :: index_3d(:,:,:)
and then set it like this:
index_3d(1:10,1:10,1:10) => data
and hey presto, you can now use both rank-3 and rank-1 indices into data, which is close to what you want to do. I've not used this in anger yet, but a couple of simple tests haven't revealed any serious problems.
In Fortran, I have an 1D array of type real, real :: work(2*N), which represents N complex numbers. I don't have any impact of the declaration of the array.
Later I need to apply a complex conjugation on work. However, conjg(work(:)) does not work since it is of type real.
Is there a efficient way to convince the compiler to apply the conjg to my array?
The easiest approach is already in the comment by HighPerformanceMark, just multiply the elements representing the imaginary part by -1.
You can also use equivalence between a real array and a complex array. It will be just one array but viewed as both real and complex. Maybe not strictly standard conforming (not sure) but working as long as N is constant.
The equivalence is used as:
real :: work(2*N)
complex :: cwork(N)
!both work and cwork point to the same data
equivalence (work, cwork)
work = some_initial_value
!this conjugates work at the same time as cwork because they are just different names for the same array
cwork = conjg(cwork)
Use a complex variable, COMPLEX :: temp(N) and apply the conjugation to that. You can then dissect the real and complex parts and put them back into your work array by using REAL(temp) and AIMAG(temp).
Probably it is better to make your work a complex type from the outset though.
I'm writing a matrix library (part of SciRuby) with multiple storage types ('stypes') and multiple data types ('dtypes'). For example, a matrix's stype may currently be dense, yale (AKA 'csr'), or list-of-lists; and its dtype may be int8, int16, int32, int64, float32, float64, complex64, etc.
It's super easy to write a template processor in Ruby or sed which takes a basic function (like sparse matrix multiplication) and creates a custom version for each possible dtype. I could even write such a template to handle two different dtypes, say if we wanted to multiply an int32 by a float64.
The same can be done in certain cases for different stypes. Eventually, though, you could end up with a very large set of functions, many of which never even get used in the course of most people's use.
It's also easy to use function pointer arrays to enable access to these functions -- and imagining even a 3-dimensional function pointer array is not too hard:
MultFuncs[lhs->stype][lhs->dtype][rhs->dtype](lhs->shape[0], rhs->shape[1], lhs->data, rhs->data, result->data);
// This might point to some function like this:
// i32_f64_dense_mult(size_t, size_t, int32_t*, float64*, float64*);
The extreme alternative to function pointer arrays, of course, which would be incredibly complicated to code and maintain, is hierarchical switch or if/else statements:
switch(lhs->stype) {
case STYPE_SPARSE:
switch(lhs->dtype) {
case DTYPE_INT32:
switch(rhs->dtype) {
case DTYPE_FLOAT64:
i32_f64_mult(lhs->shape[0], rhs->shape[1], lhs->ija, rhs->ija, lhs->a, rhs->a, result->data);
break;
// ... and so on ...
It also seems that this would be O(sd2), where s=number of stypes, d=number of dtypes for every operation, whereas the function pointer array would be O(r), where r=number of dimensions in the array.
But there's also a third option.
The third option is to use function pointer arrays for common operations (e.g., copying from one unknown type to another):
SetFuncs[lhs->dtype][rhs->dtype](5, // copy five consecutive items
&to, // destination
dtype_sizeof[lhs->dtype], // dtype_sizeof is a const size_t array giving sizeof(int8_t), sizeof(int16_t), etc.
&from, // source
dtype_sizeof[rhs->dtype]);
And then to call that from a generic sparse matrix multiplication function, which might be declared like this:
void generic_sparse_multiply(size_t* ija, size_t* ijb, void* a, void* b, int dtype_a, int dtype_b);
And that would use SetFuncs[dtype_a][dtype_b] to reference the correct assignment function, for example. The downside, then, is that you might have to implement a whole bunch of these -- IncrementFuncs, DecrementFuncs, MultFuncs, AddFuncs, SubFuncs, etc. -- because you'd never know what types to expect.
So, finally, my questions:
What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow load time? etc.
Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to compile-time optimization?
If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated every single time?
I realize this is an incredibly complicated array of questions.
If you can simply refer me to a good resource and prefer not to answer directly, that's perfectly fine. I used the Google extensively before posting this, but wasn't quite sure what search terms to use.
First of all, try to reduce the complexity of the function(s). You should be able to have a declaration like
Result_t function (Param_t*);
where Param_t is a struct containing all those things you pass around. To use generic types, include an enum in the struct telling which type that is used, and a void* to that type.
1.What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow
load time? etc.
Definitely larger executable. Load time depends on what system the code is for. If it is for a RAM-based system (PC etc), then the load time might increase, but it shouldn't have any major impact of performance. Though of course it depends on how large "enormous" is :)
2.Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to
compile-time optimization?
Probably not, there's just so much that the compiler can optimize. When dealing with generic data types in C, it often boils down to memcpy() in the end, which in itself hopefully is implemented to be as fast as copying gets.
3.If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated
every single time?
Ironically, the compiler would probably optimize it into something like an array of function pointers. The compiler can however likely not predict the nature of the data, especially if it gets set in runtime.
I want to implement (what represents abstractly) a two dimensional 4x4 matrix. All the code I write for matrix multiplication et cetera will be entirely "unrolled" as it were -- that is to say, I will not be using loops to access and write data entries in the matrix.
My question is: In C, would it be faster to use a struct as such:
typedef struct {
double e0, e1, e2, e3, e4, ..., e15
} My4x4Matrix;
Or would this be faster:
typedef double My4x4Matrix[16];
Given that I will be accessing each matrix element individually as such:
My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c.e0=a.e0+b.e0;
c.e1=a.e1+b.e1;
...
Or
My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c[0]=a[0]+b[0];
c[1]=a[1]+b[1];
...
Or are they exactly the same speed?
Any decent compiler will generate the exact same code, byte-for-byte. However, using arrays allows you a lot more flexibility; when accessing the matrix elements, you can choose whether you want to access fixed locations or address positions with variables.
I also highly question your choice to "unwind" (unroll?) all the operations by hand. Any good compiler can fully unroll loops with a constant number of iterations for you, and can perhaps even generate SIMD code and/or optimally schedule the order of instructions. You'll have a hard time doing better by hand, and you'll end up with code that's hideous to read. The fact that you asked this question suggests to me that you're probably not sufficiently experienced to do better than even a naive optimizing compiler.
Struct elements (fields) can only be accessed by their names explicitly specified in the program's source, which means that every time you access a field the actual field must be selected and hardcoded at compile time. If you wanted to implement the same thing with arrays, that would mean that you would use explicit constant compile-time array indices (as in your example). In this case the performance of the two will be exactly the same and the code generated will be exactly the same (excluding from consideration "malicious" compilers).
However, note that arrays provide you with an extra degree of freedom: if necessary, you can select array elements by a run-time index. This is something that's not possible with structs. Only you know whether it matters to you.
On the other hand, note also that arrays in C are not copyable, which means that you'll be forced to use memcpy to copy your array-based My4x4Matrix. With struct-based version normal language-level copying will work. With arrays this issue can be worked around by wrapping the actual array in a struct.
I guess both are the same speed. The difference between a struct and an array is just its meaning (in human terms.) Both will be compiled as memory addresses.
I would say the best way is to create a test to try it yourself. Results may vary based on system environments and compilers.