Passing in section of array to BLAS Fortran

Passing in section of array to BLAS Fortran - arrays

I recently inherited a Fortran 2008 code where calls to BLAS/LAPACK are performed without the use of implicit interfaces. That has resulted into some cases where erroneous types of arguments get input e.g. reals being passed as integers.
My greater concern however is how arrays in general are passed into BLAS/LAPACK for example
call daxpy(n, da, dx(start), 1, dy(start), 1)
where start is a valid position in the arrays dx and dy.
My understanding is that by not supplying an explicit interface for daxpy we can get away with passing in a real instead of a real array. Furthermore, in the execution of the code I suspect BLAS/LAPACK see the memory address of dx(start) and assuming its contiguous access the next n entries. Please correct me if I am wrong on this one.
The above implementation (lack of explicit interfaces and passing only the start of arrays) makes me a bit uncomfortable, hence I decided to define the BLAS/LAPACK interfaces in the code. So the above call to daxpy has become
call daxpy(n, da, dx(start:), 1, dy(start:), 1)
My question is, is passing arrays like that considered good practice/safe? And moreover, is there any way to check/guarantee that no temporary arrays are created in the call?

Related

Is there a way to use an array as an address for another array in Fortran [duplicate]

Is it possible in modern Fortran to use a vector to index a multidimensional array? That is, given, say,
integer, dimension(3) :: index = [4,6,9]
double precision, dimension(10,10,10) :: data
is there a better (more general) way to access data(4,6,9) than writing data(index(1), index(2), index(3))? It would be good not to have to hard-code the rank of the data array.
(Naively I would like to write data(index) but of course this actually means something different - subset "gathering" - requiring data to be a rank-one array itself.)
For what it's worth this is essentially the same question as multidimensional index by array of indices in JavaScript, but in Fortran instead. Unfortunately the clever answers there won't work with predefined array ranks.

No. And all the workarounds I can think of are ghastly hacks, you're better off writing a function to take data and index as arguments and spit out the element(s) you want.
You might, however, be able to use modern Fortran's capabilities for array rank remapping to do exactly the opposite, which might satisfy your wish to play fast-and-loose with array ranks.
Given the declaration
double precision, dimension(1000), target :: data
you can define a rank-3 pointer
double precision, pointer :: index_3d(:,:,:)
and then set it like this:
index_3d(1:10,1:10,1:10) => data
and hey presto, you can now use both rank-3 and rank-1 indices into data, which is close to what you want to do. I've not used this in anger yet, but a couple of simple tests haven't revealed any serious problems.

array notation vs pointer notation C

Is there any advantage to using pointer notation over array notation? I realize that there may be some special cases where pointer notation is better, but it seems to me that array notation is clearer. My professor told us that he prefers pointer notation "because it's C", but it's not something he will be marking. And I know that there are differences with declaring strings as character arrays vs declaring a pointer as a string - I'm just talking about in general looping through an array.

If you write a straightforward loop, both array and pointer forms typically compile to the same machine code.
There are differences in especially non-constant loop exit conditions, but it only matters if you are trying to optimize the loop for a specific compiler and architecture.
So, how about we consider a real world example that relies on both?
These types implement a double-precision floating-point matrix of dynamically determined size, with separate reference-counted data storage:
struct owner {
long refcount;
size_t size;
double data[]; /* C99 flexible array member */
};
struct matrix {
long rows;
long cols;
long rowstep;
long colstep;
double *origin;
struct owner *owner;
};
The idea is that when you need a matrix, you describe it using a local variable of type struct matrix. All data referred to is stored in dynamically allocated struct owner structures, in the C99 flexible array member. After you no longer need the matrix, you must explicitly "drop" it. This allows multiple matrices refer to the same data: you can even have separate row, column, or diagonal vectors, with any change to one immediately reflected in the all others (because they refer to the same data values).
When a matrix is associated with data, either by creating an empty matrix, or by referring to existing data referred to by another matrix, the owner structure refcount is incremented. Whenever a matrix is dropped, the referred to owner structure refcount is decremented. The owner structure is freed, when the refcount drops to zero. This means you only need to remember to "drop" each matrix you used, and the data referred to will be correctly managed and released as soon as possible (unneeded), but never too early.
This all assumes a single-threaded process; multithreaded handling is quite a bit more complicated.
To access element in matrix struct matrix m, row r, column c, assuming 0 <= r < m.rows and 0 <= c < m.cols, you use m.origin[r*m.rowstep + c*m.colstep].
If you want to transpose a matrix, you simply swap m.rows and m.cols, and m.rowstep and m.colstep. All that changes, is the order in which the data (stored in the owner structure) is read.
(Note that origin points to the double which appears at row 0, column 0, in the matrix; and that rowstep and colstep can be negative. This allows all kinds of weird "views" to the otherwise dull regular data, like mirrors and diagonals and so on.)
If we did not have the C99 flexible array member -- say, we only had pointers, and no array notation at all --, the owner structure data member would have to be a pointer. It would mean an additional redirection at the hardware level (slowing down the data accesses a bit). We would either need to allocate the memory pointed by data separately, or use tricks to point to an address following the owner structure itself, but suitably aligned for a double.
Multidimensional arrays do have their uses -- basically, when the sizes of all dimensions (or all but one dimension) are known --, and it's nice for the compiler to take care of the indexing, but it does not have to mean they are always easier than methods using pointers. For example, in the above matrix structure case, we can always define some helper preprocessor macros, like
#define MATRIXELEM(m, r, c) ((m).origin[(r)*(m).rowstep + (c)*(m).colstep])
which admittedly has the downside that it evaluates the first parameter, m, three times. (It means that MATRIXELEM(m++,0,0) would actually try to increment m three times.) In this particular case, m is normally a local variable of struct matrix type, which should minimize surprises. One could have e.g.
struct matrix m1, m2;
/* Stuff that initializes m1 and m2, and makes sure they point
to valid matrix data */
MATRIXELEM(m1, 0, 0) = MATRIXELEM(m2, 0, 0);
The "extra" parentheses in such macros ensure that if you use a calculation, for example i + 4*j as row, the index calculation is correct ((i + 4*j)*m.rowstep and not i + 4*j*m.rowstep). In preprocessor macros, those parentheses are not really "extra" at all. In addition to ensure the correct calculation, having the "extra" parentheses also tell other programmers that the macro writer has been careful in avoiding such arithmetic-related bugs. (I for one consider it "good form" to put the parentheses there, even in cases where they are not needed for syntax unambiquity, if it conveys that "assurance" to other developers reading the code.)
And this, after all this text, leads to my most important point: Some things are easier expressed and understood by us human programmers using array notation than pointer notation, and vice versa. "Foo"[1] is pretty obviously equal to 'o', whereas *("Foo"+1) is not nearly as obvious. (Then again, neither is 1["foo"], but you can blame the C standardization folks for that.)
Based on the examples above, I consider the two notations complementary; they do have large overlap especially in simple loops -- in which case it is okay to just pick one --, but being able to utilize both notations and pick one not based on ones proficiency in one but based on ones opinion on as to what makes most sense wrt. readability and maintainability, is an important skill for any C programmer, in my not very humble opinion.

Actually if you, say, pass an array argument to a function in C, you actually pass a pointer to its beginning. This doesn't really passes an array in a common sense, first, because passing an array would include passing its actual length, second, because passing an array (as a value) would imply its copying.
In other word, you really pass an iterator pointing to an array beginning (like std::vector::begin() in C++) but you pretend that you pass the array itself. It's very confusing in fact. So, using pointers represents things those are really happening in a much more clear way, and it definitely should be preferred.
There may be some advantages of array notation too but I don't think they overweight the drawbacks mentioned. First, using array notation emphasizes the difference between pointer to a single value and pointer to continuous block. And then, you may specify an expected size of passed array for your own reference. But that size isn't actually passed to expressions or functions or somehow checked which fact is very confusing.

C and MPI: function works differently with same data

I have successfully wrote a complicate function with PETSc library (it's a MPI-based scientific library for parallel solving huge linear systems). This library provides its own "malloc" version and basic datatypes (i.e. "PetscInt" as standard "int"). For this function, I've always been using PETSc stuff instead of standard stuff such as "malloc" and "int". The function has been extensevely tested and always worked fine. Despite the use of MPI, the function is fully serial, and all processors perform it on the same data (each processor has its copy): no communication involved at all.
Then, I decided to not use PETSc and write a standard MPI version. Basically, I rewrote all code substituting PETSc stuff with classic C stuff, not with brutal force but paying attention for substitutions (no "Replace" tool of any editor, I mean! All done by hands). During substitution, few minor changes have been made, such as declaring two different variables a and b, instead of declaring a[2]. These are the substitutions:
PetscMalloc -> malloc
PetscScalar -> double
PetscInt -> int
PetscBool -> created an enum structure to replicate it, as C doesn't have boolean datatype.
Basically, algorithms have not been changed during the substitution process. The main function is a "for" loop (actually 4 nested loops). At each iteration, it calls another function. Let's call it Disfunction. Well, Disfunction works perfectly outside the 4-cycle (as I tested it separately), but inside the 4-cycle, in some cases works, in some doesn't. Also, I checked the data passed to Disfunction at each iteration: with ECXACTELY the same input, Disfunction performs different computations between one iteration and another.
Also, computed data doesn't seem to be Undefined Behaviour, as Disfunction always gives back the same results with different runs of the program.
I've noticed that changing the number of processors for "mpiexec" gives different computational results.
That's my problem. Few other considerations: the program use extensively "malloc"; computed data is the same for all processes, correct or not; Valgrind doesn't detect errors (apart from detecting error with normal use of printf, which is another problem and an OT); Disfunction calls recursively two other functions (extensively tested in PETSc version as well); algorithms involved are mathematically correct; Disfunction depends on an integer parameter p>0: for p=1,2,3,4,5 it works PERFECTELY, for p>=6 it does not.
If asked, I can post the code but it's long and complicated (scientifically, not informatically) and I think it requires time to be explained.
My idea is that I mess up with memory allocations, but I can't understand where.
Sorry for my english and for bad formattation.

Well, I don't know if anyone is stll interested, but the problem was that PETSc functon PetscMalloc zero-initialize the data, not like standard C malloc. Stupid mistake... – user3029623

The only suggestion I can offer without reference to the code itself is to try to construct progressively simpler test cases that demonstrate your issue.
When you narrow down the iterative process to a single point in your data set or a single step (by eliminating some loops), does the error still occur? If not, that might suggest their bounds are wrong.
Does the erroneous output always occur on particular loop indices, especially the first or last? Perhaps there are some ghost or halo values you're missing or some boundary condition that you're not properly accounting for.

What are the penalties of generic functions versus function pointer arrays in C?

I'm writing a matrix library (part of SciRuby) with multiple storage types ('stypes') and multiple data types ('dtypes'). For example, a matrix's stype may currently be dense, yale (AKA 'csr'), or list-of-lists; and its dtype may be int8, int16, int32, int64, float32, float64, complex64, etc.
It's super easy to write a template processor in Ruby or sed which takes a basic function (like sparse matrix multiplication) and creates a custom version for each possible dtype. I could even write such a template to handle two different dtypes, say if we wanted to multiply an int32 by a float64.
The same can be done in certain cases for different stypes. Eventually, though, you could end up with a very large set of functions, many of which never even get used in the course of most people's use.
It's also easy to use function pointer arrays to enable access to these functions -- and imagining even a 3-dimensional function pointer array is not too hard:
MultFuncs[lhs->stype][lhs->dtype][rhs->dtype](lhs->shape[0], rhs->shape[1], lhs->data, rhs->data, result->data);
// This might point to some function like this:
// i32_f64_dense_mult(size_t, size_t, int32_t*, float64*, float64*);
The extreme alternative to function pointer arrays, of course, which would be incredibly complicated to code and maintain, is hierarchical switch or if/else statements:
switch(lhs->stype) {
case STYPE_SPARSE:
switch(lhs->dtype) {
case DTYPE_INT32:
switch(rhs->dtype) {
case DTYPE_FLOAT64:
i32_f64_mult(lhs->shape[0], rhs->shape[1], lhs->ija, rhs->ija, lhs->a, rhs->a, result->data);
break;
// ... and so on ...
It also seems that this would be O(sd2), where s=number of stypes, d=number of dtypes for every operation, whereas the function pointer array would be O(r), where r=number of dimensions in the array.
But there's also a third option.
The third option is to use function pointer arrays for common operations (e.g., copying from one unknown type to another):
SetFuncs[lhs->dtype][rhs->dtype](5, // copy five consecutive items
&to, // destination
dtype_sizeof[lhs->dtype], // dtype_sizeof is a const size_t array giving sizeof(int8_t), sizeof(int16_t), etc.
&from, // source
dtype_sizeof[rhs->dtype]);
And then to call that from a generic sparse matrix multiplication function, which might be declared like this:
void generic_sparse_multiply(size_t* ija, size_t* ijb, void* a, void* b, int dtype_a, int dtype_b);
And that would use SetFuncs[dtype_a][dtype_b] to reference the correct assignment function, for example. The downside, then, is that you might have to implement a whole bunch of these -- IncrementFuncs, DecrementFuncs, MultFuncs, AddFuncs, SubFuncs, etc. -- because you'd never know what types to expect.
So, finally, my questions:
What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow load time? etc.
Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to compile-time optimization?
If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated every single time?
I realize this is an incredibly complicated array of questions.
If you can simply refer me to a good resource and prefer not to answer directly, that's perfectly fine. I used the Google extensively before posting this, but wasn't quite sure what search terms to use.

First of all, try to reduce the complexity of the function(s). You should be able to have a declaration like
Result_t function (Param_t*);
where Param_t is a struct containing all those things you pass around. To use generic types, include an enum in the struct telling which type that is used, and a void* to that type.
1.What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow
load time? etc.
Definitely larger executable. Load time depends on what system the code is for. If it is for a RAM-based system (PC etc), then the load time might increase, but it shouldn't have any major impact of performance. Though of course it depends on how large "enormous" is :)
2.Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to
compile-time optimization?
Probably not, there's just so much that the compiler can optimize. When dealing with generic data types in C, it often boils down to memcpy() in the end, which in itself hopefully is implemented to be as fast as copying gets.
3.If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated
every single time?
Ironically, the compiler would probably optimize it into something like an array of function pointers. The compiler can however likely not predict the nature of the data, especially if it gets set in runtime.

Opaque pointers in F77

I have a project that is half in C and half in Fortran 77. [No, not Fortran 90 or 03, Fortran 77.] The code would be much cleaner if I could pass pointers generated on the C side back to Fortran, which would then pass them back as necessary for handling in other C functions. As it is, the C code is filled with global variables that shouldn't be global, and is otherwise on the verge of becoming an unstructured mess. So are there any reasonably reliable ways to pass an opaque pointer between C and Fortran?

If you are on a 32-bit platform, consider casting the pointers to integers and passing those integers to the Fortran code. When the Fortran passes them back, reconvert the integer back into a pointer, cross-fingers, and use.
From what I remember (from 25+ years ago), Fortran 77 tends to pass everything to C by pointer anyway - and character strings get passed with a length, and arrays get passed with their dimensions.
If you're on a 64-bit platform, you'll have to work out whether the Fortran 77 compiler provides any 8-byte integers (INTEGER*8?) - my suspicion is that it won't (largely confirmed by looking at the GNU documentation; if you were using Fortran 2003, you'd be in better shape, it seems). If it does, the same trick works. If it does not, you are into much dodgier territory.
You could try - against recommendations - using a union of a double and a pointer. In the C, you'd set the pointer in the union from your C code pointer, then copy the double out of the union into a Fortran REAL*8, and as long as no-one touches that except to copy it or pass it back, maybe you will be OK if the gods smile favourably upon your endeavours. Most likely though, the whole thing will explode - this sort of union has an incredible ability to detect when the customer will be most annoyed if something doesn't work and will then proceed to explode at exactly the right moment - part way through the demo, or fifteen minutes after the program goes live.
An alternative to consider (still with gritted teeth) is a union of a 64-bit pointer and an array of two 32-bit integers, and then requiring the Fortran code to pass an array of two integers when you need to return a (64-bit) pointer. Clearly, an array of one integer(s) would work to 32-bit code; maybe just require the calling code to pass an array of two integers in all cases, zeroing the unused integer value in the 32-bit pointer case? That gives you forward migratability.

You can do this with the (non-standard) Cray pointer extension:
http://gcc.gnu.org/onlinedocs/gfortran/Cray-pointers.html

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight