Opaque pointers in F77 - c

I have a project that is half in C and half in Fortran 77. [No, not Fortran 90 or 03, Fortran 77.] The code would be much cleaner if I could pass pointers generated on the C side back to Fortran, which would then pass them back as necessary for handling in other C functions. As it is, the C code is filled with global variables that shouldn't be global, and is otherwise on the verge of becoming an unstructured mess. So are there any reasonably reliable ways to pass an opaque pointer between C and Fortran?

If you are on a 32-bit platform, consider casting the pointers to integers and passing those integers to the Fortran code. When the Fortran passes them back, reconvert the integer back into a pointer, cross-fingers, and use.
From what I remember (from 25+ years ago), Fortran 77 tends to pass everything to C by pointer anyway - and character strings get passed with a length, and arrays get passed with their dimensions.
If you're on a 64-bit platform, you'll have to work out whether the Fortran 77 compiler provides any 8-byte integers (INTEGER*8?) - my suspicion is that it won't (largely confirmed by looking at the GNU documentation; if you were using Fortran 2003, you'd be in better shape, it seems). If it does, the same trick works. If it does not, you are into much dodgier territory.
You could try - against recommendations - using a union of a double and a pointer. In the C, you'd set the pointer in the union from your C code pointer, then copy the double out of the union into a Fortran REAL*8, and as long as no-one touches that except to copy it or pass it back, maybe you will be OK if the gods smile favourably upon your endeavours. Most likely though, the whole thing will explode - this sort of union has an incredible ability to detect when the customer will be most annoyed if something doesn't work and will then proceed to explode at exactly the right moment - part way through the demo, or fifteen minutes after the program goes live.
An alternative to consider (still with gritted teeth) is a union of a 64-bit pointer and an array of two 32-bit integers, and then requiring the Fortran code to pass an array of two integers when you need to return a (64-bit) pointer. Clearly, an array of one integer(s) would work to 32-bit code; maybe just require the calling code to pass an array of two integers in all cases, zeroing the unused integer value in the 32-bit pointer case? That gives you forward migratability.

You can do this with the (non-standard) Cray pointer extension:
http://gcc.gnu.org/onlinedocs/gfortran/Cray-pointers.html

Related

If a `uintptr_t` can be used to store both pointers and numbers, why can't a `void*` do the same?

Enumerations in languages like e.g. Swift or Rust support a kind of hybrid "choice plus data" mechanism, such that I could define a type like:
enum SomeOption {
None,
Index(int),
Key(string),
Callback(fn),
}
Now if I were to implement this in C, my understanding is that something like this would not be valid:
typedef enum {
is_callback_or_none,
is_string,
is_number
} my_choice;
typedef struct {
my_choice value_type;
void* value;
} my_option;
my_option x = {
.value_type = is_number,
.value = (void*)42
};
if (x.value_type == is_number) {
int n = (int)x.value;
// … use n…
}
I'm not sure what exactly I risk in doing this, but according to e.g. Can pointers store values and what is the use of void pointers? the only things I should store in a void* are actual addresses and NULL. [Aside: please turn a blind eye to the separate question of storing callback function pointers in a void* which I forgot was problematic when I made up this example.]
I suppose a more proper way to do this would be to use a union, e.g.:
typedef struct {
my_choice value_type;
union {
int number_value;
char* string_value;
void* pointer_value;
};
} my_option;
…which is probably nicer all around anyway. But I'm wondering specifically about the invalidity of the void* value version . What if I were (instead of the union solution) to simply substitute uintptr_t in place of the void*?
typedef struct {
my_choice value_type;
uintptr_t value;
} my_option;
Would storing either a pointer to a string/callback/null or a number within the uintptr_t value field of this struct be valid and [at least POSIX-]portable code? And if so, why is that okay, but not the seemingly equivalent void* value version?
The problem is that I don't understand if the rules are different re. what I can do with a uintptr_t/intptr_t vs. a void*, and if so, why they would be different?
The rules are different because they're out at the edge, within a boundary (or a grey area) between what machines can actually do, and what people want to do, and what a language standard says they can do.
Now, yes, on a "conventional" architecture, pointers and ints are both just binary integers of some size, so clearly it's possible to mix'n'match between the two.
And, again yes, this is clearly something that some people find themselves wanting to do. You've got a big, heterogeneous array of things, and some of them are plain numbers, and some of them are data pointers, and maybe some of them are function pointers, and you've got some way of knowing which is which, and sometimes it really does seem tidy to store them in one big heterogeneous array. Or you've got a function with a parameter that sometimes wants to be an integer and sometimes wants to be a pointer, and you're fine with that, too. (Well, except for all the warnings you get from your compilers, and the lectures from language lawyers and SO regulars.)
But then there's the C Standard, which goes to some pains to distinguish between integers, and data pointers, and function pointers. There are architectures where these truly aren't interchangeable, and where it's a bad idea to try. And the C Standard has always tried to accommodate those architectures.
(If you don't believe me, ask, because examples do exist.)
The C Standard could say that all pointers and all integers are more freely interchangeable. It sounds like that's what Swift and Rust have done. But it also sounds like Swift and Rust would not be implementable on those hypothetical "exotic" architectures.
These discussions get tricky because they're also at the intersection between language standards and programming practices. If you know you're always going to be using machines where integers and pointers are interchangeable, if you don't care about portability to the other, "exotic" architectures, you could just say so, and ignore the warnings, and move on -- unless your house style guide says that casts are forbidden, or your software development plan says that your code must be strictly conforming and compile without warnings. Then you might find yourself arguing with, or trying to change, the C Standard, just to get around your own SDP. (My advice: make sure that your style guide and your SDP have waiver mechanisms.)
Some day the C standard will probably get less "protective" (enabling?) of those exotic architectures. For example, I've heard it's been debated to drop the accommodation of one's complement and sign/magnitude machines, and to build the definition of two's complement into the next revision of the C Standard. But if that happens, or if other guarantees/accommodations change, it won't mean that C compilers and C programs for the exotic machines can't be written any more -- it will just mean that programmers for those machines will have to apply their own rules (like, "don't assign between int and void * and void (*)()") that aren't actually in the standard any more. (Or, equivalently, it means that strictly conforming code written for "normal" architectures won't automatically be portable to the exotic ones. Also, I guess, that the vendors of compilers for the exotic architectures won't be able to claim standards compliance any more.)
Even if void * values are represented as numbers, that does not mean the compiler handles them as it does numbers.
A uintptr_t is a number; C 2018 7.20.1.4 1 says it designates an unsigned integer type. So it behaves like other unsigned integer types: You can put any number within its range and get the same number back (and you can do arithmetic with it). The paragraph further says any valid void * can be converted to uintptr_t and that converting it back will produce the original pointer (or something equivalent, such as a pointer to the same place but with a different representation). So you can store pointers in uintptr_t objects.
However, the C standard does not say there is a range of numbers you can put into void * and get them back. 6.3.2.3 5 says that when an integer is converted to a pointer type, the result is implementation-defined (except that converting a constant zero to void * yields a null pointer, per 6.3.2.3 3). 6.3.2.3 6 says when you convert a pointer to an integer, the result is implementation-defined. (7.20.1.4 overrides this when the number is a uintptr_t that came from a pointer originally.)
So, if you store a number in a void *, how do you know it will work? The C standard does not guarantee to you that it will work. You would need some documentation for the compiler that says it will work.

Passing in section of array to BLAS Fortran

I recently inherited a Fortran 2008 code where calls to BLAS/LAPACK are performed without the use of implicit interfaces. That has resulted into some cases where erroneous types of arguments get input e.g. reals being passed as integers.
My greater concern however is how arrays in general are passed into BLAS/LAPACK for example
call daxpy(n, da, dx(start), 1, dy(start), 1)
where start is a valid position in the arrays dx and dy.
My understanding is that by not supplying an explicit interface for daxpy we can get away with passing in a real instead of a real array. Furthermore, in the execution of the code I suspect BLAS/LAPACK see the memory address of dx(start) and assuming its contiguous access the next n entries. Please correct me if I am wrong on this one.
The above implementation (lack of explicit interfaces and passing only the start of arrays) makes me a bit uncomfortable, hence I decided to define the BLAS/LAPACK interfaces in the code. So the above call to daxpy has become
call daxpy(n, da, dx(start:), 1, dy(start:), 1)
My question is, is passing arrays like that considered good practice/safe? And moreover, is there any way to check/guarantee that no temporary arrays are created in the call?

What does impossibility to return arrays actually mean in C?

I'm not trying to replicate the usual question about C not being able to return arrays but to dig a bit more deeply into it.
We cannot do this:
char f(void)[8] {
char ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
char obj_a[10];
obj_a = f();
}
But we can do:
struct s { char arr[10]; };
struct s f(void) {
struct s ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
struct s obj_a;
obj_a = f();
}
So, I was skimming the ASM code generated by gcc -S and seems to be working with the stack, addressing -x(%rbp) as with any other C function return.
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
Extra data: I am using Linux and gcc on a x64 Intel.
First of all, yes, you can encapsulate an array in a structure, and then do anything you want with that structure (assign it, return it from a function, etc.).
Second of all, as you've discovered, the compiler has little difficulty emitting code to return (or assign) structures. So that's not the reason you can't return arrays, either.
The fundamental reason you cannot do this is that, bluntly stated, arrays are second-class data structures in C. All other data structures are first-class. What are the definitions of "first-class" and "second-class" in this sense? Simply that second-class types cannot be assigned.
(Your next question might be, "Other than arrays, are there any other second-class data types?", and I think the answer is "Not really, unless you count functions".)
Intimately tied up with the fact that you can't return (or assign) arrays is that there are no values of array type, either. There are objects (variables) of array type, but whenever you try to take the value of one, you immediately get a pointer to the array's first element. [Footnote: more formally, there are no rvalues of array type, although an object of array type can be thought of as an lvalue, albeit a non-assignable one.]
So, quite aside from the fact that you can't assign to an array, you can't even generate a value to try to assign. If you say
char a[10], b[10];
a = b;
it's as if you had written
a = &b[0];
So we've got an array on the left, but a pointer on the right, and we'd have a massive type mismatch even if arrays somehow were assignable. Similarly (from your example) if we try to write
a = f();
and somewhere inside the definition of function f() we have
char ret[10];
/* ... fill ... */
return ret;
it's as if that last line said
return &ret[0];
and, again, we have no array value to return and assign to a, merely a pointer.
(In the function call example, we've also got the very significant issue that ret is a local array, perilous to try to return in C. More on this point later.)
Now, part of your question is probably "Why is it this way?", and also "If you can't assign arrays, why can you assign structures containing arrays?"
What follows is my interpretation and my opinion, but it's consistent with what Dennis Ritchie describes in his paper The Development of the C Language.
The non-assignability of arrays arises from three facts:
C is intended to be syntactically and semantically close to the machine hardware. An elementary operation in C should compile down to one or a handful of machine instructions taking one or a handful of processor cycles.
Arrays have always been special, especially in the way they relate to pointers; this special relationship evolved from and was heavily influenced by the treatment of arrays in C's predecessor language B.
Structures weren't initially in C.
Due to point 2, it's impossible to assign arrays, and due to point 1, it shouldn't be possible anyway, because a single assignment operator = shouldn't expand to code that might take N thousand cycles to copy an N thousand element array.
And then we get to point 3, which really ends up leading to a contradiction.
When C got structures, they initially weren't fully first-class either, in that you couldn't assign or return them. But the reason you couldn't was simply that the first compiler wasn't smart enough, at first, to generate the code. There was no syntactic or semantic roadblock, as there was for arrays.
And the goal all along was for structures to be first-class, and this was achieved relatively early on. The compiler caught up, and learned how to emit code to assign and return structures, shortly around the time that the first edition of K&R was going to print.
But the question remains, if an elementary operation is supposed to compile down to a small number of instructions and cycles, why doesn't that argument disallow structure assignment? And the answer is, yes, it's a contradiction.
I believe (though this is more speculation on my part) that the thinking was something like this: "First-class types are good, second-class types are unfortunate. We're stuck with second-class status for arrays, but we can do better with structs. The no-expensive-code rule isn't really a rule, it's more of a guideline. Arrays will often be large, but structs will usually be small, tens or hundreds of bytes, so assigning them won't usually be too expensive."
So a consistent application of the no-expensive-code rule fell by the wayside. C has never been perfectly regular or consistent, anyway. (Nor, for that matter, are the vast majority of successful languages, human as well as artificial.)
With all of this said, it may be worth asking, "What if C did support assigning and returning arrays? How might that work?" And the answer will have to involve some way of turning off the default behavior of arrays in expressions, namely that they tend to turn into pointers to their first element.
Sometime back in the '90's, IIRC, there was a fairly well-thought-out proposal to do exactly this. I think it involved enclosing an array expression in [ ] or [[ ]] or something. Today I can't seem to find any mention of that proposal (though I'd be grateful if someone can provide a reference). At any rate, I believe we could extend C to allow array assignment by taking the following three steps:
Remove the prohibition of using an array on the left-hand side of an assignment operator.
Remove the prohibition of declaring array-valued functions. Going back to the original question, make char f(void)[8] { ... } legal.
(This is the biggie.) Have a way of mentioning an array in an expression and ending up with a true, assignable value (an rvalue) of array type. For the sake of argument I'll posit a new operator or pseudofunction called arrayval( ... ).
[Side note: Today we have a "key definition" of array/pointer correspondence, namely that:
A reference to an object of array type which appears in an expression decays (with three exceptions) into a pointer to its first element.
The three exceptions are when the array is the operand of a sizeof operator, or a & operator, or is a string literal initializer for a character array. Under the hypothetical modifications I'm discussing here, there would be a fourth exception, namely when the array was an operand of this new arrayval operator.]
Anyway, with these modifications in place, we could write things like
char a[8], b[8] = "Hello";
a = arrayval(b);
(Obviously we would also have to decide what to do if a and b were not the same size.)
Given the function prototype
char f(void)[8];
we could also do
a = f();
Let's look at f's hypothetical definition. We might have something like
char f(void)[8] {
char ret[8];
/* ... fill ... */
return arrayval(ret);
}
Note that (with the exception of the hypothetical new arrayval() operator) this is just about what Dario Rodriguez originally posted. Also note that — in the hypothetical world where array assignment was legal, and something like arrayval() existed — this would actually work! In particular, it would not suffer the problem of returning a soon-to-be-invalid pointer to the local array ret. It would return a copy of the array, so there would be no problem at all — it would be just about perfectly analogous to the obviously-legal
int g(void) {
int ret;
/* ... compute ... */
return ret;
}
Finally, returning to the side question of "Are there any other second-class types?", I think it's more than a coincidence that functions, like arrays, automatically have their address taken when they are not being used as themselves (that is, as functions or arrays), and that there are similarly no rvalues of function type. But this is mostly an idle musing, because I don't think I've ever heard functions referred to as "second-class" types in C. (Perhaps they have, and I've forgotten.)
Footnote: Because the compiler is willing to assign structures, and typically knows how to emit efficient code for doing so, it used to be a somewhat popular trick to co-opt the compiler's struct-copying machinery in order to copy arbitrary bytes from point a to point b. In particular, you could write this somewhat strange-looking macro:
#define MEMCPY(b, a, n) (*(struct foo { char x[n]; } *)(b) = \
*(struct foo *)(a))
that behaved more or less exactly like an optimized in-line version of memcpy(). (And in fact, this trick still compiles and works under modern compilers today.)
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
It has nothing to do with capability per se. Other languages do provide the ability to return arrays, and you already know that in C you can return a struct with an array member. On the other hand, yet other languages have the same limitation that C does, and even more so. Java, for instance, cannot return arrays, nor indeed objects of any type, from methods. It can return only primitives and references to objects.
No, it is simply a question of language design. As with most other things to do with arrays, the design points here revolve around C's provision that expressions of array type are automatically converted to pointers in almost all contexts. The value provided in a return statement is no exception, so C has no way of even expressing the return of an array itself. A different choice could have been made, but it simply wasn't.
For arrays to be first-class objects, you would expect at least to be able to assign them. But that requires knowledge of the size, and the C type system is not powerful enough to attach sizes to any types. C++ could do it, but doesn't due to legacy concerns—it has references to arrays of particular size (typedef char (&some_chars)[32]), but plain arrays are still implicitly converted to pointers as in C. C++ has std::array instead, which is basically the aforementioned array-within-struct plus some syntactic sugar.
Bounty hunting.
The authors of C did not aspire to be language or type system designers. They were tool designers. C was a tool to make system programming easier.
ref: B Kernighan on Pascal Ritchie on C
There was no compelling case for C to do anything unexpected; especially as UNIX and C were ushering in the era of least surprise. Copying arrays, and making complex syntax to do so when it was the metaphorical equivalent of having a setting to burn the toast did not fit the C model.
Everything in C, the language, is effectively constant time, constant size. C, the standard, seems preoccupied with doing away with this core feature which made C so popular; so expect the, uh, standard C/2023.feb07 to feature a punctuation nightmare that enables arrays as r-values.
The decision of the C authors makes eminent sense if you view the programming world pragmatically. If you view it as a pulpit for treasured beliefs, then get onboard for C/2023.feb07 before C/2023.feb08 nullifies it.
I'm afraid in my mind it's not so much a debate of first or second class objects, it's a religious discussion of good practice and applicable practice for deep embedded applications.
Returning a structure either means a root structure being changed by stealth in the depths of the call sequence, or a duplication of data and the passing of large chunks of duplicated data. The main applications of C are still largely concentrated around the deep embedded applications. In these domains you have small processors that don't need to be passing large blocks of data. You also have engineering practice that necessitates the need to be able to operate without dynamic RAM allocation, and with minimal stack and often no heap. It could be argued the return of the structure is the same as modification via pointer, but abstracted in syntax... I'm afraid I'd argue that's not in the C philosophy of "what you see is what you get" in the same way a pointer to a type is.
Personally, I would argue you have found a loop hole, whether standard approved or not. C is designed in such a way that allocation is explicit. You pass as a matter of good practice address bus sized objects, normally in an aspirational one cycle, referring to memory that has been allocated explicitly at a controlled time within the developers ken. This makes sense in terms of code efficiency, cycle efficiency, and offers the most control and clarity of purpose. I'm afraid, in code inspection I'd throw out a function returning a structure as bad practice. C does not enforce many rules, it's a language for professional engineers in many ways as it relies upon the user enforcing their own discipline. Just because you can, doesn't mean you should... It does offer some pretty bullet proof ways to handle data of very complex size and type utilising compile time rigour and minimising the dynamic variations of footprint and at runtime.

Storing pointers to C structures in R as integers

I've been working on using a C library from R by writing custom C-functions using the library's functionality, and then accessing these C-functions from R using the .C-Interface.
In some of the C-code, I allocate space for some custom structures and want to store pointers to them in R so I can use these structures in successive calls to .C. While toying around with the .C function, I noticed I can simply cast the pointer to the C-structure to int and store it in R as an integer. Passing this integer to later calls via .C works fine, I can keep track of my structures and use them without problems.
My somewhat naive question: what is wrong with storing these pointers in integers in R? It works fine so I'm assuming there has to be some downside, but I couldn't find any info on it.
R's integers are 32 bits even on a 64 bits platform. Therefore, when working on a 64 bits system this won't work (the pointers will be 64 bits).
R has functionality for this. See the 'Writing R Extensions' manual, the section on 'External pointers and weak references'.
If you are willing to switch to c++ (which doesn't mean you have to rewrite all of your code), you can use the Rcpp package which makes this easier. See for example External pointers with Rcpp

What is the point behind unions in C?

I'm going through O'Reilly's Practical C Programming book, and having read the K&R book on the C programming language, and I am really having trouble grasping the concept behind unions.
They take the size of the largest data type that makes them up...and the most recently assigned one overwrites the rest...but why not just use / free memory as needed?
The book mentions that it's used in communication, where you need to set flags of the same size; and on a googled website, that it can eliminate odd-sized memory chunks...but is it of any use in a modern, non-embedded memory space?
Is there something crafty you can do with it and CPU registers? Is it simply a hold over from an earlier era of programming? Or does it, like the infamous goto, still have some powerful use (possibly in tight memory spaces) that makes it worth keeping around?
Well, you almost answered your question: Memory.
Back in the days memory was rather low, and even saving a few kbytes has been useful.
But even today there are scenarios where unions would be useful. For example, if you'd like to implement some kind of variant datatype. The best way to do this is using a union.
This doesn't sound like much, but let's just assume you want to use a variable either storing a 4 character string (like an ID) or a 4 byte number (which could be some hash or indeed just a number).
If you use a classic struct, this would be 8 bytes long (at least, if you're unlucky there are filling bytes as well). Using an union it's only 4 bytes. So you're saving 50% memory, which isn't a lot for one instance, but imagine having a million of these.
While you can achieve similar things by casting or subclassing a union is still the easiest way to do this.
One use of unions is having two variables occupy the same space, and a second variable in the struct decide what data type you want to read it as.
e.g. you could have a boolean 'isDouble', and a union 'doubleOrLong' which has both a double and a long. If isDouble == true interpret the union as a double else interpret it as a long.
Another use of unions is accessing data types in different representations. For instance, if you know how a double is laid out in memory, you could put a double in a union, access it as a different data type like a long, directly access its bits, its mantissa, its sign, its exponent, whatever, and do some direct manipulation with it.
You don't really need this nowadays since memory is so cheap, but in embedded systems it has its uses.
The Windows API makes use of unions quite a lot. LARGE_INTEGER is an example of such a usage. Basically, if the compiler supports 64-bit integers, use the QuadPart member; otherwise, set the low DWORD and the high DWORD manually.
It's not really a hold over, as the C language was created in 1972, when memory was a real concern.
You could make the argument that in modern, non-embedded space, you might not want to use C as a programming language to begin with. If you've chosen C as your language choice for implementation, you're looking to harness the benefits of C: it's efficient, close-to-metal, which results in tight, fast binaries.
As such, when choosing to use C, you'd still want to take advantage of it's benefits, which includes memory-space efficiency. To which, the Union works very well; allowing you to have some degree of type safety, while enforcing the smallest memory foot print available.
One place where I have seen it used is in the Doom 3/idTech 4 Fast Inverse Square Root implementation.
For those unfamiliar with this algorithm, it essentially requires treating a floating point number as an integer. The old Quake (and earlier) version of the code does this by the following:
float y = 2.0f;
// treat the bits of y as an integer
long i = * ( long * ) &y;
// do some stuff with i
// treat the bits of i as a float
y = * ( float * ) &i;
original source on GitHub
This code takes the address of a floating point number y, casts it to a pointer to a long (ie, a 32 bit integer in Quake days), and derefences it into i. Then it does some incredibly bizarre bit-twiddling stuff, and the reverse.
There are two disadvantages of doing it this way. One is that the convoluted address-of, cast, dereference process forces the value of y to be read from memory, rather than from a register1, and ditto on the way back. On Quake-era computers, however, floating point and integer registers were completely separate so you pretty much had to push to memory and back to deal with this restriction.
The second is that, at least in C++, doing such casting is deeply frowned upon, even when doing what amounts to voodoo such as this function does. I'm sure there are more compelling arguments, however I'm not sure what they are :)
So, in Doom 3, id included the following bit in their new implementation (which uses a different set of bit twiddling, but a similar idea):
union _flint {
dword i;
float f;
};
...
union _flint seed;
seed.i = /* look up some tables to get this */;
double r = seed.f; // <- access the bits of seed.i as a floating point number
original source on GitHub
Theoretically, on an SSE2 machine, this can be accessed through a single register; I'm not sure in practice whether any compiler would do this. It's still somewhat cleaner code in my opinion than the casting games in the earlier Quake version.
1 - ignoring "sufficiently advanced compiler" arguments

Resources