I'm working on some zig bindings but as the language doesn't have complete C ABI support I'm trying to hack it to at least work somewhat. From the issue above I know that normal struct & union parameters <= 16 bytes are broken down into pieces without any modification with the exception of structs or unions that are just floats and <= 16. My question is: what does C to those structs?
Edit (additional information):
When passing a struct as a parameter in C, C handles different kinds of structs differently. For structs that are 16 bytes are smaller there are two kinds: "normal" and structs that are only made up of floats. I know how C passes normal structs, by just breaking it down into its pieces and pushing them onto the stack, but I don't know how it passes only-float structs.
Platform information:
64-bit macOS, compiler is apple clang version 11.0.3
Related
From Modern C by Jens Gustedt,
Representations of values on a computer can vary “culturally” from architecture to architecture or are determined by the type the programmer gave to the value. Therefore, we should try to reason primarily about values and not about representations if we want to write portable code.
If you already have some experience in C and in manipulating bytes and bits, you will need to make an effort to actively “forget” your knowledge for most of this section. Thinking about concrete representations of values on your computer will inhibit you more
than it helps.
Takeaway - C programs primarily reason about values and not about their representation.
Question 1: What kind of 'representations' of values, is author talking about? Could I be given an example, where this 'representation' varies from architecture to architecture and also an example of how representations of values are determined by type programmer gave to value?
Question 2: What's the purpose of specifying a data type in C language, I mean that's the rule of the language but I have heard that's how a compiler knows how much memory to allocate to an object? Is that the only use, albeit crucial? I've heard there isn't a need to specify a data type in Python.
What kind of 'representations' of values, is author talking about?
https://en.wikipedia.org/wiki/Two%27s_complement vs https://en.wikipedia.org/wiki/Ones%27_complement vs https://en.wikipedia.org/wiki/Offset_binary. Generally https://en.wikipedia.org/wiki/Signed_number_representations.
But also the vast space of floating point number formats https://en.wikipedia.org/wiki/Floating-point_arithmetic#IEEE_754:_floating_point_in_modern_computers - IEEE 745, minifloat, bfloat16, etc. etc. .
Could I be given an example, where this 'representation' varies from architecture to architecture
Your PC uses twos complement vs https://superuser.com/questions/1137182/is-there-any-existing-cpu-implementation-which-uses-ones-complement .
Ach - but of course, most notably https://en.wikipedia.org/wiki/Endianness .
also an example of how representations of values are determined by type programmer gave to value?
(float)1 is represented in IEEE 745 as 0b00111111100000000000000000000000 https://www.h-schmidt.net/FloatConverter/IEEE754.html .
(unsigned)1 with 32-bit int is represented as 0b00.....0001.
What's the purpose of specifying a data type in C language,
Use computer resources efficiently. There is no point in reserving 2 gigabytes to store 8-bits of data. Type determines the range of values that can be "contained" in a variable. You communicate that "upper/lower range" of allowed values to the compiler, and the compiler generates nice and fast code. (There is also ADA where you literally specify the range of types, like type Day_type is range 1 .. 31;).
Programs are written using https://en.wikipedia.org/wiki/Harvard_architecture . Variables at block scope are put on stack https://en.wikipedia.org/wiki/Stack_(abstract_data_type)#Hardware_stack . The idea is that you have to know in advance how many bytes to reserve from the stack. Types communicate just that.
have heard that's how a compiler knows how much memory to allocate to an object?
Type communicates to the compiler how much memory to allocate for an object, but it also communicates the range of values, the representation (float vs _Float32 might be similar, but be different). Overflowing addition of two int's is invalid, overflowing addition of two unsigned is fine and wraps around. There are differences.
Is that the only use, albeit crucial?
The most important use of types is to clearly communicate the purpose of your code to other developers.
char character;
int numerical_variable;
uint_least8_t variable_with_8_bits_that_is_optimized_for_size;
uint_fast8_t variable_with_8_bits_that_is_optimized_for_speed;
wchar_t wide_character;
FILE *this_is_a_file;
I've heard there isn't a need to specify a data type in Python.
This is literally the difference between statically typed programming languages and dynamically typed programming languages. https://en.wikipedia.org/wiki/Type_system#Type_checking
The standard doesn't seem impose any padding requirements on struct members, even though it does prohibit reordering (6.7.2.1p6). How likely is it that a C platform will not pad minimally, i.e., not add only the minimum amount of padding needed to make sure the next member (or instance of the same struct, if this is the last member) is sufficiently aligned for its type?
Is it even sensible of the standard not to require that padding be minimal?
I'm asking because this lack of a padding guarantee seems to prevent me from portably representing serialized objects as structs (even if I limit myself to just uint8_t arrays as members, compilers seem to be allowed to add padding in between them), and I'm finding it a little weird to have to resort to offset arithmetic there.
How likely is it that a C platform will not pad minimally, i.e., not add only the minimum amount of padding needed to make sure the next member (or instance of the same struct, if this is the last member) is sufficiently aligned for its type?
Essentially, the "extra" padding may allow significant compiler optimizations.
Unfortunately, I don't know if any compilers actually do that (and therefore cannot provide any estimate on its likelihood of occurring).
As a simple example, consider a 32-bit or 64-bit architecture, where the ABI states that string literals and character arrays are aligned to 32-bit or 64-bit boundary. Many of the C library functions are (also) implemented by the C compiler itself; see e.g. these lists for GCC. The compiler can track the parameters to see if they refer to a string literal or (the beginning of a) character array, and if so, replace e.g. strcmp() with an optimized built-in version (which does the comparison in 32-bit units, rather than char-at-a-time).
As a more complicated example, consider a RISC hardware architecture, where unaligned byte access is slower than aligned native word access. (For example, the former may be implemented in hardware as the latter, followed by a bit shift.) Such an architecture could have an ABI that requires all structure members to be word-aligned. Then, the C compiler would be required to add more-than-minimal padding.
Traditionally, the C standards committee has been very careful to not exclude any kind of hardware architecture from correctly implementing the language.
Is it even sensible of the standard not to require that padding be minimal?
The purpose of the C standard used to be to ensure that C code would behave in the same manner if compiled with different compilers, and to allow implementation of the language on any sufficiently capable hardware architecture. In that sense, it is very sensible for the standard not to require minimal padding, as some ABIs may require more than minimal padding for whatever reason.
With the introduction of the Microsoft "extensions", the purpose of the C standard has shifted significantly, to binding C to C++ to ensure a C++ compiler can compile C code with minimal differences to C++ compilation, and to provide interfaces that can be marketed as "safer" with the actual purpose of balkanizing developers and binding them to a single vendor implementation. Because this is contrary to the previous purpose of the standard, and it is clearly non-sensible to standardize single-vendor functions like fscanf_s() while not standardizing multi-vendor functions like getline(), it may not be possible to define what sensible means anymore in the context of the C standard. It definitely does not match "good judgment"; it probably now refers to "being perceptible by the senses".
I'm asking because this lack of a padding guarantee seems to prevent me from portably representing serialized objects as structs
You are making the same mistake C programmers make, over and over again. Structs are not suitable for representing serialized objects. You should not use a struct to represent a network object, or a file header, because of the C struct rules.
Instead, you should use a simple character buffer, and either accessor functions (to extract or pack each member or field from the buffer), or conversion functions (to convert the buffer contents to a struct and vice versa).
The underlying reason why even experienced programmers like the asker still would prefer to use a struct instead, is that the accessors/conversion involves a lot of extra code; having the compiler do it instead would be much better: less code, simpler code, easier to maintain.
And I agree. It would even be quite straightforward, if a new keyword, say serialized_struct was introduced; to introduce a serialized data structure with completely different member rules to traditional C structs. (Note that this support would not affect e.g. linking at all, so it really is not as complicated as one might think.) Additional attributes or keywords could be used to specify explicit byte order, and the compiler would do all the conversion details for us, in whatever way the compiler sees best for the specific architecture it compiler for. This support would only be available for new code, but it would be hugely beneficial in cutting down on interoperability issues -- and it would make a lot of serialization code simpler!
Unfortunately, when you combine the C standard committee's traditional dislike to adding new keywords, and the overall direction change from interoperability to vendor lock-in, there is no chance at all for anything like this to be included in the C standard.
Of course, as described in the comments, there are lots of C libraries that implement one serialization scheme or other. I've even written a few myself (for rather peculiar use cases, though). A sensible approach (poor pun intended) would be to pick a vibrant one (well maintained, with a lively community around the library), and use it.
I've been working on using a C library from R by writing custom C-functions using the library's functionality, and then accessing these C-functions from R using the .C-Interface.
In some of the C-code, I allocate space for some custom structures and want to store pointers to them in R so I can use these structures in successive calls to .C. While toying around with the .C function, I noticed I can simply cast the pointer to the C-structure to int and store it in R as an integer. Passing this integer to later calls via .C works fine, I can keep track of my structures and use them without problems.
My somewhat naive question: what is wrong with storing these pointers in integers in R? It works fine so I'm assuming there has to be some downside, but I couldn't find any info on it.
R's integers are 32 bits even on a 64 bits platform. Therefore, when working on a 64 bits system this won't work (the pointers will be 64 bits).
R has functionality for this. See the 'Writing R Extensions' manual, the section on 'External pointers and weak references'.
If you are willing to switch to c++ (which doesn't mean you have to rewrite all of your code), you can use the Rcpp package which makes this easier. See for example External pointers with Rcpp
How can i store 128 or 256 bit data types(ints and floats) without using any data structures(like arrays and others) and external libraries on a 64-bit machine.I am using codeblocks.
You cannot do this using the C language as laid out in the standard. However, depending on your compiler and architecture, you may have compiler-specific support for larger integer sizes. For instance, GCC supplies limited support for 128-bit integers, which you can use like so:
__int128 foo; //foo is a 128-bit signed integer.
However, the real question is probably why you are so determined not to use a library or basic language features like arrays to implement your own.
I have a project that is half in C and half in Fortran 77. [No, not Fortran 90 or 03, Fortran 77.] The code would be much cleaner if I could pass pointers generated on the C side back to Fortran, which would then pass them back as necessary for handling in other C functions. As it is, the C code is filled with global variables that shouldn't be global, and is otherwise on the verge of becoming an unstructured mess. So are there any reasonably reliable ways to pass an opaque pointer between C and Fortran?
If you are on a 32-bit platform, consider casting the pointers to integers and passing those integers to the Fortran code. When the Fortran passes them back, reconvert the integer back into a pointer, cross-fingers, and use.
From what I remember (from 25+ years ago), Fortran 77 tends to pass everything to C by pointer anyway - and character strings get passed with a length, and arrays get passed with their dimensions.
If you're on a 64-bit platform, you'll have to work out whether the Fortran 77 compiler provides any 8-byte integers (INTEGER*8?) - my suspicion is that it won't (largely confirmed by looking at the GNU documentation; if you were using Fortran 2003, you'd be in better shape, it seems). If it does, the same trick works. If it does not, you are into much dodgier territory.
You could try - against recommendations - using a union of a double and a pointer. In the C, you'd set the pointer in the union from your C code pointer, then copy the double out of the union into a Fortran REAL*8, and as long as no-one touches that except to copy it or pass it back, maybe you will be OK if the gods smile favourably upon your endeavours. Most likely though, the whole thing will explode - this sort of union has an incredible ability to detect when the customer will be most annoyed if something doesn't work and will then proceed to explode at exactly the right moment - part way through the demo, or fifteen minutes after the program goes live.
An alternative to consider (still with gritted teeth) is a union of a 64-bit pointer and an array of two 32-bit integers, and then requiring the Fortran code to pass an array of two integers when you need to return a (64-bit) pointer. Clearly, an array of one integer(s) would work to 32-bit code; maybe just require the calling code to pass an array of two integers in all cases, zeroing the unused integer value in the 32-bit pointer case? That gives you forward migratability.
You can do this with the (non-standard) Cray pointer extension:
http://gcc.gnu.org/onlinedocs/gfortran/Cray-pointers.html