Does casting remove endian dependency in C/C++?

Does casting remove endian dependency in C/C++? - c

i.e. if we cast a C or C++ unsigned char array named arr as (unsigned short*)arr and then assign to it, is the result the same independent of machine endianness?
Side note - I saw the discussion on IBM and elsewhere on SO with example:
unsigned char endian[2] = {1, 0};
short x;
x = *(short *) endian;
...stating that the value of x will depend on the layout of endian, and hence the endianness of the machine. That means dereferencing an array is endian-dependent, but what about assigning to it?
*(short*) endian = 1;
Are all future short-casted dereferences then guaranteed to return 1, regardless of endianness?
After reading the responses, I wanted to post some context:
In this struct
struct pix {
unsigned char r;
unsigned char g;
unsigned char b;
unsigned char a;
unsigned char y[2];
};
replacing unsigned char y[2] with unsigned short y makes no individual difference, but if I make an array of these structs and put that in another struct, then I've noticed that the size of the container struct tends to be higher for the "unsigned short" version, so, since I intend to make a large array, I went with unsigned char[2] to save space overhead. I'm not sure why, but I imagine it's easier to align the uchar[2] in memory.
Because I need to do a ton of math with that variable y, which is meant to be a single short-length numerical value, I find myself casting to short a lot just to avoid individually accessing the uchar bytes... sort of a fast way to avoid ugly byte-specific math, but then I thought about endianness and whether my math would still be correct if I just cast everything like
*(unsigned short*)this->operator()(x0, y0).y = (ySum >> 2) & 0xFFFF;
...which is a line from a program that averages 4-adjacent-neighbors in a 2-D array, but the point is that I have a bunch of these operations that need to act on the uchar[2] field as a single short, and I'm trying to find the lightest (i.e. without an endian-based if-else statement every time I need to access or assign), endian-independent way of working with the short.

Thanks to strict pointer aliasing it's undefined behaviour, so it might be anything. If you'd do the same with a union however the answer is no, the result is dependent on machine endianness.

Each possible value of short has a so-called "object representation"[*], which is a sequence of byte values. When an object of type short holds that value, the bytes of the object hold that sequence of values.
You can think of endianness as just being one of the ways in which the object representation is implementation-dependent: does the byte with the lowest address hold the most significant bits of the value, or the least significant?
Hopefully this answers your question. Provided you've safely written a valid object representation of 1 as a short into some memory, when you read it back from the same memory you'll get the same value again, regardless of what the object representation of 1 actually is in that implementation. And in particular regardless of endianness. But as the others say, you do have to avoid undefined behavior.
[*] Or possibly there's more than one object representation for the same value, on exotic architectures.

Yes, all future dereferences will return 1 as well: As 1 is in range of type short, it will end up in memory unmodified and won't change behind your back once it's there.
However, the code itself violates effective typing: It's illegal to access an unsigned char[2] as a short, and may raise a SIGBUS if your architecture doesn't support unaligned access and you're particularly unlucky.
However, character-wise access of any object is always legal, and a portable version of your code looks like this:
short value = 1;
unsigned char *bytes = (unsigned char *)&value;
How value is stored in memory is of course still implementation-defined, ie you can't know what the following will print without further knowledge about the architecture:
assert(sizeof value == 2); // check for size 2 shorts
printf("%i %i\n", bytes[0], bytes[1]);

Related

Assigning a short to int * fails

I understand that I can reassign a variable to a bigger type if it fits, ad its ok to do it. For example:
short s = 2;
int i = s;
long l = i;
long long ll = l;
When I try to do it with pointers it fails and I don't understand why. I have integers that I pass as arguments to functions expecting a pointer to a long long. And it hasn't failed, yet..
The other day I was going from short to int, and something weird happens, I hope someone can I explain it to me. This would be the minimal code to reproduce.
short s = 2;
int* ptr_i = &s; // here ptr_i is the pointer to s, ok , but *ptr_i is definitely not 2

When I try to do it with pointers it fails and I don't understand why.
A major purpose of the type system in C is to reduce programming mistakes. A default conversion may be disallowed or diagnosed because it is symptomatic of a mistake, not because the value cannot be converted.
In int *ptr_i = &s;, &s is the address of a short, typically a 16-bit integer. If ptr_i is set to point to the same memory and *ptr_i is used, it attempts to refer to an int at that address, typically a 32-bit integer. This is generally an error; loading a 32-bit integer from a place where there is a 16-bit integer, and we do not know what is beyond it, is not usually a desired operation. The C standard does not define the behavior when this is attempted.
In fact, there are multiple things that can go wrong with this:
As described above, using *ptr_i when we only know there is a short there may produce undesired results.
The short object may have alignment that is not suitable for an int, which can cause a problem either with the pointer conversion or with using the converted pointer.
The C standard does not define the result of converting short * to int * except that, if it is properly aligned for int, the result can be converted back to short * to produce a value equal to the original pointer.
Even if short and int are the same width, say 32 bits, and the alignment is good, the C standard has rules about aliasing that allow the compiler to assume that an int * never accesses an object that was defined as short. In consequence, optimization of your program may transform it in unexpected ways.
I have integers that I pass as arguments to functions expecting a pointer to a long long.
C does allow default conversions of integers to integers that are the same width or wider, because these are not usually mistakes.

Casting uint64_t on bitfield

I found code where bitfield is used for network messages. I would like to know what casting bitfield_struct data = *(bitfield_struct *)&tmp; exaclty does and how it's syntax work. Won't it violate the strict aliasing rule? Here is part of code:
typedef struct
{
unsigned var1 : 1;
unsigned var2 : 13;
unsigned var3 : 8;
unsigned var4 : 10;
unsigned var5 : 7;
unsigned var6 : 12;
unsigned var7 : 7;
unsigned var8 : 6;
} bitfield_struct;
void print_data(u_int64_t * raw, FILE * f, int no_object)
{
uint64_t tmp = ntohll(*raw);
bitfield_struct data = *(bitfield_struct *)&tmp;
...
}

Won't it violate the strict aliasing rule?
Yes it will, so the code invokes undefined behavior. It is also highly non-portable:
We don't know the size of the abstract item called "addressable storage unit" that the given system uses. It isn't necessarily 64 bits, so there could in theory be padding and other nasty things hidden in the bit-field. 64 bit unsigned is fishy.
Neither do we know if the bit-field uses the same bit-order as uint64_t. Nor can we know if they use the same endianess.
If individual bit (fields) of the uint64_t need to be accessed, I would recommend doing so using bitwise shifts, as that makes the code fully portable even between different endianess architectures. Then you don't need the non-portable ntohll call either.

What it does (or attempts to do) is quite straightforward.
uint64_t tmp = ntohll(*raw);
This line takes the value in pointer raw, reverses the byte-order and copies it into temp.
bitfield_struct data = *(bitfield_struct *)&tmp;
This line reinterprets the data in temp (which was a uint64) as type bitfield_struct and copies it into data. This is basically the equivalent of doing:
/* Create a bitfield_struct pointer that points to tmp */
bitfield_struct *p = (bitfield_struct *)&tmp;
/* Copy the value in tmp to data */
bitfield_struct data = *p;
This because normally bitfield_struct and uint64 are incompatible types and you cannot assign one to the other with just bitfield_struct data = tmp;
The code presumably continues to access fields within the bitfield through data, such as data.var1.
Now, like people pointed out, there are several issues which makes this code unreliable and non-portable.
Bit-fields are heavily implementation-dependent. Solution? Read the manual and figure out how your specific compiler variant treats bit-fields. Or don't use bitfields at all.
There is no guarantee that a uint64_t and bitfield_struct have the same alignment. Which means there could be padding which can completely offset your expectations and make you end up with wrong data. One solution is to use memcpy to copy instead of pointers, which might let you this particular issue. Or specify packed alignment using the mechanism provided by your compiler.
The code invokes UB when strict aliasing rules are applied. Solution? Most compilers will have a no-strict-aliasing flag that can be enabled, at a performance cost. Or even better, create a union type with bitfield_struct and uint64_t and use this to reinterpret between one and the other. This is allowed even with the strict-aliasing rules. Using memcpy is also legal, since it treats the data as an array of chars.
However, the best thing to do is not use this piece of code at all. As you may have noticed, it relies too much on compiler and platform specific stuff. Instead, try to accomplish the same thing using bit masks and shifts. This gets rid of all three problems mentioned above, without needing special compiler flags or having to face any real question of portability. Most importantly, it saves other developers reading your code, from having to worry about such things in the future.

Right to left:
&tmp Take address of tmp
(bitfield_struct *)&tmp Address of tmp is address to data of type bitfield_struct
*(bitfield_struct *)&tmp Extract value out of tmp, assuming it's bitfield_struct data
bitfield_struct data = *(bitfield_struct *)&tmp; Store tmp to data, assuming that tmp is bitfield_struct
So it's just copy using extra pointers to avoid compilation errors/warnings of incompatible types.
What you may not understand is bit-addressing of structure.
unsigned var1 : 1;
unsigned var2 : 13;
Here you will find some more info about it: https://www.tutorialspoint.com/cprogramming/c_bit_fields.htm

should pointers be signed or unsigned in c

I have a function get_picture() that takes a picture. It returns a pointer of type uint8_t (where the pciture is stored) and takes a pointer to a variable
that stores the length of the picture.
Here is the declaration:
uint8_t * get_picture(int *piclength)
Here I call it in main():
unsigned int address, value;
address = (unsigned int)get_picture((int*)& value);
My question is - becuase address is storing an address (which is positive) should I actually define it as an int.

I'm not sure you understand pointers.
If your function returns a uint8_t * then you should be storing it in uint8_t * not an int.
As an example:
uint8_t* get_picture(int* piclength);
int piclength;
uint8_t* address;
address = get_picture(&piclength);

If you really want to convert a data-pointer to an integer, use the dedicated typedef instead of some random (and possibly too small) type:
uintptr_t / intptr_t (Optional typedefs in <stdint.h>)
Still, the need is rare, and I don't see it here.

It depends on what you are really after. Your code is fine as is if you want address to contain the address of where that picture lives. Likewise you could use an int, since bits is bits, int is the same number of bits as unsigned int and whatever consumes address can be fed those bits. It makes more sense as a human to think of addresses as unsigned, but the compiler and hardware don't care, bits is bits.
But depending on what you are doing you may want to as mentioned already, preserve this address using a pointer of the same type. See Dragan's answer.
If you want to "see" the address then it depends on how you want to see it, converting it to an unsigned int is one easy and generic way to do it.
Yes, this is very system dependent and the size of int varies by toolchain and target and may or may not completely hold an address for that system, so some masking may be required by the consumer of that variable.
So your code is fine, I think I understand the question. Signed or unsigned is in the eye of the beholder, it is only unsigned or signed for particular specific operations. Addresses are not themselves signed nor unsigned, they are just bits on an address bus. For a sane compiler unsigned int and int are the same size, store the same number of bits so long as this compiler defines them as at least the size of the address that this compiler uses for a pointer, then this will work just fine with int or unsigned int. Int feels a little wrong, unsigned int feels right, but those are human emotions. The hardware doesn't care, so long as the bits dont change on their way to the address bus. Now if for some reason the code we don't see prints this variable as a decimal for example printf("%d\n",address); (why would you printf on a microcontroller?) then it may look strange to humans but will still be the right decimal interpretation of the bit pattern than is the address. printf("0x%X\n",address); would make more sense and be more generic. if your printf supports it you could just printf("%p",address); using Dragan's uint8_t * address declaration, which is what many folks here are probably thinking based on classical C training. vs bits are bits and have no meaning whatsoever to the hardware until used, and only for that use case, an address is only an address on the address bus, when doing math on it to compute another address it is not an address it is a bit pattern being fed into the alu, signed or unsigned might depend on the operation (add and subtract dont know signed from unsigned, multiply and divide do).
If you choose to not to use uint8_t * address as a declaration, then unsigned int "feels" better, less likely to mess you up (if you have enough bits in an (unsigned) int for that compiler to store an address in the first place). A signed int feels a little wrong, but technically should work. My rule is only use signed when you specifically need signed, otherwise use unsigned everywhere else, saves on a lot of bugs. Unfortunately traditionally C libraries do it the other way around, making a big mess before the stdint.h stuff came about.

C Casting from pointer to uint32_t to a pointer to a union containing uint32_t

I'd like to know if casting a pointer to uint32_t to a pointer of a union containing a uint32_t will lead to defined behavior in C, i.e.
typedef union
{
uint8_t u8[4];
uint32_t u32;
} T32;
void change_value(T32 *t32)
{
t32->u32 = 5678;
}
int main()
{
uint32_t value = 1234;
change_value((T32 *)&value); // value is 5678 afterwards
return EXIT_SUCCESS;
}
Is this valid C? Many thanks in advance.

The general answer to your question is, no, this is in general not defined. If the union contains a field that has larger alignment than uint32_t such a union must have the largest alignment and accessing that pointer would then lead to UB. This could e.g happen if you replace uint8_t in your example by double.
In your particular case, though, the behavior is well defined. uint8_t, if it exists, is most likely nothing other than unsigned char and all character types always have the least alignment requirement.
Edit:
As R.. mentions in his comments there are other issues with your approach. First, theoretically, uint8_t could be different from unsigned char if there is an unsigned "extended integer type" of that width. This is very unlikely, I never heard of such an architecture. Second, your approach is subject to aliasing issues, so you should be extremely careful.

At the risk of incurring downvotes... Conceptually, there is nothing wrong with what you are trying to do. That is, define a piece of storage that can be viewed as four bytes an a 32 bit integer, and then reference and modify that storage using a pointer.
However, I would ask why you would want to write code where its intent is obscured. What you are really doing is forcing the next programmer who reads your code to think for minutes and maybe even try a little test program. Thus, this programming style is "expensive".
You could have just as easily defined, value as:
T32 value;
// etc.
change_value(&value);
and then avoid the cast and subsequent angst.

Since all union members are guaranteed to start at the same memory address, your program as written does not lead to undefined behavior.

Difference between accessing Memory Mapped Registers using char and int

I have been reading about accessing Memory Mapped Registers of peripheral devices and it seems you can do multiple ways. For example:
Method 1:
#define MyReg 0x30610000
volatile int *ptrMyReg;
ptrMyReg = (volatile int *) MyReg;
*ptrMyReg = 0x7FFFFFFF; /* Turn ON all bits */
Method 2:
#define MyReg 0x30610000
volatile unsigned char *ptrMyReg;
ptrMyReg = (volatile unsigned char *) MyReg;
*ptrMyReg = 0x7FFFFFFF; /* Turn ON all bits */
Question: Is there any specific reason as to why one would choose one over another?
Assume: Size of int on architecture is 4 bytes.

*ptrMyReg = 0x7FFFFFFF;
In the second case, *ptrMyReg is of type unsigned char so 0x7FFFFFFF will be converted to unsigned char (i.e., value after conversion will be 0xFF) before assignment and only one byte will be written. I don't think this what you want if you originally intended to write 4 bytes.

Well, the second example isn't valid code, since your typecast doesn't match. If you fix that to be:
ptrMyReg = (volatile unsigned char *)MyReg;
Then, yes, they're different. In the second case, that constant gets truncated, and you will write only 0xFF to either the most- or least-significant byte of the word at 0x30610000, depending on endianness. Regardless, it's the single byte at 0x30610000 that will be written to, and not others.

The CPU architecture may require that all accesses to perihperal registers are e.g. 32 bits wide. If so, doing byte access may cause CPU exception or silent erroneous execution. This is the case on many ARM SoCs.

In method 2, you aren't going to access the entire int by dereferencing a pointer to char (unless, of course, sizeof(int)=1 on your platform).
Other than that, you should look at your hardware. It may behave differently when accessed using memory operands of different sizes.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight