Data of a struct into a union - c

I have declared the next union:
typedef union
{
struct
{
uint32_t data;
};
uint8_t w[4];
} xxx_data_t
I am trying to access a memory by SPI, which only has an input capacity of 1Byte.
I want to enter the variable data, and I have though of descomposing that 32-bits data variable into 4 8-bits variables (1Byte each one), thus forming the w[4] array.
My question is: is this valid? This creates a descomposition of my 32-bits variable?
EXAMPLE
I declare xxx_DATA_t my_variable.
my_variable.data=300 which in hexadecimal is 0x12C. Will the array be my_variable.w[4]=[0,0,1,44]??
data (32-bits) = 300 = 0x 00 00 01 2C
w (4-bits) ====== = 0x [0] [0] [1] [44]
Thanks all.

I think it is right. You can write a test program to test it.
#include<stdio.h>
#include<stdint.h>
typedef union
{
struct
{
uint32_t data;
};
uint8_t w[4];
}data_t;
int main(){
data_t d = {.data = 0x12c};
for(int i = 0; i < 4; i++){
printf("%d\n", d.w[i]);
}
return 0;
}
44
1
0
0

My question is: is this valid?
Yes, you may do type punning between different types using union. Type punning to/from a character type (which uint8_t almost certainly is) is always safe. Alignment/padding shouldn't be an issue either, in this specific case.
Please note that you can only do this in C - you cannot do it in C++. So think twice before using C++ for hardware-related programming.
which in hexadecimal is 0x12C. Will the array be my_variable.w[4]=[0,0,1,44]??
It depends on CPU endianess. What is CPU endianness?
So you can either get 00 00 01 2C on a Big Endian computer such as Power PC, or you can get 2C 01 00 00 on a Little Endian computer such as x86.
As for what endianess you actually want, it is the network endianess. In case of SPI, network endianess is the same as the byte order that the part you are communicating with expects. You have to look that up in a datasheet, in case it's a "dumb" part like an ADC, display or similar. If it's a "smart" part like another MCU, then you can probably specify the network endianess yourself.

Related

Endian-independent way of using memcpy() from smaller to larger integer pointer

Suppose I have two arrays.
uint8_t[SIZE] src = { 0 };
uint32_t[SIZE] dst = { 0 };
uint8_t* srcPtr; // Points to current src value
uint32_t* dstPtr; // Points to current dst value
src holds values that sometimes need to be put into dst. Importantly, the values from src may be 8-bit, 16-bit, or 32-bit, and aren't necessarily properly aligned. So, suppose I wish to use memcpy() like below, to copy a 16-bit value
memcpy(dstPtr, srcPtr, 2);
Will I run into an endianness issue here? This works fine on little-endian systems, since if I want to copy 8, then srcPtr has 08 then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be 8, as expected.
But if I were on a big-endian system, srcPtr would be 00 then 08, and the bytes at dstPtr will be 00 08 00 00 (I presume), which would take on a value of 524288.
What would be an endian-independent way to write this copy?
Will I run into an endianness issue here?
Not necessarily endianness issues per se, but yes, the specific approach you describe will run into issues with integer representation.
This works fine on
little-endian systems, since if I want to copy 8, then srcPtr has 08
then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be
8, as expected.
You seem to be making an assumption there, either
that more bytes of the destination will be modified than you actually copy, or perhaps
that relevant parts of the destination are pre-set to all zero bytes.
But you need to understand that memcpy() will copy exactly the number of bytes requested. No more than that will be read from the specified source, and no more than that will be modified in the destination. In particular, the data types of the objects to which the source and destination pointers point have no effect on the operation of memcpy().
What would be an endian-independent way to write this copy?
The most natural way to do it would be via simple assignment, relying on the compiler to perform the necessary conversion:
*dstPtr = *srcPtr;
However, I take your emphasis on the prospect that the arrays might not aligned as a concern that it may be unsafe to dereference the source and / or destination pointer. That will not, in fact, be the case for pointers to char, but it might be the case for pointers to other types. For cases where you take memcpy as the only safe way to read from the arrays, the most portable method for converting value representations is still to rely on the implementation. For example:
uint8_t* srcPtr = /* ... */;
uint32_t* dstPtr = /* ... */;
uint16_t srcVal;
uint32_t dstVal;
memcpy(&srcVal, srcPtr, sizeof(srcVal));
dstVal = srcVal; // conversion is automatically performed
memcpy(dstPtr, &dstVal, sizeof(dstVal));
Will I run into an endianness issue here?
Yes. You're not copying, you're converting from one format to another (packing several unsigned integers into a single larger unsigned integer).
What would be an endian-independent way to write this copy?
The simple way is to make the conversion explicit, like:
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
However, for cases where using memcpy() works it's likely to be faster, and this won't change after compiling; so you could do something like:
#ifdef BIG_ENDIAN
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
#else
memcpy(dest, src, something*4);
#endif
Note: you'd also have to define the BIG_ENDIAN macro when appropriate - e.g. maybe a -D BIG_ENDIAN command line argument when starting the compiler when you know the target architecture needs it.
I'm storing 16-bit values in src which aren't 16-bit-aligned which then need to be put into a 64-bit integer
That adds another problem - some architectures do not allow misaligned accesses. You need to use explicit conversion (read 2 separate uint8_t, not a misaligned uint16_t) to avoid this problem too.

why does a integer type need to be little-endian?

I am curious about little-endian
and I know that computers almost have little-endian method.
So, I praticed through a program and the source is below.
int main(){
int flag = 31337;
char c[10] = "abcde";
int flag2 = 31337;
return 0;
}
when I saw the stack via gdb,
I noticed that there were 0x00007a69 0x00007a69 .... ... ... .. .... ...
0x62610000 0x00656463 .. ...
So, I have two questions.
For one thing,
how can the value of char c[10] be under the flag?
I expected there were the value of flag2 in the top of stack and the value of char c[10] under the flag2 and the value of flag under the char c[10].
like this
7a69
"abcde"
7a69
Second,
I expected the value were stored in the way of little-endian.
As a result, the value of "abcde" was stored '6564636261'
However, the value of 31337 wasn't stored via little-endian.
It was just '7a69'.
I thought it should be '697a'
why doesn't integer type conform little-endian?
There is some confusion in your understanding of endianness, stack and compilers.
First, the locations of variables in the stack may not have anything to do with the code written. The compiler is free to move them around how it wants, unless it is a part of a struct, for example. Usually they try to make as efficient use of memory as possible, so this is needed. For example having char, int, char, int would require 16 bytes (on a 32bit machine), whereas int, int, char, char would require only 12 bytes.
Second, there is no "endianness" in char arrays. They are just that: arrays of values. If you put "abcde" there, the values have to be in that order. If you would use for example UTF16 then endianness would come into play, since then one part of the codeword (not necessarily one character) would require two bytes (on a normal 8-bit machine). These would be stored depending on endianness.
Decimal value 31337 is 0x007a69 in 32bit hexadecimal. If you ask a debugger to show it, it will show it as such whatever the endianness. The only way to see how it is in memory is to dump it as bytes. Then it would be 0x69 0x7a 0x00 0x00 in little endian.
Also, even though little endian is very popular, it's mainly because x86 hardware is popular. Many processors have used big endian (SPARC, PowerPC, MIPS amongst others) order and some (like older ARM processors) could run in either one, depending on the requirements.
There is also a term "network byte order", which actually is big endian. This relates to times before little endian machines became most popular.
Integer byte order is an arbitrary processor design decision. Why for example do you appear to be uncomfortable with little-endian? What makes big-endian a better choice?
Well probably because you are a human used to reading numbers from left-to-right; but the machine hardly cares.
There is in fact a reasonable argument that it is intuitive for the least-significant-byte to be placed in the lowest order address; but again, only from a human intuition point-of-view.
GDB shows you 0x62610000 0x00656463 because it is interpreting data (...abcde...) as 32bit words on a little endian system.
It could be either way, but the reasonable default is to use native endianness.
Data in memory is just a sequence of bytes. If you tell it to show it as a sequence (array) of short ints, it changes what it displays. Many debuggers have advanced memory view features to show memory content in various interpretations, including string, int (hex), int (decimal), float, and many more.
You got a few excellent answers already.
Here is a little code to help you understand how variables are laid out in memory, either using little-endian or big-endian:
#include <stdio.h>
void show_var(char* varname, unsigned char *ptr, size_t size) {
int i;
printf ("%s:\n", varname);
for (i=0; i<size; i++) {
printf("pos %d = %2.2x\n", i, *ptr++);
}
printf("--------\n");
}
int main() {
int flag = 31337;
char c[10] = "abcde";
show_var("flag", (unsigned char*)&flag, sizeof(flag));
show_var("c", (unsigned char*)c, sizeof(c));
}
On my Intel i5 Linux machine it produces:
flag:
pos 0 = 69
pos 1 = 7a
pos 2 = 00
pos 3 = 00
--------
c:
pos 0 = 61
pos 1 = 62
pos 2 = 63
pos 3 = 64
pos 4 = 65
pos 5 = 00
pos 6 = 00
pos 7 = 00
pos 8 = 00
pos 9 = 00
--------

Copying bytes to struct gives wrong values [duplicate]

This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 8 years ago.
I'm trying to copy a byte array to a struct:
Actual bytes:
00000000 | ff 53 4d 42 72 00 00 00 00 08 01 c8 00 00 00 00 | .SMBr...........
Destination structure:
typedef struct {
uint8_t protocol[4];
uint8_t command;
uint32_t status;
uint8_t flags;
uint16_t flags2;
uint16_t pidHigh;
uint16_t somethingElse;
} MyStruct;
But for some reason, bytes in myStruct.status are not what they're supposed to be:
printf("%x", bytes[4]);
=> 72 // Ok
printf("%x", myStruct.command);
=> 72 // Ok
printf("%02x%02x%02x%02x", bytes[5], bytes[6], bytes[7], bytes[8]);
=> 00000000 // Ok
printf("%"PRIX32, myStruct.status);
=> C8010800 // What?! Why did it take the next 4 bytes... and reversed them?
Code used to copy those bytes:
MyStruct myStruct;
memcpy(&myStruct, bytes, 16);
This code is running on ARM (iPhone 5), which might explain the little-endianness of the output, but it doesn't explain why there's a +4 bytes offset in the bytes that've been copied.
What's going on here?
The memory layout of a struct is going to conform to the alignment requirements of its members. On 32-bit ARM, 16-bit values need 2 byte alignment and 32-bit and greater values require 4 byte alignment. There are padding bytes in between the structure elements when the alignment doesn't match from one to another. Due to this padding, copying or casting arrays of bytes to a struct is not going to work as you expect.
Unfortunately, there is no great way around this. You can choose to pack your structures, which may reduce their performance. You can copy each element individually. Or you can carefully arrange your structures so that they are tightly packed (assuming you are aware of the alignment rules for all platforms your code will run on).
For example: if you rearrange you struct in this way, it will be perfectly packed with no padding bytes in the middle or at the end (it is an even multiple of 4).
typedef struct {
uint32_t status; // +0
uint16_t flags2; // +4
uint16_t pidHigh; // +6
uint16_t somethingElse; // +8
uint8_t command; // +10
uint8_t flags; // +11
uint8_t protocol[4]; // +12
} MyStruct;
The compiler aligns the elements in the struct, so that all of them occupy a space in memory equal to a multiple of 4.
So basically, command, that supposedly uses 1 byte only, is followed by 3 bytes of garbage before status.
You can tell the compiler to not do that by setting this:
#pragma pack(1)

Structure and pointer

I'm having a problem getting the entry memory address to a member variable of a structure. I've tried in two ways, one of which didn't work properly. It would be very good if you guys give me some advice.
First, i defined a structure named BITMAP_HEADER.
struct BITMAP_HEADER
{
WORD bfType ;
DWORD bfSize ;
WORD bfReserved1 ;
WORD bfReserved2 ;
DWORD bfOffBits ;
} ;
Second, i defined and initialized some variables. please look at the code below before you read next line. In case you ask me why i got a character pointer, i needed to access each bytes of integer bfSize.
struct BITMAP_HEADER bitmap_header ;
char* pSize = (char*)&bitmap_header.bfSize;
Third, i got a memory address to the bfSize in two different ways and printed the values.
1. printf("%X\n", *pSize) ;
2. printf("%X\n", (unsigned char)*(((char*)&bitmap_header)+2)) ;
(1) directly got a memory address to the bitmap_header.bfSize.
(2) got a memory address to the structure BITMAP_HEADER and shifted the pointer to the next by 2 bytes.
Finally, here is the result.
2D
F6
For your information, here is the hex data of the structure BITMAP_HEADER.
42 4D / F6 C6 2D 00 / 00 00 / 00 00 / 36 00 00 00
Why didn't the first method work? I thought the two methods were exactly same.
You're running into structure padding here. The compiler is inserting two bytes' worth of padding between the bfType and bfSize fields, to align bfSize to 4 bytes' size, since bfSize is a DWORD.
Generally speaking, you cannot rely on being able to calculate exact offsets within a structure, since the compiler might add padding between members. You can control this to some degree using compiler-specific bits; for example, on MSVC, the pack pragma, but I would not recommend this. Structure padding is there to specify member alignment restrictions, and some architectures will fault on unaligned accesses. (Others might fixup the alignment manually, but typically do this rather slowly.)
See also: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding
As for the raw data which structure is known in advance, it usually better to read it to an array and use defined offsets to access required fields. This way you won't have to worry about compiler's behaviour (which might often be not as you expected). Your code would look like:
#define FIELD_TYPE 0
#define FIELD_SIZE 2
#define FIELD_RES1 6
#define FIELD_RES2 8
#define FIELD_OFF 10
#define SIZE_HEADER 14
static uint8_t header[SIZE_HEADER];
<...>
uint8_t * pheader = header;
DWORD offset_bits = (DWORD)*(pheader + FIELD_OFF);
P.S. to make this code portable, size of WORD and endianness must be considered, few #ifdef.. #else.. #endif should help with that.
P.P.S it would be even better use manual logical operations and shift operators instead of casting, but left it this way for the sake of brevity.

Handling xor with different key sizes and endianess

I am playing around with xor decoding via a small C file, and am running into issues with endianness ...I am a bit stuck on how to work around them. This is really the first time I've played this deeply with bitwise operations in C.
If I use a one-byte xor key and pick up several xor-encoded values into a uint8_t pointer, my basic code works fine. Walk each byte, xor it against the key, and store the result in a decoded byte array/buffer and then print it back to the console.
However, if I try a two-byte xor key, then endianness starts to get in the way. I currently stick the key into a uint32_t, because I don't plan on dealing with xor keys greater than 32bits. On a little-endian system, a xor key of 0xc39f gets stored as 0x9fc3. The bytes to be decoded are big-endian if I play them back one byte at a time, but they too, get flipped to little-endian if I try to play them back two-bytes at a time (same size as the xor key).
I am tempted to #include <byteswap.h> and then call bswap_32(). But while this will work on little endian, it might have the opposite effect on big-endian. I assume then I'd need ugly #ifdef's to only use bswap_32() for little-endian archs. I figure, there has got to be a more portable way for this to work.
Random sample string:
g e n e r a t e
67 65 6e 65 72 61 74 65
Xor 0xc39f
a4 fa ad fa b1 fe b7 fa
If I play back the xor-encoded buffer with two-byte (uint16_t) pointers, I get this (via a basic printf):
0xfaa4 0xfaad 0xfeb1 0xfab7
And with four-byte pointers (uint32_t):
0xfaadfaa4 0xfab7feb1
I would expect for the above, to get instead for two-byte pointers:
0xa4fa 0xadfa 0xb1fe 0xb7fa
And four-byte pointers:
0xa4faadfa 0xb1feb7fa
Thoughts?
Edit: Any takers? Current answers aren't adequate to my needs.
You're overthinking this—just treat your xor key as an endianless binary blob, and convert it to a native uint32_t for performance:
void xor_encrypt_slow(uint8_t *data, size_t len, uint8_t key[4])
{
// key is a 4-byte xor key
size_t i;
for(i = 0; i < len; i++)
data[i] ^= key[i % 4];
}
void xor_encrypt_fast(uint8_t *data, size_t len, uint8_t key[4])
{
// Convert key to a 32-bit value
uint32_t key32 = *(uint32_t *)key;
// This assumes that data is aligned on a 4-byte boundary; if not, adjust
// accordingly
size_t i;
for(i = 0; i + 3 < len; i += 4)
((uint32_t *)data)[i] ^= key32;
// Handle the remainder, if len is not a multiple of 4
for( ; i < len; i++)
data[i] ^= key[i % 4];
}
Try using the htonl() macro, which is designed exactly for this purpose. It stands for "hex to network long" and is defined to swap (or not swap) bytes to make the resulting values big-endian, as required before transmitting them over the network.

Resources