Modeling hardware registers with out of order data fields - c

I have the following memory structure:
struct {
uint16_t MSB_VALUE : 8;
uint16_t : 8;
uint16_t LSB_VALUE;
} BIG_VALUE;
This structure, all together, represents a 32-bit section of memory that is fixed by hardware. The value of BIG_VALUE can be represented using Verilog concatenation notation thus:
BIG_VALUE = { MSB_VALUE[7:0], LSB_VALUE[15:0] }
I would like to be able to write a union (or something) such that I can access the value of BIG_VALUE using dot notation. Maybe something stupid like this:
union {
uint32_t val;
struct {
uint16_t MSB_VALUE : 8;
uint16_t : 8;
uint16_t LSB_VALUE;
} sub;
} BIG_VALUE;
But, the issue is that the MSB comes before the LSB in memory (with an 8-bit gap too), and so calling BIG_VALUE.val isn't going to get the hoped-for value.
I have a vague idea of something to try, but I'm just confusing myself. Is there a way to do this within the union/struct formalism, or should I give up now? Giving up, I guess, means having to manually split up the 24-bit value and then to store those into the appropriate fields. Maybe I could write a function to do that later, if it makes sense.
Having this work means that I could store a 24-bit value using dot notation and have the data go into the appropriate locations in memory. For example:
BIG_VALUE.val = 0x0031FFFE
Then
BIG_VALUE.MSB_VALUE == 0x31
and
BIG_VALUE.LSB_VALUE == 0xFFFE
But the memory layout would be
addr : 0x0031
addr +4 : 0xFFFE

Related

Why does this union organize data in a different way than expected?

In the following code, I have a union with a struct and a uint32 (I am trying to convert from big-endian to little endian). I store a hex value in each of the bytes, and then I print the uint32 equivalent. I thought that structs order values in memory in the way that they are declared, but the output of this code is 0x302010 instead of the 0x102030 I expected. Does anyone know why this is happening?
typedef union raw {
struct {
uint8_t LSB;
uint8_t MID;
uint8_t MSB;
};
uint32_t raw;
} raw;
int main()
{
raw myraw;
myraw.LSB = 0x10;
myraw.MID = 0x20;
myraw.MSB = 0x30;
printf("%x", myraw.raw);
return 0;
}
The names you gave the struct members is actually a hint as to what happened.
Your machine apparently used little endian byte ordering. This means that multibyte types store the least significant byte first. So the value you store in the LSB member is the least significant byte of the raw field.
However, note that not all 4 bytes of the raw field have corresponding fields in the inner struct. That means that not all bytes of raw have been set, meaning that the value of the field as a whole is indeterminate.
You should add an additional field to the inner struct to match the bytes in raw. You should also reverse the order of the fields if you want to switch between big endian and little endian.
typedef union raw {
struct {
uint8_t MSB;
uint8_t MID1;
uint8_t MID2;
uint8_t LSB;
};
uint32_t raw;
} raw;
I thought that structs order values in memory in the way that they are declared.
They are. Your problem is almost certainly because you're on a little-endian architecture where the most significant byte is the higher memory address.
So, the fact that you have them ordered LSB, MID, MSB, is actually the same way that would be in the uint32_t (except the MSB isn't the most significant byte of your uint32_t, just of the 24-bit value you're using).
If you want to reverse a big-endian value 0x00102030 with this method, you're going to need something like:
typedef union raw {
struct {
uint8_t MSB;
uint8_t INR1;
uint8_t INR2;
uint8_t LSB;
};
uint32_t raw;
} raw;
myraw.MSB = 0x00;
myraw.INR1 = 0x10;
myraw.INR2 = 0x20;
myraw.LSB = 0x30;
Just keep in mind it's not always a good idea to read from a union field that wasn't the last one written to. C11 has this to say on the matter:
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
If you want something that's more portable, you can use a function along the lines of:
uint32_t reverseEndian(uint32_t input)
{
return (input & 0x000000ffU) << 24 |
(input & 0x0000ff00U) << 8 |
(input & 0x00ff0000U) >> 8 |
(input & 0xff000000U) >> 24;
}

Bit ordering in a byte when using bitfields

C a reference manual states that "The precise manner in which components( and especially bit fields) are packed into a structure is implementation dependent but is predictable for each implementation".
I read that some compilers pack bit fields left to right ( MSB to LSB) in Big endian machines whereas right to left(LSB to MSB) in Little endian machines.
is there a reason/advantage about representing bitfields in two diffrent ways depends on the endianness?
I've not implemented this, but I can imagine that it has to do with working with bit fields in registers, and reading/writing entire words to/from the structure when possible. If you implement it that way, instead of doing byte-level accesses, you will of course "feel" the endianness as the word is byte-swapped in memory.
So if you have
struct color {
uint32_t red : 8;
uint32_t green : 8;
uint32_t blue : 8;
uint32_t alpha : 8;
};
When you do
struct color orange = { .red = 255, .green = 127, .blue = 0, .alpha = 0 };
It might be implemented (since the fields are conveniently sized) as
struct color orange;
uint32_t *tmp = *(uint32_t *) &orange;
*tmp = 0xff7f0000; /* The field values, mapping red to the MSBs. */
Now, since the above does one single uint32_t-sized memory write, the value will be byte-swapped on a little-endian machine but not on a big-endian one, i.e. when viewed byte by byte, the representations are different.
Layout of bit fields inside a structure is implementation defined. It is not a good idea to use them if you need portable code.

C programming: words from byte array

I have some confusion regarding reading a word from a byte array. The background context is that I'm working on a MIPS simulator written in C for an intro computer architecture class, but while debugging my code I ran into a surprising result that I simply don't understand from a C programming standpoint.
I have a byte array called mem defined as follows:
uint8_t *mem;
//...
mem = calloc(MEM_SIZE, sizeof(uint8_t)); // MEM_SIZE is pre defined as 1024x1024
During some of my testing I manually stored a uint32_t value into four of the blocks of memory at an address called mipsaddr, one byte at a time, as follows:
for(int i = 3; i >=0; i--) {
*(mem+mipsaddr+i) = value;
value = value >> 8;
// in my test, value = 0x1084
}
Finally, I tested trying to read a word from the array in one of two ways. In the first way, I basically tried to read the entire word into a variable at once:
uint32_t foo = *(uint32_t*)(mem+mipsaddr);
printf("foo = 0x%08x\n", foo);
In the second way, I read each byte from each cell manually, and then added them together with bit shifts:
uint8_t test0 = mem[mipsaddr];
uint8_t test1 = mem[mipsaddr+1];
uint8_t test2 = mem[mipsaddr+2];
uint8_t test3 = mem[mipsaddr+3];
uint32_t test4 = (mem[mipsaddr]<<24) + (mem[mipsaddr+1]<<16) +
(mem[mipsaddr+2]<<8) + mem[mipsaddr+3];
printf("test4= 0x%08x\n", test4);
The output of the code above came out as this:
foo= 0x84100000
test4= 0x00001084
The value of test4 is exactly as I expect it to be, but foo seems to have reversed the order of the bytes. Why would this be the case? In the case of foo, I expected the uint32_t* pointer to point to mem[mipsaddr], and since it's 32-bits long, it would just read in all 32 bits in the order they exist in the array (which would be 00001084). Clearly, my understanding isn't correct.
I'm new here, and I did search for the answer to this question but couldn't find it. If it's already been posted, I apologize! But if not, I hope someone can enlighten me here.
It is (among others) explained here: http://en.wikipedia.org/wiki/Endianness
When storing data larger than one byte into memory, it depends on the architecture (means, the CPU) in which order the bytes are stored. Either, the most significant byte is stored first and the least significant byte last, or vice versa. When you read back the individual bytes through byte access operations, and then merge them to form the original value again, you need to consider the endianess of your particular system.
In your for-loop, you are storing your value byte-wise, starting with the most significant byte (counting down the index is a bit misleading ;-). Your memory looks like this afterwards: 0x00 0x00 0x10 0x84.
You are then reading the word back with a single 32 bit (four byte) access. Depending on our architecture, this will either become 0x00001084 (big endian) or 0x84100000 (little endian). Since you get the latter, you are working on a little endian system.
In your second approach, you are using the same order in which you stored the individual bytes (most significant first), so you get back the same value which you stored earlier.
It seems to be a problem of endianness, maybe comes from casting (uint8_t *) to (uint32_t *)

How do I unpack bits from a structure's stream_data in c code?

Ex.
typedef struct
{
bool streamValid;
dword dateTime;
dword timeStamp;
stream_data[800];
} RadioDataA;
Ex. Where stream_data[800] contains:
**Variable** **Length (in bits)**
packetID 8
packetL 8
versionMajor 4
versionMinor 4
radioID 8
etc..
I need to write:
void unpackData(radioDataA *streamData, MA_DataA *maData)
{
//unpack streamData (from above) & put some of the data into maData
//How do I read in bits of data? I know it's by groups of 8 but I don't understand how.
//MAData is also a struct.
}
I'm not sure I understood it right, but why can't you do just:
memcpy(maData, streamData->stream_data, sizeof(MA_DataA));
This will fully copy data contained in the array of bytes to the structure.
Your types are inconsistent or unspecified. I believe you are trying to extract packed data from a byte stream. If so, assume buf contains your data packed in order with the lengths specified. The following code should then extract each field correctly:
int packetID = buf[0];
int packetL = buf[1];
int versionMajor = (buf[2] >> 4);
int versionMinor = (buf[2] & 0x0F);
int radioID = buf[3];
As you can see, the byte-aligned values are straightforward copies. However, the 4-bit fields must be masked and/or shifted to extract only the desired data. For more information on bitwise operations refer to the excellent Bit Twiddling Hacks code snippets.
I'm just trying to unpack data and output it. I'm just stuck on how to work with bits and keeping an index and determining how to truncate it into my different variables.
The stream_data[800] is of type byte. Sorry!!
I don't think memcopy will work because it's not 1:1 direct transfer.
hope you get what I mean!

Safe, efficient way to access unaligned data in a network packet from C

I'm writing a program in C for Linux on an ARM9 processor. The program is to access network packets which include a sequence of tagged data like:
<fieldID><length><data><fieldID><length><data> ...
The fieldID and length fields are both uint16_t. The data can be 1 or more bytes (up to 64k if the full length was used, but it's not).
As long as <data> has an even number of bytes, I don't see a problem. But if I have a 1- or 3- or 5-byte <data> section then the next 16-bit fieldID ends up not on a 16-bit boundary and I anticipate alignment issues. It's been a while since I've done any thing like this from scratch so I'm a little unsure of the details. Any feedback welcome. Thanks.
To avoid alignment issues in this case, access all data as an unsigned char *. So:
unsigned char *p;
//...
uint16_t id = p[0] | (p[1] << 8);
p += 2;
The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.
You should have functions (inline and/or templated if the language you're using supports those features) that will read the potentially unaligned data and return the data type you're interested in. Something like:
uint16_t unaligned_uint16( void* p)
{
// this assumes big-endian values in data stream
// (which is common, but not universal in network
// communications) - this may or may not be
// appropriate in your case
unsigned char* pByte = (unsigned char*) p;
uint16_t val = (pByte[0] << 8) | pByte[1];
return val;
}
The easy way is to manually rebuild the uint16_ts, at the expense of speed:
uint8_t *packet = ...;
uint16_t fieldID = (packet[0] << 8) | packet[1]; // assumes big-endian host order
uint16_t length = (packet[2] << 8) | packet[2];
uint8_t *data = packet + 4;
packet += 4 + length;
If your processor supports it, you can type-pun or use a union (but beware of strict aliasing).
uint16_t fieldID = htons(*(uint16_t *)packet);
uint16_t length = htons(*(uint16_t *)(packet + 2));
Note that unaligned access aren't always supported (e.g. they might generate a fault of some sort), and on other architectures, they're supported, but there's a performance penalty.
If the packet isn't aligned, you could always copy it into a static buffer and then read it:
static char static_buffer[65540];
memcpy(static_buffer, packet, packet_size); // make sure packet_size <= 65540
uint16_t fieldId = htons(*(uint16_t *)static_buffer);
uint16_t length = htons(*(uint16_t *)(static_buffer + 2));
Personally, I'd just go for option #1, since it'll be the most portable.
Alignment is always going to be fine, although perhaps not super-efficient, if you go through a byte pointer.
Setting aside issues of endian-ness, you can memcpy from the 'real' byte pointer into whatever you want/need that is properly aligned and you will be fine.
(this works because the generated code will load/store the data as bytes, which is alignment safe. It's when the generated assembly has instructions loading and storing 16/32/64 bits of memory in a mis-aligned manner that it all falls apart).

Resources