structure memory layout in multithreaded code

structure memory layout in multithreaded code - c

The following code is a multi-threaded and is running for thread id=0 and 1 simultaneously.
typedef struct
{
unsigned char pixels[4];
} FourPixels;
main()
{
FourPixels spixels[];
//copy on spixels
spixels[id] = gpixels[id];
//example : remove blue component
spixels[id].pixels[0] &= 0xFC;
spixels[id].pixels[1] &= 0xFC;
spixels[id].pixels[2] &= 0xFC;
spixels[id].pixels[3] &= 0xFC;
}
We see that thread id =0 fetches 4 chars, and the thread id =1 fetches another set of 4 chars.
I want to know in memory how the structures spixels[0] and spixles[1] are put, means something like this?
spixels[0] spixels[1]
pixel[0] pixel[1] pixel[2] pixel[3] pixel[0] pixel[1] pixel[2] pixel[3]
2000 2001 2002 2003 2004 2005 2006 2007
The question is are spixel[0] and spixel[1] placed contiguously with guarantee as shown above?

Yes, they will be laid out contiguously as you say. Now, probably someone will come and say that it is not guaranteed on all platforms, because the alignment of the struct could be more than its size, so you could have a gap between the two struct "bodies" due to implicit padding after the first one. But no matter, because the alignment on any sane compiler and platform will be just 1 byte (as in char).
If I were writing code that relied on this, I'd add a compile-time assertion that the size of two of those structs should be exactly 8 bytes, and then I'd be 100% confident.
Edit: here's an example of how a compile-time check might work:
struct check {
char floor[sizeof(FourPixels[2]) - 8];
char ceiling[8 - sizeof(FourPixels[2])];
};
The idea is that if the size is not 8, one of the arrays will have negative size. If it is 8, they'll both have zero size. Note that this is a compiler extension (GCC supports zero-length arrays for example), so you may want to look for a better way. I'm more of a C++ person, and we have fancier tricks for this (in C++11 it's built in: static_assert()).

An array is guaranteed by the standard to be contiguous. It's also guaranteed that the first entry will be on a low address in memory, and the next will be on a higher, etc.
In the case the structures pixel array, pixel[1] will always come directly after pixel[0]. The same with the next entries.

Yes arrays are placed in contiguous memory location.
This is to allow the pointer arithmetic.

Related

C - Big-endian struct interconvert with little-endian struct

I have two structs which have the same data members. (one is a big_endian struct, the other is little_endian ) now I have to interconvert with them. But when I code, I found that there are lots of repeated codes with little change. How can I change these codes to be more elegant without repeated code? (repeated code means these code may be similar such as mode == 1 and mode == 2, which only differ in assignment position. It doesn't look elegant but works.)
here is my code:
#pragma scalar_storage_order big-endian
typedef struct {
int a1;
short a2;
char a3;
int a4;
} test_B;
#pragma scalar_storage_order default
typedef struct {
int a1;
short a2;
char a3;
int a4;
} test_L;
void interconvert(test_L *little, test_B *big, int mode) {
// if mode == 1 , convert little to big
// if mode == 2 , convert big to little
// it may be difficult and redundant when the struct has lots of data member!
if(mode == 1) {
big->a1 = little->a1;
big->a2 = little->a2;
big->a3 = little->a3;
big->a4 = little->a4;
}
else if(mode == 2) {
little->a1 = big->a1;
little->a2 = big->a2;
little->a3 = big->a3;
little->a4 = big->a4;
}
else return;
}
Note：The above code must run on gcc-7 or higher ，because of the #pragma scalar_storage_order

An answer was posted which suggested to use memcpy for this problem, but that answer has been deleted. Actually that answer was right, if used correctly, and I want to explain why.
The #pragma specified by the OP is central, as he notes out:
Note: the above code must run on gcc-7 or higher because of the #pragma scalar_storage_order
The struct from the OP:
#pragma scalar_storage_order big-endian
typedef struct {
int a1;
short a2;
char a3;
int a4;
} test_B;
means that the instruction "test_B.a2=256" writes, in the two consecutive bytes belonging to the a2 member, respectively 1 and 0. This is big-endian. The similar instruction "test_L.a2=256" would instead strore the bytes 0 and 1 (little endian).
The following memcpy:
memcpy(&test_L, &test_B, sizeof test_L)
would make the bytes for test_L.a2 equal to 1 and 0, because that is the ram content of test_B.a2. But now, reading test_L.a2 in little endian mode, those two bytes mean 1. We wrote 256 and read back 1. This is exactly the wanted conversion.
To use correctly this mechanism, it is sufficient to write in one struct, memcpy() in the other, and read the other - member by member. What was big-endian becomes little-endian and viceversa. Of course, if the intention is to elaborate data and apply calculations on it, it is important to know what endianness has the data; if it matches the default mode, no transformation has to be done before the calculations, but the transformation has to be applied later. On the contrary, if the incoming data does not match the "default endianness" of the processor, it must be transformed first.
EDIT
After the comment of the OP, below, I investigated more. I took a look at this https://gcc.gnu.org/onlinedocs/gcc/Structure-Layout-Pragmas.html
Well, there are three #pragma available to choose the byte layout: big-endian, little-endian, and default. One of the first two is equal to the last: if the target machine is little-endian, default means little-endian; if it is big-endian, default means big-endian. This is more than logical.
So, doing a memcpy() between big-endian and default does nothing on a big-endian machine; and also this is logical. Ok, better I stress more that memcpy() does absolutely nothing per se: it only moves data from a ram area treated in a certain manner to another area treated in another manner. The two different areas are treated differently only when a normal member access is done: here come to play the #pragma scalar_storage_order. And as I written before, it is important to know what endiannes have the data entering the program. If they come from TCP network, for example, we know that is big-endian; more in general, if it is taken from outside the "program" and respect a protocol, we should know what endianness has.
To convert from an endianness to the other, one should use little and big, NOT default, because that default is surely equal to one of the former two.
Still another edit
Stimulated by comments, and by Jamesdlin who used an online compiler, I tried to do it too. At this url http://tpcg.io/lLe5EW
there is the demonstration that assigning to a member of one struct, memcpy to another, and reading that, the endian conversion is done. That's all.

Endianness macro in C

I recently saw this post about endianness macros in C and I can't really wrap my head around the first answer.
Code supporting arbitrary byte orders, ready to be put into a file
called order32.h:
#ifndef ORDER32_H
#define ORDER32_H
#include <limits.h>
#include <stdint.h>
#if CHAR_BIT != 8
#error "unsupported char size"
#endif
enum
{
O32_LITTLE_ENDIAN = 0x03020100ul,
O32_BIG_ENDIAN = 0x00010203ul,
O32_PDP_ENDIAN = 0x01000302ul
};
static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
{ { 0, 1, 2, 3 } };
#define O32_HOST_ORDER (o32_host_order.value)
#endif
You would check for little endian systems via
O32_HOST_ORDER == O32_LITTLE_ENDIAN
I do understand endianness in general. This is how I understand the code:
Create example of little, middle and big endianness.
Compare test case to examples of little, middle and big endianness and decide what type the host machine is of.
What I don't understand are the following aspects:
Why is an union needed to store the test-case? Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed? And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?
Why the check for CHAR_BIT? One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide? Why not just use uint8_t? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?

Why is a union needed to store the test case?
The entire point of the test is to alias the array with the magic value the array will create.
Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed?
Well, more-or-less. It will but other than 32 bits there are no guarantees. It would fail only on some really fringe architecture you will never encounter.
And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?
The inner brace is for the array.
Why the check for CHAR_BIT?
Because that's the actual guarantee. If that doesn't blow up, everything will work.
One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide?
Because in fact it always is, these days.
Why not just use uint8_t? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?
Lots of other choices would work also.

The initialization has two set of braces because the inner braces initialize the bytes array. So byte[0] is 0, byte[1] is 1, etc.
The union allows a uint32_t to lie on the same bytes as the char array and be interpreted in whatever the machine's endianness is. So if the machine is little endian, 0 is in the low order byte and 3 is in the high order byte of value. Conversely, if the machine is big endian, 0 is in the high order byte and 3 is in the low order byte of value.

{{0, 1, 2, 3}} is the initializer for the union, which will result in bytes component being filled with [0, 1, 2, 3].
Now, since the bytes array and the uint32_t occupy the same space, you can read the same value as a native 32-bit integer. The value of that integer shows you how the array was shuffled - which really means which endian system are you using.
There are only 3 popular possibilities here - O32_LITTLE_ENDIAN, O32_BIG_ENDIAN, and O32_PDP_ENDIAN.
As for the char / uint8_t - I don't know. I think it makes more sense to just use uint_8 with no checks.

C socket sending multiple fields message with binary protocol

How to construct a request message with a given message specification, and then send to server thought c socket? Binary protocol is employed for Client and Server communication. Are the following approaches correct?
Given message specification:
Field Fomat Length values
------------ ------ ------ --------
requesID Uint16 2 20
requestNum Uint16 2 100
requestTitle String 10 data sring
/************** approach 1 ****************/
typedef unsigned short uint16;
typedef struct {
uint16 requesID [2];
uint16 requestNum [2];
unsigned char requestTitle [10];
}requesMsg;
…
requesMsg rqMsg;
memcpy(rqMsg.requesID, "\x0\x14", 2); //20
memcpy(rqMsg.requesNum, "\x0\x64", 2); //100
memcpy(rqMsg.requesTitle, "title01 ", 10);
…
send(sockfd, &rqMsg, sizeof(rqMsg), 0);
/************** approach 2 ****************/
unsigned char rqMsg[14];
memset(rqMsg, 0, 14);
memcpy(rqMsg, "\x0\x14", 2);
memcpy(rqMsg+2, "\x0\x64", 2);
memcpy(rqMsg+4, "title01 ", 10);
…
send(sock, &rqMsg, sizeof(rqMsg), 0);

I'm afraid you are misunderstanding something: The length column appears to tell you the length in bytes, so if you receive a uint16 you receive 2 bytes.
Your first approach could lead to serious problem through data structure alignment. If I were in your shoes I'd prefer the second approach and fill in the bytes on my own into a byte array.
A general note about filling fields here: I'ts useless to use memcpy for "native" fields like uint16, etc. It might work but is simply a waste of runtime. You can fill in fields of a struct simply assigning them a value like rqMsg.requesID = 20;
Another issue is the question of byte order or endianness of your binary protocol.
As a whole package, I'd implement a "serializeRequest" function taking fields of your struct and convert it into a byte array according to the protocol.

Both of them are at least partially correct but I much prefer the first one because it allows for quick and natural data manipulations and access and leaves less space for errors compared to manual indexing. As a bonus you can even copy and assign structure values as a whole and in C it works as expected.
But for any outgoing data you should make sure to use a "packed" struct. Not only it will reduce the amount of data transmitted down to the array-based implementation figure but it will also make sure that the fields alignments are the same in all the programs involved. For most C compilers I tried (GCC included) it can be done with __attribute__((__packed__)) attribute, but there are different compilers that require different attributes or even a different keyword.
Also endianness control may be required if your application is going to run on different architectures (ARM clients vs x86_64 server is a major example). I just use some simple macros like these to preprocess each field individually before doing any calculations or data output:
#define BYTE_SWAP16(num) ( ((num & 0xFF) << 8) | ((num >> 8) & 0xFF) )
#define BYTE_SWAP32(num) ( ((num>>24)&0xff) | ((num<<8)&0xff0000) | ((num>>8)&0xff00) | ((num<<24)&0xff000000) )
But you can use different approaches like BCD encoding, separate decoding functions or something else.
Also notice that uint16_t is already a 2-byte value. You probably don't need two of them to store your single values.

C: Memcpy vs Shifting: Whats more efficient?

I have a byte array containing 16 & 32bit data samples, and to cast them to Int16 and Int32 I currently just do a memcpy with 2 (or 4) bytes.
Because memcpy is probably isn't optimized for lenghts of just two bytes, I was wondering if it would be more efficient to convert the bytes using integer arithmetic (or an union) to an Int32.
I would like to know what the effiency of calling memcpy vs bit shifting is, because the code runs on an embedded platform.

I would say that memcpy is not the way to do this. However, finding the best way depends heavily on how your data is stored in memory.
To start with, you don't want to take the address of your destination variable. If it is a local variable, you will force it to the stack rather than giving the compiler the option to place it in a processor register. This alone could be very expensive.
The most general solution is to read the data byte by byte and arithmetically combine the result. For example:
uint16_t res = ( (((uint16_t)char_array[high]) << 8)
| char_array[low]);
The expression in the 32 bit case is a bit more complex, as you have more alternatives. You might want to check the assembler output which is best.
Alt 1: Build paris, and combine them:
uint16_t low16 = ... as example above ...;
uint16_t high16 = ... as example above ...;
uint32_t res = ( (((uint32_t)high16) << 16)
| low16);
Alt 2: Shift in 8 bits at a time:
uint32_t res = char_array[i0];
res = (res << 8) | char_array[i1];
res = (res << 8) | char_array[i2];
res = (res << 8) | char_array[i3];
All examples above are neutral to the endianess of the processor used, as the index values decide which part to read.
Next kind of solutions is possible if 1) the endianess (byte order) of the device match the order in which the bytes are stored in the array, and 2) the array is known to be placed on an aligned memory address. The latter case depends on the machine, but you are safe if the char array representing a 16 bit array starts on an even address and in the 32 bit case it should start on an address dividable by four. In this case you could simply read the address, after some pointer tricks:
uint16_t res = *(uint16_t *)&char_array[xxx];
Where xxx is the array index corresponding to the first byte in memory. Note that this might not be the same as the index to he lowest value.
I would strongly suggest the first class of solutions, as it is endianess-neutral.
Anyway, both of them are way faster than your memcpy solution.

memcpy is not valid for "shifting" (moving data by an offset shorter than its length within the same array); attempting to use it for such invokes very dangerous undefined behavior. See http://lwn.net/Articles/414467/
You must either use memmove or your own shifting loop. For sizes above about 64 bytes, I would expect memmove to be a lot faster. For extremely short shifts, your own loop may win. Note that memmove has more overhead than memcpy because it has to determine which direction of copying is safe. Your own loop already knows (presumably) which direction is safe, so it can avoid an extra runtime check.

Pointer Dereferencing = Program Crash

unsigned int *pMessageLength, MessageLength;
char *pszParsePos;
...
//DATA into pszParsePos
...
printf("\nMessage Length\nb1: %d\nb2: %d\nb3: %d\nb4: %d\n",
pszParsePos[1],pszParsePos[2],pszParsePos[3],pszParsePos[4]);
pMessageLength= (unsigned int *)&pszParsePos[1];
MessageLength = *((unsigned int *)&pszParsePos[1]);
//Program Dies
Output:
Message Length
b1: 0
b2: 0
b3: 0
b4: 1
I'm don't understand why this is crashing my program. Could someone explain it, or at least suggest an alternative method that won't crash?
Thanks for your time!

Bus error means that you're trying to access data with incorrect alignment. Specifically, it seems like the processor requires int to be aligned more strictly than just anywhere, and if your *pszParsePos is aligned, say on an int boundary (which depends on how you initialize it, but will happen, e.g., if you use malloc), it's certain that &pszParsePos[1] isn't.
One way to fix this would be constructing MessageLength explicitly, i.e., something like
MessageLength = (pszParsePos[1] << 24) | (pszParsePos[2] << 16) | (pszParsePos[3] << 8) | pszParsePos[4]
(or the other way around if it's supposed to be little-endian). If you really want to type-pun, make sure that the pointer you're accessing is properly aligned.

Here's what I think is going wrong:
You added in a comment that you are runing on the Blackfin Processor. I looked this up on some web sites and they claim that the Blackfin requires what are called aligned accesses. That is, if you are reading or writing a 32-bit value to/from memory, then the physical address must be a an even multiple of 4 bytes.
Arrays in C are indexed beginning with [0], not [1]. A 4-byte array of char ends with element [3].
In your code, you have a 4-byte array of char which:
You treat as though it began at index 1.
You convert via pointer casts to a DWORD via 32-bit memory fetch.
I suspect your 4-char array is aligned to a 4-byte boundary, but as you are beginning your memory access at position +1 byte, you get a misalignment of data bus error.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight