Pointer casting problem with struct array member - c

I've run across this source in a legacy code base and I don't really know why exactly it behaves the way it does.
In the following code, the pData struct member either contains the data or a pointer to the real data in shared memory. The message is sent using IPC (msgsnd() and msgrcv()). Using the pointer casts (that are currently commented out), it fails using GCC 4.4.1 on an ARM target, the member uLen gets modified. When using memcpy() and everything works as expected. I can't really see what is wrong with the pointer casting. What is wrong here?
typedef struct {
long mtype;
unsigned short uRespQueue;
unsigned short uID;
unsigned short uLen;
unsigned char pData[8000];
} message_t;
// changing the pointer in the struct
{
unsigned char *pData = <some_pointer>;
#if 0
*((unsigned int *)pMessage->pData) = (unsigned int)pData;
#else
memcpy(pMessage->pData, &pData, sizeof(unsigned int));
#endif
}
// getting the pointer out
{
#if 0
unsigned char *pData; (unsigned char *)(*((unsigned int *)pMessage->pData));
#else
unsigned char *pData;
memcpy(&pData, pMessage->pData, sizeof(int));
#endif
}

I suspect it's an alignment problem and either GCC or the processor is trying to compensate. The structure is defined as:
typedef struct {
long mtype;
unsigned short uRespQueue;
unsigned short uID;
unsigned short uLen;
unsigned char pData[8000];
} message_t;
Assuming normal alignment restrictions and a 32-bit processor, the offsets of each field are:
mtype 0 (alignment 4)
uRespQueue 4 (alignment 2)
uID 6 (alignment 2)
uLen 8 (alignment 2)
pData 10 (alignment 1)
On all but the most recent versions of the ARM processor, memory access must be aligned on the ARM processor and with the casting:
*((unsigned int *)pMessage->pData) = (unsigned int)pData;
you are attempting to write a 32-bit value on a misaligned address. To correct the alignment, the address appears to have truncated the LSB's of the address to have the proper alignment. Doing so happened to overlap with the uLen field causing the problem.
To be able to handle this correctly, you need to make sure that you write the value to a properly aligned address. Either offset the pointer to align it or make sure pData is aligned to be able to handle 32-bit data. I would redefine the structure to align the pData member for 32-bit access.
typedef struct {
long mtype;
unsigned short uRespQueue;
unsigned short uID;
unsigned short uLen;
union { /* this will add 2-bytes of padding */
unsigned char *pData;
unsigned char rgData[8000];
};
} message_t;
The structure should still occupy the same amount of bytes since it has a 4-byte alignment due to the mtype field.
Then you should be able to access the pointer:
unsigned char *pData = ...;
/* setting the pointer */
pMessage->pData = pData;
/* getting the pointer */
pData = pMessage->pData;

That is a very nasty thing to do (the thing that's compiled out). You're trying basically to hack the code, and instead of using the data copy in the message (in the provided 8000 bytes for it), you try to put a pointer, and pass it through IPC.
The main issue is sharing memory between processes. Who knows what happens to that pointer after you send it? Who knows what happens to the data it points to? That's a very bad habbit to send out a pointer to data that is not under your control (i.e.: not protected/properly shared).
Another thing that might happen, and is probably what you're actually talking about, is the alignment. The array is of char's, the previous member in the struct is short, the compiler might attempt packing them. Recasting char[] to int * means that you take memory area and represent it as something else, without telling the compiler. You're stomping over the uLen by the cast.
memcopy is the proper way to do it.

The point here is the code "int header = (((int)(txUserPtr) - 4))"
Illustration of UserTypes and struct pointer casting is great of help!
typedef union UserTypes
{
SAUser AUser;
BUser BUser;
SCUser CUser;
SDUser DUser;
} UserTypes;
typedef struct AUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} AUser;
typedef struct AUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} AUser;
typedef struct BUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} BUser;
typedef struct CUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} CUser;
typedef struct DUser
{
int userId;
int dbIndex;
ChannelType ChanType;
} DUser;
//this is the function I want to test
void Fun(UserTypes * txUserPtr)
{
int header = (*((int*)(txUserPtr) - 4));
//the problem is here
//how should i set incoming pointer "txUserPtr" so that
//Fun() would skip following lines.
// I don't want to execute error()
if((header & 0xFF000000) != (int)0xAA000000)
{
error("sth error\n");
}
/*the following is the rest */
}

Related

is this valid C plus valid ebpf, means can I safe the address of any object in long type which is 8 byte size

this is my code
#include <stdio.h>
#include <string.h>
#include <stdint.h>
struct mystruct{
int i;
int j;
long k;
long l;
char str[11];
};
int main()
{
struct mystruct obj;
obj.i=5;obj.j=55;obj.k=6;obj.k=1000001;obj.l=2000007;memcpy(obj.str,"hello",sizeof("hello"));
long addr=(long)((uint8_t *)&obj);
struct mystruct *myobj=(struct mystruct *)(addr);
printf("%d %d %zu %zu %s\n",myobj->i,myobj->j,myobj->k,myobj->l,myobj->str);
printf("%zu=%zu\n",sizeof(long),sizeof(myobj));
return 0;
}
So I like to know can I safe the address of any object (struct,union, etc.) or any type variable (int,long,char) into long variable. I showed the the code sizeof struct address or pointer is same as long. Is my above code is OK.
Also does ebpf varifier allows this?
Also if I have map with value of type long
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, __u32);
__type(value, long);
__uint(max_entries, 2);
} hash_map1 SEC(".maps");
can I do extract my struct object like following from long type for map value
struct hash_elem {
int cnt;
struct bpf_spin_lock lock;
};
from long type. is this possible in ebpf/xdp?
long addr=(long)((uint8_t *)&obj);
struct mystruct *myobj=(struct mystruct *)(addr);
I don't see a reason why the BPF verifier should reject this; you are just casting into and from long.
This will be compiled into a 64-bit BPF register anyway and type information will be lost (unless using BTF). The verifier will then infer a type for its own purpose and will correctly recognize this as a pointer to the stack (PTR_TO_STACK).
can I do extract my struct object like following from long type for map value
No, that wouldn't make sense. If you store in the map a pointer to the stack, then when your program exists, the pointer may be pointing to invalid memory.
Why not simple:
uintptr_t addr=(uintptr_t)&obj;
struct mystruct *myobj=(struct mystruct *)(addr);
uintptr_t type is guaranteed to keep any pointer converted to an integer and it is widely used in the BPF code.

Struct member alignment -- different sizeof using 16-bit and 32-bit compiler

I have a structure used to contruct messages to a control board I need to maintain software compatibility between a C167 16-bit Keil compiler and a 32-bit Tricore gcc compiler.
typedef struct
{
unsigned char new_weld_status[2];
UINT32 new_weld_count;
UINT16 new_weld_fail_count;
} NEW_PULSE_DATA;
The array new_weld_status[2] takes up 2 bytes on the 16-bit compiler but 4 bytes on the 32-bit compiler. I was thinking of replacing all the new_weld_status[2] with a union when compiling with gcc. But is there a switch I can use for gcc that makes the chars fits/aligns in 2 bytes?
Thanks
Note that your structure layout creates the problem on a 32-bit system. Many (most) 32-bit CPU architectures require 4-byte alignment for 32-bit words, thus the new_weld_count requires 'padding' to provide proper memory alignment.
typedef struct
{
unsigned char new_weld_status[2]; //a
//char padding_1[2]; //hidden padding
UINT32 new_weld_count; //a
UINT16 new_weld_fail_count; //a
} NEW_PULSE_DATA;
The following redefinition of your structure completely avoids the problem.
typedef struct
{
UINT32 new_weld_count; //a
UINT16 new_weld_fail_count; //a
unsigned char new_weld_status[2]; //a
} NEW_PULSE_DATA;
NEW_PULSE_DATA ex_PULSE_DATA;
However, the above approach is not the approach typically to transport struct(ured) data across networks/over message transports. A more common and much better approach is to use a serialization/deserialization layer (aka marshalling) to place the structures into 'over the wire' formats. Your current approach is conflating the in-memory storage and addressing with the communication format.
//you need to decide on the size of wire format data,
//Both ends of the protocol must agree on these sizes,
#define new_weld_count_SZ sizeof(ex_PULSE_DATA.new_weld_count)
#define new_weld_fail_count_SZ sizeof(ex_PULSE_DATA.new_weld_fail_count)
#define new_weld_status_SZ sizeof(ex_PULSE_DATA.new_weld_status)
//Then you define a network/message format
typedef struct
{
byte new_weld_count[new_weld_count_SZ];
byte new_weld_fail_count[new_weld_count_SZ];
byte new_weld_status[new_weld_count_SZ];
} MESSAGE_FORMAT_PULSE_DATA;
Then you would implement serialization & deserialization functions on both ends of the transport. The following example is simplistic, but conveys the gist of what you need.
byte*
PULSE_DATA_serialize( MESSAGE_FORMAT_PULSE_DATA* msg, NEW_PULSE_DATA* data )
{
memcpy(&(msg->new_weld_count), data->new_weld_count, new_weld_count_SZ);
memcpy(&(msg->new_weld_fail_count), data->new_weld_fail_count, new_weld_fail_count_SZ);
memcpy(&(msg->new_weld_status), data->new_weld_status, new_weld_status_SZ);
return msg;
}
NEW_PULSE_DATA*
PULSE_DATA_deserialize( NEW_PULSE_DATA* data, MESSAGE_FORMAT_PULSE_DATA* msg )
{
memcpy(data->new_weld_count, &(msg->new_weld_count), new_weld_count_SZ);
memcpy(data->new_weld_fail_count, &(msg->new_weld_fail_count), new_weld_fail_count_SZ);
memcpy(data->new_weld_status, &(msg->new_weld_status), new_weld_status_SZ);
return msg;
}
Note that I have omitted the obligatory network byte order conversions, because I assume your have already worked out your byte order issues between the two cpu domains. If you have not considered byte-order (big-endian vs. little-endian), then you need to address that issue as well.
When you send a message, the sender does the following,
//you need this declared & assigned somewhere
NEW_PULSE_DATA data;
//You need space for your message
MESSAGE_FORMAT_PULSE_DATA msg;
result = send(PULSE_DATA_deserialize( &data, &msg ));
When you receive a message, the recipient does the following,
//recipient needs this declared somewhere
NEW_PULSE_DATA data;
//Need buffer to store received data
MESSAGE_FORMAT_PULSE_DATA msg;
result = receive(&msg,sizeof(msg));
//appropriate receipt checking here...
PULSE_DATA_deserialize( &data, &msg );
Union wouldn't change the members alignment inside a struct. You are interested in padding. The compiler may insert any number of bytes/bits between struct members to satisfy alignment requiremens. On gcc compatible compilers you may use __attribute__((__packed__)) as Acorn already pointed out, but this does not take care of endianess. The most compatible version between platforms (including platforms with different alignment and different endianess) would be to use (sadly!) get/set functions that look like this:
typedef struct {
unsigned char data[2+4+2];
} NEW_PULSE_DATA;
unsigned char NEW_PULSE_DATA_get_new_weld_status(NEW_PULSE_DATA *t, size_t idx) {
return t->data[idx];
}
void NEW_PULSE_DATA_set_new_weld_status(NEW_PULSE_DATA *t, size_t idx, unsigned char value) {
t[idx] = value;
}
UINT32 NEW_PULSE_DATA_get_new_weld_count(NEW_PULSE_DATA *t) {
return (UINT32)t->data[2]<<24
| (UINT32)t->data[3]<<16
| (UINT32)t->data[4]<<8
| (UINT32)t->data[5];
}
void NEW_PULSE_DATA_set_new_weld_count(NEW_PULSE_DATA *t, UINT32 val) {
t->data[2] = val>>24;
t->data[3] = val>>16;
t->data[4] = val>>8;
t->data[5] = val;
}
UINT16 NEW_PULSE_DATA_get_new_weld_fail_count(NEW_PULSE_DATA *t) {
return (UINT16)t->data[6]<<8
| (UINT16)t->data[7];
}
void NEW_PULSE_DATA_set_new_weld_fail_count(NEW_PULSE_DATA *t, UINT16 val) {
t->data[6] = val>>8;
t->data[7] = val;
}
This is the only "good" way of being 100% sure, that NEW_PULSE_DATA looks exactly the same on different platforms (at least on platforms with the same number of bits per char/CHAR_BIT value). However sizeof(NEW_PULSE_DATA) may be still different between platforms, because compiler may insert padding on the end of the struct (after the last member of the structure). So you may want to change NEW_PULSE_DATA type to be just an array of bytes:
typedef unsigned char NEW_PULSE_DATA[2+4+2];
unsigned char NEW_PULSE_DATA_get_new_weld_status(NEW_PULSE_DATA t, size_t idx) {
return t[idx];
}
unsigned char NEW_PULSE_DATA_set_new_weld_status(NEW_PULSE_DATA t, size_t idx, unsigned char value) {
t[idx] = value;
}
UINT32 NEW_PULSE_DATA_get_new_weld_count(NEW_PULSE_DATA t) {
return (UINT32)t[2]<<24
| (UINT32)t[3]<<16
| (UINT32)t[4]<<8
| (UINT32)t[5];
}
void NEW_PULSE_DATA_set_new_weld_count(NEW_PULSE_DATA t, UINT32 val) {
t[2] = val>>24;
t[3] = val>>16;
t[4] = val>>8;
t[5] = val;
}
UINT16 NEW_PULSE_DATA_get_new_weld_fail_count(NEW_PULSE_DATA t) {
return (UINT16)t[6]<<8
| (UINT16)t[7];
}
void NEW_PULSE_DATA_set_new_weld_fail_count(NEW_PULSE_DATA t, UINT16 val)
{
t[6] = val>>8;
t[7] = val;
}
For gcc and other compilers, you can use __attribute__((packed)):
This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required.

Casting structures to buffers in embedded code

It's sometimes necessary to cast a data structure into a pointer so that the data can be sent, for example, over an interface, or written out to some other stream. In these cases, I usually do something like this:
typedef struct {
int field1;
char field2;
} testStruct;
int main()
{
char *buf;
testStruct test;
buf = (char *)&test;
// write(buf, sizeof(test)) or whatever you need to do
return 0;
}
Recently in some microprocessor code, however, I saw something similar to this:
typedef struct {
int field1;
char field2;
} testStruct;
int main()
{
char buf[5];
testStruct test;
*(testStruct *)buf = test;
// write(buf, sizeof(test)) or whatever you need to do
return 0;
}
To me, the former feels a little more safe. You just have one pointer, and you assign the address of the structure to the pointer.
In the latter case, it seems like if you allocate the wrong size to the array buf by accident, you'll end up with undefined behavior, or a segfault.
With optimizations on, I get a -Wstrict-aliasing warning from gcc. However, again, this code runs on a microprocessor, so is there something I might be missing there?
There's no pointers in the structures, or anything, it's very straight forward.
(testStruct *)buf may generate a mis-aligned address for a testStruct leading to a bus fault. Do not use.
A union is better. It helps cope with anti-aliasing issues as well as alignment ones.
Also see #Steve Summit's good comment.
Consider a master type like testStruct_all.
typedef struct { // OP's structure
int field1;
char field2;
} testStruct1;
typedef struct { // Perhaps another structure to send
double field1;
char field2;
} testStruct2;
// A union of all possible structures used in this app
typedef union {
testStruct1 tS1;
testStruct2 tS2;
char buf[1];
} testStruct_all;
int main(void) {
testStruct_all ux;
foo(&ux.tS1); // populate ux.tSn of choice.
write(ux.buf, sizeof ux.tS1);
read(ux.buf, sizeof ux.tS1);
// the union insures alignment and avoids AA issues
bar(&ux.tS1);
return 0;
}
write() usually accepts a void * #user58697, so code could drop the buf member and use:
write(&ux, sizeof ux.tS1); // or whatever you need to do

using memcpy for structs

I have a problem when using memcpy on a struct.
Consider the following struct
struct HEADER
{
unsigned int preamble;
unsigned char length;
unsigned char control;
unsigned int destination;
unsigned int source;
unsigned int crc;
}
If I use memcpy to copy data from a receive buffer to this struct the copy is OK, but if i redeclare the struct to the following :
struct HEADER
{
unsigned int preamble;
unsigned char length;
struct CONTROL control;
unsigned int destination;
unsigned int source;
unsigned int crc;
}
struct CONTROL
{
unsigned dir : 1;
unsigned prm : 1;
unsigned fcb : 1;
unsigned fcb : 1;
unsigned function_code : 4;
}
Now if I use the same memcpy code as before, the first two variables ( preamble and length ) are copied OK. The control is totally messed up, and last three variables are shifted one up, aka crc = 0, source = crc, destination = source...
ANyone got any good suggestions for me ?
Do you know that the format in the receive buffer is correct, when you add the control in the middle?
Anyway, your problem is that bitfields are the wrong tool here: you can't depend on the layout in memory being anything in particular, least of all the exact same one you've chosen for the serialized form.
It's almost never a good idea to try to directly copy structures to/from external storage; you need proper serialization. The compiler can add padding and alignment between the fields of a structure, and using bitfields makes it even worse. Don't do this.
Implement proper serialization/deserialization functions:
unsigned char * header_serialize(unsigned char *put, const struct HEADER *h);
unsigned char * header_deserialize(unsigned char *get, struct HEADER *h);
That go through the structure and read/write as many bytes as you feel are needed (possibly for each field):
static unsigned char * uint32_serialize(unsigned char *put, uint32_t x)
{
*put++ = (x >> 24) & 255;
*put++ = (x >> 16) & 255;
*put++ = (x >> 8) & 255;
*put++ = x & 255;
return put;
}
unsigned char * header_serialize(unsigned char *put, const struct HEADER *h)
{
const uint8_t ctrl_serialized = (h->control.dir << 7) |
(h->control.prm << 6) |
(h->control.fcb << 5) |
(h->control.function_code);
put = uint32_serialize(put, h->preamble);
*put++ = h->length;
*put++ = ctrl_serialized;
put = uint32_serialize(put, h->destination);
put = uint32_serialize(put, h->source);
put = uint32_serialize(put, h->crc);
return put;
}
Note how this needs to be explicit about the endianness of the serialized data, which is something you always should care about (I used big-endian). It also explicitly builds a single uint8_t version of the control fields, assuming the struct version was used.
Also note that there's a typo in your CONTROL declaration; fcb occurs twice.
Using struct CONTROL control; instead of unsigned char control; leads to a different alignment inside the struct and so filling it with memcpy() produces a different result.
Memcpy copies the values of bytes from the location pointed by source directly to the memory block pointed by destination.
The underlying type of the objects pointed by both the source and destination pointers are irrelevant for this function; The result is a binary copy of the data.
So if there is any structure padding then you will have messed up results.
Check sizeof(struct CONTROL) -- I think it would be 2 or 4 depending on the machine. Since you are using unsigned bitfields (and unsigned is shorthand of unsigned int), the whole structure (struct CONTROL) would take at least the size of unsigned int -- i.e. 2 or 4 bytes.
And, using unsigned char control takes 1 byte for this field. So, definitely there should be mismatch staring with the control variable.
Try rewriting the struct control as below:-
struct CONTROL
{
unsigned char dir : 1;
unsigned char prm : 1;
unsigned char fcb : 1;
unsigned char fcb : 1;
unsigned char function_code : 4;
}
The clean way would be to use a union, like in.:
struct HEADER
{
unsigned int preamble;
unsigned char length;
union {
unsigned char all;
struct CONTROL control;
} uni;
unsigned int destination;
unsigned int source;
unsigned int crc;
};
The user of the struct can then choose the way he wants to access the thing.
struct HEADER thing = {... };
if (thing.uni.control.dir) { ...}
or
#if ( !FULL_MOON ) /* Update: stacking of bits within a word appears to depend on the phase of the moon */
if (thing.uni.all & 1) { ... }
#else
if (thing.uni.all & 0x80) { ... }
#endif
Note: this construct does not solve endianness issues, that will need implicit conversions.
Note2: and you'll have to check the bit-endianness of your compiler, too.
Also note that bitfields are not very useful, especially if the data goes over the wire, and the code is expected to run on different platforms, with different alignment and / or endianness. Plain unsigned char or uint8_t plus some bitmasking yields much cleaner code. For example, check the IP stack in the BSD or linux kernels.

Dereferencing pointer to array of void

I am attempting to learn more about C and its arcane hidden powers, and I attempted to make a sample struct containing a pointer to a void, intended to use as array.
EDIT: Important note: This is for raw C code.
Let's say I have this struct.
typedef struct mystruct {
unsigned char foo;
unsigned int max;
enum data_t type;
void* data;
} mystruct;
I want data to hold max of either unsigned chars, unsigned short ints, and unsigned long ints, the data_t enum contains
values for those 3 cases.
enum Grid_t {gi8, gi16, gi32}; //For 8, 16 and 32 bit uints.
Then I have this function that initializes and allocates one of this structs, and is supposed to return a pointer to the new struct.
mystruct* new(unsigned char foo, unsigned int bar, long value) {
mystruct* new;
new = malloc(sizeof(mystruct)); //Allocate space for the struct.
assert(new != NULL);
new->foo = foo;
new->max = bar;
int i;
switch(type){
case gi8: default:
new->data = (unsigned char *)calloc(new->max, sizeof(unsigned char));
assert(new->data != NULL);
for(i = 0; i < new->max; i++){
*((unsigned char*)new->data + i) = (unsigned char)value;
//Can I do anything with the format new->data[n]? I can't seem
//to use the [] shortcut to point to members in this case!
}
break;
}
return new;
}
The compiler returns no warnings, but I am not too sure about this method. Is it a legitimate way to use pointers?
Is there a better way©?
I missed calling it. like mystruct* P; P = new(0,50,1024);
Unions are interesting but not what I wanted. Since I will have to approach every specific case individually anyway, casting seems as good as an union. I specifically wanted to have much larger 8-bit arrays than 32-bits arrays, so an union doesn't seem to help. For that I'd make it just an array of longs :P
No, you cannot dereference a void* pointer, it is forbidden by the C language standard. You have to cast it to a concrete pointer type before doing so.
As an alternative, depending on your needs, you can also use a union in your structure instead of a void*:
typedef struct mystruct {
unsigned char foo;
unsigned int max;
enum data_t type;
union {
unsigned char *uc;
unsigned short *us;
unsigned int *ui;
} data;
} mystruct;
At any given time, only one of data.uc, data.us, or data.ui is valid, as they all occupy the same space in memory. Then, you can use the appropriate member to get at your data array without having to cast from void*.
What about
typedef struct mystruct
{
unsigned char foo;
unsigned int max;
enum data_t type;
union
{
unsigned char *chars;
unsigned short *shortints;
unsigned long *longints;
};
} mystruct;
That way, there is no need to cast at all. Just use data_t to determine which of the pointers you want to access.
Is type supposed to be an argument to the function? (Don't name this function or any variable new or any C++ programmer who tries to use it will hunt you down)
If you want to use array indices, you can use a temporary pointer like this:
unsigned char *cdata = (unsigned char *)new->data;
cdata[i] = value;
I don't really see a problem with your approach. If you expect a particular size (which I think you do given the name gi8 etc.) I would suggest including stdint.h and using the typedefs uint8_t, uint16_t, and uint32_t.
A pointer is merely an address in the memory space. You can choose to interpret it however you wish. Review union for more information on how you can interpret the same memory location in multiple ways.
casting between pointer types is common in C and C++, and the use of void* implies that you dont want users to accidentally dereference (dereferencing a void* will cause an error, but dereferencing the same pointer when cast to int* will not)

Resources