Casting arrays and structs - c

Suppose I have some complex struct
struct icmphdr
{
u_int8_t type;
u_int8_t code;
u_int16_t checksum;
/* Parts of the packet below don’t have to appear */
union
{
struct
{
u_int16_t id;
u_int16_t sequence;
// Type of ICMP message
// Packet code
// Datagram checksum
} echo;
u_int32_t gateway;
struct
{
u_int16_t __unused;
u_int16_t mtu;
} frag;
} un;
};
and a
char buf[SIZE];//for some integer SIZE
what is the meaning and the interest of this cast ?
ip=(struct icmphdr*)buf; //ip was formerly defined as some struct iphdr *ip;

The likely scenario behind your code is this:
The programmer wanted to create a data protocol and represent the various contents as a struct, to ease programming and improve code readability.
The underlying API probably only allows data transmissions on byte basis. This means that the struct will have to be passed as a "chunk of bytes". Your particular code appears to be the receiver: it has a chunk of raw bytes and states that the data in those bytes corresponds to a struct.
Formally & theoretically, the C standard does not define what happens when you cast between pointers to different data types. In theory, anything can happen if you do. But in practice/the real world, such casts are well-defined as long as there some sort of guarantee about the structure of the data.
Here is where you can get problems. Many computers have alignment requirements, meaning that the compiler is free to insert so-called padding bytes anywhere inside your struct/union. These padding bytes may not necessarily be the same between two compilations, and they may certainly not be the same between two different systems.
So you have to either ensure that both the sender and the receiver have no padding enabled, or that they have the same padding. Otherwise you cannot use structs/unions, they will cause the program to crash and burn.
The quick & dirty way to ensure that struct padding isn't enabled, is to use a compiler option such as the non-standard #pragma pack 1, which is commonly supported by many compilers.
The professional, portable way is to add a compile-time assert to check that the size of the struct is indeed as intended. With C11, it would look like
static_assert(sizeof(struct icmphdr) ==
(sizeof(uint8_t) +
sizeof(uint8_t) + ... /* all individual members' types */ ),
"Error: padding detected");
If the compiler doesn't support static_assert, there are several ways to achieve something similar with various macros, or even a runtime assert().

That's pretty bad. Don't ever make a char buffer and cast it to a struct, because the alignment will be wrong (ie, the char buffer is going to have some random starting address because strings can start anywhere, but ints need/should have addresses multiples of four on most architectures).
The solution is not to do nasty casts like that. Make a proper union that will have the alignment of the most restrictive of its members, or use a special element to force the alignment you need if you have to (see the definition of sockaddr_storage in your /usr/include/sys/socket.h or similar).
Illustration
You create a buffer on the stack and read some data into it:
char buf[1024]; int nread = read(fd, &buf, sizeof(buf));
Now you pretend the buffer was the struct:
CHECK(nread >= sizeof(struct icmphdr));
struct icmphdr* hdr = (struct icmphdr*)buf;
hdr->u.gateway; // probable SIGSEGV on eg Itanium!
By reinterpreting the buffer as a struct, we bypassed the compiler's checks. If we're unlucky, &hdr->u.gateway won't be a multiple of four, and accessing it as an integer will barf on some platforms.
Illustration of solution
strut iphdr hdr; int nread = read(fd, &hdr, sizeof(hdr));
CHECK(nread == sizeof(hdr));
hdr.u.gateway; // OK
Let the compiler help you. Don't do grotty casts. When you make a buffer, tell the compiler what you're going to use the buffer for so it can put it in the correct place in memory for you.

Related

How safe is casting a struct to uint8_t * or char * and accessing it via bytestream?

The following logic works fine but I'm uncertain of the caveats with what the standard says and whether it's totally safe to cast a struct to uint8_t * or char * to send to a message queue (which itself takes in a pointer to the buffer as well) or even a function?
My understanding is as long as uint8_t is considered a byte (which char is), it could be used to address any set of bytes
typedef struct
{
uint8_t a;
uint8_t b;
uint16_t c;
} } __attribute__((packed)) Pkt;
int main()
{
Pkt pkt = {.a = 4, .b = 12, .c = 300};
mq_send(mq, (char *) &pkt, sizeof(pkt), 0);
}
Perhaps it's similar to passing a cast pointer to a function (on the receiver end), and it's parsing the data according to bytes
typedef struct
{
uint8_t a;
uint8_t b;
uint16_t c;
} __attribute__((packed)) Pkt;
void foo(uint8_t *ptr)
{
uint8_t a = *ptr++;
uint8_t b = *ptr++;
uint16_t str = (*(ptr+1) << 8) | *ptr;
printf ("A: %d, B: %d, C: %d\n", a, b, str);
}
int main()
{
Pkt pkt = {.a = 4, .b = 12, .c = 300};
foo((uint8_t *) &pkt);
}
C deliberately allows accessing the bytes of an object and supports communicating objects by transmitting the bytes that represent them and reconstructing them from the transmitted bytes. However, it should be done correctly, and there are some issues to deal with.
A character type should be used.
The preferred type to work with is unsigned char. This is preferred for two reasons:
The C standard defines the behavior of using character types to access the representations of objects. The character types are char, signed char, and unsigned char. The standard does not require that uint8_t be a character type. Although it may have the same size and general properties of unsigned char, it may be an extended integer type rather than an alias of unsigned char (or of char). In this case, the C standard does not define the behavior of accessing the bytes of an object with uint8_t.
unsigned char is preferred over char or signed char to avoid problems with signed integers in various C operations.
The sender and the receiver must agree on the representations of the objects or the protocol used for sending them.
If the sender and the receiver are compiled with the same C implementation using the same definitions for the objects being transmitted (such as the same structure definitions), they will agree on the representations. Between diverse C implementations, though, it is necessary to ensure there is clear agreement on how the transmitted bytes represent objects. As shown in your code, the structure is packed, which should take care of the problem that there may be padding inside structures. Other considerations include:
The order of bytes within integers. Little-endian-first (bytes in order from least significant to most) and big-endian-first (the reverse) are common, although others are possible. Big endian is most common in network protocols.
Representations of non-integers, such as floating-point formats. The IEEE-754 floating-point standard specifies some interchange formats, which are very widely used.
Structures are layed out identically, including the types of members.
Theoretically, the order of bits within bytes must be agreed, but this is not an issue if the network service is operating at the byte level.
Note that, of course, some objects are inherently impossible to send via bytes representations due to needing context in the running program, such as pointers and file handles.
Additional note
Another hazard to guard against is interpreting a byte buffer as another object. The C standard defines the behavior of accessing the bytes of an object (for example, something defined as a structure) using a character type, but it does not define the reverse. Sometimes naïve programmers will create an array of character type, read a network message into it, and then convert a pointer into the array to a pointer to a structure type. This runs afoul of two issues:
The conversion is not defined if the alignment is not correct. (This should not be a problem with a packed array, which we would expect to have an alignment requirement of one byte.)
Accessing an array of characters as a different, incompatible type is not defined by the C standard.
The proper way to reassemble received bytes into an object is to copy them either into memory declared as the desired type or memory allocated (as with malloc) for the purpose of interpreting it as the intended object. This can be done by copying bytes from a buffer into the target memory or by directly passing the target memory to the network read routine, for it to fill in the bytes directly.

Initializing a struct pointer with char array

I've encountered a similiar problem as described in another thread (perf_event_open - how to monitoring multiple events). I was able to solve it and the code is working, but I want to understand why this part actually works and how this is not a violation of any kind:
char buf[4096];
struct read_format* rf = (struct read_format*) buf;
struct read_format is defined as followed:
struct read_format {
uint64_t nr;
struct {
uint64_t value;
uint64_t id;
} values[/*2*/]; };
How does the compiler know to which value uint64_t nr should be initialized? Or how to initialize the inner struct right?
This code is incorrect in Standard C:
char buf[4096];
read(fd1, buf, 4096); // Assume error handling, omitted for brevity
struct read_format* rf = (struct read_format*) buf;
printf("%llu\n", rf->nr);
There are two issues -- and these are distinct issues which should not be conflated -- :
buf might not be correctly aligned for struct read_format. If it isn't, the behaviour is undefined.
Accessing rf->nr violates the strict aliasing rule and the behaviour is undefined. An object with declared type char cannot be read of written by an expression of type . unsigned long long. Note that the converse is not true.
Why does it appear to work? Well, "undefined" does not mean "must explode". It means the C Standard no longer specifies the program's behaviour. This sort of code is somewhat common in real code bases. The major compiler vendors -- for now -- include logic so that this code will behave as "expected", otherwise too many people would complain.
The "expected" behaviour is that accessing *rf should behave as if there exists a struct read_format object at the address, and the bytes of that object are the same as the bytes of buf . Similar to if the two were in a union.
The code could be made compliant with a union:
union
{
char buf[4096];
struct read_format rf;
} u;
read(fd1, u.buf, sizeof u.buf);
printf("%llu\n", u.rf->nr);
The strict aliasing rule is "disabled" for union members accessed by name; and this also addresses the alignment problem since the union will be aligned for all members.
It's up to you whether to be compliant, or trust that compilers will continue put practicality ahead of maximal optimization within the constraints permitted by the Standard.
It doesn't The buffer is zero-initialized and the struct pointer is initialized with a pointer to the buffer.
It looks completely whack; however it really isn't. The read function is going to read as many structures into the buffer as fit.
The outer structure is variable-length. The advance loop looks like this:
struct read_format *current = rf;
if (readstructs(..., &current, 4096)) {
for (;current;current=current->nr?((struct read_format *)((char *)current + current->nr)):NULL) {
}
}
These things appear in system-level OS calls to decrease the complexity of copying memory across security boundaries. The read side is easy and well-taught. The writer performs the operations necessary in filling the buffer to ensure this simple reader does not violate any system-level constraints. The code will work despite looking like it violates types left and right because the writer has set it up to work. In particular, the pointer will be aligned.
I've seen a similar method used in old file formats. Unfortunately that only follows the rules of the platform that wrote it (usually something ancient and far more permissive than a modern system) and leads to having to write a byte-at-a-time reader because the host doing the reading doesn't correspond.

Should all structs that are expected to be read from binary be marked as packed?

I know that some structs, may, or may not, add padding between elements.
My current project is reading input from /dev/input files. The binary layout of these files is defined in <linux/input.h>:
struct input_event {
struct timeval time;
__u16 type;
__u16 code;
__s32 value;
};
I wonder though that this struct is not marked with a packed attribute. Meaning that the /dev/input files (which are packed bit by bit) are not guaranteed to match the same package as the struct. Thus the logic
struct input_event event;
read(fd, &event, sizeof(event));
Is not defined to work across all archs.
Do I have a fallacy in my logic? Or is it safe to assume that somethings are not going to be packed?
Packing by Layout
In the current case, you are safe. Your struct input_event is allready layed out packed.
struct timeval time; /* 8 bytes */
__u16 type; /* 2 bytes */
__u16 code; /* 2 bytes */
__s32 value; /* 4 bytes */
This means, the members form clean 32Bit blocks and so there is no padding required. This article explains how the size of struct members (especially chars) and their layout affect the padding and thus also the final size of a struct.
Packing by the Preprocessor
Packing structs via the preprocessor seems to be a good solution at a first sight. Looking a little closer, there show up several downsides and one of those hits you at the point you are caring about (see also #pragma pack effect)
Performance
Padding assures that your struct members are accessible without searching inside memoryblocks (4byte blocks on a 32bit machine and 8byte blocks on a 64bit machine respectively). In consequence, packing such structures leads to members spanning over multiple memoryblocks and therefore require the machine to search for them.
Different Platforms (Archs, Compilers)
Preprocessor instructions are heavily vendor and architecture specific. Thus, using them leads to lesser or in worst case non-portability of your code.
Conclusion
As the author of this article (already mentioned above) states, even NTP directly reads data from the network into structures.
So, carefully laying out your structures and maybe padding them by hand might be the safest and also most portable solution.
If you insist on directly loading structures into memory from binary images, then C is not your friend. Padding is allowed, and the basic types can have different widths and endiannness. You are not even guaranteed 8 bits in a byte. However packing the structures and sticking to int32_t and so on will help a great deal, it's effectively portable bar endiannness.
But the best solution is to load the structure from the stream portably. This is even possible with real numbers, though a bit fiddly.
This is how to read a 16 bit integer portably. See my github project for the rest of the functions (similar logic)
https://github.com/MalcolmMcLean/ieee754
/**
Get a 16-bit big-endian signed integer from a stream.
Does not break, regardless of host integer representation.
#param[in] fp - pointer to a stream opened for reading in binary mode
# returns the 16 bit value as an integer
*/
int fget16be(FILE *fp)
{
int c1, c2;
c2 = fgetc(fp);
c1 = fgetc(fp);
return ((c2 ^ 128) - 128) * 256 + c1;
}

typecast array to struct in c

I have a structure like this
struct packet
{
int seqnum;
char type[1];
float time1;
float pri;
float time2;
unsigned char data[512];
}
I am receiving packet in an array
char buf[529];
I want to take the seqnum,data everything separately.Does the following typecast work.. It is giving junk value for me.
struct packet *pkt;
pkt=(struct packet *)buf;
printf(" %d",pkt->seqnum)
No, that likely won't work and is generally a bad and broken way of doing this.
You must use compiler-specific extensions to make sure there's no invisible padding between your struct members, for something like that to work. With gcc, for instance, you do this using the __attribute__() syntax.
It is, thus, not a portable idea.
It's much better to be explicit about it, and unpack each field. This also gives you a chance to have a well-defined endianness in your network protocol, which is generally a good idea for interoperability's sake.
No, that isn't generally valid code. You should make the struct first and then memcopy stuff into it:
packet p;
memcpy(&p.seqnum, buf + 0, 4);
memcpy(&p.type[0], buf + 4, 1);
memcpy(&p.time1, buf + 5, 4);
And so forth.
You must take great care to get the type sizes and endianness right.
First of all, you cannot know in advance where the compiler will insert padding bytes in your structure for performance optimization (cache line alignment, integer alignment etc) since this is platform-dependent. Except, of course, if you are considering building the app only on your platform.
Anyway, in your case it seems like you are getting data from somewhere (network ?) and it is highly probable that the data has been compacted (no padding bytes between fields).
If you really want to typecast your array to a struct pointer, you can still tell the compiler to remove the padding bytes it might add. Note that this depends on the compiler you use and is not a standard C implementation. With gcc, you might add this statement at the end of your structure definition :
struct my_struct {
int blah;
/* Blah ... */
} __attribute__((packed));
Note that it will affect the performance for member access, copy etc ...
Unless you have a very good reason to do so, don't ever use the __attribute__((packed)) thing !
The other solution, which is much more advisable is to make the parsing on your own. You just allocate an appropriate structure and fill its fields by seeking the good information from your buffer. A sequence of memcpy instructions is likely to do the trick here (see Kerrek's answer)

Send a struct over a socket with correct padding and endianness in C

I have several structures defined to send over different Operating Systems (tcp networks).
Defined structures are:
struct Struct1 { uint32_t num; char str[10]; char str2[10];}
struct Struct2 { uint16_t num; char str[10];}
typedef Struct1 a;
typedef Struct2 b;
The data is stored in a text file.
Data Format is as such:
123
Pie
Crust
Struct1 a is stored as 3 separate parameters. However, struct2 is two separate parameters with both 2nd and 3rd line stored to the char str[] . The problem is when I write to a server over the multiple networks, the data is not received correctly. There are numerous spaces that separate the different parameters in the structures. How do I ensure proper sending and padding when I write to server? How do I store the data correctly (dynamic buffer or fixed buffer)?
Example of write: write(fd,&a, sizeof(typedef struct a)); Is this correct?
Problem Receive Side Output for struct2:
123( , )
0 (, Pie)
0 (Crust,)
Correct Output
123(Pie, Crust)
write(fd,&a, sizeof(a)); is not correct; at least not portably, since the C compiler may introduce padding between the elements to ensure correct alignment. sizeof(typedef struct a) doesn't even make sense.
How you should send the data depends on the specs of your protocol. In particular, protocols define widely varying ways of sending strings. It is generally safest to send the struct members separately; either by multiple calls to write or writev(2). For instance, to send
struct { uint32_t a; uint16_t b; } foo;
over the network, where foo.a and foo.b already have the correct endianness, you would do something like:
struct iovec v[2];
v[0].iov_base = &foo.a;
v[0].iov_len = sizeof(uint32_t);
v[1].iov_base = &foo.b;
v[1].iov_len = sizeof(uint16_t);
writev(fp, v, 2);
Sending structures over the network is tricky. The following problems you might have
Byte endiannes issues with integers.
Padding introduced by your compiler.
String parsing (i.e. detecting string boundaries).
If performance is not your goal, I'd suggest to create encoders and decoders for each struct to be send and received (ASN.1, XML or custom). If performance is really required you can still use structures and solve (1), by fixing an endianness (i.e. network byte
order) and ensure your integers are stored as such in those structures, and (2) by fixing a compiler and using the pragmas or attributes to enforce a "packed" structure.
Gcc for example uses attribute((packed)) as such:
struct mystruct {
uint32_t a;
uint16_t b;
unsigned char text[24];
} __attribute__((__packed__));
(3) is not easy to solve. Using null terminated strings at a network protocol
and depending on them being present would make your code vulnerable to several attacks. If strings need to be involved I'd use an proper encoding method such as the ones suggested above.
The easy way would be to write two functions for each structure: one to convert from textual representation to the struct and one to convert a struct back to text. Then you just send the text over the network and on the receiving side convert it to your structures. That way endianness does not matter.
There are conversion functions to ensure portability of binary integers across a network. Use htons, htonl, ntohs and ntohl to convert 16 and 32 bit integers from host to network byte order and vice versa.

Resources