Reading uninitialized unsigned int arrays from a packet in C - c

Im stuck with a problem of reading bytes in my C tcp socket server which receives request from a python client. I have the following struct as my receive template
struct ofp_connect {
uint16_t wildcards; /* identifies ports to use below */
uint16_t num_components;
uint8_t pad[4]; /* Align to 64 bits */
uint16_t in_port[0];
uint16_t out_port[0];
struct ofp_tdm_port in_tport[0];
struct ofp_tdm_port out_tport[0];
struct ofp_wave_port in_wport[0];
struct ofp_wave_port out_wport[0];
};
OFP_ASSERT(sizeof(struct ofp_connect) == 8);
I can read the first two 32 bit fields properly but my problem is the in_port[0] after the pad field that seems to be wrong. The way its currently being read is
uint16_t portwin, portwout, * wportIN;
wportIN = (uint16_t*)&cflow_mod->connect.in_port; //where cflow_mod is the main struct which encompasses connect struct template described above
memcpy(&portwin, wportIN, sizeof(portwin) );
DBG("inport:%d:\n", ntohs(portwin));
unfortunately this doesnt give me the expected inport number. I can check in wireshark that the client is sending the right packet format but I feel the way I read the in/out port is wrong. Or is it because of the way python sends the data? Can you provide some advice on where and why im going wrong? Thanks in advance.

The declaration of struct ofp_connect violates the following clause of the ISO C standard:
6.7.2.1 Structure and union specifiers ... 18 As a special case, the last element of a structure with more than one named member may have
an incomplete array type; this is called a flexible array member.
Note that in your case in_port and out_port should have been declared as in_port[] and out_port[] to take advantage of the clause above in which case you would have two flexible array membes, which is prohibited by the above clause. The zero-length array declaration is a convention adopted by many compilers (including gcc, for example) which has the same semantics but in your case, both in_port and out_port share the same space (essentially whatever bytes follow the ofp_connect structure). Moreover, for this to work, you have to allocate some space after the structure for the flexible array members. Since, as you said, struct connect is part of a larger structure, accessing in_port returns the 'value' stored in the containing structure's member following the connect sub-struct

Related

Change of flexible array into pointer

I am working to get rid of MISRA violations coming in my C code. It is violating rule 18.7.
struct abc {
struct header;
uint8_t data[]; /* Line 1 */
};
Here, Line 1 is causing the MISRA violations.
I tried to convert it into:
struct abc {
struct header;
uint8_t *data;
};
Can we do like the above or is it violating something ?
Your solution is semantically different and won't work even if it clears the violation.
The intent here is to create a structure that can act as a header for the contiguous data that follows it. So for example if you have:
struct Message
{
struct abc info ;
char data[128] ;
} message ;
Such that message.info.data and message.data refer to the same thing and casting a struct abc to a struct Message allows a function to be defined for passing any object with a struct abc header. Effectively supporting polymorphism in C.
Replacing it with:
struct abc
{
struct header;
uint8_t* data;
};
is semantically different because the data member does not refer to the data contiguous with header. The copy semantics also differ, and it is unlikely in the context of the code that uses the original structure that it will work as intended.
GCC supports the following syntax:
struct abc
{
struct header;
uint8_t data[0] ;
} ;
but it is likely that is not MISRA compliant either. A compliant solution is to have:
struct abc
{
struct header;
uint8_t data[1] ;
} ;
But that inserts an extra character and any code that uses this as a header may need to accommodate that when accessing the data through the data member.
All safety-related systems bans dynamic memory allocation, and therefore MISRA-C:2012 does so as well. This is the rationale for rule 18.7: flexible array members are closely associated with with dynamic allocation and therefore not allowed.
The reason why dynamic allocation is banned is that there can be no non-deterministic behavior in these kind of systems. In addition, it doesn't make any sense to use dynamic allocation in microcontroller/RTOS applications.
You can swap the flexible array member for a pointer if it makes sense to your application. But if it is some manner of protocol or data structure header, you probably want a fixed-size array instead. (And mind struct padding: storing data communication protocols in structs can be problematic because of alignment and endianess.)
Yes you can, as it makes the structure size deterministic and static, but it also forces you to allocate then release the needed space for data with malloc() and free(), or explicitly make it point to some already available space somewhere, each time you instanciate the structure.
What you probably want to do here is to specify a definite length to your array. If however this structure is meant to actually describe the header of a data block, you may use data[1] then let your index exceed this value to access the rest (ISO C forbids 0-length arrays, though).

How to create a C struct with specific size to send over socket to DalmatinerDB?

I'm trying to create a C client for dalmatinerdb but having trouble to understand how to combine the variables, write it to a buffer and send it to the database. The fact that dalmatinerdb is written in Erlang makes it more difficult. However, by looking at a python client for dalmatinerdb i have (probably) found the necessary variable sizes and order.
The erlang client has a function called "encode", see below:
encode({stream, Bucket, Delay}) when
is_binary(Bucket), byte_size(Bucket) > 0,
is_integer(Delay), Delay > 0, Delay < 256->
<<?STREAM,
Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>;
According to the official dalmatinerdb protocol we can see the following:
-define(STREAM, 4).
-define(DELAY_SIZE, 8). /bits
-define(BUCKET_SS, 8). /bits
Let's say i would like to create this kind of structure in C,
would it look something like the following:
struct package {
unsigned char[1] mode; // = "4"
unsigned char[1] delay; // = for example "5"
unsigned char[1] bucketNameSize; // = "5"
unsigned char[1] bucketName; // for example "Test1"
};
Update:
I realized that the dalmatinerdb frontend (web interface) only reacts and updates when values have been sent to the bucket. With other words just sending the first struct won't give me any clue if it's right or wrong. Therefore I will try to create a secondary struct with the actual values.
The erland code snippet which encodes values looks like this:
encode({stream, Metric, Time, Points}) when
is_binary(Metric), byte_size(Metric) > 0,
is_binary(Points), byte_size(Points) rem ?DATA_SIZE == 0,
is_integer(Time), Time >= 0->
<<?SENTRY,
Time:?TIME_SIZE/?SIZE_TYPE,
(byte_size(Metric)):?METRIC_SS/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?DATA_SS/?SIZE_TYPE, Points/binary>>;
The different sizes:
-define(SENTRY, 5)
-define(TIME_SIZE, 64)
-define(METRIC_SS, 16)
-define(DATA_SS, 32)
Which gives me this gives me:
<<?5,
Time:?64/?SIZE_TYPE,
(byte_size(Metric)):?16/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?32/?SIZE_TYPE, Points/binary>>;
My guess is that my struct containing a value should look like this:
struct Package {
unsigned char sentry;
uint64_t time;
unsigned char metricSize;
uint16_t metric;
unsigned char pointSize;
uint32_t point;
};
Any comments on this structure?
The binary created by the encode function has this form:
<<?STREAM, Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>
First let's replace all the preprocessor macros with their actual values:
<<4, Delay:8/unsigned-integer,
(byte_size(Bucket):8/unsigned-integer, Bucket/binary>>
Now we can more easily see that this binary contains:
a byte of value 4
the value of Delay as a byte
the size of the Bucket binary as a byte
the value of the Bucket binary
Because of the Bucket binary at the end, the overall binary is variable-sized.
A C99 struct that resembles this value can be defined as follows:
struct EncodedStream {
unsigned char mode;
unsigned char delay;
unsigned char bucket_size;
unsigned char bucket[];
};
This approach uses a C99 flexible array member for the bucket field, since its actual size depends on the value set in the bucket_size field, and you are presumably using this structure by allocating memory large enough to hold the fixed-size fields together with the variable-sized bucket field, where bucket itself is allocated to hold bucket_size bytes. You could also replace all uses of unsigned char with uint8_t if you #include <stdint.h>. In traditional C, bucket would be defined as a 0- or 1-sized array.
Update: the OP extended the question with another struct, so I've extended my answer below to cover it too.
The obvious-but-wrong way to write a struct corresponding to the metric/time/points binary is:
struct Wrong {
unsigned char sentry;
uint64_t time;
uint16_t metric_size;
unsigned char metric[];
uint32_t points_size;
unsigned char points[];
};
There are two problems with the Wrong struct:
Padding and alignment: Normally, fields are aligned on natural boundaries corresponding to their sizes. Here, the C compiler will align the time field on an 8-byte boundary, which means there will be padding of 7 bytes following the sentry field. But the Erlang binary contains no such padding.
Illegal flexible array field in the middle: The metric field size can vary, but we can't use the flexible array approach for it as we did in the earlier example because such arrays can only be used for the final field of a struct. The fact that the size of metric can vary means that it's impossible to write a single C struct that matches the Erlang binary.
Solving the padding and alignment issue requires using a packed struct, which you can achieve with compiler support such as the gcc and clang __packed__ attribute (other compilers might have other ways of achieving this). The variable-sized metric field in the middle of the struct can be solved by using two structs instead:
typedef struct __attribute((__packed__)) {
unsigned char sentry;
uint64_t time;
uint16_t size;
unsigned char metric[];
} Metric;
typedef struct __attribute((__packed__)) {
uint32_t size;
unsigned char points[];
} Points;
Packing both structs means their layouts will match the layouts of the corresponding data in the Erlang binary.
There's still a remaining problem, though: endianness. By default, fields in an Erlang binary are big-endian. If you happen to be running your C code on a big-endian machine, then things will just work, but if not — and it's likely you're not — the data values your C code reads and writes won't match Erlang.
Fortunately, endianness is easily handled: you can use byte swapping to write C code that can portably read and write big-endian data regardless of the endianness of the host.
To use the two structs together, you'd first have to allocate enough memory to hold both structs and both the metric and the points variable-length fields. Cast the pointer to the allocated memory — let's call it p — to a Metric*, then use the Metric pointer to store appropriate values in the struct fields. Just make sure you convert the time and size values to big-endian as you store them. You can then calculate a pointer to where the Points struct is in the allocated memory as shown below, assuming p is a pointer to char or unsigned char:
Points* points = (Points*)(p + sizeof(Metric) + <length of Metric.metric>);
Note that you can't just use the size field of your Metric instance for the final addend here since you stored its value as big-endian. Then, once you fill in the fields of the Points struct, again being sure to store the size value as big-endian, you can send p over to Erlang, where it should match what the Erlang system expects.

Data Structure inside a Union (C Programming)

Anytime structures are thrown inside other structures I just get confused for some reason. I'm writing a driver for a I2C (2-wire Serial Interface) device and I'm using the manufacturers drivers as a reference for creating mine. I have this union statement below (which is defined in a header file) and I just can't understand a few lines inside it. Just a brief background so you know what you're looking at is the main snippet below is setting up this TWI_statusReg variable which holds the information from a status register every time i'm transmitting/receiving data across the I2c bus. This data register is 8 bits long and belongs to a Atmel Atmega328P microcontroller. Here are my questions...
1.) Its hard to formulate this question in words but can you explain in easy terms of why you would declare a data struct inside a union struct like this? What key points should I pick out from this?
2.) In the ".c" header definition file which is too long to post here, there is a single line that says the following
TWI_statusReg.all = 0;
I know there is a char variable in the header file called 'all' as seen in the main snippet of code below. However, I'm not understanding what happens when it gets assigned a zero. Is this setting all the bits in the status register to zero?
3.) The two lines
unsigned char lastTransOK:1;
unsigned char unusedBits:7;
are confusing to me specifically what the colon operator is doing.
The main snippet of CODE
/****************************************************************************
Global definitions
****************************************************************************/
union TWI_statusReg // Status byte holding flags.
{
unsigned char all;
struct
{
unsigned char lastTransOK:1;
unsigned char unusedBits:7;
};
};
extern union TWI_statusReg TWI_statusReg;
1) The main reason for writing such a union is convenience. Instead of doing manually bit masks every time you need to access specific bits, you now have aliases for those bits.
2) Unions let you refer to memory as if its components were different variables representing different types. Unions only allocate space for the biggest component inside them. So if you have
union Example {
char bytes[3];
uint32_t num;
};
such a union would take 4 bytes, since its biggest type uint32_t takes 4 bytes of space. It would probably make more sense to have a union like this though, since you're using that space anyway and it's more convenient:
union Example {
char bytes[4];
uint32_t num;
};
bytes array will let you access individual bytes of num.
Your guess is correct - writing value to all will set the corresponding bits of the union.
3) This construct is called a bit field, and is an optimization of memory usage - if you were to use a struct of 2 chars it would actually take 2 bytes of memory space, instead if you declare a bit field it will only take 1 byte (and you still have 6 more "unused" bits)

Why does such a struct contain two array fields containing only one element?

Please Note: This question is not a duplicate of ( One element array in struct )
The following code is excerpted from the Linux kernel source (version: 3.14)
struct files_struct
{
atomic_t count;
struct fdtable __rcu *fdt;
struct fdtable fdtab;
spinlock_t file_lock ____cacheline_aligned_in_smp;
int next_fd;
unsigned long close_on_exec_init[1];
unsigned long open_fds_init[1];
struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};
I just wonder why close_on_exec_init and open_fds_init are defined as arrays containing one element, rather than just defined as unsigned long close_on_exec_init; and unsigned long open_fds_init;.
These fields are an optimization so Linux doesn't have to perform as many allocations for a typical process that has no more than BITS_PER_LONG open file descriptors.
The close_on_exec_init field provides the initial storage for fdt->close_on_exec when a files_struct is allocated. (See dup_fd in fs/file.c.)
Each bit of fdt->close_on_exec is set if the corresponding file descriptor has the “close-on-exec” flag set. Thus Linux only needs to allocate additional space for fdt->close_on_exec if the process has more open file descriptors than the number of bits in an unsigned long.
The open_fds_init field serves the same function for the fdt->open_fds field. The fd_array field serves the same function for the fdt->fd field. (Note that fd_array has a size of BITS_PER_LONG.)
The close_on_exec_init and open_fds_init fields formerly had type struct embedded_fd_set, but were changed to bare arrays in this commit. The commit message doesn't explain why the author chose to use one-element arrays instead of bare scalars. Perhaps the author (David Howells) simply wanted to avoid using the & operator.
My best guess: The addresses of these fields are used much more often than their actual values. In this case, making them size-1 arrays saves typing & every time their address is needed, since in C using the name of an array in an expression is in nearly all cases exactly equivalent to taking the address of its first element:
int x;
int y[1];
function_that_needs_address_of_int(&x);
function_that_needs_address_of_int(y);
function_that_needs_address_of_int(&y[0]); // Identical to previous line
(As others have pointed out in the comments, it can't be that the fields are being used as a hack for variable-length arrays, since there is more than one and they don't appear at the end of the struct.)
[EDIT: As pointed out by user3477950, an array name is not always identical to the address of its first element -- in certain contexts, like the argument to sizeof, they mean different things. (That's the only context I can think of for C; in C++, passing an array name as an argument can also enable a template parameter's type to be inferred to be a reference type.)]

Send a struct over a socket with correct padding and endianness in C

I have several structures defined to send over different Operating Systems (tcp networks).
Defined structures are:
struct Struct1 { uint32_t num; char str[10]; char str2[10];}
struct Struct2 { uint16_t num; char str[10];}
typedef Struct1 a;
typedef Struct2 b;
The data is stored in a text file.
Data Format is as such:
123
Pie
Crust
Struct1 a is stored as 3 separate parameters. However, struct2 is two separate parameters with both 2nd and 3rd line stored to the char str[] . The problem is when I write to a server over the multiple networks, the data is not received correctly. There are numerous spaces that separate the different parameters in the structures. How do I ensure proper sending and padding when I write to server? How do I store the data correctly (dynamic buffer or fixed buffer)?
Example of write: write(fd,&a, sizeof(typedef struct a)); Is this correct?
Problem Receive Side Output for struct2:
123( , )
0 (, Pie)
0 (Crust,)
Correct Output
123(Pie, Crust)
write(fd,&a, sizeof(a)); is not correct; at least not portably, since the C compiler may introduce padding between the elements to ensure correct alignment. sizeof(typedef struct a) doesn't even make sense.
How you should send the data depends on the specs of your protocol. In particular, protocols define widely varying ways of sending strings. It is generally safest to send the struct members separately; either by multiple calls to write or writev(2). For instance, to send
struct { uint32_t a; uint16_t b; } foo;
over the network, where foo.a and foo.b already have the correct endianness, you would do something like:
struct iovec v[2];
v[0].iov_base = &foo.a;
v[0].iov_len = sizeof(uint32_t);
v[1].iov_base = &foo.b;
v[1].iov_len = sizeof(uint16_t);
writev(fp, v, 2);
Sending structures over the network is tricky. The following problems you might have
Byte endiannes issues with integers.
Padding introduced by your compiler.
String parsing (i.e. detecting string boundaries).
If performance is not your goal, I'd suggest to create encoders and decoders for each struct to be send and received (ASN.1, XML or custom). If performance is really required you can still use structures and solve (1), by fixing an endianness (i.e. network byte
order) and ensure your integers are stored as such in those structures, and (2) by fixing a compiler and using the pragmas or attributes to enforce a "packed" structure.
Gcc for example uses attribute((packed)) as such:
struct mystruct {
uint32_t a;
uint16_t b;
unsigned char text[24];
} __attribute__((__packed__));
(3) is not easy to solve. Using null terminated strings at a network protocol
and depending on them being present would make your code vulnerable to several attacks. If strings need to be involved I'd use an proper encoding method such as the ones suggested above.
The easy way would be to write two functions for each structure: one to convert from textual representation to the struct and one to convert a struct back to text. Then you just send the text over the network and on the receiving side convert it to your structures. That way endianness does not matter.
There are conversion functions to ensure portability of binary integers across a network. Use htons, htonl, ntohs and ntohl to convert 16 and 32 bit integers from host to network byte order and vice versa.

Resources