I'm making a C program that pass a structure via socket
This is my struct
typedef struct{
char type; //message type
char* sender; //sender
char* receiver; //receiver
unsigned int msglen; //msg length
char* msg; //text
} msg_t;
this is my send function:
void send_message(int socket, char* msg)
{
msg_t message;
bzero(&message,sizeof(message));
message.msg = msg;
if(send(socket,&message,sizeof(msg_t),0) < 0)
{
perror("ERROR: send fail\n");
}
}
and this is my receive function:
msg_t rec_message(int socket)
{
msg_t buff;
bzero(&buff,sizeof(buff));
if(recv(socket,&buff,sizeof(buff),0) < 0)
{
perror("ERROR: receive failed\n");
}
return buff;
}
When I send message like strings everything works fine, but when I switch to structure the client seems to send the message and then give me this:
ERROR: receive failed: connection reset by peer
and the server this:
ERROR: receive failed: invalid argument
What am I doing wrong?
The question has several problems that need to be addressed. Perhaps it would be best to focus, first, on the msg_t structure itself. Here is a model of what it probably looks like; both in memory, as well as 'on the wire' as it is transmitted:
According to the above, msg_t is 40 bytes long. This can be confirmed by printing out it's size:
printf("sizeof(msg_t): %zd\n", sizeof(msg_t));
"So what's with all the empty white blocks?"
In order to make thing speedy at run-time, the compiler 'aligns' each field in the 'msg_t' structure at "natural/native" offsets of the CPU addressing architecture. On my 64-bit system, that means each structure field will be aligned on an eight-byte offset; even if it means leaving empty, unused space in the structure. Notice that the offsets of the structure fields are: 0, 8, 16, 24, 32; all multiples of 8 bytes.
On a 32-bit system, you might find that these offsets are at multiples of 4 bytes.
While an 8-byte alignment of structure fields is optimum for memory access, it is not so great when structures are sent over the wire. It is preferable for wire/protocol structures to be aligned at 1-byte; thus eliminating unused 'filler' bytes in the structure.
One way to change the alignment of a structure (supported my many compilers, but perhaps not defined by the C language itself) is '#pragma pack()'; which is used as shown below:
#pragma pack(1)
typedef struct{
char type; //message type
char* sender; //sender
char* receiver; //receiver
unsigned int msglen; //msg length
char* msg; //text
} msg_t;
#pragma pack()
In the above structure definition, the first '#pragma pack(1)' causes the following structure to be 1-byte aligned. The next '#pragma pack()' returns the compiler to its default 8-byte alignment default. This "packed" structure looks like this:
Next, examine the fields in the structure. The 'sender' field is a 'char *'. A 'char *' is an address where the sender string can be found on the 'sending' machine (or endpoint).
To be blunt, this 'address' is of no value at all to the 'receiver' machine (or endpoint); as the 'receiver' has no access to the memory of the 'sender'.
The same is true of the 'receiver' field; and the 'msg' field. All of these are addresses of strings on the 'sender' machine; which are of no value to the 'receiver' machine.
Most likely, the 'intent' is to send the actual 'sender', 'receiver' and 'msg' strings. To do that, a structure similar to the following might be used:
#pragma pack(1)
typedef struct{
char type; //message type
char sender[15]; //sender
char receiver[15]; //receiver
char msg[30]; //text
} msg_t;
#pragma pack()
This structure looks like this:
Now, the actual strings are in the structure; not just their address in memory. This will do what was actually intended.
Unfortunately, it does limit the length of each string; and it also contains a lot of unused/wasted space. Perhaps it would be nice to remove that limitation and allow more flexibility. It might be better to send these fields like this:
Notice that each 'variable-length' string is prefixed with one byte that indicates the length of the string that follows. (This is how strings are stored in the PASCAL language). This byte allows the following string to be from 0-255 bytes long. No wasted space on the wire.
Unfortunately, this 'wire format' cannot be produced directly using C structures.
Lets go now to the structure defined in the question; with some slight modification:
typedef struct{
char type; //message type
char* sender; //sender
char* receiver; //receiver
char* msg; //text
} msg_t;
Notice that I have returned the structure to it's natural/native 8-byte alignment by eliminating the '#pragma pack()' stuff. I have also removed the 'msgLength' field (it is not really needed).
Most likely, the sender, reciever, and msg fields of the structure will be initialized to point to strings (perhaps allocated with malloc(), etc.). What you do to send this structure over the wire, using the efficient layout above, is to send each field individually.
First, send the one byte 'type'. Then send the one byte length of the sender 'string' [ie: strlen(sender) + 1). Then send the 'sender' string, followed by the one byte length of the receiver string, followed by the 'receiver' string, followed by the one byte length of the 'msg' string, followed by the 'msg' string.
On the 'receiver' endpoint, you first read the one-byte 'type' (which would clue you in that there will be three 'length-preceeded' strings to follow). Reading the next byte would tell you the size of the following string (and allow you to malloc() memory to the 'sender' field of the msg_t structure at the receiver endpoint). Then read the 'sender' string into exactly the right sized, malloc()ed memory. Do the same to read the receiver string length, and the receiver string; and finally, with the msg length, and string.
If you find a PASCAL string (limited to 255 bytes) a bit tight, change the length-preceeded value from one byte, to multiple bytes.
Related
The following logic works fine but I'm uncertain of the caveats with what the standard says and whether it's totally safe to cast a struct to uint8_t * or char * to send to a message queue (which itself takes in a pointer to the buffer as well) or even a function?
My understanding is as long as uint8_t is considered a byte (which char is), it could be used to address any set of bytes
typedef struct
{
uint8_t a;
uint8_t b;
uint16_t c;
} } __attribute__((packed)) Pkt;
int main()
{
Pkt pkt = {.a = 4, .b = 12, .c = 300};
mq_send(mq, (char *) &pkt, sizeof(pkt), 0);
}
Perhaps it's similar to passing a cast pointer to a function (on the receiver end), and it's parsing the data according to bytes
typedef struct
{
uint8_t a;
uint8_t b;
uint16_t c;
} __attribute__((packed)) Pkt;
void foo(uint8_t *ptr)
{
uint8_t a = *ptr++;
uint8_t b = *ptr++;
uint16_t str = (*(ptr+1) << 8) | *ptr;
printf ("A: %d, B: %d, C: %d\n", a, b, str);
}
int main()
{
Pkt pkt = {.a = 4, .b = 12, .c = 300};
foo((uint8_t *) &pkt);
}
C deliberately allows accessing the bytes of an object and supports communicating objects by transmitting the bytes that represent them and reconstructing them from the transmitted bytes. However, it should be done correctly, and there are some issues to deal with.
A character type should be used.
The preferred type to work with is unsigned char. This is preferred for two reasons:
The C standard defines the behavior of using character types to access the representations of objects. The character types are char, signed char, and unsigned char. The standard does not require that uint8_t be a character type. Although it may have the same size and general properties of unsigned char, it may be an extended integer type rather than an alias of unsigned char (or of char). In this case, the C standard does not define the behavior of accessing the bytes of an object with uint8_t.
unsigned char is preferred over char or signed char to avoid problems with signed integers in various C operations.
The sender and the receiver must agree on the representations of the objects or the protocol used for sending them.
If the sender and the receiver are compiled with the same C implementation using the same definitions for the objects being transmitted (such as the same structure definitions), they will agree on the representations. Between diverse C implementations, though, it is necessary to ensure there is clear agreement on how the transmitted bytes represent objects. As shown in your code, the structure is packed, which should take care of the problem that there may be padding inside structures. Other considerations include:
The order of bytes within integers. Little-endian-first (bytes in order from least significant to most) and big-endian-first (the reverse) are common, although others are possible. Big endian is most common in network protocols.
Representations of non-integers, such as floating-point formats. The IEEE-754 floating-point standard specifies some interchange formats, which are very widely used.
Structures are layed out identically, including the types of members.
Theoretically, the order of bits within bytes must be agreed, but this is not an issue if the network service is operating at the byte level.
Note that, of course, some objects are inherently impossible to send via bytes representations due to needing context in the running program, such as pointers and file handles.
Additional note
Another hazard to guard against is interpreting a byte buffer as another object. The C standard defines the behavior of accessing the bytes of an object (for example, something defined as a structure) using a character type, but it does not define the reverse. Sometimes naïve programmers will create an array of character type, read a network message into it, and then convert a pointer into the array to a pointer to a structure type. This runs afoul of two issues:
The conversion is not defined if the alignment is not correct. (This should not be a problem with a packed array, which we would expect to have an alignment requirement of one byte.)
Accessing an array of characters as a different, incompatible type is not defined by the C standard.
The proper way to reassemble received bytes into an object is to copy them either into memory declared as the desired type or memory allocated (as with malloc) for the purpose of interpreting it as the intended object. This can be done by copying bytes from a buffer into the target memory or by directly passing the target memory to the network read routine, for it to fill in the bytes directly.
I've got the following struct:
struct fetch_info_t {
u_int8_t grocery_type;
u_int8_t arg[1024];
} __attribute__((packed));
I'd like to send this over a socket to a server, to request data. I'd very much like to avoid any libraries, such as protobuf.
grocery_type can be any value between 1 and 255. Some grocery types, say type 128, must provide additional information. I'ts not enough to provide type 128, I'd also like to provide Cheeses as a string. Having that said, type 129 must provide a number, u_int32_t and not a string, unlike 128.
Basically I've allocated 1024 bytes for the additional information the system may require. The question is, how do I send it over a socket, or more specifically, populate arg with the right information non-system-dependant? I know htonl on the number could be used, but how do I actually set the buffer value to that?
I'd imagine that the info sending would actually eventually be casting the struct pointer to unsigned char array and send it like that over a socket. Let me please know if there's a better way.
You cannot assign directly the 32-bit value to the array
because the correct alignment is not guaranteed.
memcpy() will just replicate the bytes with not alignment problem.
u_int32_t the_value=htonl( ... );
struct fetch_info_t the_info;
the_info.grocery_type=129;
memcpy(the_info.arg, &the_value, sizeof(the_value));
Then, because your structure is packed, you can send it with
send(my_socket, &the_info,
sizeof(the_info.grocery_type)+sizeof(the_value), 0);
In case you need to send a string
char *the_text= ... ;
size_t the_size=strlen(the_text)+1;
struct fetch_info_t the_info;
the_info.grocery_type=128;
memcpy(the_info.arg, the_text, the_size);
send(my_socket, &the_info,
sizeof(the_info.grocery_type)+the_size, 0);
Note that the '\0' is transmitted here.
I'm trying to create a C client for dalmatinerdb but having trouble to understand how to combine the variables, write it to a buffer and send it to the database. The fact that dalmatinerdb is written in Erlang makes it more difficult. However, by looking at a python client for dalmatinerdb i have (probably) found the necessary variable sizes and order.
The erlang client has a function called "encode", see below:
encode({stream, Bucket, Delay}) when
is_binary(Bucket), byte_size(Bucket) > 0,
is_integer(Delay), Delay > 0, Delay < 256->
<<?STREAM,
Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>;
According to the official dalmatinerdb protocol we can see the following:
-define(STREAM, 4).
-define(DELAY_SIZE, 8). /bits
-define(BUCKET_SS, 8). /bits
Let's say i would like to create this kind of structure in C,
would it look something like the following:
struct package {
unsigned char[1] mode; // = "4"
unsigned char[1] delay; // = for example "5"
unsigned char[1] bucketNameSize; // = "5"
unsigned char[1] bucketName; // for example "Test1"
};
Update:
I realized that the dalmatinerdb frontend (web interface) only reacts and updates when values have been sent to the bucket. With other words just sending the first struct won't give me any clue if it's right or wrong. Therefore I will try to create a secondary struct with the actual values.
The erland code snippet which encodes values looks like this:
encode({stream, Metric, Time, Points}) when
is_binary(Metric), byte_size(Metric) > 0,
is_binary(Points), byte_size(Points) rem ?DATA_SIZE == 0,
is_integer(Time), Time >= 0->
<<?SENTRY,
Time:?TIME_SIZE/?SIZE_TYPE,
(byte_size(Metric)):?METRIC_SS/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?DATA_SS/?SIZE_TYPE, Points/binary>>;
The different sizes:
-define(SENTRY, 5)
-define(TIME_SIZE, 64)
-define(METRIC_SS, 16)
-define(DATA_SS, 32)
Which gives me this gives me:
<<?5,
Time:?64/?SIZE_TYPE,
(byte_size(Metric)):?16/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?32/?SIZE_TYPE, Points/binary>>;
My guess is that my struct containing a value should look like this:
struct Package {
unsigned char sentry;
uint64_t time;
unsigned char metricSize;
uint16_t metric;
unsigned char pointSize;
uint32_t point;
};
Any comments on this structure?
The binary created by the encode function has this form:
<<?STREAM, Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>
First let's replace all the preprocessor macros with their actual values:
<<4, Delay:8/unsigned-integer,
(byte_size(Bucket):8/unsigned-integer, Bucket/binary>>
Now we can more easily see that this binary contains:
a byte of value 4
the value of Delay as a byte
the size of the Bucket binary as a byte
the value of the Bucket binary
Because of the Bucket binary at the end, the overall binary is variable-sized.
A C99 struct that resembles this value can be defined as follows:
struct EncodedStream {
unsigned char mode;
unsigned char delay;
unsigned char bucket_size;
unsigned char bucket[];
};
This approach uses a C99 flexible array member for the bucket field, since its actual size depends on the value set in the bucket_size field, and you are presumably using this structure by allocating memory large enough to hold the fixed-size fields together with the variable-sized bucket field, where bucket itself is allocated to hold bucket_size bytes. You could also replace all uses of unsigned char with uint8_t if you #include <stdint.h>. In traditional C, bucket would be defined as a 0- or 1-sized array.
Update: the OP extended the question with another struct, so I've extended my answer below to cover it too.
The obvious-but-wrong way to write a struct corresponding to the metric/time/points binary is:
struct Wrong {
unsigned char sentry;
uint64_t time;
uint16_t metric_size;
unsigned char metric[];
uint32_t points_size;
unsigned char points[];
};
There are two problems with the Wrong struct:
Padding and alignment: Normally, fields are aligned on natural boundaries corresponding to their sizes. Here, the C compiler will align the time field on an 8-byte boundary, which means there will be padding of 7 bytes following the sentry field. But the Erlang binary contains no such padding.
Illegal flexible array field in the middle: The metric field size can vary, but we can't use the flexible array approach for it as we did in the earlier example because such arrays can only be used for the final field of a struct. The fact that the size of metric can vary means that it's impossible to write a single C struct that matches the Erlang binary.
Solving the padding and alignment issue requires using a packed struct, which you can achieve with compiler support such as the gcc and clang __packed__ attribute (other compilers might have other ways of achieving this). The variable-sized metric field in the middle of the struct can be solved by using two structs instead:
typedef struct __attribute((__packed__)) {
unsigned char sentry;
uint64_t time;
uint16_t size;
unsigned char metric[];
} Metric;
typedef struct __attribute((__packed__)) {
uint32_t size;
unsigned char points[];
} Points;
Packing both structs means their layouts will match the layouts of the corresponding data in the Erlang binary.
There's still a remaining problem, though: endianness. By default, fields in an Erlang binary are big-endian. If you happen to be running your C code on a big-endian machine, then things will just work, but if not — and it's likely you're not — the data values your C code reads and writes won't match Erlang.
Fortunately, endianness is easily handled: you can use byte swapping to write C code that can portably read and write big-endian data regardless of the endianness of the host.
To use the two structs together, you'd first have to allocate enough memory to hold both structs and both the metric and the points variable-length fields. Cast the pointer to the allocated memory — let's call it p — to a Metric*, then use the Metric pointer to store appropriate values in the struct fields. Just make sure you convert the time and size values to big-endian as you store them. You can then calculate a pointer to where the Points struct is in the allocated memory as shown below, assuming p is a pointer to char or unsigned char:
Points* points = (Points*)(p + sizeof(Metric) + <length of Metric.metric>);
Note that you can't just use the size field of your Metric instance for the final addend here since you stored its value as big-endian. Then, once you fill in the fields of the Points struct, again being sure to store the size value as big-endian, you can send p over to Erlang, where it should match what the Erlang system expects.
I am building this file system in C. At the moment I am on the first step of the assignment which is just to create a simple file system that works in memory. My question is more based around C than it is around a Unix File System. I am trying to "emulate" a raw disk. I have the following structure:
struct disk {
void *data;
unsigned int numOfBlocks;
};
Let's pretend a block on this disk will be 512 Bytes (like the original Unix file system). I have some functions defined to create a disk, read from a disk, and write to a disk. It is then my job to implement the various things such as data blocks, i-node blocks, the super block etc.
Look at the void *data variable above. I want this to be a two dimensional array. It should be an array of block arrays. So...what makes the most sense to be is to use something like this.
unsigned char *data[30][512]; // Assuming the disk holds 30 blocks of 512 bytes each
Here comes the question: If I have other structures defined to represent an i-node, a super block, or a data block, and each also has a struct size of 512 bytes, how can I properly cast this unsigned char * to the i-node struct, or the data struct, etc?
Thanks.
I'm not sure I understand the question.
From what you say:
My question isn't how to access the characters in the data variable...it's how to take a 512 byte array of unsigned char for instance and convert it to some other type of struct that is 512 bytes long.
If you have:
unsigned char block[512];
and:
//total struct size: 512 bytes
struct something {
//members here
};
You could do this:
int main(void)
{
struct something *ptr;
ptr = block;
return 0;
}
You can just use the array bytes to hold the structure data. To avoid warnings, cast the address:
ptr = (struct something *)block;
After you assigned memory to the pointer you can use it as you normally would.
note: I may be wrong since I'm a beginner.
void is point to anything, so if you have control of what void * means, than the definition you have is enough.
To access data as a two dimensional array, you just need to cast the type ((char (*)[512])data
or
char *array[512] = data;
byte = array[x][y];
the compiler do not need to know the size of the second dimension, it is your responsibility to make sure your program will never access an invalid location.
Im stuck with a problem of reading bytes in my C tcp socket server which receives request from a python client. I have the following struct as my receive template
struct ofp_connect {
uint16_t wildcards; /* identifies ports to use below */
uint16_t num_components;
uint8_t pad[4]; /* Align to 64 bits */
uint16_t in_port[0];
uint16_t out_port[0];
struct ofp_tdm_port in_tport[0];
struct ofp_tdm_port out_tport[0];
struct ofp_wave_port in_wport[0];
struct ofp_wave_port out_wport[0];
};
OFP_ASSERT(sizeof(struct ofp_connect) == 8);
I can read the first two 32 bit fields properly but my problem is the in_port[0] after the pad field that seems to be wrong. The way its currently being read is
uint16_t portwin, portwout, * wportIN;
wportIN = (uint16_t*)&cflow_mod->connect.in_port; //where cflow_mod is the main struct which encompasses connect struct template described above
memcpy(&portwin, wportIN, sizeof(portwin) );
DBG("inport:%d:\n", ntohs(portwin));
unfortunately this doesnt give me the expected inport number. I can check in wireshark that the client is sending the right packet format but I feel the way I read the in/out port is wrong. Or is it because of the way python sends the data? Can you provide some advice on where and why im going wrong? Thanks in advance.
The declaration of struct ofp_connect violates the following clause of the ISO C standard:
6.7.2.1 Structure and union specifiers ... 18 As a special case, the last element of a structure with more than one named member may have
an incomplete array type; this is called a flexible array member.
Note that in your case in_port and out_port should have been declared as in_port[] and out_port[] to take advantage of the clause above in which case you would have two flexible array membes, which is prohibited by the above clause. The zero-length array declaration is a convention adopted by many compilers (including gcc, for example) which has the same semantics but in your case, both in_port and out_port share the same space (essentially whatever bytes follow the ofp_connect structure). Moreover, for this to work, you have to allocate some space after the structure for the flexible array members. Since, as you said, struct connect is part of a larger structure, accessing in_port returns the 'value' stored in the containing structure's member following the connect sub-struct