SENDING STRUCTS using tcp sockets - file

I am reading a file to the structs, then write them to the server.
Here is a snippet of code:
struct b{
uint16_t num;
char str[10];
} x;
struct a{
uint32_t pid;
char str1[10];
char str2[10];
} y;
while(fscanf(fp,"%s",buff) != EOF)
while(1){
c = getchar();
if (c == '\n')
break;
else
buff[i]= c ;
i++;
write(fd, &b,sizeof(b))
Text File Format
123
George
Lee
How do I read and store the file to each struct?
When I write to a server, does it look like
write(fd, &a,sizeof(struct a)).
write(fd,&b,sizeof(struct b))?
How to ensure correct padding, endianness
This is how I run the file : /a.out IP PORT < file.txt

When you say "read and store the file to each struct", could you clarify that question?
To ensure the right padding and endian-ness, you have to do two things:
Send the struct one field at a time. It is a pain. Since you are declaring each struct member as a uint16_t type (and from that family), sizeof() will give you the same answer across all platforms.
TO ensure endianness, you have to use the host-to-network-order family. See: http://www.gnu.org/s/hello/manual/libc/Byte-Order.html
When you do this, you have to use htons() (host to network short) when sending and then ntohs() (network to host short) when receiving. For each member of your struct.
Often people ignore this; if both machines are modern Intel boxes, which is true for the vast majority of us, you can get away with the code you have for sending structs. But like you have mentioned, that doesn't gaurantee padding and endinaness!
You could also use the #pragma pack() GNU directives for specifying how padding should be handled. This works if you can rely on the GNU toolchain for both your client and server, as that is not part of the C specification.

Don't use send/recv (sockets) or read/write (file descriptors) with structures. It's virtually guaranteed to break later on. Among other problems, the padding and/or alignment of the structure members can change depending on the compiler, the compilation options, and the environment.
Instead, marshal the data into an independent format (like text) and send that. If the data must be absolutely the same on both sides, encode it into base 64 and prepend a checksum.
If you absolutely must transmit binary data, remember to convert each of the structure members to network byte order (man byteorder) and send each one individually.

Related

Best practice for parsing data of mixed type?

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.
For instance, let's say the data is 10 bytes, and it consists of:
Byte 0-1: manufacturer ID (int)
Byte 2: type (int)
Byte 3-4: device id (ascii char)
I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.
Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).
Then, based on the assumption of "potential incorrectness" define types to distinguish between "unchecked, potential incorrect data" and "checked, known correct data". For your example, you could use uint8_t packet[10]; as the data type for unchecked data and a normal structure (with padding and without __attribute__((packed));) for the checked data. This makes it extremely difficult for a programmer to accidentally use unsafe data when they think they're using safe/checked data.
Of course you will also need code to convert between these data types, which needs to do as many sanity checks as possible (and possibly also worry about things like endianess). For your example these checks could be:
are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).
is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)
is the type valid (e.g. maybe there's an enumeration that it needs to match)
Note that this function should return some kind of status to indicate if the conversion was successful or not, and in most cases this status should also give an indication of what the problem was if the conversion wasn't successful (so that the caller can inform the user or log the problem or handle the problem in the most suitable way for the problem). For example, maybe "unknown manufacturer ID" means that the program needs to be updated to handle a new manufacturer and that the data was correct, and "invalid manufacturer ID" means that the data was definitely wrong.
Like this:
struct packet {
uint16_t mfg;
uint8_t type;
uint16_t devid;
} __attribute__((packed));
The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.
Once you have the above struct, you simply cast (part of) a char array which you received from wherever:
char buf[1000];
(struct packet*)(buf + N);
For a fully portable version, I suggest you do the read in this fashion:
struct {
uint16_t e1;
uint8_t e2;
uint16_t e3;
} d;
uint8_t *cursor;
uint8_t rbuf[5];
read(sock, rbuf, sizeof(rbuf));
memcpy(&s.e1, &rbuf[0], sizeof(s.e1));
s.e2 = rbuf[2];
memcpy(&s.e3, &rbuf[3], sizeof(s.e3));
s.e1 = ntohs(s.e1);
s.e3 = ntohs(s.e3);
You may be tempted to do something like others answers said, something like:
struct s {
uint16_t e1;
uint8_t e2;
uint16_t e3;
} __attribute__((packed));
struct s d;
read(sock, &d, sizeof(d));
s.e1 = ntohs(s.e1);
s.e3 = ntohs(s.e3);
However, this code is not fully portable and can lead you to problems, since you are accessing items (s.e3) with unaligned memory, which is in itself undefined behavior. Under some circumstances, this fashion OK and desirable (less cache polluting since more structs can fill the different cache lines, and maybe simpler code), but in others cases, it can cause a bus error and make your code incompatible for some architectures.
Beyond that, you should follow other best practices, like trying to read as many structs at possible between read() calls, make nicer code about net-to-host byte-ordering translations... but I think that avoid the non-standard attribute should be the first thing.
Note that if you DON'T do unaligned access, all of this (even the __packed__ attribute) is completely unnecesary, and that you can read the structs like:
struct {
uint16_t e1;
uint8_t e2;
uint8_t e2;
uint16_t e3;
} d;
read(rsock, &d, sizeof(d));

Copying structure to char* buffer

Basicly i have a custom structure that contains different kind of data. For example:
typedef struct example_structure{
uint8_t* example_1[4];
int example_2[4];
int example_3;
} example_structure;
What i need to do is to copy context of this structure to a const char* buffer so i can send that copied data (buffer) using winsock2's send(SOCKET s, const char* buffer, int len, int flags) function. I tried using memcpy() but wouldn't i just copy address of pointers and not the data?
Yes, if you copied or sent that structure through a socket you would end up copying/sending pointers, which would obviously be meaningless to the recipient, however, if the recipient is running on different hardware (e.g. not the same endian), all of the data may be meaningless anyway. On top of that, differences in the amount of padding between structure members may also become a problem.
For non-trivial situations it is best to use an existing protocol (such as protobuf), or roll your own protocol, keeping in mind the potential differences in hardware representation of your data.
You need to design a protocol before you can encode the data in accord with that protocol. Decide exactly how the data will be encoded at the byte level. Then write code to encode and decode to that format that you decided on.
Do not skip the step of actually documenting the wire protocol at the byte level. It will save you pain later, I promise.
See this answer for a bit more detail.
const char* buffer
This buffer has a constant value so u cant copy anything to it. You probably don't need to copy anything. Just use send function in such a way:
send(s, (char*)&example_structure, sizeof(structure), flags)
But here is the problem with pointers in your structure (uint8_t* example_1[4];).
Sending pointers between different applications / machine does not make sense.
Hmm, your struct contains uint8_t * fields, what looks like C strings... It does not make sense copying or sending a pointer which is just a mere memory address in sending process user space.
If your struct has been (note, no pointers):
typedef struct example_structure{
uint8_t example_1[4];
int example_2[4];
int example_3;
} example_structure;
and provided you transfer it on exactly same architecture (same hardware, same compiler, same compiler options), you could do simply:
example_structure ex_struc;
// initialize the struct
...
send(s, &ex_struc, sizeof(ex_struc), flags);
And even in that case, I would strongly advise you to define and use a protocol - as already said by #DavidSchwartz, it could save you time and headaches later...
But as you have pointers, you cannot do that and must define a protocol.
it could be (but you are free to prefere little endian order, or 2 or 8 bytes for each int depending on your actual data):
one byte (or two) for length of first uint8_t array, followed by the array
above repeated 3 more times
four bytes in big endian order for first int of example_2
repeated 3 times
four bytes in big endian order for int of example_3
This clearly defines the format of a message.

read/write struct to a socket

I am trying to write from client a struct to the server with socket.
the struct is :
typedef struct R
{
int a;
int b;
double c;
double d;
double result[4];
}R;
The struct is the same at the 2 programs(server,client) and i malloc for the struct in both.
the client program:
struct R* r;
malloc..
...(fill the struct with data)
write(socket_fd,(void*)r,size of(R));
the Server program:
struct R* r;
malloc..
read(client_fd,(R*)r,size of(R));
This is not passing the struct from client to server..
How to write the struct to the server by socket??
Some basic elements of network programming are:
One read or write call might not write the total bytes you intend to
read/write. Check the return value of call. It would return number
of bytes read/written. If less bytes have been written, you should
call write in a loop until all data has been written. Same applies
to read.
Endianess of machine also matters. If you wrote an int which was
little endian (e.g. x86), when travelling on the network it would be converted into a
big endian value. You need to use apis such as htons, ntohs in POSIX to
accommodate that.
These are just starting points, but the most likely reasons of data not reaching destination in the form as you expected.
I'm assuming you are getting some data, but not in the form you are expecting. If so, it might also help to add a breakpoint in gdb and inspect the memory of r, in the client code.
You could fill the struct in sender code with 0xdeadbeef or similar debug strings(http://en.wikipedia.org/wiki/Magic_number_%28programming%29#Magic_debug_values), to identify your data in the client memory more easily. I have found that very helpful for debugging. Like some of the other answers mentioned, endianess and partial data might be the problem. Checking the return values and error codes will help too.

Invalid sizeof() struct, gap between members

I have a struct like this:
typedef struct _HEADER_IO
{
uint8_t field1 : 2;
uint8_t field2 : 4;
uint8_t field3 : 1;
uint8_t field4 : 1;
uint16_t field5;
uint8_t field6;
} HEADER_IO;
It's basicly a message header that will be sent over tcp. The server reads this so that it knows what data follows in the buffer. However for some reason intead of the size being 4 bytes (2+4+1+1 first byte + 2 bytes from field 5 + 1 byte field 6) the size is 6 bytes.
Looking it up in memory view it is:
XX AA XX XX XX AA
Instead of:
XX XX XX XX
Where AA are never set no matter what I do. This is a problem because I am planning for the header to be send() to a server and the extra bytes are included making the server interpret the header wrong. What am I doing wrong?
In general, it's a bad idea to use bitfields for things like these. Since you can't know beforehand exactly which byte the bits will end up in, and since there are padding and alignment issues.
In my opinion, it's better to "own up" to the fact that you need more control over the external representation than what C structures give you, and do it manually. You can of course keep the struct as the in-memory (internal) representation.
Basically, you would write a function like:
size_t header_serialize(unsigned char *buf, size_t max, const HEADER_IO *header);
whose job it would be to, in the memory at buf, build the proper byte sequence that represents header.
To clarify (based on comments), the intent is to read the fields from header, not just do e.g.
memcpy(buf, header, sizeof *header); /* DON'T DO THIS! */
Instead, you're supposed to assemble the expected external representation, byte by byte, from the fields of header. That way, you always get the same external representation regardless of what the compiler does to the in-memory format of header.
In standard C you can't help the fact that struct members can have padding inserted between them. You have to write a function to decode the data and store it in your struct before processing. This is because on some architectures unaligned memory access (reading from a pointer not aligned to, for example, 4 bytes) is very expensive and C will automatically pad your structures to avoid the cost. There's no standard way to turn the feature on or off.
For example in GCC you can add __attribute__((packed)) after the struct definition and Visual Studio has some #pragma commands (see http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html) that are also supported by GCC but beware that overall this is non-standard.
Since your comments mentioned it's a Windows program, probably it would work if you add this before the struct definition:
#pragma pack(push,1)
And this after it:
#pragma pack(pop)
While it would be more portable to write code to more manually decode the header, the above approach should be faster.

Send a struct over a socket with correct padding and endianness in C

I have several structures defined to send over different Operating Systems (tcp networks).
Defined structures are:
struct Struct1 { uint32_t num; char str[10]; char str2[10];}
struct Struct2 { uint16_t num; char str[10];}
typedef Struct1 a;
typedef Struct2 b;
The data is stored in a text file.
Data Format is as such:
123
Pie
Crust
Struct1 a is stored as 3 separate parameters. However, struct2 is two separate parameters with both 2nd and 3rd line stored to the char str[] . The problem is when I write to a server over the multiple networks, the data is not received correctly. There are numerous spaces that separate the different parameters in the structures. How do I ensure proper sending and padding when I write to server? How do I store the data correctly (dynamic buffer or fixed buffer)?
Example of write: write(fd,&a, sizeof(typedef struct a)); Is this correct?
Problem Receive Side Output for struct2:
123( , )
0 (, Pie)
0 (Crust,)
Correct Output
123(Pie, Crust)
write(fd,&a, sizeof(a)); is not correct; at least not portably, since the C compiler may introduce padding between the elements to ensure correct alignment. sizeof(typedef struct a) doesn't even make sense.
How you should send the data depends on the specs of your protocol. In particular, protocols define widely varying ways of sending strings. It is generally safest to send the struct members separately; either by multiple calls to write or writev(2). For instance, to send
struct { uint32_t a; uint16_t b; } foo;
over the network, where foo.a and foo.b already have the correct endianness, you would do something like:
struct iovec v[2];
v[0].iov_base = &foo.a;
v[0].iov_len = sizeof(uint32_t);
v[1].iov_base = &foo.b;
v[1].iov_len = sizeof(uint16_t);
writev(fp, v, 2);
Sending structures over the network is tricky. The following problems you might have
Byte endiannes issues with integers.
Padding introduced by your compiler.
String parsing (i.e. detecting string boundaries).
If performance is not your goal, I'd suggest to create encoders and decoders for each struct to be send and received (ASN.1, XML or custom). If performance is really required you can still use structures and solve (1), by fixing an endianness (i.e. network byte
order) and ensure your integers are stored as such in those structures, and (2) by fixing a compiler and using the pragmas or attributes to enforce a "packed" structure.
Gcc for example uses attribute((packed)) as such:
struct mystruct {
uint32_t a;
uint16_t b;
unsigned char text[24];
} __attribute__((__packed__));
(3) is not easy to solve. Using null terminated strings at a network protocol
and depending on them being present would make your code vulnerable to several attacks. If strings need to be involved I'd use an proper encoding method such as the ones suggested above.
The easy way would be to write two functions for each structure: one to convert from textual representation to the struct and one to convert a struct back to text. Then you just send the text over the network and on the receiving side convert it to your structures. That way endianness does not matter.
There are conversion functions to ensure portability of binary integers across a network. Use htons, htonl, ntohs and ntohl to convert 16 and 32 bit integers from host to network byte order and vice versa.

Resources