I'm a bit new to C, but I've done my homework (some tutorials, books, etc.) and I need to program a simple server to handle requests from clients and interact with a db. I've gone through Beej's Guide to Network programming, but I'm a bit unsure how to piece together and handle different parts of the data getting sent back and forth.
For instance, say the client is sending some information that the server will put in multiple fields. How do I piece together that data to be sent and then break it back up on the server side?
Thanks,
Eric
If I understand correctly, you're asking, "how does the server understand the information the client sends it"?
If that's what you're asking, the answer is simple: it's mutually agreed upon ahead of time that the data structures each uses will be compatible. I.e. you decide upon what your communication protocol will be ahead of time.
So, for example, if I have a client-server application where the client connects and can ask for things such as "time", "date" and can say "settime " and "setdate ", I need to write my server in such a way that it will understand those commands.
Obviously, in the above case it's trivial, since it'd just be a text-based protocol. But let's say you're writing an application that will return a struct of information, i.e.
struct Person {
char* name;
int age;
int heightInInches;
// ... other fields ...
};
You might write the entire struct out from the server/client. In this case there are a few things to be aware of:
You need to hton/ntoh properly
You need to make sure that your client and server both can understand the struct in question.
You may or may not have to align on a 4B boundary (because if you don't, different C compilers may do different things, which may burn you between the client and the server, or it may not).
In general, though, when writing a client/server app, the most important thing to get right is the communication protocol.
I'm not sure if this quite answers your question, though. Is this what you were after, or were you asking more about how, exactly, do you use the send/recv functions?
First, you define how the packet will look - what information will be in it. Make sure the definition is in an architecture-neutral format. That means that you specify it in a sequence that does not depend on whether the machine is big-endian or little-endian, for example, nor on whether you are compiling with 32-bit long or 64-bit long values. If the content is of variable length, make sure the definition contains the information needed to tell how long each part is - in particular, each variable length part should be preceded by a suitable count of its length.
When you need to package the data for transmission, you will take the raw (machine-specific) values and write them into a buffer (think 'character array') at the appropriate positions, in the appropriate format.
This buffer will be sent across the wire to the receiver, which will read it into another buffer, and then reverse the process to obtain the information from the buffer into local variables.
There are functions such as ntohs() to convert from a network ('n') to host ('h') format for a 'short' (meaning 16-bit) integer, and htonl() to convert from a host 'long' (32-bit integer) to network format - etc.
One good book for networking is Stevens' "UNIX Network Programming, Vol 1, 3rd Edn". You can find out more about it at its web site, including example code.
As already mentioned above what you need is a previously agreed means of communication. One thing that helps me is to use xmls to communicate.
e.g. You need time to send time to client then include it in a tag called time.
Then parse it on the client side and read the tag value.
The biggest advantage is that once you have a parser in place on client side then even if you have to send some new information them just have to agree on a tag name that will be parsed on the client side.
It helps me , I hope it helps you too.
Related
I am working on a Linux-based project consisting of a "core" application, written in C, and a web server, probably written in Python. The core and web server must be able to communicate with each other over TCP/IP. My focus is on the core application, in C.
Because of the different programming languages used for the core and web server, I am looking for a message protocol which is easy to use in both languages. Currently I think JSON is a good candidate. My question, however, is not so much about the message protocol, but about how I would determine the amount of bytes to read from (and maybe send to) the socket, specifically when using a message protocol like JSON, or XML.
As I understand it, whether you use JSON, XML, or some other message protocol, you cannot include the size of the message in the message itself, because in order to parse the message, you would need the entire message and therefore need to know the size of it in advance. Note that by "message" I mean the data formatted according to the used message protocol.
I've been thinking and reading about the solution to this, and have come to the following two possibilities:
Determine the largest possible size of a message, say 500 bytes, and based on that determine the buffer size, say 512 bytes, and add padding to each message so that 512 bytes are sent;
Prepend each message with its size in "plain text". If the size is stored in an Int (4 bytes), then the receiver first reads 4 bytes from the socket and using those 4 bytes, determines how many bytes to read next for the actual message;
Because all of the offered solutions I've read weren't specifically for the use of some message protocol, like JSON, I think it's possible that maybe I am missing out on something.
So, which of the two possibilities I offered is the best, or, am I not aware of some other solution to this problem?
Kind regards.
This is a classic problem encountered with streams, including those of TCP, often called the "message boundary problem." You can search around for more detailed answers than what I can give here.
To determine boundaries, you have some options:
Fixed length with padding like you said. Unless you have very small messages, not adviseable.
Prepend with size like you said. If you want to get fancy and support large messages without wasting too many bytes, you can use a variable length quantity, where you use a bit to determine whether to read more bytes for the size. #alnitak mentioned a drawback in the comments I neglected, which is that you can't start sending until you know the size.
Bound with some byte you don't use anywhere else (JSON and XML are text-only, so '\0' works with ASCII or any UTF). Simple but slower on the receiving end because you have to scan every byte this way.
Edit: JSON, XML, and many other formats can also be parsed on-the-fly to determine boundaries (e.g. each { must be closed with } in JSON), but I don't see any advantage to doing this.
If this isn't just a learning experience, you can instead use an existing protocol to do this all for you. HTTP (inefficient) or gRPC (more efficient), for example.
Edits: I originally said something totally wrong about having to include a checksum to handle packet loss in spite of TCP... TCP won't advance until those packets are properly received, so that's not an issue. IDK what I was thinking.
Question is simple: how to send exactly one byte to some server.
I am trying to create a program for Windows in C which should be able to send exactly one byte to a given IP address. So, I simply need a function like this:
void sendByte(unsigned int ip, unsigned char byte){/**/}
I searched on MSDN and I found some functions which provide such a service (of course, program should be enabled in firewall etc.., I did that already), but these functions also write a header. For example, when I try to send one byte, it sends actually 1025 bytes. The first 1024 bytes are some junk data I don't want to be sent (like headers Content-Length:... etc). Is there a way to send exactly one byte?
What I have also found is some libraries and stand-alone executable files which provide something like that. But, I don't want to include anything except standard C libraries and also <windows.h>, and I don't want to use 3rd stand-alone executables.
I have searched on StackOverflow and MSDN, but I found nothing helpful. Is there an easy way to do it using only standard libraries and windows.h only? Not other libraries and plugins and programs please.
Suppose I have a C struct defined as follows :
typedef struct servData {
char max_word[MAX_WORD];
char min_word[MAX_WORD];
int word_count ;
} servSendData ;
where 'MAX_WORD' could be any value.
Now if I have an instance of this structure :
servSendData myData ;
And if I populate this instance and then send it over the network, will there be any portability issues here considering that I want my server as well as the client to be running on either a 64-bit system or a 32-bit system.
I am going to send and receive data as follows :
//server side
strcpy(myData.max_word, "some large word") ;
strcpy(myData.min_word, "small") ;
myData.word_count=100 ;
send(sockFd, (char*)&myData, sizeof(myData);
//client side
recv(sockFd, (char*)&myData, sizeof(myData);
printf("large word is %s\n", myData.max_word) ;
printf("small word is %s\n", myData.min_word) ;
printf("total words is %d\n", myData.word_count) ;
Yes, there definitely will be portability issues.
Alignment of structure members can be different even among different compilers on the same platform, let alone different platforms. And that's all assuming that sizeof(int) is the same across all of them (though granted, it usually is --- but do you really want to rely on "usually" and hope for the best?).
This holds even if MAX_WORD is the same on both computers (I'll assume they are from here on out; if they're not, then you're in trouble here).
What you need to do is send (and receive) each field separately. There is also a problem with sizeof(int) and endianness, so I've added a call to htonl() to convert from system to network byte order (the inverse function is ntohl()). They both return uint32_t which has a fixed, known, size.
send(sockFd, myData.max_word, sizeof(myData.max_word)); // or just MAX_WORD
send(sockFd, myData.min_word, sizeof(myData.min_word));
uint32_t count = htonl(myData.word_count); // convert to network byte order
send(sockFd, &count, sizeof(count));
// error handling!
if((ret = recv(sockFd, myData.max_word, sizeof(myData.max_word))) != sizeof(myData.max_word))
{
// handle error or read more data
}
... // and so on
// remember to convert back from network byte order on recv!
// also keep in mind the third field is now `uint32_t`, and not `int` in the stream
As other relies have stated there are real problems in copying a C structure between different machines with different compilers/word size/and endian structure. One common way to resolve this issue is to transform your data into a machine independent format, transfer it across the network and then transform it back into a structure on the receiver. This is such a common requirement that multiple technologies already exist to do this - the two that spring to my mind initially are gsoap and rpcgen although there are probably many other options.
I've mostly used gsoap and after you get past the initial learning curve you can develop robust solutions that scale well (with multiple threads) and which handles both the networking and data translations for you.
If you don't want to go down this route then the safest approach is to write routines that convert your data to/from a standard string format (if you have issues with Unicode you'll need to take that into account as well) and then send that across the network.
You have to take care about the endians.
May you should use hton() or ntoh() functions, to convert between little and big endian.
You can use structure packing. With most C compilers, you can enforce a specific structure alignment. It is sometimes used for what you need it - to transfer a struct over a network.
Note that this still leaves endianness issues, so this is not a universal solution.
If you are not writing embedded software, sending data between applications without serializing properly is rarely a good idea.
The same goes for using raw sockets, which is not very convenient, and feels a bit like "reinventing the wheel".
Many libraries can help you with both! Of course, you don't have to use them, but reading their documentation, and understanding how they work will help you make better choices. Things you have not yet planned can come out of the box (like, what happens when you want to update your system, and the message format changes?)
For serialization, have a read on those general purpose formats:
Human readable: JSON, XML, YAML, others...
Binary: Protobuf, TPL, Avro, BSON, MessagePack, and many others
For socket abstraction, look up
Boost ASIO
ZeroMQ
nanomsg
Many others
Question
I am wondering why do we connect to sockets by using functions like hton to take care of endianness when we could have sent the ip in plain char array.
Say we want to connect to 184.54.12.169
There is an explanation to this but I cannot figure out why we use integers instead of char, and so involving ourself in endianness hell.
I think char out_ip[] = "184.54.12.169" could have theoretically made it.
Please explain me the subtleties i don't get here.
The basic networking APIs are low level functions. These are very thin wrappers around kernel system calls. Removing these low level functions, forcing everything to use strings, would be rather bad for a low-level API like that, especially considering how tedious string handling is in C. As a concrete hurdle, even IP strings would not be fixed length, so handling them is a lot more complex than just plain 32 bit integers. And moving string handling to kernel is really quite against what kernel is supposed to be, handling arbitrary user strings is really user space problem.
So, you want to create higher-level functions which would accept strings and do the conversion in the library. But, adding such higher level "convenience" functions all over the place in the core libraries would bloat them, because certainly passing IP numbers is not the only place for such convenience. These functions would need to be maintained forever and included everywhere, after they became part of standard (official like POSIX, or de-facto) libraries.
So, removing the low-level functions is not really an option, and adding more functions for higher-level API in the same library is not a good option either.
So solution is to use another library to provide higher level networking API, which could for example handle address strings directly. Not sure what's out ther for C, but it's almost a given for other languages, which also have "real" strings built in so using them is not a hassle.
Because that's how an IP is transmitted in a packet. The "www.xxx.yyy.zzz" string form is really just a human readable form of a 4 byte integer that allows us to see the hierarchical nature a little easier. Sending a whole string would take up a lot more space as well.
Say number 127536 that requires 7 bytes not four. In addition you need to parse it.
I.e. more efficient and do not have to deal with invalid values.
Currently, on my embedded system (coded in C), I have a lot of debug-assistive print statements which are executed when a remote tool is hooked up to the system that can display the messages to a PC. These help to understand the general system status, but as the messages are going over a slow CAN bus, I believe they may be clogging the tubes and causing other problems with trying to get any useful data logged.
The basic gist of it is this:
It's like a printf, but ultimately in a special message format that gets sent from the embedded system to the tool over the CAN bus. For this purpose, I can replace this generic print message with special debugging messages that send it a unique ID followed by only the variable parameters (i.e. the argc/argv). I am wondering if that is the right way to go about it, or if there is a magic bullet I've missed, or something else I haven't thought of.
So, I found this question which starts out well for my purposes:
printf() debugging library using string table "decoder ring"
But I do not have the constraint of making it as easy as a printf. I can maintain a table of strings on the remote tool's side (since it's a Windows executable and therefore not as code size limited). I am the sole person responsible for this code and would prefer to try and lighten up the code size as well as the CAN bus traffic while debugging.
My current thoughts are thus:
printf("[%d] User command: answer call\n", (int)job);
This becomes
debug(dbgUSER_COMMAND_ANSWER_CALL, job);
dbgUSER_COMMAND_ANSWER_CALL being part of an enum of possible debug messages
and the remote side has something like
switch(messagetype)
{
case dbgUSER_COMMAND_ANSWER_CALL:
/* retrieve an integer from the data portion of the message and put it into a variable */
printf("[%d] User command: answer call\n", (int)avariable);
}
That's relatively straightforward and it would be fantastic if all my messages came in that same format. Where it gets tricky, though, is where some of my statements have to print strings which are not constant (the name of the device, for example).
printf("[%d] %02X:%02X:%02X:%02X:%02X:%02X (%s)\n", /* a bunch of parameters here */);
So, should I make it so that the contents of the debug message are 1) the message type, 2) length of the first parameter, 3) the parameter, 4) length of the next parameter, 5) the parameter, so on and so forth
Or have I overlooked something more obvious or easy?
Thanks
I'm assuming you use CAN because that's the connection you already have to your device. You haven't really provided enough information about your diagnostic needs, but I can give an example of what we do where I work. We use a custom build tool to comb through our sources building up a string table. Our code uses something like:
log( LOGH(T0722T), //"Position"
LOG_DOT_HEX_VALUE(i),
LOG_TEXT(T0178T), //"Id"
LOG_DOT_VALUE(uniqueId % 10000),
0 );
This would record some data which could be decoded into:
<timestamp> H Position.00B Id.0235
We allow use of T0000T to have the tool lookup (or generate) a unique number for us. The tool builds up an enum using the TxxxxT numbers for the compiler, and a file containing the ordered list of strings. Every build generates a string table which matches the enum numbering. This system also ties into a database system used for generating internationalized strings, but that's not really relevant to your question.
Each element is a short (16 bits). We allow 12 bits for values and use the high 4 bits for type and flag info. Is the encoded data a string id; Is it a signal (High/Low) or just an event; Is it a value; Is it decimal, hexidecimal, base64url; concatenated (with the preceding item) or separate.
We record data in a large ring buffer for querying if needed. This allows the system to run without interference but the data can be extracted if a problem is noted. It is certainly possible to constantly send the data, and if that's desired I'd suggest limiting any one message to a single CAN payload (that's 8 bytes assuming you only use a single ID). The message I provided above would fit in one CAN message if properly encoded (assuming the receiving side created the timestamps).
We do have the ability to include arbitrary data (ASCII or Hexadecimal), but I never use it. It's usually a waste of precious space in the logs. Logging is always a balance in embedded environments.