Multiple calls to send() are merged into one call to recv() - c

I have a client-server application.
The client is sending a string followed by an integer using two distinct send() calls. These two data are supposed to be stored into two different variables on the server.
The problem is that both variables sent are received on recv() call. Therefore, the two strings sent by the two distinct send()s are chained and stored in the buffer of the first recv().
server.c:
printf("Incoming connection from client %s:%i accepted\n",inet_ntoa(clientSocketAddress.sin_addr),ntohs(clientSocketAddress.sin_port));
memset(buffer,0,sizeof(buffer));
int sizeofMessage;
if ((recv(clientSocket,buffer,MAXBUFFERSIZE,0)==sizeofMessage)<0)
{
printf("recv failed.");
closesocket(serverSocket);
clearWinsock();
return EXIT_FAILURE;
}
char* Name=buffer;
printf("Name: %s\n",Name);
if ((recv(clientSocket,buffer,MAXBUFFERSIZE,0))<0)
{
printf("bind failed.");
closesocket(serverSocket);
clearWinsock();
return EXIT_FAILURE;
}
int integer=ntohs(atoi(buffer));
printf("integer: %i\n",intero);
client.c:
if (send(clientSocket,Name,strlen(Name),0)!=strlen(Name))
{
printf("send failed");
closesocket(clientSocket);
clearWinsock();
return EXIT_FAILURE;
}
printf("client send: %s",Name);
int age=35;
itoa(htons(age),buffer,10);
sizeofBuffer=strlen(buffer);
if (send(clientSocket,buffer,sizeofBuffer,0)!=sizeofBuffer)
{
printf("bind failed.");
closesocket(clientSocket);
clearWinsock();
return EXIT_FAILURE;
}
How can I fix it? What am I doing wrong?

TCP is a streaming protocol. It is not aware at all of any kind of "message" boundaries. It does not add such information dependend on single calls to send().
Due to those facts any number of send()s on the sender side can lead to any number of recv()s (up to the number of bytes sent) on the receiver side.
To get around this behaviour define and implement an application level protocol to distinguish the different "messages" that had been sent.
One cannot rely on recv()/send() receiving/sending as much bytes as those two functions were told to receive/send. It is an essential necessity to check their return value to learn how much bytes those functions actually received/sent and loop around them until all data that was intended to be received/sent had been received/sent.
For examples how this "looping" could be done
for writing you might like to have look at this answer: https://stackoverflow.com/a/24260280/694576 and
for reading on this answer: https://stackoverflow.com/a/20149925/694576

This is how TCP works. Treat it as a byte stream.
Put some basic protocol on top of it - delimit you application messages with some known byte value, or prepend your messages with length field.
Or switch to UDP, which gives you datagram semantics you are looking for, if you can tolerate/recover from occasional packet loss.

You can add a short time interval like sleep(5) between two messages, if time does not matter too much in your application.

Related

Still sending one message with TCP_NODELAY

from what i've understood of Nagel's algorithm is that it tries to send multiple messages in one message if possible to use less bandwith.
My problem is that for a university project i would have to disable this; I have to first send a name then a year, a month, a day and finally a filename.
On the server side I will have to process it to a string: name/year/month/day/filename
It is explicitly stated that my client/server should work with the client/servers from other students. So I am not allowed to just set a \0 or another character at the end every message and then process it on the server because any student could have a different end charachter.
My code looks like this
int main(int argc, char *argv[])
{
int sockfd;
int yes=1;
struct sockaddr_in their_addr;
struct hostent *he;
if ((he=gethostbyname(argv[1])) == NULL) {
perror("Client: gethostbyname");
return EXIT_FAILURE;
}
if ((sockfd = socket(PF_INET,SOCK_STREAM,IPPROTO_TCP))==-1) {
perror("Client: socket");
return EXIT_FAILURE;
}
their_addr.sin_family = AF_INET;
their_addr.sin_port = htons(PORT);
their_addr.sin_addr = *((struct in_addr*)he->h_addr);
memset(&(their_addr.sin_zero), '\0', 8);
if (connect(sockfd,(struct sockaddr *)&their_addr,sizeof(struct sockaddr))==-1) {
perror("Client: connect");
return EXIT_FAILURE;
}
if (setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, (char *)&yes, sizeof(int))==-1) {
perror("Client: setsockopt");
return EXIT_FAILURE;
}
if (send(sockfd,argv[2],strlen(argv[2]),0)==-1) {
perror("Client: send username");
return EXIT_FAILURE;
}
if (send(sockfd,argv[4],4,0)==-1) {
perror("Client: send year");
return EXIT_FAILURE;
}
I thought that this would work because of the line
setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, (char *)&yes, sizeof(int)
sometimes also written like this (none of them work anyways)
setsockopt(sockfd, SOL_TCP, TCP_NODELAY, &yes, sizeof(yes));
I did not find anything saying that this should be done (I always used 0 instead of IPPROTO_TCP):
sockfd = socket(PF_INET,SOCK_STREAM,IPPROTO_TCP);
but I found some code with this so I tried it out, but it still did not work.
On the server side I have also very standard code with 5 recv(), I also tried to implement TCP_NODELAY there and it still did not work. I doubt the server code will help as the problem seems to be from the client sending one message.
So I would like to know what I am doing wrong and how to effectively get 5 different messages instead of one (what I am currently doing is to have sleep(1) between each send, which is clearly not optimal).
Thank you in advance for the response
There are no 'messages' end-to-end in TCP; it's a byte stream protocol. The protocol is free to combine the bytes from multiple sends as it wishes, or to split one send into multiple segments. This means that if you want discrete messages then you have to invent them. The usual methods include sending a length ahead of the actual message bytes; or having a specific terminating character (which the receiver must then scan for); or using fixed-length messages (I would advise against this as it's inflexible).
All of those would require establishing a standard approach for all students to use. But that's how it is in real life: communication requires the protocols to be agreed in advance. I don't know your teacher's opinion, but I'd award good marks if you collectively defined a message standard and wrote it up as part of submitting your work.
The "wait between messages" approach which you discovered for yourself is very much a cross-your-fingers and hope solution; you hope your wait time exceeds the time taken to transmit the message, which could be quite large if there is a network burp. And the receiver hopes that either (a) all bytes are delivered at once, or (b) that if it polls for data then a 'no more' indication means that it has read the whole message.
while it does say in linux'es own headerfiles that 'TCP_NODELAY' 'disables nagle' ;)
user#user-OptiPlex-9020:~$ cat /usr/include/linux/tcp.h |grep -i nagle
#define TCP_NODELAY 1 /* Turn off Nagle's algorithm. */
so ehm yeah there is that.... a couple of sequential sends() still end up in one receive. EVEN if other filedescriptors get send()'t to in between by the same process. so yeah. that doesn't quite work as documented.
as in send(1,"aaa");send(2,"aaa");send(3,"aaa");send(1,"bbb");send(2,"bbb") etc... can still end up at the other end of filedescriptor 1 as "aaabbb" in the recv(). so it doesn't -quite- turn it off... it does seem to keep the parts sent in one send() together in one recv tho. so no "aaabb" and then the last "b" in the next recv. just merges them until the mtu is full (as long as the whole payload fits) or it takes too long ;)
from the looks of it it seems to try to merge the payloads a bit less than it does without it tho. so it still seems to affect it someway... but without diving into the code or running long term statistics on it that's hard to tell. just 'from the looks of it it has less larger merged packets than without it'.

Message Ordering with Asynchronous I/O (epoll)

Say that I've implemented a epoll-based TCP server where each thread is running something very similar to the below (taken from the epoll manpage where kdpfd is the epoll file descriptor and listener is a socket that is listening on a port):
struct epoll_event ev, *events;
for(;;) {
nfds = epoll_wait(kdpfd, events, maxevents, -1);
for(n = 0; n < nfds; ++n) {
if(events[n].data.fd == listener) {
client = accept(listener, (struct sockaddr *) &local,
&addrlen);
if(client < 0){
perror("accept");
continue;
}
setnonblocking(client);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = client;
if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
fprintf(stderr, "epoll set insertion error: fd=%d0,
client);
return -1;
}
}
else
do_use_fd(events[n].data.fd);
}
}
For the do_use_fd(events[n].data.fd) above, say we want to write everything we receive to stdout:
int do_use_fd(int fd) {
int err;
char buf[512];
while ((err = read(fd, buf, 512)) > 0) {
write(1, buf, err);
}
if (err == -1 && errno != EAGAIN && errno != EWOULDBLOCK)
// do some error handling and return -1
return 0;
}
Now, say I have 10k+ connections, all of who send me a lot of messages over a prolonged period of time. Assume that my clients send me the message hello, my name is {client's name} every few seconds. Assume that (somehow) this message is large enough that it has to be transfered as multiple packets.
As such, read(fd, buf, 512) may occasionally return -1 with an errno indicating it would block. As such, I think the above solution could end up with the something like following output:
hello, my nam
hello, my name is Pau
e is John Le
hello, my name is Geo
nnon
l McCartney
rge
hello, my name is Ringo
Starr
Harrison
because as soon as a read blocks on one connection, another read can start on a different connection. Instead, I'd like the following to be printed:
hello, my name is John Lennon
hello, my name is Paul McCartney
hello, my name is George Harrison
hello, my name is Ringo Starr
Is there a recommended way of dealing with this issue? One option would be to keep a buffer per connection, and check if the message is completed and only print once this happens. But with 10k+ connections, would this be a good idea? On one hand, something tells me this solution does not scale well. On the other hand, if the messages are only 500 bytes, with 10k connections, this solution is only going to take up 5MB.
Thanks in advance.
I think using a buffer per connection would be OK in your case. It may however be more elegant to create a buffer per incomplete message. That would mean that you somehow have to know when your message is done, so you would need a small protocol, such as using a length field or a terminator (, and possibly a timeout to kill incomplete messages after a certain time). This would also guarantee that no unused memory is allocated, as the buffer could be released right after the message is complete and passed up. You could for example access these buffers through a hashmap using the connection 5-tuple as key. If you decide to use a message-bound identifier, which of course will incur extra overhead, you could even demux messages from a single tcp-connection used to transmit multiple messages at a time.
If you need to enforce ordering among these messages you will have to detail your situation, because ordering is a tough problem in many situations.
Edit: Sorry, I have a lot to do at the moment, so I could not answer any sooner. You are correct that using a connection-based approach is easier. Message-based is the more advantageous the sparser the connections are used. If you can expect all connections to receive messages at all times it is just an overhead. If connections are sometimes idle for a while it may reduce the memory usage considerably though. Also note that your applications memory usage no longer scales with the number of clients but the the number of messages, which is usually nice, because message-rates typically vary. You are also correct about the ordering on a TCP-stream. As long as you send only one complete message at a time over the connection, TCP will ensure ordering. Some applications e.g., HTTP2 reuse the same TCP-connection to send multiple messages at the same time. In that case TCP will not be helpful, because message fragments arrive in an unspecified order and you need to demultiplex them (e.g. via stream-ids in HTTP2).

reading Ethernet data packet from raw sockets using recvfrom() in MSG_DONTWAIT mode

I am using raw sockets to send and receive Ethernet data packets in C using recvFrom(). I want to read in non blocking mode so I am using MSG_DONTWAIT. But the recvFrom() is always returning -1 even if packet is received or not. I am new to C programming.
I am able to receive my payload but I get message "Receive resource temporary unavailable" always.
Code Snippet:
if ((sock = socket(AF_PACKET, SOCK_RAW, htons(0x8851))) < 0) {
perror("ERROR: Socket");
exit(1);
}
while(1) {
int flag=0;
n=recvfrom(sock, buffer, 2048, MSG_DONTWAIT, NULL, NULL);
if (n == -1) {
perror("ERROR: Recvfrom");
close(sock);
flag=1;
}
if (flag==0) {
// Read Packet
}
}
If you use the MSG_DONTWAIT argument for recvfrom(), the system call will always immediately return whether or not there is any data to be read. If there's no data, the return value is -1 and then errno will be set to EAGAIN. In your application, I'm not completely sure that MSG_DONTWAIT is the right choice. If the only thing you're doing is reading packets from that one socket, you shouldn't use MSG_DONTWAIT. So, your program in practice will print lots of error messages in a loop. If you remove that error message for the case when errno == EAGAIN, your program would be slightly better but not much better: it would spin in a loop, consuming all CPU resources.
If, however, you are reading from multiple file descriptors at the same time, then using non-blocking I/O is the right choice. But instead of your loop, you should have a loop that polls for the readiness of multiple file descriptors using select(), poll() or epoll_wait(). Since you're running on Linux, I highly recommend epoll_wait() as it's the most scalable approach of these. See the epoll, epoll_wait and epoll_create manual pages on Linux for more information.
I heavily recommend not using MSG_DONTWAIT for now and check if the function call ever returns. If it never returns, it means it isn't receiving any packets.

TCP server using select()

I need to write a TCP server that can handle multiple connections; I followed this guide and wrote up the following program:
static void _handle_requests(char* cmd,int sessionfd){
//TODO: extend
printf("RECEIVED: %s\n",cmd);
if (!strcmp(cmd,BAR)){
barrier_hit(&nodebar,sessionfd);
}else if (!strcmp(cmd, BYE)){
}else if (!strcmp(cmd, HI)){
}
}
void handle_requests(void){
listen(in_sock_fd,QUEUELEN);
fd_set read_set, active_set;
FD_ZERO(&active_set);
FD_SET(in_sock_fd, &active_set);
int numfd = 0;
char cmd[INBUFLEN];
for (;;){
read_set = active_set;
numfd = select(FD_SETSIZE,&read_set,NULL,NULL,NULL);
for (int i = 0;i < FD_SETSIZE; ++i){
if (FD_ISSET(i,&read_set)){
if (i == in_sock_fd){
//new connection
struct sockaddr_in cliaddr;
socklen_t socklen = sizeof cliaddr;
int newfd = accept(in_sock_fd,(struct sockaddr*)&cliaddr, &socklen);
FD_SET(newfd,&active_set);
}else{
//already active connection
read(i,cmd,INBUFLEN);
_handle_requests(cmd,i);
}
}
}
}
}
..and a single client that connect() to the server and does two consecutive write() calls to the socket file descriptor.
n = write(sm_sockfd, "hi", 3);
if (n < 0) {
perror("SM: ERROR writing to socket");
return 1;
}
//...later
n = write(sm_sockfd, "barrier", 8);
if (n < 0) {
perror("SM: 'barrier msg' failed");
exit(1);
}
The thing is, the server only picks up the first message ("hi"); afterwards, the select call hangs. Since the write ("barrier") on the client's end succeeded, shouldn't that session file descriptor be ready for reading? Have I made any glaring mistakes?
Thanks; sorry if this is something obvious, I'm completely unfamiliar with C's networking library, and the project is due very soon!
You have a misunderstanding of how TCP sockets work. There is no message boundary in TCP, i.e. if you send first "hi" and then "barrier", you can't expect the corresponding receives to return "hi" and "barrier". It's possible that they return "hibarrier". It's also in theory possible (although very rare) that they would return "h", "i", "b", "a", "r", "r", "i", "e", "r".
You really need to consider how you delimit your messages. One possibility is to send the length of a message as 32-bit integer in network byte order (4 bytes) prior to the message. Then when you receive the message, you first read 4 bytes and then read as many bytes as the message length indicates.
Do note that TCP may return partial reads, so you need to somehow handle those. One possibility is to have a buffer which holds the bytes read, and then append to this buffer when more bytes are read and handle the contents of the buffer when the first four bytes of the buffer (i.e. the message length) indicate that you have the full message.
If you want a sequential packet protocol that preserves packet boundaries, you may want to consider SCTP. However, it's not widely supported by operating system kernels currently so what I would do is the 32-bit length trick to have a packet-oriented layer on top of TCP.
Do this :
int nbrRead = read(i,cmd,INBUFLEN);
and print out the value of nbrRead. You will see that you received everything in one go. TCP is a streaming protocol, if you do 3 or more sequential sends the chance is very high that you will receive them all at once.
Also make sure that INBUFLEN is large enough 2048 will be more than enough for your example.

Socket programming for multi-clients with 'select()' in C

This is a question about socket programming for multi-client.
While I was thinking how to make my single client and server program
to multi clients,I encountered how to implement this.
But even if I was searching for everything, kind of confusion exists.
I was thinking to implement with select(), because it is less heavy than fork.
but I have much global variables not to be shared, so I hadn`t considered thread to use.
and so to use select(), I could have the general knowledge about FD_functions to utilize, but here I have my question, because generally in the examples on websites, it only shows multi-client server program...
Since I use sequential recv() and send() in client and also in server program
that work really well when it`s single client and server, but
I have no idea about how it must be changed for multi cilent.
Does the client also must be unblocking?
What are all requirements for select()?
The things I did on my server program to be multi-client
1) I set my socket option for reuse address, with SO_REUSEADDR
2) and set my server as non-blocking mode with O_NONBLOCK using fctl().
3) and put the timeout argument as zero.
and proper use of FD_functions after above.
But when I run my client program one and many more, from the second client,
client program blocks, not getting accepted by server.
I guess the reason is because I put my server program`s main function part
into the 'recv was >0 ' case.
for example with my server code,
(I`m using temp and read as fd_set, and read as master in this case)
int main(void)
{
int conn_sock, listen_sock;
struct sockaddr_in s_addr, c_addr;
int rq, ack;
char path[100];
int pre, change, c;
int conn, page_num, x;
int c_len = sizeof(c_addr);
int fd;
int flags;
int opt = 1;
int nbytes;
fd_set read, temp;
if ((listen_sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
{
perror("socket error!");
return 1;
}
memset(&s_addr, 0, sizeof(s_addr));
s_addr.sin_family = AF_INET;
s_addr.sin_addr.s_addr = htonl(INADDR_ANY);
s_addr.sin_port = htons(3500);
if (setsockopt(listen_sock, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(int)) == -1)
{
perror("Server-setsockopt() error ");
exit(1);
}
flags = fcntl(listen_sock, F_GETFL, 0);
fcntl(listen_sock, F_SETFL, flags | O_NONBLOCK);
//fcntl(listen_sock, F_SETOWN, getpid());
bind(listen_sock, (struct sockaddr*) &s_addr, sizeof(s_addr));
listen(listen_sock, 8);
FD_ZERO(&read);
FD_ZERO(&temp);
FD_SET(listen_sock, &read);
while (1)
{
temp = read;
if (select(FD_SETSIZE, &temp, (fd_set *) 0, (fd_set *) 0,
(struct timeval *) 0) < 1)
{
perror("select error:");
exit(1);
}
for (fd = 0; fd < FD_SETSIZE; fd++)
{
//CHECK all file descriptors
if (FD_ISSET(fd, &temp))
{
if (fd == listen_sock)
{
conn_sock = accept(listen_sock, (struct sockaddr *) &c_addr, &c_len);
FD_SET(conn_sock, &read);
printf("new client got session: %d\n", conn_sock);
}
else
{
nbytes = recv(fd, &conn, 4, 0);
if (nbytes <= 0)
{
close(fd);
FD_CLR(fd, &read);
}
else
{
if (conn == Session_Rq)
{
ack = Session_Ack;
send(fd, &ack, sizeof(ack), 0);
root_setting();
c = 0;
while (1)
{
c++;
printf("in while loop\n");
recv(fd, &page_num, 4, 0);
if (c > 1)
{
change = compare_with_pre_page(pre, page_num);
if (change == 1)
{
page_stack[stack_count] = page_num;
stack_count++;
}
else
{
printf("same as before page\n");
}
} //end of if
else if (c == 1)
{
page_stack[stack_count] = page_num;
stack_count++;
}
printf("stack count:%d\n", stack_count);
printf("in page stack: <");
for (x = 0; x < stack_count; x++)
{
printf(" %d ", page_stack[x]);
}
printf(">\n");
rq_handler(fd);
if (logged_in == 1)
{
printf("You are logged in state now, user: %s\n",
curr_user.ID);
}
else
{
printf("not logged in.\n");
c = 0;
}
pre = page_num;
} //end of while
} //end of if
}
} //end of else
} //end of fd_isset
} //end of for loop
} //end of outermost while
}
if needed for code explanation : What I was about to work of this code was,
to make kind of web pages to implement 'browser' for server.
I wanted to make every client get session for server to get login-page or so.
But the execution result is, as I told above.
Why is that?
the socket in the client program must be non-blocking mode too
to be used with non-blocking Server program to use select()?
Or should I use fork or thread to make multi client and manage with select?
The reason I say this is, after I considered a lot about this problem,
'select()' seems only proper for multi client chatting program... that many
'forked' or 'threaded' clients can pend to, in such as chat room.
how do you think?...
Is select also possible or proper thing to use for normal multi-client program?
If there something I missed to let my multi client program work fine,
please give me some knowledge of yours or some requirements for the proper use of select.
I didn`t know multi-client communication was not this much easy before :)
I also considered to use epoll but I think I need to understand first about select well.
Thanks for reading.
Besides the fact you want to go from single-client to multi-client, it's not very clear what's blocking you here.
Are you sure you fully understood how does select is supposed to work ? The manual (man 2 select on Linux) may be helpful, as it provides a simple example. You can also check Wikipedia.
To answer your questions :
First of all, are you sure you need non-blocking mode for your sockets ? Unless you have a good reason to do so, blocking sockets are also fine for multi-client networking.
Usually, there are basically two ways to deal with multi-clients in C: fork, or select. The two aren't really used altogether (or I don't know how :-) ). Models using lightweight threads are essentially asynchronous programming (did I mention it also depends on what you mean by 'asynchronous' ?) and may be a bit overkill for what you seem to do (a good example in C++ is Boost.Asio).
As you probably already know, the main problem when dealing with more than one client is that I/O operations, like a read, are blocking, not letting us know when there's a new client, or when a client has said something.
The fork way is pretty straighforward : the server socket (the one which accepts the connections) is in the main process, and each time it accepts a new client, it forks a whole new process just to monitor this new client : this new process will be dedicated to it. Since there's one process per client, we don't care if i/o operations are blocking or not.
The select way allows us to monitor multiple clients in one same process : it is a multiplexer telling us when something happens on the sockets we give it. The base idea, on the server side, is first to put the server socket on the read_fds FD_SET of the select. Each time select returns, you need to do a special check for it : if the server socket is set in the read_fds set (using FD_ISSET(...)), it means you have a new client connecting : you can then call accept on your server socket to create the connection.
Then you have to put all your clients sockets in the fd_sets you give to select in order to monitor any change on it (e.g., incoming messages).
I'm not really sure of what you don't understand about select, so that's for the big explaination. But long story short, select is a clean and neat way to do single-threaded, synchronous networking, and it can absolutely manage multiple clients at the same time without using any fork or threads. Be aware though that if you absolutely want to deal with non-blocking sockets with select, you have to deal extra error conditions that wouldn't be in a blocking way (the Wikipedia example shows it well as they have to check if errno isn't EWOULDBLOCK). But that's another story.
EDIT : Okay, with a little more code it's easier to know what's wrong.
select's first parameter should be nfds+1, i.e. "the highest-numbered file descriptor in any of the three sets, plus 1" (cf. manual), not FD_SETSIZE, which is the maximum size of an FD_SET. Usually it is the last accept-ed client socket (or the server socket at beginning) who has it.
You shouldn't do the "CHECK all file descriptors" for loop like that. FD_SETSIZE, e.g. on my machine, equal to 1024. That means once select returns, even if you have just one client you would be passing in the loop 1024 times ! You can set fd to 0 (like in the Wikipedia example), but since 0 is stdin, 1 stdout and 2 stderr, unless you're monitoring one of those, you can directly set it to your server socket's fd (since it is probably the first of the monitored sockets, given socket numbers always increase), and iterate until it is equal to "nfds" (the currently highest fd).
Not sure that it is mandatory, but before each call to select, you should clear (with FD_ZERO for example) and re-populate your read fd_set with all the sockets you want to monitor (i.e. your server socket and all your clients sockets). Once again, inspire yourself of the Wikipedia example.

Resources