I've modified UDP code in the linux kernel to implement send and receive buffers to handle out of order delivery of packets. When In the new code, whenever I try to deliver multiple packets to the socket from the receive buffer, I get a kernel crash. My code snippet:
while(!skb_queue_empty(&sk->sk_receive_queue)){
skb = skb_peek(&sk->sk_receive_queue);
qb = QUIC_SKB_CB(skb);
//Check if this is the packet to be received
if(qb->sequence != qp->first_rcv){
printk("First packet in queue not yet received\nFirst packet seq %u\nExpected packet seq %u\n", qb->sequence, qp->first_rcv);
//break;
goto drop;
}
skb_unlink(skb, &sk->sk_receive_queue);
if (sk_rcvqueues_full(sk, skb, sk->sk_rcvbuf))
goto drop;
rc = 0;
ipv4_pktinfo_prepare(sk, skb);
bh_lock_sock(sk);
if (!sock_owned_by_user(sk))
rc = __udp_queue_rcv_skb(sk, skb);
else if (sk_add_backlog(sk, skb, sk->sk_rcvbuf)) {
bh_unlock_sock(sk);
goto drop;
}
bh_unlock_sock(sk);
printk("Packets left in read buffer = %u\n", skb_queue_len(&sk->sk_receive_queue));
}
return rc;
However, when I remove the while loop from the code, the code runs fine, though I only manage to send one packet from the buffer. Also, the crash happens after bh_lock_sock(sk), i.e. while the packet is being delivered to the socket. I figured this out by commenting the lines between locking and unlocking the socket.
What could possibly be going wrong with the loop?
Thanks.
I figured out what I was doing wrong. I was using sk->sk_receive_queue as the receive buffer, assuming UDP doesn't really use it. So when the function __udp_queue_rcv_skb was called, it'll use the list sk->sk_receive_queue and hence interfere with the queue I'd already populated.
I just defined another receive buffer in the struct udp_opt and I'm using it now.
Related
I'm building a multi-client<->server messaging application over TCP.
I created a non blocking server using epoll to multiplex linux file descriptors.
When a fd receives data, I read() /or/ recv() into buf.
I know that I need to either specify a data length* at the start of the transmission, or use a delimiter** at the end of the transmission to segregate the messages.
*using a data length:
char *buffer_ptr = buffer;
do {
switch (recvd_bytes = recv(new_socket, buffer_ptr, rem_bytes, 0)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
buffer_ptr += recvd_bytes;
rem_bytes -= recvd_bytes;
} while (rem_bytes != 0);
**using a delimiter:
void get_all_buf(int sock, std::string & inStr)
{
int n = 1, total = 0, found = 0;
char c;
char temp[1024*1024];
// Keep reading up to a '\n'
while (!found) {
n = recv(sock, &temp[total], sizeof(temp) - total - 1, 0);
if (n == -1) {
/* Error, check 'errno' for more details */
break;
}
total += n;
temp[total] = '\0';
found = (strchr(temp, '\n') != 0);
}
inStr = temp;
}
My question: Is it OK to loop over recv() until one of those conditions is met? What if a client sends a bogus message length or no delimiter or there is packet loss? Wont I be stuck looping recv() in my program forever?
Is it OK to loop over recv() until one of those conditions is met?
Probably not, at least not for production-quality code. As you suggested, the problem with looping until you get the full message is that it leaves your thread at the mercy of the client -- if a client decides to only send part of the message and then wait for a long time (or even forever) without sending the last part, then your thread will be blocked (or looping) indefinitely and unable to serve any other purpose -- usually not what you want.
What if a client sends a bogus message length
Then you're in trouble (although if you've chosen a maximum-message-size you can detect obviously bogus message-lengths that are larger than that size, and defend yourself by e.g. forcibly closing the connection)
or there is packet loss?
If there is a reasonably small amount of packet loss, the TCP layer will automatically retransmit the data, so your program won't notice the difference (other than the message officially "arriving" a bit later than it otherwise would have). If there is really bad packet loss (e.g. someone pulled the Ethernet cable out of the wall for 5 minutes), then the rest of the message might be delayed for several minutes or more (until connectivity recovers, or the TCP layer gives up and closes the TCP connection), trapping your thread in the loop.
So what is the industrial-grade, evil-client-and-awful-network-proof solution to this dilemma, so that your server can remain responsive to other clients even when a particular client is not behaving itself?
The answer is this: don't depend on receiving the entire message all at once. Instead, you need to set up a simple state-machine for each client, such that you can recv() as many (or as few) bytes from that client's TCP socket as it cares to send to you at any particular time, and save those bytes to a local (per-client) buffer that is associated with that client, and then go back to your normal event loop even though you haven't received the entire message yet. Keep careful track of how many valid received-bytes-of-data you currently have on-hand from each client, and after each recv() call has returned, check to see if the associated per-client incoming-data-buffer contains an entire message yet, or not -- if it does, parse the message, act on it, then remove it from the buffer. Lather, rinse, and repeat.
I'm testing the CAN interface on an embedded device (SOC / ARM core / Linux) using SocketCAN, and I want to send data as fast as possible for testing, using efficient code.
I can open the CAN device ("can0") as a BSD socket, and send frames with "write". This all works well.
My desktop can obviously generate frames faster than the CAN transmission rate (I'm using 500000 bps). To send efficiently, I tried using a "select" on the socket file descriptor to wait for it to become ready, followed by the "write". However, the "select" seems to return immediately regardless of the state of the send buffer, and "write" also doesn't block. This means that when the buffer fills up, I get an error from "write" (return value -1), and errno is set to 105 ("No buffer space available").
This mean I have to wait an arbitrary amount of time, then try the write again, which seems very inefficient (polling!).
Here's my code (C, edited for brevity):
printf("CAN Data Generator\n");
int skt; // CAN raw socket
struct sockaddr_can addr;
struct canfd_frame frame;
const int WAIT_TIME = 500;
// Create socket:
skt = socket(PF_CAN, SOCK_RAW, CAN_RAW);
// Get the index of the supplied interface name:
unsigned int if_index = if_nametoindex(argv[1]);
// Bind CAN device to socket created above:
addr.can_family = AF_CAN;
addr.can_ifindex = if_index;
bind(skt, (struct sockaddr *)&addr, sizeof(addr));
// Generate example CAN data: 8 bytes; 0x11,0x22,0x33,...
// ...[Omitted]
// Send CAN frames:
fd_set fds;
const struct timeval timeout = { .tv_sec=2, .tv_usec=0 };
struct timeval this_timeout;
int ret;
ssize_t bytes_writ;
while (1)
{
// Use 'select' to wait for socket to be ready for writing:
FD_ZERO(&fds);
FD_SET(skt, &fds);
this_timeout = timeout;
ret = select(skt+1, NULL, &fds, NULL, &this_timeout);
if (ret < 0)
{
printf("'select' error (%d)\n", errno);
return 1;
}
else if (ret == 0)
{
// Timeout waiting for buffer to be free
printf("ERROR - Timeout waiting for buffer to clear.\n");
return 1;
}
else
{
if (FD_ISSET(skt, &fds))
{
// Ready to write!
bytes_writ = write(skt, &frame, CAN_MTU);
if (bytes_writ != CAN_MTU)
{
if (errno == 105)
{
// Buffer full!
printf("X"); fflush(stdout);
usleep(20); // Wait for buffer to clear
}
else
{
printf("FAIL - Error writing CAN frame (%d)\n", errno);
return 1;
}
}
else
{
printf("."); fflush(stdout);
}
}
else
{
printf("-"); fflush(stdout);
}
}
usleep(WAIT_TIME);
}
When I set the per-frame WAIT_TIME to a high value (e.g. 500 uS) so that the buffer never fills, I see this output:
CAN Data Generator
...............................................................................
................................................................................
...etc
Which is good! At 500 uS I get 54% CAN bus utilisation (according to canbusload utility).
However, when I try a delay of 0 to max out my transmission rate, I see:
CAN Data Generator
................................................................................
............................................................X.XX..X.X.X.X.XXX.X.
X.XX..XX.XX.X.XX.X.XX.X.X.X.XX..X.X.X.XX..X.X.X.XX.X.XX...XX.X.X.X.X.XXX.X.XX.X.
X.X.XXX.X.XX.X.X.X.XXX.X.X.X.XX.X.X.X.X.XX..X..X.XX.X..XX.X.X.X.XX.X..X..X..X.X.
.X.X.XX.X.XX.X.X.X.X.X.XX.X.X.XXX.X.X.X.X..XX.....XXX..XX.X.X.X.XXX.X.XX.XX.XX.X
.X.X.XX.XX.XX.X.X.X.X.XX.X.X.X.X.XX.XX.X.XXX...XX.X.X.X.XX..X.XX.X.XX.X.X.X.X.X.
The initial dots "." show the buffer filling up; Once the buffer is full, "X" starts appearing meaning that the "write" call failed with error 105.
Tracing through the logic, this means the "select" must have returned and the "FD_ISSET(skt, &fds)" was true, although the buffer was full! (or did I miss something?).
The SockedCAN docs just say "Writing CAN frames can be done similarly, with the write(2) system call"
This post suggests using "select".
This post suggests that "write" won't block for CAN priority arbitration, but doesn't cover other circumstances.
So is "select" the right way to do it? Should my "write" block? What other options could I use to avoid polling?
After a quick look at canbusload:184, it seems that it computes efficiency (#data/#total bits on the bus).
On the other hand, according to this, max efficiency for CAN bus is around 57% for 8-byte frames, so you seem not to be far away from that 57%... I would say you are indeed flooding the bus.
When setting a 500uS delay, 500kbps bus bitrate, 8-byte frames, it gives you a (control+data) bitrate of 228kbps, which is lower than max bitrate of the CAN bus, so, no bottleneck here.
Also, since in this case only 1 socket is being monitored, you don't need pselect, really. All you can do with pselect and 1 socket can be done without pselect and using write.
(Disclamer: hereinafter, this is just guessing since I cannot test it right now, sorry.)
As of why the behavior of pselect, think that the buffer could have byte semantics, so it tells you there is still room for more bytes (1 at least), not necessarily for more can_frames. So, when returning, pselect does not inform you can send the whole CAN frame. I guess you could solve this by using SIOCOUTQ and the max size of the Rx buffer SO_SNDBUF, but not sure if it works for CAN sockets (the nice thing would be to use SO_SNDLOWAT flags, but it is not changable in Linux's implementation).
So, to answer your questions:
Is "select" the right way to do it?
Well, you can do it both ways, either (p)select or write, since you are only waiting for one file descriptor, there is no real difference.
Should my "write" block? It should if there is no single byte available in the send buffer.
What other options could I use to avoid polling? Maybe by ioctl'ing SIOCOUTQ and getsockopt'ing SO_SNDBUF and substracting... you will need to check this yourself. Alternatively, maybe you could set the send buffer size to a multiple of sizeof(can_frame) and see if it keeps you signaling when less than sizeof(can_frame) are available.
Anyhow, if you are interested in having a more precise timing, you could use a BCM socket. There, you can instruct the kernel to send a specific frame at a specific interval. Once set, the process run in kernel space, without any system call. In this way, user-kernel buffer problem is avoided. I would test different rates until canbusload shows no rise in bus utilization.
select and poll worked for me right with SocketCan. However, carefull configuration is require.
some background:
between user app and the HW, there are 2 buffers:
socket buffer, where its size (in bytes) is controlled by the setsockopt's SO_SNDBUF option
driver's qdisc, where its size (in packets) is controlled by the "ifconfig can0 txqueuelen 5" command.
data path is: user app "write" command --> socket buffer -> driver's qdisc -> HW TX mailbox.
2 flow control points exist along this path:
when there is no free TX mailboxe, driver freeze driver's qdisc (__QUEUE_STATE_DRV_XOFF), to prevent more packets to be dequeued from driver's qdisc into HW. it will be un-freezed when TX mailbox is free (upon TX completion interrupt).
when socket buffer goes above half of its capacity, poll/select blocks, until socket buffer goes beyond half of its capacity.
now, assume that socket buffer has room for 20 packets, while driver's qdisc has room for 5 packets. lets assume also that HW have single TX mailbox.
poll/select let user write up to 10 packets.
those packets are moved down to socket buffer.
5 of those packets continue and fill driver's qdisc.
driver dequeue 1st packet from driver's qdisc, put it into HW TX mailbox and freeze driver's qdisc (=no more dequeue). now there is room for 1 packet in driver's qdisc
6th packet is moved down successfully from socket buffer to driver's qdisc.
7th packet is moved down from socket buffer to driver's qdisc, but since there is no room - it is dropped and error 105 ("No buffer space available") is generated.
what is the solution?
in the above assumptions, lets configure socket buffer for 8 packets. in this case, poll/select will block user app after 4 packets, ensuring that there is room in driver's qdisc for all of those 4 packets.
however, socket buffer is configured to bytes, not to packet. translation should be made as the following: each CAN packet occupy ~704 bytes at socket buffer (most of them for the socket structure). so, to configure socket buffer to 8 packet, the size in bytes should be 8*704:
int size = 8*704;
setsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size));
I have a loop to capture packets with pcap_next_ex and in each iteraction I do a lot of functions calls according to process the packets. This stuff can be simulated by a Sleep() call in the loop. Then what happen then I call Sleep in a pcap_next_ex() loop?.
pcap_pkthdr* header = NULL;
UCHAR* content = NULL;
pcap = pcap_open(adapterName.c_str(), 65536, PCAP_OPENFLAG_PROMISCUOUS, 1000, NULL, NULL);
//Set to nonblock mode?
while (INT res = pcap_next_ex(pcap, &header, const_cast<const UCHAR**>(&content)) >= 0)
{
if (res != FALSE)
{
if (content)
{
//Here i do the stuff which I will simulate with a Sleep() call
Sleep(200);
}
}
}
I have seen code which uses pcap_next_ex and save the packets in a vector to treat them later with another thread, this method reduces the time of the stuff notably but does not convince me a lot. Shall I use this method?.
I would like to use other winpcap functions which capture packets in "non blocking" mode and call an event for each packet which comes... What is the best method to not lost packets with winpcap?.
Any help will be appreciated. Regards.
WinPcap stores packets it captures into a ring buffer the size of which is limited.
If the number of bytes of packets reach the ring buffer size, the old packets are discarded so that WinPcap can store new packets.
So, you should call pcap_next_ex as frequently as possible so that you can get as many packets as possible before they are discarded.
Calling pcap_next_ex in a dedicated thread and processing packets in another thread is a good practice because this way can call pcap_next_ex the most frequently.
I wrote simple TCP/IP multi-thread ANSI C server (client is C sharp), everything works fine except when the server doesnt receive proper signal from client it wont end the thread and close its socket (for example when client crash). Eventually it could become problem if those threads accumulate.
I got threads stored in Linked List - iterating through them isnt a problem. However they are all blocked by recv() by default and since dead client wont send anything they become stuck in memory.
What is the proper way of maintaining list of online clients? (or how to detect threads with broken connection).
struct tListItem {
pthread_t thisThread;
char* name;
int c_sockfd;
int run;
tListItem* next;
tListItem* prev;};
struct tList{
tListItem* head;
int count;};
code of thread:
while(param->run)
{
bzero(&buf, sizeof(buf));
if ((readLen = recv(param->c_sockfd, buf, BUFFSIZE, 0)) == -1)
{
perror("Read error");
param->run = 0;
}
else if (readLen > 0) {
printf("%s: %s \n", param->name, buf);
parseIncoming(param->c_sockfd, param, buf);}}
and here is my attempt to detect broken connection, but this causes the server to end with no message:
void* maintenance() {
tListItem *item;
char buf[4] = "PNG";
while(1)
{
usleep(2000000);
item= threadList->head;
while(item != 0)
{
if ((send(item->c_sockfd, buf, 3, NULL)) == -1)
{
perror("Write error");
item->run = 0;
}
item = item->next;
}
}
}
There's a few common ways this is dealt with:
Implement a heartbeat/ping-pong in your protocol on top of TCP. That is, periodically the client and/or server
sends a heartbeat message to the other end. If the server has not received any data or heartbeat messages within a period of time, e.g. two times the heartbeat period, or if sending the heartbeat message from the server fails, then consider the connection to be dead and close it.
Implement an overall data timeout. Each time the server receives data, you read time current time. Periodically you check the connection for when you last received data, and time out/close connections that haven't received data in a while.
Enable TCP keepalive. This is basically a last resort if you cannot do either 1. or 2.. It'll help you detect dead peers, as the TCP keepalives will break the connection if the peer cannot be reached. (Though it will not help you detect idle clients). Note that the default for keepalives is in the order of hours.
In all cases you should always to be read()/recv() or otherwise monitoring the socket for read events so you can learn as quick as possible if the connection actively breaks.
It's also quite hard to implement this if you're doing blocking read()/recv() calls, you would normally need to set a timeout on the read() so you can wake up periodically and send a heartbeat message or check if the client has been idle for too long - this is best done by using select()/poll() or the like so you can get a timeout instead of doing a block read() that might never return.
I have written a forward proxy. I gonna use it for both windows and linux. I do have required changes as per the OS. However, I keep seeing some raise conditions. Mostly I believe they are due to my misunderstanding in guessing which is the last packet (FIN sigal). Currently I do select on set of sockets. Whichever socket gets signalled, I do read() on it. If read returns 0 then I assume it is a FIN packet and I close that socket. Can it happen that my read() gives non zero value. But that packet does contain FIN (I think it can happen). So, I do not close some sockets though they have got closed.
I am not sure how proxies detect which socket has closed? Or which is a last packet on the established connection.
My code looks like follow:
I have 100 fds which I have accepted from client. I store them an array sock_array[total_size].
select(copy_of_sock_array,timeout)
for(int cnt=0;cnt<total_size;cnt++)
{
if(FD_ISSET(sock_array[cnt],sock_array))
{
ret = recv(sock_array[cnt],buffer,len);
if(ret<=0){
/*This must be a FIN packet */
/* Close corresponding socket which is opened with outer world */
close(/*corresponding socket*/);
}
}
}
Does this look ok?
Thanks
You need to do a non-blocking read, and keep reading from the socket until you get a return value that indicates you should stop reading.
ssize_t r = 0;
for (;;) {
r = recv(sock, buf, bufsz, MSG_DONTWAIT);
if (r <= 0) {
if (r < 0 && errno == EINTR) {
continue;
}
break;
}
/* ... handle data in buf .. */
}
if (r < 0) {
if (errno == EAGAIN) {
/* ... wait in select again ... */
} else {
/* ... handle error ... */
}
} else {
/* got FIN */
}
Note that just because FIN is received does not necessarily mean the connection should be closed. The FIN merely indicates that no more data will be sent, but the peer may still be willing to accept more data. This can happen in HTTP where the client only wants a single response, so it delivers a FIN after its request. It still expects to receive the response though.
Your proxy likely has two sockets, say sock1 and sock2. So receipt of the FIN on sock1 should mean that this indication be forwarded onto sock2 after any data that has been queued on it has been delivered (and the mirror is true as well). You can forward the FIN by using shutdown.
shutdown(sock2, SHUT_WR);
When FIN has been received from both sock1 and sock2, you can call close on both sockets.
So addressing your questions.
Can it happen that my read() gives non zero value. But that packet does contain FIN (I think it can happen).
Yes, this may happen. This is why you continue reading until you get an indication to stop. Well, technically, you don't have to. You can defer that until you have processed some other connection if you have per connection fairness issues. But, you need to come back to it and finish reading before you enter your select wait.
So, I do not close some sockets though they have got closed. I am not sure how proxies detect which socket has closed? Or which is a last packet on the established connection.
As I described, as a (transparent) proxy, the socket can be safely closed once you have forwarded a FIN on it and a FIN has been received on it. If you are not a transparent proxy, you play by a different set of rules, since you really are the server for the client in that case. So, you can close the socket whenever the application protocol you are implementing permits you to do so.
Sockets have a well defined behavior. If you receive data and the connection is closed after that, you'll need two read()s. The first will return the data and the second one will return 0, to signal the end of connection.
You always have to read until the syscall returns 0.
And you don't need a non-blocking read to detect this!