Winpcap code - Capture loses packets in loop

Winpcap code - Capture loses packets in loop - c

I have a loop to capture packets with pcap_next_ex and in each iteraction I do a lot of functions calls according to process the packets. This stuff can be simulated by a Sleep() call in the loop. Then what happen then I call Sleep in a pcap_next_ex() loop?.
pcap_pkthdr* header = NULL;
UCHAR* content = NULL;
pcap = pcap_open(adapterName.c_str(), 65536, PCAP_OPENFLAG_PROMISCUOUS, 1000, NULL, NULL);
//Set to nonblock mode?
while (INT res = pcap_next_ex(pcap, &header, const_cast<const UCHAR**>(&content)) >= 0)
{
if (res != FALSE)
{
if (content)
{
//Here i do the stuff which I will simulate with a Sleep() call
Sleep(200);
}
}
}
I have seen code which uses pcap_next_ex and save the packets in a vector to treat them later with another thread, this method reduces the time of the stuff notably but does not convince me a lot. Shall I use this method?.
I would like to use other winpcap functions which capture packets in "non blocking" mode and call an event for each packet which comes... What is the best method to not lost packets with winpcap?.
Any help will be appreciated. Regards.

WinPcap stores packets it captures into a ring buffer the size of which is limited.
If the number of bytes of packets reach the ring buffer size, the old packets are discarded so that WinPcap can store new packets.
So, you should call pcap_next_ex as frequently as possible so that you can get as many packets as possible before they are discarded.
Calling pcap_next_ex in a dedicated thread and processing packets in another thread is a good practice because this way can call pcap_next_ex the most frequently.

Related

Is it OK to loop over recv / read to read all data from socket

I'm building a multi-client<->server messaging application over TCP.
I created a non blocking server using epoll to multiplex linux file descriptors.
When a fd receives data, I read() /or/ recv() into buf.
I know that I need to either specify a data length* at the start of the transmission, or use a delimiter** at the end of the transmission to segregate the messages.
*using a data length:
char *buffer_ptr = buffer;
do {
switch (recvd_bytes = recv(new_socket, buffer_ptr, rem_bytes, 0)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
buffer_ptr += recvd_bytes;
rem_bytes -= recvd_bytes;
} while (rem_bytes != 0);
**using a delimiter:
void get_all_buf(int sock, std::string & inStr)
{
int n = 1, total = 0, found = 0;
char c;
char temp[1024*1024];
// Keep reading up to a '\n'
while (!found) {
n = recv(sock, &temp[total], sizeof(temp) - total - 1, 0);
if (n == -1) {
/* Error, check 'errno' for more details */
break;
}
total += n;
temp[total] = '\0';
found = (strchr(temp, '\n') != 0);
}
inStr = temp;
}
My question: Is it OK to loop over recv() until one of those conditions is met? What if a client sends a bogus message length or no delimiter or there is packet loss? Wont I be stuck looping recv() in my program forever?

Is it OK to loop over recv() until one of those conditions is met?
Probably not, at least not for production-quality code. As you suggested, the problem with looping until you get the full message is that it leaves your thread at the mercy of the client -- if a client decides to only send part of the message and then wait for a long time (or even forever) without sending the last part, then your thread will be blocked (or looping) indefinitely and unable to serve any other purpose -- usually not what you want.
What if a client sends a bogus message length
Then you're in trouble (although if you've chosen a maximum-message-size you can detect obviously bogus message-lengths that are larger than that size, and defend yourself by e.g. forcibly closing the connection)
or there is packet loss?
If there is a reasonably small amount of packet loss, the TCP layer will automatically retransmit the data, so your program won't notice the difference (other than the message officially "arriving" a bit later than it otherwise would have). If there is really bad packet loss (e.g. someone pulled the Ethernet cable out of the wall for 5 minutes), then the rest of the message might be delayed for several minutes or more (until connectivity recovers, or the TCP layer gives up and closes the TCP connection), trapping your thread in the loop.
So what is the industrial-grade, evil-client-and-awful-network-proof solution to this dilemma, so that your server can remain responsive to other clients even when a particular client is not behaving itself?
The answer is this: don't depend on receiving the entire message all at once. Instead, you need to set up a simple state-machine for each client, such that you can recv() as many (or as few) bytes from that client's TCP socket as it cares to send to you at any particular time, and save those bytes to a local (per-client) buffer that is associated with that client, and then go back to your normal event loop even though you haven't received the entire message yet. Keep careful track of how many valid received-bytes-of-data you currently have on-hand from each client, and after each recv() call has returned, check to see if the associated per-client incoming-data-buffer contains an entire message yet, or not -- if it does, parse the message, act on it, then remove it from the buffer. Lather, rinse, and repeat.

SocketCAN select() and write() don't block

I'm testing the CAN interface on an embedded device (SOC / ARM core / Linux) using SocketCAN, and I want to send data as fast as possible for testing, using efficient code.
I can open the CAN device ("can0") as a BSD socket, and send frames with "write". This all works well.
My desktop can obviously generate frames faster than the CAN transmission rate (I'm using 500000 bps). To send efficiently, I tried using a "select" on the socket file descriptor to wait for it to become ready, followed by the "write". However, the "select" seems to return immediately regardless of the state of the send buffer, and "write" also doesn't block. This means that when the buffer fills up, I get an error from "write" (return value -1), and errno is set to 105 ("No buffer space available").
This mean I have to wait an arbitrary amount of time, then try the write again, which seems very inefficient (polling!).
Here's my code (C, edited for brevity):
printf("CAN Data Generator\n");
int skt; // CAN raw socket
struct sockaddr_can addr;
struct canfd_frame frame;
const int WAIT_TIME = 500;
// Create socket:
skt = socket(PF_CAN, SOCK_RAW, CAN_RAW);
// Get the index of the supplied interface name:
unsigned int if_index = if_nametoindex(argv[1]);
// Bind CAN device to socket created above:
addr.can_family = AF_CAN;
addr.can_ifindex = if_index;
bind(skt, (struct sockaddr *)&addr, sizeof(addr));
// Generate example CAN data: 8 bytes; 0x11,0x22,0x33,...
// ...[Omitted]
// Send CAN frames:
fd_set fds;
const struct timeval timeout = { .tv_sec=2, .tv_usec=0 };
struct timeval this_timeout;
int ret;
ssize_t bytes_writ;
while (1)
{
// Use 'select' to wait for socket to be ready for writing:
FD_ZERO(&fds);
FD_SET(skt, &fds);
this_timeout = timeout;
ret = select(skt+1, NULL, &fds, NULL, &this_timeout);
if (ret < 0)
{
printf("'select' error (%d)\n", errno);
return 1;
}
else if (ret == 0)
{
// Timeout waiting for buffer to be free
printf("ERROR - Timeout waiting for buffer to clear.\n");
return 1;
}
else
{
if (FD_ISSET(skt, &fds))
{
// Ready to write!
bytes_writ = write(skt, &frame, CAN_MTU);
if (bytes_writ != CAN_MTU)
{
if (errno == 105)
{
// Buffer full!
printf("X"); fflush(stdout);
usleep(20); // Wait for buffer to clear
}
else
{
printf("FAIL - Error writing CAN frame (%d)\n", errno);
return 1;
}
}
else
{
printf("."); fflush(stdout);
}
}
else
{
printf("-"); fflush(stdout);
}
}
usleep(WAIT_TIME);
}
When I set the per-frame WAIT_TIME to a high value (e.g. 500 uS) so that the buffer never fills, I see this output:
CAN Data Generator
...............................................................................
................................................................................
...etc
Which is good! At 500 uS I get 54% CAN bus utilisation (according to canbusload utility).
However, when I try a delay of 0 to max out my transmission rate, I see:
CAN Data Generator
................................................................................
............................................................X.XX..X.X.X.X.XXX.X.
X.XX..XX.XX.X.XX.X.XX.X.X.X.XX..X.X.X.XX..X.X.X.XX.X.XX...XX.X.X.X.X.XXX.X.XX.X.
X.X.XXX.X.XX.X.X.X.XXX.X.X.X.XX.X.X.X.X.XX..X..X.XX.X..XX.X.X.X.XX.X..X..X..X.X.
.X.X.XX.X.XX.X.X.X.X.X.XX.X.X.XXX.X.X.X.X..XX.....XXX..XX.X.X.X.XXX.X.XX.XX.XX.X
.X.X.XX.XX.XX.X.X.X.X.XX.X.X.X.X.XX.XX.X.XXX...XX.X.X.X.XX..X.XX.X.XX.X.X.X.X.X.
The initial dots "." show the buffer filling up; Once the buffer is full, "X" starts appearing meaning that the "write" call failed with error 105.
Tracing through the logic, this means the "select" must have returned and the "FD_ISSET(skt, &fds)" was true, although the buffer was full! (or did I miss something?).
The SockedCAN docs just say "Writing CAN frames can be done similarly, with the write(2) system call"
This post suggests using "select".
This post suggests that "write" won't block for CAN priority arbitration, but doesn't cover other circumstances.
So is "select" the right way to do it? Should my "write" block? What other options could I use to avoid polling?

After a quick look at canbusload:184, it seems that it computes efficiency (#data/#total bits on the bus).
On the other hand, according to this, max efficiency for CAN bus is around 57% for 8-byte frames, so you seem not to be far away from that 57%... I would say you are indeed flooding the bus.
When setting a 500uS delay, 500kbps bus bitrate, 8-byte frames, it gives you a (control+data) bitrate of 228kbps, which is lower than max bitrate of the CAN bus, so, no bottleneck here.
Also, since in this case only 1 socket is being monitored, you don't need pselect, really. All you can do with pselect and 1 socket can be done without pselect and using write.
(Disclamer: hereinafter, this is just guessing since I cannot test it right now, sorry.)
As of why the behavior of pselect, think that the buffer could have byte semantics, so it tells you there is still room for more bytes (1 at least), not necessarily for more can_frames. So, when returning, pselect does not inform you can send the whole CAN frame. I guess you could solve this by using SIOCOUTQ and the max size of the Rx buffer SO_SNDBUF, but not sure if it works for CAN sockets (the nice thing would be to use SO_SNDLOWAT flags, but it is not changable in Linux's implementation).
So, to answer your questions:
Is "select" the right way to do it?
Well, you can do it both ways, either (p)select or write, since you are only waiting for one file descriptor, there is no real difference.
Should my "write" block? It should if there is no single byte available in the send buffer.
What other options could I use to avoid polling? Maybe by ioctl'ing SIOCOUTQ and getsockopt'ing SO_SNDBUF and substracting... you will need to check this yourself. Alternatively, maybe you could set the send buffer size to a multiple of sizeof(can_frame) and see if it keeps you signaling when less than sizeof(can_frame) are available.
Anyhow, if you are interested in having a more precise timing, you could use a BCM socket. There, you can instruct the kernel to send a specific frame at a specific interval. Once set, the process run in kernel space, without any system call. In this way, user-kernel buffer problem is avoided. I would test different rates until canbusload shows no rise in bus utilization.

select and poll worked for me right with SocketCan. However, carefull configuration is require.
some background:
between user app and the HW, there are 2 buffers:
socket buffer, where its size (in bytes) is controlled by the setsockopt's SO_SNDBUF option
driver's qdisc, where its size (in packets) is controlled by the "ifconfig can0 txqueuelen 5" command.
data path is: user app "write" command --> socket buffer -> driver's qdisc -> HW TX mailbox.
2 flow control points exist along this path:
when there is no free TX mailboxe, driver freeze driver's qdisc (__QUEUE_STATE_DRV_XOFF), to prevent more packets to be dequeued from driver's qdisc into HW. it will be un-freezed when TX mailbox is free (upon TX completion interrupt).
when socket buffer goes above half of its capacity, poll/select blocks, until socket buffer goes beyond half of its capacity.
now, assume that socket buffer has room for 20 packets, while driver's qdisc has room for 5 packets. lets assume also that HW have single TX mailbox.
poll/select let user write up to 10 packets.
those packets are moved down to socket buffer.
5 of those packets continue and fill driver's qdisc.
driver dequeue 1st packet from driver's qdisc, put it into HW TX mailbox and freeze driver's qdisc (=no more dequeue). now there is room for 1 packet in driver's qdisc
6th packet is moved down successfully from socket buffer to driver's qdisc.
7th packet is moved down from socket buffer to driver's qdisc, but since there is no room - it is dropped and error 105 ("No buffer space available") is generated.
what is the solution?
in the above assumptions, lets configure socket buffer for 8 packets. in this case, poll/select will block user app after 4 packets, ensuring that there is room in driver's qdisc for all of those 4 packets.
however, socket buffer is configured to bytes, not to packet. translation should be made as the following: each CAN packet occupy ~704 bytes at socket buffer (most of them for the socket structure). so, to configure socket buffer to 8 packet, the size in bytes should be 8*704:
int size = 8*704;
setsockopt(s, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size));

Receiving UDP packets in a loop crashes linux kernel

I've modified UDP code in the linux kernel to implement send and receive buffers to handle out of order delivery of packets. When In the new code, whenever I try to deliver multiple packets to the socket from the receive buffer, I get a kernel crash. My code snippet:
while(!skb_queue_empty(&sk->sk_receive_queue)){
skb = skb_peek(&sk->sk_receive_queue);
qb = QUIC_SKB_CB(skb);
//Check if this is the packet to be received
if(qb->sequence != qp->first_rcv){
printk("First packet in queue not yet received\nFirst packet seq %u\nExpected packet seq %u\n", qb->sequence, qp->first_rcv);
//break;
goto drop;
}
skb_unlink(skb, &sk->sk_receive_queue);
if (sk_rcvqueues_full(sk, skb, sk->sk_rcvbuf))
goto drop;
rc = 0;
ipv4_pktinfo_prepare(sk, skb);
bh_lock_sock(sk);
if (!sock_owned_by_user(sk))
rc = __udp_queue_rcv_skb(sk, skb);
else if (sk_add_backlog(sk, skb, sk->sk_rcvbuf)) {
bh_unlock_sock(sk);
goto drop;
}
bh_unlock_sock(sk);
printk("Packets left in read buffer = %u\n", skb_queue_len(&sk->sk_receive_queue));
}
return rc;
However, when I remove the while loop from the code, the code runs fine, though I only manage to send one packet from the buffer. Also, the crash happens after bh_lock_sock(sk), i.e. while the packet is being delivered to the socket. I figured this out by commenting the lines between locking and unlocking the socket.
What could possibly be going wrong with the loop?
Thanks.

I figured out what I was doing wrong. I was using sk->sk_receive_queue as the receive buffer, assuming UDP doesn't really use it. So when the function __udp_queue_rcv_skb was called, it'll use the list sk->sk_receive_queue and hence interfere with the queue I'd already populated.
I just defined another receive buffer in the struct udp_opt and I'm using it now.

AT commands in embedded systems

I am using embedded C and trying to make application for GPRS terminal. My main problem is working with AT commands. I send AT command using serial line, but if it is some network oriented command its response could take time and because of that I have a lot of waiting, while processor don`t do anything. Idea is to make this waiting to be done in same way parallel like in different thread. Does anyone have idea how to do that because my system does not support threads?
I had idea to use some timers, because we have interrupt which is called every 5ms, but I don't know ho many seconds I have to wait for response, and if I compare strings in interrupt to check if all message is received it could be very inefficient, right?

you could either use interrupts, configure the serial interface to interrupt when data is available, or use an RTOS something, like FreeRTOS, to run two threads, one for the main code and the other to block and wait for the serial data.
Update: based on your comments, you say you don't know the size of the data, that's fine, in the interrupt handler check for the byte that terminates the data, this is a simple and generic example you should check the examples for your MCU:
void on_serial_char()
{
//disable interrupts
disable_interrupts();
//read byte
byte = serial.read();
//check if it's the terminating byte
if (byte == END) {
//set the flag here
MESSAGE_COMPLETE = 1;
}
//add byte to buffer
buf[length++] = byte;
//enable interrupts
enable_interrupts();
}
And check for that flag in your main loop:
...
if (MESSAGE_COMPLETE) {
//process data
...
//you may want to clear the flag here
MESSAGE_COMPLETE = 0;
//send next command
...
}

You can simply call a packetHandler in each mainLoopCycle.
This handler checks if new characters are available from the serial port.
The packetHandler will build the response message bit for bit, if the message is complete (CR LF found) then it calls a messageReceive function, else it simply returns to the mainLoop.
int main()
{
init();
for (;;)
{
packetHandler();
}
}
char msgBuffer[80];
int pos=0;
void packetHandler()
{
char ch;
while ( isCharAvailable() )
{
ch=getChar();
msgBuffer[pos++] = ch;
if ( ch == '\n' )
{
messageReceived(msgBuffer);
pos=0;
}
}
}

It sounds like you are rather close to the hardware drivers. If so, the best way is to use DMA, if the MCU supports it, then use the flag from the DMA hardware to determine when to start parse out the received data.
The second best option is to use rx interrupts, store every received byte in a simple FIFO, such as a circular buffer, then set some flag once you have received them. One buffer for incoming data and one for the latest valid data received may be necessary.

C - How to use both aio_read() and aio_write()

I implement game server where I need to both read and write. So I accept incoming connection and start reading from it using aio_read() but when I need to send something, I stop reading using aio_cancel() and then use aio_write(). Within write's callback I resume reading. So, I do read all the time but when I need to send something - I pause reading.
It works for ~20% of time - in other case call to aio_cancel() fails with "Operation now in progress" - and I cannot cancel it (even within permanent while cycle). So, my added write operation never happens.
How to use these functions well? What did I missed?
EDIT:
Used under Linux 2.6.35. Ubuntu 10 - 32 bit.
Example code:
void handle_read(union sigval sigev_value) { /* handle data or disconnection */ }
void handle_write(union sigval sigev_value) { /* free writing buffer memory */ }
void start()
{
const int acceptorSocket = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
memset(&addr, 0, sizeof(struct sockaddr_in));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_port = htons(port);
bind(acceptorSocket, (struct sockaddr*)&addr, sizeof(struct sockaddr_in));
listen(acceptorSocket, SOMAXCONN);
struct sockaddr_in address;
socklen_t addressLen = sizeof(struct sockaddr_in);
for(;;)
{
const int incomingSocket = accept(acceptorSocket, (struct sockaddr*)&address, &addressLen);
if(incomingSocket == -1)
{ /* handle error ... */}
else
{
//say socket to append outcoming messages at writing:
const int currentFlags = fcntl(incomingSocket, F_GETFL, 0);
if(currentFlags < 0) { /* handle error ... */ }
if(fcntl(incomingSocket, F_SETFL, currentFlags | O_APPEND) == -1) { /* handle another error ... */ }
//start reading:
struct aiocb* readingAiocb = new struct aiocb;
memset(readingAiocb, 0, sizeof(struct aiocb));
readingAiocb->aio_nbytes = MY_SOME_BUFFER_SIZE;
readingAiocb->aio_fildes = socketDesc;
readingAiocb->aio_buf = mySomeReadBuffer;
readingAiocb->aio_sigevent.sigev_notify = SIGEV_THREAD;
readingAiocb->aio_sigevent.sigev_value.sival_ptr = (void*)mySomeData;
readingAiocb->aio_sigevent.sigev_notify_function = handle_read;
if(aio_read(readingAiocb) != 0) { /* handle error ... */ }
}
}
}
//called at any time from server side:
send(void* data, const size_t dataLength)
{
//... some thread-safety precautions not needed here ...
const int cancellingResult = aio_cancel(socketDesc, readingAiocb);
if(cancellingResult != AIO_CANCELED)
{
//this one happens ~80% of the time - embracing previous call to permanent while cycle does not help:
if(cancellingResult == AIO_NOTCANCELED)
{
puts(strerror(aio_return(readingAiocb))); // "Operation now in progress"
/* don't know what to do... */
}
}
//otherwise it's okay to send:
else
{
aio_write(...);
}
}

If you wish to have separate AIO queues for reads and writes, so that a write issued later can execute before a read issued earlier, then you can use dup() to create a duplicate of the socket, and use one to issue reads and the other to issue writes.
However, I second the recommendations to avoid AIO entirely and simply use an epoll()-driven event loop with non-blocking sockets. This technique has been shown to scale to high numbers of clients - if you are getting high CPU usage, profile it and find out where that's happening, because the chances are that it's not your event loop that's the culprit.

First of all, consider dumping aio. There are lots of other ways to do asynchronous I/O that are not as braindead (yes, aio is breaindead). Lots of alternatives; if you're on linux you can use libaio (io_submit and friends). aio(7) mentions this.
Back to your question.
I haven't used aio in a long time but here's what I remember. aio_read and aio_write both put requests (aiocb) on some queue. They return immediately even if the requests will complete some time later. It's entirely possible to queue multiple requests without caring what happened to the earlier ones. So, in a nutshell: stop cancelling read requests and keep adding them.
/* populate read_aiocb */
rc = aio_read(&read_aiocb);
/* time passes ... */
/* populate write_aiocb */
rc = aio_write(&write_aiocb)
Later you're free to wait using aio_suspend, poll using aio_error, wait for signals etc.
I see you mention epoll in your comment. You should definitely go for libaio.

Unless I'm not mistaken, POSIX AIO (that is, aio_read(), aio_write() and so on) is guaranteed to work only on seekable file descriptors. From the aio_read() manpage:
The data is read starting at the absolute file offset aiocbp->aio_offset, regardless of the
current file position. After this request, the value of the current file position is unspeci‐
fied.
For devices which do not have an associated file position such as network sockets, AFAICS, POSIX AIO is undefined. Perhaps it happens to work on your current setup, but that seems more by accident than by design.
Also, on Linux, POSIX AIO is implemented in glibc with the help of userspace threads.
That is, where possible use non-blocking IO and epoll(). However, epoll() does not work for seekable file descriptors such as regular files (same goes for the classical select()/poll() as well); in that case POSIX AIO is an alternative to rolling your own thread pool.

There should be no reason to stop or cancel an aio read or write request just because you need to make another read or write. If that were the case, that would defeat the whole point of asynchronous reading and writing since it's main purpose is to allow you to setup a reading or writing operation, and then move on. Since multiple requests can be queued, it would be much better to setup a couple of asynchronous reader/writer pools where you can grab a set of pre-initialized aiocb structures from an "available" pool that have been setup for asynchronous operations whenever you need them, and then return them to another "finished" pool when they're done and you can access the buffers they point to. While they're in the middle of an asynchronous read or write, they would be in a "busy" pool and wouldn't be touched. That way you won't have to keep creating aiocb structures on the heap dynamically every time you need to make a read or write operation, although that's okay to-do ... it's just not very efficient if you never plan on going over a certain limit, or plan to have only a certain number of "in-flight" requests.
BTW, keep in mind with a couple different in-flight asynchronous requests that your asychronous read/write handler can actually be interrupted by another read/write event. So you really don't want to be doing a whole-lot with your handler. In the above scenario I described, your handler would basically move the aiocb struct that triggered the signal handler from one of the pools to the next in the listed "available"->"busy"->"finished" stages. Your main code, after reading from the buffer pointed to by the aiocb structures in the "finished" pool would then move the structure back to the "available" pool.