If I want to sniffing packet in linux without set any filters, I saw 2 options.
Use libpcap
Use raw socket myself like https://www.binarytides.com/packet-sniffer-code-in-c-using-linux-sockets-bsd-part-2/
Why libpcap is better than use raw sockets myself?
Three reasons:
1) It's way easier to correctly set up.
2) It's portable, even to Windows, which uses a quite similar yet different API for sockets.
3) It's MUCH faster.
1 and 2, IMO, don't need much explanation. I'll dive into 3.
To understand why libpcap is (generally) faster, we need to understand the bottlenecks in the socket API.
The two biggest bottlenecks that libpcap tends to avoid are syscalls and copies.
How it does so is platform-specific.
I'll tell the story for Linux.
Linux, since 2.0 IIRC, implements what it calls AF_PACKET socket family, and later with it PACKET_MMAP. I don't exactly recall the benefits of the former, but the latter is critical to avoid both copying from kernel to userspace (there are still a few copies kernel-side) and syscalls.
In PACKET_MMAP you allocate a big ring buffer in userspace, and then associate it to an AF_PACKET socket. This ring buffer will contain a bit of metadata (most importantly, a marker that says if a region is ready for user processing) and the packet contents.
When a packet arrives to a relevant interface (generally one you bind your socket to), the kernel makes a copy in the ring buffer and marks the location as ready for userspace*.
If the application was waiting on the socket, it gets notified*.
So, why is this better than raw sockets? Because you can do with few or none syscalls after setting up the socket, depending on whether you want to busy-poll the buffer itself or wait with poll until a few packets are ready, and because you don't need the copy from the socket's internal RX buffer to your user buffers, since it's shared with you.
libpcap does all of that for you. And does it on Mac, *BSD, and pretty much any platform that provides you with faster capture methods.
*It's a bit more complex on version 3, where the granularity is in "blocks" instead of packets.
Related
I'd need to know where can I make zeroization for the received/transmitted network packets in the e1000 linux driver. I need to know this to pass one compliance requirement, but I'm not able to find in the code of the e1000 where to do zeroization of the network packet buffer (or if it already does the zeroization somewhere, that would be great)
I saw that it does ring zeroization when the interface goes up or down in the kernel in the file Intel_LAN_15.0.0_Linux_Source_A00/Source/base_driver/e1000e-2.4.14/src/netdev.c, in the e1000_clean_rx_ring() and e1000_clean_tx_ring() functions:
/* Zero out the descriptor ring */
memset(rx_ring->desc, 0, rx_ring->size);
But I'm not able to find where it should be done for each packet that the system receives/send.
So, does anybody know where is the place in the code where the buffer zeroization for the tx/rx packets should happen? I bet that it will introduce some overhead, but I have to do it anyway.
We're using the intel EF multi port network card: https://www-ssl.intel.com/content/www/us/en/network-adapters/gigabit-network-adapters/gigabit-et-et2-ef-multi-port-server-adapters-brief.html?
and the kernel 3.4.107
We're using the linux-image-3.4.107-0304107-generic_3.4.107-0304107.201504210712_amd64.deb kernel
EDIT: #skgrrwasme pointed correctly that the e1000_clean_tx_ring and e1000_clean_rx_ring functions seem to do the zeroize work, but as it is done only when the hw is down it is not valid for our compliance need.
So, it seems that the functions that are doing the work for each packet are e1000_clean_rx_irq and e1000_clean_tx_irq, but those functions doesn't zeroize data, they only free memory but doesn't make a memset() with 0 to overwrite memory (and that's what is required). So, what I think could be done is, as it is enough to zeroize data when rx or tx, inside e1000_clean_tx_irq() calls to e1000_unmap_and_free_tx_resource(), but in fact it only frees it, not zeroize it:
if (buffer_info->skb) {
dev_kfree_skb_any(buffer_info->skb);
buffer_info->skb = NULL;
}
So what I think is that we can wrote inside dev_kfree_skb_any(), the memset. That function calls to two functions:
dev_kfree_skb_any(struct sk_buff *skb)
{
if (in_irq() || irqs_disabled())
dev_kfree_skb_irq(skb);
else
dev_kfree_skb(skb);
}
So, something easy would be a call to skb_recycle_check(skb); that will do a:
memset(skb, 0, offsetof(struct sk_buff, tail));
Does this make sense? I think that with this, the memory will be overwritten with zeroes, and the work will be done, but I'm not sure...
TL;DR
As far as I can tell, both the transmit and receive buffers are already cleaned by the driver for both transmit and receive. I don't think you need to do anything.
Longer Answer
I don't think you have to worry about it. The transmit and receive buffer clearing functions, e1000_clean_rx__irq and e1000_clean_rx_irq, seem to be called in any interrupt configuration, and for both transmit and receive. Interrupts can be triggered with any of the following interrupt signaling methods: legacy, MSI, or MSI-X. It appears that ring buffer cleaning happens in any interrupt mode, but they call the cleaning functions in different locations.
Since you have two types of transfers (transmit and receive) and three different types of interrupt invocations (Legacy, MSI, and MSI-X), you have a total of six scenarious where you need to make sure things are cleaned. Fortunately, five of the six situations handle the packets by scheduling a job for NAPI. These scenarios are transmit and receive for Legacy and MSI interrupts, and receive for MSI-X. Part of NAPI handling those packets is calling the e1000_clean function as a callback. If you look at the code, you'll see that it calls the buffer cleaning functions for both TX and RX.
The outlier is the MSI-X TX handler. However, it seems to directly call the TX buffer cleaning function, rather than having NAPI handle it.
Here are the relevant interrupt handlers that weren't specifically listed above:
Legacy (both RX and TX)
MSI (both RX and TX)
MSI-X RX
Notes
All of my function references will open a file in the e1000e driver called netdev.c. They will open a window in the Linux Cross Reference database.
This post discusses the e1000e driver, but some of the function names are "e1000...". I think a lot of the e1000 code was reused in the newer e1000e driver, so some of the names carried over. Just know that it isn't a typo.
The e1000_clean_tx_ring and e1000_clean_rx_ring functions that you referred too appear to only be called when the driver is trying to free resources or the hardware is down, during any actual packet handling. The two I referenced above seem to, though. I'm not sure exactly what the difference between them is, but they appear to get the job done.
I am looking to implement some kind of transmission protocol in C, to use on a custom hardware. I have the ability to send and receive through RF, but I need to rely in some protocol that validates the package integrity sent/received, so I though it would be a good idea to implement some kind of UDP library.
Of course, if there is any way that I can modify the existing implementations for UDP or TCP so it works over my RF device it would be of great help. The only thing that I think it needs to be changed is the way that a single bit is sent, if I could change that on the UDP library (sys/socket.h) it would save me a lot of time.
UDP does not exist in standard C99 or C11.
It is generally part of some Internet Protocol layer. These are very complex software (as soon as you want some performance).
I would suggest to use some existing operating system kernel (e.g. Linux) and to write a network driver (e.g. for the Linux kernel) for your device. Life is too short to write a competitive UDP like layer (that could take you dozens of years).
addenda
Apparently, the mention of UDP in the question is confusing. Per your comments (which should go inside the question) you just want some serial protocol on a small 8 bits PIC 18F4550 microcontroller (32Kbytes ROM + 2Kbytes RAM). Without knowing additional constraints, I would suggest a tiny "textual" like protocol (e.g. in ASCII lines, no more than 128 bytes per line, \n terminated ....) and I would put some simple hex checksum inside it. In the 1980s Hayes modems had such things.
What you should then do is define and document the protocol first (e.g. as BNF syntax of the message lines), then implement it (probably with buffering and finite state automaton techniques). You might invent some message format like e.g. DOFOO?123,456%BE53 followed by a newline, meaning do the command DOFOO with arguments 123 then 456 and hex checksum BE53
I need to perform data filtering based on the source unicast IPv4 address of datagrams arriving to a Linux UDP socket.
Of course, it is always possible to manually perform the filtering based on the information provided by recvfrom, but I am wondering if there could be another more intelligent/efficient approach (if possible, not using libpcap).
Any ideas?
If it's a single source you need to allow, then use just connect(2) and kernel will do filtering for you. As a bonus, connected UDP sockets are more efficient. This, of cource, does not work for more then one source.
As already stated, NetFilter (the Linux firewall) can help you here.
You could also use the UDP options of xinetd and tcpd to perform filtering.
What proportion of datagrams are you expecting to discard? If it is very high, then you may want to review your application design (for example, to make the senders not send so many datagrams which are to be discarded). If it is not very high, then you don't really care about how much effort you spend discarding them.
Suppose discarding a packet takes the same amount of (runtime) effort as processing it normally; if you discard 1% of packets, you will only be spending 1% of time discarding. However, realistically, discarding is likely to be much easier than processing messages.
If I am writing to a socket file descriptor using write() bytes by bytes,
Is every byte now a packet?
will the socket add TCP/IP header to every byte?
Or does it have a buffer mechanism (I personally doubt so since I don't have explicitly flush).
For example:
write(fd, 'a', 1);
write(fd, 'b', 1);
write(fd, 'c', 1);
Will this be less efficient than say
write (fd, 'abc', 3);
I have to ask this here because I do not have the expertise to monitor TCP/IP header in traffic. Thanks.
No, not every byte will become a packet. Some may be coalesced due to Nagle's Algorithm and other things. There will be one TCP header per packet, not per byte.
That said, you should avoid calling write/send byte by byte because each one is a system call, which is expensive (on the local machine, not in terms of how it ends up on the network).
Adding to John's answer, you can disable Nagle's Algorithm (via TCP_NODELAY) and then the first version will become slower.
And for the reverse, you can call writev() instead of write(), which will cause the first version to perform exactly as the second.
It really depends on the implementation of the TCP/IP stack. It would really depend on the segmentation that is implemented in the OS. Most OSes have a lot of optimization already built in.
If you're looking at a worst case situation, a TCP header is 20 bytes, an IP header is 20 bytes and the size of the frame header (depending on the protocol you're using, probably ethernet), so you could expect that plus your payload. That being said, the majority of the traffic in the internet is dominated by ACKs however, your network stack should combine the payloads.
We have a client/server communication system over UDP setup in windows. The problem we are facing is that when the throughput grows, packets are getting dropped. We suspect that this is due to the UDP receive buffer which is continuously being polled causing the buffer to be blocked and dropping any incoming packets. Is it possible that reading this buffer will cause incoming packets to be dropped? If so, what are the options to correct this? The system is written in C. Please let me know if this is too vague and I can try to provide more info. Thanks!
The default socket buffer size in Windows sockets is 8k, or 8192 bytes. Use the setsockopt Windows function to increase the size of the buffer (refer to the SO_RCVBUF option).
But beyond that, increasing the size of your receive buffer will only delay the time until packets get dropped again if you are not reading the packets fast enough.
Typically, you want two threads for this kind of situation.
The first thread exists solely to service the socket. In other words, the thread's sole purpose is to read a packet from the socket, add it to some kind of properly-synchronized shared data structure, signal that a packet has been received, and then read the next packet.
The second thread exists to process the received packets. It sits idle until the first thread signals a packet has been received. It then pulls the packet from the properly-synchronized shared data structure and processes it. It then waits to be signaled again.
As a test, try short-circuiting the full processing of your packets and just write a message to the console (or a file) each time a packet has been received. If you can successfully do this without dropping packets, then breaking your functionality into a "receiving" thread and a "processing" thread will help.
Yes, the stack is allowed to drop packets — silently, even — when its buffers get too full. This is part of the nature of UDP, one of the bits of reliability you give up when you switch from TCP. You can either reinvent TCP — poorly — by adding retry logic, ACK packets, and such, or you can switch to something in-between like SCTP.
There are ways to increase the stack's buffer size, but that's largely missing the point. If you aren't reading fast enough to keep buffer space available already, making the buffers larger is only going to put off the time it takes you to run out of buffer space. The proper solution is to make larger buffers within your own code, and move data from the stack's buffers into your program's buffer ASAP, where it can wait to be processed for arbitrarily long times.
Is it possible that reading this buffer will cause incoming packets to be dropped?
Packets can be dropped if they're arriving faster than you read them.
If so, what are the options to correct this?
One option is to change the network protocol: use TCP, or implement some acknowledgement + 'flow control' using UDP.
Otherwise you need to see why you're not reading fast/often enough.
If the CPU is 100% utilitized then you need to do less work per packet or get a faster CPU (or use multithreading and more CPUs if you aren't already).
If the CPU is not 100%, then perhaps what's happening is:
You read a packet
You do some work, which takes x msec of real-time, some of which is spent blocked on some other I/O (so the CPU isn't busy, but it's not being used to read another packet)
During those x msec, a flood of packets arrive and some are dropped
A cure for this would be to change the threading.
Another possibility is to do several simultaneous reads from the socket (each of your reads provides a buffer into which a UDP packet can be received).
Another possibility is to see whether there's a (O/S-specific) configuration option to increase the number of received UDP packets which the network stack is willing to buffer until you try to read them.
First step, increase the receiver buffer size, Windows pretty much grants all reasonable size requests.
If that doesn't help, your consume code seems to have some fairly slow areas. I would use threading, e.g. with pthreads and utilize a producer consumer pattern to put the incoming datagram in a queue on another thread and then consume from there, so your receive calls don't block and the buffer does not run full
3rd step, modify your application level protocol, allow for batched packets and batch packets at the sender to reduce UDP header overhead from sending a lot of small packets.
4th step check your network gear, switches, etc. can give you detailed output about their traffic statistics, buffer overflows, etc. - if that is in issue get faster switches or possibly switch out a faulty one
... just fyi, I'm running UDP multicast traffic on our backend continuously at avg. ~30Mbit/sec with peaks a 70Mbit/s and my drop rate is bare nil
Not sure about this, but on windows, its not possible to poll the socket and cause a packet to drop. Windows collects the packets separately from your polling and it shouldn't cause any drops.
i am assuming your using select() to poll the socket ? As far as i know , cant cause a drop.
The packets could be lost due to an increase in unrelated network traffic anywhere along the route, or full receive buffers. To mitigate this, you could increase the receive buffer size in Winsock.
Essentially, UDP is an unreliable protocol in the sense that packet delivery is not guaranteed and no error is returned to the sender on delivery failure. If you are worried about packet loss, it would be best to implement acknowledgment packets into your communication protocol, or to port it to a more reliable protocol like TCP. There really aren't any other truly reliable ways to prevent UDP packet loss.