I'm using libnetfilter_queue and iptables with the NFQUEUE target to store incoming packets in three different queues with --queue-num x.
I successfully create the three queues with libnetfilter_queuefunctions, bind them, listen to them and read from them as follows:
/* given 'h' as a handler of one of my three queues */
fd = nfq_fd(h);
while ((rv = recv(fd, buf, sizeof(buf), 0)) && rv >= 0) {
nfq_handle_packet(h, buf, rv);
}
The callback function, triggered with nfq_handle_packet, has the nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL); command where it sends the packet as soon it has been processed.
The problem is: I don't want every packet to be sent right away, since I need to store them in a custom struct (written below).
So I came across a potential solution: I may call NF_DROP verdict instead of NF_ACCEPT on every packet I want to queue (so it won't be immediately sent away), store it in my custom struct and then (sooner or later) re-inject it at my need.
Sounds great, but the situation is: I don't know how to re-inject my queued packets at my pleasure from my userspace application. Is correct to use nfq_set_verdict again at a same point of my code, but with NF_ACCEPT verdict? Or should I open a socket (maybe a raw one)?
This is my custom struct
struct List {
int queue;
int pktsize;
unsigned char *buffer;
struct nfq_q_handle *qh;
struct nfqnl_msg_packet_hdr *hdr;
struct List *next;
};
representing a packet caught with the rule above.
These are my queues where to store packets.
struct List *List0 = NULL; // low priority
struct List *List1 = NULL; // medium priority
struct List *List2 = NULL; // high priority
I have Ubuntu 14.04 3.13.0-57-generic.
Any suggestions would be appreciated.
Your idea makes sense. In fact I've seen a very similar scheme implemented in a commercial product I worked on. It had to process individual packets at high rates, so it would always copy the incoming packet and immediately set an NF_DROP verdict. It would then perform the processing, and if it decided that the packet should be forwarded, it would send the copy to the outbound interface. So you're not alone.
As far as I know, nfq_set_verdict can be called only once per packet. Once the verdict is set, NFQUEUE sends the packet to the destination (which is packet heaven in your case). It doesn't keep an extra copy of the packet just in case you change your mind. So to send the packet back to the network you'll have to store a copy of it and send it using your own socket. And yes, if you want to send the received packet as-is (including headers) the outbound socket would have to be raw.
I don't know if this will fit with your application model, but Frottle just holds the packets in limbo until it decides whether to accept them or drop them. The "novelty" of this approach relies on the fact that you aren't required to call nfq_set_verdict during the NFQUEUE callback function itself; you can call it later and outside the netfilter loop proper. It will use more kernel memory, but the alternative would be just to use more usermode memory so it isn't much of a loss.
Hope this helps!
Related
The Situation
I am currently writing a kernel module, that should handle a custom network protocol based on UDP.
What I do (in rather pseudocode) is, I create a UDP socket with
sock_create(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &sk);
where sk is the socket pointer.
So, instead of actively polling the kernel for new UDP data, I registered the data ready callback with
sk->sk_data_ready = myudp_data_ready;
And here is the full code of the myudp_data_ready function:
void myudp_data_ready(struct sock *sk) {
struct sk_buff *skb;
int err;
if ((skb = skb_recv_datagram(sk, 0, 1, &err)) == NULL) {
goto Bail;
}
// ...
// HERE, MY CUSTOM UDP-BASED PROTOCOL WILL BE IMPLEMENTED
// ...
skb_free_datagram(sk, skb);
return;
Bail:
return;
}
The Problem
The problem now is, that I get all UDP packets perfectly fine.
The skb_recv_datagram function is returning a socket buffer that I can handle.
However, after some time, it stops working.
What I already tried
In /proc/net/udp I can see, that the rx_queue is growing until it's full and than packets are dropped.
This is, where I do not get any packets anymore in my code (obviously).
This seems odd.
If I understood correctly, the Kernel uses a reference count in socket buffers.
If this count drops to 1, the buffer is free'd and unlinked from the receive queue.
I had a look in to the skb->users field, which is supposed to be the reference count.
It is set to 1, which means, that my code is the only place holding a reference to the skb.
But neither skb_free_datagram nor kfree_skb seems to free the buffer, as the rx_queue keeps growing.
And I have no clue why.
Do you have any advice?
Am I missing something?
Some more information
I am using Ubuntu 20.04 with Kernel version 5.4.0-52.
I have a simple user-land application sending UDP packets to a specific port where the kernel module is listening on.
Thank you for your help.
I listen to tcp socket in linux with recv or recvfrom.
Who taking care that I will get the tcp packets on the right order?
Is that the kernel taking care so if packet 1 came after packet 2 the kernel will drop out both/save packet 2 until packet 1 will come?
Or maybe I need to take care on user-space to the order of tcp packet?
On Linux based systems in any normal scenario, this is handled by the kernel.
You can find the source code here, and here's an abridged version:
/* Queue data for delivery to the user.
* Packets in sequence go to the receive queue.
* Out of sequence packets to the out_of_order_queue.
*/
if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) {
/* packet is in order */
}
if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) {
/* already received */
}
if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt + tcp_receive_window(tp)))
goto out_of_window;
if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
/* Partial packet, seq < rcv_next < end_seq */
}
/* append to out-of-order queue */
tcp_data_queue_ofo(sk, skb);
along with the actual implementation that does the reordering using RB trees:
To quote Wikipedia,
At the lower levels of the protocol stack, due to network congestion, traffic load balancing, or unpredictable network behaviour, IP packets may be lost, duplicated, or delivered out of order. TCP detects these problems, requests re-transmission of lost data, rearranges out-of-order data, [...]
It's an inherent property of the protocol that you will receive the data in the correct order (or not at all).
Note that TCP is a stream protocol, so you can't even detect packet boundaries. A call to recv/recvfrom may return a portion of a packet, and it may return bytes that came from more than one packet.
I'm working on the sockets (in C, with no prior experience on socket programming) from the past few days.
Actually I have to collect WiFi packets on raspberry pi, do some processing and have to send the formatted information to another device over sockets (both the devices are connect in a network).
The challenge I'm facing is when receiving the data over the sockets.
While sending the data, the data is sent successfully over the sockets from the sending side but on the receiving side, sometimes some junk or previous data is received.
On Sending Side (client):
int server_socket = socket(AF_INET, SOCK_STREAM, 0);
//connecting to the server with connect function
send(server_socket, &datalength, sizeof(datalength),0); //datalength is an integer containing the number of bytes that are going to be sent next
send(server_socket, actual_data, sizeof(actual_data),0); //actual data is a char array containing the actual character string data
On Receiving Side (Server Side):
int server_socket = socket(AF_INET, SOCK_STREAM, 0);
//bind the socket to the ip and port with bind function
//listen to the socket for any clients
//int client_socket = accept(server_socket, NULL, NULL);
int bytes;
recv(client_socket, &bytes, sizeof(bytes),0);
char* actual_message = malloc(bytes);
int rec_bytes = recv(client_socket, actual_message, bytes,0);
*The above lines of code are not the actual lines of code, but the flow and procedure would be similar (with exception handling and comments).
Sometimes, I could get the actual data for all the packets quickly(without any errors and packet loss). But sometimes the bytes (integer sent to tell the size of byte stream for the next transaction) is received as a junk value, so my code is breaking at that point.
Also sometimes the number of bytes that I receive on the receving side are less than the number of bytes expected (known from the received integer bytes). So in that case, I check for that condition and retrieve the remaining bytes.
Actually the rate at which packets arrive is very high (around 1000 packets in less than a second and I have to dissect, format and send it over sockets). I'm trying different ideas (using SOCK_DGRAMS but there is some packet loss here, insert some delay between transactions, open and close a new socket for each packet, adding an acknowledgement after receiving packet) but none are them meets my requirement (quick transfer of packets with 0 packet loss).
Kindly, suggest a way to send and receive varying length of packets at a quick rate over sockets.
I see a few main issues:
I think your code ignores the possibility of a full buffer in the send function.
It also seems to me that your code ignores the possibility of partial data being collected by recv. (nevermind, I just saw the new comment on that)
In other words, you need to manage a user-land buffer for send and handle fragmentation in recv.
The code uses sizeof(int) which might be a different length on different machines (maybe use uint32_t instead?).
The code doesn't translate to and from network byte order. This means that you're sending the memory structure of the int instead of an integer that can be read by different machines (some machines store the bytes backwards, some forward, some mix and match).
Notice that when you send larger data using TCP/IP, it will be fragmented into smaller packets.
This depends, among others, on the MTU network value (which often runs at ~500 bytes in the wild and usually around ~1500 bytes in your home network).
To handle these cases you should probably use an evented network design rather than blocking sockets.
Consider routing the send through something similar to this (if you're going to use blocking sockets):
int send_complete(int fd, void * data, size_t len) {
size_t act = 0;
while(act < len) {
int tmp = send(fd, (void *)((uintptr_t)data + act), len - act);
if(tmp <= 0 && errno != EWOULDBLOCK && errno != EAGAIN && errno != EINTR)
return tmp; // connection error
act += tmp;
// add `select` to poll the socket
}
return (int)act;
}
As for the sizeof issues, I would replace the int with a specific byte length integer type, such as int32_t.
A few more details
Please notice that sending the integer separately doesn't guaranty that it would be received separately or that the integer itself wouldn't be fragmented.
The send function writes to the system's buffer for the socket, not to the network (just like recv reads from the available buffer and not from the wire).
You can't control where fragmentation occurs or how the TCP packets are packed (unless you implement your own TCP/IP stack).
I'm sure it's clear to you that the "junk" value is data that was sent by the server. This means that the code isn't reading the integer you send, but reading another piece of data.
It's probably a question of alignment to the message boundaries, caused by an incomplete read or an incomplete send.
P.S.
I would consider using the Websocket protocol on top of the TCP/IP layer.
This guaranties a binary packet header that works with different CPU architectures (endianness) as well as offers a wider variety of client connectivity (such as connecting with a browser etc').
It will also solve the packet alignment issue you're experiencing (not because it won't exist, but because it was resolved in whatever Websocket parser you will adopt).
I am currently writing a kernel module that modifies packet payloads as a learning experience. I have the packet modifications done, but now I want to send out this new modified packet after the original (I don't want to drop the original). I can't seem to find a kernel function that sends SKB's for transmission. I've tried dev_queue_xmit(nskb) but that causes a kernel panic, I also tried skb->next = nskb but that does nothing. Do I have to implement the SKB list handling? I am unsure of how to do that since this article seems to be outdated .
EDIT:
So I was able to fix the kernel panic when calling dev_queue_xmit(nskb), I was accidentally doing dev_queue_xmit(skb) which would delete skb and cause a panic from net filter. The problem is now that everything works, but I'm not seeing duplicate packets being sent out, there is no trace of the second packet ever being sent. TCPDump on the machine doesn't see anything and TPCDump on the target doesn't see anything either, the following is my code.
unsigned int in_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) {
struct sk_buff *nskb = skb_copy(skb, GFP_KERNEL);
/* Various other variables not relevant to the problem */
__u32 saddr, daddr;
saddr = ntohl(iph->saddr);
if (saddr == ipToInt(10,0,2,12) || saddr == ipToInt(10,0,2,13)) {
/*For loop that saves the payload contents into a variable */
/* Here is where the problem is,
I have this if statement to prevent a feedback loop
then if the ip matches, I call dev_queue_xmit(nskb)
which is supposed to send out sk_buff's, but TCPDump doesn't
show anything on any computer */
if (saddr == ipToInt(10,0,2,13)) {
dev_queue_xmit(nskb);
}
/* Rest of the code that isn't relevant to sending packets */
}
return NF_ACCEPT;
}
My network setup is as follows, it's 3 Ubuntu Server VM's, all of them are being SSH'd into from the host computer (macOS if it matters, I don't know at this point). The computer running the above kernel module spoofs bidirectionally the other two VM's. The other two VM's then talk to each other via a netcat session. I'm hoping that when I send one message from the VM with ip 10.0.2.13, that 10.0.2.12 sees two of the same message. I know the acknowledgement number mishap will break the connection, but I'm not getting that. TCPDump on any of the 3 computers doesn't show anything besides the packets that are supposed to be sent.
I have so far tried dev_queue_xmit(nskb) as well as nskb->dev->netdev_ops->ndo_start_xmit(nskb, skb->dev).
As far as I remember dev_queue_xmit() is the right procedure for sending. The question is how do you prepared the skb you want to send? Also give us the calltrace from dmesg when the kernel panic occured. Do you set skb->dev?
I figured it out, skb_copy doesn't copy the ethernet header of an skb, so the sent packet never reaches its destination.
i am calling select system call for every 1 second.if two or more packets arrives within one second. read() considering it as one packet. i want to know how many has arrived and also reading each packet separately? please resolve the issue by not reducing that time "1 sec".
TCP is a streaming protocol, which doesn't expose individual packets at the application level.
If you need to process individual packets, you will need to switch to a datagram protocol, such as UDP, which is designed for that kind of usage. Note, however, that this will require changes to both sender and receiver code, as well as a protocol redesign to manually handle retransmissions.
You're doing it wrong, or else you need to describe more what you're doing. You normally call select() as frequently as possible and block waiting for input. Why are you clamping randomly at 1s? Whenever you're notified of readability, you always read greedily until you get EWOULDBLOCK/EAGAIN, at which point you go back to select() to wait for more input.
For TCP you can define a protocol header and put the size in it (1,2 or 4 bytes, you need to define it).
for each signaled socket you can to do the steps
read header - size = "header size" (must be constant)
get the "data size" from header
read data size = data size (from step 2)
process packet
if there is more to read back to step 1.
EDIT:
if we
let say this is your header:
struct header{
int data_size;
//add more things if u like
};
in your code you read as usual but in 2 steps:
int res;
header hdr;
unsigned char data[MAX_SIZE];
res = recv(s, hdr, sizeof(hdr),0);
//now res should be = sizeof(hdr)
res = recv(s, data,hdr->data_size,0);
//now res should be = hdr->data_size
now you have a full packet you can process.