Retransmitting large packets with raw sockets

Retransmitting large packets with raw sockets - c

Problem: On raw sockets, recvfrom can capture more bytes than sendto can send, preventing me from retransmitting packets larger than MTU.
Background: I'm programming an application that will capture and retransmit packets. Basically host A sends data to X that logs them and forwards them to B, all Linux machines. I'm using raw socket so I can capture all data and it's created with socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)).
Then, there's code waiting for and reading incoming packets:
const int buffer_size = 2048;
uint8_t* buffer = new uint8_t[buffer_size];
sockaddr_ll addr = {0};
socklen_t addr_len = sizeof(addr);
int received_bytes = recvfrom(_raw_socket, buffer, buffer_size, 0, (struct sockaddr*)&addr, &addr_len);
Packet processing follows and the loop is finished with sending packet out again:
struct sockaddr_ll addr;
memset(&addr, 0, sizeof(struct sockaddr_ll));
addr.sll_family = htons(AF_PACKET);
addr.sll_protocol = eth_hdr->type;
addr.sll_ifindex = interface().id();
addr.sll_halen = HardwareAddress::byte_size;
memcpy(&(addr.sll_addr), eth_hdr->dest_mac, HardwareAddress::byte_size);
// Try to send packet
if(sendto(raw_socket(), data, length, 0, (struct sockaddr*)&addr, sizeof(addr)) < 0)
The problem is that I don't expect to receive packets that are larger than Ethernet MTU (1500 bytes) and I shouldn't since I'm using raw sockets that process each packet individually. But sometimes I do receive packets larger than MTU. I thought it might be error in my code but Wireshark confirms that as shown in the image, so there must be some reassembly going on at lower level like network controller itself.
Well, ok then I don't think there's a way to disable this for just one application and I can't change the host configuration, so I might increase buffer size. But the problem is that when I call sendto with anything larger than MTU size (actually 1514B, becuase of eth header) I get 80: Message too long errno. And that's the problem stated above - I can't send out the same packet I received. What could be possible solution for this? And what buffer size would I need to always capture whole packet?
EDIT: I have just checked on the machines with ethtool -k interf and got tcp-segmentation-offload: on on all of them, so it seems that it's really NIC reassembling fragments. But I wonder why sendto doesn't behave as recvfrom. If the packets can be automatically reassembled, why not fragmented?
A side note: The application needs to send those packets. Setting up forwarding with iptables etc. won't work.

Your network card probably has segmentation offload enabled, which means the hardware can re-assemble TCP segments before they reach the OS or your code.
You can check whether that is the case by running ethtool -k.
While transparently capturing TCP traffic and re-transmitting it at such a low level is often more trouble than it is worth(one are often better off doing this at the application layer, terminate the TCP connection and set up a new TCP connection towards your host B), you cannot capture and re-send packets if your network card has messed with the packets. You need to:
Turn off generic-segmentation-offload
Turn off generic-receive-offload
Turn off tcp-segmentation-offload
Turn off udp-fragmentation-offload if you are also dealing with UDP
Turn off rx-vlan-offload/tx-vlan-offload if your packets are VLAN encapsulated
Possibly turn off rx-checksumming and tx-checksumming. It either works if both
are enabled, or it's broken wrt. RAW sockets if enabled, depending on your
kernel version and type of network card.
These can be turned on/off with the ethtool -K command, the exact syntax is described in the ethtool manpage.

Related

Solarflare ef vi how to sniff TCP packet to a specific port?

{
in_addr addr;
EFVI_CHECK(::inet_aton("11.231.75.7", &addr));
ef_filter_spec filter_spec;
ef_filter_spec_init(&filter_spec, EF_FILTER_FLAG_NONE);
ef_filter_spec_set_ip4_local(&filter_spec, IPPROTO_TCP, addr.s_addr, htons(21723));
ef_vi_filter_add(&res_.vi, res_.dh, &filter_spec, nullptr);
}
{
// problem is here
ef_filter_spec filter_spec;
ef_filter_spec_init(&filter_spec, EF_FILTER_FLAG_NONE);
ef_filter_spec_set_port_sniff(&filter_spec, 0);
ef_vi_filter_add(&res_.vi, res_.dh, &filter_spec, nullptr);
}
I'm trying to sniff packets sent to a specific netword card at a specific port using Solarflare ef_vi API.
However, when I use ef_filter_spec_set_port_sniff(&filter_spec, 0), it actually sniff all packets received at that netword card. Basically the previous filters are ignored.
When I use ef_filter_spec_set_ip4_local alone, it intercepts the packet instead of sniffing (i.e the packet isn't sent to the kernel).
How can I do this with ef_vi?

what you can do is to interpret all the packets and process only packets matched your port. As far as I know, there is no way to sniff to a specific port using ef_vi APIs.

More raw (packet) sockets can't communicate on the same network interface

There are two Linux C programs, one called 'sender', another one called 'receiver'. Both program uses raw (packet) sockets on the same network interface (eth0). They communicate using a custom ethernet protocol (type). Yes, the point is exactly that to have access to raw ethernet frames.
The sockets are opened somehow like this:
sock = socket(AF_PACKET, SOCK_RAW, htons(MY_CUSTOM_ETH_PROTOCOL));
Receiver issues this to read from the raw socket:
recv(sock, eth_frame, MAX_ETH_FRAME_LEN, 0);
Sender issues this to write to the raw socket:
struct sockaddr_ll sa;
memset(&sa, 0, sizeof(sa));
sa.sll_family = AF_PACKET;
memcpy(sa.sll_addr, dst_mac, 6);
sa.sll_halen = 6;
sa.sll_ifindex = itf_idx;
I hope it's unnecessary to share how I assemble valid ethernet frames, how I get network interface index etc.
The problem: if the two programs run on the same machine, receiver can't see the ethernet frames emitted by the sender. However, Wireshark can see them all.
If the two programs run on separate machines connected with a switch, receiver receives the ethernet frames emitted by the sender.
In the first case, no indication of errors is seen.
What can this be? I need to make the raw socket capable of receiving all raw ethernet frames that are put on the wire by other raw sockets.

The resolution: if I create the socket like this:
socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
all ethernet frames are received. Since my software contains filtering based on ethernet frame type, it can select the ethernet frames belonging to the custom protocol.
Thanks!

C udp recvfrom client side

I'm writing udp server/client application in which server is sending data and client
is receiving. When packet is loss client should sent nack to server. I set the socket as
O_NONBLOCK so that I can notice if the client does not receive the packet
if (( bytes = recvfrom (....)) != -1 ) {
do something
}else{
send nack
}
My problem is that if server does not start to send packets client is behave as the
packet is lost and is starting to send nack to server. (recvfrom is fail when no data is available)I want some advice how can I make difference between those cases , if the server does not start to send the packets and if it sends, but the packet is really lost

You are using UDP. For this protocol its perfectly ok to throw away packets if there is need to do so. So it's not reliable in terms of "what is sent will arrive". What you have to do in your client is to check wether all packets you need arrived, and if not, talk politely to your server to resend those packets you did not receive. To implement this stuff is not that easy,
If you have to use UDP to transfer a largish chunk of data, then design a small application-level protocol that would handle possible packet loss and re-ordering (that's part of what TCP does for you). I would go with something like this:
Datagrams less then MTU (plus IP and UDP headers) in size (say 1024 bytes) to avoid IP fragmentation.
Fixed-length header for each datagram that includes data length and a sequence number, so you can stitch data back together, and detect missed, duplicate, and re-ordered parts.
Acknowledgements from the receiving side of what has been successfully received and put together.
Timeout and retransmission on the sending side when these acks don't come within appropriate time.
you have a loop calling either select() or poll() to determine if data has arrived - if so you then call recvfrom() to read the data.
you can set time out for receive data as follows
ssize_t
recv_timeout(int fd, void *buf, size_t len, int flags)
{
ssize_t ret;
struct timeval tv;
fd_set rset;
// init set
FD_ZERO(&rset);
// add to set
FD_SET(fd, &rset);
// this is set to 60 seconds
tv.tv_sec =
config.idletimeout;
tv.tv_usec = 0;
// NEVER returns before the timeout value.
ret = select(fd, &rset, NULL, NULL, &tv);
if (ret == 0) {
log_message(LOG_INFO,
"Idle Timeout (after select)");
return 0;
} else if (ret < 0) {
log_message(LOG_ERR,
"recv_timeout: select() error \"%s\". Closing connection (fd:%d)",
strerror(errno), fd);
return;
}
ret = recvfrom(fd, buf, len, flags);
return ret;
}
It tells that if there are data ready, Normally, read() should return up to the maximum number of bytes that you've specified, which possibly includes zero bytes (this is actually a valid thing to happen!), but it should never block after previously having reported readiness.
Under Linux, select() may report a socket file descriptor as "ready
for reading", while nevertheless a subsequent read blocks. This could
for example happen when data has arrived but upon examination has
wrong checksum and is discarded. There may be other circumstances in
which a file descriptor is spuriously reported as ready. Thus it may
be safer to use O_NONBLOCK on sockets that should not block.

Look up sliding window protocol here.
The idea is that you divide your payload into packets that fit in a physical udp packet, then number them. You can visualize the buffers as a ring of slots, numbered sequentially in some fashion, e.g. clockwise.
Then you start sending from 12 oclock moving to 1,2,3... In the process, you may (or may not) receive ACK packets from the server that contain the slot number of a packet you sent.
If you receive a ACK, then you can remove that packet from the ring, and place the next unsent packet there which is not already in the ring.
If you receive a NAK for a packet you sent, it means that packet was received by the server with data corruptions, and then you resend it from the ring slot reported in the NAK.
This protocol class allows transmission over channels with data or packet loss (like RS232, UDP, etc). If your underlying data transmission protocol does not provide checksums, then you need to add a checksum for each ring packet you send, so the server can check its integrity, and report back to you.
ACK and NAK packets from the server can also be lost. To handle this, you need to associate a timer with each ring slot, and if you don't receive either a ACK or NAK for a slot when the timer reaches a timeout limit you set, then you retransmit the packet and reset the timer.
Finally, to detect fatal connection loss (i.e. server went down), you can establish a maximum timeout value for all your packets in the ring. To evaluate this, you just count how many consecutive timeouts you have for single slots. If this value exceeds the maximum you have set, then you can consider the connection lost.
Obviously, this protocol class requires dataset assembly on both sides based on packet numbers, since packets may not be sent or received in sequence. The 'ring' helps with this, since packets are removed only after successful transmission, and on the receiving side, only when the previous packet number has already been removed and appended to the growing dataset. However, this is only one strategy, there are others.
Hope this hepls.

First UDP message to a specific remote ip gets lost

I am working on a LAN based solution with a "server" that has to control a number of "players"
My protocol of choice is UDP because its easy, I do not need connections, my traffic consists only of short commands from time to time and I want to use a mix of broadcast messages for syncing and single target messages for player individual commands.
Multicast TCP would be an alternative, but its more complicated, not exactly suited for the task and often not well supported by hardware.
Unfortunately I am running into a strange problem:
The first datagram which is sent to a specific ip using "sendto" is lost.
Any datagram sent short time afterwards to the same ip is received.
But if i wait some time (a few minutes) the first "sendto" is lost again.
Broadcast datagrams always work.
Local sends (to the same computer) always work.
I presume the operating system or the router/switch has some translation table from IP to MAC addresses which gets forgotten when not being used for some minutes and that unfortunately causes datagrams to be lost.
I could observe that behaviour with different router/switch hardware, so my suspect is the windows networking layer.
I know that UDP is by definition "unreliable" but I cannot believe that this goes so far that even if the physical connection is working and everything is well defined packets can get lost. Then it would be literally worthless.
Technically I am opening an UDP Socket,
bind it to a port and INADRR_ANY.
Then I am using "sendto" and "recvfrom".
I never do a connect - I dont want to because I have several players. As far as I know UDP should work without connect.
My current workaround is that I regularly send dummy datagrams to all specific player ips - that solves the problem but its somehow "unsatisfying"
Question: Does anybody know that problem? Where does it come from? How can I solve it?
Edit:
I boiled it down to the following test program:
int _tmain(int argc, _TCHAR* argv[])
{
WSADATA wsaData;
WSAStartup(MAKEWORD(2, 2), &wsaData);
SOCKET Sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
SOCKADDR_IN Local = {0};
Local.sin_family = AF_INET;
Local.sin_addr.S_un.S_addr = htonl(INADDR_ANY);
Local.sin_port = htons(1234);
bind(Sock, (SOCKADDR*)&Local, sizeof(Local));
printf("Press any key to send...\n");
int Ret, i = 0;
char Buf[4096];
SOCKADDR_IN Remote = {0};
Remote.sin_family = AF_INET;
Remote.sin_addr.S_un.S_addr = inet_addr("192.168.1.12"); // Replace this with a valid LAN IP which is not the hosts one
Remote.sin_port = htons(1235);
while(true) {
_getch();
sprintf(Buf, "ping %d", ++i);
printf("Multiple sending \"%s\"\n", Buf);
// Ret = connect(Sock, (SOCKADDR*)&Remote, sizeof(Remote));
// if (Ret == SOCKET_ERROR) printf("Connect Error!\n", Buf);
Ret = sendto(Sock, Buf, strlen(Buf), 0, (SOCKADDR*)&Remote, sizeof(Remote));
if (Ret != strlen(Buf)) printf("Send Error!\n", Buf);
Ret = sendto(Sock, Buf, strlen(Buf), 0, (SOCKADDR*)&Remote, sizeof(Remote));
if (Ret != strlen(Buf)) printf("Send Error!\n", Buf);
Ret = sendto(Sock, Buf, strlen(Buf), 0, (SOCKADDR*)&Remote, sizeof(Remote));
if (Ret != strlen(Buf)) printf("Send Error!\n", Buf);
}
return 0;
The Program opens an UDP Socket, and sends 3 datagrams in a row on every keystroke to a specific IP.
Run that whith wireshark observing your UDP traffic, press a key, wait a while and press a key again.
You do not need a receiver on the remote IP, makes no difference, except you wont get the black marked "not reachable" packets.
This is what you get:
As you can see the first sending initiated a ARP search for the IP. While that search was pending the first 2 of the 3 successive sends were lost.
The second keystroke (after the IP search was complete) properly sent 3 messages.
You may now repeat sending messages and it will work until you wait (about a minute until the adress translation gets lost again) then you will see dropouts again.
That means: There is no send buffer when sending UDP messages and there are ARP requests pending! All messages get lost except the last one.
Also "sendto" does not block until it successfully delivered, and there is no error return!
Well, that surprises me and makes me a little bit sad, because it means that I have to live with my current workaround or implement an ACK system that only sends one message at a time and then waits for reply - which would not be easy any more and imply many difficulties.

I'm posting this long after it's been answered by others, but it's directly related.
Winsock drops UDP packets if there's no ARP entry for the destination address (or the gateway for the destination).
Thus it's quite likely some of the first UDP packet gets dropped as at that time there's no ARP entry - and unlike most other operating systems, winsock only queues 1 packet while the the ARP request completes.
This is documented here:
ARP queues only one outbound IP datagram for a specified destination
address while that IP address is being resolved to a MAC address. If a
UDP-based application sends multiple IP datagrams to a single
destination address without any pauses between them, some of the
datagrams may be dropped if there is no ARP cache entry already
present. An application can compensate for this by calling the
Iphlpapi.dll routine SendArp() to establish an ARP cache entry, before
sending the stream of packets.
The same behavior can be observed on Mac OS X and FreeBSD:
When an interface requests a mapping for an address not
in the cache, ARP queues the message which requires the
mapping and broadcasts a message on the associated associated
network requesting the address mapping. If a response is
provided, the new mapping is cached and any pending message is
transmitted. ARP will queue at most one packet while waiting
for a response to a mapping request; only the most recently
``transmitted'' packet is kept.

UDP packets are supposed to be buffered on receipt, but a UDP packet (or the ethernet frame holding it) can be dropped at several points on a given machine:
network card does not have enough space to accept it,
OS network stack does not have enough buffer memory to copy it to,
firewall/packet filtering drop-rule match,
no application is listening on destination IP and port,
receive buffer of the listening application socket is full.
First two points are about too much traffic, which is not likely the case here. Then I trust that point 4. is not applicable and your software is waiting for the data. Point 5. is about your application not processing network data fast enough - also does not seem like the case.
Translation between MAC and IP addresses is done via Address Resolution Protocol. This does not cause packet drop if your network is properly configured.
I would disable Windows firewall and any anti-virus/deep packet inspection software and check what's on the wire with wireshark. This will most likely point you into right direction - if you can sniff those first packets on the "sent-to" machines then check local configuration (firewall, etc.); if you don't, then check your network - something in the path is interfering with your traffic.
Hope this helps.

erm ..... Its your computer doing ARP request. When you first start sending, your com does not know the receiver mac address, hence its unable to send any packets. It uses the ip address of the receiver to do an ARP request to get the mac address. During this process, any udp packets you try to send cannot be send out as the destination mac address is still not known.
Once your com receive the mac address it can start sending. However, the mac address will only remain in you com's ARP cache for 2mins (if no further activities is detected between you and the receiver) or 10 mins (full clear of ARP cache, even if connection is active). That is why you experience this problem every few mins.

Does anybody know that problem?
The real problem is that you assumed that UDP packet sending is reliable. It isn't.
The fact that you lose the first packet with your current network configuration is really a secondary issue. You might be able to fix that, but at any point you are still vulnerable to packet losses.
If packet loss is a problem for you, then you really should use TCP. You can build a reliable protocol on UDP, but unless you have a very good reason to do so, it's no recommended.

Socket issue with RTP packets

I'm trying to build an RTP packet metric analyzer in C, but i ended up in a strange issue, I'm cutting down implementation details for ease of exposure:
Since the RTP packets are contained in UDP my socket is initalized with the following parameters:
sock_domain = AF_INET;
sock_type = SOCK_DGRAM;
sock_proto = IPPROTO_UDP;
and binded with those parameters:
socket_addr.sin_family = AF_INET;
socket_addr.sin_port = 10000; // My server is streaming on 10000
socket_addr.sin_addr.s_addr = INADDR_ANY;
and joined the multicast group:
mgroup_req.imr_multiaddr.s_addr = inet_addr("224.1.0.1"); // Multicast group of the stream
mgroup_req.imr_interface.s_addr = INADDR_ANY;
set the multicast group from which i receive to:
mgroup_addr.sin_family = socket_addr.sin_family;
mgroup_addr.sin_port = socket_addr.sin_port;
mgroup_addr.sin_addr.s_addr = mgroup_req.imr_multiaddr.s_addr;
receiving packets with recvfrom function:
errcode = recvfrom(sockfd, &recvbuff, IPMAXSIZE, 0, (struct sockaddr *)&mgroup_addr, &mgroup_addr_len)
Something goes wrong, while i can receive standard UDP packets addressed to the multicast group 224.1.0.1, the RTP packets are not received the first time, but sending them again makes the trick.
Scenario 1:
I send n UDP packets: They are correctly received
Scenario 2:
I send n RTP packets: Nothing happens
I send again the same n RTP packets: They are correctly received
whatever the n number of packets is... so weird but true.
edit:
At the analyzer side I'm running a sniffer and it shows both the packet bursts, so messages arrives at the analyzer side, it's not a sender related problem.
Question:
the code is exactly the same (read: Same executable) for the RTP scenario AND the UDP scenario.
What am I doing wrong?
Side notes:
RTP manage libraries or high-level RTP libraries suggestions are a no-no, I MUST work at this level of abstraction because of the metrics i need to analyze.
Also Network metrics analyzer are a no-no, I MUST do this with my own code.
Thanks in advance.

I'm not convinced that you need to name the mcast group address again as the recvfrom address, as you've already joined the group. Is receiving packets from elsewhere a genuine concern? I would just use recv().

The issue was due to the board i was working on, it was set (not my code) for redirecting RTP packets directly to the MPEG decoder.
Turns out that packets in the decoder are not to be forwarded to the Kernel. Still has to understand why the double send was working.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight