Linux kernel UDP reception timestamp - c

I've been reading the network timestamping documentation of linux kernel and there is something that it's not clear to me.
Where is the timestamp provided by SO_TIMESTAMPNS generated? In hardware or in the kernel? If so it is gerated as soon as an interrupt for a new packet is raised?
SO_TIMESTAMPING should also allow the generation of hardware timestamps. Is this supported by all the NICs? How does SO_TIMESTAMPING with options SOF_TIMESTAMPING_RX_HARDWARE and SO_TIMESTAMPNS?
In that case is the hardware timestamp referring to the system clock or to the NIC clock? In the second case how to retrieve the NIC clock to compute elapsed time?

The socket attribute used for software timestamping is, SO_TIMESTAMPNS. This socket attribute returns the time from the system clock. It is not generated in the hardware, rather it is the snapshot of the system time when the interrupt is handled in the software. We can access this timestamp through the ancillary data (CMSG) that is not part of the socket payload, using:
int level, type;
struct cmsghdr *cm;
struct timespec *ts = NULL;
for (cm = CMSG_FIRSTHDR(&msg); cm != NULL; cm = CMSG_NXTHDR(&msg, cm))
{
level = cm->cmsg_level;
type = cm->cmsg_type;
if (SOL_SOCKET == level && SO_TIMESTAMPNS == type) {
ts = (struct timespec *) CMSG_DATA(cm);
printf("SW TIMESTAMP %ld.%09ld\n", (long)ts[0].tv_sec, (long)ts[0].tv_nsec);
}
}
The SO_TIMESTAMPING socket option offers many different flags, some of them are,
SOF_TIMESTAMPING_TX_HARDWARE // Transmit timestamp generated in hardware by NIC clock
SOF_TIMESTAMPING_RX_HARDWARE // Receive timestamp generated in hardware by NIC clock
SOF_TIMESTAMPING_TX_SOFTWARE // Transmit timestamp generated in kernel driver by NIC clock
SOF_TIMESTAMPING_RX_SOFTWARE // Receive timestamp generated in kernel driver by NIC clock
This socket option is not supported by all Network Interface Cards (NICs). Currently many ethernet NICs support SO_TIMESTAMPING. In order to find if a particular interface driver supports SO_TIMESTAMPING, use:
ethtool -T ethX // where X corresponds to your particular interface
This will return all the socket attributes ethX supports for timestamping.
To use hardware timestamping feature provided by a particular NIC, use the code:
int flags;
flags = SOF_TIMESTAMPING_TX_HARDWARE
| SOF_TIMESTAMPING_RX_HARDWARE
| SOF_TIMESTAMPING_TX_SOFTWARE
| SOF_TIMESTAMPING_RX_SOFTWARE
| SOF_TIMESTAMPING_RAW_HARDWARE;
if (setsockopt(sd, SOL_SOCKET, SO_TIMESTAMPING, &flags, sizeof(flags)) < 0)
printf("ERROR: setsockopt SO_TIMESTAMPING\n");
int level, type;
struct cmsghdr *cm;
struct timespec *ts = NULL;
for (cm = CMSG_FIRSTHDR(&msg); cm != NULL; cm = CMSG_NXTHDR(&msg, cm))
{
if (SOL_SOCKET == level && SO_TIMESTAMPING == type) {
ts = (struct timespec *) CMSG_DATA(cm);
printf("HW TIMESTAMP %ld.%09ld\n", (long)ts[2].tv_sec, (long)ts[2].tv_nsec);
}
}

Related

DPDK19.11.10: HW offload for IPV4 with VLAN tag is not working properly

I am using DPDK19.11.10 on centos.
The application is working fine with HW offloading if I send only the IPV4 packet without the VLAN header.
If I add the VLAN header with IPV4, HW offloading is not working.
If capture the pcap on ubuntu gateway the IP header is corrupted with Fragmented IP packet even though we are not fragmenting IP packet.
We verified capabalities like this:
if (!(dev->tx_offload_capa & DEV_TX_OFFLOAD_VLAN_INSERT)) {
rte_panic(" VLAN offload not supported");
}
Below is my code:
.offloads = (DEV_TX_OFFLOAD_IPV4_CKSUM |
DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM | DEV_TX_OFFLOAD_VLAN_INSERT),
m->l2_len = L2_HDR_SIZE;
m->l3_len = L3_IPV4_HDR_SIZE;
ip_hdr->check = 0;
m->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CKSUM;
ip_hdr = rte_pktmbuf_mtod(m, struct iphdr *);
vlan1_hdr = (struct vlan1_hdr *) rte_pktmbuf_prepend(m, sizeof(struct vlan1_hdr));
eth_hdr = (struct ethernet_hdr *) rte_pktmbuf_prepend(m, (uint16_t)sizeof(struct ethernet_hdr));
Once I received the packet in the ubuntu gateway the IP packet is corrupted as a fragmented IP packet.
The same code works fine if I removed the VLAN header.
Does anything else need to add here?
By the sound of it,
You might misunderstand the way how HW Tx VLAN offload is supposed to work;
Your code does not update m->l2_len when it inserts a VLAN header.
First of all, your code enables support for HW Tx VLAN offload, but, oddly enough, it does not actually attempt to use it. If one wants to use hardware Tx VLAN offload, they should set PKT_TX_VLAN in m->ol_flags and fill out m->vlan_tci. The VLAN header will be added by the hardware.
However, your code prepends the header itself, like if there was no intent to use a hardware offload in the first place. Your code does m->l2_len = L2_HDR_SIZE;, which, as I presume, only counts for Ethernet header. When your code prepends a VLAN header, this variable has to be updated accordingly:
m->l2_len += sizeof(struct rte_vlan_hdr);
Most DPDK NIC PMD supports HW VLAN offload (RX direction). But a limited number of PMD support the DEV_TX_OFFLOAD_VLAN_INSERT feature namely
Aquantia Atlantic
Marvell OCTEON CN9K/CN10K SoC
Cisco VIC adapter
Pensando NIC
OCTEON TX2
Wangxun 10 Gigabit Ethernet NIC and
Intel NIC - i40e, ice, iavf, ixgbe, igb
To enable HW VLAN INSERT one needs to check
if DEV_TX_OFFLOAD_VLAN_INSERT by checking get_dev_info
configure tx offload for the port with DEV_TX_OFFLOAD_VLAN_INSERT
enable MBUF descriptor with ol_flags = PKT_TX_VLAN and vlan_tci = [desired TCI in big-endian format]
This will allow the driver code in xmit function to check mbuf descriptors ol_flags for PKT_TX_VLAN and enables VLAN Insert offload to Hardware by registering the appropriate command with the Packet Descriptor before DMA.
From DPDK conditions are to be satisfied
at a given instance there should only be 1 thread access and updating the mbuf.
no modification for the mbuf is to be done on the original mbuf (with payload).
If the intention is to perform VLAN insert via SW (especially if HW or virtual NIC PMD does not support), in dpdk one has to do the following
Ensure the refcnt is 1 to prevent multiple thread access and modification on the intended buffer.
There is enough headroom to shift the packet 4 bytes to accommodate the Ether type and VLAN values.
ensure pkt_len and data_len are in bound (greater than 60 bytes and less than 4 bytes of MTU)
MBUF offload descriptors is not enabled for PKT_TX_VLAN
update data_len on the modified MBUF by 4 Bytes.
Update total pkt_len by 4.
(optional for performance consideration) prefetch the 4 bytes prior to mtod of mbuf memory address
Note: All the above things are easily achieved by using the DPDK function rte_vlan_insert. TO use the same follow the steps as
Do not configure the port with DEV_TX_OFFLOAD_VLAN_INSERT.
Update ol_flags with PKT_TX_VLAN and vlan_tci desired value.
Invoke rte_vlan_insert with the mbuf
Sample code:
/* Get burst of RX packets, from first port of pair. */
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
for (int i = 0; i < nb_rx; i++) {
bufs[i]->ol_flags = PKT_TX_VLAN;
bufs[i]->vlan_tci = 0x10;
rte_vlan_insert(&bufs[i]);
}
/* Send burst of TX packets, to second port of pair. */
const uint16_t nb_tx = rte_eth_tx_burst(port, 0,
bufs, nb_rx);
/* Free any unsent packets. */
if (unlikely(nb_tx < nb_rx)) {
uint16_t buf;
for (buf = nb_tx; buf < nb_rx; buf++)
rte_pktmbuf_free(bufs[buf]);
}

Add TCP Option to ACK packet during 3-way handshake

I'm writing kernel module using netfilter. I just want to handle ACK for SYN/ACK (TCP three-way handshake). I use skb_is_tcp_pure_ack function, but ACK for data is also processed.
How can I do? My kernel version is 3.10.0-514.16.1.el7.x86_64.
Current code looks like this:
struct iphdr *iph;
struct tcphdr *tcph;
struct net *net;
unsigned int hdr_len;
unsigned int tcphoff;
if (!skb_is_tcp_pure_ack(skb)) {
return NF_ACCEPT;
}
/* add tcp option */
/* A netfilter instance to use */
static struct nf_hook_ops nfho __read_mostly = {
.hook = ato_hookfn,
.pf = PF_INET,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP_PRI_LAST,
.owner = THIS_MODULE,
};
From the TCP state machine, you want to only match ACK packets when the state is TCP_SYN_SENT. Try adding another condition to check that, something like:
if (skb_is_tcp_pure_ack(skb)
&& skb->sk->sk_state == TCP_SYN_SENT) {
/*
* ACK(-only) packet during three-way handshake
*/
}
Also, note that skb_is_tcp_pure_ack was introduced in kernel version 4.* and not below.

Linux kernel: packet is forwarded to another interface, but isn't sent onwards

I've made a virtual network interface that forwards any packet it receives to a physical interface eth2, with the intention of sending it to whichever destination is required.
The virtual interface's hard_start_xmit function receives the packet, changes the sourceMAC and IP, sets the handling device to be the physical interface, and sets the packet type to PACKET_OUTGOING.
static int forward_packet(struct sk_buff *skb, struct net_device *dev) {
char switched_device_name[] = "eth2";
struct net_device *switched_device = dev_get_by_name(switched_device_name);
int ifrx_output;
u32 switched_device_ip;
if (!switched_device) {
printk("Error: Couldn't get device %s for switching.\n", switched_device_name);
return -1;
}
skb->dev = switched_device;
skb->input_dev = switched_device;
skb->pkt_type = PACKET_OUTGOING;
switched_device_ip = get_interface_ip(switched_device);
change_source_mac_and_ip(skb, switched_device->dev_addr, switched_device_ip);
ifrx_output = netif_rx(skb);
printk("netif_rx returned %d.\n", ifrx_output);
return 0;
}
I tested this code, and while the packet does appear in the physical interface's tcpdump, there's no trace of it in the destination machine.
The only reason I can think something like this would happen is because the physical interface treats it as an incoming packet with no need to pass it on. However as far as I can tell, only skb->pkt_type determines that.
What other packet changes could I be missing?

Raspberry Pi spidev.h SPI Communication

I try to establish a spi communication from RPi (Master) to an EtherCAT Device (Slave).
The transfer of data got a scheme.
I have to transfer 2 bytes which address registers and the following bytes transfer data till a chip select terminates the communication.
This is my created attempt. With cs_change, i can tell my spi communication to deselect Chip Select before the next transfer starts.
char transfer(UINT8 data, char last)
{
char last_transfer = last;
int ret;
uint8_t tx[] = { data };
uint8_t rx[ARRAY_SIZE(tx)] = { };
struct spi_ioc_transfer tr = {
.tx_buf = (unsigned long)tx,
.rx_buf = (unsigned long)rx,
.len = ARRAY_SIZE(tx),
.delay_usecs = delay,
.speed_hz = speed,
.bits_per_word = bits,
.cs_change = 0,
};
if (last_transfer)
tr.cs_change = 1;
ret = ioctl(fd, SPI_IOC_MESSAGE(1), &tr);
if (ret < 1)
printf("can't send spi message");
return rx[tr.len-1];
}
First problem:
I think it is too late to deselect Chip Select first at new transfer.
So my first question: Is there another way to control my Chip Select signal, maybe another library i can use?!
Second problem:
I want to read from spi without writing on it, how can i realise that ( with a simple read(fd...) ?!)
I hope you guys can support me :)
Now this is spidev_test.c application you are referring to. The implementation seems to work as slave SPI device would be deselected after your last transfer. It stays selected until after the last transfer in a message.
Each SPI device is deselected when it's not in active use, allowing other drivers to talk to other devices as may be your SPI bus is shared among other slave SPI devices.
And moreover you are going for full-duplex.
Standard read() operation is obviously only half-duplex.
So whenever your SPI slave device wants to send data to the SPI controller(Master), then it must have some interrupt mechanism so that your driver/app can set the SPI controller to "SPI_IOC_RD_MODE" and can make "SPI_IOC_MESSAGE" providing only rx_buf OR you can go for only simple read() operation as the transfer would be half_duplex after setting the SPI controller to read mode. -Sumeet
You can use SPI_NO_CS mode and toggle the CS pins as GPIO using the wiringPi library... (from http://wiringpi.com/ )

How bonding driver takes RX packets from enslave interfaces

I have a question regarding how to bonding driver takes RX packets from enslaved interfaces. I found that bonding use dev_add_pack() to set handlers for LACPDU and ARP packets, but I didn't found other handlers (for other packets types).
Could you please help me to solve this problem?
Bonding drivers registers its own Rx handler, when a slave interface is enslaved to a bond master, in bond_enslave() you can see:
res = netdev_rx_handler_register(slave_dev, bond_handle_frame,
new_slave);
So in bond_handle_frame(), it will hijack the packets received by the slave interface, so that the bond master will receive the packets instead:
static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
{
struct sk_buff *skb = *pskb;
struct slave *slave;
struct bonding *bond;
int (*recv_probe)(const struct sk_buff *, struct bonding *,
struct slave *);
int ret = RX_HANDLER_ANOTHER;
skb = skb_share_check(skb, GFP_ATOMIC);
if (unlikely(!skb))
return RX_HANDLER_CONSUMED;
*pskb = skb;
slave = bond_slave_get_rcu(skb->dev);
bond = slave->bond;
if (bond->params.arp_interval)
slave->dev->last_rx = jiffies;
recv_probe = ACCESS_ONCE(bond->recv_probe);
if (recv_probe) {
ret = recv_probe(skb, bond, slave);
if (ret == RX_HANDLER_CONSUMED) {
consume_skb(skb);
return ret;
}
}
if (bond_should_deliver_exact_match(skb, slave, bond)) {
return RX_HANDLER_EXACT;
}
skb->dev = bond->dev;
if (bond->params.mode == BOND_MODE_ALB &&
bond->dev->priv_flags & IFF_BRIDGE_PORT &&
skb->pkt_type == PACKET_HOST) {
if (unlikely(skb_cow_head(skb,
skb->data - skb_mac_header(skb)))) {
kfree_skb(skb);
return RX_HANDLER_CONSUMED;
}
memcpy(eth_hdr(skb)->h_dest, bond->dev->dev_addr, ETH_ALEN);
}
return ret;
}
I checked code of bonding and found that driver didn't inspect incoming RX packets without some of types (LACPDU, ARP) in cases, when driver works in these modes. Driver set handlers for this packets with use of dev_add_pack() function.
For set global hook in practic you can use nf_register_hook(), which provide interface for setup own net filter procedure for intercept packets.
Seems that nf_register_hook() is more powerful than dev_add_pack(), but I think that need more care when working nf_register_hook(), because it can drop a lot of packets in case of wrong conditions in hook.

Resources