In user space, I encapsulated a L3 packet using sock_raw (including IP header) and send to kernel space using sock_sendmsg() using msghdr structure
struct msghdr {
void *msg_name; /* optional address */
struct iovec *msg_iov; /* scatter/gather array */
...
};
I cannot understand clearly the roles of msg_name. I already specified the source IP and dest IP in L3 header. Why do I need msg_name?
The msg_name and msg_namelen fields of struct msghdr have the same function as the dest_addr and addrlen arguments to sendto: they specify the destination address. They are intended to be used with normal unconnected datagram sockets. For instance, when sending UDP packets with sendmsg on an AF_INET/SOCK_DGRAM socket, you supply only the payload, not the headers, in the iovec, and the destination address goes in msg_name + msg_namelen.
raw(7), the manpage describing SOCK_RAW sockets, indicates that you are allowed to put the header into the iovec when using raw sockets (note in particular the discussion of IP_HDRINCL) but does not make clear what you should set msg_name and msg_namelen to in that case. I would recommend you try setting both of them to 0 and see if that works.
Related
I am currently writing an eBPF program in C to track egress network packets using tc-bpf. The program has its definition as follows:
SEC("classifier")
int bpf_tc_sample(struct __sk_buff *skb) {
[...]
return TC_ACT_OK;
}
In my program, I need to get the socket associated with the sock of the socket buffer.
If the type of the socket buffer would have been struct sk_buff this could have been done by accessing skb->sk->sk_socket. In this case, the type of skb->sk is struct bpf_sock and not struct sock which does not contain an entry for sk_socket.
Could anyone please let me know whether there is a way to access the struct socket field from a struct __sk_buff entry in an eBPF program, either through a BPF helper or via using a kprobe.
Any insight will be appreciated.
What is the difference between iov.iov_base and msg.msg_control ?
I'm looking at some code examples (ipuitls open source ping)
When sending data using sendmsg the packet is set in iov.iov_base
When reading data using recvmsg the packet is read from msg->msg_control directly.
What is the relationship between struct iovec and struct msghdr ? Is there a difference when reading/sending data ?
Sorry for the silly question. I didn't find an answer so far and I'm confused.
thanks !
Ancillary data or control messages (.msg_controllen bytes at .msg_control) is data provided or verified by the kernel, whereas the normal payload (in iovecs) is just data received from the other endpoint, unverified and unchecked by the kernel (except for checksum, if the protocol has one).
For IP sockets (see man 7 ip), there are several socket options that cause the kernel to provide ancillary data on received messages. For example:
IP_RECVORIGDSTADDR socket option tells the kernel to provide a IP_ORIGDSTADDR type ancillary message (with a struct sockaddr_in as data), identifying the original destination address of the datagram received
IP_RECVOPTS socket option tells the kernel to provide a IP_OPTIONS type ancillary message containing all IP option headers (up to 40 bytes for IPv4) for incoming datagrams
Ping and traceroute uses ICMP messages over IP; see man 7 icmp (and man 7 raw) for details.
Because most ICMP responses do not contain useful data filled in by the sender, the iovecs don't usually contain anything interesting. Instead, the interesting data is in the IP message headers and options.
For example, an ICMP Echo reply packets contain just 8 bytes (64 bits): 8-bit type (0), 8-bit code (0), 16-bit checksum, 16-bit id, and 16-bit sequence number. To get the IP headers with the interesting fields, you need the kernel to provide them as ancillary data control messages.
The background:
As described in the sendmsg() and related man pages, we have
ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);
struct msghdr {
void *msg_name; /* Optional address */
socklen_t msg_namelen; /* Size of address */
struct iovec *msg_iov; /* Scatter/gather array */
size_t msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* Ancillary data */
size_t msg_controllen; /* Ancillary data buffer len */
int msg_flags; /* Flags (unused) */
};
struct iovec {
void *iov_base; /* Starting address */
size_t iov_len; /* Number of bytes to transfer */
};
with man 3 cmsg describing how to construct and access such ancillary data,
struct cmsghdr {
size_t cmsg_len; /* Data byte count, including header
(type is socklen_t in POSIX) */
int cmsg_level; /* Originating protocol */
int cmsg_type; /* Protocol-specific type */
unsigned char cmsg_data[]; /* Data itself */
};
struct cmsghdr *CMSG_FIRSTHDR(struct msghdr *msgh);
struct cmsghdr *CMSG_NXTHDR(struct msghdr *msgh, struct cmsghdr *cmsg);
size_t CMSG_ALIGN(size_t length);
size_t CMSG_SPACE(size_t length);
size_t CMSG_LEN(size_t length);
unsigned char *CMSG_DATA(struct cmsghdr *cmsg);
These ancillary data messages are always sufficiently aligned for the current architecture (so that the data items can be directly accessed), so to construct a proper ancillary message (SCM_CREDENTIALS to pass user, group, and process ID information over an Unix domain socket, or SCM_RIGHTS to pass file descriptors), these macros have to be used. The man 3 cmsg man page contains example code for these.
Suffice it to say, that to loop over each ancillary data part in a given message (struct msghdr msg), you use something that boils down to
char *const end = (char *)msg.msg_control + msg.msg_controllen;
char *ptr = (char *)msg.msg_control;
for (char *ptr = (char *)msg.msg_control; ptr < end;
ptr += ((struct cmsghdr *)ptr)->cmsg_len) {
struct cmsghdr *const cmsg = (struct cmsghdr *)ptr;
/* level is cmsg->cmsg_level and type is cmsg->cmsg_type, and
cmsg->cmsg_data is sufficiently aligned for the level and type,
so you can use ((datatype *)(cmsg->cmsg_data)) to obtain a pointer
to the type corresponding to this level and type ancillary payload.
The exact size of the payload is
(cmsg->cmsg_len - sizeof (struct cmsghdr))
so e.g. an SCM_RIGHTS ancillary message, with
cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_RIGHTS
has exactly
(cmsg->cmsg_len - sizeof (struct cmsghrd)) / sizeof (int)
new file descriptors as a payload.
*/
}
I have a UDP socket on which i have specified the IP_PKTINFO flag
to pass a ancillary message containing a pktinfo struct to set a custom source address
struct in_pktinfo {
unsigned int ipi_ifindex;
struct in_addr ipi_spec_dst;
struct in_addr ipi_addr;
};
struct in_addr is just wrapping struct around a unsigned long
struct in_addr {
unsigned long s_addr; // load with inet_aton()
};
On my implementation:
Datagrams arriving on a transparent UDP socket(tproxy), analyzed, processed
and then forwarded to a specific destination(Port Address Translation).
So it is crucial for me to set the source IP as well as source port.
I can realize setting a specific source port if i open new socket and bind to it,
because of the large amount of datagrams it is very inefficient.
Questions:
is there anyway to set the source port without multiple sockets/binds or using a raw socket.
In Linux, Ubuntu 14.04:
I'm writing a code that implements socket to send pure UDP datagrams which includes UDP header+payload, without any part of IP header.
I have created the socket
sokt_fd=socket(AF_INET, SOCK_RAW, IPPROTO_UDP)
Also, I have prepared the UDP header.
I want to leave the IP encapsulation process to the kernel.
I want to send the datagram over any available IP interface. (I do not want to specify the source IP, and also leave this task to the kernel).
Do I need to specify the destination IP address before sending the datagram.
I must use "sendto()" command to send the datagram; how I must fill the "sockaddr" data structure?
#include <netinet/in.h>
struct sockaddr
{
unsigned short sa_family;// address family, AF_xxx
char sa_data[14];// 14 bytes of protocol address
};
Don't use the sockaddr structure. Use sockaddr_in instead and cast it when you have to pass a sockaddr* to a function.
struct sockaddr_in myaddr;
int s;
myaddr.sin_family = AF_INET;
myaddr.sin_port = htons(3490);
inet_aton("63.161.169.137", &myaddr.sin_addr.s_addr);
s = socket(PF_INET, SOCK_STREAM, 0);
bind(s, (struct sockaddr*)myaddr, sizeof(myaddr));
The socket API is designed for different addressing families, others are Infrared and Bluetooth. Since AF_INET is only one of the families the API functions use the general sockaddr type in the parameters.
There is a nice explanation of this in Chapter 3, "Sockets Introduction" in the well-known book Unix Network Programming, The Sockets Networking API (Volume 1) by Richard Stevens et. al. Let me quote:
Most socket functions require a pointer to a socket address structure
as an argument. Each supported protocol suite defines its own socket
address structure. The names of these structures begin with
sockaddr_ and end with a unique suffix for each protocol suite.
For the IP (Internet protocol) suite, the structure is sockaddr_in so it follows that since your example is specifying the AF_INET address family when you created the socket that you would use the more specific sockaddr_in structure instead of the more generic sockaddr. The socket API, as a matter of efficiency uses the more generic sockaddr pointer in the signature prototype.
With regard to using send() versus sendto(), I have found that sendto() is used more commonly with UDP and send() with TCP sockets. Therefore, to answer your question in #3 above, with UDP you don't have to specify the destination address up front, but instead it is supplied as an argument to sendto().
For a given udp_datagram and datagram_length, your code might look something like this:
uint32_t address = inet_addr("1.2.3.4"); // can also provide hostname here
uint16_t port = 27890;
sockaddr_in_t dest_addr;
memset(*dest_addr, 0, sizeof(dest_addr));
dest_addr.sin_family = AF_INET;
dest_addr.sin_port = htons(port);
dest_addr.sin_addr.s_addr = htonl(address);
sendto(socket_fd,
(const char*)upd_datagram,
datagram_length,
0,
reinterpret_cast<sockaddr_t*>(&dest_addr),
sizeof(dest_addr));
The address API really wanted to be object-oriented, but had to deal with the fact that C isn't an OO language. sockaddr can be seen as the "base class" and the parameter type that bind, connect, sendto, recvfrom, etc. use when they need an address. However, you must provide a "subclassed" address matching the socket domain that you're using. This is because Berkeley sockets can be used for a wide and extensible range of protocols. IPv4 and IPv6 are the most typical, but UNIX-based installs also support sockets as filesystem objects ("addressed" by path), and, for instance, a hypervisor driver can install support for special inter-VM or guest-to-host sockets. See man 7 socket for an overview.
If you use IPv4, you need to use sockaddr_in. If you use IPv6, you need to use sockaddr_in6. In both cases, you need to cast your pointer to a sockaddr*.
To fill in a sockaddr_in, you need to do something like this:
struct sockaddr_in inet_addr;
inet_addr.sin_family = AF_INET;
inet_addr.sin_port = htons(port);
inet_addr.sin_addr.s_addr = htonl(ip_address_as_number);
struct sockaddr* addr = (struct sockaddr*)&inet_addr;
htons and htonl stand for "host to network (short)" and "host to network (long)", respectively. You need this because there was a time at which network drivers were too dumb to abstract away the machine's endianness and we can't go back in time to fix them. (The network byte order is big endian.)
I am trying to send an OAM ethernet frame using raw socket. I was successful in doing so.
The send function I have written is:
int send_frame(sock_info *info,char *buf,int length)
{
struct sockaddr_ll dest_addr;
memset(&dest_addr,0,sizeof(struct sockaddr_ll));
dest_addr.sll_family = PF_PACKET;
dest_addr.sll_protocol = htons(8902);
dest_addr.sll_ifindex = info->if_index;
dest_addr.sll_halen = ETH_MAC_ADDR_LEN;
dest_addr.sll_pkttype = PACKET_OTHERHOST;
dest_addr.sll_hatype = ARPHRD_ETHER;
memset(dest_addr.sll_addr,0,8);
dest_addr.sll_addr[0] = 0x00;
dest_addr.sll_addr[1] = 0xE0;
dest_addr.sll_addr[2] = 0x0C;
dest_addr.sll_addr[3] = 0x00;
dest_addr.sll_addr[4] = 0x95;
dest_addr.sll_addr[5] = 0x02;
return sendto(info->sock_fd, buf, length, 0, (struct sockaddr*) &dest_addr, sizeof(struct sockaddr_ll));
}
I was unable to capture the packet using wireshark. After tryiing too many things, I found out that buffer used to send should have all ethernet frame fields (starting from destination address). When I added the destination and source address and other ethernet fields into the buffer, I was able to capture the packet using wireshark. So the send function doesn't use the MAC address stored in dest_addr.sll_addr.
My question is, Then what's the need of sll_addr field in the struct sockaddr_ll? Manuals say that it is the destination MAC address.
To me it sounds like it works as the manual page describes it (man 7 packet):
SOCK_RAW packets are passed to and from the device driver without any
changes in the packet data. When receiving a packet, the address is
still parsed and passed in a standard sockaddr_ll address structure.
When transmitting a packet, the user supplied buffer should contain the
physical layer header. That packet is then queued unmodified to the
network driver of the interface defined by the destination address.
Some device drivers always add other headers. SOCK_RAW is similar to
but not compatible with the obsolete PF_INET/SOCK_PACKET of Linux 2.0.
The buffer here refers to the 2nd parameter of sendto(). So, the stuct sockaddr_ll is only used to return data to the caller, not to format the RAW packet. Maybe you want to user SOCK_DGRAM or libpcap instead?