Why does struct sockaddr contain an address family field? Isn't the address family already fixed with the call to socket()?
sockaddr is used in more places than just connect and bind, including places where you don't have some external knowledge of the address family involved - getaddrinfo being one.
Additionally, whilst I don't believe the following equates to practice anywhere, I can see it having been in the eye of whoever designed this stuff originally: the call to socket() defines the protocol family. sockaddr contains the address family. In practice, I believe these are always the same, but you could theoretically have a protocol capable of supporting two different address types.
EDIT: There's another way that the parameter is useful. If you're using datagram (UDP) sockets and you have a socket in a "connected" state with a default destination address, you can clear out that address by calling connect() with a sockaddr with sa_family set to AF_UNSPEC.
If you look at the getaddrinfo interface, which is the only modern correct way to convert between interchange representations of addresses (names or numeric addresses) and the sockaddr structures, I think you'll see why it's needed.
With that said, the whole struct sockaddr stuff is a huge mess of misdesigns, especially the userspace endian conversion.
Another good instance of why the sa_family field is needed is the getsockname and getpeername interfaces. If the program inherited the file descriptor from another program, and doesn't already know what type of socket it is, it needs to be able to determine that in order to make new connections or even convert the address to a representation suitable for interchange.
If you look at the network code for 4.2BSD, where the sockets interface originated, you'll see that the sockaddr is passed to the network interface drivers but the socket fd is not.
The sa_family field is used to tell what type of address will be in the sa_data field. In a lot of applications, the address family is assumed to be IPV4. However, many applications also support IPV6.
Related
I started playing around with sockets in C and when i bind a socket to an address using bind() system call i have to specify the addrlen parameter.
Why is the address length necessary in a socket?
The bind function (syscall) is generic function which have to cope with several type of addresses, IPv4, IPv6, bluetooth, unix sockets and ... each address type may have different size than others, so you have to make it clear for bind which address you're passing by passing its size.
bind is a syscall, syscall is just a wrapper function which is used in userspace for interacting with kernel space. when you create a socket via socket syscalls a record will be created in file descriptor table of calling process. the record itself includes the type of socket.
when you call bind and passing address to it, the address should be copied to kernel space, but how much big is address? the bind syscall doesn't know the socket you're binding, because the socket record created in kernel space and userspace bind function and its respective syscall doesn't know about the size it requires. actually the bind is just syscall which copies address data to kernel space and notifies the kernel about it.
In the other hand, the bind could not determine address time at runtime, because there is no runtime type checking in pure C.
so at this time the bind doesn't know about the addresses and you should specify the address size, so the address structure will be copied to kernel space completely.
There are several different sorts of socket addresses
These each have their own sockaddr_* structure, i.e. sockaddr_in for AF_INET, sockaddr_in6 for AF_INET6, etc.
Passing the length allows the kernel to check that the passed data is consistent with the socket type.
The syscalls are protocol independent, for example they apply to
IPv4 (sockaddr_in)
IPv6 (sockaddr_in6)
Unix Domain Sockets (sockaddr_un)
all of which may have different lengths of their socket description. This is why you have to specify the length of the corresponding structure.
bind(sockfd, (struct sockaddr *) &serv_addr,sizeof(serv_addr))
It takes three arguments, the socket file descriptor, the address to which is bound, and the size of the address to which it is bound
Why is the address length necessary in a socket?
Its a limit in which it is bound
for example i will handle it is as below
if (bind(sockfd, (struct sockaddr *) &serv_addr,sizeof(serv_addr)) < 0)
{
error("ERROR on binding");
}
I'm sending some ping packets via a raw socket in C, on my linux machine.
int sock_fd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
This means that I specify the IP packet header when I write to the socket (IP_HDRINCL is implied).
Writing to the socket with send fails, telling me I need to specify an address.
If I use sendto then it works. For sendto I must specify a sockaddr_in struct to use, which includes the fields sin_family, sin_port and sin_addr.
However, I have noticed a few things:
The sin_family is AF_INET - which was already specified when the socket was created.
The sin_port is naturally unused (ports are not a concept for IP).
It doesn't matter what address I use, so long as it is an external address (the IP packet specifies 8.8.8.8 and the sin_addr specifies 1.1.1.1).
It seems none of the extra fields in sendto are actually used to great extent. So, is there a technical reason why I have to use sendto instead of send or is it just an oversight in the API?
Writing to the socket with send fails, telling me I need to specify an address.
It fails, because the send() function can only be used on connected sockets (as stated here). Usually you would use send() for TCP communication (connection-oriented) and sendto() can be used to send UDP datagrams (connectionless).
Since you want to send "ping" packets, or more correctly ICMP datagrams, which are clearly connectionless, you have to use the sendto() function.
It seems none of the extra fields in sendto are actually used to great
extent. So, is there a technical reason why I have to use sendto
instead of send or is it just an oversight in the API?
Short answer:
When you are not allowed to use send(), then there is only one option left, called sendto().
Long answer:
It is not just an oversight in the API. If you want to send a UDP datagram by using an ordinary socket (e.g. SOCK_DGRAM), sendto() needs the information about the destination address and port, which you provided in the struct sockaddr_in, right? The kernel will insert that information into the resulting IP header, since the struct sockaddr_in is the only place where you specified who the receiver will be. Or in other words: in this case the kernel has to take the destination info from your struct as you don't provide an additional IP header.
Because sendto() is not only used for UDP but also raw sockets, it has to be a more or less "generic" function which can cover all the different use cases, even when some parameters like the port number are not relevant/used in the end.
For instance, by using IPPROTO_RAW (which automatically implies IP_HDRINCL), you show your intention that you want to create the IP header on your own. Thus the last two arguments of sendto() are actually redundant information, because they're already included in the data buffer you pass to sendto() as the second argument. Note that, even when you use IP_HDRINCL with your raw socket, the kernel will fill in the source address and checksum of your IP datagram if you set the corresponding fields to 0.
If you want to write your own ping program, you could also change the last argument in your socket() function from IPPROTO_RAW to IPPROTO_ICMP and let the kernel create the IP header for you, so you have one thing less to worry about. Now you can easily see how the two sendto()-parameters *dest_addr and addrlen become significant again because it's the only place where you provide a destination address.
The language and APIs are very old and have grown over time. Some APIs can look weird from todays perspective but you can't change the old interfaces without breaking a huge amount of existing code. Sometimes you just have to get used to things that were defined/designed many years or decades ago.
Hope that answers your question.
The send() call is used when the sockets are in a TCP SOCK_STREAM connected state.
From the man page:
the send() call may be used only when the socket is in a connected
state (so that the intended recipient is known).
Since your application obviously does not connect with any other socket, we cannot expect send() to work.
In addition to InvertedHeli's answer, the dest_addr passed in sendto() will be used by kernel to determine which network interface to used.
For example, if dest_addr has ip 127.0.0.1 and the raw packet has dest address 8.8.8.8, your packet will still be routed to the lo interface.
I came across this snippet of code which appeared in the guts of setting up a socket:
#define PORT xxxx
struct sockaddr_in self;
self.sin_family = PF_INET;
self.sin_port = htons(PORT);
I understand that we need to convert byte order of the data that we are transmitting over a network to Network Byte Order but I don't get why we need to convert port number to that as well when setting up a socket. I mean, when we do bind, isn't it a "local" thing? Say the port we intend to be bound is 1 and the machine actually uses little endian; now since we converted it into Network Byte Order, wouldn't we be binding a totally different port to the socket?
I think let's assume you are using TCP. The port number is going to be in the packet header. That is going to be transmitted. So it will be in Network Byte Order.
Are you asking why you the application programmer does it instead of the library doing it internally? If so, the only technical advantage I can think of is that it allows the application to do the conversion once and cache it and use it many times without requiring many conversions.
On TCP you only need to use it once per connection and typically won't make millions of connections. But on UDP you use it every time you send a packet, and it's reasonable to assume you'd make millions or billions of such calls.
Then, for myriad calls to say sendto() for UDP, or what have you, the re-ordered-if-necessary address is supplied to the OS which can copy it as-is directly into outgoing network packets.
The alternative of doing it in the kernel would require every call to sendto() to take what the app knows as the same address over and over, and reconvert it every time.
Since sendto() benefits from this, that was perhaps sufficient reason for them to have the rest of the API work the same way.
You transmit the port number over the network. It is a part of the IP packet for TCP. Look up the RFC (ietf.org/rfc/rfc793.txt)
The struct sockaddr_in is just a wrapper structure for struct sockaddr:
struct sockaddr {
unsigned short sa_family;
char sa_data[14];
};
The Port number and Ip-address are held together here. They are held together in sa_data[14]- the first 2 bytes holding the port number, and the next 4 bytes holding the ip-address. The remaining 8 bytes are unused. These are the 8 bytes you clear to zeroes via sin_zero[8] when you use sockaddr_in.
This in entirety is sent via network, including the port number in network order.
Machines can use different encodings little/big endian. To standardize that you should use a uniform encoding when communicating through the network. That's why you have to convert the encoding to Network Byte Order regardless of it being little/big endian, what matters is that it is uniform and it is understood correctly by every device and software in the network.
I'm wondering how feasible it is to be able to convert an AF_INET socket to use an AF_UNIX instead. The reason for this is that I've got a program which will open a TCP socket, but which we cannot change. To reduce overhead we would therefore like to instead tie this socket to use an AF_UNIX one for its communication instead.
So far, my idea has been to use LD_PRELOAD to achieve this---intercepting bind() and accept(), however it is not clear how best to achieve this, or even if this is the best approach.
So far, bind in bind(), if the socket type is AF_INET and its IP/port is the socket I wish to convert to AF_UNIX, I then close the sockd here, and open an AF_UNIX one. However, this seems to be causing problems further on in accept() -- because I am unsure what to do when the sockfd in accept() matches the one I wish to tell to use an AF_UNIX socket.
Any help kindly appreciated.
Jason
Your idea sounds perfectly feasible. In fact, I think it sounds like the best way to achieve what you want. I wouldn't expect very different, or even measurably different, overhead/performance though.
Of course you'd also have to intercept socket() in addition to bind() and accept(). In bind(), you could, for example, converted the requested TCP port to a fixed pathname /tmp/converted_socket.<port-number> or something like that.
I had a similar problem and came up with unsock, a shim library that does what you describe, and more.
unsock supports other address types like AF_VSOCK and AF_TIPC, and the Firecracker VM multiplexing proxy as well.
There are three key insights I want to share:
Since sockets are created for a particular address family using socket(2), and then later connected or bound using connect(2)/bind(2), you may be tempted to simply intercept socket and fix the address there.
One problem is that you may want to selectively intercept certain addresses only, which you don't know at the time of the call.
The other problem is that file descriptors may be passed upon you from another process (e.g., via AF_UNIX auxiliary mes, so you may not be able to intercept socket(2) in the first place.
In other words, you need to intercept connect(2), bind(2), and sendto(2).
When you intercept connect(2), bind(2), and sendto(2), you need to retroactively change the address family for socket(2). Thankfully, you can just create a new socket and use dup3(2) to reassign the new socket to the existing file descriptor. This saves a lot of housekeeping!
accept(2) and recvfrom(2) also need to be intercepted, and the returned addresses converted back to something the caller understands. This will inevitably break certain assumptions unless you do maintain a mapping back to the actual, non-AF_INET address.
I'm curious because the man page of connect(2) is pretty short and it takes a struct sockaddr* which is normally cast anyways..
sockaddr_in and in_addr aren't even similar. There's no way that would work. There different because more than an address is usually needed. For example, a port number is needed to establish connect a IP socket.
How the connect(2) call works depends on the socket domain you're using (i.e. the value passed as the first parameter to socket(2)). If you're using the AF_INET protocol family, then connect expects a sockaddr_in. If you're using the AF_INET6 protocol, then it expects a sockaddr_in6, Other protocol families have their own address structure.
Whichever protocol family you're using, you should only use that family's address structure, cast to a struct sockaddr, when calling connect or any other socket functions.