What is the purpose of the sa_data field in a sockaddr? - c

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
The actual structure passed for the addr argument will depend on the address family. The sockaddr structure is defined as something like:
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
}
So for an IPv4 address (AF_INET), the actual struct that will be passed is this:
/* Source http://linux.die.net/man/7/ip */
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address. */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
Does the bind code read the sockaddr.sa_family value and depending on the value it finds, it will then cast the sockaddr struct into the appropriate struct such as sockaddr_in?
Why is the sa_data set to 14 characters? If I understand correct, the sa_data field is just a field that will have large enough memory space to fit all address family types? Presumably the original designers anticipated that 14 characters would be wide enough to fit all future types.

According to the glibc manual:
The length 14 of sa_data is essentially arbitrary.
And the FreeBSD developers handbook mentions the following:
Please note the vagueness with which the sa_data field is declared,
just as an array of 14 bytes, with the comment hinting there can be
more than 14 of them.
This vagueness is quite deliberate. Sockets is a very powerful
interface. While most people perhaps think of it as nothing more than
the Internet interface—and most applications probably use it for that
nowadays—sockets can be used for just about any kind of interprocess
communications, of which the Internet (or, more precisely, IP) is only
one.
Yes, the sa_family field is used to recognize how to treat the struct passed (which is cast to struct sockaddr* in a call to bind). You can read more about how it works also in a FreeBSD developers handbook.
And actually there are "polymorphic" (sub)types of sockaddr, in which sa_data contains more than 16 bytes, for example:
struct sockaddr_un {
sa_family_t sun_family; /* AF_UNIX */
char sun_path[108]; /* pathname */
};

The sockaddr struct is used as a tagged union. By reading the sa_family field it can be cast to a struct of the proper form.
The 14 bytes is arbitrary. It's big enough to hold IPv4 addresses, but not big enough to hold IPv6 addresses. There is also a sockaddr_storage struct which is big enough for both. Reading the Microsoft docs on SOCKADDR_STORAGE, it comes in at 128 bytes, so much larger than needed for IPv6. Checking some Linux headers, it seems to be at least that large there as well.
For reference, the IPv6 struct is:
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct in6_addr {
unsigned char s6_addr[16]; // IPv6 address
};
As you can see, the 16 byte s6_addr field is already bigger than the 14 byte sa_data field on it's own. Total size after the sa_family field is 26 bytes.

Related

How does "transfer/casting" between 2 structs that have different structures works in C?

I'm learning HTTP protocol following a tutorial which gives an understandable piece of code and here's part of it.
struct sockaddr_in address;
...
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons( PORT );
memset(address.sin_zero, '\0', sizeof address.sin_zero);
if (bind(server_fd, (struct sockaddr *)&address, sizeof(address))<0)
{
perror("In bind");
exit(EXIT_FAILURE);
}
The example code works well, although I don't understand the some kind of transfer between two structs.
the definition of struct sockaddr_in in <netinet/in.h> is
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
the definition of struct sockaddr in <sys/socket.h> is
struct sockaddr {
__uint8_t sa_len; /* total length */
sa_family_t sa_family; /* [XSI] address family */
char sa_data[14]; /* [XSI] addr value (actually larger) */
};
They have different structures, how the "transfer/casting" works there?
I don't understand the some kind of transfer between two structs.
There is no data transfer between different structs, nor any conversion of structure objects. In bind(server_fd, (struct sockaddr *)&address, sizeof(address)), a pointer to a struct is converted to a different object pointer type. This is explicitly allowed by C.
The C language specification does not define any behavior for accessing the struct via the converted pointer. Any attempt to do so would violate the strict aliasing rule, but that's not your problem. The example you presented demonstrates an utterly standard usage idiom for the bind() function, for which it was designed. Therefore, you can rely on the bind() implementation to do the right thing with it, by whatever magic is required.
Conceptually, though, you can observe that the first two members of struct sockaddr and struct sockaddr_in have the same data types. You could imagine, then, that bind is able to access those two members via the converted pointer, despite it constituting a strict-aliasing violation. Although C does not define behavior for that, POSIX implicitly requires that it work in at least this case. Having then done that, the second of those members indicates the address family, by which bind() can invoke the appropriate behavior for the address's actual type.
That is a variation on C-style polymorphism. It is helped out by the third bind argument, the size of the address object, which enables bind() to copy the address object without knowing its true effective data type.
These structure types and the bind() API could have been defined a bit differently to avoid the implied strict-aliasing violation, but that wasn't necessary in early C, where member names corresponded directly to offsets from the beginning of the structure. And where those names were global, which is why you see the sin_ and sa_ prefixes in those member names, and similar in many other structure types provided by the system. Nowadays, it's best to just accept that that's how bind() is used, and it's up to the system to provide a bind() implementation that accommodates it.
The casting works.
Looking at the two structures:
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct sockaddr {
__uint8_t sa_len; /* total length */
sa_family_t sa_family; /* [XSI] address family */
char sa_data[14]; /* [XSI] addr value (actually larger) */
};
First two members, sin_len and sa_len, sin_family and sa_family will not be problematic as those are of the same data type. The padding for sa_family_t works exactly the same on both ends.
Looking at the reference,
in_port_t Equivalent to the type uint16_t as described in <inttypes.h>
in_addr_t Equivalent to the type uint32_t as described in <inttypes.h>
For windows, struct in_addr looks like below:
struct in_addr {
union {
struct {
u_char s_b1;
u_char s_b2;
u_char s_b3;
u_char s_b4;
} S_un_b;
struct {
u_short s_w1;
u_short s_w2;
} S_un_w;
u_long S_addr;
} S_un;
};
and that for a linux is:
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
The whole confusion you might have is because of how the contents align. However, it is a well-thought historic design. It is intended to accommodate implementation-dependent aspects in the design.
When I Secondly, implementation-dependent -- it refers to the fact that implementation of in_addr_t is not consistent across all systems, as seen above.
In a nutshell, this entire magic works, because of the 2 things: The exact size and padding nature of the first two members and then lastly the data type of sa_data[14] is char, or more precisely an array of a 1-byte data-type. This design trick with union inside a struct has been widely used.
Unix Network Programming Volume 1 states:
The reason the sin_addr member is a structure, and not just an in_addr_t, is historical. Earlier releases (4.2BSD) defined the in_addr structure as a union of various structures, to allow access to each of the 4 bytes and to both of the 16-bit values contained within the 32-bit IPv4 address. This was used with class A, B, and C addresses to fetch the appropriate bytes of the address. But with the advent of subnetting and then the disappearance of the various address classes with classless addressing, the need for the union disappeared. Most systems today have done away with the union and just define in_addr as a structure with a single in_addr_t member.
Not what you asked for, but good to know:
The same header states:
The sockaddr_in structure is used to store addresses for the Internet address family. Values of this type shall be cast by applications to struct sockaddr for use with socket functions.
So, sockaddr_in is a struct specific to IP-based communication and sockaddr is more of a generic structure for socket operations.
Just a try:
#include <stdio.h>
#include <sys/socket.h>
#include <netinet/in.h>
int main(void)
{
printf("sizeof(struct sockaddr_in) = %zu bytes\n", sizeof(struct sockaddr_in));
printf("sizeof(struct sockaddr) = %zu bytes\n", sizeof(struct sockaddr));
return 0;
}
Prints:
sizeof(struct sockaddr_in) = 16 bytes
sizeof(struct sockaddr) = 16 bytes
I think this cast breaks the strict aliasing rule and then is undefined behaviour if the bind function dereferences the pointer.
In practice the code assumes that all fields of struct sockaddr_in are contiguous so you can access a buffer of bytes either as a struct sockaddr_in or as a struct sockaddr equivalently. But the fields of a structure are not guaranteed to be contiguous. If in_port_tis two bytes long for example, there may very well be a hole between sin_portand sin_addr with a 32 bytes machine compiler because it may want to align sin_addr field on 32 bytes address.
This way of coding is frequent when you develop a communication interface driver: you receive a buffer of bytes that need to be interpreted as a data structure (like: first byte is an adress, following bytes are a length, etc...). Casting from a structure to another one avoids to copy data.
Note that usually compilers provide non-standard-C ways to guarantee that all fields of structures are contigiuous. For example with gcc it is __attribute__((packed))
Now, to answer to your question: provided the structures are packed and there is no undefined behaviour, the cast basically does nothing. sa_data will be the array of bytes located after the field sin_family. So this array will consist of sin_port, followed by sin_addr followed by the array sin_zero.
EDIT: I compiled tje following structures on STM32H7 (ARM cortex M7, 32 bits architecture) with arm-none-eabi-gcc:
struct in_addr {
uint32_t s_addr;
};
struct sockaddr_in {
uint8_t sin_len;
uint16_t sin_family;
uint16_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct sockaddr {
uint8_t sa_len;
uint16_t sa_family;
char sin_zero[14];
};
The size of sockaddr_in is 20.
The size of sockaddr is 18.
Note that if sa_family_t is of type char and not short, due to alignment, both structures are same size.

UNIX sendto() without destination port

I'm new to unix socket programming. I have some questions about unix sendto() function.
ssize_t sendto(int sockfd,
const void *buf,
size_t len,
int flags,
const struct sockaddr *dest_addr,
socklen_t addrlen);
My questions are:
(1) If port number in dest_addr is not set, how would the receiver host deal with the packet?
(2) How does this functions process info in dest_addr? I send IPv6 packet with this function, I have to use sockaddr_in6 struct which is a totally different with sockaddr_in struct:
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
and why do we need a cast to struct sockaddr in sendto()? Looks like only sa_family is meaningful to this function. Then what about other fields?
Regarding 1: there is no special "undefined" value. Whatever value happens to be stored in this memory location will be taken as the desired port number.
Regarding 2: each struct sockaddr starts with an address family identifier. If the family is set to AF_INET, all functions will know to expect a struct sockaddr_in. Likewise for the other families. In that sense, I believe the example you posted (which seems to imply AF_INET6 can be a valid value of sin_family in a struct sockaddr_in) to be misleading.

Extracting IP and port from multiple sockaddr_storage's into a char*

I have a fixed length array, every entry is from type struct contact
typedef struct contact
{
int fd;
union
{
struct sockaddr_in v4addr;
struct sockaddr_in6 v6addr;
struct sockaddr_storage stor;
};
char buf[FRAME_BUF_LEN];
int len;
char name[32];
} contact_t;
and I need to extract the IP and port for every entry into a char*.
The result should look like this
192.168.0.1 1234\n192.168.0.2 1235\n192.168.0.3 1236\n //and so on..
I honestly have no clue how to get the information and allocate the correct size for the final char*.
Use (for example) struct sockaddr_storage stor's member ss_family to determine the address family and depending on this chose v4addr or v6addr to be used with inet_ntop().
The port number comes in network bytes order, so it shall be pass to ntohs() before being used.
The members of v4addr and v6addr to be used could be drawn from <netinet/in.h>:
/* Structure describing an Internet socket address. */
struct sockaddr_in
{
[...]
in_port_t sin_port; /* Port number. */
struct in_addr sin_addr; /* Internet address. */
[...]
};
/* Ditto, for IPv6. */
struct sockaddr_in6
{
[...]
in_port_t sin6_port; /* Transport layer port # */
[...]
struct in6_addr sin6_addr; /* IPv6 address */
[...]
};
To create buffers with a size unknown at compile time, use dynamic memory allocation in general.
For successivly allocating a memory block of increasing size, like when looping through your arrray and adding address:port tuples use realloc() in particular.

Why do we cast sockaddr_in to sockaddr when calling bind()?

The bind() function accepts a pointer to a sockaddr, but in all examples I've seen, a sockaddr_in structure is used instead, and is cast to sockaddr:
struct sockaddr_in name;
...
if (bind (sock, (struct sockaddr *) &name, sizeof (name)) < 0)
...
I can't wrap my head around why is a sockaddr_in struct used. Why not just prepare and pass a sockaddr?
Is it just convention?
No, it's not just convention.
sockaddr is a generic descriptor for any kind of socket operation, whereas sockaddr_in is a struct specific to IP-based communication (IIRC, "in" stands for "InterNet"). As far as I know, this is a kind of "polymorphism" : the bind() function pretends to take a struct sockaddr *, but in fact, it will assume that the appropriate type of structure is passed in; i. e. one that corresponds to the type of socket you give it as the first argument.
I don't know if its very much relevant for this question, but I would like to provide some extra info which may make the typecaste more understandable as many people who haven't spent much time with C get confused seeing such a typecaste.
I use macOS, so I am taking examples based on header files from my system.
struct sockaddr is defined as follows:
struct sockaddr {
__uint8_t sa_len; /* total length */
sa_family_t sa_family; /* [XSI] address family */
char sa_data[14]; /* [XSI] addr value (actually larger) */
};
struct sockaddr_in is defined as follows:
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
Starting from the very basics, a pointer just contains an address. So struct sockaddr * and struct sockaddr_in * are pretty much the same. They both just store an address. Only relevant difference is how compiler treats their objects.
So when you say (struct sockaddr *) &name, you are just tricking the compiler and telling it that this address points to a struct sockaddr type.
So let's say the pointer is pointing to a location 1000. If the struct sockaddr * stores this address, it will consider memory from 1000 to sizeof(struct sockaddr) possessing the members as per the structure definition. If struct sockaddr_in * stores the same address it will consider memory from 1000 to sizeof(struct sockaddr_in).
When you typecasted that pointer, it will consider the same sequence of bytes upto sizeof(struct sockaddr).
struct sockaddr *a = &name; // consider &name = 1000
Now if I access a->sa_len, the compiler would access from location 1000 to sizeof(__uint8_t) which is same bytes size as in case of sockaddr_in. So this should access the same sequence of bytes.
Same pattern is for sa_family.
After that there is a 14 byte character array in struct sockaddr which stores data from in_port_t sin_port (typedef'd 16 bit unsigned integer = 2 bytes ) , struct in_addr sin_addr (simply a 32 bit ipv4 address = 4 bytes) and char sin_zero[8](8 bytes). These 3 add up to make 14 bytes.
Now these three are stored in this 14 bytes character array and we can access any of these three by accessing appropriate indices and typecasting them again.
user529758's answer already explains the reason to do this.
This is because bind can bind other types of sockets than IP sockets, for instance Unix domain sockets, which have sockaddr_un as their type. The address for an AF_INET socket has the host and port as their address, whereas an AF_UNIX socket has a filesystem path.

What does the abbreviation "s_", "ai_", "sin_", "in" (if such) in the IP structures mean?

Pretty simple questions. And yes, maybe not (that) important, but I'm really curious what do they mean and I couldn't find their meanings.
// ipv4
struct sockaddr_in {
short int sin_family; // Address family, AF_INET
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // Same size as struct sockaddr
};
// ipv4
struct in_addr {
uint32_t s_addr; // that's a 32-bit int (4 bytes)
};
// ipv6
struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};
// ipv6
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
sin_ means sockaddr_in, ai_ means addrinfo, sa_ means sockaddr. I'm not sure about the s_ in in_addr. The sockets API was designed with pre-standard early 1980s C compilers in mind, which might have had a single namespace for all struct members.
larsman is mostly right, but it's not merely a matter of legacy single-namespace considerations. All the structs defined in standard headers use names of this form to avoid stepping on the application's namespace for macros. If struct members were not prefixed with ai_, sin_, etc., then whatever member names were included in the struct (including extensions not even specified in the C or POSIX standards) would clash and result in errors if an application defined the same name as a preprocessor macro. By using these "struct-local namespaces" that can be reserved by simple pattern rules in the standards (for instance, netdb.h reserves ai_*) there is a clear distinction between names reserved for use by the implementation and names reserved for use by the application, and new extensions or new revisions of the standard will not result in clashes.
The Microsoft definition of in_addr could imply that the "S_" prefix means struct as per #R..'s answer and seeing "un" for union, however they use a capital "S" unlike POSIX land.
typedef struct in_addr {
union {
struct {
u_char s_b1,s_b2,s_b3,s_b4;
} S_un_b;
struct {
u_short s_w1,s_w2;
} S_un_w;
u_long S_addr;
} S_un;
} IN_ADDR, *PIN_ADDR, FAR *LPIN_ADDR;
For the set of socket structures "s" usually means "sock" short for "socket", so there does not appear to be a definition reason.
For curiosity the MSDN page on IPv6 Support shows struct in6_addr containing the name s6_addr implying the IPv6 version of an IPv4 "socket" structure rather than just a "IPv4 structure".

Resources