Casting between sockaddr and sockaddr_in6 - c

I've been reading Beej's Guide to Network Programming and in one of his examples, he casts a pointer to struct sockaddr to a struct sockaddr_in6 pointer like shown below.
void *addr;
char *ipver;
// get the pointer to the address itself,
// different fields in IPv4 and IPv6:
if (p->ai_family == AF_INET) { // IPv4
struct sockaddr_in *ipv4 = (struct sockaddr_in *)p->ai_addr;
addr = &(ipv4->sin_addr);
ipver = "IPv4";
} else { // IPv6
struct sockaddr_in6 *ipv6 = (struct sockaddr_in6 *)p->ai_addr;
addr = &(ipv6->sin6_addr);
ipver = "IPv6";
}
How is this possible, since the sizes of the structs are different?

ai_family and ai_addr are fields of the addrinfo struct, so presumably the code you are quoting had called getaddrinfo() beforehand.
The result of getaddrinfo() is a NULL-terminated linked list of addrinfo structs, where the addrinfo::ai_addr field is a pointer to an allocated memory block that is of sufficient size to hold a socket address of the reported addrinfo::ai_family type. The size of the address is reported in the addrinfo::ai_addrlen field.
For AF_INET, the addrinfo::ai_addr field is pointing at a memory block containing a sockaddr_in struct.
For AF_INET6, the addrinfo::ai_addr field is pointing at a memory block containing a sockaddr_in6 struct.
That is why the type-casts work.
The addrinfo::ai_addr field is declared as struct sockaddr* so it can be passed as-is to the addr parameter of the bind() and connect() functions without type-casting. The addrinfo::ai_addrlen field can be passed as-is to their addrlen parameter.

Related

Why do we cast sockaddr_in to sockaddr when calling bind()?

The bind() function accepts a pointer to a sockaddr, but in all examples I've seen, a sockaddr_in structure is used instead, and is cast to sockaddr:
struct sockaddr_in name;
...
if (bind (sock, (struct sockaddr *) &name, sizeof (name)) < 0)
...
I can't wrap my head around why is a sockaddr_in struct used. Why not just prepare and pass a sockaddr?
Is it just convention?
No, it's not just convention.
sockaddr is a generic descriptor for any kind of socket operation, whereas sockaddr_in is a struct specific to IP-based communication (IIRC, "in" stands for "InterNet"). As far as I know, this is a kind of "polymorphism" : the bind() function pretends to take a struct sockaddr *, but in fact, it will assume that the appropriate type of structure is passed in; i. e. one that corresponds to the type of socket you give it as the first argument.
I don't know if its very much relevant for this question, but I would like to provide some extra info which may make the typecaste more understandable as many people who haven't spent much time with C get confused seeing such a typecaste.
I use macOS, so I am taking examples based on header files from my system.
struct sockaddr is defined as follows:
struct sockaddr {
__uint8_t sa_len; /* total length */
sa_family_t sa_family; /* [XSI] address family */
char sa_data[14]; /* [XSI] addr value (actually larger) */
};
struct sockaddr_in is defined as follows:
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
Starting from the very basics, a pointer just contains an address. So struct sockaddr * and struct sockaddr_in * are pretty much the same. They both just store an address. Only relevant difference is how compiler treats their objects.
So when you say (struct sockaddr *) &name, you are just tricking the compiler and telling it that this address points to a struct sockaddr type.
So let's say the pointer is pointing to a location 1000. If the struct sockaddr * stores this address, it will consider memory from 1000 to sizeof(struct sockaddr) possessing the members as per the structure definition. If struct sockaddr_in * stores the same address it will consider memory from 1000 to sizeof(struct sockaddr_in).
When you typecasted that pointer, it will consider the same sequence of bytes upto sizeof(struct sockaddr).
struct sockaddr *a = &name; // consider &name = 1000
Now if I access a->sa_len, the compiler would access from location 1000 to sizeof(__uint8_t) which is same bytes size as in case of sockaddr_in. So this should access the same sequence of bytes.
Same pattern is for sa_family.
After that there is a 14 byte character array in struct sockaddr which stores data from in_port_t sin_port (typedef'd 16 bit unsigned integer = 2 bytes ) , struct in_addr sin_addr (simply a 32 bit ipv4 address = 4 bytes) and char sin_zero[8](8 bytes). These 3 add up to make 14 bytes.
Now these three are stored in this 14 bytes character array and we can access any of these three by accessing appropriate indices and typecasting them again.
user529758's answer already explains the reason to do this.
This is because bind can bind other types of sockets than IP sockets, for instance Unix domain sockets, which have sockaddr_un as their type. The address for an AF_INET socket has the host and port as their address, whereas an AF_UNIX socket has a filesystem path.

Writing into h_addr_list

I am writing a wrapper for gethostbyname() function, which, before returning a pointer to the hostent structure, should allow for executing getaddrinfo() and eventually mapping the returned IPv6 structures to the IPv4 ones. However, I am having a problem with casting the returned in_addr structures properly in order to populate the h_addr_list of hostent addresses - in case the family identified equals AF_INET, of course.
I am basically doing the following:
strcpy(&s[0],name);
hp->h_name = strdup(s);
hp->h_addrtype = AF_INET;
hp->h_length = sizeof(struct in_addr);
struct sockaddr *sa= res->ai_addr;
// Segmentation fault:
memcpy(hp->h_addr_list[0], &(((struct sockaddr_in *)sa)->sin_addr.s_addr), hp->h_length);
Any hints? I haven't written any C code in a long time, so sorry if I am asking a stupid question. Thanks.
The s_addr member (in e.g. saddr->sin_addr.s_addr) is not a pointer. You have to use the address-of operator to make it a pointer.
And hp->h_addr_list[0] is a pointer, so when you use the address-of operator here, you get the address of that pointer, and will copy to the completely wrong address.
Ok, allocating blocks for the hostent and h_addr_list worked for me, some more context:
hp=(struct hostent *)calloc(1,sizeof(struct hostent));
hp->h_name = strdup(s);
hp->h_aliases = NULL;
hp->h_addrtype = AF_INET;
hp->h_length = sizeof(struct in_addr);
hp->h_addr_list = (char **)calloc(2,sizeof(char *));
hp->h_addr_list[0] = calloc(1,4);
struct sockaddr *sa = res->ai_addr;
memcpy(hp->h_addr_list[0], (char *)&(((struct sockaddr_in *)sa)->sin_addr.s_addr), hp->h_length);

Reasoning behind C sockets sockaddr and sockaddr_storage

I'm looking at functions such as connect() and bind() in C sockets and notice that they take a pointer to a sockaddr struct. I've been reading and to make your application AF-Independent, it is useful to use the sockaddr_storage struct pointer and cast it to a sockaddr pointer because of all the extra space it has for larger addresses.
What I am wondering is how functions like connect() and bind() that ask for a sockaddr pointer go about accessing the data from a pointer that points at a larger structure than the one it is expecting. Sure, you pass it the size of the structure you are providing it, but what is the actual syntax that the functions use to get the IP Address off the pointers to larger structures that you have cast to struct *sockaddr?
It's probably because I come from OOP languages, but it seems like kind of a hack and a bit messy.
Functions that expect a pointer to struct sockaddr probably typecast the pointer you send them to sockaddr when you send them a pointer to struct sockaddr_storage. In that way, they access it as if it was a struct sockaddr.
struct sockaddr_storage is designed to fit in both a struct sockaddr_in and struct sockaddr_in6
You don't create your own struct sockaddr, you usually create a struct sockaddr_in or a struct sockaddr_in6 depending on what IP version you're using. In order to avoid trying to know what IP version you will be using, you can use a struct sockaddr_storage which can hold either. This will in turn be typecasted to struct sockaddr by the connect(), bind(), etc functions and accessed that way.
You can see all of these structs below (the padding is implementation specific, for alignment purposes):
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct sockaddr_storage {
sa_family_t ss_family; // address family
// all this is padding, implementation specific, ignore it:
char __ss_pad1[_SS_PAD1SIZE];
int64_t __ss_align;
char __ss_pad2[_SS_PAD2SIZE];
};
So as you can see, if the function expects an IPv4 address, it will just read the first 4 bytes (because it assumes the struct is of type struct sockaddr. Otherwise it will read the full 16 bytes for IPv6).
In C++ classes with at least one virtual function are given a TAG. That tag allows you to dynamic_cast<>() to any of the classes your class derives from and vice versa. The TAG is what allows dynamic_cast<>() to work. More or less, this can be a number or a string...
In C we are limited to structures. However, structures can also be assigned a TAG. In fact, if you look at all the structures that theprole posted in his answer, you will notice that they all start with 2 bytes (an unsigned short) which represents what we call the family of the address. This defines exactly what the structure is and thus its size, fields, etc.
Therefore you can do something like this:
int bind(int fd, struct sockaddr *in, socklen_t len)
{
switch(in->sa_family)
{
case AF_INET:
if(len < sizeof(struct sockaddr_in))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in *p = (struct sockaddr_in *) in;
...
}
break;
case AF_INET6:
if(len < sizeof(struct sockaddr_in6))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in6 *p = (struct sockaddr_in6 *) in;
...
}
break;
[...other cases...]
default:
errno = EINVAL; // family not supported
return -1;
}
}
As you can see, the function can check the len parameter to make sure that the length is enough to fit the expected structure and therefore they can reinterpret_cast<>() (as it would be called in C++) your pointer. Whether the data is correct in the structure is up to the caller. There is not much choice on that end. These functions are expected to verify all sorts of things before it uses the data and return -1 and errno whenever a problem is found.
So in effect, you have a struct sockaddr_in or struct sockaddr_in6 that you (reinterpret) cast to a struct sockaddr and the bind() function (and others) cast that pointer back to a struct sockaddr_in or struct sockaddr_in6 after they checked the sa_family member and verified the size.

struct type conversion in C

In the below snippet , I found a sockaddr_in is type converted as sockaddr as '(struct sockaddr *) &sin' . I just want to know what are the variables that are in sockaddr_in will be mapped correspondingly to sockaddr .
Below is the snippet.
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
// IPv4 AF_INET sockets:
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_port = htons(floodport);
sin.sin_addr.s_addr = inet_addr(argv[1]);
//type conversion
(struct sockaddr *) &sin // what values of sockaddr_in would be mapped to sockaddr ?
this conversion is used in sendto() in socket programming as below .
sendto(s, datagram, iph->tot_len,0, (struct sockaddr *) &sin,sizeof(sin))
Thanks in advance .
The only purpose for struct sockaddr is to have one type to pass around to functions like sendto() etc.
In fact, you don't use it for other purposes, there you have other structs such as
struct sockaddr_in for the legacy IPv4
struct sockaddr_in6 for IPv6
struct sockaddr_un for Unix sockets
struct sockaddr_bth for Bluetooth
struct sockaddr_storage which is as large as the largest in your
architecture. Can neatly be used for storing addresses whose type you
don't know.
The functions that use struct sockaddr will only read sa_family and do the opposite cast internally (if they understand what's inside sa_family).

Isn't struct sockadr_in supposed to work for both IPv4 and IPv6?

Specifically sin_addr seems to be located on different memory locations for IPv4 and IPv6 socket addressed. This results in weirdness:
#include <stdio.h>
#include <netinet/in.h>
int main(int argc, char ** argv) {
struct sockaddr_in sa;
printf("sin_addr in sockaddr_in = %p\n", &sa.sin_addr);
printf("sin_addr in sockaddr_in6 = %p\n", &((struct sockaddr_in6*)&sa)->sin6_addr);
};
Output:
sin_addr in sockaddr_in = 0x7fffa26102b4
sin_addr in sockaddr_in6 = 0x7fffa26102b8
Why aren't these 2 values the same ?
Since this is pointing to the same data (the address to connect to), this should be located at the same address. Otherwise, how are you supposed to call inet_ntop with a sockaddr_in that you don't know is IPv4 or IPv6 ?
Why aren't these 2 values the same ?
sockaddr_in and sockaddr_in6 are different structs used for different address families (IPv4 and IPv6, respectively). They are not required to be compatible with each other in any way except one - the first field must be a 16-bit integer to hold the address family. sockaddr_in always has that field set to AF_INET, and sockaddr_in6 always has that field set to AF_INET6. By standardizing the family field in this way, any sockaddr-based API can access that field and know how to interpret the rest of the struct data as needed. That is also why sockaddr-based APIs usually also have an int size value as input/output as well, since sockaddr_in and sockaddr_in6 are different byte sizes, so APIs need to be able to validate the size of any buffers you pass around.
Since this is pointing to the same data (the address to connect to), this should be located at the same address.
No, it should not. The location of the address field within the struct is specific to the type of address family the struct belongs to. There is no requirement that sockaddr_in and sockaddr_in6 should store their addresses at the exact same offset.
Otherwise, how are you supposed to call inet_ntop with a sockaddr_in that you don't know is IPv4 or IPv6 ?
sockaddr_in can only be used with IPv4 and nothing else, and sockaddr_in6 can only be used with IPv6 and nothing else. If you have a sockaddr_in then you implicitally know you have an IPv4 address, and if you have a sockaddr_in6 then you implicitally know you have an IPv6 address. You have to specify that information to inet_ntop() so it knows how to interpret the data you pass in to it:
struct sockaddr_in sa;
inet_ntop(AF_INET, &(sa.sin_addr), ...);
.
struct sockaddr_in6 sa;
inet_ntop(AF_INET6, &(sa.sin6_addr), ...);
To help you write family-agnostic code, you should be using sockaddr_storage instead of sockaddr_in or sockaddr_in6 directly when possible. sockaddr_storage is large enough in size to hold both sockaddr_in and sockaddr_in6 structs. Since both structs define a family field at the same offset and size, sockaddr_storage can be used with any API that operates on sockaddr* pointers (connect(), accept(), bind(), getsockname(), getpeername(), etc).
However, inet_ntop() does not fall into that category, so you have to pull apart a sockaddr_storage manually when using inet_ntop(), eg:
struct sockaddr_storage sa;
switch (sa.ss_family)
{
case AF_INET:
inet_ntop(AF_INET, &(((sockaddr_in*)&sa)->sin_addr), ...);
break;
case AF_INET6:
inet_ntop(AF_INET6, &(((sockaddr_in6*)&sa)->sin6_addr), ...);
break;
}
No, for ipv6 you need to use
in6_addr // is used to store the 128-bit network address
and
sockaddr_in6
Details can be referenced here
For writing code which supports dual stack i.e. ipv4 and 6 along use this

Resources