From what I understand struct addrinfo is used to prep the socket address structure and struct sockaddr contains socket address information. But what does that actually mean? struct addrinfo contains a pointer to a struct sockaddr. Why keep them separate? Why can't we combine all things within sockaddr into addr_info?
I'm just guessing here but is the reason for their separation is to save space when passing structs? For example in the bind() call, all it needs is the port number and the internet address. So both of these are grouped in a struct sockaddr. So, we can just pass this small struct instead of the larger struct addrinfo?
struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
struct addrinfo is returned by getaddrinfo(), and contains, on success, a linked list of such structs for a specified hostname and/or service.
The ai_addr member isn't actually a struct sockaddr, because that struct is merely a generic one that contains common members for all the others, and is used in order to determine what type of struct you actually have. Depending upon what you pass to getaddrinfo(), and what that function found out, ai_addr might actually be a pointer to struct sockaddr_in, or struct sockaddr_in6, or whatever else, depending upon what is appropriate for that particular address entry. This is one good reason why they're kept "separate", because that member might point to one of a bunch of different types of structs, which it couldn't do if you tried to hardcode all the members into struct addrinfo, because those different structs have different members.
This is probably the easiest way to get this information if you have a hostname, but it's not the only way. For an IPv4 connection, you can just populate a struct sockaddr_in structure yourself, if you want to and you have the data to do so, and avoid going through the rigamarole of calling getaddrinfo(), which you might have to wait for if it needs to go out into the internet to collect the information for you. You don't have to use struct addrinfo at all.
Related
I'm trying a test code that connects to a remote host with data given by gethostbyname() function. In examples I found they do the following:
struct hostent* pHostent = gethostbyname(host);
struct sockaddr_in remoteAddr;
// ... code
remoteAddr.sin_addr.s_addr = ((struct in_addr*) (pHostent->h_addr))->s_addr;
// ... more code
I'm trying to understand what's being done here.
Is it legal since data types are different? Maybe a memcpy()
should have been used?
Why does this work? Meaning what data
actually resides in both places?
We can start by looking at the actual struct layouts:
struct hostent {
char *h_name;
char **h_aliases;
int h_addrtype
int h_length;
char **h_addr_list;
}
#define h_addr h_addr_list[0]
struct sockaddr_in {
short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct in_addr {
uint32_t s_addr; // IPv4 address
};
The gethostbyname() function can give you either IPv4 or IPv6 addresses, depending on the value of h_addrtype. So the h_addr_list need to be able to hold either IPv4 or IPv6 addresses. To accomplish this the addresses are stored as raw memory pointed to by char* pointers. To get the actual address, you need to cast the memory to the correct address type, as you found in your code:
remoteAddr.sin_addr.s_addr = ((struct in_addr*) (pHostent->h_addr))->s_addr;
So to answer your questions:
The pointer types are different, but the data pointed to are the same type.
No, the data is in one place only, it's just referenced to by pointers of different type.
I'm trying to understand the linux kernel network code and I come up across many functions like
static inline struct tcp_sock *tcp_sk(const struct sock *sk)
{
return (struct tcp_sock *)sk;
}
The contents of the structures sock and tcp_sock are quite different. How does this mapping exactly happen? Do the structure members with the same name and type get mapped to each other?
Edit 1: To be more specific, in the function tcp_v4_connect
struct tcp_sock *tp = tcp_sk(sk);
tp->write_seq = secure_tcp_sequence_number(inet->inet_saddr,
inet->inet_daddr,
inet->inet_sport,
usin->sin_port);
err = tcp_connect(sk);
And then in the function tcp_connect called above,
struct tcp_sock *tp = tcp_sk(sk);
tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
The variable write_seq exists only in the struct tcp_sock so how can it's value have been passed on to the function tcp_connect when only the struct sock *sk was passed to it?
Thank you.
Do the structure members with the same name and type get mapped to each other?
No.
struct sock sk; /* our sock struct */
struct tcp_sock *tcp_sk = (struct tcp_sock *)&sk;
The tcp_sk pointer now simply points to memory pointed to by &sk. By accessing tcp_sk fields now you will access the memory assigned to this pointer but in order to get expected result the data stored at both addresses should be valid.
Now, struct sock (network layer representation of sockets) has struct sock_common as it's first member:
306 struct sock {
307 /*
308 * Now struct inet_timewait_sock also uses sock_common, so please just
309 * don't add nothing before this first member (__sk_common) --acme
310 */
311 struct sock_common __sk_common;
And struct tcp_sock has struct inet_connection_sock as it's first member
struct tcp_sock {
138 /* inet_connection_sock has to be the first member of tcp_sock */
139 struct inet_connection_sock inet_conn;
and this struct has struct inet_sock as it's first member:
90 struct inet_connection_sock {
91 /* inet_sock has to be the first member! */
92 struct inet_sock icsk_inet;
and guess what struct is the first member of inet_sock... It is struct sock:
172 struct inet_sock {
173 /* sk and pinet6 has to be the first two members of inet_sock */
174 struct sock sk;
So by accessing tcp_sk.inet_conn.icsk_inet.sk you are accessing our sock struct.
As the construct is merely a cast, nothing is "mapped", nothing at the memory location that sk points to is actually changed.
This only works if the structures are similar or are at least starting with the same fields at the same offsets.
In linux kernel code you can see that struct sock and struct tcp_sock have all of the fields of struct sock in common (tcp_sock contains an inet_connection_sockwhich contains an inet_sock which in turn contains a sock as first elements, respectively). So, "the contents of the structures...are quite different", is not quite true. They are actually identical (at least for the size of a struct sock).
In order to be able to do this properly, the system somehow "must know" that it can safely cast a struct sock to a struct tcp_sock - This information is normally hidden somewhere in the common header, in this case in the struct sock part (I haven't checked recently, but I guess here the information is in sk_family)
After your edit:
What is actually passed to your function is a pointer to a memory region. Whatever has there been before is not changed by the cast (as I said above). If we can safely assume that sk was a pointer to a struct tcp_sock some time before, we can safely cast it back to such a beast. This is often done in code that moves around data that has a generic and a specific part (as in this case). All socket (TCP, UDP, generic, Unix sockets,...) data has common data that is collected in a struct sock at the beginning of all specific socket structures. So, on incoming socket data, all of the generic parts of that structures are first handled by functions handling a struct sock, later one there must be some big switch...case that looks into the specific parts, then passing on a specific pointer type.
This allows a layeredprogramming approach, where everything is first cast to a generic struct sock, then the generic part is handled by functions common for all socket types, later on, the pointer is cast back to what it originally was, and the specific part is handled by specialist functions. All the time, the memory area that is basically used as a "backpack" on a struct sock to hold specific data is simply left unchanged.
I have a function which takes in "struct sockaddr *" as a parameter (let's call this input_address), and then I need to operate on that address, which may be a sockaddr_in or sockaddr_in6, since I support both IPv4 and IPv6.
I'm getting some memory corruption and trying to track it down to it's source, and in the process found some code that seems suspect, so I would like to validate if this is the right way to do things.
struct sockaddr_storage *input_address_storage = (struct sockaddr_storage *) input_address;
struct sockaddr_storage result = [UtilityClass performSomeOperation: *input_address_storage];
At first I thought the cast in the first line was safe, but then in the second line I need to dereference that pointer, which seems like it may be wrong. The reason I am concerned is that it may end up copying memory that is beyond where the original structure is (since sockaddr_in is shorter than sockaddr_in6). I am not sure if this could cause a memory corruption (my guess is no), but nevertheless this code gives me a bad feeling.
I can't change the fact my function takes a "struct sockaddr *", so it seems like it would be difficult to work around this type of code, and yet I want to avoid copying from a memory location where I shouldn't be.
If anyone can validate whether what I am doing is wrong, and the best way to fix this, I'd appreciate it.
EDIT: An admin had changed my C tag for C# for some reason. The code I gave is primarily C, with one function call from objective C that doesn't really matter. That call could have been C.
The problem with your approach is that you are converting an existing struct sockaddr* into a struct sockaddr_storage*. Imagine what happens if the original was a ``struct sockaddr_in. Sincesizeof(struct sockaddr_in) < sizeof(struct sockaddr_storage)`, the memory-sanitizer complains of unbound memory reference.
struct sockaddr_storage is essentially a container to contain either your struct sockaddr_in or struct sockaddr_in6.
Hence, it is useful when you want to pass in a struct sockaddr* object but want to allocate enough memory for both sockaddr_in and sockaddr_in6.
A good example is the recvfrom(3) call:
ssize_t recvfrom(int socket, void *restrict buffer, size_t length,
int flags, struct sockaddr *restrict address,
socklen_t *restrict address_len);
Since address requires a struct sockaddr* object, we will construct a struct sockaddr_storage first, and pass it in:
struct sockaddr_storage address;
socklen_t address_length = sizeof(struct sockaddr_storage);
ssize_t ret = recvfrom(fd, buffer, buffer_length, 0, (struct sockaddr*)&address, &address_length);
if (address.ss_family == AF_INET) {
DoIpv4Work((struct sockaddr_in*)&address, ...);
} else if (address.ss_family == AF_INET6) {
DoIpv6Work((struct sockaddr_in6*)&address, ...);
}
The difference in your approach and mine is that I allocate a struct sockaddr_storage and then use it as struct sockaddr, but you do the REVERSE, and use a struct sockaddr and then use it as struct sockaddr_storage.
The bind() function accepts a pointer to a sockaddr, but in all examples I've seen, a sockaddr_in structure is used instead, and is cast to sockaddr:
struct sockaddr_in name;
...
if (bind (sock, (struct sockaddr *) &name, sizeof (name)) < 0)
...
I can't wrap my head around why is a sockaddr_in struct used. Why not just prepare and pass a sockaddr?
Is it just convention?
No, it's not just convention.
sockaddr is a generic descriptor for any kind of socket operation, whereas sockaddr_in is a struct specific to IP-based communication (IIRC, "in" stands for "InterNet"). As far as I know, this is a kind of "polymorphism" : the bind() function pretends to take a struct sockaddr *, but in fact, it will assume that the appropriate type of structure is passed in; i. e. one that corresponds to the type of socket you give it as the first argument.
I don't know if its very much relevant for this question, but I would like to provide some extra info which may make the typecaste more understandable as many people who haven't spent much time with C get confused seeing such a typecaste.
I use macOS, so I am taking examples based on header files from my system.
struct sockaddr is defined as follows:
struct sockaddr {
__uint8_t sa_len; /* total length */
sa_family_t sa_family; /* [XSI] address family */
char sa_data[14]; /* [XSI] addr value (actually larger) */
};
struct sockaddr_in is defined as follows:
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
Starting from the very basics, a pointer just contains an address. So struct sockaddr * and struct sockaddr_in * are pretty much the same. They both just store an address. Only relevant difference is how compiler treats their objects.
So when you say (struct sockaddr *) &name, you are just tricking the compiler and telling it that this address points to a struct sockaddr type.
So let's say the pointer is pointing to a location 1000. If the struct sockaddr * stores this address, it will consider memory from 1000 to sizeof(struct sockaddr) possessing the members as per the structure definition. If struct sockaddr_in * stores the same address it will consider memory from 1000 to sizeof(struct sockaddr_in).
When you typecasted that pointer, it will consider the same sequence of bytes upto sizeof(struct sockaddr).
struct sockaddr *a = &name; // consider &name = 1000
Now if I access a->sa_len, the compiler would access from location 1000 to sizeof(__uint8_t) which is same bytes size as in case of sockaddr_in. So this should access the same sequence of bytes.
Same pattern is for sa_family.
After that there is a 14 byte character array in struct sockaddr which stores data from in_port_t sin_port (typedef'd 16 bit unsigned integer = 2 bytes ) , struct in_addr sin_addr (simply a 32 bit ipv4 address = 4 bytes) and char sin_zero[8](8 bytes). These 3 add up to make 14 bytes.
Now these three are stored in this 14 bytes character array and we can access any of these three by accessing appropriate indices and typecasting them again.
user529758's answer already explains the reason to do this.
This is because bind can bind other types of sockets than IP sockets, for instance Unix domain sockets, which have sockaddr_un as their type. The address for an AF_INET socket has the host and port as their address, whereas an AF_UNIX socket has a filesystem path.
I'm looking at functions such as connect() and bind() in C sockets and notice that they take a pointer to a sockaddr struct. I've been reading and to make your application AF-Independent, it is useful to use the sockaddr_storage struct pointer and cast it to a sockaddr pointer because of all the extra space it has for larger addresses.
What I am wondering is how functions like connect() and bind() that ask for a sockaddr pointer go about accessing the data from a pointer that points at a larger structure than the one it is expecting. Sure, you pass it the size of the structure you are providing it, but what is the actual syntax that the functions use to get the IP Address off the pointers to larger structures that you have cast to struct *sockaddr?
It's probably because I come from OOP languages, but it seems like kind of a hack and a bit messy.
Functions that expect a pointer to struct sockaddr probably typecast the pointer you send them to sockaddr when you send them a pointer to struct sockaddr_storage. In that way, they access it as if it was a struct sockaddr.
struct sockaddr_storage is designed to fit in both a struct sockaddr_in and struct sockaddr_in6
You don't create your own struct sockaddr, you usually create a struct sockaddr_in or a struct sockaddr_in6 depending on what IP version you're using. In order to avoid trying to know what IP version you will be using, you can use a struct sockaddr_storage which can hold either. This will in turn be typecasted to struct sockaddr by the connect(), bind(), etc functions and accessed that way.
You can see all of these structs below (the padding is implementation specific, for alignment purposes):
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct sockaddr_storage {
sa_family_t ss_family; // address family
// all this is padding, implementation specific, ignore it:
char __ss_pad1[_SS_PAD1SIZE];
int64_t __ss_align;
char __ss_pad2[_SS_PAD2SIZE];
};
So as you can see, if the function expects an IPv4 address, it will just read the first 4 bytes (because it assumes the struct is of type struct sockaddr. Otherwise it will read the full 16 bytes for IPv6).
In C++ classes with at least one virtual function are given a TAG. That tag allows you to dynamic_cast<>() to any of the classes your class derives from and vice versa. The TAG is what allows dynamic_cast<>() to work. More or less, this can be a number or a string...
In C we are limited to structures. However, structures can also be assigned a TAG. In fact, if you look at all the structures that theprole posted in his answer, you will notice that they all start with 2 bytes (an unsigned short) which represents what we call the family of the address. This defines exactly what the structure is and thus its size, fields, etc.
Therefore you can do something like this:
int bind(int fd, struct sockaddr *in, socklen_t len)
{
switch(in->sa_family)
{
case AF_INET:
if(len < sizeof(struct sockaddr_in))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in *p = (struct sockaddr_in *) in;
...
}
break;
case AF_INET6:
if(len < sizeof(struct sockaddr_in6))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in6 *p = (struct sockaddr_in6 *) in;
...
}
break;
[...other cases...]
default:
errno = EINVAL; // family not supported
return -1;
}
}
As you can see, the function can check the len parameter to make sure that the length is enough to fit the expected structure and therefore they can reinterpret_cast<>() (as it would be called in C++) your pointer. Whether the data is correct in the structure is up to the caller. There is not much choice on that end. These functions are expected to verify all sorts of things before it uses the data and return -1 and errno whenever a problem is found.
So in effect, you have a struct sockaddr_in or struct sockaddr_in6 that you (reinterpret) cast to a struct sockaddr and the bind() function (and others) cast that pointer back to a struct sockaddr_in or struct sockaddr_in6 after they checked the sa_family member and verified the size.