I'm trying a test code that connects to a remote host with data given by gethostbyname() function. In examples I found they do the following:
struct hostent* pHostent = gethostbyname(host);
struct sockaddr_in remoteAddr;
// ... code
remoteAddr.sin_addr.s_addr = ((struct in_addr*) (pHostent->h_addr))->s_addr;
// ... more code
I'm trying to understand what's being done here.
Is it legal since data types are different? Maybe a memcpy()
should have been used?
Why does this work? Meaning what data
actually resides in both places?
We can start by looking at the actual struct layouts:
struct hostent {
char *h_name;
char **h_aliases;
int h_addrtype
int h_length;
char **h_addr_list;
}
#define h_addr h_addr_list[0]
struct sockaddr_in {
short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct in_addr {
uint32_t s_addr; // IPv4 address
};
The gethostbyname() function can give you either IPv4 or IPv6 addresses, depending on the value of h_addrtype. So the h_addr_list need to be able to hold either IPv4 or IPv6 addresses. To accomplish this the addresses are stored as raw memory pointed to by char* pointers. To get the actual address, you need to cast the memory to the correct address type, as you found in your code:
remoteAddr.sin_addr.s_addr = ((struct in_addr*) (pHostent->h_addr))->s_addr;
So to answer your questions:
The pointer types are different, but the data pointed to are the same type.
No, the data is in one place only, it's just referenced to by pointers of different type.
Related
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
The actual structure passed for the addr argument will depend on the address family. The sockaddr structure is defined as something like:
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
}
So for an IPv4 address (AF_INET), the actual struct that will be passed is this:
/* Source http://linux.die.net/man/7/ip */
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address. */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
Does the bind code read the sockaddr.sa_family value and depending on the value it finds, it will then cast the sockaddr struct into the appropriate struct such as sockaddr_in?
Why is the sa_data set to 14 characters? If I understand correct, the sa_data field is just a field that will have large enough memory space to fit all address family types? Presumably the original designers anticipated that 14 characters would be wide enough to fit all future types.
According to the glibc manual:
The length 14 of sa_data is essentially arbitrary.
And the FreeBSD developers handbook mentions the following:
Please note the vagueness with which the sa_data field is declared,
just as an array of 14 bytes, with the comment hinting there can be
more than 14 of them.
This vagueness is quite deliberate. Sockets is a very powerful
interface. While most people perhaps think of it as nothing more than
the Internet interface—and most applications probably use it for that
nowadays—sockets can be used for just about any kind of interprocess
communications, of which the Internet (or, more precisely, IP) is only
one.
Yes, the sa_family field is used to recognize how to treat the struct passed (which is cast to struct sockaddr* in a call to bind). You can read more about how it works also in a FreeBSD developers handbook.
And actually there are "polymorphic" (sub)types of sockaddr, in which sa_data contains more than 16 bytes, for example:
struct sockaddr_un {
sa_family_t sun_family; /* AF_UNIX */
char sun_path[108]; /* pathname */
};
The sockaddr struct is used as a tagged union. By reading the sa_family field it can be cast to a struct of the proper form.
The 14 bytes is arbitrary. It's big enough to hold IPv4 addresses, but not big enough to hold IPv6 addresses. There is also a sockaddr_storage struct which is big enough for both. Reading the Microsoft docs on SOCKADDR_STORAGE, it comes in at 128 bytes, so much larger than needed for IPv6. Checking some Linux headers, it seems to be at least that large there as well.
For reference, the IPv6 struct is:
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct in6_addr {
unsigned char s6_addr[16]; // IPv6 address
};
As you can see, the 16 byte s6_addr field is already bigger than the 14 byte sa_data field on it's own. Total size after the sa_family field is 26 bytes.
I have a fixed length array, every entry is from type struct contact
typedef struct contact
{
int fd;
union
{
struct sockaddr_in v4addr;
struct sockaddr_in6 v6addr;
struct sockaddr_storage stor;
};
char buf[FRAME_BUF_LEN];
int len;
char name[32];
} contact_t;
and I need to extract the IP and port for every entry into a char*.
The result should look like this
192.168.0.1 1234\n192.168.0.2 1235\n192.168.0.3 1236\n //and so on..
I honestly have no clue how to get the information and allocate the correct size for the final char*.
Use (for example) struct sockaddr_storage stor's member ss_family to determine the address family and depending on this chose v4addr or v6addr to be used with inet_ntop().
The port number comes in network bytes order, so it shall be pass to ntohs() before being used.
The members of v4addr and v6addr to be used could be drawn from <netinet/in.h>:
/* Structure describing an Internet socket address. */
struct sockaddr_in
{
[...]
in_port_t sin_port; /* Port number. */
struct in_addr sin_addr; /* Internet address. */
[...]
};
/* Ditto, for IPv6. */
struct sockaddr_in6
{
[...]
in_port_t sin6_port; /* Transport layer port # */
[...]
struct in6_addr sin6_addr; /* IPv6 address */
[...]
};
To create buffers with a size unknown at compile time, use dynamic memory allocation in general.
For successivly allocating a memory block of increasing size, like when looping through your arrray and adding address:port tuples use realloc() in particular.
From what I understand struct addrinfo is used to prep the socket address structure and struct sockaddr contains socket address information. But what does that actually mean? struct addrinfo contains a pointer to a struct sockaddr. Why keep them separate? Why can't we combine all things within sockaddr into addr_info?
I'm just guessing here but is the reason for their separation is to save space when passing structs? For example in the bind() call, all it needs is the port number and the internet address. So both of these are grouped in a struct sockaddr. So, we can just pass this small struct instead of the larger struct addrinfo?
struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
struct addrinfo is returned by getaddrinfo(), and contains, on success, a linked list of such structs for a specified hostname and/or service.
The ai_addr member isn't actually a struct sockaddr, because that struct is merely a generic one that contains common members for all the others, and is used in order to determine what type of struct you actually have. Depending upon what you pass to getaddrinfo(), and what that function found out, ai_addr might actually be a pointer to struct sockaddr_in, or struct sockaddr_in6, or whatever else, depending upon what is appropriate for that particular address entry. This is one good reason why they're kept "separate", because that member might point to one of a bunch of different types of structs, which it couldn't do if you tried to hardcode all the members into struct addrinfo, because those different structs have different members.
This is probably the easiest way to get this information if you have a hostname, but it's not the only way. For an IPv4 connection, you can just populate a struct sockaddr_in structure yourself, if you want to and you have the data to do so, and avoid going through the rigamarole of calling getaddrinfo(), which you might have to wait for if it needs to go out into the internet to collect the information for you. You don't have to use struct addrinfo at all.
I am new to network programming. The following structure definitions are quite confusing to me. Here h_addr_list is a defined as string array, but it is used to store array of in_addr structures. Why didn't it define as struct in_addr **h_addr_list rather than char **h_addr_list?
struct hostent
{
char *h_name; /* Official domain name of host */
char **h_aliases; /* Null-terminated array of domain names */
int h_addrtype; /* Host address type (AF_INET) */
int h_length; /* Length of an address, in bytes */
char **h_addr_list; /* Null-terminated array of in_addr structs */
};
struct in_addr
{
unsigned int s_addr; /* Network byte order (big-endian) */
};
The structure definition dates back to the era before C supported void * (or void at all, or prototypes). In those days, char * was the 'universal pointer'. This accounts for some of the weirdnesses of the networking function interfaces.
It also dates back to the era when there were many different networking systems (IPX/SPX, SNA, TCP/IP, …). These days, TCP/IP is dominant, but even now, you could have an array of IPv4 or an array of IPv6 addresses being returned, so specifying either struct in_addr or struct in6_addr would cause problems.
The intention was that you'd have an array of pointers to appropriate structure types. Nowadays, it would be written void **h_addr_list — an array of void *. But this option was not available when the structures were first defined, and the rest is history (you don't change an interface after it is standardized if you can avoid it).
When the struct was created, the creator wasn't sure whether AF_INET was going to be the winning address type.
What if h_addrtype is something other than AF_INET? Then h_addr_list will contain addresses that are not struct in_addr.
Now, a couple of decades later, we find that IPV4 addresses are running out. Soon, struct inaddr will be replaced more and more by IPV6 addresses.
I'm looking at functions such as connect() and bind() in C sockets and notice that they take a pointer to a sockaddr struct. I've been reading and to make your application AF-Independent, it is useful to use the sockaddr_storage struct pointer and cast it to a sockaddr pointer because of all the extra space it has for larger addresses.
What I am wondering is how functions like connect() and bind() that ask for a sockaddr pointer go about accessing the data from a pointer that points at a larger structure than the one it is expecting. Sure, you pass it the size of the structure you are providing it, but what is the actual syntax that the functions use to get the IP Address off the pointers to larger structures that you have cast to struct *sockaddr?
It's probably because I come from OOP languages, but it seems like kind of a hack and a bit messy.
Functions that expect a pointer to struct sockaddr probably typecast the pointer you send them to sockaddr when you send them a pointer to struct sockaddr_storage. In that way, they access it as if it was a struct sockaddr.
struct sockaddr_storage is designed to fit in both a struct sockaddr_in and struct sockaddr_in6
You don't create your own struct sockaddr, you usually create a struct sockaddr_in or a struct sockaddr_in6 depending on what IP version you're using. In order to avoid trying to know what IP version you will be using, you can use a struct sockaddr_storage which can hold either. This will in turn be typecasted to struct sockaddr by the connect(), bind(), etc functions and accessed that way.
You can see all of these structs below (the padding is implementation specific, for alignment purposes):
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct sockaddr_storage {
sa_family_t ss_family; // address family
// all this is padding, implementation specific, ignore it:
char __ss_pad1[_SS_PAD1SIZE];
int64_t __ss_align;
char __ss_pad2[_SS_PAD2SIZE];
};
So as you can see, if the function expects an IPv4 address, it will just read the first 4 bytes (because it assumes the struct is of type struct sockaddr. Otherwise it will read the full 16 bytes for IPv6).
In C++ classes with at least one virtual function are given a TAG. That tag allows you to dynamic_cast<>() to any of the classes your class derives from and vice versa. The TAG is what allows dynamic_cast<>() to work. More or less, this can be a number or a string...
In C we are limited to structures. However, structures can also be assigned a TAG. In fact, if you look at all the structures that theprole posted in his answer, you will notice that they all start with 2 bytes (an unsigned short) which represents what we call the family of the address. This defines exactly what the structure is and thus its size, fields, etc.
Therefore you can do something like this:
int bind(int fd, struct sockaddr *in, socklen_t len)
{
switch(in->sa_family)
{
case AF_INET:
if(len < sizeof(struct sockaddr_in))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in *p = (struct sockaddr_in *) in;
...
}
break;
case AF_INET6:
if(len < sizeof(struct sockaddr_in6))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in6 *p = (struct sockaddr_in6 *) in;
...
}
break;
[...other cases...]
default:
errno = EINVAL; // family not supported
return -1;
}
}
As you can see, the function can check the len parameter to make sure that the length is enough to fit the expected structure and therefore they can reinterpret_cast<>() (as it would be called in C++) your pointer. Whether the data is correct in the structure is up to the caller. There is not much choice on that end. These functions are expected to verify all sorts of things before it uses the data and return -1 and errno whenever a problem is found.
So in effect, you have a struct sockaddr_in or struct sockaddr_in6 that you (reinterpret) cast to a struct sockaddr and the bind() function (and others) cast that pointer back to a struct sockaddr_in or struct sockaddr_in6 after they checked the sa_family member and verified the size.