What is the need of separate address structure in sockaddr_in? - c

This is the internet(IPv4) socket address structure defined in netinet/in.h
struct sockaddr_in {
uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct in_addr {
in_addr_t s_addr;
};
Here what is the need of separate structure only for address field.
Why can't we use following structure ?
struct sockaddr_in {
uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
in_addr_t sin_addr;
char sin_zero[8];
};

It's for historical reasons. In the early days of socket programming, struct in_addr contained a union of various structures so you could get to the individual bytes. This union became unnecessary when subnetting and classless addressing came along, but switching out the struct for a simple unsigned long would break a lot of code, so it just stayed that way.
If you're interested in network programming and you haven't yet picked up a copy of UNIX Network Programming then I'd highly recommend doing so, it's a goldmine for little details like this.

Related

Sockets. Setting s_addr field of sockaddr_in structure

I'm trying a test code that connects to a remote host with data given by gethostbyname() function. In examples I found they do the following:
struct hostent* pHostent = gethostbyname(host);
struct sockaddr_in remoteAddr;
// ... code
remoteAddr.sin_addr.s_addr = ((struct in_addr*) (pHostent->h_addr))->s_addr;
// ... more code
I'm trying to understand what's being done here.
Is it legal since data types are different? Maybe a memcpy()
should have been used?
Why does this work? Meaning what data
actually resides in both places?
We can start by looking at the actual struct layouts:
struct hostent {
char *h_name;
char **h_aliases;
int h_addrtype
int h_length;
char **h_addr_list;
}
#define h_addr h_addr_list[0]
struct sockaddr_in {
short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
struct in_addr {
uint32_t s_addr; // IPv4 address
};
The gethostbyname() function can give you either IPv4 or IPv6 addresses, depending on the value of h_addrtype. So the h_addr_list need to be able to hold either IPv4 or IPv6 addresses. To accomplish this the addresses are stored as raw memory pointed to by char* pointers. To get the actual address, you need to cast the memory to the correct address type, as you found in your code:
remoteAddr.sin_addr.s_addr = ((struct in_addr*) (pHostent->h_addr))->s_addr;
So to answer your questions:
The pointer types are different, but the data pointed to are the same type.
No, the data is in one place only, it's just referenced to by pointers of different type.

what does struct "sockaddr_in servaddr, clientaddr;" mean?

struct sockaddr_in servaddr, cliaddr;
I am a newbie in socket programing.
What does this statement in socket programing in c mean?
Are we creating a struct named sockaddr_in , and are the servaddr and cliaddr the members? Why is their datatype not mentioned?
What does this statement in socket programing in c mean?
It declares two uninitialized variables of the type struct sockaddr_in.
Are we creating a struct named sockaddr_in?
No. It must be defined already to declare variables, or your program is ill-formed.
and are the servaddr and cliaddr the members?
Nope.
Why is their datatype not mentioned?
It is. See the answer to your first question.
In Unix sockets, header netinet/in.h defines a struct type sockaddr_in, something like this:
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct in_addr {
unsigned long s_addr; // load with inet_pton()
};
(ref: http://beej.us/guide/bgnet/output/html/multipage/sockaddr_inman.html)
Your servaddr and cliaddr are names for your sockaddr_in -type structs. Both of them have (in Unix, at least) four members (short sin_family etc.). (One of the members is struct type in_addr called sin_addr that is defined couple of lines later having one member unsigned long s_addr. There's a reason for a struct with only one member, you can find explanation for that in stackoverflow too.)
If you want to use sockaddr_in you must #include <netinet/in.h> or Windows equivalent for netinet/in.h (unless you define the struct yourself).

What is the purpose of the sa_data field in a sockaddr?

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
The actual structure passed for the addr argument will depend on the address family. The sockaddr structure is defined as something like:
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
}
So for an IPv4 address (AF_INET), the actual struct that will be passed is this:
/* Source http://linux.die.net/man/7/ip */
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address. */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
Does the bind code read the sockaddr.sa_family value and depending on the value it finds, it will then cast the sockaddr struct into the appropriate struct such as sockaddr_in?
Why is the sa_data set to 14 characters? If I understand correct, the sa_data field is just a field that will have large enough memory space to fit all address family types? Presumably the original designers anticipated that 14 characters would be wide enough to fit all future types.
According to the glibc manual:
The length 14 of sa_data is essentially arbitrary.
And the FreeBSD developers handbook mentions the following:
Please note the vagueness with which the sa_data field is declared,
just as an array of 14 bytes, with the comment hinting there can be
more than 14 of them.
This vagueness is quite deliberate. Sockets is a very powerful
interface. While most people perhaps think of it as nothing more than
the Internet interface—and most applications probably use it for that
nowadays—sockets can be used for just about any kind of interprocess
communications, of which the Internet (or, more precisely, IP) is only
one.
Yes, the sa_family field is used to recognize how to treat the struct passed (which is cast to struct sockaddr* in a call to bind). You can read more about how it works also in a FreeBSD developers handbook.
And actually there are "polymorphic" (sub)types of sockaddr, in which sa_data contains more than 16 bytes, for example:
struct sockaddr_un {
sa_family_t sun_family; /* AF_UNIX */
char sun_path[108]; /* pathname */
};
The sockaddr struct is used as a tagged union. By reading the sa_family field it can be cast to a struct of the proper form.
The 14 bytes is arbitrary. It's big enough to hold IPv4 addresses, but not big enough to hold IPv6 addresses. There is also a sockaddr_storage struct which is big enough for both. Reading the Microsoft docs on SOCKADDR_STORAGE, it comes in at 128 bytes, so much larger than needed for IPv6. Checking some Linux headers, it seems to be at least that large there as well.
For reference, the IPv6 struct is:
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct in6_addr {
unsigned char s6_addr[16]; // IPv6 address
};
As you can see, the 16 byte s6_addr field is already bigger than the 14 byte sa_data field on it's own. Total size after the sa_family field is 26 bytes.

Why is sin_addr inside the structure in_addr?

My doubt is related to the following structure of sockets in UNIX :
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
Here the member sin_addr is of type struct in_addr.
But I don't get why someone would like to do that as all struct inaddr has is :
struct in_addr {
unsigned long s_addr; // load with inet_pton()
};
All in_addr has is just one member s_addr. Why cannot we have something like this :
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
unsigned long s_addr ;
char sin_zero[8]; // zero this if you want to
};
struct in_addr is sometimes very different than that, depending on what system you're on. On Windows for example:
typedef struct in_addr {
union {
struct {
u_char s_b1,s_b2,s_b3,s_b4;
} S_un_b;
struct {
u_short s_w1,s_w2;
} S_un_w;
u_long S_addr;
} S_un;
} IN_ADDR, *PIN_ADDR, FAR *LPIN_ADDR;
The only requirement is that it contain a member s_addr.
struct in_addr is a more than just an integer is because it might have more than in_addr_t. In many systems, it has a union, and the reason of such implementation is for class A/B/C addresses, which are not used now.
Unix Network Programming Volume 1 explains the historical reason in detail:
The reason the sin_addr member is a structure, and not just an in_addr_t,
is historical. Earlier releases (4.2BSD) defined the in_addr structure as a
union of various structures, to allow access to each of the 4 bytes and to both of the 16-bit values contained within the 32-bit IPv4 address. This was used with class A, B, and C addresses to fetch the appropriate bytes of the address. But with the advent of subnetting and then the disappearance of the various address classes with classless addressing, the need for the
union disappeared. Most systems today have done away with the union and
just define in_addr as a structure with a single in_addr_t member.
Because the in_addr structure may contain more than one member.
http://pubs.opengroup.org/onlinepubs/009604599/basedefs/netinet/in.h.html

What does the abbreviation "s_", "ai_", "sin_", "in" (if such) in the IP structures mean?

Pretty simple questions. And yes, maybe not (that) important, but I'm really curious what do they mean and I couldn't find their meanings.
// ipv4
struct sockaddr_in {
short int sin_family; // Address family, AF_INET
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // Same size as struct sockaddr
};
// ipv4
struct in_addr {
uint32_t s_addr; // that's a 32-bit int (4 bytes)
};
// ipv6
struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};
// ipv6
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
sin_ means sockaddr_in, ai_ means addrinfo, sa_ means sockaddr. I'm not sure about the s_ in in_addr. The sockets API was designed with pre-standard early 1980s C compilers in mind, which might have had a single namespace for all struct members.
larsman is mostly right, but it's not merely a matter of legacy single-namespace considerations. All the structs defined in standard headers use names of this form to avoid stepping on the application's namespace for macros. If struct members were not prefixed with ai_, sin_, etc., then whatever member names were included in the struct (including extensions not even specified in the C or POSIX standards) would clash and result in errors if an application defined the same name as a preprocessor macro. By using these "struct-local namespaces" that can be reserved by simple pattern rules in the standards (for instance, netdb.h reserves ai_*) there is a clear distinction between names reserved for use by the implementation and names reserved for use by the application, and new extensions or new revisions of the standard will not result in clashes.
The Microsoft definition of in_addr could imply that the "S_" prefix means struct as per #R..'s answer and seeing "un" for union, however they use a capital "S" unlike POSIX land.
typedef struct in_addr {
union {
struct {
u_char s_b1,s_b2,s_b3,s_b4;
} S_un_b;
struct {
u_short s_w1,s_w2;
} S_un_w;
u_long S_addr;
} S_un;
} IN_ADDR, *PIN_ADDR, FAR *LPIN_ADDR;
For the set of socket structures "s" usually means "sock" short for "socket", so there does not appear to be a definition reason.
For curiosity the MSDN page on IPv6 Support shows struct in6_addr containing the name s6_addr implying the IPv6 version of an IPv4 "socket" structure rather than just a "IPv4 structure".

Resources