struct sockaddr, do we really need to give addrlen

struct sockaddr, do we really need to give addrlen - c

I'm currently writing a C library that use network structures such as sockaddr.
In my code, I am using this ternary condition to deduce the size of such structure.
addrlen = sourceaddr->sa_family ==
AF_INET ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6);
However, most of the standard unix functions such as bind has this kind of signatures:
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
It contains explicit declaration of the address length addrlen.
My question is: "Why should I need an explicit indication of the address len in my library if I can deduce it with sa_family?"
For instance, the bind call can change to this signature:
int bind(int sockfd, const struct sockaddr *addr);
And use internally:
__bind(sockfd, addr,
addr->sa_family == AF_INET ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6));
Thank you very much

It is linked to socket API history between different operating systems. Each system had its own socket implementation and such API were not standardized among them.
socklen_t is a strange typedef for the size of a struct sockaddr in
accept(), getpeername(), and connect().
The reason for this typedef is that in BSD, those functions took an
int*, and the POSIX people screwed it all up when they decided to make
such arguments point to an integer of (potentially) another size,
under a new typedef.

Related

Safely converting from struct sockaddr to struct sockaddr_storage

I have a function which takes in "struct sockaddr *" as a parameter (let's call this input_address), and then I need to operate on that address, which may be a sockaddr_in or sockaddr_in6, since I support both IPv4 and IPv6.
I'm getting some memory corruption and trying to track it down to it's source, and in the process found some code that seems suspect, so I would like to validate if this is the right way to do things.
struct sockaddr_storage *input_address_storage = (struct sockaddr_storage *) input_address;
struct sockaddr_storage result = [UtilityClass performSomeOperation: *input_address_storage];
At first I thought the cast in the first line was safe, but then in the second line I need to dereference that pointer, which seems like it may be wrong. The reason I am concerned is that it may end up copying memory that is beyond where the original structure is (since sockaddr_in is shorter than sockaddr_in6). I am not sure if this could cause a memory corruption (my guess is no), but nevertheless this code gives me a bad feeling.
I can't change the fact my function takes a "struct sockaddr *", so it seems like it would be difficult to work around this type of code, and yet I want to avoid copying from a memory location where I shouldn't be.
If anyone can validate whether what I am doing is wrong, and the best way to fix this, I'd appreciate it.
EDIT: An admin had changed my C tag for C# for some reason. The code I gave is primarily C, with one function call from objective C that doesn't really matter. That call could have been C.

The problem with your approach is that you are converting an existing struct sockaddr* into a struct sockaddr_storage*. Imagine what happens if the original was a ``struct sockaddr_in. Sincesizeof(struct sockaddr_in) < sizeof(struct sockaddr_storage)`, the memory-sanitizer complains of unbound memory reference.
struct sockaddr_storage is essentially a container to contain either your struct sockaddr_in or struct sockaddr_in6.
Hence, it is useful when you want to pass in a struct sockaddr* object but want to allocate enough memory for both sockaddr_in and sockaddr_in6.
A good example is the recvfrom(3) call:
ssize_t recvfrom(int socket, void *restrict buffer, size_t length,
int flags, struct sockaddr *restrict address,
socklen_t *restrict address_len);
Since address requires a struct sockaddr* object, we will construct a struct sockaddr_storage first, and pass it in:
struct sockaddr_storage address;
socklen_t address_length = sizeof(struct sockaddr_storage);
ssize_t ret = recvfrom(fd, buffer, buffer_length, 0, (struct sockaddr*)&address, &address_length);
if (address.ss_family == AF_INET) {
DoIpv4Work((struct sockaddr_in*)&address, ...);
} else if (address.ss_family == AF_INET6) {
DoIpv6Work((struct sockaddr_in6*)&address, ...);
}
The difference in your approach and mine is that I allocate a struct sockaddr_storage and then use it as struct sockaddr, but you do the REVERSE, and use a struct sockaddr and then use it as struct sockaddr_storage.

What care the parameter sizeof(sockaddr_in)

In functions as connect() the last parameter is the size of the sockaddr_in structure.
All well, but sin_zero is unused (the system doesn't "pop sin_zero" as I understood in stackoverflow.com/questions/28280581/how-kernels-recognize-sin-zero-sockaddr-in-structure-pushed), so for what the functions need this parameter?
I.e. asm or when C is compiled to asm, the RET of connect() must specify the number of bytes to delete (RET n), bytes that are the connect() arguments. Why n is a fixed number (don't change I specify the sin_zero or no) if I can push or no sin_zero in asm and assign value to sin_zero in C. And if I create a program in asm and I don't push sin_zero to the stack, the programs works perfectly... but n is the same number!

A sockaddr_in is a structure containing an internet address. This structure is defined in <netinet/in.h>. This is server/client address struct sockaddr_in serv_addr, cli_addr; Here is the definition:
struct sockaddr_in {
short sin_family;
u_short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
sin_family – use AF_INET
sin_port – port number (in network byte order => use htons(port))
sin_addr – Internet address described by struct in_addr
struct in_addr {
unsigned long s_addr;
};
Set s_addr to INADDR_ANY => local internet address.
sin_zero[] – set to 0 with bzero() or memset(). Padding to make structure the same size as SOCKADDR.
bind() example
int mysock,err;
struct sockaddr_in myaddr;
mysock = socket(AF_INET,SOCK_STREAM,0);
myaddr.sin_family = AF_INET;
myaddr.sin_port = htons( portnum );
myaddr.sin_addr.s_addr = INADDR_ANY;
bzero(&(myaddr.sin_zero),sizeof(myaddr.sin_zero));
err = bind(mysock, (struct sockaddr *) &myaddr, sizeof(myaddr));
I have found some info about sin_zero (Unix network programming chapter 3.2)
The POSIX specification requires only three members in the structure: sin_family, sin_addr, and sin_port. It is acceptable for a POSIX-compliant implementation to define additional structure members, and this is normal for an Internet socket address structure. Almost all implementations add the sin_zero member so that all socket address structures are at least 16 bytes in size.
And the definition of sin_zero
unsigned char __pad[__SOCK_SIZE__ - sizeof(short int)
- sizeof(unsigned short int) - sizeof(struct in_addr)];
};
#define sin_zero __pad
Most of the net code does not use sockaddr_in, it uses sockaddr. When you use a function like sendto, you must explicitly cast sockaddr_in, or whatever address your using, to sockaddr. sockaddr_in is the same size as sockaddr, but internally the sizes are the same because of a slight hack.
That hack is sin_zero. Really the length of useful data in sockaddr_in is shorter than sockaddr. But the difference is padded in sockaddr_in using a small buffer; that buffer is sin_zero.
On some architectures, it wont cause any problems not clearing sin_zero. But on other architectures it might. Its required by specification to clear sin_zero, so you must do this if you intend your code to be bug free for now and in the future.
Please, have a look at page.

I think, kernel uses size parameter to see your sockaddr is ipv6 address or not because ipv6 address size is bigger than other net address in linux and unix.

Maybe it's the architecture issue, on different architecture like 64-bit / 32-bit size of primitive declared in the sockaddr_in will be different. That's why it may be a parameter declared in the connect() function.

socket structure : casting?

Context
I am self-learning how sockets work.
Admitted, I am not a C guru, but learn fast.
I read this page :
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/topic/rzab6/rzab6xafunixsrv.htm
Problem
I am stuck at this line :
rc = bind(sd, (struct sockaddr *)&serveraddr, SUN_LEN(&serveraddr));
I just cannot figure what we get from this cast (struct sockaddr *)&serveraddr.
My test so far :
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
int main(void)
{
/*JUST TESTING THE CAST THING NOTHING ELSE HERE*/
struct sockaddr_in localaddr ;
struct sockaddr_in * mi;
struct sockaddr * toto;
localaddr.sin_family = AF_INET;
localaddr.sin_addr.s_addr = htonl(INADDR_ANY);
localaddr.sin_port = 38999;
/* DID I DEFINED MI CORRECTLY ? */
mi = (struct sockaddr*)&localaddr;
toto = (struct sockaddr*)&localaddr;
printf("mi %d\n",mi->sin_family);
printf("mi %d\n",mi->sin_port);
printf("toto %d\n",toto->sa_family);
/*ERROR*/
printf("toto %d\n",toto->sa_port);
}
SUM UP
Could someone please tell me what is really passed to the bind function concerning the structure cast ?
What members do we have in that structure ?
How can I check it ?
Thanks

Here's struct sockaddr:
struct sockaddr {
uint8_t sa_len;
sa_family_t sa_family;
char sa_data[14];
};
and, for instance, here's struct sockaddr_in:
struct sockaddr_in {
uint8_t sa_len;
sa_family_t sa_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
and struct sockaddr_in6:
struct sockaddr_in6 {
uint8_t sa_len;
sa_family_t sa_family;
in_port_t sin_port;
uint32_t sin6_flowinfo;
struct in6_addr sin6_addr;
};
You'll note that they all share the first two members in common. Thus, functions like bind() can accept a pointer to a generic struct sockaddr and know that, regardless of what specific struct it actually points to, it'll have sa_len and sa_family in common (and "in common" here means "laid out the same way in memory", so there won't be any weirdness where both structs have an sa_family member, but they're in totally different places in the two different structs. Technically sa_len is optional, but if it's not there, none of the structs will have it, so sa_family will still be aligned in the same way, and often the datatype of sa_family_t will be increased to make up the difference in size). So, it can access sa_family and determine exactly what type of struct it is, and proceed accordingly, e.g. something like:
int bind(int socket, const struct sockaddr *address, socklen_t address_len) {
if ( address->sa_family == AF_INET ) {
struct sockaddr_in * real_struct = (struct sockaddr_in *)address;
/* Do stuff with IPv4 socket */
}
else if ( address->sa_family == AF_INET6 ) {
struct sockaddr_in6 * real_struct = (struct sockaddr_in6 *)address;
/* Do stuff with IPv6 socket */
}
/* etc */
}
(Pedantic note: technically, according to the C standard [section 6.5.2.3.6 of C11], you're only supposed to inspect common initial parts of structs like this if you embed them within a union, but in practice it'll almost always work without one, and for simplicity I haven't used one in the above code).
It's basically a way of getting polymorphism when you don't actually have real OOP constructs. In other words, it means you don't have to have a bunch of functions like bind_in(), bind_in6(), and all the rest of it, one single bind() function can handle them all because it can figure out what type of struct you actually have (provided that you set the sa_family member correctly, of course).
The reason you need the actual cast is because C's type system requires it. You have a generic pointer in void *, but beyond that everything has to match, so if a function accepts a struct sockaddr * it just won't let you pass anything else, including a struct sockaddr_in *. The cast essentially tells the compiler "I know what I'm doing, here, trust me", and it'll relax the rules for you. bind() could have been written to accept a void * instead of a struct sockaddr * and a cast would not have been necessary, but it wasn't written that way, because:
It's semantically more meaningful - bind() isn't written to accept any pointer whatsoever, just one to a struct which is "derived" from struct sockaddr; and
The original sockets API was released in 1983, which was before the ANSI C standard in 1989, and its predecessor - K&R C - just didn't have void *, so you were going to have to cast it to something in any case.

Casts appear in socket code because C doesn't have inheritance.
struct sockaddr is an abstract supertype of struct sockaddr_in and friends. The syscalls take the abstract type, you want to pass an actual instance of a derived type, and C doesn't know that converting a struct sockaddr_in * to a struct sockaddr * is automatically safe because it has no idea of the relationship between them.

Why do we cast sockaddr_in to sockaddr when calling bind()?

The bind() function accepts a pointer to a sockaddr, but in all examples I've seen, a sockaddr_in structure is used instead, and is cast to sockaddr:
struct sockaddr_in name;
...
if (bind (sock, (struct sockaddr *) &name, sizeof (name)) < 0)
...
I can't wrap my head around why is a sockaddr_in struct used. Why not just prepare and pass a sockaddr?
Is it just convention?

No, it's not just convention.
sockaddr is a generic descriptor for any kind of socket operation, whereas sockaddr_in is a struct specific to IP-based communication (IIRC, "in" stands for "InterNet"). As far as I know, this is a kind of "polymorphism" : the bind() function pretends to take a struct sockaddr *, but in fact, it will assume that the appropriate type of structure is passed in; i. e. one that corresponds to the type of socket you give it as the first argument.

I don't know if its very much relevant for this question, but I would like to provide some extra info which may make the typecaste more understandable as many people who haven't spent much time with C get confused seeing such a typecaste.
I use macOS, so I am taking examples based on header files from my system.
struct sockaddr is defined as follows:
struct sockaddr {
__uint8_t sa_len; /* total length */
sa_family_t sa_family; /* [XSI] address family */
char sa_data[14]; /* [XSI] addr value (actually larger) */
};
struct sockaddr_in is defined as follows:
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
Starting from the very basics, a pointer just contains an address. So struct sockaddr * and struct sockaddr_in * are pretty much the same. They both just store an address. Only relevant difference is how compiler treats their objects.
So when you say (struct sockaddr *) &name, you are just tricking the compiler and telling it that this address points to a struct sockaddr type.
So let's say the pointer is pointing to a location 1000. If the struct sockaddr * stores this address, it will consider memory from 1000 to sizeof(struct sockaddr) possessing the members as per the structure definition. If struct sockaddr_in * stores the same address it will consider memory from 1000 to sizeof(struct sockaddr_in).
When you typecasted that pointer, it will consider the same sequence of bytes upto sizeof(struct sockaddr).
struct sockaddr *a = &name; // consider &name = 1000
Now if I access a->sa_len, the compiler would access from location 1000 to sizeof(__uint8_t) which is same bytes size as in case of sockaddr_in. So this should access the same sequence of bytes.
Same pattern is for sa_family.
After that there is a 14 byte character array in struct sockaddr which stores data from in_port_t sin_port (typedef'd 16 bit unsigned integer = 2 bytes ) , struct in_addr sin_addr (simply a 32 bit ipv4 address = 4 bytes) and char sin_zero[8](8 bytes). These 3 add up to make 14 bytes.
Now these three are stored in this 14 bytes character array and we can access any of these three by accessing appropriate indices and typecasting them again.
user529758's answer already explains the reason to do this.

This is because bind can bind other types of sockets than IP sockets, for instance Unix domain sockets, which have sockaddr_un as their type. The address for an AF_INET socket has the host and port as their address, whereas an AF_UNIX socket has a filesystem path.

Reasoning behind C sockets sockaddr and sockaddr_storage

I'm looking at functions such as connect() and bind() in C sockets and notice that they take a pointer to a sockaddr struct. I've been reading and to make your application AF-Independent, it is useful to use the sockaddr_storage struct pointer and cast it to a sockaddr pointer because of all the extra space it has for larger addresses.
What I am wondering is how functions like connect() and bind() that ask for a sockaddr pointer go about accessing the data from a pointer that points at a larger structure than the one it is expecting. Sure, you pass it the size of the structure you are providing it, but what is the actual syntax that the functions use to get the IP Address off the pointers to larger structures that you have cast to struct *sockaddr?
It's probably because I come from OOP languages, but it seems like kind of a hack and a bit messy.

Functions that expect a pointer to struct sockaddr probably typecast the pointer you send them to sockaddr when you send them a pointer to struct sockaddr_storage. In that way, they access it as if it was a struct sockaddr.
struct sockaddr_storage is designed to fit in both a struct sockaddr_in and struct sockaddr_in6
You don't create your own struct sockaddr, you usually create a struct sockaddr_in or a struct sockaddr_in6 depending on what IP version you're using. In order to avoid trying to know what IP version you will be using, you can use a struct sockaddr_storage which can hold either. This will in turn be typecasted to struct sockaddr by the connect(), bind(), etc functions and accessed that way.
You can see all of these structs below (the padding is implementation specific, for alignment purposes):
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
struct sockaddr_in {
short sin_family; // e.g. AF_INET, AF_INET6
unsigned short sin_port; // e.g. htons(3490)
struct in_addr sin_addr; // see struct in_addr, below
char sin_zero[8]; // zero this if you want to
};
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct sockaddr_storage {
sa_family_t ss_family; // address family
// all this is padding, implementation specific, ignore it:
char __ss_pad1[_SS_PAD1SIZE];
int64_t __ss_align;
char __ss_pad2[_SS_PAD2SIZE];
};
So as you can see, if the function expects an IPv4 address, it will just read the first 4 bytes (because it assumes the struct is of type struct sockaddr. Otherwise it will read the full 16 bytes for IPv6).

In C++ classes with at least one virtual function are given a TAG. That tag allows you to dynamic_cast<>() to any of the classes your class derives from and vice versa. The TAG is what allows dynamic_cast<>() to work. More or less, this can be a number or a string...
In C we are limited to structures. However, structures can also be assigned a TAG. In fact, if you look at all the structures that theprole posted in his answer, you will notice that they all start with 2 bytes (an unsigned short) which represents what we call the family of the address. This defines exactly what the structure is and thus its size, fields, etc.
Therefore you can do something like this:
int bind(int fd, struct sockaddr *in, socklen_t len)
{
switch(in->sa_family)
{
case AF_INET:
if(len < sizeof(struct sockaddr_in))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in *p = (struct sockaddr_in *) in;
...
}
break;
case AF_INET6:
if(len < sizeof(struct sockaddr_in6))
{
errno = EINVAL; // wrong size
return -1;
}
{
struct sockaddr_in6 *p = (struct sockaddr_in6 *) in;
...
}
break;
[...other cases...]
default:
errno = EINVAL; // family not supported
return -1;
}
}
As you can see, the function can check the len parameter to make sure that the length is enough to fit the expected structure and therefore they can reinterpret_cast<>() (as it would be called in C++) your pointer. Whether the data is correct in the structure is up to the caller. There is not much choice on that end. These functions are expected to verify all sorts of things before it uses the data and return -1 and errno whenever a problem is found.
So in effect, you have a struct sockaddr_in or struct sockaddr_in6 that you (reinterpret) cast to a struct sockaddr and the bind() function (and others) cast that pointer back to a struct sockaddr_in or struct sockaddr_in6 after they checked the sa_family member and verified the size.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

struct sockaddr, do we really need to give addrlen - c

Related

Safely converting from struct sockaddr to struct sockaddr_storage

What care the parameter sizeof(sockaddr_in)

socket structure : casting?

Why do we cast sockaddr_in to sockaddr when calling bind()?

Reasoning behind C sockets sockaddr and sockaddr_storage

Categories

Resources