Clarity on bind() socket function - c

While I was reading how to make a TCP client/server connection in C, I had a doubt about the bind() function.
I read that you need this function to "bind" a socket you've created to a local IPEndPoint, because the client/server connection takes a socket pair, made by LocalIP:LocalPort, RemoteIP:RemotePort. Thus my questions are:
What happens and what does the kernel do when a client doesn't call bind(), but calls connect() immediately after creating a socket (this is a common thing; I do it too in a client program, but I don't understand why I needn't to bind)?
... and above all ...
Why does a server program call bind(), specifying INADDR_ANY as the LocalIP address? I read that this constant is useful to specify a generic IP address, that is the server accepts data from anywhere on the Internet. But... is this a specification of a local IP address, or does it indicates where the clients can reach the server? I'm getting very confused...

1) You usually only need to call Bind if you want to create a server socket. There are some cases where it is required to establish a client socket, but more often than not, it is not necessary for for a client sockets. If you want to wait for incoming connections on a certain port, you have to bind to it. If you want to connect out to some IP and Port, there's no need to bind. The server socket's bind takes exclusive access to the TCP port. Nothing else can come online and bind to that port until your application closes or the socket is closed by you.
2) You are specifying which IP on the local computer to bind to. A single computer can have many IP addresses. Your computer may have a wired and wireless connection. Each has its own IP on the local network. You can specifically bind to one of those IPs and not the other. You could even have one application bound to port 473 (for example) on one IP and an entirely different application bound to port 473 on the other IP. If you specify INADDR_ANY, you are binding to all valid IPs the machine has. So it doesn't matter what IP the client used to get to you, it will work.

What happens and what does the kernel do when a client doesn't call bind(), but calls connect() immediately after creating a socket (this is a common thing; I do it too in a client program, but I don't understand why I needn't to bind)?
When you make an outbound connection without first binding the socket to an IP/port, the kernel will pick a source IP and port automatically, based on routing tables and what ports are available.
Why does a server program call bind(), specifying INADDR_ANY as the LocalIP address? I read that this constant is useful to specify a generic IP address, that is the server accepts data from anywhere on the Internet. But... is this a specification of a local IP address, or does it indicates where the clients can reach the server? I'm getting very confused...
What you've read is inaccurate -- the IP address in the sockaddr passed to bind() doesn't indicate where the server will accept connections from. It indicates what local IP addresses the socket should be attached to. INADDR_ANY indicates that you want to listen for connections on the specified port on any and all IP addresses attached to the machine. On servers with multiple IP addresses, it's often useful to specify one IP address to bind() to, so that other sockets can be bound to the same port on other IPs. It's also often useful to bind to a port on localhost only.

Related

Why do we require the ip address of server in itself in socket programming in C

I am fairly new to socket programming. While programming a simple client-server application, I observed that we bind the server to the server address structure.
// define the server address
struct sockaddr_in server_address;
server_address.sin_family = AF_INET;
server_address.sin_port = htons(9002);
server_address.sin_addr.s_addr = INADDR_ANY; //focus on this line --- line 4
// bind the socket to our specified IP and port
bind(server_socket, (struct sockaddr*) &server_address, sizeof(server_address));
As shown in the code, why do we require the line 4 ? We know the server socket is going to bind on the server machine-ip (itself), it can't bind to any other ip. What is the significance of INADDR_ANY (or any other ip?) in this context?
A host machine can have more than one network interface installed (ie, connected to multiple networks at a time), and there is at least one IP address associated with each interface (consider, for example, IP aliasing).
INADDR_ANY binds a socket to all available interfaces. Otherwise, you have to specify an IP address of a particuar interface to bind to.
Now imagine a server. And the server has, say, 3 network cards. 1 is dedicated to production (dedicated to the application), another is for backups (so that during backups PROD NIC network would not get jammed) and the third one for maintenance (the one that has no access to outside world and is only used internally, for administrators to access the machine).
And you have a ssh daemon running inside. Now the server MUST NOT be accessible via ssh from outside world (i.e. from PROD NIC), so you cannot bind ssh to NIC1 IP and you cannot bind it to 0.0.0.0. NIC3 is dedicated for administrational purposes and you bind the ssh daemon to its IP.
Makes sense, doesn't it?
The same applies to backups and main app. Quite often the main app is not meant to be accessible from LAN hence it must be bound to NIC1 IP. And backups scheduler listener shall not be possible to be triggered by connections from outside, so you will bind it to IP of a NIC whichever is connected to bkups server (NIC2).
Usually local databases are bound to loopback ONLY so that they would not be accessible via any of the NICs. And general services are often available via any of the NICS installed on the server (think httpd, DNS #LAN, etc..)
That's why you have to bind -- you have to choose how will the socket be accessible: via loopback, via either of the NICs or via whatever NIC/LO.
It is to bind to all available interfaces in your machine/device. Explanation taken from here:
The IP address INADDR_ANY:
When you wrote your simple FTP server in project 1, you probably bound
your listening socket to the special IP address INADDR_ANY. This
allowed your program to work without knowing the IP address of the
machine it was running on, or, in the case of a machine with multiple
network interfaces, it allowed your server to receive packets destined
to any of the interfaces. In reality, the semantics of INADDR_ANY are
more complex and involved.
In the simulator, INADDR_ANY has the following semantics: When
receiving, a socket bound to this address receives packets from all
interfaces. For example, suppose that a host has interfaces 0, 1 and
2. If a UDP socket on this host is bound using INADDR_ANY and udp port 8000, then the socket will receive all packets for port 8000 that
arrive on interfaces 0, 1, or 2. If a second socket attempts to Bind
to port 8000 on interface 1, the Bind will fail since the first socket
already ``owns'' that port/interface.
When sending, a socket bound with INADDR_ANY binds to the default IP
address, which is that of the lowest-numbered interface.

Bind in TCP/UDP Sockets

Bind function is used to assign a name (a sockaddr struct) to a socket descriptor. Why is it required for TCP server but not TCP client ? And why it is required for bot UDP Client and server?
I have also written correctly working code without using bind() in UDP Client .
I do not understand why bind() is not used universally i.e. in all cases above.
Binding is only a required, if there is no other way for the computer to know which program to send the packets to. For connection less programs this is only the receiving end.
Please have a look at socket connect() vs bind() this post.
There a much better job of explaining is done than I'm able to do. If you've got any questions after. Feel free to ask:)
Client on calling connect implicitly bind to a ephemeral, available port provided by the kernel. It need not specifically bind because it is the initiator of the connection. Server explicitly need to bind because it need to tell external world (the clients) how they can reach the server. Server listens on that port.Client knowing that published port initiates connection to it.
Now servers can send packets to client because on connection establishment the peer details (IP and Port) becomes known and are part of connection identifier.
And the above applied to both TCP and UDP. (UDP will not have connect)

C Get IP for listening server

I am writing a client/server program using C sockets. I am specifying that the server can listen on any network interface by using INADDR_ANY in sockaddr_in.sin_addr.s_addr. This is equivalent to an IP of 0.0.0.0. Is it possible for me to get the actual IP that the server is listening on? (e.g. 192.168.1.100)
When you bind a listening socket to INADDR_ANY, the socket listens on all available local IPs. There is no way to determine from the socket which IP(s) it is listening on. If you need that information, then you have to enumerate the local IPs separately (in which case you could just bind() each IP to its own socket individually if you need to retreive pre-accept binding details). However, once accept() has returned an established client connection, you can use getsockname() on the accepted socket to know which specific IP accepted the connection.
I have finally been able to find a solution that works.
Edit: link is dead so see: Internet Archive link.
Hopefully it can be helpful to others as it has been to me.

C Programming Using a specific Host IP to connect to server

I am looking at a machine with multiple IP address (ex ethernet and wireless) and need my client to connect to a server using a specific Client IP address. I can perform this easy in Java (done doing the Socket() with 4 args), but the only reference to use a specific host IP address that I can see is with using bind() but as far as I know that only is used for servers.
Synopsis:
Client has 2 IP address (ethernet and wireless) trying to connect to a server.. but using a specific IP (don't need to worry about "finding" the IP addresses as they will be in a config file or DEFINE
Not a windows only answer but a C programing that is portable (I use mac osx/linux usually but also windows)
I've never done this before but I think you can bind a socket to an ip address and then use connect with that socket. The pertinent section from here http://pubs.opengroup.org/onlinepubs/009695399/functions/connect.html states:
If the socket has not already been bound to a local address, connect()
shall bind it to an address which, unless the socket's address family
is AF_UNIX, is an unused local address.
Which implies that bind can be used before connect.
Your reference is correct. If you use bind(), then that address will be used to bind the client socket to the interface you want to use. bind() is not only used for server sockets.
https://beej.us/guide/bgnet/html/multi/syscalls.html#bind
By using the AI_PASSIVE flag, I'm telling the program to bind to the
IP of the host it's running on. If you want to bind to a specific
local IP address, drop the AI_PASSIVE and put an IP address in for the
first argument to getaddrinfo().

What client-side situations need bind()?

I'm learning C socket programming. When would you use bind() on the client-side? What types of program will need it and why? Where can I find an example?
On the client side, you would only use bind if you want to use a specific client-side port, which is rare. Usually on the client, you specify the IP address and port of the server machine, and the OS will pick which port you will use. Generally you don't care, but in some cases, there may be a firewall on the client that only allows outgoing connections on certain port. In that case, you will need to bind to a specific port before the connection attempt will work.
An example would be the data connection of an active FTP connection. In this case, the server connects from its port 20 to the IP and port specified by a PORT or EPRT command.
A classic example of a client program using bind() is the (obsolete) rlogin / rsh family of network clients. These clients were intended to be used within networks with strong trust relationships - in some cases the server machine trusts the client machine to tell it the username of the user that is connecting. This required that the client program connect from a low port (a port less than 1024), because such ports are restricted to the root user and thus (in theory) prove that the client being used is authorised by the system administrator.
The NFS protocol has similar trust relationships, and similarly the client makes connections from a low port number, using bind().
Another example is IRC clients that allow the user to specify a particular source IP address to connect from. This is to accomodate users with many IP addresses assigned to their machine, each with a different "vanity" domain name assigned to it. Choosing which IP to connect from (with bind()) allows the user to choose which domain name to appear as on IRC.
A good situation would be in a p2p case, you’re communicating with a STUN Server with a bound socket, and the STUN Server tells you the port on which he is receiving messages from your socket (that can be different from the one you specified when you bound your socket depending on your network and more specifically on your NAT type). This will allow you to be aware of the real port translation that your NAT is doing, and you’ll be able to give this information to potential peers that want to connect to you.
Binding the socket is useful as some NATs are dynamically giving you ports (binding on port x twice might not give you the same “real” port). So you’ll be able to directly use the socket you bound to listen on the port.
I suppose you should bind() in the case of UDP sockets.
bind function is one of "key" functions. It associates your socket (server or client) with address (ip + port). As for Windows you must use bind for WinSockets. There is good book about it "Network Programming for Microsoft Windows" by Anthony Jones and Jim Ohlund.
Bind can be used to attach names to a sockets. Thus, say you create a software that uses a particular TCP port, you need to bind it and then, you will know the TCP port it is using.

Resources