Unix sockets: when to use bind() function? - c

I've not a clear idea about when I have to use the bind() function.
I guess it should be used whenever I need to receive data (i.e. recv() or recvfrom() functions) whether I'm using TCP or UDP, but somebody told me this is not the case.
Can anyone clarify a bit?
EDIT I've read the answers but actually I'm not so clear. Let's take an example where I have an UDP client which sends the data to the server and then has to get a response. I have to use bind here, right?

This answer is a little bit long-winded, but I think it will help.
When we do computer networking, we're really just doing inter-process communication. Lets say on your own computer you had two programs that wanted to talk to each other. You might use pipe to send the data from one program to another. When you say ls | grep pdf you are taking the output of ls and feeding it into grep. In this way, you have unidirectional communication between the two separate programs ls and grep.
When you do this, someone needs to keep track of the Process ID (PID) of each process. That PID is a unique identifier for each process and it helps us track who the "source" and "destination" processes are for the data we want to transfer.
So now lets say you have data from a webserver than you want to transfer to a browser. Well, this is the same scenario as above - interprocess communication between two programs, the "server" and "browser".
Except this time those two programs are on different computers. The mechanism for interprocess communication across two computers are called "sockets".
So great. You take some data, lob it over the wire, and the other computer receives it. Except that computer doesn't know what to do with that data. Remember we said we need a PID to know which processes are communicating? The same is true in networking. When your computer receives HTML data, how does it know to send it to "firefox" rather than "pidgin"?
Well when you transmit network data, you specify that it goes on a specific "port". Port 80 is usually used for web, port 25 for telnet, port 443 for HTTPS, etc.
And that "port" is bound to a specific process ID on the machine. This is why we have ports. This is why we use bind(). In order to tell the sender which process should receive our data.
This should explain the answers people have posted. If you are a sender, you don't care what the outgoing port is, so you usually don't use bind() to specify that port. If you are a receiver, well, everyone else has to know where to look for you. So you bind() your program to port 80 and then tell everyone to make sure to transmit data there.
To answer your hw question, yes, your probably want to use bind() for your server. But the clients don't need to use bind() - they just need to make sure they transmit data to whatever port you've chosen.

After reading your updated question. I would suggest not to use bind() function while making client calls. The function is used, while writing your own server, to bind the socket (created after making a call to socket()) to a physical address.
For further help look at this tutorial

bind() is useful when you are writing a server which awaits data from clients by "listening" to a known port. With bind() you are able to set the port on which you will listen() with the same socket.
If you are writing the client, it is not needed for you to call bind() -- you can simply call recv() to obtain the data sent from the server. Your local port will be set to an "ephemeral" value when the TCP connection is established.

You use bind whenever you want to bind to a local address. You mostly use this for opening a listening socket on a specific address/port, but it can also be used to fix the address/port of an outgoing TCP connection.

you need to call bind() only in your server. It's needed especially for binding a #port to your socket.

Related

Can a RAW socket be bound to an ip:port instead of an interface?

I need to write a proxy server in C language on Linux (Ubuntu 20.04). The purpose of this proxy server is as follows. There're illogical governmental barriers in accessing the free internet. Some are:
Name resolution: I ping telegram.org and many other sites which the government doesn't want me to access. I ask 8.8.8.8 to resolve the name, but they response of behalf of the server that the IP may be resolved to 10.10.34.35!
Let's concentrate on this one, because when this is solved many other problems will be solved too. For this, I need to setup such a configuration:
A server outside of my country is required. I prepared it. It's a VPS. Let's call it RS (Remote Server).
A local proxy server is required. Let's call it PS. PS runs on the local machine (client) and knows RS's IP. I need it to gather all requests going to be sent through the only NIC available on client, process them, scramble them, and send them to RS in a way to be hidden from the government.
The server-side program should be running on RS on a specific port to get the packet, unscramble it, and send it to the internet on behalf of the client. After receiving the response from the internet, it should send it back to the client via the PS.
PS will deliver the response to the client application which originates the request. Of course this happens after it will unscramble and will find the original response from the internet.
This is the design and some parts is remained gloomy for me. Since I'm not an expert in network programming context, I'm going to ask my questions in the parts I'm getting into trouble or are not clear for me.
Now, I'm in part 2. See whether I'm right. There're two types of sockets, a RAW socket and a stream socket. A RAW socket is opened this way:
socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
And a stream socket is opened this way:
socket(AF_INET, SOCK_STREAM, 0);
For RAW sockets, we use sockaddr_ll and for stream sockets we use sockaddr_in. May I use stream sockets between client applications and PS? I think not, because I need the whole RAW packet. I should know the protocol and maybe some other info of the packet, because the whole packet should be retrieved transparently in RS. For example, I should know whether it has been a ping packet (ICMP) or a web request (TCP). For this, I need to have packet header in PS. So I can't use a stream socket, because it doesn't contain the packet header. But until now, I've used RAW sockets for interfaces and have not written a proxy server to receive RAW packets. Is it possible? In another words, I've the following questions to go to next step:
Can a RAW socket be bound to localhost:port instead of an interface so that it may receive all low-level packets containing packet headers (RAW packets)?
I may define a proxy server for browser. But can I put the whole system behind the proxy server so that packets of other apps like PING may route automatically via it?
Do I really need RAW sockets in PS? Can't I change the design to suffice the data I got from the packets payload?
Maybe I'm wrong in some of the concepts and will appreciate your guidance.
Thank you
Can a RAW socket be bound to localhost:port instead of an interface so that it may receive all low-level packets containing packet headers (RAW packets)?
No, it doesn't make sense. Raw packets don't have port numbers so how would it know which socket to go to?
It looks like you are trying to write a VPN. You can do this on Linux by creating a fake network interface called a "tun interface". You create a tun interface, and whenever Linux tries to send a packet through the interface, instead of going to a network cable, it goes to your program! Then you can do whatever you like with the packet. Of course, it works both ways - you can send packets from your program back to Linux through the tun interface, and Linux will act like they just arrived on a network cable.
Then, you can set up your routing table so that all traffic goes to the tun interface, except for traffic to the VPN server ("RS"), which goes to your real ethernet/wifi interface. Otherwise you'd have an endless loop where your VPN program PS tried to send packets to RS but they just went back to PS.

Retrieving local IP before a connection is made

I am trying to determine which local IP would be used on a socket for a TCP connection towards a given host on Linux, using C.
Let me make an example. I could connect my socket and use getsockname() on the file descriptor to get the local ip (and local TCP port); but can I do this without opening the connection?
I could read the routing table and make a decision based on that - but the networking subsystem must have that algorithm already, for when the connection is actually open. In short, I'd like to know if there is an API to access the routing algorithms without having to parse the rules myself or opening an actual connection. The solution - if any - will probably be Linux only but that's OK.
EDIT: someone on IRC suggested I create a UDP socket and use connect() on it. No network is used at that point but I should be able to use getsockname() on it
The only solution I know to this is what traceroute does. Send a packet with a TTL of 1 and see which interface the ICMP return comes in on. As I recall there are lots of incompatibilities between different hosts, so there's probably several different types of messages you might need to send/receive to get the data you need.

When to use bind() when programming with sockets?

I am writing a simple sender and receiver program to transit using UDP so it's connectionless, but I am having a problem figuring out whether my receiver program needs to call bind() or the server and/or both. My receiver program(client) will sit in an infinite loop waiting to receive data from the sender(server) and then it will print out the data. I'm not quite sure what bind() does exactly besides associating an address/port with a specific socket. Why is it that I need to call bind()?
You need to call bind(2) so that the OS knows which application to route network packets to. When you call bind with a specific port for a given protocol (e.g. TCP or UDP), you're asking it "whenever you see a network packet on port XXXXX, please give it to me".
Say, for example, that two copies of your program were running, and they both wanted to listen for UDP packets on the same port. If they both call bind on the same port, then one will succeed and one will fail, since the OS can arbitrate who is bound to each port. Then, any packet received on that port will be given to whichever instance of the program succeeded at binding to that port.
when you want to make a socket a fixed address or/and port, you use bind.
See you when developing a Network Application you need to specify "Address and Port" to Bind because if you want to set it for Localhost your application is not able to communicate with the all over the network its only for your system which its communicating.. If you set it with your Network address it's not able to communicate as localhost Its only communicate with the network and If you set it to 0 then It can be use as both for localhost and Network.

how can a process execute network code

I am a beginner to networking and I have a few questions regarding networking.
1)How can a process execute code that is sent from a different computer on the network. Generally a process's code segment cannot be changed once its loaded to ensure protection. (Also I can execute some arbitrary code to corrupt the process's memory)
2)Also can a process hear to multiple ports ? And multiple processes can hear to a same port ? For example two https associated with port 80. How to distinguish between the processes and how to ensure protection ?
3)Also I would like to know how listen is implemented in sockets. Are they implemented as software interrupts ?
Any good book recommendations are very much appreciated.
Thanks & Regards,
Mousey.
Q: How can a process execute code sent from another machine?
A: Generally, this is a bad idea as the security concerns are difficult to fully explore. However, this can be done by saving the network-delivered code to a separate executable and then launching this new program. This can also be done on most systems by just treating the raw bytes received as code; load the bytes into the heap (not the stack!), cast the address to a function pointer, and call it. Again though, this is almost certainly a bad idea.
Q: Can a process listen on multiple ports simultaneously?
A: Yes. By the way, HTTPS is port 443. HTTP is port 80.
Q: Can multiple processes listen on the same port (with the same protocol, on the same address)?
A: No. Other processes might be able to eavesdrop and also receive the packets, but they're not directly bound to the port. In general, only one process can be bound to a given protocol/port/address 3-tuple.
Q: How is blocking while listening on a socket implemented?
A: By the operating system, in its own fashion. Generally a thread is moved into the "blocking" state when it calls accept, read, or poll/select on a non-ready socket, and will not receive CPU time until some data have arrived.
1)How can a process execute code that is sent from a different computer on the network. Generally a process's code segment cannot be changed once its loaded to ensure protection.
This has nothing to do with networking. Once you receive the data through a socket, it's in your local memory. What you do after that is OS-specific. For example, on Windows, you can use VirtualProtect to mark pages as executable.
2)Also can a process hear to multiple ports ?
Sure, just create a different socket for each port you want to listen to. Of course, to use them simultaneously, you either need to use non-blocking sockets or run each socket in a separate thread.
3)Also I would like to know how listen is implemented in sockets. Are they implemented as software interrupts ?
This is entirely OS-specific. listen just sets up the socket so that it can accept connections. Any connection requests that arrive after this (this probably happens somewhere in the TCP/IP driver) are put in a queue by the OS. When you later call accept, the OS pulls out the first pending connection from this queue and returns a socket to that.

Detect whether a socket program is connecting to itself

How, in C, can I detect whether a program is connecting to itself.
For example, I've set up a listener on port 1234, then I set up another socket to connect to an arbitrary address on port 1234. I want to detect whether I'm connecting to my own program. Is there any way?
Thanks,
Dave
Linux provides tools that I think can solve this problem. If the connection is to the same machine, you can run
fuser -n tcp <port-number>
and get back a list of processes listening to that port. You can then look in /proc and found out if there is a process with a pid not your own which is running the same binary you are. A bit of chewing gum and baling wire will help keep the whole contraption together.
I don't think you can easily ask questions about a process on another machine.
One of the parameters to the accept() function is a pointer to a struct sockaddr.
When you call accept() on the server side it will fill in the address of the remote machine connecting to your server socket.
If that address matches the address of any of the interfaces on that machine then that indicates that the client is on the same machine as the server.
You could send a sequence of magic packets upon connection, which is calculated in a deterministic way. The trick is how to do this in a way that sender and receiver will always calculate the same packet contents if they are from the same instance of the program. A little more information on what your program is would be helpful here, but most likely you can do some sort of hash on a bunch of program state and come up with something fairly unique to that instance of the program.
I assume you mean not just the same program, but the same instance of it running on the same machine.
Do you care about the case where you're connecting back to yourself via the network (perhaps you have two network cards, or a port-forwarding router, or some unusual routing out on the internet somewhere)?
If not, you could check whether the arbitrary address resolves to loopback (127.0.0.1), or any of the other IP addresses you know are you. I'm not a networking expert, so I may have missed some possibilities.
If you do care about that "indirect loopback" case, do some handshaking including a randomly-generated number which the two endpoints share via memory. I don't know whether there are security concerns in your situation: if so bear in mind that this is almost certainly subject to MITM unless you also secure the connection.

Resources