I'm confused about the use and consequences of INADDR_ANY when binding a socket. Of course the INADDR_ANY listens to all the local interfaces. My question as about what consequences this does have.
I remember reading that binding to a specific IP address allows the kernel to handle the demultiplexing but can't find the reference any more.
Will the use of INADDR_ANY have consequences of this kind or will I simply just receive data from all my local ip's? What are the benifits and problems of using each kind of binding?
Other questions that discuss this:
bind with INADDR_ANY
Question about INADDR_ANY
EDIT: Found the reference. It's from Unix Network Programming (Stevens)
One advantage in binding a non-wildcard IP address is that the
demultiplexing of a given destination IP address to a given server
process is then done by the kernel.
What does this really mean?
Binding to specific interfaces is something to use only in very special circumstances, when the application needs to "know" the local IP addresses and the immediate network layout. A routing daemon program is perhaps the best example.
Another, more pedestrian example: if you have a multi-homed machine (i.e. a machine with more than one connection to the Internet, possibly different ISPs) you can bind to a specific interface to make sure that the connection goes through a given connection. Binding separately to each network interface, the application could detect link down etc.
Implementations of protocols that need to "know" the local IP address (infamous examples: FTP, SIP, UPnP) tend to bind sockets to specific interfaces. (I consider them to be bad protocols, since they violate the isolation between transport and application layers).
Save for these cases, it is normally over-engineering to bind to specific interfaces, because addresses and interfaces may change, and the program must detect these conditions to update the respective sockets.
You are not going to be able to measure any performance difference between using a specific IP or all of them. You might wish to use a specific one based on the needs of your application... for example, if you know you should never have a (legitimate) connection from an external facing IP, you would not want to receive input from it, for security reasons.
Related
I am trying to find out whether the machines in a network are running a certain app. More like, I am trying to resolve addresses of nodes in a network.
I built a small code based on ARP, but it works only on a local network(same subnet). What I want to do is resolve addresses out of the subnet i.e. all other nodes.
I read these answers: UDP broadcast packets across subnets
and Broadcast on different subnets
But they all talk about changing router setting or creating a multicast network.
From what I read for multicasting to work, I need to create a multitask network beforehand. Is it really necessary?
And for changing router setting, does it really have to be a "special" router?
This is all for a college assignment and would be demonstrating it probably on an ad-hoc network or something like that.
I am open to ideas to solve the original problem.
PS:
1. I am a beginner in networking so do excuse me for any fault or misinterpretation.
I am using sockets and C(No other option).
Edit 1:
I am well aware ARP is useless outside the subnet. I mentioned it because I used it and it helped explaining the problem.
Edit 2:
The original problem is:
Building a chat application, nothing fancy sending messages from one point to another, without using a central server of any kind. Not even a hybrid network with a central store is allowed.
i.e. if A and B are two clients, A should directly connect to B and vice versa.
I did some research and decided to use P2P architecture. And now I am stuck to how will A discover address of B. If I know the subnet of B, I can brute force and locate B but since I don't have such information what do I do?
ARP is not intended to be routed outside the local network, where in IPv4, the "local network" typically corresponds to a subnet. You should not expect ARP traffic to transit routers from inside to outside or vise versa.
Similarly, UDP broadcasts generally do not propagate outside the local network, and it's a good thing that they don't, for reasons related to both security and traffic volume.
From what I read for multicasting to work, I need to create a multitask network beforehand. Is it really necessary?
Basically, yes. Your routers need to be configured to support multicasting (which may be their default). All participants need to agree on and join the same multicast group. There might not be a need for any new networking hardware, but multicast communication has its own protocols and network requirements; it is not merely a broadcast that can traverse network boundaries.
And for changing router setting, does it really have to be a "special" router?
If you mean changing router settings so that UDP broadcasts are routed between networks, you do indeed need a router that exposes this capability. But I urge you not to do this, as it will let broadcasts from all other sources, for all other reasons transit the router, too. At minimum, this will significantly increase the noisiness of all networks involved, but it could produce bona fide misbehavior of applications and services other than yours.
The Limited Broadcast (255.255.255.255, which is used by ARP requests as the destination address, and ARP only works for IPv4 on the local LAN) cannot cross a router, and a Network Broadcast (last network address, where the host is all ones) by default cannot cross a router (Directed Broadcast) because it is a security risk (see RFC 2644, Changing the Default for Directed Broadcasts in Routers).
Some routers can be configured to forward directed broadcasts, but this can be dangerous.
Multicast is a form of broadcast. Multicast routing is very different than unicast routing, and every router in a path must be configured for multicast routing. Also, hosts must subscribe to a multicast group before they will even listen for packets from a multicast group. Additionally, there are some multicast groups that all hosts listen for, but those are link-local multicasts that cannot be forwarded off the local LAN.
Adding to what other answers have provided:
ARP is not useful for a system in another subnet. Even if you were able to send an ARP request to a system in the other subnet, and receive its response somehow -- providing you with that system's MAC address -- you could not use it to send a packet to that system because Ethernet does not provide a routing mechanism, and so the system will never see any Ethernet packet you address to it.
If you are simply trying to identify which systems within another IP subnet are live, you can probably do this by other means. Take a look at the nmap command, for example. It supports a wide variety of IP communications methods that will be routed to the other subnet and can often detect what machines are present and which services are available on the machines found.
And you can of course duplicate what nmap does yourself. For example, if you want to find out which systems in subnet 192.168.10.0/24 are listening on TCP port 80, one way is to simply attempt to connect to port 80 on each system in that subnet. In general, there are four answers you may receive back:
Connection success (No error: the machine is present and there is a program listening to that port)
Connection refused (errno ECONNREFUSED: the machine is present but there
is no program listening to that port)
No route to host (EHOSTUNREACH: there is no machine answering to
that IP address)
No response (ETIMEDOUT: several reasons why this can happen; for example, the system could have firewall settings causing it to simply ignore the request)
(And there are other less likely possibilities as well.) Using other IP access methods (ICMP/ping, UDP packets) will have a different matrix of possible results.
As others have explained, multicast mechanisms would only be helpful for discovering a set of cooperating machines that are pre-configured to join a multicast group.
Can someone possibly explain (within the size of a stackoverflow answer) the code required in order to emulate a network interface? I just know that there is virtualization software out there like Qemu that does this specific type of hardware emulation, but have no idea how this would work. Lots of books will show you how to create a program that listens on a TCP socket, but not create a host that gets its own IP address.
VirtualBox is open source. As a VM, with networking support, it should be sufficient to demonstrate to you what to do, along with a working implementation. https://www.virtualbox.org/wiki/Downloads
It's really depends what do you mean and what do you want to achieve. If you want emulate some real hardware you need via hypervisor's primitive emulate the most aspects mentioned in datasheet of corresponding adapter, if you want introduce some service, e.g. DNS or HTTP service visible in internal network: you need port teach some user land stack (e.g. LWIP or Slirp, or part if you need UDP only or lower) to communicate with hypervisor's internal network.
I have a server that binds to a specific IP address (In a linux system). We are considering the option to bind(0), ie to bind to ANY interface. Are there any problems with this?
Depends. You may
want to have different processes binding to different IPs. In this
case you don't want any to bind to all.
want the server to be
accessible only from internal network (for instance, when the other
interface is accessible from the outer world).
want something that I can't think of at the moment.
basically, it's not the binding that has disadvantages, but the effect thereof and it all depends if this is what you want.
The most obvious issue would be that if you depend on network topology to provide access control and safety (firewalls etc) then the presence of multiple interfaces might mean that not all access to the server is being protected at the same degree. This could especially be a problem if interfaces are added or changed in the future.
Also, if your server has a notion of "its own IP" for its own purposes and it is not programmed for the eventuality that different client connections will be established to different "local" IPs then that might present a problem -- of course, you 'd need to read the source to weigh in on this.
I'm writing a program in C on GNU/Linux that uses UDP to communicate messages between various instances of the program, either on a single machine, or across a network. Each instance of the program has it's own unique internal application layer address that it uses to differentiate between instances that run on a single machine (and thus share an IP address). Currently, the whole system communicates on a single UDP port.
This works fine between instances of the program running on separate machines, as these all have unique IP addresses, and thus unique socket connections. The problem is running multiple instances on a single machine. In this case, only the first instance of the program gets a socket connect and the others fail since the port is already in use.
Is there a way to bind multiple datagram sockets to a single port? I realize this is not normally advisable, but since I have unique application layer addresses that I can use to resolve the ambiguity, it would be helpful in this case. Essentially, I want to be able to do the following:
Bind all instances of the program on a single machine to the same common protocol port
When a message is received, each instance will use recv with the MSG_PEEK flag set to determine if the message's application layer address matches the instance's internal address.
For the single instance on a given machine where the addresses match, a regular call to recv will remove the message from the input queue for processing by the appropriate instance.
Essentially, I wish to use UDP as a common communication medium with more specific addressing occurring at the application layer.
Is there a standard way of doing this in GNU C? I realize that I could write a top level governing program to listen to all messages on the socket and reroute them to the appropriate instance, but this seems unnecessarily complicated, and breaks the program operating identically with multiple instances across a network vs across a shared single IP. I also know I could use multiple ports, but this adds the need to assign each instance a separate free port and keep track of these across the entire network of instances.
Essentially, I wish to "Broadcast" a message to a group of instances sharing a single IP address and let them sort out who the message belongs to at the application layer.
Thoughts?
You can do such binding with setsockopt(SO_REUSEPORT), but I think it would not help. You will have several sockets, each with its own packet queue, and each packet will go in one queue only. MSG_PEEK will do no good.
Top-level instance rerouting messages to different consumers looks like right solution.
You can't use the multiple socket bound to a unique ip/port combination.
Use some message queue / message passing interface, and forget about UDP.
For example, see 0MQ (zeromq) http://www.zeromq.org/
If it's a client/server style app, the client side need not bind.
When the server responds to the client that hasn't bound it will respond to the source port which will be randomly chosen by the OS when the client sends (without bind).
The client then reads from the unbound port.
Most of the applications I've seen that use TCP, do roughly the following to connect to remote host:
get the hostname (or address) from the configuration/user input (textual)
either resolve the hostname into address and add the port, or use getaddrinfo()
from the above fill in the sockaddr_* structure with one of the remote addresses
use the connect() to get the socket connected to the remote host.
if fails, possibly go to (3) and retry - or just complain about the error
(2) is blocking in the stock library implementation, and the (4) seems to be most frequently non-blocking, which seems to give a room for a lot of somewhat similar yet different code that serves the purpose to asynchronously connect to a remote host by its hostname.
So the question: what are the good reasons not to have the additional single call like following:
int sockfd = connect_by_name(const char *hostname, const char *servicename)
?
I can come up with three:
historic: because that's what the API is
provide for custom per-application policy mechanism for address selection/connection retry: this seems a bit superficial, since for the common case ("get me a tube to talk to remote host") the underlying OS should know better
provide the visual feedback to the user about the exact step involved ("name resolution" vs "connection attempt"): this seems rather important, lookup+connection attempt may take time
Only the last of them seems to be compelling enough to rewrite the resolve/connect code for every client app (as opposed to at least having and using a widely used library that would implement the connect_by_name() semantics in addition to the existing sockets API), so surely there should be some more reasons that I am missing ?
(one of the reasons behind the question is that this kind of API would appear to help the portability to IPv6, as well as possibly to other stream transport protocols significantly)
Or, maybe such a library exists and my google-fu failed me ?
(edited: corrected the definition to look like it was meant to look, thanks LnxPrgr3)
Implementing such an API with non-blocking characteristics within the constraints of the standard library (which, crucially, isn't supposed to start its own threads or processes to work asynchronously) would be problematic.
Both the name lookup and connecting part of the process require waiting for a remote response. If either of these are not to block, then that requires a way of doing asychronous work and signalling the change in state of the socket to the calling application. connect is able to do this, because the work of the connect call is done in the kernel, and the kernel can mark the socket as readable when the connect is done. However, name lookup is not able to do this, because the work of a name lookup is done in userspace - and without starting a new thread (which is verboten in the standard library), giving that name lookup code a way to be woken up to continue work is a difficult problem.
You could do it by having your proposed call return two file descriptors - one for the socket itself, and another that you are told "Do nothing with this file descriptor except to check regularly if it is readable. If this file descriptor becomes readable, call cbn_do_some_more_work(fd)". That is clearly a fairly uninspiring API!
The usual UNIX approach is to provide a set of simple, flexible tools, working on a small set of object types, that can be combined in order to produce complex effects. That applies to the programming API as much as it does to the standard shell tools.
Because you can build higher level APIs such as the one you propose on top of the native low level APIs.
The socket API is not just for TCP, but can also be used for other protocols that may have different end point conventions (i.e. the Unix-local protocol where you have a name only and no service). Or consider DNS which uses sockets to implement itself. How does the DNS code connect to the server if the connection code relies on DNS?
If you would like a higher level abstraction, one library to check out is ACE.
There are several questions in your question. For instance, why not
standardizing an API with such connect_by_name? That would certainly
be a good idea. It would not fit every purpose (see the DNS example
from R Samuel Klatchko) but for the typical network program, it would
be OK. A paper exploring such APIs is "Simplifying Internet Applications Development
With A Name-Oriented Sockets Interface" by Christian Vogt. Note
that another difficulty for such an API would be "callback"
applications, for instance a SIP client asking to be called back: the
application has no easy way to know its own name and therefore often
prefer to be called back by address, despite the problems it make, for
instance with NAT.
Now, another question is "Is it possible to build such
connect_by_name subroutine today?" Partly yes (with the caveats
mentioned by caf) but, if written in userspace, in an ordinary
library, it would not be completely "name-oriented" since the Unix
kernel still manages the connections using IP addresses. For instance,
I would expect a "real" connect_by_name routine to be able to
survive renumbering (for instance because a mobile host renumbered),
which is quite difficult to do in userspace.
Finally, yes, it already exists a lot of libraries with similar
semantics. For a HTTP client (the most common case for a program whose
network abilities are not the main feature, for instance a XML
processor), you have Neon and libcURL. With libcURL, you can
simply write things like:
#define URL "http://www.velib.paris.fr/service/stationdetails/42"
...
curl_easy_setopt(curl, CURLOPT_URL, URL);
result = curl_easy_perform(curl);
which is even higher-layer than connect_by_name since it uses an
URL, not a domain name.