disadvantages of binding to ANY interface - c

I have a server that binds to a specific IP address (In a linux system). We are considering the option to bind(0), ie to bind to ANY interface. Are there any problems with this?

Depends. You may
want to have different processes binding to different IPs. In this
case you don't want any to bind to all.
want the server to be
accessible only from internal network (for instance, when the other
interface is accessible from the outer world).
want something that I can't think of at the moment.
basically, it's not the binding that has disadvantages, but the effect thereof and it all depends if this is what you want.

The most obvious issue would be that if you depend on network topology to provide access control and safety (firewalls etc) then the presence of multiple interfaces might mean that not all access to the server is being protected at the same degree. This could especially be a problem if interfaces are added or changed in the future.
Also, if your server has a notion of "its own IP" for its own purposes and it is not programmed for the eventuality that different client connections will be established to different "local" IPs then that might present a problem -- of course, you 'd need to read the source to weigh in on this.

Related

Synchronize sqlite files without a public IP

I'm trying to come up with a way to sync a sqlite database between two computers.
If this were on a machine with a public IP this would not be difficult but I'm trying to find a way to make this work for ANY two devices, and most computers don't have a static IP.
What are some of the ways I can tackle this problem?
Assuming you just need to find the peers IP address...
Broadcast a Udp packet onto Lan, if machines are same lan segment. You can also try using admin scoped multicast, but mileage will vary according to the network setup and gear in use.
If trying to find two machines across the internet (assuming you can solve the NAT address translation issue), you need to bounce off a node that will hold info for you. Eg write a packet to dweet.io or sparkfun or other website that will hold store and forward data. You can also read twitter feeds etc, basically you need a known reference point to both talk too. Look into how malware create command and control networks for some ideas. Or search for rendezvous servers and protocols.
If the address range is small, probe all possible addresses. But be careful as you might trigger anti virus or ISP action
If wanting more browser based, look at webrtc, not quite what you are after but some of the techniques for discovery might be interesting.
If you have access, you can play with your DNS records. Essentially this is a variation of (2).
There are more options too, but that get more special purpose or become a bit too stelthy for general use. Also see how mesh networks are formed.

Difference binding to INADDR_ANY and a specific IP

I'm confused about the use and consequences of INADDR_ANY when binding a socket. Of course the INADDR_ANY listens to all the local interfaces. My question as about what consequences this does have.
I remember reading that binding to a specific IP address allows the kernel to handle the demultiplexing but can't find the reference any more.
Will the use of INADDR_ANY have consequences of this kind or will I simply just receive data from all my local ip's? What are the benifits and problems of using each kind of binding?
Other questions that discuss this:
bind with INADDR_ANY
Question about INADDR_ANY
EDIT: Found the reference. It's from Unix Network Programming (Stevens)
One advantage in binding a non-wildcard IP address is that the
demultiplexing of a given destination IP address to a given server
process is then done by the kernel.
What does this really mean?
Binding to specific interfaces is something to use only in very special circumstances, when the application needs to "know" the local IP addresses and the immediate network layout. A routing daemon program is perhaps the best example.
Another, more pedestrian example: if you have a multi-homed machine (i.e. a machine with more than one connection to the Internet, possibly different ISPs) you can bind to a specific interface to make sure that the connection goes through a given connection. Binding separately to each network interface, the application could detect link down etc.
Implementations of protocols that need to "know" the local IP address (infamous examples: FTP, SIP, UPnP) tend to bind sockets to specific interfaces. (I consider them to be bad protocols, since they violate the isolation between transport and application layers).
Save for these cases, it is normally over-engineering to bind to specific interfaces, because addresses and interfaces may change, and the program must detect these conditions to update the respective sockets.
You are not going to be able to measure any performance difference between using a specific IP or all of them. You might wish to use a specific one based on the needs of your application... for example, if you know you should never have a (legitimate) connection from an external facing IP, you would not want to receive input from it, for security reasons.

emulating a network interface

Can someone possibly explain (within the size of a stackoverflow answer) the code required in order to emulate a network interface? I just know that there is virtualization software out there like Qemu that does this specific type of hardware emulation, but have no idea how this would work. Lots of books will show you how to create a program that listens on a TCP socket, but not create a host that gets its own IP address.
VirtualBox is open source. As a VM, with networking support, it should be sufficient to demonstrate to you what to do, along with a working implementation. https://www.virtualbox.org/wiki/Downloads
It's really depends what do you mean and what do you want to achieve. If you want emulate some real hardware you need via hypervisor's primitive emulate the most aspects mentioned in datasheet of corresponding adapter, if you want introduce some service, e.g. DNS or HTTP service visible in internal network: you need port teach some user land stack (e.g. LWIP or Slirp, or part if you need UDP only or lower) to communicate with hypervisor's internal network.

how to restrict number of proposed ports by getaddrinfo

One of my stand-alone java applications (no sources available) picks random-available port to listen on.
At this stage I assume it uses getaddrinfo system call to obtain addresses to bind against.
Since I'm maintaining hundreds of various servers with assigned ports, the black app sometimes kicks in and pick one of 'the assigned' ports, which cause my small servers to fail on startup...
I'm wondering is there a way to restrict number of ports proposed by the OS?
Would be mostly interested in system config solutions,
but if there are no other solutions I'm also able to hack bind()/getaddrinfo (this would require some hits as well ... )
thanks
You must be able to control it from proc entries - For example, here is a system wide setting :
/proc/sys/net/ipv4/ip_local_port_range
You can modify them. Or there may be utilities available for the same purpose.
If OS-wide change is not what you had in mind, configure the JVM's Java Security Manager so that SecurityManager.CheckListen(NNN) throws SecurityException for any of the port numbers you want to reserve.
Take a look on:
http://www.tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec70.html
It's the solution for my problem, than I could limit port ranges

Why is separate getaddrinfo-like() + connect() not refactored into a (theoretical) connect_by_name()?

Most of the applications I've seen that use TCP, do roughly the following to connect to remote host:
get the hostname (or address) from the configuration/user input (textual)
either resolve the hostname into address and add the port, or use getaddrinfo()
from the above fill in the sockaddr_* structure with one of the remote addresses
use the connect() to get the socket connected to the remote host.
if fails, possibly go to (3) and retry - or just complain about the error
(2) is blocking in the stock library implementation, and the (4) seems to be most frequently non-blocking, which seems to give a room for a lot of somewhat similar yet different code that serves the purpose to asynchronously connect to a remote host by its hostname.
So the question: what are the good reasons not to have the additional single call like following:
int sockfd = connect_by_name(const char *hostname, const char *servicename)
?
I can come up with three:
historic: because that's what the API is
provide for custom per-application policy mechanism for address selection/connection retry: this seems a bit superficial, since for the common case ("get me a tube to talk to remote host") the underlying OS should know better
provide the visual feedback to the user about the exact step involved ("name resolution" vs "connection attempt"): this seems rather important, lookup+connection attempt may take time
Only the last of them seems to be compelling enough to rewrite the resolve/connect code for every client app (as opposed to at least having and using a widely used library that would implement the connect_by_name() semantics in addition to the existing sockets API), so surely there should be some more reasons that I am missing ?
(one of the reasons behind the question is that this kind of API would appear to help the portability to IPv6, as well as possibly to other stream transport protocols significantly)
Or, maybe such a library exists and my google-fu failed me ?
(edited: corrected the definition to look like it was meant to look, thanks LnxPrgr3)
Implementing such an API with non-blocking characteristics within the constraints of the standard library (which, crucially, isn't supposed to start its own threads or processes to work asynchronously) would be problematic.
Both the name lookup and connecting part of the process require waiting for a remote response. If either of these are not to block, then that requires a way of doing asychronous work and signalling the change in state of the socket to the calling application. connect is able to do this, because the work of the connect call is done in the kernel, and the kernel can mark the socket as readable when the connect is done. However, name lookup is not able to do this, because the work of a name lookup is done in userspace - and without starting a new thread (which is verboten in the standard library), giving that name lookup code a way to be woken up to continue work is a difficult problem.
You could do it by having your proposed call return two file descriptors - one for the socket itself, and another that you are told "Do nothing with this file descriptor except to check regularly if it is readable. If this file descriptor becomes readable, call cbn_do_some_more_work(fd)". That is clearly a fairly uninspiring API!
The usual UNIX approach is to provide a set of simple, flexible tools, working on a small set of object types, that can be combined in order to produce complex effects. That applies to the programming API as much as it does to the standard shell tools.
Because you can build higher level APIs such as the one you propose on top of the native low level APIs.
The socket API is not just for TCP, but can also be used for other protocols that may have different end point conventions (i.e. the Unix-local protocol where you have a name only and no service). Or consider DNS which uses sockets to implement itself. How does the DNS code connect to the server if the connection code relies on DNS?
If you would like a higher level abstraction, one library to check out is ACE.
There are several questions in your question. For instance, why not
standardizing an API with such connect_by_name? That would certainly
be a good idea. It would not fit every purpose (see the DNS example
from R Samuel Klatchko) but for the typical network program, it would
be OK. A paper exploring such APIs is "Simplifying Internet Applications Development
With A Name-Oriented Sockets Interface" by Christian Vogt. Note
that another difficulty for such an API would be "callback"
applications, for instance a SIP client asking to be called back: the
application has no easy way to know its own name and therefore often
prefer to be called back by address, despite the problems it make, for
instance with NAT.
Now, another question is "Is it possible to build such
connect_by_name subroutine today?" Partly yes (with the caveats
mentioned by caf) but, if written in userspace, in an ordinary
library, it would not be completely "name-oriented" since the Unix
kernel still manages the connections using IP addresses. For instance,
I would expect a "real" connect_by_name routine to be able to
survive renumbering (for instance because a mobile host renumbered),
which is quite difficult to do in userspace.
Finally, yes, it already exists a lot of libraries with similar
semantics. For a HTTP client (the most common case for a program whose
network abilities are not the main feature, for instance a XML
processor), you have Neon and libcURL. With libcURL, you can
simply write things like:
#define URL "http://www.velib.paris.fr/service/stationdetails/42"
...
curl_easy_setopt(curl, CURLOPT_URL, URL);
result = curl_easy_perform(curl);
which is even higher-layer than connect_by_name since it uses an
URL, not a domain name.

Resources