Persistent TCP connections, long timeouts, and IP hopping mobile devices - mobile

We have an app with a long polling scheme over HTTP (although this question could apply to any TCP-based protocol). Our timeout is fairly high, 30 minutes or so.
What we see sometimes is mobile devices hop from IP to IP fairly often, every minute or so, and this causes dozens of long-lived sockets to pile up on the server. Can't help but think this is causing more load than neccessary.
So I am guessing that some IP gateways are better than others at closing connections when a device hops off. The strategies I can think of to deal with this are:
Decrease the timeout (increasing battery life on the device)
Close the last active connection when a user reconnects (requires cookie or user ID tracking)
Any others?

I would look into closing the last active connection using a cookie or some sort of ID in you're server. Yes it's more work, but as soon as the user hops addresses, you can find the old socket and clean up the resources right of way. It should be fairly easy to tie to a username or something like that.
The other problem you may run into even if the user equipment isn't hopping addresses, some mobile networks and maybe you're own network may have a statefull firewall that will clean up unused sockets, which will cause connectivity problems since a new connection will require the syn/syn-ack again. Just something to keep in mind if you're noticing connectivity problems.
If you do decide to play with keep alives, please don't be too aggressive, chatty applications are the plague of Mobile networks, and ones that hammer the network when it lose connection to the server can cause all sorts of problems for the network (and you if the carrier catches on). Atleast have a sort of backoff mechanism to retrying connectivity, and maybe even try to find out why the device is switching IP addresses every minute. If it's functioning properly that shouldn't occur.
***I work for a mobile operator in Canada, however, my comments do not reflect the position of my employer.

If you can, turn on TCP keepalive on the sockets, and give them a fairly low timer (e.g. every 1-5 minute). As long as you're reading from the socket, you'll detect an unreachable peer faster - and with less resources utiilization on the phone than decreasing your 30 minute application timeout.

Related

Sending UDP and TCP packets on the same network line - how to prevent UDP drops? [duplicate]

Consider the prototypical multiplayer game server.
Clients connecting to the server are allowed to download maps and scripts. It is straightforward to create a TCP connection to accomplish this.
However, the server must continue to be responsive to the rest of the clients via UDP. If TCP download connections are allowed to saturate available bandwidth, UDP traffic will suffer severely from packet loss.
What might be the best way to deal with this issue? It definitely seems like a good idea to "throttle" the TCP upload connection somehow by keeping track of time, and send() on a regular time interval. This way, if UDP packet loss starts to occur more frequently the TCP connections may be throttled further. Will the OS tend to still bunch the data together rather than sending it off in a steady stream? How often would I want to be calling send()? I imagine doing it too often would cause the data to be buffered together first rendering the method ineffective, and doing it too infrequently would provide insufficient (and inefficient use of) bandwidth. Similar considerations exist with regard to how much data to send each time.
It sounds a lot like you're solving a problem the wrong way:
If you're worried about losing UDP packets, you should consider not using UDP.
If you're worried about sharing bandwidth between two functions, you should consider having separate pipes (bandwidth) for them.
Traffic shaping (which is what this sounds like) is typically addressed in the OS. You should look in that direction before making strange changes to your application.
If you haven't already gotten the application working and experienced this problem, you are probably prematurely optimizing.
To avoid saturating the bandwidth, you need to apply some sort of rate limiting. TCP actually already does this, but it might not be effective in some cases. For example, it has no idea weather you consider the TCP or UDP traffic to be the more important.
To implement any form of rate limiting involving UDP, you will first need to calculate UDP loss rate. UDP packets will need to have sequence numbers, and then the client has to count how many unique packets it actually got, and send this information back to the server. This gives you the packet loss rate. The server should monitor this, and if packet loss jumps after a file transfer is started, start lowering the transfer rate until the packet loss becomes acceptable. (You will probably need to do this for UDP anyway, since UDP has no congestion control.)
Note that while I mention "server" above, it could really be done either direction, or both. Depending on who needs to send what. Imagine a game with player created maps that transfer these maps with peer-to-peer connections.
While lowering the transfer rate can be as simple as calling your send function less frequently, attempting to control TCP this way will no doubt conflict with the existing rate control TCP has. As suggested in another answer, you might consider looking into more comprehensive ways to control TCP.
In this particular case, I doubt it would be an issue, unless you really need to send lots of UDP information while the clients are transferring files.
I wold expect most games to just show a loading screen or a lobby while this is happening. Neither should require much UDP traffic unless your game has it's own VOIP.
Here is an excellent article series that explains some of the possible uses of both TCP and UDP, specifically in the context of network games. TCP vs. UDP
In a later article from the series, he even explains a way to make UDP 'almost' as reliable as TCP (with code examples).
And as always... and measure your results. You have no way of knowing if your code is making the connections faster or slower unless you measure.
"# If you're worried about losing UDP packets, you should consider not using UDP."
Right on. UDP means no guarentee of packet delivery, especially over the internet. Check the TCP speed which is quite acceptable in modern day internet connections for most users playing games.

Phone doesn't send all stored ssids while capturing Wi-Fi probe wequests

I build a script with scapy to capture probe requests in a monitornig wi-fi interface.
I successfully capture the requests, and some of the SSIDs contained in them. But most of the networks stored in the phone don't get broadcasted.
And there isn't a clear pattern of why this happens. Some phones don't broadcast ssids at all.
I'm trying to find an explanation for the reasoning behind this behaviour, but haven't found any, apart that the hidden networks should be broadcasted in order for the phone to connect to them, but even that is not true, and most of the broadcasted ones are visible.
Another behaviour is the iPhones, that only seem to broadcast the network that they are connected to, and nothing else. (no network -> no SSIDs).
I have tried putting the interface in various channels, and results vary on the broadcasted networks, but the great majority of the saved ones in the device still aren't broadcasted.
Is there a reason behind this? Or a way to force the device to broadcast them all?
You seem to assume that the phone would do a probe request for each and every known network, permanently.
This is not the case - and not just for phone, but in general. Quoting the Wi-Fi Alliance[*]:
What are passive and active scanning?
The reason for client scanning is to determine a suitable AP to which
the client may [emphasis mine] need to roam now or in the future. A client can use two
scanning methods: active and passive. During an active scan, the
client radio transmits a probe request and listens for a probe
response from an AP [emphasis mine]. With a passive scan, the
client radio listens on each channel for beacons[emphasis mine again]
sent periodically by an AP. A passive scan generally takes more time,
since the client must listen and wait for a beacon versus actively
probing to find an AP. Another limitation with a passive scan is that
if the client does not wait long enough on a channel, then the client
may miss an AP beacon.
So this is entirely application/OS dependent if
the phone STA do an active scan, sending probe requests,
or just seat there listening for beacons (or doing nothing at all).
In my remembering - it's been a few years I didn't worked/looked at Android code, so it may have change - Android will not do an active scan, and thus will not send probe request to known SSID, unless you're in the Wi-Fi networks setting screen. It will just listen to beacons.
There are some Wi-Fi 802.11 design rationale behind this:
STA are supposed to be mobile. After all, if you're not moving from
time to time, there's not much point in using Wi-Fi (except marketing
or laziness, and of course smartphones changed that), you might as
well get wired.
...if you're mobile, it's reasonable to think you're running on a
battery,
And so you want to save battery life: so you'll rather do passive
scans listening to beacons rather than active scan sending probe
request, because this uses less power.
This idea of power saving alternative capabilities is spread all other the place in 802.11 design, hidden under carpet, when you're a STA.
So it is fully OS stack/application dependent from the STA if it 1/ just listen to beacons /2 actively send probe-request for every know AP 3/ send a broadcast probe-request, and also if it do so in a continuous manner, or periodically, or depending if it's in a know state (ex screen ON, and user going to the Wi-Fi networks setting screen).
Now there may be some other considerations, like some regional regulations that mandate that you first listen to beacons to decide if you can or cannot use some channels. But the main point is above.
*:
http://www.wi-fi.org/knowledge-center/faq/what-are-passive-and-active-scanning
EDIT:
On the programming side:
1/ What you seem to have is an IOP (interoperability) problem, because you expect a specific behavior from STA regarding scanning active vs passive and the involved probe-requests, and this is not how it works in the real world. Depending on your application final main goal, this may be a flawn in the design - or just a minor nuisance. You may want to restrict yourself to some specific device's brand, or try to cover all cases, which has a development cost.
2/ ...OR you were just surprised by your observations, and look for an explanation. In such case of surprising results, it goes without saying: go straight to wireshark to check your program observations (if your program is a packet sniffer) or behavior (if your program is a client/server/layer XYZ protocol implementation).
On the 802.11 strategies regarding active vs passive scan and power saving:
From "802.11 Wireless Networks: The Definitive Guide, 2nd Edition", by Matthew S. Gast ("member of the IEEE 802.11 working group, and serves as chair of 802.11 Task Group M. As chair of the Wi-Fi Alliance's Wireless Network Management marketing task group, he is leading the investigation of certification requirements for power saving, performance optimization, and location and timing services" - from his publisher bio). A book i can highly recommend.
p. 171:
ScanType (active or passive)
Active scanning uses the transmission of Probe Request frames to
identify networks in the area. Passive scanning saves battery power by
listening for Beacon frames.
p. 172:
Passive Scanning
Passive scanning saves battery power because it does not require
transmitting. In passive scanning, a station moves to each channel on
the channel list and waits for Beacon frames.
Also, a bit old (2003), but these guys know their stuff about networking. About scanning strategies:
From Cisco "802.11 Wireless LAN Fundamentals", chapter 5 "mobility".
Page 153:
Roaming Algorithms
The mechanism to determine when to roam is not defined by the IEEE
802.11 specification and is, therefore, left to vendors to implement. [...] The fact that the algorithms are left to vendor implementation
provide vendors an opportunity to differentiate themselves by creating
new and better performing algorithms than their competitors. Roaming
algorithms become a vendor’s “secret sauce,” and as a result are kept
confidential.
Page 154 "Determining Where to Roam":
There is no ideal technique for scanning. Passive scanning has the
benefit of not requiring the client to transmit probe requests but
runs the risk of potentially missing an AP because it might not
receive a beacon during the scanning duration. Active scanning has the
benefit of actively seeking out APs to associate to but requires the
client to actively transmit probes. Depending on the implementation
for the 802.11 client, one might be better suited than the other. For
example, many embedded systems use passive scanning as the preferred
method [emphasis mine] [...]
Other interesting stuff on page 155, "Preemptive AP Discovery".

Possible causes for lack of data loss over my localhost UDP protocol?

I just implemented my first UDP server/client. The server is on localhost.
I'm sending 64kb of data from client to server, which the server is supposed to send back. Then, the client checks how many of the 64kb are still intact and they all are. Always.
What are the possible causes for this behaviour? I was expecting at least -some- dataloss.
client code: http://pastebin.com/5HLkfcqS
server code: http://pastebin.com/YrhfJAGb
PS: A newbie in network programming here, so please don't be too harsh. I couldn't find an answer for my problem.
The reason why you are not seeing any lost datagrams is that your network stack is simply not running into any trouble.
Your localhost connection can easily cope with what you provide, a localhost connection is able to process several 100 megabyte of data per second on a decent CPU.
To see dropped datagrams you should increase the probability of interference. You have several opportunities:
increase the load on the network
busy your cpu with other tasks
use a "real" network and transfer data between real machines
run your code over a dsl line
set up a virtual machine and simulate network outages (Vmware Workstation is able to do so)
And this might be an interesting read: What would cause UDP packets to be dropped when being sent to localhost?

How would one go about to measure differences in clock time on two different devices?

I'm currently in an early phase of developing a mobile app that depends heavily on timestamps.
A master device is connected to several client devices over wifi, and issues various commands to these. When the client devices receive commands, they need to mark the (relative) timestamp when the command is executed.
While all this is simple enough, I haven't come up with a solution for how to deal with clock differences. For example, the master device might have its clock at 12:01:01, while client A is on 12:01:02 and client B on 12:01:03. Mostly, I can expect these devices to be set to similar times, as they sync over NTP. However, the nature of my application requires ms precision, so therefore I would like to safeguard against discrepancies.
A short delay between issuing a command and executing the command is fine, however an incorrect timestamp of when that command was executed is not.
So far, I'm thinking of something along the line of having the master device ping each client device to determine transaction time, and then request the client to send their "local" time. Based on this, I can calculate what the time difference is between master and client. Once the time difference is know, the client can adapt its timestamps accordingly.
I am not very familiar with networking though, and I suspect that pinging a device is not a very reliable method of establishing transaction time, since a lot factors apply, and latency may change.
I assume that there are many real-world settings where such timing issues are important, and thus there should be solutions already. Does anyone know of any? Is it enough to simply divide response time by two?
Thanks!
One heads over to RFC 5905 for NTPv4 and learns from the folks who really have put their noodle to this problem and how to figure it out.
Or you simply make sure NTP is working properly on your servers so that you don't have this problem in the first place.

Socket stress test

What will flood the network better?
1- Opening one socket to http web server and write data till crash
2- Openning multiple sockets and write data till crash
3- Opening a socket and send tcp packets till crash
4- Opening multiple sockets and send tcp packets till crash ?
It sounds like what you are looking to do is to test how the web server reacts to various flavors of Denial of Service attacks.
Something to consider is what is the Denial of Service logic in the web server and how is Denial of Service protection typically implemented in web servers. For instance is there logic to cap the number of concurrent connections or the number of concurrent connections from the same IP or to monitor the amount of traffic so as to throttle it or to disconnect if the amount of traffic exceeds some threshold.
One thing to consider is to not just push lots of bytes through the TCP/IP socket. The web server is interpreting the bytes and is expecting the HTTP protocol to be used. So what happens if you do strange and unusual things with the HTTP protocol as well as other protocols that are built onto HTTP.
For options 3 and 4 it sounds like you are considering bypassing the TCP/IP stack with its windowing logic and to just send a stream of TCP protocol packets ignoring reply packets until something smokes. This would be more of a test of the TCP stack robustness on the server rather than the robustness of the web server itself.
The ability to reach network saturation depends on network conditions. It is possible to write your "flooder" in such a way that it slows you down because you are causing intermediate devices to drop packets, and the server itself ends up not seeing its maximum load.
Your application should start with one connection, and monitor the data rate. Continue to add connections if the aggregate data rate for all connections continues to rise. Once it no longer rises, you have hit a throughput limit. If it starts to drop, you are exceeding capacity of your system, and are either causing congestion control to kick in, or the server is unable to efficiently handle that many connections. If the throughput limit is much lower than what you expected, then you probably need to debug your network, or tune your TCP/socket parameters. If it is the server that is slowing down, you will need to profile it to see why it is not able to handle the connection load.
Also, check the data rate of each connection, and see if certain connections are much faster than others. If that happens, then the server has a fairness problem which should also be addressed. This has less to do with server and network performance as it has to do with good user experience, though. The presence of such a problem could be exploited in a denial of service attack.

Resources