Faster file transfer than FTP - file

FTP is a pure TCP-connect protocol, and thus AFAIK "as fast as it gets" when considering TCP file transfer options.
However, there are some other products that do not run over TCP - examples are the commercial products BI.DAN-GUN, fasp and FileCatalyst. The latter product points out problems with pure TCP, and one can read more on Wikipedia, e.g. starting from Network Congestion.
What other alternatives are there? .. in particular Open Source ones? Also, one would think that this should be an RFC of sorts - a standard largish-file-transfer-specific protocol, probably running over UDP. Anyone know of such a protocol, or an initiative? (The Google SPDY is interesting, but doesn't directly address fast large-file-transfer)

Why do you think using TCP makes the transfer slower? TCP is usually able to use all available bandwidth. Using UDP instead is unlikely to be faster. In fact, if you tried to make a reliable UDP based file transfer, you'd likely end up implementing an inferior alternative to TCP - since you'd have to implement the reliability yourself.
What is problematic about FTP is that it performs multiple synchronous request-response commands for every file you transfer, and opens a new data connection for every file. This results in extremely inefficient transfers when a lot of smaller files are being transferred, because much of the time is spent waiting requests/responses and establishing data connections instead of actually transferring data.
A simple way to get around this issue is to pack the files/folders into an archive. While you can, of course, just make the archive, send it using FTP or similar, and unpack it on the other side, the time spent packing and unpacking may be unacceptable. You can avoid this delay by doing the packing and unpacking on-line. I'm not aware of any software that integrates such on-line packing/unpacking. You can, however, simply use the nc and tar programs in a pipeline (Linux, on Windows use Cygwin):
First run on the receiver:
nc -l -p 7000 | tar x -C <destination_folder>
This will make the receiver wait for a connection on port number 7000. Then run on the sender:
cd /some/folder
tar c ./* | nc -q0 <ip_address_of_receiver>:7000
This will make the sender connect to the receiver, starting the transfer. The sender will creating the tar archive, sending it to the receiver, which will be extracting it - all at the same time. If you need, you can reverse the roles of sender and receiver (by having the receiver connect to the sender).
This online-tar approach has none of the two performance issues of FTP; it doesn't perform any request-response commands, and uses only a single TCP connections.
However, note that this is not secure; anybody could connect to the receiver before our sender does, send him his own tar archive. If this is an issue, a VPN can be used, in combination with appropriate firewall rules.
EDIT: you mentioned packet loss as a problem with TCP performance, which is a significant problem, if the FileCatalyst page is to be believed. It is true that TCP may perform non-optimally with high packet loss links. This is because TCP usually reacts aggressively to packet loss, because it assumes loss is due to congestion; see Additive_increase/multiplicative_decrease. I'm not aware of any free/open source file transfer programs that would attempt to overcome this with custom protocols. You may however try out different TCP congestion avoidance algorithms. In particular, try Vegas, which does not use packet loss as a signal to reduce transmission rate.

There is a number of open source projects trying to tackle file transfer via UDP.
Take a look at UFTP, Tsunami or UDT, each project is at a different stage of development.
Depending on the bandwidth the latency and the pocket loss you are working with, each project will produce a different result. There is a blog article that tries to compare the 3 projects, here is the link http://www.filecatalyst.com/open-source-fast-file-transfers

Not open source, but Aspera's transfer speeds are worth checking out and are not affected by packet loss or network delay. You can see a chart here.
It also uses a proprietary protocol called fasp.

Consider using UDP based file transfer, have a look at JSCAPE MFT Server which implements a proprietary protocol known as AFTP (Accelerated File Transfer Protocol). Please review this link :
http://www.jscape.com/blog/bid/80668/UDP-File-Transfer-up-to-100x-Faster-than-TCP
Please keep in mind that JSCAPE's AFTP works optimally over long distance connections which have network latency. If there is no network latency AFTP will not perform any better than plain old regular FTP (over TCP).
Yes I do work for JSCAPE LLC.

Related

View - but not intercept - all IPv4 traffic to Linux computer

Is there a way to view all the IPv4 packets sent to a Linux computer?
I know I can capture the packets at the ethernet level using libpcap. This can work, but I don't really want to defragment the IPv4 packets. Does libpcap provide this functionality and I'm just missing it?
One thing that kinda works is using a tun device. I can capture all the IPv4 traffic by routing all traffic to the tun device via something like ip route add default via $TUN_IP dev $TUNID. This also stops outbound traffic though, which is not what I want.
I just want to see the IPv4 packets, not intercept them. (Or, even better, optionally intercept them.)
Edit: I'm specifically looking for a programmatic interface to do this. E.g. something I can use from within a C program.
Yes, you can see all the packets that arrive at your network interface. There are several options to access or view them. Here a small list of possible solutions, where the first one is the easiest and the last one the hardest to utilize:
Wireshark
I'd say this is pretty much the standard when it comes to protocol analyzers with a GUI (uses libpcap). It has tons of options, a nice GUI, great filtering capabilities and reassembles IP datagrams. It uses libpcap and can also show the raw ethernet frame data. For example it allows you to see layer 2 packets like ARP. Furthermore you can capture the complete data arriving at your network interface in a file that can later be analyzed (also in Wireshark).
tcpdump
Very powerful, similar features like Wireshark but a command line utility, which also uses libpcap. Can also capture/dump the complete interface traffic to a file. You can view the dumped data in Wireshark since the format is compatible.
ngrep
This is known as the "network grep" and is similar to tcpdump but supports regular expressions (regex) to filter the payload data. It allows to save captured data in the file format supported by Wireshark and tcpdump (also uses libpcap).
libnids
Quotation from the official git repository:
"Libnids is a library that provides a functionality of one of NIDS
(Network Intrusion Detection System) components, namely E-component. It means
that libnids code watches all local network traffic [...] and provides convenient information on them to
analyzing modules of NIDS. Libnids performs:
assembly of TCP segments into TCP streams
IP defragmentation
TCP port scan detection"
libpcap
Of course you can also write your own programs by using the library directly. Needless to say, this requires more efforts.
Raw or Packet Sockets
In case you want to do all the dirty work yourself, this is the low level option, which of course also allows you to do everything you want. The tools listed above use them as a common basis. Raw sockets operate on OSI layer 3 and packet sockets on layer 2.
Note: This is not meant to be a complete list of available tools or options. I'm sure there are much more but these are the most common ones I can think of.
Technically you have to make a copy of the received packet via libpcap. To be more specific, what you can do is to get packets with libpcap, that way the packets will be kind of blocked, so you need to re send them to the destination. Lets say that you want to make a Fire-Wall or something, what you should do is to have a layer that can work like getting the package and then send it to the destination, in between you can make a copy of what you got for further processes. In order to make the intercept option, you need to create some predefined rules, i.e. the ones that violates the rules will not be send again to their destination.
But that needs a lot of efforts and I don't think you want to waist your life on it.
Wire-shark as mentioned by #Barmar can do the job already.
If you need some kind of command line interface option I would say that "tcpdump" is one of the best monitoring tools. for example for capturing all ipv4 HTTP packets to and from port 80 the command will be:
tcpdump 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'
for more information and options see tcpdump
Please be specific if you need to write a program for it, then we can help about how to do it.

Sending UDP and TCP packets on the same network line - how to prevent UDP drops? [duplicate]

Consider the prototypical multiplayer game server.
Clients connecting to the server are allowed to download maps and scripts. It is straightforward to create a TCP connection to accomplish this.
However, the server must continue to be responsive to the rest of the clients via UDP. If TCP download connections are allowed to saturate available bandwidth, UDP traffic will suffer severely from packet loss.
What might be the best way to deal with this issue? It definitely seems like a good idea to "throttle" the TCP upload connection somehow by keeping track of time, and send() on a regular time interval. This way, if UDP packet loss starts to occur more frequently the TCP connections may be throttled further. Will the OS tend to still bunch the data together rather than sending it off in a steady stream? How often would I want to be calling send()? I imagine doing it too often would cause the data to be buffered together first rendering the method ineffective, and doing it too infrequently would provide insufficient (and inefficient use of) bandwidth. Similar considerations exist with regard to how much data to send each time.
It sounds a lot like you're solving a problem the wrong way:
If you're worried about losing UDP packets, you should consider not using UDP.
If you're worried about sharing bandwidth between two functions, you should consider having separate pipes (bandwidth) for them.
Traffic shaping (which is what this sounds like) is typically addressed in the OS. You should look in that direction before making strange changes to your application.
If you haven't already gotten the application working and experienced this problem, you are probably prematurely optimizing.
To avoid saturating the bandwidth, you need to apply some sort of rate limiting. TCP actually already does this, but it might not be effective in some cases. For example, it has no idea weather you consider the TCP or UDP traffic to be the more important.
To implement any form of rate limiting involving UDP, you will first need to calculate UDP loss rate. UDP packets will need to have sequence numbers, and then the client has to count how many unique packets it actually got, and send this information back to the server. This gives you the packet loss rate. The server should monitor this, and if packet loss jumps after a file transfer is started, start lowering the transfer rate until the packet loss becomes acceptable. (You will probably need to do this for UDP anyway, since UDP has no congestion control.)
Note that while I mention "server" above, it could really be done either direction, or both. Depending on who needs to send what. Imagine a game with player created maps that transfer these maps with peer-to-peer connections.
While lowering the transfer rate can be as simple as calling your send function less frequently, attempting to control TCP this way will no doubt conflict with the existing rate control TCP has. As suggested in another answer, you might consider looking into more comprehensive ways to control TCP.
In this particular case, I doubt it would be an issue, unless you really need to send lots of UDP information while the clients are transferring files.
I wold expect most games to just show a loading screen or a lobby while this is happening. Neither should require much UDP traffic unless your game has it's own VOIP.
Here is an excellent article series that explains some of the possible uses of both TCP and UDP, specifically in the context of network games. TCP vs. UDP
In a later article from the series, he even explains a way to make UDP 'almost' as reliable as TCP (with code examples).
And as always... and measure your results. You have no way of knowing if your code is making the connections faster or slower unless you measure.
"# If you're worried about losing UDP packets, you should consider not using UDP."
Right on. UDP means no guarentee of packet delivery, especially over the internet. Check the TCP speed which is quite acceptable in modern day internet connections for most users playing games.

Do I need to check data integrity after sending file over ftp?

I need to transfer some files from remote computer (on local network) and I plan to do it via FTP.
Apparently, FTP is based on TCP protocol and if I remember well my lessons the difference between TCP and UDP is that TCP checks that network packets are correctly send and received.
After asking myself if I need to add checksum verification, my conclusion was that I don't need to. Am I correct ?
I'm aware of the differences between binary transfer and text transfer and plan to do only binary transfers (working only on Windows).
Do I really need to checksum big files transfered by binary FTP ?
Be it clear, I need data integrity to verify that some bits where not altered during the exchange. Man in the middle is not (much) an issue because the operation will be done in a private network.
Yes, you do.
A man in the middle can alter any TCP packets on the way from the ftp server to your site or he can even act as a malicious ftp site and suppress the original traffic completely.
Therefore you need to verify somehow that that file you received is really the file you wanted to receive. Checksums are suitable for this task.

How do I block packets coming on port 23 on my computer?

I am using libpcap library. I have made one packet sniffer C program using pcap.h. Now I want to block packets coming on port 23 on my computer via eth0 device. I tried pcap_filter function but it is not useful for blocking.
Please explain to me how to code this functionality using c program.
Libpcap is just used for packet capturing, i.e. making packets available for use in other programs. It does not perform any network setup, like blocking, opening ports. In this sense pcap is a purely passive monitoring tool.
I am not sure what you want to do. As far as I see it, there are two possibilities:
You actually want to block the packets, so that your computer will not process them in any way. You should use a firewall for that and just block this port. Any decent firewall should be able to do that fairly easy. But you should be aware, that this also means no one will be able to ssh into your system. So if you do that on a remote system, you have effectively locked out yourself.
You still want other programs (sshd) to listen on port 23 but all this traffic is annoying you in your application. Libpcap has a filtering function for that, that is quite powerful. With this function you can pass small scripts to libpcap and have it only report packets that fit. Look up filtering in the pcap documentation for more information. This will however not "block the traffic" just stop pcap from capturing it.
Actually using pcap you are not able to build firewall. This is because packets seen inside your sniffer (built using pcap) are just copy of packets which (with or without sniffer) are consumed by network stack.
In other words: using filters in pcap will cause that you will not see copies of original packets (as far as I know pcap compiles filters and add those to kernel so that on kernel level copy will not be done); anyway original packet will go to network stack anyway.
The solution of your problem most probably could be done by netfilter. You can register in NF_IP_PRE_ROUTING hook and there decide to drop or allow given traffic.

Maximizing performance on udp

im working on a project with two clients ,one for sending, and the other one for receiving udp datagrams, between 2 machines wired directly to each other.
each datagram is 1024byte in size, and it is sent using winsock(blocking).
they are both running on a very fast machines(separate). with 16gb ram and 8 cpu's, with raid 0 drives.
im looking for tips to maximize my throughput , tips should be at winsock level, but if u have some other tips, it would be great also.
currently im getting 250-400mbit transfer speed. im looking for more.
thanks.
Since I don't know what else besides sending and receiving that your applications do it's difficult to know what else might be limiting it, but here's a few things to try. I'm assuming that you're using IPv4, and I'm not a Windows programmer.
Maximize the packet size that you are sending when you are using a reliable connection. For 100 mbs Ethernet the maximum packet is 1518, Ethernet uses 18 of that, IPv4 uses 20-64 (usually 20, thought), and UDP uses 8 bytes. That means that typically you should be able to send 1472 bytes of UDP payload per packet.
If you are using gigabit Ethernet equiptment that supports it your packet size increases to 9000 bytes (jumbo frames), so sending something closer to that size should speed things up.
If you are sending any acknowledgments from your listener to your sender then try to make sure that they are sent rarely and can acknowledge more than just one packet at a time. Try to keep the listener from having to say much, and try to keep the sender from having to wait on the listener for permission to keep sending.
On the computer that the sender application lives on consider setting up a static ARP entry for the computer that the receiver lives on. Without this every few seconds there may be a pause while a new ARP request is made to make sure that the ARP cache is up to date. Some ARP implementations may do this request well before the ARP entry expires, which would decrease the impact, but some do not.
Turn off as many users of the network as possible. If you are using an Ethernet switch then you should concentrate on the things that will introduce traffic to/from the computers/network devices on which your applications are running reside/use (this includes broadcast messages, like many ARP requests). If it's a hub then you may want to quiet down the entire network. Windows tends to send out a constant stream of junk to networks which in many cases isn't useful.
There may be limits set on how much of the network bandwidth that one application or user can have. Or there may be limits on how much network bandwidth the OS will let it self use. These can probably be changed in the registry if they exist.
It is not uncommon for network interface chips to not actually support the maximum bandwidth of the network all the time. There are chips which may miss packets because they are busy handling a previous packet as well as some which just can't send packets as close together as Ethernet specifications would allow. Additionally the rest of the system might not be able to keep up even if it is.
Some things to look at:
Connected UDP sockets (some info) shortcut several operations in the kernel, so are faster (see Stevens UnP book for details).
Socket send and receive buffers - play with SO_SNDBUF and SO_RCVBUF socket options to balance out spikes and packet drop
See if you can bump up link MTU and use jumbo frames.
use 1Gbps network and upgrade your network hardware...
Test the packet limit of your hardware with an already proven piece of code such as iperf:
http://www.noc.ucf.edu/Tools/Iperf/
I'm linking a Windows build, it might be a good idea to boot off a Linux LiveCD and try a Linux build for comparison of IP stacks.
More likely your NIC isn't performing well, try an Intel Gigabit Server Adapter:
http://www.intel.com/network/connectivity/products/server_adapters.htm
For TCP connections it has been shown that using multiple parallel connections will better utilize the data connection. I'm not sure if that applies to UDP, but it might help with some of the latency issues of packet processing.
So you might want to try multiple threads of blocking calls.
As well as Nikolai's suggestion of send and recv buffers, if you can, switch to overlapped I/O and have many recvs pending, this also helps to minimise the number of datagrams that are dropped by the stack due to lack of buffer space.
If you're looking for reliable data transfer, consider UDT.

Resources