Do I need to check data integrity after sending file over ftp? - integrity

I need to transfer some files from remote computer (on local network) and I plan to do it via FTP.
Apparently, FTP is based on TCP protocol and if I remember well my lessons the difference between TCP and UDP is that TCP checks that network packets are correctly send and received.
After asking myself if I need to add checksum verification, my conclusion was that I don't need to. Am I correct ?
I'm aware of the differences between binary transfer and text transfer and plan to do only binary transfers (working only on Windows).
Do I really need to checksum big files transfered by binary FTP ?
Be it clear, I need data integrity to verify that some bits where not altered during the exchange. Man in the middle is not (much) an issue because the operation will be done in a private network.

Yes, you do.
A man in the middle can alter any TCP packets on the way from the ftp server to your site or he can even act as a malicious ftp site and suppress the original traffic completely.
Therefore you need to verify somehow that that file you received is really the file you wanted to receive. Checksums are suitable for this task.

Related

Deny a client's TCP connect request before accept()

I'm trying code TCP server in C language. I just noticed accept() function returns when connection is already established.
Some clients are flooding with random data some clients are just sending random data for one time, after that I want to close their's current connection and future connections for few minutes (or more, depends about how much load program have).
I can save bad client IP addresses in a array, can save timings too but I cant find any function for abort current connection or deny future connections from bad clients.
I found a function for windows OS called WSAAccept that allows you deny connections by user choice, but I don't use windows OS.
I tried code raw TCP server which allows you access TCP packet from begin including all TCP header and it doesn't accept connections automatically. I tried handle connections by program side including SYN ACK and other TCP signals. It worked but then I noticed raw TCP server receiving all packets in my network interface, when other programs using high traffic it makes my program laggy too.
I tried use libnetfilter which allows you filter whole traffic in your network interface. It works too but like raw TCP server it also receiving whole network interface's packets which is making it slow when there is lot of traffic. Also I tried compare libnetfilter with iptables. libnetfilter is slower than iptables.
So in summary how I can abort client's current and future connection without hurt other client connections?
I have linux with debian 10.
Once you do blacklisting on packet level you could get very fast vulnerable to very trivial attacks based on IP spoofing. For a very basic implementation an attacker could use your packet level blacklisting to blacklist anyone he wants by just sending you many packets with a fake source IP address. Usually you don't want to touch these filtering (except you really know what you are doing) and you just trust your firewall etc. .
So I recommend really just to close the file descriptor immediately after getting it from accept.

How to distinguish between different type of packets in the same HTTPS traffic?

There's something that bothers me: I'd like to distinguish between a packet coming from Youtube and a packet coming from Wikipedia: they both travel on HTTPS and they both come from the port 443.
Since they travel on HTTPS, their payload is not understandable and I can't do a full Deep Packet Inspection: I can only look at Ethernet, IP and TCP struct headers. I may look at the IP address source of both packets and see where they actually come from, but to know if they are from Youtube or Wikipedia I should already know the IP addresses of these two sites.
What I'm trying to figure out is a way to tell from a streaming over HTTP (like Youtube does) and a simple HTML transport (Wikipedia) without investigating the payload.
Edit 1: in a Wireshark session started during a reproducing video I got tons of packets. Maybe I should start looking at the timeout between packets coming from the same address.
If you are just interested in following the data stream in Wireshark you can use the TCP stream index, filter would be something like tcp.stream == 12
The stream index starts at zero with the first stream that wireshark encounters and increments for each new stream (persistent connection).
So two different streams between the same IPs would have two different numbers. For example a video stream might be 12 and an audio stream, between the same IP addresses, might be 13.
If you started the capture before the stream was initiated you'll be able to see the original traffic setting up the SSL connection (much of this is in clear text)
You may consider looking at the server certificate. It will tell you whether it's youtube (google) or facebook.
That would give you an idea whether SSL connection is to youtube, which one is to facebook.
You can try looking at the TCP header options, but generally the traffic is encrypted for a reason... so that it wouldn't be seen by man-in-the-middle. If it were possible, it would be, by definition, a poor encryption standard. Since you have the capture and all the information known to the user agent, you are not "in-the-middle". But you will need to use the user agent info to do the decryption before you can really see inside the stream.
this link: Reverse ip, find domain names on ip address
indicates several methods.
Suggest running nslookup on the IP from within a C program.
And remembering that address/ip values can be nested within the data of the packet, it may (probably will) take some investigation of the packet data to get to the originator of the packet
Well, you have encountered a dilema. How to get the info users are interchanging with their servers when they have explicitly encrypted the information to get anonymity. The quick response is you can't. But only if you can penetrate on the SSL connection you'll get more information.
Even the SSL certificate interchanged between server and client will be of not help, as it only identifies the server (and not the virtual host you'll try behind this connecton), and more than one SSL server (with the feature known as HTTP virtual host) several servers can be listening for connections on the same port of the same address.
SSL parameters are negotiated just after connection, and virtual server is normally selected with the Host http header field of the request (see RFC-2616) but these ocurr after the SSL negotiation has been finished, so you don't have access to them.
The only thing you can do for sure is to try to identify connections for youtube by the amounts and connection patterns this kind of traffic exhibit.

HTTPS protocol file integrity

I understand that when you send a file from a client to a server using HTTP/HTTPS protocols, you have the guarantee that all data sent successfully arrived at the destination. However, if you are sending a huge file and then suddenly the internet connection goes down, not all packages are sent and, therefore, you lose the logical integrity of the file.
Is there any point I am missing in my statement?
I would like to know if there is a way for the destination node to check file logical integrity without using a "custom code/api".
HTTPS is just HTTP over a TLS layer, so all applies to HTTPS, too:
HTTP is typically transported over TCP/IP. Now, TCP has flow control (ie. lost packets will be resent), and checksums (ie. the probability, that without the receiver noticing and re-requesting a packet data got altered is minor). So if you're really just transferring data, you're basically set (as long as your HTTP server is configured to send the length of your file in bytes, which, at least for static files, it usually is).
If your transfer is stopped before the whole file size that was advertised in the HTTP GET reply that your server sends to the client is reached, your client will know! Many HTTP libraries/clients can re-start HTTP transmissions (if the server supports it).
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15
even specifies a MD5 checksum header field. You can configure web servers to use that field, and clients might use it to verify the overall file integrity.
EDIT: Content-MD5 as specified by rfc2616 seems to be deprecated. You can now use a content digest, which is much more flexible.
Also, you mention that you want to check the file that a client sends to a server. That problem might be quite a bit harder -- whilst you're usually in total control of your web server, you can't force an arbitrary client (e.g. a browser) to hash its file before uploading.
If you're, on the other hand, in fact in control over the client's HTTP implementation, you could most probably also use something more file transfer oriented than plain HTTP -- think WebDav, AtomPUB etc, which are protocols atop of HTTP, or even more file exchange oriented protocols like rsync (which I'd heartily recommend if you're actually syncing stuff -- it reduces network usage to a minimum if both side's versions only differ partially). If for some reason you're in the position that your users share most of their data within a well-defined circle (for example, you're building something where photographers share their albums), you might even just use bittorrent, which has per-chunk hashing, extensive load balancing options, and allows for "plain old HTTP seeds".
There are several issues here:
As Marcus stated is his answer TCP protects your bytes from being accidentaly corrupted, but it doesn't help if download was interrupted
HTTPS additionally ensures that those bytes weren't tampered with between server and client (you)
If you want to verify integrity of file (whose transfer was or was not interrupted) you should use checksum designed to protect from accidental file corruption (e.g. CRC32, there could be better ones, you should check)
If in addition you use HTTPS then you're safe from intentional attacks too because you know your checksum is OK and that file parts you got weren't tampered with.
If you use checksum, but don't use HTTPS (but you really should) then you should be safe against accidental data corruption but not against malicious attacks. It could be mitigated, but it's outside the scope of this question
In HTTP/1.1, the recipient can always detect whether it received a complete message (either by comparing the Content-Length, or by properly handling transfer-encoding: chunked).
(Adding content hashes can help if you suspect bit errors on the transport layer.)

Implementing FTP Server/Client in C

I am asked for an assignment that requires implementation of FTP protocol. I have gone through the documentation given at RFC959.
I am confused with a couple of implementation details
1)If a file needs to be transferred, what function can be used. can a simple send() function be used for a non text file.
2) Is it possible to get a good tutorial that speaks about implementing Modes and file structures, and to specify, which are essential.
hope to get a reply soon.
FTP transfers file through a plain TCP connection, and you can transfer any kind of file with it. There is no difference between text files and binary files, they are all just sequence of bytes.
For the file transmission is sufficient to open a connection and call the write function many times until the entire file is transmitted (check the return value of write to know how many bytes it sent).
The rest of the FTP protocol is text based and is sent to a different port.
There is a good tutorial on using FTP directly through netcat, that can be useful to understand how things work. Understanding active and passive mode can also be useful, since you are going to implement at least one of them.
Also, use wireshark to follow a TCP stream and see the data you are sending/receiving, it can be very useful in debugging.
The protocol implementation won't give you a file structure. The protocol is here to define some rules and states.
The dev/prog part is up to you. You just need to respect the FTP protocol in order to gain the normalization and the compatibility with other client/server.
Best regards

Faster file transfer than FTP

FTP is a pure TCP-connect protocol, and thus AFAIK "as fast as it gets" when considering TCP file transfer options.
However, there are some other products that do not run over TCP - examples are the commercial products BI.DAN-GUN, fasp and FileCatalyst. The latter product points out problems with pure TCP, and one can read more on Wikipedia, e.g. starting from Network Congestion.
What other alternatives are there? .. in particular Open Source ones? Also, one would think that this should be an RFC of sorts - a standard largish-file-transfer-specific protocol, probably running over UDP. Anyone know of such a protocol, or an initiative? (The Google SPDY is interesting, but doesn't directly address fast large-file-transfer)
Why do you think using TCP makes the transfer slower? TCP is usually able to use all available bandwidth. Using UDP instead is unlikely to be faster. In fact, if you tried to make a reliable UDP based file transfer, you'd likely end up implementing an inferior alternative to TCP - since you'd have to implement the reliability yourself.
What is problematic about FTP is that it performs multiple synchronous request-response commands for every file you transfer, and opens a new data connection for every file. This results in extremely inefficient transfers when a lot of smaller files are being transferred, because much of the time is spent waiting requests/responses and establishing data connections instead of actually transferring data.
A simple way to get around this issue is to pack the files/folders into an archive. While you can, of course, just make the archive, send it using FTP or similar, and unpack it on the other side, the time spent packing and unpacking may be unacceptable. You can avoid this delay by doing the packing and unpacking on-line. I'm not aware of any software that integrates such on-line packing/unpacking. You can, however, simply use the nc and tar programs in a pipeline (Linux, on Windows use Cygwin):
First run on the receiver:
nc -l -p 7000 | tar x -C <destination_folder>
This will make the receiver wait for a connection on port number 7000. Then run on the sender:
cd /some/folder
tar c ./* | nc -q0 <ip_address_of_receiver>:7000
This will make the sender connect to the receiver, starting the transfer. The sender will creating the tar archive, sending it to the receiver, which will be extracting it - all at the same time. If you need, you can reverse the roles of sender and receiver (by having the receiver connect to the sender).
This online-tar approach has none of the two performance issues of FTP; it doesn't perform any request-response commands, and uses only a single TCP connections.
However, note that this is not secure; anybody could connect to the receiver before our sender does, send him his own tar archive. If this is an issue, a VPN can be used, in combination with appropriate firewall rules.
EDIT: you mentioned packet loss as a problem with TCP performance, which is a significant problem, if the FileCatalyst page is to be believed. It is true that TCP may perform non-optimally with high packet loss links. This is because TCP usually reacts aggressively to packet loss, because it assumes loss is due to congestion; see Additive_increase/multiplicative_decrease. I'm not aware of any free/open source file transfer programs that would attempt to overcome this with custom protocols. You may however try out different TCP congestion avoidance algorithms. In particular, try Vegas, which does not use packet loss as a signal to reduce transmission rate.
There is a number of open source projects trying to tackle file transfer via UDP.
Take a look at UFTP, Tsunami or UDT, each project is at a different stage of development.
Depending on the bandwidth the latency and the pocket loss you are working with, each project will produce a different result. There is a blog article that tries to compare the 3 projects, here is the link http://www.filecatalyst.com/open-source-fast-file-transfers
Not open source, but Aspera's transfer speeds are worth checking out and are not affected by packet loss or network delay. You can see a chart here.
It also uses a proprietary protocol called fasp.
Consider using UDP based file transfer, have a look at JSCAPE MFT Server which implements a proprietary protocol known as AFTP (Accelerated File Transfer Protocol). Please review this link :
http://www.jscape.com/blog/bid/80668/UDP-File-Transfer-up-to-100x-Faster-than-TCP
Please keep in mind that JSCAPE's AFTP works optimally over long distance connections which have network latency. If there is no network latency AFTP will not perform any better than plain old regular FTP (over TCP).
Yes I do work for JSCAPE LLC.

Resources