I am asked for an assignment that requires implementation of FTP protocol. I have gone through the documentation given at RFC959.
I am confused with a couple of implementation details
1)If a file needs to be transferred, what function can be used. can a simple send() function be used for a non text file.
2) Is it possible to get a good tutorial that speaks about implementing Modes and file structures, and to specify, which are essential.
hope to get a reply soon.
FTP transfers file through a plain TCP connection, and you can transfer any kind of file with it. There is no difference between text files and binary files, they are all just sequence of bytes.
For the file transmission is sufficient to open a connection and call the write function many times until the entire file is transmitted (check the return value of write to know how many bytes it sent).
The rest of the FTP protocol is text based and is sent to a different port.
There is a good tutorial on using FTP directly through netcat, that can be useful to understand how things work. Understanding active and passive mode can also be useful, since you are going to implement at least one of them.
Also, use wireshark to follow a TCP stream and see the data you are sending/receiving, it can be very useful in debugging.
The protocol implementation won't give you a file structure. The protocol is here to define some rules and states.
The dev/prog part is up to you. You just need to respect the FTP protocol in order to gain the normalization and the compatibility with other client/server.
Best regards
Related
I need to transfer some files from remote computer (on local network) and I plan to do it via FTP.
Apparently, FTP is based on TCP protocol and if I remember well my lessons the difference between TCP and UDP is that TCP checks that network packets are correctly send and received.
After asking myself if I need to add checksum verification, my conclusion was that I don't need to. Am I correct ?
I'm aware of the differences between binary transfer and text transfer and plan to do only binary transfers (working only on Windows).
Do I really need to checksum big files transfered by binary FTP ?
Be it clear, I need data integrity to verify that some bits where not altered during the exchange. Man in the middle is not (much) an issue because the operation will be done in a private network.
Yes, you do.
A man in the middle can alter any TCP packets on the way from the ftp server to your site or he can even act as a malicious ftp site and suppress the original traffic completely.
Therefore you need to verify somehow that that file you received is really the file you wanted to receive. Checksums are suitable for this task.
I am trying to write a client-server program in C in windows. The objective is to receive the directory listing from the server. Now I was trying to develop the client-server in such a way to utilize most resources.
One way to implement is that server makes a single send() call to send info of a single file. So if there are 100 files, it makes 100 calls. But I feel its a wastage of network resources. As far as I know the buffer size for send() or recv() in windows is 8kb. But the info of a single file will be hardly 1kb. So is there a way to make send() call to send multiple files info (file info are stored in structures. So they basically form a linked list) ? May be I can send info of atleast 8 files in a single Send() call. That should reduce the total send() calls to maximum 13.
So basically is there a way to send a linked list via send() ?? Plz let me know if you can think of any alternative method.
Good question! +1 for that.
But do you really want or need to write your code to use Winsock? There are good reasons to do so -- including that it's fun and a challenge. But if you don't need to, you might want to consider using the libcurl ftp library, which is free, multi-platform (including win32, of course), just works, and might make your job a lot easier.
The only way I know of to do this with FTP is to use multiple connections to the FTP server. If this is allowed by the server, there can be a list performance boost because the many protocol exchanges needed to list a complete folder tree can be run in parallel.
Rgds,
Martin
TCP is a byte stream. There is no guarantee of a 1-to-1 relation between the number of items you want to send and the number of calls to send() (or recv()) you need to make. That is simply not how TCP works. You format the data the way you need to, and then you keep calling send() until it tells you that all of the data has been sent.
Regarding FTP, please read RFC 959 and RFC 3659 to learn how the ftp protocol actually works. Before the introduction of the MLST and MLSD commands, directory listings had no standardized format. FTP servers were free to use whatever formatting they wanted. Many servers just piped the raw data from the OS's own dir or list commands. Indy, for example, includes several dozen parsers in its FTP client for handling non-standard directory listings.
I was designing a file server using socket programming in C. I send the calls like open(), write() etc as plain strings using stream sockets and decipher it at the server end.i.e if it is a open call then we extract path, mode, flags. Is it ok or should I be using some kind of struct to store the file system calls and send to server where the server simply accesses the fields.
Is there some standard way i don't know?
Thanks
You're basically starting to define your own protocol. It would be a lot easier if you sent numbers describing operations instead of strings.
If you're serious about this you might want to look into RPC - RFC707 (you did ask for a standard way, right?).
Yes, there is a standard way. Look into NFS, AFP, CIFS, and WebDAV, then pick one.
You already have answers for the standard way, so I'll give you a few caveats you should look out for.
If you intend to deploy your file server in an un-trusted environment (e.g. on the Internet) think about securing it right away. Securing it is not just a question of slapping encryption on - you need to know how you intend to authenticate your users, how you want to authorize different types of access to different parts of the server, how you will insure the authenticity and the integrity of the data and how you intend to keep the data confidential.
You'll also need to think about your server's availability. That means that you should be fault-tolerant - i.e. connections can (and will) break (regardless of whether they're being broken on purpose or not) so you need to detect that, either will some kind of keep-alive (which will fail if the client left) or with some kind of activity time-out (which will expire if the client left). You also need to think about how many clients you are willing to support simultaneously - which can radically change the architecture of your server.
As for the open, close, read, write etc. commands, most file transfer protocols don't go into so much detail, but it may be interesting to be able to do so depending on your situation. If your files are huge and you only need some chunks of it, or if you want to be able to lock files to work on them exclusively, etc. you may want to go into such detail. If you don't have those requirements, simpler, transactional commands such as get & put (rather than open, read, read, read, close and open, write, write some more, close) may both be easier to implement and easier to work with.
If you want a human being to interact with your server and give it commands, text is a good approach: it's easy to debug when sniffing and humans understand text and can type is easily. If there are no humans involved, using integers as commands is probably a better approach: you can structure your command to start with an integer followed by a number of parameters and always simply expect the same thing on the server's end (and do a switch on the command you receive). Even in that case, though, it may be a good idea to have human-readable values in your integers. For example, putting 'READ' in an integer as the read command uses as many bytes as 0x00000001, but is easier to read when sniffed with WireShark.
Finally, you really take a look at the standard approaches and try to understand the trade-offs made in each case. Ask yourself, for example, why HTTP has such verbose headers and why WebDAV uses it. Why does FTP use two connections (one for commands and one for data) while many other protocols use only one? How did NFS evolve to where it is now, and why? Understanding the answers to these questions will help you develop your own protocol - if after you understand those answers, you still feel you need your own protocol.
Am back with a question on using C File stream in sockets programming. I was reading about it and saw mixed reviews - some people say it's not reliable (ie leaky abstraction?).
Has any one got a view about using C File stream in sockets programming?
Yes. Don't.
The TCP and UDP protocols have too many semantics to be easily mapped to your usual file stream APIs. That's not to say it's impossible or even difficult, but there are likely to be lots and lots of gotchas and edge cases that will give you wildly unpredictable behaviour. I also cannot think off the top of my head of any applications where you might want to treat a socket as an ordinary file.
At the end of the day, once you've dealt with binding and listening and accepting, none of which you can do with C File streams, and wrapped the resultant file descripter in a File stream type, all you are going to do is use fread() and fwrite(), maybe fgetc(), so you may as well leave it as an ordinary file descriptor and use recv(), and send() and save yourself the hassle of wrapping. You may save yourself the hassle of dealing with buffering, but having control of the buffering allows you to tune your buffer to the application's requirements and save yourself some network overhead and speed.
It depends on the kind of application you're writing. FILE streams are not suitable for nonblocking, asynchronous, or select/poll-based IO. This may be no problem for a command line program that performs a sequential task of connecting to a server, making some request, and getting the results. It also works alright for a run-from-inetd-only server process. But if your application will be doing anything event-based, you're in trouble. If you really want to use FILE streams with sockets in an event-based application, you can make it possible using threads, but I doubt it's a good idea...
So, for a CS project I'm supposed to sniff a network stream and build files from that stream. For example, if the program is pointed to ~/dumps/tmp/ then the directory structure would be this:
~/dumps/tmp
/192.168.0.1/
page1.html
page2.html
[various resources for pages1 & 2]
downloaded file1
/192.168.0.2/
so on and so forth.
I'm doing this in C & pcap on linux (since I already know C++, and figure the learning experience would be good).
Thus far, I've been looking at various header formats for TCP/IP
TCP header
As I figure, I can sort the packets by their dst/src and then order them correctly by sequence and acknowledgement windows.
But that leaves me with a big ? as to how do I figure out how packets a-z are part of an html file and A-Z part of some random file being downloaded etc?
Also, what other kind of header formats should I be looking up? Currently, I have:
I'd post more hyperlink pictures, but I apparently need reputation to do that, sorry
TCP, Ethernet, UDP, and I'll get around to things like FTP (but I'm pretty sure FTP is built on top of TCP, as is HTTP)
So, in short, how do I find files in a network stream, and am I missing any major protocols that I'll need to be able to read?
REPLY
I can't figure out how to reply, so this will have to do.
I have used pcap on several occasions, and will do so again for this project, but I won't use any of Wiresharks stuff (although it is a great program) because I want to no kidding learn this kind of stuff.
Yeah, I'll look into the OSI layer, any suggestions on a good site that covers common protocols?
And I guess I should stop, before this 'question' becomes a discussion.
Where a file begins and ends is not in TCP. You have to deal with the protocol carried over TCP. For example, for HTTP, you have to read the Content-Length header in the HTTP header, which should be equal to the length of the HTTP body (the full html page). Then you accumulate the body over 1 or more TCP packets until you have the total content, as indicated by the Content-Length header.
Since this is a school assignment, you may be limited as to what tools you can use, but you might want to look into Wireshark. If I were given this task as a real-world project, I'd take Wireshark and look into how to use its stream extraction and protocol parsing capabilities and just wrap something around them to automate them and get the desired result.
You need to open a raw socket over a promiscuous Ethernet device. Then use libpcap to store and analyze the packets.
As this is for CS school, I would start with the OSI Model which gives you a good overview and logical structure of network protocols.
Files are on level 6 (MIME) and 7 (various).
Then you need to go through each protocol and check how to determine which contain files and how you can capture them.