Efficient way to transfer data between 2 clients over TCP with intermediate server - c

This is for educational purpose (University assignment).
I need to write a client-server programs in C for Linux. (I already have that part. A client connects to the server, it sends and receive files without a problem...).
When a client connects to the server, it sends the server a list of files on the client. So the server has a list of all the files on it's clients.
A client A can request from the server a file "test.txt", the server knows that the file is on client B, and the file should be transferred from B to A. I'm trying to think about the best way of doing this.
recv from B into a buffer, and immediatly send() the buffer to A?
recv() the whole file from B and save it on server ,then send it
to A?
My programs should support this behavior:
if A asks a file from B, and then C asks a file from B, C should NOT wait until the transfer A<-->B will end. And this is where I get stuck.
Thank you very much!!
Edit: My server is using threads: whenever a new client is connecting, a new thread is opened to serve it. My client, at this moment, does not use threads (this can be changed).

If you want to do several transfers at once using sockets you have two options:
Blocking sockets and threads
This is the way you've written your server. The problem with threads is that they can lead to bugs that are rather hard to debug. Combine that with network bugs that in themselves can be rather hard to debug and you have a potential nightmarish debugging session on your hands.
Non-blocking sockets and select()
This way doesn't need threads, instead it uses select() to see which sockets have data waiting to be read. Combine that with some loops and you can transfer several files concurrently. Setting sockets to be non-blocking is easy, using correct can be slightly trickier but weighted against the potential for thread + network bugs this is the way I personally prefer to write network code.
Regarding your actual problem; I would suggest something like this:
Client A connects to server S. You need to bind the local side for the next step.
A also opens another socket for data transfer on the next port upwards.
S sends file list to A. How you build up the file list I leave to you.
A requests file F from S.
S checks which client has F.
S sends "send F to A on port X" to B. You can check which remote port is used, and then you know which port to send the file on.
B recieves and executes the command.

You have to connect 2 sockets per client, so you gonna have 4 sockets like this :
1 A <-> C
2 B <-> C
3 A <-> C <-> B 4
So you will have to use socket 1 and 2 to transfer file just between client A and Server C (socket1) and between client B and Server C (socket 2).
Then you will have to developp a bridge in server C using socket 3 and 4 to transfer data from A to B.
A multithreading solution should work, I guess !

What I've got so far:
Server is running only with select().
Client has a socket in listen() mode- call it dataSocket (on top of it's regular socket).
If a client A want a file, it tells the server the file name. The server finds the user with the file, and making a connet() to the dataSocket. (I managed to implement that).
Small files are transferred without a problem. Big files fails.
I think it's because something is wrong with my client.
I've implemented the client with a select() function, and if it finds some one doing a connect() it opens a new thread and serves the request. But something there is wrong..
The code is like this:
FD_SET(globalSocket,&origin);
FD_SET(dataSocket,&origin);
FD_SET(STDIN,&origin); //#define STDIN 0
while(1)
{
readfds=origin;
select(fdmax+1,&readfds,NULL,NULL,NULL);
for(i=0;...)
{
if(FD_ISSET(i,&readfds)
{
if(i==STDIN)
// get user input
if(i==dataSocket)
{
printf("someone wants a file from me");
pthread_create(...);
}
}
}
}
One problem that I saw is when I do my first request of the file, it prints the line "someone want a file from me" 2 or 3 times, but only one thread is created.
And when I try to send a large file, I do get some of it, but then a "Connection reset by peer" pops out...
I hope somebody here will be able to answer my not-so-good-explained question.
Thank you.

Related

Does reading from a socket wait or get EOF?

I'm implementing a simple connection between a client and a server in C.
In client side, I'm in a loop, reading from a file; every time BUFFER_SIZE bytes and sending it to the server side (didn't upload error handling).
//client side
bytesNumInput = read(inputFileFD,bufInput,BUFFER_SIZE)
bytesSend = write(sockfd,bufInput,bytesNumInput)
Of course the server is also in a loop.
//server side
bytesRecv = read(sockfd,bufOutput,bytesNumInput)
Now, my questions are:
Can I get EOF in the middle of the connection if the server reads faster than the client?
Does the read function wait to get all the data or is it the same as reading from a file?
Is it possible the server will handle 2 read iteration in 1 write iteration?
Can I get EOF in the middle of the connection if the server reads faster than the client?
No. EOF means the peer has disconnected. If the connection is still alive, read() will block until (a) at least one byte is transferred, (b) EOF occurs, or (c) an error occurs.
Does the read function wait to get all the data or is it the same as reading from a file?
See (a) above.
Is it possible the server will handle 2 read in 1 write iteration?
Yes. TCP is a byte-stream protocol, not a messaging protocol.
No, server will wait for incoming data. Sockets provide control flow.
Question not clear to me, read always try to get all requested data, but if there is no so much then it will get what is available
Yes, socket does not have semantic of messages, just a flow of bytes.

SSL renegotiation with full duplex socket communication

I have a very simple client-server with one blocking socket doing full-duplex communication. I've enabled SSL/TLS to the application. The model is that of a typical producer-consumer. The client produces the data, sends it to the server and the server processes them. The only catch is that, once in a while the server sends data back to the client which the client handles accordingly. Below is a very simple pseudo code of the application:
1 Client:
2 -------
3 while (true)
4 {
5 if (poll(pollin, timeout=0) || 0 < SSL_pending(ssl))
6 {
7 SSL_read();
8 // Handle WANT_READ or WANT_WRITE appropriately.
9 // If no error, handle the received control message.
10 }
11 // produce data.
12 while (!poll(pollout))
13 ; // Wait until the pipe is ready for a send().
14 SSL_write();
15 // Handle WANT_READ or WANT_WRITE appropriately.
16 if (time to renegotiate)
17 SSL_renegotiate(ssl);
18 }
19
20 Server:
21 -------
22 while (true)
23 {
24 if (poll(pollin, timeout=1s) || 0 < SSL_pending(ssl))
25 {
26 SSL_read();
27 // Handle WANT_READ or WANT_WRITE appropriately.
28 // If no error, consume data.
29 }
30 if (control message needs to be sent)
31 {
32 while (!poll(pollout))
33 ; // Wait until the pipe is ready for a send().
34 SSL_write();
35 // Handle WANT_READ or WANT_WRITE appropriately.
36 }
37 }
The trouble happens when, for testing purposes, I force SSL renegotiation (lines 16-17). The session starts nice and easy, but after a while, I get the following errors:
Client:
-------
error:140940F5:SSL routines:SSL3_READ_BYTES:unexpected record
Server:
-------
error:140943F2:SSL routines:SSL3_READ_BYTES:sslv3 alert unexpected message
Turns out, around the same time that the client initiates a renegotiation (line 14), the server ends up sending application data to the client (line 34). The client as part of the renegotiation process receives this application data and bombs with a "unexpected record" error. Similarly, when the server does the subsequent receive (line 26), it ends up receiving a renegotiation data when it was expecting application data.
What am I doing wrong? How should I handle/test SSL renegotiations with a full-duplex channel. Note that, there are no threads involved. It's a simple single threaded model with reads/writes happening on either end of the socket.
UPDATE : To verify that there is nothing wrong with the application that I have written, I could even reproduce this quite comfortably with OpenSSL's s_client and s_server implementations. I started a s_server and once the s_client got connected to the server, I programmatically send a bunch of application data from the server to the client and a bunch of 'R' (renegotiation requests) from the client to the server. Eventually, they both fail in exactly the same manner as described above.
s_client:
RENEGOTIATING
4840:error:140940F5:SSL routines:SSL3_READ_BYTES:unexpected record:s3_pkt.c:1258:
s_server:
Read BLOCK
ERROR
4838:error:140943F2:SSL routines:SSL3_READ_BYTES:sslv3 alert unexpected message:s3_pkt.c:1108:SSL alert number 10
4838:error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure:s3_pkt.c:1185:
UPDATE 2:
Ok. As suggested by David, I reworked the test application to use non-blocking sockets and always do SSL_read and SSL_write first and do the select based on what they return and I still get the same errors during renegotiations (SSL_write ends up getting application data from the other side in the midst of renegotiation). The question is, at any point in time, if SSL_read returns WANT_READ, can I assume it is because there is nothing in the pipe and go ahead with SSL_write since I have something to write? If not, that's probably why I end up with errors. Either that, or I am doing the renegotiation all wrong. Note, if SSL_read returns WANT_WRITE, I always do a select and call SSL_read again.
You're trying to "look through" the SSL black box. This is a huge mistake.
if (poll(pollin, timeout=0) || 0 < SSL_pending(ssl))
{
SSL_read();
You're making the assumption that in order for SSL_read to make forward progress, it needs to read data from the socket. This is an assumption that can be false. For example, if a renegotiation is in progress, the SSL engine may need to send data next, not read data.
while (!poll(pollout))
; // Wait until the pipe is ready for a send().
SSL_write();
How do you know the SSL engine wants to write data to the pipe? Did it give you a WANT_WRITE indication? If not, maybe it needs to read renegotiation data in order to send.
To use SSL in non-blocking mode, just attempt the operation you want to do. If you want to read decrypted data, call SSL_read. If you want to send encrypted data, call SSL_write. Only call poll if the SSL engine tells you to, with a WANT_READ or WANT_WRITE indication.
Update:: You have a "half of each" hybrid between blocking and non-blocking approaches. This cannot possibly work. The problem is simple: Until you call SSL_read, you don't know whether or not it needs to read from the socket. If you call poll first, you will block even if SSL_read does not need to read from the socket. If you call SSL_read first, it will block if it does need to read from the socket. SSL_pending won't help you. If SSL_read needs to write to the socket to make forward progress, SSL_pending will return zero, but calling poll will block forever.
You have two sane choices:
Blocking. Leave the sockets set blocking. Just call SSL_read when you want to read and SSL_write when you want to write. They will block. Blocking sockets can block, that's how they work.
Non-blocking. Set the sockets non-blocking. Just call SSL_read when you want to read and SSL_write when you want to write. They will not block. If you get a WANT_READ indication, poll in the read direction. If you get a WANT_WRITE indication, poll in the write direction. Note that it is perfectly normal for SSL_read to return WANT_WRITE, and then you poll in the write direction. Similarly, SSL_write can return WANT_READ, and then you poll in the read direction.
Your code would (mostly) work if the implementation of SSL_read was basically, "read some data then decrypt it" and SSL_write was "encrypt some data and send it". The problem is, these functions actually run a sophisticated state machine that reads and writes to the socket as needed and ultimately causes the effect of giving you decrypted data or encrypting your data and sending it.
After spending time debugging my application with OpenSSL, I figured out the answer to the question I originally posted. I am sharing it here in case it helps others like me.
The question I had posted originally had to do with a clear error from OpenSSL indicating that it was receiving application data in the middle of a handshake. What I failed to understand was that OpenSSL gets confused when it receives application data in the middle of a handshake. It's fine to receive handshake data when receiving/sending application data but not the other way around (at least with OpenSSL). That's the thing that I failed to realize. This is also the reason most SSL-enabled applications run fine because most of them are half-duplex in nature (HTTPS for instance) which implicitly guarantees no application data asynchronously arriving at the time of handshake.
What this means is that if you are designing a custom client-server full-duplex protocol (which is the case I am in) and want to slap SSL onto it, it's the application's responsibility to initiate a renegotiation when neither end is sending any data. This is clearly documented in Mozilla's NSS API. Not to mention there is an open ticket in OpenSSL's bug repository regarding this issue. The moment I changed my application to initiate a handshake when there is nothing for the client/server to sayto one another, I no longer faced the above errors.
Also, I agree with David's comments about blocking sockets and I've read many of his arguments in the OpenSSL mailing list as well. But, the sad thing is that most legacy applications are built around poll and blocking sockets and they "Just Work Fine (TM)". The issue arises when dealing with SSL renegotiation. I still believe at least my application can deal with SSL renegotiation in the presence of blocking sockets since it is a very confined and custom protocol and we (as the application developer) can decide to do the renegotiation when the protocol is quiescent. If that doesn't work, I will go the non-blocking socket route.

Can i use select to send data on multiple interfaces as fast as the interface can process

I am an experienced network programmer and am faced with a situation where i need some advice.
I am required to distribute some data on several outgoing interfaces (via different tcp socket connections, each corresponding to each interface). However, the important part is, i should be able to send MORE/most of the data on the interface with better bandwidth i.e. the one that can send faster.
The opinion i had was to use select api (both unix and windows) for this purpose. I have used select, poll or even epoll in the past. But it was always for READING from multiple sockets whenever data is available.
Here i intend to write successive packets on several interfaces in sequence, then monitor each of them for write descriptors (select parameter), then which ever is available (means it was able to send the packet first), i would keep sending more packets via that descriptor.
Will i be able to achieve my intension here? i.e. if i have an interface with 10Mbps link vs another one with 1Mbps, i hope to be able to get most of the packets out via the faster interface.
Update 1: I was wondering what would be select's behavior in this case, i.e. when you call select on read descriptors, the one on which data is available is returned. However, in my scenario when we are writing on the descriptors and waiting for select to return the one that finished writing first, does select ensure returning only when entire packet is written i.e. say i tried writing 1200 bytes in one go. Will it only return when entire 1200 are return or there is a permanent error? I am not sure how would select behave and failed to find any documentation describing that.
I'd adapt the producer/consumer pattern. In this case one producer and several consumers.
Let the main thread handle your source (be the producer) and spawn off one thread for each connection (being the consumers).
The treads in parallel pull a chunk of the source each and send it over the connection one by one.
The thread holding the fastest connection is expected to send the most chunks in this setup.
Using poll/epoll/select for writing is rather tricky. The reason is that sockets are mostly ready for writing unless their socket send buffer is full. So, polling for 'writable' is apt to just spin without ever waiting.
You need to proceed as follows:
When you have something to write to a socket, write it, in a loop that terminates when all the data has been written or write() returns -1 with errno == EAGAIN/EWOULDBLOCK.
At that point you have a full socket send buffer. So, you need to register this socket with the selector/poll/epoll for writability.
When you have nothing else to do, select/poll/epoll and repeat the writes that caused the associated sockets to be polled for writability.
Do those writes the same way as at (1) but this time, if the write completes, deregister the socket for writability.
In other words you must only select/poll for writeability if you already know the socket's send buffer is full, and you must stop doing so immediately you know it isn't.
How you fit all this into your application is another question.

Using C sockets: Address already in use

So the basic premise of my program is that I'm supposed to create a tcp session, direct traffic through it, and detect any connection losses. If the connection does break, I need to close the sockets and reopen them (using the same ports) in such a way that it will seem like the connection (almost) never died. It should also be noted that the two programs will be treated as proxies (data gets sent to them, if the connection breaks it gets stored until connection is fixed, then data is sent off).
I've done some research and gone ahead and used setsockopt() with the SO_REUSEADDR option to set the socket options so that I can reuse the address.
Here's the basic algorithm I do to detect a connection break using signals:
After initial setup of sockets, begin sending data
After x seconds, set a flag to false, which will prevent all other data from being sent
Send a single piece of data to let the other program know the connection is still open, reset timer to x seconds
If I receive same piece of data from the program, set the flag to true to continue sending
If I don't receive the data after x seconds, close the socket and attempt to reconnect
(step 5 is where I'm getting the error).
Essentially one program is a client(on one VM) and one program is a server(on another VM), each sending and receiving data to/from each other and to/from another program on each VM.
My question is: Given that I'm still getting this error after setting the socket options, why am I not allowed to re-bind the address when a connection has been detected?
The server is the one complaining when a disconnect is detected (I close the socket, open a new one, set the option, and attempt to bind the port with the same information).
One other thing of note is the way I'm receiving the data from the sockets. If I have a socket open, I'm basically reading it by doing the following:
while((x = recv(socket, buff, 1, 0)>=0){
//add to buffer
// send out to other program if connection is alive
}
Since I'm using the timer to close/reopen the socket, and this is in a different thread, will this prevent the socket from closing?
SO_REUSEADDR only allows limited reuse of ports. Specifically, it does not allow reuse of a port that some other socket is currently actively listening for incoming connections on.
There seems to be an epidemic here of people calling bind() and then setsockopt() and wondering why the setsockopt() doesn't fix an error that had already happened on bind().
You have to call setsockopt() first.
But I don't understand your problem. Why do you think you need to use the same ports? Why are you setting a flag preventing you from sending data? You don't need any of this. Just handle the errors on send() when and if they arise, creating a new connection when necessary. Don't try to out-think TCP. Many have tried, few if any have succeeded.

Socket read() hangs for a while when there is no data to read

Hi' I'm writing a simple http port forwarder. I read data from port 80, and pass the data to my lighttpd server, on port 8080.
As long as I write() data on the socket on port 8080 (forwarding the request) there's no problem, but when I read() data from that socket (forwarding the response), the last read() hangs a lot (about 1 or 2 seconds) before realizing there's no more data and returning 0.
I tried to set the socket to non-blocking, but this doesn't work, as sometimes it returns EWOULDBLOCKING even if there's some data left (lighttpd + cgi can be quite slow).
I tried to set a timeout with select(), but, as above, a slow cgi could timeout the socket when there's actually some data to transmit.
Update: SOLVED. It was the keepalive after all. After I disabled it in my lighttpd configuration file, the whole thing runs flawlessly.
Well, for the sake of completion, and as per my comment:
It is likely that the HTTP server itself (lighttpd in your case) is maintaining a persistent connection to your proxy because your proxy relayed a header containing “Connection: keep-alive”. This header aids when the client wants to make multiple requests over the same connection. So, because lighttpd received this header, it assumed it was going to receive further requests and kept the socket open, causing read to block in your proxy.
Disabling keep-alive in your lighttpd configuration is one way to fix it, but also you could also strip the “Connection: keep-alive“ from the header before you relay it to your web server.
Using both non-blocking sockets and select is the right way to go. Returning EWLOULDBLOCK doesn't mean that the entire stream of data is finished being received, it means that, instantaneously, there is nothing to read right now. That's exactly what you want, because it means that read won't wait even half a second for more data to show up. If the data isn't immediately available it will return.
Now, obviously, this means you will need to call read multiple times to get the complete data. The general format for doing this is a select loop. In pseudocode:
do
select ( my_sockets )
if ( select error )
handle_error
else
for each ( socket in my_sockets ) do
if ( socket is ready ) then
nonblocking read from socket
if ( no data was read ) then
close socket
remove socket from my_sockets
endif
endif
loop
endif
loop
The idea is that select will tell you which sockets have data available for reading right now. If you read one of those sockets, you are guaranteed either to get data or to get a return value of 0, indicating that the remote end closed the socket.
If you use this method, you will never be stuck in a read call that is not reading data, for any length of time. The blocking operation is the select call, and you can also select over writeable sockets if you need to write, and set a timeout if you need to do things periodically.
Don't do that!
Keepalives boost performance from other clients. Instead, fix your client. Send a Connection: close header in your client and make sure your request doesn't claim HTTP/1.1 compliance. (If for no other reason than that you probably don't handle chunked encoding either.)
I guess that I would use non-blocking I/O to full extend. Instead of setting timeouts I'd rather wait for event's:
while(select(...)) {
switch(...) {
case ...: // Handle accepting new connection
case ...: // Handle reading from socket
...
}
}
Sinle-thread, blocking forwarder will cause problems anyway with multiple clients.
Sorry - I don't remember exact calls. Also it can be strange in some cases (IIRC - you need to handle write), but there are libraries which simplify the task.

Resources