UPDATE :
I edited my question to focus it more on the problem.
Context
Coding to understand how loadbalancing works.
Debian 64 bits.
loadbalancer on 127.0.0.1:36001
backend on 127.0.0.1:36000
curl calling on loadbalancer (36001).
Problem
I took this code to create my socket socket code
I created a very naive loadbalancer, a server and I have a curl request to the loadbalancer.
My problem is that I don't understand how to pass the client port/ip to the backend to anwser directly to curl when splicing.
The code
curl request
curl localhost:36001 -d "salute"
loadbalancer
static int pool_of_conn[2],
pool_of_pipes[2][2];
int sentinel , efd;
struct epoll_event event;
struct epoll_event *events;
...
off64_t offset = 0;
pipe(pool_of_pipes[0]);
/* This splice works but sends the loadbalancer ip and port. How to put the client's here ? How to alter events[i].data.fd ? */
bytes = splice(events[i].data.fd, NULL, pool_of_pipes[0][1], NULL, 4096, SPLICE_F_MOVE);
if (bytes == 0)
break;
splice(pool_of_pipes[0][0], NULL, pool_of_conn[0], NULL, bytes, SPLICE_F_MOVE);
Could you help me please ?
The fundamental issue with your problem is that HTTP, goes over TCP, which is a connection oriented protocol. So, even if you could make the connection to the backend appear to come from the client IP address, the TCP connection is established between the load balancer and backend. Thus the return TCP packets use that connection also. To somehow work around this, you would need for HTTP to use a different TCP connection for the request and reply and also manipulate local IP routing to behave so that the backends reply still goes to the actual client.
If your intent is to make a HTTP load balancer, the easiest solution probably lies somehwere on the application protocol (i.e. HTTP) level. You could use HTTP redirects or something more complex, like wrapping the client request in some headers before forwarding it. Im sure existing HTTP load balancers offer plenty of examples and ideas.
That said, if you want to solve this on the TCP connection level, transparently to the application protocols, it is doable, but it does require quite a bit of additional code. Also, the backends reply still goes through the load balancer. My suggestion would be to look into the IP option IP_TRANSPARENT. It would allow your load balancer to use the clients IP address as source, when connecting to the backend. (Note that the return traffic also goes through the load balancer, so your splice() comes in handy)
Here is an old SO question about IP_TRANSPARENT Transparent proxying - how to pass socket to local server without modification?
and here is the Linux kernel documentation for it https://www.kernel.org/doc/Documentation/networking/tproxy.txt
Related
I am currently working on a multi threaded proxy server that supports keep-alive connections. I see some weird issues while handling requests from firefox browser. I connect to my local proxy using localhost:10001/http://url, and I can access all the links on this host. The process is as below.
1. Create a socket bind it to port 10001
2. Accept connections and if a client is connected fork()
3. Keep on processing the client request as persistent connection.
Now the problem is that when I open a new tab in firefox to access a second url with different host with using localhost:10001/http://url2, the strange thing is that that request goes to my client socket connection created during first connection. I initially thought that it might be due to my code, but then i tried to do the same using telnet and all the new connections would create a separate process. Are there any specific settings that is making firefox browser do this??
HTTP keep-alive is a way to reuse an underlying TCP connection for multiple requests so that one can skip the overhead of creating a new TCP connection all the time. Since the target of the connection is the same all the time in your case it makes sense for the browser to reuse the same TCP connection. The comparison with telnet is flawed since with telnet you do a new TCP connection all the time.
If HTTP keep-alive gets used is specified by the HTTP version the Connection header and on the behavior of both server and client. Both server and client can decide to close the idle connection any time after a request was done, i.e. they are not required to keep it open after the request is done. Additionally they can signal that they like to have the connection open by using the Connection: keep-alive HTTP header or that they like to close after the request with Connection: close. These headers have default values depending on the HTTP version, i.e. keep-alive is on with HTTP/1.1 while off with HTTP/1.0 unless explicitly specified.
Apart from that the "proxy" you are implementing with the use of URL's like http://proxy/real-url is not a real HTTP proxy. A real HTTP proxy would be configured as a proxy inside the browser and the URL's you use would stay the same which also means that no URL rewriting would need to be done by the proxy. Worse is that your idea of a proxy effectively merges all hosts inside the same origin (i.e. origin is the proxy) and thus effectively disables a major security concept of the browser: the same-origin policy. This means for example that some rogue advertisement server would share with your implementation the origin with ebay and thus could get access to the ebay cookies and hijack the session and misuse it for identity theft
HTTP persistent connection is also used with the proxy, not only with the destination.
For firefox you could try to alter the behavior with the proxy by setting network.http.proxy.version to 1.0. But you'll have to enhance your proxy (and perhaps rethink completely its inner workings) to be able to deal with these reused connections. I'm sure it's not limited to Firefox.
Also make sure your proxy doesn't answer with HTTP/1.1 because it's not.
My issue is simple: I want to make an HTTP GET request from a server. My client program takes a URL and sends it to the server program. I want to know how I can take this URL and make an HTTP GET request for the specific page that was entered from the client.
If I resolve the URL to get an IP address, I could open a socket to that IP address, but then how would I make the request for the actual page? Would I just send an HTTP GET request directly to that IP address with the directory in the request?
For instance, if I ran a Google search for the word "Test" I'd be presented with the following URL:
https://www.google.com/?gws_rd=ssl#q=Test
My understanding is that a GET request could look like this:
GET /?gws_rd=ssl#q=Test HTTP/1.1
Host: www.google.com
So, if I'm understanding this correctly, would I resolve the IP, open a socket, and then just send this GET request directly to the socket?
Lastly, if I try throwing a URL such as the one above into my server code it's unable to resolve an IP address. This means that if I'm making a more complex request than something like www.google.com I'd have to parse the string and match only the host. Is there a simple way to handle this other than by the use of regular expressions? I'm familiar with regular expressions from Python and C#, but if I can cut down on the complexity of the program by approaching this differently, I'd like to know.
Update: I'm attempting to match the URL with a POSIX regular expression and extract the domain name from it. I'm not having much luck so far as this implementation is oppressively confusing.
Yes, once a socket has been opened you could send requests as in your example and described in RFC 2616.
If you don't want to use regular expressions or strchr to split your URL you cold also send the entire URL:
`GET http://www.google.com/?gws_rd=ssl#q=Test HTTP/1.1
`
However, you will still need to find the hostname in the URL to make a call to something like gethostbyname.
I have a basic packet sniffer like http://www.binarytides.com/packet-sniffer-code-c-linux/
I have extended it to process packets only on port 80 (HTTP). I am not sure how to get host web address from data. Can you guys help me here
What I am trying to do is parse HTTP header subset in order to identify host web address
I found something similar to what I need : https://github.com/joyent/http-parser/blob/master/http_parser.h#L194
but the code is too complex...
Or where can I find HTTP header bytewise breakdown like for TCP http://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_structure
You need to grab the tcp data, then look for "GET". A typical http request looks like:
GET www.foo.com HTTP/1.0
web host name just follows the GET request. So you can extract the web host address from there.
I have a module that needs to return a reference URI in its payload. If called via an SSL connection, I need to build a URI that has the https prefix. I can easily get the port number from the request, but the problem is the user could have picked any port for SSL (and in fact this particular Apache instance always starts out with a non-standard SSL port).
All of the parsed URI's in the request structure already have the http/https prefix removed. I'm contemplating resorting to the r->server->defn_name field, which actually has the conf file for the request's virtual server in it. If I see that ends with httpd-ssl.conf, I can guess this is an SSL connection. Still feels like a hack, and in reality the user could name that conf file something else too, which would break this approach.
Another approach would be to read the config file and find the SSL VirtualHost Listen port, but I haven't been able to accomplish this either.
It seems like I am missing some very simple way to tell if the request was made via https, but I have scanned all of the structures available from the request_rec and I don't see anything obvious.
There is a nice function defined in the httpd.h header file that will give you the scheme for a request:
if (apr_strnatcmp(ap_http_scheme(r), "https") == 0) {
ssl = TRUE;
} else {
ssl = FALSE;
}
I must develop proxy server that work with only HTTP 1.0 in Linux and by c .
I need some hint to start developing .
I assume you are confident in using linux and the language c (no hints for that, else don't start with developing a proxy)
Read and understand the RFC 1945 HTTP/1.0 (pay attention to the specific mentioning of proxy)
Determine what kind of proxy you want (web/caching/content-filter/anonymizer/transparent/non-transparent/reverse/gateway/tunnel/...)
Start developing the server
Basic steps
Open port
Listen on port
Get all request sent from the client to that port (maybe make the whole thing multithreaded to be able to handle more than 1 request at a time)
Determine if it is a valid HTTP 1.0 request
Extract the request components
Rebuild the request according to what type of proxy you are
Send the new request
Get the response
Send response to client
How to create a proxy server:
Open a port to listen on
Catch all incoming requests on that report
Determine the web address requested
Open a connection to the host and forward the request
Receive response
Send the response back to the requesting client
Additionally: Use threads to allow for multiple requests to the server.