I am working on a school project, in which I have to analyze .pcap files in C language using the libcap library. I am new to networking, however I do know that TCP is on the layer 4 and HTTP is on the 7th layer in the OSI model. I want to sort HTTP packets, and print out the source/destination ports but I'm a little confused how to distinguish HTTP protocols from TCP protocols.
Here is an example, which I don't understand:
EDIT: Here is another example, where the source port is 80, the length is 100. The 54th byte is 48, which is the same as for a HTTP 1.1 response packet. It is a TCP.
https://i.stack.imgur.com/RQs6v.png
The destination port here is 80, which is HTTP. However wireshark does not list this packet as a HTTP protocol, it is just TCP.
https://i.stack.imgur.com/TsVuO.png
Me question is how to determine based on bytes if the packet is a HTTP protocol or just a TCP protocol?
You cannot determine if a packet is HTTP or not just by looking at its headers. HTTP is application level, if you want to identify an HTTP stream you will have to check the innermost payload of the packet. In other words, HTTP packets are distinguishable just by looking at what comes after the TCP header. Wireshark already does this for you and marks packets that look like HTTP as such. You can filter packets identified as HTTP by Wireshark by simply typing http in the filter bar at the top.
In your case, the packet you show has Length = 0, so there really isn't anything to analyze other than the various headers of the different layers. The packet is not HTTP.
Determining HTTP traffic "based on bytes" can be done by looking at the payload: HTTP requests and responses have known formats. For example HTTP 1.1 requests start with <METHOD> <URI> HTTP/1.1\r\n, and responses with HTTP/1.1 <CODE> <MSG>\r\n.
Related
For servers written in tcp raw socket like nginx, how do servers correctly detect and handle non HTTP messages among HTTP messages(skip those bytes and move to the next valid http messages)?
It doesn't. If the first request on the connection doesn't start with a valid HTTP request line, it closes the connection. No other detection, no skipping, no moving to the next message.
A search for 'raw' at nginx.org returned no results.
I build a http packet by using libnet build functions, and send it by libnet_write. I see the packet is sent successfully via wireshark, the tcp and ip header are all right. But client cannot parse my packet. For example, the client doesn't load the html when I send "HTTP/1.1 200 OK" packet; the client doesn't jump to the redirect site when I send "HTTP/1.1 302 Moved Temporarily" packet. etc..
You provide no code in your question, and no example of the packet you construct, but HTTP runs over TCP, and you mention writing only a single packet. This cannot work. Establishing a TCP connection requires writing a SYN packet, reading the SYN/ACK response, and writing a final ACK.
Only after this 3-packet exchange can a TCP packet with data be sent.
Also, its not at all clear why you are trying to do this. If you want data to be received by an HTTP server, you should construct and send it with the normal socket APIs.
UPDATE :
I edited my question to focus it more on the problem.
Context
Coding to understand how loadbalancing works.
Debian 64 bits.
loadbalancer on 127.0.0.1:36001
backend on 127.0.0.1:36000
curl calling on loadbalancer (36001).
Problem
I took this code to create my socket socket code
I created a very naive loadbalancer, a server and I have a curl request to the loadbalancer.
My problem is that I don't understand how to pass the client port/ip to the backend to anwser directly to curl when splicing.
The code
curl request
curl localhost:36001 -d "salute"
loadbalancer
static int pool_of_conn[2],
pool_of_pipes[2][2];
int sentinel , efd;
struct epoll_event event;
struct epoll_event *events;
...
off64_t offset = 0;
pipe(pool_of_pipes[0]);
/* This splice works but sends the loadbalancer ip and port. How to put the client's here ? How to alter events[i].data.fd ? */
bytes = splice(events[i].data.fd, NULL, pool_of_pipes[0][1], NULL, 4096, SPLICE_F_MOVE);
if (bytes == 0)
break;
splice(pool_of_pipes[0][0], NULL, pool_of_conn[0], NULL, bytes, SPLICE_F_MOVE);
Could you help me please ?
The fundamental issue with your problem is that HTTP, goes over TCP, which is a connection oriented protocol. So, even if you could make the connection to the backend appear to come from the client IP address, the TCP connection is established between the load balancer and backend. Thus the return TCP packets use that connection also. To somehow work around this, you would need for HTTP to use a different TCP connection for the request and reply and also manipulate local IP routing to behave so that the backends reply still goes to the actual client.
If your intent is to make a HTTP load balancer, the easiest solution probably lies somehwere on the application protocol (i.e. HTTP) level. You could use HTTP redirects or something more complex, like wrapping the client request in some headers before forwarding it. Im sure existing HTTP load balancers offer plenty of examples and ideas.
That said, if you want to solve this on the TCP connection level, transparently to the application protocols, it is doable, but it does require quite a bit of additional code. Also, the backends reply still goes through the load balancer. My suggestion would be to look into the IP option IP_TRANSPARENT. It would allow your load balancer to use the clients IP address as source, when connecting to the backend. (Note that the return traffic also goes through the load balancer, so your splice() comes in handy)
Here is an old SO question about IP_TRANSPARENT Transparent proxying - how to pass socket to local server without modification?
and here is the Linux kernel documentation for it https://www.kernel.org/doc/Documentation/networking/tproxy.txt
I have a basic packet sniffer like http://www.binarytides.com/packet-sniffer-code-c-linux/
I have extended it to process packets only on port 80 (HTTP). I am not sure how to get host web address from data. Can you guys help me here
What I am trying to do is parse HTTP header subset in order to identify host web address
I found something similar to what I need : https://github.com/joyent/http-parser/blob/master/http_parser.h#L194
but the code is too complex...
Or where can I find HTTP header bytewise breakdown like for TCP http://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_segment_structure
You need to grab the tcp data, then look for "GET". A typical http request looks like:
GET www.foo.com HTTP/1.0
web host name just follows the GET request. So you can extract the web host address from there.
I have a project in C , in which I receive HTTP GET Requests (port 10000), process it and send appropriate response. I use winsock Libraries for network connections. Also I have a module which receives HTTPS request on a different port (port 10001). The client specifies which port it has to send to , if it is sending a http message it will send on port 10000 and if it is a https request it will send to port 10001.
Due to this I realize, that the coming request is a HTTP or HTTPS request.
There is a requirement, that the client will specify only one port number whether it is HTTP or HTTPS, i.e. it will send only on port 10000.
So now, when an HTTPS message comes on port 10000, it will be all encrypted but I want that message to go to port 10001. Is there any way to differentiate a HTTP or HTTPS request at the server level ?
If the first byte coming from client is 0x16, it's the beginning of SSL handshake. As it's not a possible start of HTTP request, you can differentiate requests by this property.