Sending Image Data via HTTP Websockets in C - c

I'm currently trying to build a library similar to ExpressJS in C. I have the ability to send any text (with res.send() functionality) or textually formatted file (.html, .txt, .css, etc.).
However, sending image data seems to cause a lot more trouble! I'm trying to use pretty much the exact same process I used for reading textual files. I saw this post and answer which uses a MAXLEN variable, which I would like to avoid. First, here's how I'm reading the data in:
// fread char *, goes 64 chars at a time
char *read_64 = malloc(sizeof(char) * 64);
// the entirety of the file data is placed in full_data
int *full_data_max = malloc(sizeof(int)), full_data_index = 0;
*full_data_max = 64;
char *full_data = malloc(sizeof(char) * *full_data_max);
full_data[0] = '\0';
// start reading 64 characters at a time from the file while fread gives positive feedback
size_t fread_response_length = 0;
while ((fread_response_length = fread(read_64, sizeof(char), 64, f_pt)) > 0) {
// internal array checker to make sure full_data has enough space
full_data = resize_array(full_data, full_data_max, full_data_index + 65, sizeof(char));
// copy contents of read_64 into full_data
for (int read_data_in = 0; read_data_in < fread_response_length / sizeof(char); read_data_in++) {
full_data[full_data_index + read_data_in] = read_64[read_data_in];
}
// update the entirety data current index pointer
full_data_index += fread_response_length / sizeof(char);
}
full_data[full_data_index] = '\0';
I believe the error is related to this component here. Likely something with calculating data length with fread() responses perhaps? I'll take you through the HTTP response creating as well.
I split the response sending into two components (as per the response on this question here). First I send my header, which looks good (29834 seems a bit large for image data, but that is an unjustified thought):
HTTP/1.1 200 OK
Content-Length: 29834
Content-Type: image/jpg
Connection: Keep-Alive
Access-Control-Allow-Origin: *
I send this first using the following code:
int *head_msg_len = malloc(sizeof(int));
// internal header builder that builds the aforementioned header
char *main_head_msg = create_header(status, head_msg_len, status_code, headers, data_length);
// send header
int bytes_sent = 0;
while ((bytes_sent = send(sock, main_head_msg + bytes_sent, *head_msg_len - bytes_sent / sizeof(char), 0)) < sizeof(char) * *head_msg_len);
Sending the image data (body)
Then I use a similar setup to try sending the full_data element that has the image data in it:
bytes_sent = 0;
while ((bytes_sent = send(sock, full_data + bytes_sent, full_data_index - bytes_sent, 0)) < full_data_index);
So, this all seems reasonable to me! I've even taken a look at the file original file and the file post curling, and they each start and end with the exact same sequence:
Original (| implies a skip for easy reading):
�PNG
�
IHDR��X��d�IT pHYs
|
|
|
RU�X�^Q�����땵I1`��-���
#QEQEQEQEQE~��#��&IEND�B`�
Post using curl:
�PNG
�
IHDR��X��d�IT pHYs
|
|
|
RU�X�^Q�����땵I1`��-���
#QEQEQEQEQE~��#��&IEND�B`
However, trying to open the file that was created after curling results in corruption errors. Similar issues occur on the browser as well. I'm curious if this could be an off by one or something small.
Edit:
If you would like to see the full code, check out this branch on Github.

Related

what is wrong with the following while loop?

I have a c winsock code section where a client receives a comma delimited stream of file fingerprints as shown below. I need to extract the fingerprints from the stream using strtok_s() in a while loop. My problem is most of the time the client does not extract the exact number of fingerprints sent from the server, even though the data received(observed by debugging) is exactly what the server sent.
What am I missing here?
recv_size = recv(clnt_sock, fp_buf, BUF_LEN, 0);
received_fp_size += recv_size;
if (0 != (last_string_len = recv_size % 33))
strncpy(last_string, &fp_buf[(recv_size - last_string_len)], last_string_len);//
while (recv_size > 0)
{
unique_fp = strtok_s(fp_buf, ",", &strtk);
k:
while (unique_fp != NULL)
{
memcpy(unique_fp_buf[unique_files_count], unique_fp, 32);
unique_fp = strtok_s(NULL, ",", &strtk);
unique_files_count++;
}
recv_size = recv(clnt_sock, fp_buf, BUF_LEN, 0);
received_fp_size += recv_size;
if (last_string_len > 0)
{
unique_fp = strtok_s(fp_buf, ",", &strtk);
strncat_s(last_string, unique_fp, strlen(unique_fp));
memcpy(unique_fp, last_string, 32);
last_string_len = 0;
goto k;
}
}
The reason behind the if (0 != (last_string_len = recv_size % 33)) line is; The server sends a multiple of 33 byte strings(32 for the fingerprint and 1 for the coma demlimiter)
One problem is that you never check that fp_buf actually contains a complete token. For instance if the first call only receives 20 bytes, your code will fail by copying a partial fingerprint.
I think another problem is here:
memcpy(unique_fp, last_string, 32);
Seems you are copying into the receive buffer and therefore overwrites some data that you haven't processed yet. Further, you may overwrite a token.
Maybe you actually wanted:
memcpy(unique_fp_buf[unique_files_count], last_string, 32);
^^^^^^^^^^^
unique_fp = strtok_s(NULL, ",", &strtk);
unique_files_count++;
Besides that I think you are making the code much more complicated than needed. The use of a goto kind of tell you that your design is wrong.
Instead of using a last_string you could do:
1) Call recv
2) Process all complete fingerprints
3) Copy the remainder (i.e. the last partial fingerprint) to the start of `fp_buf`
4) Call `recv` with an offset into `fp_buf`
5) Repeat from step 2 (i.e. use a while loop - don't use goto
Step 3 could be something like:
recv_size = recv(clnt_sock, fp_buf + length_of_remainder , BUF_LEN - length_of_remainder, 0);
In that way you don't have to handle the last_string stuff

Getting Host field from TCP packet payload

I'm writing a kernel module in C, and trying to get the Host field from a TCP packet's payload, carrying http request headers.
I've managed to do something similar with FTP (scan the payload and look for FTP commands), but I can't seem to be able to do the same and find the field.
My module is connected to the POST_ROUTING hook.
each packet that goes to that hook, if it has a dst port of 80, is being recognized as an HTTP packet, and so my module starts to parse it.
for some reason, I can't seem to be able to get the HOST line (matter of fact, I only see the server HTTP 200 ok)
are these headers always go on the packets that use port 80?
if so, what is the best way to parse those packt's payload? seems like going char by char is a lot of work. is there any better way?
Thanks
EDIT:
Got some progress.
every packet I get from the server, I can read the payload with no problem. but every packet I send - it's like the payload is empty.
I thought it's a problem of skb pointer, but i'm getting the TCP ports fine. just can't seem to read this damn payload.
this is how i parse it:
unsigned char* user_data = (unsigned char *)((int)tcphd + (int)(tcphd->doff * 4));
unsigned char *it;
for (it = user_data; it != tail; ++it) {
unsigned char c = *(unsigned char *)it;
http_command[http_command_index] = c;
http_command_index++;
}
where tail:
tail = skb_tail_pointer(skb);
The pointer doesn't advance at all on the loop. it's like it's empty from the start or something, and I can't figure out why.
help, please.
I've managed to solve this.
using this
, I've figured out how to parse all of the packet's payload.
I hope this code explains it
int http_command_offset = iphd->ihl*4 + tcphd->doff*4;
int http_command_length = skb->len - http_command_offset;
http_command = kmalloc(http_command_length + 1, GFP_ATOMIC);
skb_copy_bits(skb, http_command_offset , (void*)http_command, http_command_length);
skb_cop_bits, just copies the payload entirely into the buffer i've created. parsing it now is pretty simple.

How to Match a URL in UDP payload using POSIX regexec and libpcap in C

I'm trying to capture the URL from an UDP payload using Libpcap in C with POSIX regex. I have tried all the methods but nothing returns a hit.
I have pasted the part of my code here where im trying to capture the URL that comes with UDP payload.
size_udp = 8;
udp = (struct sniff_udp*)(pktptr + ETHER_HDRLEN + size_udp);
payload_udp = (u_char *)(pktptr + ETHER_HDRLEN + size_ip + size_udp);
size_payload_udp = ntohs(ip->ip_len) - (size_ip + size_udp);
int reg,sh;
regex_t re;
regmatch_t pm;
char *hit;
reg = regcomp(&re, ( "\.youtube\.com", "\.googlevideo\.com","ytimg"), REG_EXTENDED);
sh = regexec(&re, &payload_udp, 2, &pm, REG_EXTENDED);
strcpy(hit, payload_udp + (pm.rm_so - pm.rm_eo));
if(
(strstr(hit,"youtube") != NULL)
|| (strstr(hit,"googlevideo") != NULL)
|| (strstr(hit,"video") != NULL)
|| (strstr(hit,"ytimg") != NULL)
)
{
//Writing to dump file
pcap_dump(usr, pkthdr, pktptr - lnkhdrlen);
}
This is my code. I would like to know why the regex doens't match the URL of Youtube in the UDP Payload.
Thank You for your suggestion
One possible reason is this line:
reg = regcomp(&re, ( "\.youtube\.com", "\.googlevideo\.com","ytimg"), REG_EXTENDED);
In your second argument the expressions concerning youtube and googlevideo are unsed. That is, what is actually compiled is this:
reg = regcomp(&re, "ytimg", REG_EXTENDED);
Your compiler should have warned about this...
Moreover, in
sh = regexec(&re, &payload_udp, 2, &pm, REG_EXTENDED);
some of the arguments do not make sense. pm is only one match structure, yet you tell regexec that it can save 2. &payload_udp is the address of the pointer your payload, not a pointer in the string your are searching for. REG_EXTENDED is not needed for executing only for compiling the regex.
sh (the return value) already tells you whether there was a match (if it returns 0) or not (if it returns REG_NOMATCH). No need to copy and strstr. Btw, your strcpy will copy (without limit) to wherever arbitrary memory location hit happens to point, and it will copy as long as it does not find a '0'-byte.
Finally, if your udp payload is not a null-terminated string (or at least starts with the null-terminated string you want to match against) the approach with regexec will not help.

Missing characters from input stream from fastcgi request

I'm trying to develop simple RESTful api using FastCGI (and restcgi). When I tried to implement POST method I noticed that the input stream (representing request body) is wrong. I did a little test and looks like when I try to read the stream only every other character is received.
Body sent: name=john&surname=smith
Received: aejh&unm=mt
I've tried more clients just to make sure it's not the client messing with the data.
My code is:
int main(int argc, char* argv[]) {
// FastCGI initialization.
FCGX_Init();
FCGX_Request request;
FCGX_InitRequest(&request, 0, 0);
while (FCGX_Accept_r(&request) >= 0) {
// FastCGI request setup.
fcgi_streambuf fisbuf(request.in);
std::istream is(&fisbuf);
fcgi_streambuf fosbuf(request.out);
std::ostream os(&fosbuf);
std::string str;
is >> str;
std::cerr << str; // this way I can see it in apache error log
// restcgi code here
}
return 0;
}
I'm using fast_cgi module with apache (not sure if that makes any difference).
Any idea what am I doing wrong?
The problem is in fcgio.cpp
The fcgi_steambuf class is defined using char_type, but the int underflow() method downcasts its return value to (unsigned char), it should cast to (char_type).
I encountered this problem as well, on an unmodified Debian install.
I found that the problem went away if I supplied a buffer to the fcgi_streambuf constructor:
const size_t LEN = ... // whatever, it doesn't have to be big.
vector<char> v (LEN);
fcgi_streambuf buf (request.in, &v[0], v.size());
iostream in (&buf);
string s;
getline(in, s); // s now holds the correct data.
After finding no answer anywhere (not even FastCGI mailing list) I dumped the original fastcgi libraries and tried using fastcgi++ libraries instead. The problem disappeared. There are also other benefits - c++, more features, easier to use.
Use is.read() not is >> ...
Sample from restcgi documentation:
clen = strtol(clenstr, &clenstr, 10);
if (*clenstr)
{
cerr << "can't parse \"CONTENT_LENGTH="
<< FCGX_GetParam("CONTENT_LENGTH", request->envp)
<< "\"\n";
clen = STDIN_MAX;
}
// *always* put a cap on the amount of data that will be read
if (clen > STDIN_MAX) clen = STDIN_MAX;
*content = new char[clen];
is.read(*content, clen);
clen = is.gcount();

HTTP protocol: end of a message body

I built a program that parses the header and I would like to read the message body in case I receive a POST.
For headers, I have been able to look for to determine when the header ends. I am having more issues for the message body. Am I supposed to look at "Content-Length" field to know when to stop reading input? In my current code (below), it will not stop until I hit the red cross (stop loading page) in Firefox.
Here is the code:
size_t n;
unsigned char newChar;
int index = 0;
int capacity = 50;
char *option = (char *) malloc(sizeof(char) * capacity);
while ( ( n = read( req->socket, &newChar, sizeof(newChar) ) ) > 0 ) {
if (newChar == '\0' || newChar == '\n') break; // This is not working
if (index == capacity) {
capacity *= 2;
option = (char *) realloc(option, sizeof(char) * capacity);
assert(option != NULL);
}
option[index++] = newChar;
fprintf(stderr, "%c", newChar);
}
if (index == capacity) {
capacity *= 2;
option = (char *) realloc(option, sizeof(char) * capacity);
assert(option != NULL);
}
option[index] = '\0';
The correct input gets printed, but I wonder why it won't stop until the stop loading button get pressed. I'd like to know if there is any other solution or if I please need to use the "Content-Length" field in the header.
Thank you very much,
Jary
There are a few things to consider. You'll want to consider how you want to handle all of these cases perhaps?
For HTTP protocol 1.0 the connection closing was used to signal the end of data.
This was improved in HTTP 1.1 which supports persistant connections. For HTTP 1.1 typically you set or read the Content-Length header to know how much data to expect.
Finally with HTTP 1.1 there is also the possibility of "Chunked" mode, you get the size as they come and you know you've reached the end when a chunk Size == 0 is found.
Also do you know about libcurl? It will certainly help you having to re-implement the wheel.
This code blocks on the read() waiting for another character which never comes.
Additionally, RFC2616, 3.7.1 states "HTTP applications MUST accept CRLF, bare CR, and bare LF as being representative of a line break in text media received via HTTP. In addition, if the text is represented in a character set that does not use octets 13 and 10 for CR and LF respectively, as is the case for some multi-byte character sets, HTTP allows the use of whatever octet sequences are defined by that character set to represent the equivalent of CR and LF for line breaks."
So you're going to need to catch more than just "\n".

Resources