C String Checking - c

I'm new to C specifically and I'm trying to check some strings.
The following is my code, commented to indicate the issues that I don't understand why they are occuring:
if (strstr(recBuff, "GET / HTTP/1.0\r\n\r\n") != NULL)
//Send HTTP/1.0 200
//This gets recognised fine
else if (strstr(recBuff, "GET / HTTP/1.0\r\r") != NULL)
//Send HTTP/1.0 200
//This gets recognised fine
else if (strstr(recBuff, "GET / HTTP/1.0\r\n") != NULL)
//Do something else
//This never gets picked up, and instead goes to the final else...
else
//HTTP/1.0 404
//Etc
I guess my question is why is strstr picking up \r\n\r\n and acting on it, but just \r\n by itself goes through all the way until the final else? There's an else for \r\n\r\n that works, but the else for a single \r\n doesn't work for a single \r\n.
TL;DR "GET / HTTP/1.0\r\n\r\n" gets picked up, but "GET / HTTP/1.0\r\n" doesn't.

You've not reduced your code to an an SSCCE (Short, Self-Contained, Correct Example) so we can't tell what you're doing wrong. However, it is most likely that the data you think has two carriage returns actually doesn't contain the two adjacent carriage returns. However, only some sort of hex dump or something similar will show that for sure.
Here's an SSCCE which shows that your code can work if given the correct data:
#include <stdio.h>
#include <string.h>
int main(void)
{
char *examples[] =
{
"YYYYGET / HTTP/1.0\r\nExample 1 Single CRLF",
"YYYYGET / HTTP/1.0\r\n\r\nExample 2 Double CRLF",
"YYYYGET / HTTP/1.0\r\r\nExample 3 Double CR",
"YYYYGET / HTTP/1.0\n\nExample 4 Double NL",
};
for (int i = 0; i < 4; i++)
{
char *recBuff = examples[i];
printf("Data:\n%s\n", recBuff);
if (strstr(recBuff, "GET / HTTP/1.0\r\n\r\n") != NULL)
printf("Option 1 - double CRLF\n");
else if (strstr(recBuff, "GET / HTTP/1.0\r\r") != NULL)
printf("Option 2 - double CR\n");
else if (strstr(recBuff, "GET / HTTP/1.0\r\n") != NULL)
printf("Option 3 - single CRLF\n");
else
printf("Option 4 - no match\n");
}
return 0;
}
Sample output
$ ./counter-example
Data:
YYYYGET / HTTP/1.0
Example 1 Single CRLF
Option 3 - single CRLF
Data:
YYYYGET / HTTP/1.0
Example 2 Double CRLF
Option 1 - double CRLF
Data:
YYYYGET / HTTP/1.0
Example 3 Double CR
Option 2 - double CR
Data:
YYYYGET / HTTP/1.0
Example 4 Double NL
Option 4 - no match
$
So, if you are not seeing something similar with your code, you aren't getting the data you thought you were getting.
The YYYY part is not necessary to the reproduction; neither is the Example n information. The trailing part makes sure the fairly difficult to discriminate strings are recognizable; the YYYY is arguably fluff since the HTTP protocol would not start with such garbage.

Related

Sending Image Data via HTTP Websockets in C

I'm currently trying to build a library similar to ExpressJS in C. I have the ability to send any text (with res.send() functionality) or textually formatted file (.html, .txt, .css, etc.).
However, sending image data seems to cause a lot more trouble! I'm trying to use pretty much the exact same process I used for reading textual files. I saw this post and answer which uses a MAXLEN variable, which I would like to avoid. First, here's how I'm reading the data in:
// fread char *, goes 64 chars at a time
char *read_64 = malloc(sizeof(char) * 64);
// the entirety of the file data is placed in full_data
int *full_data_max = malloc(sizeof(int)), full_data_index = 0;
*full_data_max = 64;
char *full_data = malloc(sizeof(char) * *full_data_max);
full_data[0] = '\0';
// start reading 64 characters at a time from the file while fread gives positive feedback
size_t fread_response_length = 0;
while ((fread_response_length = fread(read_64, sizeof(char), 64, f_pt)) > 0) {
// internal array checker to make sure full_data has enough space
full_data = resize_array(full_data, full_data_max, full_data_index + 65, sizeof(char));
// copy contents of read_64 into full_data
for (int read_data_in = 0; read_data_in < fread_response_length / sizeof(char); read_data_in++) {
full_data[full_data_index + read_data_in] = read_64[read_data_in];
}
// update the entirety data current index pointer
full_data_index += fread_response_length / sizeof(char);
}
full_data[full_data_index] = '\0';
I believe the error is related to this component here. Likely something with calculating data length with fread() responses perhaps? I'll take you through the HTTP response creating as well.
I split the response sending into two components (as per the response on this question here). First I send my header, which looks good (29834 seems a bit large for image data, but that is an unjustified thought):
HTTP/1.1 200 OK
Content-Length: 29834
Content-Type: image/jpg
Connection: Keep-Alive
Access-Control-Allow-Origin: *
I send this first using the following code:
int *head_msg_len = malloc(sizeof(int));
// internal header builder that builds the aforementioned header
char *main_head_msg = create_header(status, head_msg_len, status_code, headers, data_length);
// send header
int bytes_sent = 0;
while ((bytes_sent = send(sock, main_head_msg + bytes_sent, *head_msg_len - bytes_sent / sizeof(char), 0)) < sizeof(char) * *head_msg_len);
Sending the image data (body)
Then I use a similar setup to try sending the full_data element that has the image data in it:
bytes_sent = 0;
while ((bytes_sent = send(sock, full_data + bytes_sent, full_data_index - bytes_sent, 0)) < full_data_index);
So, this all seems reasonable to me! I've even taken a look at the file original file and the file post curling, and they each start and end with the exact same sequence:
Original (| implies a skip for easy reading):
�PNG
�
IHDR��X��d�IT pHYs
|
|
|
RU�X�^Q�����땵I1`��-���
#QEQEQEQEQE~��#��&IEND�B`�
Post using curl:
�PNG
�
IHDR��X��d�IT pHYs
|
|
|
RU�X�^Q�����땵I1`��-���
#QEQEQEQEQE~��#��&IEND�B`
However, trying to open the file that was created after curling results in corruption errors. Similar issues occur on the browser as well. I'm curious if this could be an off by one or something small.
Edit:
If you would like to see the full code, check out this branch on Github.

extract number from http request

I want to extract some number from HTTP Get requests in C.
for example if my HTTP request is like:
GET /getUIKVal?mdn=9860436150 HTTP/1.1
Host: api.end.point
I want number 9860436150 to be printed as output.
I have already tried with sscanf() and atoi()
You can simply use sscanf like below.
char* line = "GET /getUIKVal?mdn=9860436150 HTTP/1.1";
long long val ;
int ret = sscanf(line, "%*[^=]=%lld",&val);
printf("%lld\n", val) ;
Where %*[^=]= will read and discard the string until it reaches =
and %ld will read actual number in val.
You could use strstr to identify marker mdn= and then either scan a number or a string. Note that for scanning the number you don't need to copy the respective contents; The code below shows how:
const char* content = "GET /getUIKVal?mdn=9860436150 HTTP/1.1";
const char* startOfNumber = strstr(content,"mdn=");
if (startOfNumber) {
startOfNumber += strlen("mdn=");
long number;
if (scanf("%ld",&number)==1) {
printf("the number is... %ld", number);
} else {
printf("no valid number after 'mdn='");
}
} else {
printf("marker 'mdn=' not found.");
}
I actually prefer the strstr-solution over a solution where scanf identifies both the marker and the number because it is hard to reason about syntax errors then.

Remove byte-order-mark in R/C

This SO post has an example of a server that generates json with a byte order mark. RFC7159 says:
Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.
Currently yajl and hence jsonlite choke on the BOM. I would like to follow the RFC suggestion and ignore the BOM from the UTF8 string if present. What is an efficient way to do this? A naive implementation:
if(substr(json, 1, 1) == "\uFEFF"){
json <- substring(json, 2)
}
However substr is a bit slow for large strings, and I am not sure this is the correct way to do this. Is there a more efficient way in R or C to remove the BOM if present?
A simple solution:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
std::string stripBom(std::string x) {
if (x.size() < 3)
return x;
if (x[0] == '\xEF' && x[1] == '\xBB' && x[2] == '\xBF')
return x.substr(3);
return x;
}
/*** R
x <- "\uFEFFabcdef"
print(x)
print(stripBom(x))
identical(x, stripBom(x))
utf8ToInt(x)
utf8ToInt(stripBom(x))
*/
gives
> x <- "\uFEFFabcdef"
> print(x)
[1] "abcdef"
> print(stripBom(x))
[1] "abcdef"
> identical(x, stripBom(x))
[1] FALSE
> utf8ToInt(x)
[1] 65279 97 98 99 100 101 102
> utf8ToInt(stripBom(x))
[1] 97 98 99 100 101 102
EDIT: What might also be useful is seeing how R does it internally -- there are a number of situations where R strips BOM (e.g. for its scanners and file readers). See:
https://github.com/wch/r-source/blob/bfe73ecd848198cb9b68427cec7e70c40f96bd72/src/main/scan.c#L455-L458
https://github.com/wch/r-source/blob/bfe73ecd848198cb9b68427cec7e70c40f96bd72/src/main/connections.c#L3950-L3957
Based on Kevin's Rcpp example I used the following C function to check for the bom:
SEXP R_parse(SEXP x) {
/* get data from R */
const char* json = translateCharUTF8(asChar(x));
/* ignore BOM as suggested by RFC */
if(json[0] == '\xEF' && json[1] == '\xBB' && json[2] == '\xBF'){
warning("JSON string contains UTF8 byte-order-mark!");
json = json + 3;
}
/* parse json */
char errbuf[1024];
yajl_val node = yajl_tree_parse(json, errbuf, sizeof(errbuf));
}

Formal grammar of XML

Im trying to build small parser for XML files in C. I know, i could find some finished solutions but, i need just some basic stuff for embedded project. I`m trying to create grammar for describing XML without attributes, just tags, but it seems it is not working and i was not able to figure out why.
Here is the grammar:
XML : FIRST_TAG NIZ
NIZ : VAL NIZ | eps
VAL : START VAL END
| STR
| eps
Here is part of C code that implement this grammar :
void check() {
getSymbol();
if( sym == FIRST_LINE )
{
niz();
}
else {
printf("FIRST_LINE EXPECTED");
exit(1);
}
}
void niz() {
getSymbol();
if( sym == ERROR )
return;
if( sym == START ) {
back = 1;
val();
niz();
}
printf(" EPS OR START EXPECTED\n");
}
void val() {
getSymbol();
if( sym == ERROR )
return;
if( sym == START ) {
back = 0;
val();
getSymbol();
if( sym != END ) {
printf("END EXPECTED");
exit(1);
}
return;
}
if( sym == EMPTY_TAG || sym == STR)
return;
printf("START, STR, EMPTY_TAG OR EPS EXPECTED\n");
exit(1);
}
void getSymbol() {
int pom;
if(back == 1) {
back = 0;
return;
}
sym = getNextToken(cmd + offset, &pom);
offset += pom + 1;
}
EDIT: Here is the example of XML file that does not satisfy this grammar:
<?xml version="1.0"?>
<VATCHANGES>
<DATE>15/08/2012</DATE>
<TIME>1452</TIME>
<EFDSERIAL>01KE000001</EFDSERIAL>
<CHANGENUM>1</CHANGENUM>
<VATRATE>A</VATRATE>
<FROMVALUE>16.00</FROMVALUE>
<TOVALUE>18.00</TOVALUE>
<VATRATE>B</VATRATE>
<FROMVALUE>2.00</FROMVALUE>
<TOVALUE>0.00</TOVALUE>
<VATRATE>C</VATRATE>
<FROMVALUE>5.00</FROMVALUE>
<TOVALUE>0.00</TOVALUE>
<DATE>25/05/2010</DATE>
<CHANGENUM>2</CHANGENUM>
<VATRATE>C</VATRATE>
<FROMVALUE>0.00</FROMVALUE>
<TOVALUE>4.00</TOVALUE>
</VATCHANGES>
It gives END EXPECTED at the output.
First, your grammar needs some work. Assuming the preamble is handled correctly, you have a basic error in the definition of NIZ.
NIZ : VAL NIZ | eps
VAL : START VAL END
| STR
| eps
So we enter NIZ and we look for VAL first. The problem is the eps on the end of both VAL's possible productions and NIZ. Therefore, if VAL produces nothing (i.e. eps) and consumes no tokens in the process (which it can't to be proper, since eps is the production), NIZ reduces to:
NIZ: eps NIZ | eps
which isn't good.
Consider into something more along these lines: I just spewed this with no real foresight into having something beyond a purely basic construction.
XML: START_LINE ELEMENT
ELEMENT: OPENTAG BODY CLOSETAG
OPENTAG: lt id(n) gt
CLOSETAG: lt fs id(n) gt
BODY: ELEMENT | VALUE
VALUE: str | eps
This is super basic. Terminals include:
lt: '<'
gt: '>'
fs: '/'
str: any alphanumeric string excluding chars lt or gt.
id(n): any alphanumeric string excluding chars lt, gt, or fs.
I can almost feel the wrath of the XML purists raining down on me right now, but the point I'm trying to get across is that, when an grammar is well-defined, the RDP will literally write itself. Obviously the lexer (i.e. the token engine) needs to handle the terminals accordingly. Note: the id(n) is an id-stack to ensure you properly close the innermost tag, and is an attribute of your parser in accordance with how it manages tag ids. Its not traditional, but it makes things MUCH easier.
This can/should clearly be expanded to include stand-alone element declarations and short-cut element closure. For example, this grammar allows for elements of this form:
<ElementName>...</ElementName>
but not of this form:
<ElementName/>
Nor does it account for short-cut termination such as:
<ElementName>...</>
Accounting for such additions will obviously complicate the grammar considerably, but also make the parser substantially more robust. Like I said, the sample above is basic with a capital B. If you're really going to embark on this these are things you want to consider when designing your grammar, and thus also your RDP by consequence.
Anyway, just consider how a few reworks in your grammar can/will substantially make this easier on you.

HTTP protocol: end of a message body

I built a program that parses the header and I would like to read the message body in case I receive a POST.
For headers, I have been able to look for to determine when the header ends. I am having more issues for the message body. Am I supposed to look at "Content-Length" field to know when to stop reading input? In my current code (below), it will not stop until I hit the red cross (stop loading page) in Firefox.
Here is the code:
size_t n;
unsigned char newChar;
int index = 0;
int capacity = 50;
char *option = (char *) malloc(sizeof(char) * capacity);
while ( ( n = read( req->socket, &newChar, sizeof(newChar) ) ) > 0 ) {
if (newChar == '\0' || newChar == '\n') break; // This is not working
if (index == capacity) {
capacity *= 2;
option = (char *) realloc(option, sizeof(char) * capacity);
assert(option != NULL);
}
option[index++] = newChar;
fprintf(stderr, "%c", newChar);
}
if (index == capacity) {
capacity *= 2;
option = (char *) realloc(option, sizeof(char) * capacity);
assert(option != NULL);
}
option[index] = '\0';
The correct input gets printed, but I wonder why it won't stop until the stop loading button get pressed. I'd like to know if there is any other solution or if I please need to use the "Content-Length" field in the header.
Thank you very much,
Jary
There are a few things to consider. You'll want to consider how you want to handle all of these cases perhaps?
For HTTP protocol 1.0 the connection closing was used to signal the end of data.
This was improved in HTTP 1.1 which supports persistant connections. For HTTP 1.1 typically you set or read the Content-Length header to know how much data to expect.
Finally with HTTP 1.1 there is also the possibility of "Chunked" mode, you get the size as they come and you know you've reached the end when a chunk Size == 0 is found.
Also do you know about libcurl? It will certainly help you having to re-implement the wheel.
This code blocks on the read() waiting for another character which never comes.
Additionally, RFC2616, 3.7.1 states "HTTP applications MUST accept CRLF, bare CR, and bare LF as being representative of a line break in text media received via HTTP. In addition, if the text is represented in a character set that does not use octets 13 and 10 for CR and LF respectively, as is the case for some multi-byte character sets, HTTP allows the use of whatever octet sequences are defined by that character set to represent the equivalent of CR and LF for line breaks."
So you're going to need to catch more than just "\n".

Resources