a stack overflow (about "evhttp_uri_parse")

a stack overflow (about "evhttp_uri_parse") - c

code realize function that reading file(contain lots of urls) ,every url pass through "evhttp_uri_parse" getting host and path.But it has a error that evhttp_uri_parse parse fail ,return NULL。Possibly reason is a stack overflow.
FILE *fp=fopen(argv[1],"rb");
if(NULL==fp)
{
printf("open url_file is error %d::%s\n",errno,strerror(errno));
return 0;
}
char url_buf[2048];
memset(url_buf,'\0',sizeof(url_buf));
fgets(url_buf,sizeof(url_buf),fp);
while(!feof(fp))
{
if(strlen(url_buf)>1)
{
printf("url_buf::%s",url_buf);
#if 1
struct evhttp_uri *ev_uri=NULL;
ev_uri=evhttp_uri_parse(url_buf);
if(ev_uri==NULL)
{
printf("parse uri error::%d,%s\n",errno,strerror(errno));
}
const char *host=evhttp_uri_get_host(ev_uri);
const char *path=evhttp_uri_get_path(ev_uri);
printf("query host::%s,path::%s\n",host,path);
evhttp_uri_free(ev_uri);
#endif
}
memset(url_buf,'\0',sizeof(url_buf));
fgets(url_buf,sizeof(url_buf),fp);
}
fclose(fp);

fgets(url_buf,sizeof(url_buf)+1,fp) should be changed to fgets(url_buf,sizeof(url_buf),fp)
fgets adds '\n' at the end of the string. Try to remove it and see if it helps.

if your url for any reason greater than 2048 character size then fgets will not completely return you the url you wanted and return you a part of it (with 2047 character) with a null character at 2048'th location only.
so thats why it's a bad idea to put sizeof(url_buf)+1. it will lead to undefined behavior since you will be accessing a location which is out of bound to url_buf array.
so check whether you got a string with newline character and change it to a null character, if you didn't get a newline character in the string then you might want to read until you get a newline to get the complete url.
this is applicable only if your url's are delimited by newline.

Related

How to get blank input with scanf

I have created a console-based program that get commands from the user. I wanted to check if the user give a blank input (just hit enter), it gives a message to the user. I used wscanf_s to get input from users. I have written the following code:
else if (!wcscmp(g_c_Commands, L"console"))
{
wchar_t console_command[MAX_PATH] = { 0 };
wscanf_s(L"%s", console_command, MAX_PATH - 1);
if (!wcscmp(console_command, L"--local"))
{
CallPsExecuteWindow(arg_computer_name);
}
else if (!wcscmp(console_command, L"--ip"))
{
wchar_t remote_host[MAX_PATH] = { 0 };
wscanf_s(L"%s", remote_host, MAX_PATH - 1);
CallPsExecuteWindow(remote_host);
}
else
{
wprintf(L"\n\t");
WarningMessage(L"%s", L"[Wrong] Usage: console --local / --ip [ADDRESS].");
wprintf(L"\n\n");
}
}

Instead of "%s", you should use "%[^ \t\n]" after skipping blanks:
console_command[0] = L'\0';
wscanf_s(L"%*[ \t]"); // skip blanks if any
wscanf_s(L"%[^ \t\n]", console_command, (unsigned)MAX_PATH); // read word, stop on white space
The first wscanf_s will fail if no blanks are pending, but you can ignore the error.
The second wscanf_s will fail if there are no more words pending on the line, ie: if the pending byte is a newline, console_command is unmodified so it still contains an empty string.
Parsing this input as wide strings is a major pain, most platforms have standardized on UTF-8 encoding, it is a shame you must deal with such cumbersome APIs.
It is also very disturbing to have to pass a unsigned int value as the number of elements in the destination array when the C Standard specifies that this extra argument should be a size_t which has a different width on most 64-bit platforms including Windows.

Extracting JSON text from char response buffer

I receive a large chunk of text from my Wi-Fi module.
Which is saved in my response buffer.
char wifiResponseBuffer[500];
The contents can be seen below :
AT+CIPSEND=84
> GET http://api.noteu.co.uk/v1/poll/get/?seria
SEND OK
+IPD,308:{"data":[{"line1":" Facebook Note ","line2":"Nathan Weighill also","line3":" commented on Harry ","line4":" Bailey's photo.","beep":1,"received_time":1424976639},{"line1":" Gmail Message ","line2":"","line3":"Noteu Error","line4":"","beep":1,"received_time":1424976640}],"summary":{"note_count":2}}
OK
OK
Unlink
I have a JSON parser library however need to extract the actual JSON text from the response before it can be parsed. This is at the first occurrence of { and last occurrence of }.
What combination of string functions can I use in C to find the indexes of these characters and then extract the JSON text.
Any help is greatly appreciated,
Jack

strchr is used to find the first occurrence of a character in a string. strrchr finds the last.
Here's a brief example of how you might use these:
int test(void)
{
char wifiResponseBuffer[500];
int wifi_len;
// get the WiFi data, leaving 1 byte for a NUL terminator
wifi_len = get_wifi_response(wifiResponseBuffer, sizeof(wifiResponseBuffer)-1);
if (wifi_len < 0)
return -1; // error
// NUL-terminate to use with strxxx functions
wifiResponseBuffer[wifi_len] = '\0';
// Find start of JSON data
const char *json_start = strchr(wifiResponseBuffer, '{');
if (json_start == NULL)
return -1;
json_start++; // advance past {
// Find end of JSON data
const char *json_end = strchr(json_start, '}');
if (json_end == NULL)
return -1;
// Pass the JSON data to the library
size_t json_len = json_end - json_start;
do_something_with_json_data(json_start, json_len);
}

Bus Error on void function return

I'm learning to use libcurl in C. To start, I'm using a randomized list of accession names to search for protein sequence files that may be found hosted here. These follow a set format where the first line is a variable length (but which contains no information I'm trying to query) then a series of capitalized letters with a new line every sixty (60) characters (what I want to pull down, but reformat to eighty (80) characters per line).
I have the call itself in a single function:
//finds and saves the fastas for each protein (assuming on exists)
void pullFasta (proteinEntry *entry, char matchType, FILE *outFile) {
//Local variables
URL_FILE *handle;
char buffer[2] = "", url[32] = "http://www.uniprot.org/uniprot/", sequence[2] = "";
//Build full URL
/*printf ("u:%s\nt:%s\n", url, entry->title); /*This line was used for debugging.*/
strcat (url, entry->title);
strcat (url, ".fasta");
//Open URL
/*printf ("u:%s\n", url); /*This line was used for debugging.*/
handle = url_fopen (url, "r");
//If there is data there
if (handle != NULL) {
//Skip the first line as it's got useless info
do {
url_fread(buffer, 1, 1, handle);
} while (buffer[0] != '\n');
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
//Print it
printFastaEntry (entry->title, sequence, matchType, outFile);
}
url_fclose (handle);
return;
}
With proteinEntry being defined as:
//Entry for fasta formatable data
typedef struct proteinEntry {
char title[7];
struct proteinEntry *next;
} proteinEntry;
And the url_fopen, url_fclose, url_feof, url_read, and URL_FILE code found here, they mimic the file functions for which they are named.
As you can see I've been doing some debugging with the URL generator (uniprot URLs follow the same format for different proteins), I got it working properly and can pull down the data from the site and save it to file in the proper format that I want. I set the read buffer to 1 because I wanted to get a program that was very simplistic but functional (if inelegant) before I start playing with things, so I would have a base to return to as I learned.
I've tested the url_<function> calls and they are giving no errors. So I added incremental printf calls after each line to identify exactly where the bus error is occurring and it is happening at return;.
My understanding of bus errors is that it's a memory access issue wherein I'm trying to get at memory that my program doesn't have control over. My confusion comes from the fact that this is happening at the return of a void function. There's nothing being read, written, or passed to trigger the memory error (as far as I understand it, at least).
Can anyone point me in the right direction to fix my mistake please?
EDIT: As #BLUEPIXY pointed out I had a potential url_fclose (NULL). As #deltheil pointed out I had sequence as a static array. This also made me notice I'm repeating my bad memory allocation for url, so I updated it and it now works. Thanks for your help!

If we look at e.g http://www.uniprot.org/uniprot/Q6GZX1.fasta and skip the first line (as you do) we have:
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
Which is a 60 characters string.
When you try to read this sequence with:
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
The problem is sequence is not expandable and not large enough (it is a fixed length array of size 2).
So make sure to choose a large enough size to hold any sequence, or implement the ability to expand it on-the-fly.

Tainted string in C

I'm running Coverity tool in my file operation function and getting the following error.
As you can see below, I'm using an snprintf() before passing this variable in question to the line number shown in the error message. I guess that some sanitization of the string has to be done as a part of that snprintf(). But still the warning is shown.
Error:TAINTED_STRING (TAINTED string "fn" was passed to a tainted string sink content.) [coverity]
char fn[100]; int id = 0;
char* id_str = getenv("ID");
if (id_str) {
id = atoi(id_str);
}
memset(fn, '\0', sizeof(fn));
snprintf(fn, 100, LOG_FILE, id);
if(fn[100-1] != '\0') {
fn[100-1] = '\0';
}
log_fp = fopen (fn, "a");
Any help would be highly appreciated.

Try the following:
char* id_str = getenv("ID");
if (id_str) {
id_str = strdup(id_str);
id = atoi(id_str);
free( id_str );
}
The fn string passed to fopen is tainted by an environment variable. Using strdup may act as "sanitizing".

Error:TAINTED_STRING is warning that (as far as Coverity can tell) some aspect of the behaviour is influenced by some external input and that the external input is not examined for 'safeness' before it influences execution.
In this particular example it would appear that Coverity is wrong because the value of LOG_FILE is "/log/test%d.log" and is used with an int in the snprintf, meaning that the content of char fn[100] is always well defined.
So a reasonable course of action would be to mark the error as a non-issue so that it is ignored on future runs.

Coverity wants to make sure you sanitize any string which is coming from outside of your program, be it getenv, argv, or from some file read.
You may have a function to sanitize the input(Tainted string) and have a comment provided by Coverty which tells Coverty that input string is sanitized and the SA warning will go away.
// coverity[ +tainted_string_sanitize_content : arg-0 ]
int sanitize_mystring(char* s)
{
// Do some string validation
if validated()
return SUCCESS;
else
return FAILED;
}
// coverity[ +tainted_string_sanitize_content : arg-0 ] is the line Coverty is looking
Hope this helps.

fgets() seems to overflow input to other variables

I'm doing a read from a file, but the input seems to "overflow" into other variables.
I have these 2 variables:
char str[250]; //used to store input from stream
char *getmsg; //already points to some other string
The problem is, when I use fgets() to read the input
printf("1TOKEN:%s\n",getmsg);
fp=fopen("m.txt","r");
fp1=fopen("m1.txt","w");
if(fp!=NULL && fp1!=NULL)
printf("2TOKEN:%s\n",getmsg);
while(fgets(str,250,fp)!=NULL){
printf("3TOKEN:%s\n",getmsg);
printf("read:%s",str);
printf("4TOKEN:%s\n",getmsg);
I get something like this:
1TOKEN:c
2TOKEN:c
3TOKEN:b atob atobbody
read:a b atob atobbody
4TOKEN:b atob atobbody
You see how str kind of flows into getmsg. What happened there? How can I avoid this from happening?
Thanks in advance :)
in the code, "getmsg" is called "token", I thought it might have something to do with identical names or something so I changed it to getmsg, same error, so I changed it back...
if(buf[0]=='C'){
int login_error=1;
fp=fopen("r.txt","r");
if(fp!=NULL){
memcpy(&count,&buf[1],2);
pack.boxid=ntohs(count);
memcpy(pack.pword,&buf[3],10);
printf("boxid:%u pword:%s\n",pack.boxid,pack.pword);
while(fgets(str,250,fp)!=NULL){
/*"getmsg"===>*/ token=strtok(str," ");
token=strtok(NULL," ");//receiver uname
token1=strtok(NULL," ");//pword
token2=strtok(NULL," ");//boxid
sscanf(token2,"%hu",&count);//convert char[] to unsigned short
if(pack.boxid==count && strcmp(token1,pack.pword)==0){//uname & pword found
login_error=0;
printf("found:token:%s\n",token);
break;
}
}
if(login_error==1){
count=65535;
pack.boxid=htons(count);
}
if(login_error==0){
count=0;
pack.boxid=htons(count);
}
fclose(fp);
}
printf("1TOKEN:%s\n",token);
if(login_error==0){
int msg_error=1;
fp=fopen("m.txt","r");
fp1=fopen("m1.txt","w");
if(fp!=NULL && fp1!=NULL){
printf("2TOKEN:%s\n",token);
while(fgets(str,250,fp)!=NULL){
printf("3TOKEN:%s\n",token);
printf("read:%s",str);
token1=strtok(str," ");//sender
token2=strtok(NULL," ");//receiver
token3=strtok(NULL," ");//subject
token4=strtok(NULL," ");//body
printf("m.txt:token1:%s token2:%s token3:%s token4:%s\n",token1,token2,token3,token4);
if(msg_error==1 && strcmp(token,token2)==0){//message found
msg_error=0;
count=0;
pack.boxid=htons(count);
strcpy(pack.uname,token1);
strcpy(pack.subject,token3);
strcpy(pack.body,token4);
printf("pack:uname:%s subject:%s body:%s token:%s token2:%s strcmp:%d\n",pack.uname,pack.subject,pack.body,token,token2,strcmp(token,token2));
continue;
}
fprintf(fp1,"%s %s %s %s\n",token1,token2,token3,token4);
}
if(msg_error==1){
count=65534;
pack.boxid=htons(count);
}
printf("count:%u -> boxid:%u\n",count,pack.boxid);
fclose(fp);
fclose(fp1);
}
str[0]='c';
memcpy(&str[1],&pack.boxid,2);
memcpy(&str[3],pack.uname,8);
memcpy(&str[11],pack.subject,20);
memcpy(&str[31],pack.body,200);
str[231]='\0';
bytes=232;
}
}
below is m.txt, it is used to store senders, receivers, subjects and msgbodies:
the naming patter is quite obvious >.^
a b atob atobbody
a c atoc atoccc
b c btoc btoccccc
b a btoa btoaaaaa
So I'm trying to get a msg stored in m.txt for the recipient "c", but it flows over, and by much coincidence, it returns the msg for "b"...

It looks like getmsg is pointing to the third character of your str buffer:
`str` is "a b atob atobbody"
^
|
\__ `getmsg` is pointing there.
Therefore, every time you change str by calling fgets(), the string pointed to by getmsg also changes, since it uses the same memory.