Elegant parsing of query string in C - c

I'm trying to parse a URL query string in C and I don't see how to do it elegantly. Any hints or suggestions would be greatly appreciated:
static void readParams(char * string, char * param, char * value) {
char arg[100] = {0}; // Not elegant, brittle
char value2[1024] = {0};
sscanf(string, "%[^=]=%s", arg, value2);
strcpy(param, arg);
strcpy(value, value2);
}
char * contents = "username=ted&age=25";
char * splitted = strtok (contents,"&");
char * username;
char * age;
while (splitted != NULL)
{
char param[100]; // Not elegant, brittle
char value[100];
char * t_str = strdup(splitted);
readParams(t_str, param, value);
if (strcmp(param, "username") == 0) {
username = strdup(value);
}
if (strcmp(param, "age") == 0) {
age = strdup(value); // This is a string, can do atoi
}
splitted = strtok (NULL, "&");
}
The problem I kept on having is that because of the strtok function anything that was seemed more intelligent to do before the last strtok function seemed to break the while loop.

You either need to tailor complex and effective parser or settle with libraries that will do it for you.
uriparser should provide all you need (plus it supports unicode).

In general strtok breaks the source string for use by some other functions. Here is a bare bones example of using strtok to tokenize a string
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define MX_SPLIT 128
char **split( char **result, char *working, const char *src, const char *delim)
{
int i;
strcpy(working, src); // working will get chppped up instead of src
char *p=strtok(working, delim);
for(i=0; p!=NULL && i < (MX_SPLIT -1); i++, p=strtok(NULL, delim) )
{
result[i]=p;
result[i+1]=NULL; // mark the end of result array
}
return result;
}
void foo(const char *somestring)
{
int i=0;
char *result[MX_SPLIT]={NULL};
char working[256]={0x0}; // assume somestring is never bigger than 256 - a weak assumption
char mydelim[]="!##$%^&*()_-";
split(result, working, somestring, mydelim);
while(result[i]!=NULL)
printf("token # %d=%s\n", i, result[i]);
}

I do:
char querystring[]="a=1&b&c=3&d=&meh=5";
int pc=0;
char *tok;
char *otok;
for(tok=strtok(querystring,"&");tok!=NULL;tok=strtok(tok,"&")) {
pc++;
otok=tok+strlen(tok)+1;
tok=strtok(tok,"=");
fprintf(stderr,"param%d: %s ",pc,tok);
tok=strtok(NULL,"=");
fprintf(stderr,"value%d: %s\n",pc,tok);
tok=otok;
};
remember that strtok destroys the original so before this just make a copy of the querystring.

Assumptions is not a bad thing, in general, and especially in fast, robust and protective code (consider, for example, that your input string have invalid format).
To reach most elastic code however, you need to manually allocate (and deallocate after usage!) memory for strings, the size of which should be to total lenght of an input string (one more time where reasonable limit is a must), since it's unknown (in general) how long is param and value string parts.

Related

C: Splitting a string into two strings, and returning a 2 - element array

I am trying to write a method that takes a string and splits it into two strings based on a delimiter string, similar to .split in Java:
char * split(char *tosplit, char *culprit) {
char *couple[2] = {"", ""};
int i = 0;
// Returns first token
char *token = strtok(tosplit, culprit);
while (token != NULL && i < 2) {
couple[i++] = token;
token = strtok(NULL, culprit);
}
return couple;
}
But I keep getting the Warnings:
In function ‘split’:
warning: return from incompatible pointer type [-Wincompatible-pointer-types]
return couple;
^~~~~~
warning: function returns address of local variable [-Wreturn-local-addr]
... and of course the method doesn't work as I hoped.
What am I doing wrong?
EDIT: I am also open to other ways of doing this besides using strtok().
A view things:
First, you are returning a pointer to a (sequence of) character(s), i.e. a char
* rather than a pointer to a (sequence of) pointer(s) to char. Hence, the return type should be char **.
Second, you return the address of a local variable, which - once the function has finished - goes out of scope and must not be accessed afterwards.
Third, you define an array of 2 pointers, whereas your while-loop may write beyond these bounds.
If you really want to split into two strings, the following method should work:
char ** split(char *tosplit, char *culprit) {
static char *couple[2];
if ((couple[0] = strtok(tosplit, culprit)) != NULL) {
couple[1] = strtok(NULL, culprit);
}
return couple;
}
I'd caution your use of strtok, it probably does not do what you want it to. If you think it does anything like a Java split, read the man page and then re-read it again seven times. It is literally tokenizing the string based on any of the values in delim.
I think you are looking for something like this:
#include <stdio.h>
#include <string.h>
char* split( char* s, char* delim ) {
char* needle = strstr(s, delim);
if (!needle)
return NULL;
needle[0] = 0;
return needle + strlen(delim);
}
int main() {
char s[] = "Fluffy furry Bunnies!";
char* res = split(s, "furry ");
printf("%s%s\n", s, res );
}
Which prints out "Fluffy Bunnies!".
First of all strtok modifies the memory of tosplit so be certain that, that's what you wish to do. If so then consider this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*
* NOTE: unsafe (and leaky) implementation using strtok
*
* *into must point to a memory space where tokens can be stored
* or if *into is NULL then it allocates enough space.
* Returns:
* allocated array of items that you must free yourself
*
*/
char **__split(char *src, const char *delim)
{
size_t idx = 0;
char *next;
char **dest = NULL;
do {
dest = realloc(dest, (idx + 1)* sizeof(char *));
next = strtok(idx > 0 ? NULL:strdup(src), delim);
dest[idx++] = next;
} while(next);
return dest;
}
int main() {
int x = 0;
char **here = NULL;
here = __split("hello,there,how,,are,you?", ",");
while(here[x]) {
printf("here: %s\n", here[x]);
x++;
}
}
You can implement a much safer and non leaky version (note the strdup) of this but hopefully this is a good start.
The type of couple is char** but you have defined the function return type as char*. Furthermore you are returning the pointer to a local variable. You need to pass the pointer array into the function from the caller. For example:
#include <stdio.h>
#include <string.h>
char** split( char** couple, char* tosplit, char* culprit )
{
int i = 0;
// Returns first token
char *token = strtok( tosplit, culprit);
for( int i = 0; token != NULL && i < 2; i++ )
{
couple[i] = token;
token = strtok(NULL, culprit);
}
return couple;
}
int main()
{
char* couple[2] = {"", ""};
char tosplit[] = "Hello World" ;
char** strings = split( couple, tosplit, " " ) ;
printf( "%s, %s", strings[0], strings[1] ) ;
return 0;
}

Overwriting parts of a string with parts of another string

I'm trying to overwrite a part of a string with parts of another String.
Basically, I want to access a given index of a string, write a given number of chars from another given index of another string.
So a function like memcpy(stringa[indexa], stringb[indexb], length);, except that this does not work.
Using strncpy would also suffice.
More code, as requested:
void mymemset(char* memloc, char* cmd, int data_blocks[], int len)
{
int i = 0;
while(i < len)
{
//missing part. Where I want the "memcpy" operation to take place
i++;
}
return;
}
memloc is the string we want to overwrite, cmd is the string we are overwriting from, data_blocks contains information about where in memloc we are supposed to write, and len is the number of operations we are executing. So I want to overwrite at location data_blocks[i], from cmd 8 chars at a time.
EDIT: I think I just forgot an &, so sorry to have confused you and thanks for your time. This seems to work:
void mymemset(char* memloc, char* cmd, int data_blocks[], int len)
{
int i = 0;
while(i < len)
{
memcpy(&memloc[data_blocks[i]], &cmd[i*8], 8);
i++;
}
return;
}
Takes 8 bytes at a time from cmd, stores them in memloc at the index given by data_blocks[i]. As commented, data_blocks contains information about different indexes in memloc that is available, and segmentation of the string cmd can occur.
Supposing stringa and stringb are declared as follows
char stringa[] = "Hello" ;
char stringb[] = "World" ;
This should work:
memcpy(&stringa[1], &stringb[1], 2) ;
Your example should not compile, or if it compiles if is likely to crash or to cause undefined behaviour :
memcpy(stringa[1], stringb[1], 2) ;
Your naming is confusing : memset works on bytes. If you manipulate strings you have extra precaution to take: think of the \0.
I think you want something like that:
void my_str_overwrite(char* dest, const char* ref, int idx, size_t count)
{
size_t input_len = strlen(dest);
if(input_len <= idx+count)
{
// Error: not enough space
}
for(size_t i=0; i<count; i++)
{
dest[idx+i] = ref[i];
}
return;
}
You don't need to pass the whole data_block[] array, you just interested in one element of this array which contains an offset for your copy, if I understood correctly.
As you don't modify cmd it should be const
The code above does not handle the NULL terminating byte which should be appended to memloc if it is actually a string
So I want to overwrite at location data_blocks[i], from cmd 8 chars at a time.
This one is confusing. If you know that you only want 8 bytes to be copied each time you call the function then in the code above make count an local variable within the function and fix it size_t count = 8;
if strings are the same size the you can just use memcpy:
#include <strings.h>
char text[] = "Hello James!";
char name[] = "Jenny";
char* pos = strstr(text, "James");
memcpy(pos, name, strlen(name)-1); // for the '\0'
If they're not then you must reallocate the string as the length will change
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#define STR "Hello James!"
void replace(char** src, char* find, char* rep) {
char* ret = NULL;
char* pos = strstr(*src, find);
if (!pos)
return; // no changes
int l = (1 + strlen(*src) + strlen(rep) - strlen(find));
ret = (char*)malloc(sizeof(char) * l);
ret[l-1] = 0;
int ind = (int)(pos - *src);
strncpy(ret, *src, ind);
printf("ind: %d; %s\n", ind, ret);
strncpy(&ret[ind], rep, strlen(rep));
strncpy(&ret[ind+strlen(rep)], &pos[strlen(find)], strlen(pos)-strlen(find));
printf("%s\n", ret);
free(*src);
*src = ret;
}
int main() {
char *str = NULL;
str = (char*)malloc(sizeof(char) * (strlen(STR)+1));
assert(str);
strcpy(str, STR);
printf("before: %s\n", str);
replace(&str, "James", "John");
printf("after: %s\n", str);
free(str);
return 0;
}
This code in not optimized.

Want to pass a single char pointer from a double pointer

I have to write a function which takes in 2 double pointers (both to char type). The first double pointer has a string of query values and the 2nd one has stopwords. The idea is to eliminate the stopwords from the query string and return all the words without those stopwords.
For example
Input - query: “the”, “new”, “store”, “in”, “SF”
stopwords: “the”, “in”
OUTPUT
new
store
SF
I have written the following code while trying to use strtok which takes in only single pointers to char types. How do I access the contents of a double pointer?
Thanks
#include <stdio.h>
void remove_stopwords(char **query, int query_length, char **stopwords, int stopwords_length) {
char *final_str;
final_str = strtok(query[0], stopwords[0]);
while(final_str != NULL)
{
printf("%s\n", final_str);
final_str = strtok(NULL, stopwords);
}
}
For simplicity's sake, you can assume a double pointer to be equivalent to a 2d array (it is not!). However, this means that you can use array-convention to access contents of a double pointer.
#include <stdio.h>
#include <string.h>
char *query[5] = {"the","new","store","in","SF"};
char *stopwords[2] = {"the","in"};
char main_array[256];
void remove_stopwords(char **query,int query_length, char **stopwords, int stopwords_length);
int main()
{
remove_stopwords(query,5,stopwords,2);
puts(main_array);
return 0;
}
void remove_stopwords(char **query,int query_length, char **stopwords, int stopwords_length)
{
int i,j,found;
for(i=0;i<query_length;i++)
{
found=0;
for(j=0;j<stopwords_length;j++)
{
if(strcmp(query[i],stopwords[j])==0)
{
found=1;
break;
}
}
if(found==0)
{
printf("%s ",query[i]);
strncat(main_array,query[i],strlen(query[i]));
}
}
}
Output: new store SF newstoreSF
#Binayaka Chakraborty's solution solved the problem but I thought it might be useful to provide an alternative that used pointers only and showed appropriate use of strtok(), the use of which may have been misunderstood in the question.
In particular, the second parameter of strtok() is a pointer to a string that lists all the single-character delimiters to be used. One cannot use strtok() to split a string based on multi-character delimiters, as appears to have been the intention in the question.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void remove_stopwords(char *query, char **stopwords) {
char *final_str = strtok(query, " ");
while(final_str != NULL) {
int isStop = 0;
char **s;
for (s = stopwords; *s; s++) {
if (strcmp(final_str,*s) == 0) {
isStop = 1;
}
}
if (!isStop) printf("%s ", final_str);
final_str = strtok(NULL, " ");
}
}
int main() {
const char *q = "the new store in SF";
char *query = malloc(strlen(q)+1);
/* We copy the string before calling remove_stopwords() because
strtok must be able to modify the string given as its first
parameter */
strcpy(query,q);
char *stopwords[] = {"the", "in", NULL};
remove_stopwords(query,stopwords);
return 0;
}
The approach shown here also avoids the need to hard code the sizes of the arrays involved, which therefore reduces potential for bugs.

Reversing a string in C using pointers?

Language: C
I am trying to program a C function which uses the header char *strrev2(const char *string) as part of interview preparation, the closest (working) solution is below, however I would like an implementation which does not include malloc... Is this possible? As it returns a character meaning if I use malloc, a free would have to be used within another function.
char *strrev2(const char *string){
int l=strlen(string);
char *r=malloc(l+1);
for(int j=0;j<l;j++){
r[j] = string[l-j-1];
}
r[l] = '\0';
return r;
}
[EDIT] I have already written implementations using a buffer and without the char. Thanks tho!
No - you need a malloc.
Other options are:
Modify the string in-place, but since you have a const char * and you aren't allowed to change the function signature, this is not possible here.
Add a parameter so that the user provides a buffer into which the result is written, but again this is not possible without changing the signature (or using globals, which is a really bad idea).
You may do it this way and let the caller responsible for freeing the memory. Or you can allow the caller to pass in an allocated char buffer, thus the allocation and the free are all done by caller:
void strrev2(const char *string, char* output)
{
// place the reversed string onto 'output' here
}
For caller:
char buffer[100];
char *input = "Hello World";
strrev2(input, buffer);
// the reversed string now in buffer
You could use a static char[1024]; (1024 is an example size), store all strings used in this buffer and return the memory address which contains each string. The following code snippet may contain bugs but will probably give you the idea.
#include <stdio.h>
#include <string.h>
char* strrev2(const char* str)
{
static char buffer[1024];
static int last_access; //Points to leftmost available byte;
//Check if buffer has enough place to store the new string
if( strlen(str) <= (1024 - last_access) )
{
char* return_address = &(buffer[last_access]);
int i;
//FixMe - Make me faster
for( i = 0; i < strlen(str) ; ++i )
{
buffer[last_access++] = str[strlen(str) - 1 - i];
}
buffer[last_access] = 0;
++last_access;
return return_address;
}else
{
return 0;
}
}
int main()
{
char* test1 = "This is a test String";
char* test2 = "George!";
puts(strrev2(test1));
puts(strrev2(test2));
return 0 ;
}
reverse string in place
char *reverse (char *str)
{
register char c, *begin, *end;
begin = end = str;
while (*end != '\0') end ++;
while (begin < --end)
{
c = *begin;
*begin++ = *end;
*end = c;
}
return str;
}

Making specific word in string uppercase C

I'm trying very hard to figure out a way to parse a string and "highlight" the search term in the result by making it uppercase.
I've tried using strstr and moving a pointer along and "toupper"ing the characters, to no avail.
char * highlight( char *str, char *searchstr ) {
char *pnt=str;
int i;
pnt=strstr(str,searchstr);
while(pnt){
printf("ststr retured: %s\n", pnt);
for(i=0;i<strlen(searchstr);i++) {
printf("%c",toupper(pnt[i]));
}
printf("\n");
pnt=pnt+strlen(searchstr);
pnt=strstr(pnt,searchstr);
}
return str;
}
Any advice is greatly appreciated.
Since Schot mentioned every occurrence:
#include <string.h>
char *highlight(char *str, char *searchstr) {
char *pnt = str;
while (pnt = strstr(pnt, searchstr)) {
char *tmp = searchstr;
while(*(tmp++)) { *pnt = toupper(*pnt); pnt++; }
}
return str;
}
int main() {
char s[] = "hello world follow llollo";
char search[] = "llo";
puts(highlight(s, search));
return 0;
}
output is:
$ ./a.out
heLLO world foLLOw LLOLLO
You appreciate that the function takes the string as an argument and then returns that same string, while having -not- modified that string? all the function does is print to stdout the capital characters.
At some point, you would need to change the string itself, e.g.;
pnt[i] = toupper( pnt[i] );
Like Blank Xavier said, you probably want to modify the actual string. toupper does not change the value of the character you supply, but returns a new character that is its uppercase version. You have to explicitly assign it back to the original string.
Some additional tips:
Never do multiple strlen calls on a string that doesn't change, do it once and store the result.
You can express the promise of not changing searchstr by declaring it as const char *.
Below is an example with a (in my opinion) easy method of looping through all strstr matches:
#include <string.h>
#include <ctype.h>
char *highlight(char *s, const char *t)
{
char *p;
size_t i, len = strlen(t);
for (p = s; (p = strstr(p, t)); p += len)
for (i = 0; i < len; i++)
p[i] = toupper(p[i]);
return s;
}

Resources