C String parsing errors with strtok(),strcasecmp() - c

So I'm new to C and the whole string manipulation thing, but I can't seem to get strtok() to work. It seems everywhere everyone has the same template for strtok being:
char* tok = strtok(source,delim);
do
{
{code}
tok=strtok(NULL,delim);
}while(tok!=NULL);
So I try to do this with the delimiter being the space key, and it seems that strtok() no only reads NULL after the first run (the first entry into the while/do-while) no matter how big the string, but it also seems to wreck the source, turning the source string into the same thing as tok.
Here is a snippet of my code:
char* str;
scanf("%ms",&str);
char* copy = malloc(sizeof(str));
strcpy(copy,str);
char* tok = strtok(copy," ");
if(strcasecmp(tok,"insert"))
{
printf(str);
printf(copy);
printf(tok);
}
Then, here is some output for the input "insert a b c d e f g"
aaabbbcccdddeeefffggg
"Insert" seems to disappear completely, which I think is the fault of strcasecmp(). Also, I would like to note that I realize strcasecmp() seems to all-lower-case my source string, and I do not mind. Anyhoo, input "insert insert insert" yields absolutely nothing in output. It's as if those functions just eat up the word "insert" no matter how many times it is present. I may* end up just using some of the C functions that read the string char by char but I would like to avoid this if possible. Thanks a million guys, i appreciate the help.

With the second snippet of code you have five problems: The first is that your format for the scanf function is non-standard, what's the 'm' supposed to do? (See e.g. here for a good reference of the standard function.)
The second problem is that you use the address-of operator on a pointer, which means that you pass a pointer to a pointer to a char (e.g. char**) to the scanf function. As you know, the scanf function want its arguments as pointers, but since strings (either in pointer to character form, or array form) already are pointer you don't have to use the address-of operator for string arguments.
The third problem, once you fix the previous problem, is that the pointer str is uninitialized. You have to remember that uninitialized local variables are truly uninitialized, and their values are indeterminate. In reality, it means that their values will be seemingly random. So str will point to some "random" memory.
The fourth problem is with the malloc call, where you use the sizeof operator on a pointer. This will return the size of the pointer and not what it points to.
The fifth problem, is that when you do strtok on the pointer copy the contents of the memory pointed to by copy is uninitialized. You allocate memory for it (typically 4 or 8 bytes depending on you're on a 32 or 64 bit platform, see the fourth problem) but you never initialize it.
So, five problems in only four lines of code. That's pretty good! ;)

It looks like you're trying to print space delimited tokens following the word "insert" 3 times. Does this do what you want?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char str[BUFSIZ] = {0};
char *copy;
char *tok;
int i;
// safely read a string and chop off any trailing newline
if(fgets(str, sizeof(str), stdin)) {
int n = strlen(str);
if(n && str[n-1] == '\n')
str[n-1] = '\0';
}
// copy the string so we can trash it with strtok
copy = strdup(str);
// look for the first space-delimited token
tok = strtok(copy, " ");
// check that we found a token and that it is equal to "insert"
if(tok && strcasecmp(tok, "insert") == 0) {
// iterate over all remaining space-delimited tokens
while((tok = strtok(NULL, " "))) {
// print the token 3 times
for(i = 0; i < 3; i++) {
fputs(tok, stdout);
}
}
putchar('\n');
}
free(copy);
return 0;
}

Related

How does the compiler allocate memory for an array of strings in C?

I typed up this block of code for an assignment:
char *tokens[10];
void parse(char* input);
void main(void)
{
char input[] = "Parse this please.";
parse(input);
for(int i = 2; i >= 0; i--) {
printf("%s ", tokens[i]);
}
}
void parse(char* input)
{
int i = 0;
tokens[i] = strtok(input, " ");
while(tokens[i] != NULL) {
i++;
tokens[i] = strtok(NULL, " ");
}
}
But, looking at it, I'm not sure how the memory allocation works. I didn't define the length of the individual strings as far as I know, just how many strings are in the string array tokens (10). Do I have this backwards? If not, then is the compiler allocating the length of each string dynamically? In need of some clarification.
strtok is a bad citizen.
For one thing, it retains state, as you've implicitly used when you call strtok(NULL,...) -- this state is stored in the private memory of the Standard C Library, which means only single threaded programs can use strtok. Note that there is a reentrant version called strtok_r in some libraries.
For another, and to answer your question, strtok modifies its input. It doesn't allocate space for the strings; it writes NUL characters in place of your delimiter in the input string, and returns a pointer into the input string.
You are correct that strtok can return more than 10 results. You should check for that in your code so you don't write beyond the end of tokens. A reliable program would either set an upper limit, like your 10, and check for it, reporting an error if it's exceeded, or dynamically allocate the tokens array with malloc, and realloc it if it gets too big. Then the error occurs when you fun out of memory.
Note that you can also work around the problem of strtok modifying your input string by strduping before passing it to strtok. Then you'll have to free the new string after both it and tokens, which points to it, are going out of scope.
tokens is an array of pointers.
The distinction between strings and pointers if often fuzzy. In some situations strings are better thought out as arrays, in other situations as pointers.
Anyway... in your example input is an array and tokens is an array of pointers to a place within input.
The data inside input is changed with each call to strtok()
So, step by step
// input[] = "foo bar baz";
tokens[0] = strtok(input, " ");
// input[] = "foo\0bar baz";
// ^-- tokens[0] points here
tokens[1] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[1] points here
tokens[2] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[2] points here
// next strtok returns NULL

How to store each sentence as an element of an array?

So, suppose I have an array (program asks me to write some text):
char sentences[] = "The first sentence.The second sentence.The third sentence";
And I need to store each sentence as an array, where I can have access to any word, or to store the sentences in a single array as elements.
(sentences[0] = "The first sentence"; sentences[1] = "The second sentence";)
How to print out each sentence separately I know:
char* sentence_1 = strtok(sentences, ".");
char* sentence_2 = strtok(NULL, ".");
char* sentence_3 = strtok(NULL, ".");
printf("#1 %s\n", sentence_1);
printf("#2 %s\n", sentence_2);
printf("#3 %s\n", sentence_3);
But how to make program store those sentences in 1 or 3 arrays I have no idea.
Please, help!
If you keep it in the main, since your sentences memory is static (cannot be deleted) you can simply do that:
#include <string.h>
#include <stdio.h>
int main()
{
char sentences[] = "The first sentence.The second sentence.The third sentence";
char* sentence[3];
unsigned int i;
sentence[0] = strtok(sentences, ".");
for (i=1;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
sentence[i] = strtok(NULL, ".");
}
for (i=0;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
printf("%d: %s\n",i,sentence[i]);
}
return 0;
}
In the general case, you first have to duplicate your input string:
char *sentences_dup = strdup(sentences);
sentence[0] = strtok(sentences_dup, ".");
many reasons for that:
you don't know the lifespan/scope of the input, and it is generally a pointer/a parameter, so your sentences could be invalid as soon as the input memory is freed/goes out of scope
the passed buffer may be const: you cannot modify its memory (strtok modifies the passed buffer)
change sentences[] by *sentences in the example above and you're pointing on a read-only zone: you have to make a copy of the buffer.
Don't forget to store the duplicated pointer, because you may need to free it at some point.
Another alternative is to also duplicate there:
for (i=1;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
sentence[i] = strdup(strtok(NULL, "."));
}
so you can free your big tokenized string at once, and the sentences have their own, independent memory.
EDIT: the remaining problem here is that you still have to know in advance how many sentences there are in your input.
For that, you could count the dots, and then allocate the proper number of pointers.
int j,nb_dots=0;
char pathsep = '.';
int nb_sentences;
int len = strlen(sentences);
char** sentence;
// first count how many dots we have
for (j=0;j<len;j++)
{
if (sentences[j]==pathsep)
{
nb_dots++;
}
}
nb_sentences = nb_dots+1; // one more!!
// allocate the array of strings
sentence=malloc((nb_sentences) * sizeof(*sentence));
now that we have the number of strings, we can perform our strtok loop. Just be careful of using nb_sentences and not sizeof(sentence)/sizeof(sentence[0]) which is now irrelevant (worth 1) because of the change of array type.
But at this point you could also get rid of strtok completely like proposed in another answer of mine

how to put a parsed string inside of malloc/calloc/dynamic memory?

So I'm doing a few practice questions for a final exam coming up. and I'm having a lot of trouble with dynamic memory.
So the question wants to basically parse through 2 different sources and compare them to find the similar words. (one from a csv file and one from a cgi input)
so I figured I'd use malloc/calloc to put a string in each array slot and then compare each slot. but I'm having some issues with my code:
char buffer[100],buffer2[100],tmp[100],line[100];
char *token,*tok,*input;
int main()
{
char s[100]="search=cat+or+dog+store";
char *search=(char*)calloc(10,sizeof(char));
strcpy(buffer,s);
sscanf(buffer,"search=%s",buffer);
int k=0;
tok=strtok(buffer,"+");
while(tok!=NULL)
{
strcpy(&search[k],tok);
k++;
tok=strtok(NULL,"+");
}
printf("%d\n",k);
strcpy(&search[k],"\0");
***printf("%s",&search[0]);
printf("%s",&search[1]);
printf("%s",&search[2]);
printf("%s",&search[3]);***
char* csv=(char*)calloc(10,sizeof(char));
char tmp2[100];
FILE *fp;
fp=fopen("web.csv","r");
while(fgets(line,sizeof(line),fp)!=NULL)
{
strcpy(buffer2,line);
token=strtok(buffer2,",");
while(token!=NULL)
{
strcpy(csv,token);
csv++;
token=strtok(NULL,",");
}
strcpy(csv,"\0");
free(csv);
free(search);
return(0);
}
the part i put between * * i put in order to test if the strings were put inside the calloc. but nothing prints out or smt weird prints out. the same code was used for the latter bottom part and they are both either empty or only printing out weird fragmented part of the code.
when i put the free(csv) and free(search), it says that "pointer being freed was not allocated". i looked it up but I can't seem to find a answer to why it does this?
thank you!
You seem to be trying to create an array of pointers. So let me show you what that looks like
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXT 10
int main( void )
{
char s[100]="search=cat+or+dog+store";
char buffer[100];
char **search = calloc( MAXT, sizeof(char *) );
if ( sscanf( s, "search=%s", buffer ) != 1 )
return 1;
int t = 0;
char *token = strtok( buffer, "+" );
while ( token != NULL && t < MAXT )
{
search[t++] = token;
token = strtok( NULL, "+" );
}
for ( int i = 0; i < t; i++ )
printf( "%s\n", search[i] );
free( search );
}
Things to look for
search is declared as a char ** meaning pointer to a char pointer, which can be used like an array of char pointers
in the calloc, the allocation is for 10 items of type char *, i.e. an array of 10 pointers
in the sscanf, the input and output strings must not be the same string. I changed the arguments so that s is the input, and buffer is the output. Also, you should always check that the return value from sscanf is equal to the number of items requested.
in the while loop, I've added a check t < MAXT to avoid running past the end of the pointer array
search is an array of pointers, and strtok returns a pointer, so the line search[t++]=token; stores the pointer in the array. The string itself is still in the buffer.
This line here:
strcpy(&search[k],"\0");
What you are doing is adding the string literal "\0" to the k'th position in memory (which works... but gross). I believe you are trying to do this:
search[k] = '\0'
Notice the single quotes ('') that is a character rather than a string literal.
You should also not be casting a malloc: char *search = (char *)malloc(...)
MAINLY:
You should also consider that printf("%s", string) only prints up until the nearest terminator ('\0') in 'string'. Reference here.
So check what you are buffering, and see if you can build any new conclusions...
And, when you print your string, you only need to printf("%s", search)
I highly suggest you use malloc(), especially for strings. Because calloc() initiates all values to zero. And '\0' == 0, so you could be making it more difficult for yourself to diagnose.

"integer from pointer without cast" when adding nullbyte to pointer

I was messing around with all of the string functions today and while most worked as expected, especially because I stopped trying to modify literals (sigh), there is one warning and oddity I can't seem to fix.
#include <stdio.h>
#include <string.h>
int main() {
char array[] = "Longword";
char *string = "Short";
strcpy(array, string); // Short
strcat(array, " "); // Short (with whitespace)
strcat(array, string); // Short Short
strtok(array, " "); // Short
if (strcmp(array, string) == 0)
{
printf("They are the same!\n");
}
char *substring = "or";
if (strstr(array, substring) != NULL)
{
printf("There's a needle in there somewhere!\n");
char *needle = strstr(array, substring);
int len = strlen(needle);
needle[len] = "\0"; // <------------------------------------------------
printf("Found it! There ya go: %s",needle);
}
printf("%s\n", array);
return 0;
}
Feel free to ignore the first few operations - I left them in because they modified array in a way, that made the strstr function useful to begin with.
The point in question is the second if statement, line 32 if you were to copy it in an editor.
(EDIT: Added arrow to the line. Sorry about that!)
This line is wrong:
needle[len] = "\0";
Doublequotes make a string literal, whose type is char *. But needle[len] is a char. To make a char literal you use singlequotes:
needle[len] = '\0';
See Single quotes vs. double quotes in C or C++
Your second strcat call overruns the end of array, corrupting whatever happens to be after it in memory. Once that happens, the later code might do just about anything, which is why writing past the end of an array is undefined behavior

string parsing in C

I'm trying to pass a string to chdir(). But I always seem to have some trailing stuff makes the chdir() fail.
#define IN_LEN 128
int main(int argc, char** argv) {
int counter;
char command[IN_LEN];
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
size_t path_len; char path[IN_LEN];
...
fgets(command, IN_LEN, stdin)
counter = 0;
tmp = strtok(command, delim);
while(tmp != NULL) {
*(tokens+counter) = tmp;
tmp = strtok(NULL, delim);
counter++;
}
if(strncmp(*tokens, cd_command, strlen(cd_command)) == 0) {
path_len = strlen(*(tokens+1));
strncpy(path, *(tokens+1), path_len-1);
// this is where I try to remove the trailing junk...
// but it doesn't work on a second system
if(chdir(path) < 0) {
error_string = strerror(errno);
fprintf(stderr, "path: %s\n%s\n", path, error_string);
}
// just to check if the chdir worked
char buffer[1000];
printf("%s\n", getcwd(buffer, 1000));
}
return 0;
}
There must be a better way to do this. Can any help out? I'vr tried to use scanf but when the program calls scanf, it just hangs.
Thanks
It looks like you've forgotten to append a null '\0' to path string after calling strncpy(). Without the null terminator chdir() doesn't know where the string ends and it just keeps looking until it finds one. This would make it appear like there are extra characters at the end of your path.
You have (at least) 2 problems in your example.
The first one (which is causing the immediately obvious problems) is the use of strncpy() which doesn't necessarily place a '\0' terminator at the end of the buffer it copies into. In your case there's no need to use strncpy() (which I consider dangerous for exactly the reason you ran into). Your tokens will be '\0' terminated by strtok(), and they are guaranteed to be smaller than the path buffer (since the tokens come from a buffer that's the same size as the path buffer). Just use strcpy(), or if you want the code to be resiliant of someone coming along later and mucking with the buffer sizes use something like the non-standard strlcpy().
As a rule of thumb don't use strncpy().
Another problem with your code is that the tokens allocation isn't right.
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
will allocate an area as large as your input string buffer, but you're storing pointers to strings in that allocation, not chars. You'll have fewer tokens than characters (by definition), but each token pointer is probably 4 times larger than a character (depending on the platform's pointer size). If your string has enough tokens, you'll overrun this buffer.
For example, assume IN_LEN is 14 and the input string is "a b c d e f g". If you use spaces as the delimiter, there will be 7 tokens, which will require a pointer array with 28 bytes. Quite a few more than the 14 allocated by the malloc() call.
A simple change to:
char** tokens = (char**) malloc((sizeof(char*) * IN_LEN) / 2);
should allocate enough space (is there an off-by-one error in there? Maybe a +1 is needed).
A third problem is that you potentially access *tokens and *(tokens+1) even if zero or only one token was added to the array. You'll need to add some checks of the counter variable before dereferencing those pointers.

Resources