explanation of what my code is doing (C) - c

char *extractSubstring(char *str)
{
char temp[256];
char *subString; // the "result"
printf("%s\n", str); //prints #include "hello.txt"
strcpy(temp, str); //copies string before tokenizing
subString = strtok(str,"\""); // find the first double quote
subString = strtok(NULL,"\""); // find the second double quote
printf("%s\n", subString); //prints hello.txt
strcpy(str, temp); //<---- the problem
printf("%s", subString); //prints hello.txt"
return subString;
}
After I strcpy, why does it add a quotation? When I comment out the 2nd strcpy line, the program works. The printfs will be deleted out of my program. I was just using it to show what was happening with my program.
Can someone please explain to me what is going on? Thank you.

It is important to realize that strtok() modifies the source string in-place, and returns pointers into it.
Thus, the two calls to strtok() turn str into
#include \0hello.txt\0
^ subString points here
(For simplicity, I don't show the final terminating \0).
Now, the second ("problematic") strcpy() changes str back to:
#include "hello.txt"
^ subString still points here
This is what makes the " reappear in subString.
One way to fix it is by tokenizing a copy and keeping the original intact. Just make sure that your function doesn't return a pointer to an automatic variable (that would go out of scope the moment the function returns).

The first thing to know is that strtok modifies the first argument (str), if this is a constant (such as when calling extractSubstring like so: extractSubstring("#include \"hello.txt\"");) then this leads to undefined behaviour.
You already copy str into temp so you should use temp in your calls to strtok. When the tokenizing is done you should copy subString into a variable that you either allocate on the heap (malloc) or that you pass to extractSubstring as an extra parameter. You can't return a pointer to a local array because the array runs out of scope the the function ends.
So in summary:
subString = strtok(temp, "\"");
subString = strtok(NULL, "\"");
char * ret = malloc(strlen(subString));
strcpy(ret, subString);
ret[strlen(ret)] = '\0';
return ret;

Related

strcat & Overwrite

I have read the documentation of strcat() C library function on a few websites.
I have also read here: Does strcat() overwrite or move the null?
However, one question is still left - can strcat() function be used to override the characters in the destionation string (assume that dest string has enough space for the source string, so there will be no errors)?
I ran the following code and found that it doesn't have the ability to override the dest string's characters...
char dest[20] = "Hello World";
char src[] = "char";
strcat(dest+1, src);
printf("dest: %s", dest);
Assume that the goal is to have a destination string that contains: "Hchar World!"
(I know that strcat() also copies the NULL characters('\0') to the dest string, so if printf() function is called, it should print Hchar, as I mistakenly thought would happen...).
Is that a possible task to do with strcat()? If not, is strcpy() the answer to the question?
If there is an assignment of '\0' (NULL character) in the middle of the string, for example, will strcat() always treat the first '\0' (NULL character) it meets? I mean, If I had:
char str[] = "Hello";
str[2]= 0;
strcat(str, "ab");
I just want to be sure and clarify the misunderstanding. I will be glad to read explanations.
As noted in the comments, the strcat function will always (attempt to) append the string given as its second argument (traditionally called src) to that given as its first (dest); it will produce undefined behaviour if either string is not null-terminated or if the destination buffer is not large enough.
The cppreference site gives better documentation (for both C and C++) than the website you linked. From that site's strcat page:
(1) … The character src[0] replaces the null terminator at the
end of dest. The resulting byte string is null-terminated.
And:
Notes
Because strcat needs to seek to the end of dest on each call, it is inefficient to concatenate many strings into one using strcat.
So, in the code you show, calling strcat(dest+1, src); has the same effect as calling strcat(dest, src);. However, calling strcpy(dest+1, src); will produce the result you want (printing Hchar).
strcat will write src string at the end of dst.
If you want to override dst with strcat, you first need to make dst "end" where you want to override it.
Take a look at this code sample:
#include <stdio.h>
#include <string.h>
int main()
{
char dst[20] = "Hello world";
char src[] = "char";
dst[1] = '\0';
strcat(dst, src);
printf("%s\n", dst);
return (0);
}
However, this is not the aim of strcat, and as said in the comments, the use of strcpy would be more appropriate here.
#include <stdio.h>
#include <string.h>
int main()
{
char dst[20] = "Hello world";
char src[] = "char";
strcpy(dst + 1, src);
printf("%s\n", dst);
return (0);
}

How does the compiler allocate memory for an array of strings in C?

I typed up this block of code for an assignment:
char *tokens[10];
void parse(char* input);
void main(void)
{
char input[] = "Parse this please.";
parse(input);
for(int i = 2; i >= 0; i--) {
printf("%s ", tokens[i]);
}
}
void parse(char* input)
{
int i = 0;
tokens[i] = strtok(input, " ");
while(tokens[i] != NULL) {
i++;
tokens[i] = strtok(NULL, " ");
}
}
But, looking at it, I'm not sure how the memory allocation works. I didn't define the length of the individual strings as far as I know, just how many strings are in the string array tokens (10). Do I have this backwards? If not, then is the compiler allocating the length of each string dynamically? In need of some clarification.
strtok is a bad citizen.
For one thing, it retains state, as you've implicitly used when you call strtok(NULL,...) -- this state is stored in the private memory of the Standard C Library, which means only single threaded programs can use strtok. Note that there is a reentrant version called strtok_r in some libraries.
For another, and to answer your question, strtok modifies its input. It doesn't allocate space for the strings; it writes NUL characters in place of your delimiter in the input string, and returns a pointer into the input string.
You are correct that strtok can return more than 10 results. You should check for that in your code so you don't write beyond the end of tokens. A reliable program would either set an upper limit, like your 10, and check for it, reporting an error if it's exceeded, or dynamically allocate the tokens array with malloc, and realloc it if it gets too big. Then the error occurs when you fun out of memory.
Note that you can also work around the problem of strtok modifying your input string by strduping before passing it to strtok. Then you'll have to free the new string after both it and tokens, which points to it, are going out of scope.
tokens is an array of pointers.
The distinction between strings and pointers if often fuzzy. In some situations strings are better thought out as arrays, in other situations as pointers.
Anyway... in your example input is an array and tokens is an array of pointers to a place within input.
The data inside input is changed with each call to strtok()
So, step by step
// input[] = "foo bar baz";
tokens[0] = strtok(input, " ");
// input[] = "foo\0bar baz";
// ^-- tokens[0] points here
tokens[1] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[1] points here
tokens[2] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[2] points here
// next strtok returns NULL

Dereference C string pointer into variable

I have the following simple program which creates a pointer to the first character of a string:
char str[] = "Hello world";
char *p = &str[0];
How can I then get this string back into a variable using only the pointer?
Dereferencing the pointer just gives the first character of the string - as somewhat expected - so I'm assuming that there is no 'simple' way to achieve this and it will instead require writing extra code.
The current way I would approach this would be as follows:
Iterate from the pointer until a null terminator is reached to find the length of the string
Create a new char array with this length
Iterate through again inserting characters into this array
Is there a library function to achieve this, or if not, a simpler way that doesn't involve iterating twice?
Yes you have to "do it by hand". Because there are no objects in C - you need to take care of all that happens in the code.
You can use malloc, strlen and memcpy:
char str[] = "Hello world";
char *p = malloc(strlen(str) + 1);
if (!p) { abort(); }
memcpy(p, str, strlen(str) + 1);
You can use strcpy and forget about one strlen:
char *p = malloc(strlen(str) + 1);
if (!p) { abort(); }
strcpy(p, str);
Or you can use strdup from POSIX or a C extension:
char *p = strdup(str);
if (!p) { abort(); }
...
Is there a library function to achieve this, or if not, a simpler way that doesn't involve iterating twice?
As said in comment, strdup() will do exactly what you want. But here there is another problem (by your point of view): strcpy() will iterate the string twice, because there is no other way to duplicate a string.
By definition, strings in C are a sequence of characters somewhere in memory, with the last one character being a NUL (with single L), the value 0 (in a char). References to strings are pointers to the first character in the sequence depicted above. Note that two different strings can point to the same memory (they are not so different then...), or a string can point into the middle of another. These two cases are somewhat particular but not uncommon. The memory for strings must be managed by the programmer, who is the only one to know where allocate and deallocate space for strings; functions like strcpy() do nothing special in this regard, they are (presumably) well written and optimized, so maybe to copy a string the behavior is not plain as I depicted it before, but the idea is the same.
try this code:
#include "stdio.h"
int main(){
char str[] = "Hello world";
int count = 12;
char (*p)[12] = &str;
printf("%c\n",(*p)[0]);
printf("%c\n",(*p)[1]);
printf("%c\n",(*p)[2]);
printf("%c\n",(*p)[3]);
printf("%s\n",(*p));
}
Here's how I would make a copy of a string using only the standard library functions:
#include <stdio.h> // printf
#include <stdlib.h> // malloc
#include <string.h> // strcpy
int main(void)
{
char str[] = "Hello world"; // your original string
char *p = (char *)malloc(strlen(str) + 1); // allocate enough space to hold the copy in p
if (!p) { // malloc returns a NULL pointer when it fails
puts("malloc failed.");
exit(-1);
}
strcpy(p, str); // now we can safely use strcpy to put a duplicate of str into p
printf("%s\n", p); // print out this duplicate to verify
return 0;
}

strtok affects the input buffer

I am using strtok to tokenise the string, Is strtok affects the original buffer? For e.g:
*char buf[] = "This Is Start Of life";
char *pch = strtok(buf," ");
while(pch)
{
printf("%s \n", pch);
pch = strtok(NULL," ");
}*
printf("Orignal Buffer:: %s ",buf);
Output is::
This
Is
Start
Of
life
Original Buffer:: This
I read that strtok returns pointer to the next token, then how the buf is getting affected? Is there way to retain original buffer (without extra copy overhead)?
Follow-on Question:: from so far answers I guess there is no way to retain the buffer. So what if I use dynamic array to create original buffer and if strtok is going to affect it, then there will be memory leak while freeing the original buffer or is strtok takes care of freeing memory?
strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok(). Therefore the original string gets affected.
strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token. Therefore after you run strtok() the delim characters will be replaced by NULL characters. You can read link1 link2.
As you can see in output of example in link2, the output you are getting is as expected since the delim character is replaced by strtok.
When you do strtok(NULL, "|"), strtok finds a token and puts null on place (replace delimiter with '\0') and modifies the string. So you need to make the copy of the original string before tokenization.
Please try following:
void main(void)
{
char buf[] = "This Is Start Of life";
char *buf1;
/* calloc() function will allocate the memory & initialize its to the NULL*/
buf1 = calloc(strlen(buf)+1, sizeof(char));
strcpy(buf1, buf);
char *pch = strtok(buf," ");
while(pch)
{
printf("%s \n", pch);
pch = strtok(NULL," ");
}
printf("Original Buffer:: %s ",buf1);
}

C String parsing errors with strtok(),strcasecmp()

So I'm new to C and the whole string manipulation thing, but I can't seem to get strtok() to work. It seems everywhere everyone has the same template for strtok being:
char* tok = strtok(source,delim);
do
{
{code}
tok=strtok(NULL,delim);
}while(tok!=NULL);
So I try to do this with the delimiter being the space key, and it seems that strtok() no only reads NULL after the first run (the first entry into the while/do-while) no matter how big the string, but it also seems to wreck the source, turning the source string into the same thing as tok.
Here is a snippet of my code:
char* str;
scanf("%ms",&str);
char* copy = malloc(sizeof(str));
strcpy(copy,str);
char* tok = strtok(copy," ");
if(strcasecmp(tok,"insert"))
{
printf(str);
printf(copy);
printf(tok);
}
Then, here is some output for the input "insert a b c d e f g"
aaabbbcccdddeeefffggg
"Insert" seems to disappear completely, which I think is the fault of strcasecmp(). Also, I would like to note that I realize strcasecmp() seems to all-lower-case my source string, and I do not mind. Anyhoo, input "insert insert insert" yields absolutely nothing in output. It's as if those functions just eat up the word "insert" no matter how many times it is present. I may* end up just using some of the C functions that read the string char by char but I would like to avoid this if possible. Thanks a million guys, i appreciate the help.
With the second snippet of code you have five problems: The first is that your format for the scanf function is non-standard, what's the 'm' supposed to do? (See e.g. here for a good reference of the standard function.)
The second problem is that you use the address-of operator on a pointer, which means that you pass a pointer to a pointer to a char (e.g. char**) to the scanf function. As you know, the scanf function want its arguments as pointers, but since strings (either in pointer to character form, or array form) already are pointer you don't have to use the address-of operator for string arguments.
The third problem, once you fix the previous problem, is that the pointer str is uninitialized. You have to remember that uninitialized local variables are truly uninitialized, and their values are indeterminate. In reality, it means that their values will be seemingly random. So str will point to some "random" memory.
The fourth problem is with the malloc call, where you use the sizeof operator on a pointer. This will return the size of the pointer and not what it points to.
The fifth problem, is that when you do strtok on the pointer copy the contents of the memory pointed to by copy is uninitialized. You allocate memory for it (typically 4 or 8 bytes depending on you're on a 32 or 64 bit platform, see the fourth problem) but you never initialize it.
So, five problems in only four lines of code. That's pretty good! ;)
It looks like you're trying to print space delimited tokens following the word "insert" 3 times. Does this do what you want?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char str[BUFSIZ] = {0};
char *copy;
char *tok;
int i;
// safely read a string and chop off any trailing newline
if(fgets(str, sizeof(str), stdin)) {
int n = strlen(str);
if(n && str[n-1] == '\n')
str[n-1] = '\0';
}
// copy the string so we can trash it with strtok
copy = strdup(str);
// look for the first space-delimited token
tok = strtok(copy, " ");
// check that we found a token and that it is equal to "insert"
if(tok && strcasecmp(tok, "insert") == 0) {
// iterate over all remaining space-delimited tokens
while((tok = strtok(NULL, " "))) {
// print the token 3 times
for(i = 0; i < 3; i++) {
fputs(tok, stdout);
}
}
putchar('\n');
}
free(copy);
return 0;
}

Resources