The strtok Function With char * Argument - c

I was just playing around with some code, and ended up typing something along the lines of the following piece of code. The issue seems to be that the char *string line isn't actually interchangeable with a char string[], but I can't seem to wrap my head around why strtok(...) throws a "segmentation violation" if my argument is initialized as a char* to a string, or why it would even require an initialization of char[] instead?
#include <stdio.h>
#include <string.h>
//extern char *strtok (char *__restrict __s, const char *__restrict __delim);
char *string = "Hello world whats up?";
/*
SEGV - Must be char string[] in order to execute.
e.g. char string[] = "Hello world whats up?";
*/
char *delim = "\t ";
char *token;
int main (argc, argv)
int argc;
char **argv;
{
token = strtok(string, delim);
while ( token != NULL ) {
printf("%s\n", token);
token = strtok(NULL, delim);
}
}

The strtok function (potentially) modifies the string passed to it as its first argument. That is the critical point, here.
In your code snippet (not using the [] version), your string variable is initialized to be the address of a string literal. That literal is a constant and is likely to be placed in read-only memory. Thus, when you call strtok and that function finds a delimiter character, it attempts to replace that character with a nul, which would require writing to memory to which it does not have the required access – and your program crashes.
However, in your version using the [] syntax, you are declaring a (modifiable) array of characters and initializing it with a copy of the string literal.
In summary:
char* pc = "Hello, World!"; // pc points to a CONSTANT string literal
char ca[] = "Hello, World!"; // ca is a 'normal' array initialized with data

In C all literal strings are non-modifiable, they are in essence read-only.
When you define and initialize string you make it point to the first character of such a literal string.
This is the reason it's recommended to use const char * for literal strings.
If you want to modify the string in any way, and strtok modifies the string it tokenizes, then you must use an explicit modifiable array.

Related

I am unable to use strtok over a char

I am unable to use strtok over char *a but on b its working.
Its working with malloc and array
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define DELIM ":"
int main()
{
char *a = "xgdgsf: duh d";//unable to use strtok on it
char *token=NULL;
printf("%s",a);
char *b = malloc(strlen(a) + 1);
strcpy(b, a);//i can use strtko on it
token=strtok(b,DELIM);
printf("\n%s",token);
token=strtok(a,DELIM);
printf("\n%s",token);
return 0;
}
You have passed a to strtok which is a string literal. You can't do that because strtok modifies the string.
If you had defined the array as
const char *a = "xgdgsf: duh d";
then the compiler would give you a warning.
The second call to strtok should be
token=strtok(NULL, DELIM);
to extract the second token from b.
Change
char *a = "xgdgsf: duh d";
to
char *a = strdup("xgdgsf: duh d");
In C, string literals are read only. However, do remember that you need to free the pointer after calling strdup. Do it with free(a) when you are done with the string.
As others have pointed out, the problem is that a points to a string literal. strtok modifies its input (it overwrites the delimiters with a string terminator) and the behavior on attempting to modify a string literal is undefined - you may get a runtime error, the code may work as expected, or something else may happen. String literals are supposed to be immutable, but it's not guaranteed that they'll be stored in read-only memory.
There are several ways to resolve this. One is to declare a as an array and initialize it with the string:
char a[] = "xgdgsf: duh d";
Another is to dynamically allocate a buffer and copy the contents of the string to it:
char *a = malloc( strlen( "xgdgsf: duh d" ) + 1 );
if ( a )
strcpy( a, "xgdgsf: duh d" );
If it's available on your system, you can use the non-standard function strdup to do the same thing:
char *a = strdup( "xgdgsf: duh d" );
When declaring a pointer to a string literal, it's usually a good idea to qualify it as const:
const char *a = "xgdgsf: duh d";
that way the compiler will yak at you if you try to modify (assign to) *a or a[i] or pass it to a function that expects a plain char * as an input. That way mistakes like using strtok on a literal are caught at compile time, not runtime.

How To Assign char* to an Array variable

I have recently started to code in C and I am having quite a lot of fun with it.
But I ran into a little problem that I have tried all the solutions I could think of but to no success. How can I assign a char* variable to an array?
Example
int main()
{
char* sentence = "Hello World";
//sentence gets altered...
char words[] = sentence;
//code logic here...
return 0;
}
This of course gives me an error. Answer greatly appreciated.
You need to give the array words a length
char words[100]; // For example
The use strncpy to copy the contents
strncpy(words, sentence, 100);
Just in case add a null character if the string sentence is too long
words[99] = 0;
Turn all the compiler warnings on and trust what it says. Your array initializer must be a string literal or an initializer list. As such it needs an explicit size or an initializer. Even if you had explicitly initialized it still wouldn't have been assignable in the way you wrote.
words = sentence;
Please consult this SO post with quotation from the C standard.
As of:
How To Assign char* to an Array variable ?
You can do it by populating your "array variable" with the content of string literal pointed to by char *, but you have to give it an explicit length before you can do it by copying. Don't forget to #include <string.h>
char* sentence = "Hello World";
char words[32]; //explicit length
strcpy (words, sentence);
printf ("%s\n", words);
Or in this way:
char* sentence = "Hello World";
char words[32];
size_t len = strlen(sentence) + 1;
strncpy (words, sentence, (len < 32 ? len : 31));
if (len >= 32) words[31] = '\0';
printf ("%s\n", words);
BTW, your main() should return an int.
I think you can do it with strcpy :
#include <memory.h>
#include <stdio.h>
int main()
{
char* sentence = "Hello World";
char words[12];
//sentence gets altered...
strcpy(words, sentence);
//code logic here...
printf("%s", words);
return 0;
}
..if I didn't misunderstand. The above code will copy the string into the char array.
How To assign char* to an Array variable?
The code below may be useful for some occasions since it does not require copying a string or knowing its length.
char* sentence0 = "Hello World";
char* sentence1 = "Hello Tom!";
char *words[10]; // char *words[10] array can hold char * pointers to 10 strings
words[0] = sentence0;
words[1] = sentence1;
printf("sentence0= %s\n",words[0]);
printf("sentence1= %s\n",words[1]);
Output
sentence0= Hello World
sentence1= Hello Tom!
The statement
char* sentence = "Hello World";
Sets the pointer sentence to point to read-only memory where the character sequence "Hello World\0" is stored.
words is an array and not a pointer, you cannot make an array "point" anywhere since it is a
fixed address in memory, you can only copy things to and from it.
char words[] = sentence; // error
instead declare an array with a size then copy the contents of what sentence points to
char* sentence = "Hello World";
char words[32];
strcpy_s(words, sizeof(word), sentence); // C11 or use strcpy/strncpy instead
The string is now duplicated, sentence is still pointing to the original "Hello World\0" and the words
array contains a copy of that string. The array's content can be modified.
Among other answers I'll try to explain logic behind arrays without defined size. They were introduced just for convenience (if compiler can calculate number of elements - it can do it for you). Creating array without size is impossible.
In your example you try to use pointer (char *) as array initialiser. It is not possible because compiler doesn't know number of elements stayed behind your pointer and can really initialise the array.
Standard statement behind the logic is:
6.7.8 Initialization
...
22 If an array of unknown size is initialized, its size is determined
by the largest indexed element with an explicit initializer. At the
end of its initializer list, the array no longer has incomplete type.
I guess you want to do the following:
#include <stdio.h>
#include <string.h>
int main()
{
char* sentence = "Hello World";
//sentence gets altered...
char *words = sentence;
printf("%s",words);
//code logic here...
return 0;
}

difference between string pointer and string array

I was writing code to reinforce my knowledge, I got segmentation fault. So, I also got that I have to restock(completing imperfect knowledge) on my knowledge. The problem is about strtok(). When I run the first code there is no problem, but in second, I get segmantation fault. What is my "imperfect knowledge" ? Thank you for your appreciated answers.
First code
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "team_name=fenerbahce";
char *token;
token = strtok(str,"=");
while(token != NULL)
{
printf("%s\n",token);
token = strtok(NULL,"=");
}
return 0;
}
Second code
#include <stdio.h>
#include <string.h>
int main() {
char *str= "team_name=fenerbahce";
char *token;
token = strtok(str,"=");
while(token != NULL)
{
printf("%s\n",token);
token = strtok(NULL,"=");
}
return 0;
}
From strtok -
This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of strtok.
And in the second case, str is a string literal which resides in read only memory. Any attempt to modify string literals lead to undefined behavior.
You see string literals are the strings you write in "". For every such string, no-matter where it is used, automatically a global space is alloacted to store it. When you assign it to an array - you copy it's content into a new memory, that of the array. Otherwise you just store a pointer to it's global memory storage.
So this:
int main()
{
const char *str= "team_name=fenerbahce";
}
Is equal to:
const char __unnamed_string[] { 't', 'e', /*...*/, '\0' };
int main()
{
const char *str= __unnamed_string;
}
And when assigning the string to array, like this:
int main()
{
char str[] = "team_name=fenerbahce";
}
To this:
const char __unnamed_string[] { 't', 'e', /*...*/, '\0' };
int main()
{
char str[sizeof(__unnamed_string) / sizeof(char)];
for(size_t i(0); i < sizeof(__unnamed_string) / sizeof(char); ++i)
str[i] = __unnamed_string[i];
}
As you can see there is a difference. In the first case you're just storing a single pointer and in the second - you're copying the whole string into local.
Note: String literals are un-editable so you should store their address at a constant.
In N4296 - § 2.13.5 .8 states:
Ordinary string literals and UTF-8 string literals are also referred
to as narrow string literals. A narrow string literal has type “array
of n const char”, where n is the size of the string as defined below,
and has static storage duration
The reason behind this decision is probably because this way, such arrays can be stored in read-only segments and thus optimize the program somehow. For more info about this decision see.
Note1:
In N4296 - § 2.13.5 .16 states:
Evaluating a string-literal results in a string literal object with
static storage duration, initialized from the given characters as
specified above.
Which means exactly what I said - for every string-literal an unnamed global object is created with their content.
char *str= "team_name=fenerbahce";
char str[]= "team_name=fenerbahce";
The "imperfect" knowledge is about the difference between arrays and pointers! It's about the memory you cannot modify when you create a string using a pointer.
When you create a string you allocate some memory that will store those values (the characters of the string). In the next lines I will refer to this when I'll talk about the "memory allocated at the start".
When you create a string using an array you will create an array that will contain the same characters as the ones of the string. So you will allocate more memory.
When you create a string using a pointer you will point to the address of memory that contains that string (the one allocated at the start).
You have to assume that the memory created at the start is not writable (that's why you'll have undefined behavior, which means segmentation fault most of the times so don't do it).
Instead, when you create the array, that memory will be writable! That's why you can modify with a command like strtok only in this case

difference between character array initialized with string literal and one using strcpy

Please help me understand what is the difference using an initialized character array like char line[80]="1:2" ( which doesn't work !! ) and using char line[80] followed by strcpy(line,"1:2").
As per my understanding in first case I have a charachter array, it has been allocated memory, and i am copying a string literal to it. And the same is done in the second case. But obviously I am wrong. So what is wrong with my understanding.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void tokenize(char* line)
{
char* cmd = strtok(line,":");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, ":");
}
}
int main(){
char line[80]; //char line[80]="1:2" won't work
/*
char *line;
line = malloc(80*sizeof(char));
strcpy(line,"1:2");
*/
strcpy(line,"1:2");
tokenize(line);
return 0;
}
You are wrong. The result of these two code snippets
char line[80] = "1:2";
and
char line[80];
strcpy( line, "1:2" );
is the same. That is this statement
char line[80] = "1:2";
does work.:)
There is only one difference. When the array is initialized by the string literal all elements of the array that will not have a corresponding initialization character of the string literal will be initialized by '\0' that is they will be zero-initialized. While when function strcpy is used then the elements of the array that were not overwritten by characters of the string literal will have undeterminated values.
The array is initialized by the string literal (except the zero initialization of its "tail") the same way as if there was applied function strcpy.
The exact equivalence of the result exists between these statements
char line[80] = "1:2";
and
char line[80];
strncpy( line, "1:2", sizeof( line ) );
that is when you use function strncpy and specify the size of the target array.
If you mean that you passed on to function tokenize directly the string literal as an argument of the function then the program has undefined behaviour because you may not change string literals.
According to the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

char* and char[] are not same type? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why do I get a segmentation fault when writing to a string?
I'm experiencing a strange issue with my C code. I'm trying to split string using strtok function, but I get access violation exception. Here's my code:
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";
token = strtok(line, search); <-- this code causes crash
However, if I change char *line to char line[], everything works as expected and I don't get any error.
Anyone can explain why I get that (strange for me) behavior with strtok? I thought char* and char[] was the same and exact type.
UPDATE
I'm using MSVC 2012 compiler.
strtok() modifies the string that it parses. If you use:
char* line = "...";
then a string literal is being modified, which is undefined behaviour. When you use:
char[] line = "...";
then a copy of the string literal is being modified.
When assigning "LINE TO BE SEPARATED" to char *line, you make line point to an constant string written in the program executable. You are not allowed to modify it. You should declare those kind of variable as const char *.
When declared as char[], your string is declared on the stack of your function. Thus, you are able to modify it.
char *s = "any string"
is a definition of pointer which point to string or array of char(s). in the above example s is pointing to a constant string
char s[] = "any string"
is a definition of array of char(s). in the above example s is an array of char(s) which contains the charcters {'a','n','y',' ','s','t','r,'i','n','g','\0'}
strtock changes the content of your input string. it replaces the delimators in your string by the '\0' (null).
So, you can not use strtok with constant strings like this:
char *s="any string"
you can use strtok with dynamic memory or static memory like:
char *s = malloc(20 * sizeof(char)); //dynamic allocation for string
strcpy(s,"any string");
char s[20] = "any string"; //static allocation for string
char s[] = "any string"; //static allocation for string
To answer your question: char* and char[] are not same type?
This:
char *line = "LINE TO BE SEPARATED";
Is a string literal defined in read only memory. You can't change this string.
This, however:
char line[] = "LINE TO BE SEPARATED";
Is now a character array (the quoted text was copied into the array) placed on the stack. You are allowed to modify the characters in this array.
So they are both character arrays, but placed in different parts of memory.

Resources