strtok - char array versus char pointer [duplicate]

strtok - char array versus char pointer [duplicate] - c

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
strtok wont accept: char *str
When using the strtok function, using a char * instead of a char [] results in a segmentation fault.
This runs properly:
char string[] = "hello world";
char *result = strtok(string, " ");
This causes a segmentation fault:
char *string = "hello world";
char *result = strtok(string, " ");
Can anyone explain what causes this difference in behaviour?

char string[] = "hello world";
This line initializes string to be a big-enough array of characters (in this case char[12]). It copies those characters into your local array as though you had written out
char string[] = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
The other line:
char* string = "hello world";
does not initialize a local array, it just initializes a local pointer. The compiler is allowed to set it to a pointer to an array which you're not allowed to change, as though the code were
const char literal_string[] = "hello world";
char* string = (char*) literal_string;
The reason C allows this without a cast is mainly to let ancient code continue compiling. You should pretend that the type of a string literal in your source code is const char[], which can convert to const char*, but never convert it to a char*.

In the second example:
char *string = "hello world";
char *result = strtok(string, " ");
the pointer string is pointing to a string literal, which cannot be modified (as strtok() would like to do).
You could do something along the lines of:
char *string = strdup("hello world");
char *result = strtok(string, " ");
so that string is pointing to a modifiable copy of the literal.

strtok modifies the string you pass to it (or tries to anyway). In your first code, you're passing the address of an array that's been initialized to a particular value -- but since it's a normal array of char, modifying it is allowed.
In the second code, you're passing the address of a string literal. Attempting to modify a string literal gives undefined behavior.

In the second case (char *), the string is in read-only memory. The correct type of string constants is const char *, and if you used that type to declare the variable you would get warned by the compiler when you tried to modify it. For historical reasons, you're allowed to use string constants to initialize variables of type char * even though they can't be modified. (Some compilers let you turn this historic license off, e.g. with gcc's -Wwrite-strings.)

The first case creates a (non const) char array that is big enough to hold the string and initializes it with the contents of the string. The second case creates a char pointer and initializes it to point at the string literal, which is probably stored in read only memory.
Since strtok wants to modify the memory pointed at by the argument you pass it, the latter case causes undefined behavior (you're passing in a pointer that points at a (const) string literal), so its unsuprising that it crashes

Because the second one declares a pointer (that can change) to a constant string...
So depending on your compiler / platform / OS / memory map... the "hello world" string will be stored as a constant (in an embedded system, it may be stored in ROM) and trying to modify it will cause that error.

Related

strcpy() on a string inside a struct - undefined? [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 2 years ago.
I have a struct defined similar to the following:
struct csv_headers {
char field1[256];
char field2[256];
};
And then later in my code I have:
struct csv_headers fields;
strcpy(fields.field1, str_ary[0]);
strcpy(fields.field2, str_ary[1]);
Where str_ary[n] is some string.
Does this cause undefined behaviour the same way the following code would?
char str[] = "some text";
strcpy(str, "text");
Since strcpy should not be used for string literals?
If so, what is the accepted way to copy a string into a string declared inside a struct? My code example above compiles with no warnings using -Wextra (assuming I typed it correctly - I don't have the code in front of me right now).
I am aware of the potential dangers of using strcpy and don't wish to discuss that here, that is for illustrative purposes only. I just want to know the accepted way of copying a string into a struct containing strings.

Does this cause undefined behaviour the same way the following code would?
char str[] = "some text";
strcpy(str, "text");
Since strcpy should not be used for string literals?
str isn’t a string literal, it’s a char[10] (initialised by copying data from a string literal1), and it’s a valid target for strcpy. This code does not cause undefined behaviour; it’s well-defined and valid.
Likewise, fields.field1 and fields.field2 in your example are regular char[]s. That they’re members of a struct isn’t important in this context.
However, be sure that the source is valid: str_ary[0] looks fishy, unless str_ary is an array of zero-terminated strings, i.e. something like char[][].
1 The initialisation of a string array from a string literal happens as if by strcpy, i.e. it’s almost exactly the same as writing
char str[sizeof("some text")];
strcpy(str, "some text");
In fact, compilers could (and at least in some cases do) generate the same code for this initialisation as for the direct assignment. By contrast, char *str = "some text"; doesn’t perform copying, and here str is a pointer to a string literal, and therefore read-only, and not a valid target for strcpy.

Does this cause undefined behaviour the same way the following code would?
Well, your snippet does not cause problems.
char str1[] = "foo"; // str1 is a regular array initialized to "foo"
char *str2 = "bar" // str2 is a pointer, pointing to a string literal
strcpy(str1, "FOO"); // Ok
strcpy(str2, "BAR"); // Undefined behavior

As long as the string fits in the target space, even
char str[] = "some text";
strcpy(str, "text");
is well-defined.
char field1[256]; declares an array with room for 256 characters. This allows you to store a null-terminated string of up to 255 characters (the array size, minus one for the null terminator).
char str[256] = "some text"; additionally initializes the array to contain the string "some text". You can still copy a string of up to 255 characters into the array, since that's how much room there is.
char str[] = "some text"; is exactly identical to char str[10] = "some text";. With the empty brackets, str is still an array of characters. What the empty brackets mean is that the size of the array (what would normally go inside the brackets) is determined automatically. There is no other difference compared with brackets with a size in it. For an array of characters, the size is the length of the string plus one for the null terminator.
In all cases involving an array, the string used to initialize the array (if there is one) is stored in the array itself. It isn't accessible in any other way, and if you change the array, the string that was used to initialize it is not present in memory anymore (at least in the sense that there's no way to find it — there's no guarantee that it won't be present in a memory dump, but you have no way to access it from your program).
The critical difference is not between char str[10] and char str[], but between char str[…] and char *p. With a * instead of brackets, char *p declares a pointer, not an array.
char *p = "some text";
does two things. One thing is that it arranges for a string literal "some text" to be stored somewhere that's accessible for the whole duration of the program. Another thing is that it declares a variable p that points to this string literal. You cannot modify the memory where "some text" is stored (in practice, on many platforms, it's in read-only memory). On the other hand, you can make something else point to that memory, and you can change p to point to something else.
char *p = "some text";
char *q;
puts(p); // prints "some text"
q = p;
p = "hello";
puts(q); // prints "some text"
puts(p); '// prints "hello"

Shouldn't it be impossible to point directly to text in C?

I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?

Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.

In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'

Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.

Pointer to Char Initialization in C

If this code is correct:
char v1[ ] = "AB";
char v2[ ] = {"AB"};
char v3[ ] = {'A', 'B'};
char v4[2] = "AB";
char v5[2] = {"AB"};
char v6[2] = {'A', 'B'};
char *str1 = "AB";
char *str2 = {"AB"};
Then why this other one is not?
char *str3 = {'A', 'B'};
To the best of my knowledge (please correct me if I'm wrong at any point) "AB" is a string literal and 'A' and 'B' are characters (integers,scalars). In char *str1 = "AB"; the string literal "AB" is defined and the char pointer is set to point to that string literal (to the first element). With char *str3 = {'A', 'B'}; two characters are defined and stored in subsequent memory positions, and the char pointer "should" be set to point to the first one. Why is that not correct?
In a similar way, a regular char array like v3[] or v6[2] can indeed be initialized with {'A', 'B'}. The two characters are defined, the array is set to point to them and thus, being "turned into" or treated like a string literal. Why a char pointer like char *str3 does not behave in the same way?
Just for the record, gcc compiler warnings I get are "initialization makes pointer from integer without a cast" when it gets to the 'A', and "excess elements in scalar initializer" when it gets to the 'B'.
Thanks in advance.

There is one thing you need to learn about constant string literals. Except when used to initialize an array (for example in the case of v1 in your example code) constant string literals are themselves arrays. For example if you use the literal "AB" it is stored somewhere by the compiler as an array of three characters: 'A', 'B' and the terminator '\0'.
When you initialize a pointer to point to a literal string, as in the case of str1 and str2, then you are making those pointers point to the first character in those arrays. You don't actually create an array named str1 (for example) you just make it point somewhere.
The definition
char *str1 = "AB";
is equivalent to
char *str1;
str1 = "AB";
Or rather
char unnamed_array_created_by_compiler[] = "AB";
char *str1 = unnamed_array_created_by_compiler;
There are also other problematic things with the definitions you show. First of all the arrays v3, v4, v5 and v6. You tell the compiler they will be arrays of two char elements. That means you can not use them as strings in C, since strings needs the special terminator character '\0'.
In fact if you check the sizes of v1 and v2 you will see that they are indeed three bytes large, once for each of the characters plus the terminator.
Another important thing you miss is that while constant string literals are arrays of char, you miss the constant part. String literals are really read-only, even if not stored as such. That's why you should never create a pointer to char (like str1 and str2) to point to them, you should create pointers to constant char. I.e.
const char *str1 = "AB";

(" ") is for string and (' ') is for character. for an string a memory has been allocated and for character not. pointers points to a memory and you must allocate an specified memory to it but for array of characters is not necessary.

C String manipulation pointer vs array notation [duplicate]

This question already has answers here:
getting segmentation fault in a small c program
(3 answers)
Closed 7 years ago.
Why does the first version make the program crash, while the second one doesn't? Aren't they the same thing?
Pointer Notation
char *shift = "mondo";
shift[3] = shift[2];
Array Notation
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
MWE
int main( void )
{
char *shift = "mondo";
shift[3] = shift[2];
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
return 0;
}

No! This is one of the important issues in C. In the first, you create a pointer to a read-only part of memory, i.e. you can not change it, only read it. The second, makes an array of characters, i.e. a part of memory of continuous characters where you can have both read and write access, meaning you can both read and change the values of the array.

First one points to a string literal (usually in a read only section of code, should really be const char * but able to get away with it due to historical reasons)|.
The second one creates an array and then populates that array.
Therefore they are not the same

The first is allocating memory in the .TEXT segment while the second is putting it into the .BSS. Memory in the .TEXT segment is, effectively, read only or const:
char *string = "AAAA";
This creates what is effectively a const char * since the memory will be allocated in the .TEXT segment as a string literal. Since this will typically be marked read-only, an attempt to write to it will generate an access violation or segmentation fault.
You want to do this:
char string[] = "AAAA";
This will work as expected and allocate memory for a string of four capital As and use the variable string as a pointer to the location.

This creates a pointer to an existing string:
char *shift = "mondo";
This creates a new array of characters:
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
In the second case, you are allowed to modify the characters because they are the ones that you just created.
In the first case, you are just pointing to an existing string, which should never be modified. The details of where the string is stored is up to the particular compiler. For example, it can store the string in unmodifyable memory. The compiler is also allowed to do tricks to save space. For example:
char *s1 = "hello there";
char *s2 = "there";
s2 might actually point to the same letter 't' that is at the seventh position of the string that s1 points to.
To avoid confusion, prefer to use const pointers with string literals:
const char *shift = "mondo";
This way, the compiler will let you know if you accidentally try to modify it.

Whenever you define a string using
char * str = "hello";
This is implicitly expressed by compiler
const char * str= "hello";
Which makes this symbol goes to read only location of program memory.
But in case of array the same is interpreted as
char const *array[];
That's why compiler screams when user try to change base address of array.
This is implicit done by compiler

Different string initialization yields different behavior?

How come when I use the following method, to be used to convert all the characters in a string to uppercase,
while (*postcode) {
*postcode = toupper(*postcode);
postcode++;
}
Using the following argument works,
char wrong[20];
strcpy(wrong, "la1 4yt");
But the following, doesn't, despite them being the same?
char* wrong = "la1 4yt";
My program crashes in an attempt to write to an illegal address (a segfault, I presume). Is it an issue with not mallocing? Not being null-terimanted? It shouldn't be...
Through debugging I notice it crashes on the attempt to assign the first character as its uppercase.
Any help appreciated!

char* wrong = "la1 4yt";
This declares a pointer to a string constant. The constant cannot be modified, which is why your code crashes. If you wrote the more pedantic
const char* wrong = "la1 4yt"; // Better
then the compiler would catch the mistake. You should probably do this any time you declare a pointer to a string literal rather than creating an array.
This, on the other hand, allocates read/write storage for twenty characters so writing to the space is fine.
char wrong[20];
If you wanted to initialize it to the string above you could do so and then would be allowed to change it.
char wrong[20] = "la1 4yt"; // Can be modified
char wrong[] = "la1 4yt"; // Can be modified; only as large as required

char * whatever = "some cont string";
Is read-only.

In the second variant, "la1 4yt" is a constant and therefore is in a read-only segment. Only the pointer (wrong) to the constant is writeable. That's why you get the segfault. In the first example however, everything is writable.
This one might be interesting: http://eli.thegreenplace.net/2009/10/21/are-pointers-and-arrays-equivalent-in-c/

See Question 8.5 in the C FAQ list.

When you do
char wrong[20] = "la1 4yt";
the compiler copies the elements of the string literal {'l', 'a', '1', ' ', '4', 'y', 't', '\0'} to the corresponding elements of the wrong array; when you do
char *wrong = "la1 4yt";
the compiler assigns to wrong the address of the string literal.
String literals are char[] (arrays of char), not const char[] ... but you cannot change them!!
Quote from the Standard:
6.4.5 String literals
6 It is unspecified whether these arrays are distinct provided
their elements have the appropriate values. If the program
attempts to modify such an array, the behavior is undefined.
When I use a string literal to initialize a char *, I usually also tell the compiler I will not be changing the contents of that string literal by adding a const to the definition.
const char *wrong = "la1 4yt";
Edit
Suppose you had
char *test1 = "example test";
char *test2 = "test";
And the compiler created 1 single string literal and used that single string literal to initialize both test1 and test2. If you were allowed to change the string literal ...
test1[10] = 'x'; /* attempt to change the 's' */
printf("%s\n", test2); /* print "text", not "test"! */

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight