I was writing code to reinforce my knowledge, I got segmentation fault. So, I also got that I have to restock(completing imperfect knowledge) on my knowledge. The problem is about strtok(). When I run the first code there is no problem, but in second, I get segmantation fault. What is my "imperfect knowledge" ? Thank you for your appreciated answers.
First code
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "team_name=fenerbahce";
char *token;
token = strtok(str,"=");
while(token != NULL)
{
printf("%s\n",token);
token = strtok(NULL,"=");
}
return 0;
}
Second code
#include <stdio.h>
#include <string.h>
int main() {
char *str= "team_name=fenerbahce";
char *token;
token = strtok(str,"=");
while(token != NULL)
{
printf("%s\n",token);
token = strtok(NULL,"=");
}
return 0;
}
From strtok -
This function is destructive: it writes the '\0' characters in the elements of the string str. In particular, a string literal cannot be used as the first argument of strtok.
And in the second case, str is a string literal which resides in read only memory. Any attempt to modify string literals lead to undefined behavior.
You see string literals are the strings you write in "". For every such string, no-matter where it is used, automatically a global space is alloacted to store it. When you assign it to an array - you copy it's content into a new memory, that of the array. Otherwise you just store a pointer to it's global memory storage.
So this:
int main()
{
const char *str= "team_name=fenerbahce";
}
Is equal to:
const char __unnamed_string[] { 't', 'e', /*...*/, '\0' };
int main()
{
const char *str= __unnamed_string;
}
And when assigning the string to array, like this:
int main()
{
char str[] = "team_name=fenerbahce";
}
To this:
const char __unnamed_string[] { 't', 'e', /*...*/, '\0' };
int main()
{
char str[sizeof(__unnamed_string) / sizeof(char)];
for(size_t i(0); i < sizeof(__unnamed_string) / sizeof(char); ++i)
str[i] = __unnamed_string[i];
}
As you can see there is a difference. In the first case you're just storing a single pointer and in the second - you're copying the whole string into local.
Note: String literals are un-editable so you should store their address at a constant.
In N4296 - § 2.13.5 .8 states:
Ordinary string literals and UTF-8 string literals are also referred
to as narrow string literals. A narrow string literal has type “array
of n const char”, where n is the size of the string as defined below,
and has static storage duration
The reason behind this decision is probably because this way, such arrays can be stored in read-only segments and thus optimize the program somehow. For more info about this decision see.
Note1:
In N4296 - § 2.13.5 .16 states:
Evaluating a string-literal results in a string literal object with
static storage duration, initialized from the given characters as
specified above.
Which means exactly what I said - for every string-literal an unnamed global object is created with their content.
char *str= "team_name=fenerbahce";
char str[]= "team_name=fenerbahce";
The "imperfect" knowledge is about the difference between arrays and pointers! It's about the memory you cannot modify when you create a string using a pointer.
When you create a string you allocate some memory that will store those values (the characters of the string). In the next lines I will refer to this when I'll talk about the "memory allocated at the start".
When you create a string using an array you will create an array that will contain the same characters as the ones of the string. So you will allocate more memory.
When you create a string using a pointer you will point to the address of memory that contains that string (the one allocated at the start).
You have to assume that the memory created at the start is not writable (that's why you'll have undefined behavior, which means segmentation fault most of the times so don't do it).
Instead, when you create the array, that memory will be writable! That's why you can modify with a command like strtok only in this case
Related
I was solving a challenge on CodeSignal in C. Even though the correct libraries where included, I couldn't use the strrev function in the IDE, so I looked up a similar solution and modified it to work. This is good. However, I don't understand the distinction between a literal string and an array. Reading all this online has left me a bit confused. If C stores all strings as an array with each character terminated by \0 (null terminated), how can there be any such thing as a literal string? Also if it is the case that strings are stored as an array, *would inputString store the address of the array or is it an array itself of all the individual characters stored.
Thanks in advance for any clarification provided!
Here is the original challenge, C:
Given the string, check if it is a palindrome.
bool solution(char * inputString) {
// The input will be character array type, storing a single character each terminated by \0 at each index
// * inputString is a pointer that stores the memory address of inputString. The memory address points to the user inputted string
// bonus: inputString is an array object starting at index 0
// The solution function is set up as a Boolean type ("1" is TRUE and the default "0" is FALSE)
int begin;
// The first element of the inputString array is at position 0, so is the 'counter'
int end = strlen(inputString) - 1;
// The last element is the length of the string minus 1 since the counter starts at 0 (not 1) by convention
while (end > begin) {
if (inputString[begin++] != inputString[end--]) {
return 0;
}
} return 1;
}
A string is also an array of symbols. I think that what you don't understand is the difference between a char pointer and a string. Let me explain in an example:
Imagine I have the following:
char str[20]="helloword";
str is the address of the first symbol of the array. In this case str is the address of h. Now try to printf the following:
printf("%c",str[0]);
You can see that it has printed the element of the addres that is 'h'.
If now I declare a char pointer, it will be poining to whatever char adress I want:
char *c_pointer = str+1;
Now print the element of c_pointer:
printf("%c",c_pointer[0]);
You can see that it will print 'e' as it is the element of the second adress of the original string str.
In addition, what printf("%s", string) does is to printf every elemet/symbol/char from the starting adress(string) to the end adress where its element is '\0'.
The linked question/answers in the comments pretty much cover this, but saying the same thing a slightly different way helps sometimes.
A string literal is a quoted string assigned to a char pointer. It is considered read only. That is, any attempts to modify it result in undefined behavior. I believe that most implementations put string literals in read-only memory. IMO, it's a shortcoming of C (fixed in C++) that a const char* type isn't required for assigning a string literal. Consider:
int main(void)
{
char* str = "hello";
}
str is a string literal. If you try to modify this like:
#include <string.h>
...
str[2] = 'f'; // BAD, undefined behavior
strcpy(str, "foo"); // BAD, undefined behavior
you're broken the rules. String literals are read only. In fact, you should get in the habit of assigning them to const char* types so the compiler can warn you if you try to do something stupid:
const char* str = "hello"; // now you should get some compiler help if you
// ever try to write to str
In memory, the string "hello" resides somewhere in memory, and str points to it:
str
|
|
+-------------------> "hello"
If you assign a string to an array, things are different:
int main(void)
{
char str2[] = "hello";
}
str2 is not read only, you are free to modify it as you want. Just take care not to exceed the buffer size:
#include <string.h>
...
str2[2] = 'f'; // this is OK
strcpy(str2, "foo"); // this is OK
strcpy(str2, "longer than hello"); // this is _not_ OK, we've overflowed the buffer
In memory, str2 is an array
str2 = { 'h', 'e', 'l', 'l', '0', '\0' }
and is present right there in automatic storage. It doesn't point to some string elsewhere in memory.
In most cases, str2 can be used as a char* because in C, in most contexts, an array will decay to a pointer to it's first element. So, you can pass str2 to a function with a char* argument. One instance where this is not true is with sizeof:
sizeof(str) // this is the size of pointer (either 4 or 8 depending on your
// architecture). If _does not matter_ how long the string that
// str points to is
sizeof(str2) // this is 6, string length plus the NUL terminator.
I want to pass string to a c function using pointer to char and modify it and it gave me segmentation fault. I don't know why ?
Note*: I know I can pass the string to array of character will solve the problem
I tried to pass it to array of character and pass to function the name of array and it works , but I need to know what the problem of passing the pointer to character.
void convertToLowerCase(char* str){
int i=0;
while(str[i] != '\0')
{
if(str[i]>='A'&& str[i]<='Z'){
str[i]+=32;
}
i++;
}
}
int main(void){
char *str = "AHMEDROSHDY";
convertToLowerCase(str);
}
I expect the output str to be "ahmedroshdy", but the actual output segmentation fault
This (you had char str* which is a syntax error, fixed that):
char *str = "AHMEDROSHDY";
is a pointer to a string literal, thus it cannot be modified, since it is stored in read-only memory.
You modify it here str[i]+=32;, which is not allowed.
Use an array instead, as #xing suggested, i.e. char str[] = "AHMEDROSHDY";.
To be more precise:
char *str = "AHMEDROSHDY";
'str` is a pointer to the string literal. String literals in C are not modifacable
in the C standard:
The behavior is undefined in the following circumstances: ...
The program attempts to modify a string literal
Generally, you can initialize a pointer with any string literals like char *str = "Hello". I think this means "Hello" returns the address of 'H'. However, the below isn't allowed.
#include <stdio.h>
#include <stdlib.h>
typedef struct {
char name[64];
} Student;
Student initialization(char *str) {
//Student tmp = {}; strcpy(tmp.name, str) //(*1)This is allowed.
//Student tmp = {"Hello"}; //(*2)This is allowed.
Student tmp = {str}; //(*3)This is not allowed.
return tmp;
}
int main(void) {
(...)
}
Could anyone tell me the reason why (*2) is allowed but (*3) is not allowed? Compiling this code makes the error below.
warning: initialization makes integer from pointer without a cast [-Wint-conversion]
Student tmp = {str};
^
All these cases you are trying to initialize a char array. Now after saying that - we can see it makes thing easier. Just like an char array where if we write down a string literal directly it initializes the char array with the content of the string literal.
But in the second case, the string literal which is basically a char array is converted to a pointer to the first element of it (the fist character of string literal) which is then used to initialize the char array. That will not work. Note that, even if str is a pointer to a char array which is not a literal this won't work. For the same reason as specified. Standard allows initialization from the string literal directly. Not other way round.
From standard 6.7.9p14
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
I know the C language has dynamic length strings whereby it uses the special character null (represented as 0) to terminate a string - rather than maintaining the length.
I have this simple C code that creates a string with the null character in the fifth index:
#include <stdio.h>
#include <stdlib.h>
int main () {
char * s= "sdfsd\0sfdfsd";
printf("%s",s);
s[5]='3';
printf("%s",s);
return 0;
}
Thus, a print of the string will only output up to the fifth index. Then the code changes the character at the fifth index to a '3'. Given my understanding, I assumed it would print the full string with the 3 instead of the null, as such:
sdfsdsdfsd3sfdfsd
but instead it outputs:
sdfsdsdfsd
Can someone explain this?
This program exhibits undefined behavior because you modify a read-only string literal. char* s = "..." makes s point to constant memory; C++ actually disallows pointing non-const char* to string literals, but in C it's still possible, and we have to be careful (see this SO answer for more details and a C99 standards quote)
Change the assignment line to:
char s[] = "sdfsd\0sfdfsd";
Which creates an array on the stack and copies the string to it, as an initializer. In this case modifying s[5] is valid and you get the result you expect.
String literals can not be changed because the compiler put the string literals into a read-only data-section (but this might vary by underlying platform). The effect of attempting to modify a string literal is undefined.
In your code:
char * s= "sdfsd\0sfdfsd"
Here, s is char pointer pointing to a string "sdfsd\0sfdfsd" stored in read-only memory, making it immutable.
Here you are trying to modify the content of read-only memory:
s[5]='3';
which leads to undefined behavior.
Instead, you can use char[]:
#include <stdio.h>
int main () {
char a[] = "sdfsd\0sfdfsd";
char * s = a;
printf("%s",s);
s[5]='3';
printf("%s\n",s);
return 0;
}
This operation has failed:
s[5] = 3;
You're trying to change a string literal, which is always read-only. My testing shows the program exited with segfault:
Segmentation fault (core dumped)
You should store it in an array (or allocated memory) before any attempts to change it:
char s[] = "sdfsd\0sfdfsd";
With the above change, the program works as intended.
#include <stdio.h>
int main(){
char x[10] = "aa\0a";
x[2] = '1';
puts(x);
printf("\n\n\nPress any key to exit...");
getch();
return 0;
}
Output: aa1a
include
#include <string.h>
int main()
{
char *array[10]={};
char* token;
token = "testing";
array[0] = "again";
strcat(array[0], token);
}
why it returns Segmentation fault?
I'm a little confused.
Technically, this isn't valid C. (It is valid C++, though.)
char *array[10]={};
You should use
char *array[10] = {0};
This declares an array of 10 pointers to char and initializes them all to null pointers.
char* token;
token = "testing";
This declares token as a pointer to char and points it at a string literal which is non-modifiable.
array[0] = "again";
This points the first char pointer of array at a string literal which (again) is a non-modifiable sequence of char.
strcat(array[0], token);
strcat concatenates one string onto the end of another string. For it to work the first string must be contained in writeable storage and have enough excess storage to contain the second string at and beyond the first terminating null character ('\0') in the first string. Neither of these hold for array[0] which is pointing directly at the string literal.
What you need to do is something like this. (You need to #include <string.h> and <stdlib.h>.)
I've gone for runtime calculation of sizes and dynamic allocation of memory as I'm assuming that you are doing a test for where the strings may not be of known size in the future. With the strings known at compile time you can avoid some (or most) of the work at compile time; but then you may as well do "againtesting" as a single string literal.
char* token = "testing";
char* other_token = "again";
/* Include extra space for string terminator */
size_t required_length = strlen(token) + strlen(other_token) + 1;
/* Dynamically allocated a big enough buffer */
array[0] = malloc( required_length );
strcpy( array[0], other_token );
strcat( array[0], token );
/* More code... */
/* Free allocated buffer */
free( array[0] );
How this works: char *array[10] is an array of 10 char * pointers (basically 10 same things as token).
token = "testing" creates static space somewhere in your program's memory at build time, and puts "testing" there. Then in run time, it puts address to that static "testing" to token.
array[0] = "again" does basically the same thing.
Then, strcat(array[0], token) takes address in array[0], and tries to add token's content to string at that address. Which gives you segfault, since array[0] points to read-only data segment in your memory.
How to do this properly:
char * initial = "first"; // pointer to static "first" string
char * second = "another"; // another one
char string[20]; // local array of 20 bytes
strcpy(string, initial); // copies first string into your read-write memory
strcat(string, second); // adds the second string there
Actually, if you don't want to shoot yourself in the foot, the better way to do something like the last two lines is:
snprintf(string, sizeof(string), "%s%s", initial, second);
snprintf then makes sure that you don't use more than 20 bytes of string. strcat and strcpy would happily go over the limit into invalid memory, and cause another run-time segfault or something worse (think security exploits) if the copied string were longer then the destination space.
To create a array of characters, char *array[10]={}; should instead be char array[10]={};
the segmentation fault occurs because array[0] points to "again", a string literal, and modifying string literals is a no-no(undefined behaviour)
If you're planning on changing the strings involved you should really allocate enough memory for what you need. For example instead of char *token; token = "testing"; you could use, say char token[20] = "testing";, which allows enough room for a 19 character string (plus the null byte at the end).
Similarly, you could use char array[10][20] = {"testing"}; to create an array of 10 strings and set the first one to testing.
You are putting a string at array[0] which is only one character.
Use array[0]='a' like this.