Related
As far as I know, a string literal can't be modified for example:
char* a = "abc";
a[0] = 'c';
That would not work since string literal is read-only. I can only modify it if:
char a[] = "abc";
a[0] = 'c';
However, in this post,
Parse $PATH variable and save the directory names into an array of strings, the first answer modified a string literal at these two places:
path_var[j]='\0';
array[current_colon] = path_var+j+1;
I'm not very familiar with C so any explanation would be appreciated.
In programming, there are quite a few rules that are up to you to follow, even though they are not — necessarily — enforced. And "String literals in C are not modifiable" is one of those. So is "Strings returned by getenv should not be modified".
There are some real-world analogies that apply. Here's one: If you're at an intersection, and the light is red, you're not supposed to cross. But, much of the time, if you break the rule, and cross, you might get away with it. You might get a ticket from a policeman — or you might not. You might cause a crash — or you might not. But if you get lucky, and neither of these things happens, that does not imply that crossing the intersection against the red light was okay — it's still quite true that it was very much against the rules.
Similarly, in C, if you write some code that modifies a string literal, or a string returned from getenv, you might get away with it. The compiler might give you a warning or error message — or it might not. Your program might crash — or it might not. But if the program seems to work, that does not imply that these strings are actually modifiable — they're not.
Code blocks from the post you linked:
const char *orig_path_var = getenv("PATH");
char *path_var = strdup(orig_path_var ? orig_path_var : "");
const char **array;
array = malloc((nb_colons+1) * sizeof(*array));
array[0] = path_var;
array[current_colon] = path_var+j+1;
First block:
In the 1st line getenv() returns a pointer to a string which is pointed to by orig_path_var. The string that get_env() returns should be treated as a read-only string as the behaviour is undefined if the program attempts to modify it.
In the 2nd line strdup() is called to make a duplicate of this string. The way strdup() does this is by calling malloc() and allocating memory for the size of the string + 1 and then copying the string into the memory.
Since malloc() is used, the string is stored on the heap, this allows us to edit the string and modify it.
Second block:
In the 1st line we can see that array points to a an array of char * pointers. There is nb_colons+1 pointers in the array.
Then in the 2nd line the 0th element of array is initilized to path_var (remember it is not a string literal, but a copy of one).
In the 3rd line, the current_colonth element of array is set to path_var+j+1. If you don't understand pointer arithmetic, this just means it assigns the address of the j+1th char of path_var to array[current_colon].
As you can see, the code is not operating on const string literals like orig_path_var. Instead it uses a copy made with strdup(). This seems to be where your confusion stems from so take a look at this:
char *strdup(const char *s);
The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3).
The above text shows what strdup() does according to its man page.
It may also help to read the malloc() man page.
In the example
char* a = "abc";
the token "abc" produces a literal object in the program image, and denotes an expression which yields that object's address.
In the example
char a[] = "abc";
The token "abc" is serves as an array initializer, and doesn't denote a literal object. It is equivalent to:
char a[] = { 'a', 'b', 'c', 0 };
The individual character values of "abc" are literal data is recorded somewhere and somehow in the program image, but they are not accessible as a string literal object.
The array a isn't a literal, needless to say. Modifying a doesn't constitute modifying a literal, because it isn't one.
Regarding the remark:
That would not work since string literal is read-only.
That isn't accurate. The ISO C standard (no version of it to date) doesn't specify any requirements for what happens if a program tries to modify a string literal. It is undefined behavior. If your implementation stops the program with some diagnostic message, that's because of undefined behavior, not because it is required.
C implementations are not required to support string literal modification, which has the benefits like:
standard-conforming C programs can be translated into images that can be be burned into ROM chips, such that their string literals are accessed directly from that ROM image without having to be copied into RAM on start-up.
compilers can condense the storage for string literals by taking advantage of situations when one literal is a suffix of another. The expression "string" + 2 == "ring" can yield true. Since a strictly conforming program will not do something like "ring"[0] = 'w', due to that being undefined behavior, such a program will thereby avoid falling victim to the surprise of "string" unexpectedly turning into "stwing".
There are several reasons for which you had better not to modify them:
The first is that the operating system and/or the compiler can enforce the non-writable property of string literals, putting them in read-only memory (e.g. ROM) or in the .text segment.
second, the compiler is allowed to merge string literals together, so if you modify (and do it successfully) you can get surprises later because other literals (that have been merged because e.g. one of them is a suffix of the other) change apparently by no reason.
if you need an initialized string that is modifiable, you can do it by allocating an array with a declaration, as in (which you can freely modify):
char array[100] = "abc"; // initialized to { 'a' ,'b', 'c', '\0',
// /* and 96 more '\0' characters */
// };
there are many similar questions regarding this Topic, but they do not answer the following question:
Taking a swing
I am going to take a swing, if you want go straight to the question in the next heading. Please correct me if I make any wrong assumptions here.
Lets assume, I have this string declaration
char* cpHelloWorld = "Hello World!";
I understand the Compiler will make a char* to an anonymous Array stored somewhere in the Memory (by the way: where is it stored?).
If I have this declaration
char cHelloWorld[] = "Hello World!";
There will be no anonymous Array, as the Compiler will create the Array cHelloWorld right away.
The first difference between these two variables is that I can change cpHelloWorld, whereas the Array cHelloWorld is read-only, and I would have to redeclare it if I want to Change it.
My question is following
cpHelloWorld = "This is a pretty long Phrase, compared to the Hello World phrase above.";
How does my application allocate at runtime a new, bigger (anonymous) Array at runtime? Should I use this approach with the pointer, as it seems easier to use or are there any cons? On paper, I would have used malloc, if I had to work with dynamic Arrays.
My guess is that the Compiler (or runtime Environment?) creates a new anonymous Array every time I change the Content of my Array.
char* cpHelloWorld = "Hello World!";
is a String Literal stored in read-only memory. You cannot modify the contents of this string.
char cHelloWorld[] = "Hello World!";
is an array of char initialized to "Hello World!\0".
(note: where the brackets are placed)
The amount of memory allocated at run-time by the compiler is set by the initialization "This is a pretty long ... phrase above."; The compiler will initialize the literal allowing 1 char for each char in the initialization string +1 for the required nul-terminating character.
Whether you use a statically declared array (e.g. char my_str[ ] = "stuff";) or you seek to dynamically allocate storage for the characters, largely depends on whether you know what, and how much, of whatever you wish to store. Obviously, if you know beforehand what the string is, using a string literal or an initialized array of type char is a simple way to go.
However, when you do NOT know what will be stored, or how much, then declaring a pointer to char (e.g. char *my_string; and then once you have the data to store, you can allocate storage for my_string (e.g. my_string = malloc (len * sizeof *my_string); (of course sizeof *my_string will be 1 for character arrays, so that can be omitted) (note: parenthesis are required with sizeof (explicit type), e.g. sizeof (int), but are optional when used with a variable)
Then simply free whatever you have allocated when the values are no longer needed.
As a matter of fact all strings known to the compiler at compile-time are allocated in the data segment of the program. The pointer itself is located on the stack.
There is no memory allocation at run-time, so it is nothing like malloc. There are no performance drawbacks here.
Each of the constant "anonymous" strings used in these contexts exists at its fixed address. The only dynamic part is the actual pointer assignment. You should get the same string address each time you execute a specific pointer assignment from a specific anonymous string (each string has its own address).
I was doing some pointers and arrays practice in C and I noticed all my four methods returned the same answer.My question is are there disadvantages of using any one of my below methods? I am stunned at how all these four give me the same output. I just noticed you can use a pointer as if it was an array and you can also use an array as if it was a pointer?
char *name = "Madonah";
int i= 0;
for (i=0;i<7; i++){
printf("%c", *(name+i));
}
char name1 [7] = "Madonah";
printf("\n");
int j= 0;
for (j=0;j<7; j++){
printf("%c", name1[j]);
}
char *name2 = "Madonah";
printf("\n");
int k= 0;
for (k=0;k<7; k++){
printf("%c", name2[k]);
}
char name3 [7] = "Madonah";
printf("\n");
int m= 0;
for (m=0;m<7; m++){
printf("%c", *(name+m));
}
Results:
Madonah
Madonah
Madonah
Madonah
It is true that pointers and arrays are equivalent in some context, "equivalent" means neither that they are identical nor even interchangeable. Arrays are not pointers.
It is pointer arithmetic and array indexing that are equivalent, pointers and arrays are different.
which one is preferable and the advantages/Disadvantages?
It depends how you want to use them. If you do not wanna modify string then you can use
char *name = "Madonah";
It is basically equivalent to
char const *name = "Madonah";
*(name + i) and name[i] both are same. I prefer name[i] over *(name + i) as it is neat and used most frequently by C and C++ programmers.
If you like to modify the string then you can go with
char name1[] = "Madonah";
In C, a[b], b[a], *(a+b) are equivalent and there's no difference between these 3. So you only have 2 cases:
char *name = "Madonah"; /* case 1 */
and
char name3 [7] = "Madonah"; /* case 2 */
The former is a pointer which points to a string literal. The latter is an array of 7 characters.
which one is preferred depends on your usage of name3.
If you don't intend to modify then the string then you can use (1) and I would also make it const char* to make it clear and ensure the string literal is not modified accidentally. Modifying string literal is undefined behaviour in C
If you do need to modify it then (2) should be used as it's an array of characters that you can modify. One thing to note is that in case (2), you have explicitly specified the size of the array as 7. That means the character array name3 doesn't have a null terminator (\0) at the end. So it can't be used as a string. I would rather not specify the size of the array and let the compiler calculate it:
char name3 [] = "Madonah"; /* case 2 */
/* an array of 8 characters */
Just in addition to what others said, I will add an image for better illustration. If you have
char a[] = "hello";
char *p = "world";
What happens in first case enough memory is allocated for a (6 characters) on the stack usually, and the string "hello" is copied to memory which starts at a. Hence, you can modify this memory region.
In the second case "world" is allocated somewhere else(usually in read only region), and a pointer to that memory is returned which is simply stored in p. You can't modify the string literal in this case via p.
Here is how it looks:
But for your question stick to notation which is easier, I prefer [].
More info on relationship between arrays and pointers is here.
In all cases, in C pointer + index is the same as pointer[index]. Also, in C an array name used in an expression is treated as a pointer to the first element of the array. Things get a little more mystifying when you consider that addition is commutative, which makes index + pointer and index[pointer] legal also. Compilers will usually generate similar code no matter how you write it.
This allows cool things like "hello"[2] == 'l' and 2["hello"] == 'l'.
it is convenient to use array syntax for random access
int getvalue(int *a, size_t x){return a[x];}
and pointer arithmetic syntax for sequential access
void copy_string(char *dest, const char *src){while((*dest++=*src++));}
Many compilers can optimize sequential accesses to pointers better than using array indices.
For practicing pointer and array idioms in C, those are all valid code blocks and illustrative of the different ways of expressing the same thing.
For production code, you wouldn't use any of those; you would use something both easier for humans to read and more likely to be optimized by the compiler. You'd dispense with the loop entirely and just say:
printf("%s", name);
(Note that this requires name include the \0 character at the end of the string, upon which printf relies. Your name1 and name3 definitions, as written, do not allocate the necessary 8 bytes.)
If you were trying to do something tricky in your code that required one of the more verbose methods you posted, which method you chose would depend on exactly what tricky thing you were trying to do (which you would of course explain in code comments -- otherwise, you would look at your own code six months later and ask yourself, "What the heck was I doing here??"). In general, for plucking characters out of a string, name[i] is a more common idiom than *(name+i).
char name[] = "Madonah";
char* name = "Madonah";
When declared inside a function, the first option yields additional operations every time the function is called. This is because the contents of the string are copied from the RO data section into the stack every time the function is called.
In the second option, the compiler simply sets the variable to point to the address of that string in memory, which is constant throughout the execution of the program.
So unless you're planning to change the contents of the local array (not the original string - that one is read-only), you may opt for the second option.
Please note that all the details above are compiler-implementation dependent, and not dictated by the C-language standard.
can I know which one is easy to use [...]
Try sticking to use the indexing operator [].
One time your code gets more complex and you then notice that things are "more easy" to code using pointer arithmetical expressions.
This typically will be the case when you also start to use the address-of operator: &.
If for example you see your self in the need to code:
char s[] = "Modonah";
char * p = &s[2]; /* This gives you the address of the 'd'. */
you will soon notice that it's "easier" to write:
char * p = s + 2; /* This is the same as char * p = &s[2];. */
An array is just a variable that contains several values, each value has an index, you probably know that already.
Pointers are one of those things you dont need to know about until you realize you need to use them. Pointers are not variables themselves they are literally pointers to variables.
An example of how and why you might want to use a pointer.
You create a variable in one function which you want to use in another function.
You could pass your variable to the new function in the function header.
This effectively COPIES the values from the original variable to a new variable local to the new function. Changes made to it in the new function only change the new variable in the new function.
But what if you wanted changes made in the new function to change the original variable where it is in the first function ?
You use a pointer.
Instead of passing the actual variable to the new function, you pass a pointer to the variable. Now changes made to the variable in the new function are reflected in the original variable in the first function.
This is why in your own example using the pointer to the array and using the actual array while in the same function has identical results. Both of them are saying "change this array".
I often want to do something like this:
unsigned char urlValid[66] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
...or:
unsigned char listOfChar[4] = "abcd";
...that is, initialize a character array from a string literal and ignoring the null terminator from that literal. It is very convenient, plus I can do things like sizeof urlValid and get the right answer.
But unfortunately it gives the error initializer-string for array of chars is too long.
Is there a way to either:
Turn off errors and warnings for this specific occurrence (ie, if there's no room for null terminator when initialising a char array)
Do it better, maintaining convenience and readability?
You tagged your question as both C and C++. In reality in C language you would not receive this error. Instead, the terminating zero simply would not be included into the array. I.e. in C it works exactly as you want it to work.
In C++ you will indeed get the error. In C++ you'd probably have to accommodate the terminating zero and remember to subtract 1 from the result of sizeof.
Note also, that as #Daniel Fischer suggested in the comments, you can "decouple" definition from initialization
char urlValid[66];
memcpy(urlValid, "ab...", sizeof urlValid);
thus effectively simulating the behavior of C language in C++.
Well, in C++ you should always use std::string. It's convenient and not prone to memory leaks etc.
You can, however, initialize an array without specifying the size:
char urlValid[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
This works since the compiler can deduce the correct size from the string literal. Another advantage is that you don't have to change the size if the literal changes.
Edit:You should not use unsigned char for strings.
Initialise with an actual array of chars?
char urlValid[] = {'a','b','c','d','e','f',...};
there are two simple solutions.
You can either add an extra element into the array like this:
unsigned char urlValid[67] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
or leave out the size of the array all together:
unsigned char urlValid[] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is
the string is broken into substrings,
each terminating with a '\0', where
the '\0' replaces any characters
contained in string s2. The first call
uses the string to be tokenized as s1;
subsequent calls use NULL as the first
argument. A pointer to the beginning
of the current token is returned; NULL
is returned if there are no more
tokens.
Hi,
I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.
Why is this?
I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.
What did you initialize the char * to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.
Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
I blame the C standard.
char *s = "abc";
could have been defined to give the same error as
const char *cs = "abc";
char *s = cs;
on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]
If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.
It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.
[Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]
In brief:
char *s = "HAPPY DAY";
printf("\n %s ", s);
s = "NEW YEAR"; /* Valid */
printf("\n %s ", s);
s[0] = 'c'; /* Invalid */
If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.