Below is the implementation of strlen.c as per "The Standard C Library,
size_t strlen(const char *s){
const char *sc;
for(sc = s; *sc != '\0'; ++sc)
return (sc-s); }
Is my understanding of the legality of sc = s correct?
sc=s is a legal assignment because since both variables are declared as const, both protect the object that is pointed to by s. In this case, it is legal to change where sc or s both point to but any assignment (or reference?) to *s or sc would be illegal.
I think what you are asking is what the const keyword means. If not please clarify your question.
The way I like to think of it is any const variable can be stored in ROM (Read Only Memory) and variables that are not declared const can be stored in RAM (Random Access Memory). This kind of depends on the kind of computer you are working with so the const data may not actually be stored in ROM but it could be.
So you can do anything you want with the pointer itself but you can not change the data in the memory it points to.
This means you can reference the pointer and pass that around as much as you like. Also you can assign a different value to the pointer.
Say you have this code
const char* foo = "hello";
const char* bar = "world";
Its perfectly legal to do
foo = bar;
Now both point "world"
Its also legal to do
const char *myPtr = bar;
myPtr = foo;
What you are not allowed to do is change the actual data memory so you are not allowed to do
foo[0] = 'J';
You are correct.
const char * sc declares a pointer to a const char. In essence, it means that sc points to a variable of type char (or in that case, a contiguous array of chars) and that you cannot use sc to modify the pointed variable. See it live here.
Note that sc itself is not a const variable. The const applies to the pointed variable, and not to the pointer. You can thus change the value of the pointer, i.e. the variable to which it points.
Follow this answer to have more insight about the different uses of const and pointers : What is the difference between const int*, const int * const, and int const *?
Is my understanding of the legality of sc = s correct?
Yes, only some detail on the last part needed.
... but any assignment (or reference?) to *s or sc would be illegal.
(I suspect OP means "... or *sc would be illegal.")
Referencing what s or sc points to is OK as in char ch = *sc;
Attempting to change the value of *s or *sc is undefined behavior (UB), not "illegal" as in *sc = 'x';
(See good additional detail by #rici)
With UB, the assignment may work, it might not on Tuesdays, code may crash, etc. It is not defined by C what happens. Certainty code should not attempt it.
This question already has answers here:
In C, why can't an integer value be assigned to an int* the same way a string value can be assigned to a char*?
(5 answers)
Why it is possible to assign string to character pointer in C but not an integer value to an integer pointer
(3 answers)
Assigning strings to pointer in C Language
(4 answers)
Why must int pointer be tied to variable but not char pointer?
(8 answers)
Closed 4 years ago.
Still learning more C and am a little confused. In my references I find cautions about assigning a pointer that has not been initialized. They go on to give examples. Great answers yesterday by the way from folks helping me with pointers, here:
Precedence, Parentheses, Pointers with iterative array functions
On follow up I briefly asked about the last iteration of the loop and potentially pointing the pointer to a non-existent place (i.e. because of my references cautioning against it). So I went back and looked more and find this:
If you have a pointer
int *pt;
then use it without initializing it (i.e. I take this to mean without a statement like *pt= &myVariable):
*pt = 606;
you could end up with a real bad day depending on where in memory this pointer has been assigned to. The part I'm having trouble with is when working with a string of characters something like this would be ok:
char *str = "Sometimes I feel like I'm going crazy.";
Where the reference says, "Don't worry about where in the memory the string is allocated; it's handled automatically by the compiler". So no need to say initialize *str = &str[0]; or *str = str;. Meaning, the compiler is automatically char str[n]; in the background?
Why is it that this is handled differently? Or, am I completely misunderstanding?
In this case:
char *str = "Sometimes I feel like I'm going crazy.";
You're initializing str to contain the address of the given string literal. You're not actually dereferencing anything at this point.
This is also fine:
char *str;
str = "Sometimes I feel like I'm going crazy.";
Because you're assigning to str and not actually dereferencing it.
This is a problem:
int *pt;
*pt = 606;
Because pt is not initialized and then it is dereferenced.
You also can't do this for the same reason (plus the types don't match):
*pt= &myVariable;
But you can do this:
pt= &myVariable;
After which you can freely use *pt.
When you write sometype *p = something;, it's equivalent to sometype *p; p = something;, not sometype *p; *p = something;. That means when you use a string literal like that, the compiler figures out where to put it and then puts its address there.
The statement
char *str = "Sometimes I feel like I'm going crazy.";
is equivalent to
char *str;
str = "Sometimes I feel like I'm going crazy.";
Simplifying the string literal can be expressed as:
const char literal[] = "Sometimes I feel like I'm going crazy.";
so the expression
char *str = "Sometimes I feel like I'm going crazy.";
is logically equivalent to:
const char literal[] = "Sometimes I feel like I'm going crazy.";
const char *str = literal;
Of course literals do not have the names.
But you can't dereference the char pointer which does not have allocated memory for the actual object.
/* Wrong */
char *c;
*c = 'a';
/* Wrong - you assign the pointer with the integer value */
char *d = 'a';
/* Correct */
char *d = malloc(1);
*d = 'a';
/* Correct */
char x
char *e = &x;
*e = 'b';
The last example:
/* Wrong - you assign the pointer with the integer value */
int *p = 666;
/* Wrong you dereference the pointer which references to the not allocated space */
int *r;
*r = 666;
/* Correct */
int *s = malloc(sizeof(*s));
*s = 666;
/* Correct */
int t;
int *u = &t;
*u = 666;
And the last one - something similar to the string literals = the compound literals:
/* Correct */
int *z = (int[]){666,567,234};
z[2] = 0;
*z = 5;
/* Correct */
int *z = (const int[]){666,567,234};
Good job on coming up with that example. It does a good job of showing the difference between declaring a pointer (like char *text;) and assigning to a pointer (like text = "Hello, World!";).
When you write:
char *text = "Hello!";
it is essentially the same as saying:
char *text; /* Note the '*' before text */
text = "Hello!"; /* Note that there's no '*' on this line */
(Just so you know, the first line can also be written as char* text;.)
So why is there no * on the second line? Because text is of type char*, and "Hello!" is also of type char*. There is no disagreement here.
Also, the following three lines are identical, as far as the compiler is concerned:
char *text = "Hello!";
char* text = "Hello!";
char * text = "Hello!";
The placement of the space before or after the * makes no difference. The second line is arguably easier to read, as it drives the point home that text is a char*. (But be careful! This style can burn you if you declare more than one variable on a line!)
As for:
int *pt;
*pt = 606; /* Unsafe! */
you might say that *pt is an int, and so is 606, but it's more accurate to say that pt (without a *) is a pointer to memory that should contain an int. Whereas *pt (with a *) refers to the int inside the memory that pt (without the *) is pointing to.
And since pt was never initialized, using *pt (either to assign to or to de-reference) is unsafe.
Now, the interesting part about the lines:
int *pt;
*pt = 606; /* Unsafe! */
is that they'll compile (although possibly with a warning). That's because the compiler sees *pt as an int, and 606 as an int as well, so there's no disagreement. However, as written, the pointer pt doesn't point to any valid memory, so assigning to *pt will likely cause a crash, or corrupt data, or usher about the end of the world, etc.
It's important to realize that *pt is not a variable (even though it is often used like one). *pt just refers to the value in the memory whose address is contained in pt. Therefore, whether *pt is safe to use depends on whether pt contains a valid memory address. If pt isn't set to valid memory, then the use of *pt is unsafe.
So now you might be wondering: What's the point of declaring pt as an int* instead of just an int?
It depends on the case, but in many cases, there isn't any point.
When programming in C and C++, I use the advice: If you can get away with declaring a variable without making it a pointer, then you probably shouldn't declare it as a pointer.
Very often programmers use pointers when they don't need to. At the time, they aren't thinking of any other way. In my experience, when it's brought to their attention to not use a pointer, they will often say that it's impossible not to use a pointer. And when I prove them otherwise, they will usually backtrack and say that their code (which uses pointers) is more efficient than the code that doesn't use pointers.
(That's not true for all programmers, though. Some will recognize the appeal and simplicity of replacing a pointer with a non-pointer, and gladly change their code.)
I can't speak for all cases, of course, but C compilers these days are usually smart enough to compile both pointer code and non-pointer code to be practically identical in terms of efficiency. Not only that, but depending on the case, non-pointer code is often more efficient than code that uses pointers.
There are 4 concepts which you have mixed up in your example:
declaring a pointer. int *p; or char *str; are declarations of the pointers
initializing a pointer at declaration. char *str = "some string"; declares the pointer and initializes it.
assigning a value to the pointer. str = "other string"; assigns a value to the pointer. Similarly p = (int*)606; would assign the value of 606 to the pointer. Though, in the first case the value is legal and points to the location of the string in static memory. In the second case you assign an arbitrary address to p. It might or might not be a legal address. So, p = &myint; or p = malloc(sizeof(int)); are better choices.
assigning a value to what the pointer points to. *p = 606; assigns the value to the 'pointee'. Now it depends, if the value of the pointer 'p' is legal or not. If you did not initialize the pointer, it is illegal (unless you are lucky :-)).
Many good explanations over here. The OP has asked
Why is it that this is handled differently?
It is a fair question, he means why, not how.
Short answer
It is a design decision.
Long answer
When you use a literal in an asigment, the compiler has two options: either it places the literal in the generated assembly instruction (maybe allowing variable length assembly instructions to accomodate different literal byte lenghts) or it places the literal somewhere the cpu can reach it (memory, registers...). For ints, it seems a good choice to place them on the assembly instruction, but for strings... almost all strings used in programs (?) are too long to be placed on the assembly instruction. Given that arbitrarily long assembly instructions are bad for general purpose CPUs, C designers have decided to optimize this use case for strings and save the programmer one step by allocating memory for him. This way, the behaviour is consistent across machines.
Counterexample
Just to see that, for other languages, this has not to be necessarily the case, check this. There (it is Python), int constants are actually placed in memory and given an id, always. So, if you try to get the address of two different variables that were asigned the same literal, it will return the same id (since they are refereing to the same literal, already placed in memory by the Python loader). It is useful to stress that in Python, the id is equivalent to an address in the Python's abstract machine.
Each byte of memory is stored in its own numbered pigeon-hole. That number is the "address" of that byte.
When your program compiles, it builds up a data-table of constants. At run-time these are copied into memory somewhere. So upon execution, in memory is the string (here at the 100,000th byte):
#100000 Sometimes I feel like I'm going crazy.\0
The compiler has generated code, such that when the variable str is created, it is automatically initialised with the address of where that string came to be stored. So in this example's case, str -> 100000. This is where the name pointer comes from, str does not actually contain that string-data, it holds the address of it (i.e. a number), "pointing" to it, saying "that piece of data at this address".
So if str was treated like an integer, it would contain the value 100000.
When you dereference a pointer, like *str = '\0', it's saying: The memory str points at, put this '\0' there.
So when the code defines a pointer, but without any initialisation, it could be pointing anywhere, perhaps even to memory the executable doesn't own (or owns, but can't write to).
For example:
int *pt = blah; // What does 'pt' point at?
It does not have an address. So if the code tries to dereference it, it's just pointing off anywhere in memory, and this gives indeterminate results.
But the case of:
int number = 605;
int *pt = &number
*pt = 606;
Is perfectly valid, because the compiler has generated some space for the storage of number, and now pt contains the address of that space.
So when we use the address-of operator & on a variable, it gives us the number in memory where the variable's content is stored. So if the variable number happened to be stored at byte 100040:
int number = 605;
printf( "Number is stored at %p\n", &number );
We would get the output:
Number is stored at 100040
Similarly with string-arrays, these are really just pointers too. The address is the memory-number of the first element.
// words, words_ptr1, words_ptr2 all end up being the same address
char words[] = "Sometimes I feel like I'm going crazy."
char *words_ptr1 = &(words[0]);
char *words_ptr2 = words;
There are answers here with very good and detailed information.
I will post another answer, perhaps targeting more straightly to the OP.
Rephrasing it a bit:
Why is
int *pt;
*pt = 606;
not ok (non working case), and
char *str = "Sometimes I feel like I'm going crazy.";
is ok (working case)?
Consider that:
char *str = "Sometimes I feel like I'm going crazy.";
is equivalent to
char *str;
str = "Sometimes I feel like I'm going crazy.";
The closest "analogous", working case for int is (using a compound literal instead of a string literal)
int *pt = (int[]){ 686, 687 };
or
int *pt;
pt = (int[]){ 686, 687 };
So, the differences with your non-working case are three-fold:
Use pt = ... instead of *pt = ...
Use a compound literal, not a value (by the same token, str = 'a' wouldn't work).
Compound literals are not always guaranteed to work, since the lifetime of its storage depends on standard/implementation.
In fact, its use as above may give the compilation error taking address of temporary array.
A string variable can be declared either as an array of characters char txt[] or using a character pointer char* txt. The following illustrates the declaration and initialization of a string:
char* txt = "Hello";
In fact, as illustrated above, txt is a pointer to the first character of the string literal.
Whether we are able to modify (read/write) a string variable or not, depends on how we declared it.
6.4.5 String literals (ISO)
6. It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
Actually, if we declare a string txt like we previously did, the compiler will declare the string literal in a read-only data section .rodata (platform dependent) even if txt is not declared as const char*. So we can not modify it. Actually, we should not even try to modify it. In this case gcc can fire warnings (-Wwrite-strings) or even fail due to -Werror. In this cas, it is better to declare string variable as const pointers:
const char* txt = "Hello";
On the other hand, we can declare a string variable as an array of characters:
char txt[] = "Hello";
In that case, the compiler will arrange for the array to get initialized from the string literal, so you can modify it.
Note: An array of characters can be used as if it was a pointer to its first character. That's why we can use txt[0] or *txt syntax to access the first character. And we can even explicitly convert an array of characters to a pointer:
char txt[] = "Hello";
char* ptxt = (char*) txt;
This question already has answers here:
How are string literals compiled in C?
(2 answers)
"Life-time" of a string literal in C
(9 answers)
Closed 8 years ago.
I have written a simple c code which shows below. In this code snippet I want to verify where the const string abcd stores. I first guess that it should be stored in .data section for read-only. After a test in Debian, however, things is different from what I initial guessed. By checking the assembly code which generated by gcc, I find it is placed in the stack frame of function p. But when I try it later in OSX, the string is stored in .data section again. Now I am confused by this. Is there any standard for the storing of const string?
#include<stdio.h>
char *p()
{
char p[] = "abcd";
return p;
}
int main()
{
char *pp = p();
printf("%s\n",pp);
return 0;
}
UPDATE: rici's answer awaken me. In OSX, the initial literal is stored in .data and then moved into function's stack frame later. Thus, it becomes a local variable for this function. However, gcc in Debian handle this situation is different from OSX. In Debian, gcc directly stored literal in stack instead of moving it from .data. I'm sorry for my carelessness.
in your case, it's located in stack. and returning the pointer to main will cause undefined behavior. but, if you have static char p[] = "abcd"; or char *p = "abcd"; they(the data) are located in .data.
There is a huge difference between:
const char s[] = "abcd";
and
const char* t = "abcd";
The first of these declares s to be an array object initialized from the string "abcd". s will have an address distinct from that of any other object in the program. The character string itself might be a compile-time artifact; the initialization is a copy so the character string does not need to be present at runtime if the compiler can find some other way of performing the initialization (such as a store immediate operation).
The second declaration declares t to be a pointer to a string constant. The string constant now must be present at runtime, because expressions like t+1, which are pointers inside the string, are valid. The language standard does not guarantee that every occurrence of string literals in the program is unique, nor does it guarantee that all occurrence are merged (although good compilers will try to do the second.) It does, however, guarantee that they have static lifetime.
Consequently, this is undefined behaviour, because the lifetime of the array s ends when the function returns:
const char *gimme_a_string() {
const char s[] = "abcd";
return s;
}
However, this is fine:
const char *gimme_a_string() {
const char *s = "abcd";
return s;
}
Also:
const char s[] = "abcd";
const char t[] = "abcd";
printf("%d\n", s == t);
is guaranteed to print 0, while
const char* s = "abcd";
const char* t = "abcd";
printf("%d\n", s == t);
might print either 0 or 1, depending on the implementation. (As written, it will almost certainly print 1. However, if the two declarations are in separate compilation units and lto is not enable, it is likely to print 0.)
Since the array form is initialized with a copy the non-const version is fine:
char s[] = "abcd";
s[3] = 'C';
But the char pointer version must be a const to avoid undefined behaviour.
// Will produce a warning on most compilers with compile option -Wall or equivalent
char* s = "abcd";
// *** UNDEFINED BEHAVIOUR *** Can cause random program breakage
s[3] = 'C';
Technically, the non-const declaration of s is legal (which is why the compiler only warns) because it is the attempt to modify the constant which is UB. But you should always heed compiler warnings; it is better to think of the declaration / initialization as wrong, because it is.
In the following program, p is declared as a pointer(which is constant BUT string is not).But still the program does not work and stops abruptly saying "untitled2.exe has stopped working".
#include<stdio.h>
#include<stdlib.h>
int main(){
char * const p = "hello";
*p = 'm';
return 0;
}
Why this unexpected behaviour?
Albeit p itself is a pointer to a non-const object, it is pointing to a string literal. A string literal is an object which, although not const-qualified with regards to its type, is immutable.
In other words, p is pointing to an object which is not const, but behaves as if it were.
Read more on ANSI/ISO 9899:1990 (C90), section 6.1.4.
You are getting a Windows error because you are invalidly accessing memory. On other systems you might get a SEGFAULT or SEGV or a Bus error.
*p = 'm';
Is trying to change the first letter of the constant string "hello" from 'h' to 'm';
char * const p = "hello";
defines a constant pointer p and initialises it with the memory address of a constant string "hello" which is inherently of type const char *. By this assignment you are discarding a const qualifier. It's valid C, but will lead to undefined behaviour if you don't know what you are doing.
Mind that const char * forbids you to modify the contents of the memory being pointed to, but does not forbid to change the address while char * const permits you to modify the contents, but fixes the address. There is also a combo version const char * const.
Although this is valid C code, depending on your OS placement and restrictions on "hello" it may or may not end up in writable memory. This is left undefined. As a rule on thumb: constant strings are part of the executable program text and are read-only. Thus attempting to write to *p gives you a memory permission error SIGSEGV.
The correct way is to copy the contents of the string to the stack and work there:
char p[] = "hello";
Now you can modify *p because it is located on the stack which is read/write. If you require the same globally then put it into the global scope.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
getting segmentation fault in a small c program
Here's my code:
char *const p1 = "john";
p1[2] = 'z'; //crashes
printf("%s\n", p1);
I know p1 is a "read-only" variable, but I thought I could still modify the string ("john" ). I appreciate any tips or advice.
You cannot safely modify string literals, even if the pointer doesn't look const. They will often be allocated in read-only memory, hence your crashes - and when they're not in read-only memory modifying them can have unexpected consequences.
If you copy to an array this should work:
char tmp[] = "john";
char *const p1 = tmp;
p1[2] = 'z'; // ok
Keyword const means that variable of that type should stay constant. You shouldn't change it.
Also even if you declare this string as char *p1 = "john"; it would be constant string literal and changing it would cause undefined behaviour. You should declare it as char p1[] = "john"; in order to achieve behaviour you are looking for.
1) Forget the "const" qualifier. This is WRONG, in C and C++, on ANY platform:
char *p1 = "john";
p1[2] = 'z'; //crashes
Here's why:
Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?
http://c-faq.com/decl/strlitinit.html
2) So how do you get around the access violation?
Simple: you allocate writable memory (instead of writing to a string constant, which is probably allocated in read-only memory):
#define BUFSIZE 80
...
char p1[BUFSIZE];
strcpy (p1, "john");
p1[2] = 'z'; //no problem
3) OK: so then what's the deal with "const"?
You were right about that part. Here's (one of many) discussions about the (subtle) difference between a "const pointer" and a "pointer to a const":
http://www.codeguru.com/cpp/cpp/cpp_mfc/general/article.php/c6967
your pointer points to memory that isn't allowed to be changed ( the constant string )
do char p1[100] = "john";
This is because you are not supposed to modify constants: after all, they are called constants for a reason. Modifying a constant is undefined behavior according to the C standard, which often means that your program is going to crash.
Note that it has nothing to do with your pointer being constant: the crash is because what your pointer points to is a string constant.
Here is how to do what you are trying to do legally:
char p1[] = "john";
p1[2] = 'z'; //no longer crashes :)
printf("%s\n", p1);
There are two, unrelated, problems in that code. First, the const is not in the right place.
char * const p = "john";
char const * p = "john";
const char * p = "john";
The latter two are pointers to unmodifiable strings. If you had done this, then the code would not have compiled.
The first option char * const is not really a read-only variable. It means a pointer which points to modifiable data, and that the pointer cannot be changed such that it points to another string. But that's not relevant to your problem.
Your constis not relevant here. You have attempted to modify a string that shouldn't be modified. Literal strings should never be modified, and this is the cause of your crash.
p1 is a pointer to constant data, not a constant pointer. Moreover it points to a literal constant, which typically resides in the code space, and modern operating systems and processors usually protect against code trying to modify code space.
You should not be surprised that it crashes, but in general the behaviour is undefined.