This question already has answers here:
In C, why can't an integer value be assigned to an int* the same way a string value can be assigned to a char*?
(5 answers)
Why it is possible to assign string to character pointer in C but not an integer value to an integer pointer
(3 answers)
Assigning strings to pointer in C Language
(4 answers)
Why must int pointer be tied to variable but not char pointer?
(8 answers)
Closed 4 years ago.
Still learning more C and am a little confused. In my references I find cautions about assigning a pointer that has not been initialized. They go on to give examples. Great answers yesterday by the way from folks helping me with pointers, here:
Precedence, Parentheses, Pointers with iterative array functions
On follow up I briefly asked about the last iteration of the loop and potentially pointing the pointer to a non-existent place (i.e. because of my references cautioning against it). So I went back and looked more and find this:
If you have a pointer
int *pt;
then use it without initializing it (i.e. I take this to mean without a statement like *pt= &myVariable):
*pt = 606;
you could end up with a real bad day depending on where in memory this pointer has been assigned to. The part I'm having trouble with is when working with a string of characters something like this would be ok:
char *str = "Sometimes I feel like I'm going crazy.";
Where the reference says, "Don't worry about where in the memory the string is allocated; it's handled automatically by the compiler". So no need to say initialize *str = &str[0]; or *str = str;. Meaning, the compiler is automatically char str[n]; in the background?
Why is it that this is handled differently? Or, am I completely misunderstanding?
In this case:
char *str = "Sometimes I feel like I'm going crazy.";
You're initializing str to contain the address of the given string literal. You're not actually dereferencing anything at this point.
This is also fine:
char *str;
str = "Sometimes I feel like I'm going crazy.";
Because you're assigning to str and not actually dereferencing it.
This is a problem:
int *pt;
*pt = 606;
Because pt is not initialized and then it is dereferenced.
You also can't do this for the same reason (plus the types don't match):
*pt= &myVariable;
But you can do this:
pt= &myVariable;
After which you can freely use *pt.
When you write sometype *p = something;, it's equivalent to sometype *p; p = something;, not sometype *p; *p = something;. That means when you use a string literal like that, the compiler figures out where to put it and then puts its address there.
The statement
char *str = "Sometimes I feel like I'm going crazy.";
is equivalent to
char *str;
str = "Sometimes I feel like I'm going crazy.";
Simplifying the string literal can be expressed as:
const char literal[] = "Sometimes I feel like I'm going crazy.";
so the expression
char *str = "Sometimes I feel like I'm going crazy.";
is logically equivalent to:
const char literal[] = "Sometimes I feel like I'm going crazy.";
const char *str = literal;
Of course literals do not have the names.
But you can't dereference the char pointer which does not have allocated memory for the actual object.
/* Wrong */
char *c;
*c = 'a';
/* Wrong - you assign the pointer with the integer value */
char *d = 'a';
/* Correct */
char *d = malloc(1);
*d = 'a';
/* Correct */
char x
char *e = &x;
*e = 'b';
The last example:
/* Wrong - you assign the pointer with the integer value */
int *p = 666;
/* Wrong you dereference the pointer which references to the not allocated space */
int *r;
*r = 666;
/* Correct */
int *s = malloc(sizeof(*s));
*s = 666;
/* Correct */
int t;
int *u = &t;
*u = 666;
And the last one - something similar to the string literals = the compound literals:
/* Correct */
int *z = (int[]){666,567,234};
z[2] = 0;
*z = 5;
/* Correct */
int *z = (const int[]){666,567,234};
Good job on coming up with that example. It does a good job of showing the difference between declaring a pointer (like char *text;) and assigning to a pointer (like text = "Hello, World!";).
When you write:
char *text = "Hello!";
it is essentially the same as saying:
char *text; /* Note the '*' before text */
text = "Hello!"; /* Note that there's no '*' on this line */
(Just so you know, the first line can also be written as char* text;.)
So why is there no * on the second line? Because text is of type char*, and "Hello!" is also of type char*. There is no disagreement here.
Also, the following three lines are identical, as far as the compiler is concerned:
char *text = "Hello!";
char* text = "Hello!";
char * text = "Hello!";
The placement of the space before or after the * makes no difference. The second line is arguably easier to read, as it drives the point home that text is a char*. (But be careful! This style can burn you if you declare more than one variable on a line!)
As for:
int *pt;
*pt = 606; /* Unsafe! */
you might say that *pt is an int, and so is 606, but it's more accurate to say that pt (without a *) is a pointer to memory that should contain an int. Whereas *pt (with a *) refers to the int inside the memory that pt (without the *) is pointing to.
And since pt was never initialized, using *pt (either to assign to or to de-reference) is unsafe.
Now, the interesting part about the lines:
int *pt;
*pt = 606; /* Unsafe! */
is that they'll compile (although possibly with a warning). That's because the compiler sees *pt as an int, and 606 as an int as well, so there's no disagreement. However, as written, the pointer pt doesn't point to any valid memory, so assigning to *pt will likely cause a crash, or corrupt data, or usher about the end of the world, etc.
It's important to realize that *pt is not a variable (even though it is often used like one). *pt just refers to the value in the memory whose address is contained in pt. Therefore, whether *pt is safe to use depends on whether pt contains a valid memory address. If pt isn't set to valid memory, then the use of *pt is unsafe.
So now you might be wondering: What's the point of declaring pt as an int* instead of just an int?
It depends on the case, but in many cases, there isn't any point.
When programming in C and C++, I use the advice: If you can get away with declaring a variable without making it a pointer, then you probably shouldn't declare it as a pointer.
Very often programmers use pointers when they don't need to. At the time, they aren't thinking of any other way. In my experience, when it's brought to their attention to not use a pointer, they will often say that it's impossible not to use a pointer. And when I prove them otherwise, they will usually backtrack and say that their code (which uses pointers) is more efficient than the code that doesn't use pointers.
(That's not true for all programmers, though. Some will recognize the appeal and simplicity of replacing a pointer with a non-pointer, and gladly change their code.)
I can't speak for all cases, of course, but C compilers these days are usually smart enough to compile both pointer code and non-pointer code to be practically identical in terms of efficiency. Not only that, but depending on the case, non-pointer code is often more efficient than code that uses pointers.
There are 4 concepts which you have mixed up in your example:
declaring a pointer. int *p; or char *str; are declarations of the pointers
initializing a pointer at declaration. char *str = "some string"; declares the pointer and initializes it.
assigning a value to the pointer. str = "other string"; assigns a value to the pointer. Similarly p = (int*)606; would assign the value of 606 to the pointer. Though, in the first case the value is legal and points to the location of the string in static memory. In the second case you assign an arbitrary address to p. It might or might not be a legal address. So, p = &myint; or p = malloc(sizeof(int)); are better choices.
assigning a value to what the pointer points to. *p = 606; assigns the value to the 'pointee'. Now it depends, if the value of the pointer 'p' is legal or not. If you did not initialize the pointer, it is illegal (unless you are lucky :-)).
Many good explanations over here. The OP has asked
Why is it that this is handled differently?
It is a fair question, he means why, not how.
Short answer
It is a design decision.
Long answer
When you use a literal in an asigment, the compiler has two options: either it places the literal in the generated assembly instruction (maybe allowing variable length assembly instructions to accomodate different literal byte lenghts) or it places the literal somewhere the cpu can reach it (memory, registers...). For ints, it seems a good choice to place them on the assembly instruction, but for strings... almost all strings used in programs (?) are too long to be placed on the assembly instruction. Given that arbitrarily long assembly instructions are bad for general purpose CPUs, C designers have decided to optimize this use case for strings and save the programmer one step by allocating memory for him. This way, the behaviour is consistent across machines.
Counterexample
Just to see that, for other languages, this has not to be necessarily the case, check this. There (it is Python), int constants are actually placed in memory and given an id, always. So, if you try to get the address of two different variables that were asigned the same literal, it will return the same id (since they are refereing to the same literal, already placed in memory by the Python loader). It is useful to stress that in Python, the id is equivalent to an address in the Python's abstract machine.
Each byte of memory is stored in its own numbered pigeon-hole. That number is the "address" of that byte.
When your program compiles, it builds up a data-table of constants. At run-time these are copied into memory somewhere. So upon execution, in memory is the string (here at the 100,000th byte):
#100000 Sometimes I feel like I'm going crazy.\0
The compiler has generated code, such that when the variable str is created, it is automatically initialised with the address of where that string came to be stored. So in this example's case, str -> 100000. This is where the name pointer comes from, str does not actually contain that string-data, it holds the address of it (i.e. a number), "pointing" to it, saying "that piece of data at this address".
So if str was treated like an integer, it would contain the value 100000.
When you dereference a pointer, like *str = '\0', it's saying: The memory str points at, put this '\0' there.
So when the code defines a pointer, but without any initialisation, it could be pointing anywhere, perhaps even to memory the executable doesn't own (or owns, but can't write to).
For example:
int *pt = blah; // What does 'pt' point at?
It does not have an address. So if the code tries to dereference it, it's just pointing off anywhere in memory, and this gives indeterminate results.
But the case of:
int number = 605;
int *pt = &number
*pt = 606;
Is perfectly valid, because the compiler has generated some space for the storage of number, and now pt contains the address of that space.
So when we use the address-of operator & on a variable, it gives us the number in memory where the variable's content is stored. So if the variable number happened to be stored at byte 100040:
int number = 605;
printf( "Number is stored at %p\n", &number );
We would get the output:
Number is stored at 100040
Similarly with string-arrays, these are really just pointers too. The address is the memory-number of the first element.
// words, words_ptr1, words_ptr2 all end up being the same address
char words[] = "Sometimes I feel like I'm going crazy."
char *words_ptr1 = &(words[0]);
char *words_ptr2 = words;
There are answers here with very good and detailed information.
I will post another answer, perhaps targeting more straightly to the OP.
Rephrasing it a bit:
Why is
int *pt;
*pt = 606;
not ok (non working case), and
char *str = "Sometimes I feel like I'm going crazy.";
is ok (working case)?
Consider that:
char *str = "Sometimes I feel like I'm going crazy.";
is equivalent to
char *str;
str = "Sometimes I feel like I'm going crazy.";
The closest "analogous", working case for int is (using a compound literal instead of a string literal)
int *pt = (int[]){ 686, 687 };
or
int *pt;
pt = (int[]){ 686, 687 };
So, the differences with your non-working case are three-fold:
Use pt = ... instead of *pt = ...
Use a compound literal, not a value (by the same token, str = 'a' wouldn't work).
Compound literals are not always guaranteed to work, since the lifetime of its storage depends on standard/implementation.
In fact, its use as above may give the compilation error taking address of temporary array.
A string variable can be declared either as an array of characters char txt[] or using a character pointer char* txt. The following illustrates the declaration and initialization of a string:
char* txt = "Hello";
In fact, as illustrated above, txt is a pointer to the first character of the string literal.
Whether we are able to modify (read/write) a string variable or not, depends on how we declared it.
6.4.5 String literals (ISO)
6. It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
Actually, if we declare a string txt like we previously did, the compiler will declare the string literal in a read-only data section .rodata (platform dependent) even if txt is not declared as const char*. So we can not modify it. Actually, we should not even try to modify it. In this case gcc can fire warnings (-Wwrite-strings) or even fail due to -Werror. In this cas, it is better to declare string variable as const pointers:
const char* txt = "Hello";
On the other hand, we can declare a string variable as an array of characters:
char txt[] = "Hello";
In that case, the compiler will arrange for the array to get initialized from the string literal, so you can modify it.
Note: An array of characters can be used as if it was a pointer to its first character. That's why we can use txt[0] or *txt syntax to access the first character. And we can even explicitly convert an array of characters to a pointer:
char txt[] = "Hello";
char* ptxt = (char*) txt;
Related
I'm new to C and have a trouble understanding pointers. The part that I'm confused is about char * and int *.
For example, we can directly assign a pointer for char, like
char *c = "c"; and it doesn't make errors.
However, if I assign a pointer for int like I just did, for example int * a = 10;,
it makes errors. I need to make an additional space in memory to assign a pointer for int,
like int *b = malloc(sizeof(int)); *b = 20; free(b);...
Can anyone tell me why?
I think you're misunderstanding what a pointer is and what it means. In this case:
int* a = 10;
You're saying "create a pointer (to an int) and aim it at the literal memory location 0x0000000A (10).
That's not the same as this:
int n = 10;
int* a = &n;
Which is "create a pointer (to an int) and aim it at the memory location of n.
If you want to dynamically allocate this:
int* a = malloc(sizeof(int));
*a = 10;
Which translates to "create a pointer (to an int) and aim it at the block of memory just allocated, then assign to that location the value 10.
Normally you'd never allocate a single int, you'd allocate a bunch of them for an array, in which case you'd refer to it as a[0] through a[n-1] for an array of size n. In C *(x + y) is generally the same as x[y] or in other words *(x + 0) is either just *x or x[0].
In your example, you do not send c to the char 'c'. You used "c" which is a string-literal.
For string-literals it works as explained in https://stackoverflow.com/a/12795948/5280183.
In initialization the right-hand side (RHS) expression must be of or convertible to the type of the variable declared.
If you do
char *cp = ...
int *ip = ...
then what is in ... must be convertible to a pointer to a char or a pointer to an int. Also remember that a char is a single character.
Now, "abc" is special syntax in C - a string literal, for creating an array of (many) immutable characters. It has the type char [size] where size is the number of characters in the literal plus one for the terminating null character. So "c" has type char [2]. It is not char. An array in C is implicitly converted to pointer to the first element, having then the type "pointer to the element type". I.e. "c" of type char [2] in char *cp = "c"; is implicitly converted to type char *, which points to the first of the two characters c and \0 in the two-character array. It is also handily of type char * and now we have char *cp = (something that has type char * after conversions);.
As for int *, you're trying to pass an integer value to a pointer. It does not make sense.
A pointer holds an address. If you'd ask for the address of Sherlock Holmes, the answer would be 221 Baker Street. Now instead what you've done is "address of Sherlock Holmes is this photo I took of him in this morning".
The same incorrect code written for char * would be
char *cp = 'c'; // or more precisely `char *p = (char)'c';
and it would give you precisely the same error proving that char *cp and int *cp work alike.
Unfortunately C does not have int string literals nor literals for integer arrays, though from C99 onwards you could write:
int *ip = (int[]){ 5 };
or
const int *ip = (const int[]){ 5 };
Likewise you can always point a char * or int * to a single object of that type:
char a = 'c';
char *pa = &a;
int b = 42;
int *pb = &b;
now we can say that a and *pa designate the same char object a, and likewise b and *pb designate the same int object b. If you change *pa you will change a and vice versa; and likewise for b and *pb.
A string literal like "c" is actually an array expression (the type of "c" is "2-element array of char).
Unless it is the operand of the sizeof or & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "array of T" will be converted, or "decay" to an expression of type "pointer to T" and its value will be the address of the first element of the array.
So when you write
char *c = "c";
it’s roughly equivalent to writing
char string[] = "c";
char *c = &string[0];
You’re assigning a pointer to a pointer, so the compiler doesn’t complain.
However, when you write
int *a = 10;
you’re assigning an int value to a pointer and the types are not compatible, so the compiler complains. Pointers are not integers; they may have an integer representation, but that’s not guaranteed, and it won’t be the same size as int.
Are there differences between int pointer and char pointer in c?
short answer: NO
All pointers in C are created equal in the last decades. During some time there were pointers of different sizes, in some platforms like Windows, due to a thing called memory model.
Today the compiler sets the code to use 32-bit or 64 or any size pointers depending on the platform, but once set all pointers are equal.
What is not equal is the thing a pointer points to, and it is that you are confused about.
consider a void* pointer. malloc() allocates memory in C. It always return a void* pointer and you then cast it to anything you need
sizeof(void*) is the same as sizeof(int*). And is the same as sizeof(GiantStruct*) and in a 64-bit compiler is 8 bytes or 64 bits
what about char* c = "c"?
you must ask yourself what is "c". is is char[2]. And it is a string literal. Somewhere in memory the system will allocate an area of at least 2 bytes, put a 'c' and a zero there, since string literals are NULL terminated in C, take that address in put in the address allocated to your pointer c.
what about int* a = 10?
it is the same thing. It is somewhat ok until the operating system comes in. What is a? a is int* pointer to an int. And what is *a? an int. At which address? 10. Well, truth is the system will not let you access that address just because you set a pointer to point to it. In the '80s you could. Today there are very strong rules on this: you can only access memory allocated to your processes. Only. So the system aborts your program as soon as you try to access.
an example
if you write
int* a = malloc(300);
the system will allocate a 300-bytes area, take the area address and write it for you at the address allocated to a. When you write *a=3456 the system will write this value to the first sizeof(int) bytes at that address.
You can for sure write over all 300, but you must somewhat manipulate the pointer to point inside the area. Only inside that area.
In fact C language was designed just for that.
Let us write "c" to the end of that 300-byte area alocated to a:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char* c = "c";
int* a = (int*)malloc(300);
*a = 3456;
printf("Area content is %d\n", *a);
char* pToEnd = (char*)a + 298;
*(pToEnd) = 'c';
*(pToEnd + 1) = 0;
printf("At the end of the 300-byte area? [Should be 'c'] '%s'\n", pToEnd);
printf("The pointer c points to '%s'\n", c);
//*c = 'x'; // error: cancel program
c = pToEnd;
printf("The pointer c now points do the end of the 300-byte area:'%s'\n", c);
free(a);
return 0;
};
and see
Area content is 3456
At the end of the 300-byte area? [Should be 'c'] 'c'
The pointer c points to 'c'
The pointer c now points do the end of the 300-byte area:'c'
Also note that as soon as you try to write over c, the char pointer in your example, your program will also cancel. It is a read only area --- see the comment on line 14. But you can point c inside the area in the example and use it, since the pointer in itself is not constant.
And when you write c = pToEnd in the example above access to the string literal is lost forever. In some sense is a memory leak, but it is a static allocated area. The leak is not the area but the address.
Note also that free(a) at the end, even if being a int* will free all 300 bytes
The direct answer to your question, is that, C language is a strict typed language. And before the machine code is produced, a compiler checks for some possible errors. The types are different, and that is all that the compiler wants to know.
But from the bit-byte point of view, there are no difference, for C language, with the above two pointers.
Take a look at your code with online compiler
void main(){
int * a = 10; /* <= this presumably gives you an error. */
}
What I did is copy pasted your code that gives you the error. And online compiler had emitted a warning, not an error. Even more than that. Change the C standard to be of c17 and you still get a mere warning.
The note about assigning a string literal to a pointer had been discussed broadly at SO. It deserves a different topic.
To sum it up. For the C language there is no meaningful difference in the above two pointers. It is a user, who takes a responsibility to know a layout of a memory in his/her program. All in all, the language was designed as a high level tool that operates over linear memory.
What is the correct way to use int* x?
Mention any related link if possible as I was unable to find one.
Because the literal "hello" evaluates to a pointer to constant memory initialised with the string "hello" (and a nul terminator), i.e. the value you get is of char* type.
If you want a pointer to number 12 then you'll need to store the value 12 somewhere, e.g. in another int, and then take a pointer to that:
int x_value = 12;
int* x = &x_value;
However in this case you're putting the 12 on the stack, and so that pointer will become invalid once you leave this function.
You can at a pinch abuse that mechanism to make yourself a pointer to 12; depending on endianness that would probably be
int* x = (int*)("\x0c\x00\x00");
Note that this is making assumptions about your host's endianness and size of int, and that you would not be able to modify that 12 either (but you can change x to point to something else), so this is a bad idea in general.
Because the compiler creates a static (constant) string "hello" and lets x point to that, where it doesn't create a static (constant) int.
A string literal creates an array object. This object has static storage duration (meaning it exists for the entire execution of the program), and is initialized with the characters in the string literal.
The value of a string literal is the value of the array. In most contexts, there is an implicit conversion from char[N] to char*, so you get a pointer to the initial (0th) element of the array. So this:
char *s = "hello";
initializes s to point to the initial 'h' in the implicitly created array object. A pointer can only point to an object; it does not point to a value. (Incidentally, that really should be const char *s, so you don't accidentally attempt to modify the string.)
String literals are a special case. An integer literal does not create an object; it merely yields a value. This:
int *ptr = 42; // INVALID
is invalid, because there is no implicit conversion of 42 from int* to int. This:
int *ptr = &42; // INVALID
is also invalid, because the & (address-of) operator can only be applied to an object (an "lvalue"), and there is no object for it to apply to.
There are several ways around this; which one you should use depends on what you're trying to do. You can allocate an object:
int *ptr = malloc(sizeof *ptr); // allocation an int object
if (ptr == NULL) { /* handle the error */ }
but a heap allocation can always fail, and you need to deallocate it when you're finished with it to avoid a memory leak. You can just declare an object:
int obj = 42;
int *ptr = &obj;
You just have to be careful with the object's lifetime. If obj is a local variable, you can end up with a dangling pointer. Or, in C99 and later, you can use a compound literal:
int *ptr = &(int){42};
(int){42} is a compound literal, which is similar in some ways to a string literal. In particular, it does create an object, and you can take that object's address.
But unlike with string literals, the lifetime of the (anonymous) object created by a compound literal depends on the context in which it appears. If it's inside a function definition, the lifetime is automatic, meaning that it ceases to exist when you leave the block containing it -- just like an ordinary local variable.
That answers the question in your title. The body of your question:
What is the correct way to use int* x?
is much more general, and it's not a question we can answer here. There are a multitude of ways to use pointers correctly -- and even more ways to use them incorrectly. Get a good book or tutorial on C and read the section that discusses pointers. Unfortunately there are also a lot of bad books and tutorials. Question 18.10 of the comp.lang.c FAQ is a good starting point. (Bad tutorials can often be identified by the casual use of void main(), and by the false assertion that arrays are really pointers.)
Q1. Why can't we assign int *x=12? You can provided that 12 is a valid memory address which holds an int. But with a modern OS specifying a hard memory address is completely wrong (perhaps except embedded code). The usage is typically like this
int y = 42; // simple var
int *x = &y; // address-of: x is pointer to y
*x = 12; // write a new value to y
This looks the same as what you asked, but it is not, because your original declaration assigns the value 12 to x the pointer itself, not to *x its target.
Q2. Why can't we assign int *x = "12"? Because you are trying to assign an incompatible type - a char pointer to int pointer. "12" is a string literal which is accessed via a pointer.
Q3. But we can assign char* x= "hello"
Putting Q1 and Q2 together, "hello" generates a pointer which is assigned to the correct type char*.
Here is how it is done properly:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *x;
x = malloc(sizeof(int));
*x = 8;
printf("%d \n", *x);
}
So I want to know exactly how many ways are there to declare string. I know similar questions have been asked for several times, but I think my focus is different. As a beginner in C, I want to know which method of declaration is correct and preferable, so that I can stick to it.
We all know we can declare string in the two following ways.
char str[] = "blablabla";
char *str = "blablabla";
After reading some Q&A in stack-overflow, I was told string literal is placed in the read-only part of the memory. So in order to create modifiable string, you need to create a character array of the length of the string + 1 in the stack and copy each characters to array.
So this is what the 1st line of code doing.
However, what the 2nd line of code does is to create a character pointer and assign the pointer the address of the 1st character located in the read-only part of the memory. So this kind of declaration does not involved copying character by character.
Please let me know if I am wrong.
So far it seems quite understandable, but what really confuses me is the modifiers. For instance, some suggests
const char *str = "blablabla";
instead of
char *str = "blablabla";
because if we do something like
*(str + 1) = 'q';
It will cause undefined behavior which is horrible.
But some go even further and suggest something like
static const char *str = "blablabla";
and say this will place the string into the static memory
which will never gets modified to avoid the undefined behavior.
So which is actually the #right# way to declare a string?
Besides, I am also interested in knowing the scope when declaring string.
For example,
(You can ignore the examples, both of them are buggy as pointed out by the others)
#include <stdio.h>
char **strPtr();
int main()
{
printf("%s", *strPtr());
}
char **strPtr()
{
char str[] = "blablabla";
char *charPtr = str;
char **strPtr = &charPtr;
return strPtr;
}
will print some garbage value.
But
#include <stdio.h>
char **strPtr();
int main()
{
printf("%s", *strPtr());
}
char **strPtr()
{
char *str = "blablabla";
/*As point out by other I am returning the address of a local variable*/
/*This causes undefined behavior*/
char **strPtr = &str;
return strPtr;
}
will work perfectly fine. (NO it doesn't, it is undefined behavior.)
I think I should leave it as another question.
This question is getting too long.
A lot of your confusion comes from a common mis-understanding about C and strings: one which is explicitly stated in the title of your question. The C language does not have a native string type, so in fact there are exactly zero ways to declare a string in C.
Spend some time reading Does C Have a String type? which does a good job of explaining that.
This is evident from the fact that you can't (sensibly) do the following:
char *a, *b;
// code to point a and b at some "strings"
if (a == b)
{
// do something if strings compare equal
}
The if statement will compare the values of the pointers, not the contents of the memory they address. So, if a and b pointed to two different areas of memory, each containing identical data, the comparison a == b would fail. The only time the comparison would evaluate as "true" (i.e. something other than zero), would be if a and b held the same address (i.e. pointed to the same location in memory).
What C has is a convention, and some syntactic sugar to make life easier.
The convention is that a "string" is represented as a sequence of char terminated with the value zero (usually referred to as NUL and represented by the special character escape sequence '\0'). This convention comes from the API of the original standard library (back in the 70's) which provides a set of string primitives such as strcpy(). These primitives were so fundamental to doing anything truly useful in the language that life was made easier for programmers by adding syntactic sugar (this is all before the language escaped from the lab).
The syntactic sugar is the existence of "string literals": no more, and no less. In source code, any sequence of ASCII characters, enclosed in double quotes, is interpreted as a string literal and the compiler produces a copy (in "read-only" memory) of the characters plus a terminating NUL byte to conform to the convention. Modern compilers detect duplicated literals and only produce a single copy - but it's not a requirement of the standard last time I looked. Thus this:
assert("abc" == "abc");
may or may not raise an assertion - which reinforces the statement that C does not have a native string type. (For that matter, neither does C++ - it has a String class!)
With that out of the way, how do you use string literals to initialize a variable?
The first (and most common) form you will see is this
char *p = "ABC";
Here, the compiler sets aside 4 bytes (assuming sizeof(char) ==1) of memory in a "read only" section of the program and initializes it with [0x41, 0x42, 0x43, 0x00]. It then initializes p with the address of that array. You should note that there is some const casting going on here as the underlying type of a string literal is const char * const (a constant pointer to a constant character). Which is why you would normally be advised to write this as:
const char *p = "ABC";
Which is a "pointer to a constant char" - another way of saying "pointer to read only memory".
The next two forms use string literals to initialize arrays
char p1[] = "ABC";
char p2[3] = "ABC";
Note that there is a critical difference between the two. the first line creates a 4 byte array. The second creates a 3 bytes array.
In the first case, as before, the compiler creates a 4 byte constant array containing [0x41, 0x42, 0x43, 0x00]. Note that it adds the trailing NUL to form a "C String". It then reserves four bytes of RAM (on the stack for a local variable, or in "static" memory for variables at file scope) and inserts code to initialize it at run time by copying the "read only" array into the allocated RAM. You are now free to modify elements of p1 at will.
In the second case, the compiler creates a 3 byte constant array containing [0x41, 0x42, 0x43]. Note that there is no trailing NUL. It then reserves 3 bytes of RAM (on the stack for a local variable, or in "static" memory for variables at file scope) and inserts code to initialize it at run time by copying the "read only" array into the allocated RAM. You are again now free to modify elements of p2 at will.
The difference in sizes of the two arrays p1 and p2 is critical. The following code (if you ran it) would demonstrate it.
char p1[] = "ABC";
char p2[3] = "ABC";
printf ("p1 = %s\n", p1); // Will print out "p1 = ABC"
printf ("p2 = %s\n", p2); // Will print out "p2 = ABC!##$%^&*"
The output of the second printf is unpredictable, and could theoretically result in your code crashing. It tends to seem to work, simply because so much of RAM is filled with zeroes that eventually printf finds a terminating NUL.
Hope this helps.
I'm learning C programming in a self-taught fashion. I know that numeric pointer addresses must always be initialized, either statically or dynamically.
However, I haven't read about the compulsory need of initializing char pointer addresses yet.
For example, would this code be correct, or is a pointer address initialization needed?
char *p_message;
*p_message = "Pointer";
I'm not entirely sure what you mean by "numeric pointer" as opposed to "char pointer". In C, a char is an integer type, so it is an arithmetic type. In any case, initialization is not required for a pointer, regardless of whether or not it's a pointer to char.
Your code has the mistake of using *p_message instead of p_message to set the value of the pointer:
*p_message = "Pointer" // Error!
This wrong because given that p_message is a pointer to char, *p_message should be a char, not an entire string. But as far as the need for initializing a char pointer when first declared, it's not a requirement. So this would be fine:
char *p_message;
p_message = "Pointer";
I'm guessing part of your confusion comes from the fact that this would not be legal:
char *p_message;
*p_message = 'A';
But then, that has nothing to do with whether or not the pointer was initialized correctly. Even as an initialization, this would fail:
char *p_message = 'A';
It is wrong for the same reason that int *a = 5; is wrong. So why is that wrong? Why does this work:
char *p_message;
p_message = "Pointer";
but this fail?
char *p_message;
*p_message = 'A';
It's because there is no memory allocated for the 'A'. When you have p_message = "Pointer", you are assigning p_message the address of the first character 'P' of the string literal "Pointer". String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap.
But chars, like ints, need to be allocated either on the stack or the heap. Either you need to declare a char variable so that there is memory on the stack:
char myChar;
char *pChar;
pChar = &myChar;
*pChar = 'A';
Or you need to allocate memory dynamically on the heap:
char* pChar;
pChar = malloc (1); // or pChar = malloc (sizeof (char)), but sizeof(char) is always 1
*pChar = 'A';
So in one sense char pointers are different from int or double pointers, in that they can be used to point to string literals, for which you don't have to allocate memory on the stack (statically) or heap (dynamically). I think this might have been your actual question, having to do with memory allocation rather than initialization.
If you are really asking about initialization and not memory allocation: A pointer variable is no different from any other variable with regard to initialization. Just as an uninitialized int variable will have some garbage value before it is initialized, a pointer too will have some garbage value before it is initialized. As you know, you can declare a variable:
double someVal; // no initialization, will contain garbage value
and later in the code have an assignment that sets its value:
someVal = 3.14;
Similarly, with a pointer variable, you can have something like this:
int ary [] = { 1, 2, 3, 4, 5 };
int *ptr; // no initialization, will contain garbage value
ptr = ary;
Here, ptr is not initialized to anything, but is later assigned the address of the first element of the array.
Some might say that it's always good to initialize pointers, at least to NULL, because you could inadvertently try to dereference the pointer before it gets assigned any actual (non-garbage) value, and dereferencing a garbage address might cause your program to crash, or worse, might corrupt memory. But that's not all that different from the caution to always initialize, say, int variables to zero when you declare them. If your code is mistakenly using a variable before setting its value as intended, I'm not sure it matters all that much whether that value is zero, NULL, or garbage.
Edit. OP asks in a comment: You say that "String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap", so how does allocation occur?
That's just how the language works. In C, a string literal is an element of the language. The C11 standard specifies in §6.4.5 that when the compiler translates the source code into machine language, it should transform any sequence of characters in double quotes to a static array of char (or wchar_t if they are wide characters) and append a NUL character as the last element of the array. This array is then considered immutable. The standard says: If the program attempts to modify such an array, the behavior is undefined.
So basically, when you have a statement like:
char *p_message = "Pointer";
the standard requires that the double-quoted sequence of characters "Pointer" be implemented as a static, immutable, NUL-terminated array of char somewhere in memory. Typically implementations place such string literals in a read-only area of memory such as the text block (along with program instructions). But this is not required. The exact way in which a given implementation handles memory allocation for this array / NUL terminated sequence of char / string literal is up to the particular compiler. However, because this array exists somewhere in memory, you can have a pointer to it, so the above statement does work legally.
An analogy with function pointers might be useful. Just as the code for a function exists somewhere in memory as a sequence of instructions, and you can have a function pointer that points to that code, but you cannot change the function code itself, so also the string literal exists in memory as a sequence of char and you can have a char pointer that points to that string, but you cannot change the string literal itself.
The C standard specifies this behavior only for string literals, not for character constants like 'A' or integer constants like 5. Setting aside memory to hold such constants / non-string literals is the programmer's responsibility. So when the compiler comes across statements like:
char *charPtr = 'A'; // illegal!
int *intPtr = 5; // illegal!
the compiler does not know what to do with them. The programmer has not set aside such memory on the stack or the heap to hold those values. Unlike with string literals, the compiler is not going to set aside any memory for them either. So these statements are illegal.
Hopefully this is clearer. If not, please comment again and I'll try to clarify some more.
Initialisation is not needed, regardless of what type the pointer points to. The only requirement is that you must not attempt to use an uninitialised pointer (that has never been assigned to) for anything.
However, for aesthetic and maintenance reasons, one should always initialise where possible (even if that's just to NULL).
First of all, char is a numeric type, so the distinction in your question doesn't make sense. As written, your example code does not even compile:
char *p_message;
*p_message = "Pointer";
The second line is a constraint violation, since the left-hand side has arithmetic type and the right-hand side has pointer type (actually, originally array type, but it decays to pointer type in this context). If you had written:
char *p_message;
p_message = "Pointer";
then the code is perfectly valid: it makes p_message point to the string literal. However, this may or may not be what you want. If on the other hand you had written:
char *p_message;
*p_message = 'P';
or
char *p_message;
strcpy(p_message, "Pointer");
then the code would be invoking undefined behavior by either (first example) applying the * operator to an invalid pointer, or (second example) passing an invalid pointer to a standard library function which expects a valid pointer to an object able to store the correct number of characters.
not needed, but is still recommended for a clean coding style.
Also the code you posted is completely wrong and won't work, but you know that and only wrote that as a quick example, right?
I'm learning C right now and got a bit confused with character arrays - strings.
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat!
char* name;
Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but
Why do they call this a char pointer? I know of pointers as references to variables
Is this an "excuse"? Does this find any other use than in char*?
What is this actually? Is it a pointer? How do you use it correctly?
thanks in advance,
lamas
I think this can be explained this way, since a picture is worth a thousand words...
We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.
char name[] = "Fortran";
+======+ +-+-+-+-+-+-+-+--+
|0x1234| |F|o|r|t|r|a|n|\0|
+======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name a memory address of 0x1234.
Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.
What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!
The address of name in memory is 0x1234, as in the example, if you were to do this:
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.
Now on to pointers...
char *name is a pointer to type of char....
Edit:
And we initialize it to NULL as shown Thanks Dan for pointing out the little error...
char *name = (char*)NULL;
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.
Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type
Suppose we do this:
name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");
We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,
char *name;
+======+ +======+ +-+-+-+-+-+-+-+--+
|0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0|
+======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this:
printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00005678
The address of name is 0x00009876
Now, this is where the illusion that 'arrays and pointers are the same comes into play here'
When we do this:
char ch = name[1];
What happens at runtime is this:
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.
That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.
Remember, an array of type T will always decay into a pointer of the first element of type T.
When we do this:
char ch = *(name + 5);
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.
Incidentally, you can also do that to the array of chars also...
Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value).
i.e.
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value) is commutative, that is in the reverse,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)
3[name]
Ok, how do I get the value of the pointer?
That is what the * is used for,
Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.
Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this
*name++;
*name--;
The pointer automatically steps up/down as a result by one.
When we do this, assuming the pointer memory address of 0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!
Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.
Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.
Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)
I hope this is of help to you in understanding pointers.
That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.
It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:
char* name = "Mr. Anderson";
That is actually pretty much the same as this:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup() function, like this:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:
void strcpy(char *str1, char *str2);
If you then pass that:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.
Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.
This ended up a bit more encyclopedic than I'd intended, hope its helpful.
Editted to remove C++ code. I mix the two so often, I sometimes forget.
char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.
It could point to a single byte of memory and be a "true" pointer to a single char.
It could point to a contiguous area of memory which holds a number of characters.
If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
char *name, on it's own, can't hold any characters. This is important.
char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.
Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.
It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.
However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As #which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.
It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.
a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.
if you used the following:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
One is an actual array object and the other is a reference or pointer to such an array object.
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.
The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran