I am trying to understand a bit more about incrementing arrays vs pointers. For example, the following works:
char string[] = "Hi";
char * pstring = string;
while(*pstring)
printf("%c", *pstring++);
But if I remove the pointer, it will not:
char string[] = "Hi";
while(*string)
printf("%c", *string++);
Why does the first one work? What's the main difference between iteration through a pointer and array?
There's really quite a lot of rationales for why the second one doesn't work. I'm sure you know this, but I'll just reiterate: an array is not a pointer. Arrays are only converted to pointers. This is automatic, and it happens "very often", but arrays and pointers are still different types. In terms of the language, you simply cannot assign to an array (lvalues of array type are not "modifiable lvalues"), and, specifically, the "hidden" string = string + 1 assignment in string++ does not work. You can, of course, assign to a pointer variable. For this reason, in the C standard, ++ is defined to work only on numeric ("real") and pointer types, and this means you aren't allowed to give it an array.
In terms of why the rules are like this, one train of thought starts by noticing that an array type is only "complete" when it has a size—i.e. if you have a variable of array type, its size is a part of its type. Trying to mutate string with arithmetic, as you do here, would require changing the type of string, because the size would change, but this is not allowed in C. (Note that char string[]; is an invalid declaration because it doesn't specify a size; when you added the initializer you told C to infer the size (3) from it.) In the pointer version, pstring is a char*, and so is pstring + 1, so there's no issue. In the array version, you'd need to have char string[3] before the loop and char string[1] afterwards. Worse, the final size of string would depend on the data in it, so there'd be no way to predict it from the language's point of view. Best not open that can of worms, no?
The idea of incrementing string also breaks down because, in C, an "array object" is more than "a bunch of contiguous elements". When you declare string, yes, you create a bunch of char objects that are contiguous in memory, but you also "bless" (this is not a technical term, unless you're using Perl :)) that memory into being a char[3] object. Probably, in terms of the actual machine, this "blessing" doesn't actually do or mean anything, but, in terms of the abstract machine that C programs run on, there is a difference. Specifically, there is neither an char[2] object located at memory address string + 1 nor an char[1] at string + 2. Thus, were you to increment string, there would be no array for string to refer to anymore.
I suppose you can boil all this down to the intuition that an array is really just "a bunch of variables". That is, when you declared char string[3];, that should feel like you did char string_0, string_1, string_2;. This is just like if you had struct { int x; char y; } test;—this feels like writing int test_x; char test_y;. "Incrementing a group of variables" is quite meaningless, so of course string++ and test++ are disallowed. With string, you have the option to create a char *pstring, such that pstring = &string_0, pstring + 1 = &string_1, pstring + 2 = &string_2, but that doesn't change the fact that doing arithmetic on string itself (especially destructively incrementing it) doesn't make sense.
Here's my two bits....
Why does the first one work?
The pointer "pstring" is a 'variable'. This means that the pointer "pstring" can be re-assigned a new value.
pstring++ is "pstring = pstring + 1" (allowed).
Other valid pointer operations are:
Assignment of pointers of the same type.
Adding or Subtracting a pointer and an integer.
Subtracting or Comparing two pointers to members of the same array.
Assigning or Comparing a pointer to zero(NULL).
What's the main difference between iteration through a pointer and array?
The name of the array(synonymous with the location of the first element) is not a "variable" and will always refer to the same storage.
Though an integer can be added to or subtracted from an array name, re-assigning a new value to an array name is illegal.
string++ is "string = string + 1" (not allowed).
The difference in coding is extrapolated further in the following:
char string[] = "Hi";
int i = 0;
while(*(string+i)){ // or string[i]
printf("%c", *(string+i));// or string[i]
i++;
}
Related
char string1[3][4]={"koo","kid","kav"}; //This is a 2D array
char * string[3]={"koo","kid","kav"}; //This is an array of 3 pointers pointing to 1D array as strings are stored as arrays in memory
char (*string1Ptr)[4]=string1; //This is a pointer to a 1D array of 4 characters
//I want to know differences between string1Ptr(pointer to array mentioned in question) and string(array of pointers mentioned in question). I only typed string1 here to give string1Ptr an address to strings
Besides the fact that string can point to strings of any size and string1Ptr can only point to strings of size 4 only(otherwise pointer arithmetic would go wrong), I don't see any differences between them.
For example,
printf("%s\n", string1[2]); // All print the same thing, ie, the word "kav"
printf("%s\n", string1Ptr[2]);
printf("%s\n", string[2]);
They all seem to perform the same pointer arithmetic.(My reason for assuming string and string1Ptr are almost similar besides for the difference I stated above)
So what are the differences between string and string1Ptr? Any reason to use one over the other?
PS: I'm a newbie so please go easy on me.
Also, I did check C pointer to array/array of pointers disambiguation, it didn't seem to answer my question.
char string1[3][4]={"koo","kid","kav"}; //This is a 2D array
char * string[3]={"koo","kid","kav"}; //This is an array of 3 pointers pointing to 1D array as strings are stored as arrays in memory
char (*string1Ptr)[4]=string1; //This is a pointer to a 1D array of 4 characters
Besides the fact that string can point to strings of any size and
string1Ptr can only point to strings of size 4 only(otherwise
pointer arithmetic would go wrong), I don't any differences between
them.
They are absolutely, fundamentally different, but C goes to some trouble to hide the distinction from you.
string is an array. It identifies a block of contiguous memory wherein its elements are stored. Those elements happen to be of type char * in this example, but that's a relatively minor detail. One can draw an analogy here to a house containing several rooms -- the rooms are physically part of and exist inside the physical boundaries of the house. I can decorate the rooms however I want, but they always remain the rooms of that house.
string1Ptr is a pointer. It identifies a chunk of memory whose contents describe how to access another, different chunk of memory wherein an array of 4 chars resides. In our real estate analogy, this is like a piece of paper on which is written "42 C Street, master bedroom". Using that information, you can find the room and redecorate it as you like, just as in the other case. But you can also replace the paper with a locator for a different room, maybe in a different house, or with random text, or you can even burn the whole envelope, without any of that affecting the room on C Street.
string1, for its part, is an array of arrays. It identifies a block of contiguous memory where its elements are stored. Each of those elements is itself an array of 4 chars, which, incidentally, happens to be just the type of object to which string1Ptr can point.
For example,
printf("%s\n", string1[2]); // All print the same thing, ie, the word "kav"
printf("%s\n", string1Ptr[2]);
printf("%s\n", string[2]);
They all seem to perform the same pointer arithmetic.(My reason for
assuming string and string1Ptr are almost similar besides for the
difference I stated above)
... and that is where C hiding the distinction comes in. One of the essential things to understand about C arrays is that in nearly all expressions,* values of array type are silently and automatically converted to pointers [to the array's first element]. This is sometimes called pointer "decay". The indexing operator is thus an operator on pointers, not on arrays, and indeed it does have similar behavior in your three examples. In fact, the pointer type to which string1 decays is the same as the type of string1Ptr, which is why the initialization you present for the latter is permitted.
But you should understand that the logical sequence of operations is not the same in those three cases. First, consider
printf("%s\n", string1Ptr[2]);
Here, string1Ptr is a pointer, to which the indexing operator is directly applicable. The result is equivalent to *(string1Ptr + 2), which has type char[4]. As a value of array type, that is converted to a pointer to the first element (resulting in a char *).
Now consider
printf("%s\n", string1[2]);
string1 is an array, so first it is converted to a pointer to its first element, resulting in a value of type char(*)[4]. This is the same type as string1Ptr1, and evaluation proceeds accordingly, as described above.
But this one is a bit more different:
printf("%s\n", string[2]);
Here, string is a pointer, so the indexing operation applies directly to it. The result is equivalent to *(string + 2), which has type char *. No automatic conversions are performed.
Any reason to use one over the other?
Many, in both directions, depending on your particular needs at the time. Generally speaking, pointers are more flexible, especially in that they are required for working with dynamically allocated memory. But they suffer from the issues that
a pointer may be in scope, but not point to anything, and
declaring a pointer does not create anything for it to point to. Also,
even if a pointer points to something at one time during an execution of the program, and its value is not subsequently written by the program, it can nevertheless stop pointing to anything. (This most often is a result of the pointer outliving the object to which it points.)
Additionally, it can be be both an advantage and a disadvantage that
a pointer can freely be assigned to point to a new object, any number of times during its lifetime.
Generally speaking, arrays are easier to use for many purposes:
declaring an array allocates space for all its elements. You may optionally specify initial values for them at the point of declaration, or in some (but not all) cases avail yourself of default initialization.
the identifier of an array is valid and refers to the array wherever it is in scope.
Optionally, if an initializer is provided then an array declaration can use it to automatically determine the array dimension(s).
* But only nearly all. There are a few exceptions, with the most important being the operand of a sizeof operator.
The difference between string1 and string is the same as the difference between:
char s1[4] = "foo";
char *s2 = "foo";
s1 is a writable array of 4 characters, s2 is a pointer to a string literal, which is not writable. See Why do I get a segmentation fault when writing to a string initialized with "char *s" but not "char s[]"?.
So in your example, it's OK to do string1[0][0] = 'f'; to change string1[0] to "foo", but string[0][0] = 'f'; causes undefined behavior.
Also, since string is an array of pointers, you can reassign those pointers, e.g. string[0] = "abc";. You can't assign to string1[0] because the elements are arrays, not pointers, just as you can't reassign s1.
The reason that string1Ptr works is because the string1 is a 2D array of char, which is guaranteed to be contiguous. string1Ptr is a pointer to an array of 4 characters, and when you index it you increment by that number of characters, which gets you to the next row of the string1 array.
I am looking for a solution for my problem (newbie here).
I have an array of strings (char** arrayNum) and I would like to store the number of elements of that array at the first index.
But I can't find the right way to do convert the number of elements as a string (or character).
I have tried itoa, casting (char), +'0', snprintf ... nothing works.
As every time I ask a question on Stack Overflow, I am sure the solution will be obvious. Thanks in advance for your help.
So I have an array of strings char** arrayNum which I populate, leaving the index 0 empty.
When I try to assign a string to arrayNum[0]:
This works: arrayNum[0] = "blabla";
This does not work: arrayNum[0] = (char) ((arraySize - 1)+'0');
I have tried countless others combinations, I don't even remember...
arrayNum can be thought of as an array of strings (char *). So you will naturally have trouble trying to assign a char (or indeed any type other than char *) to an element of this array.
I think it would preferable to store the length of the array separately to the array. For example, using a struct. To do otherwise invites confusion.
If you really, really want to store the length in the first element, then you could do something like:
arrayNum[0] = malloc(sizeof(char));
arrayNum[0][0] = (char) ((arraySize - 1)+'0');
This takes advantage of the fact that arrayNum is strictly an array of pointers and each of those pointers is a pointer to a char. So it can point to a single character or an array of characters (a "string").
Compare this for clarity with (say):
struct elements {
int length;
char **data;
};
arrayNum is not an "array of strings."
It might be useful for you to think about it that way, but it is important for you to know what it really is. It is an array of pointers where each pointer is a pointer to char.
Sometimes a pointer to char is a "string," and sometimes it's a pointer into the middle of a string, and sometimes it's just a pointer to some character somewhere. It all depends on how you use it.
The C programming language does not really have strings. It has string literals, but a string literal is just a const array of characters that happens to end with a \000. The reason you can write arrayNum[0] = "blabla"; is because the value of the string literal "blabla" is a pointer to the first 'b' in "blabla", and the elements of the arrayNum array are pointers to characters.
It's your responsibility to decide whether arrayNum[i] points to the first character of some string, or whether it just happens to point to some single character; and it's your responsibility to decide and keep track of whether it points to something that needs to be freed() or whether it points to read-only memory, or whether it points to something on the stack, or whether it points to/into some staticly allocated data structure.
The language doesn't care.
I'm learning C programming in a self-taught fashion. I know that numeric pointer addresses must always be initialized, either statically or dynamically.
However, I haven't read about the compulsory need of initializing char pointer addresses yet.
For example, would this code be correct, or is a pointer address initialization needed?
char *p_message;
*p_message = "Pointer";
I'm not entirely sure what you mean by "numeric pointer" as opposed to "char pointer". In C, a char is an integer type, so it is an arithmetic type. In any case, initialization is not required for a pointer, regardless of whether or not it's a pointer to char.
Your code has the mistake of using *p_message instead of p_message to set the value of the pointer:
*p_message = "Pointer" // Error!
This wrong because given that p_message is a pointer to char, *p_message should be a char, not an entire string. But as far as the need for initializing a char pointer when first declared, it's not a requirement. So this would be fine:
char *p_message;
p_message = "Pointer";
I'm guessing part of your confusion comes from the fact that this would not be legal:
char *p_message;
*p_message = 'A';
But then, that has nothing to do with whether or not the pointer was initialized correctly. Even as an initialization, this would fail:
char *p_message = 'A';
It is wrong for the same reason that int *a = 5; is wrong. So why is that wrong? Why does this work:
char *p_message;
p_message = "Pointer";
but this fail?
char *p_message;
*p_message = 'A';
It's because there is no memory allocated for the 'A'. When you have p_message = "Pointer", you are assigning p_message the address of the first character 'P' of the string literal "Pointer". String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap.
But chars, like ints, need to be allocated either on the stack or the heap. Either you need to declare a char variable so that there is memory on the stack:
char myChar;
char *pChar;
pChar = &myChar;
*pChar = 'A';
Or you need to allocate memory dynamically on the heap:
char* pChar;
pChar = malloc (1); // or pChar = malloc (sizeof (char)), but sizeof(char) is always 1
*pChar = 'A';
So in one sense char pointers are different from int or double pointers, in that they can be used to point to string literals, for which you don't have to allocate memory on the stack (statically) or heap (dynamically). I think this might have been your actual question, having to do with memory allocation rather than initialization.
If you are really asking about initialization and not memory allocation: A pointer variable is no different from any other variable with regard to initialization. Just as an uninitialized int variable will have some garbage value before it is initialized, a pointer too will have some garbage value before it is initialized. As you know, you can declare a variable:
double someVal; // no initialization, will contain garbage value
and later in the code have an assignment that sets its value:
someVal = 3.14;
Similarly, with a pointer variable, you can have something like this:
int ary [] = { 1, 2, 3, 4, 5 };
int *ptr; // no initialization, will contain garbage value
ptr = ary;
Here, ptr is not initialized to anything, but is later assigned the address of the first element of the array.
Some might say that it's always good to initialize pointers, at least to NULL, because you could inadvertently try to dereference the pointer before it gets assigned any actual (non-garbage) value, and dereferencing a garbage address might cause your program to crash, or worse, might corrupt memory. But that's not all that different from the caution to always initialize, say, int variables to zero when you declare them. If your code is mistakenly using a variable before setting its value as intended, I'm not sure it matters all that much whether that value is zero, NULL, or garbage.
Edit. OP asks in a comment: You say that "String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap", so how does allocation occur?
That's just how the language works. In C, a string literal is an element of the language. The C11 standard specifies in §6.4.5 that when the compiler translates the source code into machine language, it should transform any sequence of characters in double quotes to a static array of char (or wchar_t if they are wide characters) and append a NUL character as the last element of the array. This array is then considered immutable. The standard says: If the program attempts to modify such an array, the behavior is undefined.
So basically, when you have a statement like:
char *p_message = "Pointer";
the standard requires that the double-quoted sequence of characters "Pointer" be implemented as a static, immutable, NUL-terminated array of char somewhere in memory. Typically implementations place such string literals in a read-only area of memory such as the text block (along with program instructions). But this is not required. The exact way in which a given implementation handles memory allocation for this array / NUL terminated sequence of char / string literal is up to the particular compiler. However, because this array exists somewhere in memory, you can have a pointer to it, so the above statement does work legally.
An analogy with function pointers might be useful. Just as the code for a function exists somewhere in memory as a sequence of instructions, and you can have a function pointer that points to that code, but you cannot change the function code itself, so also the string literal exists in memory as a sequence of char and you can have a char pointer that points to that string, but you cannot change the string literal itself.
The C standard specifies this behavior only for string literals, not for character constants like 'A' or integer constants like 5. Setting aside memory to hold such constants / non-string literals is the programmer's responsibility. So when the compiler comes across statements like:
char *charPtr = 'A'; // illegal!
int *intPtr = 5; // illegal!
the compiler does not know what to do with them. The programmer has not set aside such memory on the stack or the heap to hold those values. Unlike with string literals, the compiler is not going to set aside any memory for them either. So these statements are illegal.
Hopefully this is clearer. If not, please comment again and I'll try to clarify some more.
Initialisation is not needed, regardless of what type the pointer points to. The only requirement is that you must not attempt to use an uninitialised pointer (that has never been assigned to) for anything.
However, for aesthetic and maintenance reasons, one should always initialise where possible (even if that's just to NULL).
First of all, char is a numeric type, so the distinction in your question doesn't make sense. As written, your example code does not even compile:
char *p_message;
*p_message = "Pointer";
The second line is a constraint violation, since the left-hand side has arithmetic type and the right-hand side has pointer type (actually, originally array type, but it decays to pointer type in this context). If you had written:
char *p_message;
p_message = "Pointer";
then the code is perfectly valid: it makes p_message point to the string literal. However, this may or may not be what you want. If on the other hand you had written:
char *p_message;
*p_message = 'P';
or
char *p_message;
strcpy(p_message, "Pointer");
then the code would be invoking undefined behavior by either (first example) applying the * operator to an invalid pointer, or (second example) passing an invalid pointer to a standard library function which expects a valid pointer to an object able to store the correct number of characters.
not needed, but is still recommended for a clean coding style.
Also the code you posted is completely wrong and won't work, but you know that and only wrote that as a quick example, right?
I have seen in several pieces of code a string declared as char*. How does this work, surely it is a pointer to a single char, not an array of chars which makes up a string. If I wished to take string input to a method that would be called like this:
theMethod("This is a string literal");
What datatype should the parameter be?
surely it is a pointer to a single char, not an array of chars
It's a pointer to the first character of an array of char. One can access each element of the array using a pointer to its first element by performing pointer arithmetic and "array" indexing.
What datatype should the parameter be?
const char *, if you don't wish to modify the characters from within the function (this is the general case), and char * if you do.
This is a common beginner-C confusion. A pointer to any type, T *, is ambiguously either a pointer to a single object of type T, or a pointer to an element within a linear array of objects of type T, size unspecified. You, the programmer, are responsible for knowing which is which, and passing around length information as necessary. If you get it wrong, the compiler stands by and watches as your program drives off the undefined-behavior cliff.
To the extent C has strings (there is a strong case to be made that it doesn't really) they take shameless advantage of this ambiguity, such that when you see char * or const char * in a C program, it almost always will be a pointer to a string, not a single char. The same is not true of pointers to any other type.
per definition is "string" of type char * (or unsigned char * or const char *) however, it is a pointer to the first character of that character chain (i dont want to use the words array or vector). The Difference is to see in char: 'x' (single quote)
this is good old c programming (sometimes i could cry for loosing it)
char *p = "i am here";
for (q=p; ++q; *q) { // so lets start with p walk through and end wit the /0 after the last e
if (*q=='h') { // lets find the first 'h' and cut the string there
*(q-1)=0;
break;
}
}
i used no const and other probs here, i just try to clearify
I'm learning C right now and got a bit confused with character arrays - strings.
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat!
char* name;
Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but
Why do they call this a char pointer? I know of pointers as references to variables
Is this an "excuse"? Does this find any other use than in char*?
What is this actually? Is it a pointer? How do you use it correctly?
thanks in advance,
lamas
I think this can be explained this way, since a picture is worth a thousand words...
We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.
char name[] = "Fortran";
+======+ +-+-+-+-+-+-+-+--+
|0x1234| |F|o|r|t|r|a|n|\0|
+======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name a memory address of 0x1234.
Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.
What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!
The address of name in memory is 0x1234, as in the example, if you were to do this:
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.
Now on to pointers...
char *name is a pointer to type of char....
Edit:
And we initialize it to NULL as shown Thanks Dan for pointing out the little error...
char *name = (char*)NULL;
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.
Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type
Suppose we do this:
name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");
We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,
char *name;
+======+ +======+ +-+-+-+-+-+-+-+--+
|0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0|
+======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this:
printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00005678
The address of name is 0x00009876
Now, this is where the illusion that 'arrays and pointers are the same comes into play here'
When we do this:
char ch = name[1];
What happens at runtime is this:
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.
That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.
Remember, an array of type T will always decay into a pointer of the first element of type T.
When we do this:
char ch = *(name + 5);
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.
Incidentally, you can also do that to the array of chars also...
Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value).
i.e.
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value) is commutative, that is in the reverse,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)
3[name]
Ok, how do I get the value of the pointer?
That is what the * is used for,
Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.
Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this
*name++;
*name--;
The pointer automatically steps up/down as a result by one.
When we do this, assuming the pointer memory address of 0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!
Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.
Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.
Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)
I hope this is of help to you in understanding pointers.
That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.
It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:
char* name = "Mr. Anderson";
That is actually pretty much the same as this:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup() function, like this:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:
void strcpy(char *str1, char *str2);
If you then pass that:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.
Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.
This ended up a bit more encyclopedic than I'd intended, hope its helpful.
Editted to remove C++ code. I mix the two so often, I sometimes forget.
char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.
It could point to a single byte of memory and be a "true" pointer to a single char.
It could point to a contiguous area of memory which holds a number of characters.
If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
char *name, on it's own, can't hold any characters. This is important.
char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.
Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.
It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.
However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As #which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.
It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.
a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.
if you used the following:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
One is an actual array object and the other is a reference or pointer to such an array object.
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.
The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran