character pointer takes the address of what - c

I read that:
char a[] = "string";
is a: "string"
whereas
char *ptr = "string"
is ptr: [__] ---> "string"
I am little confused. One thing I know is that pointers always store the address. In case of character pointer what address does it store? What does this block represent (block which I made pointing to string). Is it the starting address of the "string".
And in case of array? How can I clearly differentiate between char pointer and char array?

Diagrams may help.
char *ptr = "string";
+-------+ +----------------------------+
| ptr |--------->| s | t | r | i | n | g | \0 |
+-------+ +----------------------------+
char a[] = "string";
+----------------------------+
| s | t | r | i | n | g | \0 |
+----------------------------+
Here, ptr is a variable that holds a pointer to some (constant) data. You can subsequently change the memory address that it points at by assigning a new value to ptr, such as ptr = "alternative"; — but you cannot legitimately change the contents of the array holding "string" (it is officially readonly or const, and trying to modify it may well crash your program, or otherwise break things unexpectedly).
By contrast, a is the constant address of the first byte of the 7 bytes of data that is initialized with the value "string". I've not shown any storage for the address because, unlike a pointer variable, there isn't a piece of changeable storage that holds the address. You cannot change the memory address that a points to; it always points to the same space. But you can change the contents of the array (for example, strcpy(a, "select");).
When you call a function, the difference disappears:
if (strcmp(ptr, a) == 0)
…string is equal to string…
The strcmp() function takes two pointers to constant char data (so it doesn't modify what it is given to scrutinize), and both ptr and a are passed as pointer values. There's a strong case for saying that only pointers are passed to functions — never arrays — even if the function is written using array notation.
Nevertheless, and this is crucial, arrays (outside of paramter lists) are not pointers. Amongst other reasons for asserting that:
sizeof(a) == 7
sizeof(ptr) == 8 (for 64-bit) or sizeof(ptr) == 4 (32-bit).

In case of character pointer what address does it store? What does this block represent (block which I made pointing to string). Is it the starting address of the "string".
This blocks represents a WORD or DWORD (achitecture dependant), the content of this block is a memory address, a random location defined at compile time. That memory address is the address of first character of the string.
In practice, the difference is how much stack memory it uses.
For example when programming for microcontrollers where very little memory for the stack is allocated, makes a big difference.
char a[] = "string"; // the compiler puts {'s','t','r','i','n','g', 0} onto STACK
char *b = "string"; // the compiler puts just the pointer onto STACK
// and {'s','t','r','i','n','g',0} in static memory area.
Maybe this will help you understand.
assert(a[0] == 's'); // no error.
assert(b[0] == 's'); // no error.
assert(*b == 's'); // no error.
b++; // increment the memory address, so points to 't'
assert(*b == 's'); // assertion failed
assert(*b == 't'); // no error.

char a[] = "string"; initializes the value of the array of chars called a with the value string. And the size of a.
char *a = "string"; creates an unnamed static array of chars somewhere in memory and return the address of the first element of this unnamed array to a.
In the first one, a stores the address of the first element of the array. So when we index something like a[4], this means 'take' the 4th element after the begin of the object named a.
In the second, a[4] means 'take' the 4th element after the object that a points to.
And for your last question:
A char array is a 'block' of contiguous elements of type char. A char pointer is a reference to an element of the type char.
Due to pointer arithmetics, a pointer can be used to simulate (and access) an array.
Maybe those 3 links help make the difference more clear:
http://c-faq.com/decl/strlitinit.html
http://c-faq.com/aryptr/aryptr2.html
http://c-faq.com/aryptr/aryptrequiv.html

You may find it useful to think of:
char * a = "string";
as the same as:
char SomeHiddenNameYouWillNeverKnowOrSee[] = "string"; /* may be in ReadOnly memory! */
char * a = &SomeHiddenNameYouWillNeverKnowOrSee[0];

Did you ever tried to open some executabe file with a text editor ? It appears merely as garbage, but in the middle of the garbage you can see some readable strings. These are all the litteral strings defined in you program.
printf("my literal text");
char * c = "another literal text"; // should be const char *, see below
If your program contains the above code you may be able to find my literal textand another literal text in program's binary (actually it depends on the details of the binary format, but it often works). If you are Linux/Unix user you can also use the strings command for that.
By the way, if you write the above code, C++ compilers will emit some warning (g++ say: warning: deprecated conversion from string constant to ‘char*’ because such strings are not of type char * but const char [] (const char array) which decay to const char * when assigned to a pointer.
This also is the case with C compilers, but the above error is so very common that this warning is usually disabled. gcc does not even include in -Wall, you have to explicitely enable it through -Wwrite-strings. The warning is warning: initialization discards ‘const’ qualifier from pointer target type.
It merely reminds that you are theoretically not allowed to change the literal texts through pointers.
The executable may loads such strings in a read only part of Data segment memory. If you try to change the content of string it can raise a memory error. Also the compiler is allowed to optimise literal text storage by merging identical strings for instance. The pointer just contains the address in (read only) memory where the literal strings will be loaded.
On the other hand
char c[] = "string"; is mere syntaxic sugar for char c[7] = {'s', 't', 'r', 'i', 'n', 'g', 0}; If you do sizeof(c) in your code it will be 7 bytes (the size of the array, not the size of a pointer). This is an array on stack with an initialiser. Internally the compiler can do wathever it likes to initialize the array. It can be characters constants loaded one by one in the array, or it can involved a memcpy of some hiden string literal. The thing is that you have no way to tell the difference from your program and find out where the data comes from. Only the result matters.
By the way a thing that is slightly confusing is that if you define some function parameter of the type char c[], then it won't be an array but an alternative syntax for char * c.

In your example, ptr contains the address of the first char in the string.
As for the difference between a char array and a string, in C terms there is no difference other than the fact that by convention what we call "string" is a char array where the final char is a NULL, to terminate the string.
i.e. even if we have an array of char with 256 potential elements, if the first (0th) char is null (0) then the length of the string is 0.
Consider a variable str which is a char array of 5 chars, containing the string 'foo'.
*ptr => str[0] 'f'
str[1] 'o'
str[2] 'o'
str[3] \0
str[4] ..
A char *ptr to this array would reference the first element (index = 0) and the 4th element (index = 3) would be null, marking the end of the 'string'. The 5th element (index = 4) will be ignored by string handling routines which respect the null terminator.

If you are asking what a contains in each case then:
char a[] = "string";
// a is a pointer.
// It contains the address of the first element of the array.
char *a = "string";
// Once again a is a pointer containing address of first element.
As rnrneverdies has explained in his answer, the difference is in where the elements are stored.

Related

Why can I store a string in the memory address of a char?

I'm starting to understand pointers and how to dereference them etc. I've been practising with ints but I figured a char would behave similarly. Use the * to dereference, use the & to access the memory address.
But in my example below, the same syntax is used to set the address of a char and to save a string to the same variable. How does this work? I think I'm just generally confused and maybe I'm overthinking it.
int main()
{
char *myCharPointer;
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
myCharPointer = "This is a string.";
printf("%s\n", myCharPointer);
return 0;
}
First, you need to understand how "strings" work in C.
"Strings" are stored as an array of characters in memory. Since there is no way of determining how long the string is, a NUL character, '\0', is appended after the string so that we know where it ends.
So for example if you have a string "foo", it may look like this in memory:
--------------------------------------------
| 'f' | 'o' | 'o' | '\0' | 'k' | 'b' | 'x' | ...
--------------------------------------------
The things after '\0' are just stuff that happens to be placed after the string, which may or may not be initialised.
When you assign a "string" to a variable of type char *, what happens is that the variable will point to the beginning of the string, so in the above example it will point to 'f'. (In other words, if you have a string str, then str == &str[0] is always true.) When you assign a string to a variable of type char *, you are actually assigning the address of the zeroth character of the string to the variable.
When you pass this variable to printf(), it starts at the pointed address, then goes through each char one by one until it sees '\0' and stops. For example if we have:
char *str = "foo";
and you pass it to printf(), it will do the following:
Dereference str (which gives 'f')
Dereference (str+1) (which gives 'o')
Dereference (str+2) (which gives another 'o')
Dereference (str+3) (which gives '\0' so the process stops).
This also leads to the conclusion that what you're currently doing is actually wrong. In your code you have:
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
When printf() sees the %s specifier, it goes to address pointed to by myCharPointer, in this case it contains 'G'. It will then try to get next character after 'G', which is undefined behaviour. It might give you the correct result every now and then (if the next memory location happens to contain '\0'), but in general you should never do this.
Several comments
Static strings in c are treated as a (char *) to a null terminated
array of characters. Eg. "ab" would essentially be a char * to a block of memory with 97 98 0. (97 is 'a', 98 is 'b', and 0 is the null termination.)
Your code myCharPointer = &charMemoryHolder; followed by printf("%s\n", myCharPointer) is not safe. printf should be passed a null terminated string, and there's no guarantee that memory contain the value 0 immediately follows your character charMemoryHolder.
In C, string literals evaluate to pointers to read-only arrays of chars (except when used to initialize char arrays). This is a special case in the C language and does not generalize to other pointer types. A char * variable may hold the address of either a single char variable or the start address of an array of characters. In this case the array is a string of characters which has been stored in a static region of memory.
charMemoryHolder is a variable that has an address in memory.
"This is a string." is a string constant that is stored in memory and also has an address.
Both of these addresses can be stored in myCharPointer and dereferenced to access the first character.
In the case of printf("%s\n", myCharPointer), the pointer will be dereferenced and the character displayed, then the pointer is incremented. It repeasts this until finds a null (value zero) character and stops.
Hopefully you are now wondering what happens when you are pointing to the single 'G' character, which is not null-terminated like a string constant. The answer is "undefined behavior" and will most likely print random garbage until it finds a zero value in memory, but could print exactly the correct value, hence "undefined behavior". Use %c to print the single character.

Something I don't get about C strings

A few questions regarding C strings:
both char* and char[] are pointers?
I've learned about pointers and I can tell that char* is a pointer, but why is it automatically a string and not just a char pointer that points to 1 char; why can it hold strings?
Why, unlike other pointers, when you assign a new value to the char* pointer you are actually allocating new space in memory to store the new value and, unlike other pointers, you just replace the value stored in the memory address the pointer is pointing at?
A pointer is not a string.
A string is a constant object having type array of char and, also, it has the property that the last element of the array is the null character '\0' which, in turn, is an int value (converted to char type) having the integer value 0.
char* is a pointer, but char[] is not. The type char[] is not a "real" type, but an incomplete type. The C language is specified in such a way that, in the moment that you define a concrete variable (object) having array of char type, the size of the array is well determined in some way or another. Thus, none variable has type char[] because this is not a type (for a given object).
However, automatically every object having type array of N objects of type char is promoted to char *, that is, a pointer to char pointing to the initial object of the array.
On the other hand, this promotion is not always performed. For example, the operator sizeof() will give different results for char* than for an array of N chars. In the former case, the size of a pointer to char is given (which is in general the same amount for every pointer...), and in the last case gives you the value N, that is, the size of the array.
The behaviour is differente when you declare function arguments as char* and char[]. Since the function cannot know the size of the array, you can think of both declarations as equivalent.
Actually, you are right here: char * is a pointer to just 1 character object. However, it can be used to access strings, as I will explain you now: In the paragraph 1. I showed you that the strings are considered objects in memory having type array of N chars for some N. This value N is big enough to allow an ending null character (as all "string" is supposed to be in C).
So, what's the deal here?
The key point to understand this issues is the concept of object (in memory).
When you have a string or, more generally, an array of char, this means that you have figured out some manner to hold an array object in memory.
This object determines a portion of RAM memory that you can access safely, because C has assigned enough memory for it.
Thus, when you point to the first byte of this object with a char* variable, actually you have guaranteed access to all the adjacent elements to the "right" of that memory place, because those places are well defined by C as having the bytes of the array above.
Briefly: the adjacent (to the right) bytes of the byte pointed by a char* variable can be accessed, they are valid places to access, so the pointer can be "iterated" to walk through these bytes, up to the end of the string, without "risks", since all the bytes in an array are contiguous well defined positions in memory.
This is a complicated question, but it reveals that you are not understanding the relationship between pointers, arrays, and string literals in C.
A pointer is just a variable pointing to a position in memory.
A pòinter to char points to just 1 object having type char.
If the adjacent bytes of the pointed position correspond to an array of chars, they will be accessible by the pointer, so the pointer can "walk on" the memory bytes occupied by the array object.
A string literal is considered as an array of char object, which implictely add an ending byte with value 0 (the null character).
In any case, an array of T object has a well defined "size".
A string literal has an additional property: it's a constant object.
Try to fit and gather these concepts in your mind to figure out what's going on.
And ask me for clarification.
ADDITIONAL REMARKS:
Consider the following piece of code:
#include <stdio.h>
int main(void)
{
char *s1 = "not modifiable";
char s2[] = "modifiable";
printf("%s ---- %s\n\n", s1, s2);
printf("Size of array s2: %d\n\n", (int)sizeof(s2));
s2[1] = '0', s2[3] = s2[5] = '1', s2[4] = '7',
s2[6] = '4', s2[7] = '8', s2[9] = '3';
printf("New value of s2: %s\n\n",s2);
//s1[0] = 'X'; // Attempting to modify s1
}
In the definition and initialization of s1 we have the string literal "not modifiable", which has constant content and constant address. Its address is assigned to the pointer s1 as initialization.
Any attempt to modify the bytes of the string will give some kind of error, because the array content is read-only.
In the definition and initialization of s2, we have the string literal "modifiable", which has, again, constant content and constant address. However, what happens now is that, as part of the initialization, the content of the string is copied to the array of char s2. The size of the array s2 is not specified (the declaration char s2[] gives an incomplete type), but after initialization the size of the array is well determined and defined as the exact size of the copied string (plus 1 character used to hold the null character, or end-of-string mark).
So, the string literal "modifiable" is used to initialize the bytes of the array s2, which is modifiable.
The right manner to do that is by changing a character at the time.
For more handy ways of modifying and assigning strings, it has to be used the standard header <string.h>.
char *s is a pointer, char s[] is an array of characters. Ex.
char *s = "hello";
char c[] = "world";
s = c; //Legal
c = address of some other string //Illegal
char *s is not a string; it points to an address. Ex
char c[] = "hello";
char *s = &c[3];
Assigning a pointer is not creating memory; you are pointing to memory. Ex.
char *s = "hello";
In this example when you type "hello" you are creating special memory to hold the string "hello" but that has nothing to do with the pointer, the pointer simply points to that spot.

How are pointers "variables whose values are an address" when we can do char *string = "string"?

I've always understood pointers as follows:
int x = 5;
int *y = &x;
printf("%d", *y);
I store 5 at some memory location and allow myself to access that value with x.
I create an integer pointer y, setting its memory location to the memory location of x.
I print the value stored at the address that y holds.
However, I can at the same time do char *string = "neato" and it's totally functional. To me this looks like "create a character pointer, holding the memory address 'neato'". How does that make any sense?
Furthermore, if I set it, I would try to do it as *string = "more neat" but that gives an error. I instead need to do string = "more neat". The first attempt intuitively looks like "change the value stored at the memory address held by string to 'more neat'", but it doesn't work. The second looks to me like "for the memory address held by 'string', change it to 'more neato'. And that totally doesn't make sense to me.
What am I confusing? If in order to access the value stored at a pointer I need to do printf("%d", *pointer), how is setting its value not along those lines as well?
The unary * operator has two different (but related) purposes. In a declaration, it denotes that the type is a pointer type, but does not imply that the pointer is being dereferenced.
When used in an expression, it denotes deferencing the pointer. That's why your example works where it's a declaration, but when you dereference it you can't assign (because the appropriate type to assign would be a char, if you could modify string literals anyhow).
The equivalent way to do it outside the declaration looks like:
const char *s = "hello"; /* initialize pointer value */
s = "goodbye"; /* assign to pointer value */
The above initialization and assignment are equivalent.
"neato" is of the type const char[]. Arrays decay to pointers when appropriate, so the assignment is one pointer to another.
It follows then that your pointer should be to const char. Writing to a memory location occupied by a string literal invokes undefined behavior. This however is valid:
char str[] = "neato";
str[0] = 'p';
Furthermore, if I set it, I would try to do it as *string = "more neat"
Well, you have a pointer to char, so that assignment doesn't make sense (also what I said above about writing to a string literal.)
For example,
char *ptr = "neato";
char arr[] = "neato";
are completely different. ptr is a pointer to the string literal "neato" and the compiler usually stores the string literal in read-only memory. So you can't change the string literal ptr points to, but you can change the value of ptr, namely the address.
*ptr = "more neat"; // error, even if *ptr were writable, it should be a character
*ptr = 'b'; // error
ptr = "more neat"; // ok, you just create another string literal and ptr now points to it
The second one is just an abbreviation of
char arr[] = {'n', 'e', 'a', 't', 'o', '\0'};
In this case, you can change the characters in the array, but you can't change the address of arr(yeah, it's an array)
*arr = 'b'; // ok
arr = "more neat" // error, the value of arr, namely the address of the array cannot be changed
Initialisation and Assignment are different, even if they look pretty similar, the meaning of the operation is often different.
The basic type of both char * and the string "neato" is a char; String literals are simply arrays of characters, often located in read-only addresses. "n" must be located in an address (as must be the next character 'e' and so on). That address is stored in the variable char *ptr;
The second part of the question is with why *ptr = "more neat"; is invalid.
*var dereferences the address -- in this case the 1-character wide memory address, that contains already the character 'n'. You can't put either the address of the new string literal (probably representable in 4 or 8 bytes) to a single char; nor can you put the 9 characters and the terminating ascii-zero in it.
We may study this by taking a memory dump of a 16-bit (Big Endian) machine;
0FFE: .. .. // other variables, return addresses etc.
1000: F0 00 // The pointer "var" is located here
1002: // Top of stack
F000: "n" "e" "a" "t" "o" 00 // Address of constant string is F000
F006: "m" "o" "r" "e" ... // Address of next string is F006
*var accesses the single byte memory at F000. var is itself located at address 0000 and is in this machine two bytes, because addresses are here 16-bit wide. After the new assignment var="more neat"; The memory dump is:
1000: F0 06 // Pointer holds a new address

C: Behaviour of arrays when assigned to pointers

#include <stdio.h>
main()
{
char * ptr;
ptr = "hello";
printf("%p %s" ,"hello",ptr );
getchar();
}
Hi, I am trying to understand clearly how can arrays get assign in to pointers. I notice when you assign an array of chars to a pointer of chars ptr="hello"; the array decays to the pointer, but in this case I am assigning a char of arrays that are not inside a variable and not a variable containing them ", does this way of assignment take a memory address specially for "Hello" (what obviously is happening) , and is it possible to modify the value of each element in "Hello" wich are contained in the memory address where this array is stored. As a comparison, is it fine for me to assign a pointer with an array for example of ints something as vague as thisint_ptr = 5,3,4,3; and the values 5,3,4,3 get located in a memory address as "Hello" did. And if not why is it possible only with strings? Thanks in advanced.
"hello" is a string literal. It is a nameless non-modifiable object of type char [6]. It is an array, and it behaves the same way any other array does. The fact that it is nameless does not really change anything. You can use it with [] operator for example, as in "hello"[3] and so on. Just like any other array, it can and will decay to pointer in most contexts.
You cannot modify the contents of a string literal because it is non-modifiable by definition. It can be physically stored in read-only memory. It can overlap other string literals, if they contain common sub-sequences of characters.
Similar functionality exists for other array types through compound literal syntax
int *p = (int []) { 1, 2, 3, 4, 5 };
In this case the right-hand side is a nameless object of type int [5], which decays to int * pointer. Compound literals are modifiable though, meaning that you can do p[3] = 8 and thus replace 4 with 8.
You can also use compound literal syntax with char arrays and do
char *p = (char []) { "hello" };
In this case the right-hand side is a modifiable nameless object of type char [6].
The first thing you should do is read section 6 of the comp.lang.c FAQ.
The string literal "hello" is an expression of type char[6] (5 characters for "hello" plus one for the terminating '\0'). It refers to an anonymous array object with static storage duration, initialized at program startup to contain those 6 character values.
In most contexts, an expression of array type is implicitly converted a pointer to the first element of the array; the exceptions are:
When it's the argument of sizeof (sizeof "hello" yields 6, not the size of a pointer);
When it's the argument of _Alignof (a new feature in C11);
When it's the argument of unary & (&arr yields the address of the entire array, not of its first element; same memory location, different type); and
When it's a string literal in an initializer used to initialize an array object (char s[6] = "hello"; copies the whole array, not just a pointer).
None of these exceptions apply to your code:
char *ptr;
ptr = "hello";
So the expression "hello" is converted to ("decays" to) a pointer to the first element ('h') of that anonymous array object I mentioned above.
So *ptr == 'h', and you can advance ptr through memory to access the other characters: 'e', 'l', 'l', 'o', and '\0'. This is what printf() does when you give it a "%s" format.
That anonymous array object, associated with the string literal, is read-only, but not const. What that means is that any attempt to modify that array, or any of its elements, has undefined behavior (because the standard explicitly says so) -- but the compiler won't necessarily warn you about it. (C++ makes string literals const; doing the same thing in C would have broken existing code that was written before const was added to the language.) So no, you can't modify the elements of "hello" -- or at least you shouldn't try. And to make the compiler warn you if you try, you should declare the pointer as const:
const char *ptr; /* pointer to const char, not const pointer to char */
ptr = "hello";
(gcc has an option, -Wwrite-strings, that causes it to treat string literals as const. This will cause it to warn about some C code that's legal as far as the standard is concerned, but such code should probably be modified to use const.)
#include <stdio.h>
main()
{
char * ptr;
ptr = "hello";
//instead of above tow lines you can write char *ptr = "hello"
printf("%p %s" ,"hello",ptr );
getchar();
}
Here you have assigned string literal "hello" to ptr it means string literal is stored in read only memory so you can't modify it. If you declare char ptr[] = "hello";, then you can modify the array.
Say what?
Your code allocates 6 bytes of memory and initializes it with the values 'h', 'e', 'l', 'l', 'o', and '\0'.
It then allocates a pointer (number of bytes for the pointer depends on implementation) and sets the pointer's value to the start of the 5 bytes mentioned previously.
You can modify the values of an array using syntax such as ptr[1] = 'a'.
Syntactically, strings are a special case. Since C doesn't have a specific string type to speak of, it does offer some shortcuts to declaring them and such. But you can easily create the same type of structure as you did for a string using int, even if the syntax must be a bit different.

C strings confusion

I'm learning C right now and got a bit confused with character arrays - strings.
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat!
char* name;
Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but
Why do they call this a char pointer? I know of pointers as references to variables
Is this an "excuse"? Does this find any other use than in char*?
What is this actually? Is it a pointer? How do you use it correctly?
thanks in advance,
lamas
I think this can be explained this way, since a picture is worth a thousand words...
We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.
char name[] = "Fortran";
+======+ +-+-+-+-+-+-+-+--+
|0x1234| |F|o|r|t|r|a|n|\0|
+======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name a memory address of 0x1234.
Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.
What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!
The address of name in memory is 0x1234, as in the example, if you were to do this:
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.
Now on to pointers...
char *name is a pointer to type of char....
Edit:
And we initialize it to NULL as shown Thanks Dan for pointing out the little error...
char *name = (char*)NULL;
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.
Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type
Suppose we do this:
name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");
We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,
char *name;
+======+ +======+ +-+-+-+-+-+-+-+--+
|0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0|
+======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this:
printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00005678
The address of name is 0x00009876
Now, this is where the illusion that 'arrays and pointers are the same comes into play here'
When we do this:
char ch = name[1];
What happens at runtime is this:
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.
That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.
Remember, an array of type T will always decay into a pointer of the first element of type T.
When we do this:
char ch = *(name + 5);
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.
Incidentally, you can also do that to the array of chars also...
Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value).
i.e.
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value) is commutative, that is in the reverse,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)
3[name]
Ok, how do I get the value of the pointer?
That is what the * is used for,
Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.
Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this
*name++;
*name--;
The pointer automatically steps up/down as a result by one.
When we do this, assuming the pointer memory address of 0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!
Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.
Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.
Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)
I hope this is of help to you in understanding pointers.
That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.
It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:
char* name = "Mr. Anderson";
That is actually pretty much the same as this:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup() function, like this:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:
void strcpy(char *str1, char *str2);
If you then pass that:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.
Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.
This ended up a bit more encyclopedic than I'd intended, hope its helpful.
Editted to remove C++ code. I mix the two so often, I sometimes forget.
char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.
It could point to a single byte of memory and be a "true" pointer to a single char.
It could point to a contiguous area of memory which holds a number of characters.
If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
char *name, on it's own, can't hold any characters. This is important.
char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.
Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.
It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.
However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As #which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.
It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.
a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.
if you used the following:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
One is an actual array object and the other is a reference or pointer to such an array object.
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.
The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran

Resources