String array initialization in C - c

In C it's possible to initialize a string array with such line of code:
char *exStr[] = { "Zorro", "Alex", "Celine", "Bill", "Forest", "Dexter" };
On my debugger the array is initialized as below:
This is different as standard two-dimensional array where each string occupy the same amount of byte; can someone point me on the right direction to understand:
What's means exactly the declaration "char *exStr[] = ...";
How can I re-create the same variable's structure from my program.

What's means exactly the declaration "char *exStr[] = ...";
It means 'array of poniter to char'. Char data of literals like "Zorro" are usually placed in read-only data segments, and array elements inlcude only address of the first char of literal.
How can I re-create the same variable's structure from my program.
You can do something like:
char zorro[] = {'Z', 'o', 'r', 'r', 'o', '\0'}; // initialized every char for clarity
char alex[] = "Alex";
char celine[] = "Celine";
...
char* exStr[] = {
&zorro[0], // explicitly referenced for clarity
alex,
celine,
...
};

1.
As the other answers and comments said, the expression "char *exStr[]" means "array of pointers to char".
a) How to read it
The best way to read it (as well as other more complex C declarations) is to start at the thing's name (in this case "exStr") and work your way towards the extremities of the declaration, like this:
First go right, adding each encountered meaningful symbol(s) to the meaning of the expression
Stop going right at the first closing paranthesis ")", or when the expression ends
When you cannot go right anymore, resume where you started and go left, again adding each encountered symbol to the meaning of the expression
Stop going left at the paranthesis "(" that corresponds to the ")" that stopped you on the right, or at the beginning of the expression
When stopped going left at a "(" paranthethis and you haven't reached the beginning of the expression, resume going right immediately after the ")" that corresponds to it
Keep going right and left until you reach the margins of the expression both ways
In your case, you would go like that:
start at exStr: that's the variable's name
go right: []: exStr is an array
go right: stop: there's nothing there, the "=" sign stops us
go left: *: exStr is an array of pointers
go left: char: exStr is an array of pointers to char
go left: stop: there's nothing there, the "=" sign stops us
b) Why are you seeing that each array element occupies a different amount of bytes
When you have a value like "Zorro", it's a C string.
A C string is an array of bytes that starts at a given address in memory and ends with 0. In case of "Zorro" it will occupy 6 bytes: 5 for the string, the 6th for the 0.
You have several ways of creating C strings. A few common ones are:
A) use it literally, such as:
printf("Zorro");
B) use it literraly but store it in a variable:
char *x = "Zorro";
C) allocate it dynamically and copy data into it
size_t n = 2;
char *p = malloc((n+1) * sizeof(char));
char c = getch(); // read a character from the console
p[0] = c;
p[1] = 0;
// do something with p...
free(p);
Whenever you're putting a string value like "Zorro" in your program, the compiler will reserve memory for just enough to hold this string plus the 0 terminator. It will initialize this memory with whatever you provided inside "", it will add the 0, and secretly keep a pointer to that memory. It will prevent you from modifying that memory (you cannot change that string).
In your code sample, it did this for every string that appeared in the exStr initialization.
That's why you see each array element occupying a different amount of memory. If you look closer to your debugger output, you'll see that the compiler reserved memory for a string immediately after the preceding one, and each string occupies its length in bytes plus the 0 terminator. E.g. "Zorro" starts at 02f, and occupies positions 02f - 034, which are 6 bytes (5 for Zorro and 1 for the 0 terminator). Then "Alex" starts at 035, occupies 5 bytes: 035 - 039, and so on.
2.
To create a similar array programmatically:
If all you have are some static strings like in your example, then your code sample is good enough.
Otherwise if you plan to put dynamic values into your array (or if you plan to change the original strings in the program), you would do something like:
#define COUNT 5
char *strings[COUNT];
int i;
for (i = 0; i < COUNT; i++) {
int n = 32; // or some other suitable maximum value, or even a computed value
strings[i] = malloc((n+1) * sizeof(char));
// put up to 32 arbitrary characters into strings[i], e.g. read from a file or from console; don't forget to add the 0 terminator
}
// use strings...
// free memory for the strings when done
for (i = 0; i < COUNT; i++) {
free(strings[i]);
}

What's means exactly the declaration "char *exStr[] = {......};
Here, exStr is an array of char pointers, initialized by the supplied brace enclosed initializers.
To elaborate, the starting address of each string literal in the initializer list is stored into each element of the array.
Point to note, the contents of the array elements (being string literals) are not modifiable.
Regarding the second part,
How can I re-create the same variable's structure from my program.
is not clear. Can you elaborate on that part?
Just in case, if you meant how to access, then, it's just like the normal array access. As long as you're within bounds, you're good to go.

For the second part of your question, I guess you are asking for an array of pointers to variable length strings.
It can be done like this:
#include <stdio.h>
int main(void) {
char* ma[2]; // Array of char pointers for pointing to char-strings
char* pZorro = "Zorro"; // Char string - not modifiable
char AlexString[5] = {"Alex"}; // Char string - modifiable
ma[0] = pZorro; // Make the first pointer point to the "Zorro" string
ma[1] = AlexString; // Make the second pointer point to the "Alex" string
printf("%s\n", ma[0]);
printf("%s\n", ma[1]);
// strcpy(ma[0], "x"); // Run time error! Can't change "Zorro"
strcpy(ma[1], "x"); // OK to change "Alex"
printf("%s\n", ma[0]);
printf("%s\n", ma[1]);
return 0;
}
The output will be:
Zorro
Alex
Zorro
x

Related

I want to know how double quotes are used in C

In the code char * str = "hello";, I understand that code "hello" is to allocate the word hello to any other memory and then put the first value of that allocated memory into the variable str.
But when I use the code char str[10] = "hello";, I understood that the word hello is included in each element of the array.
If then, on the top, the code "hello" returns the address of the memory
and on the bottom, the code "hello" returns the word h e l l o \n.
I want to know why they are different and if I'm wrong, I want to know what double quotes return.
C is a bit quirky. You have two distinct use cases here. But let's first start with what "hello" is.
Your "hello" in the program source code is a character string literal. That is a character sequence enclosed in double quotes. When the compiler is compiling this source code, it appends a zero byte to the sequence, so that standard library functions like strlen() can work on it. The resulting zero-terminated sequence is then used by the compiler to "initialize an array of static storage duration and length just sufficient to contain the sequence array of constant characters" (n1570 ISO C draft, 6.4.5/6). That length is 6: The 5 characters h, e, l, l and o as well as the appended zero byte.
"Static storage duration" means that the array exists the entire time the program is running (as opposed to objects with automatic local storage duration, e.g. local variables, and those with dynamic storage duration, which are created via malloc() or calloc()).
You can memorize the address of that array, as in char *str = "hello";. This address will point to valid memory during the lifetime of the program.
The second use case is a special syntax for initializing character arrays. It is just syntactic sugar for this common use case, and a deviation from the fact that you cannot normally initialize arrays with arrays.1
This time you don't define a pointer, you define a proper array of 10 chars. You then use the string literal to initialize it. You always can use the generic method to initialize a character array by listing the individual array elements, separated by commas, in curly braces (by the way, this generic method works also for the other kind of compound types, namely structs):
char str[10] = { 'h', 'e', 'l', 'l', 'o', '\0' };
This is entirely equivalent to
char str[10] = "hello";
Now your array has more elements (10) than the number of characters in the initializing array produced from the string literal (6); the standard stipulates that "subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration". Those global and static variables are initialized with zero, which means that the character array str ends with 4 zero characters.
It is immediately obvious why Dennis Ritchie added the somewhat anti-paradigmatic initialization of character arrays via a string literal, probably after the second time he had to do it with the generic array initialization syntax. Designing your own language has its benefits.
1 For example, static char src[] = "123"; char dest[] = src; doesn't work. You have to use strcpy().
The initialization:
char * str = "hello";
in most C implementations makes sure that the string hello is placed in a constant data section of the executable memory. Exactly six bytes are written, the last one being the string terminator '\0'.
str char pointer contains the address of the first character 'h', so that anyone accessing the string knows that the following bytes have to be read until the terminator character is found.
The other initialization
char str[10] = "hello"; // <-- string must be enclosed in double quotes
is very similar, as str points to the first character of the string and that the following characters are written in the following memory locations (included the string terminator).
But:
Even if only six bytes are explicitly initialized, ten bytes are allocated because that's the size of the array. In this case, the four trailing bytes will contain zeroes
Data is not constant and can be changed, while in the previous example it wasn't possible because such initialization, in most C implementations, instructs the compiler to use a constant data section
You seem to be mixing up some things:
char str[10] = "hello';
This does not even compile: when you start with a double-quote, you should end with one:
char str[10] = "hello";
In memory, this has following effect:
str[0] : h
str[1] : e
str[2] : l
str[3] : l
str[4] : o
str[5] : 0 (the zero character constant)
str[6] : xxx
str[7] : xxx
str[8] : xxx
str[9] : xxx
(By xxx, I mean that this can be anything)
As a result, the code will not return hello\n (with an end-of-line character), just hello\0 (the zero character).
The double quotes just mention the beginning and the ending of a string constant and return nothing.

Is there any benefit of using pointer arithmetic on character array in C?

I have a question regarding the operation on character array.
I was inspired by This answer here that discusses char* vs char[]. In this answer, it says considering a case where we want to cut off the first character of a word,
char* p = "hello";
p++;
is a much faster operation, than if we declared using character array, and therefore had to index into every element.
char s[] = "hello";
s[0] = 'e';
s[1] = 'l';
s[2] = 'l';
s[3] = 'o';
s[4] = '/0';
But my question is, in the second case, cannot I just use another pointer that points to the start of s, i.e.
char* ptr = s;
and then treat it the same as char* p? Would it be as fast?
You mean something like this?
char s[] = "Hello";
char *ptr = s+1;
Yes, you can do that, and it results in ptr pointing to the string "ello". It is just as fast; like the other example, it should be simple arithmetic and not require any copying of data.
I think it is a misconception to think of this as a "pointer versus array" issue. Pointers and arrays in C are inseparable. Both examples involve an array, and two different pointers into that array. In the first example, the array is created by the string literal "Hello" and doesn't have a name; the values of p before and after p++ both point into that array. In the second example, the array is named s, and the expression s decays into a pointer to the zeroth element of that array; s+1 is a pointer to the first element.
The main difference is that in the second example, the array can be modified, and may have a finite lifetime. So your program now has to manage issues of aliasing and ownership. Suppose one part of the code accesses the string through s, and another part accesses it through ptr. Are you prepared for the fact that if the first code modifies the string, the second code will see it change? Does your code correctly keep track of when the lifetime of s ends (e.g. the end of the block where it's defined), and does it make sure not to use ptr beyond that point?
By contrast, the string literal in the first example has static lifetime, and survives until the end of the program; so ptr = p+1 remains valid indefinitely. It also must not be modified at all, or undefined behavior results. So if your program is otherwise bug free, you don't have to worry about some other part of your code changing the string behind your back.
The way I understand these things:
char *p = "hello";
Six bytes (including terminating '\0') are put in READ ONLY memory, and the address of the first byte is put into the variable p that has the datatype char *. Yes, the code can increment/decrement p to point further into the string.
char s[] = "hello";
Six bytes (including terminating '\0') are put in READ/WRITE memory, and referring to s means referring to the initial byte of the string. Yes, another pointer can be declared: char *p = s; And, yes, the code can manipulate the data stored in those six bytes.
char *p = &s[1]; // don't need to increment.
If you want to really play with things:
printf( "%.3s\n", &"Hi there"[3] );
will print "the". There is no name and no pointer, but there is an unnamed READONLY array that can be (carefully!) accessed.
Consider the following example that 'decorates' up to the 31 day of a month with the correct 'ordinal' suffix:
for( int DoM = 1; DoM <= 31; DoM++ ) {
int i = DoM;
if( i > 20 ) i %= 10; // Change 21-99 to 1-10. Don't wanna think about 100+
if( i > 3 ) i = 0; // Every other one ends with "th"
// 0 1 2 3
printf( "%d%s\n", DoM, &"th\0st\0nd\0rd"[ i * 3 ] );
}

Pointer arithmetic in C when used as a target array for strcat()

When studying string manipulation in C, I've come across an effect that's not quite what I would have expected with strcat(). Take the following little program:
#include <stdio.h>
#include <string.h>
int main()
{
char string[20] = "abcde";
strcat(string + 1, "fghij");
printf("%s", string);
return 0;
}
I would expect this program to print out bcdefghij. My thinking was that in C, strings are arrays of characters, and the name of an array is a pointer to its first element, i.e., the element with index zero. So the variable string is a pointer to a. But if I calculate string + 1 and use that as the destination array for concatenation with strcat(), I get a pointer to a memory address that's one array element (1 * sizeof(char), in this case) away, and hence a pointer to the b. So my thinking was that the target destination is the array starting with b (and ending with the invisible null character), and to that the fghij is concatenated, giving me bcdefghij.
But that's not what I get - the output of the program is abcdefghij. It's the exact same output as I would get with strcat(string, "fghij"); - the addition of the 1 to string is ignored. I also get the same output with an addition of another number, e.g. strcat(string + 4, "fghij");, for that matter.
Can somebody explain to me why this is the case? My best guess is that it has to do with the binding precedence of the + operator, but I'm not sure about this.
Edit: I increased the size of the original array with char string[20] so that it will, in any case, be big enough to hold the concatenated string. Output is still the same, which I think means the array overflow is not key to my question.
You will get an output of abcdefghij, because your call to strcat hasn't changed the address of string (and nor can you change that – it's fixed for the duration of the block in which it is declared, just like the address of any other variable). What you are passing to strcat is the address of the second element of the string array: but that is still interpreted as the address of a nul-terminated string, to which the call appends the second (source) argument. Appending that second argument's content to string, string + 1 or string + n will produce the same result in the string array, so long as there is a nul-terminator at or after the n index.
To print the value of the string that you actually pass to the strcat call (i.e., starting from the 'b' character), you can save the return value of the call and print that:
#include <stdio.h>
#include <string.h>
int main()
{
char string[20] = "abcde";
char* result = strcat(string + 1, "fghij"); // strcat will return the "string + 1" pointer
printf("%s", result); // bcdefghij
return 0;
}
char string[] = "abcde";
strcat(string + 1, "fghij");
Append five characters to a full string array. Booom. Undefined behavior.
Adding something to a string array is a performance optimization that tells the runtime that the string is known to be at least that many characters long.
You seem to believe that a string is a thing of its own and not an array, and strcat is doing something to its first argument. That's not how that works. Strings are arrays*; and strcat is modifying the array contents.
*Somebody's going to come by and claim that heap allocated strings are not arrays. OP is not dealing with heap yet.
Arrays are non-modibfiable lvalues. For example you may not write
char string[20] = "abcde";
char string2[] = ""fghij"";
string = string2;
Used in expressions arrays with rare exceptions are implicitly converted to pointers to their first elements.
If you will write for example string + 1 then the address of the array will not be changed.
In this call
strcat(string + 1, "fghij");
elements of the array string are being overwritten starting from the second element of the array.
In this statement
printf("%s", string);
there is outputted the whole array starting from its first character (again the array designator used as an argument is converted to a pointer to its first element).
You could write for example
printf("%s", string + 1);
In this case the array is outputted starting from its second element.
These are just two pointers to different parts of the same memory inside the same array. There is nothing in your code which creates a second array. "the name of an array is a pointer to its first element" well, not really, it decays into a pointer to its first element whenever used in an expression. So in case of string + 1, this decay first happens to the string operand and then you get pointer arithmetic afterwards. You can actually never do pointer arithmetic on array types, only on decayed pointers. Details here: Do pointers support "array style indexing"?
As for strcat, it basically does two things: call strlen on the original string to find where it ends, then call strcpy to append the new string at the position where the null terminator was stored. It's the very same thing as typing strcpy(&src[strlen(src)], dst);
Therefore it won't matter if you pass string + 1 or string, because in either case strcat will look for the null terminator and nothing else.

Using a string in CS50 library

Hi all I have a question regarding a passing a string to a function in C. I am using CS50 library and I know they are passing string as a char array (char pointer to a start of array) so passing is done by reference. My function is receiving array as argument and it returns array. When I change for example one of the element of array in function this change is reflected to original string as I expect. But if I assign new string to argument, function returns another string and original string is not change. Can you explain the mechanics behind this behaviour.
#include <stdlib.h>
#include <cs50.h>
#include <stdio.h>
string test(string s);
int main(void)
{
string text = get_string("Text: ");
string new_text = test(text);
printf("newtext: %s\n %s\n", text, new_text);
printf("\n");
return 0;
}
string test(string s)
{
//s[0] = 'A';
s = "Bla";
return s;
}
First example reflects change in the first letter on both text and newtext strings, but second example prints out text unchanged and newtext as "Bla"
Thanks!
This is going to take a while.
Let's start with the basics. In C, a string is a sequence of character values including a 0-valued terminator. IOW, the string "hello" is represented as the sequence {'h', 'e', 'l', 'l', 'o', 0}. Strings are stored in arrays of char (or wchar_t for "wide" strings, which we won't talk about here). This includes string literals like "Bla" - they're stored in arrays of char such that they are available over the lifetime of the program.
Under most circumstances, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", so most of the time when we're dealing with strings we're actually dealing with expressions of type char *. However, this does not mean that an expression of type char * is a string - a char * may point to the first character of a string, or it may point to the first character in a sequence that isn't a string (no terminator), or it may point to a single character that isn't part of a larger sequence.
A char * may also point to the beginning of a dynamically allocated buffer that has been allocated by malloc, calloc, or realloc.
Another thing to note is that the [] subscript operator is defined in terms of pointer arithmetic - the expression a[i] is defined as *(a + i) - given an address value a (converted from an array type as described above), offset i elements (not bytes) from that address and dereference the result.
Another important thing to note is that the = is not defined to copy the contents of one array to another. In fact, an array expression cannot be the target of an = operator.
The CS50 string type is actually a typedef (alias) for the type char *. The get_string() function performs a lot of magic behind the scenes to dynamically allocate and manage the memory for the string contents, and makes string processing in C look much higher level than it really is. I and several other people consider this a bad way to teach C, at least with respect to strings. Don't get me wrong, it's an extremely useful utility, it's just that once you don't have cs50.h available and have to start doing your own string processing, you're going to be at sea for a while.
So, what does all that nonsense have to do with your code? Specifically, the line
s = "Bla";
What's happening is that instead of copying the contents of the string literal "Bla" to the memory that s points to, the address of the string literal is being written to s, overwriting the previous pointer value. You cannot use the = operator to copy the contents of one string to another; instead, you'll have to use a library function like strcpy:
strcpy( s, "Bla" );
The reason s[0] = A worked as you expected is because the subscript operator [] is defined in terms of pointer arithmetic. The expression a[i] is evaluated as *(a + i) - given an address a (either a pointer, or an array expression that has "decayed" to a pointer as described above), offset i elements (not bytes!) from that address and dereference the result. So s[0] is pointing to the first element of the string you read in.
This is difficult to answer correctly without a code example. I will make one but it might not match what you are doing.
Let's take this C function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
s[4] = 'X';
}
}
return s;
}
That function will accept a pointer to a character array and if the pointer is not NULL and the zero-terminated array is longer than 4 characters, it will replace the fifth character at index 4 with an 'X'. There are no references in C. They are always called pointers. They are the same thing, and you get access to a pointed-at value with the dereference operator *p, or with array syntax like p[0].
Now, this function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
char *new_s = malloc(len+1);
strcpy(new_s, s);
new_s[4] = 'X';
return new_s;
}
}
s = malloc(1);
s[0] = '\0';
return s;
}
That function returns a pointer to a newly allocated copy of the original character array, or a newly allocated empty string. (By doing that, the caller can always print it out and call free on the result.)
It does not change the original character array because new_s does not point to the original character array.
Now you could also do this:
const char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
return "string was longer than 4";
}
}
s = "string was not longer than 4";
return s;
}
Notice that I changed the return type to const char* because a string literal like "string was longer than 4" is constant. Trying to modify it would crash the program.
Doing an assignment to s inside the function does not change the character array that s used to point to. The pointer s points to or references the original character array and then after s = "string" it points to the character array "string".

C strings confusion

I'm learning C right now and got a bit confused with character arrays - strings.
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat!
char* name;
Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but
Why do they call this a char pointer? I know of pointers as references to variables
Is this an "excuse"? Does this find any other use than in char*?
What is this actually? Is it a pointer? How do you use it correctly?
thanks in advance,
lamas
I think this can be explained this way, since a picture is worth a thousand words...
We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.
char name[] = "Fortran";
+======+ +-+-+-+-+-+-+-+--+
|0x1234| |F|o|r|t|r|a|n|\0|
+======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name a memory address of 0x1234.
Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.
What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!
The address of name in memory is 0x1234, as in the example, if you were to do this:
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.
Now on to pointers...
char *name is a pointer to type of char....
Edit:
And we initialize it to NULL as shown Thanks Dan for pointing out the little error...
char *name = (char*)NULL;
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.
Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type
Suppose we do this:
name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");
We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,
char *name;
+======+ +======+ +-+-+-+-+-+-+-+--+
|0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0|
+======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this:
printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00005678
The address of name is 0x00009876
Now, this is where the illusion that 'arrays and pointers are the same comes into play here'
When we do this:
char ch = name[1];
What happens at runtime is this:
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.
That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.
Remember, an array of type T will always decay into a pointer of the first element of type T.
When we do this:
char ch = *(name + 5);
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.
Incidentally, you can also do that to the array of chars also...
Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value).
i.e.
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value) is commutative, that is in the reverse,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)
3[name]
Ok, how do I get the value of the pointer?
That is what the * is used for,
Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.
Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this
*name++;
*name--;
The pointer automatically steps up/down as a result by one.
When we do this, assuming the pointer memory address of 0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!
Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.
Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.
Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)
I hope this is of help to you in understanding pointers.
That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.
It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:
char* name = "Mr. Anderson";
That is actually pretty much the same as this:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup() function, like this:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:
void strcpy(char *str1, char *str2);
If you then pass that:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.
Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.
This ended up a bit more encyclopedic than I'd intended, hope its helpful.
Editted to remove C++ code. I mix the two so often, I sometimes forget.
char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.
It could point to a single byte of memory and be a "true" pointer to a single char.
It could point to a contiguous area of memory which holds a number of characters.
If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
char *name, on it's own, can't hold any characters. This is important.
char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.
Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.
It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.
However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As #which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.
It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.
a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.
if you used the following:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
One is an actual array object and the other is a reference or pointer to such an array object.
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.
The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran

Resources