Related
I have a question regarding the operation on character array.
I was inspired by This answer here that discusses char* vs char[]. In this answer, it says considering a case where we want to cut off the first character of a word,
char* p = "hello";
p++;
is a much faster operation, than if we declared using character array, and therefore had to index into every element.
char s[] = "hello";
s[0] = 'e';
s[1] = 'l';
s[2] = 'l';
s[3] = 'o';
s[4] = '/0';
But my question is, in the second case, cannot I just use another pointer that points to the start of s, i.e.
char* ptr = s;
and then treat it the same as char* p? Would it be as fast?
You mean something like this?
char s[] = "Hello";
char *ptr = s+1;
Yes, you can do that, and it results in ptr pointing to the string "ello". It is just as fast; like the other example, it should be simple arithmetic and not require any copying of data.
I think it is a misconception to think of this as a "pointer versus array" issue. Pointers and arrays in C are inseparable. Both examples involve an array, and two different pointers into that array. In the first example, the array is created by the string literal "Hello" and doesn't have a name; the values of p before and after p++ both point into that array. In the second example, the array is named s, and the expression s decays into a pointer to the zeroth element of that array; s+1 is a pointer to the first element.
The main difference is that in the second example, the array can be modified, and may have a finite lifetime. So your program now has to manage issues of aliasing and ownership. Suppose one part of the code accesses the string through s, and another part accesses it through ptr. Are you prepared for the fact that if the first code modifies the string, the second code will see it change? Does your code correctly keep track of when the lifetime of s ends (e.g. the end of the block where it's defined), and does it make sure not to use ptr beyond that point?
By contrast, the string literal in the first example has static lifetime, and survives until the end of the program; so ptr = p+1 remains valid indefinitely. It also must not be modified at all, or undefined behavior results. So if your program is otherwise bug free, you don't have to worry about some other part of your code changing the string behind your back.
The way I understand these things:
char *p = "hello";
Six bytes (including terminating '\0') are put in READ ONLY memory, and the address of the first byte is put into the variable p that has the datatype char *. Yes, the code can increment/decrement p to point further into the string.
char s[] = "hello";
Six bytes (including terminating '\0') are put in READ/WRITE memory, and referring to s means referring to the initial byte of the string. Yes, another pointer can be declared: char *p = s; And, yes, the code can manipulate the data stored in those six bytes.
char *p = &s[1]; // don't need to increment.
If you want to really play with things:
printf( "%.3s\n", &"Hi there"[3] );
will print "the". There is no name and no pointer, but there is an unnamed READONLY array that can be (carefully!) accessed.
Consider the following example that 'decorates' up to the 31 day of a month with the correct 'ordinal' suffix:
for( int DoM = 1; DoM <= 31; DoM++ ) {
int i = DoM;
if( i > 20 ) i %= 10; // Change 21-99 to 1-10. Don't wanna think about 100+
if( i > 3 ) i = 0; // Every other one ends with "th"
// 0 1 2 3
printf( "%d%s\n", DoM, &"th\0st\0nd\0rd"[ i * 3 ] );
}
I'm new to C and have a trouble understanding pointers. The part that I'm confused is about char * and int *.
For example, we can directly assign a pointer for char, like
char *c = "c"; and it doesn't make errors.
However, if I assign a pointer for int like I just did, for example int * a = 10;,
it makes errors. I need to make an additional space in memory to assign a pointer for int,
like int *b = malloc(sizeof(int)); *b = 20; free(b);...
Can anyone tell me why?
I think you're misunderstanding what a pointer is and what it means. In this case:
int* a = 10;
You're saying "create a pointer (to an int) and aim it at the literal memory location 0x0000000A (10).
That's not the same as this:
int n = 10;
int* a = &n;
Which is "create a pointer (to an int) and aim it at the memory location of n.
If you want to dynamically allocate this:
int* a = malloc(sizeof(int));
*a = 10;
Which translates to "create a pointer (to an int) and aim it at the block of memory just allocated, then assign to that location the value 10.
Normally you'd never allocate a single int, you'd allocate a bunch of them for an array, in which case you'd refer to it as a[0] through a[n-1] for an array of size n. In C *(x + y) is generally the same as x[y] or in other words *(x + 0) is either just *x or x[0].
In your example, you do not send c to the char 'c'. You used "c" which is a string-literal.
For string-literals it works as explained in https://stackoverflow.com/a/12795948/5280183.
In initialization the right-hand side (RHS) expression must be of or convertible to the type of the variable declared.
If you do
char *cp = ...
int *ip = ...
then what is in ... must be convertible to a pointer to a char or a pointer to an int. Also remember that a char is a single character.
Now, "abc" is special syntax in C - a string literal, for creating an array of (many) immutable characters. It has the type char [size] where size is the number of characters in the literal plus one for the terminating null character. So "c" has type char [2]. It is not char. An array in C is implicitly converted to pointer to the first element, having then the type "pointer to the element type". I.e. "c" of type char [2] in char *cp = "c"; is implicitly converted to type char *, which points to the first of the two characters c and \0 in the two-character array. It is also handily of type char * and now we have char *cp = (something that has type char * after conversions);.
As for int *, you're trying to pass an integer value to a pointer. It does not make sense.
A pointer holds an address. If you'd ask for the address of Sherlock Holmes, the answer would be 221 Baker Street. Now instead what you've done is "address of Sherlock Holmes is this photo I took of him in this morning".
The same incorrect code written for char * would be
char *cp = 'c'; // or more precisely `char *p = (char)'c';
and it would give you precisely the same error proving that char *cp and int *cp work alike.
Unfortunately C does not have int string literals nor literals for integer arrays, though from C99 onwards you could write:
int *ip = (int[]){ 5 };
or
const int *ip = (const int[]){ 5 };
Likewise you can always point a char * or int * to a single object of that type:
char a = 'c';
char *pa = &a;
int b = 42;
int *pb = &b;
now we can say that a and *pa designate the same char object a, and likewise b and *pb designate the same int object b. If you change *pa you will change a and vice versa; and likewise for b and *pb.
A string literal like "c" is actually an array expression (the type of "c" is "2-element array of char).
Unless it is the operand of the sizeof or & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "array of T" will be converted, or "decay" to an expression of type "pointer to T" and its value will be the address of the first element of the array.
So when you write
char *c = "c";
it’s roughly equivalent to writing
char string[] = "c";
char *c = &string[0];
You’re assigning a pointer to a pointer, so the compiler doesn’t complain.
However, when you write
int *a = 10;
you’re assigning an int value to a pointer and the types are not compatible, so the compiler complains. Pointers are not integers; they may have an integer representation, but that’s not guaranteed, and it won’t be the same size as int.
Are there differences between int pointer and char pointer in c?
short answer: NO
All pointers in C are created equal in the last decades. During some time there were pointers of different sizes, in some platforms like Windows, due to a thing called memory model.
Today the compiler sets the code to use 32-bit or 64 or any size pointers depending on the platform, but once set all pointers are equal.
What is not equal is the thing a pointer points to, and it is that you are confused about.
consider a void* pointer. malloc() allocates memory in C. It always return a void* pointer and you then cast it to anything you need
sizeof(void*) is the same as sizeof(int*). And is the same as sizeof(GiantStruct*) and in a 64-bit compiler is 8 bytes or 64 bits
what about char* c = "c"?
you must ask yourself what is "c". is is char[2]. And it is a string literal. Somewhere in memory the system will allocate an area of at least 2 bytes, put a 'c' and a zero there, since string literals are NULL terminated in C, take that address in put in the address allocated to your pointer c.
what about int* a = 10?
it is the same thing. It is somewhat ok until the operating system comes in. What is a? a is int* pointer to an int. And what is *a? an int. At which address? 10. Well, truth is the system will not let you access that address just because you set a pointer to point to it. In the '80s you could. Today there are very strong rules on this: you can only access memory allocated to your processes. Only. So the system aborts your program as soon as you try to access.
an example
if you write
int* a = malloc(300);
the system will allocate a 300-bytes area, take the area address and write it for you at the address allocated to a. When you write *a=3456 the system will write this value to the first sizeof(int) bytes at that address.
You can for sure write over all 300, but you must somewhat manipulate the pointer to point inside the area. Only inside that area.
In fact C language was designed just for that.
Let us write "c" to the end of that 300-byte area alocated to a:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char* c = "c";
int* a = (int*)malloc(300);
*a = 3456;
printf("Area content is %d\n", *a);
char* pToEnd = (char*)a + 298;
*(pToEnd) = 'c';
*(pToEnd + 1) = 0;
printf("At the end of the 300-byte area? [Should be 'c'] '%s'\n", pToEnd);
printf("The pointer c points to '%s'\n", c);
//*c = 'x'; // error: cancel program
c = pToEnd;
printf("The pointer c now points do the end of the 300-byte area:'%s'\n", c);
free(a);
return 0;
};
and see
Area content is 3456
At the end of the 300-byte area? [Should be 'c'] 'c'
The pointer c points to 'c'
The pointer c now points do the end of the 300-byte area:'c'
Also note that as soon as you try to write over c, the char pointer in your example, your program will also cancel. It is a read only area --- see the comment on line 14. But you can point c inside the area in the example and use it, since the pointer in itself is not constant.
And when you write c = pToEnd in the example above access to the string literal is lost forever. In some sense is a memory leak, but it is a static allocated area. The leak is not the area but the address.
Note also that free(a) at the end, even if being a int* will free all 300 bytes
The direct answer to your question, is that, C language is a strict typed language. And before the machine code is produced, a compiler checks for some possible errors. The types are different, and that is all that the compiler wants to know.
But from the bit-byte point of view, there are no difference, for C language, with the above two pointers.
Take a look at your code with online compiler
void main(){
int * a = 10; /* <= this presumably gives you an error. */
}
What I did is copy pasted your code that gives you the error. And online compiler had emitted a warning, not an error. Even more than that. Change the C standard to be of c17 and you still get a mere warning.
The note about assigning a string literal to a pointer had been discussed broadly at SO. It deserves a different topic.
To sum it up. For the C language there is no meaningful difference in the above two pointers. It is a user, who takes a responsibility to know a layout of a memory in his/her program. All in all, the language was designed as a high level tool that operates over linear memory.
I'm confused as to why I can create a string by using just char*. Isn't a char* just a pointer to a single char? What makes me suddenly allowed to treat this thing as a string? Can I do this with other objects? What about:
int* arr;
Do I now have an array of ints? Is it dynamically sized? Can I just start calling
arr[0] and arr[1]
and expect it to work?
I've been googling for quite awhile now, and can't find any clear answer to this question.
This is one part of C I've always just accepted as the way it works...no more.
Pointers point to things. int *arr; currently does not point anywhere. You cannot treat it as pointing to anything (array or otherwise).
You can point it at something. If you point it at a single int, you can treat it as pointing to a single int. If you point it at an element of an array of ints, you can treat it as pointing to an element of an array.
char behaves the same as int (or any other object type) in this respect. You can only treat a char * as pointing to a string if you first point that char * to a string.
Isn't a char* just a pointer to a single char?
It is a pointer to a memory location that should hold a char. It can be a single allocated char, or an array of several char's.
What makes me suddenly allowed to treat this thing as a string?
When the char* pointer points to a memory location where several chars[] are allocated one after another (an array), then you can access them with an array index [idx]. But you have to either:
A. Use a statically allocated array in code:
char myArray[5];
B. or Dynamic allocation (malloc() in C language):
int chars_in_my_array = 10;
char* myArray = (char*)malloc( sizeof(char) * chars_in_my_array );
Can I do this with other objects? What about: int* arr; Do I now have an array of ints?
no you don't:
int* p; //'p' is a *pointer* to an int
int arr[10]; //'arr' is an array of 10 integers in memory.
You can set 'p' to point to the first element in 'arr' by doing this:
p = &arr[0]; //'p' points to the address of the first item in arr[]
Is it dynamically sized?
No. That's something you'd see in a higher-level construct like 'std::string' in C++, or 'String' in Java. In C, memory is managed manually (or you use existing library functions to do this for you. see the standard header file 'string.h')
Can I just start calling arr[0] and arr[1] and expect it to work?
Not unless you do this:
int arr[5];
arr[0] = 100; arr[1] = 200;
...Because you statically allocate an array of five integers. Then you can access them with an array subscript.
However you can NOT do this:
int* p;
p[0] = 100; p[1] = 200; // NO!!
Because 'p' is just a variable that contains a memory address to an integer. It doesn't actually allocate any memory unless you explicitly allocate memory for it. You can do this though:
int* p;
int arr[10];
p = &arr[0];
p[0] = 100; p[1] = 200; // This is OK
This works because 'p' now points to memory that has been (statically) allocated for an array. So we can access the array through the pointer 'p'.
You can also dynamically allocate the array:
int* p;
p = (int*)malloc( sizeof(int) * 10 );
if( p != NULL ) {
p[0] = 100;
p[1] = 200;
}
Here we allocate memory for 10 integers and 'p' points to the first item. Now we can use 'p' as an array. But only for 10 items maximum!
I've been googling for quite awhile now, and can't find any clear
answer to this question. This is one part of C I've always just
accepted as the way it works...no more.
Really? Try googling 'arrays in c', 'memory allocation in C', 'static vs dynamic allocation C', and functions such as malloc() and free(). There's a plethora of information available, but hope this helps for starters :)
#include <stdio.h>
int main(void){
char *p = "Hello";
p = "Bye"; //Why is this valid C code? Why no derefencing operator?
int *z;
int x;
*z = x
z* = 2 //Works
z = 2 //Doesn't Work, Why does it work with characters?
char *str[2] = {"Hello","Good Bye"};
print("%s", str[1]); //Prints Good-Bye. WHY no derefrencing operator?
// Why is this valid C code? If I created an array with pointers
// shouldn't the element print the memory address and not the string?
return 0;
}
My Questions are outlined with the comments. In gerneal I'm having trouble understanding character arrays and pointers. Specifically why I can acess them without the derefrencing operator.
In gerneal I'm having trouble understanding character arrays and pointers.
This is very common for beginning C programmers. I had the same confusion back about 1985.
p = "Bye";
Since p is declared to be char*, p is simply a variable that contains a memory address of a char. The assignment above sets the value of p to be the address of the first char of the constant string "Bye", in other words the address of the letter "B".
z = 2
z is declared to be char*, so the only thing you can assign to it is the memory address of a char. You can't assign 2 to z, because 2 isn't the address of a char, it's a constant integer value.
print("%s", str[1]);
In this case, str is defined to be an array of two char* variables. In your print statement, you're printing the second of those, which is the address of the first character in the string "Good Bye".
When you type "Bye", you are actually creating what is called a String Literal. Its a special case, but essentially, when you do
p = "Bye";
What you are doing is assigning the address of this String literal to p(the string itself is stored by the compiler in a implementation dependant way (I think) ). Technically address to the first element of a char array, as Richard J. Ross III explains.
Since it is a special case, it does not work with other types.
By the way, you should likely get a compiler warning for lines like char *p = "Hello";. You should be required to define them as const char *p = "Hello"; since modifying them is undefined as the link explains.
As to the printing code.
print("%s", str[1]);
This doesnt need a dereferencing operation, since internally %s requires a pointer(specifically char *) to be passed, thus the dereferencing is done by printf. You can test this by passing a value when printf is expecting a pointer. You should get a runtime crash when it tries to dereference it.
p = "Bye";
Is an assignment of the address of the literal to the pointer.
The
array[n]
operator works in a similar way as a dereferrence of the pointer "array" increased by n. It is not the same, but it works that way.
Remember that "Hello", "Bye" all are char * not char.
So the line, p="Bye"; means that pointer p is pointing to a const char *i.e."Bye"
But in the next case with int *
*z=2 means that
`int` pointed by `z` is assigned a value of 2
while, z=2 means the pointer z points to the same int, pointed by 2.But, 2 is not a int pointer to point other ints.So, the compiler flags the error
You're confusing something: It does work with characters just as it works with integers et cetera.
What it doesn't work with are strings, because they are character arrays and arrays can only be stored in a variable using the address of their first element.
Later on, you've created an array of character pointers, or an array of strings. That means very simply that the first element of that array is a string, the second is also a string. When it comes to the printing part, you're using the second element of the array. So, unsurprisingly, the second string is printed.
If you look at it this way, you'll see that the syntax is consistent.
I'm learning C right now and got a bit confused with character arrays - strings.
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat!
char* name;
Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but
Why do they call this a char pointer? I know of pointers as references to variables
Is this an "excuse"? Does this find any other use than in char*?
What is this actually? Is it a pointer? How do you use it correctly?
thanks in advance,
lamas
I think this can be explained this way, since a picture is worth a thousand words...
We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.
char name[] = "Fortran";
+======+ +-+-+-+-+-+-+-+--+
|0x1234| |F|o|r|t|r|a|n|\0|
+======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name a memory address of 0x1234.
Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.
What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!
The address of name in memory is 0x1234, as in the example, if you were to do this:
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.
Now on to pointers...
char *name is a pointer to type of char....
Edit:
And we initialize it to NULL as shown Thanks Dan for pointing out the little error...
char *name = (char*)NULL;
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.
Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type
Suppose we do this:
name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");
We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,
char *name;
+======+ +======+ +-+-+-+-+-+-+-+--+
|0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0|
+======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this:
printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00005678
The address of name is 0x00009876
Now, this is where the illusion that 'arrays and pointers are the same comes into play here'
When we do this:
char ch = name[1];
What happens at runtime is this:
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.
That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.
Remember, an array of type T will always decay into a pointer of the first element of type T.
When we do this:
char ch = *(name + 5);
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.
Incidentally, you can also do that to the array of chars also...
Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value).
i.e.
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value) is commutative, that is in the reverse,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)
3[name]
Ok, how do I get the value of the pointer?
That is what the * is used for,
Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.
Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this
*name++;
*name--;
The pointer automatically steps up/down as a result by one.
When we do this, assuming the pointer memory address of 0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!
Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.
Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.
Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)
I hope this is of help to you in understanding pointers.
That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.
It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:
char* name = "Mr. Anderson";
That is actually pretty much the same as this:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup() function, like this:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:
void strcpy(char *str1, char *str2);
If you then pass that:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.
Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.
This ended up a bit more encyclopedic than I'd intended, hope its helpful.
Editted to remove C++ code. I mix the two so often, I sometimes forget.
char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.
It could point to a single byte of memory and be a "true" pointer to a single char.
It could point to a contiguous area of memory which holds a number of characters.
If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
char *name, on it's own, can't hold any characters. This is important.
char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.
Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.
It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.
However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As #which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.
It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.
a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.
if you used the following:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
One is an actual array object and the other is a reference or pointer to such an array object.
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.
The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran