Is string an array or a pointer? - c

I got this tiny little program down here, that comes with preassigned 'count' number and then parses it to format of 'xxx', where x is a 0 or corresponding cipher (e.g from '6' I got 006 and from 234 I get 234). When I get it like this
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
int count = 0;
char number[2] = {0};
int base0 = count % 10;
int base1 = ((count % 100) - base0) / 10;
int base2 = ((count % 1000) - base1) / 100;
sprintf(number, "%d%d%d", base2, base1, base0); //print into the number variable
printf("%s\n", number);
}
everything is working fine, but if I switch 'number' variable definition with
char* number = NULL;
I get segmentation fault. Why is that? It should just point to the beggining of the string.

In C, a string is represented as an array of characters; the line
char number[2] = { 0 };
therefore defines a two character array ('string') initialised with two null characters.
An array is a series of contiguous (read 'next to each other') blocks of memory that can each be individually accessed via the subscript operator (square brackets).
When you changed the above line to
char* number = NULL;
You are no longer defining a character array. In this case you are now defining a pointer to char variable. Because arrays are basically blocks of memory next to each other, if you know where the first element is, it is possible to then locate any of the others. So since the deference operator allows you to access the value pointed at by a pointer, you can access the ith element in a string with
*(char + i)
which in C is exactly equivalent to
char[i]
Now in your case, you haven't set the char* to point to actual memory containing a real string, you have assigned the NULL pointer, which means 'this pointer points nowhere'. Treating a char* as a string requires it to be dereferenced (following the pointer); however, dereferencing a NULL pointer is an undefined behaviour, which in this situation has caused a seg fault

NULL is a well-defined "nowhere" - it's an invalid pointer value that's guaranteed not to point to a function or an object in memory. When you set number to NULL, you're saying that number isn't pointing anywhere meaningful.
Trying to work through an invalid pointer leads to undefined behavior - in this case, a segfault.
Arrays and pointers are not the same thing. Array expressions will "decay" (be converted) to pointer types in most circumstances, but the object is always an array.

"everything is working fine,"
Wait there. You already have sever shortage of required memory. In case of
sprintf(number, "%d%d%d", base2, base1, base0);
number needs much more memory to stay within the bounds, so as is, your program invokes undefined behavior by accessing out of bound memory. You need to allocate enough memory to hold the final sting to be held by number.
Anyways, NULL is defined to be an invalid memory location, accessing which invoke UB again, so you cannot use it to store anything, at all.

char number[2]; gives you two bytes of memory for number
char * number = null; gives you zero bytes of memory for *number.
When you do sprintf, you are writing three bytes into the number.
When say "char *number = null", you are writing into null, which results in segfault.
When you do sprintf, you are writing 3 bytes into 2 bytes of allocated memory - you are overwriting 1 extra byte into stack.

Related

C Programming: error: assignment to expression with array type

Hello I am a beginner in C and I was working with structures when i have this problem:
#include <stdio.h>
#include <stdlib.h>
typedef struct {
char nom[20];
char prenom[20];
int note;
} Etu;
int main() {
Etu E[5];
E[0].nom = "reda";
printf("%s", E[0].nom);
return 0;
}
With this one I have this error (error: assignment to expression with array type). So I decided to do it using pointers and it actually working this is the code I used:
#include <stdio.h>
#include <stdlib.h>
typedef struct {
char *nom;
char *prenom;
int note;
} Etu;
int main() {
Etu E[5];
E[0].nom = "reda";
printf("%s", E[0].nom);
return 0;
}
So the question is what is the difference between both of them since strings are pointers.
thank you ..
Strings are not pointers, they are sequences of characters terminated with a null byte. These sequences are stored in arrays, which are not pointers either, but objects that live in memory and to which a pointer can point. A pointer is a variable that contains the address of an object in memory.
In the first example, you try and store a string into an array with =. C does not support this type of assignment and complains with an explicit error message. You can copy the string into the array with strcpy, assuming the destination array is large enough to store the bytes of the string, including the null terminator. "reda" uses 5 bytes, so it is OK to copy it into E[0].nom:
#include <stdio.h>
#include <string.h>
typedef struct {
char nom[20];
char prenom[20];
int note;
} Etu;
int main() {
Etu E[5];
strcpy(E[0].nom, "reda");
printf("%s\n", E[0].nom);
return 0;
}
The second example uses a different approach: nom is now defined as a pointer to char. Defining the array E in main leaves it uninitialized, so you can only read from the pointers after you set their value to point to actual arrays of char somewhere in memory. E[0].nom = "reda"; does just that: set the address of the string literal "reda" into the member nom of E[0]. The string literal will be placed by the compiler into an array of its own, including a null terminator, and that must not be changed by the program.
First part, you try to copy two array of character (string is not a pointer, it is array of character that is terminated by null character \0).
If you want to copy value of an array to another, you can use memcpy, but for string, you can also use strcpy.
E[0].nom = "reda";
change to:
strcpy(E[0].nom,"reda");
Second part, you make the pointer point to string literal. The pointer points to first character of the string (r in this case).
You can see How to copy a string using a pointer
And Assigning strings to pointer in C Language
It is a lower-level concept actually.
If we say char nom[20] than its memory location will be decided at compile time and it will use stack memory (see the difference between stack memory and heap memory to gain more grasp on lover level programming). so we need to use it with indexes.
such as,
nom[0] = 'a';
nom[1] = 'b'; // and so on.
On the other hand if we create a string using the double quotes method (or the second way in which you used pointers). The double quoted strings are identified as const char* by the compiler and the string is placed at the heap memory rather than the stack memory. moreover their memory location is not decided at compile time.
a very efficient way of testing the difference between these two types of strings is that when you use sizeof() with char[20] nom it will return 20 * sizeof(char) which evaluates to 20 without any doubt. it clearly states that whether you use only 5 characters or 2 characters it will always occupy space of 20 charaters (in stack memory)
but in case of const char* if you use sizeof operator on this variable than it will return the size of the pointer variable which depends on the hardware (32bit will have 4bytes pointer) and (64bit will have 8bytes pointer). So, the sizeof operator do not shows how many characters are stored in the pointer. it just shows the size of pointer. but the benefit of using const char* is that only take memory that is required if 4 characters are stored than it will consume 5 byte memory (last byte is additionally added in case of strings) and in case of 8 characters it will consume 9 bytes and so on but note that the characters will be stored in heap(memory)
pointer --------------------> "characters"
(on stack) (heap)
What you have been doing in the first problem is that you were assigning a string that is only stored on heap to a data types that is only stored on stack. it is not the matter of string. it the matter of memory model
I hope you might have understood. I not i can elaborate more in comments.
an additional information. Stack Memory is faster compared to the Heap Memory.

What is the difference between using strcpy and equating the addresses of strings?

I am not able to understand the difference between strcpy function and the method of equating the addresses of the strings using a pointer.The code given below would make my issue more clear. Any help would be appreciated.
//code to take input of strings in an array of pointers
#include <stdio.h>
#include <strings.h>
int main()
{
//suppose the array of pointers is of 10 elements
char *strings[10],string[50],*p;
int length;
//proper method to take inputs:
for(i=0;i<10;i++)
{
scanf(" %49[^\n]",string);
length = strlen(string);
p = (char *)malloc(length+1);
strcpy(p,string);//why use strcpy here instead of p = string
strings[i] = p; //why use this long way instead of writing directly strcpy(strings[i],string) by first defining malloc for strings[i]
}
return 0;
}
A short introduction into the magic of pointers:
char *strings[10],string[50],*p;
These are three variables with distinct types:
char *strings[10]; // an array of 10 pointers to char
char string[50]; // an array of 50 char
char *p; // a pointer to char
Then the followin is done (10 times):
scanf(" %49[^\n]",string);
Read C string from input and store it into string considering that a 0 terminator must fit in also.
length = strlen(string);
Count non-0 characters until 0 terminator is found and store in length.
p = (char *)malloc(length+1);
Allocate memory on heap with length + 1 (for 0 terminator) and store address of that memory in p. (malloc() might fail. A check if (p != NULL) wouldn't hurt.)
strcpy(p,string);//why use strcpy here instead of p = string
Copy C string in string to memory pointed in p. strcpy() copies until (inclusive) 0 terminator is found in source.
strings[i] = p;
Assign p (the pointer to memory) to strings[i]. (After assignment strings[i] points to the same memory than p. The assignment is a pointer assignment but not the assignment of the value to which is pointed.)
Why strcpy(p,string); instead of p = string:
The latter would assign address of string (the local variable, probably stored on stack) to p.
The address of allocated memory (with malloc()) would have been lost. (This introduces a memory leak - memory in heap which cannot be addressed by any pointer in code.)
p would now point to the local variable in string (for every iteration in for loop). Hence afterwards, all entries of strings[10] would point to string finally.
char *strings[10]---- --------->1.
strcpy(strings[i],string) ----->2.
strings[i] = string ----------->3.
p = (char *)malloc(length+1); -|
strcpy(p,string); |-> 4.
strings[i] = p;----------------|
strings is an array of pointers, each pointer must point to valid memory.
Will lead undefined behavior since strings[i] is not pointing to valid memory.
Works but every pointer of strings will point to same location thus each will have same contents.
Thus create the new memory first, copy the contents to it and assign that memory to strings[i]
strcpy copies a particular string into allocated memory. Assigning pointers doesn't actually copy the string, just sets the second pointer variable to the same value as the first.
strcpy(char *destination, char *source);
copies from source to destination until the function finds '\0'. This function is not secure and should not be used - try strncpy or strlcpy instead. You can find useful information about these two functions at https://linux.die.net/man/3/strncpy - check where your code is going to run in order to help you choose the best option.
In your code block you have this declaration
char *strings[10],string[50],*p;
This declares three pointers, but they are quite different. *p is an ordinary pointer, and must have space allocated for it (via malloc) before you can use it. string[50] is also a pointer, but of length 50 (characters, usually 1 byte) - and it's allocated on the function stack directly so you can use it right away (though the very first use of it should be to zero out the memory unless you've used a zeroing allocator like Solaris' calloc. Finally, *strings[10] is a double pointer - you have allocated an array of 10 pointers, each element of which (strings[1], strings[9] etc) must be allocated for before use.
The only one of those which you can assign to immediately is string, because the space is already allocated. Each of those pointers can be addressed via subscripts - but in each case you must ensure that you do not walk off the end otherwise you'll incur a SIGSEGV "segmentation violation" and your program will crash. Or at least, it should, but you might instead get merely weird results.
Finally, pointers allocated to must be freed manually otherwise you'll have memory leaks. Items allocated on the stack (string) do not need to be freed because the compiler handles that for you when the function ends.

Why the char has to be a pointer instead of a type of char?

#include <stdio.h>
typedef struct {
char * name;
int age;
} person;
int main() {
person john;
/* testing code */
john.name = "John";
john.age = 27;
printf("%s is %d years old.", john.name, john.age);
}
This a well-working code, I just got a small question.
In the struct part, after I delete the * before name, this code no longer works, but no matter the age's type is, int or a pointer, it always works fine. So can anyone tell me why name has to be a pointer rather than just a type of char?
char type is short for character and can hold one character. C has no string type, instead a string in C is an array of char terminated with '\0' - the null character (null terminated strings).
Thus to use a string you need a pointer to memory that contains lots of characters. So why does it work for an int with or without the *. Well we can either have the age as an int or we can have a pointer to memory that stores the age. Either works well. But we can't store a string in one character.
This has to do with format specifiers you've in printf function. %s tries to output the string (reads a portion of memory), %d interprets everything in gets like an integer, thus even a pointer sort of works, however, you shouldn't to that, it's undefined behavior.
I suggest you to read some good books on C to get a good grasp on such things, a good list is here The Definitive C Book Guide and List
but no matter the age's type is int or a pointer, it always works fine.
That's undefined behaviour.
To elaborate, a double-quote delimited string (as seen above) is a string literal, and when used as an initializer, it basically gives you a pointer to the starting of the literal thereby it needs a pointer variable to be stored. So, name has to be a pointer.
OTOH, the initializer 27 is an integer literal (integer constant) and it needs to be stored into an int variable , not an int *. If you use 27 to initialize an int * and use that, it works (rather, seem to work) because that way, it invokes undefined behavior later, by attempting to use invalid memory location.
FWIW, if you try something like
typedef struct {
char * name;
int *age;
} person;
and then
john.age = 27; //incompatible assigment
compiler will warn you about wrong conversion from integer to pointer.
char *name: name is a pointer to type char. Now, when you make it to point to "John", the compiler stores the John\0 i.e., 5 chars to some memory and returns you the starting address of that memory. So, when you try to read using %s (string format specifier), the name variable returns you the whole string reading till \0.
char name : Here name is just one char having 1 byte of memory. So, you can't store anything more than one char. Also, when you would try to read, you should always read just one char (%c) because trying to read more than that will take you to the memory region which is not assigned to you and hence, will invoke Undefined Behavior.
int age : age is allocated 4 bytes, so you can store an integer to this memory and read as well, printf("%d", age);
int *age : age is a pointer to type int and it stores the address of some memory. Unlike strings, you do not read integers using address (loosely saying, just for the sake of avoiding complexity). You have to dereference it. So first, you need to allocate some memory, store any integer into it and return the address of this memory to age. Or else, if you don't want to allocate memory, you can use compiler's help by assigning a value to age like this, *age = 27. In this case, compiler will store 27 to some random memory and will return the address to age which can be dereferenced using *age, like printf("%d", *age);

pointer related queries

Guys i have few queries in pointers. Kindly help to resolve them
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
question.1 : what is the difference between these 2 types of declaration ?
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
question.2 : i didn't get how is it working
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
question.3 how many bytes are being allocated to the pointer c?
when i try to input a string, it takes just a word and not the whole string. why so ?
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
question.4 when i try to input a charcter why does it throw a segmentation fault?
Thanks in advance.. Waiting for your reply guys..
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
the %s expects a pointer to array of chars.
char *c=malloc(sizeof(char)); // you are allocating only 1 byte aka char, not array of char!
scanf("%s",c); // you need pass a pointer to array, not a pointer to char
printf("%s",c);// you are printing a array of chars, but you are sending a char
you need do this:
int sizeofstring = 200; // max size of buffer
char *c = malloc(sizeof(char))*sizeofstring; //almost equals to declare char c[200]
scanf("%s",c);
printf("%s",c);
question.3 how many bytes are being allocated to the pointer c? when i
try to input a string, it takes just a word and not the whole string.
why so ?
In your code, you only are allocating 1 byte because sizeof(char) = 1byte = 8bit, you need allocate sizeof(char)*N, were N is your "string" size.
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
Here you are declaring two variables, a and b, and initializing them. "this is an array of characters" is a string literal, which in C has type array of char. a has type array of char. In this specific case, the array does not get converted to a pointer, and a gets initialized with the array "this is an array of characters". b has type pointer to char, the array gets converted to a pointer, and b gets initialized with a pointer to the array "this is an array of characters".
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
In an expression, *b dereferences the pointer b, so it evaluates to the char pointed by b, i.e: T. This is not an address (which is what "%s" is expecting), so you get undefined behavior, most probably a crash (but don't try to do this on embedded systems, you could get mysterious behaviour and corrupted data, which is worse than a crash). In the second case, %s expects a pointer to a char, gets it, and can proceed to do its thing.
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
In C, sizeof returns the size in bytes of an object (= region of storage). In C, a char is defined to be the same as a byte, which has at least 8 bits, but can have more (but some standards put additional restrictions, e.g: POSIX requires 8-bit bytes, i.e: octets). So, you are allocating 1 byte. When you call scanf(), it writes in the memory pointed to by d without restraint, overwriting everything in sight. scanf() allows maximum field widths, so:
Allocate more memory, at least enough for what you want + 1 terminating ASCII NUL.
Tell scanf() to stop, e.g: scanf("%19s") for a maximum 19 characters (you'll need 20 bytes to store that, counting the terminating ASCII NUL).
And last (if markdown lets me):
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
c is not a pointer, so you are trying to store an address where you shouldn't. In scanf, "%c" expects a pointer to char, which should point to an object (=region of storage) with enough space for the specified field width, 1 by default. Since c is not a pointer, the above may crash in some platforms (and cause worse things on others).
I see several problems in your code.
Question 1: The difference is:
a gets allocated in writable memory, the so-called data segment. Here you can read and write as much as you want. sizeof a is the length of the string plus 1, the so-called string terminator (just a null byte).
b, however, is just a pointer to a string which is located in the rodata. That means, in a data area which is read only. sizeof b is whatever is the pointer size on your system, maybe 4 or 8 on a PC or 2 on many embedded systems.
Question 2: The printf() format wants a pointer to a string. With *b, you dereferene the pointer you have and give it the first byte of data, which is a t (ASCII 84 or something like that). The callee, however, treats it as a pointer, dereferences it and BAM.
With b, however, everything goes fine, as it is exactly the right call.
Question 3: malloc(sizeof(char)) allocates exactly one byte. sizeof(char) is 1 by definition, so the call is effectively malloc(1). The input just takes a word because %s is defined that way.
Question 4:
char c=malloc(sizeof(char)); // 4)
shound give you a warning: malloc() returns a pointer which you try to put into a char. ITYM char *...
As you continue, you give that pointer to scanf(), which receives e.g. instead of 0x80043214 a mere 0x14, interprets it as a pointer and BAM again.
The correct way would be
char * c=malloc(1024);
scanf("%1024s", c);
printf("%s", c);
Why? Well, you want to read a string. 1 byte is too small, better allocate more.
In scanf() you should take care that you don't allow reading more than your buffer can hold - thus the limitation in the format specifier.
and on printing, you should use %s, because you want the whole string to be printed and not only the first character. (At least, I suppose so.)
Ad Q1: The first is an array of chars with a fixed pointer a pointing to it. sizeof(a) will return something like 20 (strlen(a)+1). Trying to assign something to a (like a = b) will fail, since a is fixed.
The second is a pointer pointing to an array of char and hence is the sizeof(b) usually 4 on 32-bit or 8 on 64-bit. Assigning something to b will work, since the pointer can take a new value.
Of course, *a or *b work on both.
Ad Q2: printf() with the %s argument takes a pointer to a char (those are the "strings" in C). Hence, printf("%s", *b) will crash, since the "pointer" used by printf() will contain the byte value of *b.
What you could do, is printf("%c", *b), but that would only print the first character.
Ad Q3: sizeof(char) is 1 (by definition), hence you allocate 1 byte. The scanf will most likely read more than one byte (remember that each string will be terminated by a null character occupying one char). Hence the scanf will trash memory, likely to cause memory sometime later on.
Ad 4: Maybe that's the trashed memory.
Both declaration are the same.
b point to the first byte so when you say *b it's the first character.
printf("%s", *b)
Will fail as %s accepts a pointer to a string.
char is one byte.

C strings confusion

I'm learning C right now and got a bit confused with character arrays - strings.
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat!
char* name;
Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but
Why do they call this a char pointer? I know of pointers as references to variables
Is this an "excuse"? Does this find any other use than in char*?
What is this actually? Is it a pointer? How do you use it correctly?
thanks in advance,
lamas
I think this can be explained this way, since a picture is worth a thousand words...
We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.
char name[] = "Fortran";
+======+ +-+-+-+-+-+-+-+--+
|0x1234| |F|o|r|t|r|a|n|\0|
+======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name a memory address of 0x1234.
Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.
What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!
The address of name in memory is 0x1234, as in the example, if you were to do this:
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.
Now on to pointers...
char *name is a pointer to type of char....
Edit:
And we initialize it to NULL as shown Thanks Dan for pointing out the little error...
char *name = (char*)NULL;
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.
Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type
Suppose we do this:
name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");
We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,
char *name;
+======+ +======+ +-+-+-+-+-+-+-+--+
|0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0|
+======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this:
printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00005678
The address of name is 0x00009876
Now, this is where the illusion that 'arrays and pointers are the same comes into play here'
When we do this:
char ch = name[1];
What happens at runtime is this:
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.
That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.
Remember, an array of type T will always decay into a pointer of the first element of type T.
When we do this:
char ch = *(name + 5);
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.
Incidentally, you can also do that to the array of chars also...
Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value).
i.e.
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value) is commutative, that is in the reverse,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)
3[name]
Ok, how do I get the value of the pointer?
That is what the * is used for,
Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.
Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this
*name++;
*name--;
The pointer automatically steps up/down as a result by one.
When we do this, assuming the pointer memory address of 0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!
Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.
Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.
Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)
I hope this is of help to you in understanding pointers.
That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.
It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:
char* name = "Mr. Anderson";
That is actually pretty much the same as this:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup() function, like this:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:
void strcpy(char *str1, char *str2);
If you then pass that:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.
Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.
This ended up a bit more encyclopedic than I'd intended, hope its helpful.
Editted to remove C++ code. I mix the two so often, I sometimes forget.
char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.
It could point to a single byte of memory and be a "true" pointer to a single char.
It could point to a contiguous area of memory which holds a number of characters.
If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
char *name, on it's own, can't hold any characters. This is important.
char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.
Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.
It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.
However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As #which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.
It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.
a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.
if you used the following:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
One is an actual array object and the other is a reference or pointer to such an array object.
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.
The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran

Resources