I'm having trouble understanding the following code:
const char *suit[4] = {"Hearts", "Diamonds", "Clubs", "Spades"}
I don't understand what is stored in the array suit, are they pointers? And if so, where are the strings stored?
Also, is the pointer constant, or the array constant?
I would appreciate a full detailed explanation of this code, and what is going on in memory!
Thanks in advance.
We learn a lot by using cdecl.org. This is what it tells us about suit:
declare suit as array 4 of pointer to const char
So:
the array contains 4 pointers.
each pointer points at a char (in this case, the first character of each string).
the pointers are not const, and neither is the array.
The strings are literals; where they are stored is implementation-specific.
In ASCII art:
"Clubs"
^
| "Spades"
| ^
| |
+---+---+---+---+
suit | | | | |
+---+---+---+---+
| |
| v
| "Diamonds"
v
"Hearts"
Note that suit itself is not a pointer; it's the name of the array.
const char * is a string type since strings are just arrays of characters. This means you have an array of const char * (strings). The strings themselves are constant and are stored in the .data section of your file binary when compiled. Hence the data pointed to by the pointer is constant.
Related
This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
String literals: pointer vs. char array
(1 answer)
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
hello to all programmers, I can't understand something
char a[]="hello"
char* b="salam"
the first question is why can't we modify 2,for example b[0]='m', I know that 2 gets stored as compile time constant BUT I can't understand what does it mean and what is the quiddity of 2 ?
and second question:
3.
char a[]="hello";
char* c=a;
c[0]='r';
Now we can modify and then print c, but we couldn't modify 2 ! why?
I can't understand the concept of those pointers please explain it to me
char a[] = "hello;" is a null terminated array of characters, the array will be initialized with the charaters you specify and the size of it will be deduced by the compiler, in this case it will have space for 6 characters, these are mutable, the charaters are copied to the array, you can change them at will. e.g. a[0] = 'x' will change hello to xello.
char* c = a; just makes the pointer c point to a, the same operations can be performed in c as you are really operating in a.
char* b = "salam" is a different animal, b is a pointer to a string literal, these are not meant to be modified, they don't get stored in an array like a, they are read only and are usually stored in some read only section of memory, either way the behavior of editing b is undefined, i.e. b[0] = 'x' is illegal as per the language rules.
char a[]="hello";
This creates an array like this:
+---+---+---+---+---+----+
a: | h | e | l | l | o | \0 |
+---+---+---+---+---+----+
The array is modifiable and you can write other characters to it later if you like (although you cannot write more than 5 or 6 of them).
char* b="salam";
This uses a string literal to create a constant string somewhere, that variable b is then a pointer to. I like to draw it like this:
+-------+
b: | * |
+---|---+
|
V
+---+---+---+---+---+----+
| s | a | l | a | m | \0 |
+---+---+---+---+---+----+
There are two differences here: (1) b is a pointer, not an array as a was. (2) the string here (that b points to) is probably in nonwritable memory. But a was definitely in writable memory.
char* c=a;
Now c is a pointer, pointing at the earlier-declared array a. The picture looks like this:
+---+---+---+---+---+----+
a: | h | e | l | l | o | \0 |
+---+---+---+---+---+----+
^
|
\
|
+---|---+
c: | * |
+-------+
And the array a was modifiable, so there's no problem doing c[0] = 'r', and we end up sounding like Scooby-Doo and saying:
+---+---+---+---+---+----+
a: | r | e | l | l | o | \0 |
+---+---+---+---+---+----+
^
|
\
|
+---|---+
c: | * |
+-------+
The key difference (which can be quite subtle) is that a string literal in source code like "hello" can be used in two very different ways. When you say
char a[] = "hello";
the string literal is used as the initial value of the array a. But the array a is an ordinary, modifiable array, and there's no problem writing to it later.
Most other uses of string literals, however, work differently. When you say
char *b = "salam";
or
printf("goodbye\n");
those string literals are used to create and initialize "anonymous" string arrays somewhere, which are referred to thereafter via pointers. The arrays are "anonymous" in that they don't have names (identifiers) to refer to them, and they're also usually placed in read-only memory, so you're not supposed to try to write to them.
Let's start of with your first question:
We have 2 strings, a and b
char a[] = "hello";
char *b = "salam";
The first string can be modified, this is because it uses a different memory segment than the second string. It is stored in the data segment of the program, and we have write access to the data segment so we can modify it.
The second string is a pointer to a string, we cannot modify string literals (pointers to strings) since c specifies that this is undefined behavior.
The address of b will just point to somewhere in the program where that string is stored. This string should preferably be declared const since it can't be modified anyways.
const char *b = "salam";
Now let's look at the second question:
The code you provided for the second question is perfectly valid,
char a[] = "hello";
char *c = a;,
c[0] = 'r';
We have a, which stores the actual string and if using ASCII it consists of 6 bytes 'h', 'e', 'l', 'l', 'o', '\0'
c points to a we can verify this with this code
#include <stdio.h>
int main(void) {
char a[] = "hello";
char *c = a;
c[0] = 'r';
printf("a: %p\nc: %p\n", &a, &*c);
}
And we'll get output as such
a: 0x7ffe3c94ecf2
c: 0x7ffe3c94ecf2
They both point to the same address, the start of the array when we do
c[0] // It essentially means *(c + 0) = in other words the address which c points to + 0 and then we subscript this is how subscripting works a[1] = *(a + 1), etc...
So pretty much c in this case points to
0x7ffe3c94ecf2
c + 0 =
0x7ffe3c94ecf2
Access that address and modify the character.
We could declare a pointer to an integer by writing int*. We already saw a pointer type char** argv. This is a pointer to pointers to characters.
Seems that argv is a pointer to multiple pointers which point to chars.
In C strings are represented by the pointer type char*. Under the hood they are stored as a list of characters, where the final character is a special character called the null terminator.
Is it the case with above char** where the pointers are stored as characters in the string ?
A pointer can point to a single object, or it can point to an array of objects.
In the case of the argv parameter to main which is declared as char *argv[] (or equivalently char ** since it is a function parameter), it is a pointer to an array of char *.
In memory it looks something like this:
argv
-----
| .-|----> ------
----- | | ----------------------------------
| .-|-----> | s | t | r | i | n | g | 1 | \0 |
| | ----------------------------------
------
| | ----------------------------------
| .-|-----> | s | t | r | i | n | g | 2 | \0 |
| | ----------------------------------
------
| | ----------------------------------
| .-|-----> | s | t | r | i | n | g | 3 | \0 |
| | ----------------------------------
------
...
When we define a char *argv[] for example :
Example 1:
char *p[5] = {{"ali"}, {"reza"}, {"hamid"}, {"saeed"}, {"mohsen"}};
for(int i = 0;i < 5;i++)
printf("%s\n", *p[i]);
Example 2 : (Here we have 5 pointers pointing to char*)
char **p;
p = new char*[5];
for(int i = 0;i < 5;i++)
p[i] = new char[10];
This happens in memory :
Yes.
A pointer p to type T can point to a single T, or to an array of T. In the latter case you can index into the array using pointer arithmetics, such as p[n]. In the same way, argv[n]'s pointees are not single chars, but nul-terminated arrays of chars, AKA C-style strings.
A pointer is a reference to a memory address - pointer contains address to a variable. A pointer to pointer is a form of indirection where the pointer contains address to the other pointer variable. The second pointer variable contains address where the value is stored.
argv refers to argument vector which has reference to arguments passes to a program via the command line. As pointer argv refers to the first element in the character array; now since the vector is represented as an array its implicit to find the other pointers.
Memory-Address: |0xA0|0xA1|0xA2|0xA3|0xA4|0xA5|0xA6|0xA7|
Memory-Content: | 0x123 | 0x456 |
|-------4-Byte------|
|<- int* = 0x123
An pointer in C contains the address of a specific region in memory (ignoring VirtualMemory).
The pure address marks the start-position (here 0xA0) and the range is bounded by the size of the actual C-type.
But the content may be a pointer as well. (Here just 32-Bit addresses!)
Memory-Address: |0xA0|0xA1|0xA2|0xA3|0xA4|0xA5|0xA6|0xA7|
Memory-Content: | 0xA4 | 0x123 |
|-------4-Byte------|
|<- int** = 0xA4 |<- int* = 0x123
So you can construct any pointer hierarchy in memory.
I'm quite confused because from what I've learned, pointers store addresses of the data they are pointing to. But in some codes, I see strings often assigned to pointers during initialization.
What exactly happens to the string?
Does the pointer automatically assign an address to store the string and point itself to that address?
How does "dereferencing" works in pointers to strings?
In case of
char *p = "String";
compiler allocate memory for "String", most likely "String" is stored in read only data section of memory, and set pointer p to points to the first byte of that memory address.
p --------------+
|
|
V
+------+------+------+------+------+------+------+
| | | | | | | |
| 'S' | 't' | 'r' | 'i' | 'n' | 'g' | '\0' |
| | | | | | | |
+------+------+------+------+------+------+------+
x100 x101 x102 x103 x104 x105 x106
Q: I see strings often assigned to pointers during initialization.
I think, what you are calling as string is actually a string literal.
According to C11 standard, chapter §6.4.5
A character string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz". [...]
The representation, "xyz" produces the address of the first element of the string literal which is then stored into the pointer, as you've seen in the initialization time.
Q: Does the pointer automatically assign an address to store the string and point itself to that address?
A: No, the memory for storing the string literal is allocated at compile time by the compiler. Whether a string literal is stored in a read only memory or read-write memory is compiler dependent. Standard only mentions that any attempt to modify a string literal results in undefined behavior.
Q: How does "dereferencing" works in pointers to strings?
A: Just the same way as it happens in case of another pointer to any other variable.
I'm trying to write a C99 program and I have an array of strings implicitly defined as such:
char *stuff[] = {"hello","pie","deadbeef"};
Since the array dimensions are not defined, how much memory is allocated for each string? Are all strings allocated the same amount of elements as the largest string in the definition? For example, would this following code be equivalent to the implicit definition above:
char stuff[3][9];
strcpy(stuff[0], "hello");
strcpy(stuff[1], "pie");
strcpy(stuff[2], "deadbeef");
Or is each string allocated just the amount of memory it needs at the time of definition (i.e. stuff[0] holds an array of 6 elements, stuff[1] holds an array of 4 elements, and stuff[2] holds an array of 9 elements)?
Pictures can help — ASCII Art is fun (but laborious).
char *stuff[] = {"hello","pie","deadbeef"};
+----------+ +---------+
| stuff[0] |--------->| hello\0 |
+----------+ +---------+ +-------+
| stuff[1] |-------------------------->| pie\0 |
+----------+ +------------+ +-------+
| stuff[2] |--------->| deadbeef\0 |
+----------+ +------------+
The memory allocated for the 1D array of pointers is contiguous, but there is no guarantee that the pointers held in the array point to contiguous sections of memory (which is why the pointer lines are different lengths).
char stuff[3][9];
strcpy(stuff[0], "hello");
strcpy(stuff[1], "pie");
strcpy(stuff[2], "deadbeef");
+---+---+---+---+---+---+---+---+---+
| h | e | l | l | o | \0| x | x | x |
+---+---+---+---+---+---+---+---+---+
| p | i | e | \0| x | x | x | x | x |
+---+---+---+---+---+---+---+---+---+
| d | e | a | d | b | e | e | f | \0|
+---+---+---+---+---+---+---+---+---+
The memory allocated for the 2D array is contiguous. The x's denote uninitialized bytes. Note that stuff[0] is a pointer to the 'h' of 'hello', stuff[1] is a pointer to the 'p' of 'pie', and stuff[2] is a pointer to the first 'd' of 'deadbeef' (and stuff[3] is a non-dereferenceable pointer to the byte beyond the null byte after 'deadbeef').
The pictures are quite, quite different.
Note that you could have written either of these:
char stuff[3][9] = { "hello", "pie", "deadbeef" };
char stuff[][9] = { "hello", "pie", "deadbeef" };
and you would have the same memory layout as shown in the 2D array diagram (except that the x's would be zeroed).
char *stuff[] = {"hello","pie","deadbeef"};
Is not a multidimensional array! It is simply an array of pointers.
how much memory is allocated for each string?
The number of characters plus a null terminator. Same as any string literal.
I think you want this:
char foo[][10] = {"hello","pie","deadbeef"};
Here, 10 is the amount of space per string and all the strings are in contiguous memory. Thus, there will be padding for strings less than size 10.
In the first example, it is a jagged array I suppose.
It declares an array of const pointers to a char. So the string literal can be as long as you like. The length of the string is independent of the array columns.
In the second one.. the number of characters per row (string) lengths must be 9 as specified by your column size, or less.
Are all strings allocated the same amount of elements as the largest
string in the definition?
No, only 3 pointer are allocated and they point to 3 string literals.
char *stuff[] = {"hello","pie","deadbeef"};
and
char stuff[3][9];
are not at all equivalent. First is an array of 3 pointers whereas the second is a 2D array.
For the first only pointer are allocated and the string literals they point to may be stored in the read-only section. The second is allocated on automatic storage (usually stack).
I know this topic was already discussed several times and I think I basically know the difference between arrays and pointer but I am interested in how arrays are exactly stored in mem.
for example:
const char **name = {{'a',0},{'b',0},{'c',0},0};
printf("Char: %c\n", name[0][0]); // This does not work
but if its declared like this:
const char *name[] = {"a","b","c"};
printf("Char: %c\n", name[0][0]); // Works well
everything works out fine.
When you define a variable like
char const* str = "abc";
char const** name = &str;
it looks something like this:
+---+ +---+ +---+---+---+---+
| *-+---->| *-+--->| a | b | c | 0 |
+---+ +---+ +---+---+---+---+
When you define a variable using the form
char const* name[] = { "a", "b", "c" };
You have an array of pointers. This looks something like that:
+---+ +---+---+
| *-+---->| a | 0 |
+---+ +---+---+
| *-+---->| b | 0 |
+---+ +---+---+
| *-+---->| c | 0 |
+---+ +---+---+
What may be confusing is that when you pass this array somewhere, it decays into a pointer and you got this:
+---+ +---+ +---+---+
| *-+---->| *-+---->| a | 0 |
+---+ +---+ +---+---+
| *-+---->| b | 0 |
+---+ +---+---+
| *-+---->| c | 0 |
+---+ +---+---+
That is, you get a pointer to the first element of the array. Incrementing this pointer moves on to the next element of the array.
A string literal converts implicitly to char const*.
The curly braces initializer doesn't.
Not relevant to your example, but worth knowing: up till and including C++03 a string literal could also implicitly convert to char* (no const), for compatibility with old C, but happily in C++11 this unsafe conversion was finally removed.
The reason the first snippet does not work is that the compiler re-interprets the sequence of characters as the value of a pointer, and then ignores the rest of the initializers. In order for the snippet to work, you need to tell the compiler that you are declaring an array, and that the elements of that array are arrays themselves, like this:
const char *name[] = {(char[]){'a',0},(char[]){'b',0},(char[]){'c',0},0};
With this modification in place, your program works and produces the desired output (link to ideone).
Your first example declares a pointer to a pointer to char. The second declares an array of pointers to char. The difference is that there's one more layer of indirection in the first one. It's a bit hard to describe without a drawing.
In a fake assembly style,
char **name = {{'a',0},{'b',0},{'c',0},0};
would translate to something like:
t1: .byte 'a', 0
.align somewhere; possibly somewhere convenient
t2: .byte 'b', 0
.align
t3: .byte 'c', 0
.align
t4: .dword t1, t2, t3, 0
name: .dword t4
while the second one,
char *name[] = {"a","b","c"};
might generate the same code for t1, t2, and t3, but then would do
name: .dword t1, t2, t3
Does that make sense?
Arrays are stored in memory as a contiguous sequence of objects, where the type of that object is the base type of the array. So, in the case of your array:
const char *name[] = {"a","b","c"};
The base type of the array is const char * and the size of the array is 3 (because your initialiser has three elements). It would look like this in memory:
| const char * | const char * | const char * |
Note that the elements of the array are pointers - the actual strings aren't stored in the array. Each one of those strings is a string literal, which is an array of char. In this case, they're all arrays of two chars, so somewhere else in memory you have three unnamed arrays:
| 'a' | 0 |
| 'b' | 0 |
| 'c' | 0 |
The initialiser sets the three elements of your name array to point to the initial elements of these three unnamed arrays. name[0] points to the 'a', name[1] points to the 'b' and name[2] points to the 'c'.
You have to look at what happens when you declare a variable, and where the memory to store the data for the variable goes.
First, what does it mean to simply write:
char x = 42;
you get enough bytes to hold a char on the stack, and those bytes are set to the value 42.
Secondly, what happens when you declare an array:
char x[] = "hello";
you get 6 bytes on the stack, and they are set to the characters h, e, l, l, o, and the value zero.
Now what happens if you declare a character pointer:
const char* x = "hello";
The bytes for "hello" are stored somewhere in static memory, and you get enough bytes to hold a pointer on the stack, and its value is set to the address of the first byte of that static memory that holds the value of the string.
So now what happens when you declare it as in your second example? You get three separate strings stored in static memory, "a", "b", and "c". Then on the stack you get an array of three pointers, each set to the memory location of those three strings.
So what is your first example trying to do? It looks like you want a pointer to an array of pointers, but the question is where will this array of pointers go? This is like my pointer example above, where something should be allocated in static memory. However, it just happens that you cannot declare a two dimensional array in static memory using brace initialisation like that. So you could do what you want by declaring the array as a variable outside of the function:
const char* name_pointers[] = {"a", "b", "c"};
then inside the function:
const char** name = name_pointers;