Related
In C it's possible to initialize a string array with such line of code:
char *exStr[] = { "Zorro", "Alex", "Celine", "Bill", "Forest", "Dexter" };
On my debugger the array is initialized as below:
This is different as standard two-dimensional array where each string occupy the same amount of byte; can someone point me on the right direction to understand:
What's means exactly the declaration "char *exStr[] = ...";
How can I re-create the same variable's structure from my program.
What's means exactly the declaration "char *exStr[] = ...";
It means 'array of poniter to char'. Char data of literals like "Zorro" are usually placed in read-only data segments, and array elements inlcude only address of the first char of literal.
How can I re-create the same variable's structure from my program.
You can do something like:
char zorro[] = {'Z', 'o', 'r', 'r', 'o', '\0'}; // initialized every char for clarity
char alex[] = "Alex";
char celine[] = "Celine";
...
char* exStr[] = {
&zorro[0], // explicitly referenced for clarity
alex,
celine,
...
};
1.
As the other answers and comments said, the expression "char *exStr[]" means "array of pointers to char".
a) How to read it
The best way to read it (as well as other more complex C declarations) is to start at the thing's name (in this case "exStr") and work your way towards the extremities of the declaration, like this:
First go right, adding each encountered meaningful symbol(s) to the meaning of the expression
Stop going right at the first closing paranthesis ")", or when the expression ends
When you cannot go right anymore, resume where you started and go left, again adding each encountered symbol to the meaning of the expression
Stop going left at the paranthesis "(" that corresponds to the ")" that stopped you on the right, or at the beginning of the expression
When stopped going left at a "(" paranthethis and you haven't reached the beginning of the expression, resume going right immediately after the ")" that corresponds to it
Keep going right and left until you reach the margins of the expression both ways
In your case, you would go like that:
start at exStr: that's the variable's name
go right: []: exStr is an array
go right: stop: there's nothing there, the "=" sign stops us
go left: *: exStr is an array of pointers
go left: char: exStr is an array of pointers to char
go left: stop: there's nothing there, the "=" sign stops us
b) Why are you seeing that each array element occupies a different amount of bytes
When you have a value like "Zorro", it's a C string.
A C string is an array of bytes that starts at a given address in memory and ends with 0. In case of "Zorro" it will occupy 6 bytes: 5 for the string, the 6th for the 0.
You have several ways of creating C strings. A few common ones are:
A) use it literally, such as:
printf("Zorro");
B) use it literraly but store it in a variable:
char *x = "Zorro";
C) allocate it dynamically and copy data into it
size_t n = 2;
char *p = malloc((n+1) * sizeof(char));
char c = getch(); // read a character from the console
p[0] = c;
p[1] = 0;
// do something with p...
free(p);
Whenever you're putting a string value like "Zorro" in your program, the compiler will reserve memory for just enough to hold this string plus the 0 terminator. It will initialize this memory with whatever you provided inside "", it will add the 0, and secretly keep a pointer to that memory. It will prevent you from modifying that memory (you cannot change that string).
In your code sample, it did this for every string that appeared in the exStr initialization.
That's why you see each array element occupying a different amount of memory. If you look closer to your debugger output, you'll see that the compiler reserved memory for a string immediately after the preceding one, and each string occupies its length in bytes plus the 0 terminator. E.g. "Zorro" starts at 02f, and occupies positions 02f - 034, which are 6 bytes (5 for Zorro and 1 for the 0 terminator). Then "Alex" starts at 035, occupies 5 bytes: 035 - 039, and so on.
2.
To create a similar array programmatically:
If all you have are some static strings like in your example, then your code sample is good enough.
Otherwise if you plan to put dynamic values into your array (or if you plan to change the original strings in the program), you would do something like:
#define COUNT 5
char *strings[COUNT];
int i;
for (i = 0; i < COUNT; i++) {
int n = 32; // or some other suitable maximum value, or even a computed value
strings[i] = malloc((n+1) * sizeof(char));
// put up to 32 arbitrary characters into strings[i], e.g. read from a file or from console; don't forget to add the 0 terminator
}
// use strings...
// free memory for the strings when done
for (i = 0; i < COUNT; i++) {
free(strings[i]);
}
What's means exactly the declaration "char *exStr[] = {......};
Here, exStr is an array of char pointers, initialized by the supplied brace enclosed initializers.
To elaborate, the starting address of each string literal in the initializer list is stored into each element of the array.
Point to note, the contents of the array elements (being string literals) are not modifiable.
Regarding the second part,
How can I re-create the same variable's structure from my program.
is not clear. Can you elaborate on that part?
Just in case, if you meant how to access, then, it's just like the normal array access. As long as you're within bounds, you're good to go.
For the second part of your question, I guess you are asking for an array of pointers to variable length strings.
It can be done like this:
#include <stdio.h>
int main(void) {
char* ma[2]; // Array of char pointers for pointing to char-strings
char* pZorro = "Zorro"; // Char string - not modifiable
char AlexString[5] = {"Alex"}; // Char string - modifiable
ma[0] = pZorro; // Make the first pointer point to the "Zorro" string
ma[1] = AlexString; // Make the second pointer point to the "Alex" string
printf("%s\n", ma[0]);
printf("%s\n", ma[1]);
// strcpy(ma[0], "x"); // Run time error! Can't change "Zorro"
strcpy(ma[1], "x"); // OK to change "Alex"
printf("%s\n", ma[0]);
printf("%s\n", ma[1]);
return 0;
}
The output will be:
Zorro
Alex
Zorro
x
So I want to know exactly how many ways are there to declare string. I know similar questions have been asked for several times, but I think my focus is different. As a beginner in C, I want to know which method of declaration is correct and preferable, so that I can stick to it.
We all know we can declare string in the two following ways.
char str[] = "blablabla";
char *str = "blablabla";
After reading some Q&A in stack-overflow, I was told string literal is placed in the read-only part of the memory. So in order to create modifiable string, you need to create a character array of the length of the string + 1 in the stack and copy each characters to array.
So this is what the 1st line of code doing.
However, what the 2nd line of code does is to create a character pointer and assign the pointer the address of the 1st character located in the read-only part of the memory. So this kind of declaration does not involved copying character by character.
Please let me know if I am wrong.
So far it seems quite understandable, but what really confuses me is the modifiers. For instance, some suggests
const char *str = "blablabla";
instead of
char *str = "blablabla";
because if we do something like
*(str + 1) = 'q';
It will cause undefined behavior which is horrible.
But some go even further and suggest something like
static const char *str = "blablabla";
and say this will place the string into the static memory
which will never gets modified to avoid the undefined behavior.
So which is actually the #right# way to declare a string?
Besides, I am also interested in knowing the scope when declaring string.
For example,
(You can ignore the examples, both of them are buggy as pointed out by the others)
#include <stdio.h>
char **strPtr();
int main()
{
printf("%s", *strPtr());
}
char **strPtr()
{
char str[] = "blablabla";
char *charPtr = str;
char **strPtr = &charPtr;
return strPtr;
}
will print some garbage value.
But
#include <stdio.h>
char **strPtr();
int main()
{
printf("%s", *strPtr());
}
char **strPtr()
{
char *str = "blablabla";
/*As point out by other I am returning the address of a local variable*/
/*This causes undefined behavior*/
char **strPtr = &str;
return strPtr;
}
will work perfectly fine. (NO it doesn't, it is undefined behavior.)
I think I should leave it as another question.
This question is getting too long.
A lot of your confusion comes from a common mis-understanding about C and strings: one which is explicitly stated in the title of your question. The C language does not have a native string type, so in fact there are exactly zero ways to declare a string in C.
Spend some time reading Does C Have a String type? which does a good job of explaining that.
This is evident from the fact that you can't (sensibly) do the following:
char *a, *b;
// code to point a and b at some "strings"
if (a == b)
{
// do something if strings compare equal
}
The if statement will compare the values of the pointers, not the contents of the memory they address. So, if a and b pointed to two different areas of memory, each containing identical data, the comparison a == b would fail. The only time the comparison would evaluate as "true" (i.e. something other than zero), would be if a and b held the same address (i.e. pointed to the same location in memory).
What C has is a convention, and some syntactic sugar to make life easier.
The convention is that a "string" is represented as a sequence of char terminated with the value zero (usually referred to as NUL and represented by the special character escape sequence '\0'). This convention comes from the API of the original standard library (back in the 70's) which provides a set of string primitives such as strcpy(). These primitives were so fundamental to doing anything truly useful in the language that life was made easier for programmers by adding syntactic sugar (this is all before the language escaped from the lab).
The syntactic sugar is the existence of "string literals": no more, and no less. In source code, any sequence of ASCII characters, enclosed in double quotes, is interpreted as a string literal and the compiler produces a copy (in "read-only" memory) of the characters plus a terminating NUL byte to conform to the convention. Modern compilers detect duplicated literals and only produce a single copy - but it's not a requirement of the standard last time I looked. Thus this:
assert("abc" == "abc");
may or may not raise an assertion - which reinforces the statement that C does not have a native string type. (For that matter, neither does C++ - it has a String class!)
With that out of the way, how do you use string literals to initialize a variable?
The first (and most common) form you will see is this
char *p = "ABC";
Here, the compiler sets aside 4 bytes (assuming sizeof(char) ==1) of memory in a "read only" section of the program and initializes it with [0x41, 0x42, 0x43, 0x00]. It then initializes p with the address of that array. You should note that there is some const casting going on here as the underlying type of a string literal is const char * const (a constant pointer to a constant character). Which is why you would normally be advised to write this as:
const char *p = "ABC";
Which is a "pointer to a constant char" - another way of saying "pointer to read only memory".
The next two forms use string literals to initialize arrays
char p1[] = "ABC";
char p2[3] = "ABC";
Note that there is a critical difference between the two. the first line creates a 4 byte array. The second creates a 3 bytes array.
In the first case, as before, the compiler creates a 4 byte constant array containing [0x41, 0x42, 0x43, 0x00]. Note that it adds the trailing NUL to form a "C String". It then reserves four bytes of RAM (on the stack for a local variable, or in "static" memory for variables at file scope) and inserts code to initialize it at run time by copying the "read only" array into the allocated RAM. You are now free to modify elements of p1 at will.
In the second case, the compiler creates a 3 byte constant array containing [0x41, 0x42, 0x43]. Note that there is no trailing NUL. It then reserves 3 bytes of RAM (on the stack for a local variable, or in "static" memory for variables at file scope) and inserts code to initialize it at run time by copying the "read only" array into the allocated RAM. You are again now free to modify elements of p2 at will.
The difference in sizes of the two arrays p1 and p2 is critical. The following code (if you ran it) would demonstrate it.
char p1[] = "ABC";
char p2[3] = "ABC";
printf ("p1 = %s\n", p1); // Will print out "p1 = ABC"
printf ("p2 = %s\n", p2); // Will print out "p2 = ABC!##$%^&*"
The output of the second printf is unpredictable, and could theoretically result in your code crashing. It tends to seem to work, simply because so much of RAM is filled with zeroes that eventually printf finds a terminating NUL.
Hope this helps.
I read that:
char a[] = "string";
is a: "string"
whereas
char *ptr = "string"
is ptr: [__] ---> "string"
I am little confused. One thing I know is that pointers always store the address. In case of character pointer what address does it store? What does this block represent (block which I made pointing to string). Is it the starting address of the "string".
And in case of array? How can I clearly differentiate between char pointer and char array?
Diagrams may help.
char *ptr = "string";
+-------+ +----------------------------+
| ptr |--------->| s | t | r | i | n | g | \0 |
+-------+ +----------------------------+
char a[] = "string";
+----------------------------+
| s | t | r | i | n | g | \0 |
+----------------------------+
Here, ptr is a variable that holds a pointer to some (constant) data. You can subsequently change the memory address that it points at by assigning a new value to ptr, such as ptr = "alternative"; — but you cannot legitimately change the contents of the array holding "string" (it is officially readonly or const, and trying to modify it may well crash your program, or otherwise break things unexpectedly).
By contrast, a is the constant address of the first byte of the 7 bytes of data that is initialized with the value "string". I've not shown any storage for the address because, unlike a pointer variable, there isn't a piece of changeable storage that holds the address. You cannot change the memory address that a points to; it always points to the same space. But you can change the contents of the array (for example, strcpy(a, "select");).
When you call a function, the difference disappears:
if (strcmp(ptr, a) == 0)
…string is equal to string…
The strcmp() function takes two pointers to constant char data (so it doesn't modify what it is given to scrutinize), and both ptr and a are passed as pointer values. There's a strong case for saying that only pointers are passed to functions — never arrays — even if the function is written using array notation.
Nevertheless, and this is crucial, arrays (outside of paramter lists) are not pointers. Amongst other reasons for asserting that:
sizeof(a) == 7
sizeof(ptr) == 8 (for 64-bit) or sizeof(ptr) == 4 (32-bit).
In case of character pointer what address does it store? What does this block represent (block which I made pointing to string). Is it the starting address of the "string".
This blocks represents a WORD or DWORD (achitecture dependant), the content of this block is a memory address, a random location defined at compile time. That memory address is the address of first character of the string.
In practice, the difference is how much stack memory it uses.
For example when programming for microcontrollers where very little memory for the stack is allocated, makes a big difference.
char a[] = "string"; // the compiler puts {'s','t','r','i','n','g', 0} onto STACK
char *b = "string"; // the compiler puts just the pointer onto STACK
// and {'s','t','r','i','n','g',0} in static memory area.
Maybe this will help you understand.
assert(a[0] == 's'); // no error.
assert(b[0] == 's'); // no error.
assert(*b == 's'); // no error.
b++; // increment the memory address, so points to 't'
assert(*b == 's'); // assertion failed
assert(*b == 't'); // no error.
char a[] = "string"; initializes the value of the array of chars called a with the value string. And the size of a.
char *a = "string"; creates an unnamed static array of chars somewhere in memory and return the address of the first element of this unnamed array to a.
In the first one, a stores the address of the first element of the array. So when we index something like a[4], this means 'take' the 4th element after the begin of the object named a.
In the second, a[4] means 'take' the 4th element after the object that a points to.
And for your last question:
A char array is a 'block' of contiguous elements of type char. A char pointer is a reference to an element of the type char.
Due to pointer arithmetics, a pointer can be used to simulate (and access) an array.
Maybe those 3 links help make the difference more clear:
http://c-faq.com/decl/strlitinit.html
http://c-faq.com/aryptr/aryptr2.html
http://c-faq.com/aryptr/aryptrequiv.html
You may find it useful to think of:
char * a = "string";
as the same as:
char SomeHiddenNameYouWillNeverKnowOrSee[] = "string"; /* may be in ReadOnly memory! */
char * a = &SomeHiddenNameYouWillNeverKnowOrSee[0];
Did you ever tried to open some executabe file with a text editor ? It appears merely as garbage, but in the middle of the garbage you can see some readable strings. These are all the litteral strings defined in you program.
printf("my literal text");
char * c = "another literal text"; // should be const char *, see below
If your program contains the above code you may be able to find my literal textand another literal text in program's binary (actually it depends on the details of the binary format, but it often works). If you are Linux/Unix user you can also use the strings command for that.
By the way, if you write the above code, C++ compilers will emit some warning (g++ say: warning: deprecated conversion from string constant to ‘char*’ because such strings are not of type char * but const char [] (const char array) which decay to const char * when assigned to a pointer.
This also is the case with C compilers, but the above error is so very common that this warning is usually disabled. gcc does not even include in -Wall, you have to explicitely enable it through -Wwrite-strings. The warning is warning: initialization discards ‘const’ qualifier from pointer target type.
It merely reminds that you are theoretically not allowed to change the literal texts through pointers.
The executable may loads such strings in a read only part of Data segment memory. If you try to change the content of string it can raise a memory error. Also the compiler is allowed to optimise literal text storage by merging identical strings for instance. The pointer just contains the address in (read only) memory where the literal strings will be loaded.
On the other hand
char c[] = "string"; is mere syntaxic sugar for char c[7] = {'s', 't', 'r', 'i', 'n', 'g', 0}; If you do sizeof(c) in your code it will be 7 bytes (the size of the array, not the size of a pointer). This is an array on stack with an initialiser. Internally the compiler can do wathever it likes to initialize the array. It can be characters constants loaded one by one in the array, or it can involved a memcpy of some hiden string literal. The thing is that you have no way to tell the difference from your program and find out where the data comes from. Only the result matters.
By the way a thing that is slightly confusing is that if you define some function parameter of the type char c[], then it won't be an array but an alternative syntax for char * c.
In your example, ptr contains the address of the first char in the string.
As for the difference between a char array and a string, in C terms there is no difference other than the fact that by convention what we call "string" is a char array where the final char is a NULL, to terminate the string.
i.e. even if we have an array of char with 256 potential elements, if the first (0th) char is null (0) then the length of the string is 0.
Consider a variable str which is a char array of 5 chars, containing the string 'foo'.
*ptr => str[0] 'f'
str[1] 'o'
str[2] 'o'
str[3] \0
str[4] ..
A char *ptr to this array would reference the first element (index = 0) and the 4th element (index = 3) would be null, marking the end of the 'string'. The 5th element (index = 4) will be ignored by string handling routines which respect the null terminator.
If you are asking what a contains in each case then:
char a[] = "string";
// a is a pointer.
// It contains the address of the first element of the array.
char *a = "string";
// Once again a is a pointer containing address of first element.
As rnrneverdies has explained in his answer, the difference is in where the elements are stored.
This question already has answers here:
What is the type of string literals in C and C++?
(4 answers)
Closed 9 years ago.
I'm learning C and today I stuck with the "strings" in C. Basically I understand that there is no such thing like string in C.
In C strings are an array characters terminated with \0 at the end.
So far so good.
char *name = "David";
char name[] = "David";
char name[5] = "David";
This is where confusing starts. Three different ways to declare "strings". Can you provide me with a simple examples in which situations which one to use. I've read a lot tutorials on the web but still can't get the idea.
I read this How to declare strings in C question on stackoverflow but still can't get the difference..
First one char *name = "David"; is string literal and is resides in read only section of memory. You can't do any modification to it. Better to write
const char *name = "David";
Second one char name[] = "David"; is a string of 6 chars including '\0'. Modification can be done.
char name[5] = "David"; invoke undefined behavior. "David" is a string of 6 chars (including terminating '\0'). You need an array of 6 chars to store it.
char name[6] = "David";
Further reading: C-FAQ 6. Arrays and Pointers.
This link provides a pretty good explanation.
char[] refers to an array, char* refers to a pointer, and they are not the same thing.
char a[] = "hello"; // array
char *p = "world"; // pointer
According to the standard, Annex J.2/1, it is undefined behavior when:
—The program attempts to modify a string literal (6.4.5).
6.4.5/5 says:
In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal
or literals.
Therefore you actually need an array of six elements to account for the NUL character.
In the first example, you declare a pointer to a variable:
// A variable pointer to a variable string (i.e. an array of 6 bytes).
char *pName = "David";
At this time, you can modify the 6 bytes occupied by 'D', 'a', 'v', 'i', 'd', '\0':
pName[0] = 'c';
*pName = 'c';
*(pName+0) = 'c';
strcpy(pName, "Eric"); // Works well
But ONLY those 6 bytes:
// BUG: Will overwrite 2 random bytes located after \0 in RAM.
strcpy(pName, "Fredrik");
The pointer can be altered runtime to point to another variable string e.g.
pName = "Charlie Chaplin";
Which then can be modified
pName[0] = 'c';
*pName = 'c';
*(pName+0) = 'c';
// OK now, since pName now points to the CC array
// which is 16 bytes located somewhere else:
strcpy(pName, "Fredrik");
As stated by others, you would normally use const char * in the pointer cases, which also is the preferred way to use a string. The reason is that the compiler will help you from the most common (and hard-to-find) bugs of memorytrashing:
// A variable pointer to a constant string (i.e. an array of 6 constant bytes).
const char *pName = "David";
// Pointer can be altered runtime to point to another string e.g.
pName = "Charlie";
// But, the compiler will warn you if you try to change the string
// using any of the normal ways:
pName[0] = 'c'; // BUG
*pName = 'c'; // BUG
*(pName+0) = 'c'; // BUG
strcpy(pName, "Eric");// BUG
The other ways, using an array, gives less flexibility:
char aName[] = "David"; // aName is now an array in RAM.
// You can still modify the array using the normal ways:
aName[0] = 'd';
*aName = 'd';
*(aName+0) = 'd';
strcpy(aName, "Eric"); // OK
// But not change to a larger, or a different buffer
aName = "Charlie"; // BUG: This is not possible.
Similarly, a constant array helps you even more:
const char aName[] = "David"; // aName is now a constant array.
// The compiler will prevent modification of it:
aName[0] = 'd'; // BUG
*aName = 'd'; // BUG
*(aName+0) = 'd'; // BUG
strcpy(aName, "Eric");// BUG
// And you cannot of course change it this way either:
aName = "Charlie"; // BUG: This is not possible.
The major difference between using the pointer vs array declaration is the returned value of sizeof(): sizeof(pName) is the size of a pointer, i.e. typically 4. sizeof(aName) returns the size of the array, i.e. the length of the string+1.
It matters most if the variable is declared inside a function, especially if the string is long: It occupies more of the precious stack. Thus, the array declaration is normally avoided.
It also matters when passing the variable to a macros which use sizeof(). Such macros must be supplied with the intended type.
It also matters if you want to e.g. swap the strings. Strings declared as pointers are straight-forward and requires the CPU to access less bytes, by simply moving the 4 bytes of the pointers around:
const char *pCharlie = "Charlie";
const char *pDavid = "David";
const char *pTmp;
pTmp = pCharlie;
pCharlie = pDavid;
pDavid = pTmp;
pCharlie is now "David", and pDavid is now "Charlie".
Using arrays, you must provide a temporary storage large enough for the largest string, and use strcpy(), which takes more CPU, copying byte for byte in the strings.
The last method is rarely used, since the compiler automatically calculates that David needs 6 bytes. No need to tell it what's obvious.
char aName[6] = "David";
But, it is sometimes used in cases where the array MUST be a fixed length, independent of its contents, e.g. in binary protocols or files. In that case, it can be of benefit to manually add the limit, in order to get help from the compiler, should anyone by accident add or remove a character from the string in the future.
What is the difference between
char str1[32] = "\0";
and
char str2[32] = "";
Since you already declared the sizes, the two declarations are exactly equal. However, if you do not specify the sizes, you can see that the first declaration makes a larger string:
char a[] = "a\0";
char b[] = "a";
printf("%i %i\n", sizeof(a), sizeof(b));
prints
3 2
This is because a ends with two nulls (the explicit one and the implicit one) while b ends only with the implicit one.
Well, assuming the two cases are as follows (to avoid compiler errors):
char str1[32] = "\0";
char str2[32] = "";
As people have stated, str1 is initialized with two null characters:
char str1[32] = {'\0','\0'};
char str2[32] = {'\0'};
However, according to both the C and C++ standard, if part of an array is initialized, then remaining elements of the array are default initialized. For a character array, the remaining characters are all zero initialized (i.e. null characters), so the arrays are really initialized as:
char str1[32] = {'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0'};
char str2[32] = {'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0',
'\0','\0','\0','\0','\0','\0','\0','\0'};
So, in the end, there really is no difference between the two.
As others have pointed out, "" implies one terminating '\0' character, so "\0" actually initializes the array with two null characters.
Some other answerers have implied that this is "the same", but that isn't quite right. There may be no practical difference -- as long the only way the array is used is to reference it as a C string beginning with the first character. But note that they do indeed result in two different memory initalizations, in particular they differ in whether Str[1] is definitely zero, or is uninitialized (and could be anything, depending on compiler, OS, and other random factors). There are some uses of the array (perhaps not useful, but still) that would have different behaviors.
Unless I'm mistaken, the first will initialize 2 chars to 0 (the '\0' and the terminator that's always there, and leave the rest untouched, and the last will initialize only 1 char (the terminator).