How do strings and bits work in C? - c

I have a two part question:
Understand output from sizeof
Understand how strings are stored in variables (e.g. bits and ram)
Question 1
I'm trying to understand the output from the following piece of C code.
printf("a: %ld\n", sizeof("a")); // 2
printf("abc: %ld\n", sizeof("abc")); // 4
It always seems to be one larger than the actual number of characters specified.
The docs suggest that the returned value represents the size of the object (in this case a string) in bytes. So if the size of a gives us back 2 bytes, then I'm curious how a represents 16 bits of information.
If I look at the binary representation of the ASCII character a I can see it is 01100001. But that's only showing 3 bits out of 1 byte being used.
Question 2
Also, how do large strings get stored into a variable in C? Am I right in thinking that they have to be stored within an array, like so:
char my_string[5] = "hello";
Interestingly when I have some code like:
char my_string = "hello";
printf("my_string: %s\n", my_string);
I get two compiler errors:
- incompatible pointer to integer conversion initializing 'char' with an expression of type 'char [6]'
- format specifies type 'char *' but the argument has type 'char'
...which I don't understand. Firstly it states the type is presumed to be a size of [6] when there's only 5 characters. Secondly the mention of a pointer here seems odd to me? Why does printf expect a pointer and why does not specifying the length of the variable/array result in a pointer to integer error?
By the way I seemingly can set the length of the variable/array to 5 rather than 6 and it'll work as I'd expect it to char my_string[5] = "hello";.
I'm probably just missing something very basic/fundamental about how bits and strings work in C.
Any help understanding this would be appreciated.

The first part of the question is due to the way strings are stored in C. Strings in C are nothing more than a series of characters (char) with a \0 added at the end, which is the reason you're seeing a +1 when you do sizeof. Notice in your second part if you were to say char my_string[4] = "hello"; you'd also get a compiler error saying there wasn't enough size for this string. That's also related to this.
Now onto the second part, strings themselves are a series of characters. However, you don't store every character by themselves in a variable. You instead have a pointer to these series of characters that will allow you to access them from some part of memory. Additional information regarding pointers and strings in C can be found here: Pointer to a String in C

In C, a string is a sequence of character values followed by a zero valued terminator. For example, the string "hello" is the sequence of character values {'h', 'e', 'l', 'l', 'o', 0 }1. Strings (including string literals) are stored as arrays of char (or wchar_t for wide-character strings). To account for the terminator, the size of the array must always be one greater than the number of characters in the string:
char greeting[6] = "hello";
The storage for greeting will look like
+---+
greeting: |'h'| greeting[0]
+---+
|'e'| greeting[1]
+---+
|'l'| greeting[2]
+---+
|'l'| greeting[3]
+---+
|'o'| greeting[4]
+---+
| 0 | greeting[5]
+---+
Storage for a string literal is largely the same2:
+---+
"hello": |'h'| "hello"[0]
+---+
|'e'| "hello"[1]
+---+
|'l'| "hello"[2]
+---+
|'l'| "hello"[3]
+---+
|'o'| "hello"[4]
+---+
| 0 | "hello"[5]
+---+
Yes, you can apply the subscript operator [] to a string literal just like any other array expression.
Except when it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array. So, the string literal "hello" is an expression of type "6-element array of char". If I pass that literal as an argument to a function like
printf( "%s\n", "hello" );
then both of the string literal expressions "%s" and "hello" are converted from "4-element array of char"3 and "6-element array of char" to "pointer to char", so what printf receives are pointer values, not array values.
You've already seen two exceptions to the conversion rule. You saw it in your code when you used the sizeof operator and got a value one more than you expected. sizeof evaluates to the number of bytes required to store the operand. Because of the zero terminator, it takes N+1 bytes to store an N-character string.
The second exception is the declaration of the greeting array above; since I'm using the string literal to initialize the array, the literal is not converted to a pointer value first. Note the you can write that declaration as
char greeting[] = "hello";
In that case, the size of the array is taken from the size of the initializer.
The third exception occurs when the array expression is the operand of the unary & operator. Instead of evaluating to a pointer to a pointer to char (char **), the expression &greeting evaluates to type "pointer to 6-element array of char", or char (*)[6].
The length of a string is the number of characters before to zero terminator. All the standard library functions that deal with strings expect to see that terminator. The size of the array to store that string must be at least one greater than the maximum length of the string you intend to store.
Sometimes you'll see people write '\0' instead of a naked 0 to represent a string terminator; they mean the same thing.
Storage for string literals is allocated at program startup and held until the program terminates. String literals may be stored in a read-only memory segment; attempting to modify the contents of a string literal results in undefined behavior.
'\n' counts as a single character.

Related

Why do char functions require becoming a pointer when returning a string?

I am a beginner C programmer and I am having difficulties grasping the concept of pointers. My question is why does the program require char *lowercase to run normally and why, if I remove the *, it breaks the program?
char *lowercase(char a[]){
int c = 0;
char *lowercase_string = malloc(300);
strcpy(lowercase_string,a);
while (lowercase_string[c] != '\0') {
if( lowercase_string[c] >= 'A' && lowercase_string[c] <= 'Z' ){
lowercase_string[c] = lowercase_string[c]+32;
}
c++;
}
return lowercase_string;
}
In C, strings are just contiguous chunks of characters, ending with a null byte (the character with a value of 0, denoted by '\0' or '\x00'). To represent a string, a pointer to the first element is used, which is what the star means after char. When you return a pointer, you return where the string is, and people can use that information to get the whole string just by iterating (by adding to the pointer/looking past it) until they find a null byte. If you just return char, you only can return one character of the string, which is nowhere close to the full string.
As written is previous answers, the char pointer points to the first letter of the string, which gives you the ability to scan the whole string (moving 1 char at a time).
Additionally, since your string is allocated on the heap, it must be accessed through a pointer, as the allocated block of memory is nameless.
The root of the issue is how C treats arrays (not just arrays of character type, but any type).
Arrays are not pointers - they are contiguous sequences of objects of some type. No pointer is part of the array object itself. When you declare an array like int arr[5];, you get something like this in memory:
+–––+
arr: | | arr[0]
+–––+
| | arr[1]
+–––+
...
+–––+
| | arr[4]
+–––+
However, unless it is the operand of the sizeof or unary & operators or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array.
When you pass an array argument to a function, what the function actually receives is a pointer to the first element. When you attempt to return an array from a function, that array expression will be converted to a pointer (more on this later). In fact, you cannot declare a function to return an array type - something like
int foo( void )[10];
is not allowed. You can return pointers to arrays:
int (*foo( void ))[10];
but not arrays directly. This is why the *alloc functions return pointers instead of array types.
This is also why returning non-static local arrays from a function is a problem - you’re not returning the value of the array (that is, a copy of the array’s contents), you’re returning its address. After the function returns, though, that array no longer exists and that pointer value is invalid. That’s why you need to use malloc to allocate storage that will hang around after the function returns.
Strings are sequences of character values including a zero-valued terminator. The string "hello" is represented by the sequence {'h', 'e', 'l', 'l’, 'o', 0}. Strings (including string literals) are stored in arrays of character type (char for ASCII, EBCDIC, and UTF-8 encodings, wchar_t for "wide" encodings like UTF-16).
So, that’s all background. In your specific case, you need to declare lowercase as char * because it is returning a value of that type (lowercase_string). It breaks when you leave the * off because you’re telling the rest of the program it returns a single character value, not a pointer, and the rest of the program is expecting it to return a pointer.
Your program is returning a pointer so you must call the function with char *.
And you know, in c a char array is also treated as a char pointer

Does printf only allow taking a string literal as the first argument?

According to the definition of printf, it says that first argument should be an array i.e char* followed by ellipses ... i.e variable arguments after that. If I write:
printf(3+"helloWorld"); //Output is "loWorld"`
According to the definition shouldn't it give an error?
Here is the definition of printf:
#include <libioP.h>
#include <stdarg.h>
#include <stdio.h>
#undef printf
/* Write formatted output to stdout from the format string FORMAT. */
/* VARARGS1 */
int __printf(const char *format, ...) {
va_list arg;
int done;
va_start(arg, format);
done = vfprintf(stdout, format, arg);
va_end (arg);
return done;
}
#undef _IO_printf
ldbl_strong_alias(__printf, printf);
/* This is for libg++. */
ldbl_strong_alias(__printf, _IO_printf);
This is not an error.
If you pass "helloWorld" to printf, the string literal is converted to a pointer to the first character.
If you pass 3+"helloWorld", you're adding 3 to a pointer to the first character, which results in a pointer to the 4th character. This is still a valid pointer to a string, it's just not the whole string that was defined.
3+"helloWorld" is of char * type (after conversion in call to printf). In C, the type of a string literal is char []. When passed as an argument to a function, char [] will convert to pointer to its first element (array to pointer conversion rule). Therefore, "helloWorld" will be converted to a pointer to the element h and 3+"helloWorld" will move the pointer to the 4th element of the array "helloWorld".
From Pointer Arithmetic:
If the pointer P points at an element of an array with index I, then
P+N and N+P are pointers that point at an element of the same array with index I+N
P-N is a pointer that points at an element of the same array with index {tt|I-N}}
The behavior is defined only if both the original pointer and the result pointer are pointing at elements of the same array or one past the end of that array. ....
The type of string literal is char[N], where N is size of string (including null terminator).
From C Standard#6.3.2.1p3 [emphasis mine]
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
So, in the expression
3+"helloWorld"
"helloWorld", which is of type char [11] (array of characters), convert to pointer to character that points to the initial element of the array object.
Which means, the expression is:
3 + P
where P is pointer to initial element of "helloWorld" string
----------------------------------------------
| h | e | l | l | o | W | o | r | l | d | \0 |
----------------------------------------------
^
|
P (pointer pointing to initial element of array)
when 3 gets added to pointer P, the resulting pointer will be pointing to 4th character:
----------------------------------------------
| h | e | l | l | o | W | o | r | l | d | \0 |
----------------------------------------------
^
|
P (after adding 3 the resulting pointer pointing to 4th element of array)
This resulting pointer of expression 3+"helloWorld" will be passed to printf(). Note that the first parameter of printf() is not array but pointer to a null-terminated string and the expression 3+"helloWorld" resulting in pointer to 4th element of "helloWorld" string. Hence, you are getting output "loWorld".
The answers from dbush and haccks are concise and illuminating, and I upvoted the one from haccks in support of the bounty offered by dbush.
The only thing that I find left unsaid is that the way the question title is phrased makes me wonder if the OP would think that e.g. this also should produce an error:
char sometext[] = {'h', 'e', 'l', 'l', 'o', '\0'};
printf (sometext);
since there is no string literal involved at all. The OP needs to understand that one should never think that a function call that takes a char * argument can "only allow taking a string literal as [that] argument".
The answers from dbush and haccks hint at this by mentioning the conversion of a string literal to a char * (and how adding an integer to that evaluates), but I feel that it's worth pointing out explicitly that anything that is treated as a char * can be used, even things not converted from a string literal.
printf(3+"helloWorld"); //Output is "loWorld"
It will not give error because String in C Language String is array of characters and Array name give base address of array.
In case of printf(3+"helloWorld"); 3+"helloWorld" is giving the address of fourth element of array of charecters i.e of String
This is still a valid pointer to a string i.e char*
printf only allow taking a char* as the first argument
The first argument to printf is declared as const char *format. It means printf should be passed a pointer to char and the characters pointed to by this pointer will not be changed by printf. There are additional constraints on this first argument:
it should point to a proper C string, that is an array of characters terminated by a null byte.
it may contain conversion specifiers, which must be properly constructed and the corresponding arguments must be passed as extra arguments to printf, with the expected types and order as derived from the format string.
Passing a string constant such as "helloWorld" as the format argument is the most common way to invoke printf. String constants are arrays of char terminated by a null byte which should not be modified by the program. Passing them to functions expecting a pointer to char will cause a pointer to their first byte to be passed, as is the case for all arrays in C.
The expression "helloWorld" + 3 or 3 + "helloWorld" evaluates to a pointer to the 4th byte of the string. It is equivalent to the expression 3 + &("helloWorld"[0]), &("helloWorld"[3]) or simply &"helloWorld"[3]. As a matter of fact, it is also equivalent to &3["helloWorld"] but this latest form is only used for pathological obfuscation.
printf does not use the bytes that precede the format argument, so passing 3 + "helloWorld" is equivalent to passing "loWorld" and produces the same output.
To be more precise, the first argument to printf() function is not an array of chars but a pointer to array of chars. the difference between them is the same difference between byval and byref in VB world.
a pointer can be incremented and decremented using (++ and --) or applying arithmetic operations (+ and -).
in your case you are passing a pointer to "helloWorld" incremented by three, thus it points to the forth element of the array of chars "helloWorld".
lets simplify this in asm pseudo code
MOV EAX, offset ("helloWorld")
ADD EAX, 3
PUSH EAX
CALL printf
you maybe think that 3+"hello word" do concatenation between 3 and "hello world", but in C concatenation is done otherwise. the simplest way to do is sprintf(buff, "%d%s",3,"hello wrord");
It's pretty strange, but not wrong. By doing 3 + you are moving your pointer to a different location.
The same thing work when you initialize a char *:
char *str1 = "Hello";
char *str2 = 2 + str1;
str2 is now equal to "llo".

What is the difference between char*str={"foo",...} and char str[][5]={"foo",...} array definitions?

Case 1: When I write
char*str={"what","is","this"};
then str[i]="newstring"; is valid whereas str[i][j]='j'; is invalid.
Case 2: When I write
char str[][5]={"what","is","this"};
then str[i]="newstring"; is not valid whereas str[i][j]='J'; is valid.
Why is it so? I am a beginner who already get very confused after reading the other answers.
First of all: A suggestion: Please read about arrays are not pointers and vice-versa!!
That said, to enlighten this particular scenario,
In the first case,
char*str={"what","is","this"};
does not do what you think it does. It is a constraint violation, requiring a diagnostic from any conforming C implementation, as per chapter§6.7.9/P2:
No initializer shall attempt to provide a value for an object not contained within the entity
being initialized.
If you enable warnings, you'd (at least) see
warning: excess elements in scalar initializer
char*str={"what","is","this"};
However, a(ny) compiler with strict conformance turned on, should refuse to compile the code. In case, the compiler chose to compile and produce a binary anyway, the behavior is not withing the scope of definition of C language, it's up to the compiler implementation (and thus, can vary widely).
In this case, compiler decided this statement to make functionally only same as char*str= "what";
So, here str is a pointer to a char, which points to a string literal.
You can re-assign to the pointer,
str="newstring"; //this is valid
but, a statement like
str[i]="newstring";
would be invalid, as here, a pointer type is attempted to be converted and stored into a char type, where the types are not compatible. The compiler should throw a warning about the invalid conversion in this case.
Thereafter, a statement like
str[i][j]='J'; // compiler error
is syntactically invalid, as you're using the Array subscripting [] operator on something which is not "pointer to complete object type", like
str[i][j] = ...
^^^------------------- cannot use this
^^^^^^ --------------------- str[i] is of type 'char',
not a pointer to be used as the operand for [] operator.
On the other hand, in second case,
str is an array of arrays. You can change individual array elements,
str[i][j]='J'; // change individual element, good to go.
but you cannot assign to an array.
str[i]="newstring"; // nopes, array type is not an lvalue!!
Finally, considering you meant to write (as seen in comments)
char* str[ ] ={"what","is","this"};
in your first case, the same logic for arrays hold. This makes str an array of pointers. So, the array members, are assignable, so,
str[i]="newstring"; // just overwrites the previous pointer
is perfectly OK. However, the pointers, which are stored as array members, are pointers to string literal, so for the very same reason mentioned above, you invoke undefined behavior, when you want to modify one of the elements of the memory belonging to the string literal
str[i][j]='j'; //still invalid, as above.
The memory layout is different:
char* str[] = {"what", "is", "this"};
str
+--------+ +-----+
| pointer| ---> |what0|
+--------+ +-----+ +---+
| pointer| -------------> |is0|
+--------+ +---+ +-----+
| pointer| ----------------------> |this0|
+--------+ +-----+
In this memory layout, str is an array of pointers to the individual strings. Usually, these individual strings will reside in static storage, and it is an error to try to modify them. In the graphic, I used 0 to denote the terminating null bytes.
char str[][5] = {"what", "is", "this"};
str
+-----+
|what0|
+-----+
|is000|
+-----+
|this0|
+-----+
In this case, str is a contiguous 2D array of characters located on the stack. The strings are copied into this memory area when the array is initialized, and the individual strings are padded with zero bytes to give the array a regular shape.
These two memory layout are fundamentally incompatible with each other. You cannot pass either to a function that expects a pointer to the other. However, access to the individual strings is compatible. When you write str[1], you get a char* to the first character of a memory region containing the bytes is0, i.e. a C string.
In the first case, it is clear that this pointer is simply loaded from memory. In the second case, the pointer is created via array-pointer-decay: str[1] actually denotes an array of exactly five bytes (is000), which immediately decays into a pointer to its first element in almost all contexts. However, I believe that a full explanation of the array-pointer-decay is beyond the scope of this answer. Google array-pointer-decay if you are curious.
With the first you define a variable that is a pointer to a char, which is usually used as just a single string. It initializes the pointer to point to the string literal "what". The compiler should also complain that you have too many initializers in the list.
The second definition makes str an array of three arrays of five char. That is, it's an array of three five-character strings.
A little differently it can be seen something like this:
For the first case:
+-----+ +--------+
| str | --> | "what" |
+-----+ +--------+
And for the second you have
+--------+--------+--------+
| "what" | "is" | "this" |
+--------+--------+--------+
Also note that for the first version, with the pointer to a single string, the expression str[i] = "newstring" should also lead to warnings, as you try to assign a pointer to the single char element str[i].
That assignment is invalid in the second version as well, but for another reason: str[i] is an array (of five char elements) and you can't assign to an array, only copy to it. So you could try doing strcpy(str[i], "newstring") and the compiler will not complain. It's wrong though, because you try to copy 10 characters (remember the terminator) into an array of 5 characters, and that will write out of bounds leading to undefined behavior.
In the first declaration
char *str={"what","is","this"};
declares str a pointer to a char and is a scalar. The standard says that
6.7.9 Initialization (p11):
The initializer for a scalar shall be a single expression, optionally enclosed in braces. [...]
That said a scalar type can have braced enclosed initializer but with a single expression, but in case of
char *str = {"what","is","this"}; // three expressions in brace enclosed initializer
it is upto compilers that how it is going to handle this. Note that what happen to rest of the initializers is a bug. A confirming complier should give a diagnostic message.
[Warning] excess elements in scalar initializer
5.1.1.3 Diagnostics (P1):
A conforming implementation shall produce at least one diagnostic message (identified in an implementation-defined manner) if a preprocessing translation unit or translation unit contains a violation of any syntax rule or constraint, even if the behavior is also explicitly specified as undefined or implementation-defined
You claim "str[i]="newstring"; is valid whereas str[i][j]='j'; is invalid."
str[i] is of char type and can hold only a char data type. Assigning "newstring" (which is of char *) is invalid. The statement str[i][j]='j'; is invalid as subscript operator can only be applied to an array or pointer data type.
You can make str[i]="newstring"; working by declaring str as an array of char *
char *str[] = {"what","is","this"};
In this case str[i] is of char * type and a string literal can be assigned to it but modifying the string literal str[i] points to will invoke undefined behavior. That said you can't do str[0][0] = 'W'.
The snippet
char str[][5]={"what","is","this"};
declare str as an array of arrays of chars. str[i] is actually an array and as arrays are non modifiable lvalues so you can't use them as a left operand of assignment operator. This makes str[i]="newstring"; invalid. While str[i][j]='J'; works because elements of an array can be modified.
Just because you said other answers are confusing me, lets see what is happening with a simpler example first
char *ptr = "somestring";
Here "somestring" is a string literal which is stored in read only data section of the memory. ptr is a pointer (allocated just like other variables in the same section of code) which is pointing to the first byte of that allocated memory.
Hence cnosider these two statements
char *ptr2 = ptr; //statement 1 OK
ptr[1] = 'a'; //statement 2 error
Statement 1 is doing a perfectly valid operation (assigning 1 pointer to another), but statement 2 is not a valid operation (trying to write into a read only location).
On the other hand if we write:
char ptr[] = "somestring";
Here ptr is not actually a pointer, but the name of an array(unlike the pointer it doesn't take extra space in the memory). It allocates the same number of bytes as required by "somestring" (not read only) and that's it.
Hence consider the same two statements and one extra statement
char *ptr2 = ptr; //statement 1 OK
ptr[1] = 'a'; //statement 2 OK
ptr = "someotherstring" //statement 3 error
Statement 1 is doing a perfectly valid operation (assigning array name to a pointer, array name returns the address of the 1st byte), statement 2 is also valid because the memory is not readonly.
Statement 3 is not a valid operation because here ptr is not a pointer, It can not point to some other memory location.
Now in this code,
char **str={"what","is","this"};
*str is a pointer (str[i] is same as *(str+i))
but in this code
char str[][] = {"what", "is", "this"};
str[i] is not a pointer. It is the name of an array.
The same thing as above follows.
To begin with
char*str={"what","is","this"};
is not even valid C code 1), so discussing it isn't very meaningful. For some reason, the gcc compiler lets this code through with only a warning. Do not ignore compiler warnings. When using gcc, make sure to always compile using -std=c11 -pedantic-errors -Wall -Wextra.
What gcc seems to do when encountering this non-standard code, is to treat it as if you had written char*str={"what"};. Which in turn is the same thing as char*str="what";. This is by no means guaranteed by the C language.
str[i][j] tries to indirect a pointer twice, even though it only has one level of indirection, and therefore you get a compiler error. It makes as little sense as typing
int array [3] = {1,2,3}; int x = array[0][0];.
As for the difference between char* str = ... and char str[] = ..., see FAQ: What is the difference between char s[] and char *s?.
Regarding the char str[][5]={"what","is","this"}; case, it creates an array of arrays (2D array). The inner-most dimension is set to 5 and the outer-most dimension is set automatically by the compiler depending on how many initializers the programmer provided. In this case 3, so the code is equivalent to char[3][5].
str[i] gives you array number i in the array of arrays. You cannot assign to arrays in C, because that's how the language is designed. Furthermore, it would be incorrect to do so for a string anyway, FAQ: How to correctly assign a new string value?
1) This is a constraint violation of C11 6.7.9/2. Also see 6.7.9/11.
To do away with the confusion, you must have proper understanding of pointers, arrays and initializers.
A common misconception amongst C programming beginners is that an array is equivalent to a pointer.
An array is a collection of items of the same type. consider the following declaration:
char arr[10];
This array contains 10 elements, each of type char.
An initializer list may be used to initialize an array in a convenient manner. The following initializes the array elements with the corresponding values of the initializer list:
char array[10] = {'a','b','c','d','e','f','g','h','i','\0'};
Arrays are not assignable, thus the use of initializer list is valid upon array declaration only.
char array[10];
array = {'a','b','c','d','e','f','g','h','i','\0'}; // Invalid...
char array1[10];
char array2[10] = {'a','b','c','d','e','f','g','h','i','\0'};
array1 = array2; // Invalid...; You cannot copy array2 to array1 in this manner.
After the declaration of an array, assignments to array members must be via the array indexing operator or its equivalent.
char array[10];
array[0] = 'a';
array[1] = 'b';
.
.
.
array[9] = 'i';
array[10] = '\0';
Loops are a common and convenient way of assigning values to array members:
char array[10];
int index = 0;
for(char val = 'a'; val <= 'i'; val++) {
array[index] = val;
index++;
}
array[index] = '\0';
char arrays may be initialized via string literals which are constant null terminated char arrays:
char array[10] = "abcdefghi";
However the following is not valid:
char array[10];
array = "abcdefghi"; // As mentioned before, arrays are not assignable
Now, let us get to pointers...
Pointers are variables that can store the address of another variable, usually of the same type.
Consider the following declaration:
char *ptr;
This declares a variable of type char *, a char pointer. That is, a pointer that may point to a char variable.
Unlike arrays, pointers are assignable. Thus the following is valid:
char var;
char *ptr;
ptr = &var; // Perfectly Valid...
As a pointer is not an array, a pointer may be assigned a single value only.
char var;
char *ptr = &var; // The address of the variable `var` is stored as a value of the pointer `ptr`
Recall that a pointer must be assigned a single value, thus the following is not valid, as the number of initializers is more than one:
char *ptr = {'a','b','c','d','\0'};
This is a constraint violation, but your compiler might just assign 'a' to ptr and ignore the rest. But even then, the compiler will warn you because character literals such as 'a' have int type by default, and is incompatible with the type of ptr which is char *.
If this pointer has been dereferenced at runtime, then it will result in a run-time error for accessing invalid memory, causing the program to crash.
In your example:
char *str = {"what", "is", "this"};
again, this is a constraint violation, but your compiler may assign the string what to str and ignore the rest, and simply display a warning:
warning: excess elements in scalar initializer.
Now, here is how we eliminate the confusion regarding pointers and arrays:
In some contexts, an array may decay to a pointer to the first element of the array. Thus the following is valid:
char arr[10];
char *ptr = arr;
by using the array name arr in an assignment expression as an rvalue, the array decays to a pointer to it's first element, which makes the previous expression equivalent to:
char *ptr = &arr[0];
Remember that arr[0] is of type char, and &arr[0] is its address that is of type char *, which is compatible with the variable ptr.
Recall that string literals are constant null terminated char arrays, thus the following expression is also valid:
char *ptr = "abcdefghi"; // the array "abcdefghi" decays to a pointer to the first element 'a'
Now, in your case, char str[][5] = {"what","is","this"}; is an array of 3 arrays, each contain 5 elements.
Since arrays are not assignable, str[i] = "newstring"; is not valid as str[i] is an array, but str[i][j] = 'j'; is valid since
str[i][j] is an array element that is NOT an array by itself, and is assignable.
Case 1:
When I write
char*str={"what","is","this"};
then str[i]="newstring"; is valid whereas str[i][j]='j'; is invalid.
Part I.I
>> char*str={"what","is","this"};
In this statement, str is a pointer to char type.
When compiling, you must be getting a warning message on this statement:
warning: excess elements in scalar initializer
char*str={"what","is","this"};
^
Reason for the warning is - You are providing more than one initializer to a scalar.
[Arithmetic types and pointer types are collectively called scalar types.]
str is a scalar and from C Standards#6.7.9p11:
The initializer for a scalar shall be a single expression, optionally enclosed in braces. ..
Furthermore, giving more than one initializer to a scalar is undefined behavior.
From C Standards#J.2 Undefined behavior:
The initializer for a scalar is neither a single expression nor a single expression enclosed in braces
Since it is undefined behavior as per the standard, there is no point in discussing it further. Discussing Part I.II and Part I.III with an assumption - char *str="somestring", just for better understanding of char * type.
Seems that you want to create an array of pointers to string. I have added a brief about the array of pointers to string, below in this post, after talking about both the cases.
Part I.II
>> then str[i]="newstring"; is valid
No, this is not valid.
Again, the compiler must be giving a warning message on this statement because of incompatible conversion.
Since str is a pointer to char type. Therefore, str[i] is a character at i places past the object pointed to by str [str[i] --> *(str + i)].
"newstring" is a string literal and a string literal decays into a pointer, except when used to initialize an array, of type char * and here you are trying to assign it to a char type. Hence the compiler reporting it as a warning.
Part I.III
>> whereas str[i][j]='j'; is invalid.
Yes, this is invalid.
The [] (subscript operator) can be used with array or pointer operands.
str[i] is a character and str[i][j] means you are using [] on char operand which is invalid. Hence the compiler reporting it as an error.
Case 2:
When I write
char str[][5]={"what","is","this"};
then str[i]="newstring"; is not valid whereas str[i][j]='J'; is valid.
Part II.I
>> char str[][5]={"what","is","this"};
This is absolutely correct.
Here, str is a 2D-array. Based on the number of initializers, the compiler will automatically set the first dimension.
The in-memory view of str[][5], in this case, would be something like this:
str
+-+-+-+-+-+
str[0] |w|h|a|t|0|
+-+-+-+-+-+
str[1] |i|s|0|0|0|
+-+-+-+-+-+
str[2] |t|h|i|s|0|
+-+-+-+-+-+
Based on initializer list, the respective elements of 2D-array will be initialized and the rest of the elements are set to 0.
Part II.II
>> then str[i]="newstring"; is not valid
Yes, this is not valid.
str[i] is a one-dimensional array.
As per the C Standards, an array is not a modifiable lvalue.
From C Standards#6.3.2.1p1:
An lvalue is an expression (with an object type other than void) that potentially designates an object;64) if an lvalue does not designate an object when it is evaluated, the behavior is undefined. When an object is said to have a particular type, the type is specified by the lvalue used to designate the object. A modifiable lvalue is an lvalue that does not have array type, does not have an incomplete type, does not have a const- qualified type, and if it is a structure or union, does not have any member (including, recursively, any member or element of all contained aggregates or unions) with a const- qualified type.
Also, an array name convert to pointer that point to initial element of the array object except when it is the operand of the sizeof operator, the _Alignof operator or the unary & operator.
From C Standards#6.3.2.1p3:
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue.
Since str is already initialized and when you assign some other string literal to ith array of str, the string literal convert to a pointer which makes the assignment incompatible because you have lvalue of type char array and rvalue of type char *. Hence the compiler reporting it as an error.
Part II.III
>> whereas str[i][j]='J'; is valid.
Yes, this is valid as long as the i and j are valid values for given array str.
str[i][j] is of type char, so you can assign a character to it.
Beware, C does not check array boundaries and accessing an array out of bounds is undefined behavior which includes - it may fortuitously do exactly what the programmer intended or segmentation fault or silently generating incorrect results or anything can happen.
Assuming that in the Case 1, you want to create an array of pointers to string.
It should be like this:
char *str[]={"what","is","this"};
^^
The in-memory view of str will be something like this:
str
+----+ +-+-+-+-+--+
str[0]| |--->|w|h|a|t|\0|
| | +-+-+-+-+--+
+----+ +-+-+--+
str[1]| |--->|i|s|\0|
| | +-+-+--+
+----+ +-+-+-+-+--+
str[2]| |--->|t|h|i|s|\0|
| | +-+-+-+-+--+
+----+
"what", "is" and "this" are string literals.
str[0], str[1] and str[2] are pointers to the respective string literal and you can make them point to some other string as well.
So, this is perfectly fine:
str[i]="newstring";
Assuming i is 1, so str[1] pointer is now pointing to string literal "newstring":
+----+ +-+-+-+-+-+-+-+-+-+--+
str[1]| |--->|n|e|w|s|t|r|i|n|g|\0|
| | +-+-+-+-+-+-+-+-+-+--+
+----+
But you should not do this:
str[i][j]='j';
(assuming i=1 and j=0, so str[i][j] is first character of second string)
As per the standard attempting to modify a string literal results in undefined behavior because they may be stored in read-only storage or combined with other string literals.
From C standard#6.4.5p7:
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
Additional:
There is no native string type in C language. In C language, a string is a null-terminated array of characters. You should know the difference between arrays and pointers.
I would suggest you read following for better understanding about arrays, pointers, array initialization:
Array Initialization, check this.
Equivalence of pointers and arrays, check this and this.
case 1 :
char*str={"what","is","this"};
First of all above statement is not valid, read the warnings properly. str is single pointer, it can points to single char array at a time not to multiple char array.
bounty.c:3:2: warning: excess elements in scalar initializer [enabled by default]
str is a char pointer and it's stored in section section of RAM but it's contents are stored in code(Can't modify the content section of RAM because str is initialized with string(in GCC/linux).
as you stated str[i]="newstring"; is valid whereas str[i][j]='j'; is invalid.
str= "new string" is not causing modifying code/read-only section, here you are simply assigning new address to str that's why it's valid but
*str='j' or str[0][0]='j' is not valid because here you are modifying the read only section, trying to change first letter of str.
Case 2 :
char str[][5]={"what","is","this"};
here str is 2D array i.e str and str[0],str[1],str[2] itself are stored in stack section of RAM that means you can change each str[i] contents.
str[i][j]='w'; it's valid because you are trying to stack section contents which is possible. but
str[i]= "new string"; it's not possible because str[0] itself a array and array is const pointer(can't change the address), you can't assign new address.
Simply in first case str="new string" is valid because str is pointer, not an array and in second case str[0]="new string" is not valid because str is array not a pointer.
I hope it helps.

Understanding two ways of declaring a C string [duplicate]

This question already has answers here:
How to declare strings in C [duplicate]
(4 answers)
Closed 8 years ago.
A few weeks ago I started learning the programming language C. I have knowledge in web technologies like HMTL/CSS, Javscript, PHP, and basic server administration, but C is confusing me. To my understanding, the C language does not have a data type for strings, just characters, however I may be wrong.
I have heard there are two ways of declaring a string. What is the difference between these two lines of declaring a string:
a.) char stringName[];
b.) char *stringName;
I get that char stringName[]; is an array of characters. However, the second line confuses me. To my understanding the second line makes a pointer variable. Aren't pointer variables supposed to be the memory address of another variable?
In the C language, a "string" is, as you say, an array of char. Most string functions built into the C spec expect the string to be "NUL terminated", meaning the last char of the string is a 0. Not the code representing the numeral zero, but the actual value of 0.
For example, if you're platform uses ASCII, then the following "string" is "ABC":
char myString[4] = {65, 66, 67, 0};
When you use the char varName[] = "foo" syntax, you're allocating the string on the stack (or if its in a global space, you're allocating it globally, but not dynamically.)
Memory management in C is more manual than in many other langauges you may have experience with. In particular, there is the concept of a "pointer".
char *myString = "ABC"; /* Points to a string somewhere in memory, the compiler puts somewhere. */
Now, a char * is "an address that points to a char or char array". Notice the "or" in that statement, it is important for you, the programmer, to know what the case is.
It's important to also ensure that any string operations you perform don't exceed the amount of memory you've allocated to a pointer.
char myString[5];
strcpy(myString, "12345"); /* copy "12345" into myString.
* On no! I've forgot space for my nul terminator and
* have overwritten some memory I don't own. */
"12345" is actually 6 characters long (don't forget the 0 at the end), but I've only reserved 5 characters. This is what's called a "buffer overflow", and is the cause of many serious bugs.
The other difference between "[]" and "*", is that one is creating an array (as you guessed). The other one is not reserving any space (other than the space to hold the pointer itself.) That means that until you point it somewhere that you know is valid, the value of the pointer should not be used, for either reading or writing.
Another point (made by someone in the comment)
You cannot pass an array as a parameter to a function in C. When you try, it gets converted to a pointer automatically. This is why we pass around pointers to strings rather than the strings themselves
In C, a string is a sequence of character values followed by a 0-valued byte1 . All the library functions that deal with strings use the 0 terminator to identify the end of the string. Strings are stored as arrays of char, but not all arrays of char contain strings.
For example, the string "hello" is represented as the character sequence {'h', 'e', 'l', 'l', 'o', 0}2 To store the string, you need a 6-element array of char - 5 characters plus the 0 terminator:
char greeting[6] = "hello";
or
char greeting[] = "hello";
In the second case, the size of the array is computed from the size of the string used to initialize it (counting the 0 terminator). In both cases, you're creating a 6-element array of char and copying the contents of the string literal to it. Unless the array is declared at file scope (oustide of any function) or with the static keyword, it only exists for the duration of the block in which is was declared.
The string literal "hello" is also stored in a 6-element array of char, but it's stored in such a way that it is allocated when the program is loaded into memory and held until the program terminates3, and is visible throughout the program. When you write
char *greeting = "hello";
you are assigning the address of the first element of the array that contains the string literal to the pointer variable greeting.
As always, a picture is worth a thousand words. Here's a simple little program:
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char greeting[] = "hello"; // greeting contains a *copy* of the string "hello";
// size is taken from the length of the string plus the
// 0 terminator
char *greetingPtr = "hello"; // greetingPtr contains the *address* of the
// string literal "hello"
printf( "size of greeting array: %zu\n", sizeof greeting );
printf( "length of greeting string: %zu\n", strlen( greeting ) );
printf( "size of greetingPtr variable: %zu\n", sizeof greetingPtr );
printf( "address of string literal \"hello\": %p\n", (void * ) "hello" );
printf( "address of greeting array: %p\n", (void * ) greeting );
printf( "address of greetingPtr: %p\n", (void * ) &greetingPtr );
printf( "content of greetingPtr: %p\n", (void * ) greetingPtr );
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
return 0;
}
And here's the output:
size of greeting array: 6
length of greeting string: 5
size of greetingPtr variable: 8
address of string literal "hello": 0x4007f8
address of greeting array: 0x7fff59079cf0
address of greetingPtr: 0x7fff59079ce8
content of greetingPtr: 0x4007f8
greeting: hello
greetingPtr: hello
Note the difference between sizeof and strlen - strlen counts all the characters up to (but not including) the 0 terminator.
So here's what things look like in memory:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"hello" 0x4007f8 'h' 'e' 'l' 'l'
0x4007fc 'o' 0x00 ??? ???
...
greetingPtr 0x7fff59079ce8 0x00 0x00 0x00 0x00
0x7fff59879cec 0x00 0x40 0x7f 0xf8
greeting 0x7fff59079cf0 'h' 'e' 'l' 'l'
0x7fff59079cf4 'o' 0x00 ??? ???
The string literal "hello" is stored at a vary low address (on my system, this corresponds to the .rodata section of the executable, which is for static, constant data). The variables greeting and greetingPtr are stored at much higher addresses, corresponding to the stack on my system. As you can see, greetingPtr stores the address of the string literal "hello", while greeting stores a copy of the string contents.
Here's where things can get kind of confusing. Let's look at the following print statements:
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
greeting is a 6-element array of char, and greetingPtr is a pointer to char, yet we're passing them both to printf in exactly the same way, and the string is being printed out correctly; how can that work?
Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
In the printf call, the expression greeting has type "6-element array of char"; since it isn't the operand of the sizeof or unary & operators, it is converted ("decays") to an expression of type "pointer to char" (char *), and the address of the first element is actually passed to printf. IOW, it behaves exactly like the greetingPtr expression in the next printf call4.
The %s conversion specifer tells printf that its corresponding argument has type char *, and that it it should print out the character values starting from that address until it sees the 0 terminator.
Hope that helps a bit.
1. Often referred to as the NUL terminator; this should not be confused with the NULL pointer constant, which is also 0-valued but used in a different context.
2. You'll also see the terminating 0-valued byte written as '\0'. The leading backslash "escapes" the value, so instead of being treated as the character '0' (ASCII 48), it's treated as the value 0 (ASCII 0)).
3. In practice, space is set aside for it in the generated binary file, often in a section marked read-only; attempting to modify the contents of a string literal invokes undefined behavior.
4. This is also why the declaration of greeting copies the string contents to the array, while the declaration of greetingPtr copies the address of the first element of the string. The string literal "hello" is also an array expression. In the first declaration, since it's being used to initialize another array in a declaration, the contents of the array are copied. In the second declaration, the target is a pointer, not an array, so the expression is converted from an array type to a pointer type, and the resulting pointer value is copied to the variable.
In C (and in C++), arrays and pointers are represented similarly; an array is represented by the address of the first element in the array (which is sufficient to gain access to the other elements, since elements are contiguous in memory within an array). This also means that an array does not, by itself, indicate where it ends, and thus you need some way of identifying the end of the array, either by passing around the length as a separate variable or by using some convention (such as that there is a sentinel value that is placed in the last position of the array to indicate the end of the array). For strings, the latter is the common convention, with '\0' (the NUL character) indicating the end of the string.

Question about pointers and strings in C [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between char s[] and char *s in C?
Difference between char *str = “…” and char str[N] = “…”?
I have some code that has had me puzzled.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char* string1 = "this is a test";
char string2[] = "this is a test";
printf("%i, %i\n", sizeof(string1), sizeof(string2));
system("PAUSE");
return 0;
}
When it outputs the size of string1, it prints 4, which is to be expected because the size of a pointer is 4 bytes. But when it prints string2, it outputs 15. I thought that an array was a pointer, so the size of string2 should be the same as string1 right? So why is it that it prints out two different sizes for the same type of data (pointer)?
Arrays are not pointers. Array names decay to pointers to the first element of the array in certain situations: when you pass it to a function, when you assign it to a pointer, etc. But otherwise arrays are arrays - they exist on the stack, have compile-time sizes that can be determined with sizeof, and all that other good stuff.
Arrays and pointers are completely different animals. In most contexts, an expression designating an array is treated as a pointer.
First, a little standard language (n1256):
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
The string literal "this is a test" is a 15-element array of char. In the declaration
char *string1 = "this is a test";
string1 is being declared as a pointer to char. Per the language above, the type of the expression "this is a test" is converted from char [15] to char *, and the resulting pointer value is assigned to string1.
In the declaration
char string2[] = "this is a test";
something different happens. More standard language:
6.7.8 Initialization
...
14 An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
...
22 If an array of unknown size is initialized, its size is determined by the largest indexed element with an explicit initializer. At the end of its initializer list, the array no longer has incomplete type.
In this case, string2 is being declared as an array of char, its size is computed from the length of the initializer, and the contents of the string literal are copied to the array.
Here's a hypothetical memory map to illustrate what's happening:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
no name 0x08001230 't' 'h' 'i' 's'
0x08001234 ' ' 'i' 's' ' '
0x08001238 'a' ' ' 't' 'e'
0x0800123C 's' 't' 0
...
string1 0x12340000 0x08 0x00 0x12 0x30
string2 0x12340004 't' 'h' 'i' 's'
0x12340008 ' ' 'i' 's' ' '
0x1234000C 'a' ' ' 't' 'e'
0x1234000F 's' 't' 0
String literals have static extent; that is, the memory for them is set aside at program startup and held until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior; the underlying platform may or may not allow it, and the standard places no restrictions on the compiler. It's best to act as though literals are always unwritable.
In my memory map above, the address of the string literal is set off somewhat from the addresses of string1 and string2 to illustrate this.
Anyway, you can see that string1, having a pointer type, contains the address of the string literal. string2, being an array type, contains a copy of the contents of the string literal.
Since the size of string2 is known at compile time, sizeof returns the size (number of bytes) in the array.
The %i conversion specifier is not the right one to use for expressions of type size_t. If you're working in C99, use %zu. In C89, you would use %lu and cast the expression to unsigned long:
C89: printf("%lu, %lu\n", (unsigned long) sizeof string1, (unsigned long) sizeof string2);
C99: printf("%zu, %zu\n", sizeof string1, sizeof string2);
Note that sizeof is an operator, not a function call; when the operand is an expression that denotes an object, parentheses aren't necessary (although they don't hurt).
string1 is a pointer, but string2 is an array.
The second line is something like int a[] = { 1, 2, 3}; which defines a to be a length-3 array (via the initializer).
The size of string2 is 15 because the initializer is nul-terminated (so 15 is the length of the string + 1).
An array of unknown size is equivalent to a pointer for sizeof purposes. An array of static size counts as its own type for sizeof purposes, and sizeof reports the size of the storage required for the array. Even though string2 is allocated without an explicit size, the C compiler treats it magically because of the direct initialization by a quoted string and converts it to an array with static size. (Since the memory isn't allocated in any other way, there's nothing else it can do, after all.) Static size arrays are different types from pointers (or dynamic arrays!) for the purpose of sizeof behavior, because that's just how C is.
This seems to be a decent reference on the behaviors of sizeof.
The compiler know that test2 is an array, so it prints out the number of bytes allocated to it(14 letters plus null terminator). Remember that sizeof is a compiler function, so it can know the size of a stack variable.
array is not pointer. Pointer is a variable pointing to a memory location whereas array is starting point of sequential memory allocated
Its because
string1 holds pointer, where pointer has contiguous chars & its
immutable.
string2 is location where your chars sit.
basically C compiler iterprets these 2 differently. beautifully explained here http://c-faq.com/aryptr/aryptr2.html.

Resources