C: different String definition, I get different size using sizeof() - c

I was testing the use of sizeof() for the same String content "abc". my function is like this:
int main(void){
char* pass1 = "abc";
char pass2[] = "abc";
char pass3[4] = "abc";
char pass4[] = "";
scanf("%s", pass4);
printf("sizeof(pass1) is: %lu\n", sizeof(pass1));
printf("sizeof(pass2) is: %lu\n", sizeof(pass2));
printf("sizeof(pass3) is: %lu\n", sizeof(pass3));
printf("sizeof(pass4) is: %lu\n", sizeof(pass4));
return 0;
}
I input "abc" for pass4, the output is like this:
sizeof(pass1) is: 8
sizeof(pass2) is: 4
sizeof(pass3) is: 4
sizeof(pass4) is: 1
I was expecting all 4s. I thought the 4 above string definitions are the same.
why sizeof(pass1) returns 8? Why sizeof(pass4) is 1?

When you take sizeof on a pointer type, you'll get the size in bytes of the memory address. In this case 8 is the size of the address (in bytes). sizeof on statically allocated read only strings in C will return the actual size in bytes of the string including the null byte.

sizeof gives the size of its operand. To understand the results you are seeing, you need to understand what pass1, pass2, pass3, and pass4 actually are.
pass1 is a pointer to char (i.e. a char *) so sizeof pass1 gives the size of a pointer (a variable which contains a memory address, not an array). That is 8 with your compiler. The size of a pointer is implementation defined, so this may give different results with different compilers. The fact you have initialised pass1 so it points at the first character of a string literal "abc" does not change the fact that pass1 is declared as a pointer, not an array.
pass2 is an array initialised using the literal "abc" which - by convention - is represented in C using an array of four characters (the three letters 'a' to 'c', plus an additional character with value zero ('\0').
pass3 is also an array of four char, since it is declared that way char pass3[4] = <etc>. If you had done char pass3[4] = "abcdef", you would still find that sizeof pass3 is 4 (and the 4 elements of pass3 will be 'a' to 'd' (with other character 'e', 'f', and '\0' in the string literal "abcdef" not used to initialise pass3).
Since both pass2 and pass3 are arrays of four characters, their size is 4 (in general, the size of an array is the size of the array element multiplied by number of elements). The standard defines sizeof char to be 1, and 1*4 has a value 4.
pass4 is initialised using the literal "". That string literal is represented using a single char with value '\0' (and no characters before it, since none are between the double quotes). So pass4 has size 1 for the same reason that pass2 has size 4.

In this declaration
char* pass1 = "abc";
in the right side used as an initializer there is a character array that has size equal to 4. String literals have types of character arrays. Take into account that string literals include the terminating zero. You can check this the following way
printf("sizeof( \"abc\") is: %zu\n", sizeof( "abc"));
In the declaration the array is used to initialize a pointer. Used as initializer of a pointer the array is implicitly converted to the pointer to its first element.
Thus the pointer pass1 points to the first element of the string literal "abc". The size of the pointer itself in your system is equal to 8 bytes.
In these declerations
char pass2[] = "abc";
char pass3[4] = "abc";
the string literal is used to initialize arrays. In this case each element of the arrays is initialized by the corresponding element of the string literal. All other elements of the arrays are zero initialized. If the size of an array is not specified then it is calculated from the number of initializers.
So in this declaration
char pass2[] = "abc";
the array pass2 will have 4 elements because the sgtring literal provides four initializers.
In this declaration
char pass3[4] = "abc";
there is explicitly specified that the array has 4 elements.
Thus the both arrays has size equal to 4 and the pointer declared first has size of 8 bytes.

Related

Is a number between the 2 square brackets necessary regarding char variables

I just started programming in C and I see on the internet people declaring chars like, for example, char name[], without putting any number between the 2 square brackets. I used to code in C++ in high school and we always used to put a number when declaring char. Can someone explain when do we put a number, or if it is even necessary to use one when declaring a char variable?
I used to do it like this:
char name[20] = "John";
but I've seen on the internet people doing it like this:
char name[] = "John";
char curse[] = "fie! cometh h're and englut mine own coxcomb thee distemperate fooleth!";
In this case, the compiler automatically computes the number of bytes, including the null-terminator. The size of the array is equal to the length of the string, plus the null-byte.
char curse[100] = "fie! cometh h're and englut mine own coxcomb thee distemperate fooleth!";
Whereas, this defines an array of 100 chars, and initializes it with a string of 70 or so bytes. The rest of the bytes are initialized to 0¹. The size of the array is 100 bytes, whereas the length of the string can be determined with strlen. NB that the size of the array and the length of the string it contains are not the same.
Aside: It's better to define the array size as a macro instead of having magic numbers all over your code. It's easier to maintain that way.
[1] From C11:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
i.e. 0.
char name[] = "John"; will define a char array long enough to accommodate "John" (5 chars).
char name1[20] = "John"; defined an array of 20 char and initialization string only takes 5 bytes (abstracting from the rest of bytes) So you can use those remaining 15 bytes for example by appending another string to it.
strcat(name1, " Travolta");
`
From the C Standard (6.7.9 Initialization)
22 If an array of unknown size is initialized, its size is determined
by the largest indexed element with an explicit initializer. The array
type is completed at the end of its initializer list.
and
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
So in this declaration of a character array
char name[] = "John";
that is according to the second quote from the C Standard equivalent to
char name[] = { "John" );
the array name is initialized by the string literal "John".
According to the first quote from the C Standard the size of the array is determined by the number of characters (including the terminating zero character '\0') in the string literal. In fact the above declaration has the same effect as the following declaration
char name[] = { 'J', 'o', 'h', 'n', '\0' };
So as the number of characters in the string literal is equal to 5 then the array has exactly 5 elements and its size is also equal to 5 because sizeof( char ) is always equal to 1.
In this declaration
char name[20] = "John";
that again may be written like
char name[20] = { "John" };
the array is declared specifying explicitly 20 elements. The first 5 elements of the array have corresponding explicit initializers (characters of the string literal). All other elements of the array are implicitly initialized by 0.
Pay attention to that in C you may write
char name[4] = "John";
that is you may exclude the terminating zero character '\0' of the string literal from the list of initializers of the array. In this case the array name will not contain a string. In C++ such an initialization is incorrect and the C++ compiler will issue an error for such a declaration.
To output a character array that contains a string you can write for example
printf( "name = %s\n", name );
For the last shown declaration where the declared array does not contain a string you can write
printf( "name = %.*s\n", ( int )sizeof( name ), name );
If an array is declared without a size but with an initializer, then the size of the array is taken from the number of elements in the initializer:
int foo[] = {0, 1, 2, 3};
In this case, foo will be 4 elements wide.
char bar[] = "fred";
char bar[] = { 'f', 'r', 'e', 'd', 0 };
In this case, bar will be 5 elements wide - the initializers "fred" and { 'f', 'r', 'e', 'd', 0 } are equivalent to each other.
If you declare an array with a size and an initializer, then the size will be taken from the size expression. If there are fewer elements in the initializer than the size, then remaining elements will be initialized to 0:
int foo[10] = {1, 2, 3}; // elements 3 through 9 will be initialized to 0
If there are more elements in the initializer than the array is sized for, then that's a constraint violation and the compiler will yell at you:
int foo[3] = {1, 2, 3, 4}; // constraint violation, too many elements
// in the initializer
If you use designated initializers, then the size will either be taken from the number of initializers or the largest designated initializer:
int foo[] = {[0] = 1, [9] = 2};
foo will be 10 elements wide, element 0 will be initialized to 1, element 9 will be initialized to 2, and the remaining elements will be initialized to 0.
If an array is declared without a size and without an initializer, then the array type is incomplete, and you can't create an instance of an incomplete type.
Note that array sizes are fixed for the lifetime of the array. C has something called a variable-length array where the size of the array is determined by a runtime variable:
size_t x = some_size();
int arr[x];
However, "variable" in this context only means that the size of the array isn't fixed from definition to definition, not that the array can be resized after it is defined.
In the context of a function parameter declaration, T a[N] and T a[] are "adjusted" to T *a - you cannot pass arrays as function parameters, because array expressions "decay" to pointer expressions under most circumstances and what the function actually receives is a pointer.

Why don't strings "exist" and memory of chars vs memory of strings?

I've studied C programming at university for 4 months. My professor always said that strings don't really exist. Since I finished those 2 small courses, I really started programming (java). I can't remember WHY strings don't really exist. I wasn't concerned about this before, but I'm curious now. Why don't they exist? And do they exist in Java? I know it has to do something with that "under the hood strings are just characters", but does that mean that strings are all saved as multiple characters etc? And doesn't that take more memory?
a string type does not exist in C, but C strings do exist. They are defined as a null terminated character array. For example:
char buffer1[] = "this is a C string";//string literal
creates a C string that looks like this in memory:
|t|h|i|s| |i|s| |a| |C| |s|t\r|i|n|g|\0|?|?|?|
< string >
Note that this is not a string:
char *buffer2;
Until it contains a series of char terminated by a \0, it is just a pointer to char. (char *)
buffer2 = calloc(strlen(buffer1)+1, 1);
strcpy(buffer2, buffer1); //now buffer2 is pointing to a string
References:
Strings in C 1
Strings in C 2
Stirngs in C 3
and many more...
Edit:
(to address discussion in comments on strings:)
Based on the following definition: (From here)
Strings are actually one-dimensional array of characters terminated by
a null character '\0'.
First, since null termination is integral to a conversation about C strings, here are some clarifications:
The term NULL is a pointer, typically defined as (void*)0), or
just 0. It can be, and typically is used to initialize pointer
variables.
The term '\0' is a character. In C, it means exactly the same
thing as the integer constant 0. (same value 0, same type
int). It is used to initialize char arrays.
Things that are strings:
char string[] = {'\0'}; //zero length or _empty_ string with `sizeof` 1.
In memory:
|\0|
...
char string[10] = {'\0'} also zero length or _empty_ with `sizeof` 10.
In memory:
|\0|\0|\0|\0|\0|\0|\0|\0|\0|\0|
...
char string[] = {"string"}; string of length 6, and `sizeof` 7.
In memory:
|s|t|r|i|n|g|\0|
...
char [2][5] = {{0}}; 2 strings, each with zero length, and `sizeof` 5.
In memory:
|0|0|0|0|0|0|0|0|0|0| (note 0 equivalent to \0)
...
char *buf = {"string"};//string literal.
In memory:
|s|t|r|i|n|g|\0|
Things that are not strings:
char buf[6] = {"string"};//space for 6, but "string" requires 7 for null termination.
In Memory:
|s|t|r|i|n|g| //no null terminator
|end of space in memory.
...
char *buf = {0};//pointer to char (`char *`).
In memory:
|0| //null initiated pointer residing at address of `buf` (eg. 0x00123650)
Strings don't exist in C as a data type. There is int, char, byte, etc., but no "string".
This means you can declare a variable as an int, but not as a "string" because there is no data type named "string" .
The closest C has to a string is an array of chars, or a char * to a section of memory. The actual string is up to the programmer to define, as a sequence of chars terminated with a \0, or a number of chars with a known upper bound.

Why does the "Hello" inside the string array have a size 4?

The code in question:
#include <stdio.h>
int main(void) {
char *a[10] = {"hi", "Hello", "how"};
printf("%d\n", sizeof(a));
printf("%s\n%s\n%s\n", a[0],a[1],a[2]);
printf("%d\n%d\n%d\n", sizeof(a[0]),sizeof(a[1]),sizeof(a[2]));
printf("%d", sizeof("Hello"));
return 0;
}
Output:
40
hi
Hello
how
4
4
4
6
I have no idea why this is happening and I also looked up the reference for sizeof on cppreference. Still have no clue why it is returning 4 for "Hello" when it should need 6 to store it.
It actually print the size of pointer not the string content itself. Look all of the size is printed as 4 bytes.
a is an array of char*, so sizeof(a[1]) is the size of a char*, which is 4 on your platform.
sizeof("Hello") is the size of a char[6], which is, by definition, 6. The reason for the latter is that "Hello" is a string literal whose type is length 6 array char, containing
{'H', 'e', 'l', 'l', 'o', '\0'}
"Inside the string array" there are elements of type char * that is it is an array of pointers to chararcters because you defined it like
char *a[10] = {"hi", "Hello", "how"};
^^^^^^
When elements of such an array are initialized by string literals (that have types of character arrays) then they (string literals) are converted to pointers to their first elements. So array a is an array of pointers (the first three elements of the array) to first characters of the string literals. All other elements are zero-initialized.
Thus sizeof( a[1] ) is equal to sizeof( char * ) and in the environment where the prograam was run it is equal to 4
If you would define the array the following way
char a[][10] = {"hi", "Hello", "how"};
then in this case sizeof( a[1] ) would be equal to 10 because in this case you explicitly specified that each element of the array has type char[10]
However if you would apply function strlen to a[1] you would get that strlen( a[1] ) is equal to 5 (the terminating zero of the string literal is not counted).
As for the string literal "Hello" then in C it has type char[6] and sizeof( "Hello" ) is evidently equal to 6.
So if you have for example
char *s = "Hello";
then
sizeof( s ) is equal to 4;
sizeof( "Hello" ) is equal to 6;
and at last strlen( s ) is equal to strlen( "Hello" ) and equal to 5.
sizeof is not a function, it is an operator which produces the size of the given type.
In your case, a[0] is of type char *, which, in your system, occupies 4 bytes of memory. So, it is producing 4.
OTOH, while you use sizeof("Hello"), the string "Hello" here is of type char[6] (including null-terminator), so, it prints 6.

C - semantics to do with pointers

A very short question; how does one refer to 'names' below?
char *names[] = {
"Alan", "Frank",
"Mary", "John", "Lisa"
};
Is 'names' (a)a pointer to an array of strings?, or (b)an array of pointers to strings?
I've noticed that if the address of "Alan"(and 'names') is x, then the address of "Frank" is x+(0x08), the address of "Mary" is x+(0x10), and so on. So for this reason I'm leaning more towards (a).
As intro first of all: There are not "string"s in C.
What in C commonly is called a "string" really is an array of characters (chars) with at least one of it's elements carrying a '\0', the "string"-terminator, also known as the 0-termination marking the end of the "string".
So a "string" is a char array like this:
char s[42] = "alk"; /* with char[0] == 'a', char[1] == 'l', char[2] == 'k' and char[4] == '\0' */
This declares s to be an array of char, which could carry 42 chars, that may be a "string" with a maximum length of 41 chars, as one char needs to carry the terminator (see above).
You mentioned two types in your question:
(a) a pointer to an array of strings
(b) an array of pointers to strings
Refering (a):
A pointer to an array of string needs to have an array of strings to point to:
Let's define it:
char stringarray[3][42]; /* An array of 3 "string"s, each with a max length of 41+'\0'-terminator. */
A pointer to this would be:
char (*pstringarray)[3][42] = &stringarray; /* A pointer to an array of 3 elements of "string"s with a max length of 41 (see above). */
Referring (b):
An array of pointers to "string"s needs some "string"s to point to:
char s1 = "alk";
char s2 = "football";
char s3 = "champion";
Now lets define the array of pointers to "string"s pointing to the "string"s defined above:
char * pointerarray[4] = {
s1,
s2,
s3
}
The latter is equal to:
char * pointerarray[] = {
"alk",
"football",
"champion",
NULL
}
with the 1st three elements referring to some char-arrays ("string"s) and the last element carrying NULL(the null-pointer-value) to indicate the end of the array. Please note: In the first example the latter, the 4th element, the 4th pointer is implcitily set to all 0s, as if an initialiser misses to provide values for what it is initialising those (missed) bytes are all set to to 0.
Conclusion
The OP's example matches proposal (b), so it's: an array of pointers to "string"s.
char* means Pointer to character
[] means array
char* name[] means array of pointers to characters
String in C is an array of characters. For the string-literal like "Alan" the compiler adds a 0x0 at the end.
For performance reasons, the compiler aligns the reserved memory on the register width (32-bit = 4-byte). That's why the actual string "Alan"+ 0x0 requires 8 bytes instead of 5.
EDIT
Sorry for confusion!
I think the confusion was: address of "Alan"(and 'names') is x
Address of "Alan" is not the address of names! Address of pointer to "Alan" after initialization is the address of names.
If you define such a variable, the compiler does three things:
allocates 2 different blocks of memory:
One for holding the pointers in the array = 5 * sizeof(char*)
Another for storing the literals ("Alan", ...)
initializes the pointers with proper addresses of the data in initial values.
names[0] = start of "Alan"-string etc.
The pointers are of cause aligned in the array - apparently for your machine 8-byte wise (64-Bit) If you mean by address the address of the pointer, then of cause the difference will be always 8 (on your machine)

Question about pointers and strings in C [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between char s[] and char *s in C?
Difference between char *str = “…” and char str[N] = “…”?
I have some code that has had me puzzled.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char* string1 = "this is a test";
char string2[] = "this is a test";
printf("%i, %i\n", sizeof(string1), sizeof(string2));
system("PAUSE");
return 0;
}
When it outputs the size of string1, it prints 4, which is to be expected because the size of a pointer is 4 bytes. But when it prints string2, it outputs 15. I thought that an array was a pointer, so the size of string2 should be the same as string1 right? So why is it that it prints out two different sizes for the same type of data (pointer)?
Arrays are not pointers. Array names decay to pointers to the first element of the array in certain situations: when you pass it to a function, when you assign it to a pointer, etc. But otherwise arrays are arrays - they exist on the stack, have compile-time sizes that can be determined with sizeof, and all that other good stuff.
Arrays and pointers are completely different animals. In most contexts, an expression designating an array is treated as a pointer.
First, a little standard language (n1256):
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
The string literal "this is a test" is a 15-element array of char. In the declaration
char *string1 = "this is a test";
string1 is being declared as a pointer to char. Per the language above, the type of the expression "this is a test" is converted from char [15] to char *, and the resulting pointer value is assigned to string1.
In the declaration
char string2[] = "this is a test";
something different happens. More standard language:
6.7.8 Initialization
...
14 An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
...
22 If an array of unknown size is initialized, its size is determined by the largest indexed element with an explicit initializer. At the end of its initializer list, the array no longer has incomplete type.
In this case, string2 is being declared as an array of char, its size is computed from the length of the initializer, and the contents of the string literal are copied to the array.
Here's a hypothetical memory map to illustrate what's happening:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
no name 0x08001230 't' 'h' 'i' 's'
0x08001234 ' ' 'i' 's' ' '
0x08001238 'a' ' ' 't' 'e'
0x0800123C 's' 't' 0
...
string1 0x12340000 0x08 0x00 0x12 0x30
string2 0x12340004 't' 'h' 'i' 's'
0x12340008 ' ' 'i' 's' ' '
0x1234000C 'a' ' ' 't' 'e'
0x1234000F 's' 't' 0
String literals have static extent; that is, the memory for them is set aside at program startup and held until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior; the underlying platform may or may not allow it, and the standard places no restrictions on the compiler. It's best to act as though literals are always unwritable.
In my memory map above, the address of the string literal is set off somewhat from the addresses of string1 and string2 to illustrate this.
Anyway, you can see that string1, having a pointer type, contains the address of the string literal. string2, being an array type, contains a copy of the contents of the string literal.
Since the size of string2 is known at compile time, sizeof returns the size (number of bytes) in the array.
The %i conversion specifier is not the right one to use for expressions of type size_t. If you're working in C99, use %zu. In C89, you would use %lu and cast the expression to unsigned long:
C89: printf("%lu, %lu\n", (unsigned long) sizeof string1, (unsigned long) sizeof string2);
C99: printf("%zu, %zu\n", sizeof string1, sizeof string2);
Note that sizeof is an operator, not a function call; when the operand is an expression that denotes an object, parentheses aren't necessary (although they don't hurt).
string1 is a pointer, but string2 is an array.
The second line is something like int a[] = { 1, 2, 3}; which defines a to be a length-3 array (via the initializer).
The size of string2 is 15 because the initializer is nul-terminated (so 15 is the length of the string + 1).
An array of unknown size is equivalent to a pointer for sizeof purposes. An array of static size counts as its own type for sizeof purposes, and sizeof reports the size of the storage required for the array. Even though string2 is allocated without an explicit size, the C compiler treats it magically because of the direct initialization by a quoted string and converts it to an array with static size. (Since the memory isn't allocated in any other way, there's nothing else it can do, after all.) Static size arrays are different types from pointers (or dynamic arrays!) for the purpose of sizeof behavior, because that's just how C is.
This seems to be a decent reference on the behaviors of sizeof.
The compiler know that test2 is an array, so it prints out the number of bytes allocated to it(14 letters plus null terminator). Remember that sizeof is a compiler function, so it can know the size of a stack variable.
array is not pointer. Pointer is a variable pointing to a memory location whereas array is starting point of sequential memory allocated
Its because
string1 holds pointer, where pointer has contiguous chars & its
immutable.
string2 is location where your chars sit.
basically C compiler iterprets these 2 differently. beautifully explained here http://c-faq.com/aryptr/aryptr2.html.

Resources