Is a number between the 2 square brackets necessary regarding char variables - arrays

I just started programming in C and I see on the internet people declaring chars like, for example, char name[], without putting any number between the 2 square brackets. I used to code in C++ in high school and we always used to put a number when declaring char. Can someone explain when do we put a number, or if it is even necessary to use one when declaring a char variable?
I used to do it like this:
char name[20] = "John";
but I've seen on the internet people doing it like this:
char name[] = "John";

char curse[] = "fie! cometh h're and englut mine own coxcomb thee distemperate fooleth!";
In this case, the compiler automatically computes the number of bytes, including the null-terminator. The size of the array is equal to the length of the string, plus the null-byte.
char curse[100] = "fie! cometh h're and englut mine own coxcomb thee distemperate fooleth!";
Whereas, this defines an array of 100 chars, and initializes it with a string of 70 or so bytes. The rest of the bytes are initialized to 0¹. The size of the array is 100 bytes, whereas the length of the string can be determined with strlen. NB that the size of the array and the length of the string it contains are not the same.
Aside: It's better to define the array size as a macro instead of having magic numbers all over your code. It's easier to maintain that way.
[1] From C11:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
i.e. 0.

char name[] = "John"; will define a char array long enough to accommodate "John" (5 chars).
char name1[20] = "John"; defined an array of 20 char and initialization string only takes 5 bytes (abstracting from the rest of bytes) So you can use those remaining 15 bytes for example by appending another string to it.
strcat(name1, " Travolta");
`

From the C Standard (6.7.9 Initialization)
22 If an array of unknown size is initialized, its size is determined
by the largest indexed element with an explicit initializer. The array
type is completed at the end of its initializer list.
and
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
So in this declaration of a character array
char name[] = "John";
that is according to the second quote from the C Standard equivalent to
char name[] = { "John" );
the array name is initialized by the string literal "John".
According to the first quote from the C Standard the size of the array is determined by the number of characters (including the terminating zero character '\0') in the string literal. In fact the above declaration has the same effect as the following declaration
char name[] = { 'J', 'o', 'h', 'n', '\0' };
So as the number of characters in the string literal is equal to 5 then the array has exactly 5 elements and its size is also equal to 5 because sizeof( char ) is always equal to 1.
In this declaration
char name[20] = "John";
that again may be written like
char name[20] = { "John" };
the array is declared specifying explicitly 20 elements. The first 5 elements of the array have corresponding explicit initializers (characters of the string literal). All other elements of the array are implicitly initialized by 0.
Pay attention to that in C you may write
char name[4] = "John";
that is you may exclude the terminating zero character '\0' of the string literal from the list of initializers of the array. In this case the array name will not contain a string. In C++ such an initialization is incorrect and the C++ compiler will issue an error for such a declaration.
To output a character array that contains a string you can write for example
printf( "name = %s\n", name );
For the last shown declaration where the declared array does not contain a string you can write
printf( "name = %.*s\n", ( int )sizeof( name ), name );

If an array is declared without a size but with an initializer, then the size of the array is taken from the number of elements in the initializer:
int foo[] = {0, 1, 2, 3};
In this case, foo will be 4 elements wide.
char bar[] = "fred";
char bar[] = { 'f', 'r', 'e', 'd', 0 };
In this case, bar will be 5 elements wide - the initializers "fred" and { 'f', 'r', 'e', 'd', 0 } are equivalent to each other.
If you declare an array with a size and an initializer, then the size will be taken from the size expression. If there are fewer elements in the initializer than the size, then remaining elements will be initialized to 0:
int foo[10] = {1, 2, 3}; // elements 3 through 9 will be initialized to 0
If there are more elements in the initializer than the array is sized for, then that's a constraint violation and the compiler will yell at you:
int foo[3] = {1, 2, 3, 4}; // constraint violation, too many elements
// in the initializer
If you use designated initializers, then the size will either be taken from the number of initializers or the largest designated initializer:
int foo[] = {[0] = 1, [9] = 2};
foo will be 10 elements wide, element 0 will be initialized to 1, element 9 will be initialized to 2, and the remaining elements will be initialized to 0.
If an array is declared without a size and without an initializer, then the array type is incomplete, and you can't create an instance of an incomplete type.
Note that array sizes are fixed for the lifetime of the array. C has something called a variable-length array where the size of the array is determined by a runtime variable:
size_t x = some_size();
int arr[x];
However, "variable" in this context only means that the size of the array isn't fixed from definition to definition, not that the array can be resized after it is defined.
In the context of a function parameter declaration, T a[N] and T a[] are "adjusted" to T *a - you cannot pass arrays as function parameters, because array expressions "decay" to pointer expressions under most circumstances and what the function actually receives is a pointer.

Related

Why initializing an array with 0 clears the entire buffer?

I am initializing my array, with 0, and I have the buffer clean, what happens to the bits? For example, when I initialize with 'a', not the same, if it were with memset the whole buffer would be filled with 'a'?
#include <stdio.h>
#include <string.h>
int main(void) {
char buffer[256] = {0}, array[256] = {'a'};
char array1[256];
memset(array1, 'a', sizeof(array1));
printf("%c\n%c\n%c\n", buffer[1], array[1], array1[1]);
return 0;
}
If the initialiser does not provide enough elements to initialise the complete variable the rest is initialised as if the variable were declare globally, that is:
integers to 0
floats to 0.
pointers to NULL.
In your particular example the remaining elements of the char-array array will be following the above rule for integers.
The initialization in the case of array[256] = {'a'}; happens as per this rule:
6.7.9 Initialization
...
21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
So only the first element of array will have the value 'a'.
But in the case of memset function,
void *memset(void *s, int c, size_t n);
the function copies the value of c (converted to an unsigned char) into each of the first n characters of the object pointed to by s.
So in this case all the elements of array1 will have the value 'a'.
When you enter the function, in this case main() the stack is increased by the amount needed by the stack frame, in the stack frame there is space for all of the autos (variables declared inside the function) as well other information not relevant here.
So in this case when you write
char array[256]
as the program enters the function the stack will be increased by enough to make room for 256 characters in the array, the value of the characters in the array are undefined, it is possible that this area in memory was previously written to by another function or program who no longer needs it, so we don't know what the value of the rest of the array is.
When you write
char array[256] = {'a'}
it is equivalent to:
char array[256];
array[0] = 'a';
In this case we have not defined what is in the rest of the array
When you do
memset(array, 'a', sizeof(array))
the CPU will need to go through the entire array and initialize each char in the array to 'a', creating a known value for everything in the array at the cost of using a little more CPU.

C: different String definition, I get different size using sizeof()

I was testing the use of sizeof() for the same String content "abc". my function is like this:
int main(void){
char* pass1 = "abc";
char pass2[] = "abc";
char pass3[4] = "abc";
char pass4[] = "";
scanf("%s", pass4);
printf("sizeof(pass1) is: %lu\n", sizeof(pass1));
printf("sizeof(pass2) is: %lu\n", sizeof(pass2));
printf("sizeof(pass3) is: %lu\n", sizeof(pass3));
printf("sizeof(pass4) is: %lu\n", sizeof(pass4));
return 0;
}
I input "abc" for pass4, the output is like this:
sizeof(pass1) is: 8
sizeof(pass2) is: 4
sizeof(pass3) is: 4
sizeof(pass4) is: 1
I was expecting all 4s. I thought the 4 above string definitions are the same.
why sizeof(pass1) returns 8? Why sizeof(pass4) is 1?
When you take sizeof on a pointer type, you'll get the size in bytes of the memory address. In this case 8 is the size of the address (in bytes). sizeof on statically allocated read only strings in C will return the actual size in bytes of the string including the null byte.
sizeof gives the size of its operand. To understand the results you are seeing, you need to understand what pass1, pass2, pass3, and pass4 actually are.
pass1 is a pointer to char (i.e. a char *) so sizeof pass1 gives the size of a pointer (a variable which contains a memory address, not an array). That is 8 with your compiler. The size of a pointer is implementation defined, so this may give different results with different compilers. The fact you have initialised pass1 so it points at the first character of a string literal "abc" does not change the fact that pass1 is declared as a pointer, not an array.
pass2 is an array initialised using the literal "abc" which - by convention - is represented in C using an array of four characters (the three letters 'a' to 'c', plus an additional character with value zero ('\0').
pass3 is also an array of four char, since it is declared that way char pass3[4] = <etc>. If you had done char pass3[4] = "abcdef", you would still find that sizeof pass3 is 4 (and the 4 elements of pass3 will be 'a' to 'd' (with other character 'e', 'f', and '\0' in the string literal "abcdef" not used to initialise pass3).
Since both pass2 and pass3 are arrays of four characters, their size is 4 (in general, the size of an array is the size of the array element multiplied by number of elements). The standard defines sizeof char to be 1, and 1*4 has a value 4.
pass4 is initialised using the literal "". That string literal is represented using a single char with value '\0' (and no characters before it, since none are between the double quotes). So pass4 has size 1 for the same reason that pass2 has size 4.
In this declaration
char* pass1 = "abc";
in the right side used as an initializer there is a character array that has size equal to 4. String literals have types of character arrays. Take into account that string literals include the terminating zero. You can check this the following way
printf("sizeof( \"abc\") is: %zu\n", sizeof( "abc"));
In the declaration the array is used to initialize a pointer. Used as initializer of a pointer the array is implicitly converted to the pointer to its first element.
Thus the pointer pass1 points to the first element of the string literal "abc". The size of the pointer itself in your system is equal to 8 bytes.
In these declerations
char pass2[] = "abc";
char pass3[4] = "abc";
the string literal is used to initialize arrays. In this case each element of the arrays is initialized by the corresponding element of the string literal. All other elements of the arrays are zero initialized. If the size of an array is not specified then it is calculated from the number of initializers.
So in this declaration
char pass2[] = "abc";
the array pass2 will have 4 elements because the sgtring literal provides four initializers.
In this declaration
char pass3[4] = "abc";
there is explicitly specified that the array has 4 elements.
Thus the both arrays has size equal to 4 and the pointer declared first has size of 8 bytes.

Where is the null-character in a fixed-length empty string? [duplicate]

This question already has answers here:
C char array initialization: what happens if there are less characters in the string literal than the array size?
(6 answers)
Closed 6 years ago.
So I got curious reading some C code; let's say we have the following code:
char text[10] = "";
Where does the C compiler then put the null character?
I can think of 3 possible cases
In the beginning, and then 9 characters of whatever used to be in memory
In the end, so 9 characters of garbage, and then a trailing '\0'
It fills it completely with 10 '\0'
The question is, depending on either case, whether it's necessary to add the trailing '\0' when doing a strncpy. If it's case 2 and 3, then it's not strictly necessary, but a good idea; and if it's case 1, then it's absolutely necessary.
Which is it?
In your initialization, the text array is filled with null bytes (i.e. option #3).
char text[10] = "";
is equivalent to:
char text[10] = { '\0' };
In that the first element of text is explicitly initialized to zero and rest of them are implicitly zero initialized as required by C11, Initialization 6.7.9, 21:
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
Quoting N1256 (roughly C99), since there are no relevant changes to the language before or after:
6.7.8 Initialization
14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
"" is a string literal consisting of one character (its terminating null character), and this paragraph states that that one character is used to initialise the elements of the array, which means the first character is initialised to zero. There's nothing in here that says what happens to the rest of the array, but there is:
21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
This paragraph states that the remaining characters are initialised the same as if they had static storage duration, which means the rest of the array gets initialised to zero as well.
Worth mentioning here as well is the "if there is room" in p14:
In C, char a[5] = "hello"; is perfectly valid too, and for this case too you might want to ask where the compiler puts the null character. The answer here is: it doesn't.
String literal "" has type of character array char[1] in C and const char [1] in C++.
You can imagine it the following way
In C
chat no_name[] = { '\0' };
or
in C++
const chat no_name[] = { '\0' };
When a string literal is used to initialize a character array then all its characters are used as initializers. So for this declaration
char text[10] = "";
you in fact has
char text[10] = { '\0' };
All other characters of the array that do not have corresponding initializers (except the first character that is text[0]) then they are initialized by 0.
From the C Standard (6.7.9 Initialization)
14 An array of character type may be initialized by a character string
literal or UTF−8 string literal, optionally enclosed in braces.
Successive bytes of the string literal (including the terminating null
character if there is room or if the array is of unknown size)
initialize the elements of the array.
and
21 If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration
and at last
10 If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static
or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized
(recursively) according to these rules, and any padding is initialized
to zero bits;
The similar is written in the C++ Standard.
Take into account that in C you may write for example the following way
char text[5] = "Hello";
^^^
In this case the character array will not have the terminating zero because there is no room for it. :) It is the same as if you defined
char text[5] = { 'H', 'e', 'l', 'l', 'o' };

what's the difference between these 4 item:character,array,string,literal. in C?

there's some basic concepts that I get confused when reading a C book recently.
It says: A variable that points to a string literal can’t be used to change the contents of the string.
As I know there's also character literal and integer literal, how's their situation? Are they also can not be update? If so can you give an example?
Besides, what's the difference between literal and array? like character array, string literal, are they actually one thing?
what should I call the variable below? an integer array? an integer literal?
int contestants[] = {1, 2, 3};
I've concluded some examples but I'm still somewhat messed:
char s[] = "How big is it?"; //s is an array variable
char *t = s; //t is a pointer variable to a string literal "How big is it?"
string literal:"ABC"
Character literal:'a'
Integer literal:1
I'm messed by these 4 item:character,array,string,literal. I'm still very messed up.
character array and string literal same thing?
an array of characters and array literal same ?
A literal is a token in a program text that denotes a value. There are string literals like "123", character literals like 'a' and numeric literals like 7.
int contestants[] = {1, 2, 3};
In the program fragment above there are three literals 1 2 and 3 and no others. In particular, neither contestants nor {1, 2, 3} are literals.
It is worth noting that the C standard uses the word literal only in reference to string literals. The other kinds are officially known as constants. But you may find them referred to as literals in all kind of places so I have included them here. "Integer literal" and "integer constant" are the same thing.
A string literal is also an object (a piece of data, a region of storage) in a program that is associated with a string literal in the previous sense. This piece of data is a character array. Not every character array is a string literal. No int array is a literal.
A pointer can point to a string literal, but not to a character literal or to an integer literal, because the latter two kinds are not objects (have no storage associated with them). A pointer can only point to an object. You cannot point a pounter to a literal 5. So the question of whether such things can be modified does not arise.
char* p = "123";
In a program fragment above, "123" is a literal and p points to it. You cannot modify an object pointed to by p.
char a[] = "123";
In the program fragment above, a is a character array. It is initialized with a string literal "123", but it is not a literal itself and can be modified freely.
int i = 5;
Above, 5 is a literal and i is not. i is initialized with a literal, but it isn't one itself.
int k[] = {1, 2, 3};
int* kp = k;
In the line above, much like in the one before it, neither the array k nor its elements are literals. They are merely initialized with literals. kp is a pointer that points to the first element of the array. One can update the array with thos pointer; kp[1] = 3;
Strings:
char s[] = "How big is it?"; //array variable
Here s is an array and holding the string and has both read and write option for this.You can modify the values of the array.The size of the string literal "How big is it?" is calculated and the array size is calculated based on the string length.
What is a string literal?
char *p = "someString";
Here the string someString is stored in a read-only location and the location address is returned to your pointer. So now you can't write to the location the pointer is pointing to.
Integers:
int a[] = {1,2,3};
a is an array which holds values and they can be read also be modified.
In the code
int i;
for(i=0;i<10;i++)
10 is a integer literal as we see 10 represents the decimal value and is directly included in the code.
One more example is
int b;
b=1;
Now 1 is a integer literal.
A literal is a syntactic form that directly represents a value in a programming language. Thus, 1 + 64 is an expression that evaluates to 65 and is not a literal; x after int x = 65 also evaluates to 65 but is not a literal. 65 is a literal that represents 65, and 0x41 is the same; 65L is also a literal that represents a "long integer" version of 65. 'A' is another literal that also represents the number 65, this time as a char. "ABC" is a string literal that, when put into code, represents a four-element array of characters, and fill it with values 65, 66, 67 and 0. You could also use the array literal (char[]){ 65, 66, 67, 0 }, and it would represent the same value, since strings are arrays of characters. [See comments]
Meanwhile, an array is a data structure that can contain multiple values, each value indexed by an integer. Arrays can have literal syntax (as demonstrated above e.g. in JavaScript), and literals can be of arrays; but the two are apples and oranges.
tl;dr: arrays are a specific kind of data structure; literal is how you write data in code.
For int contestants[] = {1, 2, 3};
contestants is array of 3 items of type int initialized by 3 literals 1, 2 and 3.
So literals are particular values written in the code and you should not mix terms literal (value of some type) and array (that also has a type, but is data structure).
Concerning your example with strings
char s[] = "How big is it?"; //array variable
char *t = s; //pointer variable to a string literal
I understand t as a pointer to the first element of array, that was initialized with string literal.
To start with, let's see some definitions.
Array
An array is an ordered data structure consisting of a collection of elements (values or variables), each identified by one (single dimensional array, or vector) or multiple indexes. The elemnts are stored in contiguous memory locations.
String Literals [From C11 standard, chapter 6.4.5]
A character string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz".
Integer constants [From C11 standard, chapter 6.4.4.1]
An integer constant begins with a digit, but has no period or exponent part. It may have a
prefix that specifies its base and a suffix that specifies its type.
So, int x = 5;, here 5 is an integer constant.
Character constants [From C11 standard, chapter 6.4.4.4]
An integer character constant is a sequence of one or more multibyte characters enclosed
in single-quotes, as in 'x'.
So, char y = 'S';, here 'S' is a character constant.
Now,
int contestants[] = {1, 2, 3};
contestants here is an integer array.
char s[] = "How big is it?";
s is a character array, being null terminated, can also be referred to as a string. OTOH, "How big is it?" is an unnamed string literal. We're initializing the containts of s using the string literal. s in present in read-write memory and it's containts are modifiable.
char * point = "Hello World";
p is a pointer to the string literal "Hello World". Usually it is stored in read-only memory location and alteration is not allowed.

Is a char array initialized with zeros/null after string end, when there's still space left?

Consider the following char array:
char str[128] = "abcd";
Are all the remaining uninitialized chars in the rest of the array (from str[4] to str[127]) zero/null filled?
Yes, if there are fewer elements explicitly given in an initialiser than the aggregate contains, then the remaining elements are initialised as if the aggregate had static storage duration. For integer types (and char is one) that means with 0s.
The relevant section of the standard is 6.7.9 (21):
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
String literals as initialisers for char arrays are equivalent to brace-encloded initialisers in that respect.
Yes, the string literal initializer is identical to the following initializer:
char str[128] = { 'a', 'b', 'c', 'd', 0 };
Missing array elements are zero-initialized, hence the remainder of the array is all zeros.

Resources