Why can I do
char identifier[4] = {'A', 'B', 'C', 'D'};
and not
char identifier[4];
&identifier = {'A', 'B', 'C', 'D'}; // syntax error : '{'
?
And why can I do
char identifier[4] = "ABCD"; // ABCD\0, aren't that 5 characters??
and not
char identifier[4];
&identifier = "ABCD"; // 'char (*)[4]' differs in levels of indirection from 'char [5]'
?
Is this a joke??
You can only initialize the array when you declare it.
As for char identifier[4] = "ABCD", this is indeed possible but the syntax is used to deliberately omit the trailing NUL character. Do char identifier[] = "ABCD" to let the compiler count the characters and add the NUL ('\0') for you.
Three points:
Initialisation is not assignment
Arrays are not first-class types so cannot be assigned. You have to assign the elements individually (or use a function such as strcpy() or memcpy().
The address of an array is provided by the array name on its own.
In your last example, the following is a valid solution:
char identifier[4];
memcpy(identifier, "ABCD", sizeof(identifier) ) ;
You cannot use strcpy() here, because that would require an array of 5 characters to allow for the nul terminator. The error message about levels of indirection is not a "joke", it is your error; note in the above code identifier does not have a & operator, since that would make it a char** where a char* is required.
What Arkku said, but also, you cannot assign to the address of something, i.e. &x = ... is never legal.
Related
I am hoping somebody could please clarify what I am doing wrong.
I am trying to replicate the strcpy function in C.
The exercise requires us to create two loop through a src string and replace the content at each corresponding index at the destination string.
My issue is when I create the test function in int main() I initialise a character array and assign it some content. It compiles fine however I get a norminette error:
// Method 1
char str1[5] = "abcde";// Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
char str2[5] = "fghij"; //Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
If I initialise and assign like bellow, Norminette is ok but I get a compilation error:
// Method 2
char str1[5] = "abcde";
char str2[5] = "fghij";
str1[] = "abcde"; // error: expected expression before ‘]’ token ... (with an arrow pointing to ] bracket)
str2[] = "fghij"; // error: expected expression before ‘]’ token ... (with an arrow pointing to ] bracket)
// Method 3
char str1[] = {'a', 'b', 'c', 'd', 'e','\0'}; // Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
char str2[] = {'f', 'g', 'h', 'i', 'j', '\0'};//Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
I have also tried various methods including str[5] = "abcde" after declaration with no success.
My question is how can I declare these character arrays to satisfy both the norminette and compiler?
Also is my understand that in C, a character array and a string are interchangeable concepts?
Thank you
In method 2:
str1[] = "abcde"; // error:
is an assignment, so it's invalid. The [] syntax can only be used in a definition. More on this below.
Method 3 is fine. Whoever is flagging this is wrong.
In method 1 [and other places]:
char str1[5] = "abcde";
is wrong because it should be:
char str1[6] = "abcde";
to account for the EOS (0x00) string terminator at the end.
An alternate would be:
char str1[] = "abcde";
And, you did this [more or less] in method 3:
char str1[] = {'a', 'b', 'c', 'd', 'e', '\0'};
AFAICT, this was flagged by the external tool (not the compiler?). It is a perfectly valid alternative.
Also is my understand that in C, a character array and a string are interchangeable concepts?
No. In C, a string is a sequence of character values including a 0-valued terminator. The string "abcde" would be represented by the sequence {'a', 'b', 'c', 'd', 'e', 0}. That terminator is how the various string handling routines like strlen and strcpy know where the end of the string is.
Strings (including string literals like "abcde") are stored in arrays of character type, but not every array of character type stores a string - it could be a character sequence with no 0-valued terminator, or it could be a character sequence including multiple 0-valued bytes.
In order to store a string of N characters, the array has to be at least N+1 elements wide:
char str1[6] = "abcde"; // 5 printing characters plus 0 terminator
char str1[6] = "fghij";
You can declare the array without an explicit size, and the size (including the +1 for the terminator) will be determined from the size of the initializer:
char str1[] = "abcde"; // will allocate 6 elements for str1
char str2[] = "fghij";
I've found the documentation for Norminette and ... ugh.
I get that your school wants everyone to follow a common coding standard; that makes it easier to analyze and grade everyone's code. But some of its rules are just plain weird and non-idiomatic and, ironically, encourage bad style. If I'm interpreting it correctly, it wants you to write your initializers as
char str1[]
= "abcde";
or something equally bizarre. Nobody does that.
One way to get around the problem is to not initialize the array in the declaration, but assign it separately using strcpy:
char str1[6]; // size is required since we don't have an initializer
char str2[6];
strcpy( str1, "abcde" );
strcpy( str2, "fghij" );
You cannot use = to assign whole arrays outside of a declaration (initialization is not the same thing as assignment). IOW, you can't write something like
str1 = "abcde";
as a statement.
You either have to use a library function like strcpy or strncpy (for strings) or memcpy (for things that aren't strings), or you have to assign each element individually:
str1[0] = 'a';
str1[1] = 'b';
str1[2] = 'c';
...
why does
char *names [] = {"hello", "Jordan"};
work fine
but this does not
char names [] = {"hello", "Jordan"};
would appreciate if someone could explain this to me, thank you :).
Here
char *names [] = {"hello", "Jordan"};
names is array of char pointers i.e it can holds pointers i.e names each elements itself is one char array. But here
char names [] = {"hello", "Jordan"};
names is just a char array i.e it can hold only single char array like "hello" not multiple.
In second case like
int main(void) {
char names[] = {"hello", "Jordan"};
return 0;
}
when you compile(Suggest you to compile with -Wall -pedantic -Wstrict-prototypes -Werror flags), compiler clearly says
error: excess elements in char array initializer
which means you can't have more than one char array in this case. Correct one is
char names[] = {'h','e','l','l','o','\0'}; /* here names is array of characters */
Edit :- Also there is more possibility if syntax of names looks like below
char names[] = { "hello" "Jordan" }; /* its a valid one */
then here both hello and Jordan gets joined & it becomes single char array helloJordan.
char names[] = { "helloJordan" };
The first is an array of pointers to char. The second is an array of char and would have to look like char names[] = {'a', 'b', 'c'}
A string literal, such as "hello", is stored in static memory as an array of chars. In fact, a string literal has type char [N], where N is the number of characters in the array (including the \0 terminator). In most cases, an array identifier decays to a pointer to the first element of the array, so in most expressions a string literal such as "hello" will decay to a pointer to the char element 'h'.
char *names[] = { "hello", "Jordan" };
Here the two string literals decay to pointers to char which point to 'h' and 'J', respectively. That is, here the string literals have type char * after the conversion. These types agree with the declaration on the left, and the array names[] (which is not an array of character type, but an array of char *) is initialized using these two pointer values.
char names[] = "hello";
or similarly:
char names[] = { "hello" };
Here we encounter a special case. Array identifiers are not converted to pointers to their first elements when they are operands of the sizeof operator or the unary & operator, or when they are string literals used to initialize an array of character type. So in this case, the string literal "hello" does not decay to a pointer; instead the characters contained in the string literal are used to initialize the array names[].
char names[] = {"hello", "Jordan"};
Again, the string literals would be used to initialize the array names[], but there are excess initializers in the initializer list. This is a constraint violation according to the Standard. From §6.7.9 ¶2 of the C11 Draft Standard:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
A conforming implementation must issue a diagnostic in the event of a constraint violation, which may take the form of a warning or an error. On the version of gcc that I am using at the moment (gcc 6.3.0) this diagnostic is an error:
error: excess elements in char array initializer
Yet, for arrays of char that are initialized by an initializer list of char values rather than by string literals, the same diagnostic is a warning instead of an error.
In order to initialize an array of char that is not an array of pointers, you would need a 2d array of chars here. Note that the second dimension is required, and must be large enough to contain the largest string in the initializer list:
char names[][100] = { "hello", "Jordan" };
Here, each string literal is used to initialize an array of 100 chars contained within the larger 2d array of chars. Or, put another way, names[][] is an array of arrays of 100 chars, each of which is initialized by a string literal from the initializer list.
char name[] is an array of characters so you can store a word in it:
char name[] = "Muzol";
This is the same of:
char name[] = {'M', 'u', 'z', 'o', 'l', '\0'}; /* '\0' is NULL, it means end of the array */
And char* names[] is an array of arrays where each element of the first array points to the start of the elements of the second array.
char* name[] = {"name1", "name2"};
It's the same of:
char name1[] = {'n', 'a', 'm', 'e', '1', '\0'}; /* or char name1[] = "name1"; */
char name2[] = {'n', 'a', 'm', 'e', '2', '\0'}; /* or char name2[] = "name2"; */
char* names[] = { name1, name2 };
So basically names[0] points to &name1[0], where it can read the memory until name1[5], this is where it finds the '\0' (NULL) character and stops. The same happens for name2[];
The following is accepted as valid c code by gcc version 6.3:
char white[] = { 'a', 'b', 'c' };
char blue[] = "abc";
char *red = "abc";
However the following fails:
char *green = { 'a', 'b', 'c' }; // gcc error
I am sure there is a perfectly rational reason for this to be the case, but I am wondering what it is. This question is motivated by the case when having to initialize an array of bytes (so unsigned char rather than char), it is very tempting to write something like { '\x43', '\xde', '\xa0' } rather than "\x43\xde\xa0", and as soon as you forget to write my_array[] instead of *my_array, you get caught by the compiler.
The following will produce an error
char *green = { 'a', 'b', 'c' };
Because the initializer for green isn't an array of characters as you believe. It doesn't have a type, it's just a brace-enclosed initializer list. The thing that it initializes in the previous samples (i.e. white) determines how it's interpreted. The same initialzier can be used to initialize any aggregate that is capable of holding 3 characters.
But green is a pointer, and not an aggregate, so you can't use a brace-enclosed initializer list as it's initial value.1
Now, the following two work but with very different semantics:
char blue[] = "abc";
char *red = "abc";
blue is an array. It will hold the same contents as the literal "abc". red is a pointer that points at the literal "abc".
You can use a compound literal expression:
char *green = (char[]){ 'a', 'b', 'c' };
It tells the compiler to create an unnamed object (the life time of which depends on the scope of the declaration), that is of character array type and is initialized with those three characters. The pointer is then assigned the address of that object.
These three declarations
char white[] = { 'a', 'b', 'c' };
char blue[] = "abc";
char *red = "abc";
are different.
The first one declares a character array that contains exactly three characters corresponding to the number of the initializers.
The second one declares a character array of four characters because it is initialized by a string literal that has four characters including the terminating zero. So this character array contains a string.
The third one defined a string literal that is a character array and declares a pointer of type char * that is initialized by the address of the first character of the character array corresponding to the string literal.
You can imagine this declaration like
char unnamed = { 'a', 'b', 'c', '\0' };
char *red = unnamed;
This declaration
char *green = { 'a', 'b', 'c' };
is invalid because the left object is a scalar and may not be initialized by a list that contains more than one initializer.
Take into account that you could use a compound literal to initialize the pointer. For example
char *green = ( char[] ){ 'a', 'b', 'c' };
I've recently started to try learn the C programming language. In my first program (simple hello world thing) I came across the different ways to declare a string after I realised I couldn't just do variable_name = "string data":
char *variable_name = "data"
char variable_name[] = "data"
char variable_name[5] = "data"
What I don't understand is the difference between them. I know they are different and one of them specifically allocates an amount of memory to store the data in but that's about it, and I feel like I need to understand this inside out before moving onto more complex concepts in C.
Also, why does using *variable_name let me reassign the variable name to a new string but variable_name[number] or variable_name[] does not? Surely if I assign, say, 10 bytes to it (char variable_name[10] = "data") and try reassigning it to something that is 10 bytes or smaller it should work, so why doesn't it?
What are the empty brackets and the asterix doing?
In this declaration
char *variable_name = "data";
there is declared a pointer. This pointer points to the first character of the string literal "data". The compiler places the string literal in some region of memory and assigns the pointer by the address of the first character of the literal.
You may reassign the pointer. For example
char *variable_name = "data";
char c = 'A';
variable_name = &c;
However you may not change the string literal itself. An attempt to change a string literal results in undefined behaviour of the program.
In these declarations
char variable_name[] = "data";
char variable_name[5] = "data";
there are declared two arrays elements of which are initialized by characters of used for the initialization string literals. For example this declaration
char variable_name[] = "data";
is equivalent to the following
char variable_name[] = { 'd', 'a', 't', 'a', '\0' };
The array will have 5 elements. So this declaration is fully euivalent to the declaration
char variable_name[5] = "data";
There is a difference if you would specify some other size of the array. For example
char variable_name[7] = "data";
In this case the array would be initialized the following way
char variable_name[7] = { 'd', 'a', 't', 'a', '\0', '\0', '\0' };
That is all elements of the array that do not have explicit initializers are zero-initialized.
Pay attention to that in C you may declare a character array using a string literal the following way
char variable_name[4] = "data";
that is the terminating zero of the string literal is not placed in the array.
In C++ such a declaration is invalid.
Of course you may change elements of the array (if it is not defined as a constant array) if you want.
Take into account that you may enclose a string literal used as an initializer in braces. For example
char variable_name[5] = { "data" };
In C99 you may also use so-called destination initializers. For example
char variable_name[] = { [4] = 'A', [5] = '\0' };
Here is a demonstrative program
#include <stdio.h>
#include <string.h>
int main(void)
{
char variable_name[] = { [4] = 'A', [5] = '\0' };
printf( "%zu\n", sizeof( variable_name ) );
printf( "%zu\n", strlen( variable_name ) );
return 0;
}
The program output is
6
0
When ypu apply standard C function strlen declared in header <string.h> you get that it returns 0 because the first elements of the array that precede the element with index 4 are zero initialized.
I've been reading some code and I encountered the following:
int function(){
char str[4] = "ABC\0";
int number;
/* .... */
}
Normally, when you write a string literal to initialize a char array, the string should be null terminated implicitly right? What happens in this case? Does the compiler recognize the '\0' in the string literal and make that the null terminator? or does it overflow to the int number? Is there anything wrong with this form?
The C99 standard §6.7.8.¶14 says
An array of character type may be initialized by a character string
literal, optionally enclosed in braces. Successive characters of the
character string literal (including the terminating null character if
there is room or if the array is of unknown size) initialize the
elements of the array.
This means that the following statements are equivalent.
char str[4] = "ABC\0";
// equivalent to
char str[4] = "ABC";
// equivalent to
char sr[4] = {'A', 'B', 'C', '\0'};
So there's nothing wrong with the first statement above. As the standard explicitly states, only that many characters in the string literal are used for initializing the array as the size of the array. Note that the string literal "ABC\0" actually contains five characters. '\0' is just like any character, so it's fine.
However please note that there's a difference between
char str[4] = "ABC\0";
// equivalent to
char str[4] = {'A', 'B', 'C', '\0'};
char str[] = "ABC\0"; // sizeof(str) is 5
// equivalent to
char str[] = {'A', 'B', 'C', '\0', '\0'};
That's because the string literal "ABC\0" contains 5 characters and all these characters are used in the initialization of str when the size of the array str is not specified. Contrary to this, when the size of str is explicitly stated as 4, then only the first 4 characters in the literal "ABC\0" are used for its initialization as clearly mentioned in the above quoted para from the standard.
If the code is:
char str[3] = "ABC";
It's fine in C, but the character array str is not a string because it's not null-terminated. See C FAQ: Is char a[3] = "abc"; legal? What does it mean? for detail.
In your example:
char str[4] = "ABC\0";
The last character of the array str happens to be set to '\0', so it's fine and it's a string.