Declaring and modifying strings in C - c

I've recently started to try learn the C programming language. In my first program (simple hello world thing) I came across the different ways to declare a string after I realised I couldn't just do variable_name = "string data":
char *variable_name = "data"
char variable_name[] = "data"
char variable_name[5] = "data"
What I don't understand is the difference between them. I know they are different and one of them specifically allocates an amount of memory to store the data in but that's about it, and I feel like I need to understand this inside out before moving onto more complex concepts in C.
Also, why does using *variable_name let me reassign the variable name to a new string but variable_name[number] or variable_name[] does not? Surely if I assign, say, 10 bytes to it (char variable_name[10] = "data") and try reassigning it to something that is 10 bytes or smaller it should work, so why doesn't it?
What are the empty brackets and the asterix doing?

In this declaration
char *variable_name = "data";
there is declared a pointer. This pointer points to the first character of the string literal "data". The compiler places the string literal in some region of memory and assigns the pointer by the address of the first character of the literal.
You may reassign the pointer. For example
char *variable_name = "data";
char c = 'A';
variable_name = &c;
However you may not change the string literal itself. An attempt to change a string literal results in undefined behaviour of the program.
In these declarations
char variable_name[] = "data";
char variable_name[5] = "data";
there are declared two arrays elements of which are initialized by characters of used for the initialization string literals. For example this declaration
char variable_name[] = "data";
is equivalent to the following
char variable_name[] = { 'd', 'a', 't', 'a', '\0' };
The array will have 5 elements. So this declaration is fully euivalent to the declaration
char variable_name[5] = "data";
There is a difference if you would specify some other size of the array. For example
char variable_name[7] = "data";
In this case the array would be initialized the following way
char variable_name[7] = { 'd', 'a', 't', 'a', '\0', '\0', '\0' };
That is all elements of the array that do not have explicit initializers are zero-initialized.
Pay attention to that in C you may declare a character array using a string literal the following way
char variable_name[4] = "data";
that is the terminating zero of the string literal is not placed in the array.
In C++ such a declaration is invalid.
Of course you may change elements of the array (if it is not defined as a constant array) if you want.
Take into account that you may enclose a string literal used as an initializer in braces. For example
char variable_name[5] = { "data" };
In C99 you may also use so-called destination initializers. For example
char variable_name[] = { [4] = 'A', [5] = '\0' };
Here is a demonstrative program
#include <stdio.h>
#include <string.h>
int main(void)
{
char variable_name[] = { [4] = 'A', [5] = '\0' };
printf( "%zu\n", sizeof( variable_name ) );
printf( "%zu\n", strlen( variable_name ) );
return 0;
}
The program output is
6
0
When ypu apply standard C function strlen declared in header <string.h> you get that it returns 0 because the first elements of the array that precede the element with index 4 are zero initialized.

Related

Declaration of a character array in C - Clash between Norminette and Compiler

I am hoping somebody could please clarify what I am doing wrong.
I am trying to replicate the strcpy function in C.
The exercise requires us to create two loop through a src string and replace the content at each corresponding index at the destination string.
My issue is when I create the test function in int main() I initialise a character array and assign it some content. It compiles fine however I get a norminette error:
// Method 1
char str1[5] = "abcde";// Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
char str2[5] = "fghij"; //Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
If I initialise and assign like bellow, Norminette is ok but I get a compilation error:
// Method 2
char str1[5] = "abcde";
char str2[5] = "fghij";
str1[] = "abcde"; // error: expected expression before ‘]’ token ... (with an arrow pointing to ] bracket)
str2[] = "fghij"; // error: expected expression before ‘]’ token ... (with an arrow pointing to ] bracket)
// Method 3
char str1[] = {'a', 'b', 'c', 'd', 'e','\0'}; // Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
char str2[] = {'f', 'g', 'h', 'i', 'j', '\0'};//Error: DECL_ASSIGN_LINE Declaration and assignation on a single line
I have also tried various methods including str[5] = "abcde" after declaration with no success.
My question is how can I declare these character arrays to satisfy both the norminette and compiler?
Also is my understand that in C, a character array and a string are interchangeable concepts?
Thank you
In method 2:
str1[] = "abcde"; // error:
is an assignment, so it's invalid. The [] syntax can only be used in a definition. More on this below.
Method 3 is fine. Whoever is flagging this is wrong.
In method 1 [and other places]:
char str1[5] = "abcde";
is wrong because it should be:
char str1[6] = "abcde";
to account for the EOS (0x00) string terminator at the end.
An alternate would be:
char str1[] = "abcde";
And, you did this [more or less] in method 3:
char str1[] = {'a', 'b', 'c', 'd', 'e', '\0'};
AFAICT, this was flagged by the external tool (not the compiler?). It is a perfectly valid alternative.
Also is my understand that in C, a character array and a string are interchangeable concepts?
No. In C, a string is a sequence of character values including a 0-valued terminator. The string "abcde" would be represented by the sequence {'a', 'b', 'c', 'd', 'e', 0}. That terminator is how the various string handling routines like strlen and strcpy know where the end of the string is.
Strings (including string literals like "abcde") are stored in arrays of character type, but not every array of character type stores a string - it could be a character sequence with no 0-valued terminator, or it could be a character sequence including multiple 0-valued bytes.
In order to store a string of N characters, the array has to be at least N+1 elements wide:
char str1[6] = "abcde"; // 5 printing characters plus 0 terminator
char str1[6] = "fghij";
You can declare the array without an explicit size, and the size (including the +1 for the terminator) will be determined from the size of the initializer:
char str1[] = "abcde"; // will allocate 6 elements for str1
char str2[] = "fghij";
I've found the documentation for Norminette and ... ugh.
I get that your school wants everyone to follow a common coding standard; that makes it easier to analyze and grade everyone's code. But some of its rules are just plain weird and non-idiomatic and, ironically, encourage bad style. If I'm interpreting it correctly, it wants you to write your initializers as
char str1[]
= "abcde";
or something equally bizarre. Nobody does that.
One way to get around the problem is to not initialize the array in the declaration, but assign it separately using strcpy:
char str1[6]; // size is required since we don't have an initializer
char str2[6];
strcpy( str1, "abcde" );
strcpy( str2, "fghij" );
You cannot use = to assign whole arrays outside of a declaration (initialization is not the same thing as assignment). IOW, you can't write something like
str1 = "abcde";
as a statement.
You either have to use a library function like strcpy or strncpy (for strings) or memcpy (for things that aren't strings), or you have to assign each element individually:
str1[0] = 'a';
str1[1] = 'b';
str1[2] = 'c';
...

what is the relation between these char str[10], char *str and char *str[10] in C?

Can I consider *str[10] as two dimensional array ?
If I declare char *str[10]={"ONE","TWO","THREE"} how we can access single character ?
This record
char str[10];
is a declaration of an array with 10 elements of the type char, For example you can initialize the array like
char str[10] = "ONE";
This initialization is equivalent to
char str[10] = { 'O', 'N', 'E', '\0' };
all elements of the array that are not explicitly initialized are zero-initialized.
And you may change elements of the array like
str[0] = 'o';
or
strcpy( str, "TWO" );
This record
char *str;
declares a pointer to an object of the type char. You can initialize it for example like
char *str = "ONE";
In this case the pointer will be initialize by the address of the first character of the string literal.
This record
char * str[10];
is a declaration of an array of 10 elements that has the pointer type char *.
You can initialize it as for example
char * str[10] = { "ONE", "TWO", "THREE" };
In this case the first three elements of the array will be initialized by addresses of first characters of the string literals specified explicitly. All other elements will be initialized as null pointers.
You may not change the string literals pointed to by elements of the array. Any attempt to change a string literal results in undefined behavior.
To access elements of the string literals using the array you can use for example two subscript operator. For example
for ( sisze_t i = 0; str[0][i] != '\0'; ++i )
{
putchar( str[0][i] );
}
putchar( '\n' );
If you want to change strings then you need to declare for example a two dimensional array like
char str[][10] = { "ONE", "TWO", "THREE" };
In this case you can change elements of the array that are in turn one-dimensional arrays as for example
str[0][0] = 'o';
or
strcpy( str[0], "FOUR" );
Yes: char* str[10]; would create an array of 10 pointers to chars.
To access a single character, we can access it like a 2 dimensional array; i.e.:
char* str[10]={"ONE","TWO","THREE"};
char first = str[0][0];
Can I consider *str[10] as two dimensional array ?
It's unclear what you mean. *str[10] is not a valid type name, and the context is a bit lacking to determine how else to interpret it.
If you mean it as an expression referencing the subsequent definition then no, its type is char, but evaluating it produces undefined behavior.
If you are asking about the type of the object identified by str, referencing the subsequent definition, then again no. In this case it is a one-dimensional array of pointers to char.
If I declare char *str[10]={"ONE","TWO","THREE"} how we can access single character ?
You can access one of the pointers by indexing str, among other other ways. For example, str[1]. You can access one of the characters in the string into which that pointer points by using the indexing operator again, among other ways. For example, str[1][0]. That you are then using a double index does not make str a 2D array. The memory layout is quite different than if you declared, say, char str[3][10];.

Why can you have an pointer to array of strings in C

why does
char *names [] = {"hello", "Jordan"};
work fine
but this does not
char names [] = {"hello", "Jordan"};
would appreciate if someone could explain this to me, thank you :).
Here
char *names [] = {"hello", "Jordan"};
names is array of char pointers i.e it can holds pointers i.e names each elements itself is one char array. But here
char names [] = {"hello", "Jordan"};
names is just a char array i.e it can hold only single char array like "hello" not multiple.
In second case like
int main(void) {
char names[] = {"hello", "Jordan"};
return 0;
}
when you compile(Suggest you to compile with -Wall -pedantic -Wstrict-prototypes -Werror flags), compiler clearly says
error: excess elements in char array initializer
which means you can't have more than one char array in this case. Correct one is
char names[] = {'h','e','l','l','o','\0'}; /* here names is array of characters */
Edit :- Also there is more possibility if syntax of names looks like below
char names[] = { "hello" "Jordan" }; /* its a valid one */
then here both hello and Jordan gets joined & it becomes single char array helloJordan.
char names[] = { "helloJordan" };
The first is an array of pointers to char. The second is an array of char and would have to look like char names[] = {'a', 'b', 'c'}
A string literal, such as "hello", is stored in static memory as an array of chars. In fact, a string literal has type char [N], where N is the number of characters in the array (including the \0 terminator). In most cases, an array identifier decays to a pointer to the first element of the array, so in most expressions a string literal such as "hello" will decay to a pointer to the char element 'h'.
char *names[] = { "hello", "Jordan" };
Here the two string literals decay to pointers to char which point to 'h' and 'J', respectively. That is, here the string literals have type char * after the conversion. These types agree with the declaration on the left, and the array names[] (which is not an array of character type, but an array of char *) is initialized using these two pointer values.
char names[] = "hello";
or similarly:
char names[] = { "hello" };
Here we encounter a special case. Array identifiers are not converted to pointers to their first elements when they are operands of the sizeof operator or the unary & operator, or when they are string literals used to initialize an array of character type. So in this case, the string literal "hello" does not decay to a pointer; instead the characters contained in the string literal are used to initialize the array names[].
char names[] = {"hello", "Jordan"};
Again, the string literals would be used to initialize the array names[], but there are excess initializers in the initializer list. This is a constraint violation according to the Standard. From §6.7.9 ¶2 of the C11 Draft Standard:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
A conforming implementation must issue a diagnostic in the event of a constraint violation, which may take the form of a warning or an error. On the version of gcc that I am using at the moment (gcc 6.3.0) this diagnostic is an error:
error: excess elements in char array initializer
Yet, for arrays of char that are initialized by an initializer list of char values rather than by string literals, the same diagnostic is a warning instead of an error.
In order to initialize an array of char that is not an array of pointers, you would need a 2d array of chars here. Note that the second dimension is required, and must be large enough to contain the largest string in the initializer list:
char names[][100] = { "hello", "Jordan" };
Here, each string literal is used to initialize an array of 100 chars contained within the larger 2d array of chars. Or, put another way, names[][] is an array of arrays of 100 chars, each of which is initialized by a string literal from the initializer list.
char name[] is an array of characters so you can store a word in it:
char name[] = "Muzol";
This is the same of:
char name[] = {'M', 'u', 'z', 'o', 'l', '\0'}; /* '\0' is NULL, it means end of the array */
And char* names[] is an array of arrays where each element of the first array points to the start of the elements of the second array.
char* name[] = {"name1", "name2"};
It's the same of:
char name1[] = {'n', 'a', 'm', 'e', '1', '\0'}; /* or char name1[] = "name1"; */
char name2[] = {'n', 'a', 'm', 'e', '2', '\0'}; /* or char name2[] = "name2"; */
char* names[] = { name1, name2 };
So basically names[0] points to &name1[0], where it can read the memory until name1[5], this is where it finds the '\0' (NULL) character and stops. The same happens for name2[];

C char array v C char* initialization

The following is accepted as valid c code by gcc version 6.3:
char white[] = { 'a', 'b', 'c' };
char blue[] = "abc";
char *red = "abc";
However the following fails:
char *green = { 'a', 'b', 'c' }; // gcc error
I am sure there is a perfectly rational reason for this to be the case, but I am wondering what it is. This question is motivated by the case when having to initialize an array of bytes (so unsigned char rather than char), it is very tempting to write something like { '\x43', '\xde', '\xa0' } rather than "\x43\xde\xa0", and as soon as you forget to write my_array[] instead of *my_array, you get caught by the compiler.
The following will produce an error
char *green = { 'a', 'b', 'c' };
Because the initializer for green isn't an array of characters as you believe. It doesn't have a type, it's just a brace-enclosed initializer list. The thing that it initializes in the previous samples (i.e. white) determines how it's interpreted. The same initialzier can be used to initialize any aggregate that is capable of holding 3 characters.
But green is a pointer, and not an aggregate, so you can't use a brace-enclosed initializer list as it's initial value.1
Now, the following two work but with very different semantics:
char blue[] = "abc";
char *red = "abc";
blue is an array. It will hold the same contents as the literal "abc". red is a pointer that points at the literal "abc".
You can use a compound literal expression:
char *green = (char[]){ 'a', 'b', 'c' };
It tells the compiler to create an unnamed object (the life time of which depends on the scope of the declaration), that is of character array type and is initialized with those three characters. The pointer is then assigned the address of that object.
These three declarations
char white[] = { 'a', 'b', 'c' };
char blue[] = "abc";
char *red = "abc";
are different.
The first one declares a character array that contains exactly three characters corresponding to the number of the initializers.
The second one declares a character array of four characters because it is initialized by a string literal that has four characters including the terminating zero. So this character array contains a string.
The third one defined a string literal that is a character array and declares a pointer of type char * that is initialized by the address of the first character of the character array corresponding to the string literal.
You can imagine this declaration like
char unnamed = { 'a', 'b', 'c', '\0' };
char *red = unnamed;
This declaration
char *green = { 'a', 'b', 'c' };
is invalid because the left object is a scalar and may not be initialized by a list that contains more than one initializer.
Take into account that you could use a compound literal to initialize the pointer. For example
char *green = ( char[] ){ 'a', 'b', 'c' };

string literal in c

Why is the following code illegal?
typedef struct{
char a[6];
} point;
int main()
{
point p;
p.a = "onetwo";
}
Does it have anything to do with the size of the literal? or is it just illegal to assign a string literal to a char array after it's declared?
It doesn't have anything to do with the size. You cannot assign a string literal to a char array after its been created - you can use it only at the time of definition.
When you do
char a[] = "something";
it creates an array of enough size (including the terminating null) and copies the string to the array. It is not a good practice to specify the array size when you initialize it with a string literal - you might not account for the null character.
When you do
char a[10];
a = "something";
you're trying to assign to the address of the array, which is illegal.
EDIT: as mentioned in other answers, you can do a strcpy/strncpy, but make sure that the array is initialized with the required length.
strcpy(p.a, "12345");//give space for the \0
You can never assign to arrays after they've been created; this is equally illegal:
int foo[4];
int bar[4];
foo = bar;
You need to use pointers, or assign to an index of the array; this is legal:
p.a[0] = 'o';
If you want to leave it an array in the struct, you can use a function like strcpy:
strncpy(p.a, "onetwo", 6);
(note that the char array needs to be big enough to hold the nul-terminator too, so you probably want to make it char a[7] and change the last argument to strncpy to 7)
Arrays are non modifiable lvalues. So you cannot assign to them. Left side of assignment operator must be an modifiable lvalue.
However you can initialize an array when it is defined.
For example :
char a[] = "Hello World" ;// this is legal
char a[]={'H','e','l','l','o',' ','W','o','r','l','d','\0'};//this is also legal
//but
char a[20];
a = "Hello World" ;// is illegal
However you can use strncpy(a, "Hello World",20);
As other answers have already pointed out, you can only initialise a character array with a string literal, you cannot assign a string literal to a character array. However, structs (even those that contain character arrays) are another kettle of fish.
I would not recommend doing this in an actual program, but this demonstrates that although arrays types cannot be assigned to, structs containing array types can be.
typedef struct
{
char value[100];
} string;
int main()
{
string a = {"hello"};
a = (string){"another string!"}; // overwrite value with a new string
puts(a.value);
string b = {"a NEW string"};
b = a; // override with the value of another "string" struct
puts(b.value); // prints "another string!" again
}
So, in your original example, the following code should compile fine:
typedef struct{
char a[6];
} point;
int main()
{
point p;
// note that only 5 characters + 1 for '\0' will fit in a char[6] array.
p = (point){"onetw"};
}
Note that in order to store the string "onetwo" in your array, it has to be of length [7] and not as written in the question. The extra character is for storing the '\0' terminator.
No strcpy or C99 compund literal is needed. The example in pure ANSI C:
typedef struct{
char a[6];
} point;
int main()
{
point p;
*(point*)p.a = *(point*)"onetwo";
fwrite(p.a,6,1,stdout);fflush(stdout);
return 0;
}

Resources