I wrote two versions of codes to practice how to make a character array, and I was expecting the result to be the same.
version 1:
int main(void)
{
char a[7] = "and";
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
version 2:
int main(void)
{
char a[7];
a[1] = 'a';
a[2] = 'n';
a[3] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
However, here is the result I got:
version 1:
size: 7 length: 3
version 2:
size: 7 length: 4
As far as I know, the string ends with null character, and null character in a string literal is implicit, but why did it disappear? Why didn't it be included as the last element as length in Version 1 shows 3?
In fact strlen is supposed to exclude the null character, so the output 3 is correct. The problem with the second version is that a[7] is not initialized so its values may be arbitrary. It just so happens in this case that the 5th value is 0 and the 0th is not, thus the output 4. Note that in the second version you use wrong indices - indexing of arrays starts from 0, not from 1.
If you want to make this work in the second version, re-write it like so:
int main(void)
{
char a[7] = {0};
a[0] = 'a';
a[1] = 'n';
a[2] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
This initializes the first value in a to 0 explicitly and implicitly makes all other values zero too.
In the second case
int main(void)
{
char a[7];
a[1] = 'a';
a[2] = 'n';
a[3] = 'd';
printf("size: %d length: %d",sizeof(a), strlen(a) );
}
you are lucky that
a[0] did not contain a 0 (i.e, '\0' or null) : you would have seen a value 0 as length.
a[4] actually had a null value, otherwise, your computer might have been burnt!!
In other words, for a local variable with automatic storage, if left uninitialized, the values are indeterminate. There's no guarantee of a 0-filling (which acts as null terminator), so using the array as a string (ex: argument to strlen()) will likely have an effect of accessing out of bound memory and invoke undefined behavior.
sizeof measure the memory size of something. strlen calculates the length of a c-string (length defined as the length of the sequence of characters excluding the nul terminating one). Here your c-string is shorter than the memory used to store it.
sizeof is C-operator evaluated as compile-time.
strlen is a library function, called at runtime.
strlen(3)
DESCRIPTION
The strlen() function computes the length of the string s.
RETURN VALUES
The strlen() function returns the number of characters that precede the
terminating NUL character.
Beware that your second example is undefined as the character array is not initialized, strlen may overflow... You have no guarantee that non initialized chars are set to 0.
As per strlen description:
size_t strlen(const char *str);
Returns the length of the given null-terminated byte string, that is, the number of characters in a character array whose first element is pointed to by str up to and not including the first null character.
The behavior is undefined if str is not a pointer to a null-terminated byte string.
Since in the second version, a is not a null-terminated byte string, the behavior is undefined.
Note that assigning individual character literals to a char array does not make it a string literal, to create a properly null-terminated char array using that kind of assignment you need to do it yourself, starting at index 0:
a[0] = 'a';
//...
a[3] = '\0';
Related
char input[5] = "12345";
printf("Convert to int valid %d", atoi(input));
printf("Convert to int invalid %d", atoi(input[1])); // program crash
Is there a solution to convert an char "slice" of an char string into an int?
Short description:
User inputs a string with values for example: 1, 2 3 4 ,5
Iam formating that string to 12345
With each number i want to continue to work with the index of an array.
If you mean "how to access a substring in the char [] array", you can use pointer arithmetic:
char input[6] = "12345";
printf("strtol(input + 1) = %d\n", strtol(input + 1, NULL, 10)); // Output = "2345";
Few things to note though:
Your array should be 6 elements long to hold the null terminator
atoi shouldn't be used at all; strtol is a better function for the task of converting a string to a signed integer; see here for more info.
Also, to convert a single character to an int:
if(isdigit(c))
{
c -= '0';
}
The relation that a textual representation of a digit is exactly '0' higher than the numeric value of that digit is guaranteed to hold for every character set supported by C.
To properly convert an arbitrary slice, you have to either make a copy or modify the string by inserting a \0 after the slice. The latter may not be an option, depending on where the string is stored.
To make a copy, allocate an array big enough to hold the slice and a \0. If you know the size of the slice at compile time, you can allocate on the stack:
char slice[2];
Otherwise, you'll have to allocate dynamically:
char *slice;
slice = malloc(2);
Stack allocated slices do not need to be deallocated, but dynamically allocated ones should be freed as soon as they are no longer needed:
free(slice);
Once you have the slice allocated, copy the portion of interest and terminate it with \0:
strncpy(slice, s + 1, 1);
slice[1] = '\0';
atoi(slice);
This technique will pretty much always work.
If your slice always ends with the string, you don't need to make a copy: you just need to pass a pointer to the start of the slice:
atoi(s + 1);
Modifying the string itself probably won't work, unless it's in writeable memory. If you're sure this is the case, you can do something like:
char tmp;
tmp = s[1];
s[1] = '\0';
atoi(s);
s[1] = tmp;
If you were sure but the memory wasn't writeable, your program will seg-fault.
For the special case where your slice is exactly one character long, you can use the fact that characters are numbers:
s[0] - '0'
Note that '0' !='\0' and that this won't work if your machine uses EBCDIC or similar.
After assigning 26th element, when printed, still "Computer" is printed out in spite I assigned a character to 26th index. I expect something like this: "Computer K "
What is the reason?
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
printf("%s\n", m1); /*prints out "Computer"*/
printf("%c", m1[26]); /*prints "K"*/
}
At 8th index of that string the \0 character is found and %s prints only till it finds a \0 (the end of string, marked by \0) - at 26th the character k is there but it will not be printed as \0 is found before that.
char s[100] = "Computer";
is basically the same as
char s[100] = { 'C', 'o', 'm', 'p', 'u','t','e','r', '\0'};
Since printf stops when the string is 0-terminated it won't print character 26
Whenever you partially initialize an array, the remaining elements are filled with zeroes. (This is a rule in the C standard, C17 6.7.9 §19.)
Therefore char m1[40] = "Computer"; ends up in memory like this:
[0] = 'C'
[1] = 'o'
...
[7] = 'r'
[8] = '\0' // the null terminator you automatically get by using the " " syntax
[9] = 0 // everything to zero from here on
...
[39] = 0
Now of course \0 and 0 mean the same thing, the value 0. Either will be interpreted as a null terminator.
If you go ahead and overwrite index 26 and then print the array as a string, it will still only print until it encounters the first null terminator at index 8.
If you do like this however:
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); // prints out "Computer"
m1[8] = 'K';
printf("%s\n", m1); // prints out "ComputerK"
}
You overwrite the null terminator, and the next zero that happened to be in the array is treated as null terminator instead. This code only works because we partially initialized the array, so we know there are more zeroes trailing.
Had you instead written
int main()
{
char m1[40];
strcpy(m1, "Computer");
This is not initialization but run-time assignment. strcpy would only set index 0 to 8 ("Computer" with null term at index 8). Remaining elements would be left uninitialized to garbage values, and writing m1[8] = 'K' would destroy the string, as it would then no longer be reliably null terminated. You would get undefined behavior when trying to print it: something like garbage output or a program crash.
In C strings are 0-terminated.
Your initialization fills all array elements after the 'r' with 0.
If you place a non-0 character in any random field of the array, this does not change anything in the fields before or after that element.
This means your string is still 0-terminated right after the 'r'.
How should any function know that after that string some other string might follow?
That's because after "Computer" there's a null terminator (\0) in your array. If you add a character after this \0, it won't be printed because printf() stops printing when it encounters a null terminator.
Just as an addition to the other users answers - you should try to answer your question by being more proactive in your learning. It is enough to write a simple program to understand what is happening.
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
for(size_t index = 0; index < 40; index++)
{
printf("m1[%zu] = 0x%hhx ('%c')\n", index, (unsigned char)m1[index], (m1[index] >=32) ? m1[index] : ' ');
}
}
I wanted to test things out with arrays on C as I'm just starting to learn the language. Here is my code:
#include <stdio.h>
main(){
int i,t;
char orig[5];
for(i=0;i<=4;i++){
orig[i] = '.';
}
printf("%s\n", orig);
}
Here is my output:
.....�
It is exactly that. What are those mysterious characters? What have i done wrong?
%s with printf() expects a pointer to a string, that is, pointer to the initial element of a null terminated character array. Your array is not null terminated.
Thus, in search of the terminating null character, printf() goes out of bound, and subsequently, invokes undefined behavior.
You have to null-terminate your array, if you want that to be used as a string.
Quote: C11, chapter §7.21.6.1, (emphasis mine)
s
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
Quick solution:
Increase the array size by 1, char orig[6];.
Add a null -terminator in the end. After the loop body, add orig[i] = '\0';
And then, print the result.
char orig[5];//creates an array of 5 char. (with indices ranging from 0 to 4)
|?|?|?|0|0|0|0|0|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
for(i=0;i<=4;i++){ //attempts to populate array with '.'
orig[i] = '.';
|?|?|?|.|.|.|.|.|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
This results in a non null terminated char array, which will invoke undefined behavior if used in a function that expects a C string. C strings must contain enough space to allow for null termination. Change your declaration to the following to accommodate.
char orig[6];
Then add the null termination to the end of your loop:
...
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = 0;
Resulting in:
|?|?|?|.|.|.|.|.|0|?|?|?|
| ^memory you do not own
^start of orig
Note: Because the null termination results in a C string, the function using it knows how to interpret its contents (i.e. no undefined behavior), and your mysterious characters are held at bay.
There is a difference between an array and a character array. You can consider a character array is an special case of array in which each element is of type char in C and the array should be ended (terminated) by a character null (ASCII value 0).
%s format specifier with printf() expects a pointer to a character array which is terminated by a null character. Your array is not null terminated and hence, printf function goes beyond 5 characters assigned by you and prints garbage values present after your 5th character ('.').
To solve your issues, you need to statically allocate the character array of size one more than the characters you want to store. In your case, a character array of size 6 will work.
#include <stdio.h>
int main(){
int i,t;
char orig[6]; // If you want to store 5 characters, allocate an array of size 6 to store null character at last position.
for (i=0; i<=4; i++) {
orig[i] = '.';
}
orig[5] = '\0';
printf("%s\n", orig);
}
There is a reason to waste one extra character space for the null character. The reason being whenever you pass any array to a function, then only pointer to first element is passed to the function (pushed in function's stack). This makes for a function impossible to determine the end of the array (means operators like sizeof won't work inside the function and sizeof will return the size of the pointer in your machine). That is the reason, functions like memcpy, memset takes an additional function arguments which mentions the array sizes (or the length upto which you want to operate).
However, using character array, function can determine the size of the array by looking for a special character (null character).
You need to add a NUL character (\0) at the end of your string.
#include <stdio.h>
main()
{
int i,t;
char orig[6];
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = '\0';
printf("%s\n", orig);
}
If you do not know what \0 is, I strongly recommand you to check the ascii table (https://www.asciitable.com/).
Good luck
prinftf takes starting pointer of any memory location, array in this case and print till it encounter a \0 character. These type of strings are called as null terminated strings.
So please add a \0 at the end and put in characters till (size of array - 2) like this :
main(){
int i,t;
char orig[5];
for(i=0;i<4;i++){ //less then size of array -1
orig[i] = '.';
}
orig[i] = '\0'
printf("%s\n", orig);
}
I'm very new to C and am a bit confused as to when we need to manually add the terminating '\0' character to strings. Given this function to calculate string length (for clarity's sake):
int stringLength(char string[])
{
int i = 0;
while (string[i] != '\0') {
i++;
}
return i;
}
which calculates the string's length based on the null terminating character. So, using the following cases, what is the role of the '\0' character, if any?
Case 1:
char * stack1 = "stack";
printf("WORD %s\n", stack1);
printf("Length %d\n", stringLength(stack1));
Prints:
WORD stack
Length 5
Case 2:
char stack2[5] = "stack";
printf("WORD %s\n", stack2);
printf("Length %d\n", stringLength(stack2));
Prints:
WORD stack���
Length 8
(These results vary each time, but are never correct).
Case 3:
char stack3[6] = "stack";
printf("WORD %s\n", stack3);
printf("Length %d\n", stringLength(stack3));
Prints:
WORD stack
Length 5
Case 4:
char stack4[6] = "stack";
stack4[5] = '\0';
printf("WORD %s\n", stack4);
printf("Length %d\n", stringLength(stack4));
Prints:
WORD stack
Length 5
Case 5:
char * stack5 = malloc(sizeof(char) * 5);
if (stack5 != NULL) {
stack5[0] = 's';
stack5[1] = 't';
stack5[2] = 'a';
stack5[3] = 'c';
stack5[4] = 'k';
printf("WORD %s\n", stack5);
printf("Length %d\n", stringLength(stack5));
}
free(stack5);
Prints:
WORD stack
Length 5
Case 6:
char * stack6 = malloc(sizeof(char) * 6);
if (stack6 != NULL) {
stack6[0] = 's';
stack6[1] = 't';
stack6[2] = 'a';
stack6[3] = 'c';
stack6[4] = 'k';
stack6[5] = '\0';
printf("WORD %s\n", stack6);
printf("Length %d\n", stringLength(stack6));
}
free(stack6);
Prints:
WORD stack
Length 5
Namely, I would like to know the difference between cases 1, 2, 3, and 4 (also why the erratic behavior of case 2 and no need to specify the null-terminating character in 1 and 3. Also, how 3 and 4 both work the same?) and how 5 and 6 print out the same thing even though not enough memory is allocated in case 5 for the null-terminating character (since only 5 char slots are allocated for each letter in "slack", how does it detect a '\0' character, i.e. the 6th character?)
I'm so sorry for this absurdly long question, it's just I couldn't find a good didactic explanation on these specific instances anywhere else
The storage for a string must always leave room for the terminating null character. In some of your examples you don't do this, explicitly giving a length of 5. In those cases you will get undefined behavior.
String literals always get the null terminator automatically. Even though strlen returns a length of 5, it is really taking 6 bytes.
Your case 5 only works because undefined sometimes means looking like it worked. You probably have a value of zero following the string in memory - but you can't rely on that.
In case 1, you are creating a string literal (a constant which will be on read only memory) which will have the \0 implicitly added to it.
Since \0's position is relied upon to find the end of string, your stringLength() function prints 5.
In case 2, you are trying to initialise a character array of size 5 with a string of 5 characters leaving no space for the \0 delimiter. The memory adjacent to the string can be anything and might have a \0 somewhere. This \0 is considered the end of string here which explains those weird characters that you get. It seems that for the output you gave, this \0 was found only after 3 more characters which were also taken into account while calculating the string length. Since the contents of the memory change over time, the output may not always be the same.
In case 3, you are initialising a character array of size 6 with a string of size 5 leaving enough space to store the \0 which will be implicitly stored. Hence, it will work properly.
Case 4 is similar to case 3. No modification is done by
char stack4[5] = '\0';
because size of stack4 is 6 and hence its last index is 5. You are overwriting a variable with its old value itself. stack4[5] had \0 in it even before you overwrote it.
In case 5, you have completely filled the character array with characters without leaving space for \0. Yet when you print the string, it prints right. I think it is because the memory adjacent to the memory allocated by malloc() merely happened to be zero which is the value of \0. But this is undefined behavior and should not be relied upon. What really happens depends on the implementation.
It should be noted that malloc() will not initialise the memory that it allocates unlike calloc().
Both
char str[2]='\0';
and
char str[2]=0;
are just the same.
But you cannot rely upon it being zero. Memory allocated dynamically could be having zero as the default value owing to the working of the operating system and for security reasons. See here and here for more about this.
If you need the default value of dynamically allocated memory to be zero, you can use calloc().
Case 6 has the \0 in the end and characters in the other positions. The proper string should be displayed when you print it.
If I have an array declared as
char arr[1] = "";
What is actually stored in memory? What will a[0] be?
Strings are null-terminated. An empty string contains one element, the null-terminator itself, i.e, '\0'.
char arr[1] = "";
is equivalent to:
char arr[1] = {'\0'};
You can imagine how it's stored in the memory from this.
C-strings are zero-terminated. Thus, "abc" is represented as { 'a', 'b', 'c', 0 }.
Empty strings thus just have the zero.
This is also the reason why a string must always be allocated to be one char larger than the maximum possible length.
arr[0] = 0x00;
however, if you did not assign any value like
char arr[1];
then arr[0] = garbage value
a[0] is the null character, which can be referred to as '\0' or 0.
A string is, by definition, "a contiguous sequence of characters terminated by and including the first null character". For an empty string, the terminating null character is the first one (at index 0).
It will pique more if the array is declared as char arr[] = "";
In this case the sizeof(arr) is 1 and strlen(arr) is 0 .
But still self analysis can be done by adding print like this printf("%d", arr[0]); So that you can understand by yourself.
string is a sequence of characters, in your case there is no character is present inside "". So it stores only '\0' character in arr[0].
C string is end with NULL, so the empty string "" actually is "\0", Compiler help do this,
so strlen("") equal 0 but sizeof("") equal to 1.