Why initializing an array with 0 clears the entire buffer? - c

I am initializing my array, with 0, and I have the buffer clean, what happens to the bits? For example, when I initialize with 'a', not the same, if it were with memset the whole buffer would be filled with 'a'?
#include <stdio.h>
#include <string.h>
int main(void) {
char buffer[256] = {0}, array[256] = {'a'};
char array1[256];
memset(array1, 'a', sizeof(array1));
printf("%c\n%c\n%c\n", buffer[1], array[1], array1[1]);
return 0;
}

If the initialiser does not provide enough elements to initialise the complete variable the rest is initialised as if the variable were declare globally, that is:
integers to 0
floats to 0.
pointers to NULL.
In your particular example the remaining elements of the char-array array will be following the above rule for integers.

The initialization in the case of array[256] = {'a'}; happens as per this rule:
6.7.9 Initialization
...
21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
So only the first element of array will have the value 'a'.
But in the case of memset function,
void *memset(void *s, int c, size_t n);
the function copies the value of c (converted to an unsigned char) into each of the first n characters of the object pointed to by s.
So in this case all the elements of array1 will have the value 'a'.

When you enter the function, in this case main() the stack is increased by the amount needed by the stack frame, in the stack frame there is space for all of the autos (variables declared inside the function) as well other information not relevant here.
So in this case when you write
char array[256]
as the program enters the function the stack will be increased by enough to make room for 256 characters in the array, the value of the characters in the array are undefined, it is possible that this area in memory was previously written to by another function or program who no longer needs it, so we don't know what the value of the rest of the array is.
When you write
char array[256] = {'a'}
it is equivalent to:
char array[256];
array[0] = 'a';
In this case we have not defined what is in the rest of the array
When you do
memset(array, 'a', sizeof(array))
the CPU will need to go through the entire array and initialize each char in the array to 'a', creating a known value for everything in the array at the cost of using a little more CPU.

Related

C : If as I understand 0 and '\0' are the same, how does the compiler knows the size of an array when I write int my_array = {0};?

I am trying to create a function to copy an array into another using pointers. I'd like to add the following condition : if the array of destination is smaller, the loop must break.
So basically it's working, but it is not working if I intilize the the destination array as follows :
int dest_array[10] = {0};
From what I understand it fills the array with int 0's which are equivalent to '\0' (null characters). So here is my question :
In this case how can the computer know the array size or when it ends ?
(And how do I compare arrays passed as parameters ?)
void copy(int *src_arr, int *dest_arr)
{
// The advantage of using pointers is that you don't need to provide the source array's size
// I can't use sizeof to compare the sizes of the arrays because it does not work on parameters.
// It returns the size of the pointer to the array and not of of the whole array
int* ptr1;
int* ptr2;
for( ptr1 = source, ptr2 = dest_arr ;
*ptr1 != '\0' ;
ptr1++, ptr2++ )
{
if(!*ptr2) // Problem here if dest_arr full of 0's
{
printf("Copy interrupted :\n" +
"Destination array is too small");
break;
}
*ptr2 = *ptr1;
}
In C, it is impossible to know the length of an array inherently. This is due to the fact that an array is really just a contiguous chunk of memory, and the value passed to functions is really just a pointer to the first element in the array. As a result of this, to actually know the length of an array within a function other than the function where that array was declared, you have to somehow provide that value to the function. Two common approaches are the use of sentinel values which indicate the last element (similar to the way '\0', the null character, is per convention interpreted as the first character not part of a string in C), or providing another parameter which contains the array length.
As a very common example of this: if you have written any programs which use command-line parameters, then surely you are familiar with the common definition of int main(int argc, char *argv[]), which uses the second of the aforementioned approaches by providing the length of the argv array via the argc parameter.
The compiler has some ways to work around this for local variables. E.g., the following would work:
#include <stdio.h>
int main(){
int nums[10] = {0};
printf("%zu\n", sizeof(nums)/sizeof(nums[0]));
return 0;
}
Which prints 10 to STDOUT; however, this only works because the sizeof operation is done locally, and the compiler knows the length of the array at that point.
On the other hand, we can consider the situation of passing the array to another function:
#include <stdio.h>
int tryToGetSizeOf(int arr[]){
printf("%zu", sizeof(arr)/sizeof(arr[0]));
}
int main(){
int nums[10] = {0};
printf("%zu\n", sizeof(nums)/sizeof(nums[0]));
puts("Calling other function...");
tryToGetSizeOf(nums);
return 0;
}
This will end up printing the following to STDOUT:
10
Calling other function...
2
This may not be the value you're expecting, but this occurs due to the fact that the method signature int tryToGetSizeOf(int arr[]) is functionally equivalent to int tryToGetSizeOf(int *arr). Therefore, you are dividing the size of an integer pointer (int *) by the size of a single int; whereas while you're still in the local context of main() (i.e., where the array was defined originally), you are dividing the size of the allocated memory region by the size of the datatype that memory region is partitioned as (int).
An example of this available on Ideone.
int* ptr1;
int* ptr2;
You lose size information when you refer to arrays as pointers. There is no way you can identify the size of the array i.e. the number of elements using ptr1. You have to take help of another variable which will denote the size of the array referred by ptr1 (or ptr2).
Same holds for character arrays as well. Consider the below:
char some_string[100];
strcpy(some_string, "hello");
The approach you mentioned of checking for \0 (or 0) gives you the number of elements which are part of the string residing in some_string. In no way does it refer to the number of elements in some_string which is 100.
To identify the size of destination, you have to pass another argument depicting its size.
There are other ways to identify the end of the array but t is cleaner to pass the size explicitly rather than using some pointer hack like passing a pointer to end of the array or using some invalid value as the last element in array.
TL/DR - You will need to pass the array size as a separate parameter to your function. Sentinel values like 0 only mark the logical end of a sequence, not the end of the array itself.
Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array. So when you pass your source and destination arrays as arguments to copy, what the function actually receives is just two pointers.
There's no metadata associated with a pointer that tells it whether it's pointing to the first object in a sequence, or how long that sequence is1. A sentinel value like the 0 terminator in strings only tells you how long a logical sequence of values is, not the size of the array in which they are stored2.
You will need to supply at least one more parameter to copy to tell it how large the target buffer is, so you stop copying when you've reached the end of the target buffer or you see a 0 in the source buffer, whichever comes first.
The same is true for array objects - there's no runtime metadata in the array object to store the size or anything else. The only reason the sizeof trick works is that the array's declaration is in scope. The array object itself doesn't know how big it is.
This is a problem for library functions like strcpy, which only receives the starting address for each buffer - if there are more characters in the source buffer than the target is sized to hold, strcpy will blast right past the end of the target buffer and overwrite whatever follows.

Different outputs for almost same programs in C

Sample 1:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hi☻hi♥
Sample 2:
char a []={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
char l = a[i];
printf("%c",a[i]);
}
printf("%s",a);
Output:hii♥hi♥♦
Sample 3:
char a [5]={'h','i'};
int i;
for(i=0;a[i]!='\0';i++){
printf("%c",a[i]);
}
printf("%s",a);
Output: hihi
Why the output of these three programs are dissimilar?
Sample 1 and sample 2 are almost similar code except an extra line char l = a[i] and Sample 3 is different from sample 1 and 2 based on the declaration of the size of the array.
In C, arrays only have a size, but no terminator. So an array of two characters (like your first two examples) will have the two characters you specified and nothing else. When you loop looking for the "terminator" you will go out of bounds and have undefined behavior.
The third case is different, because there you define an array of five elements but only initialize the first two. The C standard then requires the rest of the array to be initialized to zero, which is the same as the character '\0'. The array in the third example still haven't got an explicit terminator though, it just so happens that the remainder is initialized the same value as the string terminator.
For sample 1 and 2, you invoke undefined behavior by passing a non-null terminated array as argument to %s in printf().
For a definition like
char a []={'h','i'};
a will be allocated memory to hold only two elements, there will be no extra space allocated to store a terminating null, in this case of using brace-enclosed initializer list.
Quoting Chapter §7.21.6.1, for use of %s format specifier with printf() family,
s If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
OTOH, in case of sample 3, for a definition like
char a [5]={'h','i'};
the array is null-terminated, so the output is proper. The array is null-terminated in this case, because, you have provided the array size at the time of declaration and supplied less number of initiliazers in the brace enclosed list, so the remaining elements are initialized to 0 (as if they have static storage). Related, C11, chapter §6.7.9, (emphasis mine)
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
For printf("%s",a) to work, the memory block pointed by a must end with 0.
Same thing goes for the code starting with for (i=0; a[i]!='\0'; i++).
In all of your examples, this memory block ends with 'i', not with 0.
You can fix it by changing the initialization of a to either one of the following:
char a[] = {'h','i',0};
char a[] = {'h','i','\0'};
char a[] = "hi";
char *a = "hi";

Array length counting anomaly

The count is returning unpredictable results. Sometimes they are right. Sometimes totally weird. Anyone can tell me what is wrong?
#include <stdio.h>
int len(int[]);
int main (int argc, const char * argv[])
{
int a[]={1,2,3,4,5,6,7,8};
int* i = a;
printf("length is %d",(len(i)));
return 0;
}
int len(int* a){
int count = 0;
for (; *a!='\0'; a++) {
count++;
}
return count;
}
There is not going to be a zero at the end of your array unless you put one there! A literal char array defined using a character string does, indeed, have such a sentinel value, but they're special; other arrays have no equivalent.
The len() function you're trying to write cannot be written for general arrays in C -- there's no way to determine the size of a dynamic array without using the (undocumented, platform-specific) internals of the memory allocator. If it was important to do this for your application, you could only do it if it were possible for you to add a zero at the end of every array yourself, explicitly.
I think you're confused between C strings (arrays of char) and other arrays. It's a convention that C strings are terminated with a null character ('\0'), but not all arrays (even char arrays) are terminated this way.
The general convention is to either store the length of an array somewhere, or to use a sentinel value at the end of the array. This value should be one that won't come up inside the array - eg '\0' in strings, or -1 in an array of positive ints.
Also, if you know that a is an int array (and not a pointer to an int array), then you can use:
size_t length = sizeof(a) / sizeof(a[0]);
So you could do:
int a[] = {1,2,3,4,5,6,7,8};
size_t length = sizeof(a) / sizeof(a[0]);
// In this case, sizeof(a[0])
// is the same as sizeof(int), because it's an int array.
But you can't do:
int *a = malloc(sizeof(int) * 10);
size_t length = sizeof(a) / sizeof(a[0]); // WRONG!
That last example will compile, but the answer will be wrong, because you're getting the size of a pointer to the array rather than the size of the array.
Note that you also can't use this sizeof to read the size of an array that's been passed into a function. It doesn't matter whether you declare your function len(int *a) or len(int a[]) - a will be a pointer, because the compiler converts arrays in function arguments to be a pointer to their first element.
You cannot count arrays like that. Only strings are null terminated. If you want that to work reliably, you will need to add an additional element to your array that contains '\0'. But be sure to remember to take into account that your length will no be one larger then the true length because of that '\0'
Unlike strings, normal arrays do not terminate with a null byte 0x00. The reason strings use this is because arrays have no concept of length; arrays are merely contiguous pieces of memory, it is up to you to keep track of the length of arrays.

Why would someone initialize unallocated memory in C?

Say I do initialize an array like this:
char a[]="test";
What's the purpose of this? We know that the content might immediately get changed, as it is not allocated, and thus why would someone initialize the array like this?
To clarify, this code is wrong for the reasons stated by the OP:
char* a;
strcpy(a, "test");
As noted by other responses, the syntax "char a[] = "test"" does not actually do this. The actual effect is more like this:
char a[5];
strcpy(a, "test");
The first statement allocates a fixed-size static character array on the local stack, and the second initializes the data in it. The size is determined from the length of the string literal. Like all stack variables, the array is automatically deallocated on exiting the function scope.
The purpose of this is to allocate five bytes on the stack or the static data segment (depending on where this snippet occurs), then set those bytes to the array {'t','e','s','t','\0'}.
This syntax allocates an array of five characters on the stack, equivalent to this:
char a[5] = "test";
The elements of the array are initialized to the characters in the string given as an initializer. The size of the array is determined to fit the size of the initializer.
It is allocated. That code is equivalent to
char a[5]="test";
When you leave the number out, the compiler simply calculates the length of the character-array for you by counting the characters in the literal string. It then adds 1 to the length in order to include the necessary terminating nul '\0'. Hence, the length of the array is 5 while the length of the string is 4.
The array is allocated; its size is inferred from the string literal being used to initialize it (5 chars total).
Had you written
char *a = "test";
then all that would get allocated would be a pointer variable, not an array (the string literal "test" lives in memory such that it's allocated at program startup and held until the program exits).

Is this possible? [pointer to char array C]

Is this possible?
size_t calculate(char *s)
{
// I would like to return 64
}
int main()
{
char s[64];
printf("%d", calculate(s));
return 0;
}
I want to write a function which calculates the size of the char array declared in main().
Your function calculate(), given just the pointer argument s, cannot calculate how big the array is. The size of the array is not encoded in the pointer, or accessible from the pointer. If it is designed to take a null-terminated string as an argument, it can determine how long that string is; that's what strlen() does, of course. But if it wants to know how much information it can safely copy into the array, it has to be told how big the array is, or make an assumption that there is enough space.
As others have pointed out, the sizeof() operator can be used in the function where the array definition is visible to get the size of the array. But in a function that cannot see the definition of the array you cannot usefully apply the sizeof() operator. If the array was a global variable whose definition (not declaration) was in scope (visible) where calculate() was written - and not, therefore, the parameter to the function - then calculate() could indicate the size.
This is why many, many C functions take a pointer and a length. The absence of the information is why C is somewhat prone to people misusing it and producing 'buffer overflow' bugs, where the code tries to fit a gallon of information into a pint pot.
On statically declared char[] you can use operator sizeof, which will return 64 in this case.
printf("%d", sizeof(s));
On dynamically declared char*, it is not possible to get the size of the allocated memory.
Dynamic arrays are obtained through malloc and friends. All the others are statically declared, and you can use sizeof on them, as long as you use it in the same scope as the array was declared (same function, in your case, for example).
Yes, it's possible if s has a specific character in the end of it's array. For example you could have s[63] = 125 and by knowing that every other character from 0 to 62 won't be 125, you can do a for loop until you find 125 and return the size of the array.
Otherwise, it's not possible, as s in the function parameter is just a pointer to your array, so sizeof(s) inside calculate will only return your machines pointer size and not 64 as someone could expected.
Unfortunately, you cannot determine from a pointer value alone how many elements are in the corresponding array. You either need some sort of sentinel value in the array (like the 0 terminator used for strings), or you need to keep track of it separately.
What you can do is get the number of bytes or elements in an array using the sizeof operator:
char arr[64];
size_t size = sizeof arr; // # of bytes in arr
size_t count = sizeof arr / sizeof *arr; // # of elements in arr
However, this only works if arr is an array type; if you tried to do this in your function
size_t calculate(char *s)
{
return sizeof s;
}
it would return the size in bytes of the pointer value, not of the corresponding array object.
No. char *x or char x[] just creates a pointer to a memory location. A pointer doesn't hold any information about the size of the memory region.
However, char *x = "Hello" occupies 6 bytes (including the terminating null), and strlen(x) would return 5. This relies on the null char at the end of the string, strlen still knows nothing about the underlying buffer. So strlen("Hello\000There") would still be 5.
This is usually done with a macro in C, like:
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(*x))
Whether it's a good idea is a totally different question.

Resources