Two ways to initialize an array. What happens with each one? - c

There are two ways (at least) to initialize an array in C. What is the difference between
int array[] = {1,2,3,4,5,6,7,8,9} ;
and:
int array[100] = {1,2,3,4,5,6,7,8,9} ;
I do not mean in means of memory allocation. Perhaps the thing that provoked this question would be useful so as to understand my question.
I wanted to get the length of an int array by iterating trough it. Here is the code:
#include <stdio.h>
#include <stdlib.h>
int array[] = {1,2,3,4,5,6,7,8,9} ;
int i = 0 ; // i is length
while( array[i] ) {
printf("%d\n" , array[i] ) ;
i++ ;
}
printf("%d\n" , i) ;
And I noticed that when I used array[] the length sometimes was wrong because of some sort of overflow , but when I used array[100] the length was always right. What is the difference between these two?
Has it got something to do with '\0' character ?

When you create the array without specifying its size the compiler infers it from the initializer (in this case, the length would be 9). The memory locations immediately after the array have unspecified contents since noone bothered giving them specific values, and that's why you get the "overflow" behavior -- this is technically undefined behavior, but the result is a very common way for the compiler vendor to implement "undefined".
When you explicitly specify the size the compiler initializes the array with as many elements as you have provided, then fills the remaining space with zeroes.
In both cases the behavior is according to the standard.

You can programmatically get the size of the array with the sizeof operator. So in this case, you can do sizeof(array)/sizeof(int) to get the actual size. The sizeof operator is handled by the compiler, which will insert the correct size constant at compile time.
Note that iterating through the array until you get to a false result is undefined behavior and should not be done.
Pertaining to your original question, #Jon is correct; either array size specifier is correct and will yield the same results.

Because there is no null terminating character in arrays of numbers, only array of char. So if your array was a c style string, your code would have successfully found the length

Related

C : If as I understand 0 and '\0' are the same, how does the compiler knows the size of an array when I write int my_array = {0};?

I am trying to create a function to copy an array into another using pointers. I'd like to add the following condition : if the array of destination is smaller, the loop must break.
So basically it's working, but it is not working if I intilize the the destination array as follows :
int dest_array[10] = {0};
From what I understand it fills the array with int 0's which are equivalent to '\0' (null characters). So here is my question :
In this case how can the computer know the array size or when it ends ?
(And how do I compare arrays passed as parameters ?)
void copy(int *src_arr, int *dest_arr)
{
// The advantage of using pointers is that you don't need to provide the source array's size
// I can't use sizeof to compare the sizes of the arrays because it does not work on parameters.
// It returns the size of the pointer to the array and not of of the whole array
int* ptr1;
int* ptr2;
for( ptr1 = source, ptr2 = dest_arr ;
*ptr1 != '\0' ;
ptr1++, ptr2++ )
{
if(!*ptr2) // Problem here if dest_arr full of 0's
{
printf("Copy interrupted :\n" +
"Destination array is too small");
break;
}
*ptr2 = *ptr1;
}
In C, it is impossible to know the length of an array inherently. This is due to the fact that an array is really just a contiguous chunk of memory, and the value passed to functions is really just a pointer to the first element in the array. As a result of this, to actually know the length of an array within a function other than the function where that array was declared, you have to somehow provide that value to the function. Two common approaches are the use of sentinel values which indicate the last element (similar to the way '\0', the null character, is per convention interpreted as the first character not part of a string in C), or providing another parameter which contains the array length.
As a very common example of this: if you have written any programs which use command-line parameters, then surely you are familiar with the common definition of int main(int argc, char *argv[]), which uses the second of the aforementioned approaches by providing the length of the argv array via the argc parameter.
The compiler has some ways to work around this for local variables. E.g., the following would work:
#include <stdio.h>
int main(){
int nums[10] = {0};
printf("%zu\n", sizeof(nums)/sizeof(nums[0]));
return 0;
}
Which prints 10 to STDOUT; however, this only works because the sizeof operation is done locally, and the compiler knows the length of the array at that point.
On the other hand, we can consider the situation of passing the array to another function:
#include <stdio.h>
int tryToGetSizeOf(int arr[]){
printf("%zu", sizeof(arr)/sizeof(arr[0]));
}
int main(){
int nums[10] = {0};
printf("%zu\n", sizeof(nums)/sizeof(nums[0]));
puts("Calling other function...");
tryToGetSizeOf(nums);
return 0;
}
This will end up printing the following to STDOUT:
10
Calling other function...
2
This may not be the value you're expecting, but this occurs due to the fact that the method signature int tryToGetSizeOf(int arr[]) is functionally equivalent to int tryToGetSizeOf(int *arr). Therefore, you are dividing the size of an integer pointer (int *) by the size of a single int; whereas while you're still in the local context of main() (i.e., where the array was defined originally), you are dividing the size of the allocated memory region by the size of the datatype that memory region is partitioned as (int).
An example of this available on Ideone.
int* ptr1;
int* ptr2;
You lose size information when you refer to arrays as pointers. There is no way you can identify the size of the array i.e. the number of elements using ptr1. You have to take help of another variable which will denote the size of the array referred by ptr1 (or ptr2).
Same holds for character arrays as well. Consider the below:
char some_string[100];
strcpy(some_string, "hello");
The approach you mentioned of checking for \0 (or 0) gives you the number of elements which are part of the string residing in some_string. In no way does it refer to the number of elements in some_string which is 100.
To identify the size of destination, you have to pass another argument depicting its size.
There are other ways to identify the end of the array but t is cleaner to pass the size explicitly rather than using some pointer hack like passing a pointer to end of the array or using some invalid value as the last element in array.
TL/DR - You will need to pass the array size as a separate parameter to your function. Sentinel values like 0 only mark the logical end of a sequence, not the end of the array itself.
Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array. So when you pass your source and destination arrays as arguments to copy, what the function actually receives is just two pointers.
There's no metadata associated with a pointer that tells it whether it's pointing to the first object in a sequence, or how long that sequence is1. A sentinel value like the 0 terminator in strings only tells you how long a logical sequence of values is, not the size of the array in which they are stored2.
You will need to supply at least one more parameter to copy to tell it how large the target buffer is, so you stop copying when you've reached the end of the target buffer or you see a 0 in the source buffer, whichever comes first.
The same is true for array objects - there's no runtime metadata in the array object to store the size or anything else. The only reason the sizeof trick works is that the array's declaration is in scope. The array object itself doesn't know how big it is.
This is a problem for library functions like strcpy, which only receives the starting address for each buffer - if there are more characters in the source buffer than the target is sized to hold, strcpy will blast right past the end of the target buffer and overwrite whatever follows.

C variable not where I expect to find it in memory

Can someone explain why printing the pointers to the two ints results in them being placed in different locations in relation to the chars.
The piece of code below should print out the memory address from &a to &c which (I think) should include the two ints defined but it doesn't, however when I try to find out where they're stored in memory (see second code segment) it does print them between the two chars as expected.
Please explain why printing the int pointers effects the ints being stored between the chars in memory.
The two code samples are the same except code 2 has an extra line printf("\n\n%p,%p\n",&i,&j); which prints the pointers of the two ints.
Edit: Yes I know the prinf formating is ugly but the code was only to help me clarify how memory and pointers work, so I didn't need it to be pretty
Code1
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv){
char a='a';
int i=1;
int j=2;
char c='c';
char *pos;
for ( pos=&c; pos<=&a; pos++ ){
printf("%p\t",pos);
}
printf("\n");
for ( pos=&c; pos<=&a; pos++ ){
printf("%i\t\t",*pos);
}
}
Results from Code1
0x7ffde6321e7e 0x7ffde6321e7f
99 97
Code2
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv){
char a='a';
int i=1;
int j=2;
char c='c';
char *pos;
for ( pos=&c; pos<=&a; pos++ ){
printf("%p\t",pos);
}
printf("\n");
for ( pos=&c; pos<=&a; pos++ ){
printf("%i\t\t",*pos);
}
printf("\n\n%p,%p\n",&i,&j);
}
Results from Code2
0x7ffc3575616b 0x7ffc3575616c 0x7ffc3575616d 0x7ffc3575616e 0x7ffc3575616f 0x7ffc35756170 0x7ffc35756171 0x7ffc35756172 0x7ffc35756173 0x7ffc35756174 0x7ffc35756175 0x7ffc35756176 0x7ffc35756177
99 2 0 0 0 1 0 0 0 -4 127 0 97
0x7ffc35756170,0x7ffc3575616c
You're relying on somethingNote 1 which is not specified in C standard. The behaviour cannot be defined. It invokes undefined behavior.Note 2
That said, you should always cast the argument of %p to void *, as the expected type is void * and there's no default promotion for pointers.
Note 1:
C does not mention or guarantee the order of allocation of variables / objects in a program. There's no guarantee that they will have consecutive memory locations, either increasing or decreasing. They are purely allowed to have random memory locations, so the theory you're believing in,
for ( pos=&c; pos<=&a; pos++ )
does not hold true. An(y) implementation can choose to place (reorder) variable(s) however it does see fit. There's absolutely no guarantee of the order of memory address with respect to their definition in the code.
Note 2:
For relational operators, quoting C11. chapter §6.5.8, (emphasis mine)
When two pointers are compared, the result depends on the relative locations in the
address space of the objects pointed to. If two pointers to object types both point to the
same object, or both point one past the last element of the same array object, they
compare equal. If the objects pointed to are members of the same aggregate object,
pointers to structure members declared later compare greater than pointers to members
declared earlier in the structure, and pointers to array elements with larger subscript
values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the
expression P points to an element of an array object and the expression Q points to the
last element of the same array object, the pointer expression Q+1 compares greater than
P. In all other cases, the behavior is undefined.
So, for your case, the comparison pos<=&a; is an attempt to compare two pointers which are neither
pointing to same object
members of the same aggregate object
pointers to array elements
pointers to members of the same union object
In short, they are not within the defined scope and hence, using them as operand of the relational operator invokes undefined behaviour.
The location of local variables is implementation defined. The compiler may put them in any order it deems best.
Making seemingly unrelated code changes such as an extra print statement or changing the optimization level can change how the compiler lays out the variables.
In short, you can't depend on any particular layout of variables in memory.
Local variables are placed in the stack (or in register if possible & if their address is not referred). In your example the i is first and j is second local vars, so you have push i, push j - the address of the second &j is &i-1.

what does sizeof() check as sentinel value for int array[] in c

Let us consider
int array[] = {2,33,4,56,7,8}; //case A
if sizeof() checked '\0' as end of char[] array!
what does sizeof(array) check as a sentinel value to find end of int array, therefore size of array in case A?
If I were to implement sizeof (intArray) , there is no liberty to access of sentinel value information ?
sizeof does not check anything. It only looks like a function call, but it is really an operator, a compiler trick to insert the size as known to the compiler at compile time.
Here is how sizeof interacts with C arrays: when you declare an array, you specify its size as a constant, as a run-time integer expression, or implicitly by supplying a certain number of values to put into your array.
When the number of elements is known at compile time, the compiler replaces sizeof(array) with the actual number. When the number of elements does not become known until runtime, the compiler prepares a special implementation-specific storage location, and stores the size there. The running program will need this information for stack clean-up. The compiler also makes this hidden information known to the runtime portion of sizeof implementation to return a correct value.
I think you're confusing string literals having a '\0' (null-terminator) in the end with arrays in general. Arrays have compile-time length known to the compiler 1. sizeof is an operator which gives the size based on the array length and the base type of the array.
So when someone does int a[] = {1, 2, 3}; there's no null-terminating character added in the end and number of elements is deduced as 3 by the compiler. On a platform where sizeof(int) = 4, you'll get sizeof(a) as 12.
The confusion is because for char b[] = "abc";, the element count would be 4 since all string literals have a '\0' automatically put up I.e. They are null-terminated automatically. It is not the sizeof operator which does a check for this; it simply gives 4 * sizeof(char) since for sizeof all that matters is the compile-time array length which is 4 = 1 + the number of characters explicitly stated in the string literal due to the nature of string literals in C.
However a character array not initialised by a string literal but with character literals doesn't have this quirk. Thus if char c[] = {'a', 'b', 'c'};, sizeof(c) would return 3 and NOT 4 as it is not a string literal and there's no null-terminating character. Again sizeof operator (not function) does this deduction at compile-time 2.
Finally, how the sizeof operator itself is implemented to do this, is an implementation detail not mandated by the standard. A standard talks about conditions and results. How they're achieved by implementations isn't a concern of the standard (or to anyone except the developers who implement it).
1 C99 introduced Variable Length Arrays (VLA) which allows arrays to have dynamic size.
2 Only for VLAs the sizeof operator and its operand are evaluated at run-time
sizeof is not a function, but a compile-time operator, that is replaced with the size of the variable. In case of true arrays (not pointers) it is replaced with the size in bytes of the content of the array, because it's knows at compile time;
Try the following to convince yourself:
void print_size(int[] array)
{
printf("%u\n", sizeof(array)); //Prints 4 (= sizeof(int*))
//May print 8 on 64b architectures
}
int main()
{
int array[] = {2,33,4,56,7,8};
printf("%u\n", sizeof(array)); //Prints 24 (= 6*sizeof(int))
print_size(array);
return 0;
}
This is because, inside of main, the compiler knows that array is an array of 6 ints, while the function print_size may be called with any array, and so its size is not known in advance: it is treated just like a int* (except that I'm not sure if it's a lvalue)

Questions about pointers and arrays

Sanity-check questions:
I did a bit of googling and discovered the correct way to return a one-dimensional integer array in C is
int * function(args);
If I did this, the function would return a pointer, right? And if the return value is r, I could find the nth element of the array by typing r[n]?
If I had the function return the number "3", would that be interpreted as a pointer to the address "3?"
Say my function was something like
int * function(int * a);
Would this be a legal function body?
int * b;
b = a;
return b;
Are we allowed to just assign arrays to other arrays like that?
If pointers and arrays are actually the same thing, can I just declare a pointer without specifying the size of the array? I feel like
int a[10];
conveys more information than
int * a;
but aren't they both ways of declaring an array? If I use the latter declaration, can I assign values to a[10000000]?
Main question:
How can I return a two-dimensional array in C? I don't think I could just return a pointer to the start of the array, because I don't know what dimensions the array has.
Thanks for all your help!
Yes
Yes but it would require a cast: return (int *)3;
Yes but you are not assigning an array to another array, you are assigning a pointer to a pointer.
Pointers and arrays are not the same thing. int a[10] reserves space for ten ints. int *a is an uninitialized variable pointing to who knows what. Accessing a[10000000] will most likely crash your program as you are trying to access memory you don't have access to or doesn't exist.
To return a 2d array return a pointer-to-pointer: int ** f() {}
Yes; array indexing is done in terms of pointer arithmetic: a[i] is defined as *(a + i); we find the address of the i'th element after a and dereference the result. So a could be declared as either a pointer or an array.
It would be interpreted as an address, yes (most likely an invalid address). You would need to cast the literal 3 as a pointer, because values of type int and int * are not compatible.
Yes, it would be legal. Pointless, but legal.
Pointers and arrays are not the same thing; in most circumstances, an expression of array type will be converted ("decay") to an expression of pointer type and its value will be the address of the first element of the array. Declaring a pointer by itself is not sufficient, because unless you initialize it to point to a block of memory (either the result of a malloc call or another array) its value will be indeterminate, and may not point to valid memory.
You really don't want to return arrays; remember that an array expression is converted to a pointer expression, so you're returning the address of the first element. However, when the function exits, that array no longer exists and the pointer value is no longer valid. It's better to pass the array you want to modify as an argument to the function, such as
void foo (int *a, size_t asize)
{
size_t i;
for (i = 0; i < asize; i++)
a[i] = some_value();
}
Pointers contain no metadata about the number of elements they point to, so you must pass that as a separate parameter.
For a 2D array, you'd do something like
void foo(size_t rows, size_t columns, int (*a)[columns])
{
size_t i, j;
for (i = 0; i < rows; i++)
for (j = 0; j < columns; j++)
a[i][j] = some_value;
}
This assumes you're using a C99 compiler or a C2011 compiler that supports variable length arrays; otherwise the number of columns must be a constant expression (i.e., known at compile time).
These answers certainly call for a bit more depth. The better you understand pointers, the less bad code you will write.
An array and a pointer are not the same, EXCEPT when they are. Off the top of my head:
int a[2][2] = { 1, 2, 3, 4 };
int (* p)[2] = a;
ASSERT (p[1][1] == a[1][1]);
Array "a" functions exactly the same way as pointer "p." And the compiler knows just as much from each, specifically an address, and how to calculate indexed addresses. But note that array a can't take on new values at run time, whereas p can. So the "pointer" aspect of a is gone by the time the program runs, and only the array is left. Conversely, p itself is only a pointer, it can point to anything or nothing at run time.
Note that the syntax for the pointer declaration is complicated. (That is why I came to stackoverflow in the first place today.) But the need is simple. You need to tell the compiler how to calculate addresses for elements past the first column. (I'm using "column" for the rightmost index.) In this case, we might assume it needs to increment the address ((2*1) + 1) to index [1][1].
However, there are a couple of more things the compiler knows (hopefully), that you might not.
The compiler knows two things: 1) whether the elements are stored sequentially in memory, and 2) whether there really are additional arrays of pointers, or just one pointer/address to the start of the array.
In general, a compile time array is stored sequentially, regardless of dimension(s), with no extra pointers. But to be sure, check the compiler documentation. Thus if the compiler allows you to index a[0][2] it is actually a[1][0], etc. A run time array is however you make it. You can make one dimensional arrays of whatever length you choose, and put their addresses into other arrays, also of whatever length you choose.
And, of course, one reason to muck with any of these is because you are choosing from using run time multiplies, or shifts, or pointer dereferences to index the array. If pointer dereferences are the cheapest, you might need to make arrays of pointers so there is no need to do arithmetic to calculate row addresses. One downside is it requires memory to store the addtional pointers. And note that if the column length is a power of two, the address can be calculated with a shift instead of a multiply. So this might be a good reason to pad the length up--and the compiler could, at least theoretically, do this without telling you! And it might depend on whether you select optimization for speed or space.
Any architecture that is described as "modern" and "powerful" probably does multiplies as fast as dereferences, and these issues go away completely--except for whether your code is correct.

Does C99 guarantee that arrays are contiguous?

Following an hot comment thread in another question, I came to debate of what is and what is not defined in C99 standard about C arrays.
Basically when I define a 2D array like int a[5][5], does the standard C99 garantee or not that it will be a contiguous block of ints, can I cast it to (int *)a and be sure I will have a valid 1D array of 25 ints.
As I understand the standard the above property is implicit in the sizeof definition and in pointer arithmetic, but others seems to disagree and says casting to (int*) the above structure give an undefined behavior (even if they agree that all existing implementations actually allocate contiguous values).
More specifically, if we think an implementation that would instrument arrays to check array boundaries for all dimensions and return some kind of error when accessing 1D array, or does not give correct access to elements above 1st row. Could such implementation be standard compilant ? And in this case what parts of the C99 standard are relevant.
We should begin with inspecting what int a[5][5] really is. The types involved are:
int
array[5] of ints
array[5] of arrays
There is no array[25] of ints involved.
It is correct that the sizeof semantics imply that the array as a whole is contiguous. The array[5] of ints must have 5*sizeof(int), and recursively applied, a[5][5] must have 5*5*sizeof(int). There is no room for additional padding.
Additionally, the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *). So a valid iteration is:
int a[5][5], i, *pi;
char *pc;
pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
pi = (int *)pc;
DoSomething(pi);
pc += sizeof(int);
}
Doing the same with an (int *) would be undefined behaviour, because, as said, there is no array[25] of int involved. Using a union as in Christoph's answer should be valid, too. But there is another point complicating this further, the equality operator:
6.5.9.6
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. 91)
91) Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
This means for this:
int a[5][5], *i1, *i2;
i1 = &a[0][0] + 5;
i2 = &a[1][0];
i1 compares as equal to i2. But when iterating over the array with an (int *), it is still undefined behaviour, because it is originally derived from the first subarray. It doesn't magically convert to a pointer into the second subarray.
Even when doing this
char *c = (char *)(&a[0][0]) + 5*sizeof(int);
int *i3 = (int *)c;
won't help. It compares equal to i1 and i2, but it isn't derived from any of the subarrays; it is a pointer to a single int or an array[1] of int at best.
I don't consider this a bug in the standard. It is the other way around: Allowing this would introduce a special case that violates either the type system for arrays or the rules for pointer arithmetic or both. It may be considered a missing definition, but not a bug.
So even if the memory layout for a[5][5] is identical to the layout of a[25], and the very same loop using a (char *) can be used to iterate over both, an implementation is allowed to blow up if one is used as the other. I don't know why it should or know any implementation that would, and maybe there is a single fact in the Standard not mentioned till now that makes it well defined behaviour. Until then, I would consider it to be undefined and stay on the safe side.
I've added some more comments to our original discussion.
sizeof semantics imply that int a[5][5] is contiguous, but visiting all 25 integers via incrementing a pointer like int *p = *a is undefined behaviour: pointer arithmetics is only defined as long as all pointers invoved lie within (or one element past the last element of) the same array, as eg &a[2][1] and &a[3][1] do not (see C99 section 6.5.6).
In principle, you can work around this by casting &a - which has type int (*)[5][5] - to int (*)[25]. This is legal according to 6.3.2.3 §7, as it doesn't violate any alignment requirements. The problem is that accessing the integers through this new pointer is illegal as it violates the aliasing rules in 6.5 §7. You can work around this by using a union for type punning (see footnote 82 in TC3):
int *p = ((union { int multi[5][5]; int flat[25]; } *)&a)->flat;
This is, as far as I can tell, standards compliant C99.
If the array is static, like your int a[5][5] array, it's guaranteed to be contiguous.

Resources