C global unsized array? - c

We had a school project, any information system using C. To keep a dynamic-sized list of student records, I went for a linked list data structure. This morning my friend let me see his system. I was surprised with his list of records:
#include <stdio.h>
/* and the rest of the includes */
/* global unsized array */
int array[];
int main()
{
int n;
for (n=0; n < 5; n ++) {
array[n] = n;
}
for (n=0; n < 5; n ++) {
printf("array[%d] = %d\n", n, array[n]);
}
return 0;
}
As with the code, he declared an unsized array that is global (in the bss segment) to the whole program. He was able to add new entries to the array by overwriting subsequent blocks of memory with a value other than zero so that he can traverse the array thusly:
for (n=0; array[n]; n++) {
/* do something */
}
He used (I also tested it with) Turbo C v1. I tried it in linux and it also works.
As I never encountered this technique before, I am presuming there is a problem with it. So, yeah, I wanna know why this is a bad idea and why prefer this over a linked list.

int array[];
Is technically known as an array with incomplete type. Simply put it is equivalent to:
int array[1];
This is not good simply because:
It produces an Undefined behavior. The primary use of array with incomplete type is in Struct Hack. Note that incomplete array types where standardized in C99 and they are illegal before.

This is Undefined behaviour. You are writing to unallocated memory (beyond the array). In order to compile this, the compiler is allocating at least one element, and you're then writing beyond that. Try a much bigger range of numbers. For example, if I run your code on Linux it works, but if I change the loop to 50,000, it crashes.
EDIT The code may work for small values of n but for larger values it will fail. To demonstrate this I've written your code and tested it for n = 1000.
Here is the link for CODEPAD, and you can see that for n = 1000, a segmentation fault happens.
Whereas with the same code with the same compiler, it is working for n = 10, see this link CODEPAD. So this is called Undefined behavior.

If you use linked lists you can check whether the memory is allocated properly or not.
int *ptr;
ptr = (int *)malloc(sizeof(int))
if(ptr==NULL)
{
printf("No Memory!!!");
}
But with your code the program simply crashes if tested with an array having a large bound.

Related

Having a little trouble understanding memory allocation in C

So I am learning how to program in C, and am starting to learn about dynamic memory allocation. What I know is that not all the time will your program know how much memory it needs at run time.
I have this code:
#include <stdio.h>
int main() {
int r, c, i, j;
printf("Rows?\n");
scanf("%d", &r);
printf("Columns?\n");
scanf("%d", &c);
int array[r][c];
for (i = 0; i < r; i++)
for (j = 0; j < c; j++)
array[i][j] = rand() % 100 + 1;
return 0;
}
So if I wanted to create a 2D array, I can just declare one and put numbers in the brackets. But here in this code, I am asking the user how many rows and columns they would like, then declaring an array with those variables, I then filled up the rows and columns with random integers.
So my question is: Why don't I have to use something like malloc here? My code doesn't know how many rows and columns I am going to put in at run time, so why do I have access to that array with my current code?
So my question is: why don't I have to use something like malloc here?
My code doesn't know how many rows and columns I am going to put in at
run time, so why do I have access to that array with my current code?
You are using a C feature called "variable-length arrays". It was introduced in C99 as a mandatory feature, but support for it is optional in C11 and C18. This alternative to dynamic allocation carries several limitations with it, among them:
because the feature is optional, code that unconditionally relies on it is not portable to implementations that do not support the feature
implementations that support VLAs typically store local VLAs on the stack, which is prone to producing stack overflows if at runtime the array dimension is large. (Dynamically-allocated space is usually much less sensitive to such issues. Large, fixed-size automatic arrays can be an issue too, but the potential for trouble with these is obvious in the source code, and it is less likely to evade detection during testing.)
the program still needs to know the dimensions of your array before its declaration, and the dimensions at the point of the declaration are fixed for the lifetime of the array. Unlike dynamically-allocated space, VLAs cannot be resized.
there are contexts that accommodate ordinary, fixed length arrays, but not VLAs, such as file-scope variables.
Your array is allocated on the stack, so when the function (in your case, main()) exits the array vanishes into the air. Had you allocated it with malloc() the memory would be allocated on the heap, and would stay allocated forever (until you free() it). The size of the array IS known at run time (but not at compile time).
In your program, the array is allocated with automatic storage, aka on the stack, it will be released automatically when leaving the scope of definition, which is the body of the function main. This method, passing a variable expression as the size of an array in a definition, introduced in C99, is known as variable length array or VLA.
If the size is too large, or negative, the definition will have undefined behavior, for example causing a stack overflow.
To void such potential side effects, you could check the values of the dimensions and use malloc or calloc:
#include <stdio.h>
#include <stdlib.h>
int main() {
int r, c, i, j;
printf("Rows?\n");
if (scanf("%d", &r) != 1)
return 1;
printf("Columns?\n");
if (scanf("%d", &c) != 1)
return 1;
if (r <= 0 || c <= 0) {
printf("invalid matrix size: %dx%d\n", r, c);
return 1;
}
int (*array)[c] = calloc(r, sizeof(*array));
if (array == NULL) {
printf("cannot allocate memory for %dx%d matrix\n", r, c);
return 1;
}
for (i = 0; i < r; i++) {
for (j = 0; j < c; j++) {
array[i][j] = rand() % 100 + 1;
}
}
free(array);
return 0;
}
Note that int (*array)[c] = calloc(r, sizeof(*array)); is also a variable length array definition: array is a pointer to arrays of c ints. sizeof(*array) is sizeof(int[c]), which evaluates at run time to (sizeof(int) * c), so the space allocated for the matrix is sizeof(int) * c * r as expected.
The point of dynamic memory allocation (malloc()) is not that it allows for supplying the size at run time, even though that is also one of its important features. The point of dynamic memory allocation is, that it survives the function return.
In object oriented code, you might see functions like this:
Object* makeObject() {
Object* result = malloc(sizeof(*result));
result->someMember = ...;
return result;
}
This creator function allocates memory of a fixed size (sizeof is evaluated at compile time!), initializes it, and returns the allocation to its caller. The caller is free to store the returned pointer wherever it wants, and some time later, another function
void destroyObject(Object* object) {
... //some cleanup
free(object);
}
is called.
This is not possible with automatic allocations: If you did
Object* makeObject() {
Object result;
result->someMember = ...;
return &result; //Wrong! Don't do this!
}
the variable result ceases to exist when the function returns to its caller, and the returned pointer will be dangling. When the caller uses that pointer, your program exhibits undefined behavior, and pink elephants may appear.
Also note that space on the call stack is typically rather limited. You can ask malloc() for a gigabyte of memory, but if you try to allocate the same amount as an automatic array, your program will most likely segfault. That is the second reason d'etre for malloc(): To provide a means to allocate large memory objects.
The classic way of handling a 2D array in 'C' where the dimensions might change is to declare it as a sufficiently sized one dimensional array and then have a routine / macro / calculation that calculates the element number of that 1D array given the specified row, column, element size, and number of columns in that array.
So, let's say you want to calculate the address offset in a table for 'specifiedRow' and 'specifiedCol' and the array elements are of 'tableElemSize' size and the table has 'tableCols' columns. That offset could be calculated as such:
addrOffset = specifiedRow * tableCols * tableElemSize + (specifiedCol * tableElemSize);
You could then add this to the address of the start of the table to get a pointer to the element desired.
This is assuming that you have an array of bytes, not integers or some other structure. If something larger than a byte, then the 'tableElemSize' is not going to be needed. It depends upon how you want to lay it out in memory.
I do not think that the way that you are doing it is something that is going to be portable across a lot of compilers and would suggest against it. If you need a two dimensional array where the dimensions can be dynamically changed, you might want to consider something like the MATRIX 'object' that I posted in a previous thread.
How I can merge two 2D arrays according to row in c++
Another solution would be dynamically allocated array of dynamically allocated arrays. This takes up a bit more memory than a 2D array that is allocated at compile time and the elements in the array are not contiguous (which might matter for some endeavors), but it will still give you the 'x[i][j]' type of notation that you would normally get with a 2D array defined at compile time. For example, the following code creates a 2D array of integers (error checking left out to make it more readable):
int **x;
int i, j;
int count;
int rows, cols;
rows = /* read a value from user or file */
cols = /* read a value from user of file */
x = calloc(sizeof(int *), rows);
for (i = 0; i < rows; i++)
x[i] = calloc(sizeof(int), cols);
/* Initial the 2D array */
count = 0;
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++) {
count++;
x[i][j] = count;
}
}
One thing that you need to remember here is that because we are using an array of arrays, we cannot always guarantee that each of the arrays is going to be in the next block of memory, especially if any garbage collection has been going on in the meantime (like might happen if your code was multithreaded). Even without that though, the memory is not going to be contiguous from one array to the next array (although the elements within each array will be). There is overhead associated with the memory allocation and that shows up if you look at the address of the 2D array and the 1D arrays that make up the rows. You can see this by printing out the address of the 2D array and each of the 1D arrays like this:
printf("Main Array: 0x%08X\n", x);
for (i = 0; i < rows; i++)
printf(" 0x08X [%04d], x[i], (int) x[i] - (int) x);
When I tested this with a 2D array with 4 columns, I found that each row took up 24 bytes even though it only needs 16 bytes for the 4 integers in the columns.

Pointers address location

As part of our training in the Academy of Programming Languages, we also learned C. During the test, we encountered the question of what the program output would be:
#include <stdio.h>
#include <string.h>
int main(){
char str[] = "hmmmm..";
const char * const ptr1[] = {"to be","or not to be","that is the question"};
char *ptr2 = "that is the qusetion";
(&ptr2)[3] = str;
strcpy(str,"(Hamlet)");
for (int i = 0; i < sizeof(ptr1)/sizeof(*ptr1); ++i){
printf("%s ", ptr1[i]);
}
printf("\n");
return 0;
}
Later, after examining the answers, it became clear that the cell (& ptr2)[3] was identical to the memory cell in &ptr1[2], so the output of the program is: to be or not to be (Hamlet)
My question is, is it possible to know, only by written code in the notebook, without checking any compiler, that a certain pointer (or all variables in general) follow or precede other variables in memory?
Note, I do not mean array variables, so all the elements in the array must be in sequence.
In this statement:
(&ptr2)[3] = str;
ptr2 was defined with char *ptr2 inside main. With this definition, the compiler is responsible for providing storage for ptr2. The compiler is allowed to use whatever storage it wants for this—it could be before ptr1, it could be after ptr1, it could be close, it could be far away.
Then &ptr2 takes the address of ptr2. This is allowed, but we do not know where that address will be in relation to ptr1 or anything else, because the compiler is allowed to use whatever storage it wants.
Since ptr2 is a char *, &ptr2 is a pointer to char *, also known as char **.
Then (&ptr2)[3] attempts to refer to element 3 of an array of char * that is at &ptr2. But there is no array there in C’s model of computation. There is just one char * there. When you try to refer to element of 3 of an array when there is no element 3 of an array, the behavior is not defined by the C standard.
Thus, this code is a bad example. It appears the test author misunderstood C, and this code does not illustrate what was intended.
char *ptr2 = some initializer;
(&ptr2)[3] = str;
When you evaluate &ptr2, you obtain the address of memory where is stored the pointer that points to that initializer.
When you do (&ptr2)[3]=something you try to write 3*sizeof(void*) locations further from the location of ptr2, the address of a string. This is invalid and almost sure it finishes with segmentation fault.
No, it's not possible and no such assumptions can be made.
By writing outside a variable's space, this code invokes undefined behavior, it's basically "illegal" and anything can happen when you run it. The C language specification says nothing about variables being allocated on a stack in some particular order that you can exploit, it does however say that accessing random memory is undefined behavior.
Basically this code is pretty horrible and should never be used, even less so in a teaching environment. It makes me sad, how people mis-understand C and still teach it to others. :/
A program usually is loaded in memory with this structure:
Stack, Mmap'ed files, Heap, BSS (uninitialized static variables), Data segment (Initialized static variables) and Text (Compiled code)
You can learn more here:
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
Depending on how you declare the variable it will go to one of the places said before.
The compiler will arrange the BSS and Data segment variables as he wishes on compilation time so usually no chance. Neither heap vars (the OS will get the memory block that fits better the space allocated)
In the stack (which is a LIFO structure) the variables are put one over eachother so if you have:
int a = 5;
int b = 10;
You can say that a and b will be placed one following the other. So, in this case you can tell.
There is another exception and that is if the variable is an structure or an array, they are always placed like i said before, each one following the last.
In your code ptr1 is an array of arrays of chars so it will follow the exception i said.
In fact, do the following exercise:
#include <stdio.h>
#include <string.h>
int main(){
const char * const ptr1[] = {"to be","or not to be","that is the question"};
for (int i = 0; i < 3; i++) {
for (int j = 0; j < strlen(ptr1[i]); j++)
printf("%p -> %c\n", &ptr1[i][j], ptr1[i][j]);
printf("\n");
}
}
and you will see the memory address and its content!
Have a nice day.

(C) Why can I access array elements beyond the given limit? [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 4 years ago.
So I've been learning C for more than about a year, and never in my studies have I ever thought this was possible:
#include <stdio.h>
#include <stdlib.h>
int main()
{
struct exterior
{
int x;
} *ptr;
ptr = (struct exterior *)malloc(sizeof(struct exterior[3]));
ptr[0].x = 1;
ptr[1].x = 2;
ptr[2].x = 3;
ptr[3].x = 4;
ptr[4].x = 5;
ptr[5].x = 6;
printf("%d %d %d %d %d %d", ptr[0].x, ptr[1].x, ptr[2].x, ptr[3].x, ptr[4].x, ptr[5].x);
return 0;
}
So at first I followed the rules of C; I allocated the memory required for 3 structure array elements to a structure pointer. I used to pointer to access the variable that was in the structure, while using an index to specify the structure array element.
For some reason, I then decided to try to access the array element beyond the given limit, even if I knew that the outcome would probably be the program crashing, but I did it anyways.
To my surprise, there was no crash.
Instead, the program worked. It printed out the value I had given to the variable with no problems. How is this possible?
Later on, I tried it with an int array. It worked as well! Am I doing something wrong?
When you create on array on C, the program allocates the memory you need for that and gives you the pointer for the first element. So when you say array[0], what you are doing is summing 0 to the base pointer of that array, therefore array[1] is increasing 1(4 bytes to be more precise) to the inicial pointer, so you can see the 2 element and so on (Dont forget that the array is a continous segment of memory, every value is next to his previous one). If you try to reach a position out of the array, the program will not crash, what it will do is read the memory from where it is pointing, which in most cases will most probably be garbish, but C has no problem with it, this language allows you to do pretty much everything!
Hope it helps :)

Array initialization at runtime

code link
#include <stdio.h>
int main(void) {
// your code goes here
int a = 2;
int b = 3;
int c;
c = a + b;
int arr[c];
arr[5] = 0;
printf("%d",arr[5]);
return 0;
}
Output is 0
How is it that at runtime it is taking the array number ? Is it a new feature ?
This is a variable length array. They were introduced in the 1999 revision of the C standard.
Sadly support for them came in slowly, so much that the 2011 revision made them an optional feature (but they are still standardized) 1.
Despite looking cool, they have a major caveat. They can cause you to overflow the call stack if the size is "too big". As such, care needs to be taken when using them.
1 Some compiler vendors were resistant, so it was made optional to appease them. Microsoft is an entire case study of this.
This feature (Variable length array) has been introduced in C99. But currently this still is a compiler-dependent behavior. Some compiler(like gcc) supports it. Some(like msvc) doesn't.
BTW, arr[5] in your code, is out of range. Last element should be arr[4].
don't be confused in (static/fixed memory allocation) & (dynamic memory allocation) concepts :)
Let me clear your concept bro.
Relevant to following question,
C supports two type of array.
1.Static Array - are allocated memory at "COMPILE TIME".
2.Dynamic Array - are allocated memory at "RUN TIME".
Ques. how to determine if an Array is static or dynamic?
Ans.
Array declaration syntax:-
int array_Name[size]; //size defines the size of block of memory for an array;
So, coming to the point-->
Point 1. if size is given at compile time to array, it's a "Static Memory Allocation". It is also called "fixed size memory allocation" because size is never changed. It's the LIMITATION of ARRAY in C.
ex.
int arr[10]; //10 is size of arr which is staticly defined
int brr[] = {1000, 2, 37, 755, 3}; //size is equal to the no. of values initilizes with.
point 2. If size is given at compile time to array, it's a Dynamic Memory Allocation.
It is achieved by malloc() function defined in stdlib.h .
Now, its's the clarification of your code :-
#include <stdio.h>
int main(void) {
// your code goes here
int a = 2;
int b = 3;
int c;
c = a + b; //c is calculated at run time
int arr[c]; //Compilor awaiting for the value of c which is given at run time but,
arr[5] = 0; //here arr is allocated the size of 5 at static(compile) time which is never be change further whether it is compile time in next statements or run time.
printf("%d",arr[5]);
return 0;
}
So, array(of size 5) holds value 0 at arr[5].
and ,other array indexes still show Garbage Values.
Hoping, you'll be satisfy with this solution to your problem :)

Difficulty in understanding variable-length arrays in C

I was reading a book when I found that array size must be given at time of declaration or allocated from heap using malloc at runtime.I wrote this program in C :
#include<stdio.h>
int main() {
int n, i;
scanf("%d", &n);
int a[n];
for (i=0; i<n; i++) {
scanf("%d", &a[i]);
}
for (i=0; i<n; i++) {
printf("%d ", a[i]);
}
return 0;
}
This code works fine.
My question is how this code can work correctly.Isn't it the violation of basic concept of C that array size must be declared before runtime or allocate it using malloc() at runtime.I'm not doing any of these two things,then why it it working properly ?
Solution to my question is variable length arrays which are supported in C99 but if I play aroundmy code and put the statement int a[n]; above scanf("%d,&n); then it's stops working Why is it so.if variable length arrays are supported in C ?
The C99 standard supports variable length arrays. The length of these arrays is determined at runtime.
Since C99 you can declare variable length arrays at block scope.
Example:
void foo(int n)
{
int array[n];
// Initialize the array
for (int i = 0; i < n; i++) {
array[i] = 42;
}
}
C will be happy as long as you've declared the array and allocated memory for it before you use it. One of the "features" of C is that it doesn't validate array indices, so it's the responsibility of the programmer to ensure that all memory accesses are valid.
Variable length arrays are a new feature added to C in C99.
"variable length" here means that the size of the array is decided at run-time, not compile time. It does not mean that the size of the array can change after it is created. The array is logically created where it is declared. So your code looks like.
int n, i;
Create two variables n and i. Initially these variables are uninitialised.
scanf("%d", &n);
Read a value into n.
int a[n];
Create an array "a" whose size is the current value of n.
If you swap the second and third steps you try to create an array whose size is determined by an uninitalised value. This is not likely to end well.
The C standard does not specify exactly how the array is stored but in practice most compilers (I belive there are some exceptions) will allocate it on the stack. The normal way to do this is to copy the stack pointer into a "frame pointer" as part of the function preamble. This then allows the function to dynamically modify the stack pointer while keeping track of it's own stack frame.
Variable length arrays are a feature that should be used with caution. Compilers typically do not insert any form of overflow checking on stack allocations. Operating systems typically insert a "gaurd page" after the stack to detect stack overflows and either raise an error or grow the stack, but a sufficiently large array can easilly skip over the guard page.

Resources