Automatic resizing of array, initialized by auto-count? [duplicate] - c

This question already has answers here:
int LA[] = {1,2,3,4,5} memory allocation confusion in c
(3 answers)
Closed 6 years ago.
So. I am teaching programming 1 to some college level pupils at the moment. And i specifically told them to go out and look online for references, specifically on the datastructure parts i am covering at the moment. Today one student emailed me with a link to tutorialspoint.com and asked about this piece of code he pulled from there:
#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int item = 10, k = 3, n = 5;
int i = 0, j = n;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
n = n + 1;
while( j >= k) {
LA[j+1] = LA[j];
j = j - 1;
}
LA[k] = item;
printf("The array elements after insertion :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
}
Now, without knowing exactly where it's from i don't know exactly how they described it, but obviously it is insertion into an array of value at index k, shuffling upwards from k.
Now what he asked about was that i have told my students that when doing something like:
int arr[] = {1,2,3,4};
the compiler will auto-count the size, by checking the supplied value list. This case means an array size of 4 elements. I have also told them that an array size is fixed when the array is first initialized, like:
int likethis[5];
int orthis[] = {1,2,3,4};
int orlikeso[MAX_ARR_SIZE];
Thus, to resize an array, dynamic memory management is needed, so that you would declare space for a new array (a part of the course they have yet to get to).
But the code from this tutorial site actually seems to do an auto-size by the compiler with the initializer list, then go about merrily resizing it in the loop, when shuffling.
So the final size of LA in their example would be 6 elements. Now, my student wants to know why this is valid. I have not tested this code myself, but apperantly it compiles on GCC according to my student. If so, how can that code be valid? Wouldn't this overwrite the boundaries of LA, when setting LA[5] in the shuffle loop?
Questions: Is it me who is an old geezer, and this is allowed in C since way back? Only in GCC? Seeing as i learned C in the 80s somewhere, i assume i might be wrong here, but to me it is writing past the assigned size of LA. Just wanted to check it on S.O.

But the code from this tutorial site actually seems to do an auto-size by the compiler with the initializer list, then go about merrily resizing it in the loop, when shuffling.
The code only appears to do that. In reality, the code causes undefined behavior as soon as it touches index 5 of a five-element array.
Now, my student wants to know why this is valid.
He should have started with a simpler "is this valid" question. The answer to it would be "no". The code will compile, and may even appear to work, but this code is invalid.
Unfortunately, there is no easy way to demonstrate it to students at the early stages of learning C, because reading memory profiler reports (say, valgrind) is an advanced skill. On the other hand, if the students have enough determination to learn how to run their code through a memory profiler, they are in for a very rewarding experience of having good confidence in their code.
Note: I think this is a great teaching moment, because it lets you teach the student an important point about undefined behavior in C, and also reinforce the rule "you shouldn't trust things just because you found them on the internet" applies to code as well.

By attempting to write past the last element of the array, the code invokes undefined behavior, which means it may crash outright, silently corrupt data, or appear to run without any problems.
There may be some padding or scratch space that the extra element is being written to, which is why it isn't crashing, but this code is not valid.

To answer your question, the code is simply not valid. The array overflows but the bug is not visible (however if you enable compiler size optimization, it should improve probabilities that this code crashes).
In order to help you spotting the overflow, i suggest you run the code with Valgrind, as it will spot the overflow for you.
edit: I ran Valgrind with memcheck and it didn't spot that overflow. Surprising for me.

There is nothing as automatic resizing with arrays in C. What is happening here is something known as "buffer overflow" . (Check the answer at Memory confusion for strncpy in C for more details on possible side effects of buffer overflow)
To show that the size of LA has not changed at all you can try printing the size at the beginning and at the end of the code as below:
#include <stdio.h>
int main() {
int LA[] = {1,3,5,7,8};
int item = 10, k = 3, n = 5;
int i = 0, j = n;
printf("The original array elements are :\n");
printf("Number of elements in LA = %ld\n",(sizeof(LA)/sizeof(int)));
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
n = n + 1;
while( j >= k) {
LA[j+1] = LA[j];
j = j - 1;
}
LA[k] = item;
printf("The array elements after insertion :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
printf("Number of elements in LA = %ld\n",(sizeof(LA)/sizeof(int)));
}

Related

Declaring Array in C with variable length [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
This block of code gives me the fibonacci numbers
#include <stdio.h>
int main()
{
int n; //integer overflow error for n > 47
printf("How many Fibonacci numbers?\n");
scanf("%d", &n);
int fibs[n];
fibs[0] = 0;
fibs[1] = 1;
printf("%d ", fibs[0]);
printf("%d ", fibs[1]);
for(int i = 2; i < n; i++)
{
fibs[i] = fibs[i - 2] + fibs[i - 1];
printf("%d ", fibs[i]);
}
return 0;
//gives 0 1 1 2 3 5 8 13 21 34 for n = 10
}
But this gives me the wrong output but no errors
#include <stdio.h>
int main()
{
int n, fibs[n];//change
printf("How many Fibonacci numbers?\n");
scanf("%d", &n);
fibs[0] = 0;
fibs[1] = 1;
printf("%d ", fibs[0]);
printf("%d ", fibs[1]);
for(int i = 2; i < n; i++)
{
fibs[i] = fibs[i - 2] + fibs[i - 1];
printf("%d ", fibs[i]);
}
return 0;
//gives 0 1 for n = 10
}
I know it definitely has something to do with the array and its size not being defined but I'm having trouble understanding what exactly is the problem.
Could someone explain what is going on here?
int n, fibs[n]; attempts to define an array using n for the length, but n has not been initialized, so its value is not determined. Common consequences include:
The definition behaves as if n has some small value, possibly zero, and then the following code attempts to store values in the array but overruns the memory reserved for it and thus destroys other data needed by the program.
The definition behaves as if n has some large value, causing the stack to overflow and the program to be terminated.
For example, storing 0 to to fibs[0] or 1 to fibs[1] might write to the memory reserved for n. Then the for loop terminates without executing any iterations because the test i < n is false.
The one big thing that I see in your code is the line int n, fibs[n];. The variable n is located on the stack since it's a local variable. That means that it's value can literally be anything before it's initialized. And since you are declaring an array using that value, the array has a random, unknown length. If it works, then that is purely coincidence. This is why your first code version works because the array is declared AFTER the scanf which initializes n. I think a better way of creating an array with a variable number of array elements is to use malloc instead...
int n, *fibs;
printf("How many Fibinocci numbers?\n");
scanf("%d", &n);
fibs = malloc(sizeof(int) * n);
if (fibs == NULL)
{
fprintf(stderr, "Unable to allocate sufficient memory for operation.\n");
exit(1);
}
Then you can use array indices fibs[0], fibs[1], etc... to access different locations in the block of memory.
Why does this work? Because int fibs[n] is LIKE a pointer to a block of memory. Technically, they are not the same, but you can generally use a pointer to a block of memory as an array. This will only work with single dimensional arrays because the compiler has no idea how many columns there are. But to work around that, you can compute that manually like this (i is the row, j is the column):
array[i * columns + j];

Why is the use of unrelated printf statement causing changes in my program output?

I'm stuck with a program where just having a printf statement is causing changes in the output.
I have an array of n elements. For the median of every d consecutive elements, if the (d+1)th element is greater or equals to twice of it (the median), I'm incrementing the value of notifications. The complete problem statement might be referred here.
This is my program:
#include <math.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <assert.h>
#include <limits.h>
#include <stdbool.h>
#define RANGE 200
float find_median(int *freq, int *ar, int i, int d) {
int *count = (int *)calloc(sizeof(int), RANGE + 1);
for (int j = 0; j <= RANGE; j++) {
count[j] = freq[j];
}
for (int j = 1; j <= RANGE; j++) {
count[j] += count[j - 1];
}
int *arr = (int *)malloc(sizeof(int) * d);
float median;
for (int j = i; j < i + d; j++) {
int index = count[ar[j]] - 1;
arr[index] = ar[j];
count[ar[j]]--;
if (index == d / 2) {
if (d % 2 == 0) {
median = (float)(arr[index] + arr[index - 1]) / 2;
} else {
median = arr[index];
}
break;
}
}
free(count);
free(arr);
return median;
}
int main() {
int n, d;
scanf("%d %d", &n, &d);
int *arr = malloc(sizeof(int) * n);
for (int i = 0; i < n; i++) {
scanf("%i", &arr[i]);
}
int *freq = (int *)calloc(sizeof(int), RANGE + 1);
int notifications = 0;
if (d < n) {
for (int i = 0; i < d; i++)
freq[arr[i]]++;
for (int i = 0; i < n - d; i++) {
float median = find_median(freq, arr, i, d); /* Count sorts the arr elements in the range i to i+d-1 and returns the median */
if (arr[i + d] >= 2 * median) { /* If the (i+d)th element is greater or equals to twice the median, increments notifications*/
printf("X");
notifications++;
}
freq[arr[i]]--;
freq[arr[i + d]]++;
}
}
printf("%d", notifications);
return 0;
}
Now, For large inputs like this, the program outputs 936 as the value of notifications whereas when I just exclude the statement printf("X") the program outputs 1027 as the value of notifications.
I'm really not able to understand what is causing this behavior in my program, and what I'm missing/overseeing.
Your program has undefined behavior here:
for (int j = 0; j <= RANGE; j++) {
count[j] += count[j - 1];
}
You should start the loop at j = 1. As coded, you access memory before the beginning of the array count, which could cause a crash or produce an unpredictable value. Changing anything in the running environment can lead to a different behavior. As a matter of fact, even changing nothing could.
The rest of the code is more difficult to follow at a quick glance, but given the computations on index values, there may be more problems there too.
For starters, you should add some consistency checks:
verify the return value of scanf() to ensure proper conversions.
verify the values read into arr, they must be in the range 0..RANGE
verify that int index = count[ar[j]] - 1; never produces a negative number.
same for count[ar[j]]--;
verify that median = (float)(arr[index] + arr[index - 1]) / 2; is never evaluated with index == 0.
Your program has undefined behavior (at several occasions). You really should be scared (and you are not scared enough).
I'm really not able to understand what is causing this behavior in my program
With UB, that question is pointless. You need to dive into implementation details (e.g. study the generated machine code of your program, and the code of your C compiler and standard library) to understand anything more. You probably don't want to do that (it could take years of work).
Please read as quickly as possible Lattner's blog on What Every C Programmer Should Know on Undefined Behavior
what I'm missing/overseeing.
You don't understand well enough UB. Be aware that a programming language is a specification (and code against it), not a software (e.g. your compiler). Program semantics is important.
As I said in comments:
compile with all warnings and debug info (gcc -Wall -Wextra -g with GCC)
improve your code to get no warnings; perhaps try also another compiler like Clang and work to also get no warnings from it (since different compilers give different warnings).
consider using some version control system like git to keep various variants of your code, and some build automation tool.
think more about your program and invariants inside it.
use the debugger (gdb), in particular with watchpoints, to understand the internal state of your process; and have several test cases to run under the debugger and without it.
use instrumentation facilities such as the address sanitizer -fsanitize=address of GCC and tools like valgrind.
use rubber duck debugging methodology
sometimes consider static source code analysis tools (e.g. Frama-C). They require expertise to be used, and/or give many false positives.
read more about programming (e.g. SICP) and about the C Programming Language. Download and study the C11 programming language specification n1570 (and be very careful about every mention of UB in it). Read carefully the documentation of every standard or external function you are using. Study also the documentation of your compiler and of other tools. Handle error and failure cases (e.g. calloc and scanf can fail).
Debugging is difficult (e.g. because of the Halting Problem, of Heisenbugs, etc...) - but sometimes fun and challenging. You can spend weeks on finding one single bug. And you often cannot understand the behavior of a buggy program without diving into implementation details (studying the machine code generated by the compiler, studying the code of the compiler).
PS. Your question shows a wrong mindset -which you should improve-, and misunderstanding of UB.

Code to change an array element changes a different variable

I'm quite puzzled by why my variable NumberOfArrays changes the second time through the for loop in my code. Can anyone help me out?
#include <stdio.h>
#include <cs50.h>
int main(int argc, string argv[])
{
//variable declarations
int NumberOfArrays = 0;
int arrayRack[0];
//Get number of arrays
printf("Key in the number of arrays you'd like to have\n");
NumberOfArrays = GetInt();
//Get number for each element in arrayRack[]
for(int i = 0; i < NumberOfArrays; i++)
{
printf("give me an int for the %i th array\n", i + 1);
arrayRack[i] = GetInt();
// *** on the second pass, my "NumberOfArrays" gets adjusted to my GetInt number here. Why?
}
//print out numbers stored in respective arrays
for(int j = 0; j < NumberOfArrays; j++)
{
printf("{%i}<-- number in %ith array\n", arrayRack[j], j + 1);
}
return 0;
}
Because you declared arrayRack as an empty array ([0]). Try int arrayRack[100]; or some other number, and make sure that NumberOfArrays is less than that number before you use it.
Explanation: (edit note that this may vary by compiler) your variables are most likely stored on the stack in nearby memory addresses. So arrayRack points somewhere close to NumberOfArrays in memory. C doesn't generally check if you've run off the end of an array, so accessing arrayRack[1] doesn't cause a compiler error in this situation. However, arrayRack[1] isn't part of your array, so accessing it actually accesses something else — in this situation, NumberOfArrays.
Edit gcc permits length-0 arrays but does not allocate space for them per this. However, length-0 arrays are prohibited by the C standard (e.g., see this, the answers to this, and this). Given the behaviour you've seen, it looks to me like the compiler is allocating one int's worth of space on the stack, pointing arrayRack to that space, and packing that space right next to NumberOfArrays. As a result, &(arrayRack[1]) == &NumberOfArrays. In any event, using variable-length arrays as suggested by #dasblinkenlight is a cleaner way to handle this situation.
In general, given int arrayRack[N];, you can only safely access arrayRack[0] through arrayRack[N-1].
You declared the array too early. Move the declaration to after the call of GetInt(), like this:
printf("Key in the number of arrays you'd like to have\n");
int NumberOfArrays = GetInt();
int arrayRack[NumberOfArrays];
Note: NumberOfArrays is not an ideal name for the variable, because it denotes the number of array elements, not the number of arrays; your code has only one array.

int LA[] = {1,2,3,4,5} memory allocation confusion in c

I have observed that memory allocated for array seems to be dynamic.
Here is the sample code I found in this tutorial:
#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int item = 10, k = 3, n = 5;
int i = 0, j = n;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
n = n + 1;
while( j >= k){
LA[j+1] = LA[j];
j = j - 1;
}
LA[k] = item;
printf("The array elements after insertion :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
}
and sample output:
The original array elements are :
LA[0]=1
LA[1]=3
LA[2]=5
LA[3]=7
LA[4]=8
The array elements after insertion :
LA[0]=1
LA[1]=3
LA[2]=5
LA[3]=10
LA[4]=7
LA[5]=8
How its working I did not get.
First, a general statement, for an array defined without explicit size and initialized using brace-enclosed initializer, the size will depend o the elements in the initializer list. So, for your array
int LA[] = {1,3,5,7,8};
size will be 5, as you have 5 elements.
C uses 0-based array indexing, so the valid access will be 0 to 4.
In your code
LA[j+1] = LA[j];
trying to access index 6, (5+1) which is out of bound access. This invokes undefined behavior.
Output of a code having UB cannot be justified in any way.
That said, main() is technically an invalid signature as per latest C standards. You need to use at least int main(void) to make the code conforming for a hosted environment.
The code has a buffer overflow bug! Arrays in C cannot be extended! You need to allocate enough space when you declare/define it.
You can declare additional space by supplying a size in the declaration:
int LA[10] = {1,3,5,7,8};
LA will now have room for 10 elements with index 0 through 9.
If you want more flexibility you should use a pointer and malloc/calloc/realloc to allocate memory.
Note:
There is a second bug in the copying. The loop starts one step too far out.
With j starting at 5 and assigning index j+1 the code assigns LA[6], which is the 7th element. After the insertion there are only 6 elements.
My conclusion from these 2 bugs is that the tutorial was neither written nor reviewed by an experienced C programmer.
To add on to the other answers, C/C++ do not do any bounds checking for arrays.
In this case you have a stack allocated array, so as long as your index does not leave stack space, there will be no "errors" during runtime. However, since you are leaving the bounds of your array, it is possible that you may end up changing the values of other variables that are also allocated in the stack if it's memory location happens to be immediately after the allocated array. This is one of the dangers of buffer overflows and can cause very bad things to happen in more complex programs.

Declared array of size [x][y] and another array with size [y-1]

I am using Code::Blocks 10.05, and the GNU GCC Compiler.
Basically, I ran into a really strange (and for me, inexplicable) issue that arises when trying to initialize an array outside it's declared size. In words, it's this:
*There is a declared array of size [x][y].
*There is another declared array with size [y-1].
The issue comes up when trying to put values into this second, size [y-1] array, outside of the [y-1] size. When this is attempted, the first array [x][y] will no longer maintain all of its values. I simply don't understand why breaking (or attempting to break) one array would affect the contents of the other. Here is some sample code to see it happening (it is in the broken format. To see the issue vanish, simply change array2[4] to array2[5] (thus eliminating what I have pinpointed to be the problem).
#include <stdio.h>
int main(void)
{
//Declare the array/indices
char array[10][5];
int array2[4]; //to see it work (and verify the issue), change 4 to 5
int i, j;
//Set up use of an input text file to fill the array
FILE *ifp;
ifp = fopen("input.txt", "r");
//Fill the array
for (i = 0; i <= 9; i++)
{
for (j = 0; j <= 5; j++)
{
fscanf(ifp, "%c", &array[i][j]);
//printf("[%d][%d] = %c\n", i, j, array[i][j]);
}
}
for (j = 4; j >= 0; j--)
{
for (i = 0; i <= 9; i++)
{
printf("[%d][%d] = %c\n", i, j, array[i][j]);
}
//PROBLEM LINE*************
array2[j] = 5;
}
fclose(ifp);
return 0;
}
So does anyone know how or why this happens?
Because when you write outside of an array bounds, C lets you. You're just writing to somewhere else in the program.
C is known as the lowest level high level language. To understand what "low level" means, remember that each of these variables you have created you can think of as living in physical memory. An array of integers of length 16 might occupy 64 bytes if integers are size 4. Perhaps they occupy bytes 100-163 (unlikely but I'm not going to make up realistic numbers, also these are usually better thought of in hexadecimal). What occupies byte 164? Maybe another variable in your program. What happens if you write to one past your array of 16 integers? well, it might write to that byte.
C lets you do this. Why? If you can't think of any answers, then maybe you should switch languages. I'm not being pedantic - if this doesn't benefit you then you might want to program in a language in which it is a little harder for you to make weird mistakes like this. But reasons include:
It's faster and smaller. Adding bounds checking takes time and space, so if you're writing code for a microprocessor, or writing a JIT compiler, speed and size really do matter a lot.
If you want to understand machine architecture and go into hardware, e.g. if you're a student, it's a good gateway from programming into OS/hardware/electrical engineering. And much of computer science.
Being close to machine code, it's standard in a way that many other languages and systems have to, or can easily, support some degree of compatibility with.
Other reasons that I would be able to give if I ever actually had to work this close to the machine code.
The moral is: In C, be very careful. You must check your own array bounds. You must clean up your own memory. If you don't, your program often won't crash but will start just doing really weird things without telling you where or why.
for (j = 0; j <= 5; j++)
should be
for (j = 0; j <= 4; j++)
and array2 max index is 3 so
array2[j] = 5;
is also going to be a problem when j == 4.
C array indexes start from 0. So an [X] array valid indexes are from 0 to X-1, thus you get X elements in total.
You should use the < operator, instead of <=, in order to show the same number in both the array declaration [X] and in the expression < X. For instance
int array[10];
...
for (i=0 ; i < 10 ; ++i) ... // instead of `<= 9`
This is less error prone.
If you're outside the bounds of one array, there's always a possibility you'll be inside the bounds of the other.
array2[j] = 5; - This is your problem of overflow.
for (j = 0; j <= 5; j++) - This is also a problem of overflow. Here also you are trying to access 5th index, where you can access only 0th to 4th index.
In the process memory, while calling each function one activation records will be created to keep all the local variables of the function and also it will have some more memory to store the called function address location also. In your function four local variables are there, array, array2, i and j. All these four will be aligned in an order. So if overflow happens it will first tries to overwrite in the variable declared above or below which depends on architecture. If overflow happens for more bytes then it may corrupt the entire stack itself by overwriting some of the local variables of the called functions. This may leads to crash also, Sometimes it may not but it will behave indifferently as you are facing now.

Resources