Subscripts going out of range in an array - c

Here the subscripts of the array are out of range.
int a[10], i;
for (i = 1; i <= 10; i++)
a[i] = 0;
printf("India");
The output is an infinte loop and printf statment doesnt get executed. here's what written in K N King:
When i reaches 10. the program stores 0 into a [10] . But a [10] doesn’t exist.
so 0 goes into memory immediately after a [9] . If the variable i happens to follow
a [9] in memory—as might be the case—then i will be reset to O. causing the
loop to start over.
Can anyone explain it ?

The idea that this will form an infinite loop is based on an assumption about how the variables will be laid out in memory. In particular, it assumes that because i is defined immediately after a, that it will also be allocated in memory immediately after a.
That's certainly not guaranteed, but it equally certainly could happen. If it does, then the write to a[10] may actually overwrite i. Since it's writing 0 into the nonexistent a[10], doing so actually writes 0 into i. Then when the condition in the loop checks that i <= 10, that's true, so the loop continues -- and each time i gets to 10, it's immediately overwritten with 0 before the loop condition is evaluated, so the loop re-starts from the beginning.
As far as either the C or C++ standard cares, it's just undefined behavior--when the code writes past the end of the array anything can happen. It might do what somebody expects, or might might do something entirely different and unrelated that doesn't seem to make sense at all. The compiler is free to emit code that does pretty much anything in such a circumstance (or it could, for example, diagnose it as an error, and emit no code at all).
To give some idea of what conforming behavior could be: early versions of gcc had code to detect a specific case of implementation-defined behavior (pretty much like undefined behavior, except the implementation has to document what it does). In this case, the documented behavior was fairly complex. The compiler would attempt to do each of the following in order (and stop at the first one that succeeded):
run nethack (a game)
run rogue (another game)
start emacs, and have it execute a towers of hanoi simulation
print out "You are in a maze of twisty little passages, all alike".
I could be getting the order of those a bit wrong (this was a long time ago), but you get the idea. The result had nothing to do with anything a reasonable person would be likely to expect.

Related

Multithreading in C with CLion (Windows)

I have a section of code which is really time-consuming (~2s), and can be executed without the last one have finished.
for (x = last; x <= end && !found; x++) {
found = combine(combinaciones, x, end, &current);
}
Note that combine() it's the function I was talking about. But before the loop starts again (this for it's inside another loop) all the functions must have finished.
Also, combine() receives some arguments, combinaciones it's a pointer, and current an integer (by reference). The integer gets modified over time (and must get modified on the other active functions) and combinaciones it's a linked list (size of which is current-1). The 2nd and 3rd argument are just integers. The return it's a boolean that terminates the loop, but I don't matter if they make some extra operations.
I stress the importance of current. If current is equal to 5, combinaciones's 5th element will be the created, and then current will be 6. I mean that if two functions are allocating memory they souldn't allocate to the same position.
If that can't be done I can make the operations in parallel, and then allocate the memory on the main().
So, I started searching about multithreading and I found <pthread.h>, but it only worked on Linux (not on CLion). I found something caled fork, I don't know either if can be used. Would you tell me a Windows multithreading API, with an example, and the CLion implementation?
The code is huge and I don't think that would help a lot, with I have said should be enought. However, if something is unclear let me know. The code itself it's published on GitHub, but the (few) comments it has are in Spanish.
Thanks for your help!
Using Wsl (as #Singh said) I could use pthread.h. Then I just adapted my code. A very annoying bug that I had was that "IDs" (current variable) were assigned non-continuously (the first thread had "3", and the second one had "2"). I was able to fix it by using a global variable (pthread_mutex_t) and pthread_mutex_lock(), pthread_mutex_unlock(). For more information you can still check my code on GitHub.

Backtracking algorithm crashes with hard cases (C)

I am actually making an algorithm that takes as an input a file containing tetriminoses (figures from tetris) and arranges them in smallest possible square.
Still I encounter a weird problem :
The algorithm works for less than 10 tetriminoses (every time) but start crashing with 11, 12... (I concluded that it depends on how complicated the solution is, as it finds some 14 and 15 solutions).
But the thing is, if I add an optimisation flag like -Ofast (the program is written in C) it works for every input I give him no matter how much time it takes (sometimes more than a hour..).
First I had a lot of leaks (I was using double linked list) so I changed for an Int Array, no more leaks, but same problem.
I tried using the debugger but it makes no sense (see picture) :
Debugger says my variables do not exist anymore but all I do is increment or decrement them.
For this example it just stopped while everything is fine (values of variables are correct)
Here is the link of my main function (the one that do the backtracking):
https://github.com/Caribou123/fillitAG/blob/master/19canplace/solve.c
The rest of program (same repository) consist of functions to put tetriminoses in my array, remove them from it, or print the result.
Basically I try placing a tetri, if I have enough space, I place the next one, otherwise I remove the last one and place it to the next available position etc..
Also I first thought that I were trying to place something outside of the Array, so now my Array is way bigger than it should be and filled with -1 for invalid cases (so in the worst case I just rewrite a -1), 0 for free ones, and integers values from 1 to 26 for figures.
The fact that the program works with the flag -Ofast really troubles me, as the algorithm seems to work perfectly, what could cause my program to crash ?
Here is how I tracked the number of recursions, by adding two static variables
And here is the output
(In case you want to test it yourself, use the 19canplace folder, and compile with : gcc *.c libft/libft.a)
Thanks in advance for your time,
Artiom

What does an "extra" semicolon means? [duplicate]

This question already has answers here:
Two semicolons inside a for-loop parentheses
(4 answers)
Endless loop in C/C++ [closed]
(12 answers)
Closed 4 years ago.
I was searching for an explanation for the "double semicolon" in the code:
for(;;){}
Original question.
I do not have enough reputation to even leave a comment so I need to create a new question.
My question is,
What does an "extra" semicolon means. Is this "double semicolon" or "extra semicolon" used for "something else"?
The question comes from a person with no knowledge of programing, just powered on my first arduino kit and feel like a child when LED blinked.
But it is the reality that the questions, like general occurence, are radiating the actual knowledge of the person asking the question.
And beyond "personal preference" in using while(1) or for(;;) is it helpful in using for(;;) where you do not have enough room for the code itself?
I have updated the question.
Thank you for the quick explanation. You opened my eyes with the idea of not using anything in for loop = ). From basic high school knowledge I am aware of for loop so thank you for that.
So for(;;) returns TRUE "by default"?
And my last line about the size of the code?
Or it is compiled and using for or while actually does not affect the compiled code size?
A for loop is often written
int i;
for (i = 0; i < 6; i++) {
...
}
Before the first semicolon, there is code that is run once, before the loop starts. By omitting this, you just have nothing happen before the loop starts.
Between the semicolons, there is the condition that is checked after every loop, and the for loop end if this evaluates to false. By omitting this, you have no check, and it always continues.
After the second semicolon is code that is run after every cycle, before that condition is evaluated. Omitting this just runs additional code at that time.
The semicolons are always there, there is nothing special about the face that they are adjacent. You could replace that with
for ( ; ; )
and have the same effect.
While for (;;) itself does not return true (It doesn't return anything), the second field in the parentheses of the for always causes the loop to continue if it is empty. There's no reason for this other than that someone thought it would be convenient. See https://stackoverflow.com/a/13147054/5567382. It states that an empty condition is always replaced by a non-zero constant, which is the case where the loop continues.
As for size and efficiency, it may vary depending on the compiler, but I don't see any reason that one would be more efficient than the other, though it's really up to the compiler. It could be that a compiler would allow the program to evaluate as non-zero a constant in the condition of a while loop, but recognise an empty for loop condition and skip comparison, though I would expect the compiler to optimize the while loop better. (A long way of saying that I don't know, but it depends on a lot.)

How to index arrays using pointers safely

Edit: If you fundamentally disagree with the Fedora guide here, please explain why this approach would be worse in an objective way than classic loops. As far as I know even the CERT standard doesn't make any statement on using index variables over pointers.
I'm currently reading the Fedora Defensive Coding Guide and it suggests the following:
Always keep track of the size of the array you are working with.
Often, code is more obviously correct when you keep a pointer past the
last element of the array, and calculate the number of remaining
elements by substracting the current position from that pointer. The
alternative, updating a separate variable every time when the position
is advanced, is usually less obviously correct.
This means for a given array
int numbers[] = {1, 2, 3, 4, 5};
I should not use the classic
size_t length = 5;
for (size_t i = 0; i < length; ++i) {
printf("%d ", numbers[i]);
}
but instead this:
int *end = numbers + 5;
for (int *start = numbers; start < end; ++start) {
printf("%d ", *start);
}
or this:
int *start = numbers;
int *end = numbers + 5;
while (start < end) {
printf("%d ", *start++);
}
Is my understanding the recommendation correct?
Is my implementation correct?
Which of the last 2 is safer?
Your understanding of what the text recommends is correct, as is your implementation. But regarding the basis of the recommendation, I think you are confusing safe with correct.
It's not that using a pointer is safer than using an index. The argument is that, in reasoning about the code, it is easier to decide that the logic is correct when using pointers. Safety is about failure modes: what happens if the code is incorrect (references a location outside the array). Correctness is more fundamental: that the algorithm provably does what it sets out to do. We might say that correct code doesn't need safety.
The recommendation might have been influenced by Andrew Koenig's series in Dr. Dobbs a couple of years ago. How C Makes It Hard To Check Array Bounds. Koenig says,
In addition to being faster in many cases, pointers have another big advantage over arrays: A pointer to an array element is a single value that is enough to identify that element uniquely. [...] Without pointers, we need three parameters to identify the range: the array and two indices. By using pointers, we can get by with only two parameters.
In C, referencing a location outside the array, whether via pointer or index, is equally unsafe. The compiler will not catch you out (absent use of extensions to the standard). Koenig is arguing that with fewer balls in the air, you have a better shot at getting the logic right.
The more complicated the construction, the more obvious it is that he's right. If you want a better illustration of the difference, write strcat(3) both ways. Using indexes, you have two names and two indexes inside the loop. It's possible to use the index for one with the name for the other. Using pointers, that's impossible. All you have are two pointers.
Is my understanding the recommendation correct?
Is my implementation correct?
Yes, so it seems.
The method for(type_t start = &array; start != end; start++) is sometimes used when you have arrays of more complex items. It is mostly a matter of style.
This style is sometimes used when you already have the start and end pointers available for some reason. Or in cases where you aren't really interested in the size, but just want to repeatedly compare against the end of the array. For example, suppose you have a ring buffer ADT with a start pointer and an end pointer and want to iterate through all items.
This way of doing loops is actually the very reason why C explicitly allows pointers to point 1 item out-of-bounds of an array, you can set an end pointer to one item past the array without invoking undefined behavior (as long as that item isn't de-referenced).
(It is the very same method as used by STL iterators in C++, although there's more of a rationale in C++, since it has operator overload. For example iterator++ in C++ doesn't necessarily give an item adjacently allocated in the next memory cell. For example, iterators could be used for iterating through a linked list ADT, where the ++ would translate to node->next behind the lines.)
However, to claim that this form is always the preferred one is just subjective nonsense. Particularly when you have an array of integers and know the size. Your first example is the most readable form of a loop in C and therefore always preferred whenever possible.
On some compilers/systems, the first form could also give faster code than the second form. Pointer arithmetic might give slower code on some systems. (And I suppose that the first form might give faster data cache access on some systems, though I'd have to verify that assumption with some compiler guru.)
Which of the last 2 is safer?
Neither form is safer than the other. To claim otherwise would be subjective opinions. The statement "...is usually less obviously correct" is nonsense.
Which style to pick vary on case-to-case basis.
Overall, those "Fedora" guidelines you link seem to contain lots of questionable code, questionable rules and blatant opinions. Seems more like someone wanted to show off various C tricks than a serious attempt to write a coding standard. Overall, it smells like the "Linux kernel guidelines", which I would not recommended to read either.
If you want a serious coding standard for/by professionals, use CERT-C or MISRA-C.

Solve ÿ end of file in C

When I copy the content of a file to another in C at the end of the output file I have this character ÿ. I understand thanks to this forum that it is the EOF indicator but I don't understand what to do in order to get rid of it in the output file.
This is my code:
second_file = fopen(argv[2], "w+");
while (curr_char != EOF)
{
curr_char = fgetc(original_file);
fputc(curr_char, second_file);
}
printf("Your file has been successfully copy\n");
fclose(second_file);
fclose(original_file);
For each character you read, you have two things to do:
Check to see if it's EOF.
If not, write it to the output.
Your problem is you're doing these two things in the wrong order.
There are potentially several different ways of solving this. Which one you pick depends on how much you care about your program looking good, as opposed to merely working.
One. Starting with the code you wrote, we could change it to:
while (curr_char != EOF)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
Here, we explicitly test to see if the character is EOF, immediately after reading it, before writing it. If it's EOF, we break out of the loop early. This will work, but it's ugly: there are two different places where we test for EOF, and one of them never "fires". (Also, as a commentator reminded me, there's the problem that the first time through the loop, we're testing curr_char before we've ever set it.)
Two. You could rearrange it like this:
curr_char = getc(original_file);
while (curr_char != EOF)
{
putc(curr_char, second_file);
curr_char = getc(original_file);
}
Here, we read an initial character, and as long as it's not EOF, we write it and read another. This will work just fine, but but it's still a little bit ugly, because this time there are two different places where we read the character.
Three. You could rearrange it like this:
while ((curr_char = getc(original_file)) != EOF)
{
putc(curr_char, second_file);
}
This is the conventional way of writing a character-copying loop in C. The call to getc and the assignment to curr_char are buried inside of the controlling expression of the while loop. It depends on the fact that in C, an assignment expression has a value just like any other expression. That is, the value of the expression a = b is whatever value we just assigned to a (that is, b's value). So the value of the expression curr_char = getc(original_file) is the character we just read. So when we say while ((curr_char = getc(original_file)) != EOF), what we're actually saying is, "Call getc, assign the result to curr_char, and if it's not equal to EOF, take another trip around the loop."
(If you're still having trouble seeing this, I've written other explanations in these notes and this writeup.)
This code is both good and bad. It's good because we've got exactly one place we read characters, one place we test characters, and one place we write characters. But it's a little bit bad because, let's admit it, it's somewhat cryptic at first. It's hard to think about that assignment-buried-inside-the-while-condition. It's code like this that gives C a reputation as being full of obscure gobbledegook.
But, at least in this case, it really is worth learning the idiom, and becoming comfortable with it, because the reductions to just one read and one test and one write really are virtues. It doesn't matter so much in a trivial case like this, but in real programs which are complicated for other reasons, if there's some key piece of functionality that happens in two different places, it's extremely easy to overlook this fact, and to make a change to one of them but forget to make it to the other.
(In fact, this happened to me just last week at work. I was trying to fix a bug in somebody else's code. I finally figured out that when the code did X, it was inadvertently clearing Y. I found the place where it did X, and I added some new code to properly recreate Y. But when I tested my fix, it didn't work! It turned out there were two separate places where the code did X, and I had found and fixed the wrong one.)
Finally, here's an equivalently minimal but unconventional way of writing the loop:
while (1)
{
curr_char = getc(original_file);
if(curr_char == EOF) break;
putc(curr_char, second_file);
}
This is kind of like number 1, but it gets rid of the redundant condition in the while loop, and replaces it with the constant 1, which is "true" in C. This will work just fine, too, and it shares the virtue of having one read, one test, and one write. It actually ends up doing exactly the same operations and in exactly the same order as number 3, but by being laid out linearly it may be easier to follow.
The only problem with number 4 is that it's an unconventional, "break in the middle" loop. Personally, I don't have a problem with break-in-the-middle loops, and I find they come up from time to time, but if I wrote one and someone said "Steve, that's ugly, it's not an idiom anyone recognizes, it will confuse people", I'd have to agree.
P.S. I have replaced your calls to fgetc and fputc with the more conventional getc and putc. I'm not sure who told you to use fgetc and fputc, and there are obscure circumstances where you need them, but they're so rare that in my opinion one might as well forget that the "f" variants exist, and always use getc and putc.

Resources