I struggle with a bug since hours now. Basically, I do some simple bit operation on an uint64_t array in main.c (no function calls). It works properly on gcc (Ubuntu), MSVS2019 (Windows 10) in Debug, but not in Release. However my target architecture is x64/Windows, so I need to get it work properly with MSVS2019/Release. Besides that, I'm curious what the reason for the problem is. None of the compilers shows errors or warnings.
Now, as soon as I add a totally unrelated command to the loop (commented printf()), it works properly.
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
...
Initially I believed that I messed up the stack somewhere before the for() loop, but I checked it up multiple times ... all fine!
all used variables are checked to be initialized
no pointer returns of local variables (in scope)
array indexing (reads and writes) all within declaration limits (in scope)
All Google/SE posts explain subject UB to some of the above reasons, but none of these apply for my code. Also the fact, that it works in MSVS2019/Debug and gcc shows the code works.
What do I miss?
--- UPDATE (24.08.2021 12:00) ---
I'm completely stuck, since added printf() modifies the result and MSVS/Debug works. So how can I inspect variables?!
#Lev M There are quite some calculations before and after the shown for() loop. That's why I skipped most of the code and just showed the snippet where I could influence the code towards working correctly. I know what should be the final result (it's just a uint64_t), and it's wrong with the Release version of MSVS. I also checked w/o the for() loop. It's not optimized "away". If I leave it out completely, the result is again different.
#tstanisl It's just a matter of an uint64_t number. I know that input A should output B.
#Steve Summit That's why I posted (a bit desperate). I checked in all directions, isolated as much code as I could and yet ... no uninitialized variable or array out of bound. Driving me nuts.
#Craig Estey The code is unfortunately quite extensive. I wonder ... could the error also be in a part of the code which doesn't run?
#Eric Postpischil Agreed!
#Nate Eldredge I tested on valgrind (see below).
...
==13997== HEAP SUMMARY:
==13997== in use at exit: 0 bytes in 0 blocks
==13997== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==13997==
==13997== All heap blocks were freed -- no leaks are possible
==13997==
==13997== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
--- UPDATE (24.08.2021 18:00) ---
I found the reason for the problem (after countless trial-and-errors), but no solution yet. I post more of the code.
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 3) | 3;
}
...
In fact, the MSVS/Release compiler did this:
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
a[q] = (a[q] << 3) | 3;
}
...
... which is not the same. Never seen such a thing!
How can I force the compiler to keep the 2 for() loops separate?
Summary:
MSVS/Release (default solution properties) optimization will change this code ...
// Code 1
...
int q = 5;
uint64_t a[32];
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 3) | 3;
}
...
... into the following one, which is not the same as ...
// Code 2
...
int q = 5;
uint64_t a[32];
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
a[q] = (a[q] << 3) | 3;
}
...
Above excerpt is slightly simplified, since not limited to constant 32 loops, but kept variable (% 8). Hence 64-bit constants can't be used as commented by a user.
Discoveries:
MSVS/Release - fails
MSVS/Debug - works
gcc/Release - works
gcc/Debug - works
MSVS/Release optimization merges the two for() loops (Code 1) into one for() loop (Code 2).
Fixes:
The commented printf() provides an artificial fix this as the compiler sees the requirement to print an intermediate result.
An alternative fix would be to to use the type qualifier volatile for a[].
The root of the issue is, that MSVS optimization doesn't consider that the index q remains the same in both loops, meaning that the first loop needs to finish before the second loop starts.
Related
I am trying to thoroughly understand the code I found on Github. I am running this code on Eclipse (Version: 3.6.1
Build id: M20100909-0800).
I want to efficiently debug these lines of code:
for (index_X = 0; index_X < nb_MCU_X; index_X++) {
for (index_Y = 0; index_Y < nb_MCU_Y; index_Y++) {
for (index = 0; index < SOS_section.n; index++) {
uint32_t component_index = component_order[index];
int nb_MCU = ((SOF_component[component_index].HV >> 4) & 0xf) * (SOF_component[component_index].HV & 0x0f);
for (chroma_ss = 0; chroma_ss < nb_MCU; chroma_ss++) {
unpack_block(movie, & scan_desc, index, MCU);
iqzz_block(MCU, unZZ_MCU, DQT_table[SOF_component[component_index].q_table]);
IDCT(unZZ_MCU, YCbCr_MCU_ds[component_index] + (64 * chroma_ss));
}
upsampler(YCbCr_MCU_ds[component_index], YCbCr_MCU[component_index],
max_ss_h / ((SOF_component[component_index].HV >> 4) & 0xf), max_ss_v / ((SOF_component[component_index].HV) & 0xf), max_ss_h, max_ss_v);
}
if (color && (SOF_section.n > 1)) {
YCbCr_to_ARGB(YCbCr_MCU, RGB_MCU, max_ss_h, max_ss_v);
} else {
to_NB(YCbCr_MCU, RGB_MCU, max_ss_h, max_ss_v);
}
screen_cpyrect(index_Y * MCU_sy * max_ss_h, index_X * MCU_sx * max_ss_v, MCU_sy * max_ss_h, MCU_sx * max_ss_v, RGB_MCU);
}
}
The code above contains a number of loops and stepping over every line of code many times is laborious (nb_MCU_X is 18 and nb_MCU_Y is 32).
I tried to change the values of index_X and index_Y in Debug mode. I thought doing so would take me to a point in the program where more of the code will have been processed. However, Only index_X and index_Y changed to the values I gave them but all other dependent values did not change with them. Consequently, the behavior of the program was distorted and it began behaving erratically.
I tried setting breakpoints immediately after this section of code. However, it does not allow me to see the next step that occurs after the section of the code above is processed. I want to know instantly what the condition of the code will be when index_X or index_Y is at any value of my choosing.
Is there a way for me in Eclipse to go forward in time and have more iterations processed instead of stepping over each line of the code?
What should I do if, for example, index_Y is currently 0 but I want to go instantly to a point in the program where index_Y is 7 and the rest of the code has also changed accordingly?
I am using a simple software queue based on a write index and a read index.
Introduction details; Language: C, Compiler: GCC Optimization: -O3 with extra parameters, Architecture: Armv7a, CPU: Multicore, 2 Cortex A-15, L2 Cache: Shared and enabled, L1 Cache: Every CPU, enabled, Architecture is supposed to be cache coherent.
CPU 1 does the writing stuff and CPU 2 does the reading stuff. Below is the very simplified example code. You can assume the initial values of the indexes are zero.
COMMON:
#define QUE_LEN 4
unsigned int my_que_write_index = 0; //memory
unsigned int my_que_read_index = 0; //memory
struct my_que_struct{
unsigned int param1;
unsigned int param2;
};
struct my_que_struct my_que[QUE_LEN]; //memory
CPU 1 runs:
void que_writer
{
unsigned int write_index_local;
write_index_local = my_que_write_index; //my_que_write_index is in memory
my_que[write_index_local].param1 = 16; //my_que is my queue and stored in memory also
my_que[write_index_local].param2 = 32;
//similar writing stuff
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
}
CPU 2 runs:
void que_reader()
{
unsigned int read_index_local, param1, param2;
read_index_local = my_que_read_index; //also in memory
while(read_index_local != my_que_write_index)
{
param1 = my_que[read_index_local].param1;
if(param1 == 0) FATAL_ERROR;
param2 = my_que[read_index_local].param2;
//similar reading stuff
my_que[read_index_local].param1 = 0;
++read_index_local;
if(read_index_local == QUE_LEN) read_index_local = 0;
}
my_que_read_index = read_index_local;
}
Okay, in a normal case, fatal error should never occur because param1 of the queue is always stored with a constant value of 16. But somehow param1 of the queue is happening 0 and fatal error occurs.
It is clear that this is somehow a race condition problem, but I can't figure how it is happening. Indexes are updated seperately by the CPUs.
I don't want to fill my code with memory barriers without understanding the core of the problem. Do you have any idea how this is happening?
Details: This is a baremetal system, these codes are interrupt-disabled, and there is no preemption or task switching.
The compiler and the CPU are allowed to rearrange stores and loads as they see fit (i.e. as long as a single threaded program would not be able to observe a difference). Of course for multi-threaded programs these effects are observable quite well.
For example, this code
write_index_local = my_que_write_index;
my_que[write_index_local].param1 = 16;
my_que[write_index_local].param2 = 32;
++write_index_local;
if(write_index_local == QUE_LEN) write_index_local = 0;
my_que_write_index = write_index_local;
could be reordered like this
a = my_que_write_index;
my_que_write_index = write_index_local == QUE_LEN - 1 ? 0 : a + 1;
my_que[a].param1 = 16;
my_que[a].param2 = 32;
Getting this stuff right requires atomics and barriers that avoid these kinds of reorderings. Check out Preshing's excellent series of blog posts to learn about atomics, this one is probably a good start: http://preshing.com/20120612/an-introduction-to-lock-free-programming/ but check out the following ones as well.
I have a problem with this code.
It works as expected, excepting that it gets Seg fault right at the end.
Here is the code:
void distribuie(int *nrP, pach *pachet, post *postas) {
int nrPos, k, i, j;
nrPos = 0;
for (k = 0; k < 18; k++)
pos[k].nrPac = 0;
for (i = 0; i < *nrP; i++) {
int distributed = 0;
for (j = 0; j < nrPos; j++)
if (pac[i].idCar == pos[j].id) {
pos[j].vec[pos[j].nrPac] = pac[i].id;
pos[j].nrPac++;
distributed = 1;
break;
}
if (distributed == 0) {
pos[nrPos].id = pac[i].idCar;
pos[nrPos].vec[0] = pac[i].id;
pos[nrPos].nrPac = 1;
nrPos++;
}
}
for (i = 0; i < nrPos; i++) {
printf("%d %d ", pos[i].id, pos[i].nrPac);
for (j = 0; j < pos[i].nrPac; j++)
printf("%d ", pos[i].vec[j]);
printf("\n");
}
}
and calling this function in main().
Running with gdb resulted in this error:
Program received signal SIGSEGV, Segmentation fault.
0x00000001 in ?? ()
If gdb can't find the stack trace, it means your code wrote over the stack so thoroughly that neither the normal C runtime nor gdb can find the information about where the function should return on the stack.
Or, in other words, you have a (major) stack overflow.
Somewhere, your code is writing out of bounds of an array. It is curious that the code posted references global variables pos and pac but is passed (unused) variables postas and pachet. It suggests that the code you're showing isn't the code you're executing. However, assuming that pos and pac are really spelled the same as postas and pachet, then it could be that you are mishandling the call to your distribuie() function. (If, as a comment suggests, pos and pac really are global variables, then why does the function get passed postas and pachet?)
Are you getting any compilation warnings? Have you enabled compilation warnings? If you've got GCC, does the code compile cleanly with -Wall? What about with -Wall -Wextra? If you're getting any warnings, fix the causes. Remember, at this stage in your career, it is probable that the C compiler knows more about C than you do.
You can help yourself with the debugging by printing key values (like *nrP) on entry to the function. If that isn't a sane value, you know where to start looking. You might also take a good look at the data for the line:
pos[j].vec[pos[j].nrPac] = pac[i].id;
There is lots of room there for things to go badly astray!
I lack information to completely help you: I don't know the size of the pos[] array. The loop with k<18 suggests it is 18 elements (but it could be less; I simply don't know). Then you start processing *nrP pachets, but you don't check that you process at most 18 of these. If there are more, you overwrite some other memory. Then you want to print the result et voila, a segmentation fault, meaning some memory got corrupted, is used by someone thinking it is a valid pionter, but the pointer is invalid and...bang - segfault.
So the for loop should at least check the bounds (assuming 18):
for (i = 0; i < *nrP && i < 18; i++) {
In the same way, the pos structure apparently has an array of vec, but its size is unknown and by the same reasoning can be 18, can be less or an be more:
pos[j].vec[pos[j].nrPac]
If you add all your bounds checks it will probably run.
So, I'm working at inventing my own tile map creation and I got a problem on size. The maximum size (which I did not set) is <700x700, anything higher makes it crash. First, I thought it's something I got wrong when making the "presentation version" which outputs the result on screen -> ScreenShot, but now I just finished making it more compact and tried using 800x800 and it still has the 7 limit, but I have no idea why. Since the code isn't that big I will show it here. If you have some tips I don't mind taking them.
#include <iostream>
#include <string.h>
#include <fstream>
#include <ctime>
#include <cstdlib>
#include <SFML/Graphics.hpp>
#include <SFML/Audio.hpp>
#define _WIN32_WINNT 0x0501
#include <windows.h>
using namespace std;
int main()
{
sf::Vector2i Size;
int Points,rands,PointsCheck=1,x,y,RandX,RandY,CurrentNumber=1;
srand(time(0));
bool Done=false,Expanded,Border;
ofstream Out("txt.txt");
/***/
cout << "Size X-Y = "; cin >> Size.x >> Size.y;cout << endl;
cout << "MAX Points - " << (Size.x*Size.y)/10 << endl;
cout << "Number of POINTS = ";cin >> Points ;cout << endl;
/***/
int PixelMap[Size.x+1][Size.y+1];
/***/
for (x=1;x<=Size.x;x++) for (y=1;y<=Size.y;y++) PixelMap[x][y]=0;
/***/
while(PointsCheck<=Points)
{
rands=1+(rand()%10);
RandX=1+(rand()%(Size.x));RandY=1+(rand()%(Size.y));
if (rands==1 && PointsCheck<=Points && PixelMap[RandX][RandY]==0)
{PixelMap[RandX][RandY]=CurrentNumber;CurrentNumber+=2;PointsCheck++;}
}
/***/
while(Done==false)
{
Done=true;
for(x=1;x<=Size.x;x++)
for(y=1;y<=Size.y;y++)
if(PixelMap[x][y]%2!=0 && PixelMap[x][y]!=-1)
{
if (PixelMap[x+1][y]==0) PixelMap[x+1][y]=PixelMap[x][y]+1;
if (PixelMap[x-1][y]==0) PixelMap[x-1][y]=PixelMap[x][y]+1;
if (PixelMap[x][y+1]==0) PixelMap[x][y+1]=PixelMap[x][y]+1;
if (PixelMap[x][y-1]==0) PixelMap[x][y-1]=PixelMap[x][y]+1;
}
for(x=1;x<=Size.x;x++)
for(y=1;y<=Size.y;y++)
if(PixelMap[x][y]!=0 && PixelMap[x][y]%2==0) {PixelMap[x][y]--;Done=false;}
}
for(x=1;x<=Size.x;x++){
for(y=1;y<=Size.y;y++)
{Out << PixelMap[x][y] << " ";}Out << endl;}
//ShowWindow (GetConsoleWindow(), SW_HIDE);
}
What you have here is the concept from which this site gets its name. You have a stack overflow:
int PixelMap[Size.x+1][Size.y+1];
If you want to allocate a large amount of memory, you need to do it dynamically (on the heap).
You can do this any number of ways. Since you are using C++, I recommend using a std::vector. The only trick is making the array 2-dimensional. Usually this is done in the same way as the one you allocated on the stack, except you don't get language syntax to help you:
vector<int> PixelMap( (Size.x+1) * (Size.y+1) );
Above, you'll need to calculate the linear index from the row/column. Something like:
int someval = PixelMap[ row * (size.y+1) + column ];
If you really want to use the [row][column] indexing syntax, you can either make a vector-of-vectors (not recommended), or you can index your rows:
vector<int> PixelMapData( (Size.x+1) * (Size.y+1) );
vector<int*> PixelMap( Size.x+1 );
PixelMap[0] = &PixelMapData[0];
for( int i = 0; i < Size.x+1; i++ ) {
PixelMap[i+1] = PixelMap[i] + Size.y + 1;
}
Now you can index in 2D:
int someval = PixelMap[row][col];
There's a couple of problems with your code:
First off:
int PixelMap[Size.x+1][Size.y+1];
for (x=1;x<=Size.x;x++)
for (y=1;y<=Size.y;y++)
PixelMap[x][y]=0;
In the above snipped you are never setting the value of PixelMap[0][0], or PixelMap0, etc. Basically those values will be undefined. Arrays in C++ are 0 indexed so you need to be sure you address those. Also, why are you using Size.x+1 and Size.y+1? Something feels wrong about that.
A better loop would be:
int PixelMap[Size.x][Size.y];
for (x=0;x<Size.x;x++)
for (y=0;y<Size.y;y++)
PixelMap[x][y]=0;
Second, this next bit of code is illegible:
while(PointsCheck<=Points)
{
rands=1+(rand()%10);
RandX=1+(rand()%(Size.x));
RandY=1+(rand()%(Size.y));
if (rands==1 && PointsCheck<=Points && PixelMap[RandX][RandY]==0)
{
PixelMap[RandX][RandY]=CurrentNumber;
CurrentNumber+=2;
PointsCheck++;
}
}
You're only incrementing PointsCheck if
PointsCheck <= Points
Why? You test for this to be true in your while condition. PointsCheck doesn't get incremented anywhere before this test.
rands is never guaranteed to be equal to 1 by the way, so your loop could go on for eternity (though unlikely).
The next loop suffers from similar problems as above:
while(Done==false)
{
Done=true;
What's the reason for this? You never break out of the while loop, and you never set Done to false, so the next block of code will only ever be executed once. remove this bit.
Your for-loops that follow should start at 0 and go while < Size(Size.x and Size.y)
for(x=0;x<Size.x;x++)
for(y=0;y<Size.y;y++)
Fix these issues first, and then if you still have a problem we can move on. And for all our sake, please use brackets {} to scope your for loops and if statements so that we can follow. Also, separate commands onto separate lines. It's a lot of work for us to follow more than one semicolon per line.
EDIT
Since you seem unwilling to fix these issues first:
This could be an issue with the amount of memory allocated on the stack for your program. If you're trying to create an array of 800x800 integers, then you're using 800*800*4 bytes = 2.4 MB of data. I know this is higher than visual studio's default limit of 1 MB, but since a 700x700 array uses 1.8 MB, then whatever program you're using has a higher default (or you set visual studio's higher, but not high enough).
See if you can set your limit to at least 3 MB. More is better, though. If this doesn't fix your scaling problem up to 800, then you have other issues.
EDIT2
I just noticed this:
sf::Vector2i Size;
//unimportant stuff
cin >> Size.x >> Size.y;
int PixelMap[Size.x+1][Size.y+1];
Vector2i will probably have default values for x and y. If you want to dynamically allocate more than what those are, you cannot statically say
PixelMap[Size.x][Size.y]
You need to dynamically allocate the array. I strongly suggest using something like a std::vector > for this
e.g.(untested code):
sf::Vector2i Size;
//unimportant stuff
cin >> Size.x >> Size.y;
std::vector<vector<int> > PixelMap;
//Initialize values to 0
for(size_t i=0; i < Size.x; ++i){
vector<int> nextVec;
for(size_t j=0; j < Size.y; ++j){
nextVec.push_back(0);
}
PixelMap.push_back(nextVec);
}
Not sure if this has anything to do with your crash (I would have added a comment, but I don't have the reputation), but here's a problem I noticed:
Your array indexing scheme is not consistent. Since you're using index 1 to indicate the first element, your bounds checking should look like this...
if (y!=1 && y!=Size.y && x!=1 && x!=Size.x && ...
...instead of this...
if (y!=0 && y!=Size.y && x!=0 && x!=Size.x && ...
[EDIT]
I just tried this:
...
cout << "asdf" << endl;
int PixelMap[Size.x+1][Size.y+1];
cout << "asdf" << endl;
...
and verified it's a stack overflow problem. So, as others mentioned above, allocate your pixel map on the heap and it should be fine.
BTW, this code...
int PixelMap[Size.x+1][Size.y+1];
is not standard C++. It's an extension some compilers provide, called 'variable length arrays'. Check this out for more info -> Why aren't variable-length arrays part of the C++ standard?
[/EDIT]
ofstream osCtrs("cts.txt",ios::out);
if (osCtrs.is_open()){
for(unsigned ci = 0; ci < k; ci++){
KMpoint& x = ctrs[ci];
for (unsigned di = 0; di < dim; di++)
{
//osCtrs << x[di];
osCtrs << "what is happening?";
}
}
osCtrs.close();
}
anything wrong?
file is created, but always empty,
The code works fine for me, given positive values for k and dim. Are you sure they're both non-zero? If either one is 0 or less, the program will never enter the inner loop where you're actually outputting stuff. Try setting a breakpoint and stepping through the code to see what's happening.
Also, you don't need to specify ios::out for an ofstream, it's implied.