can not write data into file using c++ - file

ofstream osCtrs("cts.txt",ios::out);
if (osCtrs.is_open()){
for(unsigned ci = 0; ci < k; ci++){
KMpoint& x = ctrs[ci];
for (unsigned di = 0; di < dim; di++)
{
//osCtrs << x[di];
osCtrs << "what is happening?";
}
}
osCtrs.close();
}
anything wrong?
file is created, but always empty,

The code works fine for me, given positive values for k and dim. Are you sure they're both non-zero? If either one is 0 or less, the program will never enter the inner loop where you're actually outputting stuff. Try setting a breakpoint and stepping through the code to see what's happening.
Also, you don't need to specify ios::out for an ofstream, it's implied.

Related

Value of variable changes with unrelated code

I struggle with a bug since hours now. Basically, I do some simple bit operation on an uint64_t array in main.c (no function calls). It works properly on gcc (Ubuntu), MSVS2019 (Windows 10) in Debug, but not in Release. However my target architecture is x64/Windows, so I need to get it work properly with MSVS2019/Release. Besides that, I'm curious what the reason for the problem is. None of the compilers shows errors or warnings.
Now, as soon as I add a totally unrelated command to the loop (commented printf()), it works properly.
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
...
Initially I believed that I messed up the stack somewhere before the for() loop, but I checked it up multiple times ... all fine!
all used variables are checked to be initialized
no pointer returns of local variables (in scope)
array indexing (reads and writes) all within declaration limits (in scope)
All Google/SE posts explain subject UB to some of the above reasons, but none of these apply for my code. Also the fact, that it works in MSVS2019/Debug and gcc shows the code works.
What do I miss?
--- UPDATE (24.08.2021 12:00) ---
I'm completely stuck, since added printf() modifies the result and MSVS/Debug works. So how can I inspect variables?!
#Lev M There are quite some calculations before and after the shown for() loop. That's why I skipped most of the code and just showed the snippet where I could influence the code towards working correctly. I know what should be the final result (it's just a uint64_t), and it's wrong with the Release version of MSVS. I also checked w/o the for() loop. It's not optimized "away". If I leave it out completely, the result is again different.
#tstanisl It's just a matter of an uint64_t number. I know that input A should output B.
#Steve Summit That's why I posted (a bit desperate). I checked in all directions, isolated as much code as I could and yet ... no uninitialized variable or array out of bound. Driving me nuts.
#Craig Estey The code is unfortunately quite extensive. I wonder ... could the error also be in a part of the code which doesn't run?
#Eric Postpischil Agreed!
#Nate Eldredge I tested on valgrind (see below).
...
==13997== HEAP SUMMARY:
==13997== in use at exit: 0 bytes in 0 blocks
==13997== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
==13997==
==13997== All heap blocks were freed -- no leaks are possible
==13997==
==13997== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
--- UPDATE (24.08.2021 18:00) ---
I found the reason for the problem (after countless trial-and-errors), but no solution yet. I post more of the code.
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 3) | 3;
}
...
In fact, the MSVS/Release compiler did this:
...
int q = 5;
uint64_t a[32] = { 0 };
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
a[q] = (a[q] << 3) | 3;
}
...
... which is not the same. Never seen such a thing!
How can I force the compiler to keep the 2 for() loops separate?
Summary:
MSVS/Release (default solution properties) optimization will change this code ...
// Code 1
...
int q = 5;
uint64_t a[32];
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
// printf("%i \n", i); // that's the line which makes it work
}
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 3) | 3;
}
...
... into the following one, which is not the same as ...
// Code 2
...
int q = 5;
uint64_t a[32];
// a[] is filled with data
for (int i = 0; i < 32; i++) {
a[q] = (a[q] << 2) | 8;
a[q] = (a[q] << 3) | 3;
}
...
Above excerpt is slightly simplified, since not limited to constant 32 loops, but kept variable (% 8). Hence 64-bit constants can't be used as commented by a user.
Discoveries:
MSVS/Release - fails
MSVS/Debug - works
gcc/Release - works
gcc/Debug - works
MSVS/Release optimization merges the two for() loops (Code 1) into one for() loop (Code 2).
Fixes:
The commented printf() provides an artificial fix this as the compiler sees the requirement to print an intermediate result.
An alternative fix would be to to use the type qualifier volatile for a[].
The root of the issue is, that MSVS optimization doesn't consider that the index q remains the same in both loops, meaning that the first loop needs to finish before the second loop starts.

How can I make a for loop work slower in C?

I am trying to implement a nested for loop that I want to get inside of a "n" index every second, and every 8 seconds I will get into a "i" index. If something happens right in that second I will update some things. Right now since the for loop is too fast it finishes its job before I can do the thing I'm trying to make. What I'm trying to make is for example, if its between 3rd and 4th seconds go to i=0 and n=2, if and if beat happens, update
for (int i = 0; i < 8; i++) {
for (int n = 0; n < 8; n++) {
if (Beat < 1) {
input[i] = input[i] + pow(2.0, n);
for (int k = 1; k < 9; k++) {
SPI_Write2(k, input[k - 1]);
}
}
}
On Linux #include <unistd.h> on windows #include <Windows.h>
sleep(1); //sleep for 1 sec
However, trying to avoid any kind of race conditions, with some kind of wait is a bad idea in general. So either go with a callback function or using signals. If it is threaded than e.g. wait for the thread to be joined (or detach it).
The provided code is not sufficient. Also the question seems little unclear.

Paralellized execution in nested for using Cilk

I'm trying to implement a 2D-stencil algorithm that manipulates a matrix. For each field in the matrix, the fields above, below, left and right of it are to be added and divided by 4 in order to calculate the new value. This process may be iterated multiple times for a given matrix.
The program is written in C and compiles with the cilkplus gcc binary.
**Edit: I figured you might interested in the compiler flags:
~/cilkplus/bin/gcc -fcilkplus -lcilkrts -pedantic-errors -g -Wall -std=gnu11 -O3 `pkg-config --cflags glib-2.0 gsl` -c -o sal_cilk_tst.o sal_cilk_tst.c
Please note that the real code involves some pointer arithmetic to keep everything consistent. The sequential implementation works. I'm omitting these steps here to enhance understandability.
A pseudocode would look something like this (No edge case handling):
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
for(int k = 0; k < matrix.height; k++){
result_ matrix[j][k] = (matrix[j-1][k] +
matrix[j+1][k] +
matrix[j] [k+1] +
matrix[j] [k-1]) / 4;
}
}
matrix = result_matrix;
}
The stencil calculation itself is then moved to the function apply_stencil(...)
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
for(int k = 0; k < matrix.height; k++){
apply_stencil(matrix, result_matrix, j, k);
}
}
matrix = result_matrix;
}
and parallelization is attempted:
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
cilk_for(int k = 0; k < matrix.height; k++){ /* <--- */
apply_stencil(matrix, result_matrix, j, k);
}
}
matrix = result_matrix;
}
This version compiles without errors/warning, but just straight out produces a Floating point exception when executed. In case you are wondering: It does not matter which of the for loops are made into cilk_for loops. All configurations (except no cilk_for) produce the same error.
the possible other method:
for(int i = 0; i < iterations; i++){
for(int j = 0; j < matrix.width; j++){
for(int k = 0; k < matrix.height; k++){
cilk_spawn apply_stencil(matrix, result_matrix, j, k); /* <--- */
}
}
cilk_sync; /* <--- */
matrix = result_matrix;
}
This produces 3 warnings when compiled: i, j and k appear to be uninitialized.
When trying to execute, the function which executes the matrix = result_matrix; step appears to be undefined.
Now for the actual question: Why and how does Cilk break my sequential code; or rather how can I prevent it from doing so?
The actual code is of course available too, should you be interested. However, this project is for an university class and therefore subject to plagiarism from other students who find this thread which is why I would prefer not to share it publicly.
**UPDATE:
As suggested I attempted to run the algorithm with only 1 worker thread, effectively making the cilk implementation sequential. This did, surprisingly enough, work out fine. However as soon as I change the number of workers to two, the familiar errors return.
I don't think this behavior is caused by race-conditions though. Since the working matrix is changed after each iteration and cilk_sync is called, there is effectively no critical section. All threads do not depend on data written by others in the same iteration.
The next steps I will attempt is to try out other versions of the cilkplus compiler, to see if its maybe an error on their side.
With regards to the floating point exception in a cilk_for, there are some issues that have been fixed in some versions of the Cilk Plus runtime. Is it possible that you are using an outdated version?
https://software.intel.com/en-us/forums/intel-cilk-plus/topic/558825
Also, what were the specific warning messages that are produced? There are some "uninitialized variable" warnings that occur with older versions of Cilk Plus GCC, which I thought were spurious warnings.
The Cilk runtime uses a recursive divide and conquer algorithm to parallelize your loop. Essentially, it breaks the range in half, and recursively calls itself twice, spawning half and calling half.
As part of the initialization, it calculates a "grain size" which is the size of the minimum size it will break your range into. By default, that's loopRange/8P, where P is the number of cores.
One interesting experiment would be to set the number of Cilk workers to 1. When you do this, all of the cilk_for mechanism is excersized, but because there's only 1 worker, nothing gets stolen.
Another possibility is to try running your code under Cilkscreen - the Cilk race detector. Unfortunately only the cilkplus branch of GCC generates the annotations that Cilkscreen needs. Your choices are to use the Intel commpiler, or try using the cilkplus branch of GCC 4.9. Directions on how to pull down the code and build it are at the cilkplus.org website.

Segmentation fault right at the end of the program

I have a problem with this code.
It works as expected, excepting that it gets Seg fault right at the end.
Here is the code:
void distribuie(int *nrP, pach *pachet, post *postas) {
int nrPos, k, i, j;
nrPos = 0;
for (k = 0; k < 18; k++)
pos[k].nrPac = 0;
for (i = 0; i < *nrP; i++) {
int distributed = 0;
for (j = 0; j < nrPos; j++)
if (pac[i].idCar == pos[j].id) {
pos[j].vec[pos[j].nrPac] = pac[i].id;
pos[j].nrPac++;
distributed = 1;
break;
}
if (distributed == 0) {
pos[nrPos].id = pac[i].idCar;
pos[nrPos].vec[0] = pac[i].id;
pos[nrPos].nrPac = 1;
nrPos++;
}
}
for (i = 0; i < nrPos; i++) {
printf("%d %d ", pos[i].id, pos[i].nrPac);
for (j = 0; j < pos[i].nrPac; j++)
printf("%d ", pos[i].vec[j]);
printf("\n");
}
}
and calling this function in main().
Running with gdb resulted in this error:
Program received signal SIGSEGV, Segmentation fault.
0x00000001 in ?? ()
If gdb can't find the stack trace, it means your code wrote over the stack so thoroughly that neither the normal C runtime nor gdb can find the information about where the function should return on the stack.
Or, in other words, you have a (major) stack overflow.
Somewhere, your code is writing out of bounds of an array. It is curious that the code posted references global variables pos and pac but is passed (unused) variables postas and pachet. It suggests that the code you're showing isn't the code you're executing. However, assuming that pos and pac are really spelled the same as postas and pachet, then it could be that you are mishandling the call to your distribuie() function. (If, as a comment suggests, pos and pac really are global variables, then why does the function get passed postas and pachet?)
Are you getting any compilation warnings? Have you enabled compilation warnings? If you've got GCC, does the code compile cleanly with -Wall? What about with -Wall -Wextra? If you're getting any warnings, fix the causes. Remember, at this stage in your career, it is probable that the C compiler knows more about C than you do.
You can help yourself with the debugging by printing key values (like *nrP) on entry to the function. If that isn't a sane value, you know where to start looking. You might also take a good look at the data for the line:
pos[j].vec[pos[j].nrPac] = pac[i].id;
There is lots of room there for things to go badly astray!
I lack information to completely help you: I don't know the size of the pos[] array. The loop with k<18 suggests it is 18 elements (but it could be less; I simply don't know). Then you start processing *nrP pachets, but you don't check that you process at most 18 of these. If there are more, you overwrite some other memory. Then you want to print the result et voila, a segmentation fault, meaning some memory got corrupted, is used by someone thinking it is a valid pionter, but the pointer is invalid and...bang - segfault.
So the for loop should at least check the bounds (assuming 18):
for (i = 0; i < *nrP && i < 18; i++) {
In the same way, the pos structure apparently has an array of vec, but its size is unknown and by the same reasoning can be 18, can be less or an be more:
pos[j].vec[pos[j].nrPac]
If you add all your bounds checks it will probably run.

Informative "if" statement in "for" loop

Normally when I have a big for loop I put messages to inform me in which part of the process my program is, for example:
for(i = 0; i < large_n; i++) {
if( i % (large_n)/1000 == 0) {
printf("We are at %ld \n", i);
}
// Do some other stuff
}
I was wondering if this hurts too much the performance (a priori) and if it is the case if there is a smarter alternative.Thanks in advance.
Maybe you can split the large loop in order to check the condition sometimes only, but I don't know if this will really save time, that depends more on your "other stuff".
int T = ...; // times to check the condition, make sure large_n % T == 0
for(int t = 0; t < T; ++t)
{
for(int i = large_n/T * t; i < large_n/T * (t+1); ++i)
{
// other stuff
}
printf("We are at %ld \n", large_n/T * (t+1));
}
Regardless of what is in your loop, I wouldn't be leaving statements like printf in unless it's essential to the application/user, nor would I use what are effectively redundant if statements, for the same reason.
Both of these are examples of trace level debugging. They're totally valid and in some cases very useful, but generally not ultimately so in the end application. In this respect, a usual thing to do is to only include them in the build when you actually want to use the information they provide. In this case, you might do something like this:
#define DEBUG
for(i = 0; i < large_n; i++)
{
#ifdef DEBUG
if( i % (large_n)/1000 == 0)
{
printf("We are at %ld \n", i);
}
#endif
}
Regarding the performance cost of including these debug outputs all the time, it will totally depend on the system you're running, the efficiency of whatever "printing" statement you're using to output the data, the check/s you're performing and, of course, how often you're trying to perform output.
Your mod test probably doesn't hurt performance but if you want a very quick test and you're prepared for multiples of two then consider a mathematical and test:
if ( ( i & 0xFF ) == 0 ) {
/* this gets printed every 256 iterations */
...
}
or
if ( ( i & 0xFFFF ) == 0 ) {
/* this gets printed every 65536 iterations */
...
}
By placing a print statement inside of the for loop, you are sacrificing some performance.
Because the program needs to do a system call to write output to the screen every time the message is printed, it takes CPU time away from the program itself.
You can see the difference in performance between these two loops:
int i;
printf("Start Loop A\n");
for(i = 0; i < 100000; i++) {
printf("%d ", i);
}
printf("Done with Loop A\n");
printf("Start Loop B\n");
for(i = 0; i < 100000; i++) {
// Do Nothing
}
printf("Done with Loop B\n");
I would include timing code, but I am in the middle of work and can update it later over lunch.
If the difference isn't noticeable, you can increase 100000 to a larger number (although too large a number would cause the first loop to take WAY too long to complete).
Whoops, forgot to finish my answer.
To cut down on the number of system calls your program needs to make, you could check a condition first, and only print if that condition is true.
For example, if you were counting up as in my example code, you could only print out every 100th number by using %:
int i;
for(i = 0; i < 100000; i++) {
if(i%100 == 0)
printf("%d", i);
}
That will reduce the number of syscalls from ~100000 to ~1000, which in turn would increase the performance of the loop.
The problem is IO operation printf takes a much time than processor calculates. you can reduce the time if you can add them all and print finally.
Notation:
Tp = total time spent executing the progress statements.
Tn = total time spent doing the other normal stuff.
>> = Much greater than
If performance is your main criteria, you want Tn >> Tp. This strongly suggests that the code should be profiled so that you can pick appropriate values. The routine 'printf()' is considered a slow routine (much slower than %) and is a blocking routine (that is, the thread that calls it may pend waiting for a resource used by it).
Personally, I like to abstract away the progress indicator. It can be a logging mechanism,
a printf, a progress box, .... Heck, it may be updating a structure that is read by another thread/task/process.
id = progressRegister (<some predefined type of progress update mechanism>);
for(i = 0; i < large_n; i++) {
progressUpdate (id, <string>, i, large_n);
// Do some other stuff
}
progressUnregister(id);
Yes, there is some overhead in calling the routine 'progressUpdate()' on each iteration, but again, as long as Tn >> Tp, it usually is not that important.
Hope this helps.

Resources