Informative "if" statement in "for" loop - c

Normally when I have a big for loop I put messages to inform me in which part of the process my program is, for example:
for(i = 0; i < large_n; i++) {
if( i % (large_n)/1000 == 0) {
printf("We are at %ld \n", i);
}
// Do some other stuff
}
I was wondering if this hurts too much the performance (a priori) and if it is the case if there is a smarter alternative.Thanks in advance.

Maybe you can split the large loop in order to check the condition sometimes only, but I don't know if this will really save time, that depends more on your "other stuff".
int T = ...; // times to check the condition, make sure large_n % T == 0
for(int t = 0; t < T; ++t)
{
for(int i = large_n/T * t; i < large_n/T * (t+1); ++i)
{
// other stuff
}
printf("We are at %ld \n", large_n/T * (t+1));
}

Regardless of what is in your loop, I wouldn't be leaving statements like printf in unless it's essential to the application/user, nor would I use what are effectively redundant if statements, for the same reason.
Both of these are examples of trace level debugging. They're totally valid and in some cases very useful, but generally not ultimately so in the end application. In this respect, a usual thing to do is to only include them in the build when you actually want to use the information they provide. In this case, you might do something like this:
#define DEBUG
for(i = 0; i < large_n; i++)
{
#ifdef DEBUG
if( i % (large_n)/1000 == 0)
{
printf("We are at %ld \n", i);
}
#endif
}
Regarding the performance cost of including these debug outputs all the time, it will totally depend on the system you're running, the efficiency of whatever "printing" statement you're using to output the data, the check/s you're performing and, of course, how often you're trying to perform output.

Your mod test probably doesn't hurt performance but if you want a very quick test and you're prepared for multiples of two then consider a mathematical and test:
if ( ( i & 0xFF ) == 0 ) {
/* this gets printed every 256 iterations */
...
}
or
if ( ( i & 0xFFFF ) == 0 ) {
/* this gets printed every 65536 iterations */
...
}

By placing a print statement inside of the for loop, you are sacrificing some performance.
Because the program needs to do a system call to write output to the screen every time the message is printed, it takes CPU time away from the program itself.
You can see the difference in performance between these two loops:
int i;
printf("Start Loop A\n");
for(i = 0; i < 100000; i++) {
printf("%d ", i);
}
printf("Done with Loop A\n");
printf("Start Loop B\n");
for(i = 0; i < 100000; i++) {
// Do Nothing
}
printf("Done with Loop B\n");
I would include timing code, but I am in the middle of work and can update it later over lunch.
If the difference isn't noticeable, you can increase 100000 to a larger number (although too large a number would cause the first loop to take WAY too long to complete).
Whoops, forgot to finish my answer.
To cut down on the number of system calls your program needs to make, you could check a condition first, and only print if that condition is true.
For example, if you were counting up as in my example code, you could only print out every 100th number by using %:
int i;
for(i = 0; i < 100000; i++) {
if(i%100 == 0)
printf("%d", i);
}
That will reduce the number of syscalls from ~100000 to ~1000, which in turn would increase the performance of the loop.

The problem is IO operation printf takes a much time than processor calculates. you can reduce the time if you can add them all and print finally.

Notation:
Tp = total time spent executing the progress statements.
Tn = total time spent doing the other normal stuff.
>> = Much greater than
If performance is your main criteria, you want Tn >> Tp. This strongly suggests that the code should be profiled so that you can pick appropriate values. The routine 'printf()' is considered a slow routine (much slower than %) and is a blocking routine (that is, the thread that calls it may pend waiting for a resource used by it).
Personally, I like to abstract away the progress indicator. It can be a logging mechanism,
a printf, a progress box, .... Heck, it may be updating a structure that is read by another thread/task/process.
id = progressRegister (<some predefined type of progress update mechanism>);
for(i = 0; i < large_n; i++) {
progressUpdate (id, <string>, i, large_n);
// Do some other stuff
}
progressUnregister(id);
Yes, there is some overhead in calling the routine 'progressUpdate()' on each iteration, but again, as long as Tn >> Tp, it usually is not that important.
Hope this helps.

Related

How can I make a for loop work slower in C?

I am trying to implement a nested for loop that I want to get inside of a "n" index every second, and every 8 seconds I will get into a "i" index. If something happens right in that second I will update some things. Right now since the for loop is too fast it finishes its job before I can do the thing I'm trying to make. What I'm trying to make is for example, if its between 3rd and 4th seconds go to i=0 and n=2, if and if beat happens, update
for (int i = 0; i < 8; i++) {
for (int n = 0; n < 8; n++) {
if (Beat < 1) {
input[i] = input[i] + pow(2.0, n);
for (int k = 1; k < 9; k++) {
SPI_Write2(k, input[k - 1]);
}
}
}
On Linux #include <unistd.h> on windows #include <Windows.h>
sleep(1); //sleep for 1 sec
However, trying to avoid any kind of race conditions, with some kind of wait is a bad idea in general. So either go with a callback function or using signals. If it is threaded than e.g. wait for the thread to be joined (or detach it).
The provided code is not sufficient. Also the question seems little unclear.

How can I best "parallelise" a set of four nested for()-loops in a Brute-Force attack?

I have the following homework task:
I need to brute force 4-char passphrase with the following mask
%%##
( where # - is a numeric character, % - is an alpha character )
in several threads using OpenMP.
Here is a piece of code, but I'm not sure if it is doing the right thing:
int i, j, m, n;
const char alph[26] = "abcdefghijklmnopqrstuvwxyz";
const char num[10] = "0123456789";
#pragma omp parallel for private(pass) schedule(dynamic) collapse(4)
for (i = 0; i < 26; i++)
for (j = 0; j < 26; j++)
for (m = 0; m < 10; m++)
for (n = 0; n < 10; n++) {
pass[0] = alph[i];
pass[1] = alph[j];
pass[2] = num[m];
pass[3] = num[n];
/* Working with pass here */
}
So my question is :
How to correctly specify the "parallel for" instruction, in order to split the range of passphrases between several cores?
Help is much appreciated.
Your code is pretty much right, except for using alph instead of num. If you're able to define the pass variable within the loop, that'll save you many a headache.
A full MWE might look like:
//Compile with, e.g.: gcc -O3 temp.c -std=c99 -fopenmp
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int PassCheck(char *pass){
usleep(50); //Sleep for 100 microseconds to simulate work
return strncmp(pass, "qr34", 4)==0;
}
int main(){
const char alph[27] = "abcdefghijklmnopqrstuvwxyz";
const char num[11] = "0123456789";
char goodpass[5] = "----"; //Provide a default password to indicate an error state
int i, j, m, n;
#pragma omp parallel for collapse(4)
for (i = 0; i < 26; i++)
for (j = 0; j < 26; j++)
for (m = 0; m < 10; m++)
for (n = 0; n < 10; n++){
char pass[4];
pass[0] = alph[i];
pass[1] = alph[j];
pass[2] = num[m];
pass[3] = num[n];
if(PassCheck(pass)){
//It is good practice to use `critical` here in case two
//passwords are somehow both valid. This won't arise in
//your code, but is worth thinking about.
#pragma omp critical
{
memcpy(goodpass, pass, 4);
goodpass[4] = '\0';
//#pragma omp cancel for //Escape for loops!
}
}
}
printf("Password was '%s'.\n",goodpass);
return 0;
}
Dynamic scheduling
Using a dynamic schedule here is probably pointless. Your expectation should be that each password will take, on average, about the same amount of time to check. Therefore, each iteration of the loop will take about the same amount of time. Therefore, there is no need to use dynamic scheduling because your loops will remain evenly distributed.
Visual noise
Note that the loop nest is stacked, rather than indented. You'll often see this in code where there are many nested loops as it tends to reduce visual noise.
Breaking early
#pragma omp cancel for is available as of OpenMP 4.0; however, I got a warning using it in this context, so I've commented it out. If you are able to get it working, that'll reduce your run-time by half since all effort is wasted once the correct password has been found and the password will, on average, be located half-way through the search space.
Where the guessed password is generated
One of the commentors suggests moving, e.g. pass[0] so that it is not in the innermost loop. This is a bad idea as doing so will prevent you from using collapse(4). As a result you could parallelize the outer loop, but you run the risk that its iteration count cannot be evenly divided by the number of threads, resulting in a large load imbalance. Alternatively, you could parallelize the inner loop, which exposes you to the same problem plus high synchronization costs each time the loop ends.
Why usleep?
The usleep function causes the code to run slowly. This is intentional; it provides feedback on the effect of parallelism, since the workload is so small.
If I remove the usleep, then the code completes in 0.003s on a single core and 0.004s on 4 cores. You cannot tell that the parallelism is even working. Leaving usleep in gives 8.950s on a single core and 2.257s on 4 cores, an apt demonstration of the effectiveness of the parallelism.
Naturally, you would remove this line once you're sure that parallelism is working correctly.
Further, any actual brute-force password cracker would likely be computing an expensive hash function inside the PassCheck function. Including usleep() here allows us to simulate that function and experiment with high-level design without having to the function first.

How do I create a "twirly" in a C program task?

Hey guys I have created a program in C that tests all numbers between 1 and 10000 to check if they are perfect using a function that determines whether a number is perfect. Once it finds these it prints them to the user, they are 6, 28, 496 and 8128. After this the program then prints out all the factors of each perfect number to the user. This is all fine. Here is my problem.
The final part of my task asks me to:
"Use a "twirly" to indicate that your program is happily working away. A "twirly" is the following characters printed over the top of each other in the following order: '|' '/' '-' '\'. This has the effect of producing a spinning wheel - ie a "twirly". Hint: to do this you can use \r (instead of \n) in printf to give a carriage return only (instead of a carriage return linefeed). (Note: this may not work on some systems - you do not have to do it this way.)"
I have no idea what a twirly is or how to implement one. My tutor said it has something to do with the sleep and delay functions which I also don't know how to use. Can anyone help me with this last stage, it sucks that all my coding is complete but I can't get this "twirly" thing to work.
if you want to simultaneously perform the task of
Testing the numbers and
Display the twirly on screen
while the process goes on then you better look into using threads. using POSIX threads you can initiate the task on a thread and the other thread will display the twirly to the user on terminal.
#include<stdlib.h>
#include<pthread.h>
int Test();
void Display();
int main(){
// create threads each for both tasks test and Display
//call threads
//wait for Test thread to finish
//terminate display thread after Test thread completes
//exit code
}
Refer chapter 12 for threads
beginning linux programming ebook
Given the program upon which the user is "waiting", I believe the problem as stated and the solutions using sleep() or threads are misguided.
To produce all the perfect numbers below 10,000 using C on a modern personal computer takes about 1/10 of a second. So any device to show the computer is "happily working away" would either never be seen or would significanly intefere with the time it takes to get the job done.
But let's make a working twirly for perfect number search anyway. I've left off printing the factors to keep this simple. Since 10,000 is too low to see the twirly in action, I've upped the limit to 100,000:
#include <stdio.h>
#include <string.h>
int main()
{
const char *twirly = "|/-\\";
for (unsigned x = 1; x <= 100000; x++)
{
unsigned sum = 0;
for (unsigned i = 1; i <= x / 2; i++)
{
if (x % i == 0)
{
sum += i;
}
}
if (sum == x)
{
printf("%d\n", x);
}
printf("%c\r", twirly[x / 2500 % strlen(twirly)]);
}
return 0;
}
No need for sleep() or threads, just key it into the complexity of the problem itself and have it update at reasonable intervals.
Now here's the catch, although the above works, the user will never see a fifth perfect number pop out with a 100,000 limit and even with a 100,000,000 limit, which should produce one more, they'll likely give up as this is a bad (slow) algorithm for finding them. But they'll have a twirly to watch.
i as integer
loop i: 1 to 10000
loop j: 1 to i/2
sum as integer
set sum = 0
if i%j == 0
sum+=j
return sum==i
if i%100 == 0
str as character pointer
set *str = "|/-\\"
set length = 4
print str[p] using "%c\r" as format specifier
Increment p and assign its modulo by len to p

How to change the count of a pthread_barrier?

The problem is that we have to implement a kind of "running-contest" using pthreads. After one track we have to wait until all runners/threads are done until this point, so we use a barrier for that.
But now we also have to implement the probability of injuries. So we wrote a function, which sometimes reduces the number of runners, and reinitialize the barrier with a smaller count. Now the problem is that the program is not always terminating. I guess the reason for this is that some of the threads have already been at the barrier, and after reinitializing them the required amount is not arriving.
The code for the simulation of the injury looks like this:
void simulateInjury(int number) {
int totalRunners = 0;
int i = 0;
if (rand() % 10 < 1) {
printf("Runner of Team %i injured!\n", number);
pthread_mutex_lock(&evaluate_teamsize);
standings.teamSize[number]--;
for (i = 0; i < teams; i++) {
totalRunners += standings.teamSize[i];
}
pthread_barrier_destroy(&barrier_track1);
pthread_barrier_destroy(&barrier_track4[number]);
pthread_barrier_init(&barrier_track1, NULL, totalRunners);
pthread_barrier_init(&barrier_track4[number], NULL, standings.teamSize[number]);
pthread_mutex_unlock(&evaluate_teamsize);
pthread_exit(NULL);
}
}
Or is there maybe a way to just change the count argument of the barrier?
I see two errors:
You should not re-initialize a barrier while some thread is using
it.
You should not execute the re-initialization of the barrier
simultaneously by several threads.
For the first you can create a second barrier that you use in alternation with the first.
For the second you should use the return value of the wait function to designate one particular thread that will do the re-initialization.

Bakery Lock when used inside a struct doesn't work

I'm new at multi-threaded programming and I tried to code the Bakery Lock Algorithm in C.
Here is the code:
int number[N]; // N is the number of threads
int choosing[N];
void lock(int id) {
choosing[id] = 1;
number[id] = max(number, N) + 1;
choosing[id] = 0;
for (int j = 0; j < N; j++)
{
if (j == id)
continue;
while (1)
if (choosing[j] == 0)
break;
while (1)
{
if (number[j] == 0)
break;
if (number[j] > number[id]
|| (number[j] == number[id] && j > id))
break;
}
}
}
void unlock(int id) {
number[id] = 0;
}
Then I run the following example. I run 100 threads and each thread runs the following code:
for (i = 0; i < 10; ++i) {
lock(id);
counter++;
unlock(id);
}
After all threads have been executed, the result of the shared counter is 10 * 100 = 1000 which is the expected value. I executed my program multiple times and the result was always 1000. So it seems that the implementation of the lock is correct. That seemed weird based on a previous question I had because I didn't use any memory barriers/fences. Was I just lucky?
Then I wanted to create a multi-threaded program that will use many different locks. So I created this (full code can be found here):
typedef struct {
int number[N];
int choosing[N];
} LOCK;
and the code changes to:
void lock(LOCK l, int id)
{
l.choosing[id] = 1;
l.number[id] = max(l.number, N) + 1;
l.choosing[id] = 0;
...
Now when executing my program, sometimes I get 997, sometimes 998, sometimes 1000. So the lock algorithm isn't correct.
What am I doing wrong? What can I do in order to fix it?
Is it perhaps a problem now that I'm reading arrays number and choosing from a struct
and that's not atomic or something?
Should I use memory fences and if so at which points (I tried using asm("mfence") in various points of my code, but it didn't help)?
With pthreads, the standard states that accessing a varable in one thread while another thread is, or might be, modifying it is undefined behavior. Your code does this all over the place. For example:
while (1)
if (choosing[j] == 0)
break;
This code accesses choosing[j] over and over while waiting for another thread to modify it. The compiler is entirely free to modify this code as follows:
int cj=choosing[j];
while(1)
if(cj == 0)
break;
Why? Because the standard is clear that another thread may not modify the variable while this thread may be accessing it, so the value can be assumed to stay the same. But clearly, that won't work.
It can also do this:
while(1)
{
int cj=choosing[j];
if(cj==0) break;
choosing[j]=cj;
}
Same logic. It is perfectly legal for the compiler to write back a variable whether it has been modified or not, so long as it does so at a time when the code could be accessing the variable. (Because, at that time, it's not legal for another thread to modify it, so the value must be the same and the write is harmless. In some cases, the write really is an optimization and real-world code has been broken by such writebacks.)
If you want to write your own synchronization functions, you have to build them with primitive functions that have the appropriate atomicity and memory visibility semantics. You must follow the rules or your code will fail, and fail horribly and unpredictably.

Resources