Please explain the following parallel code template - c

Hi I am new to parallel programming and while reading about it I came across a code template in C , Can you please explain me what this lines mean,line by line???
#include <omp.h>
main () {
int var1, var2, var3;
Serial code
.
.
.
Beginning of parallel section. Fork a team of threads.
Specify variable scoping
#pragma omp parallel private(var1, var2) shared(var3)
{
Parallel section executed by all threads
.
.
.
All threads join master thread and disband
}
Resume serial code
.
.
.
}

First, I'd like to say this is a bad place to ask question because you've obviously not done any research in to the matter yourself (especially as this template is pretty self explanatory).
I'll explain it briefly for you however:
#include <omp.h>
Allows access to the openmp library so you can use all of its functions.
main () {
Please tell me you understand this line? It's very bad practice to not return an int here however, you should really have an int main.
int var1, var2, var3;
Define 3 integers.
Where serial code is written, you can just read normal code here all executing on one thread/processor.
#pragma omp parallel private(var1, var2) shared(var3)
This line is perhaps the most important. It basically says the code in the next set of { } can be executed in parallel. private(var1, var2) means that each thread will get its own copy of these variables (I.e. thread 1's var1 is not the same as thread 2's var1) and shared(var3) means that var3 is the same on all threads (and if it changes on thread 1, it also changes on thread 2)
The code executes in parallel until that } is reached at which point the code returns to normal operating mode if you like on one thread.
You should really read some basic OMP tutorials which you can find anywhere on the internet with a very simple google.
I hope this gets you started though.

Related

What is the equivalent of OpenMP Tasks in Pthreads in this recursion example?

I am studying parallel programming and I use the following OpenMP directive to parallelize a recursive function:
voir recursiveFunction()
{
//sequential code
#pragma omp task
{
recursiveFunction(); //First instance
} //Independent from each other,
//they allow an embarrassingly parallel strategy
recursiveFunction(); //Second instance
}
It works good enough, but I have a hard time trying to make an equivalent parallelization using only pthreads.
I was thinking something like this:
voir recursiveFunction()
{
//sequential code
Pthread_t thread;
//First instance
pthread_create(thread, NULL, recursiveFunction, recFuncStructParameter);
//Second instance
recursiveFunction();
}
And... I am kind of lost here... I can not grasp how to control the number of threads, if for example I want only 16 threads to by created and if all of them are "busy" then continue sequentially until one of them is freed, then do parallel again.
Could someone point me in the right direction? I have seen may examples which seem really complicated but I have a feeling that in this particular example, which allows an embarrassingly parallel strategy, there is a simle approach which I am unable to point out...

C Kernel - interrupts not working during while loops

I am making a kernel from scratch in c (Not linux. COMPLETELY from scratch) and I have come along a bit of a problem. I have this code:
#include "timer.h"
int ms = 0;
void timer_handler(struct regs *r){
ms++;
println("I print every tick");
}
void sleep(int ticks){
unsigned long eticks;
eticks = ms + ticks;
while(ms < eticks);
}
And timer_handler is attached to IRQ0 (The PIT) and works perfectly fine. The println that says "I print every tick" works just fine, and If I print the ms variable in my code, it prints the correct amount. But, If I call the sleep function, the timer_handler stops firing and it gets stuck in an infinite loop. Is there any way I can allow interrupts to fire while in a while loop?
REST OF CODE: https://github.com/Codepixl/CodeOS
It sounds like you are not sending the end of interrupt. You need to do an outb(0x20,0x20) (port and data value are the same so it doesn't matter how you have defined outb) for IRQ0 in your interrupt handler. If you are getting only one interrupt, from your question it is not clear if you are getting any interrupts or not, then this is most likely your problem. If not post all the relevant code, e.g. where you are setting up your IDT.
If this is not the case one simple thing you can do to debug is to cause a software interrupt via the int instruction for the corresponding IDT. If that works then it has something to do with the PIC or PIT and not your IDT entry.
Also as others have mentioned you should make ms volatile or use a barrier so the compiler doesn't optimize reading the value. I don't think this is your problem though as you would still be seeing multiple prints if the timer interrupt was occurring.
It was just virtualbox for some reason, it works on my actual PC.

Enforce a code segment be atomic inside custom linux kernel system call

I've been trying to implement a linux system_call that has been giving me problems and I suspect it's because there is no locking(or maybe preemption) going on with my code.
There is a critical section I have in a very frequently called function(this custom function gets called every time a system_call is made), and it gets started/stopped by system_calls as well. Is there any way to ensure that a this critical segment that happens every time any system call is made in the linux kernel is non-preemptable and must finish execution before anything else can happen?
If I understand the question correctly, the simplest way is to use a spin-lock:
#include <linux/spinlock.h>
static DEFINE_SPINLOCK(foo_lock);
int my_system_call(...)
{
...
/* critical section starts */
spin_lock(&foo_lock);
/* critical section goes here */
...
/* critical section ends */
spin_unlock(&foo_lock);
...
}
Such critical section will be non-preemptable and concurrent executions of critical section won't overlap.

ioctl and execution time

I have a program running two threads - they communicate using message queues.
In one thread, I call ioctl() to access the hardware decryptor. The code goes like:
void Decrypt
{
...
..
...
if(<condition 1>)
{.
...
...
retVal = ioctl(...);
comesInHere1++;
}
if(<condition 2>)
{
...
...
retVal = ioctl(...);
comesInHere2++;
}
comesInHere1 and comesInHere2 are used to count the number of times it goes in that particular if loop.
The entire program takes 80 ms to execute. But if I comment out the test variables (comesInHere1, comesInHere2 within the if loops), the execution time increases by 8 ms to 88 ms!
How is that possible? I cant comment out the variables now since it increases the time taken, cant keep them either - will get killed in code review :)
Kindly let me know
Thanks
Cache? It's possible that by adding a bit more data you're moving code to different cache lines that would somehow be placed together, causing thrashing. You could experiment by running on different systems and by adding padding data between variables that are used exclusively in each thread.
What happens if you serialize the processing onto a single core?

openmp sections running sequentially

I have the following code:
#pragma omp parallel sections private(x,y,cpsrcptr) firstprivate(srcptr) lastprivate(srcptr)
{
#pragma omp section
{
//stuff
}
#pragma omp section
{
//stuff
}
}
According to the Zoom profiler, two threads are created, one thread executes both the sections, and the other thread simply blocks!
Has anyone encountered anything like this before? (And yes, I do have a dual core machine).
I guess I don't know too much about profilers yet, but one problem I've run into is forgetting to use the OpenMP flag and enable support.
Alternatively, what if you just created a simple application to try to verify the threads?
#pragma omp parallel num_threads(2)
{
#pragma omp critical
std::cout << "hello from thread: " << omp_get_thread_num() << "\n" << std::endl;
}
Maybe see if that works?
No, I can't say that I have encountered anything quite like this before. I have encountered a variety of problems with OpenMP codes though.
I can't see anything immediately wrong with your code snippet. When you use the Zoom profiler it affects the execution of the program. Have you checked that, outside the profiler, the program runs the sections on different threads ? If you have more sections do they all run on the same thread or do they run on different threads ? If you only have two sections, add some dummy ones while you test this.

Resources