Understanding broadcast operation mpi - c

I have an MPI program that was given to us that receives an integer and double as input from the user and have the processes announce their value that they receive.
For example:
user input = 7 10.1
Output:
Process 1 got 7 and 10.100000
Process 2 got 7 and 10.100000
.
.
I understand that each process will just have to announce the values that was given by user input through a single broadcast but the code seemed complicated that i couldn't understand the logic of it.
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
int rank; //rank of the process
struct {int a;double b;} value;
MPI_Datatype mystruct;
int blocklens[2]; //what is this?
MPI_Aint indices[2]; //what is this?
MPI_Datatype oldtype[2];
MPI_Init(&argc,&argv); //initialize MPI environment
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
blocklens[0] = 1;
blocklens[1] = 1;
oldtype[0] = MPI_INT;
oldtype[1] = MPI_DOUBLE;
MPI_Get_address(&value.a, &indices[0]);
MPI_Get_address(&value.b, &indices[1]);
indices[1] = indices[1] - indices[0];
indices[0] = 0;
MPI_Type_create_struct(2,blocklens,indices,oldtype,&mystruct);
MPI_Type_commit(&mystruct);
while (value.a >= 0) {
if (rank == 0) {
printf("Enter an integer and double: ");
fflush(stdout);
scanf("%d %lf",&value.a,&value.b);
}
MPI_Bcast(&value,1,mystruct,0,MPI_COMM_WORLD);
printf("Process %d got %d and %lf\n",rank,value.a,value.b);
}
MPI_Type_free(&mystruct);
MPI_Finalize();
return 0;
}
I would appreciate if someone could give me a run through of how the code works as i find it really hard to understand it.

This code creates a MPI derived datatype so struct value can be broadcasted in a single MPI call.
This is IMHO a bad example since :
the offsetof() macro should be used to (directly) populate the displacements array (indices is a very poor choice here)
the predefined MPI_DOUBLE_INT datatype is a perfect fit (do not forget to swap a and b in the struct value definition)
as a matter of taste, I’d rather recommend you pass the values via the command line rather than reading them from stdin (this is very subjective, and from experience, you will avoid surprises)

Related

How to understand the type of storage of a pointer

I have as an homework this task:
Given a void** ptr_addr write a function that return 0 if the type of storage of *ptr_addr is static or automatic and return 1 if the type of storage of *ptr_addr is dynamic.
The language of the code must be C.
The problem is that theoretically I know what the task is about but I don't know how to check the
previous condition with a code.
Thanks for the help!
Normally I don't do homework, but in cases like this I may make an exception.
Bear in mind that what I'm about to present is horrible code. Also it doesn't meet your requirements as stated — you'll have to adapt it for that. Also it may not meet your instructor's expectations: for an instructor demented enough to be assigning this task, I can't begin to guess his (her? its?) expectations. You may get dinged for using the technique I've presented, or for presenting someone else's work. Also I'm going to get dinged for presenting this code here on Stack Overflow, because no, it's nothing like portable or guaranteed to do anything, let alone to work. I have no idea whether it'll work on your system.
Nevertheless, and may God help me, I tested it, and it does "work" on a modern Debian Linux system.
#include <unistd.h>
extern etext, edata, end;
char *
mcat(void *p)
{
int dummy;
if(p < &etext)
return "text";
else if(p < &edata)
return "data";
else if(p < &end)
return "bss";
else if(p < sbrk(0))
return "heap";
else if(p > &dummy)
return "stack";
else return "?";
}
You'll get a good number of warnings if you compile this, which could theoretically be silenced using some explicit casts, but I think the warnings are actually pretty appropriate, given the nefariousness of this code.
How it works: on at least some Unix-like systems, etext, edata, and end are magic symbols corresponding to the ends of the program's text, initialized data, and uninitialized data segments, respectively. sbrk(0) gives you a pointer to the top of the heap that a traditional implementation of malloc is using. And &dummy is a good approximation of the bottom of the stack.
Test program:
#include <stdio.h>
#include <stdlib.h>
int g = 2;
int g2;
int main()
{
int l;
static int s = 3;
static int s2;
int *p = malloc(sizeof(int));
printf("g: %s\n", mcat(&g));
printf("g2: %s\n", mcat(&g2));
printf("main: %s\n", mcat(main));
printf("l: %s\n", mcat(&l));
printf("s: %s\n", mcat(&s));
printf("s2: %s\n", mcat(&s2));
printf("p: %s\n", mcat(p));
}
On my test system this prints
g: data
g2: bss
main: text
l: stack
s: data
s2: bss
p: heap
I'd like to post a different approach to solve the problem:
// this function returns 1 if ptr has been allocated by malloc/calloc/realloc, otherwise 0
int is_pointer_heap(void* ptr) {
pid_t p = fork();
if (p == 0) {
(void) realloc(ptr, 1);
exit(0);
}
int status;
(void) waitpid(p, &status, 0);
return (status == 0) ? 1 : 0;
}
I wrote this (bad) code very quickly (and there's lot of room for improvements), but I tested it and it seems to work.
EXPLANATION: realloc() will crash your process if the argument passed to it is not a malloc/calloc/realloc-allocated pointer. Here we create a new child process, we let the child process call realloc(); if the child process crashes, we return 0, otherwise we return 1.

Dual core word counter is slow

I need to parse a huge document and one of the queries requires me to count the words in certain strings of the document. Those strings usually have between 2000 and 30000 words and my program takes ~12 seconds just to parse it all. The query that takes the longest is unsurprisingly the query which requires a word counting.
I tried using pipes and a fork to try accelerate the process.
How it works:
I take the string and divide it by two. If I happen to divide a word in two - if text[i] != ' ' etc - then the left side of the divided text keeps looking to the left until it encounters a space and only counts words until it reaches that space. The right side counts that half word as a full word and keeps counting until it reaches the end of the string. If I divide between spaces the cycle just doesn't happen and the program proceeds to the next step.
Edit: could be a space or a \n or a \t
After that I do a fork and communicate between forks through a pipe. What goes through the pipe is the word count of one of the halves of the text. It then is added to the word count of the other half and the total is returned.
The problem:
On a test code example, it doesn't seem to help at all. The execution time still seems to be the same as if I did it all in one go.
The big problem
This function is meant to be ran around 60000 times throughout the parsing. And my program takes too long to execute, in fact I had to cancel it after 2 minutes...
Where do I need help?
I need help in knowing exactly why is my function:
a) not even getting slightly faster with this supposedly dual core implementation compared to the single core one.
b) taking so long in the actual program
I hope this isn't a problem with C and forks/pipes are just too slow for what I want and I hope I just don't know something.
--
Here's the code!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
long count(char* xStr) {
long num = 0;
// state:
const char* iterar = (const char*) xStr;
int in_palavra = 0;
do switch(*iterar) {
case '\0':
case ' ': case '\t': case '\n':
if (in_palavra) { in_palavra = 0; num++; }
break;
default: in_palavra = 1;
} while(*iterar++);
return num;
}
long wordCounter(char* text) {
int LHalf = strlen(text)/2;
int DHalf = LHalf;
while(text[LHalf] != ' ' && text[LHalf] != '\n' && text[LHalf] != '\t') {
if(LHalf > 0){
LHalf--;
}
else break;
}
char* lft = malloc(LHalf);
char* rgt = malloc(DHalf);
strncpy(lft, text, LHalf);
strncpy(rgt, text + DHalf, DHalf);
int fd[2];
pid_t childpid;
pipe(fd);
long size_left;
long size_right;
if((childpid = fork()) == -1) {
perror("Error in fork");
}
if(childpid == 0) {
close(fd[0]);
size_left = count(lft);
int w = write(fd[1], &size_left, sizeof(long));
close(fd[1]); //desnecessario
exit(0);
}
else {
close(fd[1]);
int r = read(fd[0], &size_left, sizeof(long));
size_right = count(rgt);
close(fd[0]);
wait(0);
}
long total = size_right + size_left;
free(lft);
free(rgt);
return total;
}
int main(int argc, char const *argv[]) {
long num = wordCounter("aaa aaa aa a a a a a a sa sa as sas sa sa saa sa sas aa sa sas sa sa"); //23 words
printf("%ld\n", num);
return 0;
}
To follow up on my comment above:
If I/O is your bottleneck:
Consider passing the filename into your word counting program, then managing the disc I/O yourself with simple fread() and fwrite() calls that read the whole file in at once. From the sound of it, your files should fit into memory reasonable at only 300k words - maybe worst case 3Meg files? That should read into memory very quickly.
Then, do your word counting magic on the data. My guess is that you won't even need to worry about threads or the like as scanning through memory should be nearly instant for your task. Heck, I bet even using strtok() looking for spaces and punctuation may be good enough.
But if I am wrong, the good news is that this data can easily be divided into multiple parts and passed to individual pthreads to count the data and then be collected and added when done.
If I/O is not, then the above exercise will show no gain at all, but at least it can be coded pretty quickly as a test case.

C code stack corruption changing variable

I'm hoping that someone can help me out. I have not written much in C code in over a decade and just picked this back up 2 days ago so bear with me please as I am rusty. THANK YOU!
What:
I'm working on creating a very simple thread pool for an application. This code is written in C on CodeBlocks using GNU GCC for the compiler. It is built as a command line application. No additional files are linked or included.
The code should create X threads (in this case I have it set to 10) each of which sits and waits while watching an array entry (identified by the threads thread index or count) for any incoming data it might need to process. Once a given child has processed the data coming in via the array there is no need to pass the data back to the main thread; rather the child should simply reset that array entry to 0 to indicate that it is ready to process another input. The main thread will receive requests and will dole them out to whatever thread is available. If none are available then it will refuse to handle that input.
For simplicity sake the code below is a complete and working but trimmed and gutted version that DOES exhibit the stack overflow I am trying to track down. This compiles fine and initially runs fine but after a few passes the threadIndex value in the child thread process (workerThread) becomes corrupt and jumps to weird values - generally becoming the number of milliseconds I have put in for the 'Sleep' function.
What I have checked:
The threadIndex variable is not a global or shared variable.
All arrays are plenty big enough to handle the max number of threads I am creating.
All loops have the loopvariable reset to 0 before running.
I have not named multiple variables with the same name.
I use atomic_load to make sure I don't write to the same global array variable with two different threads at once please note I am rusty... I may be misunderstanding how this part works
I have placed test cases all over to see where the variable goes nuts and I am stumped.
Best Guess
All of my research confirms what I recall from years back; I likely am going out of bounds somewhere and causing stack corruption. I have looked at numerous other problems like this on google as well as on stack overflow and while all point me to the same conclusion I have been unable to figure out what specifically is wrong in my code.
#include<stdio.h>
//#include<string.h>
#include<pthread.h>
#include<stdlib.h>
#include<conio.h>
//#include<unistd.h>
#define ESCAPE 27
int maxThreads = 10;
pthread_t tid[21];
int ret[21];
int threadIncoming[21];
int threadRunning[21];
struct arg_struct {
char* arg1;
int arg2;
};
//sick of the stupid upper/lowercase nonsense... boom... fixed
void* sleep(int time){Sleep(time);}
void* workerThread(void *arguments)
{
//get the stuff passed in to us
struct arg_struct *args = (struct arg_struct *)arguments;
char *address = args -> arg1;
int threadIndex = args -> arg2;
//hold how many we have processed - we are unlikely to ever hit the max so no need to round robin this number at this point
unsigned long processedCount = 0;
//this never triggers so it IS coming in correctly
if(threadIndex > 20){
printf("INIT ERROR! ThreadIndex = %d", threadIndex);
sleep(1000);
}
unsigned long x = 0;
pthread_t id = pthread_self();
//as long as we should be running
while(__atomic_load_n (&threadRunning[threadIndex], __ATOMIC_ACQUIRE)){
//if and only if we have something to do...
if(__atomic_load_n (&threadIncoming[threadIndex], __ATOMIC_ACQUIRE)){
//simulate us doing something
//for(x=0; x<(0xFFFFFFF);x++);
sleep(2001);
//the value going into sleep is CLEARLY somehow ending up in index because you can change that to any number you want
//and next thing you know the next line says "First thread processing done on (the value given to sleep)
printf("\n First thread processing done on %d\n", threadIndex);
//all done doing something so clear the incoming so we can reuse it for our next one
//this error should not EVER be able to get thrown but it is.... something is corrupting our stack and going into memory that it shouldn't
if(threadIndex > 20){ printf("ERROR! ThreadIndex = %d", threadIndex); }
else{ __atomic_store_n (&threadIncoming[threadIndex], 0, __ATOMIC_RELEASE); }
//increment the processed count
++processedCount;
}
else{Sleep(10);}
}
//no need to do atomocity I don't think for this as it is only set on the exit and not read till after everything is done
ret[threadIndex] = processedCount;
pthread_exit(&ret[threadIndex]);
return NULL;
}
int main(void)
{
int i = 0;
int err;
int *ptr[21];
int doLoop = 1;
//initialize these all to set the threads to running and the status on incoming to NOT be processing
for(i=0;i < maxThreads;i++){
threadIncoming[i] = 0;
threadRunning[i] = 1;
}
//create our threads
for(i=0;i < maxThreads;i++)
{
struct arg_struct args;
args.arg1 = "here";
args.arg2 = i;
err = pthread_create(&(tid[i]), NULL, &workerThread, (void *)&args);
if (err != 0){ printf("\ncan't create thread :[%s]", strerror(err)); }
}
//loop until we hit escape
while(doLoop){
//see if we were pressed escape
if(kbhit()){ if(getch() == ESCAPE){ doLoop = 0; } }
//just for testing - actual version would load only as needed
for(i=0;i < maxThreads;i++){
//make sure we synchronize so we don't end up pointing into a garbage address or half loading when a thread accesses us or whatever was going on
if(!__atomic_load_n (&threadIncoming[i], __ATOMIC_ACQUIRE)){
__atomic_store_n (&threadIncoming[i], 1, __ATOMIC_RELEASE);
}
}
}
//exiting...
printf("\n'Esc' pressed. Now exiting...\n");
//call to end them all...
for(i=0;i < maxThreads;i++){ __atomic_store_n (&threadRunning[i], 0, __ATOMIC_RELEASE); }
//join them all back up - if we had an actual worthwhile value here we could use it
for(i=0;i < maxThreads;i++){
pthread_join(tid[i], (void**)&(ptr[i]));
printf("\n return value from thread %d is [%d]\n", i, *ptr[i]);
}
return 0;
}
Output
Here is the output I get. Note that how long it takes before it starts going crazy does seem to possibly vary but not much.
Output Screen with Error
I don't trust your handling of args, there seems to be a race condition. What if you create N threads before the first one of them gets to run? Then the first thread created will probably see the args for the N:th thread, rather than for the first, and so on.
I don't believe there's a guarantee that automatic variables used in a loop like that are created in non-overlapping areas; after all they go out of scope with each iteration of the loop.

Accessing the variable inside another code

Is there a way to access a variable initialized in one code from another code. For eg. my code1.c is as follows,
# include <stdio.h>
int main()
{
int a=4;
sleep(99);
printf("%d\n", a);
return 0;
}
Now, is there any way that I can access the value of a from inside another C code (code2.c)? I am assuming, I have all the knowledge of the variable which I want to access, but I don't have any information about its address in the RAM. So, is there any way?
I know about the extern, what I am asking for here is a sort of backdoor. Like, kind of searching for the variable in the RAM based on some properties.
Your example has one caveat, set aside possible optimizations that would make the variable to dissapear: variable a only exists while the function is being executed and has not yet finished.
Well, given that the function is main() it shouldn't be a problem, at least, for standard C programs, so if you have a program like this:
# include <stdio.h>
int main()
{
int a=4;
printf("%d\n", a);
return 0;
}
Chances are that this code will call some functions. If one of them needs to access a to read and write to it, just pass a pointer to a as an argument to the function.
# include <stdio.h>
int main()
{
int a=4;
somefunction(&a);
printf("%d\n", a);
return 0;
}
void somefunction (int *n)
{
/* Whatever you do with *n you are actually
doing it with a */
*n++; /* actually increments a */
}
But if the function that needs to access a is deep in the function call stack, all the parent functions need to pass the pointer to a even if they don't use it, adding clutter and lowering the readability of code.
The usual solution is to declare a as global, making it accessible to every function in your code. If that scenario is to be avoided, you can make a visible only for the functions that need to access it. To do that, you need to have a single source code file with all the functions that need to use a. Then, declare a as static global variable. So, only the functions that are written in the same source file will know about a, and no pointer will be needed. It doesn't matter if the functions are very nested in the function call stack. Intermediate functions won't need to pass any additional information to make a nested function to know about a
So, you would have code1.c with main() and all the functions that need to access a
/* code1.c */
# include <stdio.h>
static int a;
void somefunction (void);
int main()
{
a=4;
somefunction();
printf("%d\n", a);
return 0;
}
void somefunction (void)
{
a++;
}
/* end of code1.c */
About trying to figure out where in RAM is a specific variable stored:
Kind of. You can travel across function stack frames from yours to the main() stack frame, and inside those stack frames lie the local variables of each function, but there is no sumplementary information in RAM about what variable is located at what position, and the compiler may choose to put it wherever it likes within the stack frame (or even in a register, so there would be no trace of it in RAM, except for push and pops from/to general registers, which would be even harder to follow).
So unless that variable has a non trivial value, it's the only local variable in its stack frame, compiler optimizations have been disabled, your code is aware of the architecture and calling conventions being used, and the variable is declared as volatile to stop being stored in a CPU register, I think there is no safe and/or portable way to find it out.
OTOH, if your program has been compiled with -g flag, you might be able to read debugging information from within your program and find out where in the stack frame the variable is, and crawl through it to find it.
code1.c:
#include <stdio.h>
void doSomething(); // so that we can use the function from code2.c
int a = 4; // global variable accessible in all functions defined after this point
int main()
{
printf("main says %d\n", a);
doSomething();
printf("main says %d\n", a);
return 0;
}
code2.c
#include <stdio.h>
extern int a; // gain access to variable from code1.c
void doSomething()
{
a = 3;
printf("doSomething says %d\n", a);
}
output:
main says 4
doSomething says 3
main says 3
You can use extern int a; in every file in which you must use a (code2.c in this case), except for the file in which it is declared without extern (code1.c in this case). For this approach to work you must declare your a variable globally (not inside a function).
One approach is to have the separate executable have the same stack layout as the program in question (since the variable is placed on the stack, and we need the relative address of the variable), therefore compile it with the same or similar compiler version and options, as much as possible.
On Linux, we can read the running code's data with ptrace(PTRACE_PEEKDATA, pid, …). Since on current Linux systems the start address of the stack varies, we have to account for that; fortunately, this address can be obtained from the 28th field of /proc/…/stat.
The following program (compiled with cc Debian 4.4.5-8 and no code generator option on Linux 2.6.32) works; the pid of the running program has to be specified as the program argument.
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ptrace.h>
void *startstack(char *pid)
{ // The address of the start (i. e. bottom) of the stack.
char str[FILENAME_MAX];
FILE *fp = fopen(strcat(strcat(strcpy(str, "/proc/"), pid), "/stat"), "r");
if (!fp) perror(str), exit(1);
if (!fgets(str, sizeof str, fp)) exit(1);
fclose(fp);
unsigned long address;
int i = 28; char *s = str; while (--i) s += strcspn(s, " ") + 1;
sscanf(s, "%lu", &address);
return (void *)address;
}
static int access(void *a, char *pidstr)
{
if (!pidstr) return 1;
int pid = atoi(pidstr);
if (ptrace(PTRACE_ATTACH, pid, 0, 0) < 0) return perror("PTRACE_ATTACH"), 1;
int status;
// wait for program being signaled as stopped
if (wait(&status) < 0) return perror("wait"), 1;
// relocate variable address to stack of program in question
a = a-startstack("self")+startstack(pidstr);
int val;
if (errno = 0, val = ptrace(PTRACE_PEEKDATA, pid, a, 0), errno)
return perror("PTRACE_PEEKDATA"), 1;
printf("%d\n", val);
return 0;
}
int main(int argc, char *argv[])
{
int a;
return access(&a, argv[1]);
}
Another, more demanding approach would be as mcleod_ideafix indicated at the end of his answer to implement the bulk of a debugger and use the debug information (provided its presence) to locate the variable.

how number of thread should be accepted via command line using pthread?

I am writting an application that has multiple threads in Linux environment using C or python. I am using pthread for that. But How number of threads should be accepted via command line.
In C you handle command line arguments by having main take two arguments,
int main(int argc, char** argv)
in which argc is the number of command line arguments (including the program itself) and argv is a pointer to a memory location where argc-1 pointers to strings with the actual arguments are located. Example:
int main(int argc, char** argv)
{
printf("The program was executed as %s.\n", argv[0]);
printf("The arguments were:\n");
for (int i = 1; i < argc; i++)
printf("%s\n", argv[i]);
return 0;
}
Let's now assume that your program takes a single command line argument, an integer telling you how many threads to spawn. The integer is given as a string, so we have to convert it using atoi:
if (argc != 2)
{
printf("Need exactly one argument!\n");
return 1;
}
int num_threads = atoi(argv[1]); // Convert first argument to integer.
if (num_threads < 1)
{
printf("I'll spawn no less than 1 thread!\n");
return 2;
}
Now what you do is simply create an array of thread handles,
pthread_t* threads = malloc(num_threads*sizeof(pthread_t));
and use it to store the thread handles as you start num_threads number of threads using pthread_create.
If you are not familiar with pthreads at all, I recommend this short tutorial.
If you were using a threading framework like OpenMP, then this is all handled automatically simply by setting the environment variable OMP_NUM_THREADS.
But if you're implementing threads "manually", you'll need to do it the way most runtime configuration is done: either by parsing argv[], or by setting an environment variable and using getenv().
Usually, you would just pass it like any other argument. I've used code similar to the following in projects before to specify fixed thread counts. This is fairly simple but suitable for situations where you don't need the full power of thread pooling (though you could just as easily set a minimum and maximum thread count the same way).
#include <stdio.h>
#define THRD_DFLT 5
#define THRD_MIN 2
#define THRD_MAX 20
static int numThreads = 0;
int main (int argCount, char *argVal[]) {
if (argCount > 1)
numThreads = atoi (argVal[1]);
if ((numThreads < 5) || (numThreads > THRD_MAX)) {
printf ("Number of threads outside range %d-%d, using %d\n",
THRD_MIN, THRD_MAX, THRD_DFLT);
numThreads = THRD_DFLT;
}
:
:

Resources