Why are different nodes running different compiles of my executable? (MPI) - c

After I recompile my (C) program, some nodes are running old compiles (with the debug information still in it), and some nodes are running the new copy. The server is running Gentoo Linux and all nodes get the file from the same storage. I'm told the filesystem is NFS. The MPI I'm using is MPICH Version 1.2.7. Why are some nodes not using the newly compiled copy?
Some more details (in case you're having trouble sleeping):
I'm trying to create my first MPI program (and I'm new to C and Linux, too). I have the following in my code:
#if DEBUG
{
int i=9;
pid_t PID;
char hostname[256];
gethostname(hostname, sizeof(hostname));
printf("PID %d on %s ready for attach.\n", PID=getpid(), hostname);
fflush(stdout);
while (i>0) {
printf("PID %d on %s will wait for `gdb` to attach for %d more iterations.\n", PID, hostname, i);
fflush(stdout);
sleep(5);
i--;
}
}
#endif
Then I recompiled with (no -DDEBUG=1 option, so the above code is excluded)
$ mpicc -Wall -I<directories...> -c myprogram.c
$ mpicc -o myprogram myprogram.o -Wall <some other options...>
The program compiles with no problems. Then I execute it like this:
$ mpirun -np 3 myprogram
Sometimes (and more and more frequently), different copies of the executable run on different nodes of the cluster? On some nodes, the debugging code executes (and prints) and on some nodes it doesn't.
Note that the cluster is currently experiencing some "clock skew" (or something like that), which may be the cause. Is that the problem?
Also note that I actually just change the compile options by commenting/uncommenting lines in a Makefile because I haven't had time to implement these suggestions yet.
Edit: When the problem occurs, md5sum myprogram returns a different value on the nodes where the issue presents itself.

Your different nodes have retained a copy of a file and are using that instead of the latest when you run the binary. This has little to nothing to do with Gentoo because it is an artifact of the Linux (kernel) caching and/or NFS implementations.
In other words, your binary is cached. Read this answer:
NFS cache-cleaning command?
Tweaking some settings may also help.
I happen to have a command here that syncs and flushes:
$ cat /home/jaroslav/bin/flush_cache
sudo sync
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

Related

Embed a binary in C program

I am trying to write a program in C that would be able to call certain binaries (ex. lsof, netstat) with options. The purpose of this program is to collect forensic data from a computer, while at the same time this program should not use the binaries of the computer under analysis as they might be compromised. As a result it is required the certified/uncompromised binaries (ex. lsof, netstat -antpu etc) already to be embedded in a C program or to be called by the C program stored in a usb drive for example.
Having for example the binary of the "ls" command I created an object file using the linker as follows:
$ ld -s -r -b binary -o testls.o bin-x86-2.4/ls
Using the following command I extracted the following entry points from the object file
$ nm testls.o
000000000007a0dc D _binary_bin_x86_2_4_ls_end
000000000007a0dc A _binary_bin_x86_2_4_ls_size
0000000000000000 D _binary_bin_x86_2_4_ls_start
The next step would be to call the "function" from the main program with some options that I might need for example "ls -al". Thus I made a C program to call the entry point of the object file.
Then I compiled the program with the following gcc options
gcc -Wall -static testld.c testls.o -o testld
This is the main program:
#include <stdio.h>
extern int _binary_bin_x86_2_4_ls_start();
int main(void)
{
_binary_bin_x86_2_4_ls_start();
return 0;
}
When I run the program I am getting a segmentation fault. I checked the entry points using the objdump in the testld program and the linking seems to be successful. Why then I am getting a segmentation fault?
I still need also to call "ls" with options. How I could do this, i.e. call the "function" with the arguments "-al".
Thank you.
The ELF header of a binary isn't a function. You can't call it. If you could (like in some ancient binary formats) it would be a really bad idea because it would never return.
If you want to run another program midstream do this:
int junk;
pid_t pid;
if (!(pid = fork())) {
execl("ls", "/bin/ls", ...); /* this results in running ls in current directory which is probably what you want but maybe you need to adjust */
_exit(3);
}
if (pid > 0) waitpid(pid, &junk, 0);
Error handling omitted for brevity.
In your case, you should ship your own copies of your binaries alongside your program.

Execlp vs Execl

Is there any occasion in which is better to use execl instead of execlp?
I think that maybe when a program is in two different folders using execlp could lead to confusion but I don't know if it is the only case.
I ask because one could think that writing execlp("ls", ...) is easier than writing execl("/bin/ls", ...).
Security
Looking programs up via PATH is convenient, but it can also be insecure. If a directory in a user's PATH is world writable, it's possible to inject a malicious program into the PATH lookup. This would affect execlp but not execl.
For example, if you had a PATH like /foo/bar/bin:/home/you/bin:/usr/bin:/bin and /foo/bar/bin was world writable, someone with access to that machine could copy a malicious program to /foo/bar/bin/ls. Then executing ls would run /foo/bar/bin/ls rather than /bin/ls. They'd be able to execute commands as you and gain greater access.
For this reason, it's often a good idea to refer to specific executables in known locations. Or to hard wire a secure PATH in the executable.
Compatibility
While there is a common set of Unix commands and features specified by POSIX, many programs rely on extensions. If your program uses those extensions, grabbing the first one in the PATH might not be a good idea.
For example, here on OS X the installed utilities in /bin and /usr/bin are BSD-flavored. But I have GNU versions installed earlier in my PATH. A program designed to run on OS X would want to explicitly use, for example, /bin/ls or /usr/bin/tar to be sure they get a known version of those utilities.
$ /usr/bin/tar --version
bsdtar 2.8.3 - libarchive 2.8.3
$ tar --version
tar (GNU tar) 1.29
Both execl() and execlp() work fine and similarly if your executables are in different folders or in the same folder, but you need to set the $PATH if different folders.
execl() is needed for executing executables (like ls) from command line as you can't go with execlp() in that case. I added a snapshot below.
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Usage Msg: ./a.out userdefined_executable\n");
return;
}
//execl(argv[1], argv[1], NULL); // it works
execlp(argv[1], argv[1], NULL); // it doesn't work
return 0;
}
// Input will be like this, here "p1" is an user-defined executable.
// xyz#xyz:~/stack_overflow$ ./a.out p1

system command not executing with mpiicc -O

I have intel Parallel studio XE cluster edition 2015 on my 10 Node server connected with infiniband band. I wrote my code in C. My code consists of system commands with sprintf command like below:
printf("started \n");
system("cp metis_input.txt $HOME/metis-4.0/.");
sprintf(filename,"$HOME/metis-4.0/./partdmesh metis_input.txt %d",size-1);
system(filename);
sprintf(filename,"mv metis_input.txt.npart.%d nodes_proc.txt",size-1);
system(filename);
printf("completed \n");
When I compile my code and run it without any opmization flags it runs smoothly but when I compile my code with "mpiicc -O" the above lines dont even seem to be executed. I think that the above lines are being skipped. Only the printf's are executed. Do I need to add anything extra in my code (like including any headers) to get these system commands runnning for INTEL mpi compiler with -O ?

gcc on Windows: generated "a.exe" file vanishes

I'm using GCC version 4.7.1, but I've also tried this on GCC 4.8. Here is the code I'm trying to compile:
#include <stdio.h>
void print(int amount) {
int i;
for (i = 0; i < 5; i++) {
printf("%d", i);
}
}
int main(int argc, char** argv) {
print(5);
return 0;
}
It looks like it should work, and when I compile with...
gcc main.c
It takes a while to compile, produces an a.exe file and the the a.exe file disappears. It isn't giving me any errors with my code.
Here's a gif of proof, as some people are misinterpreting this:
(Since ahoffer's deleted answer isn't quite correct, I'll post this, based on information in the comments.)
On Windows, gcc generates an executable named a.exe by default. (On UNIX-like systems, the default name, for historical reasons, is a.out.) Normally you'd specify a name using the -o option.
Apparently the generated a.exe file generates a false positive match in your antivirus software, so the file is automatically deleted shortly after it's created. I see you've already contacted the developers of Avast about this false positive.
Note that antivirus programs typically check the contents of a file, not its name, so generating the file with a name other than a.exe won't help. Making some changes to the program might change the contents of the executable enough to avoid the problem, though.
You might try compiling a simple "hello, world" program to see if the same thing happens.
Thanks to Chrono Kitsune for linking to this relevant Mingw-users discussion in a comment.
This is not relevant to your problem, but you should print a newline ('\n') at the end of your program's output. It probably doesn't matter much in your Windows environment, but in general a program's standard output should (almost) always have a newline character at the end of its last line.
Try to compile with gcc but without all standard libraries using a command like this:
gcc -nostdlib -c test.c -o test.o; gcc test.o -lgcc -o test.exe
One of the mingw libraries binary must generate a false positive, knowing which library would be useful.
There is no issue with your code it is just exiting properly.
You have to run it in the command line which will show you all the info.
start->run->cmd, then cd to your directory. then a.exe. If you don't want to do that you can add a sleep() before the return in main.
More over, in your code when you pass print(5) to your function it's not being used.
I confirm is due to Antivirus.
I did this test:
compile helloworld.c at t=0;
within 1 second tell McAfee not consider helloworld.exe a threat. >> the file is still there
If I am too slow, the file will be deleted.
If suppose you get the error near a.exe while running the file ,
Theen follow the below steps:
1.open virus & threat protection
2.there select manage settings in virus & threat protection settings
3.there is real time protection and cloud delivered protection is in ON then OFF the real time protection and cloud delivered protection.!
(https://i.stack.imgur.com/mcIio.jpg)
a.exe is also the name of a virus. I suspect your computer's security software is deleting or quarantining the file because it believes it is a virus. Use redFIVE's suggestion to rename your output file to "print.exe" so that the virus scanner does not delete it.
You try:
gcc -o YOUR_PROGRAM.exe main.c
You can stop your antivirus software from deleting your .exe by specifying the full file path (for eg: c:\MyProject) in the 'paths to be excluded from scanning' section of the antivirus software.

Problem with gcc tracker/make/fork/exec/wait

This is a most singular problem, with many interdisciplinary ramifications.
It focuses on this piece of code (file name mainpp.c):
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int status;
if (fork())
{
FILE *f=fopen("/tmp/gcc-trace","a");
fprintf(f,"----------------------------------------------------------------\n");
int i;
for(i=0;i<argc;i++)
{
fprintf(f,"%s:",argv[i]);
}
wait(&status);
fprintf(f,"\nstatus=%d",status);
fprintf(f,"\n");
fclose(f);
}
else
{
execv("g++.old",argv);
}
sleep(10);
return status;
}
This is used with a bash script:
#!/bin/sh
gcc -g main.c -o gcc
gcc -g mainpp.c -o g++
mv /usr/bin/gcc /usr/bin/gcc.old
mv /usr/bin/g++ /usr/bin/g++.old
cp ./gcc /usr/bin/gcc
cp ./g++ /usr/bin/g++
The purpose of this code ( and a corresponding main.c for gcc) is hopefully clear. it replaces g++ and logs calls to g++ plus all commandline arguments, it then proceeds to call the g++ compiler ( now called g++.old ).
The plan is use this to log all the calls to g++/gcc. ( Since make -n does not trace recursive makes, this is a way of capturing calls "in the wild". )
I tried this out on several programs and it worked well. ( Including compiling the program itself. ) I then tried it out on the project I was interested in, libapt-pkg-dev ( Ubuntu repository ).
The build seemed to go well but when I checked some executables were missing. Counting files in the project directory I find that an unlogged version produces 1373 whereas a logged version produces 1294. Making a list of these files, I discover that all the missing files are executables, shared libraries or object files.
Capturing the standard out of both logged makes and unlogged makes gives the same output.
The recorded return value of all processes called by exec is 0.
I've placed sleeps in various positions in the code. They do not seem to make any difference. ( The code with the traced version seems to compile much faster per file. I suspected that the exec might have caused the program to terminate while leaving gcc running. I thought that might cause failure because some object files might not be finishing when others need them. )
I have only one more diagnostic to run to see if I can diagnose the problem and then I am out of ideas. Suggestions?
I'm not sure if this if this will solve your problem, but have you considered using strace instead of your custom code?
strace executes a command (or attaches to a running process) and lists all the system calls it makes. So for instance, instead of running make directly, you might run:
strace -f -q -e trace=execve make
-f means attach to new processes as they are forked
-q means suppress attach/detach messages
-e trace=execve means only report calls to execve
You can then grep through the output for messages about /usr/bin/gcc.

Resources