Running MPI program with multiple inputs files and number of processes - c

I want to run my mpi program with 3 command line inputs.
mpirun -np 4 ./exe file_1 file_2 Size
for each file_1 there is associated file_2 and Size is same for each pair of files. I want to execute my program with the different number of processes say -np 2,4,6,8 and 10.
I have more than a hundred files. I want to execute my code once from command line that reads these files one by one with the specified number of processes.
For serial code, I have tried the following command and it works by taking all .txt files one by one.
find . -name "*.txt" | awk -F"/" '{system ("./a.out " $2)}'
I am not sure how I can execute with three program line inputs(file1,file2,size) at same time with different number of processes.

As far as I understand your question, you might for example use:
find . -name "*.txt" -print0 | xargs -0 -n2 -J% -- mpirun -np4 ./exe % $SIZE
If your working directory would contain for example files a.txt, b.txt, c.txt, d.txt, this would launch commands:
mpirun -np4 ./exe a.txt b.txt $SIZE
mpirun -np4 ./exe c.txt d.txt $SIZE
Here, find first finds all *.txt files in the current directory, outputs their names as null-delimited strings and pipes this list to xargs. The -0 option of xargs specifies that it should expect null-delimited input, -n2 says that it should take two files at a time, and -J% prescribes the replacement character %, i.e., the character which will be in the supplied command replaced with the file names taken from the input. Finally, -- merely denotes end of options and is followed by the command to be executed (containing the replacement character).

Related

Why does "a=( * )" assign an array with one element for each filename in '*' instead of each word?

Question Details
Suppose we have a directory with three files in it: file_1, file_2, and the very inconveniently named file 3. If my understanding of filename expansion is correct, the way bash interprets the string
echo *
is that it sees the (unquoted) *, and modifies the string so that it now reads
echo file_1 file_2 file 3
Then, since there are no more expansions to be performed, bash attempts to evaluate the string. In this case, it runs the command echo, passing to it four arguments: file, 3, file_1, and file_2. In any case, the outputs are identical:
$ echo *
> file 3 file_1 file_2
$ echo file 3 file_1 file_2
> file 3 file_1 file_2
However, in other contexts, this doesn't seem to be what happens. For instance
$ arr1=( * )
$ arr2=( file 3 file_1 file_2 )
$ echo ${#arr1}
> 3
$ echo ${#arr2}
> 4
And yet, if shell expansion works the way it's described in the bash documentation, these ought to be identical.
Something similar happens in a for loop:
$ for f in *; do echo $f; done
> file 3
> file_1
> file_2
$ for f in file 3 file_1 file_2; do echo $f; done
> file
> 3
> file_1
> file_2
What am I missing? Does globbing not happen in these cases?
Use case
I'm putting together a GitHub repo to centralize my dotfiles, following this suggestion from MIT's Hacker Tools. The script I'm writing has two usages:
./install.sh DOTFILE [DOTFILE [DOTFILE ...]]
./install.sh -a
In the first case, each of the named dotfiles in src/config is symlinked to a corresponding dotfile in my home directory; in the second, the -a flag prompts the script to run as if I had entered every dotfile as an argument.
The solution I came up with was to run ln -sih in a for loop using one of two arrays: $# and *.1 So, simply assign FILES=( $# ) or FILES=( * ), and then run for f in $FILES--except, it seems to me, * should break in this assignment if there's a filename with a space in it. Clearly bash is smarter than me, since it doesn't, but I don't understand why.
1: Obviously, you don't want the script itself to run through the loop, but that's easy enough to exclude with an if [[ "$f" != "$0" ]] clause.
From the bash documentation you linked to:
The order of expansions is: brace expansion; tilde expansion,
parameter and variable expansion, arithmetic expansion, and command
substitution (done in a left-to-right fashion); word splitting; and
filename expansion.
Filename expansion happens after word splitting, and therefore the expanded filenames are not themselves subject to further word splitting.

Why does read -a fail in zsh

If I type:
echo "1 the
dquote> 2 quick
dquote> 3 brown" | while read -a D; do echo "${D[1]}--${D[0]}"; done
in bash it says:
the--1
quick--2
brown--3
but in zsh it says:
zsh: bad option: -a
Why? And what should I do instead?
In both shells read is a builtin. It shares the same purpose, but the implementation and options differ.
In order to read in an array in zsh, read requires the option -A (instead of -a):
echo "1 the
2 quick
3 brown" | while read -A D; do echo $D[2]--$D[1]; done
Note: There are many more differences between zsh and bash:
In zsh arrays are numbered from one by default, in bash they start from zero.
echo $ARRAY prints outputs all elements in zsh but only the first element in bash
To print the third element of an array in sh you can use echo $ARRAY[3]. In bash braces are needed to delimit the subscript, also the subscript for the third element is 2: echo ${ARRAY[2]}.
In zsh you usually do not need to quote parameter expansions in order to handle values with white spaces correctly. For example
FILENAME="no such file"
cat $FILENAME
will print only one error message in zsh:
cat: 'no such file': No such file or directory
but three error messages in bash:
cat: no: No such file or directory
cat: such: No such file or directory
cat: file: No such file or directory
In zsh the builtin echo evaluates escape codes by default. In bash you need to pass the -e argument for that.
echo 'foo\tbar'
zsh:
foo bar
bash:
foo\tbar
…
Generally, it is important to keep in mind that, while zsh and bash are similar, they are far from being the same.

How to create multiple files of the same size from a variable?

I have a shell script with a variable which I create an output file as follows
Variable >> file.txt
Result:
file.txt 20 kilobytes
Then, I have to split that output file in several of the same size using the split instruction
Result:
file01.txt 10 kilobytes
file02.txt 10 kilobytes
My question is:
Is there any way to apply the equivalent of split instruction while creating the output file? This is the expected output:
Variable >> file.txt / / Adding here the code needed to do the split
Result:
file01.txt 10 kilobytes
file02.txt 10 kilobytes
An example,
echo $var | split -b 10240
You can specify the output file prefix like this:
echo $var | split -b 10240 - dir1/mysplits
which produces filenames dir1/mysplitsaa, dir1/mysplitsab, dir1/mysplitsac, ... You can also rename these files after split of course.
You can chain any number of commands together by putting && between them - this basically tells the shell that if the first command succeeded, then perform the second command.
Alternatively, you can "pipe" data from one command to the next. This is done with |, and essentially takes the output of the first command and passes it as input to the second command.

script for getting extensions of a file

I need to get all the file extension types in a folder. For instance, if the directory's ls gives the following:
a.t
b.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
I should get this by running the script
.t
.t.pg
.bin
.old
.txt
I have a bash shell.
Thanks a lot!
See the BashFAQ entry on ParsingLS for a description of why many of these answers are evil.
The following approach avoids this pitfall (and, by the way, completely ignores files with no extension):
shopt -s nullglob
for f in *.*; do
printf '%s\n' ".${f#*.}"
done | sort -u
Among the advantages:
Correctness: ls behaves inconsistently and can result in inappropriate results. See the link at the top.
Efficiency: Minimizes the number of subprocess invoked (only one, sort -u, and that could be removed also if we wanted to use Bash 4's associative arrays to store results)
Things that still could be improved:
Correctness: this will correctly discard newlines in filenames before the first . (which some other answers won't) -- but filenames with newlines after the first . will be treated as separate entries by sort. This could be fixed by using nulls as the delimiter, or by the aforementioned bash 4 associative-array storage approach.
try this:
ls -1 | sed 's/^[^.]*\(\..*\)$/\1/' | sort -u
ls lists files in your folder, one file per line
sed magic extracts extensions
sort -u sorts extensions and removes duplicates
sed magic reads as:
s/ / /: substitutes whatever is between first and second / by whatever is between second and third /
^: match beginning of line
[^.]: match any character that is not a dot
*: match it as many times as possible
\( and \): remember whatever is matched between these two parentheses
\.: match a dot
.: match any character
*: match it as many times as possible
$: match end of line
\1: this is what has been matched between parentheses
People are really over-complicating this - particularly the regex:
ls | grep -o "\..*" | uniq
ls - get all the files
grep -o "\..*" - -o only show the match; "\..*" match at the first "." & everything after it
uniq - don't print duplicates but keep the same order
you can also sort if you like, but sorting doesn't match the example
This is what happens when you run it:
> ls -1
a.t
a.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
> ls | grep -o "\..*" | uniq
.t
.t.pg
.bin
.old
.txt

How to get input file name from Unix terminal in C?

My program gets executed like:
$./sort 1 < test.txt
sort is the program name
1 is the argument (argv[1])
and test.txt is the file I am inputting from
Is it possible to extract the name file from this? if so how?
The problem is I already wrote my whole program as if I could extract the name from the input line, so I need to be able to pass it into arguments.
Any help is appreciated,
Thanks!
You can't. The shell opens (open(2)) that file and sets up the redirect (most likely using dup2).
The only possible way would be for the shell to explicitly export the information in an environment variable that you could read via getenv.
But it doesn't always make sense. For example, what file name would you expect from
$ echo "This is the end" | ./sort 1
Though this can't be done portably, it's possible on Linux by calling readlink on /proc/self/fd/0 (or /proc/some_pid/fd/0).
eg, running:
echo $(readlink /proc/self/fd/0 < /dev/null)
outputs:
/dev/null
No you can't: the shell sends the content of test.txt to the standard input of your program.
Look at this:
sort << _EOF
3
1
2
_EOF
The < > | operators are processed by the shell, they alter standard input,output,error of the programs in the cmd line.
If you happen to run Solaris, you could parse pfiles output to get the file associated, if any, with stdin.
$ /usr/bin/sleep 3600 < /tmp/foo &
[1] 8430
$ pfiles 8430
8430: /usr/bin/sleep 3600
Current rlimit: 65536 file descriptors
0: S_IFREG mode:0600 dev:299,2 ino:36867886 uid:12345 gid:67890 size=123
O_RDONLY|O_LARGEFILE
/tmp/foo
1: S_IFCHR mode:0600 dev:295,0 ino:12569206 uid:12345 gid:67890 rdev:24,2
...
On most Unix platforms, you will also get the same information from lsof -p if this freeware is installed.

Resources