How does execvp run a command? - c

I know execvp can be used to execute simple commands as follows:
char* arg[] = {"ls", "-l", NULL};
execvp(arg[0],arg);
I want to know what goes on in here when I run execvp. In man page it says execvp replaces the image of the process image with the new one. However here I am running a command not an executable.
To be specific, say there is a command that specifically requires an input e.g. cat. If I have a text file text.txt which contains the file name expected for cat and I redirect stdin to the file stream of the file, would the output of execle("cat","cat",NULL) or execvp("cat", arg) (obviously where arg stores "cat" and NULL) result in the output in the console as the cat /filename would? My intuition is I have to read the file and may be parse it to store the arguments in the arg. However I want to make sure.
Thanks in advance!

Here's what happens in an execvp call:
Your libc implementation searches in PATH, if applicable, for the file that is to be executed. Most, if not all, commands in UNIX-like systems are executables. What will happen if it is not? Try it. Have a look at how glibc does it.
Typically, if the executable is found, a call to execve will be made. Parts of execve may be implemented in libc or it may be a system call (like in Linux).
Linux prepares a program by allocating memory for it, opening it, scheduling it for execution, initialises memory structures, sets up its arguments and environment from the supplied arguments to the execvp call, finds a handler appropriate for loading the binary, and sets the current task (the execvp caller) as not executing. You can find its implementation here.
All steps above conform to the requirements set by POSIX which are described in the relevant manual pages.

Regarding your questions:
In man page it says execvp replaces the image of the process image
with the new one. However here I am running a command not an
executable.
Long-long time ago shell was very limited and almost all UNIX commands was standalone executables. Now, mostly for speed purposes some subset of UNIX commands is implemented inside shell itself, those commands are called builtins. You can check whatever command is implemented in your shell as built-in or not via type command:
λ ~/ type echo
echo is a shell builtin
(Full list of builtins with descriptions can be found in man pages to your shell e.g. man bash-builtins or man builtin.)
But still most of the commands still have their executable-counterpart:
λ ~/ whereis echo
/bin/echo
So in your specific case when you are running:
char* arg[] = {"ls", "-l", NULL};
execvp(arg[0],arg);
You are actually replacing address space of current process with address space of (most likely) /bin/ls.
My intuition is I have to read the file and may be parse it to store
the arguments in the arg.
Indeed you you have. But you also may use some in-kernel functions for that aka "shebang":
Instead of putting file name in separate file add so-called shebang as the first line of the file you want to cat:
#!/bin/cat
And add chmod +x to it. Then you can run it as executable (via any of exec functions or shell):
λ ~/tmp/ printf '#!/bin/cat\nTEST\n' > cat_me
λ ~/tmp/ chmod +x cat_me
λ ~/tmp/ ./cat_me
#!/bin/cat
TEST
Of cause it's has a drawback of printing shebang itself with file but still it's fun to do it in-kernel =)
BTW. Problem that you described if so common that there is a special executable called xargs which (in very simplified explanation) executes given program on list of arguments passed via stdin. For more information consult with man xargs.
For easy memorization of exec-family I often use following table:
Figure 8.14. Differences among the six exec functions
+----------+----------+----------+----------+--------+---------+--------+
| Function | pathname | filename | agr list | argv[] | environ | envp[] |
+----------+----------+----------+----------+--------+---------+--------+
| execl | * | | * | | * | |
+----------+----------+----------+----------+--------+---------+--------+
| execlp | | * | * | | * | |
+----------+----------+----------+----------+--------+---------+--------+
| execle | * | | * | | | * |
+----------+----------+----------+----------+--------+---------+--------+
| execv | * | | | * | * | |
+----------+----------+----------+----------+--------+---------+--------+
| execvp | | * | | * | * | |
+----------+----------+----------+----------+--------+---------+--------+
| execve | * | | | * | | * |
+----------+----------+----------+----------+--------+---------+--------+
| letter | | p | l | v | | e |
+----------+----------+----------+----------+--------+---------+--------+
So in your case execvp takes filename, argv(v) and environ(e).
Then it's tries to "guess" pathname (aka full path) by appending filename (in your case cat) to each path component in PATH until it find path with executable filename.
Much more information about whats going on under the exec's hood (including inheritance stuff) can be found in Advanced Programming in the UNIX Environment (2nd Edition) by W. Richard Stevens and Stephen A. Rago aka APUE2.
If you are interested in UNIX internals you should probably read it.

"ls" isn't just a command, it's actually a program (most commands are). When you run execvp like that, it will nuke your entire program, its memory, its stack, its heap, etc... conceptually "clear it out" and give it to "ls" so it can use it for its own stack, heap, etc.
In short, execvp will destroy your program, and replace it with another program, in this case "ls".

My intuition is I have to read the file and may be parse it to store the arguments in the arg. However I want to make sure.
Your intuition is largely correct. The cat utility that you're using as an example has two separate code paths:
If there are filenames specified as arguments, it will open and read each one in turn.
If there are no filenames specified, it will read from standard input.
This behavior is specifically implemented in the cat utility -- it is not implemented at any lower level. In particular, it is definitely not part of the exec system call. The exec system calls do not "look at" arguments at all; they just pass them straight on to the new process in argv, and that process gets to handle them however it sees fit.

Related

lighttpd, fastcgi and application script written in "C" using libfcgi - Weird system error message

I get a wierd Linux system message "Transport Endpoint is not connected" in response to a write call immediately after a successful open call. All this happens immediately after a Slackware Linux 2.6.33.4 reboot.
I'm writing a forms-handler in C and it runs under lighttpd and fastcgi (and before some smart alec pipes up and asks 'why am I not using his/her favourite language, it's because I like C --- OK? OK!).
I've got the major facilities of the application running - it displays index.htm (which is a form) and when the form is 'Submit'ted, finds the program I've written which correctly processes the contents returned and displays the next form. It's got complex enough that now I need to print some debugging statements somewhere, to give me some feedback from program additions. Thus arises the problem.
Research indicates that this message usually means that the endpoint of the file path is not or has become not mounted but /tmp/debug.log (which is the file I'm trying to create/append-to lives on the root partition. But then why is the open() successful but the write() is not?
The program fragment below is the bit which is giving trouble. The printf() statements send output to the web interface (for those unfamiliar with libfcgi) to give me some idea what's happening.
I can't get my (properly indented) code through this forum's demented code filter and I can't attach it in a zip file, so you'll just have to take my word that the syntax is correct.
Any clues?
This line
if(fd = open(DEBUG_FILE_NAME, (O_CREAT | O_APPEND | O_RDWR), (S_IWUSR | S_IRGRP | S_IROTH | S_IRUSR)) < 0)
results in fd being set to the result of open(DEBUG_FILE_NAME, (O_CREAT | O_APPEND | O_RDWR), (S_IWUSR | S_IRGRP | S_IROTH | S_IRUSR)) < 0 which most likely is 0.
So the write(fd, ... tries to write to stdin, which is weird, so you get a weird error ... ;-)
To fix this fix the parenthesis.
You could do this for example in save manner by using Yoda-Conditions:
if (0 > (fd = open(DEBUG_FILE_NAME, (O_CREAT | O_APPEND | O_RDWR), (S_IWUSR | S_IRGRP | S_IROTH | S_IRUSR))))
("save" in terms that everything important is at the left in one place: if (0 > (fd = open(...)

Implementing my own ps command

I'm trying to implement my own ps command, called psmod.
I can use linux system call and all utilities of the /proc directory.
I discovered that all directory in /proc directory with a number as their name are the processes in the system. My question is: how can I select only those processes which are active when psmod is called?
I know that in /proc/<pid>/stat there's a letter representing the current status of the process; anyway, for every process in /proc, this letter is S, that is sleeping.
I also tried to send a signal 0 to every process, from 0 to the maximumnumberofprocesses (in my case, 32768), but in this way it discovers far more processes than the ones present in /proc.
So, my question is, how does ps work? The source is a little too complicated for me, so if someone can explain me, I would be grateful.
how does ps work?
The way of learning standard utils - is to check their source code. There are several implementations of ps: procps and busybox; and busybox is smaller and it will be easier to begin with it. There is sources for ps from busybox: http://code.metager.de/source/xref/busybox/procps/. Main loop from ps.c:
632 p = NULL;
633 while ((p = procps_scan(p, need_flags)) != NULL) {
634 format_process(p);
635 }
Implementation of procps_scan is in procps.c (ignore code from inside ENABLE_FEATURE_SHOW_THREADS ifdefs for first time). First call to it will open /proc dir using alloc_procps_scan():
290 sp = alloc_procps_scan();
100 sp->dir = xopendir("/proc");
Then procps_scan will read next entry from /proc directory:
292 for (;;) {
310 entry = readdir(sp->dir);
parse the pid from subdirectory name:
316 pid = bb_strtou(entry->d_name, NULL, 10);
and read /prod/pid/stat:
366 /* These are all retrieved from proc/NN/stat in one go: */
379 /* see proc(5) for some details on this */
380 strcpy(filename_tail, "stat");
381 n = read_to_buf(filename, buf);
Actual unconditional printing is in format_process, ps.c.
So, busybox's simple ps will read data for all processes, and will print all processes (or all processes and all threads if there will be -T option).
how can I select only those processes which are active when psmod is called?
What is "active"? If you want find all processes that exists, do readdir of /proc. If you want to find only non-sleeping, do full read of /proc, check states of every process and print only non-sleeping. The /proc fs is virtual and is it rather fast.
PS: for example, normal ps program prints only processes from current terminal, usually two:
$ ps
PID TTY TIME CMD
7925 pts/13 00:00:00 bash
7940 pts/13 00:00:00 ps
but we can strace it with strace -ttt -o ps.log ps and I see that ps does read every process directory, files stat and status. And the time needed for this (option -tt of strace gives us timestamps of every syscall): XX.719011 - XX.870349 or just 120 ms under strace (which slows all syscalls). It takes only 20 ms in real life according to time ps (I have 250 processes in total):
$ time ps
PID TTY TIME CMD
7925 pts/13 00:00:00 bash
7971 pts/13 00:00:00 ps
real 0m0.021s
user 0m0.006s
sys 0m0.014s
"My question is: how can I select only those processes which are active when psmod is called?"
I hope this command will help you:
top -n 1 | awk "NR > 7" | awk {'print $1,$8,$12'} | grep R
I am on ubuntu 12.

get physical hdd's list in c

People, I need get the list of hard disk connected in C language on Linux system:
Example, running a program on a computer with 2 IDE disks and 1 SATA disk connected.
./a.out
Out required:
/dev/hda
/dev/hdb
/dev/sda
help?
Use libsysfs, the recommended way to query the kernel about attached devices of all kinds.
FILE *fp = popen("fdisk -l | grep \"Disk /\" | awk '{print $2};' | sed 's/://'", "r");
while(fgets(path, sizeof(path) -1,fp) != NULL)
//your code
pclose(fp);
The simplest way would be simply read and parse /proc/partitions.
Maybe you can reference fdisk's source code.
Follow this website:
ftp://ftp.gnu.org/gnu/fdisk
Command Line
"ls /sys/block/"
will return output as:
sda sdb sdc
From there, you could creat a script that pipes it to a file, then read in that file as an array or linked list to manipulate the data however you see fit (such as adding /dev/ in front of all device names in the list).

How to get input file name from Unix terminal in C?

My program gets executed like:
$./sort 1 < test.txt
sort is the program name
1 is the argument (argv[1])
and test.txt is the file I am inputting from
Is it possible to extract the name file from this? if so how?
The problem is I already wrote my whole program as if I could extract the name from the input line, so I need to be able to pass it into arguments.
Any help is appreciated,
Thanks!
You can't. The shell opens (open(2)) that file and sets up the redirect (most likely using dup2).
The only possible way would be for the shell to explicitly export the information in an environment variable that you could read via getenv.
But it doesn't always make sense. For example, what file name would you expect from
$ echo "This is the end" | ./sort 1
Though this can't be done portably, it's possible on Linux by calling readlink on /proc/self/fd/0 (or /proc/some_pid/fd/0).
eg, running:
echo $(readlink /proc/self/fd/0 < /dev/null)
outputs:
/dev/null
No you can't: the shell sends the content of test.txt to the standard input of your program.
Look at this:
sort << _EOF
3
1
2
_EOF
The < > | operators are processed by the shell, they alter standard input,output,error of the programs in the cmd line.
If you happen to run Solaris, you could parse pfiles output to get the file associated, if any, with stdin.
$ /usr/bin/sleep 3600 < /tmp/foo &
[1] 8430
$ pfiles 8430
8430: /usr/bin/sleep 3600
Current rlimit: 65536 file descriptors
0: S_IFREG mode:0600 dev:299,2 ino:36867886 uid:12345 gid:67890 size=123
O_RDONLY|O_LARGEFILE
/tmp/foo
1: S_IFCHR mode:0600 dev:295,0 ino:12569206 uid:12345 gid:67890 rdev:24,2
...
On most Unix platforms, you will also get the same information from lsof -p if this freeware is installed.

How to make a pipe loop in Zsh?

Penz says that the problem could be solved by Multios and coproc features in the thread.
However, I am unsure about the solution.
I do know that you can use multios as
ls -1 > file | less
but I have never used such that you have two inputs.
How can you use these features to have a pipe loop in Zsh?
I am having trouble understanding the questions.
Are you trying to do the following:
(ls -1 && file) | less
Where && is used for multiple commands on a single line.
Or are you trying to do the following:
ls -1 | tee file | less
Where tee puts the output into the file and standard out.

Resources