ls | grep in shell written in C - c

I am trying to make my own shell in C. It uses one pipe and the input (for now) is static. I execute commands using execvp.
Everything is fine except when I run the command ls |grep ".c" I get no results. Can anyone show me where is the problem and find a solution.
The shell so far:
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
int p[2];
int pid;
int r;
main()
{
char *ls[] = {"ls", NULL};
char *grep[] = {"grep", "\".c\"", NULL};
pipe(p);
pid = fork();
if (pid != 0) {
// Parent: Output is to child via pipe[1]
// Change stdout to pipe[1]
dup2(p[1], 1);
close(p[0]);
r = execvp("ls", ls);
} else {
// Child: Input is from pipe[0] and output is via stdout.
dup2(p[0], 0);
close(p[1]);
r = execvp("grep", grep);
close(p[0]);
}
return r;
}

Remove the quotes in the argument to grep. i.e., use
char *grep[] = {"grep", ".c", NULL};
If you are calling execvp, the usual shell expansion of arguments (i.e., globbing, removal of quotes, etc) does not happen, so effectively what you are doing is the same as
ls | grep '".c"'
In a normal shell.
Also be aware that nothing that comes after the call to execvp will execute, execvp replaces the current process, it will never return.

You have multiple problems:
One problem is that you have far too few calls to close(). When you use dup2() to replicate a file descriptor from a pipe to standard input or standard output, you should close both file descriptors returned by pipe().
A second problem is that the shell removes double quotes around arguments but you've added them around your. You are looking for files whose name contains ".c" (where the double quotes are part of the file name being searched for). Use:
char *grep[] = { "grep", "\\.c$", NULL };
This looks for a dot and a c at the end of the line.
You should report failures after execvp(). If any of the exec*() functions returns, it failed. It can happen when the user mistypes a command name, for example. It is crucial that you report the error and that the child process then exits. If you don't do that, you can end up in a normal iterative shell (rather than this one-shot, non-iterative, non-interactive shell) with multiple shell processes all trying to read from the terminal at the same time, which leads to chaos and confusion.

Related

Need help for child processes in C

I have a question which asks me to :
Write a program (in C under Linux!!!) that:
Receives via the command line a command with its options and/or parameters
Passes the command to a child process
The child executes the command and returns the results to the parent process
The father displays the results on the screen.
I managed to do this but the problem comes from the 2nd part of the question that I can't solve.
Part 2:
The program now receives as a parameter (via the command line) a shell script (shell file), i.e. a file containing a series of shell commands.
The file
Does not contain any programming
Is limited to the shell syntax seen in class (including pipes, redirects, command combinations, wildcards, ...)
Each of the commands in the file must be executed by a child, which sends the results back to the father via the pipe and the father will display the results on the screen (as for part 1).
Here is my code for part 1:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main(int argc,char * argv[]){
//Création du pipe
int descripteur[2];
if(pipe(descripteur) != 0){
return -1;
}
//Création du fils
pid_t pid_fils;
pid_fils = fork();
if(pid_fils ==-1){
return -1;
}
//Processus fils
if (pid_fils == 0){
close(descripteur[0]);
if(dup2(descripteur[1],1) == -1){
printf("Erreur\n");
return -1;
};
if(dup2(descripteur[1],2) == -1){
printf("Erreur\n");
return -1;
};
close(descripteur[1]);
if(execvp(argv[1],&argv[1]) == -1){
printf("Erreur\n");
return -1;
};
}
//Processus père
else{
char bufferHelper[256];
close(descripteur[1]);
int nbBit;
while((nbBit = read(descripteur[0], bufferHelper, sizeof(bufferHelper))) !=0){
write( 1, bufferHelper, nbBit);
}
}
return 0;
}
If I understand correctly, the input file contains one shell command per line without any programming structures combining multiple lines (i.e. for, while, if...).
So, you need to open the file and read it line by line with for example fgets(). For each line you can execute "sh -c command line" doing fork/exec of the following for each lines: av[0] = "sh", av[1]="-c", av[2] = cmdline, av[3] = NULL.
Or you can call system() which does the same internally.
Through another pipe, you can also pass the content of the file as input to the child process which executes a shell.

In execlp() system call why the arg0 argument must point to a filename that's associated with the process being started?

I was going through the Dinosaur book by Galvin et. al. where I came across the following illustration of the fork() system call.
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
pid_t pid;
/* fork a child process */
pid = fork();
if (pid < 0) { /* error occurred */
fprintf(stderr, "Fork Failed");
return 1;
}
else if (pid == 0) { /* child process */
execlp("/bin/ls","ls",NULL);
}
else { /* parent process */
/* parent will wait for the child to complete */
wait(NULL);
printf("Child Complete");
}
return 0;
}
The text says:
After a fork() system call, one of the two processes typically uses the exec() system call to replace the process’s memory space with a new program. The exec() system call loads a binary file into memory (destroying the memory image of the program containing the exec() system call) and starts its execution.
So in this example above :
The child process then overlays its address space with the UNIX command /bin/ls (used to get a directory listing) using the execlp() system call (execlp() is a version of the exec() system call).
Here it is said that :
#include <unistd.h>
int execlp( const char * file,
const char * arg0,
const char * arg1,
…
const char * argn,
NULL );
file: Used to construct a pathname that identifies the new process image file. If the file argument contains a slash character, the file argument is used as the pathname for the file. Otherwise, the path prefix for this file is obtained by a search of the directories passed as the environment variable PATH.
arg0, …, argn : Pointers to NULL-terminated character strings. These strings constitute the argument list available to the new process image. You must terminate the list with a NULL pointer. The arg0 argument must point to a filename that's associated with the process being started and cannot be NULL.
Could anyone please explain what are the 3 arguments to the execlp() doing ? More specifically, why the arg0 must have the same name as the process being started ? The web search said me that the first argument is a file name (or path name) and the rest can be considered as pointers to null terminated strings which act as arguments to the file.
What I do not understand is why we are passing ls as an argument to the ls program present in the binary folder? While working on Linux terminal with
$
in the prompt. Just typing
$ ls
and hitting the enter key does the work... I mean we do not give.
$ ls ls
Is it similar to the way a C program accepts command line arguments ?
int main(int argc,char* argv){
...
}
Running the binary corresponding to the above program as:
$ ./a.out xyz pqr
has argv[0]="./a.out" and argv[1]="xyz" and argv[2]="pqr". Is ./a.out an argument to the binary file a.out? But using ./a.out we are actually guiding the Linux system to run the binary.
I went here and here but none of them seem to answer my question directly.
The arg0 parameter is, by convention, the name of the executable being run. However, this is not required to be the case. You can pass any string for this argument.

How to correctly close unused pipes?

I'm implementing a simplified shell which supports pipe.
A part of my code shown below runs fine, but I'm not sure why it works.
main.cpp
#include <iostream>
#include <string>
#include <queue>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "include/command.h"
using namespace std;
int main()
{
string rawCommand;
IndividualCommand tempCommand = {};
int pipeFD[2] = {PIPE_IN, PIPE_OUT};
int firstPipeRead, firstPipeWrite, secondPipeRead, secondPipeWrite;
while (true)
{
cout << "% ";
getline(cin, rawCommand);
if (rawCommand == "exit")
break;
Command *command = new Command(rawCommand);
deque<IndividualCommand> commandQueue = command->parse();
delete command;
while (!commandQueue.empty())
{
tempCommand = commandQueue.front();
commandQueue.pop_front();
firstPipeRead = secondPipeRead;
firstPipeWrite = secondPipeWrite;
if (tempCommand.outputStream == PIPE_OUT)
{
pipe(pipeFD);
secondPipeRead = pipeFD[0];
secondPipeWrite = pipeFD[1];
}
pid_t child_pid;
child_pid = fork();
int status;
// child process
if (child_pid == 0)
{
if (tempCommand.redirectToFile != "")
{
int fd = open(tempCommand.redirectToFile.c_str(), O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
dup2(fd, STDOUT_FILENO);
close(fd);
}
if (tempCommand.inputStream == PIPE_IN)
{
close(firstPipeWrite);
dup2(firstPipeRead, STDIN_FILENO);
close(firstPipeRead);
}
if (tempCommand.outputStream == PIPE_OUT)
{
close(secondPipeRead);
dup2(secondPipeWrite, STDOUT_FILENO);
close(secondPipeWrite);
}
if (tempCommand.argument != "")
execl(tempCommand.executable.c_str(), tempCommand.executable.c_str(), tempCommand.argument.c_str(), NULL);
else
execl(tempCommand.executable.c_str(), tempCommand.executable.c_str(), NULL);
}
else
{
close(secondPipeWrite);
if (commandQueue.empty())
waitpid(child_pid, &status, 0);
}
}
}
return 0;
}
command.h
#ifndef COMMAND_H
#define COMMAND_H
#include <string>
#include <queue>
#include <sstream>
#include <unistd.h>
using namespace std;
#define PIPE_IN 0x100000
#define PIPE_OUT 0x100001
struct IndividualCommand
{
string executable = "";
string argument = "";
string redirectToFile = "";
int inputStream = STDIN_FILENO;
int outputStream = STDOUT_FILENO;
int errorStream = STDERR_FILENO;
};
class Command
{
private:
string rawCommand, tempString;
queue<string> splittedCommand;
deque<IndividualCommand> commandQueue;
stringstream commandStream;
IndividualCommand tempCommand;
bool isExecutableName;
public:
Command(string rawCommand);
deque<IndividualCommand> parse();
};
#endif
command.cpp
#include "include/command.h"
Command::Command(string rawCommand)
{
this->rawCommand = rawCommand;
isExecutableName = true;
}
deque<IndividualCommand> Command::parse()
{
commandStream << rawCommand;
while (!commandStream.eof())
{
commandStream >> tempString;
splittedCommand.push(tempString);
}
while (!splittedCommand.empty())
{
tempString = splittedCommand.front();
splittedCommand.pop();
if (isExecutableName)
{
tempCommand.executable = tempString;
isExecutableName = false;
if (!commandQueue.empty() && commandQueue.back().outputStream == PIPE_OUT)
tempCommand.inputStream = PIPE_IN;
}
else
{
// normal pipe
if (tempString == "|")
{
tempCommand.outputStream = PIPE_OUT;
isExecutableName = true;
commandQueue.push_back(tempCommand);
tempCommand = {};
}
// redirect to file
else if (tempString == ">")
{
tempCommand.redirectToFile = splittedCommand.front();
splittedCommand.pop();
}
// argv
else
tempCommand.argument = tempString;
}
if (splittedCommand.empty())
{
commandQueue.push_back(tempCommand);
tempCommand = {};
}
}
return commandQueue;
}
So basically the communication is established between two child processes, not between child and parent. (I'm using those first and second pipes to avoid overwriting FDs with consecutive calls to pipe() when facing something like "ls | cat |cat").
The shell originally got stuck because the write end was not closed, and thus the read end got blocked. I've tried closing everything in both the child processes, but nothing changed.
My question is why close(secondPipeWrite); in the parent process solved everything? Does it mean that it is the write end of the pipe that really matters, and we don't have to care about whether the read end is closed explicitly?
Moreover, why I don't need to close anything in the child process and it still works?
Accidents will happen! Things will sometimes seem to work when there is no good reason for them to do so reliably. A multi-stage pipeline is not guaranteed to work if you do not close all the unused pipe descriptors properly, even though it happens to work for you. You aren't closing enough file descriptors in the child processes, in particular. You should close all the unused ends of all the pipes.
Here's a 'Rule of Thumb' I've included in other answers.
Rule of thumb: If you
dup2()
one end of a pipe to standard input or standard output, close both of the
original file descriptors returned by
pipe()
as soon as possible.
In particular, you should close them before using any of the
exec*()
family of functions.
The rule also applies if you duplicate the descriptors with either
dup()
or
fcntl()
with F_DUPFD or F_DUPFD_CLOEXEC.
If the parent process will not communicate with any of its children via
the pipe, it must ensure that it closes both ends of the pipe early
enough (before waiting, for example) so that its children can receive
EOF indications on read (or get SIGPIPE signals or write errors on
write), rather than blocking indefinitely.
Even if the parent uses the pipe without using dup2(), it should
normally close at least one end of the pipe — it is extremely rare for
a program to read and write on both ends of a single pipe.
Note that the O_CLOEXEC option to
open(),
and the FD_CLOEXEC and F_DUPFD_CLOEXEC options to fcntl() can also factor
into this discussion.
If you use
posix_spawn()
and its extensive family of support functions (21 functions in total),
you will need to review how to close file descriptors in the spawned process
(posix_spawn_file_actions_addclose(),
etc.).
Note that using dup2(a, b) is safer than using close(b); dup(a);
for a variety of reasons.
One is that if you want to force the file descriptor to a larger than
usual number, dup2() is the only sensible way to do that.
Another is that if a is the same as b (e.g. both 0), then dup2()
handles it correctly (it doesn't close b before duplicating a)
whereas the separate close() and dup() fails horribly.
This is an unlikely, but not impossible, circumstance.
Note that if the wrong process keeps a pipe descriptor open, it can prevent processes from detecting EOF. If the last process in a pipeline has the write end of a pipe open where a process (possibly itself) is reading until EOF on the read end of that pipe, the process will never get EOF.
Reviewing the C++ code
On the whole, your code was good. My default compilation options picked two problems with close(firstPipeWrite) and close(firstPipeRead) operating on uninitialized variables; they were treated as errors because I compile with:
c++ -O3 -g -std=c++11 -Wall -Wextra -Werror -c -o main.o main.cpp
But that was all — which is remarkably good work.
However, those errors also point to where your problem is.
Let's suppose you have a command input which requires two pipes (P1 and P2) and three processes (or commands, C1, C2, C3), such as:
who | grep -v root | sort
You want the commands set up as follows:
C1: who — creates P1; standard input = stdin, standard output = P1[W]
C2: grep — creates P2; standard input = P1[R], standard output = P2[W]
C3: sort — creates no pipe; standard input = P2[R], standard output = stdout
The PN[R] notation means the read descriptor of pipe N, etc.
A more elaborate pipeline, such as who | awk '{print $1}' | sort | uniq -c | sort -n, with 5 commands and 4 pipes is similar: it simply has more processes CN (with N = 2, 3, 4) which create PN and run with standard input coming from P(N-1)[R] and standard output going to PN[W].
A two-command pipeline has just one pipe, of course, and the structure:
C1 — creates P1; standard input = stdin, standard output = P1[W]
C2 — creates no pipe; standard input = P1[R], standard output = stdout
And a one-command (degenerate) pipeline has zero pipes, of course, and the structure:
C1 — creates no pipe; standard input = stdin, standard output = stdout
Note that you need to know whether the command you're processing is first, last, or in the middle of the pipeline — the plumbing work to be done for each is different. Also, if you have a multi-command pipeline (three or more commands), you can close the older pipes after a while; they won't be needed again. So as you're processing C3, both ends of P1 can be closed permanently; they won't be referenced again. You need the input pipe and the output pipe for the current process; any older pipes can be closed by the process coordinating the plumbing.
You need to decide which process is coordinating the plumbing. The easiest way in some respects is to have the original (parent) shell process launch all the sub-processes, left-to-right — which is what you're doing — but it is by no means the only way.
With the shell process launching the child processes, it is crucial that the shell eventually close all the descriptors of all the pipes it opened, so that the child processes can detect EOF. This must be done before waiting for any of the children. Indeed, all the processes in the pipeline must be launched before the parent can afford to wait for any of them — those processes must run concurrently, in general, as otherwise, the pipes in the middle may fill up, blocking the entire pipeline.
I'm going to point you at C Minishell — Adding Pipelines as a question with an answer showing how to do it. It is not the only way of doing it, and I'm not convinced it is the best way to do it, but it does work.
Sorting this out in your code is left as an exercise — I need to get some work done now. But this should give you strong pointers in the right direction.
Note that since your parent shell creates all the sub-processes, the waitpid() code is not ideal. You will have zombie processes accumulating. You'll need to think about a loop which collects any dead children, possibly with WNOHANG as part of the third argument so that when there are no zombies, the shell can continue. This becomes even more important when you run processes in background pipelines, etc.

C piping using the command line arguments

I need some help emulating the "|" command in unix. I need to be able to use the output from the first argument as the input of the second, something as simple as ls and more. I got this code so far but I'm just stuck at this point. Any and all help would be helpful.-Thanks.
#include <sys/types.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char ** words)
{
char** args;
char *cmd1[2] = { words[1], 0 };
char *cmd2[2] = { words[2], 0 };
int colon, arg1 ,i, pid, status;
int thepipe[2];
char ch;
args = (char **) malloc(argc*(sizeof(char*)));
colon = -1;
for (i=0;(i<argc); i=i+1){
if (strcmp(words[i],":") == 0) {
colon = i;
}
else {}
}
pipe(thepipe);
arg1 = colon;
arg1 = arg1 - 1;
for (i=0;(i<arg1); i=i+1){
args[i] = (char*) (malloc(strlen(words[i+1])+1));
strcpy(args[i], words[i+1]);
}
args[argc] = NULL;
pid = fork();
if (pid == 0) {
wait(&pid);
dup2(thepipe[1], 1);
close(thepipe[0]);
printf("in new\n");
execvp(*args, cmd1);
}
else {
close(thepipe[1]);
printf("in old\n");
while ((status=read(thepipe[0],&ch,1)) > 0){
execvp(*args, cmd2);
}
}
}
Assuming that argv[1] is a single word command (like ls) and argv[2] is a second single word command (like more), then:
Parent
Create a pipe.
Fork first child.
Fork second child.
Close both ends of the pipe.
Parent waits for both children to die, reports their exit status, and exits itself.
Child 1
Duplicates write end of pipe to standard output.
Close both ends of the pipe.
Uses execvp() to run the command in argv[1].
Exits, probably with an error message written to standard error (if the execvp() returns).
Child 2
Duplicates read end of pipe to standard input.
Close both ends of the pipe.
Uses execvp() to run the command in argv[2].
Exits, probably with an error message written to standard error (if the execvp() returns).
The only remaining trick is that you need to create a vector such as:
char cmd1[2] = { argv[1], 0 };
char cmd2[2] = { argv[2], 0 };
to pass as the second argument to execvp().
Note that this outline does not break the strings up. If you want to handle an invocation such as:
./execute "ls -la" "wc -wl"
then you will need to split each argument into separate words and create bigger arrays for cmd1 and cmd2. If you want to handle more than two commands, you need to think quite carefully about how you're going to manage the extra stages in the pipeline. The first and last commands are different from those in the middle (so 3 processes has three different mechanisms, but 4 or more substantially uses the same mechanism over and over for all except the first and last commands).

Execvp not being executed n times in a loop

Assume input is a pointer to array, which at each elements stores, "ls -l" at position 0, then cat helloworld.txt at position 1, and so forth, I wish to create the main parameter, which is ls, cat, pwd, and execute it. Essentially, what I am doing is I have a file with all those commands, I first store them in my input variable, which is declared as char *input[10]. Now I have what I need in that array, and I am able to extract the individual main commands, like ls, cat, and I wish to execute all of them.
For example,
if position 0 had ls -l, my first variable has ls and I wish to pass that to execvp and then position 1 might have cat sample.txt, now my variable first will be cat, and I pass that to execvp along with the entire cat sample.txt(which is input[i]), to execvp. For some strange reason, this is not working. How can I run all those commands in a loop in an execvp such that once it's done, all those commands have ran successfully. Here is my attempt, at the end of the first loop, I run an execvp, I expect that to finish, and then I extract further input, etc, etc.
Help me out.
for(i=0; i<lineCount; i++)
{
first = malloc(sizeof(char)*50);
for(j=0; j<strlen(input[i]); j++)
{
if(input[i][j]!=' ')
{
first[j]=input[i][j];
}
else
{
break;
}
}
execvp(first, input[i]);
}
I tried doing execp(first, input) but that didn't work either.
If you use execvp() once , the context of execution of the process that involved will be changed , except the pid of the process that called execvp , hence your loop won't work since once execvp() , there won't be any more iterations.
execvp() is mainly meant to be called by a child process , in your case for 'n' number of execvp() calls , there must have neen 'n' number of child processes forked,
Good Practices:
Using execl, execv, execle, execve, execlp, execvp , family of system calls , with child processes.
After the new process image is loaded to child , and after execution , collect the exit code of process launched , and perform any necessary error handling.
The child processes are now in a zombie state , the parent process must exexute wait()/waitpid() , and wait till all the child processes are terminated , and then exit.
-- Edit --
POC code for OP's reference
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main(void)
{
pid_t cpid;
int status = 0;
cpid = fork();
if (cpid == 0)
{
char *cmd[] = { "ls", "-l", (char *)0 };
int ret = execv ("/bin/ls", cmd);
}
wait(&status);
if(status < 0)
perror("Abnormal exit of program ls");
else
printf("Exit Status of ls is %d",status);
}
Here is what the opengroup exec manual says, in the first paragraph:
There shall be no return from a successful exec, because the calling
process image is overlaid by the new process image.
I suggest reading the opengroup fork manual, and using fork and exec in conjunction.
exec replaces the running process with the one you exec, and so it never returns on success because the process will be replaced.
If you want to run a bunch of processes, the simple way is to use a utility function like popen or system to run them. For complete control, use the usual UNIX fork/exec combo once for each command you want to run.

Resources