How bash treats Here-docs - c

I'm working on a modest bash-like shell in C and I have a question on here-doc. For the moment my shell can execute here-doc only in first command.
ls | << eof wc
bash result :
> eof
0 0 0
my result :
> eof
10 10 63
(wc takes the result of ls, I have problem in my pipes but can't figure what.)
In this case, I can just do like ls doesn't exist I think.
wc | << eof wc
bash result
> eof
0 0 0
Here, bash executes the command with here-doc in first and execute the second (which it doesn't has input so it's freezing).
my result
> eof
I execute the here-doc first like bash, and eof works, but I have no result and then it's freezing due to the first wc.
So can I treat all cases like that? I execute the command with here doc first and cancel others except if they have to crash (like wc if it has no input)?

In a POSIX shell, redirections such as here-docs are associated with commands, and the pipe operator (|) separates commands. Thus here:
ls | << eof wc
eof
the here-doc is associated with the second command, wc (even though the redirection operator appears before the command name). It redirects that command's standard input to be from the content of the here-doc. The pipe operator also affects the second command's standard input, so an important question is which effect is applied last, as that is the one that will be effective. POSIX specifies for a pipeline of the form command1 | command2:
The standard output of command1 shall be connected to the standard
input of command2. The standard input, standard output, or both of a
command shall be considered to be assigned by the pipeline before any
redirection specified by redirection operators that are part of the
command.
(Emphasis added.) For what it's worth, the Bash docs say the same in their own words.
For your example, then, the here-doc should replace the output of ls as the input to wc, as it does when Bash processes the command.
In this case, I can just do like ls doesn't exist I think.
wc | << eof wc
bash result
> eof
0 0 0
Here, bash executes the command with here-doc in first and execute the second (which it doesn't has input so it's freezing).
Not exactly. The heredoc belongs to the second wc command, not the first, so it is the first that hangs waiting for interactive input. You can cause that to finish normally by typing a ctrl-D.
More importantly, if the idea is to ignore the ls, then why use two wc commands? Just drop the ls and the pipeline altogether, and use a single wc command:
<<eof wc
eof
# OR, equivalently and more conventionally,
wc <<eof
eof
So can I treat all cases like that? I execute the command with here
doc first and cancel others except if they have to crash (like wc if
it has no input)?
You seem to be trying to create specific rules where you should instead be observing more general ones. Each command has standard input, output, and error that are initially either inherited from the host shell or assigned as directed by pipe operators. Redirections, including from here-docs, are each part of an individual command, and their effects are applied on top of the initial standard stream assignments for that command. When multiple redirections apply to the same command, they are processed from left to right.
You cannot "cancel" commands based on the status of their standard streams. They will do whatever they do with the streams they receive. That might be nothing at all, terminating with an error, or exactly what they would have done anyway, among other possibilities. The shell doesn't know, so it cannot shortcut this.

Related

Correct fd for a shell prompt

I'm making a custom shell in C and I wonder on which fd I should write my prompts.
mycoolshell $
Looking into other classic shells, I found that dash uses STDERR for its prompts. csh and tcsh use STDOUT. For bash, zsh and BSD sh I wasn't able to find anything. I used
% dash 2>file
echo qwe
echo qwe
% cat file
(dashprompt$)
(dashprompt$)
to check dash's prompt fd. Same with csh with csh 1>file but I was unlucky with the other ones.
Is there a standard or POSIX fd for this? Is it ok to use STDIN?
If you wish to be Posix compatible, you'll need to write the prompt to stderr. (See the specification of the PS1 environment variable, below.)
Regardless of strict Posix compatibility, stdin is definitely not correct, since it may not allow write operations. stdout is also not a good idea, since it is usually line-buffered. Some shells (including zsh, I believe) write the prompt to a file descriptor connected to the current terminal (such as /dev/tty) which is probably what stderr is opened as if not redirected, although it is not necessarily the same file descriptor. But using /dev/tty or equivalent is non-standard.
The prompt is only printed if the shell is interactive. A shell is interactive, according to Posix, if it is invoked in one of two ways:
If the -i option is present, or if there are no operands and the shell's standard input and standard error are attached to a terminal, the shell is considered to be interactive. (sh utility, Options)
Clearly, you wouldn't want the shell to spew out prompts if you are using it to execute a script. So you need some mechanism to tell if the shell is being used interactively or as a script processor; Posix's requirement seems reasonably accurate. (See the isatty() library function to see one way to do this test.)
That also shows why your test failed to capture the prompt when stderr was redirected to a file. Redirecting stderr causes the shell to be non-interactive so there will not be a prompt. To do the test properly, you need to force the shell to be interactive using the -i option.
Posix requires that the prompt be modifiable by changing the value of the PS1 environment variable. Here's what Posix has to say, including the requirement that the prompt be printed to stderr: (emphasis added)
PS1
Each time an interactive shell is ready to read a command, the value of this variable shall be subjected to parameter expansion and written to standard error. The default value shall be "$ ". For users who have specific additional implementation-defined privileges, the default may be another, implementation-defined value. The shell shall replace each instance of the character '!' in PS1 with the history file number of the next command to be typed. Escaping the '!' with another '!' (that is, "!!" ) shall place the literal character '!' in the prompt. (Shell command language, Shell Variables)
Most shells allow a much richer set of substitutions in PS1. But the fact that the value is subject to parameter expansion allows extensive customisation. That means that (unlike usual variable expansion) parameter and command references appearing in the value of the PS1 variable are expanded every time the prompt is printed.

Another Linux command output (Piped) as input to my C program

I'm now working on a small C program in Linux. Let me explain you what I want to do with a sample Linux command below
ls | grep hello
The above command is executed in the below passion (Let me know if I've got this wrong)
ls command will be executed first
Output will be given to grep command which will again generate output by matching "hello"
Now I would like to write a C program which takes the piped output of one command as input. Means, In the similar passion of how "grep" program was able to get the input from ls command (in my example above).
Similar question has been asked by another user here, but for some reason this thread has been marked as "Not a valid question"
I initially thought we can get this as a command line argument to C program. But this is not the case.
If you pipe the output from one command into another, that output will be available on the receiving process's standard input (stdin).
You can access it using the usual scanf or fread functions. scanf and the like operate on stdin by default (in the same way that printf operates on stdout by default; in the absence of a pipe, stdin is attached to the terminal), and the C standard library provides a FILE *stdin for functions like fread that read from a FILE stream.
POSIX also provides a STDIN_FILENO macro in unistd.h, for functions that operate one file descriptors instead. This will essentially always be 0, but it's bad form to rely on that being the case.
If fact, ls and grep starts at the same time.
ls | grep hello means, use ls's standard output as grep's standard input. ls write results to standard output, grep waits and reads any output from standard input at once.
Still have doubts? Do an experiment. run
find / | grep usr
find / will list all files on the computer, it should take a lot of time.
If ls runs first, then OS gives the output to grep, we should wait a long time with blank screen until find finished and grep started. But, we can see the results at once, that's a proof for that.

Batch use of | symbol

What is "|" symbol used for in batch?
Because its not a command I cant use | /? to find out what it does, and if I have something like Stackoverflow|Stackoverflow (as an example) I'm told "stackoverflow is not a recognized as an internal or external command, operable program or batch file"
It's the pipe operator, or redirect operator. From TechNet:
Reads the output from one command and writes it to the input of another command. Also known as a pipe.
The pipe operator (|) takes the output (by default, STDOUT) of one command and directs it into the input (by default, STDIN) of another command. For example, the following command sorts a directory:
dir | sort
In this example, both commands start simultaneously, but then the sort command pauses until it receives the dir command's output. The sort command uses the dir command's output as its input, and then sends its output to handle 1 (that is, STDOUT).
The pipe operator | directs (pipes) the output of the first command to the standard input of the second command. So running
somecmd | anothercmd
Would first run somecmd, then gather the output of somecmd and run anothercmd, giving it the output of somecmd as input. You can read about it in innumerable places, just google "pipe command line" or something like it.
This character is used to chain commands.
There's a lot of documentation available on this. Wikipedia is probably a good place to start.

In the Unix/Linux shell programming:the difference between > and >&

int main(void)
{
char buf[] = "standard err, output.\n";
printf("standard output.\n");
if (write(STDERR_FILENO,buf, 22) != 22)
printf("write err!\n");
exit(0);
}
Compile using:
gcc -Wall text.c
Then running in the shell:
./a.out > outfile 2 >& 1
Result:outfile´s content are:
standard err, output.
standard output.
./a.out 2 >& 1 >outfile
Result:
This first prints to the terminal: standard err, output.
and the content of outfile are: standard output.
Questions:
I want to ask the difference between 2 >& fd and 2 > file.
Are they all equal to the function dup()?
Another question: why are the contents of outfile:
standard err, output.
standard output.
I expected the content of outfile to be:
standard output.
standard err, output
Actually, in bash, >& is quite similar to dup2. That is, the file descriptor to which it is applied will refer to the same file as the descriptor to the right. So:
$ ./a.out > outfile 2>& 1
It will redirect stdout(1) to the file outfile and, after that, will dup2 stderr(2) to refer to the same file as stdout(1). That is, both stdout and stderr are being redirected to the file.
$ ./a.out 2>& 1 >outfile
It will redirect stderr(2) to refer to the same file as stdout(1), that is, the console, and after that, will redirect stdout(1) to refer to the file outfile. That is, stderr will output to the console and stdout to the file.
And that's exactly what you are getting.
Paradigm Mixing
While there are reasons to do all of these things deliberately, as a learning experience it is probably going to be confusing to mix operations over what I might call "domain boundaries".
Buffered vs non-buffered I/O
The printf() is buffered, the write() is a direct system call. The write happens immediately no matter what, the printf will be (usually) buffered line-by-line when the output is a terminal and block-by-block when the output is a real file. In the file-output case (redirection) your actual printf output will happen only when you return from main() or in some other fashion call exit(3), unless you printf a whole bunch of stuff.
Historic csh redirection vs bash redirection
The now-forgotten (but typically still in a default install) csh that Bill Joy wrote at UCB while a grad student had a few nice features that have been imported into kitchen-sink shells that OR-together every shell feature ever thought of. Yes, I'm talking about bash here. So, in csh, the way to redirect both standard output and standard error was simply to say cmd >& file which was really more civilized that the bag-of-tools approach that the "official" Bourne shell provided. But the Bourne syntax had its good points elsewhere and in any case survived as the dominant paradigm.
But the bash "native" redirection features are somewhat complex and I wouldn't try to summarize them in a SO answer, although others seem to have made a good start. In any case you are using real bash redirection in one test and the legacy-csh syntax that bash also supports in another, and with a program that itself mixes paradigms. The main issue from the shell's point of view is that the order of redirection is quite important in the bash-style syntax while the csh-style syntax simply specifies the end result.
There are several loosely related issues here.
Style comment: I recommend using 2>&1 without spaces. I wasn't even aware that the spaced-out version works (I suspect it didn't in Bourne shell in the mid-80s) and the compressed version is the orthodox way of writing it.
The file-descriptor I/O redirection notations are not all available in the C shell and derivatives; they are avialable in Bourne shell and its derivatives (Korn shell, POSIX shell, Bash, ...).
The difference between >file or 2>file and 2>&1 is what the shell has to do. The first two arrange for output written to a file descriptor (1 in the first case, aka standard output; 2 in the second case, aka standard error) to go to the named file. This means that anything written by the program to standard output goes to file instead. The third notation arranges for 2 (standard error) to go to the same file descriptor as 1 (standard output); anything written to standard error goes to the same file as standard output. It is trivially implemented using dup2(). However, the standard error stream in the program will have its own buffer and the standard output stream in the program will have its own buffer, so the interleaving of the output is not completely determinate if the output goes to a file.
You run the command two different ways, and (not surprisingly) get two different results.
./a.out > outfile 2>&1
I/O redirections are processed left to right. The first one sends standard output to outfile. The second sends standard error to the same place as standard output, so it goes to outfile too.
./a.out 2>&1 >outfile
The first redirection sends standard error to the place where standard output is going, which is currently the terminal. The second redirection then sends standard output to the file (but leaves standard error going to the terminal).
The program uses the printf() function and the write() system call. When the printf() function is used, it buffers its output. If the output is going to a terminal, then it is normally 'line buffered', so output appears when a newline is added to the buffer. However, when the output is going to a file, it is 'fully buffered' and output does not appear until the file stream is flushed or closed or the buffer fills. Note that stderr is not fully buffered, so output written to it appears immediately.
If you run your program without any I/O redirection, you will see:
standard output.
standard err, output
By contrast, the write() system call immediately transfers data to the output file descriptor. In the example, you write to standard error, and what you write will appear immediately. The same would have happened if you had used fprintf(stderr, ...). However, suppose you modified the program to write to STDOUT_FILENO; then when the output is to a file, the output would appear in the order:
standard err, output
standard output.
because the write() is unbuffered while the printf() is buffered.
The 2>&1 part makes the shell do something like that:
dup2(1, 2);
This makes fd 2 a "copy" of fd 1.
The 2> file is interpreted as
fd = open(file, ...);
dup2(fd, 2);
which opens a file and puts the filedescriptor into slot 2.

Check for UNIX command line arguments, pipes and redirects from a C program

I have some problem to figure out how I can maintain the pipe and redirect functionality of a shell once I find out that there are missing command line arguments.
If I for example use a scanf call, that will work with a re-direct or a pipe from a shell, but in absence of this I get a prompt, which I don't want.
I would like to accept command line arguments through argv[], a pipe or re-direct but I can't figure out how to do it with out getting the prompt. If I for example try something like this:
if(argc < 2)
exit(0);
Then the program will terminate if I try this:
echo arg | myProgram
Or this:
myProgram < fileWithArgument
I have tried to look this up but I always get some bash scripting reference.
The common way to handle situations like this is to check if the standard input stream is connected to a terminal or not, using isatty or similar functions depending on your OS. If it is, you take parameters from the command line, if not (it's been redirected), you read standard input.
Short version: You can't do it.
Pipeline and redirect specifiers are not arguments to your program, rather they are commands to the invoking shell and are processed before the running instance of your program even exists. The shell does no pass them to the program in argv or any other variable, and you can not discover them in any reliable way.
Neil has given you the way to determine if you are connected to a terminal.
In your examples you are using pipe redirection, both echo arg | myProgram and myProgram < filesWithArguments are sending output to the STDIN of your program.
If you want to read these values, use scanf or fread on the STDIN file descriptor.
If you are trying to get the file content as an argument list for your executable, you need to use it like this:
# This will pass `lala` as a variable
myProgram `echo lala`

Resources