Shell redirection vs explicit file handling code - c

I am not a native english speaker so please excuse the awkward title of this question. I just not knew how to phrase it better.
I am on a FreeBSD box and I have a little filter tool written in C which reads a list of data via stdin and outputs a processed list via stdout. I invoke it somewhat like this: find . -type f | myfilter > /tmp/processed.txt.
Now I want to give my filter a little bit more exposure and publish it. Convention says that tools should allow something like this: find . -type f | myfilter -f - -o /tmp/processed.text
This would force me to write code that simply is not needed since the shell can do the job, therefore I tend to leave it out.
My question is: Do I miss some argument (other but convention) why the reading and writing of files should be done in my code an not delegated to shell redirection?

There's absolutely nothing wrong with this. Your filter would have an interface similar to, say, c++filt.
You might consider file handling if you wanted to automatically choose an output file based on the name of an input file or if you wanted to special handling for processing multiple files in a single command.
If you don't want to do any either of these then there's nothing wrong with being a simple filter. Anyone can provide a set of simple shell wrappers to provide a cmd infile outfile syntax if they wish.

That's a needlessly limiting interface. Accepting arguments from the command line is more flexible,
grep foo file | myfilter > /tmp/processed.text
and it doesn't preclude find from being used
find . -type f -exec myfilter {} + > /tmp/processed.text

Actually to have the same effect as shell redirection you can do this:
freopen( "filename" , "wb" , stdout );
and so if you have used printf throughout your code, outputs will be redirected to the file. So you don't need to modify any of the code you've written before and easily adapt to the convention.

It is nice to have as option run any command with filename argument. As in your example:
myfilter [-f ./infile] [-o ./outfile] #or
myfilter [-o outfile] [filename] #and (the best one)
myfilter [-f file] [-o file] #so, when the input and output are the same file - the filter should working correctly anyway
For the nice example check the sort command. Usually used as filer in pipes, but can do [-o output] and correctly handle the same input/output problem too...
And why it is good? For example, when want run the command from "C" by "fork/exec" and don't want start the shell for handling I/O. In this case is much easier (and faster) execve(.....) with arguments as start the cmd with a shell wrapper.

Related

How can I redirect console output to file?

I'm new to c.
Is there any simple way to redirect all the console's output (printfs etc.) to a file using some general command line \ linkage parameter (without having to modify any of the original code)?
If so what is the procedure?
Use shell output redirection
your-command > outputfile.txt
The standard error will still be output to the console. If you don't want that, use:
your-command > outputfile.txt 2>&1
or
your-command &> outputfile.txt
You should also look into the tee utility, which can make it redirect to two places at once.
On unices, you can also do:
your-command | tee output file.txt
That way you'll see the output and be able to interact with the program, while getting a hardcopy of the standard output (but not standard input, so it's not like a teletype session).
As mentioned above, you can use the > operator to redirect the output of your program to a file as in:
./program > out_file
Also, you can append data to an existing file (or create it if it doesnt exit already by using >> operator:
./program >> out_file
If you really want to learn more about the (awesome) features that the command line has to offer I would really recommend reading this book (and doing lots of programming :))
http://linuxcommand.org/
Enjoy!
In Unix shells you can usually do executable > file 2> &1, whch means "redirect standard output to file and error output to standard output"

How to execvp ls *.txt in C

I'm having issues execvping the *.txt wildcard, and reading this thread - exec() any command in C - indicates that it's difficult because of "globbing" issues. Is there any easy way to get around this?
Here's what I'm trying to do:
char * array[] = {"ls", "*.txt", (char *) NULL };
execvp("ls", array);
you could use the system command:
system("ls *.txt");
to let the shell do the globbing for you.
In order to answer this question you have to understand what is going on when you type ls *.txt in your terminal (emulator). When ls *.txt command is typed, it is being interpreted by the shell. The shell then performs directory listing and matches file names in the directory against *.txt pattern. Only after all of the above is done, shell prepares all of the file names as arguments and spawns a new process passing those file names as argv array to execvp call.
In order to assemble something like that yourself, look at the following Q/A:
How to list files in a directory in a C program?
Use fnmatch() to match file name with a shell-like wildcard pattern.
Prepare argument list from matched file names and use vfork() and one of the exec(3) family of functions to run another program.
Alternatively, you can use system() function as #manu-fatto has suggested. But that function will do a little bit different thing — it will actually run the shell program that will evaluate ls *.txt statement which in turn will perform steps similar to one I have described above. It is likely to be less efficient and it may introduce security holes (see manual page for more details, security risk are stated under NOTES section with a suggestion not to use the above function in certain cases).
Hope it helps. Good Luck!

How do I add an operator to Bash in Linux?

I'd like to add an operator ( e.g. ^> ) to handle prepend instead append (>>). Do I need to modify Bash source or is there an easier way (plugin, etc)?
First of all, you'd need to modify bash sources and quite heavily. Because, above all, your ^> would be really hard to implement.
Note that bash redirection operators usually do a very simple writes, and work on a single file (or program in case of pipes) only. Excluding very specific solutions, you usually can't write to a beginning of a file for the very simple reason you'd need to move all remaining contents forward after each write. You could try doing that but it will be hard, very ineffective (since every write will require re-writing the whole file) and very unsafe (since with any error you will end up with random mix of old and new version).
That said, you are indeed probably better off with a function or any other solution which would use a temporary file, like others suggested.
For completeness, my own implementation of that:
prepend() {
local tmp=$(tempfile)
if cat - "${1}" > "${tmp}"; then
mv "${tmp}" "${1}"
else
rm -f "${tmp}"
# some error reporting
fi
}
Note that you unlike #jpa suggested, you should be writing the concatenated data to a temporary file as that operation can fail and if it does, you don't want to lose your original file. Afterwards, you just replace the old file with new one, or delete the temporary file and handle the failure any way you like.
Synopsis the same as with the other solution:
echo test | prepend file.txt
And a bit modified version to retain permissions and play safe with symlinks (if that is necessary) like >> does:
prepend() {
local tmp=$(tempfile)
if cat - "${1}" > "${tmp}"; then
cat "${tmp}" > "${1}"
rm -f "${tmp}"
else
rm -f "${tmp}"
# some error reporting
fi
}
Just note that this version is actually less safe since if during second cat something else will write to disk and fill it up, you'll end up with incomplete file.
To be honest, I wouldn't personally use it but handle symlinks and resetting permissions externally, if necessary.
^ is a poor choice of character, as it is already used in history substitution.
To add a new redirection type to the shell grammar, start in parse.y. Declare it as a new %token so that it may be used, add it to STRING_INT_ALIST other_token_alist[] so that it may appear in output (such as error messages), update the redirection rule in the parser, and update the lexer to emit this token upon encountering the appropriate characters.
command.h contains enum r_instruction of redirection types, which will need to be extended. There's a giant switch statement in make_redirection in make_cmd.c processing redirection instructions, and the actual redirection is performed by functions throughout redir.c. Scattered throughout the rest of source code are various functions for printing, copying, and destroying pipelines, which may also need to be updated.
That's all! Bash isn't really that complex.
This doesn't discuss how to implement a prepending redirection, which will be difficult as the UNIX file API only provides for appending and overwriting. The only way to prepend to a file is to rewrite it entirely, which (as other answers mention) is significantly more complex than any existing shell redirections.
Might be quite difficult to add an operator, but perhaps a function could be enough?
function prepend { tmp=`tempfile`; cp $1 $tmp; cat - $tmp > $1; rm $tmp; }
Example use:
echo foobar | prepend file.txt
prepends the text "foobar" to file.txt.
I think bash's plugin architecture (loading shared objects via the 'enable' built-in command) is limited to providing additional built-in commands. The redirection operators are part of they syntax for running simple commands, so I think you would need to modify the parser to recognize and handle your new ^> operator.
Most Linux filesystems do not support prepending. In fact, I don't know of any one that has a stable userspace interface for it. So, as stated by others already, you can only rely on overwriting, either just the initial parts, or the entire file, depending on your needs.
You can easily (partially) overwrite initial file contents in Bash, without truncating the file:
exec {fd}<>"$filename"
printf 'New initial contents' >$fd
exec {fd}>&-
Above, $fd is the file descriptor automatically allocated by Bash, and $filename is the name of the target file. Bash opens a new read-write file descriptor to the target file on the first line; this does not truncate the file. The second line overwrites the initial part of the file. The position in the file advances, so you can use multiple commands to overwrite consecutive parts in the file. The third line closes the descriptor; since there is only a limited number available to each process, you want to close them after you no longer need them, or a long-running script might run out.
Please note that > does less than you expected:
Remove the > and the following word from the commandline, remembering the redirection.
When the commandline is processed and the command can be launched, calling fork(2) (or clone(2)), to create a new process.
Modify the new process according to the command. That includes things like modified environment variables (SOMEVAR=foo yourcommand), but also changed filedescriptors. At this point, a > yourfile from the cmdline will have the effect that the file is open(2)'ed at the stdout filedescriptor (that is #1) in write-only mode truncating the file to zero bytes. A >> yourfile would have the effect that the file is oppend at stdout in write-only mode and append mode.
(Only now launch the program, like execv(yourprogram, yourargs)
The redirections could, for a simple example, be implemented like
open(yourfile, O_WRONLY|O_TRUNC);
or
open(yourfile, O_WRONLY|O_APPEND);
respectively.
The program then launched will have the correct environment set up, and can happily write to fd1. From here, the shell is not involved. The real work is not done by the shell, but by the operating system. As Unix doesn't have a prepend mode (and it would be impossible to integrate that feature correctly), everything you could try would end up in a very lousy hack.
Try to re-think your requirements, there's always a simpler way around.

When is a file created when using output redirection?

I have a script that runs in AIX ksh that looks like this:
wc dir1/* dir2/* | {awk command to rearrange output} | {grep command to filter more} > dir2/output.txt
It is a precondition to this line that dir2/output.txt does not exist.
The issue is that dir2/output.txt has contained itself in the output (it's happened a handful of times out of hundreds of times with no problem). dir1 and dir2 are NFS-mounted.
Is it related to the implementation of wc -- what if the the first parameter takes a long time? I think not, as I've tried the following:
wc `sleep 5` *.txt > out.txt
Even in this case out.txt does not list itself.
As a last note, wildcards are used in this example where they are used in the actual script. So if the expansion happens first, why does this problem occur?
At what point is dir2/output.txt actually created?
Redirections are done by the shell, as are globs. Your problem is that, in the case of a pipeline, each pipeline stage is a separate subprocess; whether the shell subprocess that does the final redirection runs before the one that builds the glob of input files for wc will depend on details of the scheduler and system load, among other things, and should be considered indeterminate.
In short, you should assume that this will happen and either exclude dir2/output.txt (take a look at ksh extended glob patterns; in particular, something along the lines of dir2/!(output.txt) may be useful) or create the output somewhere else and mv it to its final location afterward.

How to redirect output away from /dev/null

I have an application that runs the a command as below:
<command> <switches> >& /dev/null
I can configure <command>, but I have no control over <switches> . All the output generated by this command goes to /dev/null. I want the output to be visible on screen or redirected to a log file.
I tried to use freopen() and related functions to reopen /dev/null to another file, but could not get it working.
Do you have any other ideas? Is this possible at all?
Thanks for your time.
PS: I am working on Linux.
Terrible Hack:
use a text editor in binary mode open the app, find '/dev/null/' and replace it with a string of the same length
e.g '~/tmp/log'
make a backup first
be carefull
be very carefull
did I mention the backup?
Since you can modify the command you run you can use a simple shell script as a wrapper to redirect the output to a file.
#!/bin/bash
"$#" >> logfile
If you save this in your path as capture_output.sh then you can add capture_output.sh to the start of your command to append the output of your program to logfile.
Append # at the end of your command so it becomes <command> # >& /dev/null, thus commenting out the undesired part.
Your application is probably running a shell and passing it that command line.
You need to make it run a script written by you. That script will replace >/dev/null in the command line with >>/your/log and call the real shell with the modified command line.
The first step is to change the shell used by the application. Changing the environment variable SHELL should suffice, i.e., run your application as
SHELL=/home/user/bin/myshell theApp
If that doesn't work, try momentarily linking /bin/sh to your script.
myshell will call the original shell, but after pattern-replacing the parameters:
#!/bin/bash
sh ${1+"${#/\>\/dev\/null/>>\/your\/log}"}
Something along these lines should work.
You can do this with an already running process by using gdb. See the following page: http://etbe.coker.com.au/2008/02/27/redirecting-output-from-a-running-process/
Can you create an alias for that command? If so, alias it to another command that dumps output to a file.
The device file /dev/tty references your application's controlling terminal - if that hasn't changed, then this should work:
freopen("/dev/tty", "w", stdout);
freopen("/dev/tty", "w", stderr);
Alternatively, you can reopen them to point to a log file:
freopen("/var/log/myapp.log", "a", stdout);
freopen("/var/log/myapp.err", "a", stderr);
EDIT: This is NOT a good idea and certainly not worth trying unless you know what this can break. It works for me, may work for you as well.
Ok, This is a really bad hack and probably not worth doing. Assuming that none of the other commands works, and you simply do not have access to the binary/application (which contains the command with /dev/null) and you cannot re-direct the output to other file (by replacing /dev/null).
Then, you can delete /dev/null ($> rm /dev/null) and create your own file at its place (preferably with a soft link) where all the data can be directed. When you are done, you can create the /dev/null once again using following command:
$> mknod -m 666 /dev/null c 1 3
Just to be very clear, this is a bad hack and certainly requires root permissions to work. High chances that your re-directed file may contain data from many other applications/binaries which are running and use /dev/null as sink.
It may not exactly redirect, but it allows to get the output wherever it's being sent
strace -ewrite -p $PID
It's not that cleen (shows lines like: write(#,) ), but works! (and is single-line :D ) You might also dislike the fact, that arguments are abbreviated. To control that use -s parameter that sets the maxlength of strings displayed.
It catches all streams, so You might want to filter that somehow.
You can filter it:
strace -ewrite -p $PID 2>&1 | grep "write(1"
shows only descriptor 1 calls. 2>&1 is to redirect stderr to stdout, as strace writes to stderr by default.
In perl, if you just want to redirect STDOUT to something slightly more useful, you can just do something like:
open STDOUT, '>>', '/var/log/myscript.log';
open STDERR, '>>', '/var/log/myscript.err';
at the beginning of your script, and that'll redirect it for the rest of your script.
Along the lines of e-t172's answer, can you set the last switch to (or append to it):
; echo
If you can put something inline before passing things to /dev/null (not sure if you are dealing with a hardcoded command), you could use tee to redirect to something of your choice.
Example from Wikipedia which allows escalation of a command:
echo "Body of file..." | sudo tee root_owned_file > /dev/null
http://en.wikipedia.org/wiki/Tee_(command)

Resources