I'm having issues execvping the *.txt wildcard, and reading this thread - exec() any command in C - indicates that it's difficult because of "globbing" issues. Is there any easy way to get around this?
Here's what I'm trying to do:
char * array[] = {"ls", "*.txt", (char *) NULL };
execvp("ls", array);
you could use the system command:
system("ls *.txt");
to let the shell do the globbing for you.
In order to answer this question you have to understand what is going on when you type ls *.txt in your terminal (emulator). When ls *.txt command is typed, it is being interpreted by the shell. The shell then performs directory listing and matches file names in the directory against *.txt pattern. Only after all of the above is done, shell prepares all of the file names as arguments and spawns a new process passing those file names as argv array to execvp call.
In order to assemble something like that yourself, look at the following Q/A:
How to list files in a directory in a C program?
Use fnmatch() to match file name with a shell-like wildcard pattern.
Prepare argument list from matched file names and use vfork() and one of the exec(3) family of functions to run another program.
Alternatively, you can use system() function as #manu-fatto has suggested. But that function will do a little bit different thing — it will actually run the shell program that will evaluate ls *.txt statement which in turn will perform steps similar to one I have described above. It is likely to be less efficient and it may introduce security holes (see manual page for more details, security risk are stated under NOTES section with a suggestion not to use the above function in certain cases).
Hope it helps. Good Luck!
Related
I'd like to add an operator ( e.g. ^> ) to handle prepend instead append (>>). Do I need to modify Bash source or is there an easier way (plugin, etc)?
First of all, you'd need to modify bash sources and quite heavily. Because, above all, your ^> would be really hard to implement.
Note that bash redirection operators usually do a very simple writes, and work on a single file (or program in case of pipes) only. Excluding very specific solutions, you usually can't write to a beginning of a file for the very simple reason you'd need to move all remaining contents forward after each write. You could try doing that but it will be hard, very ineffective (since every write will require re-writing the whole file) and very unsafe (since with any error you will end up with random mix of old and new version).
That said, you are indeed probably better off with a function or any other solution which would use a temporary file, like others suggested.
For completeness, my own implementation of that:
prepend() {
local tmp=$(tempfile)
if cat - "${1}" > "${tmp}"; then
mv "${tmp}" "${1}"
else
rm -f "${tmp}"
# some error reporting
fi
}
Note that you unlike #jpa suggested, you should be writing the concatenated data to a temporary file as that operation can fail and if it does, you don't want to lose your original file. Afterwards, you just replace the old file with new one, or delete the temporary file and handle the failure any way you like.
Synopsis the same as with the other solution:
echo test | prepend file.txt
And a bit modified version to retain permissions and play safe with symlinks (if that is necessary) like >> does:
prepend() {
local tmp=$(tempfile)
if cat - "${1}" > "${tmp}"; then
cat "${tmp}" > "${1}"
rm -f "${tmp}"
else
rm -f "${tmp}"
# some error reporting
fi
}
Just note that this version is actually less safe since if during second cat something else will write to disk and fill it up, you'll end up with incomplete file.
To be honest, I wouldn't personally use it but handle symlinks and resetting permissions externally, if necessary.
^ is a poor choice of character, as it is already used in history substitution.
To add a new redirection type to the shell grammar, start in parse.y. Declare it as a new %token so that it may be used, add it to STRING_INT_ALIST other_token_alist[] so that it may appear in output (such as error messages), update the redirection rule in the parser, and update the lexer to emit this token upon encountering the appropriate characters.
command.h contains enum r_instruction of redirection types, which will need to be extended. There's a giant switch statement in make_redirection in make_cmd.c processing redirection instructions, and the actual redirection is performed by functions throughout redir.c. Scattered throughout the rest of source code are various functions for printing, copying, and destroying pipelines, which may also need to be updated.
That's all! Bash isn't really that complex.
This doesn't discuss how to implement a prepending redirection, which will be difficult as the UNIX file API only provides for appending and overwriting. The only way to prepend to a file is to rewrite it entirely, which (as other answers mention) is significantly more complex than any existing shell redirections.
Might be quite difficult to add an operator, but perhaps a function could be enough?
function prepend { tmp=`tempfile`; cp $1 $tmp; cat - $tmp > $1; rm $tmp; }
Example use:
echo foobar | prepend file.txt
prepends the text "foobar" to file.txt.
I think bash's plugin architecture (loading shared objects via the 'enable' built-in command) is limited to providing additional built-in commands. The redirection operators are part of they syntax for running simple commands, so I think you would need to modify the parser to recognize and handle your new ^> operator.
Most Linux filesystems do not support prepending. In fact, I don't know of any one that has a stable userspace interface for it. So, as stated by others already, you can only rely on overwriting, either just the initial parts, or the entire file, depending on your needs.
You can easily (partially) overwrite initial file contents in Bash, without truncating the file:
exec {fd}<>"$filename"
printf 'New initial contents' >$fd
exec {fd}>&-
Above, $fd is the file descriptor automatically allocated by Bash, and $filename is the name of the target file. Bash opens a new read-write file descriptor to the target file on the first line; this does not truncate the file. The second line overwrites the initial part of the file. The position in the file advances, so you can use multiple commands to overwrite consecutive parts in the file. The third line closes the descriptor; since there is only a limited number available to each process, you want to close them after you no longer need them, or a long-running script might run out.
Please note that > does less than you expected:
Remove the > and the following word from the commandline, remembering the redirection.
When the commandline is processed and the command can be launched, calling fork(2) (or clone(2)), to create a new process.
Modify the new process according to the command. That includes things like modified environment variables (SOMEVAR=foo yourcommand), but also changed filedescriptors. At this point, a > yourfile from the cmdline will have the effect that the file is open(2)'ed at the stdout filedescriptor (that is #1) in write-only mode truncating the file to zero bytes. A >> yourfile would have the effect that the file is oppend at stdout in write-only mode and append mode.
(Only now launch the program, like execv(yourprogram, yourargs)
The redirections could, for a simple example, be implemented like
open(yourfile, O_WRONLY|O_TRUNC);
or
open(yourfile, O_WRONLY|O_APPEND);
respectively.
The program then launched will have the correct environment set up, and can happily write to fd1. From here, the shell is not involved. The real work is not done by the shell, but by the operating system. As Unix doesn't have a prepend mode (and it would be impossible to integrate that feature correctly), everything you could try would end up in a very lousy hack.
Try to re-think your requirements, there's always a simpler way around.
I have a script that runs in AIX ksh that looks like this:
wc dir1/* dir2/* | {awk command to rearrange output} | {grep command to filter more} > dir2/output.txt
It is a precondition to this line that dir2/output.txt does not exist.
The issue is that dir2/output.txt has contained itself in the output (it's happened a handful of times out of hundreds of times with no problem). dir1 and dir2 are NFS-mounted.
Is it related to the implementation of wc -- what if the the first parameter takes a long time? I think not, as I've tried the following:
wc `sleep 5` *.txt > out.txt
Even in this case out.txt does not list itself.
As a last note, wildcards are used in this example where they are used in the actual script. So if the expansion happens first, why does this problem occur?
At what point is dir2/output.txt actually created?
Redirections are done by the shell, as are globs. Your problem is that, in the case of a pipeline, each pipeline stage is a separate subprocess; whether the shell subprocess that does the final redirection runs before the one that builds the glob of input files for wc will depend on details of the scheduler and system load, among other things, and should be considered indeterminate.
In short, you should assume that this will happen and either exclude dir2/output.txt (take a look at ksh extended glob patterns; in particular, something along the lines of dir2/!(output.txt) may be useful) or create the output somewhere else and mv it to its final location afterward.
I have to remove few hundreds of files inside my C code. I use "remove" in a loop. Is there any faster way to do it than using "remove"? I ask this because I can't give wildchars using "remove".
No, there isn't a quicker way than using remove() - or unlink() on POSIX systems - in a loop.
The system rm command does that too - at least in the simple, non-recursive case where the names are given on the command line. The shell expands the metacharacters, and rm (in)famously goes along deleting what it was told to delete, unaware of the disastrous *.* notation that was used on the command line. (In the recursive case, it uses a function such as nftw() to traverse the directory structure in depth-first order and repeated calls to unlink() to remove the files and rmdir() to remove the (now-empty) directories.)
POSIX does provide functions (glob() and wordexp()) to generate lists of file names from metacharacters as used in the (POSIX) shell, plus fnmatch() to see whether a name matches a pattern.
You could use system to spawn a shell which would do the * expansion for you. This would probably not run any faster than just calling unlink() in a loop, though, because it would have to spawn a shell (start a new process). But it would be easier to code.
I have some questions on the subject, of commas compared to space, in delimiting parameters.
They are questions that C programmers familiar with the cmd prompt, may be able to throw some light on..
I know that when doing
c:\>program a b c
there are 4 parameters [0]=program [1]=a [2]=b [3]=c
According to hh ntcmds.chm concepts..
Shell overview
; and , are used to separate parameters
; or , command1 parameter1;parameter2 Use to separate command parameters.
I see dir a,b gives the same result as dir a b
but
c:\>program a,b,c
gives parameters [0]=program [1]=a,b,c
So do some? or all? windows commands use ; and , ? and is that interpretation within the code of each command, or done by the shell like with space?
And if it is in the code of each command.. how would I know which do it?
I notice that documentation of explorer.exe mentions the comma,e.g. you can do
explorer /e,.
but DIR /? does not mention it, but can use it. And a typical c program doesn't take , as a delimiter at all.. So is it the case that the shell doesn't use comma to delimit, it uses space. And windows commands that do, do so 'cos they are (all?) written to delimit the parameters the shell has given them further when commas are used?
There are two differences here between Unix and Windows:
Internal commands such as DIR are built into the shell; their command line syntax doesn't have to follow the same rules as for regular programs
On Windows, programs are responsible for parsing their own command lines. The shell parses redirects and pipes, then passes the rest of the command line to the program in one string
Windows C programs built using Visual Studio use the command line parser in the Microsoft C runtime, which is similar to a typical Unix shell parser and obeys spaces and quotation marks.
I've never seen a C program that uses , or ; as a command line separator. I was aware of the special case for explorer /e,., but I'd never seen the dir a,b example until just now.
Batch files use a comma or semicolon as an alternative argument separator.
Test batch file:
#echo %1/%2/%3
Test run:
> test.cmd 1,2,3
1/2/3
> test.cmd 1;2 3
1/2/3
And, as you note, dir uses it, copy as well – those are both shell built-ins and probably run through a similar parser like batch files as well (it isn't exactly the same, since you can do things like cd.. or dir/s which aren't possible for anything else). I guess (note: speculation) this is some sort of backwards compatibility that goes back into the DOS or even CP/M days. Nowadays you probably should just use spaces. And as Tim notes, the C runtime dictates certain things about arguments and how they are supposed to be parsed. Many other languages/frameworks follow that convention but not necessarily all. PowerShell for example has completely different argument handling and this can sometimes be a surprise when interacting with native programs from within it (that being said, PowerShell cmdlets and functions are no programs executable elsewhere, but batch files likewise).
I'm writing a perl module called perl5i. Its aim is to fix a swath of common Perl problems in one module (using lots of other modules).
To invoke it on the command line for one liners you'd write: perl -Mperl5i -e 'say "Hello"' I think that's too wordy so I'd like to supply a perl5i wrapper so you can write perl5i -e 'say "Hello"'. I'd also like people to be able to write scripts with #!/usr/bin/perl5i so it must be a compiled C program.
I figured all I had to do was push "-Mperl5i" onto the front of the argument list and call perl. And that's what I tried.
#include <unistd.h>
#include <stdlib.h>
/*
* Meant to mimic the shell command
* exec perl -Mperl5i "$#"
*
* This is a C program so it works in a #! line.
*/
int main (int argc, char* argv[]) {
int i;
/* This value is set by a program which generates this C file */
const char* perl_cmd = "/usr/local/perl/5.10.0/bin/perl";
char* perl_args[argc+1];
perl_args[0] = argv[0];
perl_args[1] = "-Mperl5i";
for( i = 1; i <= argc; i++ ) {
perl_args[i+1] = argv[i];
}
return execv( perl_cmd, perl_args );
}
Windows complicates this approach. Apparently programs in Windows are not passed an array of arguments, they are passed all the arguments as a single string and then do their own parsing! Thus something like perl5i -e "say 'Hello'" becomes perl -Mperl5i -e say 'Hello' and Windows can't deal with the lack of quoting.
So, how can I handle this? Wrap everything in quotes and escapes on Windows? Is there a library to handle this for me? Is there a better approach? Could I just not generate a C program on Windows and write it as a perl wrapper as it doesn't support #! anyway?
UPDATE: Do be more clear, this is shipped software so solutions that require using a certain shell or tweaking the shell configuration (for example, alias perl5i='perl -Mperl5i') aren't satisfactory.
For Windows, use a batch file.
perl5i.bat
#echo off
perl -Mperl5i %*
%* is all the command line parameters minus %0.
On Unixy systems, a similar shell script will suffice.
Update:
I think this will work, but I'm no shell wizard and I don't have an *nix system handy to test.
perl5i
#!bash
perl -Mperl5i $#
Update Again:
DUH! Now I understood your #! comment correctly. My shell script will work from the CLI but not in a #! line, since #!foo requries that foo is a binary file.
Disregard previous update.
It seems like Windows complicates everything.
I think your best there is to use a batch file.
You could use a file association, associate .p5i with perl -Mperl5i %*. Of course this means mucking about in the registry, which is best avoided IMO. Better to include instructions on how to manually add the association in your docs.
Yet another update
You might want to look at how parl does it.
I can't reproduce the behaviour your describe:
/* main.c */
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for (i = 0; i < argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
C:\> ShellCmd.exe a b c
ShellCmd.exe
a
b
c
That's with Visual Studio 2005.
Windows is always the odd case. Personally, I wouldn't try to code for the Windows environment exception. Some alternatives are using "bat wrappers" or ftype/assoc Registry hacks for a file extension.
Windows ignores the shebang line when running from a DOS command shell, but ironically uses it when CGI-ing Perl in Apache for Windows. I got tired of coding #!c:/perl/bin/perl.exe directly in my web programs because of portability issues when moving to a *nix environment. Instead I created a c:\usr\bin directory on my workstation and copied the perl.exe binary from its default location, typically c:\perl\bin for AS Perl and c:\strawberry\perl\bin for Strawberry Perl. So in web development mode on Windows my programs wouldn't break when migrated to a Linux/UNIX webhost, and I could use a standard issue shebang line "#!/usr/bin/perl -w" without having to go SED crazy prior to deployment. :)
In the DOS command shell environment I just either set my PATH explicitly or create a ftype pointing to the actual perl.exe binary with embedded switch -Mperl5i. The shebang line is ignored.
ftype p5i=c:\strawberry\perl\bin\perl.exe -Mperl5i %1 %*
assoc .pl=p5i
Then from the DOS command line you can just call "program.pl" by itself instead of "perl -Mperl5i program.pl"
So the "say" statement worked in 5.10 without any additional coaxing just by entering the name of the Perl program itself, and it would accept a variable number of command line arguments as well.
Use CommandLineToArgvW to build your argv, or just pass your command line directly to CreateProcess.
Of couse, this requires a separate Windows-specific solution, but you said you're okay with that, this is relatively simple, and often coding key pieces specifically to the target system helps integration (from the users' POV) significantly. YMMV.
If you want to run the same program both with and without a console, you should read Raymond Chen on the topic.
On Windows, at the system level, the command-line is passed to the launched program as a single UTF-16 string, so any quotes entered in the shell are passed as is. So the double quotes from your example are not removed. This is quite different from the POSIX world where the shell does the job of parsing and the launched program receives an array of strings.
I've described here the behavior at the system level. However, between your C (or your Perl) program there is usually the C standard library that is parsing the system command line string to give it to main() or wmain() as argv[]. This is done inside your process, but you can still access the original command line string with GetCommandLineW() if you really want to control how the parsing is done, or get the string in its full UTF-16 encoding.
To learn more about the Windows command-line parsing quirks, read the following:
http://www.autohotkey.net/~deleyd/parameters/parameters.htm#WIN
http://blogs.msdn.com/b/oldnewthing/archive/2006/05/15/597984.aspx
You may also be interested by the code of the wrapper I wrote for Padre on Win32: this is a GUI program (which means that it will not open a console if launched from the Start menu) called padre.exe that embeds perl to launch the padre Perl script. It also does a small trick: it changes argv[0] to point it to perl.exe so that $^X will be something usable to launch external perl scripts.
The execv you are using in your example code is just an emulation in the C library of the POSIX-like behavior. In particular it will not add quotes around your arguments so that the launched perl works as expected. You have to do that yourself.
Note that due to the fact that the client is responsible for parsing, each client client can do it the way it wants. Many let the libc do it, but not all. So generic command-line generation rules on Windows can not exist: the rule depend on the program launched.
You may still be interested in "best effort" implementation such as Win32::ShellQuote.
If you were able to use C++ then perhaps Boost.Program_options would help:
http://www.boost.org/doc/libs/1_39_0/doc/html/program_options.html