How does Perl handle the shebang line?

How does Perl handle the shebang line? - shebang

I'm trying to understand how perl deals with the shebang line.
I used to think that any interpreter mentioned in the "command position" on the command line would take precedence over one mentioned in the shebang line. For example, if an executable script called demo looks like this
#!/usr/local/bin/perl-5.00503
printf "$]\n";
...then I would observe the following:
$ ./demo
5.00503
% /usr/local/bin/perl-5.22 ./demo
5.022003
IOW, in the first execution, the interpreter in the shebang is the one running, while in the second it is the one mentioned on the command line. So far so good.
But now, if I change the "interpreter" on the shebang to something like /usr/bin/wc, then it always beats any perl interpreter I mention on the command line:
% cat demo-wc
#!/usr/bin/wc
printf "$]\n";
% ./demo-wc # produces the expected behavior
4 3 31 ./demo-wc
% /usr/local/bin/perl-5.22 ./demo-wc
4 3 31 ./demo-wc
% /usr/local/bin/perl-5.14 ./demo-wc
4 3 31 ./demo-wc
AFAICT, this special behavior seems to be limited perl interpreters; non-perl interpreters, such as /bin/bash, do "overrule" the shebang:
% /bin/bash ./demo-wc
$]
The bottom line is that perl seems to have radically different policies for handling the shebang depending on the interpreter mentioned.
How does perl determine which policy to follow?
What exactly are the policies in either case?

There are a couple of different cases in your tests.
When you use ./demo... the kernel finds the #! in the magic number (first 16 bits) and runs that program, or passes the line to the shell if it fails, which starts what's on it.
But when you invoke a perl on the command line, that perl binary is started by the shell and then that perl interpreter itself processes the shebang. In this case it discards the perl part but takes account of the switches – if the line contains "perl".
If the shebang does not invoke perl, we have behavior special to Perl.
From perlrun
If the #! line does not contain the word "perl" nor the word "indir", the program named after the #! is executed instead of the Perl interpreter. This is slightly bizarre, but it helps people on machines that don't do #!, because they can tell a program that their SHELL is /usr/bin/perl, and Perl will then dispatch the program to the correct interpreter for them.

Unlike most other interpreters, perl does its own processing of the #! line. This enables it to take multiple option arguments, even though the kernel's #! handler will only pass a single string.
Details are in the perlrun man page. The relevant portion for you is this:
If the "#!" line does not contain the word "perl" nor the word "indir" the program named after the "#!" is executed instead of the Perl interpreter. This is slightly bizarre, but it helps people on machines that don't do "#!", because they can tell a program that their SHELL is /usr/bin/perl, and Perl will then dispatch the program to the correct interpreter for them.

Related

ls piped into the command line

I have been trying to pipe in the results from ls into the command line for a C program I am writing (in Unix). I want to be able to have an index of the files and so I was planning on using argv. This is how I thought it should work:
./foo &(ls ~/path)
It doesn't work — what's the correct way to pass the output of ls as arguments to the command?

Your syntax is a bit off...
./foo $(ls ~/path)
Do note that this will choke on files with certain characters in them. Use an array instead to fix this.
pushd ~/path
files=(*)
popd
./foo "${files[#]}"

The notation you specified does two things:
./foo &
runs the program foo in the background (with no arguments other than its command name). Then:
(ls ~/path)
runs the ls command in a sub-shell (which, in this context, is the same as running it in the main shell). The problem is you intended (or need) to use $ in place of &.
./foo $(ls ~/path)
This runs the command ls ~/path and captures the output, which is split into words (using the separators listed in the $IFS variable). Each word is then supplied as an argument to the command ./foo, as you required.
We can then debate the wisdom of using the output of ls like that, but unless you have file names containing spaces (tabs, newlines etc,) you will be OK.

You know how Unix tools accept glob patterns, so you can do cat *.txt or rm ~/Pictures/Vacation*.jpg, without having to pipe/expand ls?
That's an ability your shell gives your program for free!
Just use ./foo ~/path/* and argv[1] will contain /home/you/path/fileone, argv[2] will contain /home/you/path/filetwo, and so forth.
These filenames may be relative or absolute, but can always be passed directly to open/fopen/execve or whichever function you want to use.
Using ls as you describe will only give you the last part of the filename with no directory, so you won't know where the files are to do anything with them (though if that's what you want, just use basename(argv[1])).

linux terminal hangs on bash script

I'm a student and this is my first exposure to bash scripting, besides messing with a simple Makefile for c.
#!/usr/bin/sh
gcc -g -std=c99 -Wall -c field.c
gcc -g -std=c99 -Wall -c testField.c
gcc -g field.o testField.o -o testField
#testField get 0xa 0 1 > PA1output.txt
#testField get 0xaa 0 3 >> PA1output.txt
is my script.I want to compile field.c and testField.c into the executable testField.
No matter if I leave the last 2 lines commented out or not, they linux terminal hangs and after 10 seconds of nothing happening I press ctrl+c to stop it. Ultimately I want to redirect output to PA1output.txt, then concatenate things on the end of the file, but I want to rewrite the file contents each time.
As far as I understand it, > rewrites the contents of the specified file, and >> concatenates onto the end.
This is not my homework, I want to automate testing of other homework I have. 'testField get 0xaa 0 3 are arguments into my c program.'
I tried Bash script hangs
but that didn't answer my question totally.
My script is called 'as' to make it easy to type.
Why does the terminal hang and how do I get the script to do what I described above?
Thanks.

Your system has another program called ‘as’ which is an assembler. You are likely running this rather than your script, and it hangs because the assembler is waiting for input from your terminal.
If you insist on keeping the name, you should run your script with a full or partial pathname (like ‘./as’) so that the correct program is run.
You will probably find that your script will not run without the ‘#’ at the beginning of your first line. However, another way to run your script is ‘sh ./as’ from the command line, which does not depend on having the #! line.

As Jeremy described, it's most likely a conflict of names.
If you are running your script from the command line (I really hope you are), you don't have to be afraid of giving your scripts (and all file names for that matter) longer, but more specific, names. Most (if not all) command line interfaces on linux have some form of tab-expansion. All you have to do is type enough of the name to make it unique, then press [Tab], and the shell should complete the name for you.
Here's a more thorough explanation for Bash.

cmd- comma to separate parameters Compared to space?

I have some questions on the subject, of commas compared to space, in delimiting parameters.
They are questions that C programmers familiar with the cmd prompt, may be able to throw some light on..
I know that when doing
c:\>program a b c
there are 4 parameters [0]=program [1]=a [2]=b [3]=c
According to hh ntcmds.chm concepts..
Shell overview
; and , are used to separate parameters
; or , command1 parameter1;parameter2 Use to separate command parameters.
I see dir a,b gives the same result as dir a b
but
c:\>program a,b,c
gives parameters [0]=program [1]=a,b,c
So do some? or all? windows commands use ; and , ? and is that interpretation within the code of each command, or done by the shell like with space?
And if it is in the code of each command.. how would I know which do it?
I notice that documentation of explorer.exe mentions the comma,e.g. you can do
explorer /e,.
but DIR /? does not mention it, but can use it. And a typical c program doesn't take , as a delimiter at all.. So is it the case that the shell doesn't use comma to delimit, it uses space. And windows commands that do, do so 'cos they are (all?) written to delimit the parameters the shell has given them further when commas are used?

There are two differences here between Unix and Windows:
Internal commands such as DIR are built into the shell; their command line syntax doesn't have to follow the same rules as for regular programs
On Windows, programs are responsible for parsing their own command lines. The shell parses redirects and pipes, then passes the rest of the command line to the program in one string
Windows C programs built using Visual Studio use the command line parser in the Microsoft C runtime, which is similar to a typical Unix shell parser and obeys spaces and quotation marks.
I've never seen a C program that uses , or ; as a command line separator. I was aware of the special case for explorer /e,., but I'd never seen the dir a,b example until just now.

Batch files use a comma or semicolon as an alternative argument separator.
Test batch file:
#echo %1/%2/%3
Test run:
> test.cmd 1,2,3
1/2/3
> test.cmd 1;2 3
1/2/3
And, as you note, dir uses it, copy as well – those are both shell built-ins and probably run through a similar parser like batch files as well (it isn't exactly the same, since you can do things like cd.. or dir/s which aren't possible for anything else). I guess (note: speculation) this is some sort of backwards compatibility that goes back into the DOS or even CP/M days. Nowadays you probably should just use spaces. And as Tim notes, the C runtime dictates certain things about arguments and how they are supposed to be parsed. Many other languages/frameworks follow that convention but not necessarily all. PowerShell for example has completely different argument handling and this can sometimes be a surprise when interacting with native programs from within it (that being said, PowerShell cmdlets and functions are no programs executable elsewhere, but batch files likewise).

Command line arguments with datafiles

If I want to pass a program data files how can I distinguish the fact they are data files, not just strings of the file names. Basically I want to file redirect, but use command line arguments so I can a sure input is correct.
I have been using:
./theapp < datafile1 < datafile2 arg1 arg2 arg3 > outputfile
but I am wondering is it posible for it to look like this:
./the app datafile1 datafile2 arg1 arg2 arg3 > outputfile
Allowing the use of command line arguments.

It's a little hard to combine two files into standard input like that. Better would be:
cat datafile1 datafile2 | ./theapp arg1 arg2 arg3 >outputfile
With bash (at least), the second input redirection overrides the first, it does not augment it. You can see that with the two commands:
cat <realfile.txt </dev/null # no output.
cat </dev/null <realfile.txt # outputs realfile.txt.
When you use redirection, your application never even sees >outputfile (for example). It is evaluated by the shell which opens it up and connects it to the standard output of the process you're trying to run. All your program will generally see will be:
./theapp arg1 arg2 arg3
Same with standard input, it's taken care of by the shell.
The only possible problem with that first command above is that it combines the two files into one stream so that your program doesn't know where the first ends and second begins (unless it can somehow deduce this from the content of the files).
If you want to process multiple files and know which they are, there's a time-honoured tradition of doing something like:
./theapp arg1 arg2 arg3 #datafile1 #datafile2 >outputfile
and then having your application open and process the files itself. This is more work than letting the shell do it though.

From the perspective of your program, all command line arguments are strings, and you have to decide whether they represent file names or not yourself. There are only two bytes that cannot appear in a file name on Unix: 0x00 and 0x2F (NUL and /). [I really mean bytes. Except for HFS+, Unix file systems are completely oblivious to character encoding, although sensible people use UTF-8, of course.]
Shell redirections don't appear in argv at all.
There is a convention, though: treat each element of argv (except argv[0] of course) that does not begin with a dash as the name of a file to process, in the order that they appear. You do NOT have to do any unquoting operations; just pass them to fopen (or open) as is. If the string "-" appears as an element of argv, process standard input at that point until exhausted, then continue looping over argv. And if the string "--" appears in argv, treat everything after that point as a file name, whether or not it begins with a dash. (Including subsequent appearances of "-" or "--").
There may be a handy library module or even a language primitive to deal with this stuff for you, depending on what language you're using. For instance, in Perl, you just write
for (<>) {
... do stuff with $_ ...
}
and you get everything I said in the "There is a convention..." paragraph for free. (But you said C, so, um, you gotta do most of it yourself. I'm not aware of an argument-processing library for plain C that's worth the space it takes on disk. :-( )

Writing a portable command line wrapper in C

I'm writing a perl module called perl5i. Its aim is to fix a swath of common Perl problems in one module (using lots of other modules).
To invoke it on the command line for one liners you'd write: perl -Mperl5i -e 'say "Hello"' I think that's too wordy so I'd like to supply a perl5i wrapper so you can write perl5i -e 'say "Hello"'. I'd also like people to be able to write scripts with #!/usr/bin/perl5i so it must be a compiled C program.
I figured all I had to do was push "-Mperl5i" onto the front of the argument list and call perl. And that's what I tried.
#include <unistd.h>
#include <stdlib.h>
/*
* Meant to mimic the shell command
* exec perl -Mperl5i "$#"
*
* This is a C program so it works in a #! line.
*/
int main (int argc, char* argv[]) {
int i;
/* This value is set by a program which generates this C file */
const char* perl_cmd = "/usr/local/perl/5.10.0/bin/perl";
char* perl_args[argc+1];
perl_args[0] = argv[0];
perl_args[1] = "-Mperl5i";
for( i = 1; i <= argc; i++ ) {
perl_args[i+1] = argv[i];
}
return execv( perl_cmd, perl_args );
}
Windows complicates this approach. Apparently programs in Windows are not passed an array of arguments, they are passed all the arguments as a single string and then do their own parsing! Thus something like perl5i -e "say 'Hello'" becomes perl -Mperl5i -e say 'Hello' and Windows can't deal with the lack of quoting.
So, how can I handle this? Wrap everything in quotes and escapes on Windows? Is there a library to handle this for me? Is there a better approach? Could I just not generate a C program on Windows and write it as a perl wrapper as it doesn't support #! anyway?
UPDATE: Do be more clear, this is shipped software so solutions that require using a certain shell or tweaking the shell configuration (for example, alias perl5i='perl -Mperl5i') aren't satisfactory.

For Windows, use a batch file.
perl5i.bat
#echo off
perl -Mperl5i %*
%* is all the command line parameters minus %0.
On Unixy systems, a similar shell script will suffice.
Update:
I think this will work, but I'm no shell wizard and I don't have an *nix system handy to test.
perl5i
#!bash
perl -Mperl5i $#
Update Again:
DUH! Now I understood your #! comment correctly. My shell script will work from the CLI but not in a #! line, since #!foo requries that foo is a binary file.
Disregard previous update.
It seems like Windows complicates everything.
I think your best there is to use a batch file.
You could use a file association, associate .p5i with perl -Mperl5i %*. Of course this means mucking about in the registry, which is best avoided IMO. Better to include instructions on how to manually add the association in your docs.
Yet another update
You might want to look at how parl does it.

I can't reproduce the behaviour your describe:
/* main.c */
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for (i = 0; i < argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
C:\> ShellCmd.exe a b c
ShellCmd.exe
a
b
c
That's with Visual Studio 2005.

Windows is always the odd case. Personally, I wouldn't try to code for the Windows environment exception. Some alternatives are using "bat wrappers" or ftype/assoc Registry hacks for a file extension.
Windows ignores the shebang line when running from a DOS command shell, but ironically uses it when CGI-ing Perl in Apache for Windows. I got tired of coding #!c:/perl/bin/perl.exe directly in my web programs because of portability issues when moving to a *nix environment. Instead I created a c:\usr\bin directory on my workstation and copied the perl.exe binary from its default location, typically c:\perl\bin for AS Perl and c:\strawberry\perl\bin for Strawberry Perl. So in web development mode on Windows my programs wouldn't break when migrated to a Linux/UNIX webhost, and I could use a standard issue shebang line "#!/usr/bin/perl -w" without having to go SED crazy prior to deployment. :)
In the DOS command shell environment I just either set my PATH explicitly or create a ftype pointing to the actual perl.exe binary with embedded switch -Mperl5i. The shebang line is ignored.
ftype p5i=c:\strawberry\perl\bin\perl.exe -Mperl5i %1 %*
assoc .pl=p5i
Then from the DOS command line you can just call "program.pl" by itself instead of "perl -Mperl5i program.pl"
So the "say" statement worked in 5.10 without any additional coaxing just by entering the name of the Perl program itself, and it would accept a variable number of command line arguments as well.

Use CommandLineToArgvW to build your argv, or just pass your command line directly to CreateProcess.
Of couse, this requires a separate Windows-specific solution, but you said you're okay with that, this is relatively simple, and often coding key pieces specifically to the target system helps integration (from the users' POV) significantly. YMMV.
If you want to run the same program both with and without a console, you should read Raymond Chen on the topic.

On Windows, at the system level, the command-line is passed to the launched program as a single UTF-16 string, so any quotes entered in the shell are passed as is. So the double quotes from your example are not removed. This is quite different from the POSIX world where the shell does the job of parsing and the launched program receives an array of strings.
I've described here the behavior at the system level. However, between your C (or your Perl) program there is usually the C standard library that is parsing the system command line string to give it to main() or wmain() as argv[]. This is done inside your process, but you can still access the original command line string with GetCommandLineW() if you really want to control how the parsing is done, or get the string in its full UTF-16 encoding.
To learn more about the Windows command-line parsing quirks, read the following:
http://www.autohotkey.net/~deleyd/parameters/parameters.htm#WIN
http://blogs.msdn.com/b/oldnewthing/archive/2006/05/15/597984.aspx
You may also be interested by the code of the wrapper I wrote for Padre on Win32: this is a GUI program (which means that it will not open a console if launched from the Start menu) called padre.exe that embeds perl to launch the padre Perl script. It also does a small trick: it changes argv[0] to point it to perl.exe so that $^X will be something usable to launch external perl scripts.
The execv you are using in your example code is just an emulation in the C library of the POSIX-like behavior. In particular it will not add quotes around your arguments so that the launched perl works as expected. You have to do that yourself.
Note that due to the fact that the client is responsible for parsing, each client client can do it the way it wants. Many let the libc do it, but not all. So generic command-line generation rules on Windows can not exist: the rule depend on the program launched.
You may still be interested in "best effort" implementation such as Win32::ShellQuote.

If you were able to use C++ then perhaps Boost.Program_options would help:
http://www.boost.org/doc/libs/1_39_0/doc/html/program_options.html

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight