C Language Filter Conventions - c

I seem to remember reading that C language filter programs should interpret their first argument, if present, as the name of the input file. Is this correct?
I'm coding a filter that requires a parameter file, and I'd rather interpret the first argument as the name of that file. How should this situation be handled?

Related

Variable and executable in a shell interpreter

Do you know, how make the difference between variable and executable in a shell interpretor? Because i don't know how i can do that in my lexer.
If anyone have an idea ^^
Thanks,
Have a nice day
Mathieu
In a normal Posix-style shell, the first "word" in a statement which is not a variable assignment is the command to execute. Variable assignments have the form name=value where there cannot be any whitespace around the = and the name is a valid variable name.
Other than that, and in arithmetic evaluation context (which is not required for basic shells), any use of a variable must be preceded by a $.
Identifying assignments is contextual, but it is easy to do since the = is mandatory. In a flex-style lexer you could enable and disable assignment recognition with appropriate start conditions, for example.
Without knowing anything more about your strategy for lexical analysis, it's hard to provide a more detailed answer.
If you care about compatibility with Posix shell syntax, the description can be found here.

Converting String literal to variable name

I have an assignment which tells me i need to accept arguments from the command line.
I know how to accept arguments from the command line, however this is what i need
I am told my arguments are as follows name_of_function name_of_variable argument1, arugment2
is there an easy way to map name_of_function to the name of the function and name_of_variable to the name of the global variable, without going strcmp on each of them ?
There is no tool or library that converts a string to the corresponding variable, function, or anything else in C. When you have e.g. a .NET runtime environment you could use reflection to see, if an object is in your program and to access it.
You will have to use strcmp or similar to interpret the command line arguments and decide how to deal with the commands.

Dynamic variable declaration in C

I'm a ruby developer and its been long time since I've coded in C. I want to use a datatype in C which behaves like a symbol.
Is this possible?
Program asks user for name
User replies - "foobar"
Program declares an integer with the same name i.e.
int foobar;
Unlike in interpreted languages, C does not have a dictionary of variable names at runtime. There exist no variable names at runtime at all. Hence unfortunately it is impossible to do what you want in C.
It's not possible to do this in C without implementing your own symbol table to emulate the desired behavior (essentially, implementing your own micro-programming language).
No. C must know names at compile time.
The best you could do is create your own dictionary of names and values. Much work though.
What do you want to do with the username-as-variable once you have it? What kind of operations would you perform with or on your foobaf variable?
As others have suggested you could use a data structure to dynamically associate the user name with a piece of integer data but knowing what you want to do with it would help inform suggestions as to whether that's even necessary and which data structures and algorithms you might want to look at.

Mapping variable argument LISP function to C function - C

I am developing a custom LISP interpreter. It won't support defining functions like in LISP, instead all functions are mapped to C functions. When it sees an expression like,
(substr 'input '1 '1)
it knows to call internal substr function and return the result.
Now I am planning to implement a message function which supports basic formatting and writes the output to stdout. Something like,
(message "Hello, %s" name)
%s will be replaced with value in variable name.
Current plan is to directly pass the format and arguments to functions like printf. In that way, I can support all formats that printf supports. But problem comes with variable number of arguments. One way to do will be something like,
if(argcount == 1)
/* call printf with one arg */
else if(argcount == 2)
/* call printf with two arg */
....
This works, but I am wondering is there a better way to achieve this?
I doubt there is a way to do this. The reason is that the number of parameters to your lisp function is only known at runtime, but the number of arguments to a C function must be known at compile time.
This includes va_lists unless you want to hack at them in some kind of platform specific way.
The best you can really do is write a function in C which is capable of looping through the arguments one at a time and doing something with them. The only way I can see around this is to not only store a function pointer for each of your internal functions, but to also store a "calling convention" which will give information about whether it takes parameters in the ordinary way or whether it finishes with the equivalent of a va_list.
Functions like printf would have a wrapper, printf_wrapper, say, and you'd store a function pointer to the wrapper. This wrapper would accept the format string as an ordinary parameter, followed by a list or array of other parameters (roughly analogous to a va_list).
You might indicate that printf_wrapper finishes with a parameter that expects a list by specifying the calling conventions for the printf_wrapper function as "va_list_type", meaning that it takes the usual fixed parameters, and that all remaining parameters must be bundled up and supplied to it as a list.
Of course writing a printf_wrapper function which can split up and parse a format string into multiple format strings is a bit of work. Here's an example of where I did precisely this so that I could add my own custom format specifiers:
https://github.com/wbhart/bsdnt/blob/v0.26/helper.c
Have your C function take parameters somewhat like argc/argv. That is, take a parameter specifying the number of parameters, and then a pointer to a list of pointers for each parameter.
Slightly better than an if-else chain would be a switch.
switch(argcount){
case 1: printf(arg[0]); break;
case 2: printf(arg[0],arg[1]); break;
//etc.
}

Why does C's "fopen" take a "const char *" as its second argument?

It has always struck me as strange that the C function "fopen" takes a "const char *" as the second argument. I would think it would be easier to both read your code and implement the library's code if there were bit masks defined in stdio.h, like "IO_READ" and such, so you could do things like:
FILE* myFile = fopen("file.txt", IO_READ | IO_WRITE);
Is there a programmatic reason for the way it actually is, or is it just historic? (i.e. "That's just the way it is.")
I believe that one of the advantages of the character string instead of a simple bit-mask is that it allows for platform-specific extensions which are not bit-settings. Purely hypothetically:
FILE *fp = fopen("/dev/something-weird", "r+,bs=4096");
For this gizmo, the open() call needs to be told the block size, and different calls can use radically different sizes, etc. Granted, I/O has been organized pretty well now (such was not the case originally — devices were enormously diverse and the access mechanisms far from unified), so it seldom seems to be necessary. But the string-valued open mode argument allows for that extensibility far better.
On IBM's mainframe MVS o/s, the fopen() function does indeed take extra arguments along the general lines described here — as noted by Andrew Henle (thank you!). The manual page includes the example call (slightly reformatted):
FILE *fp = fopen("myfile2.dat", "rb+, lrecl=80, blksize=240, recfm=fb, type=record");
The underlying open() has to be augmented by the ioctl() (I/O control) call or fcntl() (file control) or functions hiding them to achieve similar effects.
One word : legacy. Unfortunately we have to live with it.
Just speculation : Maybe at the time a "const char *" seemed more flexible solution, because it is not limited in any way. A bit mask could only have 32 different values. Looks like a YAGNI to me now.
More speculation : Dudes were lazy and writing "rb" requires less typing than MASK_THIS | MASK_THAT :)
Dennis Ritchie (in 1993) wrote an article about the history of C, and how it evolved gradually from B. Some of the design decisions were motivated by avoiding source changes to existing code written in B or embryonic versions of C.
In particular, Lesk wrote a 'portable
I/O package' [Lesk 72] that was later
reworked to become the C `standard
I/O' routines
The C preprocessor wasn't introduced until 1972/3, so Lesk's I/O package was written without it! (In very early not-yet-C, pointers fit in integers on the platforms being used, and it was totally normal to assign an implicit-int return value to a pointer.)
Many other changes occurred around 1972-3, but the most important was the introduction of the preprocessor, partly at the urging of Alan Snyder [Snyder 74]
Without #include and #define, an expression like IO_READ | IO_WRITE wasn't an option.
The options in 1972 for what fopen calls could look in typical source without CPP are:
FILE *fp = fopen("file.txt", 1); // magic constant integer literals
FILE *fp = fopen("file.txt", 'r'); // character literals
FILE *fp = fopen("file.txt", "r"); // string literals
Magic integer literals are obviously horrible, so unfortunately the obviously most efficient option (which Unix later adopted for open(2)) was ruled out by lack of a preprocessor.
A character literal is obviously not extensible; presumably that was obvious to API designers even back then. But it would have been sufficient (and more efficient) for early implementations of fopen: They only supported single-character strings, checking for *mode being r, w, or a. (See #Keith Thompson's answer.) Apparently r+ for read+write (without truncating) came later. (See fopen(3) for the modern version.)
C did have a character data type (added to B 1971 as one of the first steps in producing embryonic C, so it was still new in 1972. Original B didn't have char, having been written for machines that pack multiple characters into a word, so char() was a function that indexed a string! See Ritchie's history article.)
Using a single-byte string is effectively passing a char by const-reference, with all the extra overhead of memory accesses because library functions can't inline. (And primitive compilers probably weren't inlining anything, even trival functions (unlike fopen) in the same compilation unit where it would shrink total code size to inline them; Modern style tiny helper functions rely on modern compilers to inline them.)
PS: Steve Jessop's answer with the same quote inspired me to write this.
Possibly related: strcpy() return value. strcpy was probably written pretty early, too.
I must say that I am grateful for it - I know to type "r" instead of IO_OPEN_FLAG_R or was it IOFLAG_R or SYSFLAGS_OPEN_RMODE or whatever
I'd speculate that it's one or more of the following (unfortunately, I was unable to quickly find any kind of supporting references, so this'll probably remain speculation):
Kernighan or Ritchie (or whoever came up with the interface for fopen()) just happened to like the idea of specifying the mode using a string instead of a bitmap
They may have wanted the interface to be similar to yet noticeably different from the Unix open() system call interface, so it would be at once familiar yet not mistakenly compile with constants defined for Unix instead of by the C library
For example, let's say that the mythical C standard fopen() that took a bitmapped mode parameter used the identifier OPENMODE_READONLY to specify that the file what today is specified by the mode string "r". Now, if someone made the following call on a program compiled on a Unix platform (and that the header that defines O_RDONLY has been included):
fopen( "myfile", O_RDONLY);
There would be no compiler error, but unless OPENMODE_READONLY and O_RDONLY were defined to be the same bit you'd get unexpected behavior. Of course it would make sense for the C standard names to be defined the same as the Unix names, but maybe they wanted to preclude requiring this kind of coupling.
Then again, this might not have crossed their minds at all...
The earliest reference to fopen that I've found is in the first edition of Kernighan & Ritchie's "The C Programming Language" (K&R1), published in 1978.
It shows a sample implementation of fopen, which is presumably a simplified version of the code in the C standard library implementation of the time. Here's an abbreviated version of the code from the book:
FILE *fopen(name, mode)
register char *name, *mode;
{
/* ... */
if (*mode != 'r' && *mode != 'w' && *mode != 'a') {
fprintf(stderr, "illegal mode %s opening %s\n",
mode, name);
exit(1);
}
/* ... */
}
Looking at the code, the mode was expected to be a 1-character string (no "rb", no distinction between text and binary). If you passed a longer string, any characters past the first were silently ignored. If you passed an invalid mode, the function would print an error message and terminate your program rather than returning a null pointer (I'm guessing the actual library version didn't do that). The book emphasized simple code over error checking.
It's hard to be certain, especially given that the book doesn't spend a lot of time explaining the mode parameter, but it looks like it was defined as a string just for convenience. A single character would have worked as well, but a string at least makes future expansion possible (something that the book doesn't mention).
Dennis Ritchie has this to say, from http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
In particular, Lesk wrote a 'portable
I/O package' [Lesk 72] that was later
reworked to become the C `standard
I/O' routines
So I say ask Mike Lesk, post the result here as an answer to your own question, and earn stacks of points for it. Although you might want to make the question sound a bit less like criticism ;-)
The reason is simple: to allow the modes be extended by the C implementation as it sees fit. An argument of type int would not do that The C99 Rationale V5-10 7.19.5.3 The fopen function says e.g. that
Other specifications for files, such as record length and block size, are not specified in the Standard due to their widely varying characteristics in different operating environments.
Changes to file access modes and buffer sizes may be specified using the setvbuf function
(see §7.19.5.6).
An implementation may choose to allow additional file specifications as part of the mode string argument. For instance,
file1 = fopen(file1name, "wb,reclen=80");
might be a reasonable extension on a system that provides record-oriented binary files and allows
a programmer to specify record length.
Similar text exists in the C89 Rationale 4.9.5.3
Naturally if |ed enum flags were used then these kinds of extensions would not be possible.
One example of fopen implementation using these parameters would be on z/OS. An example there has the following excerpt:
/* The following call opens:
the file myfile2.dat,
a binary file for reading and writing,
whose record length is 80 bytes,
and maximum length of a physical block is 240 bytes,
fixed-length, blocked record format
for sequential record I/O.
*/
if ( (stream = fopen("myfile2.dat", "rb+, lrecl=80,\
blksize=240, recfm=fb, type=record")) == NULL )
printf("Could not open data file for read update\n");
Now, imagine if you had to squeeze all this information into one argument of type int!!
As Tuomas Pelkonen says, it's legacy.
Personally, I wonder if some misguided saps conceived of it as being better due to fewer characters typed? In the olden days programmers' time was valued more highly than it is today, since it was less accessible and compilers weren't as great and all that.
This is just speculation, but I can see why some people would favor saving a few characters here and there (note the lack of verbosity in any of the standard library function names... I present string.h's "strstr" and "strchr" as probably the best examples of unnecessary brevity).

Resources