Using sysctl(3) to write safe, portable code: good idea? - c

When writing safe code in straight C, I'm sick and tired of coming up with arbitrary
numbers to represent limitations -- specifically, the maximum amount of
memory to allocate for a single line of text. I know I can always say
stuff like
#define MAX_LINE_LENGTH 1024
and then pass that macro to functions such as snprintf().
I work and code in NetBSD, which has a sysctl(3) variable called
"user.line_max" designed for this very purpose. So I don't need to come up
with an arbitrary number like MAX_LINE_LENGTH, above. I just read the
"user.line_max" sysctl variable, which by the way is settable by the user.
My question is whether this is the Right Thing in terms of safety and
portability. Perhaps different operating systems have a different name for
this sysctl, but I'm more interested in whether I should be using this
technique at all.
And for the record, "portability" excludes Microsoft Windows in this case.

Well the linux SYSCTL (2) man page has this to say in the Notes section:
Glibc does not provide a wrapper for this system call; call it using syscall(2).
Or rather... don't call it: use of this system call has long been discouraged, and it is so unloved that it is likely to disappear in a future kernel version. Remove it from your programs now; use the /proc/sys interface instead.
So that is one consideration.

Not a good idea. Even if it weren't for what Duck told you, relying on a system-wide setting that's runtime-variable is bad design and error-prone. If you're going to go to the trouble of having buffer size limits be variable (which typically requires dynamic allocation and checking for failure) then you should go the last step and make it configurable on a more local scope.
With your example of buffer size limits, opinions differ as to what's the best practice. Some people think you should always use dynamically-growing buffers with no hard limit. Others prefer fixed limits sufficiently large that reasonable data would not exceed them. Or, as you've noted, configurable limits are an option. In choosing what's right for your application, I would consider the user experience implications. Sure users don't like arbitrary limits, but they also don't like it when accidentally (or by somebody else's malice) reading data with no newlines in it causes your application to consume unbounded amounts of memory, start swapping, and/or eventually crash or bog down the whole system.

The nearest portable construct for this is "getconf LINE_MAX" or the equivalent C.

1) Check out the Single Unix Specification, keyword: "limits"
2) s/safety/security/

Related

Buffer size for getdents64 to finish in one go

On Linux, is it possible to get the buffer size required for getdents64 to get all the entries in one go (assuming no modifications to the directory after the size is obtained)?
I tried the value from fstat(dirfd,&stb); stb.st_size but it appears unnecessarily too large? What is the value that stat::st_size holds for directories?
As far as I know, no, there is no way to do this, especially not in a manner that works for arbitrary filesystems. The representation of directories is defined by the underlying filesystem, and the same seems to be true of st_size. Certainly there is no way to get the right value for FUSE or 9p or other remote/virtual filesystems.
Why do you want to do this? I don't think it's useful. Once you get beyond a few kB per call, the overhead of the syscalls will be dominated by the actual work done. If you really care you can wrap the getdents64 syscall with a function that keeps resizing the buffer and calling it until EOF is reached. Or you could just use the portable opendir/readdir interfaces which perform appropriate buffering for you.

IOCTL, unix system call uses vairable number of arguments. Is it reliable or safe to use to monitor and control devices?

I have been learning how to control devices in linux based system using IOCTL and reading an article the author said that IOCTL prototype stands out in the list of Unix System call because of the dots int ioctl(int fd, unsigned long cmd, ...) which prevents type checking during compilation. The last part is what I do not get quite well. My concern is, not checking types could create some issues controlling the peripheral? and what would be a more reliable way or best practice to monitor and control a peripheral? Thanks
My concern is, not checking types could create some issues controlling the peripheral?
No, at least not directly. As long as the arguments provided are indeed of the correct number and types, everything will be well (that is, the values will be received correctly by the driver). The problem is that that compiler cannot help users of your device driver recognize when they are providing the wrong number or types of arguments.
and what would be a more reliable way or best practice to monitor and control a peripheral?
Alternative ways to monitor and communicate with a peripheral include character and/or block special files (see mknod()), setting kernel parameters via _sysctl(), and manipulating files presented in the proc filesystem via your driver. Whether any of those are more reliable, more appropriate, or better practice depends on many factors -- exactly what you're trying to do not least among them.
Not checking types means that the function does not prevent you from mistakenly passing a char there where your peripheral was expecting an int, and well, yes, that could create some issues controlling the peripheral.
So, this means that you need to be careful with the types of the parameters you pass.
The general rule is "GIGO", which stands for "Garbage In Garbage Out". If you give something garbage, it will give you back garbage. Type checking is meant to save programmers from really obvious, really dumb errors. No type checking simply means that the programmers need to be extra careful.
Generally, the first thing you need to do with ioctl() is to create a set of functions that fully describe the interface of your peripheral. Of course these functions will accept properly typed parameters. Then, you will implement each one of those functions by delegating to the type-unsafe ioctl() function. From that moment on, you never directly invoke ioctl() again.

How feasible is it to virtualise the FILE* interfaces of C?

It have often noticed that I would have been able to solve practical problems in C elegantly if there had been a way of creating a ‘virtual FILE’ and attaching the necessary callbacks for events such as buffer full, input requested, close, flush. It should then be possible to use a large part of the stdio.h functions, e.g. fprintf unchanged. Is there a framework enabling one to do this? If not, is it feasible with a moderate amount of effort, on at least some platforms?
Possible applications would be:
To write to or read from a dynamic or static region of memory.
To write to multiple files in parallel.
To read from a thread or co-routine generating data.
To apply a filter to another (virtual or real) FILE.
Support for file formats with indirection (like #include).
A C pre-processor(?).
I am less interested in solutions for specific cases than in a framework to let you roll your own FILE. I am also not looking for a virtual filesystem, but rather virtual FILE*s that I can pass to the CRT.
To my disappointment I have never seen anything of the sort; as far as I can see C11 considers FILE entirely up to the language implementer, which is perhaps reasonable if one wishes to keep the language (+library) specifications small but sad if you compare it with Java I/O streams.
I feel sure that virtual FILEs must be possible with any (fully) open source implementation of the C run-time, but I imagine there might be a large number of details making it trickier than it seems, and if it has already been done it would be a shame to reduplicate the effort. It would also be greatly preferable not to have to modify the CRT code. Without open source one might be able to reverse engineer the functions supplied, but I fear the result would be far too vulnerable to changes in unsupported features, unless there were a commitment to a set of interfaces. I suppose too that any system for which one can write a device driver would allow one to create a virtual device, but I suspect that of being unnecessarily low-level and of requiring one to write privileged code.
I have to admit that while I have code that would have benefited from virtual FILEs, I have no current requirement for it; nonetheless it is something I have often wondered about and that I imagine could be of interest to others.
This is somewhat similar to a-reader-interface-that-consumes-files-and-char-in-c, but there the questioner did not hope to return a virtual FILE; the answer, however, using fmemopen, did.
There is no standard C interface for creating virtual FILE*s, but both the GNU and the BSD standard libraries include one. On linux (glibc), you can use fopencookie; on most *BSD systems, funopen (including Mac OS X). (See Note 1)
The two interfaces are similar but slightly different in some details. However, it is usually very simple to adapt code written for one interface to the other.
These are not complete virtualizations. They associated the FILE* with four callbacks and a void* context (the "cookie" in fopencookie). The callbacks are read, write, seek and close; there are no callbacks for flush or tell operations. Still, this is sufficient for many simple FILE* adaptors.
For a simple example, see the two answers to Write simultaneousely to two streams.
Notes:
funopen is derived from "functional open", not from "file unopen".

Parsing: load into memory or use stream

I'm writing a little parser and I would like to know the advantages and disadvantages of the different ways to load the data to be parsed. The two ways that I thought of are:
Load the file's contents into a string then parse the string (access the character at an array position)
Parse as reading the file stream (fgetc)
The former will allow me to have two functions: one for parse_from_file and parse_from_string, however I believe this mode will take up more memory. The latter will not have that disadvantage of using more memory.
Does anyone have any advice on the matter?
Reading the entire file in or memory mapping it will be faster, but may cause issues if you want your language to be able to #include other files as these would be memory mapped or read into memory as well.
The stdio functions would work well because they usually try to buffer up data for you, but they are also general purpose so they also try to look out for usage patterns which differ from reading a file from start to finish, but that shouldn't be too much overhead.
A good balance is to have a large circular buffer (x * 2 * 4096 is a good size) which you load with file data and then have your tokenizer read from. Whenever a block's worth of data has been passed to your tokenizer (and you know that it is not going to be pushed back) you can refill that block with new data from the file and update some buffer location info.
Another thing to consider is if there is any chance that the tokenizer would ever need to be able to be used to read from a pipe or from a person typing directly in some text. In these cases your reads may return less data than you asked for without it being at the end of the file, and the buffering method I mentioned above gets more complicated. The stdio buffering is good for this as it can easily be switched to/from line or block buffering (or no buffering).
Using gnu fast lex (flex, but not the Adobe Flash thing) or similar can greatly ease the trouble with all of this. You should look into using it to generate the C code for your tokenizer (lexical analysis).
Whatever you do you should try to make it so that your code can easily be changed to use a different form of next character peek and consume functions so that if you change your mind you won't have to start over.
Consider using lex (and perhaps yacc, if the language of your grammar matches its capabilities). Lex will handle all the fiddly details of lexical analysis for you and produce efficient code. You can probably beat its memory footprint by a few bytes, but how much effort do you want to expend into that?
The most efficient on a POSIX system would probably neither of the two (or a variant of the first if you like): just map the file read-only with mmap, and parse it then. Modern systems are quite efficient with that in that they prefetch data when they detect a streaming access etc., multiple instances of your program that parse the same file will get the same physical pages of memory etc. And the interfaces are relatively simple to handle, I think.

I/O methods in C

I am looking for various ways of reading/writing data from stdin/stdout. Currently I know about scanf/printf, getchar/putchar and gets/puts. Are there any other ways of doing this? Also I am interesting in knowing that which one is most efficient in terms of Memory and Space.
Thanks in Advance
fgets()
fputs()
read()
write()
And others, details can be found here: http://www.cplusplus.com/reference/clibrary/cstdio/
As per your time question take a look at this: http://en.wikipedia.org/wiki/I/O_bound
Stdio is designed to be fairly efficient no matter which way you prefer to read data. If you need to do character-by-character reads and writes, they usually expand to macros which just access the buffer except when it's full/empty. For line-by-line text io, use puts/fputs and fgets. (But NEVER use gets because there's no way to control how many bytes it will read!) The printf family (e.g. fprintf) is of course extremely useful for text because it allows you to skip constructing a temporary buffer in memory before writing (and thus lets you avoid thinking about all the memory allocation, overflow, etc. issues). fscanf tends to be much less useful, but mostly because it's difficult to use. If you study the documentation for fscanf well and learn how to use %[, %n, and the numeric specifiers, it can be very powerful!
For large blocks of text (e.g. loading a whole file into memory) or binary data, you can also use the fread and fwrite functions. You should always pass 1 for the size argument and the number of bytes to read/write for the count argument; otherwise it's impossible to tell from the return value how much was successfully read or written.
If you're on a reasonably POSIX-like system (pretty much anything) you can also use the lower-level io functions open, read, write, etc. These are NOT part of the C standard but part of POSIX, and non-POSIX systems usually provide the same functions but possibly with slightly-different behavior (for example, file descriptors may not be numbered sequentially 0,1,2,... like POSIX would require).
If you're looking for immediate-mode type stuff don't forget about Curses (more applicable on the *NIX side but also available on Windows)

Resources