C program to test other programs with repeatable input and no restart - c

I'm trying to write a C program that is able to test the performance of other programs by passing in input and testing the output without having to restart the program every time it runs. Co-workers and I are writing sudoku solvers, and I'm writing the program to test how fast each one runs by solving numerous puzzles, which could all be in different languages, and I don't want to penalize people for using languages, like Java, that are really slow to start up. Ideally, this program will start the sudoku solver program, keep it running, and continually pass in a new puzzle via stdin and test the output in stdout.
Here's pseudocode of what I want to do:
start a sudoku solver in another process
once process is running
pass puzzle string into child stdin
wait until output comes into stdout
repeat until end time limit ends
close process
I've messed around with popen, but I couldn't figure out how to write to the child process stdin. I've done a bunch of poking around the internet, and I haven't been able to figure it out.
Any suggestions on how to accomplish this? I'm running this on a Linux box. It doesn't have to be stdin and stdout for communication, but that would be the easiest for everyone else.

This is more a long comment than an answer, but your question is really too broad and ill-defined, and I'm just giving some hints.
You first need to understand how to start, manage, and communicate with child processes. An entire Unix programming book is needed to explain that. You could read ALP or some newer book. You need to be able to write a Unix shell-like program. Become familiar with many syscalls(2) including fork(2), pipe(2), execve(2), dup2(2), poll(2), waitpid(2) and a dozen others. See also signal(7) & time(7).
You also need to discuss with your colleagues some conventions and protocol about these sudoku programs and how your controlling program would communicate with them (and the evil is in the details). For example, your pseudo-code is mentioning "pass puzzle string" but you don't define what that exactly means (what if the string contains newlines, or weird characters?). Read also about inter-process communication.
(You might want to have more than one sudoku process running. You probably don't want a buggy sudoku client to break your controlling program. This is unclear in your question)
You could want to define a text-based protocol (they are simpler to debug and use than binary protocols). Details matter a lot, so document it precisely (probably using some EBNF notation). You might want to use textual formats like JSON, YAML, S-expressions. You could take inspiration from SMTP, HTTP, JSONRPC etc (or perhaps choose to use one of them).
Remember that pipe(7)-s, fifo(7)-s and tcp(7)-s socket(7)-s are just a stream of bytes without any message boundaries. Any message organization above these should be a documented convention (and it might happen that the message would be fragmented, so you need careful buffering). See also this.
(I recommend making some free software sample implementation of your protocol)
Look also into similar work, perhaps SAT competition (or chess contests programs, I don't know the details).
Read also something about OSes, like Operating Systems: Three Easy Pieces

Related

Check if command was run directly by the user

Say I want to change the behavior of kill for educational reasons. If a user directly types it in the shell, then nothing will happen. If some other program/entity-who-is-not-the-user calls it, it performs normally. A wrapping if-statement is probably sufficient, but what do I put in that if?
Edit I don't want to do this in the shell. I'm asking about kernel programming.
In line 2296 of the kernel source, kill is defined. I will wrap an if statement around the code inside. In that statement, there should be a check to see whether the one who called this was the user or just some process. The check is the part I don't know how to implement.
Regarding security
Goal:
Block the user from directly calling kill from any shell
Literally everything else is fine and will not be blocked
While other answers are technically true, I think they're being too strict regarding the question. What you want to do it not possible to do in a 100% reliable way, but you can get pretty close by making some reasonable assumptions.
Specifically if you define an interactive kill as:
called by process owned by a logged in user
called directly from/by a process named like a shell (it may be a new process, or it may be a built-in operation)
called by a process which is connected to a serial/pseudo-terminal (possibly also belonging to the logged in user)
then you can check for each of those properties when processing a syscall and make your choice that way.
There are ways this will not be reliable (sudo + expect + sh should work around most of these checks), but it may be enough to have fun with. How to implement those checks is a longer story and probably each point would deserve its own question. Check the documentation about users and pty devices - that should give you a good idea.
Edit: Actually, this may be even possible to implement as a LKM. Selinux can do similar kind of checks.
It looks you are quite confused and do not understand what exactly a system call is and how does a Linux computer works. Everything is done inside some process thru system calls.
there should be a check to see whether the one who called this was directly done by the user or just some process
The above sentence has no sense. Everything is done by some process thru some system call. The notion of user exists only as an "attribute" of processes, see credentials(7) (so "directly done by the user" is vague). Read syscalls(2) and spend several days reading about Advanced Linux Programming, then ask a more focused question.
(I really believe you should not dare patching the kernel without knowing quite well what the ALP book above is explaining; then you would ask your question differently)
You should spend also several days or weeks reading about Operating Systems and Computer Architecture. You need to get a more precise idea of how a computer works, and that will take times (perhaps many years) and any answer here cannot cover all of it.
When the user types kill, he probably uses the shell builtin (type which kill and type kill) and the shell calls kill(2). When the user types /bin/kill he is execve(2) a program which will call kill(2). And the command might not come from the terminal (e.g. echo kill $$ | sh, the command is then coming from a pipe, or echo kill 1234|at midnight the kill is happening outside of user interaction and without any user interactively using the computer, the command being read from some file in /var/spool/cron/atjobs/, see atd(8)) In both cases the kernel only sees a SYS_kill system call.
BTW, modifying the kernel's behavior on kill could affect a lot of system software, so be careful when doing that. Read also signal(7) (some signals are not coming from a kill(2)).
You might use isatty(STDIN_FILENO) (see isatty(3)) to detect if a program is run in a terminal (no need to patch the kernel, you could just patch the shell). but I gave several cases where it is not. You -and your user- could also write a desktop application (using GTK or Qt) calling kill(2) and started on the desktop (it probably won't have any terminal attached when running, read about X11).
See also the notion of session and setsid(2); recent systemd based Linuxes have a notion of multi-seat which I am not familiar with (I don't know what kernel stuff is related to it).
If you only want to change the behavior of interactive terminals running some (well identified) shells, you need only to change the shell -with chsh(1)- (e.g. patch it to remove its kill builtin, and perhaps to avoid the shell doing an execve(2) of /bin/kill), no need to patch the kernel. But this won't prohibit the advanced user to code a small C program calling kill(2) (or even code his own shell in C and use it), compile his C source code, and run his freshly compiled ELF executable. See also restricted shell in bash.
If you just want to learn by making the exercise to patch the kernel and change its behavior for the kill(2) syscall, you need to define what process state you want to filter. So think in terms of processes making the kill(2) syscall, not in terms of "user" (processes do have several user ids)
BTW, patching the kernel is very difficult (if you want that to be reliable and safe), since by definition it is affecting your entire Linux system. The rule of thumb is to avoid patching the kernel when possible .... In your case, it looks like patching the shell could be enough for your goals, so prefer patching the shell (or perhaps patching the libc which is practically used by all shells...) to patching the kernel. See also LD_PRELOAD tricks.
Perhaps you just want the uid 1234 (assuming 1234 is the uid of your user) to be denied by your patched kernel using the kill(2) syscall (so he will need to have a setuid executable to do that), but your question is not formulated this way. That is probably simple to achieve, perhaps by adding in kill_ok_by_cred (near line 692 on Linux 4.4 file kernel/signal.c) something as simple as
if (uid_eq(1234, tcred->uid))
return 0;
But I might be completely wrong (I never patched the kernel, except for some drivers). Surely in a few hours Craig Ester would give a more authoritative answer.
You can use aliases to change the behavior of commands. Aliases are only applied at interactive shells. Shell scripts ignore them. For example:
$ alias kill='echo hello'
$ kill
hello
If you want an alias to be available all the time, you could add it to ~/.bashrc (or whatever the equivalent file is if your shell isn't bash).

C signals vs. eventhandler

i am interested in the C programming, lately. I like how you only have a 'minimal' set of functions and datatypes (the C standard library) and still you can create almost everything with it.
But now to my question:
How do you make simple event-handling in C? I have read about the signals.h header and this would be what i am looking for... if there were signals exclusivly reserved for the user. But i can never be sure that the environment unexpectedly raises one of the signals that i can use with the C standard library.
Okay... there is the extended signals header in linux/unix with 2(?) signals for the user... but i can imagine situations where you need more...
Besides i want to learn writing C platform independent. I heard about "emulating signals" by listening to a socket... but that would also not be platform independent.
Is there any way to write a C program that has to handle events without getting platform dependent only by help of the standard C library?
Thank you for any hints;
Yeap, that is exactly what Unix designed for, 2 user signals. Supposedly it all depends on what you use signal for. If you are just to relaying some events asynchronously, use sockets will do. Look up for event-loop. You can even create unlimited complexity behind that. Signals are a very special group of functions for OS specific reasons, such as somebody is trying to kill you. In that respect, the options should be limited in order to trim down overhead for OS operations.
My suggestion is to stay away from signals, unless you know very specifically what you are using it for. Signal is used for OS to communicate with you, not for you to communicate with yourself, although from many different places. And there are only defined reasons why OS want to give you a call. Hence, I tend to think the original 2 user defined signals are more than enough.
Unfortunately I think you are going to run into platform dependencies here. You can write a multithreaded application, where one thread waits for some input and then sends a message / makes a call when that input has arrived (such as waiting for an input string on a console). But that is not baked into C99, and you would have to rely on platform dependent third party libraries. Here is a useful post on that subject. I know this isn't the answer you want, but I hope it helps.
C: Multithreading
edit: C11 supports multithreading natively, see
http://en.cppreference.com/w/c/header
I haven't used this yet.

Creating unflushed file output buffers

I am trying to clear up an issue that occurs with unflushed file I/O buffers in a couple of programs, in different languages, running on Linux. The solution of flushing buffers is easy enough, but this issue of unflushed buffers happens quite randomly. Rather than seek help on what may cause it, I am interested in how to create (reproduce) and diagnose this kind of situation.
This leads to a two-part question:
Is it feasible to artificially and easily construct instances where, for a given period of time, one can have output buffers that are known to be unflushed? My searches are turning up empty. A trivial baseline is to hammer the hard drive (e.g. swapping) in one process while trying to write a large amount of data from another process. While this "works", it makes the system practically unusable: I can't poke around and see what's going on.
Are there commands from within Linux that can identify that a given process has unflushed file output buffers? Is this something that can be run at the command line, or is it necessary to query the kernel directly? I have been looking at fsync, sync, ioctl, flush, bdflush, and others. However, lacking a method for creating unflushed buffers, it's not clear what these may reveal.
In order to reproduce for others, an example for #1 in C would be excellent, but the question is truly language agnostic - just knowing an approach to create this situation would help in the other languages I'm working in.
Update 1: My apologies for any confusion. As several people have pointed out, buffers can be in the kernel space or the user space. This helped pinpoint the problems: we're creating big dirty kernel buffers. This distinction and the answers completely resolve #1: it now seems clear how to re-create unflushed buffers in either user space or kernel space. Identifying which process ID has dirty kernel buffers is not yet clear, though.
If you are interested in the kernel-buffered data, then you can tune the VM writeback through the sysctls in /proc/sys/vm/dirty_*. In particular, dirty_expire_centisecs is the age, in hundredths of a second, at which dirty data becomes eligible for writeback. Increasing this value will give you a larger window of time in which to do your investigation. You can also increase dirty_ratio and dirty_background_ratio (which are percentages of system memory, defining the point at which synchronous and asynchronous writeback start respectively).
Actually creating dirty pages is easy - just write(2) to a file and exit without syncing, or dirty some pages in a MAP_SHARED mapping of a file.
A simple program that would have an unflushed buffer would be:
main()
{
printf("moo");
pause();
}
Stdio, by default only flushes stdout on newlines, when connected to a terminal.
It is very easy to cause unflushed buffers by controlling the receiving side. The beauty of *nix systems is that everything looks like a file, so you can use special files to do what you want. The easiest option is a pipe. If you just want to control stdout, this is the simples option: unflushed_program | slow_consumer. Otherwise, you can use named pipes:
mkfifo pipe_file
unflushed_program --output pipe_file
slow_consumer --input pipe_file
slow_consumer is most likely a program you design to read data slowly, or just read X bytes and stop.

Getting rid of file-based communication

I have to work with two C programs that communicate via a file-based interface. That is, each of them has a main loop where it polls three or four files (fopen, fscanf), reacts to what it reads and eventually makes its own changes to the files (fprintf) for the other process to read.
Now I have to condense these two programs into a single program, with minimal changes to the program logic and the code in general. However, mainly for aesthetic reasons I'm supposed to replace the file-based communication with something in-memory.
I can imagine a few hacky ways to accomplish this, but I'm sure that stackoverflow will give me a hint at a beautiful solution :)
Since you tagged this Linux, I'm going to suggest open_memstream. It was added to POSIX with POSIX 2008, but it's been available on glibc-based Linux systems for a long time. Basically it lets you open a FILE * that's actually a dynamically-growing buffer in memory, so you wouldn't have to change much code. This "file" is write-only, but you could simply use sscanf instead of fscanf on the buffer to read it, or use fmemopen (which doesn't have the dynamic-growth semantics but which is very convenient for reading from in-memory buffers).
RabbitMQ is a really robust/elegant solution for event processing. After mucking with state machines for the past few years this has been a breath of fresh air. There are other messaging servers with C libs like OPenAMQ.
Since you tagged this Linux, I'd suggest putting the communication files on /dev/shm. That way you sort-of replace the file-based communication with an in-memory one, without actually altering any of the application logic :-)
You say that you have condensed the reader / Writer Processes into a single Program.
So, now you have different threads for the purpose?
If so, i think a mutex-guarded global buffer should serve the purpose well enough.
Use a global string with sscanf and sprintf instead of a file.

Is there a need for file descriptor control program/syscall?

I am currently thinking of implementing a syscall in some BSD flavours in order to close a given file descriptor.
The file descriptor would be defined as a pair of PID and file descriptor number.
It will be useful in order to test/debug a program or others strange purposes.
I think that I will do it anyway, you know, for learning purpose.
What I'm asking here is: can it be useful to someone somehow ? Can I publish my work and maintain it ?
I don't think any operating system will accept my code if there's no need for the end users or programmers.
Thanks for your advices.
Come on, that would be useless, it's like taking back memory from a program while it's running. This will never happen in reality.
Code it just for the fun if you really want to, no one will ever need that.
Or maybe I'm missing something ?
You could post it as a tutorial for kernel programming stuff. There aren't so much of them out there, and it's not as documented as one could expect.

Resources