Getting rid of file-based communication - c

I have to work with two C programs that communicate via a file-based interface. That is, each of them has a main loop where it polls three or four files (fopen, fscanf), reacts to what it reads and eventually makes its own changes to the files (fprintf) for the other process to read.
Now I have to condense these two programs into a single program, with minimal changes to the program logic and the code in general. However, mainly for aesthetic reasons I'm supposed to replace the file-based communication with something in-memory.
I can imagine a few hacky ways to accomplish this, but I'm sure that stackoverflow will give me a hint at a beautiful solution :)

Since you tagged this Linux, I'm going to suggest open_memstream. It was added to POSIX with POSIX 2008, but it's been available on glibc-based Linux systems for a long time. Basically it lets you open a FILE * that's actually a dynamically-growing buffer in memory, so you wouldn't have to change much code. This "file" is write-only, but you could simply use sscanf instead of fscanf on the buffer to read it, or use fmemopen (which doesn't have the dynamic-growth semantics but which is very convenient for reading from in-memory buffers).

RabbitMQ is a really robust/elegant solution for event processing. After mucking with state machines for the past few years this has been a breath of fresh air. There are other messaging servers with C libs like OPenAMQ.

Since you tagged this Linux, I'd suggest putting the communication files on /dev/shm. That way you sort-of replace the file-based communication with an in-memory one, without actually altering any of the application logic :-)

You say that you have condensed the reader / Writer Processes into a single Program.
So, now you have different threads for the purpose?
If so, i think a mutex-guarded global buffer should serve the purpose well enough.

Use a global string with sscanf and sprintf instead of a file.

Related

Number of threads running? [duplicate]

If I search for counting the number of threads an application has, all the answers involve external programs like top. I want to count the threads within the application itself.
I can't add code at the point of thread creation because it happens inside an immutable library.
I can't read /proc.
It's a C/pthreads program running on a few different Unices.
If you can't read /proc you are a bit in trouble, unless your program communicate with another program which reads /proc
If you don't want to read /proc because of portability concerns, you might use a library which abstracts that a bit, like libproc does
You could write a tiny wrapper for pthread_create that counts created threads and link against that wrapper after you linked against the immutable library.
Use top -H. But chances are, if you can't read proc, top won't work anyway. If thats the case, there is no easy way and it would depend on your specific system.

C program to test other programs with repeatable input and no restart

I'm trying to write a C program that is able to test the performance of other programs by passing in input and testing the output without having to restart the program every time it runs. Co-workers and I are writing sudoku solvers, and I'm writing the program to test how fast each one runs by solving numerous puzzles, which could all be in different languages, and I don't want to penalize people for using languages, like Java, that are really slow to start up. Ideally, this program will start the sudoku solver program, keep it running, and continually pass in a new puzzle via stdin and test the output in stdout.
Here's pseudocode of what I want to do:
start a sudoku solver in another process
once process is running
pass puzzle string into child stdin
wait until output comes into stdout
repeat until end time limit ends
close process
I've messed around with popen, but I couldn't figure out how to write to the child process stdin. I've done a bunch of poking around the internet, and I haven't been able to figure it out.
Any suggestions on how to accomplish this? I'm running this on a Linux box. It doesn't have to be stdin and stdout for communication, but that would be the easiest for everyone else.
This is more a long comment than an answer, but your question is really too broad and ill-defined, and I'm just giving some hints.
You first need to understand how to start, manage, and communicate with child processes. An entire Unix programming book is needed to explain that. You could read ALP or some newer book. You need to be able to write a Unix shell-like program. Become familiar with many syscalls(2) including fork(2), pipe(2), execve(2), dup2(2), poll(2), waitpid(2) and a dozen others. See also signal(7) & time(7).
You also need to discuss with your colleagues some conventions and protocol about these sudoku programs and how your controlling program would communicate with them (and the evil is in the details). For example, your pseudo-code is mentioning "pass puzzle string" but you don't define what that exactly means (what if the string contains newlines, or weird characters?). Read also about inter-process communication.
(You might want to have more than one sudoku process running. You probably don't want a buggy sudoku client to break your controlling program. This is unclear in your question)
You could want to define a text-based protocol (they are simpler to debug and use than binary protocols). Details matter a lot, so document it precisely (probably using some EBNF notation). You might want to use textual formats like JSON, YAML, S-expressions. You could take inspiration from SMTP, HTTP, JSONRPC etc (or perhaps choose to use one of them).
Remember that pipe(7)-s, fifo(7)-s and tcp(7)-s socket(7)-s are just a stream of bytes without any message boundaries. Any message organization above these should be a documented convention (and it might happen that the message would be fragmented, so you need careful buffering). See also this.
(I recommend making some free software sample implementation of your protocol)
Look also into similar work, perhaps SAT competition (or chess contests programs, I don't know the details).
Read also something about OSes, like Operating Systems: Three Easy Pieces

How to write a tail -f like C program

I want to implement a C program in Linux (Ubuntu distro) that mimics tail -f. Note that I do not want to actually call tail -f from my C code, rather implement its behaviour. At the moment I can think of two ways to implement it.
When the program is called, I seek to the end of file. Afterwards, I would read to the end of file periodically and print whatever I read if it is not empty.
The second method which can potentially be more efficient is to again, seek to the end of file. But, this time I "somehow" listen for changes to that file and read to the end of file, only if I it is changed.
With that being said, my question is how to implement the second approach and if someone can share if it is worth the effort. Also, are these the only two options?
NOTE: Thanks for the comments, the question is changed based on them.
There is no standardized mechanism for monitoring changes to a file, so you'll need to implement a "polling" solution anyway (that is, when you hit the end of file, wait a short amount of time and try again.)
On Linux, you can use the inotify family of system calls, but be aware that it won't always work. It doesn't work for special files or remote filesystems, for example, and it may not work for some local filesystems. It is complicated in the case of symlinks. And so on. There is a Windows equivalent, but I believe it suffers from some of the same issues.
So even if you use a notification system, you'll need the polling solution as a backup, and since OS notifications are not guaranteed to be reliable (that is, if the system is under load, notifications might be dropped), you'll need to poll on timeout even if you are using a notification system.
You might want to take a look at the implementation of the GNU tail utility (http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c) to see how the special cases are handled.
You can implement the requirement by following steps:
1) fopen with 'a+' mode;
2) select the file discriptor opened (need do convert from FILE * to file descriptor) and do the read.

How do I count the number of running threads (pthreads)?

If I search for counting the number of threads an application has, all the answers involve external programs like top. I want to count the threads within the application itself.
I can't add code at the point of thread creation because it happens inside an immutable library.
I can't read /proc.
It's a C/pthreads program running on a few different Unices.
If you can't read /proc you are a bit in trouble, unless your program communicate with another program which reads /proc
If you don't want to read /proc because of portability concerns, you might use a library which abstracts that a bit, like libproc does
You could write a tiny wrapper for pthread_create that counts created threads and link against that wrapper after you linked against the immutable library.
Use top -H. But chances are, if you can't read proc, top won't work anyway. If thats the case, there is no easy way and it would depend on your specific system.

Creating unflushed file output buffers

I am trying to clear up an issue that occurs with unflushed file I/O buffers in a couple of programs, in different languages, running on Linux. The solution of flushing buffers is easy enough, but this issue of unflushed buffers happens quite randomly. Rather than seek help on what may cause it, I am interested in how to create (reproduce) and diagnose this kind of situation.
This leads to a two-part question:
Is it feasible to artificially and easily construct instances where, for a given period of time, one can have output buffers that are known to be unflushed? My searches are turning up empty. A trivial baseline is to hammer the hard drive (e.g. swapping) in one process while trying to write a large amount of data from another process. While this "works", it makes the system practically unusable: I can't poke around and see what's going on.
Are there commands from within Linux that can identify that a given process has unflushed file output buffers? Is this something that can be run at the command line, or is it necessary to query the kernel directly? I have been looking at fsync, sync, ioctl, flush, bdflush, and others. However, lacking a method for creating unflushed buffers, it's not clear what these may reveal.
In order to reproduce for others, an example for #1 in C would be excellent, but the question is truly language agnostic - just knowing an approach to create this situation would help in the other languages I'm working in.
Update 1: My apologies for any confusion. As several people have pointed out, buffers can be in the kernel space or the user space. This helped pinpoint the problems: we're creating big dirty kernel buffers. This distinction and the answers completely resolve #1: it now seems clear how to re-create unflushed buffers in either user space or kernel space. Identifying which process ID has dirty kernel buffers is not yet clear, though.
If you are interested in the kernel-buffered data, then you can tune the VM writeback through the sysctls in /proc/sys/vm/dirty_*. In particular, dirty_expire_centisecs is the age, in hundredths of a second, at which dirty data becomes eligible for writeback. Increasing this value will give you a larger window of time in which to do your investigation. You can also increase dirty_ratio and dirty_background_ratio (which are percentages of system memory, defining the point at which synchronous and asynchronous writeback start respectively).
Actually creating dirty pages is easy - just write(2) to a file and exit without syncing, or dirty some pages in a MAP_SHARED mapping of a file.
A simple program that would have an unflushed buffer would be:
main()
{
printf("moo");
pause();
}
Stdio, by default only flushes stdout on newlines, when connected to a terminal.
It is very easy to cause unflushed buffers by controlling the receiving side. The beauty of *nix systems is that everything looks like a file, so you can use special files to do what you want. The easiest option is a pipe. If you just want to control stdout, this is the simples option: unflushed_program | slow_consumer. Otherwise, you can use named pipes:
mkfifo pipe_file
unflushed_program --output pipe_file
slow_consumer --input pipe_file
slow_consumer is most likely a program you design to read data slowly, or just read X bytes and stop.

Resources