Buffering expectations using `printf` - c

Say there exists a C program that executes in some Linux process. Upon start, the C program calls setvbuf to disable buffering on stdout. The program then alternates between two "logical" calls ("logical" in this sense to avoid consideration of the compiler possibly reordering instructions) - the first to printf() and the second incrementing a variable.
int main (int argc, char **argv)
{
setvbuf(stdout, NULL, _IONBF, 0);
unsigned int a = 0;
for (;;) {
printf("hello world!");
a++;
}
}
At some point assume the program receives a signal, e.g. via kill, that causes the program to terminate. Will the contents of stdout always be complete after the signal is received, in the sense that they include the result of all previous invocations to printf(), or is this dependent on other levels of buffering/other behavior not controllable via setvbuf (e.g. kernel buffering)?
The broader context of this question is, if using a synchronous logging mechanism in a C application (e.g. all threads log with printf()), can the log be trusted to be "complete" for all calls that have returned from printf() upon receiving some application-terminating signal?
Edit: I've edited the code snippet and question to remove undefined behavior for clarity.

Any sane interpretation of the expression "unbuffered stream" means that the data has left the stream object when printf returns. In the case of file-descriptor backed streams, that means the data has entered kernel-space, and the kernel should continue sending the data to its final destination (assuming no kernel panic, power loss etc).
But a problem with segfaults is that they may not happen when you think they do. Take for instance the following code:
int *p = NULL;
printf("hello world\n");
*p = 1;
A dumb non-optimizing compiler may create code that segfaults at *p=1;. But that is not the only possibility according to the c-standard. A compiler may for instance, if it can prove that printf doesn't depend on the contents of *p, reorganize the code like this:
int *p = NULL;
*p = 1;
printf("hello world\n");
In that case printf would never be called.
Another possibility is that, since p==NULL, *p=1 is invalid, the compiler may scrap that expression all together.
EDIT: The poster has changed the question from "Segfaulting" to being killed. In that case, it should all depend on if the kernel closes open file descriptors on exit the same way as close does, or not.

Given a construct like:
fprintf(file1, "whatever"); fflush(file1);
file2 = fopen(someExistingFile, "w");
there are some circumstances where it may be essential that fopen doesn't overwrite the existing file unless or until the write to file1 can be guaranteed successful, but there are others where waiting until success of the fflush can be assured before starting the fopen would needlessly degrade performance. In order to allow designers of C implementations to weigh such considerations however they see fit, and also avoid requiring that implementations provide semantic guarantees beyond those offered by the underlying OS (e.g. if an OS reports that the fflush() is complete before data is written to disk, and offers no way of finding out when all pending writes are complete, there would be no way the Standard could usefully require that an implementation which targets that OS must not allow fflush to return at any time when the write could still fail).

So, it appears that there's a basic misunderstanding in your question, and I think it's important to go through the basics of what printf is -> if your stdout buffer size is 0, then the question of "will all data be sent out of the buffer" is always yes, since there isn't a hardware buffer to save data, in theory. That is, somewhere in your computer hardware there's a something like a UART chip, that has a small buffer for transferring data. Most programs I've seen do not use this hardware buffer, so It's not surprising that your program does this.
However, the printf function has an upper layer buffer (in my application ~150 characters), and I'm assuming that this is the buffer you're asking about, note that this is not the same thing as the stdout buffer, its just an allocated piece of memory that stores messages before they're sent to wherever you want them to go. Think about it - if there were no printf-specific buffer you would only be able to send 1 character per function call
Now it really depends on the implementation of printf on your system, if it's nonblocking or blocking. If it's nonblocking, that could mean that data is being transferred by an interrupt or a DMA, probably a combination of both. In which case it depends on if your system stops these transfer mechanisms in the middle of a transfer, or allows them to complete. It's impossible for me to say based on the information you've given
However, in my experience, printf is usually a blocking function; that is it locks up the rest of your code while it's transferring things out of the buffer and moves to the next command only once it's completed, in which case if you have stopped the code from running (again, I'm not certain on the specifics of "kill" in your system) then you have also stopped the transfer.
Your system most likely has blocking PRINTF calls, and considering you say a "kill" signal it sounds like you're not even really sure what you mean by that. I think it's safe to assume that whatever signal you're talking about is not internally stopping your printf function from completing, so your full message will probably be sent before exiting, even if it arrives mid-printf. If your printf is being called it most likely is completing and sending the full message, unless this "kill" signal does something odd. That's the best answer I can give you from a "C" standpoint - if you would like a more absolute answer you would have to give us information that lets us see the implementation of "printf" on your operating system, and/or give us more specifics on how this "kill signal" you mentioned works

Related

How to determine if a pointer is in rodata [duplicate]

This question already has answers here:
How can I prevent (not react to) a segmentation fault?
(3 answers)
Closed 2 years ago.
Can I tell if a pointer is in the rodata section of an executable?
As in, editing that pointer's data would cause a runtime system trap.
Example (using a C character pointer):
void foo(char const * const string) {
if ( in_rodata( string ) ) {
puts("It's in rodata!");
} else {
puts("That ain't in rodata");
}
}
Now I was thinking that, maybe, I could simply compare the pointer to the rodata section.
Something along the lines of:
if ( string > start_of_rodata && string < end_of_rodata ) {
// it's in rodata!
}
Is this a feasible plan/idea?
Does anyone have an idea as to how I could do this?
(Is there any system information that one might need in order to answer this?)
I am executing the program on a Linux platform.
I doubt that it could possibly be portable
If you don't want to mess with linker scripts or using platform-specific memory map query APIs, a proxy approach is fairly portable on platforms with memory protection, if you're willing to just know whether the location is writable, read-only, or neither. The general idea is to do a test read and a test write. If the first succeeds but the second one fails, it's likely .rodata or code segment. This doesn't tell you "it's rodata for sure" - it may be a code segment, or some other read-only page, such as as read-only file memory mapping that has copy-on-write disabled. But that depends on what you had in mind for this test - what was the ultimate purpose.
Another caveat is: For this to be even remotely safe, you must suspend all other threads in the process when you do this test, as there's a chance you may corrupt some state that code executing on another thread may happen to refer to. Doing this from inside a running process may have hard-to-debug corner cases that will stop lurking and show themselves during a customer demo. So, on platforms that support this, it's always preferable to spawn another process that will suspend the first process in its entirety (all threads), probe it, write the result to the process's address space (to some result variable), resume the process and terminate itself. On some platforms, it's not possible to modify a process's address space from outside, and instead you need to suspend the process mostly or completely, inject a probe thread, suspend the remaining other threads, let the probe do its job, write an answer to some agreed-upon variable, terminate, then resume everything else from the safety of an external process.
For simplicity's sake, the below will assume that it's all done from inside the process. Even though "fully capable" self-contained examples that work cross-process would not be very long, writing this stuff is a bit tedious especially if you want it short, elegant and at least mostly correct - I imagine a really full day's worth of work. So, instead, I'll do some rough sketches and let you fill in the blanks (ha).
Windows
Structured exceptions get thrown e.g. due to protection faults or divide by zero. To perform the test, attempt a read from the address in question. If that succeeds, you know it's at least a mapped page (otherwise it'll throw an exception you can catch). Then try writing there - if that fails, then it was read-only. The code is almost boring:
static const int foo;
static int bar;
#if _WIN32
typedef struct ThreadState ThreadState;
ThreadState *suspend_other_threads(void) { ... }
void resume_other_threads(ThreadState *) { ... }
int check_if_maybe_rodata(void *p) {
__try {
(void) *(volatile char *)p;
} __finally {
return false;
}
volatile LONG result = 0;
ThreadState *state = suspend_other_threads();
__try {
InterlockedExchange(&result, 1);
LONG saved = *(volatile LONG*)p;
InterlockedExchange((volatile LONG *)p, saved);
InterlockedExchange(&result, 0); // we succeeded writing there
} __finally {}
resume_other_threads(state);
return result;
}
int main() {
assert(check_if_maybe_rodata(&foo));
assert(!check_if_maybe_rodata(&bar));
}
#endif
Suspending the threads requires traversing the thread list, and suspending each thread that's not the current thread. The list of all suspended threads has to be created and saved, so that later the same list can be traversed to resume all the threads.
There are surely caveats, and WoW64 threads have their own API for suspension and resumption, but it's probably something that would, in controlled circumstances, work OK.
Unix
The idea is to leverage the kernel to check the pointer for us "at arms length" so that no signal is thrown. Handling POSIX signals that result from memory protection faults requires patching the code that caused the fault, inevitably forcing you to modify the protection status of the code's memory. Not so great. Instead, pass a pointer to a syscall you know should succeed in all normal circumstances to read from the pointed-to-address - e.g. open /dev/zero, and write to that file from a buffer pointed-to by the pointer. If that fails with EFAULT, it is due to buf [being] outside your accessible address space. If you can't even read from that address, it's not .rodata for sure.
Then do the converse: from an open /dev/zero, attempt a read to the address you are testing. If the read succeeds, then it wasn't read-only data. If the read fails with EFAULT that most likely means that the area in question was read-only since reading from it succeeded, but writing to it didn't.
In all cases, it'd be most preferable to use native platform APIs to test the mapping status of the page on which the address you try to access resides, or even better - to walk the sections list of the mapped executable (ELF on Linux, PE on Windows), and see exactly what went where. It's not somehow guaranteed that on all systems with memory protection the .rodata section or its equivalent will be mapped read only, thus the executable's image as-mapped into the running process is the ultimate authority. That still does not guarantee that the section is currently mapped read-only. An mprotect or a similar call could have changed it, or parts of it, to be writable, even modified them, and then perhaps changed them back to read-only. You'd then have to either checksum the section if the executable's format provides such data, or mmap the same binary somewhere else in memory and compare the sections.
But I smell a faint smell of an XY problem: what is it that you're actually trying to do? I mean, surely you don't just want to check if an address is in .rodata out of curiosity's sake. You must have some use for that information, and it is this application that would ultimately decide whether even doing this .rodata check should be on the radar. It may be, it may be not. Based on your question alone, it's a solid "who knows?"

C - implications of fflush(stdout)

Does fflush(stdout) do anything besides flushing the output buffer?
Or what does flushing the output buffer imply?
Because in a scheduler, I just resolved a segfault by throwing an fflush(stdout) into the context switch, even though for debugging purposes, all writes to stdout had been disabled, which - as far as I'm concerned - should have rendered any kind of flushing obsolete.
For output streams, fflush() with non-null argument writes any unwritten data from the stream's buffer to the associated output device. How does it do this is implementation dependent.
Because in a scheduler, I just resolved a segfault by throwing an
fflush(stdout) into the context switch
You can close stdout explicitly at the start of program to verify your findings. Good chances are that the problem lies somewhere else or the implementation of stream on your system is buggy.
Without seeing actual code, we can only speculate.
However, adding more code to a function can have an indirect effect of changing memory layout in your program. The nature of the change - if any - depends on what the function does (does it allocate memory, declare lots of variables, etc), how the operating system manages the executable code in memory while running it, etc.
Odds are, somewhere else in your code, there is an invalid operation (invalid operation with a pointer, etc) and the effect of the additional statement is simply changing the symptom, by changing what your program is doing with the affected memory.
I suppose it is remotely possible that there is a bug in fflush(). But I wouldn't bet on it - standard I/O functions like fflush() are used by a lot of people and bugs in such function in a library that has existed for a while (e.g. from a vendor that has released several versions) are likely to have historically been found, reported, and fixed.

Understanding Buffering in C

I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.
I will be a little more specific:
I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.
The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:
#include <stdio.h>
int main(void)
{
FILE * fp = fopen("hallo.txt", "w");
fputc('A', fp);
getchar();
fputc('A', fp);
getchar();
return 0;
}
The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider
I am using Gnu GCC on Windows 8.1.
Update:
I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something?
Please explain this point too.
The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).
As you hinted in the question there are many types of buffers, but as a broad grouping:
Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.
Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:
FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
fputc('x', f);
fclose(f);
Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!
Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!
About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.
My case #3 is for example when you do:
FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);
You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...
Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.
The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.
One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.
As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.
The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".
Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.
So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki
As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:
fflush() or equivalent
File is closed
The buffer is overflown.
Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).
Buffer is a simple small area inside your memory (RAM) and that area is responsible of storing information before sent to your program, as long I'm typing the characters from the keyboard these characters will be stored inside the buffer and as soon I press the Enter key these characters will be transported from the buffer into your program so with the help of buffer all these characters are instantly available to your program (prevent lag and the slowly) and sent them to the output display screen

Will WriteFile() be atomic if the process is terminated but the system continues running?

If my process is terminated at a random moment but the operating system continues to run properly, will Windows guarantee that individual calls to WriteFile are atomic (a.k.a. all-or-nothing)?
Or can I get partial/torn writes?
Note: I am specifically NOT asking for advice on how to practice defensive coding.
This is strictly a question about the behavior of the Microsoft Windows operating system itself.
To be 100% perfectly crystal clear, we can and explicitly do trust the user code to behave sanely. There is no undefined behavior or anything of the sort. All process terminations are assumed to occur through a well-defined behavior such as unhandled exceptions or calls to TerminateProcess, not memory corruption, etc.
Also, specifically note that there are no C++ destructors to worry about here; this is C.
I hope that puts all the secondary concerns about the user code to rest.
WriteFile is certainly not atomic in the case of your process being terminated while it is executing, it is not even atomic if your process is not being killed.
Also, "all or nothing written" is not even a proper definition of an atomic write. All could be written, but intermingled with an independent write from another process. If writes are guaranteed to be atomic, there must be a guarantee (read as: lock) that this doesn't happen.
Apart from the fact that implementing proper atomicity would be considerable extra trouble with very little to gain for the average everyday user, you can also guess that WriteFile is not atomic from:
The absence of mention in the API documentation. You can bet that this would be prominently mentioned, as it is a really big, distinguishing feature.
The presence of the lpNumberOfBytesWritten parameter. A write might still fail (e.g. disk full) but if the function was guaranteed to be atomic, you would know that it either succeeded or failed, and you already know how many bytes you were going to write, so returning that number is unnecessary.
The presence of TxF. Although TxF does a lot more than just making single writes atomic, it is reasonable to assume that Microsoft wouldn't waste considerable time and money in implementing such a beast when "normal" filesystem operations already more or less work the like anyway.
No other mainstream operation system that I know of gives such a guarantee. Linux does give a sort of atomicity guarantee on writev (but not on write) insofar as your writes will not be intermingled with writes from other processes. But that is not at all the same thing as guaranteeing atomicity in presence of process termination.
However, overlapped writes on a handle opened with FILE_FLAG_NO_BUFFERING are technically atomic in respect of process termination (but not in respect of failure, such as disk full or in any other respect!). Saying so is admittedly a bit of a sophistry on an implementation detail, not an actual guarantee given by the operating system, but from a certain point of view it's certainly correct to say so.
A process that is performing an unbuffered, overlapped I/O operation cannot be terminated. That is because the OS is doing DMA transfers into that process' address space. Which of course means that the process cannot be terminated since the OS would reclaim the physical pages. The OS will therefore refuse to terminate a process while such an I/O operation is running.
You can verify this by firing off a couple of big unbuffered overlapped requests (a few GB) and try killing your process in Task Manager. It will only be killed when the I/O is complete (so, after some seconds). That comes as a big surprise when you see it happen for the first time and don't expect it!

Call a userspace function from within a Linux kernel module

I'm programming a simple Linux character device driver to output data to a piece of hardware via I/O ports. I have a function which performs floating point operations to calculate the correct output for the hardware; unfortunately this means I need to keep this function in userspace since the Linux kernel doesn't handle floating point operations very nicely.
Here's a pseudo representation of the setup (note that this code doesn't do anything specific, it just shows the relative layout of my code):
Userspace function:
char calculate_output(char x){
double y = 2.5*x;
double z = sqrt(y);
char output = 0xA3;
if(z > 35.67){
output = 0xC0;
}
return output;
}
Kernelspace code:
unsigned i;
for(i = 0; i < 300; i++){
if(inb(INPUT_PORT) & NEED_DATA){
char seed = inb(SEED_PORT);
char output = calculate_output(seed);
outb(output, OUTPUT_PORT);
}
/* do some random stuff here */
}
I thought about using ioctl to pass in the data from the userspace function, but I'm not sure how to handle the fact that the function call is in a loop and more code executes before the next call to calculate_output occurs.
The way I envision this working is:
main userspace program will start the kernelspace code (perhaps via ioctl)
userspace program blocks and waits for kernelspace code
kernelspace program asks userspace program for output data, and blocks to wait
userspace program unblocks, calculates and sends data (ioctl?), then blocks again
kernelspace program unblocks and continues
kernelspace program finishes and notifies userspace
userspace unblocks and continues to next task
So how do I have the communication between kernelspace and userspace, and also have blocking so that I don't have the userspace continually polling a device file to see if it needs to send data?
A caveat: while fixed point arithmetic would work quite well in my example code, it is not an option in the real code; I require the large range that floating point provides and -- even if not -- I'm afraid rewriting the code to use fixed point arithmetic would obfuscate the algorithm for future maintainers.
I think the simplest solution would be to create a character device in your kernel driver, with your own file operations for a virtual file. Then userspace can open this device O_RDWR. You have to implement two main file operations:
read -- this is how the kernel passes data back up to userspace. This function is run in the context of the userspace thread calling the read() system call, and in your case it should block until the kernel has another seed value that it needs to know the output for.
write-- this is how userspace passes data into the kernel. In your case, the kernel would just take the response to the previous read and pass it onto the hardware.
Then you end up with a simple loop in userspace:
while (1) {
read(fd, buf, sizeof buf);
calculate_output(buf, output);
write(fd, output, sizeof output);
}
and no loop at all in the kernel -- everything runs in the context of the userspace process that is driving things, and the kernel driver is just responsible for moving the data to/from the hardware.
Depending on what your "do some random stuff here" on the kernel side is, it might not be possible to do it quite so simply. If you really need the kernel loop, then you need to create a kernel thread to run that loop, and then have some variables along the lines of input_data, input_ready, output_data and output_ready, along with a couple of waitqueues and whatever locking you need.
When the kernel thread reads data, you put the data in input_ready and set the input_ready flag and signal the input waitqueue, and then do wait_event(<output_ready is set>). The read file operation would do a wait_event(<input_ready is set>) and return the data to userspace when it becomes ready. Similarly the write file operation would put the data it gets from userspace into output_data and set output_ready and signal the output waitqueue.
Another (uglier, less portable) way is to use something like ioperm, iopl or /dev/port to do everything completely in userspace, including the low-level hardware access.
I would suggest that you move the code that does all the "heavy lifting" to user mode - that is, calculate all the 300 values in one go, and pass those to the kernel.
I'm not even sure you can let an arbitrary piece of code call user-mode from the kernel. I'm sure it's possible to do, because that's what for example "signal" does, but I'm far from convinced you can do it "any way you like" (and almost certainly, there are restrictions regarding, for example, what you can do in that function). It certainly doesn't seem like a great idea, and it would DEFINITELY be quite slow to call back to usermode many times.

Resources