Stream buffering issue - c

The mod_rewrite documentation states that it is a strict requirement to disable in(out)put buffering in a rewrite program.
Keeping that in mind I've written a simple program (I do know that it lacks the EOF check but this is not an issue and it saves one condition check per loop):
#include <stdio.h>
#include <stdlib.h>
int main ( void )
{
setvbuf(stdin,NULL,_IONBF,0);
setvbuf(stdout,NULL,_IONBF,0);
int character;
while ( 42 )
{
character = getchar();
if ( character == '-' )
{
character = '_';
}
putchar(character);
}
return 0;
}
After making some measurements I was shocked - it was over 9,000 times slower than the demo Perl script provided by the documentation:
#!/usr/bin/perl
$| = 1; # Turn off I/O buffering
while (<STDIN>) {
s/-/_/g; # Replace dashes with underscores
print $_;
}
Now I have two related questions:
Question 1. I believe that the streams may be line buffered since Apache sends a new line after each path. Am I correct? Switching my program to
setvbuf(stdin,NULL,_IOLBF,4200);
setvbuf(stdout,NULL,_IOLBF,4200);
makes it twice as fast as Perl one. This should not hit Apache's performance, should it?
Question 2. How can one write a program in C which will use unbuffered streams (like Perl one) and will perform as fast as Perl one?

Question 1: You would have to look at the code. It could be line buffered, it could be using fflush at the end of each request (or block of requests), or it could be using write calls with a larger buffer. In any case, it won't be doing per-character I/O which is what your program is doing.
Question 2: I suspect the main issue is on output. If you were to assemble the entire result in a buffer and write that out as one call, then you would be faster. However, that just means you are doing the line buffering instead of having the library take care of it for you. The key is that with no buffering, each output call results in a system call - that is very high overhead. In theory, the same concept holds true on input but I'm not sure the implementation wouldn't notice the available characters and buffer them in any case. Same workaround though - read a larger buffer and then take it apart yourself.
Personally, I'd avoid all the setvbuf stuff and just do an fflush at the end of each request.

When writing to a terminal, stdout is flushed after every line. This way you can always see the output right away. When writing to a file or, as in your case a pipe, this automatic flush is disabled. Usually in those cases performance is more important.
This causes problems when processes have to interact with each other. One program writes something. It's not sent instantly but stored in a buffer. Second program waits for that data. First program waits for more data from second program resulting in a deadlock.
To avoid this, you need to flush all the output before waiting for additional input. Simple fflusuh(stdout) before every read operation should be enough. This is actually what $|=1 does in Perl. Nothing needs to be done with stdin.
If performance is critical and you need to operate only on single bytes. Read and write data in big chunks using unbuffered read/write. For example:
#include <unistd.h>
int main() {
char buf[1024];
while(1) {
int len = read(0,buf,sizeof(buf));
for(int i=0;i<len;i++) {
if ( buf[i] == '-' ) {
buf[i] = '_';
}
}
write(1,buf,len);
}
}

Related

why does fread from stdin not stop [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I am trying to read input from stdin with fread(). However i am have a problem, the loop will not terminate and instead keeps reading.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "argument err");
return -1;
}
FILE *in = fopen(argv[1], "w");
if (in == NULL) {
fprintf(stderr, "failed to open file");
return -1;
}
char buffer[20];
size_t ret;
while ((ret = fread(buffer, 1, 20, stdin)) > 0) {
if (fwrite(buffer, 1, ret, in) != ret) {
if (ferror(in) != 0) {
perror("write err:");
}
}
}
return 0;
}
How can i make this loop terminate when EOF is reached? i have tried using ctrl+D but that just seems like a strange way to stop taking input.
I guess what i want is to use fread() to read multiple arbitrary amounts of data in chunks of 20 bytes and then somehow stop.
How can i make this loop terminate when EOF is reached?
When do you think EOF is reached? Really. When you are providing input interactively, how is the system or the program supposed to know that you've entered all the data you want the program to consume?
i have tried using ctrl+D but that just seems like a strange way to stop taking input.
It is exactly the way to signal a soft EOF to a POSIX terminal. Since you want the loop to stop when EOF is encountered, it seems absolutely natural to me to use ctrl+D for the purpose when providing data interactively. That's not the only way you could signal the end of the input, but it has a lot going for it.
I guess what i want is to use fread() to read multiple arbitrary amounts of data in chunks of 20 bytes and then somehow stop.
Again: how is the program supposed to know when it has consumed all the "multiple arbitrary amounts" of data that you decide to provide on a given run? An EOF signal is an eminently reasonable choice for multiple reasons, and the way to deliver that from a POSIX terminal interface is ctrl+D.
As pointed out before you are reading from an eternal stream, this means that stdin don't naturally have a EOF (or <=0) value.
If you want your loop to terminate, you will have to add a termination condition, like a certain character, word or all type of value. After that you could use a break or a return in some case. You could also search if your terminal emulator support the insertion of an EOF value into the stdin, which is pretty common (But very platform dependent).
ADD: On my system, typical linux, CTRL+D is for an EOF insertion in stdin. It seems that you found this out yourself, and if you want your program to know where to stop you will need to use this.
You cand also send a signal to your program, usually done with a shortcut like CTRL+D, CTRL+C, CTRL+T etc... there is all sort of signal, which can be sent by your system or/and your TE and you just have to implement in your program the corresponding signal receiver.
How can i make this loop terminate when EOF is reached? i have tried using ctrl+D but that just seems like a strange way to stop taking input.
fread and fwrite are there to read data records, so they (both) take the number of records to read and the size of the record. If the available data doesn't fit on a full record, you will not get the full record at all (indeed, the routines return the number of full records read, and the partial read will be waiting for the next fread() call.)
All the calls in stdio.h package are buffered, so the buffer holds the data that has been read (from the system) but not yet consumed by the user, and so, this makes me to wonder why are you trying to use a buffer to read data that is already buffered?
EOF is produced when you are trying to read one record and the fread() call results in a true end of file from the system (this normally requires two calls, the first to complete the remaining data, the second resulting in no data ---zero bytes--- returned from the system) So you have to distinguish two cases:
fread() returns 0 in case it has read something, but is not enough to complete a record.
fread() returns EOF in case it has read nothing (the true end of file is reached)
As I've said above, fread() & fwrite() will read/write full records (this is useful when your data is a struct with a fixed length, but normally not when you can have extra data at the end)
The way to terminate the loop should be something like this:
while ((ret = fread(buffer, 1, 20, stdin)) >= 0) {
if (fwrite(buffer, 1, ret, in) != ret) {
if (ferror(in) != 0) {
perror("write err:");
}
}
}
/* here you can have upto 19 bytes in the buffer that cannot
* be read with that record length, but you can read individually
* with fgetc() calls. */
so, if you read half a record (at end of file) only at the next fread() it will detect the end of file (by reading nothing) and you will be free of ending. (beware that the extra data that doesn't fill a full buffer, still needs to be read by other means)
The cheapest and easiest way to solve this problem (to copy a file from one descriptor to another) is described in K&R (in the first edition) and has not yet have better code to void it, is this:
int c;
while ((c = fgetc(in)) != EOF)
fputc(c, out);
while it seems to read the characters one by one, it actually makes a call to read(2) to completely fill a full buffer of data, and return just one character, next characters will be taken from the buffer, saving calls to read(), and the same happens to fputc() (it fills the buffer until it's full, then flushes it, in a single call to write()).
Many people has tried to defeat the code above, without any measurable gain in efficience. So, my hint is be simple, that the world is complicated enough to force you to go complex.

How to use write() or fwrite() for writing data to terminal (stdout)?

I am trying to speed up my C program to spit out data faster.
Currently I am using printf() to give some data to the outside world. It is a continuous stream of data, therefore I am unable to use return(data).
How can I use write() or fwrite() to give the data out to the console instead of file?
Overall my setup consist of program written in C and its output goes to the python script, where the data is processed further. I form a pipe:
./program_in_c | script_in_python
This gives additional benefit on Raspberry Pi by using more of processor's cores.
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
write() writes up to count bytes from the buffer starting at buf to
the file referred to by the file descriptor fd.
the standard output file descriptor is: 1 in linux at least!
concern using flush the stdoutput buffer as well, before calling to write system call to ensure that all previous garabge was cleaned
fflush(stdout); // Will now print everything in the stdout buffer
write(1, buf, count);
using fwrite:
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
The function fwrite() writes nmemb items of data, each size bytes
long, to the stream pointed to by stream, obtaining them from the
location given by ptr.
fflush(stdout);
int buf[8];
fwrite(buf, sizeof(int), sizeof(buf), stdout);
Please refare to man pages for further reading, in the links below:
fwrite
write
Well, there's little or no win in trying to overcome the already used buffering system of the stdio.h package. If you try to use fwrite() with larger buffers, you'll probably win no more time, and use more memory than is necessary, as stdio.h selects the best buffer size appropiate to the filesystem where the data is to be written.
A simple program like the following will show that speed is of no concern, as stdio is already buffering output.
#include <stdio.h>
int
main()
{
int c;
while((c = getchar()) >= 0)
putchar(c);
}
If you try the above and below programs:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
char buffer[512];
int n;
while((n = read(0, buffer, sizeof buffer)) > 0)
write(1, buffer, n);
if (n < 0) {
perror("read");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
You will see that there's no significative difference or, even, the first program will be faster, despite it is doing I/O on a per character basis. (as B. Kernighan & Dennis Ritchie wrote it in her first edition of "The C programming language") Most probably the first program will win.
The calls to read() and write() involve a system call each, with a buffer size decided by you. The individual getchar() and putchar() calls don't. They just store the received chars in a memory buffer, as you print them, whose size has been decided by the stdio.h library implementation, based on the filesystem, and it flushes the buffer, once it is full of data. If you grow the buffer size in the second program, you'll see that you get better results increasing it up to a point, but after that you'll see no more increment in speed. The number of calls made to the library is insignificant with respect to the time involved in doing the actual I/O, and selecting a very large buffer, will eat much memory from your system (and a Raspberry Pi memory is limited in this sense, to 1Gb or ram) If you end making swap due to a so large buffer, you'll lose the battle completely.
Most filesystems have a preferred buffer size, because the kernel does write ahead (the kernel reads more than what you asked for, on sequential reads, in prevision that you'll continue reading more after you consumed the data) and this affects the optimum buffer size. For that, the stat(2) system call tells you what is the optimum buffer size, and stdio uses that when it selects the actual buffer size.
Don't think you are going to get better (or much better) than the program listed first above. Even if you use large enough buffers.
What is not correct (or valid) is to intermix calls that do buffering (like all the stdio package's) with basic system calls (like read(2) or write(2) ---as I've seen recommending you to use fflush(3) after write(2), which is totally incoherent--- that do not buffer the data) there's no earn (and probably you'll get your output incorrectly ordered, if you do part of the calls using printf(3) and part using write(2) (this happens more in pipelines like you plan to do, because the buffers are not line oriented ---another characteristic of buffered output in stdio---)
Finally, I recomend you to read "The Unix programming environment" by Dennis Ritchie and Rob Pike. It will teach you a lot of unix, but one very good thing is that it will teach you to use perfectly the stdio package and the unix filesystem calls for reading and writing. With a little of luck you'll find it in .pdf on internet.
The next program shows you the effect of buffering:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
int i;
char *sep = "";
for (i = 0; i < 10; i++) {
printf("%s%d", sep, i);
sep = ", ";
sleep(1);
}
printf("\n");
}
One would assume you are going to see (on the terminal) the program, writing the numbers 0 to 9, separated by , and paced on one second intervals.
But due to the buffering, what you observe is quite different, you'll see how your program waits for 10 seconds without writing anything at all on the terminal, and at the end, writes everything in one shot, including the final line end, when the program terminates, and the shell shows you the prompt again.
If you change the program to this:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main()
{
int i;
char *sep = "";
for (i = 0; i < 10; i++) {
printf("%s%d", sep, i);
fflush(stdout);
sep = ", ";
sleep(1);
}
printf("\n");
}
You'll see the expected output, because you have told stdio to flush the buffer at each loop pass. In both programs you did 10 calls to printf(3), but there was only one write(2) at the end to write the full buffer. In the second version you forced stdio to do one such write(2) after each printf, and that showed the data out as the program passed through the loop.
Be careful, because another characteristic of stdio can be confounding you, as printf(3), when you print to a terminal device, flushes the output at each \n, but when you run it through a pipe, it does it only when the buffer fills up completely. This saves system calls (in FreeBSD, for example, the buffer size selected by stdio is around 32kb, large enough to force two blocks to write(2) and optimum (you'll not get better going above that size)
The console output in C works almost the same way as a file. Once you have included stdio.h, you can write on the console output, named stdout (for "standard output"). In the end, the following statement:
printf("hello world!\n");
is the same as:
char str[] = "hello world\n";
fwrite(str, sizeof(char), sizeof(str) - 1, stdout);
fflush(stdout);

Loop until user inputs an Integer, but skip the read in if no integer exists on the terminal

Im trying to create an application that will count pulses from an Oscilloscope, my problem is this:
I need a loop to run constantly, until a user input is entered. However I dont want getch() to be called unless there is an input on the terminal ready to be read. How would I go about checking for a character or integer existing on the terminal?
If you're coding for UNIX, you'll need to use either poll(2) or select(2).
Since you mention the non-standard getch you might also have kbhit which tests if there is input waiting. But don't slug your oscilloscope with that every loop: you can ease the flow by checking occasionally.
#include <conio.h>
#include <stdio.h>
#define POLLMASK 0xFFFF
int main(void){
int poll = 0;
int ch = 0;
while(ch != 27) {
// ... oscilloscope details
if ((poll++ & POLLMASK) == 0 && _kbhit()) {
ch = _getch();
// ... do something with input
}
}
return 0;
}
Use scanf() . It's standard, it already waits for a character in the buffer, and it's pretty much universal.
EDIT:
In the case of an unnamed OS, this is simply not to be done. Standard C has no way of reading raw data from a terminal and guaranteeing verbatim, unformatted results. It's entirely up to your terminal handler.
However, if your microcontroller has some sort of serial library, I would suggest doing something like this:
if characters in buffer is greater than 0, and the character just read is not a terminating character, report it. Otherwise, keep waiting for a character in the buffer.
until a user input is entered. this sentence indicates the user will use the keyboard eventually, therefore, you can track keyboard callback event to track the user's input. There are several libraries that provide you with such keyboard events (e.g. GLUT, SDL)

Multiple processes accessing the same file

Is it alright for multiple processes to access (write) to the same file at the same time? Using the following code, it seems to work, but I have my doubts.
Use case in the instance is an executable that gets called every time an email is received and logs it's output to a central file.
if (freopen(console_logfile, "a+", stdout) == NULL || freopen(error_logfile, "a+", stderr) == NULL) {
perror("freopen");
}
printf("Hello World!");
This is running on CentOS and compiled as C.
Using the C standard IO facility introduces a new layer of complexity; the file is modified solely via write(2)-family of system calls (or memory mappings, but that's not used in this case) -- the C standard IO wrappers may postpone writing to the file for a while and may not submit complete requests in one system call.
The write(2) call itself should behave well:
[...] If the file was
open(2)ed with O_APPEND, the file offset is first set to the
end of the file before writing. The adjustment of the file
offset and the write operation are performed as an atomic
step.
POSIX requires that a read(2) which can be proved to occur
after a write() has returned returns the new data. Note that
not all file systems are POSIX conforming.
Thus your underlying write(2) calls will behave properly.
For the higher-level C standard IO streams, you'll also need to take care of the buffering. The setvbuf(3) function can be used to request unbuffered output, line-buffered output, or block-buffered output. The default behavior changes from stream to stream -- if standard output and standard error are writing to the terminal, then they are line-buffered and unbuffered by default. Otherwise, block-buffering is the default.
You might wish to manually select line-buffered if your data is naturally line-oriented, to prevent interleaved data. If your data is not line-oriented, you might wish to use un-buffered or leave it block-buffered but manually flush the data whenever you've accumulated a single "unit" of output.
If you are writing more than BUFSIZ bytes at a time, your writes might become interleaved. The setvbuf(3) function can help prevent the interleaving.
It might be premature to talk about performance, but line-buffering is going to be slower than block buffering. If you're logging near the speed of the disk, you might wish to take another approach entirely to ensure your writes aren't interleaved.
This answer was incorrect. It does work:
So the race condition would be:
process 1 opens it for append, then
later process 2 opens it for append, then
later still 1 writes and closes, then
finally 2 writes and closes.
I'd be impressed if that 'worked' because it isn't clear to me what
working should mean. I assume 'working' means all of the bytes written
by the two processes are inthe log file? I'd expect that they both
write starting at the same byte offset, so one will replace the others
bytes. It will all be okay upto and including step 3. and only show up
as a problem at step 4, Seems like an easy test to write: open getchar
... write close.
Is it critical that they can have the file open simultaneously? A
more obvious solution if the write is quick, is to open exclusive.
For a quick check on your system, try:
/* write the first command line argument to a file called foo
* stackoverflow topic 9880935
*/
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main (int argc, const char * argv[]) {
if (argc <2) {
fprintf(stderr, "Error: need some text to write to the file Foo\n");
exit(1);
}
FILE* fp = freopen("foo", "a+", stdout);
if (fp == NULL) {
perror("Error failed to open file\n");
exit(1);
}
fprintf(stderr, "Press a key to continue\n");
(void) getchar(); /* Yes, I really mean to ignore the character */
if (printf("%s\n", argv[1]) < 0) {
perror("Error failed to write to file: ");
exit(1);
}
fclose(fp);
return 0;
}

Opening a file in C through a proccess

I am trying to create a a program that does the following actions:
Open a file and read one line.
Open another file and read another line.
Compare the two lines and print a message.
This is my code:
#include <stdio.h>
#include <string.h>
int findWord(char sizeLineInput2[512]);
int main()
{
FILE*cfPtr2,*cfPtr1;
int i;
char sizeLineInput1[512],sizeLineInput2[512];
cfPtr2=fopen("mike2.txt","r");
// I open the first file
while (fgets(sizeLineInput2, 512, cfPtr2)!=NULL)
// I read from the first 1 file one line
{
if (sizeLineInput2[strlen(sizeLineInput2)-1]=='\n')
sizeLineInput2[strlen(sizeLineInput2)-1]='\0';
printf("%s \n",sizeLineInput2);
i=findWord(sizeLineInput2);
//I call the procedure that compares the two lines
}
getchar();
return 0;
}
int findWord(char sizeLineInput2[512])
{
int x;
char sizeLineInput1[512];
File *cfPtr1;
cfPtr1=fopen("mike1.txt","r");
// here I open the second file
while (fgets(sizeLineInput1, 512,cfPtr1)!=NULL)
{
if (sizeLineInput1[strlen(sizeLineInput1)-1]=='\n')
sizeLineInput1[strlen(sizeLineInput1)-1]='\0';
if (strcmp(sizeLineInput1,sizeLineInput2)==0)
//Here, I compare the two lines
printf("the words %s and %s are equal!\n",sizeLineInput1,sizeLineInput2);
else
printf("the words %s and %s are not equal!\n",sizeLineInput1,sizeLineInput2);
}
fclose(cfPtr1);
return 0;
}
It seems to have some problem with file pointers handling. Could someone check it and tell me what corrections I have to do?
Deconstruction and Reconstruction
The current code structure is, to be polite about it, cock-eyed.
You should open the files in the same function - probably main(). There should be two parallel blocks of code. In fact, ideally, you'd do your opening and error handling in a function so that main() simply contains:
FILE *cfPtr1 = file_open("mike1.txt");
FILE *cfPtr2 = file_open("mike2.txt");
If control returns to main(), the files are open, ready for use.
You then need to read a line from each file - in main() again. If either file does not contain a line, then you can bail out with an appropriate error:
if (fgets(buffer1, sizeof(buffer1), cfPtr1) == 0)
...error: failed to read file1...
if (fgets(buffer2, sizeof(buffer2), cfPtr2) == 0)
...error: failed to read file2...
Then you call you comparison code with the two lines:
findWord(buffer1, buffer2);
You need to carefully segregate the I/O operations from the actual processing of data; if you interleave them as in your first attempt, it makes everything very messy. I/O tends to be messy, simply because you have error conditions to deal with - that's why I shunted the open operation into a separate function (doubly so since you need to do it twice).
You could decide to wrap the fgets() call and error handling up in a function, too:
const char *file1 = "mike1.txt";
const char *file2 = "mike2.txt";
read_line(cfPtr1, file1, buffer1, sizeof(buffer1));
read_line(cfPtr2, file2, buffer2, sizeof(buffer2));
That function can trim the newline off the end of the string and deal with anything else that you want it to do - and report an accurate error, including the file name, if anything goes wrong. Clearly, with the variables 'file1' and 'file2' on hand, you'd use those instead of literal strings in the file_open() calls. Note, too, that making them into variables means it is trivial to take the file names from the command line; you simply set 'file1' and 'file2' to point to the argument list instead of the hard-wired defaults. (I actually wrote: const char file1[] = "mike1.txt"; briefly - but then realized that if you handle the file names via the command line, then you need pointers, not arrays.)
Also, if you open a file, you should close the file too. Granted, if your program exits, the o/s cleans up behind you, but it is a good discipline to get into. One reason is that not every program exits (think of the daemons running services on your computer). Another is that you quite often use a resource (file, in the current discussion) briefly and do not need it again. You should not hold resources in your program for longer than you need them.
Philosophy
Polya, in his 1957 book "How To Solve It", has a dictum:
Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.
That is as valid advice in programming as it is in mathematics. And in their classic 1978 book 'The Elements of Programming Style', Kernighan and Plauger make the telling statements:
[The] subroutine call permits us to summarize the irregularities in the argument list [...]
The subroutine itself summarizes the regularities of the code.
In more modern books such as 'The Pragmatic Programmer' by Hunt & Thomas (1999), the dictum is translated into a snappy TLA:
DRY - Don't Repeat Yourself.
If you find your code doing the 'same' lines of code repeated several times, write a subroutine to do it once and call the subroutine several times.
That is what my suggested rewrite is aiming at.
In both main() and findWord() you should not use strlen(sizeLineInputX) right after reading the file with fgets() - there may be no '\0' in sizeLineInput2 and you will have strlen() read beyond the 512 bytes you have.
Instead of using fgets use fgetc to read char by char and check for a newline character (and for EOF too).
UPD to your UPD: you compare each line of mike2.txt with each line of mike1.txt - i guess that's not what you want. Open both files one outside while loop in main(), use one loop for both files and check for newline and EOF on both of them in that loop.

Resources