I/O methods in C - c

I am looking for various ways of reading/writing data from stdin/stdout. Currently I know about scanf/printf, getchar/putchar and gets/puts. Are there any other ways of doing this? Also I am interesting in knowing that which one is most efficient in terms of Memory and Space.
Thanks in Advance

fgets()
fputs()
read()
write()
And others, details can be found here: http://www.cplusplus.com/reference/clibrary/cstdio/
As per your time question take a look at this: http://en.wikipedia.org/wiki/I/O_bound

Stdio is designed to be fairly efficient no matter which way you prefer to read data. If you need to do character-by-character reads and writes, they usually expand to macros which just access the buffer except when it's full/empty. For line-by-line text io, use puts/fputs and fgets. (But NEVER use gets because there's no way to control how many bytes it will read!) The printf family (e.g. fprintf) is of course extremely useful for text because it allows you to skip constructing a temporary buffer in memory before writing (and thus lets you avoid thinking about all the memory allocation, overflow, etc. issues). fscanf tends to be much less useful, but mostly because it's difficult to use. If you study the documentation for fscanf well and learn how to use %[, %n, and the numeric specifiers, it can be very powerful!
For large blocks of text (e.g. loading a whole file into memory) or binary data, you can also use the fread and fwrite functions. You should always pass 1 for the size argument and the number of bytes to read/write for the count argument; otherwise it's impossible to tell from the return value how much was successfully read or written.
If you're on a reasonably POSIX-like system (pretty much anything) you can also use the lower-level io functions open, read, write, etc. These are NOT part of the C standard but part of POSIX, and non-POSIX systems usually provide the same functions but possibly with slightly-different behavior (for example, file descriptors may not be numbered sequentially 0,1,2,... like POSIX would require).

If you're looking for immediate-mode type stuff don't forget about Curses (more applicable on the *NIX side but also available on Windows)

Related

Pushing characters back to stdin in C

Say input stream (stdin) has "abc" on it. I want to push back, say, 3 '*' chars to stdin to get something like "***abc". I was trying to use ungetc() (I did ungetc('*', stdin)) for this but I realized that it guarantees only 1 character pushback, and then it may fail. Is there any other way I could push 3 (or any known N) amount of characters back to stdin?
There is no portable way to accomplish this.
However, most implementation of the standard C library will allow multiple pushbacks, within reason. So in practice, it may not be a problem.
If you need an absolute guarantee, you'd need to write your own stdio implementation. That's certainly possible, since there are open source implementations which you could modify. But it's a lot of work. Alternatively, you could use the FreeBSD library, if it is available for your platform, since it does guarantee the possibility of repeated ungetc calls. (As far as I know, the GNU implementation also allows arbitrary ungetc calls. But the documentation doesn't guarantee that.)
Some libraries include non-standard interfaces like GNU's fopencookie, which let you create stdio streams with custom low-level read and write functions. Unfortunately, these do not help with this particular use case, which requires the ability to customise the implementation of stdio buffers. So that's a dead-end; I only mention it because it might seem plausible at first glance.

fgetc vs getline or fgets - which is most flexible

I am reading data from a regular file and I was wondering which would allow for the most flexibility.
I have found that both fgets and getline both read in a line (one with a maximum number of characters, the other with dynamic memory allocation). In the case of fgets, if the length of the line is bigger than the given size, the rest of the line would not be read but remain buffered in the stream. With getline, I am worried that it may attempt to assign a large block of memory for an obscenely long line.
The obvious solution for me seems to be turning to fgetc, but this comes with the problem that there will be many calls to the function, thereby resulting in the read process being slow.
Is this compromise in either case between flexibility and efficiency unavoidable, or can it worked through?
The three functions you mention do different things:
fgetc() reads a single character from a FILE * descriptor, it buffers input and so, you can process the file in a buffered way without having the overhelm of making a system call for each character. when your problem can be handled in a character oriented way, it is the best.
fgets() reads a single line from a FILE * descriptor, it's like calling fgetc() to fill the character array you pass to it in order to read line by line. It has the drawback of making a partial read in case your input line is longer than the buffer size you specify. This function buffers also input data, so it is very efficient. If you know that your lines will be bounded, this is the best to read your data line by line. Sometimes you want to be able to process data in an unbounded line size way, and you must redesign your problem to use the available memory. Then the one below is probably better election.
getline() this function is relatively new, and is not ANSI-C, so it is possible you port your program to some architecture that lacks it. It's the most flexible, at the price of being the less efficient. It requires a reference to a pointer that is realloc()ated to fill more and more data. It doesn't bind the line length at the cost of being possible to fill all the memory available on a system. Both, the buffer pointer and the size of the buffer are passed by reference to allow them to be updated, so you know where the new string is located and the new size. It mus be free()d after use.
The reason of having three and not only one function is that you have different needs for different cases and selecting the mos efficient one is normally the best selection.
If you plan to use only one, probably you'll end in a situation where using the function you selected as the most flexible will not be the best election and you will probably fail.
Much is case dependent.
getline() is not part of the standard C library. Its functionality may differ - depends on the implementation and what other standards it follows - thus an advantage for the standard fgetc()/fgets().
... case between flexibility and efficiency unavoidable, ...
OP is missing the higher priorities.
Functionality - If code cannot function right with the selected function, why use it? Example: fgets() and reading null characters create issues.
Clarity - without clarity, feel the wrath of the poor soul who later has to maintain the code.
would allow for the most flexibility. (?)
fgetc() allows for the most flexibility at the low level - yet helper functions using it to read lines tend to fail corner cases.
fgets() allows for the most flexibility at mid level - still have to deal with long lines and those with embedded null characters, but at least the low level of slogging in the weeds is avoided.
getline() useful when high portability not needed and risks of allowing the user to overwhelm resources is not a concern.
For robust handing of user/file input to read a line, create a wrapping function (e.g. int my_read_line(size_t buf, char *buf, FILE *f)) and call that and only that in user code. Then when issues arise, they can be handled locally, regardless of the low level input function selected.

Overall efficiency of fprintf and stdout

I have a program that regularly writes to stdout. Something like this:
fprintf(stdout, ...);
fprintf(stdout, ...);
fprintf(stdout, ...);
This makes the program easy to read but I'm curious to know how efficient is it compared to concatenating strings to some char[] and then calling a single fprintf(stdout...) on that char[]. By efficiency, I'm referring to processing efficiency.
The whole of stdio.h is notoriously slow, as are writes to the screen or files in general. What makes stdio.h particularly bad, is that it's a cumbersome wrapper around the underlying OS API. printf/scanf-like functions have an horrible interface forcing them to deal with both format string parsing and variable argument lists, before they can even pass along the data to the function doing the actual work.
Minimizing those fprintf calls into a single one will almost definitely improve performance. But then that depends on how you "concatenate strings", if it is done with sprintf, then you have only moved all the calling/parsing overhead from one icky stdio.h function to another.
The only reason you would ever use stdio.h is if you need to create very portable console and file I/O code. Otherwise, you'd call the OS API directly.
That being said, you should only manually optimize code when there is a need for it. If the program runs "fast enough" without any known bottlenecks, then leave it be and strive to write as readable code as possible.
There are 3 bottlenecks i know of that can cause slow performance when you call fprintf(stdout,....):
Format parsing
Buffering
Your Terminal or other stdout device
To avoid the format parsing, you could write using fwrite(), but then you have to create the output string in a other way and if this is faster is questionable.
Normally, stdout uses a line buffer, this means that the data has to be checked for \n characters and, assuming you running on a OS, for every line a syscall is used. Syscalls are relatively slow compared to normal function calls. Setting the buffer to full buffering with setvbuf and
_IOFBF is probably the fastest buffer method. Use BUFSIZ or try different buffer sizes and benchmark them to find the best value.
When your terminal is slow, then there is nothing you can do about in your program. You could write to a file which can be faster or use a faster terminal. AFAIK Alacritty is probably the fastest terminal on Linux.

How efficient is reading a file one byte at a time in C?

After going through most of the book, "The C Programming Language," I think I have a decent grasp on programming in C. One common C idiom presented in that book is reading a file a single byte at a time, using functions like getchar() and fgetc(). So far, I've been using these functions to do all IO in my C programs.
My question is, is this an efficient way of reading a file? Does a call to get a single byte require a lot of overhead that can be minimized if I read multiple bytes into a buffer at a time, for instance by using the read() system call on Unix systems? Or do the operating system and C library handle a buffer behind the scenes to make it more efficient? Also, does this work the same way for writing to files a single byte at a time?
I would like to know how this generally works in C, but if it is implementation or OS specific, I would like to know how it works in GCC on common Unix-like systems (like macOS and linux).
Using getchar() etc is efficient because the standard I/O library uses buffering to read many bytes at once (saving them in a buffer) and doles them out one at a time when you call getchar().
Using read() to read a single byte at a time is much slower, typically, because it makes a full system call each time. It still isn't catastrophically slow, but it is nowhere near as fast as reading 512, or 4096, bytes into a buffer.
Those are broad, sweeping statements. There are many caveats that could be added, but they are a reasonable general outline of the performance of getchar(), etc.

Parsing: load into memory or use stream

I'm writing a little parser and I would like to know the advantages and disadvantages of the different ways to load the data to be parsed. The two ways that I thought of are:
Load the file's contents into a string then parse the string (access the character at an array position)
Parse as reading the file stream (fgetc)
The former will allow me to have two functions: one for parse_from_file and parse_from_string, however I believe this mode will take up more memory. The latter will not have that disadvantage of using more memory.
Does anyone have any advice on the matter?
Reading the entire file in or memory mapping it will be faster, but may cause issues if you want your language to be able to #include other files as these would be memory mapped or read into memory as well.
The stdio functions would work well because they usually try to buffer up data for you, but they are also general purpose so they also try to look out for usage patterns which differ from reading a file from start to finish, but that shouldn't be too much overhead.
A good balance is to have a large circular buffer (x * 2 * 4096 is a good size) which you load with file data and then have your tokenizer read from. Whenever a block's worth of data has been passed to your tokenizer (and you know that it is not going to be pushed back) you can refill that block with new data from the file and update some buffer location info.
Another thing to consider is if there is any chance that the tokenizer would ever need to be able to be used to read from a pipe or from a person typing directly in some text. In these cases your reads may return less data than you asked for without it being at the end of the file, and the buffering method I mentioned above gets more complicated. The stdio buffering is good for this as it can easily be switched to/from line or block buffering (or no buffering).
Using gnu fast lex (flex, but not the Adobe Flash thing) or similar can greatly ease the trouble with all of this. You should look into using it to generate the C code for your tokenizer (lexical analysis).
Whatever you do you should try to make it so that your code can easily be changed to use a different form of next character peek and consume functions so that if you change your mind you won't have to start over.
Consider using lex (and perhaps yacc, if the language of your grammar matches its capabilities). Lex will handle all the fiddly details of lexical analysis for you and produce efficient code. You can probably beat its memory footprint by a few bytes, but how much effort do you want to expend into that?
The most efficient on a POSIX system would probably neither of the two (or a variant of the first if you like): just map the file read-only with mmap, and parse it then. Modern systems are quite efficient with that in that they prefetch data when they detect a streaming access etc., multiple instances of your program that parse the same file will get the same physical pages of memory etc. And the interfaces are relatively simple to handle, I think.

Resources