C file pointers, multiple reads on stdin - c

I have an existing program where a message (for example, an email, or some other kind of message) will be coming into a program on stdin.
I know stdin is a FILE* but I'm somewhat confused as to what other special characteristics it has. I'm currently trying to add a check to the program, and handle the message differently if it contains a particular line (say, the word "hello"). The problem is, I need to search through the file for that word, but I still need stdin to point to its original location later in the program. An outline of the structure is below:
Currently:
//actual message body is coming in on stdin
read_message(char type)
{
//checks and setup
if(type == 'm')
{
//when it reaches this point, nothing has touched stdin
open_and_read(); //it will read from stdin
}
//else, never open the message
}
I want to add another check, but where I have to search the message body.
Like so:
//actual message body is coming in on stdin
read_message(char type)
{
//checks and setup
//new check
if(message_contains_hello()) //some function that reads through the message looking for the word hello
{
other_functionality();
}
if(type == 'm')
{
//when it reaches this point, my new check may have modified stdin
open_and_read(); //it will read from stdin
}
//else, never open the message
}
The problem with this is that to search the message body, I have to touch the file pointer stdin. But, if I still need to open and read the message in the second if statement (if type = 'm'), stdin needs to point to the same place it was pointing at the start of the program. I tried creating a copy of the pointer but was only successful in creating a copy that would also modify stdin if modified itself.
I don't have a choice about how to pass the message - it has to stay on stdin. How can I access the actual body of a message coming in on stdin without modifying stdin itself?
Basically, how can I read from it, and then have another function be able to read from the beginning of the message as well?

The short answer is that you can't. Once you read data from standard input, it's gone.
As such, your only real choice is to save what you read, and do the later processing on that rather than reading directly from standard input. If your later processing demands reading from a file, one possibility would be to structure this as two separate programs, with one acting as a filter for the other.

In general, you can only read bytes from stdin once. There is no fseek() functionality. To solve this problem, you can read the bytes into a buffer in your program, look at the bytes, and then pass the buffer off to another function that actually does something with the rest of the data.
Depending on your program, you may need to only read some of the data on stdin, or you may need to read all of it into that buffer. Either way, you will probably have to modify the existing code in the program in some way.

I know stdin is a FILE* but I'm somewhat confused as to what other special characteristics it has.
Well, it's opened for reading. But it's not guaranteed to be seekable, so you'll want to read in its contents entirely, then handle the resulting string (or list of strings, or whatever).

You should use and take advantage of buffering (<stdio.h> provides buffered I/O, but see setbuf).
My suggestion is to read your stdin line by line, e.g. using getline. Once you've read an entire line, you can do some minimal look-ahead inside.
Perhaps you might read more about parsing techniques.

Related

Are the buffer of input and output different in C?

Are the buffer of a C input and output different ? I am trying to implement buffering emulation in assembly and trying to do it as the C one does. I have so far implemented buffering system in my fgets function, however, I am not sure how I should implement it in case of fputs function. If the "buffer" are same, then it does make sense to implement a global variable which will contain the file descriptor last used so that in the case of a "fputs, fgets, fputs" operation, I can use the last file descriptor to flush out the buffer before reading in case of fgets. But this method also seems very costy, as I have to flush out the buffer every time regardless of the fact that I called a fputs function before. Or should I use two buffer for inputting and outputting ?
Or should I use two buffer for inputting and outputting ?
Suggest using one buffer per file handle; that should cover the common use cases — rather than buffering based on i/o direction.

Reading a file in C with File Descriptor

I want to read from a file by using its file descriptor. I can't use its name because of assignment rules.
I obtain it by calling open and it works fine. At this moment I know that I have to use the read() function in order to read from it. My problem is that read() function requires as an argument the number of bytes to read, and I want to read a whole line from the file each time, so I don't know how many bytes to read.
If i use for example fscanf(), it works fine with a simple string and I take back the whole line as I want. So my question is:
Is there any function like fscanf() which can be called with file descriptor and not with a file pointer?
When you say "have to use read()" I can't tell if that's your understanding of the situation given a file descriptor from open() or a restriction on some kind of assignment.
If you have a file descriptor but you're more comfortable with fscanf() and friends, use fdopen() to get a FILE * from your fd and proceed to use stdio.
Internally it uses functions like read() into a buffer and then processes those buffers as you read them with fscanf() and friends.
What you could do is read one character at a time, until you've read the entire line, and detect a '/n'. As this is homework, I won't write it for you.
A few things to be warned of, however.
You need to check for EOF, otherwise, you might end up in an infinite loop.
You should declare some buffer which you read a character, then copy it into the buffer. Not knowing what your input is, I can't suggest a size, other than to say that for a homework assignment, [256] would probably be sufficient.
You need to make sure you don't overfill your buffer in the even that you do run over it's length.
Keep reading until you find a '/n' character. Then process the line that you have created, and start the next one.

C fwrite error handling

When dealing with a count mismatch from fwrite (and therefore error), dealing with the error, what is the correct approach?
clearerr(File); //Clear the error
fflush(File); //Empty the buffer of it's contents
Or:
fflush(File); //Other way around, empty buffer first then reset
clearerr(File);
Or just:
clearerr(File); //Contains fflush implicitly?
Or something else?
There isn't really anything you can do if you encounter a write error. You can flush the buffer, but your last write was still broken, so the file doesn't contain what you want. You could close the file, reopen it for writing (with "truncate") and write it anew, but that only works if you still have the entire file content in memory.
Alternatively, you could reopen and see how much data has been written, but that doesn't help you if there's an external reason why you can't write to the file, so there's really no graceful way to recover.
So in short, you don't "handle" the error at the file site; rather, your program must handle the larger error condition that the write just failed and react at an appropriate point.
You should probably consider "atomic writes", which means you first write your file to a temporary, and only if you succeed to you delete the original and rename the temporary to the original file name. That way the file itself is always in a consistent state.

unistd.h read() function: How to read a file line by line?

What I need to do is use the read function from unistd.h to read a file
line by line. I have this at the moment:
n = read(fd, str, size);
However, this reads to the end of the file, or up to size number of bytes.
Is there a way that I can make it read one line at a time, stopping at a newline?
The lines are all of variable length.
I am allowed only these two header files:
#include <unistd.h>
#include <fcntl.h>
The point of the exercise is to read in a file line by line, and
output each line as it's read in. Basically, to mimic the fgets()
and fputs() functions.
You can read character by character into a buffer and check for the linebreak symbols (\r\n for Windows and \n for Unix systems).
You'll want to create a buffer twice the length of your longest line you'll support, and you'll need to keep track of your buffer state.
Basically, each time you're called for a new line you'll scan from your current buffer position looking for an end-of-line marker. If you find one, good, that's your line. Update your buffer pointers and return.
If you hit your maxlength then you return a truncated line and change your state to discard. Next time you're called you need to discard up to the next end of line, and then enter your normal read state.
If you hit the end of what you've read in, then you need to read in another maxline chars, wrapping to the start of the buffer if you hit the bottom (ie, you may need to make two read calls) and then continue scanning.
All of the above assumes you can set a max line length. If you can't then you have to work with dynamic memory and worry about what happens if a buffer malloc fails. Also, you'll need to always check the results of the read in case you've hit the end of the file while reading into your buffer.
Unfortunately the read function isn't really suitable for this sort of input. Assuming this is some sort of artificial requirement from interview/homework/exercise, you can attempt to simulate line-based input by reading the file in chunks and splitting it on the newline character yourself, maintaining state in some way between calls. You can get away with a static position indicator if you carefully document the function's use.
This is a good question, but allowing only the read function doesn't help! :P
Loop read calls to get a fixed number of bytes, and search the '\n' character, then return a part of the string (untill '\n'), and stores the rest (except '\n') to prepend to the next character file chunk.
Use dynamic memory.
Greater the size of the buffer, less read calls used (which is a system call, so no cheap but nowadays there are preemptive kernels).
...
Or simply fix a maximum line length, and use fgets, if you need to be quick...
If you need to read exactly 1 line (and not overstep) using read(), the only generally-applicable way to do that is by reading 1 byte at a time and looping until you get a newline byte. However, if your file descriptor refers to a terminal and it's in the default (canonical) mode, read will wait for a newline and return less than the requested size as soon as a line is available. It may however return more than one line, if data arrives very quickly, or less than 1 line if your program's buffer or the internal terminal buffer is shorter than the line length.
Unless you really need to avoid overstep (which is sometimes important, if you want another process/program to inherit the file descriptor and be able to pick up reading where you left off), I would suggest using stdio functions or your own buffering system. Using read for line-based or byte-by-byte IO is very painful and hard to get right.
Well, it will read line-by-line from a terminal.
Some choices you have are:
Write a function that uses read when it runs out of data but only returns one line at a time to the caller
Use the function in the library that does exactly that: fgets().
Read only one byte at a time, so you don't go too far.
If you open the file in text mode then Windows "\r\n" will be silently translated to "\n" as the file is read.
If you are on Unix you can use the non-standard1 gcc 'getline()' function.
1 The getline() function is standard in POSIX 2008.
Convert file descriptor to FILE pointer.
FILE* fp = fdopen(fd, "r");
Then you can use getline().

forcing fgets to block (i.e. faking an "interactive device")

I have a C application which provides a "shell" for entering commands. I'm trying to write some automated test-code for the application (Using CUnit). The "shell" input is read from stdin like so:
fgets(buf, sizeof(buf), stdin);
I can "write" commands automatically to the application by freopen()'ning stdin and hooking it to an intermediate file. When the application is executed "normally" the fgets() call blocks untill characters are available because it is "an interactive device", but not so on the intermediate file. So how can I fake fgets into thinking the intermediate file is an "interactive device".
The C program is for Windows (XP) compiled using MinGW.
Regards!
fgets is not blocking when you are reading from a file because it reaches the end of the file which causes EOF to set on the stream and thus calls to fgets return immediately. When you are running from an interactive input EOF is never set, unless you type Ctrl-Z (or Ctrl-D on UNIX system) of course.
If you really want to use an intermediate file I think you'll need to enhance your shell so that when it hits an EOF it clears and retests it after a suitable wait. A function like this should work I think:-
void waitForEofClear(FILE *f)
{
while (feof(f)) {
clearerr(f);
sleep(1);
}
}
You could then call this before the fgets:-
waitForEofClear(stdin);
fgets(buf, sizeof(buf), stdin);
Simply using a file is not going to work, as the other answers have indicated. So, you need to decide what you are going to do instead. A FIFO (named pipe) or plain (anonymous) pipe could be used to feed the interactive program under test - or, on Unix, you could use a pseudo-tty. The advantage of all these is that a program blocks when there is no data read, waiting for the next information to arrive, rather than immediately deciding 'no data to read, must be EOF'.
You will then need a semi-intelligent (or even intelligent) program periodically writing data to the channel for the program under test to read. This program will need to know how long to pause between the messages it writes. This might be as simplistic as 'wait one second; write the next line of data'. Or you might do something more complex.
One scheme that I know of has two programs - a capture program to record what a user types and the timing of it (so the 'data' file is structured; it has records consisting of a delay (in seconds and fractions of a second) plus a set of characters to send (count and list of bytes). This is run to capture what the user types and record it (as well as send the data to the program). There is then a second replay program that reads the file, and interprets the delays and character sequences.
This scheme works adequately if the input sequence is stable; if the same sequence of key strokes is always needed to get the required result. If the data sent to the program needs to adapt to what the program under test is doing and its responses, and may do different things at different times, then you are probably better off going with 'expect'. This has the capacity to do whatever you need - at least for non-GUI programs.
I'm not sure what the windows equivalent is, but in Linux I would make the intermediate file a fifo. If I was going to do a real non-trivial autopilotting, I would wrap it in an expect script.

Resources