Clearest way to read and print .txt file lines in C - c

There are a bunch of ways describing how to use various methods to print out lines of a text file on this site:
Posix-style,
reading IP addresses,
Fixed line length.
They all seem to be tailored to a specific example.
It would be great to have the Clearest and Most Concise and Easiest way to simply: print each line of any text file to the screen. Preferably with detailed explanations of what each line does.
Points for brevity and clarity.

#include <stdio.h>
static void cat(FILE *fp)
{
char buffer[4096];
size_t nbytes;
while ((nbytes = fread(buffer, sizeof(char), sizeof(buffer), fp)) != 0)
fwrite(buffer, sizeof(char), nbytes, stdout);
}
int main(int argc, char **argv)
{
FILE *fp;
const char *file;
while ((file = *++argv) != 0)
{
if ((fp = fopen(file, "r")) != 0)
{
cat(fp);
fclose(fp);
}
}
return(0);
}
The cat() function is not strictly necessary, but I'd rather use it. The main program steps through each command line argument and opens the named file. If it succeeds, it calls the cat() function to print its contents. Since the call to fopen() does not specify "rb", it is opened as a text file. If the file is not opened, this code silently ignores the issue. If no files are specified, nothing is printed at all.
The cat() function simply reads blocks of text up to 4096 bytes at a time, and writes them to standard output ('the screen'). It stops when there's no more to read.
If you want to extend the code to read standard input when no file is specified, then you can use:
if (argc == 1)
cat(stdin);
else
{
...while loop as now...
}
which is one of the reasons for having the cat() function written as shown.
This code does not pay direct attention to newlines — or lines of any sort. If you want to process it formally one line at a time, then you can do several things:
static void cat(FILE *fp)
{
char buffer[4096];
while (fgets(buffer, sizeof(buffer), fp) != 0)
fputs(buffer, stdout);
}
This will read and write one line at a time. If any line is longer than 4095 bytes, it will read the line in two or more operations and write it in the same number of operations. Note that this assumes a text file in a way that the version using fread() and fwrite() does not. On POSIX systems, the version with fread() and fwrite() will handle arbitrary binary files with null bytes ('\0') in the data, but the version using fgets() and fputs() will not. Both the versions so far are strictly standard C (any version of the standard) as they don't use any platform-specific extensions; they are about as portable as code can be.
Alternatively again, if you have the POSIX 2008 getline() function, you can use that, but you need #include <stdlib.h> too (because you end up having to release the memory it allocates):
static void cat(FILE *fp)
{
char *buffer = 0;
size_t buflen = 0;
while (getline(&buffer, &buflen, fp) != -1)
fputs(buffer, stdout);
free(buffer);
}
This version, too, will not handle binary data (meaning data with null bytes in it). It could be upgraded to do so, of course:
static void cat(FILE *fp)
{
char *buffer = 0;
size_t buflen = 0;
ssize_t nbytes;
while ((nbytes = getline(&buffer, &buflen, fp)) != -1)
fwrite(buffer, sizeof(char), nbytes, stdout);
free(buffer);
}
The getline() function reports how many bytes it read (there's a null byte after that), but the fwrite() function is the only one that takes a stream of arbitrary bytes and writes them all to the given stream.

Well, here is a very short solution I eventually made. I imagine there is somethign fundamentally wrong with it otherwise it would have been suggested, but I figured I would post it here and hope someone tears it apart:
#include <stdio.h>
main()
{
FILE *MyFile;
int c;
MyFile=fopen("C:\YourFile.txt","r");
c = fgetc(MyFile);
while (c!=EOF)
{
printf("%c",c);
c = fgetc(MyFile);
}
}

#Dlinet, you are trying to learn some useful lessons on how to organize a program. I won't post code because there is already a really excellent answer; I cannot possibly improve upon it. But I would like to recommend a book to you.
The book is called Software Tools in Pascal. The language is Pascal, not C, but for reading the book this will cause no serious hardship. They start out implementing simple tools like the one in this example (which on UNIX is called cat) and they move on to more advanced stuff. Not only do they teach great lessons on how to organize this sort of program, they also cover language design issues. (There are problems in Pascal that really vex them, and if you know C you will realize that C doesn't have those problems.)
The book is out of print now, but I found it to be hugely valuable when I was learning to write code. The so-called "left corner design" methodology serves me well to this day.
I encourage you to find a used copy on Amazon or wherever. Amazon has used copies starting at $0.02 plus $4 shipping.
http://www.amazon.com/Software-Tools-Pascal-Brian-Kernighan/dp/0201103427
It would be an educational exercise to study the programs in this book and implement them in C. Any Linux system already has more-powerful and fully-debugged versions of these programs, but it would not be a waste of your time to work through this book and learn how to write this stuff.
Alternatively you could install FreePascal on your computer and use it to run the programs from the book.
Good luck and may you always enjoy software development!

If you want something prebaked, there's cat on POSIX systems.
If you want to write it yourself, here's the basic layout:
Check to make sure file name, permissions, and path are valid
Read til newline separator in a loop (\n on Unix, \r\n on Windows/DOS)
Check for error. If so, print error an abort.
Print line to screen.
Repeat
The point is, there isn't really a specific way to do it. Just read, then write, and repeat. With some error checking, you've got cat all over again.

Related

Reading from stdin after reading from file

I am trying to read each line from stdin after I finished reading from given file, or if given file name does not exist. Currently I am using below format.
while (fgets(buf, sizeof(buf), fp)!=NULL){
main process...
}
while (fgets(buf, sizeof(buf), stdin)!=NULL){
main process...
}
This format does work as I intended.
However, main process is quite a chunky code, and would there be a way to shorten this, so that I can write while loop only once? Thank you.
If your problem is that 'main process' consists of a lot of lines of code that you do not want to duplicate, the most straightforward solution is to make a function that implements main process.
Since the while loops are identical, save for the file pointer, you could also include the while loop in the function, with the file pointer as a parameter (as in David's remark).
Then you should add a function like this:
void process_input(FILE *input_handle) {
char buf[1024];
while (fgets(buf, sizeof(buf), input_handle) != NULL) {
main process...
}
}
And your original code then should be replaced with:
process_input(fp);
process_input(stdin);
would there be a way to shorten this, so that I can write while loop only once?
There isn't.
You can of course abstract the code into a function which takes a FILE* as a parameter, or extend the stdio interfaces yourself (example), but the long and short of it is that neither standard C nor any popular libc implementation have anything like the ARGV file handle from perl, or anything that let you open a list of files as a single stream.

How do i read a file backwards using read() in c? [duplicate]

This question already has answers here:
Reading a text file backwards in C
(5 answers)
Closed 9 years ago.
I am supposed to create a program that takes a given file and creates a file with reversed txt. I wanted to know is there a way i can start the read() from the end of the file and copy it to the first byte in the created file if I dont know the exact size of the file?
Also i have googled this and came across many examples with fread, fopen, etc. However i cant use those for this project i can only use read, open, lseek, write, and close.
here is my code so far its not much but just for reference:
#include<stdio.h>
#include<unistd.h>
int main (int argc, char *argv[])
{
if(argc != 2)/*argc should be 2 for correct execution*/
{
printf("usage: %s filename",argv[0[]);}
}
else
{
int file1 = open(argv[1], O_RDWR);
if(file1 == -1){
printf("\nfailed to open file.");
return 1;
}
int reversefile = open(argv[2], O_RDWR | O_CREAT);
int size = lseek(argv[1], 0, SEEK_END);
char *file2[size+1];
int count=size;
int i = 0
while(read(file1, file2[count], 0) != 0)
{
file2[i]=*read(file1, file2[count], 0);
write(reversefile, file2[i], size+1);
count--;
i++;
lseek(argv[2], i, SEEK_SET);
}
I doubt that most filesystems are designed to support this operation effectively. Chances are, you'd have to read the whole file to get to the end. For the same reasons, most languages probably don't include any special feature for reading a file backwards.
Just come up with something. Try to read the whole file in memory. If it is too big, dump the beginning, reversed, into a temporary file and keep reading... In the end combine all temporary files into one. Also, you could probably do something smart with manual low-level manipulation of disk sectors, or at least with low-level programming directly against the file system. Looks like this is not what you are after, though.
Why don't you try fseek to navigate inside the file? This function is contained in stdio.h, just like fopen and fclose.
Another idea would be to implement a simple stack...
This has no error checking == really bad
get file size using stat
create a buffer with malloc
fread the file into the buffer
set a pointer to the end of the file
print each character going backwards thru the buffer.
If you get creative with google you can get several examples just like this.
IMO the assistance you are getting so far is not really even good hints.
This appears to be schoolwork, so beware of copying. Do some reading about the calls used here. stat (fstat) fread (read)
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
int main(int argc, char **argv)
{
struct stat st;
char *buf;
char *p;
FILE *in=fopen(argv[1],"r");
fstat(fileno(in), &st); // get file size in bytes
buf=malloc(st.st_size +2); // buffer for file
memset(buf, 0x0, st.st_size +2 );
fread(buf, st.st_size, 1, in); // fill the buffer
p=buf;
for(p+=st.st_size;p>=buf; p--) // print traversing backwards
printf("%c", *p);
fclose(in);
return 0;
}

Buffering of standard I/O library

In the book Advanced Programming in the UNIX Environments (2nd edition), the author wrote in Section 5.5 (stream operations of the standard I/O library) that:
When a file is opened for reading and writing (the plus sign in the type), the following restrictions apply.
Output cannot be directly followed by input without an intervening fflush, fseek, fsetpos, or rewind.
Input cannot be directly followed by output without an intervening fseek, fsetpos, or rewind, or an input operation that encounters an end of file.
I got confused about this. Could anyone explain a little about this? For example, in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program? I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear.
You aren't allowed to intersperse input and output operations. For example, you can't use formatted input to seek to a particular point in the file, then start writing bytes starting at that point. This allows the implementation to assume that at any time, the sole I/O buffer will only contain either data to be read (to you) or written (to the OS), without doing any safety checks.
f = fopen( "myfile", "rw" ); /* open for read and write */
fscanf( f, "hello, world\n" ); /* scan past file header */
fprintf( f, "daturghhhf\n" ); /* write some data - illegal */
This is OK, though, if you do an fseek( f, 0, SEEK_CUR ); between the fscanf and the fprintf because that changes the mode of the I/O buffer without repositioning it.
Why is it done this way? As far as I can tell, because OS vendors often want to support automatic mode switching, but fail. The stdio spec allows a buggy implementation to be compliant, and a working implementation of automatic mode switching simply implements a compatible extension.
It's not clear what you're asking.
Your basic question is "Why does the book say I can't do this?" Well, the book says you can't do it because the POSIX/SUS/etc. standard says it's undefined behavior in the fopen specification, which it does to align with the ISO C standard (N1124 working draft, because the final version is not free), 7.19.5.3.
Then you ask, "in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program?"
Undefined behavior will always cause unexpected behavior, because the whole point is that you're not allowed to expect anything. (See 3.4.3 and 4 in the C standard linked above.)
But on top of that, it's not even clear what they could have specified that would make any sense. Look at this:
int main(int argc, char *argv[]) {
FILE *fp = fopen("foo", "r+");
fseek(fp, 0, SEEK_SET);
fwrite("foo", 1, 3, fp);
fseek(fp, 0, SEEK_SET);
fwrite("bar", 1, 3, fp);
char buf[4] = { 0 };
size_t ret = fread(buf, 1, 3, fp);
printf("%d %s\n", (int)ret, buf);
}
So, should this print out 3 foo because that's what's on disk, or 3 bar because that's what's in the "conceptual file", or 0 because there's nothing after what's been written so you're reading at EOF? And if you think there's an obvious answer, consider the fact that it's possible that bar has been flushed already—or even that it's been partially flushed, so the disk file now contains boo.
If you're asking the more practical question "Can I get away with it in some circumstances?", well, I believe on most Unix platforms, the above code will give you an occasional segfault, but 3 xyz (either 3 uninitialized characters, or in more complicated cases 3 characters that happened to be in the buffer before it got overwritten) the rest of the time. So, no, you can't get away with it.
Finally, you say, "I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear." This sounds like you're asking about the rationale.
You're right that it's about buffering. As I pointed out above, there really is no intuitive right thing to do here—but also, think about the implementation. Remember that the Unix way has always been "if the simplest and most efficient code is good enough, do that".
There are three ways you could implement something like stdio:
Use a shared buffer for read and write, and write code to switch contexts as needed. This is going to be a bit complicated, and will flush buffers more often than you'd ideally like.
Use two separate buffers, and cache-style code to determine when one operation needs to copy from and/or invalidate the other buffer. This is even more complicated, and makes a FILE object take twice as much memory.
Use a shared buffer, and just don't allow interleaving reads and writes without explicit flushes in between. This is dead-simple, and as efficient as possible.
Use a shared buffer, and implicitly flush between interleaved reads and writes. This is almost as simple, and almost as efficient, and a lot safer, but not really any better in any way other than safety.
So, Unix went with #3, and documented it, and SUS, POSIX, C89, etc. standardized that behavior.
You might say, "Come on, it can't be that inefficient." Well, you have to remember that Unix was designed for low-end 1970s systems, and the basic philosophy that it's not worth trading off even a little efficiency unless there's some actual benefit. But, most importantly, consider that stdio has to handle trivial functions like getc and putc, not just fancy stuff like fscanf and fprintf, and adding anything to those functions (or macros) that makes them 5x as slow would make a huge difference in a lot of real-world code.
If you look at modern implementations from, e.g., *BSD, glibc, Darwin, MSVCRT, etc. (most of which are open source, or at least commercial-but-shared-source), most of them do things the same way. A few add safety checks, but they generally give you an error for interleaving rather than implicitly flushing—after all, if your code is wrong, it's better to tell you that your code is wrong than to try to DWIM.
For example, look at early Darwin (OS X) fopen, fread, and fwrite (chosen because it's nice and simple, and has easily-linkable code that's syntax-colored but also copy-pastable). All that fread has to do is copy bytes out of the buffer, and refill the buffer if it runs out. You can't get any simpler than that.
reason 1
find the real file position to start.
due to the buffer implementation of the stdio, the stdio stream position may differ from the OS file position. when you read 1 byte, stdio mark the file position to 1. Due to the buffering, stdio may read 4096 bytes from the underlying file, where OS would record its file position at 4096. When you switch to output, you really need to choose which position you want to use.
reason 2
find the right buffer cursor to start.
tl;dr,
if an underlying implementation only uses a single shared buffer for both read and write, you have to flush the buffer when changing IO direction.
Take this glibc used in chromium os to demo how fwrite, fseek, and fflush handle the single shared buffer.
fwrite fill buffer impl:
fill_buffer:
while (to_write > 0)
{
register size_t n = to_write;
if (n > buffer_space)
n = buffer_space;
buffer_space -= n;
written += n;
to_write -= n;
if (n < 20)
while (n-- > 0)
*stream->__bufp++ = *p++;
else
{
memcpy ((void *) stream->__bufp, (void *) p, n);
stream->__bufp += n;
p += n;
}
if (to_write == 0)
/* Done writing. */
break;
else if (buffer_space == 0)
{
/* We have filled the buffer, so flush it. */
if (fflush (stream) == EOF)
break;
from this code snippet, we can see, if buffer is full, it will flush it.
Let's take a look at fflush
int
fflush (stream)
register FILE *stream;
{
if (stream == NULL) {...}
if (!__validfp (stream) || !stream->__mode.__write)
{
__set_errno (EINVAL);
return EOF;
}
return __flshfp (stream, EOF);
}
it uses __flshfp
/* Flush the buffer for FP and also write C if FLUSH_ONLY is nonzero.
This is the function used by putc and fflush. */
int
__flshfp (fp, c)
register FILE *fp;
int c;
{
/* Make room in the buffer. */
(*fp->__room_funcs.__output) (fp, flush_only ? EOF : (unsigned char) c);
}
the __room_funcs.__output by default is using flushbuf
/* Write out the buffered data. */
wrote = (*fp->__io_funcs.__write) (fp->__cookie, fp->__buffer,
to_write);
Now we are close. What's __write? Trace the default settings aforementioned, it's __stdio_write
int
__stdio_write (cookie, buf, n)
void *cookie;
register const char *buf;
register size_t n;
{
const int fd = (int) cookie;
register size_t written = 0;
while (n > 0)
{
int count = __write (fd, buf, (int) n);
if (count > 0)
{
buf += count;
written += count;
n -= count;
}
else if (count < 0
#if defined (EINTR) && defined (EINTR_REPEAT)
&& errno != EINTR
#endif
)
/* Write error. */
return -1;
}
return (int) written;
}
__write is the system call to write(3).
As we can see, the fwrite is only using only one single buffer. If you change direction, it can still store the previous write contents. From the above example, you can call fflush to empty the buffer.
The same applies to fseek
/* Move the file position of STREAM to OFFSET
bytes from the beginning of the file if WHENCE
is SEEK_SET, the end of the file is it is SEEK_END,
or the current position if it is SEEK_CUR. */
int
fseek (stream, offset, whence)
register FILE *stream;
long int offset;
int whence;
{
...
if (stream->__mode.__write && __flshfp (stream, EOF) == EOF)
return EOF;
...
/* O is now an absolute position, the new target. */
stream->__target = o;
/* Set bufp and both end pointers to the beginning of the buffer.
The next i/o will force a call to the input/output room function. */
stream->__bufp
= stream->__get_limit = stream->__put_limit = stream->__buffer;
...
}
it will soft flush (reset) the buffer at the end, which means read buffer will be emptied after this call.
This obeys the C99 rationale:
A change of input/output direction on an update file is only allowed following a successful fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions which assure that the I/O buffer has been flushed.

File IO does not appear to be reading correctly

Disclaimer: this is for an assignment. I am not asking for explicit code. Rather, I only ask for enough help that I may understand my problem and correct it myself.
I am attempting to recreate the Unix ar utility as per a homework assignment. The majority of this assignment deals with file IO in C, and other parts deal with system calls, etc..
In this instance, I intend to create a simple listing of all the files within the archive. I have not gotten far, as you may notice. The plan is relatively simple: read each file header from an archive file and print only the value held in ar_hdr.ar_name. The rest of the fields will be skipped over via fseek(), including the file data, until another file is reached, at which point the process begins again. If EOF is reached, the function simply terminates.
I have little experience with file IO, so I am already at a disadvantage with this assignment. I have done my best to research proper ways of achieving my goals, and I believe I have implemented them to the best of my ability. That said, there appears to be something wrong with my implementation. The data from the archive file does not seem to be read, or at least stored as a variable. Here's my code:
struct ar_hdr
{
char ar_name[16]; /* name */
char ar_date[12]; /* modification time */
char ar_uid[6]; /* user id */
char ar_gid[6]; /* group id */
char ar_mode[8]; /* octal file permissions */
char ar_size[10]; /* size in bytes */
};
void table()
{
FILE *stream;
char str[sizeof(struct ar_hdr)];
struct ar_hdr temp;
stream = fopen("archive.txt", "r");
if (stream == 0)
{
perror("error");
exit(0);
}
while (fgets(str, sizeof(str), stream) != NULL)
{
fscanf(stream, "%[^\t]", temp.ar_name);
printf("%s\n", temp.ar_name);
}
if (feof(stream))
{
// hit end of file
printf("End of file reached\n");
}
else
{
// other error interrupted the read
printf("Error: feed interrupted unexpectedly\n");
}
fclose(stream);
}
At this point, I only want to be able to read the data correctly. I will work on seeking the next file after that has been finished. I would like to reiterate my point, however, that I'm not asking for explicit code - I need to learn this stuff and having someone provide me with working code won't do that.
You've defined a char buffer named str to hold your data, but you are accessing it from a separate memory ar_hdr structure named temp. As well, you are reading binary data as a string which will break because of embedded nulls.
You need to read as binary data and either change temp to be a pointer to str or read directly into temp using something like:
ret=fread(&temp,sizeof(temp),1,stream);
(look at the doco for fread - my C is too rusty to be sure of that). Make sure you check and use the return value.

C readline function

In an assignment for college it was suggested to use the C readline function in an exercise. I have searched for its reference but still haven't found it. Does it really exist? In which header? Can you please post the link to the reference?
Readline exists in two places, libreadline and libedit (also called libeditline). Both have an identical interface. The difference is libreadline is licensed under the GPL, libedit is 3 clause BSD. Licensing is really not a concern for an assignment, at least I don't think it is. Either license allows you to use the code freely. If you link against readline, be sure to make the whole program GPL 2 or later which will satisfy whatever version of the GPL governs the system readline. It may be GPL2+ or GPL3+, depending on the age of the system. I'm not advocating either license, that's up to you.
Note, take care to install either / or and adjust linking as needed (-lreadline or -ledit or -leditline). Both are libraries and not a part of the standard C library.
Edit (afterthought):
If releasing a program to the wild, its a nice gesture to allow the user to configure it with their readline of choice. For instance: --with-readline or --with-libedit, etc. This allows a binary package that conforms to their choice of license, at least as far as readline is concerned.
Links: Readline and Edit/Editline.
I don't think it's a standard function.
I simple implementation would be like this:
char *Readline(char *in) {
char *cptr;
if (cptr = fgets(in, MAX_LINE, stdin)) {
/* kill preceding whitespace but leave \n so we're guaranteed to have something
while(*cptr == ' ' || *cptr == '\t') {
cptr++;
}
return cptr;
} else {
return 0;
}
}
It uses fgets() to read up to MAX_LINE - 1 characters into the buffer 'in'. It strips preceding whitespace and returns a pointer to the first non-whitespace character.
Not sure if you tried reading this from the GNU C Library: ssize_t getline (char **lineptr, size_t *n, FILE *stream).
This function reads a line from a file and can even re-allocate more space if needed.
An example of this is found in the manpage of getline. Below is a copy of it.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
FILE *stream;
char *line = NULL;
size_t len = 0;
ssize_t nread;
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(EXIT_FAILURE);
}
stream = fopen(argv[1], "r");
if (stream == NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
while ((nread = getline(&line, &len, stream)) != -1) {
printf("Retrieved line of length %zu:\n", nread);
fwrite(line, nread, 1, stdout);
}
free(line);
fclose(stream);
exit(EXIT_SUCCESS);
}
If you need a "readLine()" function, like the readLine() in Java-BufferedReader, you can also freely use my function «char* get_line(FILE *filePointer)» in "line.h", which I wrote just for this purpose: https://github.com/pheek/line.h/blob/master/line.h
It doesn't exist.
They were mistaken and referred to gets() from stdio.h.
Also this is a very unsafe function due to no maximum size to read parameter, making it immediate security whole (lookup buffer overrun attack). You may use fgets() instead.

Resources