Is there any performance and usage-difference between read and fread? - c

The goal of this program is to read a line from a file and load it into a buffer of a fixed size buf_len. If the line ends or the buffer is not big enough to store the line the program exits.
int read_line(int file, char * buffer, size_t buf_len) {
size_t total = 0;
for (total = 0; total < buf_len; ++total) {
ssize_t bytes_read = read(file, &buffer[total], 1) // why shouldn't one use fread() here?
if (bytes_read == 0) return total != 0;
if buffer[total] == '\0' return 1;
}
exit(-1) // line to long
Is there any difference if one uses fread() instead of read() in this context?

The main difference between read() and fread() is that read() is a system call that reads a specified number of bytes from a file descriptor, whereas fread() is a standard library function that reads a specified number of elements from a file stream. fread() is typically used for reading binary data, while read() is used for reading text and binary data.
In the context of the code you provided, it would make more sense to use fread() if you were reading binary data from a file stream, since it will read a specified number of elements from the file stream, rather than just a single byte as in read(). However, if you are reading text data, it may be more appropriate to use fgets(), as it will handle newlines and null-terminated strings more easily.
read() is a lower-level system call that may be more efficient for
reading large amounts of data, while fread() is a higher-level
function that may be more convenient for reading smaller amounts of
binary data.
Here are some official documentation links for the read() and fread()
read() system call: enter link description here
fread() standard library function: enter link description here
The read() system call is part of the POSIX standard and is available on most Unix-like systems, while fread() is part of the C standard library and is available on most C implementations.
These links provide detailed information on the usage, behavior, and return values of these functions. I hope you find them helpful!

Related

Using fread() to read a text based file - best practices

Consider this code to read a text based file. This sort of fread() usage was briefly touched upon in the excellent book C Programming: A Modern Approach by K.N. King.
There are other methods of reading text based files, but here I am concerned with fread() only.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
// Declare file stream pointer.
FILE *fp = fopen("Note.txt", "r");
// fopen() call successful.
if(fp != NULL)
{
// Navigate through to end of the file.
fseek(fp, 0, SEEK_END);
// Calculate the total bytes navigated.
long filesize = ftell(fp);
// Navigate to the beginning of the file so
// it can be read.
rewind(fp);
// Declare array of char with appropriate size.
char content[filesize + 1];
// Set last char of array to contain NULL char.
content[filesize] = '\0';
// Read the file content.
fread(content, filesize, 1, fp);
// Close file stream pointer.
fclose(fp);
// Print file content.
printf("%s\n", content);
}
// fopen() call unsuccessful.
else
{
printf("File could not be read.\n");
}
return 0;
}
There are some problems I have with this method. My opinion is that this is not a safe method of performing fread() since there might be an overflow if we try to read an extremely large string. Is this opinion valid?
To circumvent this issue, we may use a buffer size and keep on reading into a char array of that size. If filesize is less than buffer size, then we simply perform fread() once as described in the above code. Otherwise, We divide the total file size by the buffer size and get a result, whose int portion we will use as the total number of times to iterate a loop where we will invoke fread() each time, appending the read buffer array into a larger string. Now, for the final fread(), which we will perform after the loop, we will have to read exactly (filesize % buffersize) bytes of data into an array of that size and finally append this array into the larger string (Which we would have malloc-ed with filesize + 1 beforehand). I find that if we perform fread() for the last chunk of data using buffersize as its second parameter, then extra garbage data of size (buffersize - chunksize) will be read in and the data might become corrupted. Are my assumptions here correct? Please explain if/ how I have overlooked something.
Also, there is the issue that non-ASCII characters might not have size of 1 byte. In that case I would assume the proper amount is being read, but each byte is being read as a single char, so the text is distorted somehow? How is fread() handling reading of multi-byte chars?
this is not a safe method of performing fread() since there might be an overflow if we try to read an extremely large string. Is this opinion valid?
fread() does not care about strings (null character terminated arrays). It reads data as if it was in multiples of unsigned char*1 with no special concern to the data content if the stream opened in binary mode and perhaps some data processing (e.g. end-of-line, byte-order-mark) in text mode.
Are my assumptions here correct?
Failed assumptions:
Assuming ftell() return value equals the sum of fread() bytes.
The assumption can be false in text mode (as OP opened the file) and fseek() to the end is technical undefined behavior in binary mode.
Assuming not checking the return value of fread() is OK. Use the return value of fread() to know if an error occurred, end-of-file and how many multiples of bytes were read.
Assuming error checking is not required. , ftell(), fread(), fseek() instead of rewind() all deserve error checks. In particular, ftell() readily fails on streams that have no certain end.
Assuming no null characters are read. A text file is not certainly made into one string by reading all and appending a null character. Robust code detects and/or copes with embedded null characters.
Multi-byte: assuming input meets the encoding requirements. Example: robust code detects (and rejects) invalid UTF8 sequences - perhaps after reading the entire file.
Extreme: Assuming a file length <= LONG_MAX, the max value returned from ftell(). Files may be larger.
but each byte is being read as a single char, so the text is distorted somehow? How is fread() handling reading of multi-byte chars?
fread() does not function on multi-byte boundaries, only multiples of unsigned char. A given fread() may end with a portion of a multi-byte and the next fread() will continue from mid-multi-byte.
Instead of of 2 pass approach consider 1 single pass
// Pseudo code
total_read = 0
Allocate buffer, say 4096
forever
if buffer full
double buffer_size (`realloc()`)
u = unused portion of buffer
fread u bytes into unused portion of buffer
total_read += number_just_read
if (number_just_read < u)
quit loop
Resize buffer total_read (+ 1 if appending a '\0')
Alternatively consider the need to read the entire file in before processing the data. I do not know the higher level goal, but often processing data as it arrives makes for less resource impact and faster throughput.
Advanced
Text files may be simple ASCII only, 8-bit code page defined, one of various UTF encodings (byte-order-mark, etc. The last line may or may not end with a '\n'. Robust text processing beyond simple ASCII is non-trivial.
ASCII and UTF-8 are the most common. IMO, handle 1 or both of those and error out on anything that does not meet their requirements.
*1 fread() reads in multiple of bytes as per the 3rd argument, which is 1 in OP's case.
// v --- multiple of 1 byte
fread(content, filesize, 1, fp);

Read from file number of characters from a given position - C

I have a file called "cache.txt", where I have some data. I have a unordered_map, cache, with a pair of ints as value, where : the first element from the pair represents the number of characters I want to read, and the second the position from where I want to read. The key from the unordered_map, comps, is not relevant.
....
fp = fopen("cache.txt", "r");
fseek(fp, cache[comps].second, SEEK_SET);
int number_of_chars = cache[comps].first;
char c;
while((c = getc(fp)) != EOF && number_of_chars > 0) {
--number_of_chars;
printf("%c",c);
}
fclose(fp);
I have to use it several times, so that's the reason for opening and closing the file each time.
If you are using text streams (and apparently you are, because you open the file in mode "r", not "rb"), then the only position values you can pass to fseek are 0 or some value returned by ftell. In other words, you cannot count characters read yourself and use that count in a call to fseek.
If you use binary streams, then you can count the bytes read yourself, but you will have to deal with the fact that whatever the OS uses to represent newlines will not be translated to a newline character. Because there is no way to know how a given OS handles newlines, it's impossible to portably read a text stream in binary mode.
In short, you should take the requirement in the C standard library seriously. Make sure that the offset you are storing in your map came directly from a call to ftell. ("Directly" means that you cannot even use ftell(file) + 1; only ftell(file).)
Unix/Linux programmers can get away with not dealing with the above, since Posix mandates that there is no difference between text and binary modes, and only requires that fseek use a value returned by ftell for wide streams. But if you are using Windows, you will find that trying to use fseek to return to a byte position you computed yourself does not work. And if you are not using Windows, you should ask yourself whether your program might someday be ported to Windows.

C - does read() add a '\0'?

Does it have to? I've always been fuzzy on this sort of stuff, but if I have something like:
char buf[256];
read(fd, buf, 256);
write(fd2, buf, 256);
Is there potential for error here, other than the cases where those functions return -1?
If it were to only read 40 characters, would it put a \0 after it? (And would write recognize that \0 and stop?
Also, if it were to read 256 characters, is there a \0 after those 256?
does read() add a '\0'?
No, it doesn't. It just reads.
From read()'s documentation:
The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf.
Is there potential for error here, other than the cases where those functions return -1?
read() might return 0 indicating end-of-file.
If reading (also from a socket descriptor) read() not necessarily reads as much bytes as it was told to do. So in this context do not just test the outcome of read against -1, but also compare it against the number of bytes the function was told to read.
A general note:
Functions do what is documented (at least for proper implementations of the C language). Both your assumptions (autonomously set a 0-termination, detect the latter) are not documented.
No.
Consider reading binary data (eg. a photo from a file): adding extra bytes would corrupt the data.
From the man page:
Synopsis
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
That is void *, not char *, because read() reads bytes, not characters. It reads zero bytes as well as any other value, and since blocks of bytes (as opposed to strings) aren't terminated, read() doesn't.
Does it have to?
Not unless the data that is successfully read from the file contains a '\0'...
Is there potential for error here, other than the cases where those functions return -1?
Yes. read returns the actual number of bytes read (or a negative value to indicate failure). If you choose to write more than that number of bytes into your other file, then you are writing potential garbage.

C, format file for data of HTTP response

I have no experience with fscanf() and very little with functions for FILE. I have code that correctly determines if a client requested an existing file (using stat() and it also ensures it is not a directory). I will omit this part because it is working fine.
My goal is to send a string back to the client with a HTTP header (a string) and the correctly read data, which I would imagine has to become a string at some point to be concatenated with the header for sending back. I know that + is not valid C, but for simplicity I would like to send this: headerString+dataString.
The code below does seem to work for text files but not images. I was hoping that reading each character individually would solve the problem but it does not. When I point a browser (Firefox) at my server looking for an image it tells me "The image (the name of the image) cannot be displayed because it contains errors.".
This is the code that is supposed to read a file into httpData:
int i = 0;
FILE* file;
file = fopen(fullPath, "r");
if (file == NULL) errorMessageExit("Failed to open file");
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
fclose(file);
printf("httpData = %s\n", httpData);
Edit: This is what I send:
char* httpResponse = malloc((strlen(httpHeader)+strlen(httpData)+1)*sizeof(char));
strcpy(httpResponse, httpHeader);
strcat(httpResponse, httpData);
printf("HTTP response = %s\n", httpResponse);
The data part produces ???? for the image but correct html for an html file.
Images contain binary data. Any of the 256 distinct 8-bit patterns may appear in the image including, in particular, the null byte, 0x00 or '\0'. On some systems (notably Windows), you need to distinguish between text files and binary files, using the letter b in the standard I/O fopen() call (works fine on Unix as well as Windows). Given that binary data can contain null bytes, you can't use strcpy() et al to copy chunks of data around since the str*() functions stop copying at the first null byte. Therefore, you have to use the mem*() functions which take a start position and a length, or an equivalent.
Applied to your code, printing the binary httpData with %s won't work properly; the %s will stop at the first null byte. Since you have used stat() to verify the existence of the file, you also have a size for the file. Assuming you don't have to deal with dynamically changing files, that means you can allocate httpData to be the correct size. You can also pass the size to the reading code. This also means that the reading code can use fread() and the writing code can use fwrite(), saving on character-by-character I/O.
Thus, we might have a function:
int readHTTPData(const char *filename, size_t size, char *httpData)
{
FILE *fp = fopen(filename, "rb");
size_t n;
if (fp == 0)
return E_FILEOPEN;
n = fread(httpData, size, 1, fp);
fclose(fp);
if (n != 1)
return E_SHORTREAD;
fputs("httpData = ", stdout);
fwrite(httpData, size, 1, stdout);
putchar('\n');
return 0;
}
The function returns 0 on success, and some predefined (negative?) error numbers on failure. Since memory allocation is done before the routine is called, it is pretty simple:
Open the file; report error if that fails.
Read the file in a single operation.
Close the file.
Report error if the read did not get all the data that was expected.
Report on the data that was read (debugging only — and printing binary data to standard output raw is not the best idea in the world, but it parallels what the code in the question does).
Report on success.
In the original code, there is a loop:
int i = 0;
...
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
This loop has a lot of problems:
You should not use feof() to test whether there is more data to read. It reports whether an EOF indication has been given, not whether it will be given.
Consequently, when the last character has been read, the feof() reports 'false', but the fscanf() tries to read the next (non-existent) character, adds it to the buffer (probably as a letter such as ÿ, y-umlaut, 0xFF, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS).
The code makes no check on how many characters have been read, so it has no protection against buffer overflow.
Using fscanf() to read a single character is a lot of overhead compared to getc().
Here's a more nearly correct version of the code, assuming that size is the number of bytes allocated to httpData.
int i = 0;
int c;
while ((c = getc(file)) != EOF && i < size)
httpData[i++] = c;
You could check that you get EOF when you expect it. Note that the fread() code does the size checking inside the fread() function. Also, the way I wrote the arguments, it is an all-or-nothing proposition — either all size bytes are read or everything is treated as missing. If you want byte counts and are willing to tolerate or handle short reads, you can reverse the order of the size arguments. You could also check the return from fwrite() if you wanted to be sure it was all written, but people tend to be less careful about checking that output succeeded. (It is almost always crucial to check that you got the input you expected, though — don't skimp on input checking.)
At some point, for plain text data, you need to think about CRLF vs NL line endings. Text files handle that automatically; binary files do not. If the data to be transferred is image/png or something similar, you probably don't need to worry about this. If you're on Unix and dealing with text/plain, you may have to worry about CRLF line endings (but I'm not an expert on this — I've not done low-level HTTP stuff recently (not in this millennium), so the rules may have changed).

using read system call after a scanf

I am having a confusion regarding the following code,
#include<stdio.h>
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
read(stdin,buf,4); //attaching to stdin
printf("buffer is %s\n",buf);
return 1;
}
suppose on runtime I provided with the input 10abcd so as per my understanding following should happen:
scanf should place 10 in data
and abcd will still be on the stdin buffer
when read tries to read the stdin (already abcd is there) it should place the abcd into the buf
so printf should print abcd
but it is not happening ,printf showing no o/p
am I missing something here?
First of all read (stdin, ...) should give warnings (if you have them enabled) which you would be wise to heed. read() takes an integer as the first parameter specifying which channel to read from. stdin is of type FILE *.
Even if you changed it to read(0,..., this is not recommended practice. scanf is reading from FILE *stdin which is buffered from file handle 0. read (0, ...) reads directly from the underlying file handle and ignore any characters which were buffered. This will cause strange results unless stdin is set unbuffered.
Ignoring mechanical issues related to the syntax of the read() function call, there are two cases to consider:
Input is from a terminal.
Input is from a file.
Terminal
No data will be available for reading until the user hits return. At that point, the standard I/O library will read all the available data into the buffer associated with stdin (that would be "10abcd\n"). It will then parse the number, leaving the a in the buffer to be read later by other standard I/O functions.
When the read() occurs, it will also wait for the user to provide some input. It has no clue about the data in the stdin buffer. It will hang until the user hits return, and will then read the next lot of data, returning up to 4 bytes in the buffer (no null termination unless it so happens that the fourth character is an ASCII NUL '\0').
File
Actually, this isn't all that much different, except that instead of reading a line of data into the buffer, the standard I/O library will probably read an entire buffer full, (BUFSIZ bytes, which might be 512 or larger). It will then convert the 10 and leave the a for later use. (If the file is shorter than the buffer size, it will all be read into the stdin buffer.)
The read will then collect the next 4 bytes from the file. If the whole file was read already, then it will return nothing — 0 bytes read.
You need to record and check the return value from read(). You should also check the return value from scanf() to ensure it did actually read a number.
try... man read first.
read is declared as ssize_t read(int fd, void *buf, size_t count);
and stdin is declared as FILE *. thats the issue. use fread() instead and you will be sorted.
int main()
{
char buf[100]={'\0'};
int data=0;
scanf("%d",&data);
fread(buf, 1, 4, stdin);
printf("buffer is %s\n",buf);
return 1;
}
EDIT: Your understanding is almost correct but not totally.
To address your question properly, i will agree with Jonathen Laffer.
how your code works,
1) scanf should place 10 in data.
2) abcd will still be on the stdin buffer when you press ENTER.
3) then read() will again wait for entry and you have to again press ENTER to run program further.
4)now if you have entered anything before pressing ENTER for 2nd time the printf should print it else you will not get anything on output other than your printf statement.
Thats why i asked you to use fread instead. hope it helps.

Resources