I know that fwrite takes the following parameters:
fwrite ( const void * ptr, size_t size, size_t count, FILE * stream );
As far as I know, size_t is a typedef and nothing else than:
typedef unsigned long size_t;
Is it possible to use values greater than size_t for count and write?
And if it is not could I connect the written blocks somehow?
No, fwrite accepts only values that fit in the size_t type.
There may be implementation-specific ways to write more but, for standard C, the approach is generally just to do sequential fwrite calls. Each subsequent call will append to what you've already written.
And keep in mind that size_t is a distinct type. It may be defined as an unsigned long is some implementations but that's not guaranteed.
From an application programmers point of view a file is a contiguous series of bytes. Successive writes will position the data sequentially onto a file. (This comment is necessary because some will argue details NOT relevant to your question).
Thus:
fwrite(&user_record1, sizeof(user_record1), 1, fp);
fwrite(&user_record2, sizeof(user_record2), 1, fp);
Results in two user records, one immediately following the other, on the file.
If you have a very large record, then divide it into two smaller records, as:
fwrite(&user_record_parta, sizeof(user_record1), 1, fp);
fwrite(&user_record_partb, sizeof(user_record2), 1, fp);
However, I would question an application design that uses such large records. Perhaps what you are really doing in the application is writing an array of user records and that array grows really large. If this is the case, write each entry of the array, rather than the whole array.
In order to use fwrite, the entire object you're writing must be in the object pointed to by ptr. Unless you have a really messed-up C implementation, it's impossible to have an object larger than the maximum value of size_t, so trying to write more bytes than that would be a programming error, since the pointed-to object is not actually that large anyway.
If you use greater values than ULONG_MAX it will just wrap around to 0.
What you can do is write ULONG_MAX Bytes, seek to the end position and continue writing, then do that over in a loop, until you've written all your data.
Related
This question already has answers here:
How does fread really work?
(7 answers)
Closed 9 years ago.
So I was trying to write something to stdout from my p->payload which is supposed to be a pointer to a data. And block_size is 256.
My problem is my stdout would always be exactly the same if I switch p->block_size and sizeof(char). Can someone tell my why? Thanks.
fwrite(p->payload, p->block_size,sizeof(char),stdout);
The difference is the value that it returns.
fwrite returns the number of data elements written.
If you execute
count = fwrite(ptr, 1, N, stdout);
then count will contain the exact number of bytes successfully written. If you do:
count = fwrite(ptr, N, 1, stdout);
then count will contain 1 if the entire write succeeded, or 0 if any of it failed.
The distinction may be useful if you need to distinguish between partial success and complete failure. If a partial write is no better than not writing anything at all, you can use the second form.
(There's not likely to be any significant difference in performance.)
See also this similar question about fread.
In general, just assume size * items bytes will be written.
This seems to be an attempt to mirror the fread API, which also has size and bytes parameters. For fread, this is more useful, as (for instance) you can ask it to read one item of 1000 bytes--meaning, "I want exactly 1000 bytes"--or you can ask it to read 1000 items of one byte apiece--meaning "I'll take up to 1000 bytes; less data than that is OK." Obviously, lots of useful combinations can be used here if your data being read is fixed-size.
In the case of fwrite, though, the only time you'd expect less than size * items bytes to be written would be in an error condition. In the case of "disk full" errors, you could hypothetically choose size and items such that you write as many complete items as possible, but never write a half-item; however, I've never seen anyone actually write code that works this way and doesn't just fail on disk full.
According to fwrite(3):
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
The function fwrite() writes nmemb elements of data, each size bytes long, to the stream pointed to by stream, obtaining them from the location given by ptr.
This means fwrite() will write 'size * nmemb' bytes from ptr to stream, because 'size * nmemb' equals 'nmemb * size', so you have the same number of bytes written to stream after switched size and nmemb.
I have a very large, known number of bytes on stdin, and wish to discard a large (also known) number of them before reading the portion of interest (in other words, I want to fseek forward by a large integer, but fseek isn't defined for pipes). The simplest way to achieve this seems to be a large number of calls to fgetc, and the first alternative is to use a single call to fread with a large scratch pointer allocated to store the result. The first is very slow, and the second uses a potentially unbounded amount of memory for no good reason. Making multiple smaller reads solves the unbounded memory usage issue, but introduces a free parameter (the chunk size) which probably has a different fastest value for every machine and OS combination.
Are there any alternatives that achieve this goal in a neat, efficient fashion? POSIX is assumed.
There is no way to "skip" data on a pipe - you have to read it.
If it's a very large block, you will want to use a medium-sized buffer (as a compromise between overhead and memory usage), something like this:
size_t dataToRead = some_large_number;
while(dataToRead)
{
char buffer[4096];
size_t toread = min(sizeof(buffer), dataToRead);
size_t nread = fread(buffer, 1, toread, stdin);
dataToRead -= nread;
}
The size, 4096 is a rather arbitrary choice - but it's big enough not to cause a HUGE number of reads to the input, and small enough to not use crazy amounts of stack-space. It is unlikely that you will gain/loose much from changing this size.
I am wondering if this is the best way to go about solving my problem.
I know the values for particular offsets of a binary file where the information I want is held...What I want to do is jump to the offsets and then read a certain amount of bytes, starting from that location.
After using google, I have come to the conclusion that my best bet is to use fseek() to move to the position of the offset, and then to use fread() to read an amount of bytes from that position.
Am I correct in thinking this? And if so, how is best to go about doing so? i.e. how to incorporate the two together.
If I am not correct, what would you suggest I do instead?
Many thanks in advance for your help.
Matt
Edit:
I followed a tutorial on fread() and adjusted it to the following:
`#include <stdio.h>
int main()
{
FILE *f;
char buffer[11];
if (f = fopen("comm_array2.img", "rt"))
{
fread(buffer, 1, 10, f);
buffer[10] = 0;
fclose(f);
printf("first 10 characters of the file:\n%s\n", buffer);
}
return 0;
}`
So I used the file 'comm_array2.img' and read the first 10 characters from the file.
But from what I understand of it, this goes from start-of-file, I want to go from some-place-in-file (offset)
Is this making more sense?
Edit Number 2:
It appears that I was being a bit dim, and all that is needed (it would seem from my attempt) is to put the fseek() before the fread() that I have in the code above, and it seeks to that location and then reads from there.
If you are using file streams instead of file descriptors, then you can write yourself a (simple) function analogous to the POSIX pread() system call.
You can easily emulate it using streams instead of file descriptors1. Perhaps you should write yourself a function such as this (which has a slightly different interface from the one I suggested in a comment):
size_t fpread(void *buffer, size_t size, size_t mitems, size_t offset, FILE *fp)
{
if (fseek(fp, offset, SEEK_SET) != 0)
return 0;
return fread(buffer, size, nitems, fp);
}
This is a reasonable compromise between the conventions of pread() and fread().
What would the syntax of the function call look like? For example, reading from the offset 732 and then again from offset 432 (both being from start of the file) and filestream called f.
Since you didn't say how many bytes to read, I'm going to assume 100 each time. I'm assuming that the target variables (buffers) are buffer1 and buffer2, and that they are both big enough.
if (fpread(buffer1, 100, 1, 732, f) != 1)
...error reading at offset 732...
if (fpread(buffer2, 100, 1, 432, f) != 1)
...error reading at offset 432...
The return count is the number of complete units of 100 bytes each; either 1 (got everything) or 0 (something went awry).
There are other ways of writing that code:
if (fpread(buffer1, sizeof(char), 100, 732, f) != 100)
...error reading at offset 732...
if (fpread(buffer2, sizeof(char), 100, 432, f) != 100)
...error reading at offset 432...
This reads 100 single bytes each time; the test ensures you got all 100 of them, as expected. If you capture the return value in this second example, you can know how much data you did get. It would be very surprising if the first read succeeded and the second failed; some other program (or thread) would have had to truncate the file between the two calls to fpread(), but funnier things have been known to happen.
1 The emulation won't be perfect; the pread() call provides guaranteed atomicity that the combination of fseek() and fread() will not provide. But that will seldom be a problem in practice, unless you have multiple processes or threads concurrently updating the file while you are trying to position and read from it.
It frequently depends on the distance between the parts you care about. If you're only skipping over/ignoring a few bytes between the parts you care about, it's often easier to just read that data and ignore what you read, rather than using fseek to skip past it. A typical way to do this is define a struct holding both the data you care about, and place-holders for the ones you don't care about, read in the struct, and then just use the parts you care about:
struct whatever {
long a;
long ignore;
short b;
} w;
fread(&w, 1, sizeof(w), some_file);
// use 'w.a' and 'w.b' here.
If there's any great distance between the parts you care about, though, chances are that your original idea of using fseek to get to the parts that matter will be simpler.
Your theory sounds correct. Open, seek, read, close.
Create a struct to for the data you want to read and pass a pointer to read() of struct's allocated memory. You'll likely need #pragma pack(1) or similar on the struct to prevent misalignment problems.
I have a text file called test.txt
Inside it will be a single number, it may be any of the following:
1
2391
32131231
3123121412
I.e. it could be any size of number, from 1 digit up to x digits.
The file will only have 1 thing in it - this number.
I want a bit of code using fread() which will read that number of bytes from the file and put it into an appropriately sized variable.
This is to run on an embedded device; I am concerned about memory usage.
How to solve this problem?
You can simply use:
char buffer[4096];
size_t nbytes = fread(buffer, sizeof(char), sizeof(buffer), fp);
if (nbytes == 0)
...EOF or other error...
else
...process nbytes of data...
Or, in other words, provide yourself with a data space big enough for any valid data and then record how much data was actually read into the string. Note that the string will not be null terminated unless either buffer contained all zeroes before the fread() or the file contained a zero byte. You cannot rely on a local variable being zeroed before use.
It is not clear how you want to create the 'appropriately sized variable'. You might end up using dynamic memory allocation (malloc()) to provide the correct amount of space, and then return that allocated pointer from the function. Remember to check for a null return (out of memory) before using it.
If you want to avoid over-reading, fread is not the right function. You probably want fscanf with a conversion specifier along the lines of %100[0123456789]...
One way to achieve this is to use fseek to move your file stream location to the end of the file:
fseek(file, SEEK_END, SEEK_SET);
and then using ftell to get the position of the cursor in the file — this returns the position in bytes so you can then use this value to allocate a suitably large buffer and then read the file into that buffer.
I have seen warnings saying this may not always be 100% accurate but I've used it in several instances without a problem — I think the issues could be dependant on specific implementations of the functions on certain platforms.
Depending on how clever you need to be with the number conversion... If you do not need to be especially clever and fast, you can read it a character at a time with getc(). So,
- start with a variable initialized to 0.
- Read a character, multiply variable by 10 and add new digit.
- Then repeat until done.
Get a bigger sized variable as needed along the way or start with your largest sized variable and then copy it into the smallest size that fits after you finish.
We had a discussion here at work regarding why fread() and fwrite() take a size per member and count and return the number of members read/written rather than just taking a buffer and size. The only use for it we could come up with is if you want to read/write an array of structures which aren't evenly divisible by the platform alignment and hence have been padded but that can't be so common as to warrant this choice in design.
From fread(3):
The function fread() reads nmemb elements of data, each size bytes long,
from the stream pointed to by stream, storing them at the location given
by ptr.
The function fwrite() writes nmemb elements of data, each size bytes
long, to the stream pointed to by stream, obtaining them from the location
given by ptr.
fread() and fwrite() return the number of items successfully read or written
(i.e., not the number of characters). If an error occurs, or the
end-of-file is reached, the return value is a short item count (or zero).
The difference in fread(buf, 1000, 1, stream) and fread(buf, 1, 1000, stream) is, that in the first case you get only one chunk of 1000 bytes or nothing, if the file is smaller and in the second case you get everything in the file less than and up to 1000 bytes.
It's based on how fread is implemented.
The Single UNIX Specification says
For each object, size calls shall be
made to the fgetc() function and the
results stored, in the order read, in
an array of unsigned char exactly
overlaying the object.
fgetc also has this note:
Since fgetc() operates on bytes,
reading a character consisting of
multiple bytes (or "a multi-byte
character") may require multiple calls
to fgetc().
Of course, this predates fancy variable-byte character encodings like UTF-8.
The SUS notes that this is actually taken from the ISO C documents.
This is pure speculations, however back in the days(Some are still around) many filesystems were not simple byte streams on a hard drive.
Many file systems were record based, thus to satisfy such filesystems in an efficient manner, you'll have to specify the number of items ("records"), allowing fwrite/fread to operate on the storage as records, not just byte streams.
Here, let me fix those functions:
size_t fread_buf( void* ptr, size_t size, FILE* stream)
{
return fread( ptr, 1, size, stream);
}
size_t fwrite_buf( void const* ptr, size_t size, FILE* stream)
{
return fwrite( ptr, 1, size, stream);
}
As for a rationale for the parameters to fread()/fwrite(), I've lost my copy of K&R long ago so I can only guess. I think that a likely answer is that Kernighan and Ritchie may have simply thought that performing binary I/O would be most naturally done on arrays of objects. Also, they may have thought that block I/O would be faster/easier to implement or whatever on some architectures.
Even though the C standard specifies that fread() and fwrite() be implemented in terms of fgetc() and fputc(), remember that the standard came into existence long after C was defined by K&R and that things specified in the standard might not have been in the original designers ideas. It's even possible that things said in K&R's "The C Programming Language" might not be the same as when the language was first being designed.
Finally, here's what P.J. Plauger has to say about fread() in "The Standard C Library":
If the size (second) argument is greater than one, you cannot determine
whether the function also read up to size - 1 additional characters beyond what it reports.
As a rule, you are better off calling the function as fread(buf, 1, size * n, stream); instead of
fread(buf, size, n, stream);
Bascially, he's saying that fread()'s interface is broken. For fwrite() he notes that, "Write errors are generally rare, so this is not a major shortcoming" - a statement I wouldn't agree with.
Likely it goes back to the way that file I/O was implemented. (back in the day) It might have been faster to write / read to files in blocks then to write everything at once.
Having separate arguments for size and count could be advantageous on an implementation that can avoid reading any partial records. If one were to use single-byte reads from something like a pipe, even if one was using fixed-format data, one would have to allow for the possibility of a record getting split over two reads. If could instead requests e.g. a non-blocking read of up to 40 records of 10 bytes each when there are 293 bytes available, and have the system return 290 bytes (29 whole records) while leaving 3 bytes ready for the next read, that would be much more convenient.
I don't know to what extent implementations of fread can handle such semantics, but they could certainly be handy on implementations that could promise to support them.
I think it is because C lacks function overloading. If there was some, size would be redundant. But in C you can't determine a size of an array element, you have to specify one.
Consider this:
int intArray[10];
fwrite(intArray, sizeof(int), 10, fd);
If fwrite accepted number of bytes, you could write the following:
int intArray[10];
fwrite(intArray, sizeof(int)*10, fd);
But it is just inefficient. You will have sizeof(int) times more system calls.
Another point that should be taked into consideration is that you usually don't want a part of an array element be written to a file. You want the whole integer or nothing. fwrite returns a number of elements succesfully written. So if you discover that only 2 low bytes of an element is written what would you do?
On some systems (due to alignment) you can't access one byte of an integer without creating a copy and shifting.