I have a legacy function accepting a FILE* pointer in a library. The contents I would like to parse is actually in memory, not on disk.
So I came up with the following steps to work around this issue:
the data is in memory at this point
fopen a temporary file (using tmpnam or tmpfile) on disk for writing
fclose the file
fopen the same file again for reading - guaranteed to exist
change the buffer using setvbuf(buffer, size)
do the legacy FILE* stuff
close the file
remove the temporary file
the data can be discarded
On windows, it looks like this:
int bufferSize;
char buffer[bufferSize];
// set up the buffer here
// temporary file name
char tempName [L_tmpnam_s];
tmpnam_s(tempName, L_tmpnam_s);
// open/close/reopen
fopen_s(&fp, tempName,"wb");
fclose(fp);
freopen_s(&fp, tempName,"rb", fp);
// replace the internal buffer
setvbuf(fp, buffer, _IONBF, bufferSize);
fp->_ptr = buffer;
fp->_cnt = bufferSize;
// do the FILE* reading here
// close and remove tmp file
fclose(fp);
remove(tempName);
Works, but quite cumbersome. The main problem, aside from the backwardness of this approach, are:
the temporary name needs to be determined
the temporary file is actually written to disk
the temporary file needs to be removed afterwards
I'd like to keep things portable, so using Windows memory-mapped functions or boost's facilities is not an option. The problem is mainly that, while it is possible to convert a FILE* to an std::fstream, the reverse seems to be impossible, or at least not supported on C++99.
All suggestions welcome!
Update 1
Using a pipe/fdopen/setvbuf as suggested by Speed8ump and a bit of twiddling seems to work. It does no longer create files on disk nor does it consume extra memory. One step closer, except, for some reason, setvbuf is not working as expected. Manually fixing it up is possible, but of course not portable.
// create a pipe for reading, do not allocate memory
int pipefd[2];
_pipe(pipefd, 0, _O_RDONLY | _O_BINARY);
// open the read pipe for binary reading as a file
fp = _fdopen(pipefd[0], "rb");
// try to switch the buffer ptr and size to our buffer, (no buffering)
setvbuf(fp, buffer, _IONBF, bufferSize);
// for some reason, setvbuf does not set the correct ptr/sizes
fp->_ptr = buffer;
fp->_charbuf = fp->_bufsiz = fp->_cnt = bufferSize;
Update 2
Wow. So it seems that unless I dive into the MS-specific implementation CreateNamedPipe / CreateFileMapping, POSIX portability costs us an entire memcopy (of any size!), be it to file or into a pipe. Hopefully the compiler understands that this is just a temporary and optimizes this. Hopefully.
Still, we eliminated the silly device writing intermediate. Yay!
int pipefd[2];
pipe(pipefd, bufferSize, _O_BINARY); // setting internal buffer size
FILE* in = fdopen(pipefd[0], "rb");
FILE* out = fdopen(pipefd[1], "wb");
// the actual copy
fwrite(buffer, 1, bufferSize, out);
fclose(out);
// fread(in), fseek(in), etc..
fclose(in);
You might try using a pipe and fdopen, that seems to be portable, is in-memory, and you might still be able to do the setvbuf trick you are using.
Your setvbuf hack is a nice idea, but not portable. C11 (n1570):
7.21.5.6 The setvbuf function
Synopsis
#include <stdio.h>
int setvbuf(FILE * restrict stream,
char * restrict buf,
int mode, size_t size);
Description
[...] If buf is not a null pointer, the array it points to may be used instead of a buffer allocated by the setvbuf function [...] and the argument size specifies the size of the array; otherwise, size may determine the size of a buffer allocated by the setvbuf function. The contents of the array at any time are indeterminate.
There is neither a guarantee that the provided buffer is used at all, nor about what it contains at any point after the setvbuf call until the file is closed or setvbuf is called again (POSIX doesn't give more guarantees).
The easiest portable solution, I think, is using tmpfile, fwrite the data into that file, fseek to the beginning (I'm not sure if temporary files are guaranteed to be seekable, on my Linux system, it appears they are, and I'd expect them to be elsewhere), and pass the FILE pointer to the function. This still requires copying in memory, but I guess usually no writing of the data to the disk (POSIX, unfortunately, implicitly requires a real file to exist). A file obtained by tmpfile is deleted after closing.
Related
We have library which accepts FILE* (CImg). For performance reason we wish to handle data already in memory without accessing a disk. Target platform is windows which unfortunately does not support fmemopen (and funopen)
char* buf = new char[sz];
FILE *fp = fopen("C:\\test.dat", "wb");
int r = setvbuf(fp, buf, _IOFBF, sz);
r = fwrite(src_buf, 1, sz, fp); // Here r contains right size
fseek(fp, 0, SEEK_END);
size_t sz2 = ftell(fp); // sz2 contains right size as well
rewind(fp);
// Test (something like this actually is somewhere deep inside library)
char* read_buf = new char[sz];
r = fread(read_buf, 1, sz, fp); // Zero!
Final fread() can't read anything... Any suggestions?
Possibly because you opened with "wb" (write-only) instead of "wb+" (read + write)
fopen() function
To help further, you can #include errno.h and print out the error code and string.
printf( "errno: %d, error:'%s'\n", errno, strerror( errno ) );
For our specific case we found CImg plugin doing the thing.
As of a sample in an original question - at least MSVC runtime flushes written content to disk itself. So it isn't a valid replace for fmemopen()
https://github.com/Snaipe/fmem is a wrapper for different platform/version specific implementations of in-memory files
It tries in sequence the following implementations:
open_memstream.
fopencookie, with growing dynamic buffer.
funopen, with growing dynamic buffer.
WinAPI temporary memory-backed file.
When no other mean is available, fmem falls back to tmpfile()
I'm using C to write some data to a file. I want to erase the previous text written in the file in case it was longer than what I'm writing now.
I want to decrease the size of file or truncate until the end. How can I do this?
If you want to preserve the previous contents of the file up to some length (a length bigger than zero, which other answers provide), then POSIX provides the truncate() and ftruncate() functions for the job.
#include <unistd.h>
int ftruncate(int fildes, off_t length);
int truncate(const char *path, off_t length);
The name indicates the primary purpose - shortening a file. But if the specified length is longer than the previous length, the file grows (zero padding) to the new size. Note that ftruncate() works on a file descriptor, not a FILE *; you could use:
if (ftruncate(fileno(fp), new_length) != 0) ...error handling...
However, you should be aware that mixing file stream (FILE *) and file descriptor (int) access to a single file is apt to lead to confusion — see the comments for some of the issues. This should be a last resort.
It is likely, though, that for your purposes, truncate on open is all you need, and for that, the options given by others will be sufficient.
For Windows, there is a function SetEndOfFile() and a related function SetFileValidData() function that can do a similar job, but using a different interface. Basically, you seek to where you want to set the end of file and then call the function.
There's also a function _chsize() as documented in the answer by sofr.
In Windows systems there's no header <unistd.h> but yet you can truncate a file by using
_chsize( fileno(f), size);
That's a function of your operating system. The standard POSIX way to do it is:
open("file", O_TRUNC | O_WRONLY);
If this is to run under some flavor of UNIX, these APIs should be available:
#include <unistd.h>
#include <sys/types.h>
int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);
According to the "man truncate" on my Linux box, these are POSIX-conforming. Note that these calls will actually increase the size of the file (!) if you pass a length greater than the current length.
<edit>
Ah, you edited your post, you're using C. When you open the file, open it with the mode "w+" like so, and it will truncate it ready for writing:
FILE* f = fopen("C:\\gabehabe.txt", "w+");
fclose(file);
</edit>
To truncate a file in C++, you can simply create an ofstream object to the file, using ios_base::trunc as the file mode to truncate it, like so:
ofstream x("C:\\gabehabe.txt", ios_base::trunc);
If you want to truncate the entire file, opening the file up for writing does that for you. Otherwise, you have to open the file for reading, and read the parts of the file you want to keep into a temporary variable, and then output it to wherever you need to.
Truncate entire file:
FILE *file = fopen("filename.txt", "w"); //automatically clears the entire file for you.
Truncate part of the file:
FILE *inFile("filename.txt", "r");
//read in the data you want to keep
fclose(inFile);
FILE *outFile("filename.txt", "w");
//output back the data you want to keep into the file, or what you want to output.
In the book Advanced Programming in the UNIX Environments (2nd edition), the author wrote in Section 5.5 (stream operations of the standard I/O library) that:
When a file is opened for reading and writing (the plus sign in the type), the following restrictions apply.
Output cannot be directly followed by input without an intervening fflush, fseek, fsetpos, or rewind.
Input cannot be directly followed by output without an intervening fseek, fsetpos, or rewind, or an input operation that encounters an end of file.
I got confused about this. Could anyone explain a little about this? For example, in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program? I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear.
You aren't allowed to intersperse input and output operations. For example, you can't use formatted input to seek to a particular point in the file, then start writing bytes starting at that point. This allows the implementation to assume that at any time, the sole I/O buffer will only contain either data to be read (to you) or written (to the OS), without doing any safety checks.
f = fopen( "myfile", "rw" ); /* open for read and write */
fscanf( f, "hello, world\n" ); /* scan past file header */
fprintf( f, "daturghhhf\n" ); /* write some data - illegal */
This is OK, though, if you do an fseek( f, 0, SEEK_CUR ); between the fscanf and the fprintf because that changes the mode of the I/O buffer without repositioning it.
Why is it done this way? As far as I can tell, because OS vendors often want to support automatic mode switching, but fail. The stdio spec allows a buggy implementation to be compliant, and a working implementation of automatic mode switching simply implements a compatible extension.
It's not clear what you're asking.
Your basic question is "Why does the book say I can't do this?" Well, the book says you can't do it because the POSIX/SUS/etc. standard says it's undefined behavior in the fopen specification, which it does to align with the ISO C standard (N1124 working draft, because the final version is not free), 7.19.5.3.
Then you ask, "in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program?"
Undefined behavior will always cause unexpected behavior, because the whole point is that you're not allowed to expect anything. (See 3.4.3 and 4 in the C standard linked above.)
But on top of that, it's not even clear what they could have specified that would make any sense. Look at this:
int main(int argc, char *argv[]) {
FILE *fp = fopen("foo", "r+");
fseek(fp, 0, SEEK_SET);
fwrite("foo", 1, 3, fp);
fseek(fp, 0, SEEK_SET);
fwrite("bar", 1, 3, fp);
char buf[4] = { 0 };
size_t ret = fread(buf, 1, 3, fp);
printf("%d %s\n", (int)ret, buf);
}
So, should this print out 3 foo because that's what's on disk, or 3 bar because that's what's in the "conceptual file", or 0 because there's nothing after what's been written so you're reading at EOF? And if you think there's an obvious answer, consider the fact that it's possible that bar has been flushed already—or even that it's been partially flushed, so the disk file now contains boo.
If you're asking the more practical question "Can I get away with it in some circumstances?", well, I believe on most Unix platforms, the above code will give you an occasional segfault, but 3 xyz (either 3 uninitialized characters, or in more complicated cases 3 characters that happened to be in the buffer before it got overwritten) the rest of the time. So, no, you can't get away with it.
Finally, you say, "I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear." This sounds like you're asking about the rationale.
You're right that it's about buffering. As I pointed out above, there really is no intuitive right thing to do here—but also, think about the implementation. Remember that the Unix way has always been "if the simplest and most efficient code is good enough, do that".
There are three ways you could implement something like stdio:
Use a shared buffer for read and write, and write code to switch contexts as needed. This is going to be a bit complicated, and will flush buffers more often than you'd ideally like.
Use two separate buffers, and cache-style code to determine when one operation needs to copy from and/or invalidate the other buffer. This is even more complicated, and makes a FILE object take twice as much memory.
Use a shared buffer, and just don't allow interleaving reads and writes without explicit flushes in between. This is dead-simple, and as efficient as possible.
Use a shared buffer, and implicitly flush between interleaved reads and writes. This is almost as simple, and almost as efficient, and a lot safer, but not really any better in any way other than safety.
So, Unix went with #3, and documented it, and SUS, POSIX, C89, etc. standardized that behavior.
You might say, "Come on, it can't be that inefficient." Well, you have to remember that Unix was designed for low-end 1970s systems, and the basic philosophy that it's not worth trading off even a little efficiency unless there's some actual benefit. But, most importantly, consider that stdio has to handle trivial functions like getc and putc, not just fancy stuff like fscanf and fprintf, and adding anything to those functions (or macros) that makes them 5x as slow would make a huge difference in a lot of real-world code.
If you look at modern implementations from, e.g., *BSD, glibc, Darwin, MSVCRT, etc. (most of which are open source, or at least commercial-but-shared-source), most of them do things the same way. A few add safety checks, but they generally give you an error for interleaving rather than implicitly flushing—after all, if your code is wrong, it's better to tell you that your code is wrong than to try to DWIM.
For example, look at early Darwin (OS X) fopen, fread, and fwrite (chosen because it's nice and simple, and has easily-linkable code that's syntax-colored but also copy-pastable). All that fread has to do is copy bytes out of the buffer, and refill the buffer if it runs out. You can't get any simpler than that.
reason 1
find the real file position to start.
due to the buffer implementation of the stdio, the stdio stream position may differ from the OS file position. when you read 1 byte, stdio mark the file position to 1. Due to the buffering, stdio may read 4096 bytes from the underlying file, where OS would record its file position at 4096. When you switch to output, you really need to choose which position you want to use.
reason 2
find the right buffer cursor to start.
tl;dr,
if an underlying implementation only uses a single shared buffer for both read and write, you have to flush the buffer when changing IO direction.
Take this glibc used in chromium os to demo how fwrite, fseek, and fflush handle the single shared buffer.
fwrite fill buffer impl:
fill_buffer:
while (to_write > 0)
{
register size_t n = to_write;
if (n > buffer_space)
n = buffer_space;
buffer_space -= n;
written += n;
to_write -= n;
if (n < 20)
while (n-- > 0)
*stream->__bufp++ = *p++;
else
{
memcpy ((void *) stream->__bufp, (void *) p, n);
stream->__bufp += n;
p += n;
}
if (to_write == 0)
/* Done writing. */
break;
else if (buffer_space == 0)
{
/* We have filled the buffer, so flush it. */
if (fflush (stream) == EOF)
break;
from this code snippet, we can see, if buffer is full, it will flush it.
Let's take a look at fflush
int
fflush (stream)
register FILE *stream;
{
if (stream == NULL) {...}
if (!__validfp (stream) || !stream->__mode.__write)
{
__set_errno (EINVAL);
return EOF;
}
return __flshfp (stream, EOF);
}
it uses __flshfp
/* Flush the buffer for FP and also write C if FLUSH_ONLY is nonzero.
This is the function used by putc and fflush. */
int
__flshfp (fp, c)
register FILE *fp;
int c;
{
/* Make room in the buffer. */
(*fp->__room_funcs.__output) (fp, flush_only ? EOF : (unsigned char) c);
}
the __room_funcs.__output by default is using flushbuf
/* Write out the buffered data. */
wrote = (*fp->__io_funcs.__write) (fp->__cookie, fp->__buffer,
to_write);
Now we are close. What's __write? Trace the default settings aforementioned, it's __stdio_write
int
__stdio_write (cookie, buf, n)
void *cookie;
register const char *buf;
register size_t n;
{
const int fd = (int) cookie;
register size_t written = 0;
while (n > 0)
{
int count = __write (fd, buf, (int) n);
if (count > 0)
{
buf += count;
written += count;
n -= count;
}
else if (count < 0
#if defined (EINTR) && defined (EINTR_REPEAT)
&& errno != EINTR
#endif
)
/* Write error. */
return -1;
}
return (int) written;
}
__write is the system call to write(3).
As we can see, the fwrite is only using only one single buffer. If you change direction, it can still store the previous write contents. From the above example, you can call fflush to empty the buffer.
The same applies to fseek
/* Move the file position of STREAM to OFFSET
bytes from the beginning of the file if WHENCE
is SEEK_SET, the end of the file is it is SEEK_END,
or the current position if it is SEEK_CUR. */
int
fseek (stream, offset, whence)
register FILE *stream;
long int offset;
int whence;
{
...
if (stream->__mode.__write && __flshfp (stream, EOF) == EOF)
return EOF;
...
/* O is now an absolute position, the new target. */
stream->__target = o;
/* Set bufp and both end pointers to the beginning of the buffer.
The next i/o will force a call to the input/output room function. */
stream->__bufp
= stream->__get_limit = stream->__put_limit = stream->__buffer;
...
}
it will soft flush (reset) the buffer at the end, which means read buffer will be emptied after this call.
This obeys the C99 rationale:
A change of input/output direction on an update file is only allowed following a successful fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions which assure that the I/O buffer has been flushed.
I'm looking for a way to pass in a FILE * to some function so that the function can write to it with fprintf. This is easy if I want the output to turn up in an actual file on disk, say. But what I'd like instead is to get all the output as a string (char *). The kind of API I'd like is:
/** Create a FILE object that will direct writes into an in-memory buffer. */
FILE *open_string_buffer(void);
/** Get the combined string contents of a FILE created with open_string_buffer
(result will be allocated using malloc). */
char *get_string_buffer(FILE *buf);
/* Sample usage. */
FILE *buf;
buf = open_string_buffer();
do_some_stuff(buf); /* do_some_stuff will use fprintf to write to buf */
char *str = get_string_buffer(buf);
fclose(buf);
free(str);
The glibc headers seem to indicate that a FILE can be set up with hook functions to perform the actual reading and writing. In my case I think I want the write hook to append a copy of the string to a linked list, and for there to be a get_string_buffer function that figures out the total length of the list, allocates memory for it, and then copies each item into it in the correct place.
I'm aiming for something that can be passed to a function such as do_some_stuff without that function needing to know anything other than that it's got a FILE * it can write to.
Is there an existing implementation of something like this? It seems like a useful and C-friendly thing to do -- assuming I'm right about the FILE extensibility.
If portability is not important for you, you can take a look on fmemopen and open_memstream. They are GNU extensions, hence only available on glibc systems. Although it looks like they are part of POSIX.1-2008 (fmemopen and open_memstream).
I'm not sure if it's possible to non-portably extend FILE objects, but if you are looking for something a little bit more POSIX friendly, you can use pipe and fdopen.
It's not exactly the same as having a FILE* that returns bytes from a buffer, but it certainly is a FILE* with programmatically determined contents.
int fd[2];
FILE *in_pipe;
if (pipe(fd))
{
/* TODO: handle error */
}
in_pipe = fdopen(fd[0], "r");
if (!in_pipe)
{
/* TODO: handle error */
}
From there you will want to write your buffer into fd[1] using write(). Careful with this step, though, because write() may block if the pipe's buffer is full (i.e. someone needs to read the other end), and you might get EINTR if your process gets a signal while writing. Also watch out for SIGPIPE, which happens when the other end closes the pipe. Maybe for your use you might want to do the write of the buffer in a separate thread to avoid blocking and make sure you handle SIGPIPE.
Of course, this won't create a seekable FILE*...
I'm not sure I understand why you want to mess up with FILE *. Couldn't you simply write to a file and then load it in string?
char *get_file_in_buf(char *filename) {
char *buffer;
... get file size with fseek or fstat ...
... allocate buffer ...
... read buffer from file ...
return buffer;
}
If you only want to "write" formatted text into a string, another option could be to handle an extensible buffer using snprintf() (see the answers to this SO question for a suggestion on how to handle this: Resuming [vf]?nprintf after reaching the limit).
If, instead, you want to create a type that can be passed transparently to any function taking a FILE * to make them act on string buffers, it's a much more complex matter ...
I'm using C to write some data to a file. I want to erase the previous text written in the file in case it was longer than what I'm writing now.
I want to decrease the size of file or truncate until the end. How can I do this?
If you want to preserve the previous contents of the file up to some length (a length bigger than zero, which other answers provide), then POSIX provides the truncate() and ftruncate() functions for the job.
#include <unistd.h>
int ftruncate(int fildes, off_t length);
int truncate(const char *path, off_t length);
The name indicates the primary purpose - shortening a file. But if the specified length is longer than the previous length, the file grows (zero padding) to the new size. Note that ftruncate() works on a file descriptor, not a FILE *; you could use:
if (ftruncate(fileno(fp), new_length) != 0) ...error handling...
However, you should be aware that mixing file stream (FILE *) and file descriptor (int) access to a single file is apt to lead to confusion — see the comments for some of the issues. This should be a last resort.
It is likely, though, that for your purposes, truncate on open is all you need, and for that, the options given by others will be sufficient.
For Windows, there is a function SetEndOfFile() and a related function SetFileValidData() function that can do a similar job, but using a different interface. Basically, you seek to where you want to set the end of file and then call the function.
There's also a function _chsize() as documented in the answer by sofr.
In Windows systems there's no header <unistd.h> but yet you can truncate a file by using
_chsize( fileno(f), size);
That's a function of your operating system. The standard POSIX way to do it is:
open("file", O_TRUNC | O_WRONLY);
If this is to run under some flavor of UNIX, these APIs should be available:
#include <unistd.h>
#include <sys/types.h>
int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);
According to the "man truncate" on my Linux box, these are POSIX-conforming. Note that these calls will actually increase the size of the file (!) if you pass a length greater than the current length.
<edit>
Ah, you edited your post, you're using C. When you open the file, open it with the mode "w+" like so, and it will truncate it ready for writing:
FILE* f = fopen("C:\\gabehabe.txt", "w+");
fclose(file);
</edit>
To truncate a file in C++, you can simply create an ofstream object to the file, using ios_base::trunc as the file mode to truncate it, like so:
ofstream x("C:\\gabehabe.txt", ios_base::trunc);
If you want to truncate the entire file, opening the file up for writing does that for you. Otherwise, you have to open the file for reading, and read the parts of the file you want to keep into a temporary variable, and then output it to wherever you need to.
Truncate entire file:
FILE *file = fopen("filename.txt", "w"); //automatically clears the entire file for you.
Truncate part of the file:
FILE *inFile("filename.txt", "r");
//read in the data you want to keep
fclose(inFile);
FILE *outFile("filename.txt", "w");
//output back the data you want to keep into the file, or what you want to output.