C read binary stdin - c

I'm trying to build an instruction pipeline simulator and I'm having a lot of trouble getting started. What I need to do is read binary from stdin, and then store it in memory somehow while I manipulate the data. I need to read in chunks of exactly 32 bits one after the other.
How do I read in chunks of exactly 32 bits at a time? Secondly, how do I store it for manipulation later?
Here's what I've got so far, but examining the binary chunks I read further, it just doesn't look right, I don't think I'm reading exactly 32 bits like I need.
char buffer[4] = { 0 }; // initialize to 0
unsigned long c = 0;
int bytesize = 4; // read in 32 bits
while (fgets(buffer, bytesize, stdin)) {
memcpy(&c, buffer, bytesize); // copy the data to a more usable structure for bit manipulation later
// more stuff
buffer[0] = 0; buffer[1] = 0; buffer[2] = 0; buffer[3] = 0; // set to zero before next loop
}
fclose(stdin);
How do I read in 32 bits at a time (they are all 1/0, no newlines etc), and what do I store it in, is char[] okay?
EDIT: I'm able to read the binary in but none of the answers produce the bits in the correct order — they are all mangled up, I suspect endianness and problems reading and moving 8 bits around ( 1 char) at a time — this needs to work on Windows and C ... ?

What you need is freopen(). From the manpage:
If filename is a null pointer, the freopen() function shall attempt to change the mode of the stream to that specified by mode, as if the name of the file currently associated with the stream had been used. In this case, the file descriptor associated with the stream need not be closed if the call to freopen() succeeds. It is implementation-defined which changes of mode are permitted (if any), and under what circumstances.
Basically, the best you can really do is this:
freopen(NULL, "rb", stdin);
This will reopen stdin to be the same input stream, but in binary mode. In the normal mode, reading from stdin on Windows will convert \r\n (Windows newline) to the single character ASCII 10. Using the "rb" mode disables this conversion so that you can properly read in binary data.
freopen() returns a filehandle, but it's the previous value (before we put it in binary mode), so don't use it for anything. After that, use fread() as has been mentioned.
As to your concerns, however, you may not be reading in "32 bits" but if you use fread() you will be reading in 4 chars (which is the best you can do in C - char is guaranteed to be at least 8 bits but some historical and embedded platforms have 16 bit chars (some even have 18 or worse)). If you use fgets() you will never read in 4 bytes. You will read in at least 3 (depending on whether any of them are newlines), and the 4th byte will be '\0' because C strings are nul-terminated and fgets() nul-terminates what it reads (like a good function). Obviously, this is not what you want, so you should use fread().

Consider using SET_BINARY_MODE macro and setmode:
#ifdef _WIN32
# include <io.h>
# include <fcntl.h>
# define SET_BINARY_MODE(handle) setmode(handle, O_BINARY)
#else
# define SET_BINARY_MODE(handle) ((void)0)
#endif
More details about SET_BINARY_MODE macro here: "Handling binary files via standard I/O"
More details about setmode here: "_setmode"

I had to piece the answer together from the various comments from the kind people above, so here is a fully-working sample that works - only for Windows, but you can probably translate the windows-specific stuff to your platform.
#include "stdafx.h"
#include "stdio.h"
#include "stdlib.h"
#include "windows.h"
#include <io.h>
#include <fcntl.h>
int main()
{
char rbuf[4096];
char *deffile = "c:\\temp\\outvideo.bin";
size_t r;
char *outfilename = deffile;
FILE *newin;
freopen(NULL, "rb", stdin);
_setmode(_fileno(stdin), _O_BINARY);
FILE *f = fopen(outfilename, "w+b");
if (f == NULL)
{
printf("unable to open %s\n", outfilename);
exit(1);
}
for (;; )
{
r = fread(rbuf, 1, sizeof(rbuf), stdin);
if (r > 0)
{
size_t w;
for (size_t nleft = r; nleft > 0; )
{
w = fwrite(rbuf, 1, nleft, f);
if (w == 0)
{
printf("error: unable to write %d bytes to %s\n", nleft, outfilename);
exit(1);
}
nleft -= w;
fflush(f);
}
}
else
{
Sleep(10); // wait for more input, but not in a tight loop
}
}
return 0;
}

For Windows, this Microsoft _setmode example specifically shows how to change stdin to binary mode:
// crt_setmode.c
// This program uses _setmode to change
// stdin from text mode to binary mode.
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
int main( void )
{
int result;
// Set "stdin" to have binary mode:
result = _setmode( _fileno( stdin ), _O_BINARY );
if( result == -1 )
perror( "Cannot set mode" );
else
printf( "'stdin' successfully changed to binary mode\n" );
}

fgets() is all wrong here. It's aimed at human-readable ASCII text terminated by end-of-line characters, not binary data, and won't get you what you need.
I recently did exactly what you want using the read() call. Unless your program has explicitly closed stdin, for the first argument (the file descriptor), you can use a constant value of 0 for stdin. Or, if you're on a POSIX system (Linux, Mac OS X, or some other modern variant of Unix), you can use STDIN_FILENO.

fread() suits best for reading binary data.
Yes, char array is OK, if you are planning to process them bytewise.

I don't know what OS you are running, but you typically cannot "open stdin in binary". You can try things like
int fd = fdreopen (fileno (stdin), outfname, O_RDONLY | OPEN_O_BINARY);
to try to force it. Then use
uint32_t opcode;
read(fd, &opcode, sizeof (opcode));
But I have no actually tried it myself. :)

I had it right the first time, except, I needed ntohl ... C Endian Conversion : bit by bit

Related

Read wide char from a stream created with fmemopen

I'm trying to read a wide char from a stream that was created using fmemopen with a char *.
char *s = "foo bar foo";
FILE *f = fmemopen(s,strlen(s),"r");
wchar_t c = getwc(f);
getwc throws a segmentation fault, I checked using GDB.
I know this is due to opening the stream with fmemopen, because calling getwc on a stream opened normally works fine.
Is there a wide char version of fmemopen, or is there some other way to fix this problem?
The second line should read FILE *f = fmemopen(s, strlen(s), "r");. As posted, fmemopen has undefined behavior and might return NULL, which causes getwc() to crash.
Changing the fmemopen() line and adding a check for NULL fixes the crash but does not meet the OPs goal.
It seems wide orientation is not supported on streams open with fmemopen(), At least for the GNU C library. Note that fmemopen is not defined in the C Standard but in POSIX.1-2008 and is not available on many systems (like OS/X).
Here is a corrected and extended version of your program:
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
int main(void) {
const char *s = "foo bar foo";
FILE *f = fmemopen((void *)s, strlen(s), "r");
wchar_t c;
if (f == NULL) {
printf("fmemopen failed: %s\n", strerror(errno));
return 1;
}
printf("default wide orientation: %d\n", fwide(f, 0));
printf("selected wide orientation: %d\n", fwide(f, 1));
while ((c = getwc(f)) != WEOF) {
printf("read %lc (%d 0x%x)\n", c, c, c);
}
return 0;
}
Run on linux:
default wide orientation: -1
selected wide orientation: -1
No output, WEOF is returned immediately.
Explanation for fwide(f, 0) from the linux man page:
SYNOPSIS
#include <wchar.h>
int fwide(FILE *stream, int mode);
When mode is zero, the fwide() function determines the current orientation of stream. It returns a positive value if stream is wide-character oriented, that is, if wide-character I/O is permitted but char I/O is disallowed. It returns a negative value if stream is byte oriented, i.e., if char I/O is permitted but wide-character I/O is disallowed. It returns zero if stream has no orientation yet; in this case the next I/O operation might change the orientation (to byte oriented if it is a char I/O operation, or to wide-character oriented if it is a wide-character I/O operation).
Once a stream has an orientation, it cannot be changed and persists until the stream is closed.
When mode is nonzero, the fwide() function first attempts to set stream's orientation (to wide-character oriented if mode is greater than 0, or to byte oriented if mode is less than 0). It then returns a value denoting the current orientation, as above.
The stream returned by fmemopen() is byte-oriented and cannot be changed to wide-character oriented.
Your second line does not use the correct number of parameters, does it? corrected
FILE *fmemopen(void *buf, size_t size, const char *mode);
glibc's fmemopen does not (fully) support wide characters AFAIK. There's also open_wmemstream(), which supports wide characters but is just for writing.
Is _UNICODE defined? See wchar_t reading.
Also, have you set the locale to an encoding that supports Unicode, for example, setlocale(LC_ALL, "en_US.UTF-8");? See here.
Consider using a temporary file. Consider using fgetwc / 4 instead.
I have changed my code and adopted the code from #chqrlie since it more close to the OP code but added the locale, otherwise it fails to produce correct output for extended/Unicode characters.
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>
int main(void)
{
setlocale(LC_ALL, "en_US.UTF-8");
const char *s = "foo $€ bar foo";
FILE *f = fmemopen((void *)s, strlen(s), "r");
wchar_t c;
if (f == NULL) {
printf("fmemopen failed: %s\n", strerror(errno));
return 1;
}
printf("default wide orientation: %d\n", fwide(f, 0));
printf("selected wide orientation: %d\n", fwide(f, 1));
while ((c = getwc(f)) != WEOF) {
printf("read %lc (%d 0x%x)\n", c, c, c);
}
return 0;
}
You can use getwc() only on unoriented or wide-oriented stream. From getwc() man page: The stream shall not have an orientation yet, or be wide-oriented.
It is not possible to change stream orientation, if the stream already has orientation. From fwide() man page: Calling this function on a stream that already has an orientation cannot change it.
Stream opened with glibc's fmemopen() has an byte-orientation and therefore can't be wide-oriented in any way. As described here uClibc has fmemopen() routine without this limitation.
Conclusion: You need to use uClibc or another library or make your own fmemopen().

How multibyte string is converted to wide-character string in fxprintf.c in glibc?

Currently, the logic in glibc source of perror is such:
If stderr is oriented, use it as is, else dup() it and use perror() on dup()'ed fd.
If stderr is wide-oriented, the following logic from stdio-common/fxprintf.c is used:
size_t len = strlen (fmt) + 1;
wchar_t wfmt[len];
for (size_t i = 0; i < len; ++i)
{
assert (isascii (fmt[i]));
wfmt[i] = fmt[i];
}
res = __vfwprintf (fp, wfmt, ap);
The format string is converted to wide-character form by the following code, which I do not understand:
wfmt[i] = fmt[i];
Also, it uses isascii assert:
assert (isascii(fmt[i]));
But the format string is not always ascii in wide-character programs, because we may use UTF-8 format string, which can contain non-7bit value(s).
Why there is no assert warning when we run the following code (assuming UTF-8 locale and UTF-8 compiler encoding)?
#include <stdio.h>
#include <errno.h>
#include <wchar.h>
#include <locale.h>
int main(void)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
fwide(stderr, 1);
errno = EINVAL;
perror("привет мир"); /* note, that the string is multibyte */
return 0;
}
$ ./a.out
привет мир: Invalid argument
Can we use dup() on wide-oriented stderr to make it not wide-oriented? In such case the code could be rewritten without using this mysterious conversion, taking into account the fact that perror() takes only multibyte strings (const char *s) and locale messages are all multibyte anyway.
Turns out we can. The following code demonstrates this:
#include <stdio.h>
#include <wchar.h>
#include <unistd.h>
int main(void)
{
fwide(stdout,1);
FILE *fp;
int fd = -1;
if ((fd = fileno (stdout)) == -1) return 1;
if ((fd = dup (fd)) == -1) return 1;
if ((fp = fdopen (fd, "w+")) == NULL) return 1;
wprintf(L"stdout: %d, dup: %d\n", fwide(stdout, 0), fwide(fp, 0));
return 0;
}
$ ./a.out
stdout: 1, dup: 0
BTW, is it worth posting an issue about this improvement to glibc developers?
NOTE
Using dup() is limited with respect to buffering. I wonder if it is considered in the implementation of perror() in glibc. The following example demonstrates this issue.
The output is done not in the order of writing to the stream, but in the order in which the data in the buffer is written-off.
Note, that the order of values in the output is not the same as in the program, because the output of fprintf is written-off first (because of "\n"), and the output of fwprintf is written off when program exits.
#include <wchar.h>
#include <stdio.h>
#include <unistd.h>
int main(void)
{
wint_t wc = L'b';
fwprintf(stdout, L"%lc", wc);
/* --- */
FILE *fp;
int fd = -1;
if ((fd = fileno (stdout)) == -1) return 1;
if ((fd = dup (fd)) == -1) return 1;
if ((fp = fdopen (fd, "w+")) == NULL) return 1;
char c = 'h';
fprintf(fp, "%c\n", c);
return 0;
}
$ ./a.out
h
b
But if we use \n in fwprintf, the output is the same as in the program:
$ ./a.out
b
h
perror() manages to get away with that, because in GNU libc stderr is unbuffered. But will it work safely in programs where stderr is manually set to buffered mode?
This is the patch that I would propose to glibc developers:
diff -urN glibc-2.24.orig/stdio-common/perror.c glibc-2.24/stdio-common/perror.c
--- glibc-2.24.orig/stdio-common/perror.c 2016-08-02 09:01:36.000000000 +0700
+++ glibc-2.24/stdio-common/perror.c 2016-10-10 16:46:03.814756394 +0700
## -36,7 +36,7 ##
errstring = __strerror_r (errnum, buf, sizeof buf);
- (void) __fxprintf (fp, "%s%s%s\n", s, colon, errstring);
+ (void) _IO_fprintf (fp, "%s%s%s\n", s, colon, errstring);
}
## -55,7 +55,7 ##
of the stream. What is supposed to happen when the stream isn't
oriented yet? In this case we'll create a new stream which is
using the same underlying file descriptor. */
- if (__builtin_expect (_IO_fwide (stderr, 0) != 0, 1)
+ if (__builtin_expect (_IO_fwide (stderr, 0) < 0, 1)
|| (fd = __fileno (stderr)) == -1
|| (fd = __dup (fd)) == -1
|| (fp = fdopen (fd, "w+")) == NULL)
NOTE: It wasn't easy to find concrete questions in this post; on the whole, the post seems to be an attempt to engage in a discussion about implementation details of glibc, which it seems to me would be better directed to a forum specifically oriented to development of that library such as the libc-alpha mailing list. (Or see https://www.gnu.org/software/libc/development.html for other options.) This sort of discussion is not really a good match for StackOverflow, IMHO. Nonetheless, I tried to answer the questions I could find.
How does wfmt[i] = fmt[i]; convert from multibyte to wide character?
Actually, the code is:
assert(isascii(fmt[i]));
wfmt[i] = fmt[i];
which is based on the fact that the numeric value of an ascii character is the same as a wchar_t. Strictly speaking, this need not be the case. The C standard specifies:
Each member of the basic character set shall have a code value equal to its value when used as the lone character in an integer character constant if an implementation does not define __STDC_MB_MIGHT_NEQ_WC__. (§7.19/2)
(gcc does not define that symbol.)
However, that only applies to characters in the basic set, not to all characters recognized by isascii. The basic character set contains the 91 printable ascii characters as well as space, newline, horizontal tab, vertical tab and form feed. So it is theoretically possible that one of the remaining control characters will not be correctly converted. However, the actual format string used in the call to __fxprintf only contains characters from the basic character set, so in practice this pedantic detail is not important.
Why there is no assert warning when we execute perror("привет мир");?
Because only the format string is being converted, and the format string (which is "%s%s%s\n") contains only ascii characters. Since the format string contains %s (and not %ls), the argument is expected to be char* (and not wchar_t*) in both the narrow- and wide-character orientations.
Can we use dup() on wide-oriented stderr to make it not wide-oriented?
That would not be a good idea. First, if the stream has an orientation, it might also have a non-empty internal buffer. Since that buffer is part of the stdio library and not of the underlying Posix fd, it will not be shared with the duplicate fd. So the message printed by perror might be interpolated in the middle of some existing output. In addition, it is possible that the multibyte encoding has shift states, and that the output stream is not currently in the initial shift state. In that case, outputting an ascii sequence could result in garbled output.
In the actual implementation, the dup is only performed on streams without orientation; these streams have never had any output directed at them, so they are definitely still in the initial shift state with an empty buffer (if the stream is buffered).
Is it worth posting an issue about this improvement to glibc developers?
That is up to you, but don't do it here. The normal way of doing that would be to file a bug. There is no reason to believe that glibc developers read SO questions, and even if they do, someone would have to copy the issue to a bug, and also copy any proposed patch.
it uses isascii assert.
This is OK. You are not supposed to call this function. It is a glibc internal. Note the two underscores in front of the name. When called from perror, the argument in question is "%s%s%s\n", which is entirely ASCII.
But the format string is not always ascii in wide-character programs, because we may use UTF-8
First, UTF-8 has nothing to do with wide characters. Second, the format string is always ASCII because the function is only called by other glibc functions that know what they are doing.
perror("привет мир");
This is not the format string, this is one of the arguments that corresponds to one of the %s in the actual format string.
Can we use dup() on wide-oriented stderr
You cannot use dup on a FILE*, it operates on POSIX
file descriptors that don't have orientation.
This is the patch that I would propose to glibc developers:
Why? What isn't working?

Buffering of standard I/O library

In the book Advanced Programming in the UNIX Environments (2nd edition), the author wrote in Section 5.5 (stream operations of the standard I/O library) that:
When a file is opened for reading and writing (the plus sign in the type), the following restrictions apply.
Output cannot be directly followed by input without an intervening fflush, fseek, fsetpos, or rewind.
Input cannot be directly followed by output without an intervening fseek, fsetpos, or rewind, or an input operation that encounters an end of file.
I got confused about this. Could anyone explain a little about this? For example, in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program? I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear.
You aren't allowed to intersperse input and output operations. For example, you can't use formatted input to seek to a particular point in the file, then start writing bytes starting at that point. This allows the implementation to assume that at any time, the sole I/O buffer will only contain either data to be read (to you) or written (to the OS), without doing any safety checks.
f = fopen( "myfile", "rw" ); /* open for read and write */
fscanf( f, "hello, world\n" ); /* scan past file header */
fprintf( f, "daturghhhf\n" ); /* write some data - illegal */
This is OK, though, if you do an fseek( f, 0, SEEK_CUR ); between the fscanf and the fprintf because that changes the mode of the I/O buffer without repositioning it.
Why is it done this way? As far as I can tell, because OS vendors often want to support automatic mode switching, but fail. The stdio spec allows a buggy implementation to be compliant, and a working implementation of automatic mode switching simply implements a compatible extension.
It's not clear what you're asking.
Your basic question is "Why does the book say I can't do this?" Well, the book says you can't do it because the POSIX/SUS/etc. standard says it's undefined behavior in the fopen specification, which it does to align with the ISO C standard (N1124 working draft, because the final version is not free), 7.19.5.3.
Then you ask, "in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program?"
Undefined behavior will always cause unexpected behavior, because the whole point is that you're not allowed to expect anything. (See 3.4.3 and 4 in the C standard linked above.)
But on top of that, it's not even clear what they could have specified that would make any sense. Look at this:
int main(int argc, char *argv[]) {
FILE *fp = fopen("foo", "r+");
fseek(fp, 0, SEEK_SET);
fwrite("foo", 1, 3, fp);
fseek(fp, 0, SEEK_SET);
fwrite("bar", 1, 3, fp);
char buf[4] = { 0 };
size_t ret = fread(buf, 1, 3, fp);
printf("%d %s\n", (int)ret, buf);
}
So, should this print out 3 foo because that's what's on disk, or 3 bar because that's what's in the "conceptual file", or 0 because there's nothing after what's been written so you're reading at EOF? And if you think there's an obvious answer, consider the fact that it's possible that bar has been flushed already—or even that it's been partially flushed, so the disk file now contains boo.
If you're asking the more practical question "Can I get away with it in some circumstances?", well, I believe on most Unix platforms, the above code will give you an occasional segfault, but 3 xyz (either 3 uninitialized characters, or in more complicated cases 3 characters that happened to be in the buffer before it got overwritten) the rest of the time. So, no, you can't get away with it.
Finally, you say, "I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear." This sounds like you're asking about the rationale.
You're right that it's about buffering. As I pointed out above, there really is no intuitive right thing to do here—but also, think about the implementation. Remember that the Unix way has always been "if the simplest and most efficient code is good enough, do that".
There are three ways you could implement something like stdio:
Use a shared buffer for read and write, and write code to switch contexts as needed. This is going to be a bit complicated, and will flush buffers more often than you'd ideally like.
Use two separate buffers, and cache-style code to determine when one operation needs to copy from and/or invalidate the other buffer. This is even more complicated, and makes a FILE object take twice as much memory.
Use a shared buffer, and just don't allow interleaving reads and writes without explicit flushes in between. This is dead-simple, and as efficient as possible.
Use a shared buffer, and implicitly flush between interleaved reads and writes. This is almost as simple, and almost as efficient, and a lot safer, but not really any better in any way other than safety.
So, Unix went with #3, and documented it, and SUS, POSIX, C89, etc. standardized that behavior.
You might say, "Come on, it can't be that inefficient." Well, you have to remember that Unix was designed for low-end 1970s systems, and the basic philosophy that it's not worth trading off even a little efficiency unless there's some actual benefit. But, most importantly, consider that stdio has to handle trivial functions like getc and putc, not just fancy stuff like fscanf and fprintf, and adding anything to those functions (or macros) that makes them 5x as slow would make a huge difference in a lot of real-world code.
If you look at modern implementations from, e.g., *BSD, glibc, Darwin, MSVCRT, etc. (most of which are open source, or at least commercial-but-shared-source), most of them do things the same way. A few add safety checks, but they generally give you an error for interleaving rather than implicitly flushing—after all, if your code is wrong, it's better to tell you that your code is wrong than to try to DWIM.
For example, look at early Darwin (OS X) fopen, fread, and fwrite (chosen because it's nice and simple, and has easily-linkable code that's syntax-colored but also copy-pastable). All that fread has to do is copy bytes out of the buffer, and refill the buffer if it runs out. You can't get any simpler than that.
reason 1
find the real file position to start.
due to the buffer implementation of the stdio, the stdio stream position may differ from the OS file position. when you read 1 byte, stdio mark the file position to 1. Due to the buffering, stdio may read 4096 bytes from the underlying file, where OS would record its file position at 4096. When you switch to output, you really need to choose which position you want to use.
reason 2
find the right buffer cursor to start.
tl;dr,
if an underlying implementation only uses a single shared buffer for both read and write, you have to flush the buffer when changing IO direction.
Take this glibc used in chromium os to demo how fwrite, fseek, and fflush handle the single shared buffer.
fwrite fill buffer impl:
fill_buffer:
while (to_write > 0)
{
register size_t n = to_write;
if (n > buffer_space)
n = buffer_space;
buffer_space -= n;
written += n;
to_write -= n;
if (n < 20)
while (n-- > 0)
*stream->__bufp++ = *p++;
else
{
memcpy ((void *) stream->__bufp, (void *) p, n);
stream->__bufp += n;
p += n;
}
if (to_write == 0)
/* Done writing. */
break;
else if (buffer_space == 0)
{
/* We have filled the buffer, so flush it. */
if (fflush (stream) == EOF)
break;
from this code snippet, we can see, if buffer is full, it will flush it.
Let's take a look at fflush
int
fflush (stream)
register FILE *stream;
{
if (stream == NULL) {...}
if (!__validfp (stream) || !stream->__mode.__write)
{
__set_errno (EINVAL);
return EOF;
}
return __flshfp (stream, EOF);
}
it uses __flshfp
/* Flush the buffer for FP and also write C if FLUSH_ONLY is nonzero.
This is the function used by putc and fflush. */
int
__flshfp (fp, c)
register FILE *fp;
int c;
{
/* Make room in the buffer. */
(*fp->__room_funcs.__output) (fp, flush_only ? EOF : (unsigned char) c);
}
the __room_funcs.__output by default is using flushbuf
/* Write out the buffered data. */
wrote = (*fp->__io_funcs.__write) (fp->__cookie, fp->__buffer,
to_write);
Now we are close. What's __write? Trace the default settings aforementioned, it's __stdio_write
int
__stdio_write (cookie, buf, n)
void *cookie;
register const char *buf;
register size_t n;
{
const int fd = (int) cookie;
register size_t written = 0;
while (n > 0)
{
int count = __write (fd, buf, (int) n);
if (count > 0)
{
buf += count;
written += count;
n -= count;
}
else if (count < 0
#if defined (EINTR) && defined (EINTR_REPEAT)
&& errno != EINTR
#endif
)
/* Write error. */
return -1;
}
return (int) written;
}
__write is the system call to write(3).
As we can see, the fwrite is only using only one single buffer. If you change direction, it can still store the previous write contents. From the above example, you can call fflush to empty the buffer.
The same applies to fseek
/* Move the file position of STREAM to OFFSET
bytes from the beginning of the file if WHENCE
is SEEK_SET, the end of the file is it is SEEK_END,
or the current position if it is SEEK_CUR. */
int
fseek (stream, offset, whence)
register FILE *stream;
long int offset;
int whence;
{
...
if (stream->__mode.__write && __flshfp (stream, EOF) == EOF)
return EOF;
...
/* O is now an absolute position, the new target. */
stream->__target = o;
/* Set bufp and both end pointers to the beginning of the buffer.
The next i/o will force a call to the input/output room function. */
stream->__bufp
= stream->__get_limit = stream->__put_limit = stream->__buffer;
...
}
it will soft flush (reset) the buffer at the end, which means read buffer will be emptied after this call.
This obeys the C99 rationale:
A change of input/output direction on an update file is only allowed following a successful fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions which assure that the I/O buffer has been flushed.

stat system call reports "No such file or directory"

I'm working on a homework assignment and I ran into a little snag.
I'm trying to read a filename from standard input and then stat the file to get the size (as per the assignment's requirements):
#define BUFFSIZE 4096
int
main(void) {
int n;
char buffer[BUFFSIZE];
struct stat buf;
while ((n = read(STDIN_FILENO, buffer, BUFFSIZE)) > 0) {
stat(buffer, &buf);
perror("stat");
}
}
Here's the output when ran (I entered the filename file):
file
stat: No such file or directory
But if I try something like this:
#define BUFFSIZE 4096
int
main(void) {
int n;
char buffer[BUFFSIZE] = "file";
struct stat buf;
stat(buffer, &buf);
perror("stat");
}
I get:
stat: Success
The file named file is in the directory that I'm running the program from.
How come there is a difference between reading in the string "file" and just initializing the array to the string "file"?
Before calling stat() print the value of buffer to standard output:
printf("[%s]\n", buffer);
It will not be what you expect as read() will not NULL terminate buffer for you.
Initialise buffer before read().
Not sure why you looping on the read() as you should acquire the complete name of the file before calling stat(). If you have not been forced to use read() consider using fgets().
Did you try printing the buffer? Most likely your read call returned a newline on the end of the string "file," and there is no file "file\n" in your directory. I would recommend using fgets instead to read the filename from the console. Use standard C input/output when you can, and only delegate to platform-specific code when there is a measurable benefit (e.g. there is no cross-platform stat function in the C standard library, and sometimes Unix I/O can measurably improve performance).
There is a '\n' in the buffer in the 1st snippet. Take it out
buffer[strlen(buffer) - 1] = 0;
The problem is that the read leaves the newline in the buffer, so you try to stat "file\n".
read() may be reading too much (or not enough) on the first loop. Try printing out what it's read just before the stat() call:
printf("Read %d characters (%s)\n",n,buf);
read() is probably a bit low level for this task - I'd recommend using scanf() instead.
while ( scanf ("%s",buffer) == 1) {
For safe code, you'll need to specify the maximum number of characters to read, which you can do like this:
while ( scanf ("%4096s",buffer) == 1) {
However, if you want to use the BUFFSIZE macro, you'll need to do a bit of mucking about:
#define XLIM_PERCENT_S(x) "%" #x "s"
#define LIM_PERCENT_S(x) XLIM_PERCENT_S(x)
....
while ( scanf ( LIM_PERCENT_S( BUFFSIZE ) , buffer) == 1) {

read() syscall on windows fails to read binary file

I would like to read image file to keep them in memory before using them with SDL. I just realized that open() and read() on windows fails to read my file entirely but on linux/BSD it works!
This is my code:
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#define IMGPATH "rabbit.png"
int
main(int argc, char *argv[])
{
int fd;
struct stat st;
void *data;
size_t nbread;
fd = open(IMGPATH, O_RDONLY);
if (fd < 0)
exit(1);
fstat(fd, &st);
data = malloc(st.st_size);
if (data == NULL)
exit(1);
nbread = read(fd, data, st.st_size);
if (nbread != st.st_size)
printf("Failed to read completely: expected = %ld, read = %ld\n", st.st_size, nbread);
}
On windows it will produce: Failed to read completely: expected = 19281, read = 5. perror() says no error and if I try to read() again it does not change and stuck at this 5 byte.
Is there some flag I should add to open() to read binary file?
This is the first PNG bytes file I try to read:
0000000 211 P N G \r \n 032 \n \0 \0 \0 \r I H D R
0000010 \0 \0 \0 \ \0 \0 \0 k \b 006 \0 \0 \0 <FA> 220 <E5>
Does it stops reading when '\n' appears?
I don't know how to fix this right now.
PS: please do not says "use libpng" because I just need to get the file buffer into memory before using it with SDL and my graphic library.
A few points:
read() is not guaranteed to read the count of bytes specified. It may return early or nothing at all. It's normal to have to call read() several times to fill large buffers. This is one of the reasons the stdio wrappers and fread() are useful.
On Windows, text and binary mode differ. Since you did not specifiy O_BINARY in your flags, Windows will handle '\n' characters differently for this file. Likely it is returning at the first '\n' encountered.
It's not necessary to check the file size before reading the file. The read() and indeed any wrapper around read() will always stop reading at the end of the file.
Update0
On further observation I see that the 5th and 6th characters are \r\n, this is handled specially by Windows when in text mode, and explains the early return as I mentioned above. If you don't pass O_BINARY to the open() call these 2 characters will be converted to a single \n.

Resources