read() syscall on windows fails to read binary file - c

I would like to read image file to keep them in memory before using them with SDL. I just realized that open() and read() on windows fails to read my file entirely but on linux/BSD it works!
This is my code:
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#define IMGPATH "rabbit.png"
int
main(int argc, char *argv[])
{
int fd;
struct stat st;
void *data;
size_t nbread;
fd = open(IMGPATH, O_RDONLY);
if (fd < 0)
exit(1);
fstat(fd, &st);
data = malloc(st.st_size);
if (data == NULL)
exit(1);
nbread = read(fd, data, st.st_size);
if (nbread != st.st_size)
printf("Failed to read completely: expected = %ld, read = %ld\n", st.st_size, nbread);
}
On windows it will produce: Failed to read completely: expected = 19281, read = 5. perror() says no error and if I try to read() again it does not change and stuck at this 5 byte.
Is there some flag I should add to open() to read binary file?
This is the first PNG bytes file I try to read:
0000000 211 P N G \r \n 032 \n \0 \0 \0 \r I H D R
0000010 \0 \0 \0 \ \0 \0 \0 k \b 006 \0 \0 \0 <FA> 220 <E5>
Does it stops reading when '\n' appears?
I don't know how to fix this right now.
PS: please do not says "use libpng" because I just need to get the file buffer into memory before using it with SDL and my graphic library.

A few points:
read() is not guaranteed to read the count of bytes specified. It may return early or nothing at all. It's normal to have to call read() several times to fill large buffers. This is one of the reasons the stdio wrappers and fread() are useful.
On Windows, text and binary mode differ. Since you did not specifiy O_BINARY in your flags, Windows will handle '\n' characters differently for this file. Likely it is returning at the first '\n' encountered.
It's not necessary to check the file size before reading the file. The read() and indeed any wrapper around read() will always stop reading at the end of the file.
Update0
On further observation I see that the 5th and 6th characters are \r\n, this is handled specially by Windows when in text mode, and explains the early return as I mentioned above. If you don't pass O_BINARY to the open() call these 2 characters will be converted to a single \n.

Related

How do i display the first 10 words on my terminal without using stdio.h

Hey how would I be able to output the first 10 words of the text file without using functions from <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
int fd_to_read = open("sample.txt", O_RDONLY);
if (fd_to_read == -1) {
exit(1);
}
// ...
close(fd_to_read);
}
I have no idea how to display the first 10 words without the use of <stdio.h>
On POSIX systems the io primitives are: open, close, read, write.
These primitives operate on file descriptors rather than an FILE object as purposed by the C Standard.
Like the c io interface, the posix interface expects a buffer and a buffer size for the read and write primitives.
Assuming that words are separated by the whitespace character ' ', your job would be to read continuously from the source descriptor and count the occurrences of the space character (by iterating over the buffer char by char) until it hits the desired threshold.
Until then, write everything to the output descriptor.
In <unistd.h> you'll find these three importatnt symbols:
STDIN_FILENO
STDOUT_FILENO
STDERR_FILENO
This problem is more difficult than it looks: you cannot use <stdio.h> so you must use system calls to read from the file and write to stdout:
Read a byte:
char ch;
if (read(0, &ch, 1) != 1) {
/* end of file reached */
break;
}
Write the byte to stdout:
write(0, &ch, 1);
Testing for word boundaries is more tricky: you must skip all white space, then you have a new word until you read more white space.

How multibyte string is converted to wide-character string in fxprintf.c in glibc?

Currently, the logic in glibc source of perror is such:
If stderr is oriented, use it as is, else dup() it and use perror() on dup()'ed fd.
If stderr is wide-oriented, the following logic from stdio-common/fxprintf.c is used:
size_t len = strlen (fmt) + 1;
wchar_t wfmt[len];
for (size_t i = 0; i < len; ++i)
{
assert (isascii (fmt[i]));
wfmt[i] = fmt[i];
}
res = __vfwprintf (fp, wfmt, ap);
The format string is converted to wide-character form by the following code, which I do not understand:
wfmt[i] = fmt[i];
Also, it uses isascii assert:
assert (isascii(fmt[i]));
But the format string is not always ascii in wide-character programs, because we may use UTF-8 format string, which can contain non-7bit value(s).
Why there is no assert warning when we run the following code (assuming UTF-8 locale and UTF-8 compiler encoding)?
#include <stdio.h>
#include <errno.h>
#include <wchar.h>
#include <locale.h>
int main(void)
{
setlocale(LC_CTYPE, "en_US.UTF-8");
fwide(stderr, 1);
errno = EINVAL;
perror("привет мир"); /* note, that the string is multibyte */
return 0;
}
$ ./a.out
привет мир: Invalid argument
Can we use dup() on wide-oriented stderr to make it not wide-oriented? In such case the code could be rewritten without using this mysterious conversion, taking into account the fact that perror() takes only multibyte strings (const char *s) and locale messages are all multibyte anyway.
Turns out we can. The following code demonstrates this:
#include <stdio.h>
#include <wchar.h>
#include <unistd.h>
int main(void)
{
fwide(stdout,1);
FILE *fp;
int fd = -1;
if ((fd = fileno (stdout)) == -1) return 1;
if ((fd = dup (fd)) == -1) return 1;
if ((fp = fdopen (fd, "w+")) == NULL) return 1;
wprintf(L"stdout: %d, dup: %d\n", fwide(stdout, 0), fwide(fp, 0));
return 0;
}
$ ./a.out
stdout: 1, dup: 0
BTW, is it worth posting an issue about this improvement to glibc developers?
NOTE
Using dup() is limited with respect to buffering. I wonder if it is considered in the implementation of perror() in glibc. The following example demonstrates this issue.
The output is done not in the order of writing to the stream, but in the order in which the data in the buffer is written-off.
Note, that the order of values in the output is not the same as in the program, because the output of fprintf is written-off first (because of "\n"), and the output of fwprintf is written off when program exits.
#include <wchar.h>
#include <stdio.h>
#include <unistd.h>
int main(void)
{
wint_t wc = L'b';
fwprintf(stdout, L"%lc", wc);
/* --- */
FILE *fp;
int fd = -1;
if ((fd = fileno (stdout)) == -1) return 1;
if ((fd = dup (fd)) == -1) return 1;
if ((fp = fdopen (fd, "w+")) == NULL) return 1;
char c = 'h';
fprintf(fp, "%c\n", c);
return 0;
}
$ ./a.out
h
b
But if we use \n in fwprintf, the output is the same as in the program:
$ ./a.out
b
h
perror() manages to get away with that, because in GNU libc stderr is unbuffered. But will it work safely in programs where stderr is manually set to buffered mode?
This is the patch that I would propose to glibc developers:
diff -urN glibc-2.24.orig/stdio-common/perror.c glibc-2.24/stdio-common/perror.c
--- glibc-2.24.orig/stdio-common/perror.c 2016-08-02 09:01:36.000000000 +0700
+++ glibc-2.24/stdio-common/perror.c 2016-10-10 16:46:03.814756394 +0700
## -36,7 +36,7 ##
errstring = __strerror_r (errnum, buf, sizeof buf);
- (void) __fxprintf (fp, "%s%s%s\n", s, colon, errstring);
+ (void) _IO_fprintf (fp, "%s%s%s\n", s, colon, errstring);
}
## -55,7 +55,7 ##
of the stream. What is supposed to happen when the stream isn't
oriented yet? In this case we'll create a new stream which is
using the same underlying file descriptor. */
- if (__builtin_expect (_IO_fwide (stderr, 0) != 0, 1)
+ if (__builtin_expect (_IO_fwide (stderr, 0) < 0, 1)
|| (fd = __fileno (stderr)) == -1
|| (fd = __dup (fd)) == -1
|| (fp = fdopen (fd, "w+")) == NULL)
NOTE: It wasn't easy to find concrete questions in this post; on the whole, the post seems to be an attempt to engage in a discussion about implementation details of glibc, which it seems to me would be better directed to a forum specifically oriented to development of that library such as the libc-alpha mailing list. (Or see https://www.gnu.org/software/libc/development.html for other options.) This sort of discussion is not really a good match for StackOverflow, IMHO. Nonetheless, I tried to answer the questions I could find.
How does wfmt[i] = fmt[i]; convert from multibyte to wide character?
Actually, the code is:
assert(isascii(fmt[i]));
wfmt[i] = fmt[i];
which is based on the fact that the numeric value of an ascii character is the same as a wchar_t. Strictly speaking, this need not be the case. The C standard specifies:
Each member of the basic character set shall have a code value equal to its value when used as the lone character in an integer character constant if an implementation does not define __STDC_MB_MIGHT_NEQ_WC__. (§7.19/2)
(gcc does not define that symbol.)
However, that only applies to characters in the basic set, not to all characters recognized by isascii. The basic character set contains the 91 printable ascii characters as well as space, newline, horizontal tab, vertical tab and form feed. So it is theoretically possible that one of the remaining control characters will not be correctly converted. However, the actual format string used in the call to __fxprintf only contains characters from the basic character set, so in practice this pedantic detail is not important.
Why there is no assert warning when we execute perror("привет мир");?
Because only the format string is being converted, and the format string (which is "%s%s%s\n") contains only ascii characters. Since the format string contains %s (and not %ls), the argument is expected to be char* (and not wchar_t*) in both the narrow- and wide-character orientations.
Can we use dup() on wide-oriented stderr to make it not wide-oriented?
That would not be a good idea. First, if the stream has an orientation, it might also have a non-empty internal buffer. Since that buffer is part of the stdio library and not of the underlying Posix fd, it will not be shared with the duplicate fd. So the message printed by perror might be interpolated in the middle of some existing output. In addition, it is possible that the multibyte encoding has shift states, and that the output stream is not currently in the initial shift state. In that case, outputting an ascii sequence could result in garbled output.
In the actual implementation, the dup is only performed on streams without orientation; these streams have never had any output directed at them, so they are definitely still in the initial shift state with an empty buffer (if the stream is buffered).
Is it worth posting an issue about this improvement to glibc developers?
That is up to you, but don't do it here. The normal way of doing that would be to file a bug. There is no reason to believe that glibc developers read SO questions, and even if they do, someone would have to copy the issue to a bug, and also copy any proposed patch.
it uses isascii assert.
This is OK. You are not supposed to call this function. It is a glibc internal. Note the two underscores in front of the name. When called from perror, the argument in question is "%s%s%s\n", which is entirely ASCII.
But the format string is not always ascii in wide-character programs, because we may use UTF-8
First, UTF-8 has nothing to do with wide characters. Second, the format string is always ASCII because the function is only called by other glibc functions that know what they are doing.
perror("привет мир");
This is not the format string, this is one of the arguments that corresponds to one of the %s in the actual format string.
Can we use dup() on wide-oriented stderr
You cannot use dup on a FILE*, it operates on POSIX
file descriptors that don't have orientation.
This is the patch that I would propose to glibc developers:
Why? What isn't working?

stat system call reports "No such file or directory"

I'm working on a homework assignment and I ran into a little snag.
I'm trying to read a filename from standard input and then stat the file to get the size (as per the assignment's requirements):
#define BUFFSIZE 4096
int
main(void) {
int n;
char buffer[BUFFSIZE];
struct stat buf;
while ((n = read(STDIN_FILENO, buffer, BUFFSIZE)) > 0) {
stat(buffer, &buf);
perror("stat");
}
}
Here's the output when ran (I entered the filename file):
file
stat: No such file or directory
But if I try something like this:
#define BUFFSIZE 4096
int
main(void) {
int n;
char buffer[BUFFSIZE] = "file";
struct stat buf;
stat(buffer, &buf);
perror("stat");
}
I get:
stat: Success
The file named file is in the directory that I'm running the program from.
How come there is a difference between reading in the string "file" and just initializing the array to the string "file"?
Before calling stat() print the value of buffer to standard output:
printf("[%s]\n", buffer);
It will not be what you expect as read() will not NULL terminate buffer for you.
Initialise buffer before read().
Not sure why you looping on the read() as you should acquire the complete name of the file before calling stat(). If you have not been forced to use read() consider using fgets().
Did you try printing the buffer? Most likely your read call returned a newline on the end of the string "file," and there is no file "file\n" in your directory. I would recommend using fgets instead to read the filename from the console. Use standard C input/output when you can, and only delegate to platform-specific code when there is a measurable benefit (e.g. there is no cross-platform stat function in the C standard library, and sometimes Unix I/O can measurably improve performance).
There is a '\n' in the buffer in the 1st snippet. Take it out
buffer[strlen(buffer) - 1] = 0;
The problem is that the read leaves the newline in the buffer, so you try to stat "file\n".
read() may be reading too much (or not enough) on the first loop. Try printing out what it's read just before the stat() call:
printf("Read %d characters (%s)\n",n,buf);
read() is probably a bit low level for this task - I'd recommend using scanf() instead.
while ( scanf ("%s",buffer) == 1) {
For safe code, you'll need to specify the maximum number of characters to read, which you can do like this:
while ( scanf ("%4096s",buffer) == 1) {
However, if you want to use the BUFFSIZE macro, you'll need to do a bit of mucking about:
#define XLIM_PERCENT_S(x) "%" #x "s"
#define LIM_PERCENT_S(x) XLIM_PERCENT_S(x)
....
while ( scanf ( LIM_PERCENT_S( BUFFSIZE ) , buffer) == 1) {

C: lseek() related question

I want to write some bogus text in a file ("helloworld" text in a file called helloworld), but not starting from the beginning. I was thinking to lseek() function.
If I use the following code (edited):
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#define fname "helloworld"
#define buf_size 16
int main(){
char buffer[buf_size];
int fildes,
nbytes;
off_t ret;
fildes = open(fname, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);
if(fildes < 0){
printf("\nCannot create file + trunc file.\n");
}
//modify offset
if((ret = lseek(fildes, (off_t) 10, SEEK_END)) < (off_t) 0){
fprintf(stdout, "\nCannot modify offset.\n");
}
printf("ret = %d\n", (int)ret);
if(write(fildes, fname, 10) < 0){
fprintf(stdout, "\nWrite failed.\n");
}
close(fildes);
return (0);
}
, it compiles well and it runs without any apparent errors.
Still if i :
cat helloworld
The output is not what I expected, but:
helloworld
Can
Where is "Can" comming from, and where are my empty spaces ?
Should i expect for "zeros" instead of spaces ? If i try to open helloworld with gedit, an error occurs, complaining that the file character encoding is unknown.
LATER EDIT:
After I edited my program with the right buffer for writing, and then compile / run again, the "helloworld" file still cannot be opened with gedit.strong text
LATER EDIT
I understand the issue now. I've added to the code the following:
fildes = open(fname, O_RDONLY);
if(fildes < 0){
printf("\nCannot open file.\n");
}
while((nbytes = read(fildes, c, 1)) == 1){
printf("%d ", (int)*c);
}
And now the output is:
0 0 0 0 0 0 0 0 0 0 104 101 108 108 111 119 111 114 108 100
My problem was that i was expecting spaces (32) instead of zeros (0).
In this function call, write(fildes, fname, buf_size), fname has 10 characters (plus a trailing '\0' character, but you're telling the function to write out 16 bytes. Who knows what in the memory locations after the fname string.
Also, I'm not sure what you mean by "where are my empty spaces?".
Apart from expecting zeros to equal spaces, the original problem was indeed writing more than the length of the "helloworld" string. To avoid such a problem, I suggest letting the compiler calculate the length of your constant strings for you:
write(fildes, fname, sizeof(fname) - 1)
The - 1 is due to the NUL character (zero, \0) that is used to terminate C-style strings, and sizeof simply returning the size of the array that holds the string. Due to this you cannot use sizeof to calculate the actual length of a string at runtime, but it works fine for compile-time constants.
The "Can" you saw in your original test was almost certainly the beginning of one of the "\nCannot" strings in your code; after writing the 11 bytes in "helloworld\0" you continued to write the remaining bytes from whatever was following it in memory, which turned out to be the next string constant. (The question has now been amended to write 10 bytes, but the originally posted version wrote 16.)
The presence of NUL characters (= zero, '\0') in a text file may indeed cause certain (but not all) text editors to consider the file binary data instead of text, and possibly refuse to open it. A text file should contain just text, not control characters.
Your buf_size doesn't match the length of fname. It's reading past the buffer, and therefore getting more or less random bytes that just happened to sit after the string in memory.

C read binary stdin

I'm trying to build an instruction pipeline simulator and I'm having a lot of trouble getting started. What I need to do is read binary from stdin, and then store it in memory somehow while I manipulate the data. I need to read in chunks of exactly 32 bits one after the other.
How do I read in chunks of exactly 32 bits at a time? Secondly, how do I store it for manipulation later?
Here's what I've got so far, but examining the binary chunks I read further, it just doesn't look right, I don't think I'm reading exactly 32 bits like I need.
char buffer[4] = { 0 }; // initialize to 0
unsigned long c = 0;
int bytesize = 4; // read in 32 bits
while (fgets(buffer, bytesize, stdin)) {
memcpy(&c, buffer, bytesize); // copy the data to a more usable structure for bit manipulation later
// more stuff
buffer[0] = 0; buffer[1] = 0; buffer[2] = 0; buffer[3] = 0; // set to zero before next loop
}
fclose(stdin);
How do I read in 32 bits at a time (they are all 1/0, no newlines etc), and what do I store it in, is char[] okay?
EDIT: I'm able to read the binary in but none of the answers produce the bits in the correct order — they are all mangled up, I suspect endianness and problems reading and moving 8 bits around ( 1 char) at a time — this needs to work on Windows and C ... ?
What you need is freopen(). From the manpage:
If filename is a null pointer, the freopen() function shall attempt to change the mode of the stream to that specified by mode, as if the name of the file currently associated with the stream had been used. In this case, the file descriptor associated with the stream need not be closed if the call to freopen() succeeds. It is implementation-defined which changes of mode are permitted (if any), and under what circumstances.
Basically, the best you can really do is this:
freopen(NULL, "rb", stdin);
This will reopen stdin to be the same input stream, but in binary mode. In the normal mode, reading from stdin on Windows will convert \r\n (Windows newline) to the single character ASCII 10. Using the "rb" mode disables this conversion so that you can properly read in binary data.
freopen() returns a filehandle, but it's the previous value (before we put it in binary mode), so don't use it for anything. After that, use fread() as has been mentioned.
As to your concerns, however, you may not be reading in "32 bits" but if you use fread() you will be reading in 4 chars (which is the best you can do in C - char is guaranteed to be at least 8 bits but some historical and embedded platforms have 16 bit chars (some even have 18 or worse)). If you use fgets() you will never read in 4 bytes. You will read in at least 3 (depending on whether any of them are newlines), and the 4th byte will be '\0' because C strings are nul-terminated and fgets() nul-terminates what it reads (like a good function). Obviously, this is not what you want, so you should use fread().
Consider using SET_BINARY_MODE macro and setmode:
#ifdef _WIN32
# include <io.h>
# include <fcntl.h>
# define SET_BINARY_MODE(handle) setmode(handle, O_BINARY)
#else
# define SET_BINARY_MODE(handle) ((void)0)
#endif
More details about SET_BINARY_MODE macro here: "Handling binary files via standard I/O"
More details about setmode here: "_setmode"
I had to piece the answer together from the various comments from the kind people above, so here is a fully-working sample that works - only for Windows, but you can probably translate the windows-specific stuff to your platform.
#include "stdafx.h"
#include "stdio.h"
#include "stdlib.h"
#include "windows.h"
#include <io.h>
#include <fcntl.h>
int main()
{
char rbuf[4096];
char *deffile = "c:\\temp\\outvideo.bin";
size_t r;
char *outfilename = deffile;
FILE *newin;
freopen(NULL, "rb", stdin);
_setmode(_fileno(stdin), _O_BINARY);
FILE *f = fopen(outfilename, "w+b");
if (f == NULL)
{
printf("unable to open %s\n", outfilename);
exit(1);
}
for (;; )
{
r = fread(rbuf, 1, sizeof(rbuf), stdin);
if (r > 0)
{
size_t w;
for (size_t nleft = r; nleft > 0; )
{
w = fwrite(rbuf, 1, nleft, f);
if (w == 0)
{
printf("error: unable to write %d bytes to %s\n", nleft, outfilename);
exit(1);
}
nleft -= w;
fflush(f);
}
}
else
{
Sleep(10); // wait for more input, but not in a tight loop
}
}
return 0;
}
For Windows, this Microsoft _setmode example specifically shows how to change stdin to binary mode:
// crt_setmode.c
// This program uses _setmode to change
// stdin from text mode to binary mode.
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
int main( void )
{
int result;
// Set "stdin" to have binary mode:
result = _setmode( _fileno( stdin ), _O_BINARY );
if( result == -1 )
perror( "Cannot set mode" );
else
printf( "'stdin' successfully changed to binary mode\n" );
}
fgets() is all wrong here. It's aimed at human-readable ASCII text terminated by end-of-line characters, not binary data, and won't get you what you need.
I recently did exactly what you want using the read() call. Unless your program has explicitly closed stdin, for the first argument (the file descriptor), you can use a constant value of 0 for stdin. Or, if you're on a POSIX system (Linux, Mac OS X, or some other modern variant of Unix), you can use STDIN_FILENO.
fread() suits best for reading binary data.
Yes, char array is OK, if you are planning to process them bytewise.
I don't know what OS you are running, but you typically cannot "open stdin in binary". You can try things like
int fd = fdreopen (fileno (stdin), outfname, O_RDONLY | OPEN_O_BINARY);
to try to force it. Then use
uint32_t opcode;
read(fd, &opcode, sizeof (opcode));
But I have no actually tried it myself. :)
I had it right the first time, except, I needed ntohl ... C Endian Conversion : bit by bit

Resources