ReadFile on standard input crashes on Windows 7 - c

Trying to read standard input with ReadFile works on Windows 8+ but crashes on Windows 7.
#include <windows.h>
int main() {
char c[1];
HANDLE in = GetStdHandle(STD_INPUT_HANDLE);
ReadFile(in, c, 1, NULL, NULL);
return 0;
}
produces
Program received signal SIGSEGV, Segmentation fault.
0x00000000770f5803 in VerifyConsoleIoHandle () from C:\Windows\system32\kernel32.dll
on Windows 7

the lpNumberOfBytesRead argument is required, unless the ReadFile will complete asynchronously (the file/device was opened with the correct flags and lpOverlapped is provided)
On Windows 8 and later this parameter is checked for NULL before writing (effectively making it optional) but this is not documented anywhere.
Reading standard input without checking the number of bytes read is a bad idea anyway, since the number of bytes read could be less than requested (or even 0) if input is redirected to a pipe.

Related

fgets(), signals (EINTR) and input data integrity

fgets() was intended for reading some string until EOF or \n occurred. It is very handy for reading text config files, for example, but there are some problems.
First, it may return EINTR in case of signal delivery, so it should be wrapped with loop checking for that.
Second problem is much worse: at least in glibc, it will return EINTR and loss all already read data in case it delivered in middle. This is very unlikely to happen, but I think this may be source of some complicated vulnerabilities in some daemons.
Setting SA_RESTART flag on signals seems to help avoiding this problem but I'm not sure it covers ALL possible cases on all platforms. Is it?
If no, is there a way to avoid the problem at all?
If no, it seems that fgets() is not usable for reading files in daemons because it may lead to random data loss.
Example code for tests:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
static char buf[1000000];
static volatile int do_exit = 0;
static void int_sig_handle(int signum) { do_exit = 1; }
void try(void) {
char * r;
int err1, err2;
size_t len;
memset(buf,1,20); buf[20]=0;
r = fgets(buf, sizeof(buf), stdin);
if(!r) {
err1 = errno;
err2 = ferror(stdin);
printf("\n\nfgets()=NULL, errno=%d(%s), ferror()=%d\n", err1, strerror(err1), err2);
len = strlen(buf);
printf("strlen()=%u, buf=[[[%s]]]\n", (unsigned)len, buf);
} else if(r==buf) {
err1 = errno;
err2 = ferror(stdin);
len = strlen(buf);
if(!len) {
printf("\n\nfgets()=buf, strlen()=0, errno=%d(%s), ferror()=%d\n", err1, strerror(err1), err2);
} else {
printf("\n\nfgets()=buf, strlen()=%u, [len-1]=0x%02X, errno=%d(%s), ferror()=%d\n",
(unsigned)len, (unsigned char)(buf[len-1]), err1, strerror(err1), err2);
}
} else {
printf("\n\nerr\n");
}
}
int main(int argc, char * * argv) {
struct sigaction sa;
sa.sa_flags = 0; sigemptyset(&sa.sa_mask); sa.sa_handler = int_sig_handle;
sigaction(SIGINT, &sa, NULL);
printf("attempt 1\n");
try();
printf("\nattempt 2\n");
try();
printf("\nend\n");
return 0;
}
This code can be used to test signal delivery in middle of "attempt 1" and ensure that its partially read data become completely lost after that.
How to test:
run program with strace
enter some line (do not press Enter), press Ctrl+D, see read() syscall completed with some data
send SIGINT
see fread() returned NULL, "attempt 2" and enter some data and press Enter
it will print second entered data but will not print first anywhere
FreeBSD 11 libc: same behaviour
FreeBSD 8 libc: first attempt returns partially read data and sets ferror() and errno
EDIT: according with #John Bollinger recommendations I've added dumping of the buffer after NULL return. Results:
glibc and FreeBSD 11 libc: buffer contains that partially read data but NOT NULL-TERM so the only way to get its length is to clear entire buffer before calling fgets() which looks not like intended use
FreeBSD 8 libc: still returns properly null-terminated partially-read data
stdio is indeed not reasonably usable with interrupting signal handlers.
Per ISO C 11 7.21.7.2 The fgets function, paragraph 3:
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
EINTR is a read error, so the array contents are indeterminate after such a return.
Theoretically, the behavior could be specified for fgets in a way that you could meaningfully recover from an error in the middle of the operation by setting up the buffer appropriately before the call, since you know that fgets does not write '\n' except as the final character before null termination (similar to techniques for using fgets with embedded NULs). However, it's not specified that way, and there would be no analogous way to handle other stdio functions like scanf, which have nowhere to store state for resuming them after EINTR.
Really, signals are just a really backwards way of doing things, and interrupting signals are an even more backwards tool full of race conditions and other unpleasant and unfixable corner cases. If you want to do this kind of thing in a safe and modern way, you probably need to have a thread that forwards stdin through a pipe or socket, and close the writing end of the pipe or socket in the signal handler so that the main part of your program reading from it gets EOF.
First, it may return EINTR in case of signal delivery, so it should be
wrapped with loop checking for that.
Of course you mean that fgets() will return NULL and set errno to EINTR. Yes, this is a possibility, and not only for fgets(), or even for stdio functions generally -- a wide variety of functions from the I/O realm and others may exhibit this behavior. Most POSIX functions that may block on events external to the program can fail with EINTR and various function-specific associated behaviors. It's a characteristic of the programming and operational environment.
Second problem is much worse: at least in glibc, it will return EINTR
and loss all already read data in case it delivered in middle. This is
very unlikely to happen, but I think this may be source of some
complicated vulnerabilities in some daemons.
No, at least not in my tests. It is your test program that loses data. When fgets() returns NULL to signal an error, that does not imply that it has not transferred any data to the buffer, and if I modify your program to print the buffer after an EINTR is signaled then I indeed see that the data from attempt 1 have been transferred there. But the program ignores that data.
Now it is possible that other programs make the same mistake that yours does, and therefore lose data, but that is not because of a flaw in the implementation of fgets().
FreeBSD 8 libc: first attempt returns partially read data and sets ferror() and errno
I'm inclined to think that this behavior is flawed -- if the function returns before reaching end of line / file then it should signal an error by providing a NULL return value. It may, but is not obligated to, transfer some or all of the data read to that point to the user-provided buffer. (But if it doesn't transfer data then they should remain available to be read.) I also find it surprising that the function sets the file's error flag at all. I'm inclined to think that erroneous, but I'm not prepared to present an argument for that at the moment.

Reading from a COM port destroys lines

I'm trying to read data from a COM port line-by-line in Windows. In PuTTY, the COM connection looks fine - my serial device (an MSP430 Launchpad) outputs the string "Data" once per second. However, when I use a simple C program to read the COM port and print out the number of bytes read, then the data itself, it gets completely mangled:
0
6 Data
2 Data
4 ta
6 Data
3 Data
3 a
a
6 Data
6 Data
2 Data
The lines saying 6 Data are correct (four characters, then \r\n), but what's happening to those lines that do not contain a complete message? According to the documentation, ReadFile should read an entire line by default. Is this incorrect - do I need to buffer it myself and wait for a linefeed character?
Note that not all those errors would occur in each run of the code; I did a few runs and compiled a variety of errors for your viewing pleasure. Here's the code I'm using:
#include <windows.h>
#include <stdio.h>
static DCB settings;
static HANDLE serial;
static char line[200];
static unsigned long read;
static unsigned int lineLength = sizeof(line) / sizeof(char);
int main(void) {
int i = 10;
serial = CreateFile("COM4",
GENERIC_READ | GENERIC_WRITE,
0, NULL,
OPEN_EXISTING,
0, NULL);
GetCommState(serial, &settings);
settings.BaudRate = CBR_9600;
settings.ByteSize = 8;
settings.Parity = NOPARITY;
settings.StopBits = ONESTOPBIT;
SetCommState(serial, &settings);
while(i) {
ReadFile(serial, &line, lineLength, &read, 0);
printf("%lu %s\n", read, line);
i--;
}
scanf("%c", &read);
return 0;
}
Compiled in Windows 7 64-bit using Visual Studio Express 2012.
What's happening is that the ReadFile is returning after it gets any data. Since data may come on a serial port at some point in the future, ReadFile will return when it gets some amount of data on the serial port. The same thing happens in Linux as well, if you attempt to read from a serial port. The data that you get back may or may not be an entire line, depending on how much information is in the buffer when your process gets dispatched again.
If you take another look at the documentation, notice that it will only return a line when the HANDLE is in console mode:
Characters can be read from the console input buffer by using ReadFile with a handle to console input. The console mode determines the exact behavior of the ReadFile function. By default, the console mode is ENABLE_LINE_INPUT, which indicates that ReadFile should read until it reaches a carriage return. If you press Ctrl+C, the call succeeds, but GetLastError returns ERROR_OPERATION_ABORTED. For more information, see CreateFile.

C program, strange behaviour

Recently I came across the problem of geting 'Oops, Spwan error, can not allocate memory' while working with one C Application.
To understand the File Descriptor and Memory management better I give a try this sample program and it gives me shocking result.
Here is the code.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int ac, char *av[]);
int main(int ac, char *av[])
{
int fd = 0;
unsigned long counter=0;
while (1)
{
char *aa = malloc(16384);
usleep(5);
fprintf(stderr,"Counter is %ld \n", counter);
fd = fopen("/dev/null",r")
}
return 0;
}
Here in the sample program I am trying to allocate memory every 5 micro second and also open a file descriptor at the same time.
Now when I run the program it started increasing the memory and also file descriptor star increasing, but memory increase upto 82.5% and file descriptor increase upto 1024. I know 'ulimit' set this parameter and it is 1024 by default.
But this program must crash by eating the memory or it should gives error ' Can't spawn child', but it is working.
So Just wanted to know why it is not crashing and why it is not giving child error as it reached file descriptor limit.
It's not crashing probably because when malloc() finds no more memory to allocate and return, it simply returns NULL. Likewise, open() also just returns a negative value. In other words, the cooperation of your OS and the standard library is smarter than it would enable your program to crash.
What's the point in doing that?
Plus on linux, the system won't even eat up the memory if nothing is actually written on "aa".
And anyway, if you could actually take all the memory (which will never happen, for Linux and *bsd, don't know about windows), it would just result in making the system lag like hell or even freeze, not just crashing your application.

Is this kind of behavior defined by standard?

#include <unistd.h>
int main(int argc, char* argv[])
{
char buf[500];
read(0, buf, 5);
return 0;
}
The above read 5 characters from stdin,but if I input more than 5:
12345morethan5
[root# test]# morethan5
-bash: morethan5: command not found
The remaining characters will be executed as shell commands.
Is this kind of behavior defined by standard?
Sort of :-)
Your program reads 5 characters, and that's it. Not less, not more. The rest remain in the terminal buffer and get sent to your shell once your C program terminates.
Since you are using read(), which is a raw system call, instead of any of the C stdio buffering alternatives this behaviour is not just expected, but required.
From the POSIX standard on read():
The read() function shall attempt to
read nbyte bytes from the file
associated with the open file
descriptor, fildes, into the buffer
pointed to by buf.
...
Upon successful completion, where
nbyte is greater than 0, read() shall
mark for update the st_atime field of
the file, and shall return the number
of bytes read. This number shall never
be greater than nbyte.
...
Upon successful completion, read()
[XSI] [Option Start] and pread()
[Option End] shall return a
non-negative integer indicating the
number of bytes actually read.
I.e. read() should never read more bytes from the file descriptor than requested.
From the related part on terminals:
It is not, however, necessary to read
a whole line at once; any number of
bytes, even one, may be requested in a
read() without losing information.
...
The last process to close a terminal device file shall cause any output to be sent to the device and any input to be discarded.
Note: normally your shell will still have an open file descriptor for the terminal, until you end the session.
That has nothing to do with any standard, it's up to your runtime what to write to stdin. Your runtime makes the standard input available to your program, which reads some bytes from it and quits, and then the remaining bytes are processed by the runtime itself -- if you can configure it to clear all the file descriptors after forking a process, you could maybe prevent this behaviour, but that would seriously impede most of the standard command line workflows which rely on attaching one process's input to another process's output...

How to prevent stdin stream from reading data from associated file descriptor on program start?

I'm using select() call to detect input presence in the main cycle of my program. This makes me use raw file descriptor (0) instead of stdin.
While working in this mode I've noticed that my software occasionally loses a chunk of input at the beginning. I suspect that stdin consumes some of it on the program start. Is there a way to prevent this behavior of stdin or otherwise get the whole input data?
The effect described can be reproduced only with some data on standard input at the very moment of program start. My executable should be used as xinetd service in a way that it always has some input on the start.
Standard input is read in the following way:
Error processInput() {
struct timeval ktimeout;
int fd=fileno(stdin);
int maxFd=fd+1;
FD_ZERO(&fdset);
FD_SET(fd, &fdset);
ktimeout.tv_sec = 0;
ktimeout.tv_usec = 1;
int selectRv=-1;
while ((selectRv=select(maxFd, &fdset, NULL, NULL, &ktimeout)) > 0) {
int left=MAX_BUFFER_SIZE-position-1;
assert(left>0);
int bytesCount=read(fd, buffer+position, left);
//Input processing goes here
}
}
Don't mix cooked and raw meat together. Try replacing the read() call with the equivalent fread() call.
It is very likely that fileno(stdin) is initializing the stdin object, causing it to read and buffer some input. Or perhaps you are already calling something that causes it to initialize (scanf(), getchar(), etc...).

Resources