getline over a socket - c

Is there a libc function that would do the same thing as getline, but would work with a connected socket instead of a FILE * stream ?
A workaround would be to call fdopen on a socket. What are things that should be taken care of, when doing so. What are reasons to do it/ not do it.
One obvious reason to do it is to call getline and co, but maybe it is a better idea to rewrite some custom getline ?

when you call a read on a socket, then it can return a zero value prematurely.
eg.
read(fd, buf, bufsize)
can return a value less than bufsize if the kernel buffer for the tcp socket is full.
in such a case it may be required to call the read function again unless it returns a zero or a negative result.
thus it is best to avoid stdio functions. you need to create wrappers for the read function in order to implement the iterative call to read for getting bufsize bytes reliably. it should return a zero value only when no more bytes can be read from the socket, as if the file is being read from the local disk.
you can find wrappers in the book Computer Systems: A Programmer's Perspective by Randal Bryant.
The source code is available at this site. look for functions beginning with rio_.

If the socket is connected to untrusted input, be prepared for arbitrary input within arbitrary time frame
\0 character before \r\n
wait eternally for any of \r or \n
any other potentially ugly thing
One way to address the arbitrary timing and arbitrary data would be to provide timeouts on the reads e.g. via select(2) and feed the data you actually receive to some well-written state machine byte by byte.

The problem would be if you don't receive the new line (\n or \r\n, depends on your implementation) the program would hang. I'd write your own version that also makes calls to select() to check if the socket is still read/writable and doesn't have any errors. Really there would be no way to tell if another "\n" or "\r\n" is coming so make sure you know that the data from the client/server will be consistent.
Imagine you coded a webserver that reads the headers using getline(). If an attacker simple sent
GET / HTTP/1.1\r\n
This line isn't terminated: bla
The call the getline would never return and the program would hang. Probably costing you resources and eventually a DoS would be possible.

Related

How to tell if fgets(stdin) will block on Windows?

How can I tell if the next call to fgets(stdin, ...) will block or not? In other words, how can I tell if the stdin buffer has a newline waiting to be read?
On Unix, I know I can use select() like this:
fd_set reads;
FD_ZERO(&reads);
FD_SET(fileno(stdin), &reads);
int s = select(fileno(stdin)+1, &reads, 0, 0, 0);
if (s) {
//fgets is ready
}
However, select() on Windows only works with sockets, not with 'stdin', so I cannot use it.
I also know on Unix that I can use poll(), ioctl(0, I_NREAD...), and probably a lot of other solutions. None of these work on Windows.
I have tried kbhit() and WaitForSingleObject(GetStdHandle(STD_INPUT_HANDLE), )). The problem is that both of these indicate that input is available as soon as the first key is struck. I need to know if a whole line is available, because fgets() blocks for an entire line.
Perhaps my issue is that Unix shells tend to buffer entire input lines, while Windows doesn't?
Should I just use fgetc() to build up a buffer until I see a newline?
I've done research finding other answers, but none of them work for me. They either use C++, whereas I need a C solution. Or they are focusing on using fgets() with sockets, where I need to use it with stdin.
Any help is greatly appreciated. Thanks!
How can I tell if the next call to fgets(stdin, ...) will block or not? In other words, how can I tell if the stdin buffer has a newline waiting to be read?
Generally speaking, you cannot tell. Not on POSIX systems, either, without making some assumptions. Both POSIX and the Windows API define mechanisms for determining whether input is available, but that's not enough for you. You want to determine whether specific data (i.e. a line terminator) are available to be read, and the only way C defines for doing that is to read the data.
Therefore, if you really need to read a line at a time without blocking your main thread, then I suggest performing your reads asynchronously. You could roll your own, with a reader thread separate from your main one, but you might find that Microsoft's existing asynchronous I/O API supports your needs.

Can I determine how many bytes are in the stdio userspace read buffer associated with a FILE?

I'm writing a C program that connects to another machine over a TCP socket and reads newline-delimited text over that TCP connection.
I use poll to check whether data is available on the file descriptor associated with the socket, and then I read characters into a buffer until I get a newline. However, to make that character-by-character read efficient, I'm using a stdio FILE instead of using the read system call.
When more than one short line of input arrives over the socket quickly, my current approach has a bug. When I start reading characters, stdio buffers several lines of data in userspace. Once I've read one line and processed it, I then poll the socket file descriptor again to determine whether there is more data to read.
Unfortunately, that poll (and fstat, and every other method I know to get the number of bytes in a file) don't know about any leftover data that is buffered in userspace as part of the FILE. This results in my program blocking on that poll when it should be consuming data that has been buffered into userspace.
How can I check how much data is buffered in userspace? The specs specifically tell you not to rely on setvbuf for this purpose (the representation format is undefined), so I'm hoping for another option.
Right now, it seems like my best option is to implement my own userspace buffering where I have control over this, but I wanted to check before going down that road.
EDIT:
Some comments did provide a way to test if there is at least one character available by setting the file to be nonblocking and trying to fgetc/fungetc a single character, but this can't tell you how many bytes are available.

Flushing pipe without closing in C

I have found a lot of threads in here asking about how it is possible to flush a pipe after writing to it without closing it.
In every thread I could see different suggestions but i could not find a definite solution.
Here is a quick summary:
The easiest way to avoid read blocking on the pipe would be to write the exact number of bytes that is reading.
It could be also done by using ptmx instead of a pipe but people said it could be to much.
Note: It's not possible to use fsync with pipes
Are there any other more efficient solutions?
Edit:
The flush would be convenient when the sender wants to write n characters but the client reads m characters (where m>n). The client will block waiting for another m-n characters. If the sender wants to communicate again with the client leaves him without the option of closing the pipe and just sending the exact number of bytes could be a good source of bugs.
The receiver operates like this and it cannot be modified:
while((n=read(0, buf, 100)>0){
process(buf)
so the sender wants to get processed: "file1" and "file2" for which will have to:
write(pipe[1], "file1\0*95", 100);
write(pipe[1], "file2\0*95", 100);
what I am looking is for a way to do something like that (without being necessary to use the \n as the delimeter):
write(pipe[1], "file1\nfile2", 11); //it would have worked if it was ptmx
(Using read and write)
Flushing in the sense of fflush() is irrelevant to pipes, because they are not represented as C streams. There is therefore no userland buffer to flush. Similarly, fsync() is also irrelevant to pipes, because there is no back-end storage for the data. Data successfully written to a pipe exist in the kernel and only in the kernel until they are successfully read, so there is no work for fsync() to do. Overall, flushing / syncing is applicable only where there is intermediate storage involved, which is not the case with pipes.
With the clarification, your question seems to be about establishing message boundaries for communication via a pipe. You are correct that closing the write end of the pipe will signal a boundary -- not just of one message, but of the whole stream of communication -- but of course that's final. You are also correct that there are no inherent message boundaries. Nevertheless, you seem to be working from at least somewhat of a misconception here:
The easiest way to avoid read blocking on the pipe would be to write
the exact number of bytes that is reading.
[...]
The flush would be convenient when the sender wants to write n
characters but the client reads m characters (where m>n). The client
will block waiting for another m-n characters.
Whether the reader will block is entirely dependent on how the reader is implemented. In particular, the read(2) system call in no way guarantees to transfer the requested number of bytes before returning. It can and will perform a short read under some circumstances. Although the details are unspecified, you can ordinarily rely on a short read when at least one character can be transferred without blocking, but not the whole number requested. Similar applies to write(2). Thus, the easiest way to avoid read() blocking is to ensure that you write at least one byte to the pipe for that read() call to transfer.
In fact, people usually come at this issue from the opposite direction: needing to be certain to receive a specific number of bytes, and therefore having to deal with the potential for short reads as a complication (to be dealt with by performing the read() in a loop). You'll need to consider that, too, but you have the benefit that your client is unlikely to block under the circumstances you describe; it just isn't the problem you think it is.
There is an inherent message-boundary problem in any kind of stream communication, however, and you'll need to deal with it. There are several approaches; among the most commonly used are
Fixed-length messages. The receiver can then read until it successfully transfers the required number of bytes; any blocking involved is appropriate and needful. With this approach, the scenario you postulate simply does not arise, but the writer might need to pad its messages.
Delimited messages. The receiver then reads until it finds that it has received a message delimiter (a newline or a null byte, for example). In this case, the receiver will need to be prepared for the possibility of message boundaries not being aligned with the byte sequences transferred by read() calls. Marking the end of a message by closing the channel can be considered a special case of this alternative.
Embedded message-length metadata. This can take many forms, but one of the simplest is to structure messages as a fixed-length integer message length field, followed by that number of bytes of message data. The reader then knows at every point how many bytes it needs to read, so it will not block needlessly.
These can be used individually or in combination to implement an application-layer protocol for communicating between your processes. Naturally, the parties to the communication must agree on the protocol details for communication to be successful.

How to exit Fread() after encountering a delimiter or after some time?

I am trying to achieve something here , I get the data from a linux system in a named pipe,
the data is sporadic and does not have any determined frequency. So I have a server program in C which reads from the named pipe. But my requirement is that I have to send the data out to another program as soon I recieve the data from the client, but FREAD() function just sits on it until:
a)The buffer is full and it cannot read anymore (or)
b)The client closes the pipe.
The client would send every message with a delimiter of "\0", the size of the messages from the client can vary. My biggest question is how to BREAK fread after reading the message and waiting for couple of seconds and break the Fread. It just sits on the fread waiting for the data.
amountRead = fread(buffer+remaining, (size_t)1, (size_t)(BUFFER_SIZE-remaining), file);
Basically I am trying to understand if there is any way to break the FREAD after a certain amount of time (OR) based on a delimiter?
The most straightforward approach would be to implement your own buffering solution, and use select and read, while implementing a timeout mechanism for select. This would allow you to break off the the read operation based on some time-based criteria.
As for exiting early on a delimiter character, that's not going to happen. fread is buffered and blocking, so it's going to wait until it all the data it's requesting is available.
With select, you can have a dedicated thread waiting on data to be ready, and act on it, or wait for a timeout, etc.
Please refer to the references below for working examples.
References
How to implement a timeout in read function call?, Accessed 2014-06-25, <https://stackoverflow.com/questions/2917881/how-to-implement-a-timeout-in-read-function-call>
What is the difference between read() and fread()?, Accessed 2014-06-25, <https://stackoverflow.com/questions/584142/what-is-the-difference-between-read-and-fread>
Assuming the buffer size is 100 bytes and the delimiter is a comma (,):
fscanf(f, "%99[^,]", buffer);
If the delimiter or buffer size is not something you can easily hard-code, you can use snprintf to construct the format string programmatically, as in:
char fmt[10+3*sizeof(size_t)];
snprintf(fmt, sizeof fmt, "%%%zu[^%c]", sizeof buffer, delim);
fscanf(f, fmt, buffer);
Alternatively, you can loop calling getc until you get the delimiter. Depending on the implementation and the expected run length, this could be slower or faster than the fscanf method.
On POSIX 2008 conforming systems, you could alternatively use the getdelim function. This allows arbitrary input length (automatically allocated) which may be an advantage (ease of use and flexibility) or a disadvantage (bad input can exhaust all memory).
Edit: Sorry, I missed the part about needing a timeout. In that case using stdio is difficult, and you might be better off writing your own buffering system.

Perl TCP Socket programming vs C recv() function. Do I need to keep track of bytes received?

I am familiar with TCP/IP programming in C but am somewhat new to Perl. In C you need to write a loop around your recv() statement since you are not guaranteed to get all your data from the remote server in one recv() statement.
i.e.
while(Size != 0)
{
Status = recv(Socket, Buffer, Size, 0);
Buffer = Buffer + Status;
Size = Size - Status;
}
Pretty much all the examples I have seen in Perl show just printing what you get from the socket without keeping track of the bytes you received.
i.e.
my $new_sock=$sock->accept();
while(<$new_sock>)
{
print "$_\n";
}
close($sock);
So, does Perl somehow make sure you get all your data without having to count your bytes?
If the answer is no, can someone point me to an example of Perl TCP code that does keep track like my C example above?
The < > (readline function) is rather high level and great for convenience.
IO::Socket supports a recv method.
As IO::Socket inherits from IO::Handle you may also be interested in the read or sysread methods that closely emulate the low-level C interface for which you're looking...
Your example in C is wrong. recv returns -1 if it failed to read because of an error and 0 on eof (e.g. shutdown of the connection), so you need to exit the loop once status<=0. A similar behavior is with recv in Perl: it returns undef on error and the buffer will be '' on eof. And you get the number of bytes read by checking the size of the buffer (length($buf)).
But, in your example you don't use recv, but <>, which is similar to getline, e.g. buffered I/O. And like buffered I/O in C it will try to read until the requirement (e.g. full line read) is met or eof occured.
So for buffered I/O similar to fread,fwrite and getline use read,write and <>, for unbuffered I/O use recv,sysread and syswrite which behave the same as in C recv,read and write, e.g. read/write as much is possible but don't block unnecessarily.
In the C example, you read a fixed number of bytes. In the Perl example, you read until the stream is closed. The two snippets are different because they are perform different tasks.
They don't even use the same tools. The C example uses recv(2), while the Perl example uses read(2).
Mind you, you should never read until the stream is closed without validating that you have received the complete stream by some other means. If the connection teardown packets arrive in the wrong order, you can end up missing the end of the stream without knowing it. (Used to happen to me a lot using FTP clients back in the 90's.) The Perl example is not guaranteed to get all the data (though it usually will).

Resources