Implement own read() system call using pread()

Implement own read() system call using pread() - c

this question is somewhat related to another question I posted before (I'm posting this here as a new question, as I didnt wanted to interrupt the ongoing discussion in the other thread). I'm trying to implement an own read() implementation using (among others) the pread system call. Note that this is not intended to provide better performance whatsover, but just to check how pread() could be used to achieve the same as read().
I'm intercepting the read() execution, and forward it to my own read_handler(). There I will extract the parameters from their respective registers and execute my mapping.
static volatile int read_handler(void){
register void *rdi asm ("rdi");
register void *rsi asm ("rsi");
register void *rdx asm ("rdx");
int fd = rdi;
char *buf = rsi;
int count = rdx;
printf("[OWN_READ] Got read(%d, %p, %d)\n", fd,buf, count);
int current_offset = lseek(fd, 0, SEEK_CUR);
int pread_return = pread(fd,buf,count,current_offset);
//set fd offset, as pread will NOT change it automatically
lseek(fd, current_offset+pread_return, SEEK_CUR);
return pread_return;
}
So, first I extract the parameter from the register, which works fine. Next i get the current file offset (as pread will not change the offset according to the man page). I call pread using the same parameters, in addition to the current offset defined by the file itself. Next, I update the file offset using lseek and return the number of bytes.
As stated in my previous question, a read() call within fseek will somehow break my read implementation. I had a function to get the current file size, which was as follows:
long get_file_size(const char *name)
{
FILE *temp_file = fopen(name, "rb");
if (temp_file == NULL)
{
return -1;
}
fseek(temp_file, 0L, SEEK_END);
long sz = ftell(temp_file);
fclose(temp_file);
return sz;
}
When I execute this function using the reference read() implementation, it returns the correct file size. My implementation on the other hand forces the get_file_size function to always return double the actually size.
My understanding of read() and pread() is that the main difference (regarding the functionality to read from a file) is that pread will not update the file offset, which I added in my implementation using lseek. Thus, (for now not including corner cased) my implementation should work just as the reference implementation.
Additonally (if it is helpfull), this "get_file_size" function works just fine:
unsigned long get_file_size()
{
const char *text_file = "/tmp/syscalltest/tests/truncate_test.txt";
int fd = open(text_file, O_RDONLY);
unsigned int size = lseek(fd, 0 , SEEK_END);
printf("[FILE SIZE] Current file size: %d\n",size );
close(fd);
return size;
}
My goal with this test is to check if pread() can produce the same output as a read() call. I tried to verify this by executing various test files, including on that will use the above get_file_size. In my previous question it was hinted that maybe my read() implementation has an error, which will force the get_file_size() to produce wrong results. I'm trying to understand if I have to check my read() implementation, or if the error is caused by the undefined behaviour of the use of fseek in the function. Thanks for any hints on which part might cause an error here.

Related

'correct' semantics for ftell() when used on a memory stream

Can anyone explain the 'correct' semantics for ftell() when used on a memory stream.
Given the following program:
#include <stdio.h>
#include <stdlib.h>
#include <gnu/libc-version.h>
int main(void)
{
puts (gnu_get_libc_version ());
size_t n_buffer = 1024;
char *buffer = calloc(n_buffer, sizeof(char));
FILE *file = fmemopen(buffer, n_buffer, "w");
/* "ABCD" */
static const char magic_number[] =
{
0x41, 0x42, 0x43, 0x44
};
const size_t written = fwrite(magic_number, 1, 4, file);
fprintf(stderr,"written=%d\n",written);
int fstatus = fflush(file);
fprintf(stderr,"fstatus=%d\n",fstatus);
long ftellpos = ftell(file);
fprintf(stderr,"ftellpos=%ld\n",ftellpos);
fstatus = fseek(file, 0, SEEK_END);
fprintf(stderr,"fstatus=%d\n",fstatus);
ftellpos = ftell(file);
fprintf(stderr,"ftellpos2=%ld\n",ftellpos);
return 0;
}
The output on RHEL7 is:
2.17
written=4
fstatus=0
ftellpos=4
fstatus=0
ftellpos2=4
Whereas the output on OpenSUSE Leap 42 is:
2.22
written=4
fstatus=0
ftellpos=0
fstatus=0
ftellpos2=4
(This led to a unit test failure in code I was looking at)
My questions are:
Is the fseek() required (by a standard) to make the result of ftell() valid?
Is this a bug or change in behaviour of glibc?
Why doesn't it work on OpenSUSE?
The most obvious implementation is for the file position indicator to be
an index in the memory buffer given to fmemopen.
Its hard to see how that could go wrong.
Indeed the implementation:
https://github.com/bminor/glibc/blob/73dfd088936b9237599e4ab737c7ae2ea7d710e1/libio/fmemopen.c
Has c->pos = pos + s; at line 85.
And presumably ftell() just returns c->pos (in a roundabout way)
There has been some re-organisation of the glibc source source between 2.17 and 2.22
that would probably explain this if I could unravel it.
But is it a bug or feature?
I'm not sure if the Posix and C standards fully specify whether ftell
should work correctly for a memory stream.
Intuitively its hard to see why it shouldn't be mandated as it
ought to just work.
http://man7.org/linux/man-pages/man3/fmemopen.3.html
Says:
"The current position is implicitly updated by I/O operations.
It can be explicitly updated using fseek(3), and determined using ftell(3)."
Other man pages mention that ftell might not have to work
for things that aren't really files.
However, I believe they really have devices in mind there.

Just found this quote on the net in a discussion:
The ftell() Open Group Base Specifications Issue 7 doc states ‘ftell() shall return the current value of the file-position indicator for the stream.’ The file position indicator is not updated without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), or until the buffer is full.
So, it looks like there is some buffer handling difference in rh and suse. You would need to flush the buffer in a way to read the correct position in the file.

Is it recommended method for computing the size of a file using fseek()?

In C, we can find the size of file using fseek() function. Like,
if (fseek(fp, 0L, SEEK_END) != 0)
{
// Handle repositioning error
}
So, I have a question, Is it recommended method for computing the size of a file using fseek() and ftell()?

If you're on Linux or some other UNIX like system, what you want is the stat function:
struct stat statbuf;
int rval;
rval = stat(path_to_file, &statbuf);
if (rval == -1) {
perror("stat failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
On Windows under MSVC, you can use _stati64:
struct _stati64 statbuf;
int rval;
rval = _stati64(path_to_file, &statbuf);
if (rval == -1) {
perror("_stati64 failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
Unlike using fseek, this method doesn't involve opening the file or seeking through it. It just reads the file metadata.

The fseek()/ftell() works sometimes.
if (fseek(fp, 0L, SEEK_END) != 0)
printf("Size: %ld\n", ftell(fp));
}
Problems.
If the file size exceeds about LONG_MAX, long int ftell(FILE *stream) response is problematic.
If the file is opened in text mode, the return value from ftell() may not correspond to the file length. "For a text stream, its file position indicator contains unspecified information," C11dr §7.21.9.4 2
If the file is opened in binary mode, fseek(fp, 0L, SEEK_END) is not well defined. "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." C11dr footnote 268. #Evert This most often applies to earlier platforms than today, but it is still part of the spec.
If the file is a stream like a serial input or stdin, fseek(file, 0, SEEK_END) makes little sense.
The usual solution to finding file size is a non-portable platform specific one. Example good answer #dbush.
Note: If code attempts to allocate memory based on file size, the memory available can easily be exceeded by the file size.
Due to these issues, I do not recommend this approach.
Typically the problem should be re-worked to not need to find the file size, but to grow the data as more input is processed.
LL disclaimer: Note that C spec footnotes are informative and so not necessarily normative.

The best method in my opinion is fstat(): https://linux.die.net/man/2/fstat

Well, you can estimate the size of a file in several ways:
You can read(2) the file from the beginning to the end, and the number or chars read is the size of the file. This is a tedious way of getting the size of a file, as you have to read the whole file to get the size. But if the operating system doesn't allow to position the file pointer arbitrarily, then this is the only way to get the file size.
Or you can move the pointer at the end of file position. This is the lseek(2) you showed in the question, but be careful that you have to do the system call twice, as the value returned is the actual position before moving the pointer to the desired place.
Or you can use the stat(2) system call, that will tell you all the administrative information of the file, like the owner, group, permissions, size, number of blocks the file occupies in the disk, disk this file belongs to, number of directory entries pointing to it, etc. This allows you to get all this information with only one syscall.
Other methods you point (like the use of the ftell(3) stdio library call) will work also (with the same problem that it results in two system calls to set and retrieve/restore the file pointer) but have the problem of involving libraries that probably you are not using for anything else. It should be complicated to get a FILE * pointer (e.g. fdopen(3)) on a int file descriptor, just to be able to use the ftell(3) function on it (twice), and then fclose(3) it again.

open a temporary C FILE* for input

I have a legacy function accepting a FILE* pointer in a library. The contents I would like to parse is actually in memory, not on disk.
So I came up with the following steps to work around this issue:
the data is in memory at this point
fopen a temporary file (using tmpnam or tmpfile) on disk for writing
fclose the file
fopen the same file again for reading - guaranteed to exist
change the buffer using setvbuf(buffer, size)
do the legacy FILE* stuff
close the file
remove the temporary file
the data can be discarded
On windows, it looks like this:
int bufferSize;
char buffer[bufferSize];
// set up the buffer here
// temporary file name
char tempName [L_tmpnam_s];
tmpnam_s(tempName, L_tmpnam_s);
// open/close/reopen
fopen_s(&fp, tempName,"wb");
fclose(fp);
freopen_s(&fp, tempName,"rb", fp);
// replace the internal buffer
setvbuf(fp, buffer, _IONBF, bufferSize);
fp->_ptr = buffer;
fp->_cnt = bufferSize;
// do the FILE* reading here
// close and remove tmp file
fclose(fp);
remove(tempName);
Works, but quite cumbersome. The main problem, aside from the backwardness of this approach, are:
the temporary name needs to be determined
the temporary file is actually written to disk
the temporary file needs to be removed afterwards
I'd like to keep things portable, so using Windows memory-mapped functions or boost's facilities is not an option. The problem is mainly that, while it is possible to convert a FILE* to an std::fstream, the reverse seems to be impossible, or at least not supported on C++99.
All suggestions welcome!
Update 1
Using a pipe/fdopen/setvbuf as suggested by Speed8ump and a bit of twiddling seems to work. It does no longer create files on disk nor does it consume extra memory. One step closer, except, for some reason, setvbuf is not working as expected. Manually fixing it up is possible, but of course not portable.
// create a pipe for reading, do not allocate memory
int pipefd[2];
_pipe(pipefd, 0, _O_RDONLY | _O_BINARY);
// open the read pipe for binary reading as a file
fp = _fdopen(pipefd[0], "rb");
// try to switch the buffer ptr and size to our buffer, (no buffering)
setvbuf(fp, buffer, _IONBF, bufferSize);
// for some reason, setvbuf does not set the correct ptr/sizes
fp->_ptr = buffer;
fp->_charbuf = fp->_bufsiz = fp->_cnt = bufferSize;
Update 2
Wow. So it seems that unless I dive into the MS-specific implementation CreateNamedPipe / CreateFileMapping, POSIX portability costs us an entire memcopy (of any size!), be it to file or into a pipe. Hopefully the compiler understands that this is just a temporary and optimizes this. Hopefully.
Still, we eliminated the silly device writing intermediate. Yay!
int pipefd[2];
pipe(pipefd, bufferSize, _O_BINARY); // setting internal buffer size
FILE* in = fdopen(pipefd[0], "rb");
FILE* out = fdopen(pipefd[1], "wb");
// the actual copy
fwrite(buffer, 1, bufferSize, out);
fclose(out);
// fread(in), fseek(in), etc..
fclose(in);

You might try using a pipe and fdopen, that seems to be portable, is in-memory, and you might still be able to do the setvbuf trick you are using.

Your setvbuf hack is a nice idea, but not portable. C11 (n1570):
7.21.5.6 The setvbuf function
Synopsis
#include <stdio.h>
int setvbuf(FILE * restrict stream,
char * restrict buf,
int mode, size_t size);
Description
[...] If buf is not a null pointer, the array it points to may be used instead of a buffer allocated by the setvbuf function [...] and the argument size specifies the size of the array; otherwise, size may determine the size of a buffer allocated by the setvbuf function. The contents of the array at any time are indeterminate.
There is neither a guarantee that the provided buffer is used at all, nor about what it contains at any point after the setvbuf call until the file is closed or setvbuf is called again (POSIX doesn't give more guarantees).
The easiest portable solution, I think, is using tmpfile, fwrite the data into that file, fseek to the beginning (I'm not sure if temporary files are guaranteed to be seekable, on my Linux system, it appears they are, and I'd expect them to be elsewhere), and pass the FILE pointer to the function. This still requires copying in memory, but I guess usually no writing of the data to the disk (POSIX, unfortunately, implicitly requires a real file to exist). A file obtained by tmpfile is deleted after closing.

Buffering of standard I/O library

In the book Advanced Programming in the UNIX Environments (2nd edition), the author wrote in Section 5.5 (stream operations of the standard I/O library) that:
When a file is opened for reading and writing (the plus sign in the type), the following restrictions apply.
Output cannot be directly followed by input without an intervening fflush, fseek, fsetpos, or rewind.
Input cannot be directly followed by output without an intervening fseek, fsetpos, or rewind, or an input operation that encounters an end of file.
I got confused about this. Could anyone explain a little about this? For example, in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program? I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear.

You aren't allowed to intersperse input and output operations. For example, you can't use formatted input to seek to a particular point in the file, then start writing bytes starting at that point. This allows the implementation to assume that at any time, the sole I/O buffer will only contain either data to be read (to you) or written (to the OS), without doing any safety checks.
f = fopen( "myfile", "rw" ); /* open for read and write */
fscanf( f, "hello, world\n" ); /* scan past file header */
fprintf( f, "daturghhhf\n" ); /* write some data - illegal */
This is OK, though, if you do an fseek( f, 0, SEEK_CUR ); between the fscanf and the fprintf because that changes the mode of the I/O buffer without repositioning it.
Why is it done this way? As far as I can tell, because OS vendors often want to support automatic mode switching, but fail. The stdio spec allows a buggy implementation to be compliant, and a working implementation of automatic mode switching simply implements a compatible extension.

It's not clear what you're asking.
Your basic question is "Why does the book say I can't do this?" Well, the book says you can't do it because the POSIX/SUS/etc. standard says it's undefined behavior in the fopen specification, which it does to align with the ISO C standard (N1124 working draft, because the final version is not free), 7.19.5.3.
Then you ask, "in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program?"
Undefined behavior will always cause unexpected behavior, because the whole point is that you're not allowed to expect anything. (See 3.4.3 and 4 in the C standard linked above.)
But on top of that, it's not even clear what they could have specified that would make any sense. Look at this:
int main(int argc, char *argv[]) {
FILE *fp = fopen("foo", "r+");
fseek(fp, 0, SEEK_SET);
fwrite("foo", 1, 3, fp);
fseek(fp, 0, SEEK_SET);
fwrite("bar", 1, 3, fp);
char buf[4] = { 0 };
size_t ret = fread(buf, 1, 3, fp);
printf("%d %s\n", (int)ret, buf);
}
So, should this print out 3 foo because that's what's on disk, or 3 bar because that's what's in the "conceptual file", or 0 because there's nothing after what's been written so you're reading at EOF? And if you think there's an obvious answer, consider the fact that it's possible that bar has been flushed already—or even that it's been partially flushed, so the disk file now contains boo.
If you're asking the more practical question "Can I get away with it in some circumstances?", well, I believe on most Unix platforms, the above code will give you an occasional segfault, but 3 xyz (either 3 uninitialized characters, or in more complicated cases 3 characters that happened to be in the buffer before it got overwritten) the rest of the time. So, no, you can't get away with it.
Finally, you say, "I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear." This sounds like you're asking about the rationale.
You're right that it's about buffering. As I pointed out above, there really is no intuitive right thing to do here—but also, think about the implementation. Remember that the Unix way has always been "if the simplest and most efficient code is good enough, do that".
There are three ways you could implement something like stdio:
Use a shared buffer for read and write, and write code to switch contexts as needed. This is going to be a bit complicated, and will flush buffers more often than you'd ideally like.
Use two separate buffers, and cache-style code to determine when one operation needs to copy from and/or invalidate the other buffer. This is even more complicated, and makes a FILE object take twice as much memory.
Use a shared buffer, and just don't allow interleaving reads and writes without explicit flushes in between. This is dead-simple, and as efficient as possible.
Use a shared buffer, and implicitly flush between interleaved reads and writes. This is almost as simple, and almost as efficient, and a lot safer, but not really any better in any way other than safety.
So, Unix went with #3, and documented it, and SUS, POSIX, C89, etc. standardized that behavior.
You might say, "Come on, it can't be that inefficient." Well, you have to remember that Unix was designed for low-end 1970s systems, and the basic philosophy that it's not worth trading off even a little efficiency unless there's some actual benefit. But, most importantly, consider that stdio has to handle trivial functions like getc and putc, not just fancy stuff like fscanf and fprintf, and adding anything to those functions (or macros) that makes them 5x as slow would make a huge difference in a lot of real-world code.
If you look at modern implementations from, e.g., *BSD, glibc, Darwin, MSVCRT, etc. (most of which are open source, or at least commercial-but-shared-source), most of them do things the same way. A few add safety checks, but they generally give you an error for interleaving rather than implicitly flushing—after all, if your code is wrong, it's better to tell you that your code is wrong than to try to DWIM.
For example, look at early Darwin (OS X) fopen, fread, and fwrite (chosen because it's nice and simple, and has easily-linkable code that's syntax-colored but also copy-pastable). All that fread has to do is copy bytes out of the buffer, and refill the buffer if it runs out. You can't get any simpler than that.

reason 1
find the real file position to start.
due to the buffer implementation of the stdio, the stdio stream position may differ from the OS file position. when you read 1 byte, stdio mark the file position to 1. Due to the buffering, stdio may read 4096 bytes from the underlying file, where OS would record its file position at 4096. When you switch to output, you really need to choose which position you want to use.
reason 2
find the right buffer cursor to start.
tl;dr,
if an underlying implementation only uses a single shared buffer for both read and write, you have to flush the buffer when changing IO direction.
Take this glibc used in chromium os to demo how fwrite, fseek, and fflush handle the single shared buffer.
fwrite fill buffer impl:
fill_buffer:
while (to_write > 0)
{
register size_t n = to_write;
if (n > buffer_space)
n = buffer_space;
buffer_space -= n;
written += n;
to_write -= n;
if (n < 20)
while (n-- > 0)
*stream->__bufp++ = *p++;
else
{
memcpy ((void *) stream->__bufp, (void *) p, n);
stream->__bufp += n;
p += n;
}
if (to_write == 0)
/* Done writing. */
break;
else if (buffer_space == 0)
{
/* We have filled the buffer, so flush it. */
if (fflush (stream) == EOF)
break;
from this code snippet, we can see, if buffer is full, it will flush it.
Let's take a look at fflush
int
fflush (stream)
register FILE *stream;
{
if (stream == NULL) {...}
if (!__validfp (stream) || !stream->__mode.__write)
{
__set_errno (EINVAL);
return EOF;
}
return __flshfp (stream, EOF);
}
it uses __flshfp
/* Flush the buffer for FP and also write C if FLUSH_ONLY is nonzero.
This is the function used by putc and fflush. */
int
__flshfp (fp, c)
register FILE *fp;
int c;
{
/* Make room in the buffer. */
(*fp->__room_funcs.__output) (fp, flush_only ? EOF : (unsigned char) c);
}
the __room_funcs.__output by default is using flushbuf
/* Write out the buffered data. */
wrote = (*fp->__io_funcs.__write) (fp->__cookie, fp->__buffer,
to_write);
Now we are close. What's __write? Trace the default settings aforementioned, it's __stdio_write
int
__stdio_write (cookie, buf, n)
void *cookie;
register const char *buf;
register size_t n;
{
const int fd = (int) cookie;
register size_t written = 0;
while (n > 0)
{
int count = __write (fd, buf, (int) n);
if (count > 0)
{
buf += count;
written += count;
n -= count;
}
else if (count < 0
#if defined (EINTR) && defined (EINTR_REPEAT)
&& errno != EINTR
#endif
)
/* Write error. */
return -1;
}
return (int) written;
}
__write is the system call to write(3).
As we can see, the fwrite is only using only one single buffer. If you change direction, it can still store the previous write contents. From the above example, you can call fflush to empty the buffer.
The same applies to fseek
/* Move the file position of STREAM to OFFSET
bytes from the beginning of the file if WHENCE
is SEEK_SET, the end of the file is it is SEEK_END,
or the current position if it is SEEK_CUR. */
int
fseek (stream, offset, whence)
register FILE *stream;
long int offset;
int whence;
{
...
if (stream->__mode.__write && __flshfp (stream, EOF) == EOF)
return EOF;
...
/* O is now an absolute position, the new target. */
stream->__target = o;
/* Set bufp and both end pointers to the beginning of the buffer.
The next i/o will force a call to the input/output room function. */
stream->__bufp
= stream->__get_limit = stream->__put_limit = stream->__buffer;
...
}
it will soft flush (reset) the buffer at the end, which means read buffer will be emptied after this call.
This obeys the C99 rationale:
A change of input/output direction on an update file is only allowed following a successful fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions which assure that the I/O buffer has been flushed.

Read only buffered date from FILE object

I'd like to read only what is already in the buffer of a FILE object, so that afterwards the buffer is empty (and I can use things like sendfile which operates on file descriptors). I came up with this function, which seem to work on my 64bit Linux installation:
int readbuf(FILE *stream, char buf[], size_t *size) {
off_t pos = ftello(stream);
if (pos < 0) return -1;
off_t realpos = lseek(fileno(stream), 0, SEEK_CUR);
if (realpos < 0) return -1;
if (pos > realpos) {
errno = EIO;
return -1;
}
size_t bufsize = realpos - pos;
if (bufsize > *size) {
*size = bufsize;
errno = ERANGE;
return -1;
}
*size = bufsize;
if (fread(buf, bufsize, 1, stream) < 1) {
return -1;
}
return 0;
}
Now I wonder, can I assume this to work on other POSIX compliant operating systems? (On systems that provide all the involved functions.)

If the underlying file descriptor is seekable (either a regular file or a block device, unless you have other weird seekable objects on your system...) then there's no point in what you're trying to do. Just use ftello to get the logical position in the FILE, then discard the FILE and use sendfile. Using the already-buffered data in userspace is actually slower than sendfile anyway.
If the underlying file descriptor is not seekable, your whole approach does not work, because lseek will always return -1 and ftello will return EOF. A potential solution in this case:
Use dup to make a new file descriptor referring to the same open file description.
Open /dev/null write-only, and dup2 it on top of the old file descriptor number used by the FILE.
Reading from the FILE will succeed until the buffer is exhausted, then give read errors, since the file descriptor now refers to a non-readable file.
At this point, you're free to read directly from the duplicated fd made in the first step. You're also free to fclose the FILE.

For seekable files on Unix platforms you're supposed to be able to use fflush() to coordinate fd-based use with FILE*-based use, including for reading. The full details are given in http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05_01 and http://pubs.opengroup.org/onlinepubs/9699919799/functions/fflush.html.
This is an extension over what standard C gives you (unsurprisingly).

I do not believe the stdio API guarantees that this would work on any system. For instance, it might perform readahead if it notices the buffer is empty.
Your "solution" would be at most a specific implementation hack.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight