Why can't linux write more than 2147479552 bytes? - c

In man 2 write the NOTES section contains the following note:
On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)
Why is that?
The DESCRIPTION path has the following sentence:
According to POSIX.1, if count is greater than SSIZE_MAX, the result is implementation-defined
SSIZE_MAX is way bigger than 0x7ffff000. Why is this note there?
Update: Thanks for the answer! In case anyone is interested (and for better SEO to help developers out here), all functions with that limititations are:
read
write
sendfile
To find this out one just has to full text search the manual:
% man -wK "0x7ffff000"
/usr/share/man/man2/write.2.gz
/usr/share/man/man2/read.2.gz
/usr/share/man/man2/sendfile.2.gz
/usr/share/man/man2/sendfile.2.gz

Why is this here?
I don't think there's necessarily a good reason for this - I think this is basically a historical artifact. Let me explain with some git archeology.
In current Linux, this limit is governed by MAX_RW_COUNT:
ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
{
[...]
if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
That constant is defined as the AND of the integer max value and the page mask. This is roughly equal to the max integer size minus the size of one page.
#define MAX_RW_COUNT (INT_MAX & PAGE_MASK)
So that's where 0x7ffff000 comes from - your platform has pages which are 4096 bytes wide, which is 212, so it's the max integer value with the bottom 12 bits unset.
The last commit to change this, ignoring commits which just move things around, was e28cc71572da3.
Author: Linus Torvalds <torvalds#g5.osdl.org>
Date: Wed Jan 4 16:20:40 2006 -0800
Relax the rw_verify_area() error checking.
In particular, allow over-large read- or write-requests to be downgraded
to a more reasonable range, rather than considering them outright errors.
We want to protect lower layers from (the sadly all too common) overflow
conditions, but prefer to do so by chopping the requests up, rather than
just refusing them outright.
So, this gives us a reason for the change: to prevent integer overflow, the size of the write is capped at a size near the maximum integer. Most of the surrounding logic seems to have been changed to use longs or size_t's, but the check remains.
Before this change, giving it a buffer larger than INT_MAX would result in an EINVAL error:
if (unlikely(count > INT_MAX))
goto Einval;
As for why this limit was put in place, it existed prior to 2.6.12, the first version that was put into git. I'll let someone with more patience than me figure that one out. :)
Is this POSIX compliant?
Putting on my standards lawyer hat, I think this is actually POSIX compliant. Yes, POSIX does say that writes larger than SSIZE_MAX are implementation-defined behavior, and this is not larger than that limit. However, there are two other sentences in the standard which I think are important:
The write() function shall attempt to write nbyte bytes from the buffer pointed to by buf to the file associated with the open file descriptor, fildes.
[...]
Upon successful completion, write() and pwrite() shall return the number of bytes actually written to the file associated with fildes. This number shall never be greater than nbyte. Otherwise, -1 shall be returned and errno set to indicate the error.
The partial write is explicitly allowed by the standard. For this reason, all code which calls write() needs to wrap calls to write() in a loop which retries short writes.
Should the limit be raised?
Ignoring the historical baggage, and the standard, is there a reason to raise this limit today?
I'd argue the answer is no. The optimal size of the write() buffer is a tradeoff between trying to avoid excessive context switches between kernel and userspace, and ensuring your data fits into cache as much as possible.
The coreutils programs (which provide cat, cp, etc) use a buffer size of 128KiB. The optimal size for your hardware might be slightly larger or smaller. But it's unlikely that 2GB buffers are going to be faster.

Related

Why is the return type for ftell not fpos_t?

According to C99, the prototype for ftell is:
long int ftell(FILE *stream);
From what I undersood it should be the following instead:
fpos_t ftell(FILE *stream);
Why is that?
From §7.19.1-2
fpos_t which is an object type other than an array type capable of recording all the information needed to specify uniquely every position within a file.
I understand that fpos_t should be used to record a position within a file. So ftell which returns a position within a file should be of that type. Instead it is:
signed
of type long which can be too small or too big to access a file on certain architectures.
Notice that fpos_t is
[...] a complete object type other than an array type capable of recording all the information needed to specify uniquely every position within a file.
So it can can be even a structure, totally unusable for anything else besides calling fsetpos!
On the other hand the return value of ftell is a scalar which is guaranteed to be possible to use in telling the exact byte position in a binary file:
For a binary stream, the value is the number of characters from the beginning of the file.
Other than that, the reason is backwards-compatibility. ftell debuted in C89, and perhaps then the expectation was that long would scale fast enough to contain all file sizes, something that is not always true nowadays. Unfortunately it is not possible to change the type returned by ftell but it is too late to change that now - even those platforms that support larger files now have functions with another name, such as ftello.
the signedness is required because the function returns -1 on error.
From the manpage of fgetpos()/fsetpos():
On some non-UNIX systems, an fpos_t object may be a complex object and these routines may be the only way to portably reposition a text stream.
whereas ftell() is required to return the offset of the file pointer in the file. These are completely different interfaces.
Historical reasons.
fseek and ftell are very old functions, predating C standardization. They assume that long is big enough to represent a position in any file -- an assumption that was probably valid at the time. long is at least 32 bits, and obviously you couldn't have a single file bigger than 2 gigabytes (or even 1.21 gigabytes).
By the time the first C standard was published (ANSI C, 1989), it was becoming obvious that this assumption was no longer valid, but changing the definitions of fseek and ftell would have broken existing code. Furthermore, there was still no integer type wider than long (long long wasn't introduced until C99).
The ANSI C committee decided that fseek and ftell were still useful, but they introduced new file positioning functions fsetpos and fgetpos. These functions use an opaque non-numeric type fpos_t rather than long, which makes them both more and less flexible than fseek and ftell. An implementation can define fpos_t so it can represent any possible file offset -- but since it's a non-numeric type, fsetpos and fgetpos don't provide the SEEK_SET / SEEK_CUR / SEEK_END feature. For example, there's no way to use fsetpos to position a file to its end.
Some of this is addressed in the ANSI C Rationale, section 4.9.9:
Given these restrictions, the Committee still felt that this function [fseek] has enough utility, and is used in sufficient existing code, to warrant its retention in the Standard. fgetpos and fsetpos have been added to deal with files which are too large to handle with fseek and ftell.
If this were being defined from scratch today, there would probably be a single pair of functions covering all the functionality of the current four functions, likely using a typedefed integer type required to be big enough to represent any possible file offset. (With current systems, 64 bits is likely to be sufficient, but I wouldn't be surprised to see 8-exabyte files before too long on large systems).
The most probable use of that is to allow error return values as negative numbers. It's the same as the printf family of functions, that return ssize_t instead of size_t, which happens to be a signed version of size_t.
The first of these tricks happened with getchar(), which returns int instead of char, to allow for the value returned on end of file condition (EOF) which is normally a negative value to contrast with the whole set of possible returned characters (in the range of 0 to 255, all positive integers)
Why don't define a signed extension of the same type to allow for -1? Don't actually know :)

C Buffer underflows definition and associated risk

According to Wikipedia:
In computing, buffer underrun or buffer underflow is a state occurring when a buffer used to communicate between two devices or processes is fed with data at a lower speed than the data is being read from it.
From apple's secure coding guide:
Fundamentally, buffer underflows occur when two parts of your code disagree about the size of a buffer or the data in that buffer. For example, a fixed-length C string variable might have room for 256 bytes, but might contain a string that is only 12 bytes long.
Apple's definition complements the idea of buffer overflow.
Which of these definitions is technically more sound?
Is buffer underflow a major security concern? I have the habbit of using large buffers to poll and read() from serial ports or sockets (although I do use bzero()). Is this the right thing to do?
Those are two different usages of the word "underflow". As they are describing two different things, I don't think you can compare them on technical soundness.
Buffer underflow, as per Apple's definition, could be a weakness. See http://cwe.mitre.org/data/definitions/124.html.
2) ' I do use bzero()). Is this the right thing to do?'
Almost certainly no. The system calls return how many bytes have been received. If you're absolutely certain that you are going to receive text-style data with no embedded nulls, and wish to use C-style string lib calls on it, just push one null onto the end of the buffer, (this often means reading one less byte than the declared buffer length, to ensure thare is enough space for the null). In all other cases, just don't bother with the terminator at all. It's going to be either pointless or dangerous.
bzero() is just a waste of cycles in the case of network buffers. I don't care how many web page examples there are or how many sources say 'vars/buffers must be initialized'. It's rubbish.

Weird behavior in pread()?

so I have a functions which takes a an offset and a width to read from a device(usually harddisks). Now I currently use fseeko() and fread() to read from the disk. However I like to replace this with pread as its just more concise. It seems however that pread ALWAYS reads from offset 0 no matter what.
Here is the code of the function:
Just some Heads up.
I do have
#define _FILE_OFFSET_BITS 64
at the top of my code. I also tried pread64() with the same result!
fseeko() with fread(), this does what I want!:
uint8_t retrievedata(FILE *fp,uint64_t seekpoint, uint64_t seekwidth) {
unsigned char buf[seekwidth];
if(fseeko(fp,seekpoint,SEEK_SET)==0) {
if(fread(buf,sizeof buf,1,fp)==1) {
/* do work with retrieved data */
}
else {
printf("ERROR READING AT: %"PRIu64"| WITH VAL WIDTH: %"PRIu64"\n",seekpoint,seekwidth);
return 4;
}
}
else {
printf("ERROR SEEKING AT: %"PRIu64"\n",seekpoint);
return 3;
}
}
pread(), This always reads from offset 0 no matter what the 'seekpoint' is:
uint8_t retrievedata(FILE *fp,uint64_t seekpoint, uint64_t seekwidth) {
unsigned char buf[seekwidth]
if (pread(fileno(fp),buf,seekwidth,seekpoint)!=-1) {
/* do something */
} else {
printf("ERROR SEEKING AND/OR READING AT: %"PRIu64"\n",seekpoint);
return 3;
}
}
Enable compiler warnings (-W -Wall for GCC), and read the "Feature Test Macro Requirements for GLIBC" sections in the man pages.
The warnings would have indicated you missed another macro definition needed, and the man 2 pread man page told you that the macro definition you also need is
#define _POSIX_C_SOURCE 200809L
to get the GNU C library version 2.12 or newer to declare pread() etc. correctly.
The behaviour you are seeing, is due to not having that declaration.
I want to tell you how I perceive the full story, because I hope it will show you exactly how and how much you can save sweat and effort by reading and understanding the man pages, and more importantly, by enabling and addressing the compiler warnings.
I know you think you just don't have time to do all that right now (maybe later, right?), but you're wrong: it is one of the best ways to save time, when writing C for Linux (or POSIX-like systems in general).
I myself always use -W -Wall with GCC. It just makes it much easier to find where the cause of the problem lies. Right now, you were concentrating on the primary symptom instead, and since programming is not politics (yet), it won't get you anywhere.
So, full picture:
fread() returns the number of elements read. In your case, you read one element of seekwidth bytes, and you verify that one element was correctly read.
pread() returns the number of bytes read. Your second code snippet tries to read seekwidth bytes starting at offset seekpoint, but you don't check how much data was actually read, you only check whether an error occurred.
Here is what I think is happening to you:
You have omitted the #define _POSIX_C_SOURCE 200809L declaration (prior to #include <unistd.h>) required since glibc 2.12 to get the pread() function prototype declared
You are compiling without warnings, or you are ignoring the "implicit declaration of function `pread`" warning
You are compiling on a 32-bit architecture, or to 32-bit architecture using -m32 GCC option
Without a function prototype, the compiler assumes all parameters to the pread() function are ints -- and therefore 32-bit --, and only supplies a 32-bit seekpoint value to pread(). The three first parameters (an int, a pointer, and a size_t) all happen to be 32-bit, but the fourth parameter, file offset, is 64-bit. (Remember, you told your C library so, using #define _FILE_OFFSET_BITS 64.)
This means that the 64-bit file offset pread() receives, is garbage. (It depends on the byte order -- little-endian or Intel-like, or big-endian or Motorola/PowerPC/ARM-like -- and how the particular architecture binary interface (ABI) passes the fourth parameter to a function, exactly how the value is garbled.)
In this case, I believe you're targeting 32-bit Intel architecture, where often the value pread() actually receives (as garbled) is positive and larger than the file size, and therefore pread() returns 0: "past end-of-file, no more data to read".
Your code ignores that (since it is not -1), and instead assumes it read the data successfully. Most likely the data you're seeing is from a previous read -- pread() does not modify the buffer if it returns 0.
(There are other possible variants of the situation, too; perhaps even some where pread() always receives a zero as the (garbled) offset. It does not actually matter, as in all cases having the proper function prototype fixes the problem. You do need to check the pread() return value, too, as there are no guarantees it'll actually read the number of bytes you requested. It does so often, yes; but there are no guarantees, so make no unfounded assumptions, please.)

Buffer overflow that overwrites local variables

I'm doing a buffer overflow exercise where the source code is given. The exercise allows you to change the number of argument vectors you feed into the program so you can get around the null problem making it easy.
However the exercise also mentions that it is possible to use just 1 argument vector to compromise this code. I'm curious to see how this can be done. Any ideas on how to approach this would be greatly appreciated.
The problem here is that length needs to be overwritten in order for the overflow to take place and the return address to be compromised. To my knowledge, you can't really use NULLs in the string since they are being passed in via execve arguments. So the length ends up being a very large number as you have to write some non zero number causing the entire stack to go boom, it's the same case with the return address. Am I missing something obvious? Does strlen need to be exploited. I saw some references to arithmetic overflow of signed numbers but I'm not sure if turning the local variables does anything.
The code is posted below and returns to a main function which then ends the program and runs on a little endian system with all stack protection turned off as this is an introductory exercise for infosec:
int TrickyOverflowSeq ( char *in )
{
char to_be_exploited[128];
int c;
int limit;
limit = strlen(in);
if (limit > 144)
limit = 144;
for (c = 0; c <= limit; c++)
to_be_exploited[c] = in[c];
return(0);
}
I don't know where arg comes from, but since your buffer is only 128 bytes, and you cap the max length to 144, you need only pass in a string longer than 128 bytes to cause a buffer overrun when copying in to to_be_exploited. Any malicious code would be in the input buffer from positions 129 to 144.
Whether or not that will properly set up a return to a different location depends on many factors.
However the exercise also mentions that it is possible to use just 1 argument vector to compromise this code. I'm curious to see how this can be done.
...
The problem here is that length needs to be overwritten in order for the overflow to take place and the return address to be compromised.
It seems pretty straightforward to me. That magic number 144 makes sense if sizeof(int) == 8, which it would if you are building for 64-bit.
So assuming a stack layout where to_be_exploited comes before c and limit, you can simply pass in a very long string with junk in the bytes starting at offset 136 (i.e., 128 + sizeof(int)), and then carefully crafted junk in the bytes starting with offset 144. This will overwrite limit starting with that byte, thus disabling the length check. Then the carefully crafted junk overwrites the return address.
You could put almost anything into the 8 bytes starting at offset 136 and have them make a number that is large enough to disable the security check. Just make sure you don't end up with a negative number. For example, the string "HAHAHAHA" would evaluate, as an integer, to 5206522089439316033. This number is larger than 144... actually, it's too large as you want this function to stop copying once your string is copied. So you just need to figure out how long your attack string actually is and put the correct bytes for that length into that position, and the attack will be copied in.
Note that normal string-handling functions in C use a NUL byte as a terminator, and stop copying. This function doesn't do that; it just trusts limit. So you could put any junk you want in the input string to exploit this function. However, if normal C library functions need to copy the input data, you might end up needing to avoid NUL bytes.
Of course nobody should put code this silly into production.
EDIT: I wrote the above in a hurry. Now that I have more time, I re-read your question and I think I better understand what you wanted to have explained.
You are wondering how a string can correctly clobber limit with a correct length without having strlen() chop it off short. This is impossible on a big-endian computer, but perfectly possible on a little-endian computer.
On a little-endian computer, the first byte is the least significant byte. See the Wikipedia entry:
http://en.wikipedia.org/wiki/Endianness
Any number that is not ridiculously large must have zero in its most significant bytes. On a big-endian computer that means the first several bytes will all be zero, will act like a NUL, and will cause strlen() to chop the string before the function can clobber limit. However, on a little-endian computer, the important bytes you want copied will all come before the NUL bytes.
In the early days of the Internet, it was common for big-endian computers (often bought from Sun Microsystems) to run Internet server apps. These days, commodity x86 server hardware is most common, and x86 is little-endian. In practice, anyone deploying such exploitable code as the TrickyOverflowSeq() function will get 0wned.
If you don't think this answer is thorough enough, please post a comment explaining what part you think I need to cover better and I'll update the answer.
I am aware that this is quite an old post, however I stumbled on your question because I found myself in the same situation with exactly the same questions as the ones you ask in your post and in the comments.
A few minutes later, I solved the problem. I don't know how much of it I should "spoil" here, since AFAIK this is a typical problem in many Computer Security courses. I can say however that the solution can indeed be achieved with exactly one argument... and with a couple of environment variables. Additional hint: environment variables are stored after function arguments on the stack (as in in higher addresses than the function arguments).

Using fseek and ftell to determine the size of a file has a vulnerability?

I've read posts that show how to use fseek and ftell to determine the size of a file.
FILE *fp;
long file_size;
char *buffer;
fp = fopen("foo.bin", "r");
if (NULL == fp) {
/* Handle Error */
}
if (fseek(fp, 0 , SEEK_END) != 0) {
/* Handle Error */
}
file_size = ftell(fp);
buffer = (char*)malloc(file_size);
if (NULL == buffer){
/* handle error */
}
I was about to use this technique but then I ran into this link that describes a potential vulnerability.
The link recommends using fstat instead. Can anyone comment on this?
The link is one of the many nonsensical pieces of C coding advice from CERT. Their justification is based on liberties the C standard allows an implementation to take, but which are not allowed by POSIX and thus irrelevant in all cases where you have fstat as an alternative.
POSIX requires:
that the "b" modifier for fopen have no effect, i.e. that text and binary mode behave identically. This means their concern about invoking UB on text files is nonsense.
that files have a byte-resolution size set by write operations and truncate operations. This means their concern about random numbers of null bytes at the end of the file is nonsense.
Sadly with all the nonsense like this they publish, it's hard to know which CERT publications to take seriously. Which is a shame, because lots of them are serious.
If your goal is to find the size of a file, definitely you should use fstat() or its friends. It's a much more direct and expressive method--you are literally asking the system to tell you the file's statistics, rather than the more roundabout fseek/ftell method.
A bonus tip: if you only want to know if the file is available, use access() rather than opening the file or even stat'ing it. This is an even simpler operation which many programmers aren't aware of.
The reason to not use fstat is that fstat is POSIX, but fopen, ftell and fseek are part of the C Standard.
There may be a system that implements the C Standard but not POSIX. On such a system fstat would not work at all.
I'd tend to agree with their basic conclusion that you generally shouldn't use the fseek/ftell code directly in the mainstream of your code -- but you probably shouldn't use fstat either. If you want the size of a file, most of your code should use something with a clear, direct name like filesize.
Now, it probably is better to implement that using fstat where available, and (for example) FindFirstFile on Windows (the most obvious platform where fstat usually won't be available).
The other side of the story is that many (most?) of the limitations on fseek with respect to binary files actually originated with CP/M, which didn't explicitly store the size of a file anywhere. The end of a text file was signaled by a control-Z. For a binary file, however, all you really knew was what sectors were used to store the file. In the last sector, you had some amount of unused data that was often (but not always) zero-filled. Unfortunately, there might be zeros that were significant, and/or non-zero values that weren't significant.
If the entire C standard had been written just before being approved (e.g., if it had been started in 1988 and finished in 1989) they'd probably have ignored CP/M completely. For better or worse, however, they started work on the C standard in something like 1982 or so, when CP/M was still in wide enough use that it couldn't be ignored. By the time CP/M was gone, many of the decisions had already been made and I doubt anybody wanted to revisit them.
For most people today, however, there's just no point -- most code won't port to CP/M without massive work; this is one of the relatively minor problems to deal with. Making a modern program run in only 48K (or so) of memory for both the code and data is a much more serious problem (having a maximum of a megabyte or so for mass storage would be another serious problem).
CERT does have one good point though: you probably should not (as is often done) find the size of a file, allocate that much space, and then assume the contents of the file will fit there. Even though the fseek/ftell will give you the correct size with modern systems, that data could be stale by the time you actually read the data, so you could overrun your buffer anyway.
According to C standard, §7.21.3:
Setting the file position indicator to end-of-file, as with fseek(file,
0, SEEK_END), has undefined behavior for a binary stream (because of
possible trailing null characters) or for any stream with
state-dependent encoding that does not assuredly end in the initial
shift state.
A letter-of-the-law kind of guy might think this UB can be avoided by calculating file size with:
fseek(file, -1, SEEK_END);
size = ftell(file) + 1;
But the C standard also says this:
A binary stream need not meaningfully support fseek calls with a
whence value of SEEK_END.
As a result, there is nothing we can do to fix this with regard to fseek / SEEK_END. Still, I would prefer fseek / ftell instead of OS-specific API calls.

Resources