fseek - fails skipping a large amount of bytes? - c

I'm trying to skip a large amount of bytes before using fread to read the next bytes.
When size is small #define size 6404168 - it works:
long int x = ((long int)size)*sizeof(int);
fseek(fincache, x, SEEK_CUR);
When size is huge #define size 649218227, it doesn't :( The next fread reads garbage, can't really understand which offset is it reading from.
Using fread instead as a workaround works in both cases but its really slow:
temp = (int *) calloc(size, sizeof(int));
fread(temp,1, size*sizeof(int), fincache);
free(temp);

Assuming sizoef(int) is 4 and you are on a 32 bit system (where sizeof(long) is 4),
So 649218227*4 would overflow what a long can hold. Signed integer overflow is undefined behaviour. So you it works for smaller values (that's less than LONG_MAX).
You can use a loop instead to fseek() necessary bytes.
long x;
intmax_t len = size;
for(;len>0;){
x = (long) (len>LONG_MAX?LONG_MAX:len);
fseek(fincache, x, SEEK_CUR);
len = len-x;
}

The offset argument of fseek is required to be a long, not a long long. So x must fit into a long, else don't use fseek.

Since your platform's int is most likely 32-bit, multiplying 649,218,227 with sizeof(int) results in a number that exceeds INT_MAX and LONG_MAX, which are both 2**31-1 on 32-bit platforms. Since fseek accepts a long int, the resulting overflow causes your program to print garbage.
You should consult your compiler's documentation to find if it provides an extension for 64-bit seeking. On POSIX systems, for example, you can use fseeko, which accepts an offset of type off_t.
Be careful not to introduce overflow before even calling the 64-bit seeking function. Careful code could look like this:
off_t offset = (off_t) size * (off_t) sizeof(int);
fseeko(fincache, offset, SEEK_CUR);

Input guidance for fseek:
http://www.tutorialspoint.com/c_standard_library/c_function_fseek.htm
int fseek(FILE *stream, long int offset, int whence)
offset − This is the number of bytes to offset from whence.
You are invoking undefined behavior by passing a long long (whose value is bigger then the Max of Long int) to fseek rather then the required long.
As is known, UB can do anything, including not work.

Try this, You may have to read it out if it's such a large number
size_t toseek = 6404168;
//change the number to increase it
while(toseek>0)
{
char buffer[4096];
size_t toread = min(sizeof(buffer), toseek);
size_t read = fread(buffer, 1, toread, stdin);
toseek = toseek - read;
}

Related

Get size of char buffer for sprintf handling longs C

I am quite new to C and occurred a question, when dealing with long ints/char* in C. I want to store a long in a char*, but I am not sure, how I should manage the size of my buffer to fit any long given.
Thats what I want:
char buffer[LONG_SIZE]; // what should LONG_SIZE be to fit any long, not depending on the OS?
sprintf(buffer, "%ld", some_long);
I need to use C not C++. Is there any solution to this, if I don't want to use magic-numbers?
if I don't want to use magic-numbers
Using snprintf() with a 0-length buffer will return the number of chars needed to hold the result (Minus the trailing 0). You can then allocate enough space to hold the string on demand:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main(void) {
long some_long = LONG_MAX - 5;
// Real code should include error checking and handling.
int len = snprintf(NULL, 0, "%ld", some_long);
char *buffer = malloc(len + 1);
snprintf(buffer, len + 1, "%ld", some_long);
printf("%s takes %d chars\n", buffer, len);
free(buffer);
}
There's also asprintf(), available in Linux glibc and some BSDs, that allocates the result string for you, with a more convenient (But less portable) interface than the above.
Allocating the needed space on demand instead of using a fixed size has some benefits; it'll continue to work without further adjustment if you change the format string at some point in the future, for example.
Even if you stick with a fixed length buffer, I recommend using snprintf() over sprintf() to ensure you won't somehow overwrite the buffer.
It is probably more correct to use snprintf to compute the necessary size, but it seems like this should work:
char buf[ sizeof(long) * CHAR_BIT ];
The number of bits in a long is sizeof long * CHAR_BIT. (CHAR_BIT is defined in <limits.h>.) This can represent at most a signed number of magnitude 2sizeof long * CHAR_BIT - 1.
Such a number can have at most floor(log102sizeof long * CHAR_BIT - 1)+1 decimal digits. This is floor((sizeof long * CHAR_BIT - 1) * log102) + 1. log102 is less than .302, so (sizeof long * CHAR_BIT - 1) * 302 / 1000 + 1 bytes is enough for the digits.
Add one for a sign and one for a terminating null character, and char[(sizeof long * CHAR_BIT - 1) * 302 / 1000 + 3] suffices for the buffer.

C Programming - Size of 2U and 1024U

I know that the U literal means in c, that the value is a unsigned integer. An unsigned intagers size is 4 bytes.
But how big are 2U or 1024U? Does this simply mean 2 * 4 bytes = 8 bytes for example or does this notation means that 2 (or 1024) are unsigned integers?
My goal would be to figured out how much memory will be allocated if i call malloc like this
int *allocated_mem = malloc(2U * 1024U);
and prove in a short program my answer what i tried like this
printf("Size of 2U: %ld\n", sizeof(2U));
printf("Size of 1024U: %ld\n", sizeof(1024U));
I would have expeted for the first line a size of 2 * 4 Bytes = 8 and for the second like 1024 * 4 Bytes = 4096 but the output is always "4".
Would realy appreciate what 2U and 1024U means exactly and how can i check their size in C?
My goal would be to figured out how much memory will be allocated if i call malloc like this int *allocated_mem = malloc(2U * 1024U);
What is difficult about 2 * 1024 == 2048? The fact that they are unsigned literals does not change their value.
An unsigned intagers size is 4 bytes. (sic)
You are correct. So 2U takes up 4-bytes, and 1024U takes up 4-bytes, because they are both unsigned integers.
I would have expeted for the first line a size of 2 * 4 Bytes = 8 and for the second like 1024 * 4 Bytes = 4096 but the output is always "4".
Why would the value change the size? The size depends only on the type. 2U is of type unsigned int, so it takes up 4-bytes; same as 50U, same as 1024U. They all take 4-bytes.
You are trying to multiply the value (2) times the size of the type. That makes no sense.
How big?
2U and 1024U are the same size, the size of an unsigned, commonly 32-bits or 4 "bytes". The size of a type is the same throughout a given platform - it does not change because of value.
"I know that the U literal means in c, that the value is a unsigned integer." --> OK, close enough so far.
"An unsigned integers size is 4 bytes.". Reasonable guess yet C requires that unsigned are at least 16-bits. Further, the U makes the constant unsigned, yet that could be unsigned, unsigned long, unsigned long long, depending on the value and platform.
Detail: in C, 2U is not a literal, but a constant. C has string literals and compound literals. The literals can have their address taken, but &2U is not valid C. Other languages call 2U a literal, and have their rules on how it can be used.
My goal would be to figured out how much memory will be allocated if i call malloc like this int *allocated_mem = malloc(2U * 1024U);
Instead, better to use size_t for sizing than unsigned and check the allocation.
size_t sz = 2U * 1024U;
int *allocated_mem = malloc(sz);
if (allocated_mem == NULL) allocated_mem = 0;
printf("Allocation size %zu\n", allocated_mem);
(Aside) Be careful with computed sizes. Do your size math using size_t types. 4U * 1024U * 1024U * 1024U could overflow unsigned math, yet may compute as desired with size_t.
size_t sz = (size_t)4 * 1024 * 1024 * 1024;
The following attempts to print the size of the constants which is likely 32-bits or 4 "bytes" and not their values.
printf("Size of 1024U: %ld\n", sizeof(1024U));
printf("Size of 1024U: %ld\n", sizeof(2U));

Quick-fixing 32-bit (2GB limited) fseek/ftell on freebsd 7

I have old 32-bit C/C++ program on FreeBSD, which is used remotely by hundreds of users, and author of which will not fix it. It was written in unsafe way, all file offset are stored internally as unsigned 32-bit offsets, and ftell/fseek functions where used. In FreeBSD 7 (the host platform for software), it means that ftell and fseek uses 32-bit signed long:
int fseek(FILE *stream, long offset, int whence);
long ftell(FILE *stream);
I need to do quick fix of the program, because some internal data files suddenly hit 2^31 file size (2 147 483 7yy bytes) after 13 years of collecting data, and internal fseek/ftell assert fails now for any request.
In FreeBSD7 world there is fseeko/ftello hack for 2GB+ files.
int
fseeko(FILE *stream, off_t offset, int whence);
off_t
ftello(FILE *stream);
The off_t type here is not well-defined; all I know now, that it has 8-byte size and looks like long long OR unsigned long long (I don't know which one).
Is it enough (to work with up to 4 GB files) and safe to search-and-replace all ftell to ftello, and all fseek to fseeko (sed -i 's/ftell/ftello', same for seek), if possible usages of them are:
unsigned long offset1,offset2; //32bit
offset1 = (compute + it) * in + some - arithmetic;
fseek(file, 0, SEEK_END);
fseek(file, 4, SEEK_END); // or other small int constant
offset2 = ftell(file);
fseek(file, offset1, SEEK_SET); // No usage of SEEK_CUR
and combinations of such calls.
What is the signedness of off_t?
It is safe to assign 64-bit off_t into unsigned 32-bit offset? Will it work for bytes in range from 2 GB up to 4 GB?
Which functions may be used for working with offset besides ftell/fseek?
FreeBSD fseeko() and ftello() is documented as POSIX.1-2001 compatible, which means off_t is a signed integer type.
On FreeBSD 7, you can safely do:
off_t actual_offset;
unsigned long stored_offset;
if (actual_offset >= (off_t)0 && actual_offset < (off_t)4294967296.0)
stored_offset = (unsigned long)actual_offset;
else
some_fatal_error("Unsupportable file offset!");
(On LP64 architectures, the above would be silly, as off_t and long would both be 64-bit signed integers. It would be safe even then; just silly, since all possible file offsets could be supported.)
The thing that people do get bitten by often with this, is that the offset calculations must be done using off_t. That is, it is not enough to cast the result to off_t, you must cast the values used in the arithmetic to off_t. (Technically, you only need to make sure each arithmetic operation is done at off_t precision, but I find it easier to remember the rules if I just punt and cast all the operands.) For example:
off_t offset;
unsigned long some, values, used;
offset = (off_t)some * (off_t)value + (off_t)used;
fseeko(file, offset, SEEK_SET);
Usually the offset calculations are used to find a field in a specific record; the arithmetic tends to stay the same. I truly recommend you move the seek operations to a helper function, if possible:
int fseek_to(FILE *const file,
const unsigned long some,
const unsigned long values,
const unsigned long used)
{
const off_t offset = (off_t)some * (off_t)value + (off_t)used;
if (offset < (off_t)0 || offset >= (off_t)4294967296.0)
fatal_error("Offset exceeds 4GB; I must abort!");
return fseeko(file, offset, SEEK_SET);
}
Now, if you happen to be in a lucky position where you know all your offsets are aligned (to some integer, say 4), you can give yourself a couple of years of more time to rewrite the application, by using an extension of the above:
#define BIG_N 4
int fseek_to(FILE *const file,
const unsigned long some,
const unsigned long values,
const unsigned long used)
{
const off_t offset = (off_t)some * (off_t)value + (off_t)used;
if (offset < (off_t)0)
fatal_error("Offset is negative; I must abort!");
if (offset >= (off_t)(BIG_N * 2147483648.0))
fatal_error("Offset is too large; I must abort!");
if ((offset % BIG_N) && (offset >= (off_t)2147483648.0))
fatal_error("Offset is not a multiple of BIG_N; I must abort!");
return fseeko(file, offset, SEEK_SET);
}
int fseek_big(FILE *const file, const unsigned long position)
{
off_t offset;
if (position >= 2147483648UL)
offset = (off_t)2147483648UL
+ (off_t)BIG_N * (off_t)(position - 2147483648UL);
else
offset = (off_t)position;
return fseeko(file, offset, SEEK_SET);
}
unsigned long ftell_big(FILE *const file)
{
off_t offset;
offset = ftello(file);
if (offset < (off_t)0)
fatal_error("Offset is negative; I must abort!");
if (offset < (off_t)2147483648UL)
return (unsigned long)offset;
if (offset % BIG_N)
fatal_error("Offset is not a multiple of BIG_N; I must abort!");
if (offset >= (off_t)(BIG_N * 2147483648.0))
fatal_error("Offset is too large; I must abort!");
return (unsigned long)2147483648UL
+ (unsigned long)((offset - (off_t)2147483648UL) / (off_t)BIG_N);
}
The logic is simple: If offset is less than 231, it is used as-is. Otherwise, it is represented by value 231 + BIG_N × (offset - 231). The only requirement is that offset 231 and above are always multiples of BIG_N.
Obviously, you them must use only the above three functions -- plus whatever variants of fseek_to() you need, as long as they do the same checks, just use different parameters and formula for the offset calculation --, you can support file sizes of up to 2147483648 + BIG_N × 2147483647. For BIG_N==4, that is 10 GiB (less 4 bytes; 10,737,418,236 bytes to be exact).
Questions?
Edited to clarify:
Start with replacing your fseek(file, position, SEEK_SET) with calls to fseek_pos(file, position),
static inline void fseek_pos(FILE *const file, const unsigned long position)
{
if (fseeko(file, (off_t)position, SEEK_SET))
fatal_error("Cannot set file position!");
}
and fseek(file, position, SEEK_END) with calls to fseek_end(file, position) (for symmetry -- I'm assuming the position for this one is usually a literal integer constant),
static inline void fseek_end(FILE *const file, const off_t relative)
{
if (fseeko(file, relative, SEEK_END))
fatal_error("Cannot set file position!");
}
and finally, ftell(file) with calls to ftell_pos(file):
static inline unsigned long ftell_pos(FILE *const file)
{
off_t position;
position = ftello(file);
if (position == (off_t)-1)
fatal_error("Lost file position!");
if (position < (off_t)0 || position >= (off_t)4294967296.0)
fatal_error("File position outside the 4GB range!");
return (unsigned long)position;
}
Since on your architecture and OS unsigned long is a 32-bit unsigned integer type and off_t is a 64-bit signed integer type, this gives you the full 4GB range.
For the offset calculations, define one or more functions similar to
static inline void fseek_to(FILE *const file, const off_t term1,
const off_t term2,
const off_t term3)
{
const off_t position = term1 * term2 + term3;
if (position < (off_t)0 || position >= (off_t)4294967296.0)
fatal_error("File position outside the 4GB range!");
if (fseeko(file, position, SEEK_SET))
fatal_error("Cannot set file position!");
}
For each offset calculation algorithm, define one fseek_to variant. Name the parameters so that the arithmetic makes sense. Make the parameters const off_t, as above, so you don't need extra casts in the arithmetic. Only the parameters and the const off_t position = line defining the calculation algorithm vary between the variant functions.
Questions?

write big blocks to file with fwrite() (e.g. 1000000000)

I am attempting to write blocks with fwrite(). At this point the largest block I could write was 100000000 (it is probably a bit higher than that...I did not try..). I cannot write a block with the size 1000000000 the outputfile is 0 Byte.
Is there any possibilty to write blocks like e.g. 1000000000 and greater?
I am using uint64_t to store these great numbers.
Thank you in advance!
Code from pastebin in comment: -zw
char * pEnd;
uint64_t uintBlockSize=strtoull(chBlockSize, &pEnd, 10);
uint64_t uintBlockCount=strtoull(chBlockCount, &pEnd, 10);
char * content=(char *) malloc(uintBlockSize*uintBlockCount);
/*
Create vfs.structure
*/
FILE *storeFile;
storeFile = fopen (chStoreFile, "w");
if (storeFile!=NULL)
{
uint64_t i=uintBlockCount;
size_t check;
/*
Fill storeFile with empty Blocks
*/
while (i!=0)
{
fwrite(content,uintBlockSize, 1, storeFile);
i--;
}
You're assuming that the type used in your C library to represent the size of objects and index memory (size_t) can hold the same range of values as uint64_t. This may not be the case!
fwrite's manpage indicates that you can use the function to write blocks whose size is limited by the size_t type. If you're on a 32bit system, the block size value passed to fwrite will be cast from uint64_t to whatever the library's size_t is (uint32_t, for example, in which case a very large value will have its most significant digits lost).
I have had fwrite fail with a block >64MB compiled with gcc 4.1.2 on CentOS 5.3
I had to chop it up into smaller pieces.
I also had fread() fail for >64MB blocks on the same setup.
This seems to have been fixed in later Linux environments, e.g. Ubuntu 12.04.

size_t used as a value in a formula

Here is a short snippet of a function reading lines.
How is that possible that it compares bufsize with ((size_t)-1)/2 ?
I imagined comparing a variable to eg. int - that is just impossible; to INT_MAX on the contrary it is correct, I think.
So how can that code actually work and give no errors?
int c;
size_t bufsize = 0;
size_t size = 0;
while((c=fgetc(infile)) != EOF) {
if (size >= bufsize) {
if (bufsize == 0)
bufsize = 2;
else if (bufsize <= ((size_t)-1)/2)
bufsize = 2*size;
else {
free(line);
exit(3);
}
newbuf = realloc(line,bufsize);
if (!newbuf) {
free(line);
abort();
}
line = newbuf;
}
/* some other operations */
}
(size_t)-1
This is casting the value -1 to a size_t. (type)value is a cast in C.
Since size_t is an unsigned type, this is actually the maximum value that size_t can hold, so it's used to make sure that the buffer size can actually be safely doubled (hence the subsequent division by two).
The code relies on some assumptions about bits and then does a well known hack for finding the maximum size_t value (provided that size_t doesn't accommodate more bits than the register, a safe bet on many machines).
First it fills a register up with 1 bits, then it casts it into a size_t data type, so the comparison will work. As long as that register is larger in number of bits than the size_t data type, then the (if any) unused 1 bits will be truncated, and you will get the largest unsigned number that can fit in size_t bits.
After you have that, it divides by two to get half of that number, and does the comparison to see if it seems to be safe to increase size without going over the "maximum" size_t. but by then, it's dividing a size_t data type, and comparing two size_t data types (a type safe operation).
If you really wanted to remove this bit-wizardy (ok, it's not the worst example of bit wizardy I've seen). Consider that the following snippet
else if (bufsize <= ((size_t)-1)/2)
bufsize = 2*size;
could be replaced with
else if (bufsize <= (MAX_SIZE/2)
bufsize = 2*size;
and be type safe without casting and more readable.
(size_t)-1 casts -1 to the type size_t, which results in SIZE_MAX (a macro defined in stdint.h), the maximum value that the size_t type can hold.
So the comparison is checking whether bufsize is less than or equal to one half the maximum value that can be contained in a size_t
size_t isn't being interpreted as a value, it's being used to cast the value of negative one to the type size_t.
((size_t)-1)/2
is casting -1 to a size_t and then dividing by 2.
The size_t in ((size_t)-1)/2) is simply being used as a cast: casting -1 to size_t.
The trick here is that size_t is unsigned, so the cast (size_t) -1 will be converted to the maximum value of size_t, or SIZE_MAX. This is useful in the context of the loop. However, I'd prefer to see SIZE_MAX used directly rather than this trick.

Resources