value of offset bigger than long int in fseek [duplicate] - c

This question already has answers here:
fseeko, fseeko64; ftello, ftello64 Visual C equivalents
(2 answers)
Closed 7 years ago.
we know that the parameters of fseek are :
int fseek(FILE *stream, long int offset, int whence)
i want to use it with an offset value bigger than long int what is the solution and is there any other functions to replace fseek ?
The value is (512 * 29358080)

There is a newer API for huge offsets, int fsetpos(FILE *stream, const fpos_t *pos); and int fgetpos(FILE * restrict stream, fpos_t * restrict pos); but you cannot use it to specify an actual offset. Too bad the Standard committee overlooked this one.
Some systems have an alternate set of FILE positioning functions with larger offsets:
int fseeko(FILE *stream, off_t offset, int whence);
off_t ftello(FILE *stream);
If your system has these and off_t is 64 bits, this is your best bet.
Another solution is to use move the file pointer multiple times with fseek(fp, offset, SEEK_CUR); until you reach the desired position. There is no guarantee it will work, but you can try and verify if your system's C library supports 64 offsets for standard streams.

Related

fseek() giving 2 different counts when I expected the same count

If I use
fseek(file_ptr, 0, SEEK_END);
size = ftell(file_ptr);
I get 480000 which is right as I have 60000 x double float at 8 bytes per double in the file.
But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes. Anyone know what they are?
But when I use
fseek(file_ptr, sizeof(double), SEEK_END);
size = ftell(file_ptr);
I get 480008 an extra 8 bytes
That is because the size of a double on your system is 8 bytes, and fseek() set the file position indicator 8 bytes from SEEK_END.
The new position, measured in bytes, is obtained by adding offset
bytes to the position specified by whence.
Re:
Anyone know what they are?
This is what the open group's manual page has to say about it:
The fseek() function shall allow the file-position indicator to be set
beyond the end of existing data in the file. If data is later written
at this point, subsequent reads of data in the gap shall return bytes
with the value 0 until data is actually written into the gap.
The behavior of fseek() on devices which are incapable of seeking is
implementation-defined. The value of the file offset associated with
such a device is undefined.
Note that this behaviour is only specified for POSIX-compliant systems.
The answer is in fseek() man page (Linux OS).
The function prototype is
int fseek(FILE *stream, long offset, int whence);
where offset parameter is described as follows
The new position, measured in bytes, is obtained by adding offset bytes to the position specified by whence.
Since your offset is 8 (sizeof double, in your system) and the original file size is 480000, that's why you get 480008.
The manual never mention any limitation preventing the pointer to be set after SEEK_END.

what does `l` in `lseek` of unistd.h mean?

I am reading APUE to explore the details of C and Unix, and encounter lseek
NAME
lseek - move the read/write file offset
SYNOPSIS
#include <unistd.h>
off_t lseek(int fildes, off_t offset, int whence);
What does l mean, is it length?
l is for long integer.
It is named like that to differentiate from the old seek() in version 2 of AT&T Unix. This is an anachronism before the off_t type was introduced.
References:
Infohost indicates:
The character l in the name lseek means "long integer". Before the
introduction of the off_t data type, the offset argument and the
return value were long integers. lseek was introduced with Version 7
when long integers were added to C. (Similar functionality was
provided in Version 6 by the functions seek and tell.)
As noted at the foot of lseek.html:
A seek() function appeared in Version 2 AT&T UNIX, later renamed into
lseek() for ``long seek'' due to a larger offset argument type.
Note: Paraphrased from Why is the function called lseek(), not seek()?

Opening big files with fopen [duplicate]

On a 32-bit system, what does ftell return if the current position indicator of a file opened in binary mode is past the 2GB point? In the C99 standard, is this undefined behavior since ftell must return a long int (maximum value being 2**31-1)?
on long int
long int is supposed to be AT LEAST 32-bits, but C99 standard does NOT limit it to 32-bit.
C99 standard does provide convenience types like int16_t & int32_t etc that map to correct bit sizes for a target platform.
on ftell/fseek
ftell() and fseek() are limited to 32 bits (including sign bit) on the vast majority of 32-bit architecture systems. So when there is large file support you run into this 2GB issue.
POSIX.1-2001 and SysV functions for fseek and ftell are fseeko and ftello because they use off_t as the parameter for the offset.
you do need to define compile with -D_FILE_OFFSET_BITS=64 or define it somewhere before including stdio.h to ensure that off_t is 64-bits.
Read about this at the cert.org secure coding guide.
On confusion about ftell and size of long int
C99 says long int must be at least 32-bits it does NOT say that it cannot be bigger
try the following on x86_64 architecture:
#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *fp;
fp = fopen( "test.out", "w");
if ( !fp )
return -1;
fseek(fp, (1L << 34), SEEK_SET);
fprintf(fp, "\nhello world\n");
fclose(fp);
return 0;
}
Notice that 1L is just a long, this will produce a file that's 17GB and sticks a "\nhello world\n" to the end of it. Which you can verify is there by trivially using tail -n1 test.out or explicitly using:
dd if=test.out skip=$((1 << 25))
Note that dd typically uses block size of (1 << 9) so 34 - 9 = 25 will dump out '\nhello world\n'
At least on a 32bit OS ftell() it will overflow or error or simply run into Undefined Behaviour.
To get around this you might like to use off_t ftello(FILE *stream); and #define _FILE_OFFSET_BITS 64.
Verbatim from man ftello:
The fseeko() and ftello() functions are identical to fseek(3) and ftell(3) (see fseek(3)), respectively, except that the offset argument of fseeko() and the return value of ftello() is of type off_t instead of long.
On many architectures both off_t and long are 32-bit types, but compilation with
#define _FILE_OFFSET_BITS 64
will turn off_t into a 64-bit type.
Update:
According to IEEE Std 1003.1, 2013 Edition ftell() shall return -1 and set errno to EOVERFLOW in such cases:
EOVERFLOW
For ftell(), the current file offset cannot be represented correctly in an object of type long.
There is no 64b aware method in C99 standard. What OS/environment are you using? On windows, there is _ftelli64.
On other platforms, look at http://forums.codeguru.com/showthread.php?277234-Cannot-use-fopen()-open-file-larger-than-4-GB
This worked for me on Windows32/MinGW to play with a 6GB file
#define _FILE_OFFSET_BITS 64
#include<stdio.h>
int main() {
FILE *f = fopen("largefile.zip","rb");
fseeko64(f, 0, SEEK_END);
off64_t size = ftello64(f);
printf("%llu\n", size);
}
gcc readlargefile.c -c -std=C99 -o readlargefile.exe
Every single detail, the macro, the compiler option, matters.

fseek to a 32-bit unsigned offset

I am reading a file format (TIFF) that has 32-bit unsigned offsets from the beginning of the file.
Unfortunately the prototype for fseek, the usual way I would go to particular file offset, is:
int fseek ( FILE * stream, long int offset, int origin );
so the offset is signed. How should I handle this situation? Should I be using a different function for seeking?
After studying this question more deeply and considering the other comments and answers (thank you), I think the simplest approach is to do two seeks if the offset is greater than 2147483647 bytes. This allows me to keep the offsets as uint32_t and continue using fseek. The positioning code is therefore like this:
// note: error handling code omitted
uint32_t offset = ... (whatever it is)
if( offset > 2147483647 ){
fseek( file, 2147483647, SEEK_SET );
fseek( file, (long int)( offset - 2147483647 ), SEEK_CUR );
} else {
fseek( file, (long int) offset, SEEK_SET );
}
The problem with using 64-bit types is that the code might be running on a 32-bit architecture (among other things). There is a function fsetpos which uses a structure fpos_t to manage arbitrarily large offsets, but that brings with it a range of complexities. Although fsetpos might make sense if I was truly using offsets of arbitrarily large size, since I know the largest possible offset is uint32_t, then the double seek meets that need.
Note that this solution allows all TIFF files to be handled on a 32-bit system. The advantage of this is obvious if you consider commercial programs like PixInsight. PixInsight can only handle TIFF files smaller than 2147483648 bytes when running on 32-bit systems. To handle full sized TIFF files, a user has to use the 64-bit version of PixInsight on a 64-bit computer. This is probably because the PixInsight programmers used a 64-bit type to handle the offsets internally. Since my solution only uses 32-bit types, I can handle full-sized TIFF files on a 32-bit system (as long as the underlying operating system can handle files that large).
You can try to use lseek64() (man page)
#define _LARGEFILE64_SOURCE /* See feature_test_macros(7) */
#include <sys/types.h>
#include <unistd.h>
off64_t lseek64(int fd, off64_t offset, int whence);
With
int fd = fileno (stream);
Notes from The GNU C lib - Setting the File Position of a Descriptor
This function is similar to the lseek function. The difference is that the offset parameter is of type off64_t instead of off_t which makes it possible on 32 bit machines to address files larger than 2^31 bytes and up to 2^63 bytes. The file descriptor filedes must be opened using open64 since otherwise the large offsets possible with off64_t will lead to errors with a descriptor in small file mode.
When the source file is compiled with _FILE_OFFSET_BITS == 64 on a 32 bits machine this function is actually available under the name lseek and so transparently replaces the 32 bit interface.
About fd and stream, from Streams and File Descriptors
Since streams are implemented in terms of file descriptors, you can extract the file descriptor from a stream and perform low-level operations directly on the file descriptor. You can also initially open a connection as a file descriptor and then make a stream associated with that file descriptor.

ftell at a position past 2GB

On a 32-bit system, what does ftell return if the current position indicator of a file opened in binary mode is past the 2GB point? In the C99 standard, is this undefined behavior since ftell must return a long int (maximum value being 2**31-1)?
on long int
long int is supposed to be AT LEAST 32-bits, but C99 standard does NOT limit it to 32-bit.
C99 standard does provide convenience types like int16_t & int32_t etc that map to correct bit sizes for a target platform.
on ftell/fseek
ftell() and fseek() are limited to 32 bits (including sign bit) on the vast majority of 32-bit architecture systems. So when there is large file support you run into this 2GB issue.
POSIX.1-2001 and SysV functions for fseek and ftell are fseeko and ftello because they use off_t as the parameter for the offset.
you do need to define compile with -D_FILE_OFFSET_BITS=64 or define it somewhere before including stdio.h to ensure that off_t is 64-bits.
Read about this at the cert.org secure coding guide.
On confusion about ftell and size of long int
C99 says long int must be at least 32-bits it does NOT say that it cannot be bigger
try the following on x86_64 architecture:
#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *fp;
fp = fopen( "test.out", "w");
if ( !fp )
return -1;
fseek(fp, (1L << 34), SEEK_SET);
fprintf(fp, "\nhello world\n");
fclose(fp);
return 0;
}
Notice that 1L is just a long, this will produce a file that's 17GB and sticks a "\nhello world\n" to the end of it. Which you can verify is there by trivially using tail -n1 test.out or explicitly using:
dd if=test.out skip=$((1 << 25))
Note that dd typically uses block size of (1 << 9) so 34 - 9 = 25 will dump out '\nhello world\n'
At least on a 32bit OS ftell() it will overflow or error or simply run into Undefined Behaviour.
To get around this you might like to use off_t ftello(FILE *stream); and #define _FILE_OFFSET_BITS 64.
Verbatim from man ftello:
The fseeko() and ftello() functions are identical to fseek(3) and ftell(3) (see fseek(3)), respectively, except that the offset argument of fseeko() and the return value of ftello() is of type off_t instead of long.
On many architectures both off_t and long are 32-bit types, but compilation with
#define _FILE_OFFSET_BITS 64
will turn off_t into a 64-bit type.
Update:
According to IEEE Std 1003.1, 2013 Edition ftell() shall return -1 and set errno to EOVERFLOW in such cases:
EOVERFLOW
For ftell(), the current file offset cannot be represented correctly in an object of type long.
There is no 64b aware method in C99 standard. What OS/environment are you using? On windows, there is _ftelli64.
On other platforms, look at http://forums.codeguru.com/showthread.php?277234-Cannot-use-fopen()-open-file-larger-than-4-GB
This worked for me on Windows32/MinGW to play with a 6GB file
#define _FILE_OFFSET_BITS 64
#include<stdio.h>
int main() {
FILE *f = fopen("largefile.zip","rb");
fseeko64(f, 0, SEEK_END);
off64_t size = ftello64(f);
printf("%llu\n", size);
}
gcc readlargefile.c -c -std=C99 -o readlargefile.exe
Every single detail, the macro, the compiler option, matters.

Resources