I am reading APUE to explore the details of C and Unix, and encounter lseek
NAME
lseek - move the read/write file offset
SYNOPSIS
#include <unistd.h>
off_t lseek(int fildes, off_t offset, int whence);
What does l mean, is it length?
l is for long integer.
It is named like that to differentiate from the old seek() in version 2 of AT&T Unix. This is an anachronism before the off_t type was introduced.
References:
Infohost indicates:
The character l in the name lseek means "long integer". Before the
introduction of the off_t data type, the offset argument and the
return value were long integers. lseek was introduced with Version 7
when long integers were added to C. (Similar functionality was
provided in Version 6 by the functions seek and tell.)
As noted at the foot of lseek.html:
A seek() function appeared in Version 2 AT&T UNIX, later renamed into
lseek() for ``long seek'' due to a larger offset argument type.
Note: Paraphrased from Why is the function called lseek(), not seek()?
Related
On a 32-bit system, what does ftell return if the current position indicator of a file opened in binary mode is past the 2GB point? In the C99 standard, is this undefined behavior since ftell must return a long int (maximum value being 2**31-1)?
on long int
long int is supposed to be AT LEAST 32-bits, but C99 standard does NOT limit it to 32-bit.
C99 standard does provide convenience types like int16_t & int32_t etc that map to correct bit sizes for a target platform.
on ftell/fseek
ftell() and fseek() are limited to 32 bits (including sign bit) on the vast majority of 32-bit architecture systems. So when there is large file support you run into this 2GB issue.
POSIX.1-2001 and SysV functions for fseek and ftell are fseeko and ftello because they use off_t as the parameter for the offset.
you do need to define compile with -D_FILE_OFFSET_BITS=64 or define it somewhere before including stdio.h to ensure that off_t is 64-bits.
Read about this at the cert.org secure coding guide.
On confusion about ftell and size of long int
C99 says long int must be at least 32-bits it does NOT say that it cannot be bigger
try the following on x86_64 architecture:
#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *fp;
fp = fopen( "test.out", "w");
if ( !fp )
return -1;
fseek(fp, (1L << 34), SEEK_SET);
fprintf(fp, "\nhello world\n");
fclose(fp);
return 0;
}
Notice that 1L is just a long, this will produce a file that's 17GB and sticks a "\nhello world\n" to the end of it. Which you can verify is there by trivially using tail -n1 test.out or explicitly using:
dd if=test.out skip=$((1 << 25))
Note that dd typically uses block size of (1 << 9) so 34 - 9 = 25 will dump out '\nhello world\n'
At least on a 32bit OS ftell() it will overflow or error or simply run into Undefined Behaviour.
To get around this you might like to use off_t ftello(FILE *stream); and #define _FILE_OFFSET_BITS 64.
Verbatim from man ftello:
The fseeko() and ftello() functions are identical to fseek(3) and ftell(3) (see fseek(3)), respectively, except that the offset argument of fseeko() and the return value of ftello() is of type off_t instead of long.
On many architectures both off_t and long are 32-bit types, but compilation with
#define _FILE_OFFSET_BITS 64
will turn off_t into a 64-bit type.
Update:
According to IEEE Std 1003.1, 2013 Edition ftell() shall return -1 and set errno to EOVERFLOW in such cases:
EOVERFLOW
For ftell(), the current file offset cannot be represented correctly in an object of type long.
There is no 64b aware method in C99 standard. What OS/environment are you using? On windows, there is _ftelli64.
On other platforms, look at http://forums.codeguru.com/showthread.php?277234-Cannot-use-fopen()-open-file-larger-than-4-GB
This worked for me on Windows32/MinGW to play with a 6GB file
#define _FILE_OFFSET_BITS 64
#include<stdio.h>
int main() {
FILE *f = fopen("largefile.zip","rb");
fseeko64(f, 0, SEEK_END);
off64_t size = ftello64(f);
printf("%llu\n", size);
}
gcc readlargefile.c -c -std=C99 -o readlargefile.exe
Every single detail, the macro, the compiler option, matters.
I am reading a file format (TIFF) that has 32-bit unsigned offsets from the beginning of the file.
Unfortunately the prototype for fseek, the usual way I would go to particular file offset, is:
int fseek ( FILE * stream, long int offset, int origin );
so the offset is signed. How should I handle this situation? Should I be using a different function for seeking?
After studying this question more deeply and considering the other comments and answers (thank you), I think the simplest approach is to do two seeks if the offset is greater than 2147483647 bytes. This allows me to keep the offsets as uint32_t and continue using fseek. The positioning code is therefore like this:
// note: error handling code omitted
uint32_t offset = ... (whatever it is)
if( offset > 2147483647 ){
fseek( file, 2147483647, SEEK_SET );
fseek( file, (long int)( offset - 2147483647 ), SEEK_CUR );
} else {
fseek( file, (long int) offset, SEEK_SET );
}
The problem with using 64-bit types is that the code might be running on a 32-bit architecture (among other things). There is a function fsetpos which uses a structure fpos_t to manage arbitrarily large offsets, but that brings with it a range of complexities. Although fsetpos might make sense if I was truly using offsets of arbitrarily large size, since I know the largest possible offset is uint32_t, then the double seek meets that need.
Note that this solution allows all TIFF files to be handled on a 32-bit system. The advantage of this is obvious if you consider commercial programs like PixInsight. PixInsight can only handle TIFF files smaller than 2147483648 bytes when running on 32-bit systems. To handle full sized TIFF files, a user has to use the 64-bit version of PixInsight on a 64-bit computer. This is probably because the PixInsight programmers used a 64-bit type to handle the offsets internally. Since my solution only uses 32-bit types, I can handle full-sized TIFF files on a 32-bit system (as long as the underlying operating system can handle files that large).
You can try to use lseek64() (man page)
#define _LARGEFILE64_SOURCE /* See feature_test_macros(7) */
#include <sys/types.h>
#include <unistd.h>
off64_t lseek64(int fd, off64_t offset, int whence);
With
int fd = fileno (stream);
Notes from The GNU C lib - Setting the File Position of a Descriptor
This function is similar to the lseek function. The difference is that the offset parameter is of type off64_t instead of off_t which makes it possible on 32 bit machines to address files larger than 2^31 bytes and up to 2^63 bytes. The file descriptor filedes must be opened using open64 since otherwise the large offsets possible with off64_t will lead to errors with a descriptor in small file mode.
When the source file is compiled with _FILE_OFFSET_BITS == 64 on a 32 bits machine this function is actually available under the name lseek and so transparently replaces the 32 bit interface.
About fd and stream, from Streams and File Descriptors
Since streams are implemented in terms of file descriptors, you can extract the file descriptor from a stream and perform low-level operations directly on the file descriptor. You can also initially open a connection as a file descriptor and then make a stream associated with that file descriptor.
This question already has answers here:
fseeko, fseeko64; ftello, ftello64 Visual C equivalents
(2 answers)
Closed 7 years ago.
we know that the parameters of fseek are :
int fseek(FILE *stream, long int offset, int whence)
i want to use it with an offset value bigger than long int what is the solution and is there any other functions to replace fseek ?
The value is (512 * 29358080)
There is a newer API for huge offsets, int fsetpos(FILE *stream, const fpos_t *pos); and int fgetpos(FILE * restrict stream, fpos_t * restrict pos); but you cannot use it to specify an actual offset. Too bad the Standard committee overlooked this one.
Some systems have an alternate set of FILE positioning functions with larger offsets:
int fseeko(FILE *stream, off_t offset, int whence);
off_t ftello(FILE *stream);
If your system has these and off_t is 64 bits, this is your best bet.
Another solution is to use move the file pointer multiple times with fseek(fp, offset, SEEK_CUR); until you reach the desired position. There is no guarantee it will work, but you can try and verify if your system's C library supports 64 offsets for standard streams.
On a 32-bit system, what does ftell return if the current position indicator of a file opened in binary mode is past the 2GB point? In the C99 standard, is this undefined behavior since ftell must return a long int (maximum value being 2**31-1)?
on long int
long int is supposed to be AT LEAST 32-bits, but C99 standard does NOT limit it to 32-bit.
C99 standard does provide convenience types like int16_t & int32_t etc that map to correct bit sizes for a target platform.
on ftell/fseek
ftell() and fseek() are limited to 32 bits (including sign bit) on the vast majority of 32-bit architecture systems. So when there is large file support you run into this 2GB issue.
POSIX.1-2001 and SysV functions for fseek and ftell are fseeko and ftello because they use off_t as the parameter for the offset.
you do need to define compile with -D_FILE_OFFSET_BITS=64 or define it somewhere before including stdio.h to ensure that off_t is 64-bits.
Read about this at the cert.org secure coding guide.
On confusion about ftell and size of long int
C99 says long int must be at least 32-bits it does NOT say that it cannot be bigger
try the following on x86_64 architecture:
#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *fp;
fp = fopen( "test.out", "w");
if ( !fp )
return -1;
fseek(fp, (1L << 34), SEEK_SET);
fprintf(fp, "\nhello world\n");
fclose(fp);
return 0;
}
Notice that 1L is just a long, this will produce a file that's 17GB and sticks a "\nhello world\n" to the end of it. Which you can verify is there by trivially using tail -n1 test.out or explicitly using:
dd if=test.out skip=$((1 << 25))
Note that dd typically uses block size of (1 << 9) so 34 - 9 = 25 will dump out '\nhello world\n'
At least on a 32bit OS ftell() it will overflow or error or simply run into Undefined Behaviour.
To get around this you might like to use off_t ftello(FILE *stream); and #define _FILE_OFFSET_BITS 64.
Verbatim from man ftello:
The fseeko() and ftello() functions are identical to fseek(3) and ftell(3) (see fseek(3)), respectively, except that the offset argument of fseeko() and the return value of ftello() is of type off_t instead of long.
On many architectures both off_t and long are 32-bit types, but compilation with
#define _FILE_OFFSET_BITS 64
will turn off_t into a 64-bit type.
Update:
According to IEEE Std 1003.1, 2013 Edition ftell() shall return -1 and set errno to EOVERFLOW in such cases:
EOVERFLOW
For ftell(), the current file offset cannot be represented correctly in an object of type long.
There is no 64b aware method in C99 standard. What OS/environment are you using? On windows, there is _ftelli64.
On other platforms, look at http://forums.codeguru.com/showthread.php?277234-Cannot-use-fopen()-open-file-larger-than-4-GB
This worked for me on Windows32/MinGW to play with a 6GB file
#define _FILE_OFFSET_BITS 64
#include<stdio.h>
int main() {
FILE *f = fopen("largefile.zip","rb");
fseeko64(f, 0, SEEK_END);
off64_t size = ftello64(f);
printf("%llu\n", size);
}
gcc readlargefile.c -c -std=C99 -o readlargefile.exe
Every single detail, the macro, the compiler option, matters.
I have really old 'c' code that uses read to read a binary file. Here is a sample:
uint MyReadFunc(int _FileHandle, char *DstBuf, uint BufLen)
{
return (read( _FileHandle, DstBuf, BufLen));
}
For 64bit OS - char * will be 64 bits but the BufLen is only 32 bits and the returned value are only 32 bits.
Its not an option to change this to .NET - I have .NET versions, but I need this old library converted also.
Can someone please tell me what I need to use to do File i/o on 64 bit OS (using 'C' code)
Use size_t, not uint.
It looks like you're conflating two things: size of a pointer and the extent of the memory it points to.
I'm not sure about char* being 64-bits - the pointer itself will be 64-bit, yes, but the actual value is still a character array, unless I'm missing something? (I'm not a brilliant C programmer.)
The length argument to read() is size_t, not int, which on a 64-bit system should be 64-bit not 32. Also the return value is a ssize_t, not int, which will also be 64-bit so you should be covered if you just change your function definition to return ssize_t, and take size_t instead of the ints.