As the documentation highlighted, the lseek() function allows the file offset to be set beyond the end of the file. But what if I set it beyond the beginning of the file?
Let the current offset be 5, What will actually happen when I try lseek(fd, 10, SEEK_END) ?
As mentioned, assume that I create a new file, and call the function write(fd, buf, 5) . The current file offset would be 5. Then using lseek function above, I expect the result would either be 0 or some errors may occurred.
Assuming that what you're asking if you can seek to before the beginning of the file (which your code example wouldn't do):
The specification for lseek() says that it will return -1 and set errno to EINVAL:
The whence argument is not a proper value, or the resulting file offset would be negative for a regular file, block special file, or directory.
Related
I have this sample code where I'm trying to implement for my operating systems assignment a program that copies the contents of an input file to an output file. I'm only allowed to use POSIX system calls, stdio is forbidden.
I've thought about storing the contents in a buffer but in my implementation I must know the file descriptor contents size. I googled a little and found about
off_t fsize;
fsize = lseek (input, 0, SEEK_END);
But in this case my file descriptor (input) gets messed up and I can't rewind it to the start. I played around with the parameters but I can't figure a way to rewind it back to the first character in the file after using lseek. That's the only thing I need, having that I can loop byte by byte and copy all the contents of input to output.
My code is here, it's very short in case any of you want have to take a look:
https://github.com/lucas-sartm/OSAssignments/blob/master/copymachine.c
I figured it out by trial and error. All that was needed was to read the documentation and take a look at read() return values... This loop solved the issue.
while (read (input, &content, sizeof(content)) > 0){ //this will write byte by byte until end of buffer!
write (output, &content, sizeof(content));
}
In C, we can find the size of file using fseek() function. Like,
if (fseek(fp, 0L, SEEK_END) != 0)
{
// Handle repositioning error
}
So, I have a question, Is it recommended method for computing the size of a file using fseek() and ftell()?
If you're on Linux or some other UNIX like system, what you want is the stat function:
struct stat statbuf;
int rval;
rval = stat(path_to_file, &statbuf);
if (rval == -1) {
perror("stat failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
On Windows under MSVC, you can use _stati64:
struct _stati64 statbuf;
int rval;
rval = _stati64(path_to_file, &statbuf);
if (rval == -1) {
perror("_stati64 failed");
} else {
printf("file size = %lld\n", (long long)statbuf.st_size;
}
Unlike using fseek, this method doesn't involve opening the file or seeking through it. It just reads the file metadata.
The fseek()/ftell() works sometimes.
if (fseek(fp, 0L, SEEK_END) != 0)
printf("Size: %ld\n", ftell(fp));
}
Problems.
If the file size exceeds about LONG_MAX, long int ftell(FILE *stream) response is problematic.
If the file is opened in text mode, the return value from ftell() may not correspond to the file length. "For a text stream, its file position indicator contains unspecified information," C11dr ยง7.21.9.4 2
If the file is opened in binary mode, fseek(fp, 0L, SEEK_END) is not well defined. "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state." C11dr footnote 268. #Evert This most often applies to earlier platforms than today, but it is still part of the spec.
If the file is a stream like a serial input or stdin, fseek(file, 0, SEEK_END) makes little sense.
The usual solution to finding file size is a non-portable platform specific one. Example good answer #dbush.
Note: If code attempts to allocate memory based on file size, the memory available can easily be exceeded by the file size.
Due to these issues, I do not recommend this approach.
Typically the problem should be re-worked to not need to find the file size, but to grow the data as more input is processed.
LL disclaimer: Note that C spec footnotes are informative and so not necessarily normative.
The best method in my opinion is fstat(): https://linux.die.net/man/2/fstat
Well, you can estimate the size of a file in several ways:
You can read(2) the file from the beginning to the end, and the number or chars read is the size of the file. This is a tedious way of getting the size of a file, as you have to read the whole file to get the size. But if the operating system doesn't allow to position the file pointer arbitrarily, then this is the only way to get the file size.
Or you can move the pointer at the end of file position. This is the lseek(2) you showed in the question, but be careful that you have to do the system call twice, as the value returned is the actual position before moving the pointer to the desired place.
Or you can use the stat(2) system call, that will tell you all the administrative information of the file, like the owner, group, permissions, size, number of blocks the file occupies in the disk, disk this file belongs to, number of directory entries pointing to it, etc. This allows you to get all this information with only one syscall.
Other methods you point (like the use of the ftell(3) stdio library call) will work also (with the same problem that it results in two system calls to set and retrieve/restore the file pointer) but have the problem of involving libraries that probably you are not using for anything else. It should be complicated to get a FILE * pointer (e.g. fdopen(3)) on a int file descriptor, just to be able to use the ftell(3) function on it (twice), and then fclose(3) it again.
A program that runs just fine on my freeBSD system fails when I build it on windows (Visual Studio 15). It goes into an endless loop here:
//...
while (1) {
if ('#' == fgetc(f)) {
// we do some stuff here. irrelevant for stackoverflow question
break;
}
fseek(f, -1, SEEK_CUR);
if (0 != fseek(f, -1, SEEK_CUR)) {
// Beginning of file.
break;
}
}
//...
On closer look (by adding a bunch of fgetpos()-calls) I find that fgetc moves the file position indicator backwards. So it misses the beginning of the file and some '#' if they are not in a multiple-of-3 position from the end.
I notice that this only happenes when the file f is opened with
fopen(filename, "a+");
//text mode read/append
When I change it to
fopen(filename, "ab+");
//binary mode read/append
then everything works as expected.
I think for my code it is safe just to use binary mode all the time.
But two questions remain:
Are there reasons that stand against binary mode?
What trickery is this with wrong direction in text mode?
Quoting C11 7.21.9.2 the fseek function:
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
Invoking fseek with a whence argument of SEEK_CUR on a stream open in text mode is not covered by the C Standard. Opening the file in binary mode seems a much better option.
The value returned by fgetpos() may not be meaningful as an offset in the file, it is only meant to be passed as an argument to fsetpos().
As a general remark, you should try and change you algorithms to avoid relying on backwards seeks in the stream, especially relying on fseek() errors seems unreliable. Instead save the position before the fgetc() with ftell() or fgetpos() and restore it when needed with fseek(pos, SEEK_SET, fp) or fsetpos().
How can I find out if the offset cursor is currently at EOF by using lseek() only?
lseek returns the (new) position. If it's acceptable that the file position is at the end after the test, the following works:
off_t old_position = lseek(fd, 0, SEEK_CUR);
off_t end_position = lseek(fd, 0, SEEK_END);
if(old_position == end_position) {
/* cursor already has been at the end */
}
Now, the cursor is at the end, whether it already has been there or not; to set it back, you can do lseek(fd, old_position, SEEK_SET) afterwards.
(I omitted checks for errors (return value of (off_t)-1) for sake of shortness, remember to include them in the real code.)
An alternative, though using another function, would be to query the current position as above and fstat the file to see if the st_size field equals the current position.
As a note, the end-of-file condition is set for streams (FILE *, not the int file descriptors) after an attempt to read past the end of the file, the cursor being at the end is not enough (that is, this approach is not the file descriptor equivalent to feof(stream)).
I have opened one file with following way:
fp = fopen("some.txt","r");
Now in this file the 1st some bytes lets say 40 bytes are unnecessary junk of data so I want to remove them. But I cannot delete that data from that file, modify or
create duplicates of that file without that unnecessary data.
So I want to create another dummy FILE pointer which points to the file and when I pass this dummy pointer to any another function that does the following operation:
fseek ( dummy file pointer , 0 , SEEK_SET );
then it should set the file pointer at 40th position in my some.txt.
But the function accepts a file descriptor so i need to pass a file descriptor which will treat the file as those first 40 bytes were never in the file.
In short that dummy descriptor should treat the file as those 40 bytes were not in that file and all positioning operations should be with respect to that 40th byte counting as the is 1st byte.
Easy.
#define CHAR_8_BIT (0)
#define CHAR_16_BIT (1)
#define BIT_WIDTH (CHAR_8_BIT)
#define OFFSET (40)
FILE* fp = fopen("some.txt","r");
FILE* dummy = NULL;
#if (BIT_WIDTH == CHAR_8_BIT)
dummy = fseek (fp, OFFSET*sizeof(char), SEEK_SET);
#else
dummy = fseek (fp, OFFSET*sizeof(wchar_t), SEEK_SET);
#endif
The SEEK_SET macro indicates beginning of file, and depending on whether you are using 8-bit characters (ASCI) or 16-bit characters (eg: UNICODE) you will step 40 CHARACTERS forward from the beginning of your file pointer, and assign that pointer/address to dummy.
Good luck!
These links will likely be helpful as well:
char vs wchar_t
http://www.cplusplus.com/reference/clibrary/cstdio/fseek/
If you want, you can just convert a file descriptor to a file pointer via the fdopen() call.
http://linux.die.net/man/3/fdopen
fseek ( dummy file pointer , 0 , SEEK_SET );
In short that dummy pointer should treat the file as there is no that 40 byte in that file and all position should be with respect to that 40th byte as counting as it is 1st byte.
You have conflicting requirements, you cannot do this with the C API.
SEEK_SET always refers to the absolute position in the file, which means if you want that command to work, you have to modify the file and remove the junk.
On linux you could write a FUSE driver that would present the file like it was starting from the 40th byte, but that's a lot of work. I'm only mentioned this because it's possible to solve the problem you've created, but it would be quite silly to actually do this.
The simplest thing of course would be just to abandon this emulating layer idea you're looking for, and write code that can handle that extra header junk.
If you want to remove the first 40 bytes of a file on the disk without creating another file, then you can copy the content from the 41th byte and onwards into a buffer, then write it back at offset -40. Then use ftruncate (a POSIX library in unistd.h) to truncate at (filesize - 40) offset.
I wrote a small code with what i understood from your question.
#include<stdio.h>
void readIt(FILE *afp)
{
char mystr[100];
while ( fgets (mystr , 100 , afp) != NULL )
puts (mystr);
}
int main()
{
FILE * dfp = NULL;
FILE * fp = fopen("h4.sql","r");
if(fp != NULL)
{
fseek(fp,10,SEEK_SET);
dfp = fp;
readIt(dfp);
fclose(fp);
}
}
The readIt() is reading the file from the 11 byte.
Is this what you are expecting or something else?
I haven't actually tried this, but I think you should be able to use mmap (with the MAP_SHARED option) to get your file mapped into your address space, and then fmemopen to get a FILE* that refers to all but the first 40 bytes of that buffer.
This gives you a FILE* (as you describe in the body of your question), but I believe not a file descriptor (as in the title and elsewhere in the question). The two are not the same, and AFAIK the FILE* created with fmemopen does not have an associated file descriptor.