I got confused about lseek()'s return value(which is new file offset)
I have the text file (Its name is prwtest). Its contents are written to a to z.
And, the code what I wrote is following,
1 #include <unistd.h>
2 #include <fcntl.h>
3 #include <stdlib.h>
4 #include <stdio.h>
5 #include <string.h>
6
7 #define BUF 50
8
9 int main(void)
10 {
11 char buf1[]="abcdefghijklmnopqrstuvwxyz";
12 char buf2[BUF];
13 int fd;
14 int read_cnt;
15 off_t cur_offset;
16
17 fd=openat(AT_FDCWD, "prwtest", O_CREAT | O_RDWR | O_APPEND);
18 cur_offset=lseek(fd, 0, SEEK_CUR);
19 //pwrite(fd, buf1, strlen(buf1), 0);
20 //write(fd, buf1, strlen(buf1));
21 //cur_offset=lseek(fd, 0, SEEK_END);
22
23 printf("current offset of file prwtest: %d \n", cur_offset);
24
25 exit(0);
26 }
On the line number 17, I use flag O_APPEND, so the prwtest's current file offset is taken from i-node's current file size. (It's 26).
On the line number 18, I use lseek() which is used by SEEK_CUR, and the offset is 0.
But the result value cur_offset is 0. (I assume that it must be 26, because SEEK_CUR indicates current file offset.)
However, SEEK_END gives me what I thought, cur_offset is 26.
Why the lseek(fd, 0, SEEK_CUR); gives me return value 0, not 26?
O_APPEND takes effect before each write to the file, not when opening file.
Therefore right after the open the position remains 0 but if you invoke write, the lseek on SEEK_CUR will return correct value.
Your issue is with open() / openat(), not lseek().
From the open() manpage, emphasis mine:
O_APPEND
The file is opened in append mode. Before each write(2), the file offset is positioned at the end of the file, as if with lseek(2).
Since you don't write to the file, the offset is never repositioned to the end of the file.
While we're at it, you should be closing the file before ending the program...
Actually, while we're really at it, if you do #include <stdio.h> already, why not use the standard's file I/O (fopen() / fseek() / fwrite()) instead of the POSIX-specific stuff? ;-)
Also, on Linux, your commented-out code won't work as you expect. This code:
17 fd=openat(AT_FDCWD, "prwtest", O_CREAT | O_RDWR | O_APPEND);
18 cur_offset=lseek(fd, 0, SEEK_CUR);
19 pwrite(fd, buf1, strlen(buf1), 0);
will fail to write the contents of buf1 at the beginning of the file (unless the file is empty).
pwrite on Linux is buggy:
BUGS
POSIX requires that opening a file with the O_APPEND flag should
have no effect on the location at which pwrite() writes data.
However, on Linux, if a file is opened with O_APPEND, pwrite()
appends data to the end of the file, regardless of the value of
offset.
Related
I have a simple test program that uses fgetc() to read a character from a file stream and lseek() to read a file offset. It looks like this:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
char buf[] = "hello world";
FILE *f;
int fd;
fd = open("test.txt", O_RDWR | O_CREAT | O_TRUNC, 0600);
write(fd, buf, sizeof(buf));
lseek(fd, 0, SEEK_SET);
f = fdopen(fd, "r");
printf("%c\n", fgetc(f));
printf("%d\n", lseek(fd, 0, SEEK_CUR));
}
When I run it, I get the following output:
h
12
The return value of fgetc(f), h, makes sense to me. But why is it repositioning the file offset to be at the end of the file? Why doesn't lseek(fd, 0, SEEK_CUR) give me 1?
If I repeat the the first print statement, it works as expected and prints an e then an l etc.
I don't see any mention of this weird behavior in the man page.
stdio functions like fgetc are buffered. They will read() a large block into a buffer and then return characters from the buffer on successive calls.
Since the default buffer size is more than 12 (usually many KB), the first time you fgetc(), it tries to fill its buffer which means reading the entire file. Thus lseek returns a position at the end of the file.
If you want to get a file position that takes into account what's still in the buffer, use ftell() instead.
If you run dd with this:
dd if=/dev/zero of=sparsefile bs=1 count=0 seek=1048576
You appear to get a completely unallocated sparse file (this is ext4)
smark#we:/sp$ ls -ls sparsefile
0 -rw-rw-r-- 1 smark smark 1048576 Nov 24 16:19 sparsefile
fibmap agrees:
smark#we:/sp$ sudo hdparm --fibmap sparsefile
sparsefile:
filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
Without having to dig through the source of dd, I'm trying to figure out how to do that in C.
I tried fseeking and fwriting zero bytes, but it did nothing.
Not sure what else to try, I figured somebody might know before I hunt down dd's innards.
EDIT: including my example...
FILE *f = fopen("/sp/sparse2", "wb");
fseek(f, 1048576, SEEK_CUR);
fwrite("x", 1, 0, f);
fclose(f);
When you write to a file using write or various library routines that ultimately call write, there's a file offset pointer associated with the file descriptor that determines where in the file the bytes will go. It's normally positioned at the end of the data that was processed by the most recent call to read or write. But you can use lseek to position the pointer anywhere within the file, and even beyond the current end of the file. When you write data at a point beyond the current EOF, the area that was skipped is conceptually filled with zeroes. Many systems will optimize things so that any whole filesystem blocks in that skipped area simply aren't allocated, producing a sparse file. Attempts to read such blocks will succeed, returning zeroes.
Writing block-sized areas full of zeroes to a file generally won't produce a sparse file, although it's possible for some filesystems to do this.
Another way to produce a sparse file, used by GNU dd, is to call ftruncate. The documentation says this:
The ftruncate() function causes the regular file referenced by fildes to have a size of length bytes.
If the file previously was larger than length, the extra data is discarded. If it was previously shorter than length, it is unspecified whether the file is changed or its size increased. If the file is extended, the extended area appears as if it were zero-filled.
Support for sparse files is filesystem-specific, although virtually all designed-for-UNIX local filesystems support them.
This is complementary to the answer by #MarkPlotnick, it's a sample simple implementation of the feature you requested using ftruncate():
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
int
main(void)
{
int file;
int mode;
mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
file = open("sparsefile", O_WRONLY | O_CREAT, mode);
if (file == -1)
return -1;
ftruncate(file, 0x100000);
close(file);
return 0;
}
I am trying to understand how a file position indicator moves after I read some bytes from a file. I have a file named "filename.dat" with a single line: "abcdefghijklmnopqrstuvwxyz" (without the quotes).
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main () {
int fd = open("filename.dat", O_RDONLY);
FILE* fp = fdopen(fd,"r");
printf("ftell(fp): %ld, errno = %d\n", ftell(fp), errno);
fseek(fp, 5, SEEK_SET); // advance 5 bytes from beginning of file
printf("file position indicator: %ld, errno = %d\n", ftell(fp), errno);
char buffer[100];
int result = read(fd, buffer, 4); // read 4 bytes
printf("result = %d, buffer = %s, errno = %d\n", result, buffer, errno);
printf("file position indicator: %ld, errno = %d\n", ftell(fp), errno);
fseek(fp, 3, SEEK_CUR); // advance 3 bytes
printf("file position indicator: %ld, errno = %d\n", ftell(fp), errno);
result = read(fd, buffer, 6); // read 6 bytes
printf("result = %d, buffer = %s, errno = %d\n", result, buffer, errno);
printf("file position indicator: %ld\n", ftell(fp));
close(fd);
return 0;
}
ftell(fp): 0, errno = 0
file position indicator: 5, errno = 0
result = 4, buffer = fghi, errno = 0
file position indicator: 5, errno = 0
file position indicator: 8, errno = 0
result = 0, buffer = fghi, errno = 0
file position indicator: 8
I do not understand why the second time I try to use read, I get no bytes from the file. Also, why does the file position indicator not move when I read contents from the file using read? On the second fseek, advancing 4 bytes instead of 3 did also not work. Any suggestions?
Use fseek and fread or lseek and read, but do not mix the two APIs, it won't work.
A FILE* has its own internal buffer. fseek may or may not move the internal buffer pointer only. It is not guaranteed that the real file position indicator (one that lseek is responsible for) changes, and if it does, it is not known by how much.
First thing to note is that the read calls read chars into a raw buffer, but printf() expects to be handed null-terminated strings for %s parameters. You're not explicitly adding a null-terminator byte so your program might print garbage after the first 4 bytes of the buffer, but you've been lucky and your compiler has initialized the buffer to zeroes so you haven't noticed this problem.
The essential problem in this program is that you're mixing high-level buffering FILE * calls with low level file descriptor calls, which will result in unpredictable behavior. FILE structs contain a buffer and a couple of ints to support more efficient and convenient access to the file behind a file descriptor.
Basically all f*() calls (fopen(), fread(), fseek(), fwrite()) expect that all I/O is going to be done by f*() calls using a FILE struct, so the buffer and index values in the FILE struct will be valid. The low-level calls (read(), write(), open(), close(), seek()) completely ignore the FILE struct.
I ran strace on your program. The strace utility logs all system calls made by a process. I've omitted all the uninteresting stuff up to your open() call.
Here is your open call:
open("filename.dat", O_RDONLY) = 3
Here is where fdopen() is happening. The brk calls are evidence of memory allocation, presumably for something like malloc(sizeof(FILE)).
fcntl64(3, F_GETFL) = 0 (flags O_RDONLY)
brk(0) = 0x83ea000
brk(0x840b000) = 0x840b000
fstat64(3, {st_mode=S_IFREG|0644, st_size=26, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7728000
This might be the effect of ftell() or just the last part of fdopen, I'm not sure.
_llseek(3, 0, [0], SEEK_CUR) = 0
Here is the first printf.
write(1, "ftell(fp): 0, errno = 0\n", 24) = 24
Here is the first fseek, which has decided the easiest way to get to position 5 in the file is to just read in 5 bytes and ignore them.
_llseek(3, 0, [0], SEEK_SET) = 0
read(3, "abcde", 5) = 5
Here is the third printf. Notice that there is no evidence of a ftell() call. ftell() uses the information in the FILE struct, which claims to be accurate, so no system call is necessary.
write(1, "file position indicator: 5, errn"..., 38) = 38
Here is your read() call. Now, the operating system file handle is at position 9, but the FILE struct thinks it is still at position 5.
read(3, "fghi", 4) = 4
The third and fourth printf with ftell indication position 5.
write(1, "result = 4, buffer = fghi, errno"..., 37) = 37
write(1, "file position indicator: 5, errn"..., 38) = 38
Here is the fseek(fp, 3, SEEK_CUR) call. fseek() has decided to just SEEK_SET back to the beginning of the file and read the whole thing into the FILE struct's 4k buffer. Since it "knew" it was at position 5, it "knows" it must be at position 8 now. Since the file is only 26 bytes long, the os file position is now at eof.
_llseek(3, 0, [0], SEEK_SET) = 0
read(3, "abcdefghijklmnopqrstuvwxyz", 4096) = 26
The fifth printf.
write(1, "file position indicator: 8, errn"..., 38) = 38
Here is your second read() call. Since the file handle is at eof, it reads 0 bytes. It doesn't change anything in your buffer.
read(3, "", 6) = 0
The sixth and seventh printf calls.
write(1, "result = 0, buffer = fghi, errno"..., 37) = 37
write(1, "file position indicator: 8\n", 27) = 27
Your close() call, and the process exit.
close(3) = 0
exit_group(0) = ?
File holes are the empty spaces in file, which, however, doesn't take up any disk space and contains null bytes. Therefore, the file size is larger than its actual size on disk.
However, I don't know how to create a file with file holes for experimenting with.
Use the dd command with a seek parameter.
dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
That creates for you a file with a nice hole from byte 8192 to byte 28671.
Here's an example, demonstrating that indeed the file has holes in it (the ls -s command tells you how many disk blocks are being used by a file):
$ dd if=/dev/urandom bs=4096 count=2 of=fwh # fwh = file with holes
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 0.00195565 s, 4.2 MB/s
$ dd if=/dev/urandom seek=7 bs=4096 count=2 of=fwh
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 0.00152742 s, 5.4 MB/s
$ dd if=/dev/zero bs=4096 count=9 of=fwnh # fwnh = file with no holes
9+0 records in
9+0 records out
36864 bytes (37 kB) copied, 0.000510568 s, 72.2 MB/s
$ ls -ls fw*
16 -rw-rw-r-- 1 hopper hopper 36864 Mar 15 10:25 fwh
36 -rw-rw-r-- 1 hopper hopper 36864 Mar 15 10:29 fwnh
As you can see, the file with holes takes up fewer disk blocks, despite being the same size.
If you want a program that does it, here it is:
#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>
#include <fcntl.h>
int main(int argc, const char *argv[])
{
char random_garbage[8192]; /* Don't even bother to initialize */
int fd = -1;
if (argc < 2) {
fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0666);
if (fd < 0) {
perror("Can't open file: ");
return 2;
}
write(fd, random_garbage, 8192);
lseek(fd, 5 * 4096, SEEK_CUR);
write(fd, random_garbage, 8192);
close(fd);
return 0;
}
The above should work on any Unix. Someone else replied with a nice alternative method that is very Linux specific. I highlight it here because it's a method distinct from the two I gave, and can be used to put holes in existing files.
Create a file.
Seek to position N.
Write some data.
There will be a hole at the start of the file (up to, and excluding, position N). You can similarly create files with holes in the middle.
The following document has some sample C code (search for "Sparse files"): http://www.win.tue.nl/~aeb/linux/lk/lk-6.html
Aside from creating files with holes, since ~2 months ago (mid-January 2011), you can punch holes on existing files on Linux, using fallocate(2) FALLOC_FL_PUNCH_HOLE LWN article, git commit on Linus' tree, patch to Linux's manpages.
The problem is carefully discussed in section 3.6 of W.Richard Stevens famous book "Advanced Programming in the UNIX Environment" (APUE for short). The lseek funstion included in unistd.h is used here, which is designed to set an open file's offset explicitly. The prototype of the lseek function is as follows:
off_t lseek(int filedes, off_t offset, int whence);
Here, filedes is the file descriptor, offset is the value we are willing to set, and whence is a constant set in the header file, specifically SEEK_SET, meaning that the offset is set from the beginning of the file; SEEK_CUR, meaning that the offset is set to its current value plus the offset in the arguement list; SEEK_END, meaning that the file's offset is set the the size of the file plus the offset in the arguement list.
The example to create a file with holes in C under UNIX like OSs is as follows:
/*Creating a file with a hole of size 810*/
#include <fcntl.h>
/*Two strings to write to the file*/
char buf1[] = "abcde";
char buf2[] = "ABCDE";
int main()
{
int fd; /*file descriptor*/
if((fd = creat("file_with_hole", FILE_MODE)) < 0)
err_sys("creat error");
if(write(fd, buf1, 5) != 5)
err_sys("buf1 write error");
/*offset now 5*/
if(lseek(fd, 815, SEEK_SET) == -1)
err_sys("lseek error");
/*offset now 815*/
if(write(fd, buf2, 5) !=5)
err_sys("buf2 write error");
/*offset now 820*/
return 0;
}
In the code above, err_sys is the function to deal with fatal error related to a system call.
A hole is created when data is written at an offset beyond the current file size or the file size is truncated to something larger than the current file size
I want to write some bogus text in a file ("helloworld" text in a file called helloworld), but not starting from the beginning. I was thinking to lseek() function.
If I use the following code (edited):
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#define fname "helloworld"
#define buf_size 16
int main(){
char buffer[buf_size];
int fildes,
nbytes;
off_t ret;
fildes = open(fname, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);
if(fildes < 0){
printf("\nCannot create file + trunc file.\n");
}
//modify offset
if((ret = lseek(fildes, (off_t) 10, SEEK_END)) < (off_t) 0){
fprintf(stdout, "\nCannot modify offset.\n");
}
printf("ret = %d\n", (int)ret);
if(write(fildes, fname, 10) < 0){
fprintf(stdout, "\nWrite failed.\n");
}
close(fildes);
return (0);
}
, it compiles well and it runs without any apparent errors.
Still if i :
cat helloworld
The output is not what I expected, but:
helloworld
Can
Where is "Can" comming from, and where are my empty spaces ?
Should i expect for "zeros" instead of spaces ? If i try to open helloworld with gedit, an error occurs, complaining that the file character encoding is unknown.
LATER EDIT:
After I edited my program with the right buffer for writing, and then compile / run again, the "helloworld" file still cannot be opened with gedit.strong text
LATER EDIT
I understand the issue now. I've added to the code the following:
fildes = open(fname, O_RDONLY);
if(fildes < 0){
printf("\nCannot open file.\n");
}
while((nbytes = read(fildes, c, 1)) == 1){
printf("%d ", (int)*c);
}
And now the output is:
0 0 0 0 0 0 0 0 0 0 104 101 108 108 111 119 111 114 108 100
My problem was that i was expecting spaces (32) instead of zeros (0).
In this function call, write(fildes, fname, buf_size), fname has 10 characters (plus a trailing '\0' character, but you're telling the function to write out 16 bytes. Who knows what in the memory locations after the fname string.
Also, I'm not sure what you mean by "where are my empty spaces?".
Apart from expecting zeros to equal spaces, the original problem was indeed writing more than the length of the "helloworld" string. To avoid such a problem, I suggest letting the compiler calculate the length of your constant strings for you:
write(fildes, fname, sizeof(fname) - 1)
The - 1 is due to the NUL character (zero, \0) that is used to terminate C-style strings, and sizeof simply returning the size of the array that holds the string. Due to this you cannot use sizeof to calculate the actual length of a string at runtime, but it works fine for compile-time constants.
The "Can" you saw in your original test was almost certainly the beginning of one of the "\nCannot" strings in your code; after writing the 11 bytes in "helloworld\0" you continued to write the remaining bytes from whatever was following it in memory, which turned out to be the next string constant. (The question has now been amended to write 10 bytes, but the originally posted version wrote 16.)
The presence of NUL characters (= zero, '\0') in a text file may indeed cause certain (but not all) text editors to consider the file binary data instead of text, and possibly refuse to open it. A text file should contain just text, not control characters.
Your buf_size doesn't match the length of fname. It's reading past the buffer, and therefore getting more or less random bytes that just happened to sit after the string in memory.