How to write to specific offset in empty file - c

I have a file in which I read in data. Suppose the file has the string "abcdefghij". Now, I'm going to be reading from the file at random times from different processes and they store that byte and offset somewhere. For instance, I save 'c' as my character with an offset of '3' because that is its location. For reference, I've been using lseek to get the offset in my files.
Next, I want to write this to a new file. Is it possible to write to a specific offset in an empty file? So, I want to write 'c' to position '3' in the file and then another process will write 'j' to the file at position 10.

#include <stdio.h>
int main ()
{
FILE * f = fopen ("/tmp/x.txt", "w");
fseek (f, 3, SEEK_SET);
fwrite ("c", 1, 1, f);
fseek (f, 10, SEEK_SET);
fwrite ("j", 1, 1, f);
fclose (f);
}
When this runs, the hexdump of /tmp/x.txt is
00 00 00 63 00 00 00 00 00 00 6a | ...c.... ..j
fseek is based on lseek which is capable of recognising "holes" in files (ranges of zeroes which haven't been written yet) but the underlying file system needs to support this.
It's not brilliantly clear to me from the manpage that the holes are strictly required to be zeroes, but that seems to be the case in practice.

Look into ftello. Then use fwrite.
You can also use lseek/write, if you're using fd's instead of FILE*s.

Related

Is there a fast and reliable POSIX way to check if current file offset is at the end of file?

How to check if the current write position is at the end of file using low-level POSIX functions? The first idea is to use lseek and fstat:
off_t sk;
struct stat st;
sk = lseek (f, 0, SEEK_CUR);
fstat (f, &st);
return st->st_size == sk;
However does st->st_size reflect the actual size but not the disk file size, i.e. not including kernel buffered data?
Another idea is to use
off_t scur, send;
scur = lseek (f, 0, SEEK_CUR);
send = lseek (f, 0, SEEK_END);
lseek (f, scur, SEEK_START);
return scur == send;
but this doesn't seems to be fast and adequate way.
Also both ways seem to be non-atomic, so if there is another process appending to the file, the size could be changed after checking current offset.
However does st->st_size reflect the actual size but not the disk file size, i.e. not including kernel buffered data?
I don't understand what you mean with the kernel buffered data. The number in st->st_size reflects the size of the file in chars. So, if the file has 1000000 chars, the number that st->st_size will be 1000000, with character positions from 0 to 999999.
There are two ways to get the file size in POSIX systems:
do an off_t saved = lseek(fd, 0, SEEK_END);, which returns the actual position (you must save it, to recover it later), and a second call off_t file_size = lseek(fd, saved, SEEK_SET); which returns to the position you were before, but returns as a number the position you were before (this is the last position of the file, after the last character) If you check this, this will match with the value returned by st->st_size.
do a stat(2) to the file descriptor to get the value you mentioned up.
The first way has some drawbacks if you have multiple threads or processes sharing the file descriptor with you (by means of a dup(2) system call, or a fork()ed process) if they do a read(2), write(2), or lseek(2) call between your two lseek calls, you'll lose the position you had on the file previously and will be unable to recover to the correct place. That is weird, and makes the first approach non recommendable.
Last, there's no relationship on the file buffering done at the kernel with the file size. You always get the true file size on stat(2). The only thing that can be confusing you is the savings done at the kernel when you run the following snippet (but this is transparent to you and you don't have to account for it, except if you are going to copy the file to another place). Just run this tiny program:
#include <fcntl.h>
#include <unistd.h>
int main()
{
int fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0666);
lseek(fd, 1000000, SEEK_SET);
char string[] = "Hello, world";
write(fd, string, sizeof string);
close(fd);
}
in which you will end with a 1000013 bytes file, but that uses only one or two blocks of disk space. That's a holed file, in which there are 1000000 zero bytes before the string your wrote, and the system doesn't allocate blocks in the disk for it. Only when you write on those blocks, the system will fill the parts you write with new blocks to save your data... but until then, the system will show you zero bytes, but they are not stored anywhere.
$ ll file
-rw-r----- 1 lcu lcu 1000013 4 jul. 11:52 file
$ hd file
[file]:
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................
*
000f4240: 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 00 :Hello, world.
000f424d
$ _

Error in simple file handling C program

Here's the code:
#include<stdio.h>
int main()
{
FILE *fp;
int i;
fp=fopen("DATA","w");
for(i=1;i<=30;++i)
putw(i,fp);
fclose(fp);
fp=fopen("DATA","r");
while((i=getw(fp))!=EOF)
printf("%4d",i);
fclose(fp);
return 0;
}
I don't get the expected output. The program prints the number till 25 rather than till 30. If I set i<=20, I get the right output. I do not understand this.
Help is appreciated. Thanks!
ASCII 26 is the Ctrl-Z (aka SUB) character that on some systems is used to indicate the end of file (normally only for text files). This is the reason your program stops reading the file as soon as it sees the value 26.
The reason this becomes an issue is that you're opening the file in text mode, yet are storing binary data in it (using putw() and getw()).
To fix this, open the file in binary mode and try again.
As suggested by user NPE you work with the binary files in binary mode only.
Here ASCII code comes into the picture because when you are saving an integer to file it is actually stored as sequence of 4 bytes.
If you open your generated DATA file in any hex editor you will notice that when you saved i as 1 into the file it is actually stored as
01 00 00 00
or
00 00 00 01
according to the endian-ness of your system.
On the same basis when you save 26, it gets actually saved as
26 00 00 00
or
00 00 00 26
But when you read this file in text mode and not binary mode, then if character 26 is encountered it is treated as EOF and getw returns -1.
I hope this explains you the actual problem.
This will not happen if you open and close your file in binary mode and write bytes to your file.

fwrite fails to write value 10 (0x0A)

I was experimenting on creating BMP files from scratch, when I found a weird bug I could not explain. I isolated the bug in this minimalist program:
int main()
{
FILE* ptr=NULL;
int success=0,pos=0;
ptr=fopen("test.bin","w");
if (ptr==NULL)
{
return 1;
}
char c[3]={10,11,10};
success=fwrite(c,1,3,ptr);
pos=ftell(ptr);
printf("success=%d, pos=%d\n",success,pos);
return 0;
}
the output is:
success=3, pos=5
with hex dump of the test.bin file being:
0D 0A 0B 0D 0A
In short, whatever value you put instead of 11 (0x0B), fwrite will write it correctly. But for some reason, when fwrite comes across a 10 (0x0A) - and precisely this value - it writes 0D 0A instead, that is, 2 bytes, although I clearly specified 1 byte per write in the fwrite arguments. Thus the 3 bytes written, as can be seen in the success variable, and the mysterious 5 in the ftell result.
Could someone please tell me what the heck is going on here...and why 10, why not 97 or 28??
Thank you very much for your help!
EDIT: oh wait, I think I have an idea...isn't this linked to \n being 0A on Unix, and 0D 0A on windows, and some inner feature of the compiler converting one to the other? how can I force it to write exactly the bytes I want?
Your file was opened in text mode so CRLF translation is being done. Try:
fopen("test.bin","wb");
You must be working on the Windows machine. In Windows, EOL is CR-LF whereas in Unix, it is a single character. Your system is replacing 0A with 0D0A.

fread the same file, but return different result

Today, I read a blog named by "a bug of fread?", I didn't find any reason for it, so I paste it here waiting for any genius.
First, the purpose of the program is to read a file(readme.txt) and print the content, and I test it with Visual Studio 2010.
The content of the readme is :
1234;
abcd;
ABCD;
The hex value of readme is :
31 32 33 34 3b 0d 0a 61 62 63 64 3b 0d 0a 41 42 43 44 3b
Here is the code:
#include <stdio.h>
#include <string.h>
#define BUF_SIZE 1024
int main()
{
FILE *fp = NULL;
int rcnt = 0;
char rbuf[BUF_SIZE];
fp = fopen("readme.txt", "r");
if (NULL == fp)
{
printf("fopen error.\n");
return -1;
}
printf("--------------------------\n");
memset(rbuf, 0, BUF_SIZE);
fseek(fp, 0, SEEK_SET);
rcnt = fread(rbuf, 1, BUF_SIZE, fp);
printf("read cnt = %d\n", rcnt);
printf("%s\n", rbuf);
return 0;
}
Such a simple code, and the expected result is :
--------------------------
read cnt = 17
1234;
abcd;
ABCD;
Total 17 count include 15 characters and 2 '\n'.
But I got the below result:
--------------------------
read cnt = 17
1234;
abcd;
ABCD;D;
PS: If call fopen function with "rb", or if define the macro BUF_SIZE smaller, I got the correct result.
fread() doesn't return a NUL terminated string, but printf("%s") ask for a NUL terminated string.
You have to add a '\0' at the end of the read buffer: rbuf[rcnt] = '\0'.
And remember to read one byte less than the buffer size to leave room for the NUL byte.
I think it's wrong to use fread(), a binary reading API, with a text file. The default mode (if you just say "r") is text.
Note that FILE * I/O in text mode typically does line-termination translation, so that you can pretend that lines end with \n when they might in fact physically end with \r\n (as yours do).
This conversion might introduce confusion somewhere; which is why switching to binary mode makes it work again as no such translation happens in binary mode.

fwrite a pid_t not working

I have the follwoing code:
...
printf("Started %d", pid);
FILE * fh;
fh = fopen("run/source.pid", "wb");
fwrite(&pid, sizeof(int), 1, fh);
fclose(fh);
However the written pid file writes jargon, and not the integer, I though pid_t was just an int, I even tied doing sizeof(pid_t) for the second argument I get similar issues.
Any ideas? Thanks for the help in advance.
Thanks
well I do not understand quite well the question (too little context), but the issue may be is that you are seeing the file in a text editor, terminal, etc..
fwrite() writes raw data, for example, suppose you have a pid number, lets say 12, and you write that number using fwrite like this:
fwrite(&pid, sizeof(int), 1, file);
fwrite() will write a 32 bit integer into the file file, that is, depending in your processor type, a byte sequence like this: 00 00 00 12
However
fprintf() will write a byte sequence of : 49 50 (ASCII characters platform independent) visible among all terminals or text editors.
Hope this helps.

Resources