Here's the code:
#include<stdio.h>
int main()
{
FILE *fp;
int i;
fp=fopen("DATA","w");
for(i=1;i<=30;++i)
putw(i,fp);
fclose(fp);
fp=fopen("DATA","r");
while((i=getw(fp))!=EOF)
printf("%4d",i);
fclose(fp);
return 0;
}
I don't get the expected output. The program prints the number till 25 rather than till 30. If I set i<=20, I get the right output. I do not understand this.
Help is appreciated. Thanks!
ASCII 26 is the Ctrl-Z (aka SUB) character that on some systems is used to indicate the end of file (normally only for text files). This is the reason your program stops reading the file as soon as it sees the value 26.
The reason this becomes an issue is that you're opening the file in text mode, yet are storing binary data in it (using putw() and getw()).
To fix this, open the file in binary mode and try again.
As suggested by user NPE you work with the binary files in binary mode only.
Here ASCII code comes into the picture because when you are saving an integer to file it is actually stored as sequence of 4 bytes.
If you open your generated DATA file in any hex editor you will notice that when you saved i as 1 into the file it is actually stored as
01 00 00 00
or
00 00 00 01
according to the endian-ness of your system.
On the same basis when you save 26, it gets actually saved as
26 00 00 00
or
00 00 00 26
But when you read this file in text mode and not binary mode, then if character 26 is encountered it is treated as EOF and getw returns -1.
I hope this explains you the actual problem.
This will not happen if you open and close your file in binary mode and write bytes to your file.
Related
How to check if the current write position is at the end of file using low-level POSIX functions? The first idea is to use lseek and fstat:
off_t sk;
struct stat st;
sk = lseek (f, 0, SEEK_CUR);
fstat (f, &st);
return st->st_size == sk;
However does st->st_size reflect the actual size but not the disk file size, i.e. not including kernel buffered data?
Another idea is to use
off_t scur, send;
scur = lseek (f, 0, SEEK_CUR);
send = lseek (f, 0, SEEK_END);
lseek (f, scur, SEEK_START);
return scur == send;
but this doesn't seems to be fast and adequate way.
Also both ways seem to be non-atomic, so if there is another process appending to the file, the size could be changed after checking current offset.
However does st->st_size reflect the actual size but not the disk file size, i.e. not including kernel buffered data?
I don't understand what you mean with the kernel buffered data. The number in st->st_size reflects the size of the file in chars. So, if the file has 1000000 chars, the number that st->st_size will be 1000000, with character positions from 0 to 999999.
There are two ways to get the file size in POSIX systems:
do an off_t saved = lseek(fd, 0, SEEK_END);, which returns the actual position (you must save it, to recover it later), and a second call off_t file_size = lseek(fd, saved, SEEK_SET); which returns to the position you were before, but returns as a number the position you were before (this is the last position of the file, after the last character) If you check this, this will match with the value returned by st->st_size.
do a stat(2) to the file descriptor to get the value you mentioned up.
The first way has some drawbacks if you have multiple threads or processes sharing the file descriptor with you (by means of a dup(2) system call, or a fork()ed process) if they do a read(2), write(2), or lseek(2) call between your two lseek calls, you'll lose the position you had on the file previously and will be unable to recover to the correct place. That is weird, and makes the first approach non recommendable.
Last, there's no relationship on the file buffering done at the kernel with the file size. You always get the true file size on stat(2). The only thing that can be confusing you is the savings done at the kernel when you run the following snippet (but this is transparent to you and you don't have to account for it, except if you are going to copy the file to another place). Just run this tiny program:
#include <fcntl.h>
#include <unistd.h>
int main()
{
int fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0666);
lseek(fd, 1000000, SEEK_SET);
char string[] = "Hello, world";
write(fd, string, sizeof string);
close(fd);
}
in which you will end with a 1000013 bytes file, but that uses only one or two blocks of disk space. That's a holed file, in which there are 1000000 zero bytes before the string your wrote, and the system doesn't allocate blocks in the disk for it. Only when you write on those blocks, the system will fill the parts you write with new blocks to save your data... but until then, the system will show you zero bytes, but they are not stored anywhere.
$ ll file
-rw-r----- 1 lcu lcu 1000013 4 jul. 11:52 file
$ hd file
[file]:
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................
*
000f4240: 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 00 :Hello, world.
000f424d
$ _
In the simple code below, I'm writing an int number (10) into a file and then reading it back to make sure it's done successfully and it is. However, when I open the file (tried both notepad++ and vscode) I see something like this:
???
Here's the code:
int main(){
int var = 10;
FILE* fp = fopen("testfile","w");
rewind(fp);
fwrite(&var,sizeof(int),1,fp);
fflush(fp);
fclose(fp);
int var2 = 0;
fopen("testfile","r+");
fread(&var2,sizeof(int),1,fp);
printf("num: %d\n",var2);
return 0;
}
Of course I thought maybe it's written in a special format which vscode is unable to recognize, but recently I learned coding a simple database, and it used just the same way to save the records in files and when you opened its output file with vscode, it showed both ???s AND the information, however, here it shows only ???s WITHOUT the information. So although it seems be a very basic problem, I can't find the answer to it, so how is 10 really stored in that file? Thanks in advance.
When you write to the file with fwrite, it reads the raw bytes that make up var and writes those to disk. This is the binary representation of the number.
If you use a tool like od, it will print out the bytes the files contains:
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 0a 00 00 00
0000004
You can see here that the first byte contains the value 10 and the next 3 contain the value 0. This tells us that an int takes up 4 bytes and is stored in little-endian format, meaning the least significant byte comes first.
Had you instead uses fprintf to write the value:
fprintf(fp, "%d\n", var);
It would have written the text representation to the file. The file would then look something like this:
[dbush#db-centos7 ~]$ cat testfile
10
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 31 30 0a
0000003
We can see here that printing the file shows readable text, and od shows us the ASCII codes for the characters '1' and '0', as well as a newline.
You are writing a binary file. It cannot be read with an editor. The value 10 is probably stored as 0x0000000A or 0x0A000000 something like that, depending on if the system is big or small endian.
But the point is that it is stored in binary format and not text format.
If you open this file in a text editor, it will likely be interpreted as three NULL characters and then a LF (line feed) character.
I was experimenting on creating BMP files from scratch, when I found a weird bug I could not explain. I isolated the bug in this minimalist program:
int main()
{
FILE* ptr=NULL;
int success=0,pos=0;
ptr=fopen("test.bin","w");
if (ptr==NULL)
{
return 1;
}
char c[3]={10,11,10};
success=fwrite(c,1,3,ptr);
pos=ftell(ptr);
printf("success=%d, pos=%d\n",success,pos);
return 0;
}
the output is:
success=3, pos=5
with hex dump of the test.bin file being:
0D 0A 0B 0D 0A
In short, whatever value you put instead of 11 (0x0B), fwrite will write it correctly. But for some reason, when fwrite comes across a 10 (0x0A) - and precisely this value - it writes 0D 0A instead, that is, 2 bytes, although I clearly specified 1 byte per write in the fwrite arguments. Thus the 3 bytes written, as can be seen in the success variable, and the mysterious 5 in the ftell result.
Could someone please tell me what the heck is going on here...and why 10, why not 97 or 28??
Thank you very much for your help!
EDIT: oh wait, I think I have an idea...isn't this linked to \n being 0A on Unix, and 0D 0A on windows, and some inner feature of the compiler converting one to the other? how can I force it to write exactly the bytes I want?
Your file was opened in text mode so CRLF translation is being done. Try:
fopen("test.bin","wb");
You must be working on the Windows machine. In Windows, EOL is CR-LF whereas in Unix, it is a single character. Your system is replacing 0A with 0D0A.
I cannot understand why a call to read after an lseek returns 0 number of bytes read.
//A function to find the next note for a given userID;
//returns -1 if at the end of file is reached;
//otherwise, it returns the length of the found note.
int find_user_note(int fd, int user_uid) {
int note_uid = -1;
unsigned char byte;
int length;
while(note_uid != user_uid) { // Loop until a note for user_uid is found.
if(read(fd, ¬e_uid, 4) != 4) // Read the uid data.
return -1; // If 4 bytes aren't read, return end of file code.
if(read(fd, &byte, 1) != 1) // Read the newline separator.
return -1;
byte = length = 0;
while(byte != '\n') { // Figure out how many bytes to the end of line.
if(read(fd, &byte, 1) != 1) // Read a single byte.
return -1; // If byte isn't read, return end of file code.
//printf("%x ", byte);
length++;
}
}
long cur_position = lseek(fd, length * -1, SEEK_CUR ); // Rewind file reading by length bytes.
printf("cur_position: %i\n", cur_position);
// this is debug
byte = 0;
int num_byte = read(fd, &byte, 1);
printf("[DEBUG] found a %d byte note for user id %d\n", length, note_uid);
return length;
}
The variable length value is 34 when it exist the outer while loop and the above code produces cur_position 5 (so there are definitely at least 34 bytes after the lseek function returns), but the variable num_byte returned from function read always returns 0 even though there are still more bytes to read.
Does anyone know the reason num_byte always return 0? If it is a mistake in my code, am not seeing what it is.
Just for information, the above code was run on the following machine
$ uname -srvpio
Linux 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64 GNU/Linux
Update:
I upload the full code here
This is the content of file that I try to read
$ sudo hexdump -C /var/notes
00000000 e8 03 00 00 0a 74 68 69 73 20 69 73 20 61 20 74 |.....this is a t|
00000010 65 73 74 20 6f 66 20 6d 75 6c 74 69 75 73 65 72 |est of multiuser|
00000020 20 6e 6f 74 65 73 0a | notes.|
00000027
$
If length is an unsigned type smaller than off_t (for instance, size_t on a 32-bit machine), then length*-1 is going to be a huge value (somewhere around 4GB perhaps). This could be the problem. Storing the result of lseek into a long (again, if it's 32-bit) will apply an implementation-defined conversion, probably truncation, that leaves you with a small value again.
I see that your machine is 64-bit, but perhaps you're running a 32-bit userspace?
In any case, why not run your program under strace to see what system calls it's making? That will almost surely clear the issue up quickly.
I finally found the issue!!! I have to put #include <unistd.h> in order to use the correct lseek(). However I'm not sure why without including unistd.h it was compile-able though resulting in unexpected behavior. I thought that without including the prototype of a function, it shouldn't even compile-able.
The code was written in Hacking: The Art of Exploitation 2nd Edition by Jon Erickson and I have verified that in the book, there is no #include <unistd.h>.
With the initial variable length set to 34, the above code would
produce cur_position 5 (so there are definitely at least 34 bytes
after the lseek function returns)
This not necessarily is the case, as one could seek around beyond the end of file without getting any errors.
See the excerpt from lseek()'s man page below:
The lseek() function allows the file offset to be set beyond the
end of the file (but this does not change the size of the file).
So one could very well receive a value form lseek()ing, which still points beyond the end of the file. So read()ing from this position will still return 0 (as is's beyond end-of-file).
Also I agree with R.., that taking more care in using the correct types (the types used by the methods used) isn't a bad idea.
Update: also you might take care to include all headers for system functions you call. To check for such I strongly recommand to use gccs option -Wall to switch on all compiler warnings, they are for free ... ;-)
I have a file in which I read in data. Suppose the file has the string "abcdefghij". Now, I'm going to be reading from the file at random times from different processes and they store that byte and offset somewhere. For instance, I save 'c' as my character with an offset of '3' because that is its location. For reference, I've been using lseek to get the offset in my files.
Next, I want to write this to a new file. Is it possible to write to a specific offset in an empty file? So, I want to write 'c' to position '3' in the file and then another process will write 'j' to the file at position 10.
#include <stdio.h>
int main ()
{
FILE * f = fopen ("/tmp/x.txt", "w");
fseek (f, 3, SEEK_SET);
fwrite ("c", 1, 1, f);
fseek (f, 10, SEEK_SET);
fwrite ("j", 1, 1, f);
fclose (f);
}
When this runs, the hexdump of /tmp/x.txt is
00 00 00 63 00 00 00 00 00 00 6a | ...c.... ..j
fseek is based on lseek which is capable of recognising "holes" in files (ranges of zeroes which haven't been written yet) but the underlying file system needs to support this.
It's not brilliantly clear to me from the manpage that the holes are strictly required to be zeroes, but that seems to be the case in practice.
Look into ftello. Then use fwrite.
You can also use lseek/write, if you're using fd's instead of FILE*s.