I was experimenting on creating BMP files from scratch, when I found a weird bug I could not explain. I isolated the bug in this minimalist program:
int main()
{
FILE* ptr=NULL;
int success=0,pos=0;
ptr=fopen("test.bin","w");
if (ptr==NULL)
{
return 1;
}
char c[3]={10,11,10};
success=fwrite(c,1,3,ptr);
pos=ftell(ptr);
printf("success=%d, pos=%d\n",success,pos);
return 0;
}
the output is:
success=3, pos=5
with hex dump of the test.bin file being:
0D 0A 0B 0D 0A
In short, whatever value you put instead of 11 (0x0B), fwrite will write it correctly. But for some reason, when fwrite comes across a 10 (0x0A) - and precisely this value - it writes 0D 0A instead, that is, 2 bytes, although I clearly specified 1 byte per write in the fwrite arguments. Thus the 3 bytes written, as can be seen in the success variable, and the mysterious 5 in the ftell result.
Could someone please tell me what the heck is going on here...and why 10, why not 97 or 28??
Thank you very much for your help!
EDIT: oh wait, I think I have an idea...isn't this linked to \n being 0A on Unix, and 0D 0A on windows, and some inner feature of the compiler converting one to the other? how can I force it to write exactly the bytes I want?
Your file was opened in text mode so CRLF translation is being done. Try:
fopen("test.bin","wb");
You must be working on the Windows machine. In Windows, EOL is CR-LF whereas in Unix, it is a single character. Your system is replacing 0A with 0D0A.
Related
In the simple code below, I'm writing an int number (10) into a file and then reading it back to make sure it's done successfully and it is. However, when I open the file (tried both notepad++ and vscode) I see something like this:
???
Here's the code:
int main(){
int var = 10;
FILE* fp = fopen("testfile","w");
rewind(fp);
fwrite(&var,sizeof(int),1,fp);
fflush(fp);
fclose(fp);
int var2 = 0;
fopen("testfile","r+");
fread(&var2,sizeof(int),1,fp);
printf("num: %d\n",var2);
return 0;
}
Of course I thought maybe it's written in a special format which vscode is unable to recognize, but recently I learned coding a simple database, and it used just the same way to save the records in files and when you opened its output file with vscode, it showed both ???s AND the information, however, here it shows only ???s WITHOUT the information. So although it seems be a very basic problem, I can't find the answer to it, so how is 10 really stored in that file? Thanks in advance.
When you write to the file with fwrite, it reads the raw bytes that make up var and writes those to disk. This is the binary representation of the number.
If you use a tool like od, it will print out the bytes the files contains:
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 0a 00 00 00
0000004
You can see here that the first byte contains the value 10 and the next 3 contain the value 0. This tells us that an int takes up 4 bytes and is stored in little-endian format, meaning the least significant byte comes first.
Had you instead uses fprintf to write the value:
fprintf(fp, "%d\n", var);
It would have written the text representation to the file. The file would then look something like this:
[dbush#db-centos7 ~]$ cat testfile
10
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 31 30 0a
0000003
We can see here that printing the file shows readable text, and od shows us the ASCII codes for the characters '1' and '0', as well as a newline.
You are writing a binary file. It cannot be read with an editor. The value 10 is probably stored as 0x0000000A or 0x0A000000 something like that, depending on if the system is big or small endian.
But the point is that it is stored in binary format and not text format.
If you open this file in a text editor, it will likely be interpreted as three NULL characters and then a LF (line feed) character.
Hello guys I need a help. I want to read from stdin by 16 bytes. Every byte I convert into hexadecimal form. Is there a way I can use read() function to read NOT from the beginning, but for example from the second byte? Also how can I know if I have read the whole stdin? - This way I could call this function in a cycle until I have read the whole stdin
This is a function I made:
void getHexLine()
{
int n = 16;
char buffer[n];
read(STDIN_FILENO, buffer, n);
buffer[n]='\0';
//printf("%08x", 0); hex number of first byte on line - not working yet
putchar(' ');
putchar(' ');
//converting every byte into hexadecimal
for (int i = 0;i < 16;i++ )
{
printf("%x", buffer[i]);
putchar(' ');
if (i == 7 || i == 15)
putchar(' ');
}
printf("|%s|\n", buffer);
}
The output should be like this but with an option to start from second byte for example.
[vcurda#localhost proj1]$ echo "Hello, world! This is my program." | ./proj1
48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 20 54 68 |Hello, world! Th|
69 73 20 69 73 20 6d 79 20 70 72 6f 67 72 61 6d |is is my program|
This is a school project so I cant use malloc, scanf and <string.h>. I would be really glad if I get some help and sorry for my not very understandable english.
stdin is not seekable. You can read bytes in, but you can't rewind or fast forwards. EOF (-1) means end of input in stdin as with a regular file, but it's a bit of a looser concept if you are conducting an interactive dialogue with the user.
Basically stdin is line oriented, and it's best to use the pattern printf() prompt, enter whole line from user, printf() results if applicable and another prompt, read in whole line from user, and so on, at least at first until you get used to programming stdin.
To start from the second byte then becomes easy. Read in the whole line, then start from i = 1 instead of i = 0 as you parse it.
Is there a way I can use read() function to read NOT from the
beginning, but for example from the second byte?
Most universally, you can simply ignore the read-in bytes you're not interested in.
Sometimes you will be able to lseek, e.g. if you run your program with
a regular file set to its STDIN as in:
./a.out < /etc/passwd
but lseek will fail on STDINs that are terminals, pipes, character devices, or sockets.
how can I know if I have read the whole stdin?
read will return 0 at the end of the file.
Consult the manual pages for more information.
Generally, you should check your return codes and account for short reads. Your function should probably return an int so that it has a way to communicate a possible IO error.
Here's the code:
#include<stdio.h>
int main()
{
FILE *fp;
int i;
fp=fopen("DATA","w");
for(i=1;i<=30;++i)
putw(i,fp);
fclose(fp);
fp=fopen("DATA","r");
while((i=getw(fp))!=EOF)
printf("%4d",i);
fclose(fp);
return 0;
}
I don't get the expected output. The program prints the number till 25 rather than till 30. If I set i<=20, I get the right output. I do not understand this.
Help is appreciated. Thanks!
ASCII 26 is the Ctrl-Z (aka SUB) character that on some systems is used to indicate the end of file (normally only for text files). This is the reason your program stops reading the file as soon as it sees the value 26.
The reason this becomes an issue is that you're opening the file in text mode, yet are storing binary data in it (using putw() and getw()).
To fix this, open the file in binary mode and try again.
As suggested by user NPE you work with the binary files in binary mode only.
Here ASCII code comes into the picture because when you are saving an integer to file it is actually stored as sequence of 4 bytes.
If you open your generated DATA file in any hex editor you will notice that when you saved i as 1 into the file it is actually stored as
01 00 00 00
or
00 00 00 01
according to the endian-ness of your system.
On the same basis when you save 26, it gets actually saved as
26 00 00 00
or
00 00 00 26
But when you read this file in text mode and not binary mode, then if character 26 is encountered it is treated as EOF and getw returns -1.
I hope this explains you the actual problem.
This will not happen if you open and close your file in binary mode and write bytes to your file.
I cannot understand why a call to read after an lseek returns 0 number of bytes read.
//A function to find the next note for a given userID;
//returns -1 if at the end of file is reached;
//otherwise, it returns the length of the found note.
int find_user_note(int fd, int user_uid) {
int note_uid = -1;
unsigned char byte;
int length;
while(note_uid != user_uid) { // Loop until a note for user_uid is found.
if(read(fd, ¬e_uid, 4) != 4) // Read the uid data.
return -1; // If 4 bytes aren't read, return end of file code.
if(read(fd, &byte, 1) != 1) // Read the newline separator.
return -1;
byte = length = 0;
while(byte != '\n') { // Figure out how many bytes to the end of line.
if(read(fd, &byte, 1) != 1) // Read a single byte.
return -1; // If byte isn't read, return end of file code.
//printf("%x ", byte);
length++;
}
}
long cur_position = lseek(fd, length * -1, SEEK_CUR ); // Rewind file reading by length bytes.
printf("cur_position: %i\n", cur_position);
// this is debug
byte = 0;
int num_byte = read(fd, &byte, 1);
printf("[DEBUG] found a %d byte note for user id %d\n", length, note_uid);
return length;
}
The variable length value is 34 when it exist the outer while loop and the above code produces cur_position 5 (so there are definitely at least 34 bytes after the lseek function returns), but the variable num_byte returned from function read always returns 0 even though there are still more bytes to read.
Does anyone know the reason num_byte always return 0? If it is a mistake in my code, am not seeing what it is.
Just for information, the above code was run on the following machine
$ uname -srvpio
Linux 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64 GNU/Linux
Update:
I upload the full code here
This is the content of file that I try to read
$ sudo hexdump -C /var/notes
00000000 e8 03 00 00 0a 74 68 69 73 20 69 73 20 61 20 74 |.....this is a t|
00000010 65 73 74 20 6f 66 20 6d 75 6c 74 69 75 73 65 72 |est of multiuser|
00000020 20 6e 6f 74 65 73 0a | notes.|
00000027
$
If length is an unsigned type smaller than off_t (for instance, size_t on a 32-bit machine), then length*-1 is going to be a huge value (somewhere around 4GB perhaps). This could be the problem. Storing the result of lseek into a long (again, if it's 32-bit) will apply an implementation-defined conversion, probably truncation, that leaves you with a small value again.
I see that your machine is 64-bit, but perhaps you're running a 32-bit userspace?
In any case, why not run your program under strace to see what system calls it's making? That will almost surely clear the issue up quickly.
I finally found the issue!!! I have to put #include <unistd.h> in order to use the correct lseek(). However I'm not sure why without including unistd.h it was compile-able though resulting in unexpected behavior. I thought that without including the prototype of a function, it shouldn't even compile-able.
The code was written in Hacking: The Art of Exploitation 2nd Edition by Jon Erickson and I have verified that in the book, there is no #include <unistd.h>.
With the initial variable length set to 34, the above code would
produce cur_position 5 (so there are definitely at least 34 bytes
after the lseek function returns)
This not necessarily is the case, as one could seek around beyond the end of file without getting any errors.
See the excerpt from lseek()'s man page below:
The lseek() function allows the file offset to be set beyond the
end of the file (but this does not change the size of the file).
So one could very well receive a value form lseek()ing, which still points beyond the end of the file. So read()ing from this position will still return 0 (as is's beyond end-of-file).
Also I agree with R.., that taking more care in using the correct types (the types used by the methods used) isn't a bad idea.
Update: also you might take care to include all headers for system functions you call. To check for such I strongly recommand to use gccs option -Wall to switch on all compiler warnings, they are for free ... ;-)
I've made a simple resource packer for packing the resources for my game into one file. Everything was going fine until I began writing the unpacker.
I noticed the .txt file - 26 bytes - that I had packed, came out of the resource file fine, without anyway issues, all data preserved.
However when reading the .PNG file I had packed in the resource file, the first 5 bytes were intact while the rest was completely nullified.
I traced this down to the packing process, and I noticed that fread is only reading the first 5 bytes of the .PNG file and I can't for the life of me figure out why. It even triggers 'EOF' indicating that the file is only 5 bytes long, when in fact it is a 787 byte PNG of a small polygon, 100px by 100px.
I even tested this problem by making a separate application to simply read this PNG file into a buffer, I get the same results and only 5-bytes are read.
Here is the code of that small separate application:
#include <cstdio>
int main(int argc, char** argv)
{
char buffer[1024] = { 0 };
FILE* f = fopen("test.png", "r");
fread(buffer, 1, sizeof(buffer), f);
fclose(f); //<- I use a breakpoint here to verify the buffer contents
return 0;
}
Can somebody please point out my stupid mistake?
Can somebody please point out my stupid mistake?
Windows platform, I guess?
Use this:
FILE* f = fopen("test.png", "rb");
instead of this:
FILE* f = fopen("test.png", "r");
See msdn for explanation.
Extending the correct answer from SigTerm, here is some background of why you got the effect you did for opening a PNG file in text mode:
The PNG format explains its 8-byte file header as follows:
The first eight bytes of a PNG file always contain the following values:
(decimal) 137 80 78 71 13 10 26 10
(hexadecimal) 89 50 4e 47 0d 0a 1a 0a
(ASCII C notation) \211 P N G \r \n \032 \n
This signature both identifies the file as a PNG file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish PNG files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as a PNG file; also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline sequences. The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem.
I believe that in text mode, the call to fread() was terminated when it read the sixth byte which contains a Ctrl+Z character. Ctrl+Z was historically used in MSDOS (and in CPM before it) to indicate the end of a file, which was necessary because the file system stored the size of a file as a count of blocks, not a count of bytes.
By reading the file in text mode instead of binary mode, you triggered the protection against accidentally using the TYPE command to display a PNG file.
One thing you could do that would have helped diagnose this error is to use fread() slightly differently. You didn't test the return value from fread(). You should. Further, you should call it like this:
...
size_t nread;
...
nread = fread(buffer, sizeof(buffer), 1, f);
so that nread is a count of the bytes actually written to the buffer. For the PNG file in text mode, it would have told you on the first read that it only read 5 bytes. Since the file cannot be that small, you would have had a clue that something else was going on. The remaining bytes of the buffer were never modified by fread(), which would have been seen if you initialized the buffer to some other fill value.