EOF and PAGESIZE in mmap in C

EOF and PAGESIZE in mmap in C - c

I have this code to read a file using mmap and print it using printf. The file has 10 lines, and contains nos 0-9 on each line.
My questions are:
1. Why my code doesn't terminate on EOF ? i.e. why doesn't it stop at while (data[i]!=EOF) ?
2. When I run it with while (data[i]!=EOF), the program always terminates at data[10567] ? where as the page size is 4096 bytes. Does 10567 bytes have any significance ?
Edit: I am not looking for alternative like using fscanf, fgets.
Thanks!
Code:
10 int main(int argc, char *argv[])
11 {
12 FILE *ifp, *ofp;
13 int pagesize, fd, i=0;
14 char *data;
15 struct stat sbuf;
16
18 if ((ifp = fopen("/home/t/workspace/lin", "r"))==NULL)
19 {
20 fprintf(stderr, "Can't open input file\n");
21 exit(1);
22 }
28 fd = fileno(ifp);
29 if (stat("/home/t/workspace/lin", &sbuf) == -1)
30 {
31 perror("stat");
32 exit(1);
33 }
34 pagesize = getpagesize();
35 printf("page size: %d\n", pagesize);
36 printf("file size: %d\n", sbuf.st_size);
37 if((data = mmap((caddr_t)0, sbuf.st_size, PROT_READ, MAP_SHARED, fd, 0)) == (caddr_t)(-1))
38 {
39 perror("mmap");
40 exit(1);
41 }
43 //while (data[i]!=EOF)
44 while (i<=sbuf.st_size)
45 {
46 printf("data[%d]=%c\n", i, data[i]);
47 i++;
48 }
50 return 0;
51 }
Output:
page size: 4096
file size: 21
data[0]=0
data[1]=
data[2]=1
data[3]=
data[4]=2
data[5]=
data[6]=3
data[7]=
data[8]=4
data[9]=
. . . .
data[18]=9
data[19]=
data[20]=
data[21]= // truncated my output here,
// it goes till data[10567] if I use `while (data[i]!=EOF)`

EOF is not stored in files. So there's no point comparing a byte from the file with EOF. If you use mmap, as opposed to getchar or equivalent, then you need to stat the file to find out how big it is.
Note that getc, fgetc and getchar return an int. Quoting the manpage (or the Posix standard), these functions return the next byte "as an unsigned char cast to an int, or EOF on end of file or error." The value of EOF must be such that it cannot be confused with "an unsigned char cast to an int"; typically, it is -1. It is possible for a random (signed) char to be equal to -1, so your test data[i]!=EOF may eventually turn out to be true as you scan through uninitialized memory, if you don't segfault before you hit the random byte.
In Unix, text files are not necessarily terminated with NULs either. In short, you should only try to reference bytes you know to be inside the file, based on the file's size.

You output looks correct. The only bug I see is that:
while (i<=sbuf.st_size)
should have <.
There is no EOF, such as a Control-Z, stored in the actual data. All standard functions such as getc will return EOF when their internal counter equivalent to your i is past but their own sbuf.st_size. That is to say, EOF is a fictitious character generated by getc and/or the OS.
The confusion perhaps arises because, if I recall correctly, MS-DOS text files actually contain a ^Z, and if you inadvertently fopen one in binary mode, you can see this unwanted ^Z. Unix does not have this distinction.
With respect to your question:
Does 10567 bytes have any significance ?
I would say no. My guess is that data[10567] happens to be the first byte of memory equal to 0xFF, which is promoted to -l (assuming your char is signed), which matches EOF.

Related

Trouble understanding fseek offset

I have a text file, where each line is an integer with a newline character. I also have a .bin file with the same thing.
10
20
30
40
50
60
70
Running this code...
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
int input;
FILE *infile_t = fopen("numbers.txt", "r");
FILE *infile_b = fopen("numbers.bin", "rb");
if (infile_t == NULL) {
printf("Error: unable to open file %s\n", "numbers.txt");
exit(1);
}
if (infile_b == NULL) {
printf("Error: unable to open file %s\n", "numbers.bin");
exit(1);
}
printf("Enter an integer index: ");
while(scanf("%d",&input) != EOF){
int ch;
fseek(infile_t, (input*sizeof(int))-1, SEEK_SET);
fscanf(infile_t, "In text file: %d\n", &ch);
printf("In text file: %d\n", ch);
fseek(infile_b, (input*sizeof(int))-1, SEEK_SET);
fscanf(infile_b, "%d\n", &ch);
printf("In binary file: %d\n", ch);
printf("Enter an integer index: ");
}
fclose(infile_t);
fclose(infile_b);
return 0;
}
and entering 0, 1, 2, 3, 4 consecutively, I get the outputs:
10
0
40
50
0
I am trying to read the file by 4 bytes at a time (each int) and print the integer. What am I doing wrong and if this is bad practice, what would be better?

There is a difference between the textual representation of numbers and their binary representation.
Your input is a text file, which is a sequence of characters:
"10lf20lf30lf40lf50lf60lf70lf"
Its size is 21 bytes, which you could check with your file explorer.
And as bytes in a tabular form it looks like this, assumed that you are using ASCII and a unix-like system:
Offset
Bytes
Text
0
31 30 0A
"10lf"
3
32 30 0A
"20lf"
6
33 30 0A
"30lf"
9
34 30 0A
"40lf"
12
35 30 0A
"50lf"
15
36 30 0A
"60lf"
18
37 30 0A
"70lf"
There are no integers stored in binary form in your input file.
The function fseek() places the "cursor" into the file at the specified offset.
Then you call scanf() to scan and interpret(!) the sequence of characters that start at that offset.
Input
Offset set by fseek()
Text
Resulting value
0
0
"10lf..."
10
1
4
"0lf..."
0
2
8
"lf40lf..."
40
3
12
"50lf..."
50
4
16
"0lf..."
0
Since scanf() skips leading whitespace, you get "40" in the third case.
You cannot use fseek() in the general case to "jump" to a certain line in a text file. Except, that you know how long each line is. In your case this is known, and if you use a factor of 3 instead of 4, you will get what you seem to want.

I don't know what is in your 'numbers.bin', and you opened 'numbers.txt' as infile_t but didn't use it.
Assuming that the content in 'numbers.bin' is the text content in your question, and you open it in binary mode for reading, the contents stored in the file are as follows(end with one byte '\n' instead of two bytes '\r\n'):
\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
At this time, the file pointer is at the head of the file, pointing to the text content '1'(ascii code is 0x31).
\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
↑
when you use scanf("%d",&input) and input '0', the integer variable input will be 0, then you set the file pointer via fseek(infile_b, input*4, SEEK_SET), the file pointer will point to offset 0 relative to the beginning of the file.
Next line fscanf(infile_b, "%d\n", &ch) will read a integer value to variable ch, then ch will store the value 10 and print it to standard output (stdout) via printf.
When you enter '1', the file pointer will be set to 4, which will point to the fifth byte position relative to the beginning of the file, as follows:
\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
↑
The ascii code of the text value '0' is 0x30. It will read an integer value 0 and store it in ch.
You can replace fseek(infile_b, input*4, SEEK_SET) with fseek(infile_b, input*3, SEEK_SET), and will get the expected output.

fread the same file, but return different result

Today, I read a blog named by "a bug of fread?", I didn't find any reason for it, so I paste it here waiting for any genius.
First, the purpose of the program is to read a file(readme.txt) and print the content, and I test it with Visual Studio 2010.
The content of the readme is :
1234;
abcd;
ABCD;
The hex value of readme is :
31 32 33 34 3b 0d 0a 61 62 63 64 3b 0d 0a 41 42 43 44 3b
Here is the code:
#include <stdio.h>
#include <string.h>
#define BUF_SIZE 1024
int main()
{
FILE *fp = NULL;
int rcnt = 0;
char rbuf[BUF_SIZE];
fp = fopen("readme.txt", "r");
if (NULL == fp)
{
printf("fopen error.\n");
return -1;
}
printf("--------------------------\n");
memset(rbuf, 0, BUF_SIZE);
fseek(fp, 0, SEEK_SET);
rcnt = fread(rbuf, 1, BUF_SIZE, fp);
printf("read cnt = %d\n", rcnt);
printf("%s\n", rbuf);
return 0;
}
Such a simple code, and the expected result is :
--------------------------
read cnt = 17
1234;
abcd;
ABCD;
Total 17 count include 15 characters and 2 '\n'.
But I got the below result:
--------------------------
read cnt = 17
1234;
abcd;
ABCD;D;
PS: If call fopen function with "rb", or if define the macro BUF_SIZE smaller, I got the correct result.

fread() doesn't return a NUL terminated string, but printf("%s") ask for a NUL terminated string.
You have to add a '\0' at the end of the read buffer: rbuf[rcnt] = '\0'.
And remember to read one byte less than the buffer size to leave room for the NUL byte.

I think it's wrong to use fread(), a binary reading API, with a text file. The default mode (if you just say "r") is text.
Note that FILE * I/O in text mode typically does line-termination translation, so that you can pretend that lines end with \n when they might in fact physically end with \r\n (as yours do).
This conversion might introduce confusion somewhere; which is why switching to binary mode makes it work again as no such translation happens in binary mode.

fscanf() seg fault Program received signal EXC_BAD_ACCESS

16 char* input = (char*) argv[1];
17 FILE *fp = fopen (input, "r");
18 if( fp == NULL)
19 {
20 printf(" reading input file failed");
21 return 0;
22 }
23 fseek(fp,0,SEEK_END);
24 int file_size = ftell(fp);
29 rewind(fp);
30 int i;
31 int totalRun;
32 char * temp;
33 char* model;
34 char* example;
36 fscanf(fp,"%d",&totalRun);
37 fscanf(fp,"%s",model);
Above is my code I get this error at line 37 "fscanf(fp,"%s".model)"
Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5fc00730 0x00007fff8db20bcb in __svfscanf_l ()
What can cause this ?? I looked into *fp in gdb. before reading totalRun _offset = 0 and after reading _offset = 4096. content of totalRun was correct ("3"). I only read one line and why is offset 4096? Also what is _blksize referring to in FILE.
Thank you

You need to allocate memory for model, it is an uninitialised pointer. Also ensure fscanf() does not read beyond the array assigned to model. If model does not need to by dynamically allocated then just use a local array. For example:
char model[1024];
if (1 == fscanf(fp, "%1023s", model))
{
}
Always check the return value of fscanf(), which returns the number of successful assignments, otherwise the program will be processing uninitialised variables if the call to fscanf() fails.

The variable model is not initalized. You must allocated memory for it before it can be used in the fscanf() method. You can do in two ways:
Statically - char model[1024];
Dynamically - char * model = (char*) malloc(1024); Don't forget to use free() to deallocate the buffer once you are done.

Why does a file read after calling lseek always return 0?

I cannot understand why a call to read after an lseek returns 0 number of bytes read.
//A function to find the next note for a given userID;
//returns -1 if at the end of file is reached;
//otherwise, it returns the length of the found note.
int find_user_note(int fd, int user_uid) {
int note_uid = -1;
unsigned char byte;
int length;
while(note_uid != user_uid) { // Loop until a note for user_uid is found.
if(read(fd, &note_uid, 4) != 4) // Read the uid data.
return -1; // If 4 bytes aren't read, return end of file code.
if(read(fd, &byte, 1) != 1) // Read the newline separator.
return -1;
byte = length = 0;
while(byte != '\n') { // Figure out how many bytes to the end of line.
if(read(fd, &byte, 1) != 1) // Read a single byte.
return -1; // If byte isn't read, return end of file code.
//printf("%x ", byte);
length++;
}
}
long cur_position = lseek(fd, length * -1, SEEK_CUR ); // Rewind file reading by length bytes.
printf("cur_position: %i\n", cur_position);
// this is debug
byte = 0;
int num_byte = read(fd, &byte, 1);
printf("[DEBUG] found a %d byte note for user id %d\n", length, note_uid);
return length;
}
The variable length value is 34 when it exist the outer while loop and the above code produces cur_position 5 (so there are definitely at least 34 bytes after the lseek function returns), but the variable num_byte returned from function read always returns 0 even though there are still more bytes to read.
Does anyone know the reason num_byte always return 0? If it is a mistake in my code, am not seeing what it is.
Just for information, the above code was run on the following machine
$ uname -srvpio
Linux 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64 GNU/Linux
Update:
I upload the full code here
This is the content of file that I try to read
$ sudo hexdump -C /var/notes
00000000 e8 03 00 00 0a 74 68 69 73 20 69 73 20 61 20 74 |.....this is a t|
00000010 65 73 74 20 6f 66 20 6d 75 6c 74 69 75 73 65 72 |est of multiuser|
00000020 20 6e 6f 74 65 73 0a | notes.|
00000027
$

If length is an unsigned type smaller than off_t (for instance, size_t on a 32-bit machine), then length*-1 is going to be a huge value (somewhere around 4GB perhaps). This could be the problem. Storing the result of lseek into a long (again, if it's 32-bit) will apply an implementation-defined conversion, probably truncation, that leaves you with a small value again.
I see that your machine is 64-bit, but perhaps you're running a 32-bit userspace?
In any case, why not run your program under strace to see what system calls it's making? That will almost surely clear the issue up quickly.

I finally found the issue!!! I have to put #include <unistd.h> in order to use the correct lseek(). However I'm not sure why without including unistd.h it was compile-able though resulting in unexpected behavior. I thought that without including the prototype of a function, it shouldn't even compile-able.
The code was written in Hacking: The Art of Exploitation 2nd Edition by Jon Erickson and I have verified that in the book, there is no #include <unistd.h>.

With the initial variable length set to 34, the above code would
produce cur_position 5 (so there are definitely at least 34 bytes
after the lseek function returns)
This not necessarily is the case, as one could seek around beyond the end of file without getting any errors.
See the excerpt from lseek()'s man page below:
The lseek() function allows the file offset to be set beyond the
end of the file (but this does not change the size of the file).
So one could very well receive a value form lseek()ing, which still points beyond the end of the file. So read()ing from this position will still return 0 (as is's beyond end-of-file).
Also I agree with R.., that taking more care in using the correct types (the types used by the methods used) isn't a bad idea.
Update: also you might take care to include all headers for system functions you call. To check for such I strongly recommand to use gccs option -Wall to switch on all compiler warnings, they are for free ... ;-)

C: lseek() related question

I want to write some bogus text in a file ("helloworld" text in a file called helloworld), but not starting from the beginning. I was thinking to lseek() function.
If I use the following code (edited):
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <stdlib.h>
#include <stdio.h>
#define fname "helloworld"
#define buf_size 16
int main(){
char buffer[buf_size];
int fildes,
nbytes;
off_t ret;
fildes = open(fname, O_CREAT | O_TRUNC | O_WRONLY, S_IRUSR | S_IWUSR);
if(fildes < 0){
printf("\nCannot create file + trunc file.\n");
}
//modify offset
if((ret = lseek(fildes, (off_t) 10, SEEK_END)) < (off_t) 0){
fprintf(stdout, "\nCannot modify offset.\n");
}
printf("ret = %d\n", (int)ret);
if(write(fildes, fname, 10) < 0){
fprintf(stdout, "\nWrite failed.\n");
}
close(fildes);
return (0);
}
, it compiles well and it runs without any apparent errors.
Still if i :
cat helloworld
The output is not what I expected, but:
helloworld
Can
Where is "Can" comming from, and where are my empty spaces ?
Should i expect for "zeros" instead of spaces ? If i try to open helloworld with gedit, an error occurs, complaining that the file character encoding is unknown.
LATER EDIT:
After I edited my program with the right buffer for writing, and then compile / run again, the "helloworld" file still cannot be opened with gedit.strong text
LATER EDIT
I understand the issue now. I've added to the code the following:
fildes = open(fname, O_RDONLY);
if(fildes < 0){
printf("\nCannot open file.\n");
}
while((nbytes = read(fildes, c, 1)) == 1){
printf("%d ", (int)*c);
}
And now the output is:
0 0 0 0 0 0 0 0 0 0 104 101 108 108 111 119 111 114 108 100
My problem was that i was expecting spaces (32) instead of zeros (0).

In this function call, write(fildes, fname, buf_size), fname has 10 characters (plus a trailing '\0' character, but you're telling the function to write out 16 bytes. Who knows what in the memory locations after the fname string.
Also, I'm not sure what you mean by "where are my empty spaces?".

Apart from expecting zeros to equal spaces, the original problem was indeed writing more than the length of the "helloworld" string. To avoid such a problem, I suggest letting the compiler calculate the length of your constant strings for you:
write(fildes, fname, sizeof(fname) - 1)
The - 1 is due to the NUL character (zero, \0) that is used to terminate C-style strings, and sizeof simply returning the size of the array that holds the string. Due to this you cannot use sizeof to calculate the actual length of a string at runtime, but it works fine for compile-time constants.
The "Can" you saw in your original test was almost certainly the beginning of one of the "\nCannot" strings in your code; after writing the 11 bytes in "helloworld\0" you continued to write the remaining bytes from whatever was following it in memory, which turned out to be the next string constant. (The question has now been amended to write 10 bytes, but the originally posted version wrote 16.)
The presence of NUL characters (= zero, '\0') in a text file may indeed cause certain (but not all) text editors to consider the file binary data instead of text, and possibly refuse to open it. A text file should contain just text, not control characters.

Your buf_size doesn't match the length of fname. It's reading past the buffer, and therefore getting more or less random bytes that just happened to sit after the string in memory.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

EOF and PAGESIZE in mmap in C - c

Related

Trouble understanding fseek offset

fread the same file, but return different result

fscanf() seg fault Program received signal EXC_BAD_ACCESS

Why does a file read after calling lseek always return 0?

C: lseek() related question

Categories

Resources