Trouble understanding fseek offset - c

I have a text file, where each line is an integer with a newline character. I also have a .bin file with the same thing.
10
20
30
40
50
60
70
Running this code...
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
int input;
FILE *infile_t = fopen("numbers.txt", "r");
FILE *infile_b = fopen("numbers.bin", "rb");
if (infile_t == NULL) {
printf("Error: unable to open file %s\n", "numbers.txt");
exit(1);
}
if (infile_b == NULL) {
printf("Error: unable to open file %s\n", "numbers.bin");
exit(1);
}
printf("Enter an integer index: ");
while(scanf("%d",&input) != EOF){
int ch;
fseek(infile_t, (input*sizeof(int))-1, SEEK_SET);
fscanf(infile_t, "In text file: %d\n", &ch);
printf("In text file: %d\n", ch);
fseek(infile_b, (input*sizeof(int))-1, SEEK_SET);
fscanf(infile_b, "%d\n", &ch);
printf("In binary file: %d\n", ch);
printf("Enter an integer index: ");
}
fclose(infile_t);
fclose(infile_b);
return 0;
}
and entering 0, 1, 2, 3, 4 consecutively, I get the outputs:
10
0
40
50
0
I am trying to read the file by 4 bytes at a time (each int) and print the integer. What am I doing wrong and if this is bad practice, what would be better?

There is a difference between the textual representation of numbers and their binary representation.
Your input is a text file, which is a sequence of characters:
"10lf20lf30lf40lf50lf60lf70lf"
Its size is 21 bytes, which you could check with your file explorer.
And as bytes in a tabular form it looks like this, assumed that you are using ASCII and a unix-like system:
Offset
Bytes
Text
0
31 30 0A
"10lf"
3
32 30 0A
"20lf"
6
33 30 0A
"30lf"
9
34 30 0A
"40lf"
12
35 30 0A
"50lf"
15
36 30 0A
"60lf"
18
37 30 0A
"70lf"
There are no integers stored in binary form in your input file.
The function fseek() places the "cursor" into the file at the specified offset.
Then you call scanf() to scan and interpret(!) the sequence of characters that start at that offset.
Input
Offset set by fseek()
Text
Resulting value
0
0
"10lf..."
10
1
4
"0lf..."
0
2
8
"lf40lf..."
40
3
12
"50lf..."
50
4
16
"0lf..."
0
Since scanf() skips leading whitespace, you get "40" in the third case.
You cannot use fseek() in the general case to "jump" to a certain line in a text file. Except, that you know how long each line is. In your case this is known, and if you use a factor of 3 instead of 4, you will get what you seem to want.

I don't know what is in your 'numbers.bin', and you opened 'numbers.txt' as infile_t but didn't use it.
Assuming that the content in 'numbers.bin' is the text content in your question, and you open it in binary mode for reading, the contents stored in the file are as follows(end with one byte '\n' instead of two bytes '\r\n'):
\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
At this time, the file pointer is at the head of the file, pointing to the text content '1'(ascii code is 0x31).
\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
↑
when you use scanf("%d",&input) and input '0', the integer variable input will be 0, then you set the file pointer via fseek(infile_b, input*4, SEEK_SET), the file pointer will point to offset 0 relative to the beginning of the file.
Next line fscanf(infile_b, "%d\n", &ch) will read a integer value to variable ch, then ch will store the value 10 and print it to standard output (stdout) via printf.
When you enter '1', the file pointer will be set to 4, which will point to the fifth byte position relative to the beginning of the file, as follows:
\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
↑
The ascii code of the text value '0' is 0x30. It will read an integer value 0 and store it in ch.
You can replace fseek(infile_b, input*4, SEEK_SET) with fseek(infile_b, input*3, SEEK_SET), and will get the expected output.

Related

Shifting Extended ASCII codes

When assigning Extended ASCII codes to an unsigned char, I noticed that the values are shifted upwards when they are written to a file.
I condensed my code into this simple program to briefly present my question:
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned char testascii[3];
testascii[0] = 122;
testascii[1] = 150;
testascii[2] = 175;
printf("%d\n", testascii[0]);
printf("%d\n", testascii[1]);
printf("%d\n", testascii[2]);
return 0;
}
If I run this simple program, I get this terminal output:
122
150
175
This is correct.
If I now add the following to the above program:
FILE *f;
f = fopen("/mystuff/testascii", "wb");
if (f == NULL)
{
printf("Error opening file\n");
exit(1);
}
fwrite(testascii, 1, 3, f);
fclose(f);
It runs correctly but if I now go to the O/S and run:
od -c testascii
I get this output:
0000000 z 226 257
0000003
As you can see the Standard ASCII code (below 128) is correctly shown; however the Extended ASCII codes (above 127) are changed. I expect them to be 150 and 175 but they are 226 and 257.
If I remove the binary flag from the file open command, the result is still the same.
As a final check, instead of the binary print (fwrite), I changed the code again and looped through the array and did a fprintf of each item like this:
fprintf (fp, "%d", appendtxt[i]);
Here's the OD display for that:
0000000 1 2 2 1 5 0 1 7 5
0000011
This all tells me that the binary print (fwrite) isn't doing what I expected. It's my understanding the fwrite command writes the binary data to the file. In that case why does it successful write a value less than 128 but it fails with values equal to or greater than 128?
Environment:
Code::Blocks 16.01
Centos 7.1
Note: I did find this similar question: fwrite with non ASCII characters but it didn't seem to help with my situation. I could be wrong. Please let me know if I missed something in that post?
You are printing in octal (that's what od does by default), 226 octal is 150 decimal.

Converting HEX data in a file to ascii

i want to decode an ASN.1 standard binary file. i have converted the binary file to hex and stored it in a file. now i want to convert this hex to ascii. The problem im having now is how to read the hex file.
the file looks this way,
81 01 32 82 0D 35 31 34 32 34 31 38 38 38 where 81 is a header, 01 is the size and 32 is the data. again 82 is the header and this goes on. how do i read from this file and differentiate between the various fields present.
i searched all over the internet for this, but couldnt get a satisfactory answer. so can someone help me with the way forward . i dont want any code, just want the procedure how i can do it.
I would first read the header and then in a loop the data. You can read hexadecimal numbers with the "x"-specifier (say your file is named hexfile.txt):
#include <stdio.h>
int main ()
{
FILE *stream;
unsigned int h, l, d;
if( (stream = fopen( "hexfile.txt", "r" )) == NULL ) return 1;
while (EOF != fscanf (stream, " %x %x", &h, &l))
{
printf ("%02X %02X\n",h,l);
for (unsigned i=0; i<l; ++i)
{
if (EOF == fscanf (stream, " %x", &d)) break;
printf ("%02X ",d);
}
puts ("");
}
fclose (stream);
return 0;
}

Reading in a text file to be formatted and output (C programming)

I'm working a program for a class and it's proving much more difficult than I thought.
It's my very first experience with C, but I've had some experience with Java so I understand the general concepts.
My goal: read in a text file that has some formatting requirements contained within the file, store the file in an array, apply the formatting and output the formatted text to stdout.
The problem: reading in the file is easy, but I'm having trouble with the formatting and output.
The challenge:
--the input file will begin with ?width X, ?mrgn Y, or both (where X and Y are integers). these will always appear at the beginning of the file, and will each be on a separate line.
--the output must have the text formatted as per the formatting request (width, margin).
--additionally, there is a 3rd format command, ?fmt on/off, which can appear multiple times at any point throughout the text and will turn formatting on/off.
--just a few catches: if no ?width command appears, formatting is considered off and any ?margin commands are ignored.
--if a ?width command appears, formatting is considered on.
--the files can contain whitespace (as both tabs and spaces) which must be eliminated, but only when formatting is on.
--dynamic memory allocation is not allowed.
Easy for a first C program right? My professor is such a sweety.
I've been working on this code for hours (yes, I know it doesn't look like it) and I'm making little progress so if you feel like a challenge I would very much appreciate help of any kind. Thanks!
So far my code reads in the text file and formats the margin correctly. Can't seem to find a good way to eliminate whitespace (I believe tokenizing is the way to go) or do the word wrap when the length of the line is longer than the ?width command.
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 #define MAX_LINE_LEN 133 /* max 132 char per line plus one extra for the new line char
6 #define MAX_LINES 300 /* max 300 lines per input file */
7 #define MAX_CHARS 39900 /* max number of characters in the file */
8
9 /* Initializes array used to store lines read in from text
10 file as well as output array */
11 char input[MAX_LINE_LEN];
12 char buffer[MAX_CHARS];
13 char word_wrap[MAX_LINE_LEN];
14
15 /* Functions */
16 void parameters(char [], FILE *);
17
18 /* Variables */
19 int width = 0;
20 int margin = 0;
21
22 /*
23 argc is the count of input arguments
24 *argv is a pointer to the input arguments
25 */
26 int main (int argc, char *argv[])
27 {
28 /* Creates file pointer */
29 FILE *fp = fopen(argv[1], "r"); /* r for read */
30
31 if (!fp) /* Error checking */
32 {
33 printf("Error: Could not open file");
34 return 0;
35 }
36
37 /* Retrieves width and margin parameters from input file */
38 parameters(input, fp);
39
40 fclose(fp); /* Closes file stream */
41
42 return 0;
43 }
44
45 void parameters(char input[], FILE *fp)
46 {
47 /* Gets input file text line by line */
48 while (fgets (input, 133, fp) != NULL)
49 {
50 /* Creates a pointer to traverse array */
51 char *p = input;
52
53 /* Checks for width parameter read in from text file */
54 if (input[0] == '?' && input [1] == 'w')
55 {
56 strtok(input, " "); /* Eliminates first token '?width' */
57 width = atoi(strtok(NULL, " ")); /* Stores int value of ASCII token
58 p = NULL;
59 }
60
61 /* Checks for margin parameter read in from text file */
62 if (input[0] == '?' && input[1] == 'm')
63 {
64 strtok(input, " "); /* Eliminates first token '?mrgn' */
65 margin = atoi(strtok(NULL, " ")); /* Stores int value of ASCII token
66 p = NULL;
67 }
68
69 if (p != NULL) /* skips printing format tokens at beginning of file */
70 {
71 if (width == 0) /* no width command, formatting is off by default */
72 {
73 printf("%s", p); /* Prints unformatted line of text */
74 }
75 else /* formatting is on */
76 {
77 printf("%*s" "%s", margin, " ", p); /* Prints formatted line of text
78 }
79 }
80 }
81 }
82
And here's an example input file, along with its proper output:
?width 30
?mrgn 5
While there are enough characters here to
fill
at least one line, there is
plenty
of
white space which will cause
a bit of confusion to the reader, yet
the ?fmt off command means that
the original formatting of
the lines
must be preserved. In essence, the
command ?pgwdth is ignored.
Output:
While there are enough
characters here to fill
at least one line, there
is plenty of white space
which will cause a bit of
confusion to the reader,
yet the ?fmt off command means that
the original formatting of
the lines
must be preserved. In essence, the
command ?pgwdth is ignored.

EOF and PAGESIZE in mmap in C

I have this code to read a file using mmap and print it using printf. The file has 10 lines, and contains nos 0-9 on each line.
My questions are:
1. Why my code doesn't terminate on EOF ? i.e. why doesn't it stop at while (data[i]!=EOF) ?
2. When I run it with while (data[i]!=EOF), the program always terminates at data[10567] ? where as the page size is 4096 bytes. Does 10567 bytes have any significance ?
Edit: I am not looking for alternative like using fscanf, fgets.
Thanks!
Code:
10 int main(int argc, char *argv[])
11 {
12 FILE *ifp, *ofp;
13 int pagesize, fd, i=0;
14 char *data;
15 struct stat sbuf;
16
18 if ((ifp = fopen("/home/t/workspace/lin", "r"))==NULL)
19 {
20 fprintf(stderr, "Can't open input file\n");
21 exit(1);
22 }
28 fd = fileno(ifp);
29 if (stat("/home/t/workspace/lin", &sbuf) == -1)
30 {
31 perror("stat");
32 exit(1);
33 }
34 pagesize = getpagesize();
35 printf("page size: %d\n", pagesize);
36 printf("file size: %d\n", sbuf.st_size);
37 if((data = mmap((caddr_t)0, sbuf.st_size, PROT_READ, MAP_SHARED, fd, 0)) == (caddr_t)(-1))
38 {
39 perror("mmap");
40 exit(1);
41 }
43 //while (data[i]!=EOF)
44 while (i<=sbuf.st_size)
45 {
46 printf("data[%d]=%c\n", i, data[i]);
47 i++;
48 }
50 return 0;
51 }
Output:
page size: 4096
file size: 21
data[0]=0
data[1]=
data[2]=1
data[3]=
data[4]=2
data[5]=
data[6]=3
data[7]=
data[8]=4
data[9]=
. . . .
data[18]=9
data[19]=
data[20]=
data[21]= // truncated my output here,
// it goes till data[10567] if I use `while (data[i]!=EOF)`
EOF is not stored in files. So there's no point comparing a byte from the file with EOF. If you use mmap, as opposed to getchar or equivalent, then you need to stat the file to find out how big it is.
Note that getc, fgetc and getchar return an int. Quoting the manpage (or the Posix standard), these functions return the next byte "as an unsigned char cast to an int, or EOF on end of file or error." The value of EOF must be such that it cannot be confused with "an unsigned char cast to an int"; typically, it is -1. It is possible for a random (signed) char to be equal to -1, so your test data[i]!=EOF may eventually turn out to be true as you scan through uninitialized memory, if you don't segfault before you hit the random byte.
In Unix, text files are not necessarily terminated with NULs either. In short, you should only try to reference bytes you know to be inside the file, based on the file's size.
You output looks correct. The only bug I see is that:
while (i<=sbuf.st_size)
should have <.
There is no EOF, such as a Control-Z, stored in the actual data. All standard functions such as getc will return EOF when their internal counter equivalent to your i is past but their own sbuf.st_size. That is to say, EOF is a fictitious character generated by getc and/or the OS.
The confusion perhaps arises because, if I recall correctly, MS-DOS text files actually contain a ^Z, and if you inadvertently fopen one in binary mode, you can see this unwanted ^Z. Unix does not have this distinction.
With respect to your question:
Does 10567 bytes have any significance ?
I would say no. My guess is that data[10567] happens to be the first byte of memory equal to 0xFF, which is promoted to -l (assuming your char is signed), which matches EOF.

fread the same file, but return different result

Today, I read a blog named by "a bug of fread?", I didn't find any reason for it, so I paste it here waiting for any genius.
First, the purpose of the program is to read a file(readme.txt) and print the content, and I test it with Visual Studio 2010.
The content of the readme is :
1234;
abcd;
ABCD;
The hex value of readme is :
31 32 33 34 3b 0d 0a 61 62 63 64 3b 0d 0a 41 42 43 44 3b
Here is the code:
#include <stdio.h>
#include <string.h>
#define BUF_SIZE 1024
int main()
{
FILE *fp = NULL;
int rcnt = 0;
char rbuf[BUF_SIZE];
fp = fopen("readme.txt", "r");
if (NULL == fp)
{
printf("fopen error.\n");
return -1;
}
printf("--------------------------\n");
memset(rbuf, 0, BUF_SIZE);
fseek(fp, 0, SEEK_SET);
rcnt = fread(rbuf, 1, BUF_SIZE, fp);
printf("read cnt = %d\n", rcnt);
printf("%s\n", rbuf);
return 0;
}
Such a simple code, and the expected result is :
--------------------------
read cnt = 17
1234;
abcd;
ABCD;
Total 17 count include 15 characters and 2 '\n'.
But I got the below result:
--------------------------
read cnt = 17
1234;
abcd;
ABCD;D;
PS: If call fopen function with "rb", or if define the macro BUF_SIZE smaller, I got the correct result.
fread() doesn't return a NUL terminated string, but printf("%s") ask for a NUL terminated string.
You have to add a '\0' at the end of the read buffer: rbuf[rcnt] = '\0'.
And remember to read one byte less than the buffer size to leave room for the NUL byte.
I think it's wrong to use fread(), a binary reading API, with a text file. The default mode (if you just say "r") is text.
Note that FILE * I/O in text mode typically does line-termination translation, so that you can pretend that lines end with \n when they might in fact physically end with \r\n (as yours do).
This conversion might introduce confusion somewhere; which is why switching to binary mode makes it work again as no such translation happens in binary mode.

Resources