For my assignment, I'm required to use fread/fwrite. I wrote
#include <stdio.h>
#include <string.h>
struct rec{
int account;
char name[100];
double balance;
};
int main()
{
struct rec rec1;
int c;
FILE *fptr;
fptr = fopen("clients.txt", "r");
if (fptr == NULL)
printf("File could not be opened, exiting program.\n");
else
{
printf("%-10s%-13s%s\n", "Account", "Name", "Balance");
while (!feof(fptr))
{
//fscanf(fptr, "%d%s%lf", &rec.account, rec.name, &rec.balance);
fread(&rec1, sizeof(rec1),1, fptr);
printf("%d %s %f\n", rec1.account, rec1.name, rec1.balance);
}
fclose(fptr);
}
return 0;
}
clients.txt file
100 Jones 564.90
200 Rita 54.23
300 Richard -45.00
output
Account Name Balance
540028977 Jones 564.90
200 Rita 54.23
300 Richard -45.00╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
╠╠ü☻§9x°é -92559631349317831000000000000000000000000000000000000000000000.000000
Press any key to continue . . .
I can do this with fscanf (which Ive commented out), but I'm required to use fread/fwrite.
Why does it start with a massive number for Jone's account?
Why is there garbage after? Shouldn't feof stop this?
Are there any drawbacks using this method? or fscanf method?
How can I fix these?
Many thanks in advance
As the comments say, fread reads the bytes in your file without any interpretation. The file clients.txt consists of 50 characters, 16 in the first line plus 14 in the second plus 18 in the third line, plus two newline characters. (Your clients.txt does not contain a newline after the third line, as you will soon see.) The newline character is a single byte \n on UNIX or Mac OS X machines, but (probably) two bytes \r\n on Windows machines - hence either 50 or 51 characters. Here is the sequence of ASCII bytes in hexadecimal:
3130 3020 4a6f 6e65 7320 3536 342e 3930 100 Jones 564.90
0a32 3030 2052 6974 6120 3534 2e32 330a \n200 Rita 54.23\n
3330 3020 5269 6368 6172 6420 2d34 352e 300 Richard -45.
3030 00
Your fread statement copies these bytes without any interpretation directly into your rec1 data structure. That structure begins with int account;, which says to interpret the first four bytes as an int. As one of the comments noted, you are running your program on a little-endian machine (most likely an Intel machine), so the least significant byte is the first and the most significant byte is the fourth. Thus, your fread said to interpret the sequence of four ASCII characters "100 " as the four byte integer 0x20303031, which equals, in decimal, 540028977. The next member of your struct is char name[100];, which means that the next 100 bytes of data in rec1 will be the name. But the fread was told to read sizeof(rec1)=112 bytes (4 byte account, 100 byte name, 8 byte balance). Since your file is only 50 (or 52) characters, fread will have only been able to fill in that many bytes of rec1. The return value of fread, had you not discarded it, would have told you that the read stopped short of the number of bytes you requested. Since you hit EOF, the feof call breaks out of the loop after that first pass, having consumed the entire file in one gulp.
All of your output was produced by the first and only call to fprintf. The number 540028977 and the following space were produced by the "%d " and the rec1.account argument. The next bit is only partly determinate, and you got lucky: The "%s" specifier and the corresponding rec1.name argument will print the next characters as ASCII until a \0 byte is found. Thus, the output will begin with the 50-4 (or 52-4) remaining characters of your file -- including the two newlines -- and potentially continue forever, because there are no \0 bytes in your file (or in any text file), which means that after printing the last character of your file, what you are seeing is whatever garbage happened to be in the automatic variable rec1 when your program started. (That kind of unintentional output is similar to the famous heartbleed bug in OpenSSL.) You were lucky the garbage included a \0 byte after only a few dozen more characters. Note that printf has no way to know that rec1.name was declared to be only a 100 byte array -- it only got the pointer to the beginning of name -- it was your responsibility to guarantee that rec1.name contained a terminating \0 byte, and you never did that.
We can tell a little bit more. The number -9.2559631349317831e61 (which is pretty ugly in "%f" format) is the value of rec1.balance. The 8 bytes for that double value on an IEEE 754 machine (like your Intel and all modern computers) are in hex 0xcccccccccccccccc. Sixty four of the peculiar ╠ symbol appear in the "%s" output corresponding to rec1.name, while only 100-46 = 54 characters remain of the 100, so your "%s" output has run off the end of rec1.name, and includes rec1.balance into the bargain, and we learn that your terminal program interpreted the non-ASCII character 0xcc as ╠. There are many ways to interpret bytes bigger than 127 (0x7f); in latin-1 it would have been Ì for example. The graphical character ╠ is the representation of the 0xcc (204) byte in the ancient MS-DOS character set, Windows code page 437. Not only are you running on an Intel machine, it is a Windows machine (of course the mostly likely possibility to begin with).
That answers your first two questions. I'm not sure I understand your third question. The "drawbacks" I hope are obvious.
As for how to fix it, there is no reasonably simple way to read and interpret a text file using fread. To do so, you would need to duplicate much of the code in the libc fscanf function. The only sensible way is to first use fwrite to create a binary file; then fread will work naturally to read it back. So there have to be two programs -- one to write a binary clients.bin file, and a second to read it back. Of course, that does not solve the problem of where the data for that first program should come from in the first place. It could come from reading clients.txt using fscanf. Or it could be included in the source code of the fwrite program, for example by initializing an array of struct rec like this:
struct rec recs[] = {{100, "Jones", 564.90},
{200, "Rita", 54.23},
{300, "Richard", -45.00}};
Or it could come from reading a MySQL database, or... The one place it is unlikely to originate is in a binary file (easily) readable with fread.
Related
I came across the following question:
If a file contains the line "I am a boy\r\n" then on reading this line into the array str using fgets(). What will str contain?
[A]. "I am a boy\r\n\0"
[B]. "I am a boy\r\0"
[C]. "I am a boy\n\0"
[D]. "I am a boy"
The answer has been given as option c with the explanation
Declaration: char *fgets(char *s, int n, FILE *stream);
fgets reads characters from stream into the string s. It stops when it reads either n - 1 characters or a newline character, whichever comes first.
However, I couldn't understand how will \r (carriage return) influence fgets. I mean, shouldn't it be that first "I am a boy" is read, then on encountering \r cursor is set at the initial position and "I" from "I am a body" is overwritten by \n and space following "I" is overwritten by \0.
Any help is deeply appreciated.
P.s: My claim is based on the explanation given on this link: https://www.quora.com/What-exactly-is-r-in-the-C-language
First, every time you see a multiple choice quiz on some programming website, I recommend you close the tab and do something productive instead such as watching videos of kittens. Because the questions seem to be just some variants of
Which of these is the first letter of the alphabet (only one is right)
A
a
6
a
the letter a
all of the above.
Carriage returns and line feeds do not affect the input read by a C program in that way. Each additional byte is just on top of the other bytes. Otherwise, this is very badly phrased question, as the answer be any of A, B, C or D, or maybe none of them. Saying that C is the only one that is right is wrong.
First question is what it means if "the file contains \r"? Here I assume that the author meant that the file contains the 10 characters I am a boy followed by ASCII 13 and ASCII 10 (carriage return and line feed).
In C there are two translation modes for reading files, text mode and binary mode. On POSIX systems (all those operating systems with X in their name, except for Windows eXcePtion) these are equal - the text mode is ignored. So when you read the line into a buffer with fgets on POSIX, it will look for that line feed and store all letters as is including the , so the buffer will have the following sequence of bytes I am a boy\r\n\0. Therefore A could be true.
But on Windows, the text mode translates the carriage return and the linefeed to one newline character with ASCII value 10 in memory, so what you will have is I am a boy\n\0. Therefore C could be true. If your file was opened in binary mode, you'll still have I am a boy\r\n\0 - so how'd you claim that C is the only one that can be true?
If the string that you'd read with fgets would be I am a boy\r\n (POSIX or binary mode) but you told fgets your buffer has space for only 12 characters, then you'd get 11 characters of the input and terminating \0, and therefore you'd have I am a boy\r\0. The carriage return character would remain in the stream. Therefore B could be true. B cannot be true if you indicated that the buffer will have more space.
Finally any of these array contents does contain the string I am a boy, therefore D would be true in all of the cases above.
And if your buffer didn't have enough space for 10 characters and the terminator then you'd have some prefix of the contents, such as I am a bo followed by \0 which means that none of these was true.
I have an assignment and basically i want to read all the bytes from an audio file using getchar() like this:
while(ch = getchar()) != EOF)
At some point I have to read 4 consecutive bytes that stand for size of file and I can't understand the following:
If the file my program is reading is for example 150 bytes in size, that is enough to be stored in 1 of the 4 bytes, which means that 3 of the bytes will be 0 and the last one will be 150 in that case. I understand that I need to read all 4 bytes, through 4 repetitions of the while in the above section of cod, in order to get all the information I need, but what exactly is getchar() going to return to my variable, as it returns the ASCII code for the character it just read?
Also what happens for larger numbers, that can't be stored in a single byte?
Cant comment since i dont have enough reputation, i am deeply perplexed with your question for I do not understand what do you mean or what are you trying to achieve
The function getChar() should be used for returning mostly a single byte at a time, in fact only upon reading your question did i check the manual to learn it reads more than one although from my experience and the tests i performed now it seems it is used for reading multi byte characters heres the simple code i used to check for it
char * c;
printf("Enter character: ");
c = getchar();
printf("%s",c);
The character i used and this will probably unformat is the stack overflow glyph i use in my polybar, 溜, here it shows as an asian character.
Not only that but fgets will return EOF when arriving at the end of the file(or when an error occurs) as stated in the linux manual
https://linux.die.net/man/3/getchar
Also upon further reading it depends on how the file stores data, if its big endian the first byte read will be 0,0,0,150 else if its little endian it will be 150,0,0,0 but thats assuming it is reading 1 character at the time and not 4 at once as you described it
As for the "solution" of your question why not use fread() reading the 4 bytes at once or a derivative when it does it job properly?
EDIT
As asked by the comment the following "concatenates" the values bit-wise i used scanf because i was too lazy to manually check for every ASCII key, this assuming the file is big endian, ie 0,0,0,150 else invert the order in which the << is done and it should "just werk™"
#include <stdio.h>
#include <stdlib.h>
unsigned char c[4];
unsigned int dosomething(){
unsigned int result=0;
result= (unsigned int)c[0]<< 24 | (unsigned int)c[1]<< 16 | (unsigned int)c[2]<< 8 | (unsigned int)c[3];
return result;
}
int main(int argc, char const *argv[]){
for (size_t i = 0; i < 4; i++)
{
printf("Enter character: ");
scanf ("%u", &c[i]);
printf("%u\n", c[i]);
//printf("%s",c);
}
printf("%u",dosomething());
return 0;
}
Now for the fread it is used like the following fread(pointertodatatoread, sizeofdata, sizeofarray, filepointer);
for indepth look here is the manual:
https://www.tutorialspoint.com/c_standard_library/c_function_fread.htm
this should be asked in a different thread as i feel im asking another question
If the file my program is reading is for example 150 bytes in size, that is enough to be stored in 1 of the 4 bytes, which means that 3 of the bytes will be 0 and the last one will be 150 in that case. I understand that I need to read all 4 bytes in order to get all the information I need, but what exactly is getchar() going to return to my variable, as it returns the ASCII code for the character it just read?
getchar doesn't know anything about ASCII. It returns the numeric value of the byte it reads, or a special code, represented by EOF, if it cannot read a byte. If you treat the byte as an ASCII code then that's a matter of interpretation.
Thus, if your file size is encoded as as three zero bytes followed by one byte with value 150, then getchar() will return that as 0, 0, 0, and 150 on four consecutive calls.
I am trying to learn C and have recieved a homework assignment to write code which can read data from a .txt file and print out particular lines.
I wrote the following:
#include <stdio.h>
void main() {
char str[5];
FILE *fp;
fp=fopen("data.txt","r");
int i;
for (i=1;i<=5;i++){
fgets(str,5,fp);
printf("%d \n",i);
if (i==1||i==3||i==5) {
printf("%s \n \n",str);
}
}
}
The file data.txt is just the following:
3.21
5.22
4.67
2.31
2.51
1.11
I had read that each time fgets is run, the pointer is updated to point to the next line. I thought I could keep running fgets and then only print the string str when at the correct value for i (the line I want output on the console).
It partially worked, here is the output:
1
3.21
2
3
5.22
4
5
4.67
Process returned 8 (0x8) execution time : 0.024 s
Press any key to continue.
It did only print when i had the correct values, but for some reason it only printed the first 3 lines, even though fgets was supposed to have been run 5 times by the last iteration, and so the pointer should have been reading the last line.
Can someone explain why the pointer did not update as expected and if there is an easier way to slice or index through a file in c.
You need to account for (at least) two additional characters, in addition to the numbers you have in the file. There is the end-of-line delimiter (\n on UNIX/Mac, or possibly \r\n on Windows... so maybe 3 additional characters), plus (from the fgets documentation):
A terminating null character is automatically appended after the characters copied to str.
A lot of the C functions that manipulate character arrays (ie. strings) will give you this extra null "for free" and it can be tricky if you forget about it.
Also, a better way to loop over the lines might be:
#define MAX_CHARS 7
char buf[MAX_CHARS];
while((fgets(buf, MAX_CHARS, fp)) != NULL) {
printf("%s\n", buf);
}
It's still not the best way to do it (no error checking) but a little more compact/readable and idiomatic C, IMO.
I've got a little problem while experimenting with some C code. I've tried to use read()-Command to read a text out of a file and store the results in a charArray. But when I print the results they're always different from the file.
Here is the code:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
void main() {
int fd = open("file", 2);
char buf[2];
printf("Read elements: %ld\n", read(fd, buf, 2));
printf("%s\n", buf);
close(fd);
}
The file "file" was created in the same directory using the following UNIX commands:
cat > file
Hi
So it contains just the word "Hi". When I run it, I expect it to read 2 bytes from the file (which are 'H' and 'i') and store them at buf[0] and buf[1]. But when I want to print the result, it appears, that there was an issue, because besides the word "Hi" there are several wierd characters printed (indicating a memory reading/writing problem i guess, due to bad buffer size). I've tried to increase the size of the buf-Array and it appears that when i change the size, the wierd characters printed change. The problem is removed when size reaches 32 bytes.
Can someone explain to me in detail why this is happening?
I've understood so far that read() does not read \'0' when it reads something, and that the third parameter of read() indicates the maximum number of bytes to read.
Antoher thing I've noticed while experimenting with the above code is the following: Let's assume one changes the third parameter (maximum bytes to read) of read() to 3, and the size of buf-Array to 512 (overkill i know, but I really wanted to see what will happen). Now read will acutally read a third character (in my case 'e') and store it into the buffer, even tho this third character does not exist.
I've searched for a while now #stackoverflow and I found many similiar cases, but none of them made me understand my problem. If there is any other thread i missed, it would be a pleasure if u could link me to it.
At last: sry for my bad english, it's not my native language.
Clearly you need to make buf 3 bytes long and use the last byte as the null byte (0 or '\0'). That way, when you print the string, your computer doesn't carry on until he finds another 0 !
The way strings (char arrays really) are handled in C is quite straightforward. Indeed, when dealing with strings (most) if not all functions take under the assumption that string parameters are null terminated (puts) or return null terminated strings (strdup).
The point is that, by default the computer can't tell where a string ends unless it is given the strings size each time he processes it. The easiest implementation around this approach was to append after each string a 0 (namely the null byte). That way, the computer just need to iterate over the string's characters and stop when he finds the termination character (other name for null byte).
I encrypted a text file using an offset cipher in C. For this, I simply added 128 to each character and got the file size decreased by 3 bytes. I tried the same on some other files too just to get the same result, i.e. decrease in file size by 3 bytes. I got the original size after decryption.
Could you please tell me why does it so happen?
Code for the main logic is given below:
while((ch=fgetc(fs))!=EOF){
fputc(ch+128, ft);
Could you please tell me why does it so happen?
Your ch probably has the wrong declaration. The fputc() function returns an int, not a char, and if you cast to char you will lose the distinction between (char) 0xff and EOF.
// WRONG WRONG WRONG
// char ch = fgetc(fs);
The right declaration:
int ch = fgetc(fs);
Otherwise, it shouldn't happen. Is your process exiting cleanly? If you abort(), then there might be data still in FILE * buffers. Show more code. Run with Valgrind. Check the exit status of your process.
I think the file size should have doubled as two bytes were taken for one character after encryption as something greater than 127 can not be stored in 1 byte.
No, fputc() does not work that way. The fputc() man page (run man fputc in a terminal, unless on Windows):
fputc() writes the character c, cast to an unsigned char, to stream.
Conversion to unsigned char is done by taking the value modulo 256*. So fputc() always writes exactly one byte of data (unlesss it fails).
* This is true all but exceedingly rare systems.
If you talk about Windows, I could imagine that you have opened the file in text mode, not in binary mode.
That leads to the following:
Writing \n leads to a \r\n written to the file.
Reading \r\n from the file gives only \n to the user.
Reading stops at the first \x1A, being a EOF character.
If you add 128 to each byte, the data-to-be-written rolls over at 256. While it may be undefined behaviour to call fputc() with a value > 256 (you should write (ch+128)%256 or (ch+128) & 0xFF), on your systems it obviously writes the value wrapped by 256 and thus you may get \n or \x1A by accident.