Read binary and txt File - 1 byte at a time - c

I am trying to write a code which reads 1 byte (ideal goal is n bytes but starting with 1 byte - so for n bytes if its easier please suggest)
Below is the code I have attempted to read 1 byte at a time and output it in hex format. But all get is bunch of FFFF
FILE *fp;
int stringlength,i;
/* File can be txt or .bin */
fp = fopen("TestFile3.txt", "r");
if (fp == NULL)
{
puts("Error: Input file cannot be read");
return -1;
}
else
{
size_t i, strlength, lengthOfFile,c;
fseek(fp, 0, SEEK_END);
lengthOfFile = ftell(fp);
printf("length of File is ---- %d \n", lengthOfFile);
while (lengthOfFile)
{
c = fgetc(fp);
printf("%c", c);
lengthOfFile--;
}
putchar('\n');
}
fclose(fp);
return 0;
}

You need fseek(fp, 0, SEEK_SET); to reset the file pointer before the while loop.

You're also opening the file in "text" mode:
fp = fopen("TestFile3.txt", "r");
Per the C Standard, section 7.19.2:
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one- to-one correspondence
between the characters in a stream and those in the external
representation.
Using fseek()/ftell() doesn't return the number of bytes readable from a text stream.
You need to open the file in binary mode if you want to read every byte per the size of the file:
fp = fopen("TestFile3.txt", "rb");
Finally, the use of fseek()/ftell() isn't reliable on binary files either, because, again per the C standard, 7.19.9.2:
A binary stream need not meaningfully support fseek calls with a
whence value of SEEK_END
Given that, you can't reliably use fseek()/ftell() to find out how big a binary file is, either. And yes, examples do exist.
To reliably read all the bytes in a file, #Weather Vane posted one way to do that in the comments.

Related

How to know if the file end with a new line character or not

I'm trying to input a line at the end of a file that has the following shape "1 :1 :1 :1" , so at some point the file may have a new line character at the end of it, and in order to execute the operation I have to deal with that, so I came up with the following solution :
go to the end of the file and go backward by 1 characters (the length of the new line character in Linux OS as I guess), read that character and if it wasn't a new line character insert a one and then insert the whole line else go and insert the line, and this is the translation of that solution on C :
int insert_element(char filename[]){
elements *elem;
FILE *p,*test;
size_t size = 0;
char *buff=NULL;
char c='\n';
if((p = fopen(filename,"a"))!=NULL){
if(test = fopen(filename,"a")){
fseek(test,-1,SEEK_END );
c= getc(test);
if(c!='\n'){
fprintf(test,"\n");
}
}
fclose(test);
p = fopen(filename,"a");
fseek(p,0,SEEK_END);
elem=(elements *)malloc(sizeof(elements));
fflush(stdin);
printf("\ninput the ID\n");
scanf("%d",&elem->id);
printf("input the adress \n");
scanf("%s",elem->adr);
printf("innput the type \n");
scanf("%s",elem->type);
printf("intput the mark \n");
scanf("%s",elem->mark);
fprintf(p,"%d :%s :%s :%s",elem->id,elem->adr,elem->type,elem->mark);
free(elem);
fflush(stdin);
fclose(p);
return 1;
}else{
printf("\nRrror while opening the file !\n");
return 0;
}
}
as you may notice that the whole program depends on the length of the new line character (1 character "\n") so I wonder if there is an optimal way, in another word works on all OS's
It seems you already understand the basics of appending to a file, so we just have to figure out whether the file already ends with a newline.
In a perfect world, you'd jump to the end of the file, back up one character, read that character, and see if it matches '\n'. Something like this:
FILE *f = fopen(filename, "r");
fseek(f, -1, SEEK_END); /* this is a problem */
int c = fgetc(f);
fclose(f);
if (c != '\n') {
/* we need to append a newline before the new content */
}
Though this will likely work on Posix systems, it won't work on many others. The problem is rooted in the many different ways systems separate and/or terminate lines in text files. In C and C++, '\n' is a special value that tells the text mode output routines to do whatever needs to be done to insert a line break. Likewise, the text mode input routines will translate each line break to '\n' as it returns the data read.
On Posix systems (e.g., Linux), a line break is indicated by a line feed character (LF) which occupies a single byte in UTF-8 encoded text. So the compiler just defines '\n' to be a line feed character, and then the input and output routines don't have to do anything special in text mode.
On some older systems (like old MacOS and Amiga) a line break might be a represented by a carriage return character (CR). Many IBM mainframes used different character encodings called EBCDIC that don't have a direct mappings for LF or CR, but they do have a special control character called next line (NL). There were even systems (like VMS, IIRC) that didn't use a stream model for text files but instead used variable length records to represent each line, so the line breaks themselves were implicit rather than marked by a specific control character.
Most of those are challenges you won't face on modern systems. Unicode added more line break conventions, but very little software supports them in a general way.
The remaining major line break convention is the combination CR+LF. What makes CR+LF challenging is that it's two control characters, but the C i/o functions have to make them appear to the programmer as though they are the single character '\n'. That's not a big deal with streaming text in or out. But it makes seeking within a file hard to define. And that brings us back to the problematic line:
fseek(f, -1, SEEK_END);
What does it mean to back up "one character" from the end on a system where line breaks are indicated by a two character sequence like LF+CR? Do we really want the i/o system to have to possibly scan the entire file in order for fseek (and ftell) to figure out how to make sense of the offset?
The C standards people punted. In text mode, the offset argument for fseek can only be 0 or a value returned by a previous call to ftell. So the problematic call, with a negative offset, isn't valid. (On Posix systems, the invalid call to fseek will likely work, but the standard doesn't require it to.)
Also note that Posix defines LF as a line terminator rather than a separator, so a non-empty text file that doesn't end with a '\n' should be uncommon (though it does happen).
For a more portable solution, we have two choices:
Read the entire file in text mode, remembering whether the most recent character you read was '\n'.
This option is hugely inefficient, so unless you're going to do this only occasionally or only with short files, we can rule that out.
Open the file in binary mode, seek backwards a few bytes from the end, and then read to the end, remembering whether the last thing you read was a valid line break sequence.
This might be a problem if our fseek doesn't support the SEEK_END origin when the file is opened in binary mode. Yep, the C standard says supporting that is optional. However, most implementations do support it, so we'll keep this option open.
Since the file will be read in binary mode, the input routines aren't going to convert the platform's line break sequence to '\n'. We'll need a state machine to detect line break sequences that are more than one byte long.
Let's make the simplifying assumption that a line break is either LF or CR+LF. In the latter case, we don't care about the CR, so we can simply back up one byte from the end and test whether it's LF.
Oh, and we have to figure out what to do with an empty file.
bool NeedsLineBreak(const char *filename) {
const int LINE_FEED = '\x0A';
FILE *f = fopen(filename, "rb"); /* binary mode */
if (f == NULL) return false;
const bool empty_file = fseek(f, 0, SEEK_END) == 0 && ftell(f) == 0;
const bool result = !empty_file ||
(fseek(f, -1, SEEK_END) == 0 && fgetc(f) == LINE_FEED);
fclose(f);
return result;
}

Should the binary output be the same as the ASCII input?

I'm writing a program that reads an ASCII file and then converts it to a Binary file, as I see it's not such a hard task, but understanding what's happening behind is ...
As I understand, an ASCII file is just human readable text, so if we want to create a new file full of ASCII's, a simple loop with a fputc() would be enough and for a binary file fwrite() will do the job right?
So my question here is, once that the ASCII to Binary conversion is done, what should I see in my .bin file? It should be filled with exactly the same symbols <88><88><88><88><88>?
Code:
/*
* From "Practical C Programming 2nd Edition"
* Exercise 14-4: Write a program that reads an ASCII file containing a list of numbers
* and writes a binary file containing the same list. Write a program that goes the
* other way so that you can check your work.
*
*/
#include <stdio.h>
#include <stdlib.h>
const char *in_filename = "bigfile.txt";
const char *out_filename = "out_file.bin";
int main()
{
int ch = 0;
/* ASCII */
FILE *in_file = NULL;
in_file = fopen(in_filename, "r");
if(!in_file)
{
fprintf(stderr, "ERROR: Could not open file %s ... ", in_filename);
exit(EXIT_FAILURE);
}
/* Binary */
FILE *out_file = NULL;
out_file = fopen(out_filename, "w+b");
if(!out_file)
{
fprintf(stderr, "ERROR: New file %s, could not be created ... ", out_filename);
exit(EXIT_FAILURE);
}
while(1)
{
ch = fgetc(in_file);
if(ch == EOF)
break;
else
fwrite(in_file, sizeof(char), 1, out_file);
}
fclose(in_file);
fclose(out_file);
return 0;
}
I'm generating the input file with this shell script:
tr -dc "0-9" < /dev/urandom | fold -w100|head -n 100000 > bigfile.txt
Any help would be very appreciate it.
Thanks.
fwrite(in_file, sizeof(char), 1, out_file);
is wrong because an integer is given where a pointer is expected.
You can use fputc to write one byte like
fputc(in_file, out_file);
If you still want to use fwrite for some reason, prepare a data to write and write that like
{
unsigned char in_file_byte = in_file;
fwrite(&in_file_byte, sizeof(in_file_byte), 1, out_file);
}
Now the contents of the output file will be the same as the input file. Some system may perform conversion of newline characters and it may make the contents differ because the input file is opened in text mode.
Opening a file in text mode or binary mode has nothing to do with ASCII/binary conversion.
It has to do with how the operating system deals with some special characters (such as new line characters), line size limit or file extensions.
In the fopen Linux man page:
The mode string can also include the letter 'b' either as a last character or as a character between the characters in any of the two-character strings de‐
scribed above. This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux. (Other
systems may treat text files and binary files differently, and adding the 'b' may be a good idea if you do I/O to a binary file and expect that your program
may be ported to non-UNIX environments.)
For more information about opening a file in text or binary mode, see https://stackoverflow.com/a/20863975/6874310
Now, back to the ASCII conversion:
All the data in a computer is stored in bits so in the end everything is binary.
A text file containing ASCII characters is also a binary file, except its contents can be mapped to the ASCII table characters in a meaningful way.
Have a look at the ASCII table. The ASCII character number zero (0) has a binary value of 0x30. This means that the zero you see in a text file is actually
a binary number 0x30 in the memory.
Your program is reading data from a file and writing to another file without performing any ASCII/binary conversion.
Also, there is a small error here:
fwrite(in_file, sizeof(char), 1, out_file);
It probably should be:
fwrite(&ch, sizeof(char), 1, out_file);
This writes the byte in variable ch to out_file.
With this fix, the program basically reads data from the file bigfile.txt and write the very same data to the file out_file.bin without any conversion.
To convert a single digit ASCII number to binary, read the digit from your input file in a byte (char type) and subtract 0x30 from it:
char ch = fgetc(in_file);
if(ch == EOF)
{
break;
}
else if (isdigit(ch))
{
ch = ch - 0x30;
fwrite(&ch, sizeof(char), 1, out_file);
}
Now, your output file will be actually binary.
Use isdigitto ensure the byte is an ASCII number. Add #include <ctype.h> in the beginning of your file to use it.
So, for a small input file with the following text:
123
Its binary representation will be:
0x313233
And, after the ASCII numbers are converted to binary, the binary contents will be:
0x010203
To convert it back to ASCII, simply reverse the conversion. That is, add 0x30 to each byte of the binary file.
If you're using a Unix-like system, you can use command line tools such as xxd to check binary files. On Windows, any Hex Editor program will do the job.

Why is the file size different with the times of file read of 1 bytes?

I'm learning the file handling of C and I got some problem.
I wrote the codes as follows,
# include <stdio.h>
# include <stdlib.h>
int main(void)
{
FILE * file;
errno_t err = fopen_s(&file,"f.txt","r");
fseek(file, 0, SEEK_END);
int size = ftell(file);
fseek(file, 0, SEEK_SET);
char *tmp;
tmp = malloc(size);
printf("%d\n", size);
for (int i = 0; !feof(file); i++)
{
fread(tmp + i, 1, 1, file);
size = i + 1;
}
printf("%d\n", size);
fclose(file);
free(tmp);
return 0;
However, the outputs of size are not the same(1st: 78, 2nd: 76), what is the reason behind this?
I suspect you are using Microsoft Windows. In Microsoft’s C/C++ implementation, binary streams and text streams are different. If you had opened the file with "rb" passed to fopen_s as its third parameter, the file would be opened with a binary stream, and fread would return the actual bytes in the file.
Since you opened the file with "r", it was opened as a text stream. In this mode, some processing is performed when reading and writing the file. Notably, Windows uses two characters, a new-line '\n' and a carriage-return '\r', at the end of each line. When reading the file as a text stream, these two characters are reduced to a single '\n'. Conversely, when writing a text stream, writing a '\n' produces '\n' and '\r' in the file.
For a binary stream, ftell gives the number of bytes from the beginning of the file. For a text stream, the C standard only specifies that ftell is usable for resetting the stream position using fseek—it is not necessarily the number of bytes (in the actual file) or characters (appearing in the stream) from the beginning of the file. A C implementation might implement ftell so that it gives the number of bytes from the beginning of the file (and that is the 78 you are seeing), but, even if it does, you cannot easily use that to know how many characters are in the text stream.
Additionally, as others have noted in comments, this code is wrong:
for (int i = 0; !feof(file); i++)
{
fread(tmp + i, 1, 1, file);
size = i + 1;
}
The standard library routines do not know the end of the file file has been reached until you attempt a read and it fails because the end of the file was reached. For example, if there is one character in the file, and you read it, feof(file) is still false—the end of the file has not been encountered. It is not until you try to read a second character and fread fails that feof(file) becomes true.
Because of this, the above loop ultimately sets size to one more than the number of characters read because, at the beginning of the file iteration, !feof(file) was true, so fread was attempted, it failed, and then size was set to i + 1 even though no byte was just read.
Because this is how feof works, you could not use it for controlling a loop like this. Instead, you should write the loop so that it tests the result of fread and, if it fail to read any characters, the code exits the loop. That code could be something like:
int i = 0;
do
{
size_t result = fread(tmp + i, 1, 1, file);
if (result == 0)
break;
i++;
}
size = i;
(Note that, if you were reading more than one byte at a time with fread, additional code would be needed to handle the case where the number of bytes read was between zero and the number requested.)
Once that loop is fixed, you should see the number of characters in the stream reported as 75. Most likely, your file f.txt contains three lines of text with 72 characters total excluding the line endings. When read as a text stream, there are three '\n' characters, so the total is 75. When read as a binary stream, there are three '\n' characters and three '\r' characters, so the total is 78.

Difference between fread(&c, 1, 1, input) and fgetc(input) for reading one byte

I'm currently trying to read in a PNG file, one byte at a time, and I'm getting different results when I use fread((void*), size_t, size_t, FILE*) and fgetc(FILE*).
I essentially want to "Read one byte at a time until the file ends", and I do so in two different ways. In both cases, I open the image I want in binary mode through:
FILE* input = fopen( /* Name of File */, 'rb');
And store each byte in a character, char c
fread: while( fread(&c, 1, 1, input) != 0) //read until there are no more bytes read
fgetc:
while( (c = fgetc(input)) != EOF) //Read while EOF hasn't been reached
In the fread case, I read all the bytes I need to do. The reading function stops at the end of the file, and I end up printing all 380,000 bytes (which makes sense, as the input file is a 380kB file).
However, in the fgetc case, I stop once I reach a byte with a value of ff (which is -1, the value of the macro EOF.
My question is, if both functions are doing the same thing, reading one byte at a time, how does fread know to continue reading even if it comes across a byte with a value of EOF? And building off of this, how does fread know when to stop if EOF is passed when reading the file?
fgetc returns an int, not a char. EOF (and many actual character codes) cannot be stored in a char and attempting to do so will result in Undefined Behaviour. So don't do that. Store the return value in an int.

About the FILE * streams and how fputc() works

I wonder about the operation of FILE pointer f and how the function fputc works.
First, when I open a file (I have not been working on it yet, like writing or reading). What position of f in the file? Is it before the first character?
Second, when I use:
fseek(f, -1, SEEK_CUR);
fputc(' ', f);
what position of my pointer f now?
Reading the manuals should help you.
For fopen: the stream is positioned at the beginning of the file. Except for mode like 'a'
For fseek: that function can fail, you have to test the return value; and it is not difficult to imagine that you cannot obtain a negative offset.
When you open the file, the current position is 0, at the first character.
If you try to fseek before the beginning of the file, fseek will fail and return -1.
Note that if you seek backwards on a text file, there is no guarantee that is can succeed. On linux and/or for a binary stream, assuming you are not at the start of the stream, opened in write mode for a real file, after the sequence
fseek(f, -1L, SEEK_CUR);
fputc(' ', f);
the position of the stream will be the same as before the fseek.
But consider this seemingly simpler example:
fputc('\n', f);
fseek(f, -1L, SEEK_CUR);
On systems such as Windows, where '\n' will at some point be converted into a sequence of 2 bytes <CR><LF>, what do you think it should do?
Because of all these possibilities for failure (and a few more exotic ones), you should always test the return value of fseek and try to minimize its use.
When accessing files through C, the first necessity is to have a way to access the files. For C File I/O you need to use a FILE pointer, which will let the program keep track of the file being accessed. For Example:
FILE *fp;
To open a file you need to use the fopen function, which returns a FILE pointer. Once you've opened a file, you can use the FILE pointer to let the compiler perform input and output functions on the file.
FILE *fopen(const char *filename, const char *mode);
Here filename is string literal which you will use to name your file and mode can have one of the following values
w - open for writing (file need not exist)
a - open for appending (file need not exist)
r+ - open for reading and writing, start at beginning
w+ - open for reading and writing (overwrite file)
a+ - open for reading and writing (append if file exists)
Following is the declaration for fseek() function.
int fseek(FILE *stream, long int offset, int whence)
SEEK_SET Beginning of file
SEEK_CUR Current position of the file pointer
SEEK_END End of file
Following fputc() example :
/* fputc example: alphabet writer */
#include <stdio.h>
int main ()
{
FILE * pFile;
char c;
pFile = fopen ("alphabet.txt","w");
if (pFile!=NULL) {
for (c = 'A' ; c <= 'Z' ; c++)
fputc ( c , pFile );
fclose (pFile);
}
return 0;
}
It depends on your current position/offset for an example if your file pointer was on 100th offset and you write fseek(f, -1, SEEK_CUR); and the offset will be at 99th position, and then you write space on 99th position, after writing space using fputc(' ', f); file pointer's offset will be 100th again.

Resources