I'm writing a program that reads an ASCII file and then converts it to a Binary file, as I see it's not such a hard task, but understanding what's happening behind is ...
As I understand, an ASCII file is just human readable text, so if we want to create a new file full of ASCII's, a simple loop with a fputc() would be enough and for a binary file fwrite() will do the job right?
So my question here is, once that the ASCII to Binary conversion is done, what should I see in my .bin file? It should be filled with exactly the same symbols <88><88><88><88><88>?
Code:
/*
* From "Practical C Programming 2nd Edition"
* Exercise 14-4: Write a program that reads an ASCII file containing a list of numbers
* and writes a binary file containing the same list. Write a program that goes the
* other way so that you can check your work.
*
*/
#include <stdio.h>
#include <stdlib.h>
const char *in_filename = "bigfile.txt";
const char *out_filename = "out_file.bin";
int main()
{
int ch = 0;
/* ASCII */
FILE *in_file = NULL;
in_file = fopen(in_filename, "r");
if(!in_file)
{
fprintf(stderr, "ERROR: Could not open file %s ... ", in_filename);
exit(EXIT_FAILURE);
}
/* Binary */
FILE *out_file = NULL;
out_file = fopen(out_filename, "w+b");
if(!out_file)
{
fprintf(stderr, "ERROR: New file %s, could not be created ... ", out_filename);
exit(EXIT_FAILURE);
}
while(1)
{
ch = fgetc(in_file);
if(ch == EOF)
break;
else
fwrite(in_file, sizeof(char), 1, out_file);
}
fclose(in_file);
fclose(out_file);
return 0;
}
I'm generating the input file with this shell script:
tr -dc "0-9" < /dev/urandom | fold -w100|head -n 100000 > bigfile.txt
Any help would be very appreciate it.
Thanks.
fwrite(in_file, sizeof(char), 1, out_file);
is wrong because an integer is given where a pointer is expected.
You can use fputc to write one byte like
fputc(in_file, out_file);
If you still want to use fwrite for some reason, prepare a data to write and write that like
{
unsigned char in_file_byte = in_file;
fwrite(&in_file_byte, sizeof(in_file_byte), 1, out_file);
}
Now the contents of the output file will be the same as the input file. Some system may perform conversion of newline characters and it may make the contents differ because the input file is opened in text mode.
Opening a file in text mode or binary mode has nothing to do with ASCII/binary conversion.
It has to do with how the operating system deals with some special characters (such as new line characters), line size limit or file extensions.
In the fopen Linux man page:
The mode string can also include the letter 'b' either as a last character or as a character between the characters in any of the two-character strings de‐
scribed above. This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux. (Other
systems may treat text files and binary files differently, and adding the 'b' may be a good idea if you do I/O to a binary file and expect that your program
may be ported to non-UNIX environments.)
For more information about opening a file in text or binary mode, see https://stackoverflow.com/a/20863975/6874310
Now, back to the ASCII conversion:
All the data in a computer is stored in bits so in the end everything is binary.
A text file containing ASCII characters is also a binary file, except its contents can be mapped to the ASCII table characters in a meaningful way.
Have a look at the ASCII table. The ASCII character number zero (0) has a binary value of 0x30. This means that the zero you see in a text file is actually
a binary number 0x30 in the memory.
Your program is reading data from a file and writing to another file without performing any ASCII/binary conversion.
Also, there is a small error here:
fwrite(in_file, sizeof(char), 1, out_file);
It probably should be:
fwrite(&ch, sizeof(char), 1, out_file);
This writes the byte in variable ch to out_file.
With this fix, the program basically reads data from the file bigfile.txt and write the very same data to the file out_file.bin without any conversion.
To convert a single digit ASCII number to binary, read the digit from your input file in a byte (char type) and subtract 0x30 from it:
char ch = fgetc(in_file);
if(ch == EOF)
{
break;
}
else if (isdigit(ch))
{
ch = ch - 0x30;
fwrite(&ch, sizeof(char), 1, out_file);
}
Now, your output file will be actually binary.
Use isdigitto ensure the byte is an ASCII number. Add #include <ctype.h> in the beginning of your file to use it.
So, for a small input file with the following text:
123
Its binary representation will be:
0x313233
And, after the ASCII numbers are converted to binary, the binary contents will be:
0x010203
To convert it back to ASCII, simply reverse the conversion. That is, add 0x30 to each byte of the binary file.
If you're using a Unix-like system, you can use command line tools such as xxd to check binary files. On Windows, any Hex Editor program will do the job.
Related
I'm trying to input a line at the end of a file that has the following shape "1 :1 :1 :1" , so at some point the file may have a new line character at the end of it, and in order to execute the operation I have to deal with that, so I came up with the following solution :
go to the end of the file and go backward by 1 characters (the length of the new line character in Linux OS as I guess), read that character and if it wasn't a new line character insert a one and then insert the whole line else go and insert the line, and this is the translation of that solution on C :
int insert_element(char filename[]){
elements *elem;
FILE *p,*test;
size_t size = 0;
char *buff=NULL;
char c='\n';
if((p = fopen(filename,"a"))!=NULL){
if(test = fopen(filename,"a")){
fseek(test,-1,SEEK_END );
c= getc(test);
if(c!='\n'){
fprintf(test,"\n");
}
}
fclose(test);
p = fopen(filename,"a");
fseek(p,0,SEEK_END);
elem=(elements *)malloc(sizeof(elements));
fflush(stdin);
printf("\ninput the ID\n");
scanf("%d",&elem->id);
printf("input the adress \n");
scanf("%s",elem->adr);
printf("innput the type \n");
scanf("%s",elem->type);
printf("intput the mark \n");
scanf("%s",elem->mark);
fprintf(p,"%d :%s :%s :%s",elem->id,elem->adr,elem->type,elem->mark);
free(elem);
fflush(stdin);
fclose(p);
return 1;
}else{
printf("\nRrror while opening the file !\n");
return 0;
}
}
as you may notice that the whole program depends on the length of the new line character (1 character "\n") so I wonder if there is an optimal way, in another word works on all OS's
It seems you already understand the basics of appending to a file, so we just have to figure out whether the file already ends with a newline.
In a perfect world, you'd jump to the end of the file, back up one character, read that character, and see if it matches '\n'. Something like this:
FILE *f = fopen(filename, "r");
fseek(f, -1, SEEK_END); /* this is a problem */
int c = fgetc(f);
fclose(f);
if (c != '\n') {
/* we need to append a newline before the new content */
}
Though this will likely work on Posix systems, it won't work on many others. The problem is rooted in the many different ways systems separate and/or terminate lines in text files. In C and C++, '\n' is a special value that tells the text mode output routines to do whatever needs to be done to insert a line break. Likewise, the text mode input routines will translate each line break to '\n' as it returns the data read.
On Posix systems (e.g., Linux), a line break is indicated by a line feed character (LF) which occupies a single byte in UTF-8 encoded text. So the compiler just defines '\n' to be a line feed character, and then the input and output routines don't have to do anything special in text mode.
On some older systems (like old MacOS and Amiga) a line break might be a represented by a carriage return character (CR). Many IBM mainframes used different character encodings called EBCDIC that don't have a direct mappings for LF or CR, but they do have a special control character called next line (NL). There were even systems (like VMS, IIRC) that didn't use a stream model for text files but instead used variable length records to represent each line, so the line breaks themselves were implicit rather than marked by a specific control character.
Most of those are challenges you won't face on modern systems. Unicode added more line break conventions, but very little software supports them in a general way.
The remaining major line break convention is the combination CR+LF. What makes CR+LF challenging is that it's two control characters, but the C i/o functions have to make them appear to the programmer as though they are the single character '\n'. That's not a big deal with streaming text in or out. But it makes seeking within a file hard to define. And that brings us back to the problematic line:
fseek(f, -1, SEEK_END);
What does it mean to back up "one character" from the end on a system where line breaks are indicated by a two character sequence like LF+CR? Do we really want the i/o system to have to possibly scan the entire file in order for fseek (and ftell) to figure out how to make sense of the offset?
The C standards people punted. In text mode, the offset argument for fseek can only be 0 or a value returned by a previous call to ftell. So the problematic call, with a negative offset, isn't valid. (On Posix systems, the invalid call to fseek will likely work, but the standard doesn't require it to.)
Also note that Posix defines LF as a line terminator rather than a separator, so a non-empty text file that doesn't end with a '\n' should be uncommon (though it does happen).
For a more portable solution, we have two choices:
Read the entire file in text mode, remembering whether the most recent character you read was '\n'.
This option is hugely inefficient, so unless you're going to do this only occasionally or only with short files, we can rule that out.
Open the file in binary mode, seek backwards a few bytes from the end, and then read to the end, remembering whether the last thing you read was a valid line break sequence.
This might be a problem if our fseek doesn't support the SEEK_END origin when the file is opened in binary mode. Yep, the C standard says supporting that is optional. However, most implementations do support it, so we'll keep this option open.
Since the file will be read in binary mode, the input routines aren't going to convert the platform's line break sequence to '\n'. We'll need a state machine to detect line break sequences that are more than one byte long.
Let's make the simplifying assumption that a line break is either LF or CR+LF. In the latter case, we don't care about the CR, so we can simply back up one byte from the end and test whether it's LF.
Oh, and we have to figure out what to do with an empty file.
bool NeedsLineBreak(const char *filename) {
const int LINE_FEED = '\x0A';
FILE *f = fopen(filename, "rb"); /* binary mode */
if (f == NULL) return false;
const bool empty_file = fseek(f, 0, SEEK_END) == 0 && ftell(f) == 0;
const bool result = !empty_file ||
(fseek(f, -1, SEEK_END) == 0 && fgetc(f) == LINE_FEED);
fclose(f);
return result;
}
I've made a program that opens files and searches for a word
I want it to only work on TEXT Files
Is there a way provided by C to check if a file is BINARY, and if so, I want to exit the program before any operations take place
Thanks
No, there isn't, because it's impossible to tell for sure. If you expect a specific encoding, you can check yourself whether the file contents are valid in this encoding, e.g. if you expect ASCII, all bytes must be <= 0x7f. If you expect UTF-8, it's a bit more complicated, see a description of it.
In any case, there's no guarantee that a "binary" file would not by accident look like a valid file in any given text encoding. In fact, the term "binary file" doesn't make too much sense, as all files contain binary data.
If we assume that by text you mean ASCII and not UTF-8, you can do this by reading each character and using isascii() and isspace() to check if it is a valid character:
void is_text(char *filename) {
FILE *f = fopen(filename, "r");
if (!f) {
perror("fopen failed");
return;
}
int c;
while ((c=fgetc(f) != EOF) {
if ((!isascii(c) || iscntrl(c)) && !isspace(c)) {
printf("is binary\n");
fclose(f);
return;
}
}
printf("is text\n");
fclose(f);
}
If the file contains UTF-8 characters, it becomes more complicated as you have to look at multiple bytes at once and see if they are valid UTF-8 byte sequences. There's also the question of which Unicode code points are considered text.
It's not the file per se which is binary or text; it is just about how you interpret the content of the file when opening it.
You may interpret a file containing solely text as binary, thereby avoiding that a /r/n might get translated to a /n only; And you may open a file containing raw data like, for example, a bitmap using a text mode, thereby probably corrupting the content in that a 0x0D 0x0A gets converted to a 0x0D only.
So you cannot check the file per se, but you may open the file in binary mode and see if the content contains anything which you do not interpret as text.
perhaps: system(file "path/filename");
I am trying to write a code which reads 1 byte (ideal goal is n bytes but starting with 1 byte - so for n bytes if its easier please suggest)
Below is the code I have attempted to read 1 byte at a time and output it in hex format. But all get is bunch of FFFF
FILE *fp;
int stringlength,i;
/* File can be txt or .bin */
fp = fopen("TestFile3.txt", "r");
if (fp == NULL)
{
puts("Error: Input file cannot be read");
return -1;
}
else
{
size_t i, strlength, lengthOfFile,c;
fseek(fp, 0, SEEK_END);
lengthOfFile = ftell(fp);
printf("length of File is ---- %d \n", lengthOfFile);
while (lengthOfFile)
{
c = fgetc(fp);
printf("%c", c);
lengthOfFile--;
}
putchar('\n');
}
fclose(fp);
return 0;
}
You need fseek(fp, 0, SEEK_SET); to reset the file pointer before the while loop.
You're also opening the file in "text" mode:
fp = fopen("TestFile3.txt", "r");
Per the C Standard, section 7.19.2:
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one- to-one correspondence
between the characters in a stream and those in the external
representation.
Using fseek()/ftell() doesn't return the number of bytes readable from a text stream.
You need to open the file in binary mode if you want to read every byte per the size of the file:
fp = fopen("TestFile3.txt", "rb");
Finally, the use of fseek()/ftell() isn't reliable on binary files either, because, again per the C standard, 7.19.9.2:
A binary stream need not meaningfully support fseek calls with a
whence value of SEEK_END
Given that, you can't reliably use fseek()/ftell() to find out how big a binary file is, either. And yes, examples do exist.
To reliably read all the bytes in a file, #Weather Vane posted one way to do that in the comments.
I am trying to write some simple program to uploading files to my server. I' d like to convert binary files to hex. I have written something, but it does not work properly.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static int bufferSize = 1024;
FILE *source;
FILE *dest;
int n;
int counter;
int main() {
unsigned char buffer[bufferSize];
source = fopen("server.pdf", "rb");
if (source) {
dest = fopen("file_test", "wb");
while (!feof(source)) {
n = fread(buffer, 1, bufferSize, source);
counter += n;
strtol(buffer, NULL, 2);
fwrite(buffer, 1, n, dest);
}
}
else {
printf("Error");
}
fclose(source);
fclose(dest);
}
I use strtol to convert binary do hex. After invoking this code I have still strange characters in my file_test file.
I' d like to upload a file on server, for example a PDF file. But firstly I have to write a program, that will convert this file to a hex file. I'd like that the length of a line in hex file would be equal 1024. After that, I will upload this file line by line with PL/SQL.
EDIT: I completely misunderstood what the OP was aiming for. He wants to convert his pdf file to its hex representation, as I see now, because he wants to put that file in a text blob field in some database table. I still claim the exercise is a complete waste of time,since blobs can contain binary data: that's what blobs were invented for. Blob means binary large object.
You said: "I' d like to upload file on server, for example pdf file. But firstly I have to write a program, that will convert this file to hex file."
You don't have to, and must not, write any such conversion program.
You have to first understand and internalize the idea that hex notation is only an easy-to-read representation of binary. If you think, as you seem to, that you have to "convert" a pdf file to hex, then you are mistaken. A pdf file is a binary file is a binary file. You don't "convert" anything, not unless you want to change the binary!
You must abandon, delete, discard, defenestrate, forget about, and expunge your notion of "converting" any binary file to anything else. See, hex exists only as a human-readable presentation format for binary, each hex digit representing four contiguous binary digits.
To put it another way: hex representation is for human consumption only, unsuitable (almost always) for program use.
For an example: suppose your pdf file holds a four-bit string "1100," whose human-readable hex representation can be 'C'. When you "convert" that 1100 to hex the way you want to do it, you replace it by the ASCII character 'C', whose decimal value is 67. You can see right away that's not what you want to do and you immediately see also that it's not even possible: the decimal value 67 needs seven bits and won't fit in your four bits of "1100".
HTH
Your code is fantastically confused.
It's reading in the data, then doing a strtol() call on it, with a base of 2, and then ignoring the return value. What's the point in that?
To convert the first loaded byte of data to hexadecimal string, you should probably use something like:
char hex[8];
sprintf(hex, "%02x", (unsigned int) buffer[0] & 0xff);
Then write hex to the output file. You need to do this for all bytes loaded, of course, not just buffer[0].
Also, as a minor point, you can't call feof() before you've tried reading the file. It's better to not use feof() and instead check the return value of fread() to detect when it fails.
strtol converts a string containing a decimal representation of a number to the binary number if i am not mistaken. You probably want to convert something like a binary OK to 4F 4B... To do that you can use for example sprintf(aString, "%x", aChar).
I have an requirement where the C code extract string data from database and write it to a file. The string data in the database can have any kind of characters
for example: Description field have data "Adj \342\200\223 Data" , when I write to the file the text it writes as "Adj â Data". Similarly, this description field can have any kind of data, my code just read and uses strcpy after extracting from the database and write to a file.
How do I get the data written to a file as it is in the description field ?
Think easiest solution would be writing byte by byte - shouldn't matter that much with buffering:
int pos = 0;
FILE *fp = 0;
//...
fp = fopen("somefile.txt", "w");
//...
while(buffer[pos])
if(buffer[pos] < 32 || buffer[pos] > 127) // change bounds for non-printable chars as you like
fprintf(fp, "%c", buffer[pos++]);
else
fprintf(fp, "\\%u", buffer[pos++]);
Edit:
Might have misunderstood your question. Only use string functions when you're actually working with strings. For binary data use binary functions (e.g. the mentioned memcpy()).
Edit 2/3:
Don't print the value as "%d" or "%u" - should be "%3o" to print as a 3-digit octal number. Using "%o" could be unsafe if other digits follow.