Conversion from binary file to hex in C - c

I am trying to write some simple program to uploading files to my server. I' d like to convert binary files to hex. I have written something, but it does not work properly.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static int bufferSize = 1024;
FILE *source;
FILE *dest;
int n;
int counter;
int main() {
unsigned char buffer[bufferSize];
source = fopen("server.pdf", "rb");
if (source) {
dest = fopen("file_test", "wb");
while (!feof(source)) {
n = fread(buffer, 1, bufferSize, source);
counter += n;
strtol(buffer, NULL, 2);
fwrite(buffer, 1, n, dest);
}
}
else {
printf("Error");
}
fclose(source);
fclose(dest);
}
I use strtol to convert binary do hex. After invoking this code I have still strange characters in my file_test file.
I' d like to upload a file on server, for example a PDF file. But firstly I have to write a program, that will convert this file to a hex file. I'd like that the length of a line in hex file would be equal 1024. After that, I will upload this file line by line with PL/SQL.

EDIT: I completely misunderstood what the OP was aiming for. He wants to convert his pdf file to its hex representation, as I see now, because he wants to put that file in a text blob field in some database table. I still claim the exercise is a complete waste of time,since blobs can contain binary data: that's what blobs were invented for. Blob means binary large object.
You said: "I' d like to upload file on server, for example pdf file. But firstly I have to write a program, that will convert this file to hex file."
You don't have to, and must not, write any such conversion program.
You have to first understand and internalize the idea that hex notation is only an easy-to-read representation of binary. If you think, as you seem to, that you have to "convert" a pdf file to hex, then you are mistaken. A pdf file is a binary file is a binary file. You don't "convert" anything, not unless you want to change the binary!
You must abandon, delete, discard, defenestrate, forget about, and expunge your notion of "converting" any binary file to anything else. See, hex exists only as a human-readable presentation format for binary, each hex digit representing four contiguous binary digits.
To put it another way: hex representation is for human consumption only, unsuitable (almost always) for program use.
For an example: suppose your pdf file holds a four-bit string "1100," whose human-readable hex representation can be 'C'. When you "convert" that 1100 to hex the way you want to do it, you replace it by the ASCII character 'C', whose decimal value is 67. You can see right away that's not what you want to do and you immediately see also that it's not even possible: the decimal value 67 needs seven bits and won't fit in your four bits of "1100".
HTH

Your code is fantastically confused.
It's reading in the data, then doing a strtol() call on it, with a base of 2, and then ignoring the return value. What's the point in that?
To convert the first loaded byte of data to hexadecimal string, you should probably use something like:
char hex[8];
sprintf(hex, "%02x", (unsigned int) buffer[0] & 0xff);
Then write hex to the output file. You need to do this for all bytes loaded, of course, not just buffer[0].
Also, as a minor point, you can't call feof() before you've tried reading the file. It's better to not use feof() and instead check the return value of fread() to detect when it fails.

strtol converts a string containing a decimal representation of a number to the binary number if i am not mistaken. You probably want to convert something like a binary OK to 4F 4B... To do that you can use for example sprintf(aString, "%x", aChar).

Related

Should the binary output be the same as the ASCII input?

I'm writing a program that reads an ASCII file and then converts it to a Binary file, as I see it's not such a hard task, but understanding what's happening behind is ...
As I understand, an ASCII file is just human readable text, so if we want to create a new file full of ASCII's, a simple loop with a fputc() would be enough and for a binary file fwrite() will do the job right?
So my question here is, once that the ASCII to Binary conversion is done, what should I see in my .bin file? It should be filled with exactly the same symbols <88><88><88><88><88>?
Code:
/*
* From "Practical C Programming 2nd Edition"
* Exercise 14-4: Write a program that reads an ASCII file containing a list of numbers
* and writes a binary file containing the same list. Write a program that goes the
* other way so that you can check your work.
*
*/
#include <stdio.h>
#include <stdlib.h>
const char *in_filename = "bigfile.txt";
const char *out_filename = "out_file.bin";
int main()
{
int ch = 0;
/* ASCII */
FILE *in_file = NULL;
in_file = fopen(in_filename, "r");
if(!in_file)
{
fprintf(stderr, "ERROR: Could not open file %s ... ", in_filename);
exit(EXIT_FAILURE);
}
/* Binary */
FILE *out_file = NULL;
out_file = fopen(out_filename, "w+b");
if(!out_file)
{
fprintf(stderr, "ERROR: New file %s, could not be created ... ", out_filename);
exit(EXIT_FAILURE);
}
while(1)
{
ch = fgetc(in_file);
if(ch == EOF)
break;
else
fwrite(in_file, sizeof(char), 1, out_file);
}
fclose(in_file);
fclose(out_file);
return 0;
}
I'm generating the input file with this shell script:
tr -dc "0-9" < /dev/urandom | fold -w100|head -n 100000 > bigfile.txt
Any help would be very appreciate it.
Thanks.
fwrite(in_file, sizeof(char), 1, out_file);
is wrong because an integer is given where a pointer is expected.
You can use fputc to write one byte like
fputc(in_file, out_file);
If you still want to use fwrite for some reason, prepare a data to write and write that like
{
unsigned char in_file_byte = in_file;
fwrite(&in_file_byte, sizeof(in_file_byte), 1, out_file);
}
Now the contents of the output file will be the same as the input file. Some system may perform conversion of newline characters and it may make the contents differ because the input file is opened in text mode.
Opening a file in text mode or binary mode has nothing to do with ASCII/binary conversion.
It has to do with how the operating system deals with some special characters (such as new line characters), line size limit or file extensions.
In the fopen Linux man page:
The mode string can also include the letter 'b' either as a last character or as a character between the characters in any of the two-character strings de‐
scribed above. This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux. (Other
systems may treat text files and binary files differently, and adding the 'b' may be a good idea if you do I/O to a binary file and expect that your program
may be ported to non-UNIX environments.)
For more information about opening a file in text or binary mode, see https://stackoverflow.com/a/20863975/6874310
Now, back to the ASCII conversion:
All the data in a computer is stored in bits so in the end everything is binary.
A text file containing ASCII characters is also a binary file, except its contents can be mapped to the ASCII table characters in a meaningful way.
Have a look at the ASCII table. The ASCII character number zero (0) has a binary value of 0x30. This means that the zero you see in a text file is actually
a binary number 0x30 in the memory.
Your program is reading data from a file and writing to another file without performing any ASCII/binary conversion.
Also, there is a small error here:
fwrite(in_file, sizeof(char), 1, out_file);
It probably should be:
fwrite(&ch, sizeof(char), 1, out_file);
This writes the byte in variable ch to out_file.
With this fix, the program basically reads data from the file bigfile.txt and write the very same data to the file out_file.bin without any conversion.
To convert a single digit ASCII number to binary, read the digit from your input file in a byte (char type) and subtract 0x30 from it:
char ch = fgetc(in_file);
if(ch == EOF)
{
break;
}
else if (isdigit(ch))
{
ch = ch - 0x30;
fwrite(&ch, sizeof(char), 1, out_file);
}
Now, your output file will be actually binary.
Use isdigitto ensure the byte is an ASCII number. Add #include <ctype.h> in the beginning of your file to use it.
So, for a small input file with the following text:
123
Its binary representation will be:
0x313233
And, after the ASCII numbers are converted to binary, the binary contents will be:
0x010203
To convert it back to ASCII, simply reverse the conversion. That is, add 0x30 to each byte of the binary file.
If you're using a Unix-like system, you can use command line tools such as xxd to check binary files. On Windows, any Hex Editor program will do the job.

C Opening a file to check if it is Binary, if so print it is binary

I've made a program that opens files and searches for a word
I want it to only work on TEXT Files
Is there a way provided by C to check if a file is BINARY, and if so, I want to exit the program before any operations take place
Thanks
No, there isn't, because it's impossible to tell for sure. If you expect a specific encoding, you can check yourself whether the file contents are valid in this encoding, e.g. if you expect ASCII, all bytes must be <= 0x7f. If you expect UTF-8, it's a bit more complicated, see a description of it.
In any case, there's no guarantee that a "binary" file would not by accident look like a valid file in any given text encoding. In fact, the term "binary file" doesn't make too much sense, as all files contain binary data.
If we assume that by text you mean ASCII and not UTF-8, you can do this by reading each character and using isascii() and isspace() to check if it is a valid character:
void is_text(char *filename) {
FILE *f = fopen(filename, "r");
if (!f) {
perror("fopen failed");
return;
}
int c;
while ((c=fgetc(f) != EOF) {
if ((!isascii(c) || iscntrl(c)) && !isspace(c)) {
printf("is binary\n");
fclose(f);
return;
}
}
printf("is text\n");
fclose(f);
}
If the file contains UTF-8 characters, it becomes more complicated as you have to look at multiple bytes at once and see if they are valid UTF-8 byte sequences. There's also the question of which Unicode code points are considered text.
It's not the file per se which is binary or text; it is just about how you interpret the content of the file when opening it.
You may interpret a file containing solely text as binary, thereby avoiding that a /r/n might get translated to a /n only; And you may open a file containing raw data like, for example, a bitmap using a text mode, thereby probably corrupting the content in that a 0x0D 0x0A gets converted to a 0x0D only.
So you cannot check the file per se, but you may open the file in binary mode and see if the content contains anything which you do not interpret as text.
perhaps: system(file "path/filename");

What is the use of `putw` and `getw` function in c?

I wanna know the use of putw() and getw() function. As I know, these are used to write and read from file as like as putc and getc but these deals with only integers. But when I use these for writing integers, it just write different symbol in file (like if I write 65 to file using putw(). It writes A in the file). Why does it take the ASCII value? I am using codeblocks 13.12. Code:
#include <stdio.h>
int main() {
FILE *fp;
int num;
fp = fopen("file.txt", "w");
printf("Enter any number:\n");
scanf("%d", &num);
putw(num, fp);
fclose(fp);
printf("%d\n", num);
return 0;
}
Let's read the point to point explanation of getw() and putw() functions.
getw() and putw() are related to FILE handling.
putw() is use to write integer data on the file (text file).
getw() is use to read the integer data from the file.
getw() and putw() are similar to getc() and putc(). The only difference is that getw() and putw() are especially meant for reading and writing the integer data.
int putw(integer, FILE*);
Return type of the function is integer.
Having two argument first "integer", telling the integer you want to write on the file and second argument "FILE*" telling the location of the file in which the data would be get written.
Now let's see an example.
int main()
{
FILE *fp;
fp=fopen("file1.txt","w");
putw(65,fp);
fclose(fp);
}
Here putw() takes the integer number as argument (65 in this case) to write it on the file file1.txt, but if we manually open the text file we find 'A' written on the file. It means that putw() actually take integer argument but write it as character on the file.
So, it means that compiler take the argument as the ASCII code of the particular character and write the character on the text file.
int getw(FILE*);
Return type is integer.
Having one argument that is FILE* that is the location of the file from which the integer data to be read.
In this below example we will read the data that we have written on the file named file1.txt in the example above.
int main()
{
FILE *fp;
int ch;
fp=fopen("file1.txt","r");
ch=getw(fp);
printf("%d",ch);
fclose(fp);
}
output
65
Explanation: Here we read the data we wrote to file1.txt in above program and we will get the output as 65.
So, getw() reads the character 'A' that was already written on the file file1.txt and return the ASCII code of the character 'A' that is 65.
We can also write the above program as:
int main()
{
FILE *fp;
fp=fopen("file1.txt","r");
printf("%d",getw(fp));
fclose(fp);
}
output
65
If num is an int, then putw(num, fp) is equivalent to fwrite(&num, sizeof(int), 1, fp), except for having a different return value. It writes an int to the file in binary format. getw is similar but with fread instead. You can see how glibc implements them: putw,getw.
This means that:
They are not appropriate for writing text. If you want to write a number to a file in human-readable decimal or hexadecimal format, use fprintf instead.
They typically read/write more than one byte (one character) to the file. For instance, on a machine with 32-bit ints, they will read/write four bytes. Attempting to do putw('c') will not simply write the single character 'c'.
They should only be used with files opened in binary mode (if that makes a difference on your system).
You should not expect the contents of the file to be human-readable at all. If you attempt to view the file in an editor, you'll see the representation of whatever bytes are in the file, in your current character set (e.g. ASCII).
You should not expect the file to be successfully read back on another computer that uses a different internal representation for int (e.g. different width, different endianness).
On a typical system with 32-bit little-endian int, putw(65, fp) will result in the four bytes 0x41 0x00 0x00 0x00. The 0x41 (decimal 65) is the ASCII code for the character A, so you'll see that if you view it. The 0x00 bytes may or may not be displayed at all, depending on what you are using to view.
These function are not a good idea to use in new code. Even if you do need to store binary data in files, which has various disadvantages as noted and should usually only be done if there is a very good reason for it, you should simply use fwrite and fread. getw/putw are a worse option because:
They will make your code less portable. fwrite/fread are part of the ISO C standard, which is the most widely supported cross-platform modern standard for the C language. getw/putw were present in the Single Unix Specification v2 version 2, which dates to 1997 and is now obsolete. They were not included in the POSIX/SUSv3 specs which superseded SUSv2, and it would be unwise to count on them being available on new systems.
They will make your code less readable. Since fread/fwrite are far more widely used, another programmer reading your code will recognize immediately what they do. Since getw/putw are more obscure, people are likely to have to go and look them up, and the names don't make it easy to remember that they operate specifically on the type int. Readers may also confuse them with the similarly-named ISO-standard functions getwc/putwc. So using getw/putw makes your code less readable.
They may introduce subtle bugs. getw returns EOF on end-of-file or error, but EOF is a valid integer value (often -1). Therefore, if it returns this value, you cannot easily tell whether the file actually contained the integer -1, or whether the read failed. And since this only happens for one particular value, it may be missed in testing. You can check ferror() and feof() to distinguish the two cases, but this is awkward, easy to forget to do, and negates most of the apparent convenience of the "simpler" interface of getw compared to fread.
I speculate that the only reason these functions existed in the first place is that, like putc (respectively getc), putw could be implemented as a macro that directly wrote the buffer of fp and would thus be a little more efficient than calling fwrite. Such an implementation is no longer feasible on modern systems, since it wouldn't be thread-safe, so putw needs a function call anyway. In fact, with glibc in particular, putw just calls fwrite after all, with the overhead of an additional function call, so it's now less efficient. So there is no longer any reason at all to use these functions.
From the man page of putw() and getw()
getw() reads a word (that is, an int) from stream. It's provided for compatibility with SVr4.
putw() writes the word w (that is, an int) to stream. It is provided for compatibility with SVr4.
You can use the fread and fwrite function for better use.
getw() reads the integer from the given FILE stream.
putw() it write the integer given in the first argument into the file pointer.
getw:
It will read the integer from the file. like getchar() doing the work. Consider the file having the content "hello". It will read the h and return ascii value of h.
putw:
It will place the given integer, integer taken as a ascii value. Corresponding value of the ascii value placed in the file. like putchar()

Read image file hex values and print to file

I am absolutely new to C, learning from a book. I have been searching on the net for how to read and write hex values but I can't find what I am looking for.
Basically I want to read an image file like jpg and write it out to a file verbatim, but my code doesn't do it.
#include<stdio.h>
int main()
{
unsigned long txt;
FILE *myimg, *img;
myimg=fopen("myimg.jpg","w");
fclose(myimg);
myimg=fopen("myimg.jpg","rb+");
img=fopen("img.jpg","rb");txt=0;
printf("Start");
while(!feof(img))
{
if(img==NULL)
{
printf("WTF1");
}
txt=fgetc(img);
fprintf(myimg,"%x",txt);
}
return(0);
}
The output file is different in size and when I look at it in a hex editor, there is no similarity. Can you tell me how it is done?
Assuming CHAR_BITS==8, each char holds a value in the range [0..255] (or [-128..127] if it is signed). This is represented in hex by [0x00..0xff]. Values between 0 and 15 can be represented as hex in a single character; other values will need two characters to represent a single char as hex.
fprintf(myimg,"%x",txt);
write the value of your char as hex. For values outside the range [0..15], it'll need to write 2 characters to represent a single char. (e.g. if txt==16, formatting it as hex will write the characters 1 then 0 to file.)
You need to use the %c format specifier instead
fprintf(myimg,"%c",txt);
Alternatively, it'd be clearer if you used fputc
fputc(myimg,txt);
What you have written is different from what you read.
You read a char, but you write a long value in text mode, they are quite different.
If you open your output file in a normal text editor and open your input file in a hex editor, they should look the same.
Just use fputc() instead of fprintf, you should get what you want.

Display special characters in C

I have an requirement where the C code extract string data from database and write it to a file. The string data in the database can have any kind of characters
for example: Description field have data "Adj \342\200\223 Data" , when I write to the file the text it writes as "Adj â Data". Similarly, this description field can have any kind of data, my code just read and uses strcpy after extracting from the database and write to a file.
How do I get the data written to a file as it is in the description field ?
Think easiest solution would be writing byte by byte - shouldn't matter that much with buffering:
int pos = 0;
FILE *fp = 0;
//...
fp = fopen("somefile.txt", "w");
//...
while(buffer[pos])
if(buffer[pos] < 32 || buffer[pos] > 127) // change bounds for non-printable chars as you like
fprintf(fp, "%c", buffer[pos++]);
else
fprintf(fp, "\\%u", buffer[pos++]);
Edit:
Might have misunderstood your question. Only use string functions when you're actually working with strings. For binary data use binary functions (e.g. the mentioned memcpy()).
Edit 2/3:
Don't print the value as "%d" or "%u" - should be "%3o" to print as a 3-digit octal number. Using "%o" could be unsafe if other digits follow.

Resources