Display special characters in C - c

I have an requirement where the C code extract string data from database and write it to a file. The string data in the database can have any kind of characters
for example: Description field have data "Adj \342\200\223 Data" , when I write to the file the text it writes as "Adj â Data". Similarly, this description field can have any kind of data, my code just read and uses strcpy after extracting from the database and write to a file.
How do I get the data written to a file as it is in the description field ?

Think easiest solution would be writing byte by byte - shouldn't matter that much with buffering:
int pos = 0;
FILE *fp = 0;
//...
fp = fopen("somefile.txt", "w");
//...
while(buffer[pos])
if(buffer[pos] < 32 || buffer[pos] > 127) // change bounds for non-printable chars as you like
fprintf(fp, "%c", buffer[pos++]);
else
fprintf(fp, "\\%u", buffer[pos++]);
Edit:
Might have misunderstood your question. Only use string functions when you're actually working with strings. For binary data use binary functions (e.g. the mentioned memcpy()).
Edit 2/3:
Don't print the value as "%d" or "%u" - should be "%3o" to print as a 3-digit octal number. Using "%o" could be unsafe if other digits follow.

Related

Should the binary output be the same as the ASCII input?

I'm writing a program that reads an ASCII file and then converts it to a Binary file, as I see it's not such a hard task, but understanding what's happening behind is ...
As I understand, an ASCII file is just human readable text, so if we want to create a new file full of ASCII's, a simple loop with a fputc() would be enough and for a binary file fwrite() will do the job right?
So my question here is, once that the ASCII to Binary conversion is done, what should I see in my .bin file? It should be filled with exactly the same symbols <88><88><88><88><88>?
Code:
/*
* From "Practical C Programming 2nd Edition"
* Exercise 14-4: Write a program that reads an ASCII file containing a list of numbers
* and writes a binary file containing the same list. Write a program that goes the
* other way so that you can check your work.
*
*/
#include <stdio.h>
#include <stdlib.h>
const char *in_filename = "bigfile.txt";
const char *out_filename = "out_file.bin";
int main()
{
int ch = 0;
/* ASCII */
FILE *in_file = NULL;
in_file = fopen(in_filename, "r");
if(!in_file)
{
fprintf(stderr, "ERROR: Could not open file %s ... ", in_filename);
exit(EXIT_FAILURE);
}
/* Binary */
FILE *out_file = NULL;
out_file = fopen(out_filename, "w+b");
if(!out_file)
{
fprintf(stderr, "ERROR: New file %s, could not be created ... ", out_filename);
exit(EXIT_FAILURE);
}
while(1)
{
ch = fgetc(in_file);
if(ch == EOF)
break;
else
fwrite(in_file, sizeof(char), 1, out_file);
}
fclose(in_file);
fclose(out_file);
return 0;
}
I'm generating the input file with this shell script:
tr -dc "0-9" < /dev/urandom | fold -w100|head -n 100000 > bigfile.txt
Any help would be very appreciate it.
Thanks.
fwrite(in_file, sizeof(char), 1, out_file);
is wrong because an integer is given where a pointer is expected.
You can use fputc to write one byte like
fputc(in_file, out_file);
If you still want to use fwrite for some reason, prepare a data to write and write that like
{
unsigned char in_file_byte = in_file;
fwrite(&in_file_byte, sizeof(in_file_byte), 1, out_file);
}
Now the contents of the output file will be the same as the input file. Some system may perform conversion of newline characters and it may make the contents differ because the input file is opened in text mode.
Opening a file in text mode or binary mode has nothing to do with ASCII/binary conversion.
It has to do with how the operating system deals with some special characters (such as new line characters), line size limit or file extensions.
In the fopen Linux man page:
The mode string can also include the letter 'b' either as a last character or as a character between the characters in any of the two-character strings de‐
scribed above. This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux. (Other
systems may treat text files and binary files differently, and adding the 'b' may be a good idea if you do I/O to a binary file and expect that your program
may be ported to non-UNIX environments.)
For more information about opening a file in text or binary mode, see https://stackoverflow.com/a/20863975/6874310
Now, back to the ASCII conversion:
All the data in a computer is stored in bits so in the end everything is binary.
A text file containing ASCII characters is also a binary file, except its contents can be mapped to the ASCII table characters in a meaningful way.
Have a look at the ASCII table. The ASCII character number zero (0) has a binary value of 0x30. This means that the zero you see in a text file is actually
a binary number 0x30 in the memory.
Your program is reading data from a file and writing to another file without performing any ASCII/binary conversion.
Also, there is a small error here:
fwrite(in_file, sizeof(char), 1, out_file);
It probably should be:
fwrite(&ch, sizeof(char), 1, out_file);
This writes the byte in variable ch to out_file.
With this fix, the program basically reads data from the file bigfile.txt and write the very same data to the file out_file.bin without any conversion.
To convert a single digit ASCII number to binary, read the digit from your input file in a byte (char type) and subtract 0x30 from it:
char ch = fgetc(in_file);
if(ch == EOF)
{
break;
}
else if (isdigit(ch))
{
ch = ch - 0x30;
fwrite(&ch, sizeof(char), 1, out_file);
}
Now, your output file will be actually binary.
Use isdigitto ensure the byte is an ASCII number. Add #include <ctype.h> in the beginning of your file to use it.
So, for a small input file with the following text:
123
Its binary representation will be:
0x313233
And, after the ASCII numbers are converted to binary, the binary contents will be:
0x010203
To convert it back to ASCII, simply reverse the conversion. That is, add 0x30 to each byte of the binary file.
If you're using a Unix-like system, you can use command line tools such as xxd to check binary files. On Windows, any Hex Editor program will do the job.

C Opening a file to check if it is Binary, if so print it is binary

I've made a program that opens files and searches for a word
I want it to only work on TEXT Files
Is there a way provided by C to check if a file is BINARY, and if so, I want to exit the program before any operations take place
Thanks
No, there isn't, because it's impossible to tell for sure. If you expect a specific encoding, you can check yourself whether the file contents are valid in this encoding, e.g. if you expect ASCII, all bytes must be <= 0x7f. If you expect UTF-8, it's a bit more complicated, see a description of it.
In any case, there's no guarantee that a "binary" file would not by accident look like a valid file in any given text encoding. In fact, the term "binary file" doesn't make too much sense, as all files contain binary data.
If we assume that by text you mean ASCII and not UTF-8, you can do this by reading each character and using isascii() and isspace() to check if it is a valid character:
void is_text(char *filename) {
FILE *f = fopen(filename, "r");
if (!f) {
perror("fopen failed");
return;
}
int c;
while ((c=fgetc(f) != EOF) {
if ((!isascii(c) || iscntrl(c)) && !isspace(c)) {
printf("is binary\n");
fclose(f);
return;
}
}
printf("is text\n");
fclose(f);
}
If the file contains UTF-8 characters, it becomes more complicated as you have to look at multiple bytes at once and see if they are valid UTF-8 byte sequences. There's also the question of which Unicode code points are considered text.
It's not the file per se which is binary or text; it is just about how you interpret the content of the file when opening it.
You may interpret a file containing solely text as binary, thereby avoiding that a /r/n might get translated to a /n only; And you may open a file containing raw data like, for example, a bitmap using a text mode, thereby probably corrupting the content in that a 0x0D 0x0A gets converted to a 0x0D only.
So you cannot check the file per se, but you may open the file in binary mode and see if the content contains anything which you do not interpret as text.
perhaps: system(file "path/filename");

C, format file for data of HTTP response

I have no experience with fscanf() and very little with functions for FILE. I have code that correctly determines if a client requested an existing file (using stat() and it also ensures it is not a directory). I will omit this part because it is working fine.
My goal is to send a string back to the client with a HTTP header (a string) and the correctly read data, which I would imagine has to become a string at some point to be concatenated with the header for sending back. I know that + is not valid C, but for simplicity I would like to send this: headerString+dataString.
The code below does seem to work for text files but not images. I was hoping that reading each character individually would solve the problem but it does not. When I point a browser (Firefox) at my server looking for an image it tells me "The image (the name of the image) cannot be displayed because it contains errors.".
This is the code that is supposed to read a file into httpData:
int i = 0;
FILE* file;
file = fopen(fullPath, "r");
if (file == NULL) errorMessageExit("Failed to open file");
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
fclose(file);
printf("httpData = %s\n", httpData);
Edit: This is what I send:
char* httpResponse = malloc((strlen(httpHeader)+strlen(httpData)+1)*sizeof(char));
strcpy(httpResponse, httpHeader);
strcat(httpResponse, httpData);
printf("HTTP response = %s\n", httpResponse);
The data part produces ???? for the image but correct html for an html file.
Images contain binary data. Any of the 256 distinct 8-bit patterns may appear in the image including, in particular, the null byte, 0x00 or '\0'. On some systems (notably Windows), you need to distinguish between text files and binary files, using the letter b in the standard I/O fopen() call (works fine on Unix as well as Windows). Given that binary data can contain null bytes, you can't use strcpy() et al to copy chunks of data around since the str*() functions stop copying at the first null byte. Therefore, you have to use the mem*() functions which take a start position and a length, or an equivalent.
Applied to your code, printing the binary httpData with %s won't work properly; the %s will stop at the first null byte. Since you have used stat() to verify the existence of the file, you also have a size for the file. Assuming you don't have to deal with dynamically changing files, that means you can allocate httpData to be the correct size. You can also pass the size to the reading code. This also means that the reading code can use fread() and the writing code can use fwrite(), saving on character-by-character I/O.
Thus, we might have a function:
int readHTTPData(const char *filename, size_t size, char *httpData)
{
FILE *fp = fopen(filename, "rb");
size_t n;
if (fp == 0)
return E_FILEOPEN;
n = fread(httpData, size, 1, fp);
fclose(fp);
if (n != 1)
return E_SHORTREAD;
fputs("httpData = ", stdout);
fwrite(httpData, size, 1, stdout);
putchar('\n');
return 0;
}
The function returns 0 on success, and some predefined (negative?) error numbers on failure. Since memory allocation is done before the routine is called, it is pretty simple:
Open the file; report error if that fails.
Read the file in a single operation.
Close the file.
Report error if the read did not get all the data that was expected.
Report on the data that was read (debugging only — and printing binary data to standard output raw is not the best idea in the world, but it parallels what the code in the question does).
Report on success.
In the original code, there is a loop:
int i = 0;
...
while(!feof(file)) {
fscanf(file, "%c", &httpData[i]);
i++;
}
This loop has a lot of problems:
You should not use feof() to test whether there is more data to read. It reports whether an EOF indication has been given, not whether it will be given.
Consequently, when the last character has been read, the feof() reports 'false', but the fscanf() tries to read the next (non-existent) character, adds it to the buffer (probably as a letter such as ÿ, y-umlaut, 0xFF, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS).
The code makes no check on how many characters have been read, so it has no protection against buffer overflow.
Using fscanf() to read a single character is a lot of overhead compared to getc().
Here's a more nearly correct version of the code, assuming that size is the number of bytes allocated to httpData.
int i = 0;
int c;
while ((c = getc(file)) != EOF && i < size)
httpData[i++] = c;
You could check that you get EOF when you expect it. Note that the fread() code does the size checking inside the fread() function. Also, the way I wrote the arguments, it is an all-or-nothing proposition — either all size bytes are read or everything is treated as missing. If you want byte counts and are willing to tolerate or handle short reads, you can reverse the order of the size arguments. You could also check the return from fwrite() if you wanted to be sure it was all written, but people tend to be less careful about checking that output succeeded. (It is almost always crucial to check that you got the input you expected, though — don't skimp on input checking.)
At some point, for plain text data, you need to think about CRLF vs NL line endings. Text files handle that automatically; binary files do not. If the data to be transferred is image/png or something similar, you probably don't need to worry about this. If you're on Unix and dealing with text/plain, you may have to worry about CRLF line endings (but I'm not an expert on this — I've not done low-level HTTP stuff recently (not in this millennium), so the rules may have changed).

Conversion from binary file to hex in C

I am trying to write some simple program to uploading files to my server. I' d like to convert binary files to hex. I have written something, but it does not work properly.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static int bufferSize = 1024;
FILE *source;
FILE *dest;
int n;
int counter;
int main() {
unsigned char buffer[bufferSize];
source = fopen("server.pdf", "rb");
if (source) {
dest = fopen("file_test", "wb");
while (!feof(source)) {
n = fread(buffer, 1, bufferSize, source);
counter += n;
strtol(buffer, NULL, 2);
fwrite(buffer, 1, n, dest);
}
}
else {
printf("Error");
}
fclose(source);
fclose(dest);
}
I use strtol to convert binary do hex. After invoking this code I have still strange characters in my file_test file.
I' d like to upload a file on server, for example a PDF file. But firstly I have to write a program, that will convert this file to a hex file. I'd like that the length of a line in hex file would be equal 1024. After that, I will upload this file line by line with PL/SQL.
EDIT: I completely misunderstood what the OP was aiming for. He wants to convert his pdf file to its hex representation, as I see now, because he wants to put that file in a text blob field in some database table. I still claim the exercise is a complete waste of time,since blobs can contain binary data: that's what blobs were invented for. Blob means binary large object.
You said: "I' d like to upload file on server, for example pdf file. But firstly I have to write a program, that will convert this file to hex file."
You don't have to, and must not, write any such conversion program.
You have to first understand and internalize the idea that hex notation is only an easy-to-read representation of binary. If you think, as you seem to, that you have to "convert" a pdf file to hex, then you are mistaken. A pdf file is a binary file is a binary file. You don't "convert" anything, not unless you want to change the binary!
You must abandon, delete, discard, defenestrate, forget about, and expunge your notion of "converting" any binary file to anything else. See, hex exists only as a human-readable presentation format for binary, each hex digit representing four contiguous binary digits.
To put it another way: hex representation is for human consumption only, unsuitable (almost always) for program use.
For an example: suppose your pdf file holds a four-bit string "1100," whose human-readable hex representation can be 'C'. When you "convert" that 1100 to hex the way you want to do it, you replace it by the ASCII character 'C', whose decimal value is 67. You can see right away that's not what you want to do and you immediately see also that it's not even possible: the decimal value 67 needs seven bits and won't fit in your four bits of "1100".
HTH
Your code is fantastically confused.
It's reading in the data, then doing a strtol() call on it, with a base of 2, and then ignoring the return value. What's the point in that?
To convert the first loaded byte of data to hexadecimal string, you should probably use something like:
char hex[8];
sprintf(hex, "%02x", (unsigned int) buffer[0] & 0xff);
Then write hex to the output file. You need to do this for all bytes loaded, of course, not just buffer[0].
Also, as a minor point, you can't call feof() before you've tried reading the file. It's better to not use feof() and instead check the return value of fread() to detect when it fails.
strtol converts a string containing a decimal representation of a number to the binary number if i am not mistaken. You probably want to convert something like a binary OK to 4F 4B... To do that you can use for example sprintf(aString, "%x", aChar).

Handling special characters in C (UTF-8 encoding)

I'm writing a small application in C that reads a simple text file and then outputs the lines one by one. The problem is that the text file contains special characters like Æ, Ø and Å among others. When I run the program in terminal the output for those characters are represented with a "?".
Is there an easy fix?
First things first:
Read in the buffer
Use libiconv or similar to obtain wchar_t type from UTF-8 and use the wide character handling functions such as wprintf()
Use the wide character functions in C! Most file/output handling functions have a wide-character variant
Ensure that your terminal can handle UTF-8 output. Having the correct locale setup and manipulating the locale data can automate alot of the file opening and conversion for you ... depending on what you are doing.
Remember that the width of a code-point or character in UTF-8 is variable. This means you can't just seek to a byte and begin reading like with ASCII ... because you might land in the middle of a code point. Good libraries can do this in some cases.
Here is some code (not mine) that demonstrates some usage of UTF-8 file reading and wide character handling in C.
#include <stdio.h>
#include <wchar.h>
int main()
{
FILE *f = fopen("data.txt", "r, ccs=UTF-8");
if (!f)
return 1;
for (wint_t c; (c = fgetwc(f)) != WEOF;)
printf("%04X\n", c);
fclose(f);
return 0;
}
Links
libiconv
Locale data in C/GNU libc
Some handy info
Another good Unicode/UTF-8 in C resource
Make sure you're not accidentally dropping any bytes; some UTF-8 characters are more than one byte in length (that's sort of the point), and you need to keep them all.
It can be useful to print the contents of the buffer as hex, so you can inspect which bytes are actually read:
static void print_buffer(const char *buffer, size_t length)
{
size_t i;
for(i = 0; i < length; i++)
printf("%02x ", (unsigned int) buffer[i]);
putchar('\n');
}
You can do this after loading a very short file, containing just a few characters.
Also make sure the terminal is set to the proper encoding, so it interprets your characters as UTF-8.
Probably your text file is ISO-8559-1 encoded but your terminal is UTF-8. This kind of mismatch is a standard problem when dealing with byte-oriented text handling; other C programs (such as the standard ‘cat’ and ‘more’ commands) will do the same thing and it isn't generally considered an error or something that needs to be fixed.
If you want to operate on a Unicode character level instead of bytes that's fine, but you'll need to use wchar as your character type instead of char throughout your program, and provide switches for the user to specify what the incoming file encoding actually is. (Whilst it is sometimes possible to guess, it's not very reliable.)
I don't know if it could help but if you're sure that the encodings of terminal and input file are the same, you can try to setlocale():
#include <locale.h>
…
setlocale(LC_CTYPE, "");

Resources