Binary to UTF-8 in C - c

I am working on an application in C where I need to show Unicode UTF-8 characters. I am getting the values as a binary byte stream as 11010000 10100100 as character array which is the Unicode character "Ф".
I want to store and display the character. I tried to convert the binary to a hexadecimal character array. But printing with
void binaryToHex(char *bData) {
char hexaDecimal[MAX];
int temp;
long int i = 0, j = 0;
while (bData[i]) {
bData[i] = bData[i] - 48;
++i;
}
--i;
while (i - 2 >= 0) {
temp = bData[i - 3] * 8 + bData[i - 2] * 4 + bData[i - 1] * 2 + bData[i];
if (temp > 9)
hexaDecimal[j++] = temp + 55;
else
hexaDecimal[j++] = temp + 48;
i = i - 4;
}
if (i == 1)
hexaDecimal[j] = bData[i - 1] * 2 + bData[i] + 48;
else if (i == 0)
hexaDecimal[j] = bData[i] + 48;
else
--j;
printf("Equivalent hexadecimal value: ");
char hexVal[MAX];
// size_t len = j+1;
int k = 0;;
while (j >= 0) {
char *ch = (char*)hexaDecimal[j--];
if (j % 2 == 0) {
hexVal[k] = '\\';
k++;
hexVal[k] = 'x';
k++;
}
printf("\nkk++Length %d ...J= %d.. ", k, j);
hexVal[k] = ch;
k++;
printf("%c", ch);
}
printf("KKKK+=== %d", k);
hexVal[k] = NULL;
// printf("\nkk++Length %d",strlen(hexVal));
printf("\nMM+-+MM %s===\n ..>>>>", hexVal);
}
Only showing the value as \xD0\xA4. I did string manipulation for that.
But when writing in the way
char s[]= "\xD0\xA4";
OR
char *s= "\xD0\xA4";
printf("\n %s",s);
producing the desired result that is printing the character "Ф". How can I get the correct string dynamically? Is there any library for this in C?
The code is from http://www.cquestions.com/2011/07/binary-to-hexadecimal-conversion-in.html.
Is there a way to print it from binary directly or from a HEX value. Or is there an alternative for that?

Escape codes such as \xD0 are interpreted by the compiler when encountered in the value of a character or string literal. The compiler replaces them with the corresponding byte (or byte sequence in some cases). They are not meaningful to C at runtime.
You are therefore not only making it harder on yourself but doing altogether the wrong thing by constructing and printing the text of such escape sequences at runtime. What you get is exactly what you should expect. Just print the literal byte sequence you decode from the program input, without any dress-up.

At last converting the Unicode binary char array to actual binary codepoint like converting
11010000 10100100 to 10000 100100 and then converting to decimal and then to Unicode solved my problem for now.below is the link I use to convert to UTF8 from decimal.
C++ Windows decimal to UTF-8 Character Conversion
resources I used:
https://www.youtube.com/watch?v=vLBtrd9Ar28
http://www.zehnet.de/2005/02/12/unicode-utf-8-tutorial/

Related

Padding and adding binary strings in C

I have been tasked with writing a program in C that takes strings of binary digits and adds them. The two strings can be different lengths, so before they are added, they need to be padded to the same size.
//Make copy of strings with proper padding
char *aCopy = (char *)calloc(size+1,sizeof(char));
char *bCopy = (char *)calloc(size+1,sizeof(char));
int i;
for(i = size; i >= 0; i--)
{
aCopy[i] = '0';
bCopy[i] = '0';
}
for (i = strlen(a); i >= 0; i--)
if (i == 0 && a[i]=='1') //Two's complement
aCopy[i] = '1';
else
aCopy[size - i] = a[strlen(a)-i];
for (i = strlen(b); i >= 0; i--)
if (i == 0 && b[i]=='1') //Two's complement
bCopy[i] == '1';
else
bCopy[size - i] = b[strlen(b)-i];
I am having an issue where, if the lines commented "Two's complement" are run (they serve to move the leading 1 digit to the front of the padded string), the length of the padded string becomes one more than it should be. I cannot figure out why this is happening, and it is screwing up my calculations.
Edit: To clarify, an extra '0' char is being added to the end of the two's complement string.

Why does my conversion method in C continue to fail?

I am trying to convert from a popen pass, to a float as the final result. I have tried converting to a char, and then into a float in every possible way I can find, however the output I have seen using printf seems to be wrong every time. I have tried using a tostring function, as well as using a %s like in the printf function that returns the correct function, however it all seems to give me the wrong output as soon as I try to convert the output. Should I be trying a different conversion method?
Here is the code.
FILE * uname;
char os[80];
int lastchar;
char n;
uname = popen("sudo python ./return63.py", "r");
lastchar = fread(os, 1, 80, uname);
os[lastchar] = "\0";
n = toString(("%s", os));
printf("THE DIRECT OUTPUT FROM PY IS %s", os);
printf("THE DIRECT OUTPUT For n IS %c", n);
float ia = n - 0;
long p = ia - 0;
float dd = p - 0;
printf("Your OS is %f", dd);
Output from the PY is 'THE DIRECT OUTPUT FROM PY IS 63.0' , which is the correct value,
output from the n is 'THE DIRECT OUTPUT For n IS �'
output from the dd is 'Your OS is Your OS is 236.000000'
The function tostring was pulled from an answered question about how to get the output from another answered question. I have tried with and without this code.
int toString(char a[]) {
int c, sign, offset, n;
if (a[0] == '-') { // Handle negative integers
sign = -1;
}
if (sign == -1) { // Set starting position to convert
offset = 1;
}
else {
offset = 0;
}
n = 0;
for (c = offset; a[c] != '\0'; c++) {
n = n * 10 + a[c] - '0';
}
if (sign == -1) {
n = -n;
}
return n;
}
toString returns an int, so store an int and output an int.
int n = toString(os); // Also removed the obfuscating '("%s", ..)'
printf("THE DIRECT OUTPUT For n IS %d", n);
Also your toString function has undefined behavior because sign might be read without being initialized.
if (a[0] == '-') { // Handle negative integers
sign = -1;
offset = 1;
}
else {
sign = 1;
offset = 0;
}
You have a potential os buffer overflow and you are not doing the null termination of os correctly:
lastchar = fread(os, 1, sizeof(os) - 1, uname); // Only read one byte less
os[lastchar] = '\0'; // changed from string "\0" to char '\0'
And finally you are not checking the input string for digits, you are accepting every input (also the '.' in "63.0"). You might want to stop at the first non-digit character:
for (c = offset; !isdigit((unsigned char)a[c]); c++) {

Create char array from int in C

How do you put an int into a char array?
int x = 21, i = 3;
char length[4];
while(i >= 0) {
length[i] = (char) (x % 10);
x /= 10;
i--;
} printf("%s\n", length);
// length should now be "0021"
The string comes out blank instead.
Note: This is not a duplicate of "How do I convert from int to chars in C++?" because I also need padding. i.e. "0021" not "21"
You're not getting the character code of the digit, you're using the digit as if it were its own character code. It should be:
length[i] = '0' + (x % 10);
You also need to add an extra element to the length array for the terminating null character:
char length[5];
length[4] = 0;
The problem in your code is basically that 1 != '1' i.e. the character is not the integer, you need to check the ascii table to see what ascii code represents the character '1' but you don't really need to know the number, you can just use '1' note the single qoutes.
But you also didn't nul terminate your string, you need to add a '\0' at the end of the string, so
int x = 21, i = 3;
char length[5];
length[4] = '\0';
while (i >= 0)
{
length[i--] = x % 10 + '0';
x /= 10;
}
printf("%s\n", length);
should work, but is unecessary, you can just
snprintf(length, sizeof(length), "%0*d", padding, x);
/* ^ this is how many characters you want */
notice that sizeof works because length is a char array, do not confuse that with the length of a string.

Accessing elements in a string?

I have to convert a given binary input (e.g. 1101) to decimal, but the input isn't a string array or an integer (the passed argument is const char *binstr). How am I supposed to access each individual digit of the binary number so I can do pow(x,y) on each and add them together to get the decimal number?
const char * usually refers to a C string. You can just use strtol(3):
int x = strtol(binstr, NULL, 2);
You could try with this program which converts from Binary to Decimal
char *binstr = "1011011";
int num = 0, sum = 0, ctr = 0;
ctr = strlen(binstr) - 1;
do{
sum += ((binstr[ctr] & 0x1) << num);
ctr--;
num ++;
}while(ctr >= 0);
binstr[0];
binstr[1];
binstr[2];
etc
or you can do it through a pointer
char* s = binstr;
unsigned long x =0;
while(*s) { x = x << 1; x |= (*s == '1' ? 1:0); s++;}
printf("the decimal of %s is %ul", binstr, x);
You've made a c string and you can get each character the way similar to arrays:
input[i]
Here's an example of splitting the binary string into individual bits (characters) and printing them out: http://cfiddle.net/wYtKJv
You can use loops:
while(i<100){
if(binstr[i]== '\0'){
break;
}
printf("First Bit:\n%c\n\n",binstr[i]);
i++;
}
Since C-strings are null terminated you can check to see if a character if we hit is '\0' to break the loop.
In the loop you can also convert the chars to ints and store them someplace (array probably) where you can access them for calculations.

Questions about a Kernighan and Ritchie Excercise 2-3

I'm trying to write a program in C that converts hexadecimal numbers to integers. I've written successfully a program that converts octals to integers. However, the problems begin once I start using the letters (a-f). My idea for the program is ads follows:
The parameter must be a string that starts with 0x or 0X.
The parameter hexadecimal number is stored in a char string s[].
The integer n is initialized to 0 and then converted as per the rules.
My code is as follows (I've only read up to p37 of K & R so don't know much about pointers) :
/*Write a function htoi(s), which converts a string of hexadecimal digits (including an optional 0x or 0X) into its equivalent integer value. The allowable digits are 0 through 9, a through f, and A through F.*/
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <ctype.h>
int htoi(const char s[]) { //why do I need this to be constant??
int i;
int n = 0;
int l = strlen(s);
while (s[i] != '\0') {
if ((s[0] == '0' && s[1] == 'X') || (s[0] == '0' && s[1] == 'x')) {
for (i = 2; i < (l - 1); ++i) {
if (isdigit(s[i])) {
n += (s[i] - '0') * pow(16, l - i - 1);
} else if ((s[i] == 'a') || (s[i] == 'A')) {
n += 10 * pow(16, l - i - 1);
} else if ((s[i] == 'b') || (s[i] == 'B')) {
n += 11 * pow(16, l - i - 1);
} else if ((s[i] == 'c') || (s[i] == 'C')) {
n += 12 * pow(16, l - i - 1);
} else if ((s[i] == 'd') || (s[i] == 'D')) {
n += 13 * pow(16, l - i - 1);
} else if ((s[i] == 'e') || (s[i] == 'E')) {
n += 14 * pow(16, l - i - 1);
} else if ((s[i] == 'f') || (s[i] == 'F')) {
n += 15 * pow(16, l - i - 1);
} else {
;
}
}
}
}
return n;
}
int main(void) {
int a = htoi("0x66");
printf("%d\n", a);
int b = htoi("0x5A55");
printf("%d\n", b);
int c = htoi("0x1CA");
printf("%d\n", c);
int d = htoi("0x1ca");
printf("%d\n", d);
}
My questions are:
1. If I don't use const in the argument for htoi(s), i get the following warnings from the g++ compiler :
2-3.c: In function ‘int main()’: 2-3.c:93:20: warning: deprecated
conversion from string constant to ‘char*’ [-Wwrite-strings]
2-3.c:97:22: warning: deprecated conversion from string constant to
‘char*’ [-Wwrite-strings] 2-3.c:101:21: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings] 2-3.c:105:21:
warning: deprecated conversion from string constant to ‘char*’
[-Wwrite-strings]
Why is this?
2.Why is my program taking so much time to run? I haven't seen the results yet.
3.Why is it that when I type in cc 2-3.c instead of g++ 2-3.c in the terminal, I get the following error message:
"undefined reference to `pow'"
on every line that I've used the power function?
4. Please do point out other errors/ potential improvements in my program.
If I don't use const in the argument for htoi(s), i get the following warnings from the g++ compiler
The const parameter should be there, because it is regarded as good and proper programming to never typecast away const from a pointer. String literals "..." should be treated as constants, so if you don't have const as parameter, the compiler thinks you are casting away the const qualifier.
Furthermore, you should declare all pointer parameters that you don't intend to modify the contents of as const, Google the term const correctness.
Why is my program taking so much time to run? I haven't seen the results yet.
I think mainly because you have made an initialization goof-up. int i; i contains rubbish. Then while (s[rubbish_value] != '\0'). This function can be written a whole lot better too. Start by checking for the 0x in the start of the string, if they aren't there, signal some error (return NULL?), otherwise discard them. Then start one single loop after that, you don't need 2 loops.
Note that the pow() function deals with float numbers, which will make your program a slight bit slower. You could consider using an integer-only version. Unfortunately there is no such function in standard C, so you will have to found one elsewhere.
Also consider the function isxdigit(), a standard function in ctype.h, which checks for digits 0-9 as well as hex letters A-F or a-f. It may however not help with performance, as you will need to perform different calculations for digits and letters.
For what it is worth, here is a snippet showing how you can convert a single char to a hexadecimal int. It is not the most optimized version possible, but it takes advantage of available standard functions, for increased readability and portability:
#include <ctype.h>
uint8_t hexchar_to_int (char ch)
{
uint8_t result;
if(isdigit(ch))
{
result = ch - '0';
}
else if (isxdigit(ch))
{
result = toupper(ch) - 'A' + 0xA;
}
else
{
// error
}
return result;
}
Don't use a C++ compiler to compile a C program. That's my first advice to you.
Secondly const in a function parameter for a char * ensures that the programmer doesn't accidentally modify the string.
Thirdly you need to include the math library with -lm as stated above.
a const char[] means that you cannot change it in the function. Casting from a const to not-const gives a warning. There is much to be said about const. Check out its Wikipedia page.
--
Probably, cc doesn't link the right libraries. Try the following build command: cc 2-3.c -lm
Improvements:
Don't use pow(), it is quite expensive in terms of processing time.
Use the same trick with the letters as you do with the numbers to get the value, instead of using fixed 'magic' numbers.
You don't need the last else part. Just leave it empty (or put an error message there, because those characters aren't allowed).
Good luck!
About my remark about the pow() call (with the use of the hexchar_to_int() function described above, this is how I'd implement this (without error checking):
const char *t = "0x12ab";
int i = 0, n = 0;
int result = 0;
for (i = 2; i < strlen(t); i++) {
n = hexchar_to_int(t[i]);
result |= n;
result <<= 4;
}
/* undo the last shift */
result >>= 4;
I just worked through this exercise myself, and I think one of the main ideas was to use the knowledge that chars can be compared as integers (they talk about this in chapter 2).
Here's my function for reference. Thought it may be useful as the book doesn't contain answers to exercises.
int htoi(char s[]) {
int i = 0;
if(s[i] == '0') {
++i;
if(s[i] == 'x' || s[i] == 'X') {
++i;
}
}
int val = 0;
while (s[i] != '\0') {
val = 16 * val;
if (s[i] >= '0' && s[i] <= '9')
val += (s[i] - '0');
else if (s[i] >= 'A' && s[i] <= 'F')
val += (s[i] - 'A') + 10;
else if (s[i] >= 'a' && s[i] <= 'f')
val += (s[i] - 'a') + 10;
else {
printf("Error: number supplied not valid hexadecimal.\n");
return -1;
}
++i;
}
return val;
}
Always init your variables int i=0, otherwise i will contain a garbage value, could be any number, not necessary 0 as you expect. You're running the while statement in an infinite loop, that's why it takes forever to get the results, print i to see why. Also, add a break if the string doesn't start with 0x, will avoid the same loop issue when the user is used on a random string. As others mention you need to import the library containing pow function and declare your string with const to get rid of the warning.
This is my version of program for the question above. It converts the string of hex into decimal digits irrespective of optional prefix(0x or 0X).
4 important library functions used are strlen(s), isdigit(c), isupper(c), isxdigit(c), pow(m,n)
Suggestions to improve the code are welcome :)
/*Program - 5d Function that converts hex(s)into dec -*/
#include<stdio.h>
#include<stdlib.h>
#include<math.h> //Declares mathematical functions and macros
#include<string.h> //Refer appendix in Page 249 (very useful)
#define HEX_LIMIT 10
int hex_to_dec(char hex[]) //Function created by me :)
{
int dec = 0; //Initialization of decimal value
int size = strlen(hex); //To find the size of hex array
int temp = size-1 ; //Pointer pointing the right element in array
int loop_limit = 0; //To exclude '0x' or 'OX' prefix in input
if(hex[0]=='0' && ((hex[1]=='x') || (hex[1]=='X')))
loop_limit = 2;
while(temp>=loop_limit)
{
int hex_value = 0; //Temporary value to hold the equivalent hex digit in decimal
if(isdigit(hex[temp]))
hex_value = (hex[(temp)]-'0') ;
else if(isxdigit(hex[temp]))
hex_value = (toupper(hex[temp])-'A' + 10);
else{
printf("Error: No supplied is not a valid hex\n\n");
return -1;
}
dec += hex_value * pow(16,(size-temp-1)); //Computes equivalent dec from hex
temp--; //Moves the pointer to the left of the array
}
return dec;
}
int main()
{
char hex[HEX_LIMIT];
printf("Enter the hex no you want to convert: ");
scanf("%s",hex);
printf("Converted no in decimal: %d\n", hex_to_dec(hex));
return 0;
}

Resources