Printing Japanese characters in C program - c

I want to print Japanese characters using the C program.
I've found the Unicode range of some Japanese characters, converted them to decimal and used the for loop to print them:
setlocale(LC_ALL, "ja_JP.UTF8");
for (int i = 12784; i <= 12799; i++) {
printf("%c\n",i);
}
locale.h and wchar.h are present in the header.
The output gives me only ?????????? characters.
Please let me know how it could be resolved.

%c is only able to print characters from 0 to 127, for extended characters use:
printf("%lc\n", i);
or better yet
wprintf(L"%lc\n", i);

In addition #David Ranieri fine answer, I wanted to explain about the "output gives me only ?????????? characters."
"%c" accepts an int argument. Recall a char passed to a ... function is converted to an int. Then
the int argument is converted to an unsigned char, and the resulting character is written. C17dr § 7.21.6.1 8.
Thus printf("%c" ... handles values 0-255. Values outside that range being converted to that range.
OP's code below re-written in hex.
// for (int i = 12784; i <= 12799; i++) {
for (int i = 0x31F0; i <= 0x31FF; i++) {
printf("%c\n",i);
}
With OP locale setting and implementation, printing values [0xF0 - 0XFF] resulted in '?'. I am confident that is true for [0x80 - 0xFF] for OP. Other possibilities exist. I received �.
Had OP done the below, more familiar output would be seen, though not the Hiragana characters desired.
for (int i = 0x3041; i <= 0x307E; i++) {
printf("%c",i);
}
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Related

C: How to add char to chars, and when the max char is reached have it loop back to 'a'?

I am creating a simple encryption program.
I am adding chars to chars to create a new char.
As of now the new 'char' is often a represented by a '?'.
My assumption was that the char variable has a max sum and once it was passed it looped back to 0.
assumed logic:
if char a == 1 && char z == 255
then 256 should == a.
This does not apear to be the case.
This snippet adds a char to a char.
It often prints out something like:
for (int i = 0; i < half; ++i) {
halfM1[i] = halfM1[i] + halfP1[i];
halfM2[i] = halfM2[i] + halfP2[(half + i)];
}
printf("\n%s\n", halfM1 );
printf("%s\n", halfM2);
Returns:
a???
3d??
This snippet removes the added char and the strings go back to normal.
for (int i = 0; i < half; ++i) {
halfM1[i] = halfM1[i] - halfP1[i];
halfM2[i] = halfM2[i] - halfP2[(half + i)];
}
printf("\n%s\n", halfM1 );
printf("%s\n", halfM2);
returns:
messagepart1
messagepart2
The code technically works, but I would like the encryption to be in chars.
If question on why 'half' is everywhere.
The message and key are split in half so the first half and second half of message have separate encryption.
First of all, there is no such thing as "wraparound" for common char. A common char is a signed type in x86, and signed integers do not have wraparound. Instead the overflow leads to undefined behaviour. Additionally, the range of chars can be -128 ... 127, or even something
For cryptographic purposes you'd want to use unsigned chars, or even better, raw octets with uint8_t (<stdint.h>).
Second problem is that you're printing with %s. One of the possible 256 resulting characters is \0. If this gets into the resulting string, it will terminate the string prematurely. Instead of using %s, you should output it with fwrite(halfM1, buffer_size, 1, stdout). Of course the problem is that the output is still some binary garbage. For this purposes many Unix encryption programs will write to file, or have an option to output an ASCII-armoured file. A simple ASCII armouring would be to output as hex instead of binary.
The third is that there is an operation that is much better than addition/subtraction for cryptographic purposes: XOR, or halfM1[i] = halfM1[i] ^ halfP1[i]; - the beauty of which is that it is its own inverse!

Do char's in C have pre-assigned zero indexed values?

Sorry if my title is a little misleading, I am still new to a lot of this but:
I recently worked on a small cipher project where the user can give the file a argument at the command line but it must be alphabetical. (Ex: ./file abc)
This argument will then be used in a formula to encipher a message of plain text you provide. I got the code to work, thanks to my friend for helping but i'm not 100% a specific part of this formula.
#include <stdio.h>
#include <cs50.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <ctype.h>
int main (int argc, string argv[])
{ //Clarify that the argument count is not larger than 2
if (argc != 2)
{
printf("Please Submit a Valid Argument.\n");
return 1;
}
//Store the given arguemnt (our key) inside a string var 'k' and check if it is alpha
string k = (argv[1]);
//Store how long the key is
int kLen = strlen(k);
//Tell the user we are checking their key
printf("Checking key validation...\n");
//Pause the program for 2 seconds
sleep(2);
//Check to make sure the key submitted is alphabetical
for (int h = 0, strlk = strlen(k); h < strlk; h++)
{
if isalpha(k[h])
{
printf("Character %c is valid\n", k[h]);
sleep(1);
}
else
{ //Telling the user the key is invalid and returning them to the console
printf("Key is not alphabetical, please try again!\n");
return 0;
}
}
//Store the users soon to be enciphered text in a string var 'pt'
string pt = get_string("Please enter the text to be enciphered: ");
//A prompt that the encrypted text will display on
printf("Printing encrypted text: ");
sleep(2);
//Encipher Function
for(int i = 0, j = 0, strl = strlen(pt); i < strl; i++)
{
//Get the letter 'key'
int lk = tolower(k[j % kLen]) - 'a';
//If the char is uppercase, run the V formula and increment j by 1
if isupper(pt[i])
{
printf("%c", 'A' + (pt[i] - 'A' + lk) % 26);
j++;
}
//If the char is lowercase, run the V formula and increment j by 1
else if islower(pt[i])
{
printf("%c", 'a' + (pt[i] - 'a' + lk) % 26);
j++;
}
//If the char is a symbol just print said symbol
else
{
printf("%c", pt[i]);
}
}
printf("\n");
printf("Closing Script...\n");
return 0;
}
The Encipher Function:
Uses 'A' as a char for the placeholder but does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)
In C, character literals like 'A' are of type int, and represent whatever integer value encodes the character A on your system. On the 99.999...% of systems that use ASCII character encoding, that's the number 65. If you have an old IBM mainframe from the 1970s using EBCDIC, it might be something else. You'll notice that the code is subtracting 'A' to make 0-based values.
This does make the assumption that the letters A-Z occupy 26 consecutive codes. This is true of ASCII (A=65, B=66, etc.), but not of all codes, and not guaranteed by the language.
does 'A' hold a zero indexed value automatically? (B = 1, C = 2, ...)
No. Strictly conforming C code can not depend on any character encoding other than the numerals 0-9 being represented consecutively, even though the common ASCII character set does represent them consecutively.
The only guarantee regarding character sets is per 5.2.1 Character sets, paragraph 3 of the C standard:
... the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous...
Character sets such as EBCDIC don't represent letters consecutively
char is a numeric type that happens to also often be used to represent visible characters (or special non-visible pseudo-characters). 'A' is a value (with actual type int) that can be converted to a char without overflow or underflow. That is, it's really some number, but you usually don't need to know what number, since you generally use a particular char value either as just a number or as just a character, not both.
But this program is using char values in both ways, so it somewhat does matter what the numeric values corresponding to visible characters are. One way it's very often done, but not always, is using the ASCII values which are numbered 0 to 127, or some other scheme which uses those values plus more values outside that range. So for example, if the computer uses one of those schemes, then 'A'==65, and 'A'+1==66, which is 'B'.
This program is assuming that all the lowercase Latin-alphabet letters have numeric values in consecutive order from 'a' to 'z', and all the uppercase Latin-alphabet letters have numeric values in consecutive order from 'A' to 'Z', without caring exactly what those values are. This is true of ASCII, so it will work on many kinds of machines. But there's no guarantee it will always be true!
C does guarantee the ten digit characters from '0' to '9' are in consecutive order, which means that if n is a digit number from zero to nine inclusive, then n + '0' is the character for displaying that digit, and if c is such a digit character, then c - '0' is the number from zero to nine it represents. But that's the only guarantee the C language makes about the values of characters.
For one counter-example, see EBCDIC, which is not in much use now, but was used on some older computers, and C supports it. Its alphabetic characters are arranged in clumps of consecutive letters, but not with all 26 letters of each case all together. So the program would give incorrect results running on such a computer.
Sequentiality is only one aspect of concern.
Proper use of isalpha(ch) is another, not quite implemented properly in OP's code.
isalpha(ch) expects a ch in the range of unsigned char or EOF. With k[h], a char, that value could be negative. Insure a non-negative value with:
// if isalpha(k[h])
if isalpha((unsigned char) k[h])

Printing the value of a 0-initialized array element prints nothing, why?

I have to initialize a char array to 0's. I did it like
char array[256] = {0};
I wanted to check if it worked so I tried testing it
#include <stdio.h>
int main()
{
char s[256] = {0};
printf("%c\n", s[10]);
return 0;
}
After I compile and run it, the command line output shows nothing.
What am I missing ? Perhaps I initialized the array in a wrong manner ?
TL;DR -- %c is the character representation. Use %d to see the decimal 0 value.
Related , from C11, chapter §7.21.6.1, (emphasis mine)
c If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
FYI, see the list of printable values.
That said, for a hosted environment, int main() should be int main(void), at least to conform to the standard.
You are printing s[10] as a character (%c), and the numeric value of s[10] is 0, which represents the character \0, which means end of string and has no textual representation. For this reason you are not seeing anything.
If you want to see the numeric value instead of the character value, use %d to print it as a decimal (integer) number:
printf("%d\n", s[10]);
Note that end of string isn't the same as end of line, as said in one of your comments. End of string means that any string operation over a character sequence must stop when the first \0 arrives. If the character sequence has anything else after \0, it won't be printed, because the string operation stops on the first \0 character.
An end of line is, however, a normal character, which visual effect is to say the terminal or text editor to print the next character after the end of line in a new line.
If you want to have a vector full of end of line characters (and print them as such), you have to travel the vector and fill it:
char s[256];
int i;
for (i = 0; i < 256; ++i)
s[i] = '\n';
printf("%c\n", s[10]);
The ASCII (decimal/numerical) value of the end of line character (\n) is 12, so, the following snippet will be equivalent:
char s[256];
int i;
for (i = 0; i < 256; ++i)
s[i] = 12;
printf("%c\n", s[10]);
That doesn't work however (it doesn't print a new line):
char s[256] = {'\n'}; // or {12};
printf("%c\n", s[10]);
because the effect of {'\n'} is to assign \n to the first element of the array, and the remainings 255 character are filled with value 0, no matter which type of array are you making (char[], int[] or whatever). If you write an empty pair of brackets {}, all the elements will be 0.
So, these two statements are equivalent:
char s[256] = {}; // Implicit filling to 0.
char s[256] = {0}; // Implicit filling to 0 from the second element.
However, without defining the array:
char s[256];
The array is not filling (not initialized), so, each element of s will have anything, until you fill it with values, for example, with a for.
I hope with all of this examples you get the whole picture.

Getting smiley faces instead of 0 and 1s when converting int array to char array

So I have an array of integers. I use a for loop to transfer the contents of the int array into the char array. The problem is when I output the values, the decimal %d outputs 0 and 1s but the %c outputs a smiley emotion.
int main()
{
int array[10] = {0,1,0,1,1,0,1,1,1,0,0};
char array2[10];
int i;
for(i=0;i<10;i++)
{
array2[i] = array[i];
printf("%c %d\n", array2[i],array2[i]);
}
}
The smiley faces are symbols for "ASCII" characters 1 and 2 in Microsoft codepage 437; and character 0 is invisible; thus your code performs as expected, but maybe not like you intended.
To fill the char array with the ASCII '0' and '1' characters, you can do
array2[i] = '0' + array[i];
Try this:
array2[i] = array[i] + '0';
This converts 0 or 1 to '0' or '1'
c conversion specifier prints a character. ASCII values (I assume you live in the ASCII world) 0 and 1 are non-printable in ASCII. The ASCII value for '0' and '1' characters are 0x30 and 0x31. The result of printing a non-printable is implementation dependent.
What do you think ASCII character 0 or 1 should look like? I'm going to guess that on your system it prints as a smiley face because it is normally unprintable.
Maybe print the character as hex instead so you can see the bits are set. eg :
printf("%x", ch & 0xff);
(from solution here : Printing hexadecimal characters in C )
If you want to print characters as ints, you just need to cast them.
printf("%d\n", (int)array2[i]);

%02x format specifier for char array

I have read about %02x format specifiers but when it comes to an argument of type char array, I am unable to understand the output of the following piece of code:
int main() {
// your code goes here
char str[6] = "abcde";
char t[3];
snprintf(t,3,"%02x",str);
printf("\t%s",t);
return 0;
}
Output:
bf
How str is being parsed under this format specifier, is a point of concern. What I feel, the output should have been "ab" (without quotes).
The point to make here is that if you are printing anything using %02x then you should be using it for each byte. It is common when printing a hash digest to declare a field of size twice the digest size (+1 for \0 if a string) and then populate it with repetitive sprintf() calls.
So one needs to loop through the bytes.
Have a look at the CPlusPlus entry on printf.
I think the format specifier you are looking for is %2.2s, which limits the minimum and maximum number of characters printed to 2, and it will print a string, rather than the value of your pointer.
main(){
printf("%2.2s","abcde");
return 0;
}
This will print "ab" (without the quotes). The same format rules apply to the entire printf family, including snprintf.
%02x is a format specifier that tells the parser that your value is a number, you want it to be printed in base 16, you want there to be at least 2 characters printed, and that any padding it applies should be full of zeroes, rather than spaces. You need to use some version of %s for printing strings.
You should read your source carefully. They might use something like this:
int main() {
char str[6] = "abcde";
char t[2*6] = { 0 };
int i;
for (i = 0; i <= 5; ++i)
{
snprintf(t+2*i, sizeof(t)-2*(i), "%02x", str[i]);
}
printf("\t%s",t);
return 0;
}
The %02x is used to convert one character to a hexadecimal string. Therefore you need to access individual charaters of str. You can build a line as my code shows or you can output each converted string as your fragment shows. But then it doesn't make sense to use the temporary variable t.
Edit: Fixed code.

Resources