I would like to print superscript and subscript with printf, like x¹? - c

I want to print out a polynomial expression in c but i don't know print x to the power of a number with printf

It's far from trivial unfortunately. You cannot achieve what you want with printf. You need wprintf. Furthermore, it's not trivial to translate between normal and superscript. You would like a function like this:
wchar_t digit_to_superscript(int d) {
wchar_t table[] = { // Unicode values
0x2070,
0x00B9, // Note that 1, 2 and 3 does not follow the pattern
0x00B2, // That's because those three were common in various
0x00B3, // extended ascii tables. The rest did not exist
0x2074, // before unicode
0x2075,
0x2076,
0x2077,
0x2078,
0x2079,
};
return table[d];
}
This function could of course be changed to handle other characters too, as long as they are supported. And you could also write more complete functions operating on complete strings.
But as I said, it's not trivial, and it cannot be done with simple format strings to printf, and not even to wprintf.
Here is a somewhat working example. It's usable, but it's very short because I have omitted all error checking and such. Shortest possible to be able to use a negative float number as exponent.
#include <wchar.h>
#include <locale.h>
wchar_t char_to_superscript(wchar_t c) {
wchar_t digit_table[] = {
0x2070, 0x00B9, 0x00B2, 0x00B3, 0x2074,
0x2075, 0x2076, 0x2077, 0x2078, 0x2079,
};
if(c >= '0' && c <= '9') return digit_table[c - '0'];
switch(c) {
case '.': return 0x22C5;
case '-': return 0x207B;
}
}
void number_to_superscript(wchar_t *dest, wchar_t *src) {
while(*src){
*dest = char_to_superscript(*src);
src++;
dest++;
}
dest++;
*dest = 0;
}
And a main function to demonstrate:
int main(void) {
setlocale(LC_CTYPE, "");
double x = -3.5;
wchar_t wstr[100], a[100];
swprintf(a, 100, L"%f", x);
wprintf(L"Number as a string: %ls\n", a);
number_to_superscript(wstr, a);
wprintf(L"Number as exponent: x%ls\n", wstr);
}
Output:
Number as a string: -3.500000
Number as exponent: x⁻³⋅⁵⁰⁰⁰⁰⁰
In order to make a complete translator, you would need something like this:
size_t superscript_index(wchar_t c) {
// Code
}
wchar_t to_superscript(wchar_t c) {
static wchar_t huge_table[] {
// Long list of values
};
return huge_table[superscript_index(c)];
}
Remember that this cannot be done for all characters. Only those whose counterpart exists as a superscript version.

Unfortunately, it is not possible to output formatted text with printf.
(Of course one could output HTML format, but this then would need to be fed into an interpreter first for correct display)
So you cannot print text in superscript format in the general case.
What you have found is the superscript 1 as a special character. However this is only possible with 1 and 2, if I remember correctly (and only for the right code-page, not in plain ASCII).
The common way to print "superscripts" is to use the x^2, x^3 syntax. This is commonly understood.
An alternative is provided by klutt's answer. If you switch to unicode by using wprintf instead of printf you could use all superscript characters from 0 to 9. Even though, I am not sure how multi-digit exponents look like in a fixed-width terminal it works in principle.

If you want to print superscript 1, you need to use unicode. You can combine unicode superscripts to write a multi-digit number.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
setlocale(LC_CTYPE, "");
wchar_t one = 0x00B9;
wchar_t two = 0x00B2;
wprintf(L"x%lc\n", one);
wprintf(L"x%lc%lc\n", one, two);
}
Output:
$ clang ~/lab/unicode.c
$ ./a.out
x¹
x¹²
Ref: https://www.compart.com/en/unicode/U+00B9

Related

Why can't I print the decimal value of a extended ASCII char like 'Ç'? in C

First, in this C project we have some conditions as far as writing code: I can´t declare a variable and attribute a value to it on the same line of code and we are only allowed to use while loops. Also, I'm using Ubuntu for reference.
I want to print the decimal ASCII value, character by character, of a string passed to the program. For e.g. if the input is "rose", the program correctly prints 114 111 115 101. But when I try to print the decimal value of a char like a 'Ç', the first char of the extended ASCII table, the program weirdly prints -61 -121. Here is the code:
int main (int argc, char **argv)
{
int i;
i = 0;
if (argc == 2)
{
while (argv[1][i] != '\0')
{
printf ("%i ", argv[1][i]);
i++;
}
}
}
I did some research and found that i should try unsigned char argv instead of char, like this:
int main (int argc, unsigned char **argv)
{
int i;
i = 0;
if (argc == 2)
{
while (argv[1][i] != '\0')
{
printf("%i ", argv[1][i]);
i++;
}
}
}
In this case, I run the program with a 'Ç' and the output is 195 135 (still wrong).
How can I make this program print the right ASCII decimal value of a char from the extended ASSCCI table, in this case a "Ç" should be a 128.
Thank you!!
Your platform is using UTF-8 Encoding.
Unicode Latin Capital Letter C with Cedilla (U+00C7) "Ç" encodes to 0xC3 0x87 in UTF-8.
In turn those bytes in decimal are 195 and 135 which you see in output.
Remember UTF-8 is a multi-byte encoding for characters outside basic ASCII (0 thru 127).
That character is code-point 128 in extended ASCII but UTF-8 diverges from Extend ASCII in that range.
You may find there's tools on your platform to convert that to extended ASCII but I suspect you don't want to do that and should work with the encoding supported by your platform (which I am sure is UTF-8).
It's Unicode Code Point 199 so unless you have a specific application for Extended ASCII you'll probably just make things worse by converting to it. That's not least because it's a much smaller set of characters than Unicode.
Here's some information for Unicode Latin Capital Letter C with Cedilla including the UTF-8 Encoding: https://www.fileformat.info/info/unicode/char/00C7/index.htm
There are various ways of representing non-ASCII characters, such as Ç. Your question suggests you're familiar with 8-bit character sets such as ISO-8859, where in several of its variants Ç does indeed have code 199. (That is, if your computer were set up to use ISO-8859, your program probably would have worked, although it might have printed -57 instead of 199.)
But these days, more and more systems use Unicode, which they typically encode using a particular multibyte encoding, UTF-8.
In C, one way to extract wide characters from a multibyte character string is the function mbtowc. Here is a modification of your program, using this function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <wchar.h>
#include <locale.h>
int main (int argc, char **argv)
{
setlocale(LC_CTYPE, "");
if (argc == 2)
{
char *p = argv[1];
int n;
wchar_t wc;
while((n = mbtowc(&wc, p, strlen(p))) > 0)
{
printf ("%lc: %d (%d)\n", wc, wc, n);
p += n;
}
}
}
You give mbtowc a pointer to the multibyte encoding of one or more multibyte characters, and it converts one of them, returning it via its first argument — here, into the variable wc. It returns the number of multibyte characters it used, or 0 if it encountered the end of the string.
When I run this program on the string abÇd, it prints
a: 97 (1)
b: 98 (1)
Ç: 199 (2)
d: 100 (1)
This shows that in Unicode (just like 8859-1), Ç has the code 199, but it takes two bytes to encode it.
Under Linux, at least, the C library supports potentially multiple multibyte encodings, not just UTF-8. It decides which encoding to use based on the current "locale", which is usually part fo the environment, literally governed by an environment variable such as $LANG. That's what the call setlocale(LC_CTYPE, "") is for: it tells the C library to pay attention to the environment to select a locale for the program's functions, like mbtowc, to use.
Unicode is of course huge, encoding thousands and thousands of characters. Here's the output of the modified version of your program on the string "abΣ∫😊":
a: 97 (1)
b: 98 (1)
Σ: 931 (2)
∫: 8747 (3)
😊: 128522 (4)
Emoji like 😊 typically take four bytes to encode in UTF-8.

Is there a way to print Runes as individual characters?

Program's Purpose: Rune Cipher
Note - I am linking to my Own GitHub page below
(it is only for purpose-purpose (no joke intended; it is only for the purpose of showing the purpose of it - what I needed help with (and got help, thanks once again to all of you!)
Final Edit:
I have now (thanks to the Extremely Useful answers provided by the Extremely Amazing People) Completed the project I've been working on; and - for future readers I am also providing the full code.
Again, This wouldn't have been possible without all the help I got from the guys below, thanks to them - once again!
Original code on GitHub
Code
(Shortened down a bit)
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
#define UNICODE_BLOCK_START 0x16A0
#define UUICODE_BLOCK_END 0x16F1
int main(){
setlocale(LC_ALL, "");
wchar_t SUBALPHA[]=L"ᛠᚣᚫᛞᛟᛝᛚᛗᛖᛒᛏᛋᛉᛈᛇᛂᛁᚾᚻᚹᚷᚳᚱᚩᚦᚢ";
wchar_t DATA[]=L"hello";
int lenofData=0;
int i=0;
while(DATA[i]!='\0'){
lenofData++; i++;
}
for(int i=0; i<lenofData; i++) {
printf("DATA[%d]=%lc",i,DATA[i]);
DATA[i]=SUBALPHA[i];
printf(" is now Replaced by %lc\n",DATA[i]);
} printf("%ls",DATA);
return 0;
}
Output:
DATA[0]=h is now Replaced by ᛠ
...
DATA[4]=o is now Replaced by ᛟ
ᛠᚣᚫᛞᛟ
Question continues below
(Note that it's solved, see Accepted answer!)
In Python3 it is easy to print runes:
for i in range(5794,5855):
print(chr(i))
outputs
ᚢ
ᚣ
(..)
ᛝ
ᛞ
How to do that in C ?
using variables (char, char arrays[], int, ...)
Is there a way to e.g print ᛘᛙᛚᛛᛜᛝᛞ as individual characters?
When I try it, it just prints out both warnings about multi-character character constant 'ᛟ'.
I have tried having them as an array of char, a "string" (e.g char s1 = "ᛟᛒᛓ";)
And then print out the first (ᛟ) char of s1: printf("%c", s1[0]); Now, this might seem very wrong to others.
One Example of how I thought of going with this:
Print a rune as "a individual character":
To print e.g 'A'
printf("%c", 65); // 'A'
How do I do that, (if possible) but with a Rune ?
I have as well as tried printing it's digit value to char, which results in question marks, and - other, "undefined" results.
As I do not really remember exactly all the things I've tried so far, I will try my best to formulate this post.
If someone spots a a very easy (maybe, to him/her - even plain-obvious) solution(or trick/workaround) -
I would be super happy if you could point it out! Thanks!
This has bugged me for quite some time.
It works in python though - and it works (as far as I know) in c if you just "print" it (not trough any variable) but, e.g: printf("ᛟ"); this works, but as I said I want to do the same thing but, trough variables. (like, char runes[]="ᛋᛟ";) and then: printf("%c", runes[0]); // to get 'ᛋ' as the output
(Or similar, it does not need to be %c, as well as it does not need to be a char array/char variable) I am just trying to understand how to - do the above, (hopefully not too unreadable)
I am on Linux, and using GCC.
External Links
Python3 Cyphers - At GitHub
Runes - At Unix&Linux SE
Junicode - At Sourceforge.io
To hold a character outside of the 8-bit range, you need a wchar_t (which isn't necessarily Unicode). Although wchar_t is a fundamental C type, you need to #include <wchar.h> to use it, and to use the wide character versions of string and I/O functions (such as putwc shown below).
You also need to ensure that you have activated a locale which supports wide characters, which should be the same locale as is being used by your terminal emulator (if you are writing to a terminal). Normally, that will be the default locale, selected with the string "".
Here's a simple equivalent to your Python code:
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void) {
setlocale(LC_ALL, "");
/* As indicated in a comment, I should have checked the
* return value from `putwc`; if it returns EOF and errno
* is set to EILSEQ, then the current locale can't handle
* runic characters.
*/
for (wchar_t wc = 5794; wc < 5855; ++wc)
putwc(wc, stdout);
putwc(L'\n', stdout);
return 0;
}
(Live on ideone.)
Stored on the stack as a string of (wide) characters
If you want to add your runes (wchar_t) to a string then you can proceed the following way:
using wcsncpy: (overkill for char, thanks chqrlie for noticing)
#define UNICODE_BLOCK_START 0x16A0 // see wikipedia link for the start
#define UUICODE_BLOCK_END 0x16F0 // true ending of Runic wide chars
int main(void) {
setlocale(LC_ALL, "");
wchar_t buffer[UUICODE_BLOCK_END - UNICODE_BLOCK_START + sizeof(wchar_t) * 2];
int i = 0;
for (wchar_t wc = UNICODE_BLOCK_START; wc <= UUICODE_BLOCK_END; wc++)
buffer[i++] = wc;
buffer[i] = L'\0';
printf("%ls\n", buffer);
return 0;
}
About Wide Chars (and Unicode)
To understand a bit better what is a wide char, you have to think of it as a set of bits set that exceed the original range used for character which was 2^8 = 256 or, with left shifting, 1 << 8).
It is enough when you just need to print what is on your keyboard, but when you need to print Asian characters or other unicode characters, it was not enough anymore and that is the reason why the Unicode standard was created. You can find more about the very different and exotic characters that exist, along with their range (named unicode blocks), on wikipedia, in your case runic.
Range U+16A0..U+16FF - Runic (86 characters), Common (3 characters)
NB: Your Runic wide chars end at 0x16F1 which is slightly before 0x16FF (0x16F1 to 0x16FF are not defined)
You can use the following function to print your wide char as bits:
void print_binary(unsigned int number)
{
char buffer[36]; // 32 bits, 3 spaces and one \0
unsigned int mask = 0b1000000000000000000000000000;
int i = 0;
while (i++ < 32) {
buffer[i] = '0' + !!(number & (mask >> i));
if (i && !(i % 8))
buffer[i] = ' ';
}
buffer[32] = '\0';
printf("%s\n", buffer);
}
That you call in your loop with:
print_binary((unsigned int)wc);
It will give you a better understand on how your wide char is represented at the machine level:
ᛞ
0000000 0000001 1101101 1100000
NB: You will need to pay attention to detail: Do not forget the final L'\0' and you need to use %ls to get the output with printf.

%s minimum field width in the presence of unicode characters

So, here's my problem:
If someone wants to output visually aligned strings using printf, they'll obviously use %<n>s (where <n> is the minimum field width). And this works just fine, unless one of the strings contains unicode (UTF-8) characters.
Take this very basic example:
#include <stdio.h>
int main(void)
{
char* s1 = "\u03b1\u03b2\u03b3";
char* s2 = "abc";
printf("'%6s'\n", s1);
printf("'%6s'\n", s2);
return 0;
}
which will produce the following output:
'αβγ'
' abc'
This isn't all that surprising, because printf of course doesn't know that \u03b1 (which consists of two characters) only produces a single glyph on the output device (assuming UTF-8 is supported).
Now assume that i generate s1 and s2, but have no control over the format string used to output those variables. My current understanding is that nothing i could possibly do to s1 would fix this, because i'd have to somehow fool printf into thinking that s1 is shorter than it actually is. However, since i also control s2, my current solution is to add a non-printing character to s2 for each unicode character in s1, which would look something like this:
#include <stdio.h>
int main(void)
{
char* s1 = "\u03b1\u03b2\u03b3";
char* s2 = "abc\x06\x06\x06";
printf("'%6s'\n", s1);
printf("'%6s'\n", s2);
return 0;
}
This will produce the desired output (even though the actual width no longer corresponds to the specified field width, but i'm willing to accept that):
'αβγ'
'abc'
For context:
The example above is only to illustrate the unicode-problem, my actual code involves printing numbers with SI-prefixes, only one of which (µ) is a unicode character. Therefore i would generate strings containing only up to one normal or unicode character (which is why i can accept the resulting offset in the field-width).
So, my questions are:
Is there a better solution for this?
Is \x06 (ACK) a sensible choice (i.e. a character without undesired side-effects)?
Can you think of any problems with this approach?
Since the non ascii is restricted to µ, I believe there is a solution. I've taken value of µ to be \u00b5. Replace it with the correct value
I've coded a small function myPrint which takes input the string and the width n. You should be able to modify the code below to fit to your needs.
The function searches for all occurrences of µ and increments that much of width to the string
#include <stdio.h>
void myPrint(char* string, int n)
{
char* valueOfNu = "\u00b5";
for(int i=0;string[i]!='\0';i++)
{
if(string[i]==valueOfNu[0] && string[i+1]==valueOfNu[1])
n++;
}
printf("%*s",n,string);
}
int main(void)
{
char* s1 = "ab\u00b5";
char* s2 = "abc";
myPrint(s1,6);
printf("\n");
myPrint(s2,6);
printf("\n");
return 0;
}

+'0' wont give char value of int

I was trying to make this int to char program. The +'0' in the do while loop wont convert the int value to ascii, whereas, +'0' in main is converting. I have tried many statements, but it won't work in convert() .
#include<stdio.h>
#include<string.h>
void convert(int input,char s[]);
void reverse(char s[]);
int main()
{
int input;
char string[5];
//prcharf("enter int\n");
printf("enter int\n");
scanf("%d",&input);
convert(input,string);
printf("Converted Input is : %s\n",string);
int i=54;
printf("%c\n",(i+'0')); //This give ascii char value of int
printf("out\n");
}
void convert(int input,char s[])
{
int sign,i=0;
char d;
if((sign=input)<0)
input=-input;
do
{
s[i++]='0'+input%10;//but this gives int only
} while((input/=10)>0);
if(sign<0)
s[i++]='-';
s[i]=EOF;
reverse(s);
}
void reverse(char s[])
{
int i,j;
char temp;
for(i=0,j=strlen(s)-1;i<j;i++,j--)
{
temp=s[i];
s[i]=s[j];
s[j]=temp;
}
}
Output screenshot
Code screenshot
The +'0' in the do while loop wont convert the int value to ascii
Your own screenshot shows otherwise (assuming an ASCII-based terminal).
Your code printed 56, so it printed the bytes 0x35 and 0x36, so string[0] and string[1] contain 0x35 and 0x36 respectively, and 0x35 and 0x36 are the ASCII encodings of 5 and 6 respectively.
You can also verify this by printing the elements of string individually.
for (int i=0; string[i]; ++i)
printf("%02X ", string[i]);
printf("\n");
I tried your program and it is working for the most part. I get some goofy output because of this line:
s[i]=EOF;
EOF is a negative integer macro that represents "End Of File." Its actual value is implementation defined. It appears what you actually want is a null terminator:
s[i]='\0';
That will remove any goofy characters in the output.
I would also make that string in main a little bigger. No reason we couldn't use something like
char string[12];
I would use a bare minimum of 12 which will cover you to a 32 bit INT_MAX with sign.
EDIT
It appears (based on all the comments) you may be actually trying to make a program that simply outputs characters using numeric ascii values. What the convert function actually does is converts an integer to a string representation of that integer. For example:
int num = 123; /* Integer input */
char str_num[12] = "123"; /* char array output */
convert is basically a manual implementation of itoa.
If you are trying to simply output characters given ascii codes, this is a much simpler program. First, you should understand that this code here is a mis-interpretation of what convert was trying to do:
int i=54;
printf("%c\n",(i+'0'));
The point of adding '0' previously, was to convert single digit integers to their ascii code version. For reference, use this: asciitable. For example if you wanted to convert the integer 4 to a character '4', you would add 4 to '0' which is ascii code 48 to get 52. 52 being the ascii code for the character '4'. To print out the character that is represented by ascii code, the solution is much more straightforward. As others have stated in the comments, char is a essentially a numeric type already. This will give you the desired behavior:
int i = 102 /* The actual ascii value of 'f' */
printf("%c\n", i);
That will work, but to be safe that should be cast to type char. Whether or not this is redundant may be implementation defined. I do believe that sending incorrect types to printf is undefined behavior whether it works in this case or not. Safe version:
printf("%c\n", (char) i);
So you can write the entire program in main since there is no need for the convert function:
int main()
{
/* Make initialization a habit */
int input = 0;
/* Loop through until we get a value between 0-127 */
do {
printf("enter int\n");
scanf("%d",&input);
} while (input < 0 || input > 127);
printf("Converted Input is : %c\n", (char)input);
}
We don't want anything outside of 0-127. char has a range of 256 bits (AKA a byte!) and spans from -127 to 127. If you wanted literal interpretation of higher characters, you could use unsigned char (0-255). This is undesirable on the linux terminal which is likely expecting UTF-8 characters. Values above 127 will be represent portions of multi-byte characters. If you wanted to support this, you will need a char[] and the code will become a lot more complex.

Replace "0x" in hexadecimal string to "\x" in C

I have a C library that requires hexadecimal input of the form "\xFF". I need to pass an array of hexadecimal values formatted as "0xFF" form. Is there a way to replace "0x" by "\x" in C?
That sounds like an easy string replacement operation, but I think that's not really what you need.
The notation "\xFF" in a C string means "this string contains the character whose encoded value is 0xFF, i.e. 255 decimal".
So if that's what you mean, then you need to do the compiler's job and replace the incoming "0xFF" text with the single character that has the code 0xFF.
There is no standard function for this, since it's typically done by the compiler.
To implement this, I would write a loop that looks for 0x, and every time it's found, use strtoul() to attempt to convert a number at that location. If the number is too long (i.e. 0xDEAD) you need to figure out how to handle that.
You can use strstr in order to find the substring "0x" and then replace '0' with '\\':
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "0x01,0x0a,0x0f";
char *p = s;
printf("%s\n", s);
while (p) {
p = strstr(p, "0x");
if (p) *p = '\\';
}
printf("%s\n", s);
return 0;
}
Output:
0x01,0x0a,0x0f
\x01,\x0a,\x0f
But as pointed out by #unwind and #Sathish, that's probably not what you need.

Resources