%s minimum field width in the presence of unicode characters - c

So, here's my problem:
If someone wants to output visually aligned strings using printf, they'll obviously use %<n>s (where <n> is the minimum field width). And this works just fine, unless one of the strings contains unicode (UTF-8) characters.
Take this very basic example:
#include <stdio.h>
int main(void)
{
char* s1 = "\u03b1\u03b2\u03b3";
char* s2 = "abc";
printf("'%6s'\n", s1);
printf("'%6s'\n", s2);
return 0;
}
which will produce the following output:
'αβγ'
' abc'
This isn't all that surprising, because printf of course doesn't know that \u03b1 (which consists of two characters) only produces a single glyph on the output device (assuming UTF-8 is supported).
Now assume that i generate s1 and s2, but have no control over the format string used to output those variables. My current understanding is that nothing i could possibly do to s1 would fix this, because i'd have to somehow fool printf into thinking that s1 is shorter than it actually is. However, since i also control s2, my current solution is to add a non-printing character to s2 for each unicode character in s1, which would look something like this:
#include <stdio.h>
int main(void)
{
char* s1 = "\u03b1\u03b2\u03b3";
char* s2 = "abc\x06\x06\x06";
printf("'%6s'\n", s1);
printf("'%6s'\n", s2);
return 0;
}
This will produce the desired output (even though the actual width no longer corresponds to the specified field width, but i'm willing to accept that):
'αβγ'
'abc'
For context:
The example above is only to illustrate the unicode-problem, my actual code involves printing numbers with SI-prefixes, only one of which (µ) is a unicode character. Therefore i would generate strings containing only up to one normal or unicode character (which is why i can accept the resulting offset in the field-width).
So, my questions are:
Is there a better solution for this?
Is \x06 (ACK) a sensible choice (i.e. a character without undesired side-effects)?
Can you think of any problems with this approach?

Since the non ascii is restricted to µ, I believe there is a solution. I've taken value of µ to be \u00b5. Replace it with the correct value
I've coded a small function myPrint which takes input the string and the width n. You should be able to modify the code below to fit to your needs.
The function searches for all occurrences of µ and increments that much of width to the string
#include <stdio.h>
void myPrint(char* string, int n)
{
char* valueOfNu = "\u00b5";
for(int i=0;string[i]!='\0';i++)
{
if(string[i]==valueOfNu[0] && string[i+1]==valueOfNu[1])
n++;
}
printf("%*s",n,string);
}
int main(void)
{
char* s1 = "ab\u00b5";
char* s2 = "abc";
myPrint(s1,6);
printf("\n");
myPrint(s2,6);
printf("\n");
return 0;
}

Related

Is there a way to print Runes as individual characters?

Program's Purpose: Rune Cipher
Note - I am linking to my Own GitHub page below
(it is only for purpose-purpose (no joke intended; it is only for the purpose of showing the purpose of it - what I needed help with (and got help, thanks once again to all of you!)
Final Edit:
I have now (thanks to the Extremely Useful answers provided by the Extremely Amazing People) Completed the project I've been working on; and - for future readers I am also providing the full code.
Again, This wouldn't have been possible without all the help I got from the guys below, thanks to them - once again!
Original code on GitHub
Code
(Shortened down a bit)
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
#define UNICODE_BLOCK_START 0x16A0
#define UUICODE_BLOCK_END 0x16F1
int main(){
setlocale(LC_ALL, "");
wchar_t SUBALPHA[]=L"ᛠᚣᚫᛞᛟᛝᛚᛗᛖᛒᛏᛋᛉᛈᛇᛂᛁᚾᚻᚹᚷᚳᚱᚩᚦᚢ";
wchar_t DATA[]=L"hello";
int lenofData=0;
int i=0;
while(DATA[i]!='\0'){
lenofData++; i++;
}
for(int i=0; i<lenofData; i++) {
printf("DATA[%d]=%lc",i,DATA[i]);
DATA[i]=SUBALPHA[i];
printf(" is now Replaced by %lc\n",DATA[i]);
} printf("%ls",DATA);
return 0;
}
Output:
DATA[0]=h is now Replaced by ᛠ
...
DATA[4]=o is now Replaced by ᛟ
ᛠᚣᚫᛞᛟ
Question continues below
(Note that it's solved, see Accepted answer!)
In Python3 it is easy to print runes:
for i in range(5794,5855):
print(chr(i))
outputs
ᚢ
ᚣ
(..)
ᛝ
ᛞ
How to do that in C ?
using variables (char, char arrays[], int, ...)
Is there a way to e.g print ᛘᛙᛚᛛᛜᛝᛞ as individual characters?
When I try it, it just prints out both warnings about multi-character character constant 'ᛟ'.
I have tried having them as an array of char, a "string" (e.g char s1 = "ᛟᛒᛓ";)
And then print out the first (ᛟ) char of s1: printf("%c", s1[0]); Now, this might seem very wrong to others.
One Example of how I thought of going with this:
Print a rune as "a individual character":
To print e.g 'A'
printf("%c", 65); // 'A'
How do I do that, (if possible) but with a Rune ?
I have as well as tried printing it's digit value to char, which results in question marks, and - other, "undefined" results.
As I do not really remember exactly all the things I've tried so far, I will try my best to formulate this post.
If someone spots a a very easy (maybe, to him/her - even plain-obvious) solution(or trick/workaround) -
I would be super happy if you could point it out! Thanks!
This has bugged me for quite some time.
It works in python though - and it works (as far as I know) in c if you just "print" it (not trough any variable) but, e.g: printf("ᛟ"); this works, but as I said I want to do the same thing but, trough variables. (like, char runes[]="ᛋᛟ";) and then: printf("%c", runes[0]); // to get 'ᛋ' as the output
(Or similar, it does not need to be %c, as well as it does not need to be a char array/char variable) I am just trying to understand how to - do the above, (hopefully not too unreadable)
I am on Linux, and using GCC.
External Links
Python3 Cyphers - At GitHub
Runes - At Unix&Linux SE
Junicode - At Sourceforge.io
To hold a character outside of the 8-bit range, you need a wchar_t (which isn't necessarily Unicode). Although wchar_t is a fundamental C type, you need to #include <wchar.h> to use it, and to use the wide character versions of string and I/O functions (such as putwc shown below).
You also need to ensure that you have activated a locale which supports wide characters, which should be the same locale as is being used by your terminal emulator (if you are writing to a terminal). Normally, that will be the default locale, selected with the string "".
Here's a simple equivalent to your Python code:
#include <locale.h>
#include <stdio.h>
#include <wchar.h>
int main(void) {
setlocale(LC_ALL, "");
/* As indicated in a comment, I should have checked the
* return value from `putwc`; if it returns EOF and errno
* is set to EILSEQ, then the current locale can't handle
* runic characters.
*/
for (wchar_t wc = 5794; wc < 5855; ++wc)
putwc(wc, stdout);
putwc(L'\n', stdout);
return 0;
}
(Live on ideone.)
Stored on the stack as a string of (wide) characters
If you want to add your runes (wchar_t) to a string then you can proceed the following way:
using wcsncpy: (overkill for char, thanks chqrlie for noticing)
#define UNICODE_BLOCK_START 0x16A0 // see wikipedia link for the start
#define UUICODE_BLOCK_END 0x16F0 // true ending of Runic wide chars
int main(void) {
setlocale(LC_ALL, "");
wchar_t buffer[UUICODE_BLOCK_END - UNICODE_BLOCK_START + sizeof(wchar_t) * 2];
int i = 0;
for (wchar_t wc = UNICODE_BLOCK_START; wc <= UUICODE_BLOCK_END; wc++)
buffer[i++] = wc;
buffer[i] = L'\0';
printf("%ls\n", buffer);
return 0;
}
About Wide Chars (and Unicode)
To understand a bit better what is a wide char, you have to think of it as a set of bits set that exceed the original range used for character which was 2^8 = 256 or, with left shifting, 1 << 8).
It is enough when you just need to print what is on your keyboard, but when you need to print Asian characters or other unicode characters, it was not enough anymore and that is the reason why the Unicode standard was created. You can find more about the very different and exotic characters that exist, along with their range (named unicode blocks), on wikipedia, in your case runic.
Range U+16A0..U+16FF - Runic (86 characters), Common (3 characters)
NB: Your Runic wide chars end at 0x16F1 which is slightly before 0x16FF (0x16F1 to 0x16FF are not defined)
You can use the following function to print your wide char as bits:
void print_binary(unsigned int number)
{
char buffer[36]; // 32 bits, 3 spaces and one \0
unsigned int mask = 0b1000000000000000000000000000;
int i = 0;
while (i++ < 32) {
buffer[i] = '0' + !!(number & (mask >> i));
if (i && !(i % 8))
buffer[i] = ' ';
}
buffer[32] = '\0';
printf("%s\n", buffer);
}
That you call in your loop with:
print_binary((unsigned int)wc);
It will give you a better understand on how your wide char is represented at the machine level:
ᛞ
0000000 0000001 1101101 1100000
NB: You will need to pay attention to detail: Do not forget the final L'\0' and you need to use %ls to get the output with printf.

How to padding two strings in printf?

For example:
printf("%-10s%s\n", s1, s2);
I can get:
s1 s2
I want to have s3 at a fixed column when the earlier strings might have varying lengths
printf("%s%s%s\n", s1, s2, s3); // how to padding 10?
s1s2 s3
s11s22 s3
s111s222 s3
You will have to use a little bit of logic to work out how many spaces to print, e.g.:
int s3_column = 15; // example position
int length = printf("[%s%s]", s1, s2);
if ( length >= 0 && length < s3_column )
printf("%*s", (int)(s3_column - length), "");
printf("%s\n", s3);
As suggested in comments, another possible approach would be to prepare the [s1s2] part in its own buffer, but that requires extra memory and incurs all the potential problems associated with memory allocation, and will end up being more complicated code than calculating the spaces as in my example.
void print_with_indent(int indent, char * string) { printf("%*s%s", indent, "", string); }
with each %s include a n.m modifier (%n.ms) to place each string at a known starting point (and avoids overruns into the next strings desired location) then make sure the sum of the 'm's places the third string exactly where you want it to begin.
--OR--
use an ANSI escape sequence after the second 'output format conversion' specifier to place the cursor where the third string is to begin
--OR--
run ncurses window and use movexy() to place the cursor where you want the third string to begin
--OR--
if using the conio.h library from Borland, then use:
gotoxy()`
--OR--
in windows
#include <windows.h>
void SetPosition(int X, int Y)
{
HANDLE Screen;
Screen = GetStdHandle(STD_OUTPUT_HANDLE);
COORD Position={X, Y};
SetConsoleCursorPosition(Screen, Position);
}

copying an substring to string

i'm trying to get a 2 strings from the user and the second one will be the "needle" to copy to the first string
for example:
string 1 (user input): eight height freight
string 2 (user input): eight
output: EIGHT hEIGHT frEIGHT
for example i want to print: toDAY is a good DAY
having trouble copying multiple needles in stack
i have tried using while (*str) {rest of the function with str++}
i would love some explanation
#define _CRT_SECURE_NO_WARNINGS
#define N 101
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
void replaceSubstring(char* str, char* substr);
void main() {
int flag = 1;
char str[N], substr[N];
//char* str_ptr = &str, * substr_ptr = &substr; //creating pointer for the sake of while
while (flag) {
printf("\nEnter main text: ");
gets_s(str,N);
if (!str)
flag = 0;
printf("\nEnter sub-text: ");
gets_s(substr,N);
if (!str)
flag = 0;
replaceSubstring(str, substr);
printf("%s",str);
}
printf("\nExited. (press any key to exit)");
}
void replaceSubstring(char* str, char* substr) {
int lensbstr;
str = strstr(str, substr);
_strupr(substr); //cnvrt to UPPERCASE
lensbstr = strlen(substr); //length of the mutual string
if (str)
strncpy(str, substr, lensbstr);
}
This looks like a programming exercise, so I’m not going to just give you the answer. However, I’ll give you a few hints.
Two big problems:
You don’t have a loop that would replace the second and later instances.
You are upper-casing the substring... not a copy of the substring. A second pass through replaceSubstring would only match the upper-case version of the substring.
A couple of small problems / style comments:
str is an array, so its value is always non-zero, so “if(!str)” is never true.
strncpy is almost never the right answer. It will work here, but you shouldn’t get in the habit of using it. Its behavior is subtle and is rarely what you want. Here it would be faster and more obvious to use memcpy.
You are upper-casing the substring and measuring its length even if you didn’t find it and so won’t need those results.
Although using int for flags works and is the traditional way, newer versions of the language have stdbool.h, the “bool” type, and the “true” and “false” constants. Using those is almost always better.
You appear to intend to stop when the user enters an empty string for the first string. So why do you ask for the second string in that case? It seems like you want an infinite loop and a “break” in the middle.

How to check if input in valid - by comparing strings in C

I'm making a calc function which is meant to check if the input is valid. So, I'll have 2 strings, one with what the user inputs (eg, 3+2-1 or maybe dog - which will be invalid), and one with the ALLOWED characters stored in a string, eg '123456789/*-+.^' .
I'm not sure how can I do this and have trouble getting it started. I know a few functions such as STRMCP, and the popular ones from the string.h file, but I have no idea how to use them to check every input.
What is the most simplest way to do this?
One way of proceeding is the following.
A string is an array of ascii codes. So if your string is
char formula[50];
then you have a loop
int n =0;
while (formula[n]!=0)
{
if ( (formula[n]<........<<your code here>> ))
{printf("invalid entry\n\n"); return -1; //-1 = error code
n++;
}
you need to put the logic into the loop, but you can test the ascii codes of each character with this loop.
There may be a more elegant way of solving this, but this will work if you put the correct conditional statement here to check the ascii code of each character.
The while statement checks to see ifyou got to the end of the string.
Here's a demonstration of how use strpbrk() to check all characters in a string are in your chosen set:
#include <string.h>
#include <stdio.h>
const char alphabet[] = "123456789/*+-=.^";
int main(void) {
const char a[] = "3+2-1";
const char b[] = "dog";
char *res = strpbrk(a, alphabet);
printf("%s %s\n", a, (res) ? "true" : "false");
res = strpbrk(b, alphabet);
printf("%s %s\n", b, (res) ? "true" : "false");
return 0;
}
That's not the fastest way to do this, but it's very easy to use.
However, if you are writing a calculator function, you really want to parse the string at the same time. A typical strategy would be to have two types of entity - operators (+-/*^) and operands (numbers, so -0.1, .0002, 42, etc). You would extract these from the string as you parse it, and just fail if you hit an invalid character. (If you need to handle parentheses, you'll need a stack for the parsing.... and you'll likely need to work with a stack anyway to process and evaluate the expression overall.)

C homework - string loops replacements

I know it's a little unorthodox and will probably cost me some downvotes, but since it's due in 1 hour and I have no idea where to begin I thought I'd ask you guys.
Basically I'm presented with a string that contains placeholders in + form, for example:
1+2+5
I have to create a function to print out all the possibilities of placing different combinations of any given series of digits. I.e. for the series:
[9,8,6] // string array
The output will be
16265
16285
16295
18265
18285
18295
19265
19285
19295
So for each input I get (number of digits)^(number of placeholders) lines of output.
Digits are 0-9 and the maximum form of the digits string is [0,1,2,3,4,5,6,7,8,9].
The original string can have many placeholders (as you'd expect the output can get VERY lengthly).
I have to do it in C, preferably with no recursion. Again I really appreciate any help, couldn't be more thankful right now.
If you can offer an idea, a simplified way to look at solving this, even in a different language or recursively, it'd still be ok, I could use a general concept and move on from there.
It prints them in different order, but it does not matter. and it's not recursive.
#include <stdlib.h>
#include <stdio.h>
int // 0 if no more.
get_string(char* s, const char* spare_chr, int spare_cnt, int comb_num){
for (; *s; s++){
if (*s != '+') continue;
*s = spare_chr[comb_num % spare_cnt];
comb_num /= spare_cnt;
};
return !comb_num;
};
int main(){
const char* spare_str = "986";
int num = 0;
while (1){
char str[] = "1+2+5";
if (!get_string(str, spare_str, strlen(spare_str), num++))
break; // done
printf("str num %2d: %s\n", num, str);
};
return 0;
};
In order to do the actual replacement, you can use strchr to find the first occurrence of a character and return a char * pointer to it. You can then simply change that pointer's value and bam, you've done a character replacement.
Because strchr searches for the first occurrence (before a null terminator), you can use it repeatedly for every value you want to replace.
The loop's a little trickier, but let's see what you make of this.

Resources