UTF-8 encoding in c [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Improve this question
what is UTF-8 encoding? I google it but could not able to understand what it is. Please explain in simple words and example.
Next I need to encode one string in UTF-8 encoding. I got openssl but it is converting in only base64 format.
#include<stdio.h>
struct some
{
char string[40];
};
int main()
{
string *s;
char str[9];
gets(str);
strcpy(s,str);
/*Now how to get emcoded form of "Hello" in UTF-8*/
/*printf("encoded data");
return 0;
}
Those strings are available at runtime so do not anything about what is coming. and after encoding need to store them in DB.
I checked it on SO itself but could not find any source in c, it is available in .net java c#. I am using linux Redhat.

Encodings describe what bytes or sequence of bytes correspond to what characters. ASCII is the simplest encoding. In ASCII a single byte value corresponds to a single character. Unfortunately there are more than 255 characters in the world. UTF-8 is probably the most common encoding format because it is compatible with english ASCII, but also allows international characters. If you write a standard english string in C it is already UTF-8. "Hello" == "Hello"
Joel has a fantastic article about this subject called: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
It does a good job of explaining ASCII, unicode, and UTF8 string encodings.
In UTF-8, every code point from 0-127 is stored in a single byte. Only
code points 128 and above are stored using 2, 3, in fact, up to 4 (not 6, corrected by R.)
bytes.

Related

Retain formatting in printf with %s [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 10 months ago.
Improve this question
So I have a string char *str = someString(); and I want to print the string and retain any formatting that may be present in the string with printf("%s", str); So for instance, if str were equal to "\033[32mPassed!\n\033[0m", I would want it to print Passed! in green followed by a new line. But currently, it prints the string literally.
Is this something printf can do or is it not designed for this? I understand that this could cause issues if the string contained something like %d without actually having a number passed.
Sending ␛[32mPassed!␊␛[0m to the terminal is what causes the desired effect.[1]
You are asking how convert the string "\033[32mPassed!\n\033[0m" into the string ␛[32mPassed!␊␛[0m, just like the C compiler does when provided the C code (C string literal) "\033[32mPassed!\n\033[0m".
printf does not provide a way to convert a C string literal into the string it would produce.
And nothing else in the standard library does either. The functionality of parsing C code is entirely located in the compiler, not in the executable it produces.
You will need to write your own parser. At the very least, you will need to do the following:
Remove the leading and trailing quotes.
Replace the four character sequence \033 with character 0x1B.
Replace the two character sequence \n with character 0x0A.
Footnotes
Assuming the terminal understands these ANSI escape sequences.

Why doesn't C store strings as a linked list of characters? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
In C strings are stored as an array of character eliminated with '\0'. So I can do this:
char string[] = "Hello, World!";
or
char* string = "Hello, World!";
I can just use predefined string functions [strcpy()] to overcome the fact that I cannot exceed the initialized length of the string.
I am trying to make a program that does basic math operations on very large numbers. I thought of storing these digits in a linked list. But perhaps I can just store them in a string (char*) and make functions to operate directly on that.
What benefit will I have of using linked lists in the above program?
The C language is defined in standards like n1570 or etter.
For historical reasons, strings in C are represented in contiguous memory.
And in 2021, most processors (x86-64, ARM, PowerPC ....) are efficiently handling them (with an optimizing compiler, like a recent GCC)
Of course, you can develop your C library representing your "string" like type as linked lists. Look into Glib (part of GTK), and study its source code for inspiration.
UTF8 encoding has several bytes (char) per characters (like é or €)
Some implementations of Prolog represented strings as linked lists.

Comparing a char to the char '\"' [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I'm working on C and need to recieve a string from the user in the format of "abcd", and to diagnose it to retrieve it as a string of "abcd"(int the code).
For some reason when I try to check if the first char in the string (that I've read using sscanf) is " it doesn't return it is, as you can see in the picture below. The watch says that data[0] is '"', but that data[0] == '"' is false, which is absurd.
The character in data[0] is probably a special quotation mark with the ASCII (or rather Windows-1252) code 147/0x93. It is a number in which the highest bit is 1, and as such is outside the 7 bit ASCII range. While the 7 bit ASCII codes are interpreted identically across many character sets this is not so for 8 bit values (> 127). The "glyph" a given terminal or printer will show for 8 bit values depends on the char set is assumes (in your case, as mentioned, Windows-1252).
Last not least, because on your system chars are signed the debugger interprets the highest bit as a minus sign and shows a negative value. I think you can cast it in the debugger watch expression to unsigned char to obtain the positive equivalent.
That character cannot be entered directly with the keyboard; on Windows you can try to use the Alt+Number block trick. When you enter the normal quotation mark you create a char with the ASCII code 34/0x22, which the compiler and debugger correctly claim is not identical.

Unknown characters when printing text file in c [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to print the characters from a text file using C in CodeBlock terminal. I use getc and printf. But the terminal shows unwanted characters as well. For example,
when I read,
CAAAAATATAAAAACAGGTTTATGATATAAGGTAAAGTATGGGAGATGGGGACAAAAGT
It shows,
CΘA A A A A T A T A A A A A C A G G T T T A T G A T A T A A G GT A A A G T A T$GhGêG╝A G<AöT G#GñG<G AxC A A A A G T
Can any one please state what can be done to avoid this situation.
Your text file obviously uses a 2byte character encoding. If this is on windows, it's very likely UTF-16.
char in C is a single byte, so a single-byte encoding is assumed. There are many ways to solve this, e.g. you could use iconv. On windows, you can use wchar_t(*) to read the characters of this file (together with functions for wide characters like getwc() and if you need it in an 8byte encoding, windows API functions like WideCharToMultiByte() can help.
wchar_t is a type for "wide" characters, but it's implementation-defined how many bytes a wide character has. On windows, wchar_t has 16 bits and typically holds UTF-16 encoded characters. On many other systems, wchar_t has 32 bits and typically holds UCS-4 encoded characters.

Is there a C library that converts integers to hexadecimal or binary? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Is there a library function that takes in an integer and converts it to a single-byte hexadecimal or binary number?
For example, if I passed it the input of 64, it would output 0x40.
For hex numbers, you can use sprintf:
char buff[80];
sprintf(buff, "0x%02x", 64);
An int is an int, whether it is 0x40 or 64; the data representation of the two is exactly the same (10000000...011111111). If you are asking how it would be represented in a char array, you'd use sprintf. The simplest way is sprintf(buf, "%#x", 64).
Internally, integers are already represented as binary. You can display a number as hexadecimal using the %x format string (%#02x will fit your example best).
See this question regarding binary, for which there isn't a built-in format string specifier.
In C the int type's size depends upon implementation. Normally, it will be 4 bytes long, and thus impossible to storing in a single byte without losing important information.
If you use a char or int8_t then you will have a single byte. Bytes are binary internally and always will be. So anytime you want to do anything with your byte, you must do it in binary.
Hexadecimal vs binary vs base 10 is a display decision. So if you accept those as input, you will have to convert a string into a single byte for storage in memory. When you display them, you will have to convert to the desired display format.
Using sprintf works for display. Use strtol for input.

Resources