Representing UTF-8 in ASCII [duplicate] - c

This question already has answers here:
UTF-8 -> ASCII in C language
(5 answers)
Closed 9 years ago.
I am trying to code a lexer for language "go" in "C".But "go" used UTF-8 as it's character set and C used Ascii. So is it possible to represent the unicode characters in ascii?

C has a support for multibyte strings, but you have to mess with locales for it to work.
ASCII is actually a subset of UTF-8, so you can use standard C singlebyte string functions to some extent. Just remember that functions requiring or returning lengths are byte counts, not character counts.
For anything more sophisticated, you'll need external library.

Related

Unknown characters when printing text file in c [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to print the characters from a text file using C in CodeBlock terminal. I use getc and printf. But the terminal shows unwanted characters as well. For example,
when I read,
CAAAAATATAAAAACAGGTTTATGATATAAGGTAAAGTATGGGAGATGGGGACAAAAGT
It shows,
CΘA A A A A T A T A A A A A C A G G T T T A T G A T A T A A G GT A A A G T A T$GhGêG╝A G<AöT G#GñG<G AxC A A A A G T
Can any one please state what can be done to avoid this situation.
Your text file obviously uses a 2byte character encoding. If this is on windows, it's very likely UTF-16.
char in C is a single byte, so a single-byte encoding is assumed. There are many ways to solve this, e.g. you could use iconv. On windows, you can use wchar_t(*) to read the characters of this file (together with functions for wide characters like getwc() and if you need it in an 8byte encoding, windows API functions like WideCharToMultiByte() can help.
wchar_t is a type for "wide" characters, but it's implementation-defined how many bytes a wide character has. On windows, wchar_t has 16 bits and typically holds UTF-16 encoded characters. On many other systems, wchar_t has 32 bits and typically holds UCS-4 encoded characters.

data types to store large numbers in C [duplicate]

This question already has answers here:
Store and work with Big numbers in C
(3 answers)
Closed 8 years ago.
I want to store a number "x" where 0<=x<=(10^18).
Which datatype should be used in C for storing such a large number?
I used "long int" but it's not working..
Use unsigned long long int. It is supported in C99 or later, and as a compiler extension in some pre-1999 compilers. and it must be able to hold at least 1.8 * 10^19 values.

Datatype which can store very large value in C [duplicate]

This question already has answers here:
Are there any solid large integer implementations in C? [closed]
(7 answers)
Closed 8 years ago.
Recently in programming contest in Here, the problem is pretty straight forward but catch is with worst case scenario which we have to handle data of size 10^10000 .
I tried the program in python which is straight forward as i don't have to specify the datatype(It is taken care by the compiler ) but when i tried with C I couldn't find the correct datatype .
(I tried uintmax_t which didn't work out too).
So how to approach very huge type of data's in C ?
There is no built-in datatype in C that can store that big values. You will either have to write your own implementation or use a library. As this is a competition, though the second is not an option. Every now and then similar problems appear and usually the best approach is to use another language e.g. java(as it is usually available on competitions).

UTF-8 encoding in c [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 9 years ago.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Improve this question
what is UTF-8 encoding? I google it but could not able to understand what it is. Please explain in simple words and example.
Next I need to encode one string in UTF-8 encoding. I got openssl but it is converting in only base64 format.
#include<stdio.h>
struct some
{
char string[40];
};
int main()
{
string *s;
char str[9];
gets(str);
strcpy(s,str);
/*Now how to get emcoded form of "Hello" in UTF-8*/
/*printf("encoded data");
return 0;
}
Those strings are available at runtime so do not anything about what is coming. and after encoding need to store them in DB.
I checked it on SO itself but could not find any source in c, it is available in .net java c#. I am using linux Redhat.
Encodings describe what bytes or sequence of bytes correspond to what characters. ASCII is the simplest encoding. In ASCII a single byte value corresponds to a single character. Unfortunately there are more than 255 characters in the world. UTF-8 is probably the most common encoding format because it is compatible with english ASCII, but also allows international characters. If you write a standard english string in C it is already UTF-8. "Hello" == "Hello"
Joel has a fantastic article about this subject called: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
It does a good job of explaining ASCII, unicode, and UTF8 string encodings.
In UTF-8, every code point from 0-127 is stored in a single byte. Only
code points 128 and above are stored using 2, 3, in fact, up to 4 (not 6, corrected by R.)
bytes.

Format a Large Number [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to format a number from 1123456789 to 1,123,456,789 in C?
How can I format a large integral number with commas in C, such that the readability is improved?
222222 should be 222,222 and 44444444 should be 44,444,444.
You do not need to do the formatting yourself; printf in Unix has a ' modifier:
printf("%'d\n", number);
It looks like Visual Studio doesn't support that. This syntax is locale-aware, however.
Use the modulus (%) operation and build your own string.
If you google for "c format thousands separator" then one of the hits is this page http://www.codeguru.com/forum/archive/index.php/t-402370.html
It's C++ though but it should give you an idea of what you can do.

Resources