Convert codepoint to wchar_t in C [duplicate]

Convert codepoint to wchar_t in C [duplicate] - c

This question already has an answer here:
PHP and C++ for UTF-8 code unit in reverse order in Chinese character
(1 answer)
Closed 9 years ago.
If I know the unicode codepoint of this 2 chinese character 你好 in str
How can I convert this char * str codepoint to chinese character and assign it to wchar_t * wstr ?
char * str = "4F60 597D";
wchar_t * wstr;
I know that I can directly assign like this and problem solved.
wchar_t * wstr = L"\u4F60\u597D";
But my problem is more complicated than that, my situation does not allow that.
How can I do the conversion from literal codepoint to wchar_t * ?
Thanks.
I am using MS Visual C with charset set to MBCS, assume that I cannot use UNICODE charset.
UPDATE :
Sorry, just corrected the wchar_t wstr to wchar_t * wstr
UPDATE
The char * str contain sequence of UTF-8 code units, for the 2 chinese character 你好
char * str = "\xE4\xBD\xA0\xE5\xA5\xBD";
size_t len = strlen(str) + 1;
wchar_t * wstr = new wchar_t[len];
size_t convertedSize = 0;
_locale_t local = _create_locale( LC_ALL , "Chinese");
_mbstowcs_s_l(&convertedSize, wstr, len, str, _TRUNCATE, local);
MessageBoxW( NULL, wstr , (LPCWSTR)L"Hello", MB_OK);
Why is the MessageBox printing out Japanese character ? Instead of chinese ? What is the right locale name to use ?

I can think about this function:
#define GetValFromHex(x) (x > '9' ? x-'A'+10 : x - '0')
wchar_t GetChineesChar(const char* strInput)
{
wchar_t result = 0;
LPBYTE ptr = (LPBYTE)&result;
ptr[0] = GetValFromHex(strInput[2]) * 16 + GetValFromHex(strInput[3]);
ptr[1] = GetValFromHex(strInput[6]) * 16 + GetValFromHex(strInput[7]);
return result;
}
wchatr_t* GetChineesString(const char* strInput)
{
size_t len = strlen(strInput) / 8;
wchar_t* returnVal = new wchar_t[len];
for (int i = 0; i < len; i++)
{
returnVal[i] = GetChineesChar(&strInput[i*8]);
}
return returnVal;
}
Then you should just call GetChineesString(); ofcourse you can add more validation to check the first two chars are \x and fivth and sixth chars are \x too before moving forward. but this is a start point for more robust code. this is not robust and not tested too.
Edit:
I am assuming all hex values are Upper Case.

Related

Assigning string with null terminators inside it in one line in C

I am trying to figure out if there is a way to assign a string which includes null terminators within the string as well as the end in one go. I am able to assign it manually, char by char, but is there a way to do this in one go? I looked for a similar question but couldn't find one.
Where I'm stuck.
EX:
char *argvppp = (char *)malloc(14);
argvppp = "mine\0-c\0/10\0/2.0\0"; // forward slash before the 10 after the null terminator
When I try to read from argvppp, like this:
printf("%s\n", argvppp);
printf("%s\n", argvppp+5);
printf("%s\n", argvppp+8);
printf("%s", argvppp+11);
This is what I get:
mine
-c
/10
[blank]
And when I try to just not escape it like this:
argvppp = "mine\0-c\010\0/2.0\0"; // no forward slash before the 10
This is what I get:
mine
-c
[blank]
.0
Is there a reliable way to do this without having to manually assign assign char by char?
What does work:
argvppp[0] = 'm';
argvppp[1] = 'i';
argvppp[2] = 'n';
argvppp[3] = 'e';
argvppp[4] = '\0';
argvppp[5] = '-';
argvppp[6] = 'c';
argvppp[7] = '\0';
argvppp[8] = '1';
argvppp[9] = '0';
argvppp[10] = '\0';
argvppp[11] = '2';
argvppp[12] = '.';
argvppp[13] = '0';
argvppp[14] = '\0';
I'm just wondering if there would be a way around this manual method.

Use memcpy()
char *argvppp = malloc(sizeof "mine\0-c\0/10\0/2.0\0");
memcpy(argvppp, "mine\0-c\0/10\0/2.0\0", sizeof "mine\0-c\0/10\0/2.0\0");
When you don't put a / between \0 and 10, you get a single character \010. You can solve this by splitting the string literal apart.
"mine\0-c\0" "10\0/2.0\0"
Adjacent string literals are automatically concatenated.

= does not copy the string only assigns the pointer to string literal to your variable. The memory allocated is lost. string literals cannot be modified as it invokes UB (undefined behaviour)
You need to copy the string literal into the memory referenced by te pointer:
define STR "mine\0-c\0/10\0/2.0\0";
char *argvppp = malloc(sizeof(STR));
memcpy(argvppp, STR, sizeof(STR));
you can also:
char argvppp[] = "mine\0-c\0/10\0/2.0\0";

You could try something like this where your data is clearly visible.
int main() {
char *segs[] = { "mine", "-c", "10", "2.0" };
char buf[ 50 ]; // long enough?!
char *at = buf;
for( int i = 0; i < sizeof segs/sizeof segs[0]; i++ )
at += sprintf( at, "%s", seg[i] ) + 1; // NB +1 is beyond '\0'
/* Then examine the contents of 'buf[]' character by character. */
return 0;
}

Convert from char array to 16bit signed int and 32bit unsigned int

I'm working on some embedded C for a PCB I've developed, but my C is a little rusty!
I'm looking to do some conversions from a char array to various integer types.
First Example:
[input] " 1234" (note the space before the 1)
[convert to] (int16_t) 1234
Second Example:
[input] "-1234"
[convert to] (int16_t) -1234
Third Example:
[input] "2017061234"
[convert to] (uint32_t) 2017061234
I've tried playing around with atoi(), but I don't seem to be getting the result I expected. Any suggestions?
[EDIT]
This is the code for the conversions:
char *c_sensor_id = "0061176056";
char *c_reading1 = " 3630";
char *c_reading2 = "-24.30";
uint32_t sensor_id = atoi(c_sensor_id); // comes out as 536880136
uint16_t reading1 = atoi(c_reading1); // comes out as 9224
uint16_t reading2 = atoi(c_reading2); // comes out as 9224

A couple of things:
Never use the atoi family of functions since they have no error handling and may crash if the input format is bad. Instead, use the strtol family of functions.
Either of these functions is somewhat resource heavy on resource-constrained microcontrollers. You might have to roll out your own version of strtol.
Example:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
int main()
{
const char* c_sensor_id = "0061176056";
const char* c_reading1 = " 3630";
const char* c_reading2 = "-1234";
c_reading1++; // fix the weird string format
uint32_t sensor_id = (uint32_t)strtoul(c_sensor_id, NULL, 10);
uint16_t reading1 = (uint16_t)strtoul(c_reading1, NULL, 10);
int16_t reading2 = (int16_t) strtol (c_reading2, NULL, 10);
printf("%"PRIu32 "\n", sensor_id);
printf("%"PRIu16 "\n", reading1);
printf("%"PRId16 "\n", reading2);
}
Output:
61176056
3630
-1234

The observed behavior is very surprising. I suggest writing functions to convert character strings to int32_t and uint32_t and use them instead of atoi():
uint32_t atou32(const char *s) {
uint32_t v = 0;
while (*s == ' ')
s++;
while (*s >= '0' && *s <= '9')
v = v * 10 + *s++ - '0';
return v;
}
int32_t atos32(const char *s) {
while (*s == ' ')
s++;
return (*s == '-') ? -atou32(s + 1) : atou32(s);
}
There is no error handling, a + is not even supported but at least the value is converted as 32 bits, which would not be the case for atoi() if type int only has 16 bits on your target platform.

Try initializing your strings like this:
char c_sensor_id[] = "0061176056";
char c_reading1[] = " 3630";
char c_reading2[] = "-24.30";

How to process a string char by char in the XS code

Let's suppose there is a piece of code like this:
my $str = 'some text';
my $result = my_subroutine($str);
and my_subroutine() should be implemented as Perl XS code. For example it could return the sum of bytes of the (unicode) string.
In the XS code, how to process a string (a) char by char, as a general method, and (b) byte by byte, if the string is composed of ASCII codes subset (a built-in function to convert from the native data srtucture of a string to char[]) ?

At the XS layer, you'll get byte or UTF-8 strings. In the general case, your code will likely contain a char * to point at the next item in the string, incrementing it as it goes. For a useful set of UTF-8 support functions to use in XS, read the "Unicode Support" section of perlapi
An example of mine from http://cpansearch.perl.org/src/PEVANS/Tickit-0.15/lib/Tickit/Utils.xs
int textwidth(str)
SV *str
INIT:
STRLEN len;
const char *s, *e;
CODE:
RETVAL = 0;
if(!SvUTF8(str)) {
str = sv_mortalcopy(str);
sv_utf8_upgrade(str);
}
s = SvPV_const(str, len);
e = s + len;
while(s < e) {
UV ord = utf8n_to_uvchr(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
|UTF8_WARN_SURROGATE
|UTF8_DISALLOW_FE_FF
|UTF8_WARN_FE_FF
|UTF8_WARN_NONCHAR));
int width = wcwidth(ord);
if(width == -1)
XSRETURN_UNDEF;
s += len;
RETVAL += width;
}
OUTPUT:
RETVAL
In brief, this function iterates the given string one Unicode character at a time, accumulating the width as given by wcwidth().

If you're expecting bytes:
STRLEN len;
char* buf = SvPVbyte(sv, len);
while (len--) {
char byte = *(buf++);
... do something with byte ...
}
If you're expecting text or any non-byte characters:
STRLEN len;
U8* buf = SvPVutf8(sv, len);
while (len) {
STRLEN ch_len;
UV ch = utf8n_to_uvchr(buf, len, &ch_len, 0);
buf += ch_len;
len -= ch_len;
... do something with ch ...
}

How to convert NSString* to ANTLR3_UINT8 or ANTLR3_UINT16

Could somebody tell me what is the proper way to convert a NSString* to an ANTLR3 string (used in C) ?
EDIT: this seems to work
char buf[255];
if ( YES == [myNSString getCString:buf maxLength:255 encoding:NSStringEncodingConversionAllowLossy] )
{
pANTLR3_UINT8 theExpression = (pANTLR3_UINT8*)buf;
...

Assuming that you want the complete string without loss, you can use the following code:
NSString *myString = ...;
NSUInteger len = [myString lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
char *buf = malloc(len + 1);
if ([myString getCString:buf maxLength:(len + 1) encoding:NSUTF8StringEncoding]) {
// pANTLR3_UINT8 is equivalent to ANTLR3_UINT8*
pANTLR3_UINT8 theExpression = (pANTLR3_UINT8) buf;
}
free(buf);
Of course you can use another encoding if character loss is not important.

How to retrieve n characters from char array

I have a char array in a C application that I have to split into parts of 250 so that I can send it along to another application that doesn't accept more at one time.
How would I do that? Platform: win32.

From the MSDN documentation:
The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If count is greater than the length of strSource, the destination string is padded with null characters up to length count. The behavior of strncpy is undefined if the source and destination strings overlap.
Note that strncpy doesn't check for valid destination space; that is left to the programmer. Prototype:
char *strncpy(
char *strDest,
const char *strSource,
size_t count
);
Extended example:
void send250(char *inMsg, int msgLen)
{
char block[250];
while (msgLen > 0)
{
int len = (msgLen>250) ? 250 : msgLen;
strncpy(block, inMsg, 250);
// send block to other entity
msgLen -= len;
inMsg += len;
}
}

I can think of something along the lines of the following:
char *somehugearray;
char chunk[251] ={0};
int k;
int l;
for(l=0;;){
for(k=0; k<250 && somehugearray[l]!=0; k++){
chunk[k] = somehugearray[l];
l++;
}
chunk[k] = '\0';
dohandoff(chunk);
}

If you strive for performance and you're allowed to touch the string a bit (i.e. the buffer is not const, no thread safety issues etc.), you could momentarily null-terminate the string at intervals of 250 characters and send it in chunks, directly from the original string:
char *str_end = str + strlen(str);
char *chunk_start = str;
while (true) {
char *chunk_end = chunk_start + 250;
if (chunk_end >= str_end) {
transmit(chunk_start);
break;
}
char hijacked = *chunk_end;
*chunk_end = '\0';
transmit(chunk_start);
*chunk_end = hijacked;
chunk_start = chunk_end;
}

jvasaks's answer is basically correct, except that he hasn't null terminated 'block'. The code should be this:
void send250(char *inMsg, int msgLen)
{
char block[250];
while (msgLen > 0)
{
int len = (msgLen>249) ? 249 : msgLen;
strncpy(block, inMsg, 249);
block[249] = 0;
// send block to other entity
msgLen -= len;
inMsg += len;
}
}
So, now the block is 250 characters including the terminating null. strncpy will null terminate the last block if there are less than 249 characters remaining.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Convert codepoint to wchar_t in C [duplicate] - c

Related

Assigning string with null terminators inside it in one line in C

Convert from char array to 16bit signed int and 32bit unsigned int

How to process a string char by char in the XS code

How to convert NSString* to ANTLR3_UINT8 or ANTLR3_UINT16

How to retrieve n characters from char array

Categories

Resources