Problem converting char to wchar_t (length wrong) - c

I am trying to create a simple datastructure that will make it easy to convert back and forth between ASCII strings and Unicode strings. My issue is that the length returned by the function mbstowcs is correct but the length returned by the function wcslen, on the newly created wchar_t string, is not. Am I missing something here?
typedef struct{
wchar_t *string;
long length; // I have also tried int, and size_t
} String;
void setCString(String *obj, char *str){
obj->length = strlen(str);
free(obj->string); // Free original string
obj->string = (wchar_t *)malloc((obj->length + 1) * sizeof(wchar_t)); //Allocate space for new string to be copied to
//memset(obj->string,'\0',(obj->length + 1)); NOTE: I tried this but it doesn't make any difference
size_t length = 0;
length = mbstowcs(obj->string, (const char *)str, obj->length);
printf("Length = %d\n",(int)length); // Prints correct length
printf("!C string %s converted to wchar string %ls\n",str,obj->string); //obj->string is of a wcslen size larger than Length above...
if(length != wcslen(obj->string))
printf("Length failure!\n");
if(length == -1)
{
//Conversion failed, set string to NULL terminated character
free(obj->string);
obj->string = (wchar_t *)malloc(sizeof(wchar_t));
obj->string = L'\0';
}
else
{
//Conversion worked! but wcslen (and printf("%ls)) show the string is actually larger than length
//do stuff
}
}

The code seems to work fine for me. Can you provide more context, such as the content of strings you're passing to it, and what locale you're using?
A few other bugs/style issues I noticed:
obj->length is left as the allocated length, rather than updated to match the length in (wide) characters. Is that your intention?
The cast to const char * is useless and bad style.
Edit: Upon discussion, it looks like you may be using a nonconformant Windows version of the mbstowcs function. If so, your question should be updated to reflect as such.
Edit 2: The code only happened to work for me because malloc returned a fresh, zero-filled buffer. Since you are passing obj->length to mbstowcs as the maximum number of wchar_t values to write to the destination, it will run out of space and not be able to write the null terminator unless there's a proper multibyte character (one which requires more than a single byte) in the source string. Change this to obj->length+1 and it should work fine.

The length you need to pass to mbstowcs() includes the L'\0' terminator character, but your calculated length in obj->length() does not include it - you need to add 1 to the value passed to mbstowcs().
In addition, instead of using strlen(str) to determine the length of the converted string, you should be using mbstowcs(0, src, 0) + 1. You should also change the type of str to const char *, and elide the cast. realloc() can be used in place of a free() / malloc() pair. Overall, it should look like:
typedef struct {
wchar_t *string;
size_t length;
} String;
void setCString(String *obj, const char *str)
{
obj->length = mbstowcs(0, src, 0);
obj->string = realloc(obj->string, (obj->length + 1) * sizeof(wchar_t));
size_t length = mbstowcs(obj->string, str, obj->length + 1);
printf("Length = %zu\n", length);
printf("!C string %s converted to wchar string %ls\n", str, obj->string);
if (length != wcslen(obj->string))
printf("Length failure!\n");
if (length == (size_t)-1)
{
//Conversion failed, set string to NULL terminated character
obj->string = realloc(obj->string, sizeof(wchar_t));
obj->string = L'\0';
}
else
{
//Conversion worked!
//do stuff
}
}
Mark Benningfield points out that mbstowcs(0, src, 0) is a POSIX / XSI extension to the C standard - to obtain the required length under only standard C, you must instead use:
const char *src_copy = src;
obj->length = mbstowcs(NULL, &src_copy, 0, NULL);

I am running this on Ubuntu linux with UTF-8 as locale.
Here is the additional info as requested:
I am calling this function with a fully allocated structure and passing in a hard coded "string" (not a L"string"). so I call the function with what is essentially setCString(*obj, "Hello!").
Length = 6
!C string Hello! converted to wchar string Hello!xxxxxxxxxxxxxxxxxxxx
(where x = random data)
Length failure!
for reference
printf("wcslen = %d\n",(int)wcslen(obj->string)); prints out as
wcslen = 11

Related

Basic use of strcpy_s , strcat_s

Code
char* CreateString(char* string1, char* string2) {
int length = strlen(string1) + strlen(string2);
// Allocate memory for the resulting string
char* result = malloc((length) * sizeof(char));
// Concatenate the two strings
strcpy_s(result, sizeof result, string1);
strcat_s(result,sizeof result, string2);
return result;
}
I have this simple code of mine , all i want to do is add them together, but whenever I use strcpy_s or strcat_s it gives this error in the
picture
But it works if I use the CRT library.
Another question is that did I use the Pointers correctly? I'm new to this topic and it is confusing so I don't really understand it.
I tried to add Two Sentences together
strings require null terminating character at the end. So the buffer is too short.
sizeof result gives the size of the pointer not the size of the referenced object.
char* CreateString(char* string1, char* string2) {
size_t length = strlen(string1) + strlen(string2) + 1;
// or for Windows
// rsize_t length = strlen(string1) + strlen(string2) + 1;
// Allocate memory for the resulting string
char* result = malloc((length) * sizeof(*result));
// Concatenate the two strings
strcpy_s(result, length, string1);
strcat_s(result, length, string2);
return result;
}

Make a C program to copy char array elements from one array to another and don´t have to worry about null character

I want to create a program that copies, element by element, one char array to another but the problem is that, if I don´t introduce in the destination array the null character, the program outputs the correct String + weird characters, so my solution was to iterate through with the length of the String + 1 in order to include the null character, is there any way to iterate with the length of the String without having to worry about the null character? The code is as follows:
int copy(char * source, char * destination, unsigned int lengthDestination);
int copy(char * source, char * destination, unsigned int lengthDestination)
{
int i;
for(i = 0; source[i] != '\0'; i++) {
//Count the length of the source array
}
if(i + 1 != lengthDestination){ //i + 1 in order to take into account '\0'
return 1;
}
for(int j = 0; j < lengthDestination; j++) {
destination[j] = source[j];
}
return 0;
}
int main() {
char * source = "Test number 17"; //Length is 15 counting the null character
unsigned int destinationLength = 15;
char destination[destinationLength];
copy(source, destination, destinationLength);
printf("The String source is: %s\n", source);
printf("The String destination is: %s\n", destination);
return 0;
}
You will always need to 'worry' about the null terminator, in the sense that you can't not have a null terminator in your destination C string, and you will always need to explicitly write the null terminator at the end of your new C string.
Even the built-in strcpy method will copy the null terminator character from your original string to the destination string.
#include <stdio.h>
#include <assert.h>
int copy(char * source, char * destination, unsigned int lengthDestination);
int copy(char * source, char * destination, unsigned int lengthDestination)
{
int i;
for(i = 0; source[i] != '\0'; i++) {
destination[i] = source[i];
}
assert(i+1 == lengthDestination);
destination[i+1] = '\0';
return 0;
}
int main() {
char * source = "Test number 17"; //Length is 15 counting the null character
unsigned int destinationLength = 15;
char destination[destinationLength];
copy(source, destination, destinationLength);
printf("The String source is: %s\n", source);
printf("The String destination is: %s\n", destination);
return 0;
}
If you are passing destination previously declared with automatic storage type or as a pointer previously allocated with malloc, calloc or realloc having allocated storage type, and in either case having nchars of storage available, you can implement a fairly robust copy function by simply using snprintf.
Since you are passing the maximum number of characters, including the nul-terminating character as a parameter to your copy function, that dovetails nicely with the size parameter for snprintf. Further, snprintf will insure a nul-terminated string in destination -- even if source is too long to fit in destination. As a benefit, snprintf returns the number of characters copied if there is sufficient storage in destination for source, otherwise it returns the number of characters that would have been copied had destination had sufficient space -- allowing you to determine the number of characters truncated if destination is insufficient to hold your source string.
Before looking at the implementation of copy let's look at a prototype and talk about declaring functions to that they provide a meaningful return, and while up to you, let's also look at the order of parameters and type qualifier for source, e.g.
/* simple strcpy src to dest, returns dest on success and number of chars
* (including nul-termining char) in nchar, returns NULL otherwise.
*/
char *copy (char *dest, const char *src, size_t *nchar);
If you notice, most string functions return a pointer to the destination string on success (or NULL otherwise) which allows you to make immediate use of the return. Next, while not a show-stopper, most string (or memory in general) copy functions place the destination as the first parameter, followed later by the source. I'm sure either Brian Kerrigan or Dennis Ritchie could explain why, but suffice it to say, most copy function parameters are ordered that way.
Notice also that since you are not modifying the source string in copy, it is best to qualify the parameter as const. The const qualifier is more or less a promise you make to the compiler that source will not be modified in copy which allows the compiler to warn if it sees you breaking that promise, while also allowing the compiler to further optimize the function knowing source will not change.
Finally notice your size or my nchar is passed as a pointer above instead of an immediate value. Since a function in C can only return a single value, if you need a way to get a second piece of information back to the caller, pass a pointer as a parameter so the value at that address can be updated within the function making the new value available to the calling function. Here you return a pointer to dest (or NULL) to indicate success/failure while also updating nchar to contain the number of characters in dest (including the nul-terminating character as you passed in a size not a length).
The definition of copy is quite short and simplistic. The only requirement is the source and destination strings not overlap. (neither strcpy or snprintf are defined in that case). The basic flow is to validate both src and dest are not NULL, then handle the case where src is the "empty-string" (e.g. 1st character is the nul-character) and then to copy src to dest using snprintf saving the return in written and then using a simple conditional to determine whether truncation occurred (and warning in that case) and concluding by updating the value pointed to by nchar and returning dest, e.g.
/* simple strcpy src to dest, returns dest on success and number of chars
* (including nul-termining char) in nchar, returns NULL otherwise.
*/
char *copy (char *dest, const char *src, size_t *nchar)
{
if (!src || !dest) { /* validate src & dest not NULL */
fputs ("error: src or dest NULL\n", stderr);
return NULL; /* return NULL on error */
}
if (!*src) /* handle src being an "empty-string" */
*dest = 0, *nchar = 0;
int written = snprintf (dest, *nchar, "%s", src); /* call snprintf */
if ((size_t)written + 1 > *nchar) { /* handle truncated case */
fprintf (stderr, "warning: dest truncated by %zu chars.\n",
(size_t)(written + 1) - *nchar); /* warn with count */
}
else /* src fit in dest, set nchar to no. of chars in dest */
*nchar = (size_t)(written + 1); /* including nul-character */
return dest; /* return dest so available for immediate use */
}
Putting it altogether in a short example that takes the string to copy as the first argument to the program (using "source string" by default if no argument is given), you could do something like the following:
#include <stdio.h>
#define MAXC 16 /* constant for destination length */
/* simple strcpy src to dest, returns dest on success and number of chars
* (including nul-termining char) in nchar, returns NULL otherwise.
*/
char *copy (char *dest, const char *src, size_t *nchar)
{
if (!src || !dest) { /* validate src & dest not NULL */
fputs ("error: src or dest NULL\n", stderr);
return NULL; /* return NULL on error */
}
if (!*src) /* handle src being an "empty-string" */
*dest = 0, *nchar = 0;
int written = snprintf (dest, *nchar, "%s", src); /* call snprintf */
if ((size_t)written + 1 > *nchar) { /* handle truncated case */
fprintf (stderr, "warning: dest truncated by %zu chars.\n",
(size_t)(written + 1) - *nchar); /* warn with count */
}
else /* src fit in dest, set nchar to no. of chars in dest */
*nchar = (size_t)(written + 1); /* including nul-character */
return dest; /* return dest so available for immediate use */
}
int main (int argc, char **argv) {
char *src = argc > 1 ? argv[1] : "source string",
dest[MAXC];
size_t n = MAXC;
if (copy (dest, src, &n))
printf ("dest: '%s' (%zu chars including nul-char)\n", dest, n);
}
(note: the maximum number of characters in dest is kept short intentionally to easily show how truncation is handled -- size as appropriate for your needs)
Example Use/Output
$ ./bin/strcpy_snprintf
dest: 'source string' (14 chars including nul-char)
Showing maximum number of characters that can be copied without warning:
$ ./bin/strcpy_snprintf 123456789012345
dest: '123456789012345' (16 chars including nul-char)
Showing handling source too long for destination:
$ ./bin/strcpy_snprintf 1234567890123456
warning: dest truncated by 1 chars.
dest: '123456789012345' (16 chars including nul-char)
Look things over and let me know if you have further questions. There are at least a dozen different ways to approach a string copy, but given you are passing dest with its own storage and passing the maximum number of characters (including the nul-character) as a parameter, it's hard to beat snprintf in that case.
Easiest way:
char *copy(const char *source, char *destination, size_t lengthDestination)
{
memcpy(destination, source, lengthDestination -1);
destination[lengthDestination -1] = 0;
return destination;
}

Issue with getting substring in C

I am new to C and working on a project where I need to be able to get a substring but I am having difficulty as there is a compiler warning about the initialisation and a core dump if I attempt to run the program which I am not sure how to resolve.
I have a function called substring which passes in the source string, the start index and to end index.
Below is my substring function.
char *substring(char * src, int from, int to)
{
char * dst = "\0";
strncpy(dst, src+from, to);
return dst;
}
Below is how I am calling the function
char * debug = substring(rowReport[bPartyColIndex], 1, 2);
rowReport is a MYSQL_ROW, and bPartyColIndex is just an int equal 0 to reference the correct column from the MYSQL_ROW.
At the moment the line above has a compiler warning of:
warning: initialization makes pointer from integer without a cast
which I am unable to determine how to fix this warning.
If I try and run the program I then get a coredump which says that it is a segmentation fault within the substring function performing the strncpy.
char * dst = "\0";
strncpy(dst, src+from, to);
That's why there's a segfault. Assigning dst with \0 isn't correct ! Actually, dst isn't big enough to store the src + from bytes. You should allocate it instead:
char *substring(char * src, int from, int to)
{
size_t src_size = to + 1 - from;
char * dst = malloc(src_size); // Assuming str + from is ok
if (dst != 0)
strncpy(dst, src+from, src_size);
return dst;
}
In this case, you will have to free dst :
char * debug = substring(rowReport[bPartyColIndex], 1, 2);
puts(debug);
free(debug);
You need to allocate new memory for your substring, or have the caller pass in the desired buffer. What you're trying won't work, you are never allocating the storage.
You need something like:
char * substring(const char *str, int from, int to)
{
const size_t len = to - from + 1;
char *out = malloc(len + 1);
if(out != NULL)
{
memcpy(out, str + from, len);
out[len] = '\0';
}
return out;
}
Then the caller needs to free() the returned pointer when done with it.
Your substring function, by itself, has some issues. You are not allocating any space for dst and copying into it. That could lead to a seg fault. You are also not checking if either from or to my go beyond the end of string (can be checked with strlen).
You should also check that from is less than to.
First:
char * dst = "\0";
strncpy(dst, src+from, to);
You are writing to a string literal but string literals are immutable in C. This invokes undefined behavior.
Second:
rowReport[bPartyColIndex] has to be a char * but it is of a different type.
You didn't specify the type of rowReport in your question, but assuming it is a char array you have to pass &rowReport[bPartyColIndex] instead of rowReport[bPartyColIndex] to substring function.

Fatal error in wchar_t* to char* conversion

Here is a C code that converts a wchar_t* string into a char* string :
wchar_t *myXML = L"<test/>";
size_t length;
char *charString;
size_t i;
length = wcslen(myXML);
charString = (char *)malloc(length);
wcstombs_s(&i, charString, length, myXML, length);
The code compiles but at exectution it detects a fatal error at the last line and stops running.
Now, if I replace the last line with this one :
wcstombs_s(&i, charString, length+1, myXML, length);
I just added +1 to the third argument. Then it works perfectly...
Why is there a need to add this trick ? Or is there a flaw elsewhere in my code ?
You need one extra byte for the '\0' terminator character. wcslen does not include this in the length it returns!
To do this properly, you don't just need to pass length+1 to wcstombs_s but also to malloc:
charString = (char *)malloc(length+1);
wcstombs_s(&i, charString, length+1, myXML, length);
And even then, I suspect it will not work correctly. Not all wide characters can be mapped to a single char, so for non-ASCII characters you will need extra space in the multi-byte string.
DESCRIPTION
The wcslen() function is the wide-character
equivalent of the strlen(3) function. It determines
the length of the wide-character string pointed to by
s, not including the terminating L'\0' character.
The trick is that you should always look for code of the form:
string = malloc(len);
very suspiciously, because both wcslen(3) and strlen(3) return the string length without the nul byte, and malloc(3) must allocate the space with that byte. C kinda sucks sometimes.
So every time you see string = malloc(len); rather than string = malloc(len+1);, be very careful to read how len gets assigned.
char String = (char *)malloc(length + 1);
Ought to do the trick. :)
EDIT:
Better would be to ask wcstombs() for the size to allocate in the first place:
size_t len = wcstombs(NULL,src,0) + 1;
char *dest = malloc(len);
len = wcstombs(dest, src, len);
if (len == -1) /* handle error */ ...
The +1 allocates for the ascii nul, and wcstombs() will report how much memory is required to do the conversion. It'll do the conversion twice, once to keep track of the memory required, and then once to store the result, but it will be MUCH simpler to maintain. The second time, when it stores the result, it will write at most len bytes including the ascii nul.

C - read string from buffer of certain size

I have a char buf[x], int s and void* data.
I want to write a string of size s into data from buf.
How can I accomplish it?
Thanks in advance.
Assuming that
by “string” you mean a null-terminated string as is normally meant in C;
you haven't yet allocated memory in data;
you already know that s <= x
First you need to allocate memory in data. Don't forget the room for the 0 byte at the end of the string.
data = malloc(s+1);
if (data == NULL) {
... /*out-of-memory handler*/
}
Assuming malloc succeeds, you can now copy the bytes.
EDIT:
The best function for the job, as pointed out by caf, is strncat. (It's fully portable, being part of C89.) It appends to the destination string, so arrange for the destination to be an empty string beforehand:
*(char*)data = 0;
strncat(data, buf, s);
Other inferior possibilities, kept here to serve as examples of related functions:
If you have strlcpy (which is not standard C but is common on modern Unix systems; there are public domain implementations floating around):
strlcpy(data, buf, s+1);
If you know that there are at least s characters in the source string, you can use memcpy:
memcpy(data, buf, s);
((char*)data)[s+1] = 0;
Otherwise you can compute the length of the source string first:
size_t bytes_to_copy = strlen(buf);
if (bytes_to_copy > s) bytes_to_copy = s;
memcpy(data, buf, bytes_to_copy);
((char*)data)[s+1] = 0;
Or you can use strncpy, though it's inefficient if the actual length of the source string is much smaller than s:
strncpy(data, buf, s);
((char*)data)[s+1] = 0;
If data is not allocated:
char buf[] = "mybuffer";
void *data = malloc(strlen(buf)+1);
strcpy((char*)data,buf);
Actually if data is really to be defined you can also do
char buf[] = "mybuffer";
void *data= (void*)strdup(buf);
memcpy(data, buf, s);
This assumes that you have enough space in data (and in buf).
Depending on what you are doing (you don't say, but you do say that you are copying strings), you may want to add a null at the end of your newly copied string if you did not copy a null already in buff, and you are going to use data in a function that expects strings.
data = malloc(s);
strcpy((char*)data,buf);
free(data);
int n = MIN((x - 1), s);
char *bp = buf;
char *dp = (char *)data;
while (n--) {
*bp++ = *dp++;
}
*dp = '\0';

Resources