mbstowcs() gives incorrect results in Windows - c

I am using mbstowcs() to convert a UTF-8 encoded char* string to wchar_t*, and the latter will be fed into _wfopen(). However, I always get a NULL pointer from _wfopen() and I have found the problem is from the result of mbstowcs().
I prepared the following example and used printf for debugging...
size_t out_size;
int requiredSize;
wchar_t *wc_filename;
char *utf8_filename = "C:/Users/xxxxxxxx/Desktop/\xce\xb1\xce\xb2\xce\xb3.stdf";
wchar_t *expected_output = L"C:/Users/xxxxxxxx/Desktop/αβγ.stdf";
printf("input: %s, length: %d\n", utf8_filename, strlen(utf8_filename));
printf("correct out length is %d\n", wcslen(expected_output));
// convertion start here
setlocale(LC_ALL, "C.UTF-8");
requiredSize = mbstowcs(NULL, utf8_filename, 0);
wc_filename = (wchar_t*)malloc( (requiredSize+1) * sizeof(wchar_t));
printf("requiredsize: %d\n", requiredSize);
if (!wc_filename) {
// allocation fail
free(wc_filename);
return -1;
}
out_size = mbstowcs(wc_filename, utf8_filename, requiredSize + 1);
if (out_size == (size_t)(-1)) {
// convertion fail
free(wc_filename);
return -1;
}
printf("out_size: %d, wchar name: %ls\n", out_size, wc_filename);
if (wcscmp (wc_filename, expected_output) != 0) {
printf("converted result is not correct\n");
}
free(wc_filename);
And the console output is:
input: C:/Users/xxxxxxxx/Desktop/αβγ.stdf, length: 37
correct out length is 34
requiredsize: 37
out_size: 37, wchar name: C:/Users/xxxxxxxx/Desktop/αβγ.stdf
converted result is not correct
I just don't know why expected_output and wc_filename have the same content but the length is different? What did I do wrong here?

The problem appears to be in your choice of locale name. Replacing the following:
setlocale(LC_ALL, "C.UTF-8");
with this:
setlocale(LC_ALL, "en_US.UTF-8");
fixes the issue on my system (Windows 10, MSVC, 64-bit build) – at least, the out_size and requiredSize are both 34 and the "converted result is not correct\n" message doesn't show. Using "en_GB.UTF-8" also worked.
I'm not sure if the C Standard actually defines what locale names are, but this question/answer may be helpful: Valid Locale Names.
Note: As mentioned in the comment by Mgetz, using setlocale(LC_ALL, ".UTF-8"); also works – I guess that would be the minimal and most portable locale name to use.
Second note: You can check if the setlocale call succeeded by comparing its return value to NULL. Using your original local name will give an error message if you use the following code (but not if you remove the leading "C"):
if (setlocale(LC_ALL, "C.UTF-8") == NULL) {
printf("Error setting locale!\n");
}

Universal CRT supports UTF-8, but MSVCRT.DLL is not.
When using MINGW, you need to link to UCRT.

Related

sscanf cannot detect a number C

So I wrote this function in C using sscanf:
int parse_charstar(char *pointah)
{
int numbeh;
int retaahn = sscanf(pointah,"%*[^0123456789]%d",&numbeh);
printf("\n prent deeh numbeeh %d \n",numbeh);
return numbeh;
}
I want to get a number out of a string if there, for eg.
"hello 121"
number: 121
Currently using the above I'm getting garbage values, can someone help?
EDIT:
So I found something interesting today. Apparently, this is what was happening!
My code was never wrong to begin with as pointed out by luoluo and dasblinkenlight.
Problem was how I was calling the program. I'm on linux.
I was calling it as:
parse_charstar("1000");
Output:
prent deeh numbeeh -1634553883
I tried:
parse_charstar(" 1000 "); // added spaces
Output?
prent deeh numbeeh 1000
Spot on.
Now can someone tell me why this happens?
EDIT!!!
Hell with it guys, use strtol , its made for this stuff.
http://www.cplusplus.com/reference/cstdlib/strtol/
Code copied shamelessly from the above page:
#include <stdio.h> /* printf */
#include <stdlib.h> /* strtol */
int main ()
{
char szNumbers[] = "2001 60c0c0 -1101110100110100100000 0x6fffff";
char * pEnd;
long int li1, li2, li3, li4;
li1 = strtol (szNumbers,&pEnd,10);
li2 = strtol (pEnd,&pEnd,16);
li3 = strtol (pEnd,&pEnd,2);
li4 = strtol (pEnd,NULL,0);
printf ("The decimal equivalents are: %ld, %ld, %ld and %ld.\n", li1, li2, li3, li4);
return 0;
}
A more restricted version of your sscanf would be
int retaahn = sscanf(pointah,"%*[^0-9]%d%*[^0-9]",&numbeh);
Note that this doesn't change anything in your format string. I have just used 0-9 to mention the range and added a second %*[^0-9] to make things more explicit.
Currently using the above I'm getting garbage values, can someone
help?
Probably because you're not passing the right arguments to the function. Just do a
printf("pointah : %s\n",pointah);
to see what is passed or set breakpoints and debug your program.
So since my code was never wrong, it turns out my problem was how I was calling this function.
This is how I solved it:
I was calling it as:
parse_charstar("1000");
I tried:
parse_charstar(" 1000 "); // added spaces
And it worked!
Check my edit above for more!!

getting format not a string literal even if I add %s

I have looked around for answer on various forums, tried various things and still getting this error:
warning: format not a string literal and no format arguments [-Wformat-security]
The compiler point to the line in the function that has the error, here's how it looks:
int print_notes(int fd, int uid, char *searchstring) {
int note_length;
char byte=0, note_buffer[100];
note_length = find_user_note(fd, uid);
if(note_length == -1) // if end of file reached
return 0; // return 0
read(fd, note_buffer, note_length); // read note data
note_buffer[note_length] = 0; // terminate the string
if(search_note(note_buffer, searchstring)) // if searchstring found
scanf("%s", note_buffer) // Got this line from an answer in the forums
printf(note_buffer); // compiler points here
return 1;
}
If you want the full code i can post it here, but its kind of long :/ don't know if that will be ok.
Its giving warning for :
printf(note_buffer);
As you are getting string being formed at runtime and trying to print it.
Use :
printf("%s",note_buffer);

Having trouble comparing strings in file to an array of strings inputted by user in C

I have tried to research this question, but was unable to find anything that would help me. I have been constantly trying to debug using fprint, but I still cannot figure it out.
I am an intermediate programmer, and would love if I could get some help here. Here is my code:
int i = 0;
const int arraySize = 10;
char buf[256];
char str[256];
char buffer[256];
char *beerNames[arraySize] = { };
FILE *names;
FILE *percent;
i = 0;
int numBeers = 0;
printf("Please enter a name or (nothing to stop): ");
gets(buf);
while (strcmp(buf, "") != 0) {
beerNames[i] = strdup(buf);
i++;
numBeers++;
if (numBeers == arraySize)
break;
printf("Please enter a name or (nothing to stop): ");
gets(buf);
}
// now open files and look for matches of names: //
names = fopen("Beer_Names.txt", "r");
percent = fopen("Beer_Percentage.txt", "r");
while (fgets(str, sizeof(str) / sizeof(str[0]), names) != NULL) {
fgets(buffer, sizeof(buffer) / sizeof(buffer[0]), percent);
for (i = 0; i < numBeers; i++) {
if (strcmp(str, beerNames[i]) == 0) {
printf("Beer: %s Percentage: %s\n", str, beerNames[i]);
break;
}
}
}
fclose(names);
fclose(percent);
So, the issue that I am having is when I try to strcmp(), it is not comparing properly and is returning either a -1 or a 1. I have tried printing out the strcmp() values as well and it just ends up skipping the match when it equals to 0.
My Beer_Names.txt (shortened) looks like this:
Anchor Porter
Anchor Steam
Anheuser Busch Natural Light
Anheuser Busch Natural Ice
Aspen Edge
Big Sky I.P.A.
Big Sky Moose Drool Brown Ale
Big Sky Powder Hound (seasonal)
Big Sky Scape Goat Pale Ale
Big Sky Summer Honey Ale (seasonal)
Blatz Beer
Blatz Light
Blue Moon
And my Beer_Percentage.txt (shortened) looks like this:
5.6
4.9
4.2
5.9
4.1
6.2
5.1
6.2
4.7
14.7
4.8
0
5.4
This is not for a homework assignment, I am just doing a personal project and I trying to get better at C.
You're problem is that gets() does not return the newline character as part of the string, while fgets() does.
So when the user entered value "Anchor Porter" is read with gets, your string looks like this "Anchor Porter\0", but when you read it from a file with fgets it ends up like this "Anchor Porter\n\0", which will not compare equal.
gets(buf);
I know gets(3) is convenient, and I know this is a toy, but please do not use gets(3). It is impossible to write secure code with gets(3) and there is a reasonable chance that future C libraries might not even include this function. (Yes, I know it is standardized but we can hope future versions will omit it; POSIX.1-2008 has removed it.) Reasonable compilers will warn you about its use. Use fgets(3) instead.
while (fgets(str, sizeof(str) / sizeof(str[0]), names) != NULL) {
sizeof(char) is defined to be 1. This is unlikely to change, and you're unlikely to change the type of the array. It's generally not a big deal, but you cannot use a construct like this as often as you might suspect -- you can use it in this case only because str[] was declared in an enclosing scope of this line. If str were passed as a parameter, the sizeof(str) operator would return the size of a data pointer and not the size of the array. Don't get too used to this construct -- it won't always work as you expect.
names = fopen("Beer_Names.txt", "r");
percent = fopen("Beer_Percentage.txt", "r");
while (fgets(str, sizeof(str) / sizeof(str[0]), names) != NULL) {
fgets(buffer, sizeof(buffer) / sizeof(buffer[0]), percent);
Please take the time to check fopen(3) for success or failure. It's a good habit to get into, and if you provide a good error message, it might save you time in the future, too. Replace the fopen() lines with something like this:
names = fopen("Beer_Names.txt", "r");
percent = fopen("Beer_Percentage.txt", "r");
if (!names) {
perror("failed to open Beer_Names.txt");
exit(1);
}
if (!percent) {
perror("failed to open Beer_Percentage.txt");
exit(1);
}
You could wrap that up into a function that does fopen(), checks the return value, and either prints the error message and quits or returns the FILE* object.
And now, the bug that brought you here: Robert has pointed out that fgets(3) and gets(3) handle the terminating newline of input differently. (One more reason to get ridd of gets(3) as soon as possible.)

parsing/matching string occurrence in C

I have the following string:
const char *str = "\"This is just some random text\" 130 28194 \"Some other string\" \"String 3\""
I would like to get the the integer 28194 of course the integer varies, so I can't do strstr("20194").
So I was wondering what would be a good way to get that part of the string?
I was thinking to use #include <regex.h> which I already have a procedure to match regexp's but not sure how the regexp in C will look like using the POSIX style notation. [:alpha:]+[:digit:] and if performance will be an issue. Or will it be better using strchr,strstr?
Any ideas will be appreciate it
If you want to use regex, you can use:
const char *str = "\"This is just some random text\" 130 28194 \"Some other string\" \"String 3\"";
regex_t re;
regmatch_t matches[2];
int comp_ret = regcomp(&re, "([[:digit:]]+) \"", REG_EXTENDED);
if(comp_ret)
{
// Error occured. See regex.h
}
if(!regexec(&re, str, 2, matches, 0))
{
long long result = strtoll(str + matches[1].rm_so, NULL, 10);
printf("%lld\n", result);
}
else
{
// Didn't match
}
regfree(&re);
You're correct that there are other approaches.
EDIT: Changed to use non-optional repetition and show more error checking.

sprintf() gone crazy

I need some help with this, since it baffles me in my C program
I have 2 strings(base, and path)
BASE: /home/steve/cps730
PATH: /page2.html
this is how printf reads then just before I call a sprintf to join their content together. here is the code block
int memory_alloc = strlen(filepath)+1;
memory_alloc += strlen(BASE_DIR)+1;
printf("\n\nAlloc: %d",memory_alloc);
char *input = (char*)malloc(memory_alloc+9000);
printf("\n\nBASE: %s\nPATH: %s\n\n",BASE_DIR,filepath);
sprintf(input, "%s%s",BASE_DIR,filepath); // :(
printf("\n\nPATH: %s\n\n",input);
Now, can you explain how the final printf statement returns
PATH: e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/stev
because it dont understand it at all.
** I added 9000 in the malloc statement to prevent program from crashing (since the size of the string is obviously bigger then 31 bytes.
Full Output
Alloc: 31
BASE: /home/steve/cps730
PATH: /page2.html
PATH: /home/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/steve/cps730e/stev
Sending:
HTTP/1.0 404 Not Found
Date: Sat, 12 Sep 2009 19:01:53 GMT
Connection: close
EDIT...................All the code that uses these variables
const char *BASE_DIR = "/home/steve/cps730";
char* handleHeader(char *header){
//Method given by browser (will only take GET, POST, and HEAD)
char *method;
method = (char*)malloc(strlen(header)+1);
strcpy(method,header);
method = strtok(method," ");
if(!strcmp(method,"GET")){
char *path = strtok(NULL," ");
if(!strcmp(path,"/")){
path = (char*)malloc(strlen(BASE_DIR)+1+12);
strcpy(path,"/index.html");
}
free(method);
return readPage(path);
}
else if(!strcmp(method,"POST")){
}
else if(!strcmp(method,"HEAD")){
}
else{
strcat(contents,"HTTP/1.1 501 Not Implemented\n");
strcat(contents, "Date: Sat, 12 Sep 2009 19:01:53 GMT\n");
strcat(contents, "Connection: close\n\n");
}
free(method);
}
//Return the contents of an HTML file
char* readPage(char* filepath){
int memory_alloc = strlen(filepath)+1;
memory_alloc += strlen(BASE_DIR)+1;
printf("\n\nAlloc: %d",memory_alloc);
char *input = (char*)malloc(memory_alloc+9000);
printf("\n\nBASE: %s\nPATH: %s\n\n",BASE_DIR,filepath);
sprintf(input, "%s%s\0",BASE_DIR,filepath);
printf("\n\nPATH: %s\n\n",input);
FILE *file;
file = fopen(input, "r");
char temp[255];
strcat(contents,"");
if(file){
strcat(contents, "HTTP/1.1 200 OK\n");
strcat(contents, "Date: Sat, 12 Sep 2009 19:01:53 GMT\n");
strcat(contents, "Content-Type: text/html; charset=utf-8\n");
strcat(contents, "Connection: close\n\n");
//Read the requested file line by line
while(fgets(temp, 255, file)!=NULL) {
strcat(contents, temp);
}
}
else{
strcat(contents, "HTTP/1.0 404 Not Found\n");
strcat(contents, "Date: Sat, 12 Sep 2009 19:01:53 GMT\n");
strcat(contents, "Connection: close\n\n");
}
return contents;
}
You call readPage with an invalid pointer path - it points into the memory previously allocated with the method pointer, which is freed right before the call to readPage. The next malloc can reuse this memory and then anything can happen...
Well, clearly this can't happen :-)
My guess is that your heap is horribly corrupted already.
I would look at the actual pointer values used by filepath, input and base. I wonder if you'll find that input is very close to filepath?
I would also look at how filepath, base etc were originally created, could you have a buffer over-run there?
Try this code:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
const char* BASE_DIR = "/home/steve/cps730";
const char* filepath = "/page2.html";
int memory_alloc = strlen(filepath);
memory_alloc += strlen(BASE_DIR)+1;
printf("\n\nAlloc: %d",memory_alloc);
char *input = (char*)malloc(memory_alloc);
printf("\n\nBASE: %s\nPATH: %s\n\n",BASE_DIR,filepath);
sprintf(input, "%s%s",BASE_DIR,filepath); // :(
printf("\n\nPATH: %s\n\n",input);
return 0;
}
If this doesn't have a problem, then there must be something wrong elsewhere in the code. That's how undefined behavior sometimes may manifest itself (messing up how unrelated code works).
(BTW, I didn't add +1 to both strlen calls, since the concatenated string is still going to have only one null-terminator.)
Since the BASE_DIR value is repeating itself, either BASE_DIR or filepath is probably overlapping the in input memory.
Make sure both BASE_DIR and filepath really has allocated memory.
A first try is to just make a local copy of BASE_DIR and filepath before calling sprintf.
Aaah - the thrill of the chase as the question morphs while we're trying to resolve the problem!
The current code looks like:
const char *BASE_DIR = "/home/steve/cps730";
//Handles the header sent by the browser
char* handleHeader(char *header){
//Method given by browser (will only take GET, POST, and HEAD)
char *method;
method = (char*)malloc(strlen(header)+1);
strcpy(method,header);
method = strtok(method," ");
if(!strcmp(method,"GET")){
char *path = strtok(NULL," ");
if(!strcmp(path,"/")){
path = (char*)malloc(strlen(BASE_DIR)+1+12);
strcpy(path,"/index.html");
}
free(method);
return readPage(path);
}
...
Question: if this is running in a web server, is it safe to be using the thread-unsafe function strtok()? I'm going to assume 'Yes, it is safe', though I'm not wholly convinced. Have you printed the header string? Have you printed the value of path? Did you really intend to leak the allocated path? Did you realize that the malloc() + strcpy() sequence does not copy BASE_DIR into path?
The original version of the code ended with:
printf("\n\nPATH: %s\n\n", filepath);
Hence the original suggested partial answer:
You format into input; you print from filepath?
What is the chance that filepath points to already released memory? When you allocate the memory, you could be getting anything happening to the quasi-random area that filepath used to point to. Another possibility could be that filepath is a pointer to a local variable in a function that has returned - so it points to somewhere random in the stack that is being reused by other code, such as sprintf().
I also mentioned in a comment that you might conceivably need to ensure that malloc() is declared and check the return value from it. The '(char *)' cast is not mandatory in C (it is in C++), and many prefer not to include the cast if the code is strictly C and not bilingual in C and C++.
This code works for me:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
const char *BASE_DIR = "/home/steve/cps730";
const char *filepath = "/page2.html";
int memory_alloc = strlen(filepath) + 1;
memory_alloc += strlen(BASE_DIR) + 1;
printf("\n\nAlloc: %d", memory_alloc);
char *input = (char*)malloc(memory_alloc + 9000);
printf("\n\nBASE: %s\nPATH: %s\n\n", BASE_DIR, filepath);
sprintf(input, "%s%s", BASE_DIR, filepath);
printf("\n\nPATH: %s\n\n", filepath);
printf("\n\nPATH: %s\n\n", input);
return(0);
}
It produces extraneous empty lines plus:
Alloc: 31
BASE: /home/steve/cps730
PATH: /page2.html
PATH: /page2.html
PATH: /home/steve/cps730/page2.html
The easiest way to figure out what's going on is to trace through the execution in a debugger (possibly dropping to tracing the assembly code).
A few guesses as to what might be going on:
memory corruption by another thread (seems unlikely if this is readily repeatable)
corrupt heap (also seems unlikely, as you dump the 2 component strings after the malloc() call)
as mentioned by Jonathan Leffler in a comment, you might be missing a header (perhaps stdio.h) and the compiler is generating the incorrect calling /stack clean up sequence for the printf/sprintf calls. I would expect that you would see some compile time warnings if this were the case - ones that you should take note of.
What compiler/target are you using?
To do this correctly, I'd change the code to:
/* CHANGED: allocate additional space for "index.html" */
method = (char*)malloc(strlen(header)+1+10);
strcpy(method,header);
method = strtok(method," ");
if(!strcmp(method,"GET")){
char *path = strtok(NULL," ");
if(!strcmp(path,"/")){
/* CHANGED: don't allocate new memory, use previously allocated */
strcpy(path,"/index.html");
}
/* CHANGED: call function first and free memory _after_ the call */
char *result = readPage(path);
free(method);
return result;
}
Suggestions
There is nothing obviously wrong with the program. (Update: well, there is something obvious now. For the first hour only a few lines were posted, and they had no serious bugs.) You will have to post more of it. Here are some ideas:
malloc(3) returns void * so it should not be necessary to cast it. If you are getting a warning, it most likely means you did not include <stdlib.h>. If you aren't, you should. (For example, on a 64-bit system, not prototyping malloc(3) can be quite serious. Some of the 64-bit environments don't really support K&R C. :-)
Speaking of warnings, please be sure you are turning them all on. With gcc you can turn most of them on with -Wall.
You are not checking the return value of malloc(3) for an error.
Use a memory debugger like Electric Fence. There are many choices, see my link.

Resources