How to convert argv to wide chars in Win32 command line application? - c

I'm using the win32 api for C in my program to read from a serial port, it seems to be pretty low level stuff. Assuming that there is no better way of reading from a serial port, the CreateFile function involves a LPCWSTR argument, I've read and it looks like LPCWSTR is a wchar_t type. Firstly, I don't really understand the difference between wchar and char, I've read stuff about ansi and unicode, but I don't really know how it applies to my situation.
My program uses a main function, not wmain, and needs to get an argument from the command line and store it in a wchar_t variable. Now I know I could do this if I just made the string up on the spot;
wchar_t variable[1024];
swprintf(variable,1024,L"%s",L"randomstringETC");
Because it looks like the L converts char arrays to wchar arrays. However it does not work when I do;
wchar_t variable[1024];
swprintf(variable,1024,L"%s",Largv[1]);
obviously because it's a syntax error. I guess my question is, is there an easy way to convert normal strings to wchar_t strings?
Or is there a way to avoid this Unicode stuff completely and read from serial another way using C on windows..

There is no winapi function named CreateFile. There's CreateFileW and CreateFileA. CreateFile is a macro that maps to one of these real function depending on whether the _UNICODE macro is defined. CreateFileW takes an LPCWSTR (aka const wchar_t*), CreateFileA takes an LPCSTR (aka const char*).
If you are not ready yet to move to Unicode then simply use the CreateFileA() function explicitly. Or change the project setting: Project + Properties, General, Character Set. There's a non-zero cost, the underlying operating system is entirely Unicode based. So CreateFileA() goes through a translation layer that turns the const char* into a const wchar_t* according to the current system code page.

The L thing is only for string literals.
You need to convert argv string (presumably unsigned char) to wchar by using something like the winapi mbstowcs() function.

MultiByteToWideChar can be used to map from ANSI to UNICODE. To do your swprintf call, you need to define an array of wchar like this:
WCHAR lala[256] = {0};
swprintf(lala, _countof(lala), L"%s", Largv[1]);
It is possible to avoid unicode by compiling your application against a multibyte character set but it's bad practice to do this unless you're doing so for legacy reasons. Windows will need to convert it back to unicode at some point eventually anyway because that is the encoding of the underlying OS.

Related

How can I convert a PCHAR* to a TCHAR*?

I am looking for ways to convert a PCHAR* variable to a TCHAR* without having any warnings in Visual Studio( this is a requirement)?
Looking online I can't find a function or a method to do so without having warnings. Maybe somebody has come across something similar?
Thank you !
convert a PCHAR* variable to a TCHAR*
PCHAR is a typedef that resolves to char*, so PCHAR* means char**.
TCHAR is a macro #define'd to either the "wide" wchar_t or the "narrow" char.
In neither case can you (safely) convert between a char ** and a simple character pointer, so the following assumes the question is actually about converting a PCHAR to a TCHAR*.
PCHAR is the same TCHAR* in ANSI builds, and no conversion would be necessary in that case, so it can be further assumed that the question is about Unicode builds.
The PCHAR comes from the function declaration(can t be changed) and TCHAR comes from GetCurrentDirectory. I want to concatenate the 2 using _tcscat_s but I need to convert the PCHAR first.
The general question of converting between narrow and wide strings has been answered before, see for example Convert char * to LPWSTR or How to convert char* to LPCWSTR?. However, in this particular case, you could weigh the alternatives before choosing the general approaches.
Change your build settings to ANSI, instead of Unicode, then no conversion is necessary.
That's as easy as making sure neither UNICODE nor _UNICODE macros are defined when compiling, or changing in the IDE the project Configuration Properties / Advanced / Character Set from Use Unicode Character Set to either Not Set or Use Multi-Byte Character Set.
Disclaimer: it is retrograde nowadays to compile against an 8-bit Windows codepage. I am not advising it, and doing that means many international characters cannot be represented literally. However, a chain is only as strong as its weakest link, and if you are forced to use narrow strings returned by an external function that you cannot change, then that's limiting the usefulness of going full Unicode elsewhere.
Keep the build as Unicode, but change just the concatenation code to use ANSI strings.
This can be done by explicitly calling the ANSI version GetCurrentDirectoryA of the API, which returns a narrow string. Then you can strcat that directly with the other PCHAR string.
Keep it as is, but combine the narrow and wide strings using [w]printf instead of _tcscat_s.
char szFile[] = "test.txt";
PCHAR pszFile = szFile; // narrow string from ext function
wchar_t wszDir[_MAX_PATH];
GetCurrentDirectoryW(_MAX_PATH, wszDir); // wide string from own code
wchar_t wszPath[_MAX_PATH];
wsprintf(wszPath, L"%ws\\%hs", wszDir, pszFile); // combined into wide string

How can I convert C-String to LPCSTR on Windows

In order to find if a file exists, I want to use the GetFileAttributes WinAPI function.
The function accepts a LPCSTR argument. How can I convert my classic const char* string to this type?
Please note, I'm using C, not C++. Is this the right way to go in C too?
According to this Microsoft documentation page, LPCSTR is defined in WinNT.h as follows:
typedef __nullterminated CONST CHAR *LPCSTR;
This evaluates to const char *.
So, you are essentially asking how to convert const char* to itself. In other words, the answer to your question is that no conversion is required.
Regarding your question, there is no difference between C and C++. However, C++ offers additional ways of handling strings.
It is GetFileAttributesA (note the A) which uses LPCSTR. The wide character version is GetFileAttributesW, and its argument is LPCWSTR. The generic name GetFileAttributes is a shim which will switch between these two functions at compile time; it is defined in terms of a TCHAR typedef (const strings of which are LPCTSTR). TCHAR switches between CHAR or WCHAR based on whether you build the program for Unicode support.
If you have a const char * input intended to be passed to GetFileAttributes being compiled for Unicode, or to be passed to GetFileAttributesW, then a conversion is needed from byte string to wide string.
It's best to avoid mixing wide and narrow strings in the entire program, if at all possible, to avoid the cumbersome conversions.

Converting a UTF-8 text to wchar_t

I know this question has been asked quite a few times here, and i did read some of the answers, But there are a few suggested solutions and im trying to figure out the best of them.
I'm writing a C99 app that basically receives XML text encoded in UTF-8.
Part of it's job is to copy and manipulate that string (finding a substr, cat it, ex..)
As i would rather not to use an outside not-standard library right now, im trying to implement it using wchar_t.
Currently, im using mbstowcs to convert it to wchar_t for easy manipulation, and for some input i tried in different languages - it worked fine.
Thing is, i did read some people out there had some issues with UTF-8 and mbstowcs, so i would like to hear out about whether this use is permitted/acceptable.
Other option i faced was using iconv with WCHAR_T parameter. Thing is, im working on a platform(not a PC) which it's locale is very very limit to only ANSI C locale. How about that?
I did also encounter some C++ library which is very popular. but im limited for C99 implementation.
Also, i would be compiling this code on another platform, which the sizeof of wchar_t is different (2 bytes versus 4 bytes on my machine). How can i overcome that? using fixed-size char containers? but then, which manipulation functions should i use instead?
Happy to hear some thoughts. thanks.
C does not define what encoding the char and wchar_t types are and the standard library only mandates some functions that translate between the two without saying how. If the implementation-dependent encoding of char is not UTF-8 then mbstowcs will result in data corruption.
As noted in the rationale for the C99 standard:
However, the five functions are often too restrictive and too primitive to develop portable international programs that manage characters.
...
C90 deliberately chose not to invent a more complete multibyte- and wide-character library, choosing instead to await their natural development as the C community acquired more experience with wide characters.
Sourced from here.
So, if you have UTF-8 data in your chars there isn't a standard API way to convert that to wchar_ts.
In my opinion wchar_t should usually be avoided unless necessary - you might need it if you're using WIN32 APIs for example. I am not convinced it will simplify string manipulation. wchar_t is always UTF-16LE on Windows so you may still need to have more than one wchar_t to represent a single Unicode code point anyway.
I suggest you investigate the ICU project - at least from an educational standpoint.
Also, i would be compiling this code on another platform, which the
sizeof of wchar_t is different (2 bytes versus 4 bytes on my machine).
How can i overcome that? using fixed-size char containers?
You could do that with conditional typedefs like this:
#if defined(__STDC_UTF_16__)
typedef _Char16_t CHAR16;
#elif defined(_WIN32)
typedef wchar_t CHAR16;
#else
typedef uint16_t CHAR16;
#endif
#if defined(__STDC_UTF_32__)
typedef _Char32_t CHAR32;
#elif defined(__STDC_ISO_10646__)
typedef wchar_t CHAR32;
#else
typedef uint32_t CHAR32;
#endif
This will define the typedefs CHAR16 and CHAR32 to use the new C++11 character types if available, but otherwise fall back to using wchar_t when possible and fixed-width unsigned integers otherwise.

cannot convert parameter 1 from 'const char *' to 'LPCWSTR'

Basically I have some simple code that does some things for files and I'm trying to port it to windows. I have something that looks like this:
int SomeFileCall(const char * filename){
#ifndef __unix__
SomeWindowsFileCall(filename);
#endif
#ifdef __unix__
/**** Some unix only stat code here! ****/
#endif
}
the line SomeWindowsFileCall(filename); causes the compiler error:
cannot convert parameter 1 from 'const char *' to 'LPCWSTR'
How do I fix this, without changing the SomeFileCall prototype?
Most of the Windows APIs that take strings have two versions: one that takes char * and one that takes WCHAR * (that latter is equivalent to wchar_t *).
SetWindowText, for example, is actually a macro that expands to either SetWindowTextA (which takes char *) or SetWindowTextW (which takes WCHAR *).
In your project, it sounds like all of these macros are referencing the -W versions. This is controlled by the UNICODE preprocessor macro (which is defined if you choose the "Use Unicode Character Set" project option in Visual Studio). (Some of Microsoft's C and C++ run time library functions also have ANSI and wide versions. Which one you get is selected by the similarly-named _UNICODE macro that is also defined by that Visual Studio project setting.)
Typically, both of the -A and -W functions exist in the libraries and are available, even if your application is compiled for Unicode. (There are exceptions; some newer functions are available only in "wide" versions.)
If you have a char * that contains text in the proper ANSI code page, you can call the -A version explicitly (e.g., SetWindowTextA). The -A versions are typically wrappers that make wide character copies of the string parameters and pass control to the -W versions.
An alternative is to make your own wide character copies of the strings. You can do this with MultiByteToWideChar. Calling it can be tricky, because you have to manage the buffers. If you can get away with calling the -A version directly, that's generally simpler and already tested. But if your char * string is using UTF-8 or any encoding other than the user's current ANSI code page, you should do the conversion yourself.
Bonus Info
The -A suffix stands for "ANSI", which was the common Windows term for a single-byte code-page character set.
The -W suffix stands for "Wide" (meaning the encoding units are wider than a single byte). Specifically, Windows uses little-endian UTF-16 for wide strings. The MSDN documentation simply calls this "Unicode", which is a little bit of a misnomer.
Configure your project to use ANSI character set. (General -> Character Set)
What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR etc.
typedef const wchar_t* LPCWSTR;
{project properties->advanced->character set->use multi byte character set} İf you do these step you problem is solved
You are building with WinApi in Unicode mode, so all string parameters resolve to wide strings. The simplest fix would be to change the WinApi to ANSI, otherwise you need to create a wchar_t* with the contents from filename and use that as an argument.
Am able to solve this error by setting the Character set to "Use Multi-Byte Character set"
[Project Properties-> Configuration Properties -> General -> Character Set ->"Use Multi-Byte Character set"
not sure what compiler you are using but in visual studio you can specify the default char type, whether it be UNICODE or multibyte. In your case it sounds as if UNICODE is default so the simplest solution is to check for the switch on your particular compiler that determines default char type because it would save you some work, otherwise you would end up adding code to convert back and forth from UNICODE which may add unnecessary overhead plus could be an additional source of error.

C basic datatype problem - const char * to LPCTSTR

#include "stdafx.h"
#include "string.h"
#include "windows.h"
bool SCS_GetAgentInfo(char name[32],char version[32], char description[256], const char * dwAppVersion)
{
strcpy(name,gName);
strcpy(version,gVersion);
strcpy(description,gDescription);
notify(dwAppVersion);
return true;
}
void notify(const char * msg)
{
MessageBox(NULL, TEXT(msg), NULL, NULL);
}
I have managed to work with the first three fields fine, but I am running into issues with the const char *. I have tried passing and casting in alot of different ways, but can't get it to work. I googled around, but couldn't find much on Lmsg. I am new to alot of this. I have read around and I think it may have to do with encoding. What really confuses me is LPCTSTR is defined as a const char *, but straight typecasting doesn't give me anything from the field.
I get an error that Lmsg is undeclared which I am guessing means that the Macro expansion of TEXT is causing this. How can I get this working?
Doing MessageBox(NULL, (LPCTSTR)msg, NULL, NULL); instead gives me a bunch of boxes indicating it probably is referencing the wrong characters, but copying the dwAppsVersion parameter into the description shows the correct information.
The problem is that you're building you application to use UNICODE Win32 API's, but you're passing around non-UNICODE strings. You have two options:
convert the msg string to Unicode using something like MultiByteToWideChar(). This is probably the 'right' way to do it, if a bit more complex because you need to deal with codepages and managing the buffers used for the conversion.
you can force the ANSI version of the API to be used:
MessageBoxA(NULL, msg, NULL, NULL);
That's a simple workaround, if not elegant.
Other options include only building the application to use Win32 ANSI APIs instead the Unicode APIs or changing the strings you pass around as LPTSTR and using the TEXT() or _T() macros for your literals. However, if you're reading non-Unicode data from files or elseswhere, then you still have to deal with the conversion at some point...
An LPCTSTR is an alias for const TCHAR *, and TCHAR is a type used in Windows programming to ease the transition between the ANSI (Windows-1252, very similar to the internationally standardized ISO 8859-1) and Unicode text encodings.
If your project is set up to build your app using ANSI, TCHAR is really char, and you would be able to pass msg to MessageBox without a cast.
If your app is set up to build using Unicode (which is what it sounds like), TCHAR is really wchar_t, and you would have to convert the string from ANSI to Unicode using a function like MultiByteToWideChar().
Simply casting just forces the compiler interpret the type differently without changing the data; in this case, that's not enough because the actual data must be converted from one format to another.
It's hard to tell exactly what's going on in your question since you appear to have left some relevant context out. For example LPCTSTR isn't mentioned anywhere, so I can only guess at what you're talking about, or what "the first three fields" are.
One thing to note is that LPCTSTR is not always const char*, it is in an ANSI build, but it is const wchar_t* in a Unicode build. This most likely the issue you're running into.
Also, the TEXT() macro is only for defining string constants. You can't use it to perform a conversion on a variable, this is why you're getting 'Lmsg undeclared'.
If you aren't intentionally using a Unicode build, you may want to change your project settings to an ANSI build as a work-around. Otherwise, you may want to read a tutorial on working with Unicode, which you really should be familiar with if you are writing software for Windows these days.

Resources