wcscpy Does Not Accept TCHAR in Destination Variable - c

[VS10] The aim is to copy the drive literal string into the *.dst thus
TCHAR *driveIDBase;
...
wcscpy_s (driveIDBase, MAX_PATH-3, L"\\\\?\\C:\\*");
This produces the error
IntelliSense: no instance of overloaded function "wcscpy_s" matches
the argument list
Note that the ANSI version works well enough:
strcpy_s (driveIDBase, MAX_PATH-3, "C:\\*");
Supposing we try the obvious workaround:
strcpy_s (driveIDBase, MAX_PATH-3, "\\?\C:\");
can we call the cast (wchar_t *) driveIDBase reliable? That is, WIN32_FIND_DATAW will interpret that string as "C:\"?
Also what is meant by this quote from MSDN?
The "\\?\" prefix turns off automatic expansion of the path string,

It's worth noting the Stack Overflow definition of TCHAR:
A #define for either char or wchar_t, used for porting ancient windows
applications.
The code being assembled is not an ancient port, albeit the initial reason for including it in the current project was that it was recommended (in older threads) for the purposes of conversion in certain API functions.
Due to the phase out of MBCS, building a project these days is preferable in Unicode as Bo suggests above, which renders the usage of the TCHAR macro redundant.
As for the second part of the question, suppose a wide char directory is created thus:
%USERPROFILE%\This Is A SubDirectory of %USERPROFILE% Not C-Colon-Backslash-Users-Backslash-MyUserName-- Being the Expanded Directory Path We Intended to Use
We note that under \\?\ the given subdirectory would not be created in "C:\Users\MyUserName". In fact it could not in most cases, as C:\Users would never have been created with a \\?\ prefix in the first instance.
Concluding this part of the answer with another question: Regarding another statement from the same page in MSDN which states:
The maximum path of 32,767 characters is approximate, because the
"\\?\" prefix may be expanded to a longer string by the system at run
time, and this expansion applies to the total length,
is the expansion at run time not automatic?

Related

Format overflow warning when trying to store a wide string

I'm currently learning C and lately, I have been focusing on the topic of character encoding. Note that I'm a Windows programmer. While I currently test my code only on Windows, I want to eventually port it to Linux and macOS, so I'm trying to learn the best practices right now.
In the example below, I store a file path in a wchar_t variable to be opened later on with _wfopen. I need to use _wfopen because my file path may contain chars not in my default codepage. Afterwards, the file path and a text literal is stored inside a char variable named message for further use. My understanding is that you can store a wide string into a multibyte string with the %ls modifier.
char message[8094] = "";
wchar_t file_path[4096] = L"C:\\test\\test.html";
sprintf(message, "Accessing: %ls\n", file_path);
While the code works, GCC/MinGW outputs the following warning and notes:
warning: '%ls' directive writing up to 49146 bytes into a region of size 8083 [-Wformat-overflow=]|
note: assuming directive output of 16382 bytes|
note: 'sprintf' output between 13 and 49159 bytes into a destination of size 8094|
My issue is that I simply do not understand how sprintf could output up to 49159 bytes into the message variable. I output the Accessing: string literal, the file_path variable, the \n char and the \0 char. What else is there to output?
Sure, I could declare message as a wchar_t variable and use wsprintf instead of sprintf, but my understanding is that wchar_t does not make up for nice portable code. As such, I'm trying to avoid using it unless it's required by a specific API.
So, what am I missing?
The warning doesn't take into account the actual contents of file_path , it is calculated based on file_path having any possible content . There would be an overflow if file_path consisted of 4095 emoji and a null terminator.
Using %ls in narrow printf family converts the source to multi-byte characters which could be several bytes for each wide character.
To avoid this warning you could:
disable it with -Wno-format-overflow
use snprintf instead of sprintf
The latter is always a good idea IMHO, it is always good to have a second line of defence against mistakes introduced in code maintenance later (e.g. someone comes along and changes the code to grab a path from user input instead of hardcoded value).
After-word. Be very careful using wide characters and printf family in MinGW , which implements the printf family by calling MSVCRT which does not follow the C Standard. Further reading
To get closer to standard behaviour, use a build of MinGW-w64 which attempts to implement stdio library functions itself, instead of deferring to MSVCRT. (E.g. MSYS2 build).

Splint: substitute non-standard type `bit` with `unsigned char`

(This is an extension to my previous question). I'm using Splint in Windows CLI.
The XC8 embedded C compiler has a custom type bit. To get Splint to parse, I can pass to it the CLI option:
-Dbit=char
However I need it to replace bit with unsigned char. The space character is a problem. How can I modify the above flag?
It is the shell, not splint, which processes the quotes and escapes in command-line arguments. Any result where the shell ends up treating the whole string -Dbit=unsigned char as a single argument suffices, e.g., put quotes around the whole thing.
(edit: Actually, in case of Windows it may in some cases be something other than the shell that processes the quotes and escapes, but never-the-less putting double quotes around the whole thing should work.)

what does the `TEXT` around the format string mean in "printf"

The following prints the percentage of memory used.
printf (TEXT("There is %*ld percent of memory in use.\n"),
WIDTH, statex.dwMemoryLoad);
WIDTH is defined to be equal to 7.
What does TEXT mean, and where is this sort of syntax defined in printf?
As others already said, TEXT is probably a macro.
To see what they become, simply look at the preprocessor output. If are using gcc:
gcc -E file.c
Just guessing but TEXT is a char* to char* function that takes care of translating a text string for internationalization support.
Note that if this is the case then may be you are also required to always use TEXT with a string literal (and not with expressions or variables) to allow an external tool to detect all literals that need translations by a simple scan of the source code. For example may be you should never write:
puts(TEXT(flag ? "Yes" : "No"));
and you should write instead
puts(flag ? TEXT("Yes") : TEXT("No"));
Something that is instead standard but not used very often is the parameteric width of a field: for example in printf("%*i", x, y) the first parameter x is the width used to print the second parameter y as a decimal value.
When used with scanf instead the * special char can be used to specify that you don't want to store the field (i.e. to "skip" it instead of reading it).
TEXT() is probably a macro or function which returns a string value. I think it is user defined and does some manner of formatting on that string which is passed as an argument to the TEXT function. You should go to the function declaration for TEXT() to see what exactly it does.
TEXT() is a unicode support macro defined in winnt.h. If UNICODE is defined then it prepends L to the string making it wide.
Also see TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE blog post.
_TEXT() or _T() is a microsoft specific macro.
This MSDN link says
To simplify code development for various international markets,
the Microsoft run-time library provides Microsoft-specific "generic-text" mappings for many data types, routines, and other objects.
These mappings are defined in TCHAR.H.
You can use these name mappings to write generic code that can be compiled for any of the three kinds of character sets:
ASCII (SBCS), MBCS, or Unicode, depending on a manifest constant you define using a #define statement.
Generic-text mappings are Microsoft extensions that are not ANSI compatible.
_TEXT is a macro to make a strings "character set neutral".
For example _T("HELLO");
Characters can either be denoted by 8 bit ANSI standards or the 16 bit Unicode notation.
If you define _TEXT for all strings and define a preprocessor symbol "_UNICODE", all such strings will follow UNICODE encoding. If you don’t define _UNICODE, the strings will all be ANSI.
Hence the macro _TEXT allows you to have all strings as UNICODE or ANSI.
So no need to change every time you change your character set.

What is the difference between "\__signed" "\__signed__" and "signed"?

I am studying "include/asm-x86/types.h" and I'm a little confused about the meaning of \__signed__.
When I Google this keyword, I cannot get any useful information. Why not just use signed instead of \__signed__, does it have a special meaning?
That is used for backwards compatibility, when older compilers didn't recognize signed keyword, such alternatives were used.
The difference between __signed__ and signed is to do with namespaces. The signed names are only available in the __KERNEL__ and not outside.
As stated at the top of the header file you mention:
/*
* __xx is ok: it doesn't pollute the POSIX namespace. Use these in the
* header files exported to user space
*/
For the signed names without underscores it states this:
/*
* These aren't exported outside the kernel to avoid name space clashes
*/
__signed__ is also used for compile with gcc -traditional, where the keyword signedis not recognized.
Don't google using c __signed__, because special characters like __ are skipped in the search, do a literal search using c "__signed__" and you will get useful information.

cannot convert parameter 1 from 'const char *' to 'LPCWSTR'

Basically I have some simple code that does some things for files and I'm trying to port it to windows. I have something that looks like this:
int SomeFileCall(const char * filename){
#ifndef __unix__
SomeWindowsFileCall(filename);
#endif
#ifdef __unix__
/**** Some unix only stat code here! ****/
#endif
}
the line SomeWindowsFileCall(filename); causes the compiler error:
cannot convert parameter 1 from 'const char *' to 'LPCWSTR'
How do I fix this, without changing the SomeFileCall prototype?
Most of the Windows APIs that take strings have two versions: one that takes char * and one that takes WCHAR * (that latter is equivalent to wchar_t *).
SetWindowText, for example, is actually a macro that expands to either SetWindowTextA (which takes char *) or SetWindowTextW (which takes WCHAR *).
In your project, it sounds like all of these macros are referencing the -W versions. This is controlled by the UNICODE preprocessor macro (which is defined if you choose the "Use Unicode Character Set" project option in Visual Studio). (Some of Microsoft's C and C++ run time library functions also have ANSI and wide versions. Which one you get is selected by the similarly-named _UNICODE macro that is also defined by that Visual Studio project setting.)
Typically, both of the -A and -W functions exist in the libraries and are available, even if your application is compiled for Unicode. (There are exceptions; some newer functions are available only in "wide" versions.)
If you have a char * that contains text in the proper ANSI code page, you can call the -A version explicitly (e.g., SetWindowTextA). The -A versions are typically wrappers that make wide character copies of the string parameters and pass control to the -W versions.
An alternative is to make your own wide character copies of the strings. You can do this with MultiByteToWideChar. Calling it can be tricky, because you have to manage the buffers. If you can get away with calling the -A version directly, that's generally simpler and already tested. But if your char * string is using UTF-8 or any encoding other than the user's current ANSI code page, you should do the conversion yourself.
Bonus Info
The -A suffix stands for "ANSI", which was the common Windows term for a single-byte code-page character set.
The -W suffix stands for "Wide" (meaning the encoding units are wider than a single byte). Specifically, Windows uses little-endian UTF-16 for wide strings. The MSDN documentation simply calls this "Unicode", which is a little bit of a misnomer.
Configure your project to use ANSI character set. (General -> Character Set)
What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR etc.
typedef const wchar_t* LPCWSTR;
{project properties->advanced->character set->use multi byte character set} İf you do these step you problem is solved
You are building with WinApi in Unicode mode, so all string parameters resolve to wide strings. The simplest fix would be to change the WinApi to ANSI, otherwise you need to create a wchar_t* with the contents from filename and use that as an argument.
Am able to solve this error by setting the Character set to "Use Multi-Byte Character set"
[Project Properties-> Configuration Properties -> General -> Character Set ->"Use Multi-Byte Character set"
not sure what compiler you are using but in visual studio you can specify the default char type, whether it be UNICODE or multibyte. In your case it sounds as if UNICODE is default so the simplest solution is to check for the switch on your particular compiler that determines default char type because it would save you some work, otherwise you would end up adding code to convert back and forth from UNICODE which may add unnecessary overhead plus could be an additional source of error.

Resources