Concatenating a string using Win32 API - c

What's the best way to concatenate a string using Win32? If Understand correctly, the normal C approach would be to use strcat, but since Win32 now deals with Unicode strings (aka LPWSTR), I can't think of a way for strcat to work with this.
Is there a function for this, or should I just write my own?

lstrcat comes in ANSI and Unicode variants. Actually lstrcat is simply a macro defined as either lstrcatA or lstrcatW.
These functions are available by importing kernel32.dll. Useful if you're trying to completely avoid the C runtime library. In most cases you can just use wcscat or _tcscat as roy commented.
Also consider the strsafe.h functions, such as StringCchCat These come in ANSI and Unicode variants as well, but they help protect against buffer overflow.

Related

When should Win32/WinAPI types be used vs. Standard C types?

I'm late to the Win32 party and there's a sea of functions such as _tprintf, TEXT(), there are also libraries like strsafe.h which have such functions like StringCchCopy(), StringCchLength(), etc.. Basically, Win32 API introduces a bunch of extra functions and types on top of C which can be confusing to a C programmer who hasn't worked with Win32 much. I do not have a problem finding the definitions of these types and functions on MSDN. However, I do have a problem finding guidelines on when and why they should be used.
I have 2 questions:
How important is it to use all of these types and special functions which Microsoft has provided on top of standard C when programming with Win32? Is it considered good practice to do away with all standard C functions and types and use entirely Microsoft wrappers?
Is it okay to mix standard C functions in with these Microsoft types and functions? For example, to use malloc() instead of HeapAlloc(), or to use printf() instead of _tprintf() and etc...?
I have a copy of Charles Petzold's Programming Windows Fifth Edition book but it mostly covers GUI stuff and not a lot of the remainder of the API.
There are actually 3 questions here, the ones you explicitly asked, and the one you didn't. Let's get that last one out of the way first, as it tends to cause the most confusion:
What are those _t-extensions offered by the Microsoft-provided CRT?
They are generic-text mappings, introduced to make it possible to write code that targets both ANSI-based systems (Win9x) as well as Unicode-based systems (Windows NT). They are macros that expand to the actual function calls, based on the _UNICODE and _MBCS preprocessor symbols. For example, the symbol _tprintf expands to either printf or wprintf.
Likewise, the Windows API provides both ANSI and Unicode versions of the API calls. They, too, are preprocessor macros that expand to the actual API call, depending on the preprocessor symbol UNICODE. For example, the CreateFile symbol expands to CreateFileA or CreateFileW.
Generic-text mappings haven't been useful in the past two decades. Today, simply use the Unicode versions of the CRT and API calls (e.g. wprintf and CreateFileW). You can define _UNICODE and UNICODE for good measure, too, so that you don't accidentally call an ANSI version.
there are also libraries like strsafe.h which have such functions like StringCchCopy(), StringCchLength()
Those are safe variants of the CRT string manipulation calls. They are safer than e.g. strcpy by providing the buffer size of the destination, similar to strncpy. The latter, however, suffers from an awkward design decision, that causes the destination buffer to not get zero-terminated, in case the source won't fit. StringCchCopy will always zero-terminate the destination buffer, and thus provides additional safety over the CRT implementations. (Note: C11 introduces safe variants, e.g. strncpy_s, that will always zero-terminate the destination array, in case the input is valid. They also validate the input, calling the currently installed constraint handler when validation fails, thus providing even stronger safety than the strsafe.h implementations. The bounds-checked implementations are a conditional feature of C11.)
How important is it to use all of these types and special functions which Microsoft has provided on top of standard C when programming with Win32? Is it considered good practice to do away with all standard C functions and types and use entirely Microsoft wrappers?
It is not important at all. You can use whichever is more suitable in your scenario. If in doubt, writing portable (i.e. Standard C) code is generally preferable. You only ever want to call the Windows API calls, if you need the additional control they offer (e.g. HeapAlloc allows more control over the allocation than malloc does; likewise CreateFile provides more options than fopen).
Is it okay to mix standard C functions in with these Microsoft types and functions? For example, to use malloc() instead of HeapAlloc(), or to use printf() instead of _tprintf() and etc...?
In general, yes, as long as you match those calls: HeapFree what you HeapAlloc, free what you malloc. You must not mix HeapAlloc and free, for example. In case a Windows API call requires special memory management functions to be used, it is explicitly pointed out in the documentation. For example, if FormatMessage is requested to allocate the buffer to return data, it must be freed using LocalFree. If you do not request the API to allocate a buffer, you can pass in a buffer allocated any way you like (malloc, HeapAlloc, IMalloc::Alloc, etc.).
It is possible to create programs on Windows without using any standard C library functions but most programs do and then you might as well use malloc over HeapAlloc. malloc will use HeapAlloc or VirtualAlloc internally but it is probably tuned for better performance/less fragmentation compared to the raw API. It also makes it easier to port to POSIX in the future. You will still be forced to use LocalFree/GlobalFree/HeapFree in some places where the API allocates memory for you.
Handling text needs special consideration and you need to decide if you need Unicode support or not. A stroll down memory lane might shed some light on why things are the way they are.
Back when Windows 95/98 was king you could use the char/CHAR narrow string types with both the C standard functions and the Windows API. There was virtually no Unicode support except for a handful of functions.
On Windows NT4/2000 however the native string type is WCHAR (UTF-16 LE but Microsoft just calls it Unicode). If you are using Microsoft Visual C++ then you have access to wide string versions of the C standard libray beyond what the C standard actually requires to ease coding for this platform. When coding for Windows using the Microsoft toolchain you can assume that the Windows SDK WCHAR type is the same as the wchar_t type defined by C.
Because the development of 95 and NT4 overlapped they share the same API and every function that receives/returns a string has two versions, one with a A suffix ("ANSI") and one with a W suffix. On Windows 95 the W functions are just stubs that return failure.
When you include Windows.h it will create defines like #define CreateProcess CreateProcessW if UNICODE is defined or #define CreateProcess CreateProcessA if not.
Visual C++ does the same thing with the tchar.h header. It uses the _UNICODE define to decide if the TCHAR type and the _t* functions use the char or wchar_t type. This meant that you could create two releases from the same source code, one for Windows 95/98/ME and one with full Unicode support.
This is not that relevant anymore but you still need to make a choice because things will be defined for one or the other.
It is still perfectly valid to do
#define UNICODE
#define _UNICODE
#include <windows.h>
#include <tchar.h>
void foo()
{
TCHAR buf[100];
SomeWindowsFunction(buf, 100);
_tprintf(_T("foo: %s\n"), buf);
}
although you will see many people go straight for WCHAR and wprintf these days.
The StrSafe functions were added to make it easier to write bug free code, they still have the same A/W duplication.
You cannot mix and match WCHAR with printf, even if you use %ls in the format string the string will be converted internally and not all Unicode strings will convert correctly.
If POSIX portability is not a requirement then I suggest that you use the wide function extensions provided by Microsoft when you need a C library function.
Note that different versions of the OS use different definitions of base types and use different alignments/padding. Remember 8086, 386 and now Core i7(16, 32, 64 bits).
When structs need to be compatible with earlier versions, they typically use pre-defined integer widths and pad for legacy alignment.
For that reason, the types from the API must be used in API calls.
Also in memory there used and maybe are different memory models of process memory and shared memory. For example the clipboard uses a form of shared memory. It is important to use the memory allocation mechanisms Microsoft advices here for API calls.
For everything non-API I use the standard C functions and types.

C Language. How to use a string value as delimiter in SSCANF

Is there a way to use a string as a delimiter?
We can use characters as delimiters using sscanf();
Example
I have
char url[]="username=jack&pwd=jack123&email=jack#example.com"
i can use.
char username[100],pwd[100],email[100];
sscanf(url, "username=%[^&]&pwd=%[^&]&email=%[^\n]", username,pwd,email);
it works fine for this string. but for
url="username=jack&jill&pwd=jack&123&email=jack#example.com"
it cant be used...its to remove SQL injection...but i want learn a trick to use
&pwd,&email as delimiters..not necessarily with sscanf.
Update: Solution doesnt necessarily need to be in C language. I only want to know of a way to use string as a delimiter
Just code your own parsing. In many cases, representing in memory the AST you have parsed is useful. But do specify and document your input language (perhaps using EBNF notation).
Your input language (which you have not defined in your question) seems to be similar to the MIME type application/x-www-form-urlencoded used in HTTP POST requests. So you might look, at least for inspiration, into the source code of free software libraries related to HTTP server processing (like libonion) and HTTP client processing (like libcurl).
You could read an entire line with getline (or perhaps fgets) then parse it appropriately. sscanf with %n, or strtok might be useful, but you can also parse the line "manually" (consider using e.g. your recursive descent parser). You might use strchr or strstr also.
BTW, in many cases, using common textual representations like JSON, YAML, XML can be helpful, and you can easily find many libraries to handle them.
Notice also that strings can be processed as FILE* by using fmemopen and/or open_memstream.
You could use parser generators such as bison (with flex).
In some cases, regular expressions could be useful. See regcomp and friends.
So what you want to achieve is quite easy to do and standard practice. But you need more that just sscanf and you may want to combine several things.
Many external libraries (e.g. glib from GTK) provide some parsing. And you should care about UTF-8 (today, you have UTF-8 everywhere).
On Linux, if permitted to do so, you might use GNU readline instead of getline when you want interactive input (with editing abilities and autocompletion). Then take inspiration from the source code of GNU bash (or of RefPerSys, if interested by C++).
If you are unfamiliar with usual parsing techniques, read a good book such as the Dragon Book. Most large programs deal somewhere with parsing, so you need to know how that can be done.

Portable way to use the regex(3) functions on a wide char string in C

There are functions like regwcomp(3) etc. on some systems, but this does not seem to be a portable solution at the moment. When there is a wchar_t string, what is the suggested portable solution (not Linux or GNU specific) to use the regex(3) functions (which normally work with char strings only)? In my case it is not really necessary that the pattern or text to match is non-7-bit ASCII, the problem is that the code used wchar_t for other reasons.
If anyone else has this problem, feel free to borrow the functions my_regwcomp and my_regwexec that I had to write recently. You can find them in this source file in the ProofPower system. These functions simulate the regwcomp and regwexec functions of Free BSD using the POSIX regcomp and regexec functions.
PS: my code is part of a Motif application, if you replace XtMalloc, XtRealloc and XtFree by malloc, ralloc and free it should work in any standard C/C++ development framework. Please add a comment to this answer if you need any help getting my functions working in your environment.

Macro alternative to _T("char") or _TEXT("char")

I know that the _T(x) macro converts a string literal to a unicode/multibyte
string based on a define, however I find it very annoying that I must make a
underscore and the parenthesis, it really confuses me, I'm not quiet fluent with
macros so I don't know, is there a way to detect all string literals and convert
them to a proper unicode/multibyte string?
No, there isn't a way to avoid the macro completely if you want your code to be portable on Windows. You can of course define your own macro like #define t(x) whatever_T_does if you want to save yourself some keystrokes, but this will probably anger future maintainers of your code.
_T() and _TEXT() are C runtime macros, not Win32 macros. TEXT() (no underscore) is the Win32 macro. Even though they essentially do the same thing, you should use C runtime macros only with C functions, and Win32 macros with Win32 functions. Don't mix them.
Do you really need the ability to compile for both multibyte and Unicode? You don't need any macro if you want multibyte. In a Unicode app it is easier to use L"literal string", which does not need the underscore or the parentheses.

A pure bytes version of strstr?

Is there a version of strstr that works over a fixed length of memory that may include null characters?
I could phrase my question like this:
strncpy is to memcpy as strstr is to ?
memmem, unfortunately it's GNU-specific rather than standard C. However, it's open-source so you can copy the code (if the license is amenable to you).
Not in the standard library (which is not that large, so take a look). However writing your own is trivial, either directly byte by byte or using memchr() followed by memcmp() iteratively.
In the standard library, no. However, a quick google search for "safe c string library" turns up several potentially useful results. Without knowing more about the task you are trying to perform, I cannot recommend any particular third-party implementation.
If this is the only "safe" function that you need beyond the standard functions, then it may be best to roll your own rather than expend the effort of integrating a third-party library, provided you are confident that you can do so without introducing additional bugs.

Resources