String safe functions vs security enhanced CRT - c

Are the StringCch* functions considered safer then the safe versions of the CRT string functions.
Is StringCchCatW safer then wcscat_s? Or StringCchCopyW vs wcscpy_s?

This is tagged C++, so you should not use either of them, but std::wstring instead.

It really depends on how you are dealing with strings in your application.
With the Windows SDK many people end up mixing a lot of c-runtime, and windows api calls - because the windows api is a C API layered on top of the c-runtime it is difficult for windows programmers to know they are doing this, and why it is (potentially) wrong.
But, basically, at some point of your application development, you really should choose where you are going to get your basic data types from. Especially in the case of string data as the localization apis, that effect sorting, collation and so on are quite different.
You can choose to use the c-runtimes string support. This involves using strings that are typed as "const char*" or wchar_t. Microsoft's c-runtime specifically (it is possible to develop windows applications with GCC, so this distinction can be important if you want to write code that compiles on multiple compilers) offers a set of functions in , based off storing text data in _TCHAR's, and using the _MBCS and _UNICODE macros to switch the functions from single character, to multi byte character to wide character supporting versions.
If you choose to use the c-runtimes string abstractions, then using the c-runtimes "safe" string routnes would make the most sense.
Alternatively, Windows GUI applications frequently choose to use the windows SDK data types: CHAR, INT, WCHAR, TCHAR, DWORD etc. Strings, would be represented by LPCTSTR variables usually, and in this case, continuing to use the windows abstractions, SafeStringCch etc. make the most sense.
Really, the worst thing you can do in a program is keep on bouncing between using windows API calls, and c-runtime calls, as they have different rules sometimes for how they deal with edge cases... and this can lead to some very hard to spot bugs.

Related

When should Win32/WinAPI types be used vs. Standard C types?

I'm late to the Win32 party and there's a sea of functions such as _tprintf, TEXT(), there are also libraries like strsafe.h which have such functions like StringCchCopy(), StringCchLength(), etc.. Basically, Win32 API introduces a bunch of extra functions and types on top of C which can be confusing to a C programmer who hasn't worked with Win32 much. I do not have a problem finding the definitions of these types and functions on MSDN. However, I do have a problem finding guidelines on when and why they should be used.
I have 2 questions:
How important is it to use all of these types and special functions which Microsoft has provided on top of standard C when programming with Win32? Is it considered good practice to do away with all standard C functions and types and use entirely Microsoft wrappers?
Is it okay to mix standard C functions in with these Microsoft types and functions? For example, to use malloc() instead of HeapAlloc(), or to use printf() instead of _tprintf() and etc...?
I have a copy of Charles Petzold's Programming Windows Fifth Edition book but it mostly covers GUI stuff and not a lot of the remainder of the API.
There are actually 3 questions here, the ones you explicitly asked, and the one you didn't. Let's get that last one out of the way first, as it tends to cause the most confusion:
What are those _t-extensions offered by the Microsoft-provided CRT?
They are generic-text mappings, introduced to make it possible to write code that targets both ANSI-based systems (Win9x) as well as Unicode-based systems (Windows NT). They are macros that expand to the actual function calls, based on the _UNICODE and _MBCS preprocessor symbols. For example, the symbol _tprintf expands to either printf or wprintf.
Likewise, the Windows API provides both ANSI and Unicode versions of the API calls. They, too, are preprocessor macros that expand to the actual API call, depending on the preprocessor symbol UNICODE. For example, the CreateFile symbol expands to CreateFileA or CreateFileW.
Generic-text mappings haven't been useful in the past two decades. Today, simply use the Unicode versions of the CRT and API calls (e.g. wprintf and CreateFileW). You can define _UNICODE and UNICODE for good measure, too, so that you don't accidentally call an ANSI version.
there are also libraries like strsafe.h which have such functions like StringCchCopy(), StringCchLength()
Those are safe variants of the CRT string manipulation calls. They are safer than e.g. strcpy by providing the buffer size of the destination, similar to strncpy. The latter, however, suffers from an awkward design decision, that causes the destination buffer to not get zero-terminated, in case the source won't fit. StringCchCopy will always zero-terminate the destination buffer, and thus provides additional safety over the CRT implementations. (Note: C11 introduces safe variants, e.g. strncpy_s, that will always zero-terminate the destination array, in case the input is valid. They also validate the input, calling the currently installed constraint handler when validation fails, thus providing even stronger safety than the strsafe.h implementations. The bounds-checked implementations are a conditional feature of C11.)
How important is it to use all of these types and special functions which Microsoft has provided on top of standard C when programming with Win32? Is it considered good practice to do away with all standard C functions and types and use entirely Microsoft wrappers?
It is not important at all. You can use whichever is more suitable in your scenario. If in doubt, writing portable (i.e. Standard C) code is generally preferable. You only ever want to call the Windows API calls, if you need the additional control they offer (e.g. HeapAlloc allows more control over the allocation than malloc does; likewise CreateFile provides more options than fopen).
Is it okay to mix standard C functions in with these Microsoft types and functions? For example, to use malloc() instead of HeapAlloc(), or to use printf() instead of _tprintf() and etc...?
In general, yes, as long as you match those calls: HeapFree what you HeapAlloc, free what you malloc. You must not mix HeapAlloc and free, for example. In case a Windows API call requires special memory management functions to be used, it is explicitly pointed out in the documentation. For example, if FormatMessage is requested to allocate the buffer to return data, it must be freed using LocalFree. If you do not request the API to allocate a buffer, you can pass in a buffer allocated any way you like (malloc, HeapAlloc, IMalloc::Alloc, etc.).
It is possible to create programs on Windows without using any standard C library functions but most programs do and then you might as well use malloc over HeapAlloc. malloc will use HeapAlloc or VirtualAlloc internally but it is probably tuned for better performance/less fragmentation compared to the raw API. It also makes it easier to port to POSIX in the future. You will still be forced to use LocalFree/GlobalFree/HeapFree in some places where the API allocates memory for you.
Handling text needs special consideration and you need to decide if you need Unicode support or not. A stroll down memory lane might shed some light on why things are the way they are.
Back when Windows 95/98 was king you could use the char/CHAR narrow string types with both the C standard functions and the Windows API. There was virtually no Unicode support except for a handful of functions.
On Windows NT4/2000 however the native string type is WCHAR (UTF-16 LE but Microsoft just calls it Unicode). If you are using Microsoft Visual C++ then you have access to wide string versions of the C standard libray beyond what the C standard actually requires to ease coding for this platform. When coding for Windows using the Microsoft toolchain you can assume that the Windows SDK WCHAR type is the same as the wchar_t type defined by C.
Because the development of 95 and NT4 overlapped they share the same API and every function that receives/returns a string has two versions, one with a A suffix ("ANSI") and one with a W suffix. On Windows 95 the W functions are just stubs that return failure.
When you include Windows.h it will create defines like #define CreateProcess CreateProcessW if UNICODE is defined or #define CreateProcess CreateProcessA if not.
Visual C++ does the same thing with the tchar.h header. It uses the _UNICODE define to decide if the TCHAR type and the _t* functions use the char or wchar_t type. This meant that you could create two releases from the same source code, one for Windows 95/98/ME and one with full Unicode support.
This is not that relevant anymore but you still need to make a choice because things will be defined for one or the other.
It is still perfectly valid to do
#define UNICODE
#define _UNICODE
#include <windows.h>
#include <tchar.h>
void foo()
{
TCHAR buf[100];
SomeWindowsFunction(buf, 100);
_tprintf(_T("foo: %s\n"), buf);
}
although you will see many people go straight for WCHAR and wprintf these days.
The StrSafe functions were added to make it easier to write bug free code, they still have the same A/W duplication.
You cannot mix and match WCHAR with printf, even if you use %ls in the format string the string will be converted internally and not all Unicode strings will convert correctly.
If POSIX portability is not a requirement then I suggest that you use the wide function extensions provided by Microsoft when you need a C library function.
Note that different versions of the OS use different definitions of base types and use different alignments/padding. Remember 8086, 386 and now Core i7(16, 32, 64 bits).
When structs need to be compatible with earlier versions, they typically use pre-defined integer widths and pad for legacy alignment.
For that reason, the types from the API must be used in API calls.
Also in memory there used and maybe are different memory models of process memory and shared memory. For example the clipboard uses a form of shared memory. It is important to use the memory allocation mechanisms Microsoft advices here for API calls.
For everything non-API I use the standard C functions and types.

How to deal with Unicode paths in a cross-platfrom C library?

I'm contributing to a C library. It has a function that takes a char* parameter for a file path name. The authors are mostly UNIX developers, and this works fine on unixes where char* mostly means UTF-8. (At least in GCC, the character set is configurable and UTF-8 is the default.)
However, char* means ANSI on Windows, which implies that it is currently impossible to use Unicode path names with this library on Windows, where wchar_t* should be used and only UTF-16 is supported. (A quick search on StackOverflow reveals that the ANSI Windows API functions can not be used with UTF-8.)
The question is, what is the right way to deal with this? We've come up with various ways to do it, but neither of us are Windows experts, so we can't really decide how to do it properly. Our goal is that the users of the library should be able to write cross-platform code that would work on unixes as well as windows.
Under the hood, the library has #ifdefs in place to differentiate between operating systems so that it can use POSIX functions on UNIXes and Win32 APIs on Windows.
So far, we've come up with the following possibilities:
Offer a separate windows-only function that accepts a wchar_t*.
Require UTF-16 on Windows and #ifdef the library header in such a way that the function would accept wchar_t* on Windows.
Add a flag that would tell the function to cast the given char* to wchar_t* and call the widechar Windows APIs.
Create a variant of the function that takes a file descriptor (or file handle on Windows) instead of a file path.
Always require UTF-8 (even on Windows), and then inside the function, convert UTF-8 to UTF-16 and call the widechar Windows APIs.
The problem with options 1-4 is that they would require the user to consciously take care of portability themselves. Option 5 sounds good, but I'm not sure if this is the right way to go.
I'm also open to other suggestions or ideas that can solve this. :)
Since portability is an important goal for you, I think it is imperative for your function semantics to be precisely defined. Among other things, that means that the arguments' types and meanings don't vary across platforms. So, if you have a function that accepts regular char based paths then it should accept such paths on all systems, and the encoding expected of those paths should be well-defined (which does not necessarily mean "the same"). That rules out options (2) and (3).
Moreover, portability requires the same functions to be usable across all platforms; that rules out (1). Option (4) could be ok if a stream- and/or file descriptor-based approach were the only one provided by your library, but it yields portability only with respect to those functions, not with respect to the path-based ones. (And note that stream (FILE *) APIs are defined by C, whereas file descriptors are a POSIX concept, not native to C. In principle, therefore, streams are more portable than file descriptors.)
(5) could work, but it places stronger constraints than you actually need. It is not essential for the function to define the encoding expected (though it can); it suffices for it to define how that encoding is determined.
Additionally, you could add wchar_t-based functions that work everywhere (as opposed to Windows-only). Those might be more convenient for Windows users. Similar to alternative (4), however, that provides portability only with respect to those functions. Supposing that you don't want to drop the char-based ones, you would need to pair this alternative with some variation on (5).

Best Practice Regarding Types When Programming Windows?

I'm currently learning Windows programming through the Win32 API using Petzold's book as a resource and I was wondering if I should use the types defined in the API instead of the standard C types (ie. char instead of CHAR, DWORD instead of unsigned long). I understand that this was mostly for backwards compatibility but is there any benefit of using them right now?
I would use the Windows types only in code that's directly interfacing with Windows API, and even then, only when it matters what type you're using - like if you need to pass a pointer to that type to an API function, or for semi-opaque types like handles. Don't start writing your for loops with INT or DWORD as the loop counter variable...
Of course I may be biased... ;-)
Use the Windows types, especially for return values. You're much more likely to write portable code (i.e. works in 32-bit and 64-bit versions) that way.
When in rome... so yes. It makes your code "fit" in a particular environment.
Obviously this is more relevant if doing MFC/COM+ than, say, a (portable) console app that only makes a few WinAPI calls. (The WinAPI calls should still use the "windows types", IMOHO. They are already including anyway.)
Happy coding.

Which string manipulation functions should I use?

On my Windows/Visual C environment there's a wide number of alternatives for doing the same basic string manipulation tasks.
For example, for doing a string copy I could use:
strcpy, the ANSI C standard library function (CRT)
lstrcpy, the version included in kernel32.dll
StrCpy, from the Shell Lightweight Utility library
StringCchCopy/StringCbCopy, from a "safe string" library
strcpy_s, security enhanced version of CRT
While I understand that all these alternatives have an historical reason, can I just choose a consistent set of functions for new code? And which one? Or should I choose the most appropriate function case by case?
First of all, let's review pros and cons of each function set:
ANSI C standard library function (CRT)
Functions like strcpy are the one and only choice if you are developing portable C code. Even in a Windows-only project, it might be a wise thing to have a separation of portable vs. OS-dependent code.
These functions have often assembly level optimization and are therefore very fast.
There are some drawbacks:
they have many limitations and therefore often you still have to call functions from other libraries or provide your own versions
there are some archaisms like the infamous strncpy
Kernel32 string functions
Functions like lstrcpy are exported by kernel32 and should be used only when trying to avoid any dependency to the CRT. You might want to do that for two reasons:
avoiding the CRT payload for an ultra lightweight executable (unusual these days but not in the 90s!)
avoiding initialization issues (if you launch a thread with CreateThread instead of _beginthread).
Moreover, the kernel32 function could be more optimized that the CRT version: when your executable will run on Windows 12 optimized for a Core i13, kernel32 could use an assembly-optimized version.
Shell Lightweight Utility Functions
Here are valid the same considerations made for the kernel32 functions, with the added value of some more complex functions. However I doubt that they are actively maintained and I would just skip them.
StrSafe Function
The StringCchCopy/StringCbCopy functions are usually my personal choice: they are very well designed, powerful, and surprisingly fast (I also remember a whitepaper that compared performance of these functions to the CRT equivalents).
Security-Enhanced CRT functions
These functions have the undoubted benefit of being very similar to ANSI C equivalents, so porting legacy code is a piece of cake. I especially like the template-based version (of course, available only when compiling as C++). I really hope that they will be eventually standardized. Unfortunately they have a number of drawbacks:
although a proposed standard, they have been basically rejected by the non-Windows community (probably just because they came from Microsoft)
when fail, they don't just return an error code but execute an invalid parameter handler
Conclusions
While my personal favorite for Windows development is the StrSafe library, my advice is to use the ANSI C functions whenever is possible, as portable-code is always a good thing.
In the real life, I developed a personalized portable library, with prototypes similar to the Security-Enhanced CRT functions (included the powerful template based technique), that relies on the StrSafe library on Windows and on the ANSI C functions on other platforms.
My personal preference, for both new and existing projects, are the StringCchCopy/StringCbCopy versions from the safe string library. I find these functions to be overall very consistent and flexible. And they were designed from the groupnd up with safety / security in mind.
I'd answer this question slightly different. Do you want to have portable code or not? If you want to be portable you can not rely on anything else but strcpy, strncpy, or the standard wide character "string" handling functions.
Then if your code just has to run under Windows you can use the "safe string" variants.
If you want to be portable and still want to have some extra safety, than you should check cross-platform libraries like e.g
glib or
libapr
or other "safe string libraries" like e.g:
SafeStrLibrary
I would suggest using functions from the standard library, or functions from cross-platform libraries.
I would stick to one, I would pick whichever one is in the most useful library in case you need to use more of it, and I would stay away from the kernel32.dll one as it's windows only.
But these are just tips, it's a subjective question.
Among those choices, I would simply use strcpy. At least strcpy_s and lstrcpy are cruft that should never be used. It's possibly worthwhile to investigate those independently written library functions, but I'd be hesitant to throw around nonstandard library code as a panacea for string safety.
If you're using strcpy, you need to be sure your string fits in the destination buffer. If you just allocated it with size at least strlen(source)+1, you're fine as long as the source string is not simultaneously subject to modification by another thread. Otherwise you need to test if it fits in the buffer. You can use interfaces like snprintf or strlcpy (nonstandard BSD function, but easy to copy an implementation) which will truncate strings that don't fit in your destination buffer, but then you really need to evaluate whether string truncation could lead to vulnerabilities in itself. I think a much better approach when testing whether the source string fits is to make a new allocation or return an error status rather than performing blind truncation.
If you'll be doing a lot of string concatenation/assembly, you really should write all your code to manage the length and current position as you go. Instead of:
strcpy(out, str1);
strcat(out, str2);
strcat(out, str3);
...
You should be doing something like:
size_t l, n = outsize;
char *s = out;
l = strlen(str1);
if (l>=outsize) goto error;
strcpy(s, str1);
s += l;
n -= l;
l = strlen(str2);
if (l>=outsize) goto error;
strcpy(s, str2);
s += l;
n -= l;
...
Alternatively you could avoid modifying the pointer by keeping a current index i of type size_t and using out+i, or you could avoid the use of size variables by keeping a pointer to the end of the buffer and doing things like if (l>=end-s) goto error;.
Note that, whichever approach you choose, the redundancy can be condensed by writing your own (simple) functions that take pointers to the position/size variable and call the standard library, for instance something like:
if (!my_strcpy(&s, &n, str1)) goto error;
Avoiding strcat also has performance benefits; see Schlemiel the Painter's algorithm.
Finally, you should note that a good 75% of the string copying and assembly people perform in C is utterly useless. My theory is that the people doing it come from backgrounds in script languages where putting together strings is what you do all the time, but in C it's not useful that often. In many cases, you can get by with never copying strings at all, using the original copies instead, and get much better performance and simpler code at the same time. I'm reminded of a recent SO question where OP was using regexec to match a regular expression, then copying out the result just to print it, something like:
char *tmp = malloc(match.end-match.start+1);
memcpy(tmp, src+match.start, match.end-match.start);
tmp[match.end-match.start] = 0;
printf("%s\n", tmp);
free(tmp);
The same thing can be accomplished with:
printf("%.*s\m", match.end-match.start, src+match.start);
No allocations, no cleanup, no error cases (the original code crashed if malloc failed).

Why does the Win32-API have so many custom types?

I'm new to the Win32 API and the many new types begin to confuse me.
Some functions take 1-2 ints and 3 UINTS as arguments.
Why can't they just use ints? What are UINTS?
Then, there are those other types:
DWORD LPCWSTR LPBOOL
Again, I think the "primitive" C types would be enough - why introduce 100 new types?
This one was a pain: WCHAR*
I had to iterate through it and push_back every character to an std::string as there wasn't another way to convert it to one. Horrible.
Why WCHAR? Why reinvent the wheel? They could have just used char* instead, or?
The Windows API was first created back in the 1980's, and has had to support several different CPU architectures and compilers over the years. They've gone from single-user single-process standalone systems to networked multi-user multi-core security-conscious systems. They had to work around issues with 16-bit vs. 32-bit processors, and now 64-bit processors. They had to work around issues with pre-ANSI C compilers. They had to support C++ compilers in the early unstandardized times. They had to deal with segmented memory. They had to support internationalization before Unicode existed. They had to support some source-level compatibility with MS-DOS, with OS/2, and with Mac OS. They've had to run on several generations of Intel chips, and PowerPC, and MIPS, and Alpha, and ARM. The same basic API is used for desktop, server, mobile, and embedded systems.
Back in the 1980's, C was considered to be a high-level language (yes, really!) and many people considered it good form to use abstract types rather than just specifying everything as a primitive int, char, or void *. Back when we didn't have IntelliSense and infotips and code browsers and online documentation and the like, such usage hints were helpful, and it made it easier to port code between different compilers and different programming languages.
Yes, it looks like a horrible mess now, but that doesn't mean anybody did anything wrong.
Win32 actually has very few primitive types. What you're looking at is decades of built-up #defines and typedefs and Hungarian notation. Because there were so few types and little or no IntelliSense developers gave themselves "clues" as to what a particular type was actually supposed to do.
For example, there is no boolean type but there is an "aliased" representation of an integer that tells you that a particular variable is supposed to be treated as a boolean. Take a look at the contents of WinDef.h to see what I mean.
You can take a look here: http://msdn.microsoft.com/en-us/library/aa383751(VS.85).aspx
for a peek at the veritable tip of the iceberg. For example, notice how HANDLE is the base typedef for every other object that is a "handle" to a windows object. Of course, HANDLE is defined somewhere else as a primitive type.
UINT is an unsigned integer. If a parameter value will not / cannot be negative, it makes sense to specify unsigned. LPCWSTR is a pointer to const wide char array, while WCHAR* is non-const.
You should probably compile your app for UNICODE when working with wide chars, or use a conversion routine to convert from narrow to wide.
http://msdn.microsoft.com/en-us/library/dd319072%28VS.85%29.aspx
http://msdn.microsoft.com/en-us/library/dd374083%28v=VS.85%29.aspx
A coworker of mine would say, "There is no problem that can't be solved (obfuscated?) by a level of indirection." In Win32, you'll be dealing with WCHAR, UINT, etc., and you'll get used to it. You won't have to worry when you deploy that DLL which basic type a WCHAR or UINT compiles to—it will "just work".
It is best to read through some of the documentation to get used to it. Especially on the "wide char" support (WCHAR, etc.). There's a nice definition on MSDN for WCHAR.

Resources