Should I use secure versions of POSIX functions on MSVC - C - c

I am writing some C code which is expected to compile on multiple compilers (at least on MSVC and GCC). Since I am beginner in C, I have all warnings turned on and warnings are treated as errors (-Werror in GCC & /WX in MSVC) to prevent me from making silly mistakes.
When I compiled some code that uses strcpy on MSVC, I get warning like,
warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
I am bit confused. Lot of common functions are deprecated on MSVC. Should I use this secured version when on Windows? If yes, should I wrap strcpy something like,
my_strcpy()
{
#ifdef WIN32
// use strcpy_s
#ELSE
// use strcpy
}
Any thoughts?

Whenever you move data between non-constant-size buffers, you have to (gasp! omg!) actually think about whether it fits. Using functions (like the MS-specific strcpy_s or the BSD strlcpy) that purport to be "safe" will protect you from some obvious buffer overflow conditions, but won't protect you from the bugs that result from string truncation. It also won't protect you from integer overflows in computing the necessary sizes of buffers.
Unless you're an expert dealing with C strings, I would recommend forgetting about special functions and commenting every line of your code that will perform variable-length/position writes with a justification for how you know, at this point in the program, that the length/offset you're about to use is within the bounds of the size of the buffer. Do this for lines where you perform arithmetic on sizes/offsets too - document how you know that the arithmetic will not overflow, and add tests for overflow if you find you don't know.
Another approach is to completely wrap all your string handling in a string object that stores the length of the buffer along with the string and automatically reallocates when a string needs to be enlarged, and then only use const char * for read-only access to strings when you need to pass them to system functions or other libraries. This will sacrifice a good bit of the performance you'd expect from C, but it will help you ensure that you don't make mistakes. Just don't take it to the extreme. There's no need to duplicate stuff like strchr, strstr, etc. in your string wrapper. Just provide methods to duplicate string objects, concatenate them, and truncate them, and then with the existing library functions that operate on const char * you can do just about anything you'd want to.

There are lots and lots of discussions about this topic here on SO. The usual suspects like strncpy, strlcpy and whatever will pop up here again, I'm sure. Just type "strcpy" in the search box and read some of the longer threads to get an overview.
My advice is: Whatever your final choice will be, it is a good idea to follow the DRY principle and continue to do it as in your example of my_strcpy(). Don't throw the raw calls all over your code, use wrappers and centralize them in your own string handling library. This will reduce overall code (boilerplate), and you have one central location to make modifications, if you change your mind later.
Of course this opens up some other cans of worms, especially for a beginner: Memory handling responsibility and interface design. Both a topic on its own, and 5 people will give you 10 suggestions of how to do it. A central library usually has the nice effect that it enforces a decision, which you will follow throughout your whole codebase, instead of using method a in module A and method b in module B, causing you trouble when you try to connect A with B...

I would tend to use the safer function snprintf † which is available on both platforms rather than having different paths depending on platform. You will need to use the define to prevent the warnings on MSVC.
† though possibly slightly less safer - it will return a string which is not nul-terminated on error, so you must check the return, but it won't cause a buffer overflow.

Related

Why is 'sys_errlist' deprecated in glibc?

sys_errlist is a handy array which allows getting static errno descriptions. The alternative to it is the strerror_r function, which is available in two confusing incompatible flavors. The GNU version of it returns char *, which would be from that same aforementioned array as long as the error is known, or otherwise from the user-supplied buffer. The standards-compliant version of strerror_r returns an int instead, and always uses the user-supplied buffer. The problem is, those two functions share the same name despite having completely different semantics, so you basically have to perform a fairly complex #ifdef check and write two completely different versions of your code depending on which version you get. In addition to that, both of those functions are worse than sys_errlist, as both require for the caller to provide a "large enough" buffer to hold the description, even though the GNU version would rarely use it, and neither function allows to know just how large the buffer should really be. If instead you choose to use sys_errlist instead, you can simply check whether value >= sys_nerr and allocate the buffer only in that case, just to put the Unknown error %d there via snprintf, and be done.
Given that strerror_r is a horrible, incomprehensible and inefficient mess, why did GNU developers mark sys_errlist as deprecated, effectively forcing one to either use strerrror_r or to observe the ugly warning each time the code is compiled?
strerror and its relative are localized. The usefulness of a non-localized system message can be debated, but glibc's maintainers went with the prevailing direction (Solaris and other systems).
However: sys_errlist has been deprecated for quite a while. It is not a POSIX interface. Some systems do not have it.
Further reading:
Where can I find the contents of sys_errlist?
Use strerror() or strerror_r() instead of sys_errlist and sys_nerr (fish-shell bug #1830)
RE: sys_errlist (Cygwin in 1999)
GNU Hurd/ hurd/ porting/ guidelines
It's been a while since this was an issue, but it used to be the case that some systems did not have strerror (see Unix Incompatibility Notes:
String and Memory Functions).

Stand-alone portable snprintf(), independent of the standard library?

I am writing code for a target platform with NO C-runtime. No stdlib, no stdio. I need a string formatting function like snprintf but that should be able to run without any dependencies, not even the C library.
At most it can depend on memory alloc functions provided by me.
I checked out Trio but it needs stdio.h header. I can't use this.
Edit
Target platform : PowerPC64 home made OS(not by me). However the library shouldn't rely on OS specific stuff.
Edit2
I have tried out some 3rd-party open source libs, such as Trio(http://daniel.haxx.se/projects/trio/), snprintf and miniformat(https://bitbucket.org/jj1/miniformat/src) but all of them rely on headers like string.h, stdio.h, or(even worse) stdlib.h. I don't want to write my own implementation if one already exists, as that would be time-wasting and bug-prone.
Try using the snprintf implementation from uclibc. This is likely to have the fewest dependencies. A bit of digging shows that snprintf is implemented in terms of vsnprintf which is implemented in terms of vfprintf (oddly enough), it uses a fake "stream" to write to string.
This is a pointer to the code: http://git.uclibc.org/uClibc/tree/libc/stdio/_vfprintf.c
Also, a quick google search also turned up this:
http://www.ijs.si/software/snprintf/
http://yallara.cs.rmit.edu.au/~aholkner/psnprintf/psnprintf.html
http://www.jhweiss.de/software/snprintf.html
Hopefully one is suitable for your purposes. This is likely to not be a complete list.
There is a different list here:
http://trac.eggheads.org/browser/trunk/src/compat/README.snprintf?rev=197
You will probably at least need stdarg.h or low level knowledge of the specific compiler/architecture calling convention in order to be able to process the variadic arguments.
I have been using code based on Kustaa Nyholm's implementation It provides printf() (with user supplied character output stub) and sprintf(), but adding snprintf() would be simple enough. I added vprintf() and vsprintf() for example in my implementation.
No dynamic memory application is required, but it does have a dependency on stdarg.h, but as I said, you are unlikely to be able to get away without that for any variadic function - though you could potentially implement your own.
I am guessing you are in a norming enivronment where you need to explicitly document and verify COTS code.
However, I think in the case of stdarg.h this is worthwhile. You could pull in the source for just this and treat it like handwritten code (review, lint, unit-test, etc.). Any self-written replacement will be a lot of work, probably less stable and absolutely not portable.
That said, the actual snprintf implementation should not be too hard, and you could do this yourself, probably. Especially if you might be able to strip a few features away.
Keep in mind that vararg code has no typechecking and is prone to errors. For library snprintf you may find gcc's warnings helpful.

Harmful C Source File Check?

Is there a way to programmatically check if a single C source file is potentially harmful?
I know that no check will yield 100% accuracy -- but am interested at least to do some basic checks that will raise a red flag if some expressions / keywords are found. Any ideas of what to look for?
Note: the files I will be inspecting are relatively small in size (few 100s of lines at most), implementing numerical analysis functions that all operate in memory. No external libraries (except math.h) shall be used in the code. Also, no I/O should be used (functions will be run with in-memory arrays).
Given the above, are there some programmatic checks I could do to at least try to detect harmful code?
Note: since I don't expect any I/O, if the code does I/O -- it is considered harmful.
Yes, there are programmatic ways to detect the conditions that concern you.
It seems to me you ideally want a static analysis tool to verify that the preprocessed version of the code:
Doesn't call any functions except those it defines and non I/O functions in the standard library,
Doesn't do any bad stuff with pointers.
By preprocessing, you get rid of the problem of detecting macros, possibly-bad-macro content, and actual use of macros. Besides, you don't want to wade through all the macro definitions in standard C headers; they'll hurt your soul because of all the historical cruft they contain.
If the code only calls its own functions and trusted functions in the standard library, it isn't calling anything nasty. (Note: It might be calling some function through a pointer, so this check either requires a function-points-to analysis or the agreement that indirect function calls are verboten, which is actually probably reasonable for code doing numerical analysis).
The purpose of checking for bad stuff with pointers is so that it doesn't abuse pointers to manufacture nasty code and pass control to it. This first means, "no casts to pointers from ints" because you don't know where the int has been :-}
For the who-does-it-call check, you need to parse the code and name/type resolve every symbol, and then check call sites to see where they go. If you allow pointers/function pointers, you'll need a full points-to analysis.
One of the standard static analyzer tool companies (Coverity, Klocwork) likely provide some kind of method of restricting what functions a code block may call. If that doesn't work, you'll have to fall back on more general analysis machinery like our DMS Software Reengineering Toolkit
with its C Front End. DMS provides customizable machinery to build arbitrary static analyzers, for the a language description provided to it as a front end. DMS can be configured to do exactly the test 1) including the preprocessing step; it also has full points-to, and function-points-to analyzers that could be used to the points-to checking.
For 2) "doesn't use pointers maliciously", again the standard static analysis tool companies provide some pointer checking. However, here they have a much harder problem because they are statically trying to reason about a Turing machine. Their solution is either miss cases or report false positives. Our CheckPointer tool is a dynamic analysis, that is, it watches the code as it runs and if there is any attempt to misuse a pointer CheckPointer will report the offending location immediately. Oh, yes, CheckPointer outlaws casts from ints to pointers :-} So CheckPointer won't provide a static diagnostic "this code can cheat", but you will get a diagnostic if it actually attempts to cheat. CheckPointer has rather high overhead (all that checking costs something) so you probably want to run you code with it for awhile to gain some faith that nothing bad is going to happen, and then stop using it.
EDIT: Another poster says There's not a lot you can do about buffer overwrites for statically defined buffers. CheckPointer will do those tests and more.
If you want to make sure it's not calling anything not allowed, then compile the piece of code and examine what it's linking to (say via nm). Since you're hung up on doing this by a "programmatic" method, just use python/perl/bash to compile then scan the name list of the object file.
There's not a lot you can do about buffer overwrites for statically defined buffers, but you could link against an electric-fence type memory allocator to prevent dynamically allocated buffer overruns.
You could also compile and link the C-file in question against a driver which would feed it typical data while running under valgrind which could help detect poorly or maliciously written code.
In the end, however, you're always going to run up against the "does this routine terminate" question, which is famous for being undecidable. A practical way around this would be to compile your program and run it from a driver which would alarm-out after a set period of reasonable time.
EDIT: Example showing use of nm:
Create a C snippet defining function foo which calls fopen:
#include <stdio.h>
foo() {
FILE *fp = fopen("/etc/passwd", "r");
}
Compile with -c, and then look at the resulting object file:
$ gcc -c foo.c
$ nm foo.o
0000000000000000 T foo
U fopen
Here you'll see that there are two symbols in the foo.o object file. One is defined, foo, the name of the subroutine we wrote. And one is undefined, fopen, which will be linked to its definition when the object file is linked together with the other C-files and necessary libraries. Using this method, you can see immediately if the compiled object is referencing anything outside of its own definition, and by your rules, can considered to be "bad".
You could do some obvious checks for "bad" function calls like network IO or assembly blocks. Beyond that, I can't think of anything you can do with just a C file.
Given the nature of C you're just about going to have to compile to even get started. Macros and such make static analysis of C code pretty difficult.

Which string manipulation functions should I use?

On my Windows/Visual C environment there's a wide number of alternatives for doing the same basic string manipulation tasks.
For example, for doing a string copy I could use:
strcpy, the ANSI C standard library function (CRT)
lstrcpy, the version included in kernel32.dll
StrCpy, from the Shell Lightweight Utility library
StringCchCopy/StringCbCopy, from a "safe string" library
strcpy_s, security enhanced version of CRT
While I understand that all these alternatives have an historical reason, can I just choose a consistent set of functions for new code? And which one? Or should I choose the most appropriate function case by case?
First of all, let's review pros and cons of each function set:
ANSI C standard library function (CRT)
Functions like strcpy are the one and only choice if you are developing portable C code. Even in a Windows-only project, it might be a wise thing to have a separation of portable vs. OS-dependent code.
These functions have often assembly level optimization and are therefore very fast.
There are some drawbacks:
they have many limitations and therefore often you still have to call functions from other libraries or provide your own versions
there are some archaisms like the infamous strncpy
Kernel32 string functions
Functions like lstrcpy are exported by kernel32 and should be used only when trying to avoid any dependency to the CRT. You might want to do that for two reasons:
avoiding the CRT payload for an ultra lightweight executable (unusual these days but not in the 90s!)
avoiding initialization issues (if you launch a thread with CreateThread instead of _beginthread).
Moreover, the kernel32 function could be more optimized that the CRT version: when your executable will run on Windows 12 optimized for a Core i13, kernel32 could use an assembly-optimized version.
Shell Lightweight Utility Functions
Here are valid the same considerations made for the kernel32 functions, with the added value of some more complex functions. However I doubt that they are actively maintained and I would just skip them.
StrSafe Function
The StringCchCopy/StringCbCopy functions are usually my personal choice: they are very well designed, powerful, and surprisingly fast (I also remember a whitepaper that compared performance of these functions to the CRT equivalents).
Security-Enhanced CRT functions
These functions have the undoubted benefit of being very similar to ANSI C equivalents, so porting legacy code is a piece of cake. I especially like the template-based version (of course, available only when compiling as C++). I really hope that they will be eventually standardized. Unfortunately they have a number of drawbacks:
although a proposed standard, they have been basically rejected by the non-Windows community (probably just because they came from Microsoft)
when fail, they don't just return an error code but execute an invalid parameter handler
Conclusions
While my personal favorite for Windows development is the StrSafe library, my advice is to use the ANSI C functions whenever is possible, as portable-code is always a good thing.
In the real life, I developed a personalized portable library, with prototypes similar to the Security-Enhanced CRT functions (included the powerful template based technique), that relies on the StrSafe library on Windows and on the ANSI C functions on other platforms.
My personal preference, for both new and existing projects, are the StringCchCopy/StringCbCopy versions from the safe string library. I find these functions to be overall very consistent and flexible. And they were designed from the groupnd up with safety / security in mind.
I'd answer this question slightly different. Do you want to have portable code or not? If you want to be portable you can not rely on anything else but strcpy, strncpy, or the standard wide character "string" handling functions.
Then if your code just has to run under Windows you can use the "safe string" variants.
If you want to be portable and still want to have some extra safety, than you should check cross-platform libraries like e.g
glib or
libapr
or other "safe string libraries" like e.g:
SafeStrLibrary
I would suggest using functions from the standard library, or functions from cross-platform libraries.
I would stick to one, I would pick whichever one is in the most useful library in case you need to use more of it, and I would stay away from the kernel32.dll one as it's windows only.
But these are just tips, it's a subjective question.
Among those choices, I would simply use strcpy. At least strcpy_s and lstrcpy are cruft that should never be used. It's possibly worthwhile to investigate those independently written library functions, but I'd be hesitant to throw around nonstandard library code as a panacea for string safety.
If you're using strcpy, you need to be sure your string fits in the destination buffer. If you just allocated it with size at least strlen(source)+1, you're fine as long as the source string is not simultaneously subject to modification by another thread. Otherwise you need to test if it fits in the buffer. You can use interfaces like snprintf or strlcpy (nonstandard BSD function, but easy to copy an implementation) which will truncate strings that don't fit in your destination buffer, but then you really need to evaluate whether string truncation could lead to vulnerabilities in itself. I think a much better approach when testing whether the source string fits is to make a new allocation or return an error status rather than performing blind truncation.
If you'll be doing a lot of string concatenation/assembly, you really should write all your code to manage the length and current position as you go. Instead of:
strcpy(out, str1);
strcat(out, str2);
strcat(out, str3);
...
You should be doing something like:
size_t l, n = outsize;
char *s = out;
l = strlen(str1);
if (l>=outsize) goto error;
strcpy(s, str1);
s += l;
n -= l;
l = strlen(str2);
if (l>=outsize) goto error;
strcpy(s, str2);
s += l;
n -= l;
...
Alternatively you could avoid modifying the pointer by keeping a current index i of type size_t and using out+i, or you could avoid the use of size variables by keeping a pointer to the end of the buffer and doing things like if (l>=end-s) goto error;.
Note that, whichever approach you choose, the redundancy can be condensed by writing your own (simple) functions that take pointers to the position/size variable and call the standard library, for instance something like:
if (!my_strcpy(&s, &n, str1)) goto error;
Avoiding strcat also has performance benefits; see Schlemiel the Painter's algorithm.
Finally, you should note that a good 75% of the string copying and assembly people perform in C is utterly useless. My theory is that the people doing it come from backgrounds in script languages where putting together strings is what you do all the time, but in C it's not useful that often. In many cases, you can get by with never copying strings at all, using the original copies instead, and get much better performance and simpler code at the same time. I'm reminded of a recent SO question where OP was using regexec to match a regular expression, then copying out the result just to print it, something like:
char *tmp = malloc(match.end-match.start+1);
memcpy(tmp, src+match.start, match.end-match.start);
tmp[match.end-match.start] = 0;
printf("%s\n", tmp);
free(tmp);
The same thing can be accomplished with:
printf("%.*s\m", match.end-match.start, src+match.start);
No allocations, no cleanup, no error cases (the original code crashed if malloc failed).

Safer Alternatives to the C Standard Library

The C standard library is notoriously poor when it comes to I/O safety. Many functions have buffer overflows (gets, scanf), or can clobber memory if not given proper arguments (scanf), and so on. Every once in a while, I come across an enterprising hacker who has written his own library that lacks these flaws.
What are the best of these libraries you have seen? Have you used them in production code, and if so, which held up as more than hobby projects?
I use GLib library, it has many good standard and non standard functions.
See https://developer.gnome.org/glib/stable/
and maybe you fall in love... :)
For example:
https://developer.gnome.org/glib/stable/glib-String-Utility-Functions.html#g-strdup-printf
explains that g_strdup_printf is:
Similar to the standard C sprintf() function but safer, since it calculates the maximum space required and allocates memory to hold the result.
This isn't really answering your question about the safest libraries to use, but most functions that are vulnerable to buffer overflows that you mentioned have safer versions which take the buffer length as an argument to prevent the security holes that are opened up when the standard methods are used.
Unless you have relaxed the level of warnings, you will usually get compiler warnings when you use the deprecated methods, suggesting you use the safer methods instead.
I believe the Apache Portable Runtime (apr) library is safer than the standard C library. I use it, well, as part of an apache module, but also for independent processes.
For Windows there is a 'safe' C/C++ library.
You're always at liberty to implement any library you like and to use it - the hard part is making sure it is available on the platforms you need your software to work on. You can also use wrappers around the standard functions where appropriate.
Whether it is really a good idea is somewhat debatable, but there is TR24731 published by the C standard committee - for a safer set of C functions. There's definitely some good stuff in there. See this question: Do you use the TR 24731 Safe Functions in your C code?, which includes links to the technical report.
Maybe the first question to ask is if your really need plain C? (maybe a language like .net or java is an option - then e.g. buffer overflows are not really a problem anymore)
Another option is maybe to write parts of your project in C++ if other higher level languages are not an option. You can then have a C interface which encapsulates the C++ code if you really need C.
Because if you add all the advanced functions the C++ standard library has build in - your C code would only be marginally faster most times (and contain a lot more bugs than an existing and tested framework).

Resources