C4996 (function unsafe) warning for strcpy but not for memcpy - c

I am writing code in VS2010 and I happen to see after compilation compiler gives me C4996 warning ("This function or variable may be unsafe") for strcpy and sprintf calls.
However, I couldn't get similar warnings for memcpy (and may be there are few more similar 'unsafe' function calls in the code)
int _tmain(int argc, _TCHAR* argv[])
{
char buf1[100], buf2[100];
strcpy (buf1, buf2); // Warning C4996 displayed here asking to use strcpy_s instead
memcpy (buf1, buf2, 100); // No warning here asking to use memcpy_s
memcpy_s(buf1, 100, buf2, 100);
return 0;
}
Why is this so? How can I turn on C4996 warning for all possible unsafe calls in my code?

In general, to compile C code you need a conforming C compiler. Visual Studio is a non-conforming C++ compiler.
You get the warning because Visual Studio is bad. See this.
C4996 appears whenever you use a function that Microsoft regards as obsolete. Apparently, Microsoft has decided that they should dictate the future of the C language, rather than the ISO C working group. Thus you get false warnings for perfectly fine code. The compiler is the problem.
There is nothing wrong with the strcpy() function, that's a myth. This function has existed for some 30-40 years and every little bit of it is properly documented. So what the function does and what it does not should not come as a surprise, even to beginner C programmers.
What strcpy does and does not:
It copies a null-terminated string into another memory location.
It does not take any responsibility for error handling.
It does not fix bugs in the caller application.
It does not take any responsibility for educating C programmers.
Because of the last remark above, you must know the following before calling strcpy:
If you pass a string of unknown length to strcpy, without checking its length in advance, you have a bug in the caller application.
If you pass some chunk of data which does not end with \0, you have a bug in the caller application.
If you pass two pointers to strcpy(), which point at memory locations that overlap, you invoke undefined behavior. Meaning you have a bug in the caller application.
For example, in the code you posted, you never initialized the arrays, so your program will likely crash and burn. That bug isn't in the slightest related to the strcpy() function and will not be solved by swapping out strcpy() for something else.

strcpy is unsafe if the terminating NUL is missing, as it may copy more characters than fit in the destination area. With memcpy, the number of bytes copied is fixed.
The memcpy_s function actually makes it easier for programmers to do it wrong -- you pass two lengths, and it uses the smaller of both, and all you get is an error code that can be silently ignored with no effort. Calling memcpy requires filling out the size parameter, which should make programmers think about what to pass.

Include in header "stdafx.h" definition
#define _CRT_SECURE_NO_WARNINGS
As for the difference of strcpy and memcpy then the last function has third parameter that explicitly specifies how many characters must be copied. The first function has no information how many characters will be copied from the source string to the destination string so in general case there is a possibility that the memory allocated for the destination string will be overwritten.

You get these warning because not passing the length of string and relying on \0 termination are unsafe as they may cause buffer overrun. In memcpy you pass length so no overrun issue.
You can use something like
#ifdef _MSC_VER
# pragma warning(push)
# pragma warning(disable:4996)
#endif
strcpy... ; // Code that causes unsafe warning
#ifdef _MSC_VER
# pragma warning(pop)
#endif
If you don't worry about portability, you can use alternatives like strcpy_s etc

Because strcpy and sprintf really are unsafe functions, it depends on the content of the string to not overflow. Instead you should use strncpy and snprintf to make sure it does not overwrite the memory.
While memcpy is not this case, it has the length so it does not overwrite memory as long as the length is correct.

The warning meens that the function is deprecated and will not be available in future versions: http://msdn.microsoft.com/en-US/en-en/library/ttcz0bys.aspx You can't add other functions to the deprecate list of Microsoft.
The reason for the deprecation are "unsafe", but that's different from your assumption "C4496 shows you all unsafe functions".

The reason why you get a warning on sprintf and strcpy, and not on memcpy, is because memcpy has a length parameter that limits how much memory you copy. For strcpy and memcpy, the input has to be terminated with a \0. If not, it will continue out of bounds. You can limit this by using the snprintf and strncpy functions. Those do limit implicitly how much can be copied.
Note that microsoft has deprecated snprintf, so you should use the replacement function _snprintf instead. However, this is a MSVC specific function.
I would advise to do away with char * buffers all together and switch to C++, using stl container, such as std::string. These will save you a lot of debugging headaches and keep your code portable.

Related

Why strcpy_s is safer than strcpy?

When I am trying to use the strcpy function the visual studio gives me an error
error C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
After searching online and many answers from StackOverflow, the summary is that strcpy_s is safer than strcpy when copying a large string into a shorter one.
So, I tried the following code for coping into shorter string:
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(a), a);
printf("String = %s", b);
The code copiles successfuly. However, there is still a runtime error:
So, how is scrcpy_s is safe?
Am I understanding the safty concept wrong?
Why is strcpy_s() "safer"? Well, it's actually quite involved. (Note that this answer ignores any specific code issues in the posted code.)
First, when MSVC tells you standard functions such as strcpy() are "deprecated", at best Microsoft is being incomplete. At worst, Microsoft is downright lying to you. Ascribe whatever motiviation you want to Microsoft here, but strcpy() and a host of other functions that MSVC calls "deprecated" are standard C functions and they are most certainly NOT deprecated by anyone other than Microsoft. So when MSVC warns you that a function required to be implemented in any conforming C compiler (most of which then flow by requirement into C++...), it omits the "by Microsoft" part.
The "safer" functions that Microsoft is "helpfully" suggesting that you use - such as strcpy_s() would be standard, as they are part of the optional Annex K of the C standard, had Microsoft implemented them per the standard.
Per N1967 - Field Experience With Annex K — Bounds Checking Interfaces
Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro. The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.
As a result of the numerous deviations from the specification the Microsoft implementation cannot be considered conforming or portable.
Outside of a few specific cases (of which strcpy() is one), whether Microsoft's version of Annex K's "safer" bounds-checking functions are safer is debatable. Per N1967 (bolding mine):
Suggested Technical Corrigendum
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.
Note, however, in the case of strcpy(), strcpy_s() is actually more akin to strncpy() as strcpy() is just a bog-standard C string function that doesn't do bounds checking, but strncpy() is a perverse function in that it will completely fill its target buffer, starting with data from the source string, and filling the entire target buffer with '\0' char values. Unless the source string fills the entire target buffer, in which case strncpy() will NOT terminate it with a '\0' char value.
I'll repeat that: strncpy() does not guarantee a properly terminated copy.
It's hard not to be "safer" than strncpy(). In this case strcpy_s() does not violate the principle of least astonishment like strncpy() does. I'd call that "safer".
But using strcpy_s() - and all the other "suggested" functions - makes your code de facto non-portable, as Microsoft is the only significant implementation of any form of Annex K's bounds-checking functions.
The header definiton for C is:
errno_t strcpy_s(char *dest,rsize_t dest_size,const char *src)
The invocation for your example should be:
#include <stdlib.h>
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(b), a);
printf("String = %s", b);
strcpy_s needs the size of the destination, which is smaller than the source in your example.
strcpy_s(b, sizeof(b), a);
would be the way to go.
As for the safety concept, there are many checks now done, and better ways to handle errors.
In your example, had you used strcpy, you would have triggered a buffer overflow. Other functions, like strncpy or strlcpy, would have copied the 3 first characters without any null-byte terminator, which in turn would have triggered a buffer overflow (in reading, this time).

string literals and strcat

I am not sure why strcat works in this case for me:
char* foo="foo";
printf(strcat(foo,"bar"));
It successfully prints "foobar" for me.
However, as per an earlier topic discussed on stackoverflow here: I just can't figure out strcat
It says, that the above should not work because foo is declared as a string literal. Instead, it needs to be declared as a buffer (an array of a predetermined size so that it can accommodate another string which we are trying to concatenate).
In that case, why does the above program work for me successfully?
This code invokes Undefined Behavior (UB), meaning that you have no guarantee of what will happen (failure here).
The reason is that string literals are immutable. That means that they are not mutable, and any attempt of doing so, will invoke UB.
Note what a difficult logical error(s) can arise with UB, since it might work (today and in your system), but it's still wrong, which makes it very likely that you might miss the error, and get along as everything was fine.
PS: In this Live Demo, I am lucky enough to get a Segmentation fault. I say lucky, because this seg fault will make me investigate and debug the code.
It's worth noting that GCC issues no warning, and the warning from Clang are also irrelevant:
p
rog.c:7:8: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
printf(strcat(foo,"bar"));
^~~~~~~~~~~~~~~~~
prog.c:7:8: note: treat the string as an argument to avoid this
printf(strcat(foo,"bar"));
^
"%s",
1 warning generated.
String literals are immutable in the sense that the compiler will operate under the assumption that you won't mutate them, not that you'll necessarily get an error if you try to modify them. In legalese, this is "undefined behavior", so anything can happen, and, as far as the standard is concerned, it's fine.
Now, on modern platforms and with modern compilers you do have extra protections: on platforms that have memory protection the string table generally gets placed in a read-only memory area, so that modifying it will get you a runtime error.
Still, you may have a compiler that doesn't provide any of the runtime-enforced checks, either because you are compiling for a platform without memory protection (e.g. pre-80386 x86, so pretty much any C compiler for DOS such as Turbo C, most microcontrollers when operating on RAM and not on flash, ...), or with an older compiler which doesn't exploit this hardware capability by default to remain compatible with older revisions (older VC++ for a long time), or with a modern compiler which has such an option explicitly enabled, again for compatibility with older code (e.g. gcc with -fwritable-strings). In all these cases, it's normal that you won't get any runtime error.
Finally, there's an extra devious corner case: current-day optimizers actively exploit undefined behavior - i.e. they assume that it will never happen, and modify the code accordingly. It's not impossible that a particularly smart compiler can generate code that just drops such a write, as it's legally allowed to do anything it likes most for such a case.
This can be seen for some simple code, such as:
int foo() {
char *bar = "bar";
*bar = 'a';
if(*bar=='b') return 1;
return 0;
}
here, with optimizations enabled:
VC++ sees that the write is used just for the condition that immediately follows, so it simplifies the whole thing to return 0; no memory write, no segfault, it "appears to work" (https://godbolt.org/g/cKqYU1);
gcc 4.1.2 "knows" that literals don't change; the write is redundant and it gets optimized away (so, no segfault), the whole thing becomes return 1 (https://godbolt.org/g/ejbqDm);
any more modern gcc choose a more schizophrenic route: the write is not elided (so you get a segfault with the default linker options), but if it succeeded (e.g. if you manually fiddle with memory protection) you'd get a return 1 (https://godbolt.org/g/rnUDYr) - so, memory modified but the code that follows thinks it hasn't been modified; this is particularly egregious on AVR, where there's no memory protection and the write succeeds.
clang does pretty much the same as gcc.
Long story short: don't try your luck and tread carefully. Always assign string literals to const char * (not plain char *) and let the type system help you avoid this kind of problems.

Why use bzero over memset?

In a Systems Programming class I took this previous semester, we had to implement a basic client/server in C. When initializing the structs, like sock_addr_in, or char buffers (that we used to send data back and forth between client and server) the professor instructed us to only use bzero and not memset to initialize them. He never explained why, and I'm curious if there is a valid reason for this?
I see here: http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown that bzero is more efficient due to the fact that is only ever going to be zeroing memory, so it doesn't have to do any additional checking that memset may do. That still doesn't necessarily seem like a reason to absolutely not use memset for zeroing memory though.
bzero is considered deprecated, and furthermore is a not a standard C function. According to the manual, memset is preferred over bzero for this reason. So why would you want to still use bzero over memset? Just for the efficiency gains, or is it something more? Likewise, what are the benefits of memset over bzero that make it the de facto preferred option for newer programs?
I don't see any reason to prefer bzero over memset.
memset is a standard C function while bzero has never been a C standard function. The rationale is probably because you can achieve exactly the same functionality using memset function.
Now regarding efficiency, compilers like gcc use builtin implementations for memset which switch to a particular implementation when a constant 0 is detected. Same for glibc when builtins are disabled.
I'm guessing you used (or your teacher was influenced by) UNIX Network Programming by W. Richard Stevens. He uses bzero frequently instead of memset, even in the most up-to-date edition. The book is so popular, I think it's become an idiom in network programming which is why you still see it used.
I would stick with memset simply because bzero is deprecated and reduces portability. I doubt you would see any real gains from using one over the other.
The one advantage that I think bzero() has over memset() for setting memory to zero is that there's a reduced chance of a mistake being made.
More than once I've come across a bug that looked like:
memset(someobject, size_of_object, 0); // clear object
The compiler won't complain (though maybe cranking up some warning levels might on some compilers) and the effect will be that the memory isn't cleared. Because this doesn't trash the object - it just leaves it alone - there's a decent chance that the bug might not manifest into anything obvious.
The fact that bzero() isn't standard is a minor irritant. (FWIW, I wouldn't be surprised if most function calls in my programs are non-standard; in fact writing such functions is kind of my job).
In a comment to another answer here, Aaron Newton cited the following from Unix Network Programming, Volume 1, 3rd Edition by Stevens, et al., Section 1.2 (emphasis added):
bzero is not an ANSI C function. It is derived from early Berkely
networking code. Nevertheless, we use it throughout the text, instead
of the ANSI C memset function, because bzero is easier to remember
(with only two arguments) than memset (with three arguments). Almost
every vendor that supports the sockets API also provides bzero, and
if not, we provide a macro definition in our unp.h header.
Indeed, the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens 1996] made the mistake of swapping the second
and third arguments to memset in 10 occurrences in the first
printing. A C compiler cannot catch this error because both arguments
are of the same type. (Actually, the second argument is an int and
the third argument is size_t, which is typically an unsigned int,
but the values specified, 0 and 16, respectively, are still acceptable
for the other type of argument.) The call to memset still worked,
because only a few of the socket functions actually require that the
final 8 bytes of an Internet socket address structure be set to 0.
Nevertheless, it was an error, and one that could be avoided by using
bzero, because swapping the two arguments to bzero will always be
caught by the C compiler if function prototypes are used.
I also believe that the vast majority of calls to memset() are to zero memory, so why not use an API that is tailored to that use case?
A possible drawback to bzero() is that compilers might be more likely to optimize memcpy() because it's standard and so they might be written to recognize it. However, keep in mind that correct code is still better than incorrect code that's been optimized. In most cases, using bzero() will not cause a noticeable impact on your program's performance, and that bzero() can be a macro or inline function that expands to memcpy().
Wanted to mention something about bzero vs. memset argument. Install ltrace and then compare what it does under the hood.
On Linux with libc6 (2.19-0ubuntu6.6), the calls made are exactly the same (via ltrace ./test123):
long m[] = {0}; // generates a call to memset(0x7fffefa28238, '\0', 8)
int* p;
bzero(&p, 4); // generates a call to memset(0x7fffefa28230, '\0', 4)
I've been told that unless I am working in the deep bowels of libc or any number of kernel/syscall interface, I don't have to worry about them.
All I should worry about is that the call satisfy the requirement of zero'ing the buffer. Others have mentioned about which one is preferable over the other so I'll stop here.
You probably shouldn't use bzero, it's not actually standard C, it was a POSIX thing.
And note that word "was" - it was deprecated in POSIX.1-2001 and removed in POSIX.1-2008 in deference to memset so you're better off using the standard C function.
Have it any way you like. :-)
#ifndef bzero
#define bzero(d,n) memset((d),0,(n))
#endif
Note that:
The original bzero returns nothing, memset returns void pointer (d). This can be fixed by adding the typecast to void in the definition.
#ifndef bzero does not prevent you from hiding the original function even if it exists. It tests the existence of a macro. This may cause lots of confusion.
It’s impossible to create a function pointer to a macro. When using bzero via function pointers, this will not work.
For memset function, the second argument is an int and the third argument is size_t,
void *memset(void *s, int c, size_t n);
which is typically an unsigned int, but if the values like, 0 and 16 for second and third argument respectively are entered in wrong order as 16 and 0 then,
such a call to memset can still work, but will do nothing. Because the number of bytes to initialize are specified as 0.
void bzero(void *s, size_t n)
Such an error can be avoided by using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used.
In short: memset require more assembly operations then bzero.
This is the source:
http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown
memset takes 3 parameters, bzero takes 2
in memory constrained that extra parameter would take 4 more bytes and most of the time itll be used to set everything to 0

Type-casting in C for standard function

I am working on a C project (still pretty new to C), and I am trying to remove all of the warnings when it's compiled.
The original coders of this project have made a type called dyn_char (dynamic char arr) and it's an unsigned char * type. Here's a copy of one of the warnings:
warning: argument #1 is incompatible with prototype: prototype:
pointer to char : ".../stdio_iso.h", line 210 argument : pointer to
unsigned char
They also use lots of standard string functions like strlen(); so the way that I have been removing these warnings is like this:
strlen((char *)myDynChar);
I can do this but some of the files have hundreds of these warnings. I could do a Find and Replace to search for strlen( and replace with strlen((char*), but is there a better way?
Is it possible to use a Macro to do something similar? Maybe something like this:
#define strlen(s) strlen((char *)s)
Firstly, would this work? Secondly, if so, is it a bad idea to do this?
Thanks!
This is an annoying problem, but here's my two cents on it.
First, if you can confidently change the type of dyn_char to just be char *, I would do that. Perhaps if you have a robust test program or something you can try it out and see if it still works?
If not, you have two choices as far as I can see: fix what's going into strlen(), or have your compiler ignore those warnings (or ignore them yourself)! I'm not one for ignoring warnings unless I have to, but as far as fixing what goes into strlen...
If your underlying type is unsigned char *, then casting what goes into strlen() is basically telling the compiler to assume that the argument, for the purposes of being passed to strlen(), is a char *. If strlen() is the only place this is causing an issue and you can't safely change the type, then I'd consider a search-and-replace to add in casts to be the preferable option. You could redefine strlen with a #define like you suggested (I just tried it out and it worked for me), but I would strongly recommend not doing this. If anything, I'd search-replace strlen() with USTRLEN() or something (a fake function name), and then use that as your casting macro. Overriding C library functions with your own names transparently is a maintainability nightmare!
Two points on that: first, you're using a different name. Second, you're using all-caps, as is the convention for defining such a macro.
This may or may not work
strlen may be a macro in the system header, in which case you will get a warning about redefining a macro, and you won't get the functionality of the existing macro.
(The Visual Studio stdlib does many interesting things with macros in <string.h>. strcpy is defined this way:
__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(char *, __RETURN_POLICY_DST, __EMPTY_DECLSPEC, strcpy, _Pre_cap_for_(_Source) _Post_z_, char, _Dest, _In_z_ const char *, _Source))
I wouldn't be surprised at all if #defining strcpy broke this)`
Search and replace may be your best option. Don't hide subtle differences like this behind macros - you will just pass your pain on to the next maintainer.
Instead of adding a cast to all the calls, you may want to change all the calls to dyn_strlen, which is a function you create that calls strlen with the appropriate cast.
You can define a function:
char *uctoc(unsigned char*p){ return (char*)(p); }
and do the search replace strstr(x with strstr(uctoc(x). At least you can have some type checking. Later you can convert uctoc to a macro for performance.

Bounds checking for Variable Length Arrays (VLA)?

Is there a way to check for buffer overflows in VLA's ? I used -fstack-protector-all -Wstack-protector but get these warnings:
warning: not protecting local variables: variable length buffer
Is there a library for achieving this ? (-lefence is for heap memory)
I'm currently using Valgrind and gdb.
You can use -fmudflap instead of -fstack-protector-all
Update: Some documentation and options are here http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging
Perhaps using alloca() will help. That's annoying, because c99 should save you from having to use it, but the GCC man page seems to say that the stack protection code will be turned on if you use alloca().
Of course the real solution is to write perfect, bug free code that never tries to corrupt the stack.
I don't see how a library could do this for you; with a variable-length array, you're not calling any functions to do the indexing, so there's no place to "hook in" a library. With malloc(), the allocation is explicit in a function and you can track it.
Of course, you could go through the code and use preprocessor trickery to add some macro to each indexing point, and have the macro expand to code that checks the boundaries. But that is very intrusive.
I'm thinking something like changing:
void work(int n)
{
int data[n]; /* Our variable-length array. */
data[0] = 0;
}
into something like:
#include "vla-tracking.h"
void work(int n)
{
VLA_NEW(int, data, n); /* Our variable-length array. */
VLA_SET(data, 0, 0);
}
Then come up with suitable macro definitions (and auxiliary code) to track the accesses. As I said, it won't be pretty. Of course, the idea is that the macros would be able to "compile out" to just the plain definitions, controlled by some build-time setting (debug/release mode, or whatever).

Resources