When I am trying to use the strcpy function the visual studio gives me an error
error C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
After searching online and many answers from StackOverflow, the summary is that strcpy_s is safer than strcpy when copying a large string into a shorter one.
So, I tried the following code for coping into shorter string:
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(a), a);
printf("String = %s", b);
The code copiles successfuly. However, there is still a runtime error:
So, how is scrcpy_s is safe?
Am I understanding the safty concept wrong?
Why is strcpy_s() "safer"? Well, it's actually quite involved. (Note that this answer ignores any specific code issues in the posted code.)
First, when MSVC tells you standard functions such as strcpy() are "deprecated", at best Microsoft is being incomplete. At worst, Microsoft is downright lying to you. Ascribe whatever motiviation you want to Microsoft here, but strcpy() and a host of other functions that MSVC calls "deprecated" are standard C functions and they are most certainly NOT deprecated by anyone other than Microsoft. So when MSVC warns you that a function required to be implemented in any conforming C compiler (most of which then flow by requirement into C++...), it omits the "by Microsoft" part.
The "safer" functions that Microsoft is "helpfully" suggesting that you use - such as strcpy_s() would be standard, as they are part of the optional Annex K of the C standard, had Microsoft implemented them per the standard.
Per N1967 - Field Experience With Annex K — Bounds Checking Interfaces
Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro. The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.
As a result of the numerous deviations from the specification the Microsoft implementation cannot be considered conforming or portable.
Outside of a few specific cases (of which strcpy() is one), whether Microsoft's version of Annex K's "safer" bounds-checking functions are safer is debatable. Per N1967 (bolding mine):
Suggested Technical Corrigendum
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.
Note, however, in the case of strcpy(), strcpy_s() is actually more akin to strncpy() as strcpy() is just a bog-standard C string function that doesn't do bounds checking, but strncpy() is a perverse function in that it will completely fill its target buffer, starting with data from the source string, and filling the entire target buffer with '\0' char values. Unless the source string fills the entire target buffer, in which case strncpy() will NOT terminate it with a '\0' char value.
I'll repeat that: strncpy() does not guarantee a properly terminated copy.
It's hard not to be "safer" than strncpy(). In this case strcpy_s() does not violate the principle of least astonishment like strncpy() does. I'd call that "safer".
But using strcpy_s() - and all the other "suggested" functions - makes your code de facto non-portable, as Microsoft is the only significant implementation of any form of Annex K's bounds-checking functions.
The header definiton for C is:
errno_t strcpy_s(char *dest,rsize_t dest_size,const char *src)
The invocation for your example should be:
#include <stdlib.h>
char a[50] = "void";
char b[3];
strcpy_s(b, sizeof(b), a);
printf("String = %s", b);
strcpy_s needs the size of the destination, which is smaller than the source in your example.
strcpy_s(b, sizeof(b), a);
would be the way to go.
As for the safety concept, there are many checks now done, and better ways to handle errors.
In your example, had you used strcpy, you would have triggered a buffer overflow. Other functions, like strncpy or strlcpy, would have copied the 3 first characters without any null-byte terminator, which in turn would have triggered a buffer overflow (in reading, this time).
Related
This question already has answers here:
Why is the gets function so dangerous that it should not be used?
(13 answers)
Closed 5 years ago.
The declaration of gets is:
char * gets ( char * str );
Note the glaring omission of a maximum size for str.
cplusplus.com says2:
Notice that gets is quite different from fgets: not only gets uses
stdin as source, but it does not include the ending newline character
in the resulting string and does not allow to specify a maximum size
for str (which can lead to buffer overflows).
And also:
The most recent revision of the C standard (2011) has definitively
removed this function from its specification. The function is
deprecated in C++ (as of 2011 standard, which follows C99+TC3).
Now, of course, fgets is commonly recommended as a replacement of gets, because its declaration looks like this:
char * fgets ( char * str, int num, FILE * stream );
It DOES take a size parameter. This makes it much safer than gets.
Now since I'm not willing to shell out money to download or buy the C11 standard, can anyone shed some light on the reason for deprecating gets and what it means for future code? Why did it exist in the same place when fgets is safer? And why is it only just now being deprecated?
gets is deprecated because it's unsafe, as what you already quoted, it may cause buffer overflow. For replacement, C11 provides an alternative gets_s with a signature like this:
char *gets_s(char *s, rsize_t n);
Note that C11 still recommends fgets to replace gets.
Whether putting gets in the standard is controversial in the first place, but the Committee decided that gets was useful when the programmer does have adequate control over the input.
Here's the official explanation by the Committee.
Rationale for International Standard - Programming Languages C §7.19.7.7 The gets function:
Because gets does not check for buffer overrun, it is generally unsafe to use when its input is not under the programmer’s control. This has caused some to question whether it should appear in the Standard at all. The Committee decided that gets was useful and convenient in those special circumstances when the programmer does have adequate control over the input, and as longstanding existing practice, it needed a standard specification. In general, however, the preferred function is fgets (see §7.19.7.2).
Now since I'm not willing to shell out money to download or buy the C11 standard, can anyone shed some light on the reason for deprecating gets and what it means for future code?
From C committee in C99 Rationale:
Because gets does not check for buffer overrun, it is generally unsafe to use when its input is not under the programmer’s control. This has caused some to question whether it should appear in the Standard at all. The Committee decided that gets was useful and convenient in those special circumstances when the programmer does have adequate control over the input, and as longstanding existing practice, it needed a standard specification. In general, however, the
preferred function is fgets.
As the other C library function, like strcpy, strcat, there is a version which limits the size of string (strncpy, etc.), I am wondering why there is no such variant for strchr?
It does exist -- it is called memchr:
http://en.wikipedia.org/wiki/C_string_handling
In C, the term "string" usually means "null terminated array of characters", and the str* functions operate on those kinds of strings. The n in the functions you mention is mostly for the sake of controlling the output.
If you want to operate on an arbitary byte sequence without any implied termination semantics, use the mem* family of functions; in your case memchr should serve your needs.
strnchr and also strnlen are defined in some linux environments, for example https://manpages.debian.org/experimental/linux-manual-4.11/strnchr.9.en.html. It is really necessary. A program may crash on end of memory area if strlen or strcmp do not found any \0-termination. Unfortunately such things often are not standardized or too late and too sophisticated standardized. strnlen_s is existing in C11, but strnchr_s is not available.
You may found some more information about such problems in my internet page: https://www.vishia.org/emcdocu/html/portability_emC.html. I have defined some C-functions strnchr_emC... etc. which delivers the required functionality. To achieve compatibility you can define
#define strnchr strnchr_emC
In a common header but platform-specific. Refer the further content on https://www.vishia.org/emc/. You find the sources in https://github.com/JzHartmut
Why this distinction? I've landed up with terrible problems, assuming itoa to be in stdlib.h and finally ending up with linking a custom version of itoa with a different prototype and thus producing some crazy errors.
So, why isn't itoa not a standard function? What's wrong with it? And why is the standard partial towards its twin brother atoi?
No itoa has ever been standardised so to add it to the standard you would need a compelling reason and a good interface to add it.
Most itoa interfaces that I have seen either use a static buffer which has re-entrancy and lifetime issues, allocate a dynamic buffer that the caller needs to free or require the user to supply a buffer which makes the interface no better than sprintf.
An "itoa" function would have to return a string. Since strings aren't first-class objects, the caller would have to pass a buffer + length and the function would have to have some way to indicate whether it ran out of room or not. By the time you get that far, you've created something similar enough to sprintf that it's not worth duplicating the code/functionality. The "atoi" function exists because it's less complicated (and arguably safer) than a full "scanf" call. An "itoa" function wouldn't be different enough to be worth it.
The itoa function isn't standard probably for the reason is that there is no consistent definition of it. Different compiler and library vendors have introduced subtly different versions of it, possibly as an invention to serve as a complement to atoi.
If some non-standard function is widely provided by vendors, the standard's job is to codify it: basically add a description of the existing function to the standard. This is possible if the function has more or less consistent argument conventions and behavior.
Because multiple flavors of itoa are already out there, such a function cannot be added into ISO C. Whatever behavior is described would be at odds with some implementations.
itoa has existed in forms such as:
void itoa(int n, char *s); /* Given in _The C Programming Language_, 1st ed. (K&R1) */
void itoa(int input, void (*subr)(char)); /* Ancient Unix library */
void itoa(int n, char *buf, int radix);
char *itoa(int in, char *buf, int radix);
Microsoft provides it in their Visual C Run Time Library under the altered name: _itoa.
Not only have C implementations historically provided it under differing definitions, C programs also provide a function named itoa function for themselves, which is another source for possible clashes.
Basically, the itoa identifier is "radioactive" with regard to standardization as an external name or macro. If such a function is standardized, it will have to be under a different name.
This is a nitpicky-details question with three parts. The context is that I wish to persuade some folks that it is safe to use <stddef.h>'s definition of offsetof unconditionally rather than (under some circumstances) rolling their own. The program in question is written entirely in plain old C, so please ignore C++ entirely when answering.
Part 1: When used in the same manner as the standard offsetof, does the expansion of this macro provoke undefined behavior per C89, why or why not, and is it different in C99?
#define offset_of(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
Note: All implementations of interest to the people whose program this is supersede the standard's rule that pointers may only be subtracted from each other when they point into the same array, by defining all pointers, regardless of type or value, to point into a single global address space. Therefore, please do not rely on that rule when arguing that this macro's expansion provokes undefined behavior.
Part 2: To the best of your knowledge, has there ever been a released, production C implementation that, when fed the expansion of the above macro, would (under some circumstances) behave differently than it would have if its offsetof macro had been used instead?
Part 3: To the best of your knowledge, what is the most recently released production C implementation that either did not provide stddef.h or did not provide a working definition of offsetof in that header? Did that implementation claim conformance with any version of the C standard?
For parts 2 and 3, please answer only if you can name a specific implementation and give the date it was released. Answers that state general characteristics of implementations that may qualify are not useful to me.
There is no way to write a portable offsetof macro. You must use the one provided by stddef.h.
Regarding your specific questions:
The macro invokes undefined behavior. You cannot subtract pointers except when they point into the same array.
The big difference in practical behavior is that the macro is not an integer constant expression, so it can't safely be used for static initializers, bitfield widths, etc. Also strict bounds-checking-type C implementations might completely break it.
There has never been any C standard that lacked stddef.h and offsetof. Pre-ANSI compilers might lack it, but they have much more fundamental problems that make them unusable for modern code (e.g. lack of void * and const).
Moreover, even if some theoretical compiler did lack stddef.h, you could just provide a drop-in replacement, just like the way people drop in stdint.h for use with MSVC...
To answer #2: yes, gcc-4* (I'm currently looking at v4.3.4, released 4 Aug 2009, but it should hold true for all gcc-4 releases to date). The following definition is used in their stddef.h:
#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)
where __builtin_offsetof is a compiler builtin like sizeof (that is, it's not implemented as a macro or run-time function). Compiling the code:
#include <stddef.h>
struct testcase {
char array[256];
};
int main (void) {
char buffer[offsetof(struct testcase, array[0])];
return 0;
}
would result in an error using the expansion of the macro that you provided ("size of array ‘buffer’ is not an integral constant-expression") but would work when using the macro provided in stddef.h. Builds using gcc-3 used a macro similar to yours. I suppose that the gcc developers had many of the same concerns regarding undefined behavior, etc that have been expressed here, and created the compiler builtin as a safer alternative to attempting to generate the equivalent operation in C code.
Additional information:
A mailing list thread from the Linux kernel developer's list
GCC's documentation on offsetof
A sort-of-related question on this site
Regarding your other questions: I think R's answer and his subsequent comments do a good job of outlining the relevant sections of the standard as far as question #1 is concerned. As for your third question, I have not heard of a modern C compiler that does not have stddef.h. I certainly wouldn't consider any compiler lacking such a basic standard header as "production". Likewise, if their offsetof implementation didn't work, then the compiler still has work to do before it could be considered "production", just like if other things in stddef.h (like NULL) didn't work. A C compiler released prior to C's standardization might not have these things, but the ANSI C standard is over 20 years old so it's extremely unlikely that you'll encounter one of these.
The whole premise to this problems begs a question: If these people are convinced that they can't trust the version of offsetof that the compiler provides, then what can they trust? Do they trust that NULL is defined correctly? Do they trust that long int is no smaller than a regular int? Do they trust that memcpy works like it's supposed to? Do they roll their own versions of the rest of the C standard library functionality? One of the big reasons for having language standards is so that you can trust the compiler to do these things correctly. It seems silly to trust the compiler for everything else except offsetof.
Update: (in response to your comments)
I think my co-workers behave like yours do :-) Some of our older code still has custom macros defining NULL, VOID, and other things like that since "different compilers may implement them differently" (sigh). Some of this code was written back before C was standardized, and many older developers are still in that mindset even though the C standard clearly says otherwise.
Here's one thing you can do to both prove them wrong and make everyone happy at the same time:
#include <stddef.h>
#ifndef offsetof
#define offsetof(tp, member) (((char*) &((tp*)0)->member) - (char*)0)
#endif
In reality, they'll be using the version provided in stddef.h. The custom version will always be there, however, in case you run into a hypothetical compiler that doesn't define it.
Based on similar conversations that I've had over the years, I think the belief that offsetof isn't part of standard C comes from two places. First, it's a rarely used feature. Developers don't see it very often, so they forget that it even exists. Second, offsetof is not mentioned at all in Kernighan and Ritchie's seminal book "The C Programming Language" (even the most recent edition). The first edition of the book was the unofficial standard before C was standardized, and I often hear people mistakenly referring to that book as THE standard for the language. It's much easier to read than the official standard, so I don't know if I blame them for making it their first point of reference. Regardless of what they believe, however, the standard is clear that offsetof is part of ANSI C (see R's answer for a link).
Here's another way of looking at question #1. The ANSI C standard gives the following definition in section 4.1.5:
offsetof( type, member-designator)
which expands to an integral constant expression that has type size_t,
the value of which is the offset in bytes, to the structure member
(designated by member-designator ), from the beginning of its
structure (designated by type ).
Using the offsetof macro does not invoke undefined behavior. In fact, the behavior is all that the standard actually defines. It's up to the compiler writer to define the offsetof macro such that its behavior follows the standard. Whether it's implemented using a macro, a compiler builtin, or something else, ensuring that it behaves as expected requires the implementor to deeply understand the inner workings of the compiler and how it will interpret the code. The compiler may implement it using a macro like the idiomatic version you provided, but only because they know how the compiler will handle the non-standard code.
On the other hand, the macro expansion you provided indeed invokes undefined behavior. Since you don't know enough about the compiler to predict how it will process the code, you can't guarantee that particular implementation of offsetof will always work. Many people define their own version like that and don't run into problems, but that doesn't mean that the code is correct. Even if that's the way that a particular compiler happens to define offsetof, writing that code yourself invokes UB while using the provided offsetof macro does not.
Rolling your own macro for offsetof can't be done without invoking undefined behavior (ANSI C section A.6.2 "Undefined behavior", 27th bullet point). Using stddef.h's version of offsetof will always produce the behavior defined in the standard (assuming a standards-compliant compiler). I would advise against defining a custom version since it can cause portability problems, but if others can't be persuaded then the #ifndef offsetof snippet provided above may be an acceptable compromise.
(1) The undefined behavior is already there before you do the substraction.
First of all, (tp*)0 is not what you think it is. It is a null
pointer, such a beast is not necessarily represented with all-zero
bit pattern.
Then the member operator -> is not simply an offset addition. On a CPU with segmented memory this might be a more complicated operation.
Taking the address with a & operation is UB if the expression is
not a valid object.
(2) For the point 2., there are certainly still archictures out in the wild (embedded stuff) that use segmented memory. For 3., the point that R makes about integer constant expressions has another drawback: if the code is badly optimized the & operation might be done at runtime and signal an error.
(3) Never heard of such a thing, but this is probably not enough to convice your colleagues.
I believe that nearly every optimizing compiler has broken that macro at multiple points in time. Your coworkers have apparently been lucky enough not to have been hit by it.
What happens is that some junior compiler engineer decides that because the zero page is never mapped on their platform of choice, any time anyone does anything with a pointer to that page, that's undefined behavior and they can safely optimize away the whole expression. At that point, everyone's homebrew offsetof macros break until enough people scream about it, and those of us who were smart enough not to roll our own go happily about our business.
I don't know of any compiler where this is the behavior in the current released version, but I think I've seen it happen at some point with every compiler I've ever worked with.
I am currently working on a C project that needs to be fairly portable among different building environments. The project targets POSIX-compliant systems on a hosted C environment.
One way to achieve a good degree of portability is to code under conformance to a chosen standard, but it is difficult to determine whether a given translation unit is strict-conformant to ISO C. For example, it might violate some translation limits, or it might be relying on an undefined behavior, without any diagnostic message from the compilation environment. I am not even sure whether it is possible to check for strict conformance of large projects.
With that in mind, is there any compiler, tool or method to test for strict ISO C conformance under a given standard (for example, C89 or C99) of a translation unit?
Any help is appreciated.
It is not possible in general to find undefined run-time behavior. For example, consider
void foo(int *p, int *q)
{
*p = (*q)++;
...
which is undefined if p == q. Whether that can happen can't be determined ahead of time without solving the halting problem.
(Edited to fix mistake caf pointed out. Thanks, caf.)
Not really. The C standard doesn't set any absolute minimum limits on translation units that must be accepted. As such, a perfectly accurate checker would be trivial to write, but utterly useless in practice:
#include <stdio.h>
int main(int argc, char **argv) {
int i;
for (i=1; i<argc; i++)
fprintf(stderr, "`%s`: Translation limit (potentially) exceeded.\n", argv[i]);
return 0;
}
Yes, this rejects everything, no matter how trivial. That is in accordance with the standard. As I said, it's utterly useless in practice. Unfortunately, you can't really do a whole lot better -- when you decide to port to a different implementation, you could run into some oddball resource limit you've never seen before, so any code you write (up to an including "hello world") could potentially exceed a resource limit despite being allowed by dozens or even hundreds of compilers on/for much smaller systems.
Edit:
Why a "hello world" program isn't strictly conforming
First, it's worth re-stating the definition of "strictly conforming": "A strictly conforming program shall use only those features of the language and library specified in this International Standard.2) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit."
There are actually a number of reasons "Hello, World" isn't strictly conforming. First, as implied above, the minimum requirements for implementation limits are completely meaningless -- although there has to be some program that meets certain limits that will be accepted, no other program has to be accepted, even if it doesn't even come close to any of those limits. Given the way the requirement is stated, it's open to question (at best) whether there is any such thing as a program that doesn't exceed any minimum implementation limit, because the standard doesn't really define any minimum implementation limits.
Second, during phase 1 of translation: "Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set ... " (§5.1.1.2/1). Since "Hello, World!" (or whatever variant you prefer) is supplied as a string literal in the source file, it can be (is) mapped in an implementation-defined manner to the source character set. An implementation is free to decide that (for an idiotic example) string literals will be ROT13 encoded, and as long as that fact is properly documented, it's perfectly legitimate.
Third, the output is normally written via stdout. stdout is a text stream. According to the standard: "Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation." (§7.19.2/2) As such, an implementation could (for example) do Huffman compression on the output (on Monday, Wednesday, or Friday).
So, we have (at least) three distinct points at which the output from a "Hello, World!" depends on implementation-defined characteristics -- any one of which would prevent it from fitting the definition of a strictly conforming program.
gcc has warning levels that will attempt to pin down various aspects of ANSI conformance. But hat's only a starting point.
You might start with gcc -std=c99, or gcc -ansi -pedantic.
Good luck with that. Try to avoid signed integers, because:
int f(int x)
{
return -x;
}
can invoke UB.