Why didn't gcc (or glibc) implement _s functions? - c

_s functions, such as scanf_s, printf_s seems to be optional standard. MSVC has implemented these functions, but gcc hasn't.
Is there specific reason for not implementing secure functions? Is scanf of glibc secure enough?

The _s functions are optional (Annex K of the C11 standard). They're widely regarded as 'not very beneficial'.
In the answers to my question Do you use the TR-24731 "safe" functions?, you can find information about where there are problems with the standard specification — such as crucial differences between the standard and Microsoft's implementation. TR 24731-1 was a technical report from the C standard committee. The report was incorporated almost verbatim — with an extra, previously omitted, function, memset_s() — in the C11 standard as (optional but 'normative') Annex K. There's also TR 24731-2 for a different set of functions — without the _s suffix. It ran into resistance for a different set of reasons.
Also, there is a proposal before the C Standard Committee that the functions defined in Annex K should be removed from the next revision of the standard:
N1967 Field Experience with Annex K — Bounds Checking Interfaces
That paper is a straightforward and compelling read of the reasons why the TR-24731 (*_s()) functions have not been widely implemented.
Key reasons include:
The problem is only spotted once, then fixed, and then the *_s() function is unnecessary.
This makes it very hard to test the *_s() functions, or the code which uses them.
It isn't easy to integrate the new functions into old code (which is where there'd be most benefit).
The functions inherently slow down software with extensive but redundant checking.
See the paper for more details. The paper ends with the section:
Suggested Technical Corrigendum
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.
Annex K was not removed from C17. Two new papers from the standards committee (ISO JTC1/SC22/WG14) discuss Annex K (and are in favour of retaining the functions):
Bounds-checking Interfaces: Field Experience and Future Directions
Annex K Repairs

Related

strtok_s and compilers C11 onward compliance

The declaration of strtok_s in C11, and its usage, look to be very different from the strtok_s in compilers like the latest bundled with Visual Studio 2022 (17.4.4) and also GCC 12.2.0 (looking at MinGW64 distribution).
I fear the different form has been developed as a safer and accepted alternative to strtok long before C11. What happens now if someone wants to use strtok_s and stay C11 compliant?
Are the compiler supplied libraries C11 compliant?
Maybe it's just that I've been fooled by something otherwise obvious, and someone can help me...
This is C11 (and similar is to C17 and early drafts of C23):
char *strtok_s(char * restrict s1,
rsize_t * restrict s1max,
const char * restrict s2,
char ** restrict ptr);
the same can be found as a good reference in the safec library
While MSC/VC and GCC have the form
char* strtok_s(
char* str,
const char* delimiters,
char** context
);
The C11 "Annex K bounds checking interfaces" was received with a lot of scepticism and in practice nearly no standard lib implemented it. See for example Field Experience With Annex K — Bounds Checking Interfaces.
As for the MSVC compiler, it doesn't conform to any C standard and never made such claims - you can try this out to check if you are using such a compiler or not:
#if !defined(__STDC__) || (__STDC__==0)
#error This compiler is non-conforming.
#endif
In particular, MSVC did not implement Annex K either, but already had non-standard library extensions in place prior to C11.
In practice _s means:
Possibly more safe or possibly less safe, depending on use and what the programmer expected.
Non-portable.
Possibly non-conforming.
If portability and standard conformance are important, then avoid _s functions.
In practice _s functions protect against two things: getting passed non-sanitized input or null pointers. So assuming that you do proper input sanitation and don't pass null pointers to library functions, the _s functions aren't giving you extra safety, just extra execution bloat and portability problems.
What happens now if someone wants to use strtok_s and stay C11 compliant?
You de facto can't.
And it's not limited to just strtok_s(). The entire C11 Annex K set of implementations is fractured, and because the major deviations from the standard are from Microsoft's implementation, there will probably never be a way to write portable, standard-conforming code using the Annex K functions.
Per N1967 Field Experience With Annex K — Bounds Checking Interface:
Available Implementations
Despite the specification of the APIs having been around for over a
decade only a handful of implementations exist with varying degrees of
completeness and conformance. The following is a survey of
implementations that are known to exist and their status.
While two of the implementations below are available in portable
source code form as Open Source projects, none of the popular Open
Source distribution such as BSD or Linux has chosen to make either
available to their users. At least one (GNU C Library) has repeatedly
rejected proposals for inclusion for some of the same reasons as those
noted by the Austin Group in their initial review of TR 24731-1
N1106]. It appears unlikely that the APIs will be provided by future
versions of these distributions.
Microsoft Visual Studio
Microsoft Visual Studio implements an early version of the APIs.
However, the implementation is incomplete and conforms neither to C11
nor to the original TR 24731-1. For example, it doesn't provide the
set_constraint_handler_s function but instead defines a
_invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible
signature. It also doesn't define the abort_handler_s and
ignore_handler_s functions, the memset_s function (which isn't
part of the TR), or the RSIZE_MAX macro. The Microsoft
implementation also doesn't treat overlapping source and destination
sequences as runtime-constraint violations and instead has undefined
behavior in such cases.
As a result of the numerous deviations from the specification the
Microsoft implementation cannot be considered conforming or portable.
...
Safe C Library
Safe C Library [SafeC] is a fairly efficient and portable but
unfortunately very incomplete implementation of Annex K with support
for the string manipulation subset of functions declared in
<string.h>.
Due to its lack of support for Annex K facilities beyond the
<string.h> functions the Safe C Library cannot be considered a
conforming implementation.
Even the Safe C library is non-conforming.
Whether these functions are "safer" is debatable. Read the entire document.
Unnecessary Uses
A widespread fallacy originated by Microsoft's deprecation of the standard functions in an effort to increase the adoption of the APIs is that every call to the standard functions is necessarily unsafe and should be replaced by one to the "safer" API. As a result, security-minded teams sometimes naively embark on months-long projects rewriting their working code and dutifully replacing all instances of the "deprecated" functions with the corresponding APIs. This not only leads to unnecessary churn and raises the risk of injecting new bugs into correct code, it also makes the rewritten code less efficient.
Also, read the updated N1969 Updated Field Experience With Annex K — Bounds Checking Interfaces:
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.

C11, 6.6.10: IB: other forms of constant expressions: additional conformance documentation is needed

Why it (seems that it) is a general practice for C compiler vendors to not provide to the end users an additional conformance documentation about implementation-defined behavior regarding «other forms of constant expressions» (C11, 6.6.10)?
C11, 6.6.10:
An implementation may accept other forms of constant expressions.
This fact leads to the following reactions / feedback (taken from different sources):
SO user M.M:
The compiler vendor should publish conformance documentation listing which expressions it accepts as constants, although I couldn't find
that documentation for MSVC. (leave a comment if you can!)
Source: https://stackoverflow.com/a/62161678/9881330
SO user Keith Thompson:
Admittedly the standard doesn't seem to require such documentation (which I find a little surprising).
Source: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66618 (2015-07-01 00:48:48 UTC)
Since the 6.6.10 is related to the implementation-defined behavior, and since «each implementation shall include documentation describing its characteristics and behavior» (C++ standard, section 1.9), why it is not a general practice in case of 6.6.10? If someone here represents any (industrial) C compiler vendor then please provide the reason / comment the situation.
P.S. The origin of the question is the possible portability issues related to the «other forms of constant expressions». It will be much time-saving if the end users know exactly which «other forms of constant expressions» are «accepted by the implementation» before writing the code (and not after, being surprised by the portability issues).
UPD. Note on «When making use of implementation-defined behavior, I would assume portability issues until proven otherwise». If a software product is planned to be portable between N compilers and all the N compilers support the same IB-related language feature, which is useful while writing the code, but considered implementation-defined behavior, then why not use it? The only question is that we need to know in advance that this IB-related language feature is supported between all the N compilers. (Yes, we can empirically / experimentally find it, but in case of many IB-related language features it will be probably time-consuming. It is better to have an official statement from the compiler vendor that this IB-related language feature is supported / not supported.)

Compatibility of C89/C90, C99 and C11

I just read: C Wikipedia entry. As far as I know there are 3 different versions of C that are widely used: C89, C99 and C11. My question concerns the compatibility of source code of different versions.
Suppose I am going to write a program (in C11 since it is the latest version) and import a library written in C89. Are these two versions going to work together properly when compiling all files according to the C11 specification?
Question 1:
Are the newer versions of C i.e. C99, C11 supersets of older C versions? By superset I mean, that old code will compile without errors and the same meaning when compiled according to newer C specifications.
I just read, that the // has different meanings in C89 and C99. Apart from this feature, are C99 and C11 supersets of C89?
If the answer to Question 1 is no, then I got another 2 questions.
How to 'port' old code to the new versions? Is there a document which explains this procedure?
Is it better to use C89 or C99 or C11?
Thanks for your help in advance.
EDIT: changed ISO C to C89.
Are the newer versions of C i.e. C99, C11 supersets of older C versions?
There are many differences, big and subtle both. Most changes were adding of new features and libraries. C99 and C11 are not supersets of C90, although there was a great deal of effort made to ensure backwards-compatibility. C11 is however mostly a superset of C99.
Still older code could break when porting from C90, in case the code was written poorly. Particularly various forms of "implicit int" and implicit function declarations were banned from the language with C99. C11 banned the gets function.
A complete list of changes can be found in the C11 draft page 13 of the pdf, where "the third edition" refers to C11 and "the second edition" refers to C99.
How to 'port' old code to the new versions? Is there a document which explains this procedure?
I'm not aware about any such document. If you have good code, porting is easy. If you have rotten code, porting will be painful. As for the actual porting procedure, it will be easy if you know the basics of C99 and C11, so the best bet is to find a reliable source of learning which addresses C99/C11.
Porting from C99 to C11 should be effortless.
Is it better to use C89 or C99 or C11?
It is best to use C11 as that is the current standard. C99 and C11 both contained various "language bug fixes" and introduced new, useful features.
In most ways, the later versions are supersets of earlier versions. While C89 code which tries to use restrict as an identifier will be broken by C99's addition of a reserved word with the same spelling, and while there are some situations in which code which is contrived to exploit some corner cases with a parser will be treated differently in the two languages, most of those are unlikely to be important.
A more important issue, however, has to do with memory aliasing. C89 include
rules which restrict the types of pointers that can be used to access certain
objects. Since the rules would have made functions like malloc() useless if
they applied, as written, to the objects created thereby, most programmers and
compiler writers alike treated the rules as applying only in limited cases (I doubt C89 would have been widely accepted if people didn't believe the rules applied only narrowly). C99 claimed to "clarify" the rules, but its new rules are much more expansive in effect than contemporaneous interpretations of the old ones, breaking a lot of code that would have had defined behavior under those common interpretations of C89, and even some code which would have been unambiguously defined under C89 has no practical C99 equivalent.
In C89, for example, memcpy could be used to copy the bit pattern associated with an object of any type to an object of any other type with the same size, in any cases where that bit pattern would represent a valid value in the destination type. C99 added language which allows compilers to behave in arbitrary fashion if memcpy is used to copy an object of some type T to storage with no declared type (e.g. storage returned from malloc), and that storage is then read as object of a type that isn't alias-compatible with T--even if the bit pattern of the original object would have a valid meaning in the new type. Further, the rules that apply to memcpy also apply in cases where an object is copied as an array of character type--without clarifying exactly what that means--so it's not clear exactly what code would need to do to achieve behavior matching the C89 memcpy.
On many compilers such issues can be resolved by adding a -fno-strict-aliasing option to the command line. Note that specifying C89 mode may not be sufficient, since compilers writers often use the same memory semantics regardless of which standard they're supposed to be implementing.
The newer versions of C are definitely not strict super-sets of the older versions.
Generally speaking, this sort of problem only arises when upgrading the compiler or switching compiler vendors. You must plan for a lot of minor touches of the code to deal with this event. A lot of the time, the new compiler will catch issues that were left undiagnosed by the old compiler, in addition to minor incompatibilities that may have occurred by enforcing the newer C standard.
If it is possible to determine, the best standard to use is the one that the compiler supports the best.
The C11 wikipedia article has a lengthy description of how it differs from C99.
In general, newer versions of the standard are backward compatible.
If not, you can compile different .c files to different .o files, using different standards, and link them together. That does work.
Generally you should use the newest standard available for new code and, if it's easy, fix code that the new standard does break instead of using the hacky solution above.
EDIT: Unless you're dealing with potentially undefined behavior.

How to install C11 compiler on Mac OS with optional string functions included?

I'm trying the below code to see if the optional string functions in C are supported (I've got Mac OS X El Capitan and XCode installed)...
#include <stdio.h>
int main(void)
{
#if defined __STDC_LIB_EXT1__
printf("Optional functions are defined.\n");
#else
printf("Optional functions are not defined.\n");
#endif
return 0;
}
...but it suggests they aren't.
I've tried all the different compilers I have from XCode (cc, gcc, llvm-gcc, clang).
I've also tried brew install gcc assuming that the GNU C compiler would give me these extra functions, but it doesn't.
Is there a way to simply install a C11 compatible compiler on Mac OS that'll give me these additional (i.e. safe) string functions.
Summary: You won't get it to work. There are better ways to make sure your code is correct. For now, use the address sanitizer instead.
Also known as "Annex K" of the C11 standard or TR 24731, these functions are not widely implemented. The only commonly available implementation is part of Microsoft Visual Studio, other common C implementations have rejected (explicitly, even) the functionality in annex K. So, while annex K is technically part of the standard, for practical purposes it should be treated as a Microsoft-specific extension.
See Field Experience With Annex K — Bounds Checking Interfaces (document N1967) for more information. According to this report, there are only four implementations of annex K, two are for Windows, one is considered "very incomplete" and the remaining one is "unsuitable for production use without considerable changes."
However, the argument that these string functions are "safe" is a bit misleading. These functions merely add bounds checking, which only works if the functions are called correctly—but then again, the "non-safe" functions only work if they are called correctly too. From the report cited above,
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.
I suggest using the address sanitizer as an alternative.
Do not use strncpy, strncat or the like as "safe" functions, they're not designed to do that and they are not drop-in replacements for strcpy, strcat, etc., unlike strcpy_s, strcat_s, which are drop-in replacements.
If you are not using Windows or Embarcadero you need to use the external safeclib: https://github.com/rurban/safeclib/releases
No other libc's comes with the safe C11 Annex K extensions.
For an overview of the various libc quirks regarding this see https://rurban.github.io/safeclib/doc/safec-3.3/d1/dae/md_doc_libc-overview.html

Which version of C is more appropriate for students to learn- C89/90 or C99?

I'm looking into learning C basics and syntax before beginning Systems Programming next month. When doing some reading, I came across the C89/99 standards. According to Wikipedia,
C99 introduced several new features,
including inline functions, several
new data types (including long long
int and a complex type to represent
complex numbers), variable-length
arrays, support for variadic macros
(macros of variable arity) and support
for one-line comments beginning with
//, as in BCPL or C++. Many of these
had already been implemented as
extensions in several C compilers.
C99 is for the most part backward
compatible with C90, but is stricter
in some ways; in particular, a
declaration that lacks a type
specifier no longer has int
implicitly assumed. A standard macro
STDC_VERSION is defined with value 199901L to indicate that C99 support
is available. GCC, Sun Studio and
other compilers now support many or
all of the new features of C99.
I borrowed a copy of K&R, 2nd Edition, and it uses the C89 standard. For a student, does the use of C89 invalidate some subjects covered in K&R, and if so, what should I look out for?
There is no reason to learn C89 or C90 over C99- it's been very literally superseded. It's easy to find C99 compilers and there's no reason whatsoever to learn an earlier standard.
This doesn't mean that your professor won't force C89 upon you. From the various questions posted here marked homework, I get the feeling that many, many C (and, unfortunately, C++) courses haven't moved on since C89.
From the perspective of a starting student, the chances are that you won't really notice the difference- there's plenty of C that's both C99 and C89/90 to be covered.
Use the C99 standard, it's newer and has more features. Particularly useful may be the bool type in <stdbool.h> and the int32_t etc. family of types; the latter prevents a lot of unportable code that relies on ints having a certain size. AFAIK, it doesn't invalidate K&R, though some example programs may be written in a slightly different style now.
Note that some compilers still don't support C99 properly. I believe that GCC still requires the use of a -std=c99 flag to enable it; many Unix/Linux systems have a c99 command that wraps GCC and enables C99.
The same goes for many university professors. I surprised mine by handing in a program that used bool in my freshman year. He'd never heard of that type in C :)
While I generally agree with the others, it is worth noting that K&R is such a good book that it might be worth learning C from it and then updating your knowledge as you read about the C99 standard.
If you are at student level you probably won't even notice the differences.
Yes, it's a bit odd that you can get a loud consensus that K&R is a great C book, and also a loud consensus that C99 is the correct/current/best version of C. The two positions are incompatible - even if K&R is the best book available to learn "C meaning C99", that just implies the rest are rubbish, or are also hopelessly outdated.
I would advise learning and using C99, but keeping an eye to C89 as you do so. If you use a compiler that has both C89 and C99 compliant modes, then you can write a few bits of C89 just to get an idea of the differences. Then if you ever need to write some code intended to be portable to places that C99 doesn't go, you'll know what to do. If you never have to write any such code, then you've wasted perhaps a day.
Writing C89 properly is actually surprisingly difficult, because getting hold of a copy of the C89 standard is difficult. So, C99 if you can, C89 if for some odd reason you have to, and have some awareness what the difference is. Maybe use K&R to cover the very basics, but get a look at some idiomatic C99 as soon as possible.
As for specific issues to be aware of when reading K&R: there's a list of major changes in the foreword of the standard (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf), although the details aren't laid out there. A lot of them are new features added to C99, so it's not that K&R is wrong, it just may not always use the best tools for a given job. Some of them are quite fiddly things where you should probably consult the standard if you need the details anyway. The rest are things removed from C89, that usually a C99 compiler will tell you about as and when you try to use them.
As a student, that doesn't influence you so much. But if possible, you should find a new C book which covers C99
The term "C89" describes two very different languages:
The language that programmers in 1989 thought the Committee was describing in places where the Standard was ambiguous, and which supported features that were common in pre-existing implementations.
The language that the Committee has since decided that it wanted to have described, which threw compatibility with existing functionality out the
window.
C99 "clarifies" ambiguous parts of the standard by saying that they meant
to have the Standard interpreted in a way that would have broken a substantial
fraction of existing code and made it impossible to perform many tasks as
efficiently as they had been performed in C before 1989.
The right language to program in, for many applications, would be the superset of pre-Standard C, C89, C99, and C11. It's important, however, that anyone programming in that language be clear that they're using that language rather than a shrinking subset which favors speed over reliability.
While I think it's beneficial to know which features are more recent and less likely to be supported by obscure (or intentionally-broken, like MSVC) compilers, there are a few C99 features that you should absolutely use:
snprintf: This is the definitive function for safe and clean string assembly in C. If your compiler is missing it, you can either replace the whole printf subsystem (probably a good idea since most implementations with missing snprintf are also full of (often intentional) bugs in printf behavior), or wrap tmpfile/fprintf/fread/fclose.
stdint.h: If you need fixed-size types (16/32/64-bit), use the standard names int16_t, uint16_t, int32_t, etc. Do not invent your own, and absolutely don't use system-specific ones like INT64 or u32. It just makes your code ugly and hard to integrate and reuse. If your compiler is missing stdint.h, just drop in your own to define the types in terms of the correct-for-your-platform types.
Specifically uint64_t, in place of int foo[2]; or struct { int lo, int hi; } foo; or other hideous legacy hacks to work with 64-bit numbers. Any sane compiler even without C99 support has its own 64-bit types you can use to define int64_t and uint64_t.

Resources