memset_s(): What does the standard mean with this piece of text? - c

In C11, K.3.7.4.1 The memset_s function, I found this bit of rather confusing text:
Unlike memset, any call to the memset_s function shall be evaluated strictly according to the rules of the abstract machine as described in (5.1.2.3). That is, any call to the memset_s function shall assume that the memory indicated by s and n may be accessible in the future and thus must contain the values indicated by c.
This implies that memset is not (necessarily) "evaluated strictly according to the rules of the abstract machine". (The chapter referenced is 5.1.2.3 Program execution.)
I fail to understand the leeway the standard gives to memset that is explicitly ruled out here for memset_s, and what that would mean for an implementor of either function.

Imagine you have read a password:
{
char password[128];
if (fgets(password, sizeof(password), stdin) != 0)
{
password[strcspn(password), "\n\r"]) = '\0';
validate_password(password);
memset(password, '\0', sizeof(password));
}
}
You've carefully zapped the password so it can't be found accidentally.
Unfortunately, the compiler is allowed to omit that memset() call because password is not used again. The rule for memset_s() means that the call cannot be omitted; the password variable must be zeroed, regardless of optimization.
memset_s(password, sizeof(password), '\0', sizeof(password));
This is one of the few really useful features in Annex K. (We can debate the merits of having to repeat the size. However, in a more general case, the second size can be a variable, not a constant, and then the first size becomes a runtime protection against the variable being out of control.)
Note that this requirement is placed on the compiler rather than the library. The memset_s() function will behave correctly if it is called, just as memset() will behave correctly if it is called. The rule under discussion says that the compiler must call memset_s(), even though it may be able omit the call to memset() because the variable is never used again.

Related

Last value assigned to variable not used [MISRA 2012 Rule 2.2, required]

I'm getting this warning while using a piece of code written like below:
//Macro
#define FREEIF(p) if (p) { free_mem((void*)p); (p) = 0; }
//free_mem function
int free_mem(void *mem_ptr)
{
if (mem_ptr != NULL)
{
free(mem_ptr);
}
mem_ptr = NULL;
return 0;
}
//Use of Macro in my .c file with above declaration and definition of macro.
....
....
{
FREEIF(temp_ptr);
}
If I add a check for "temp_ptr" e.g if (temp_ptr) {FREEIF(temp_ptr);} before calling MACRO, I don't get this warning.
As I'm already checking the "temp_ptr" inside MACRO. I am wondering why I get this warning.
Any insight?
Regarding the error, rule 2.2 is about not having any "dead code" in your program.
In the function, mem_ptr = NULL; sets the local variable mem_ptr to null, not the one passed. So that code line does nothing. This is the reason for the error and it's a common beginner FAQ, see Dynamic memory access only works inside function for details.
In the function-like macro, the passed pointer would however get changed by (p) = 0;. But if you aren't using the pointer after setting it to null, it's still regarded as "dead code" since the assignment is then strictly speaking pointless (although good practice). We can't tell since you didn't post the calling code nor the actual pointer declaration.
But there's some far more serious big picture issues here:
Using MISRA-C and dynamic memory allocation at the same time is nonsensical. They are pretty much mutually exclusive. Embedded systems in general don't use dynamic allocation, especially not bare metal/RTOS MCU applications where it simply doesn't make any sense.
Dynamic allocation is particularly banned in mission-critical/safety-related software. This is not only banned by MISRA, but by any coding standard out there. It is also banned by generic safety standards like IEC 61508, ISO 26262, DO 178 etc.
Safety and MISRA aside, your macro is still nonsense since free() on a null pointer is a well-defined no-op. See the definition of free in C17 7.22.3.3:
void free(void *ptr); /--/ If ptr is a null pointer, no action occurs.
So all the macro achieves is to obfuscate the code and slow it down with an extra, pointless branch.
The correct solution here is to nuke this macro, then take a step back and consider what you are even doing with this project. Start with the requirements. Why do you need MISRA-C, does somebody in this project know what they are doing, if not - who should we hire to help with this project. And so on. You need at least one C veteran on the team for a project with MISRA-C or otherwise the project is doomed.
Arguments in C are passed by value. This means the values passed as arguments are copied to the arguments visible inside the function and modifying the arguments from the function don't have anye effect to the original values passed.
Therefore, the line mem_ptr = NULL; is meaningless (at least it seems meaningless assuming that no undefined behavior like out-of-bounds read or dereferencing invalidated pointers) because mem_ptr is local to the function and the value is not read at all after the assignment.
On the other hand, (p) = 0; may not be meaningless because p is not declared in the macro, so it will refer what is declared before the macro and that may be read after the invocation of the macro.

Guarantee of non-equality of pointers to standard functions?

Does the C language guarantee that pointers to differently-named standard functions must compare not-equal?
Per 6.5.9 Equality Operators, ¶6,
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, ...
I seem to recall seeing an interpretation claiming that aliases (multiple identifiers for the "same function") are permissible for standard functions, the canonical candidates for such treatment being getc==fgetc and putc==fputc; however, I don't know where I might have seen it, and I'm skeptical of the concept.
Is there any official interpretation or well-accepted argument for or against this possibility?
No, I don't believe there is any such guarantee.
I believe this whole discussion originates from the part of the standard which allows a function to also be defined as a macro with the same name.
From C17 7.1.4:
Any
function declared in a header may be additionally implemented as a function-like macro defined
in the header, so if a library function is declared explicitly when its header is included, one of the
techniques shown below can be used to ensure the declaration is not affected by such a macro. Any
macro definition of a function can be suppressed locally by enclosing the name of the function in
parentheses, because the name is then not followed by the left parenthesis that indicates expansion of
a macro function name. For the same syntactic reason, it is permitted to take the address of a library
function even if it is also defined as a macro189).
189) This means that an implementation shall provide an actual function for each library function, even if it also provides a
macro for that function.
The text goes on describing how users may #undef the macro name if they want to be guaranteed that they get an actual function.
So it is allowed for the implementation to have a standard function and a macro with the same name. But what the macro then expands to is implementation-defined. It may very well be an internal function with the same address as what another library macro expands to.
Based on that, I don't believe there are any guarantees that different functions have different addresses.
In the specific case of getc, the standard says (C17 7.21.7.5):
The getc function is equivalent to fgetc, except that if it is implemented as a macro, it may evaluate stream more than once, so the argument should never be an expression
with side effects.
I would say it is somewhat likely that the implementation calls the same actual function for fgetc and getc when these are implemented as macros. (Or that atoi versus strtol call the same function, etc etc). The standard library implementations I have peeked at don't seem to do it this way, but I don't think there is anything in the standard stopping them.
(As a side note, taking the address of library functions may not be a good idea for other reasons, namely that it may block inlining of that function within the same translation unit.)
Well you are falling in an implementation detail. The standard only specifies the behaviour of the functions of the standard library.
For getc the spec says (emphasize mine):
The getc function is equivalent to fgetc, except that if it is implemented as a macro, it
may evaluate stream more than once, so the argument should never be an expression
with side effects.
So the implementation may implement getc as a macro, but it also may implement it as an alias (a mere pointer to func) to fgetc or as a different function with same behaviour. Long story short you cannot rely on &getc == &fgetc to be true or false.
The only thing that the standard requires is that '&getc` must be defined per 7.1.4 § 1:
... it is permitted to take the address of a library function even if it is also defined as
a macro...
That just means that the implementation must have a function of that name, but it could:
be the fgets function - ok &fgetc == &getc is true
use the macro - fgetc == &getc is false
call the fgets function - &fgetc == &getc is false

Printing null pointers with %p is undefined behavior?

Is it undefined behavior to print null pointers with the %p conversion specifier?
#include <stdio.h>
int main(void) {
void *p = NULL;
printf("%p", p);
return 0;
}
The question applies to the C standard, and not to C implementations.
This is one of those weird corner cases where we're subject to the limitations of the English language and inconsistent structure in the standard. So at best, I can make a compelling counter-argument, as it's impossible to prove it :)1
The code in the question exhibits well-defined behaviour.
As [7.1.4] is the basis of the question, let's start there:
Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, [... other examples ...]) [...] the behavior is undefined. [... other statements ...]
This is clumsy language. One interpretation is that the items in the list are UB for all library functions, unless overridden by the individual descriptions. But the list starts with "such as", indicating that it's illustrative, not exhaustive. For example, it does not mention correct null-termination of strings (critical for the behaviour of e.g. strcpy).
Thus it's clear the intent/scope of 7.1.4 is simply that an "invalid value" leads to UB (unless stated otherwise). We have to look to each function's description to determine what counts as an "invalid value".
Example 1 - strcpy
[7.21.2.3] says only this:
The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
It makes no explicit mention of null pointers, yet it makes no mention of null terminators either. Instead, one infers from "string pointed to by s2" that the only valid values are strings (i.e. pointers to null-terminated character arrays).
Indeed, this pattern can be seen throughout the individual descriptions. Some other examples:
[7.6.4.1 (fenv)] store the current floating-point environment in the object pointed to by envp
[7.12.6.4 (frexp)] store the integer in the int object pointed to by exp
[7.19.5.1 (fclose)] the stream pointed to by stream
Example 2 - printf
[7.19.6.1] says this about %p:
p - The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner.
Null is a valid pointer value, and this section makes no explicit mention that null is a special case, nor that the pointer has to point at an object. Thus it is defined behaviour.
1. Unless a standards author comes forward, or unless we can find something similar to a rationale document that clarifies things.
The Short Answer
Yes. Printing null pointers with the %p conversion specifier has undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.
The answer applies to any of the C standards (C89/C99/C11).
The Long Answer
The %p conversion specifier expects an argument of type pointer to void, the conversion of the pointer to printable characters is implementation-defined. It doesn't state that a null pointer is expected.
The introduction to the standard library functions states that null pointers as arguments to (standard library) functions are considered to be invalid values, unless it is explicitly stated otherwise.
C99 / C11 §7.1.4 p1
[...] If an argument to a function has an invalid value (such as [...] a null pointer, [...] the behavior is undefined.
Examples for (standard library) functions that expect null pointers as valid arguments:
fflush() uses a null pointer for flushing "all streams" (that apply).
freopen() uses a null pointer for indicating the file "currently associated" with the stream.
snprintf() allows to pass a null pointer when 'n' is zero.
realloc() uses a null pointer for allocating a new object.
free() allows to pass a null pointer.
strtok() uses a null pointer for subsequent calls.
If we take the case for snprintf(), it makes sense to allow passing a null pointer when 'n' is zero, but this is not the case for other (standard library) functions that allow a similar zero 'n'. For example: memcpy(), memmove(), strncpy(), memset(), memcmp().
It is not only specified in the introduction to the standard library, but also once again in the introduction to these functions:
C99 §7.21.1 p2 / C11 §7.24.1 p2
Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values as described in 7.1.4.
Is it intentional?
I don't know whether the UB of %p with a null pointer is in fact intentional, but since the standard explicitly states that null pointers are considered invalid values as arguments to standard library functions, and then it goes and explicitly specifies the cases where a null pointer is a valid argument (snprintf, free, etc), and then it goes and once again repeats the requirement for the arguments to be valid even in zero 'n' cases (memcpy, memmove, memset), then I think it's reasonable to assume that the C standards committee isn't too concerned with having such things undefined.
The authors of the C Standard made no effort to exhaustively list all of the behavioral requirements an implementation must meet to be suitable for any particular purpose. Instead, they expected that people writing compilers would exercise a certain amount of common sense whether the Standard requires it or not.
The question of whether something invokes UB is seldom in and of itself useful. The real questions of importance are:
Should someone who is trying to write a quality compiler make it behave in predictable fashion? For the described scenario the answer is clearly yes.
Should programmers be entitled to expect that quality compilers for anything resembling normal platforms will behave in predictable fashion? In the described scenario, I would say the answer is yes.
Might some obtuse compiler writers stretch the interpretation of the Standard so as to justify doing something weird? I would hope not, but wouldn't rule it out.
Should sanitizing compilers squawk about the behavior? That would depend upon the paranoia level of their users; a sanitizing compiler probably shouldn't default to squawking about such behavior, but perhaps provide a configuration option to do in case programs might be ported to "clever"/dumb compilers that behave weirdly.
If a reasonable interpretation of the Standard would imply a behavior is defined, but some compiler writers stretch the interpretation to justify doing otherwise, does it really matter what the Standard says?

Global Variable Access Relative to Function Calls and Returns

I have been researching this topic and I can not find a specific authoritative answer. I am hoping that someone very familiar with the C spec can answer - i.e. confirm or refute my assertion, preferably with citation to the spec.
Assertion:
If a program consists of more than one compilation unit (separately compiled source file), the compiler must assure that global variables (if modified) are written to memory before any call to a function in another unit or before the return from any function. Also, in any function, the global must be read before its first use. Also after a call of any function, not in the same unit, the global must be read before use. And these things must be true whether the variable is qualified as "volatile" or not because a function in another compilation unit (source file) could access the variable without the compiler's knowledge. Otherwise, "volatile" would always be required for global variables - i.e. non-volatile globals would have no purpose.
Could the compiler treat functions in the same compilation unit differently than ones that aren't? All of the discussions I have found for the "volatile" qualifier on globals show all functions in the same compilation unit.
Edit: The compiler cannot know whether functions in other units use the global or not. Therefore I am assuming the above conditions.
I found these two other questions with information related to this topic but they don't address it head on or they give information that I find suspect:
Are global variables refreshed between function calls?
When do I need to use volatile in ISRs?
[..] in any function, the global must be read before its first use.
Definitely not:
static int variable;
void foo(void) {
variable = 42;
}
Why should the compiler bother generating code to read the variable?
The compiler must assure that global variables are written to memory before any function call or before the return from a function.
No, why should it?
void bar(void) {
return;
}
void baz(void) {
variable = 42;
bar();
}
bar is a pure function (should be determinable for a decent compiler), so there's no chance of getting any different behaviour when writing to memory after the function call.
The case of "before returning from a function" is tricky, though. But I think the general statement ("must") is false if we count inlined (static) functions, too.
Could the compiler treat functions in the same compilation unit differently than ones that aren't?
Yes, I think so: for a static function (whose address is never taken) the compiler knows exactly how it is used, and this information could be used to apply some more radical optimisations.
I'm basing all of the above on the C version of the As-If rule, specified in §5.1.2.3/6 (N1570):
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place as specied in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
This is theobservable behaviorof the program.
In particular, you might want to read the following "EXAMPLE 1".

Is &errno legal C?

Per 7.5,
[errno] expands to a modifiable lvalue175) that has type int, the value of which is set to a positive error number by several library functions. It is unspecified whether errno is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual object, or a program defines an identifier with the name errno, the behavior is undefined.
175) The macro errno need not be the identifier of an object. It might expand to a modifiable lvalue resulting from a function call (for example, *errno()).
It's not clear to me whether this is sufficient to require that &errno not be a constraint violation. The C language has lvalues (such as register-storage-class variables; however these can only be automatic so errno could not be defined as such) for which the & operator is a constraint violation.
If &errno is legal C, is it required to be constant?
So §6.5.3.2p1 specifies
The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.
Which I think can be taken to mean that &lvalue is fine for any lvalue that is not in those two categories. And as you mentioned, errno cannot be declared with the register storage-class specifier, and I think (although am not chasing references to check right now) that you cannot have a bitfield that has type of plain int.
So I believe that the spec requires &(errno) to be legal C.
If &errno is legal C, is it required to be constant?
As I understand it, part of the point of allowing errno to be a macro (and the reason it is in e.g. glibc) is to allow it to be a reference to thread-local storage, in which case it will certainly not be constant across threads. And I don't see any reason to expect it must be constant. As long as the value of errno retains the semantics specified, I see no reason a perverse C library could not change &errno to refer to different memory addresses over the course of a program -- e.g. by freeing and reallocating the backing store every time you set errno.
You could imagine maintaining a ring buffer of the last N errno values set by the library, and having &errno always point to the latest. I don't think it would be particularly useful, but I can't see any way it violates the spec.
I am surprised nobody has cited the C11 spec yet. Apologies for the long quote, but I believe it is relevant.
7.5 Errors
The header defines several macros...
...and
errno
which expands to a modifiable lvalue(201) that has type int and thread local
storage duration, the value of which is set to a positive error number by
several library functions. If a
macro definition is suppressed in order to access an actual object, or
a program defines an identifier with the name errno, the behavior is
undefined.
The value of errno in the initial thread is zero at
program startup (the initial value of errno in other threads is an
indeterminate value), but is never set to zero by any library
function.(202) The value of errno may be set to nonzero by a library
function call whether or not there is an error, provided the use of
errno is not documented in the description of the function in this
International Standard.
(201) The macro errno need not be the identifier of an object. It might expand to a
modifiable lvalue resulting from a function call (for example, *errno()).
(202) Thus, a program that uses errno for error checking should set it to zero before a
library function call, then inspect it before a subsequent library function call. Of
course, a library function can save the value of errno on entry and then set it to zero,
as long as the original value is restored if errno’s value is still zero just before the
return.
"Thread local" means register is out. Type int means bitfields are out (IMO). So &errno looks legal to me.
Persistent use of words like "it" and "the value" suggests the authors of the standard did not contemplate &errno being non-constant. I suppose one could imagine an implementation where &errno was not constant within a particular thread, but to be used the way the footnotes say (set to zero, then check after calling library function), it would have to be deliberately adversarial, and possibly require specialized compiler support just to be adversarial.
In short, if the spec does permit a non-constant &errno, I do not think it was deliberate.
[update]
R. asks an excellent question in the comments. After thinking about it, I believe I now know the correct answer to his question, and to the original question. Let me see if I can convince you, dear reader.
R. points out that GCC allows something like this at the top level:
register int errno asm ("r37"); // line R
This would declare errno as a global value held in register r37. Obviously, it would be a thread-local modifiable lvalue. So, could a conforming C implementation declare errno like this?
The answer is no. When you or I use the word "declaration", we usually have a colloquial and intuitive concept in mind. But the standard does not speak colloquially or intuitively; it speaks precisely, and it aims only to use terms that are well-defined. In the case of "declaration", the standard itself defines the term; and when it uses the term, it is using its own definition.
By reading the spec, you can learn precisely what a "declaration" is and precisely what it is not. Put another way, the standard describes the language "C". It does not describe "some language that is not C". As far as the standard is concerned, "C with extensions" is just "some language that is not C".
Thus, from the standard's point of view, line R is not a declaration at all. It does not even parse! It might as well read:
long long long __Foo_e!r!r!n!o()blurfl??/**
As far as the spec is concerned, this is just as much a "declaration" as line R; i.e., not at all.
So, when C11 spec says, in section 6.5.3.2:
The operand of the unary & operator shall be either a function
designator, the result of a [] or unary * operator, or an lvalue that
designates an object that is not a bit-field and is not declared with
the register storage-class specifier.
...it means something very precise that does not refer to anything like Line R.
Now, consider the declaration of the int object to which errno refers. (Note: I do not mean the declaration of the errno name, since of course there might be no such declaration if errno is, say, a macro. I mean the declaration of the underlying int object.)
The above language says you can take the address of an lvalue unless it designates a bit-field or it designates an object "declared" register. And the spec for the underlying errno object says it is a modifiable int lvalue with thread-local duration.
Now, it is true that the spec does not say that the underlying errno object must be declared at all. Maybe it just appears via some implementation-defined compiler magic. But again, when the spec says "declared with the register storage-class specifier", it is using its own terminology.
So either the underlying errno object is "declared" in the standard sense, in which case it cannot be both register and thread-local; or it is not declared at all, in which case it is not declared register. Either way, since it is an lvalue, you may take its address.
(Unless it is a bit-field, but I think we agree that a bit field is not an object of type int.)
The original implementation of errno was as a global int variable that various Standard C Library components used to indicate an error value if they ran into an error. However even in those days one had to be careful about reentrant code or with library function calls that could set errno to a different value as you were handling an error. Normally one would save the value in a temporary variable if the error code was needed for any length of time due to the possibility of some other function or piece of code setting the value of errno either explicitly or through a library function call.
So with this original implementation of a global int, using the address of operator and depending on the address to remain constant was pretty much built into the fabric of the library.
However with multi-threading, there was no longer a single global because having a single global was not thread safe. So the idea of having thread local storage perhaps using a function that returns a pointer to an allocated area. So you might see a construct something like the following entirely imaginary example:
#define errno (*myErrno())
typedef struct {
// various memory areas for thread local stuff
int myErrNo;
// more memory areas for thread local stuff
} ThreadLocalData;
ThreadLocalData *getMyThreadData () {
ThreadLocalData *pThreadData = 0; // placeholder for the real thing
// locate the thread local data for the current thread through some means
// then return a pointer to this thread's local data for the C run time
return pThreadData;
}
int *myErrno () {
return &(getMyThreadData()->myErrNo);
}
Then errno would be used as if it were a single global rather than a thread safe int variable by errno = 0; or checking it like if (errno == 22) { // handle the error and even something like int *pErrno = &errno;. This all works because in the end the thread local data area is allocated and stays put and is not moving around and the macro definition which makes errno look like an extern int hides the plumbing of its actual implementation.
The one thing that we do not want is to have the address of errno suddenly shift between time slices of a thread with some kind of a dynamic allocate, clone, delete sequence while we are accessing the value. When your time slice is up, it is up and unless you have some kind of synchronization involved or some way to keep the CPU after your time slice expires, having the thread local area move about seems a very dicey proposition to me.
This in turn implies that you can depend on the address of operator giving you a constant value for a particular thread though the constant value will differ between threads. I can well see the library using the address of errno in order to reduce the overhead of doing some kind of thread local lookup every time a library function is called.
Having the address of errno as constant within a thread also provides backwards compatibility with older source code which used the errno.h include file as they should have done (see this man page from linux for errno which explicitly warns to not use extern int errno; as was common in the old days).
The way I read the standard is to allow for this kind of thread local storage while maintaining the semantics and syntax similar to the old extern int errno; when errno is used and allowing the old usage as well for some kind of cross compiler for an embedded device that does not support multi-threading. However the syntax may be similar due to the use of a macro definition so the old style short cut declaration should not be used because that declaration is not what the actual errno really is.
We can find a counterexample: because a bit-field could have type int, errno can be a bit-field. In that case, &errno would be invalid. The behavior of standard is here to do not explicitly say you can write &errno, so the definition of the undefined behavior applies here.
C11 (n1570), § 4. Conformance
Undefined behavior is otherwise indicated in this International
Standard by the words ‘‘undefined behavior’’ or by the omission of any
explicit definition of behavior.
This seems like a valid implementation where &errno would be a constraint violation:
struct __errno_struct {
signed int __val:12;
} *__errno_location(void);
#define errno (__errno_location()->__val)
So I think the answer is probably no...

Resources