Why pthread_self is marked with attribute(const)? - c

In Glibc's pthread.h the pthread_self function is declared with the const attribute:
extern pthread_t pthread_self (void) __THROW __attribute__ ((__const__));
In GCC that attribute means:
Many functions do not examine any values except their arguments, and have no effects except the return value. Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
I wonder how that's supposed to be? Since it does not take any argument, pthread_self is therefore allowed only to always return the same value, which is obviously not the case. That is, I would have expected pthread_self to read global memory, and therefore eventually be marked as pure instead:
Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.
The implementation on x86-64 seems to be actually reading global memory:
# define THREAD_SELF \
({ struct pthread *__self; \
asm ("mov %%fs:%c1,%0" : "=r" (__self) \
: "i" (offsetof (struct pthread, header.self))); \
__self;})
pthread_t
__pthread_self (void)
{
return (pthread_t) THREAD_SELF;
}
strong_alias (__pthread_self, pthread_self)
Is this a bug or am I not seeing something?

The attribute was most likely added in the assumption that GCC would only use it locally (within a function), and would never be able to use it for inter-procedural optimizations. Today, some of Glibc developers are questioning the correctness of the attribute exactly because powerful inter-procedural optimization could, potentially, lead to miscompilation; quoting post by Torvald Riegel to Glibc developers' mailing list,
The const attribute is specified as asserting that the function does not
examine any data except the arguments. __errno_location has no
arguments, so it would have to return the same values every time.
This works in a single-threaded program, but not in a multi-threaded
one. Thus, I think that strictly speaking, it should not be const.
We could argue that this magically is meant to always be in the context
of a specific thread. Ignoring that GCC doesn't define threads itself
(especially in something like NPTL which is about creating a notion of
threads), we could still assume that this works because in practice, the
compiler and its passes can't leak knowledge across a function used in
one thread and other one used in another thread.
(__errno_location() and pthread_self() both are marked with __attribute__((const)) and receive no arguments).
Here's a small example that could plausibly be miscompiled with powerful interprocedural analysis:
#include <pthread.h>
#include <errno.h>
#include <stdlib.h>
static void *errno_pointer;
static void *thr(void *unused)
{
if (!errno_pointer || errno_pointer == &errno)
abort();
return 0;
}
int main()
{
errno_pointer = &errno;
pthread_t t;
pthread_create(&t, 0, thr, 0);
pthread_join(t, 0);
}
(the compiler can observe that errno_pointer is static, it does not escape the translation unit, and the only store into it assigns the same "const" value, given by __errno_location(), that is tested in thr()). I've used this example in my email asking to improve documentation of pure/const attributes, but unfortunately it didn't get much traction.

I wonder how that's supposed to be?
This attribute is telling the compiler that in a given context pthread_self will always return the same value. In other words, the two loops below are exactly equivalent, and the compiler is allowed to optimize out the second (and all subsequent) calls to pthread_self:
// loop A
std::map<pthread_t, int> m;
for (int j = 0; j < 1000; ++j)
m[pthread_self()] += 1;
// loop B
std::map<pthread_t, int> m;
const pthread_t self = pthread_self();
for (int j = 0; j < 1000; ++j)
m[self] += 1;
The implementation on x86-64 seems to be actually reading global memory
No, it does not. It reads thread-local memory.

Related

Questions regarding (non-)volatile and optimizing compilers

I have the following C code:
/* the memory entry points to can be changed from another thread but
* it is not declared volatile */
struct myentry *entry;
bool isready(void)
{
return entry->status == 1;
}
bool isready2(int idx)
{
struct myentry *x = entry + idx;
return x->status == 1;
}
int main(void) {
/* busy loop */
while (!isready())
;
while (!isready2(5))
;
}
As I note in the comment, entry is not declared as volatile even though the array it points to can be changed from another thread (or actually even directly from kernel space).
Is the above code incorrect / unsafe? My thinking is that no optimization could be performed in the bodies of isready, isready2 and since I repeatedly perform function calls from within main the appropriate memory location should be read on every call.
On the other hand, the compiler could inline these functions. Is it possible that it does it in a way that results in a single read happening (hence causing an infinite loop) instead of multiple reads (even if these reads come from a load/store buffer)?
And a second question. Is it possible to prevent the compiler from doing optimizations by casting to volatile only in certain places like that?
void func(void)
{
entry->status = 1;
while (((volatile struct myentry *) entry)->status != 2)
;
}
Thanks.
If the memory entry points to can be modified by another thread, then the program has a data race and therefore the behaviour is undefined . This is still true even if volatile is used.
To have a variable accessed concurrently by multiple threads, in ISO C11, it must either be an atomic type, or protected by correct synchronization.
If using an older standard revision then there are no guarantees provided by the Standard about multithreading so you are at the mercy of any idiosyncratic behaviour of your compiler.
If using POSIX threads, there are no portable atomic operations, but it does define synchronization primitives.
See also:
Why is volatile not considered useful in multithreaded C or C++ programming?
The second question is a bugbear, I would suggest not doing it because different compilers may interpret the meaning differently, and the behaviour is still formally undefined either way.

volatile keyword with mutex and semaphores

The question is simple. Does/Should a variable used with multi-threads be volatile even accessed in critical section(i.e. mutex, semaphore) in C? Why / Why not?
#include <pthread.h>
volatile int account_balance;
pthread_mutex_t flag = PTHREAD_MUTEX_INITIALIZER;
void debit(int amount) {
pthread_mutex_lock(&flag);
account_balance -= amount;//Inside critical section
pthread_mutex_unlock(&flag);
}
What about the example or equivalently thinking for semaphore?
Does/Should a variable used with multi-threads be volatile even accessed in critical section(i.e. mutex, semaphore) in C? Why / Why not?
No.
volatile is logically irrelevant for concurency, because it's not sufficient.
Actually, that's not really true - volatile is not irrelevant because it can hide concurrency problems in your code, so it works "most of the time".
All volatile does is tell the compiler "this variable can change outside the current thread of execution". Volatile in no way enforces any ordering, atomicity, or - critically - visibility. Just because thread 2 on CPU A changes int x, that doesn't mean thread 1 on CPU D can even see the change at any specific time - it has it's own cached value, and volatile means almost nothing with respect to memory coherence because it doesn't guarantee ordering.
The last comment at the bottom of the Intel article Volatile: Almost Useless for Multi-Threaded Programming says it best:
If you are simply adding 'volatile' to variables that are shared
between threads thinking that fixes your shared-data problem without
bothering to understand why it may not, you will eventually reap the
reward you deserve.
Yes, lock-free code can make use of volatile. Such code is written by people who can likely write tutorials on the use of volatile, multithreaded code, and other extremely detailed subjects regarding compilers.
No, volatile should not be used on shared variables which are accessed under the protection of pthreads synchronisation functions like pthread_mutex_lock().
The reason is that the synchronisation functions themselves are guaranteed by POSIX to provide all the necessary compiler barriers and synchronisation to ensure consistency (as long as you follow the POSIX rules on concurrent access - ie. that you have used pthreads synchronisation functions to ensure that no thread can be writing to a shared variable whilst another thread is writing to or reading from it).
I have no idea why there's so much misinformation about volatile everywhere on the internet. The answer to your question is yes, you should make variables you use within a critical section volatile.
I'll give a contrived example. Let's say you want to run this function on multiple threads:
int a;
void inc_a(void) {
for (int i = 0; i < 5; ++i) {
a += 5;
}
}
Everybody, as it would seem, on this site will tell you that it's enough to put a += 5 in a critical section like so:
int a;
void inc_a(void) {
for (int i = 0; i < 5; ++i) {
enter_critical_section();
a += 5;
exit_critical_section();
}
}
As i said, it's contrived, but people will tell you this is correct, and it absolutely is not! If the compiler wasn't given prior knowledge as to what the critical section functions are, and what their semantic meaning is, there's nothing stopping the compiler from outputting this code:
int a;
void inc_a(void) {
register eax = a;
for (int i = 0; i < 5; ++i) {
enter_critical_section();
eax += 5;
exit_critical_section();
}
a = eax;
}
This code produces the same output in a single threaded context, so the compiler is allowed to do that. But in a multithreaded context, this can output anything between 25 and 25 times the thread count. One way to solve this issue is to use an atomic construct, but that has performance implications, instead what you should do is make the variable volatile. That is, unless you want to be like the rest of this community and blindly put your faith in your C compiler.

Static functions declared in "C" header files

For me it's a rule to define and declare static functions inside source files, I mean .c files.
However in very rare situations I saw people declaring it in the header file.
Since static functions have internal linkage we need to define it in every file we include the header file where the function is declared. This looks pretty odd and far from what we usually want when declaring something as static.
On the other hand if someone naive tries to use that function without defining it the compiler will complaint. So in some sense is not really unsafe to do this even sounding strange.
My questions are:
What is the problem of declaring static functions in header files?
What are the risks?
What the impact in compilation time?
Is there any risk in runtime?
First I'd like to clarify my understanding of the situation you describe: The header contains (only) a static function declaration while the C file contains the definition, i.e. the function's source code. For example
some.h:
static void f();
// potentially more declarations
some.c:
#include "some.h"
static void f() { printf("Hello world\n"); }
// more code, some of it potentially using f()
If this is the situation you describe, I take issue with your remark
Since static functions have internal linkage we need to define it in every file we include the header file where the function is declared.
If you declare the function but do not use it in a given translation unit, I don't think you have to define it. gcc accepts that with a warning; the standard does not seem to forbid it, unless I missed something. This may be important in your scenario because translation units which do not use the function but include the header with its declaration don't have to provide an unused definition.
Now let's examine the questions:
What is the problem of declaring static functions in header files?
It is somewhat unusual. Typically, static functions are functions needed in only one file. They are declared static to make that explicit by limiting their visibility. Declaring them in a header therefore is somewhat antithetical. If the function is indeed used in multiple files with identical definitions it should be made external, with a single definition. If only one translation unit actually uses it, the declaration does not belong in a header.
One possible scenario therefore is to ensure a uniform function signature for different implementations in the respective translation units. The common header leads to a compile time error for different return types in C (and C++); different parameter types would cause a compile time error only in C (but not in C++' because of function overloading).
What are the risks?
I do not see risks in your scenario. (As opposed to also including the function definition in a header which may violate the encapsulation principle.)
What the impact in compilation time?
A function declaration is small and its complexity is low, so the overhead of having additional function declarations in a header is likely negligible. But if you create and include an additional header for the declaration in many translation units the file handling overhead can be significant (i.e. the compiler idles a lot while it waits for the header I/O)
Is there any risk in runtime? I cannot see any.
This is not an answer to the stated questions, but hopefully shows why one might implement a static (or static inline) function in a header file.
I can personally only think of two good reasons to declare some functions static in a header file:
If the header file completely implements an interface that should only be visible in the current compilation unit
This is extremely rare, but might be useful in e.g. an educational context, at some point during the development of some example library; or perhaps when interfacing to another programming language with minimal code.
A developer might choose to do so if the library or interaface implementation is trivial and nearly so, and ease of use (to the developer using the header file) is more important than code size. In these cases, the declarations in the header file often use preprocessor macros, allowing the same header file to be included more than once, providing some sort of crude polymorphism in C.
Here is a practical example: Shoot-yourself-in-the-foot playground for linear congruential pseudorandom number generators. Because the implementation is local to the compilation unit, each compilation unit will get their own copies of the PRNG. This example also shows how crude polymorphism can be implemented in C.
prng32.h:
#if defined(PRNG_NAME) && defined(PRNG_MULTIPLIER) && defined(PRNG_CONSTANT) && defined(PRNG_MODULUS)
#define MERGE3_(a,b,c) a ## b ## c
#define MERGE3(a,b,c) MERGE3_(a,b,c)
#define NAME(name) MERGE3(PRNG_NAME, _, name)
static uint32_t NAME(state) = 0U;
static uint32_t NAME(next)(void)
{
NAME(state) = ((uint64_t)PRNG_MULTIPLIER * (uint64_t)NAME(state) + (uint64_t)PRNG_CONSTANT) % (uint64_t)PRNG_MODULUS;
return NAME(state);
}
#undef NAME
#undef MERGE3
#endif
#undef PRNG_NAME
#undef PRNG_MULTIPLIER
#undef PRNG_CONSTANT
#undef PRNG_MODULUS
An example using the above, example-prng32.h:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#define PRNG_NAME glibc
#define PRNG_MULTIPLIER 1103515245UL
#define PRNG_CONSTANT 12345UL
#define PRNG_MODULUS 2147483647UL
#include "prng32.h"
/* provides glibc_state and glibc_next() */
#define PRNG_NAME borland
#define PRNG_MULTIPLIER 22695477UL
#define PRNG_CONSTANT 1UL
#define PRNG_MODULUS 2147483647UL
#include "prng32.h"
/* provides borland_state and borland_next() */
int main(void)
{
int i;
glibc_state = 1U;
printf("glibc lcg: Seed %u\n", (unsigned int)glibc_state);
for (i = 0; i < 10; i++)
printf("%u, ", (unsigned int)glibc_next());
printf("%u\n", (unsigned int)glibc_next());
borland_state = 1U;
printf("Borland lcg: Seed %u\n", (unsigned int)borland_state);
for (i = 0; i < 10; i++)
printf("%u, ", (unsigned int)borland_next());
printf("%u\n", (unsigned int)borland_next());
return EXIT_SUCCESS;
}
The reason for marking both the _state variable and the _next() function static is that this way each compilation unit that includes the header file has their own copy of the variables and the functions -- here, their own copy of the PRNG. Each must be separately seeded, of course; and if seeded to the same value, will yield the same sequence.
One should generally shy away from such polymorphism attempts in C, because it leads to complicated preprocessor macro shenanigans, making the implementation much harder to understand, maintain, and modify than necessary.
However, when exploring the parameter space of some algorithm -- like here, the types of 32-bit linear congruential generators, this lets us use a single implementation for each of the generators we examine, ensuring there are no implementation differences between them. Note that even this case is more like a development tool, and not something you ought to see in a implementation provided for others to use.
If the header implements simple static inline accessor functions
Preprocessor macros are commonly used to simplify code accessing complicated structure types. static inline functions are similar, except that they also provide type checking at compile time, and can refer to their parameters several times (with macros, that is problematic).
One practical use case is a simple interface for reading files using low-level POSIX.1 I/O (using <unistd.h> and <fcntl.h> instead of <stdio.h>). I've done this myself when reading very large (dozens of megabytes to gigabytes range) text files containing real numbers (with a custom float/double parser), as the GNU C standard I/O is not particularly fast.
For example, inbuffer.h:
#ifndef INBUFFER_H
#define INBUFFER_H
typedef struct {
unsigned char *head; /* Next buffered byte */
unsigned char *tail; /* Next byte to be buffered */
unsigned char *ends; /* data + size */
unsigned char *data;
size_t size;
int descriptor;
unsigned int status; /* Bit mask */
} inbuffer;
#define INBUFFER_INIT { NULL, NULL, NULL, NULL, 0, -1, 0 }
int inbuffer_open(inbuffer *, const char *);
int inbuffer_close(inbuffer *);
int inbuffer_skip_slow(inbuffer *, const size_t);
int inbuffer_getc_slow(inbuffer *);
static inline int inbuffer_skip(inbuffer *ib, const size_t n)
{
if (ib->head + n <= ib->tail) {
ib->head += n;
return 0;
} else
return inbuffer_skip_slow(ib, n);
}
static inline int inbuffer_getc(inbuffer *ib)
{
if (ib->head < ib->tail)
return *(ib->head++);
else
return inbuffer_getc_slow(ib);
}
#endif /* INBUFFER_H */
Note that the above inbuffer_skip() and inbuffer_getc() do not check if ib is non-NULL; this is typical for such functions. These accessor functions are assumed to be "in the fast path", i.e. called very often. In such cases, even the function call overhead matters (and is avoided with static inline functions, since they are duplicated in the code at the call site).
Trivial accessor functions, like the above inbuffer_skip() and inbuffer_getc(), may also let the compiler avoid the register moves involved in function calls, because functions expect their parameters to be located in specific registers or on the stack, whereas inlined functions can be adapted (wrt. register use) to the code surrounding the inlined function.
Personally, I do recommend writing a couple of test programs using the non-inlined functions first, and compare the performance and results to the inlined versions. Comparing the results ensure the inlined versions do not have bugs (off by one type is common here!), and comparing the performance and generated binaries (size, at least) tells you whether inlining is worth it in general.
Why would you want a both global and static function? In c, functions are global by default. You only use static functions if you want to limit the access to a function to the file they are declared. So you actively restrict access by declaring it static...
The only requirement for implementations in the header file, is for c++ template functions and template class member functions.

Macro in C to call a function returning integer and then return a string

I have a function which returns an integer value. Now I want to write a macro which call this function, gets the return value and prepends a string to it and return the resultant string.
I have tried this:
#define TEST(x) is_enabled(x)
I call this macro in the main function as:
int ret = 0;
ret = TEST(2);
printf("PORT-%d\n", ret);
This works perfectly. However I want the macro to return the string PORT-x, where, x is the return value of the called function. How can I do this?
EDIT :
I also tried writing it into multiple lines as:
#define TEST(x)\
{\
is_enabled(x);\
}
And called it in the main function as:
printf("PORT-%d\n", TEST(2));
But this gives a compile time error:
error: expected expression before â{â token
Use a function, not a macro. There is no good reason to use a macro here.
You can solve it by using sprintf(3), in conjonction with malloc or a buffer. See Creating C formatted strings (not printing them) or man pages for details.
About your edit: You don't need to use braces {} in a macro, and they are causing your error as preprocessing would translate it to something like
printf("format%d", {
is_enabled(x);
});
To better understand macros, run gcc or clang with -E flag, or try to read this article: http://en.wikipedia.org/wiki/C_preprocessor
That's a bit of a pain since you need to ensure there's storage for the string. In all honesty, macros nowadays could be reserved for conditional compilation only.
Constants are better done with enumerated types, and macro functions are generally better as inline functions (with the knowledge that inline is a suggestion to the compiler, not a demand).
If you insist on using a macro, the storage could be done with static storage though that has problems with threads if you're using them, and delayed/multiple use of the returned string.
You could also dynamically allocate the string but then you have to free it when done, and handle out-of-memory conditions.
Perhaps the easiest way is to demand the macro user provide their own storage, along the lines of:
#include <stdio.h>
#define TEST2_STR(b,p) (sprintf(b,"PORT-%d",p),b)
int main (void) {
char buff[20];
puts (TEST2_STR(buff, 42));
return 0;
}
which outputs:
PORT-42
In case the macro seems a little confusing, it makes use of the comma operator, in which the expression (a, b) evaluates both a and b, and has a result of b.
In this case, it evaluates the sprintf (which populates the buffer) then "returns" the buffer. And, even if you think you've never seen that before, you're probably wrong:
for (i = 0, j = 9; i < 10; i++, j--)
xyzzy[i] = plugh[j];
Despite most people thinking that's a feature of for, it's very much a different construct that can be used in many different places:
int i, j, k;
i = 7, j = 4, k = 42;
while (puts("Hello, world"),sleep(1),1);
(and so on).

Keep pthread variables local

Is there a way while using pthread.h on a Linux GCC to keep variables local to the thread-function:
int i = 42; // global instance of i
int main() {
pthread_t threads[2];
long t;
pthread_create(&threads[t], NULL, ThreadFunction, (void *) t;
pthread_create(&threads[t], NULL, ThreadFunction2, (void *) t;
}
I wonder whether there is a parameter at the POSIX function creating the new thread and keeping the variables local:
void *ThreadFunction(void *threadid)
{
int i=0;
i++; // this is a local instance of i
printf("i is %d", i); // as expected: 1
}
void *ThreadFunction2(void *threadid)
{
i += 3; // another local instance -> problem
}
Where afterwards i is 42. Even if I have defined an i previously I want this i not to be within my threads.
In gcc, you can make a global variable thread-local by using the __thread specifier:
__thread int i = 42;
Don't do that. There are better solutions, depending on you want to do.
Global variables are always available in the whole compilation unit (or even more compilation units if you use external declarations). This has nothing to do with threads, it's the default behavior of C/C++. The recommended solution is not to use globals - globals are evil. If you still need to use globals, you may want to prefix them, such as g_i. Another solution is to put your thread functions into another compilation unit (c file).
The sample code is wrong (by itself) and has undefined behavior. You are trying to read an uninitialized variable t four times - two times to index an array and two times in a cast expression - and depending on the (undefined) meaning of &threads[t], the function pthread_create may cause more UB.
Besides, its obviously not the code you have used because the pthread_create functions are lacking closing parentheses.
Regarding the variable i: declaring a new variable i (i.e. int i = 0) in the local scope hides any possible i's in the more broad scope - so there should not be any problems using i locally as a variable name inside the function.
phtread has a notion of thread local storage, and gcc offers an easy interface to it with a __thread storage class. Such variables suffer from all the problems of global variables, and then some more. But sometimes they are handy as all other solutions are worse in context.

Resources