How do I start threads in plain C?

How do I start threads in plain C? - c

I have used fork() in C to start another process. How do I start a new thread?

Since you mentioned fork() I assume you're on a Unix-like system, in which case POSIX threads (usually referred to as pthreads) are what you want to use.
Specifically, pthread_create() is the function you need to create a new thread. Its arguments are:
int pthread_create(pthread_t * thread, pthread_attr_t * attr, void *
(*start_routine)(void *), void * arg);
The first argument is the returned pointer to the thread id. The second argument is the thread arguments, which can be NULL unless you want to start the thread with a specific priority. The third argument is the function executed by the thread. The fourth argument is the single argument passed to the thread function when it is executed.

AFAIK, ANSI C doesn't define threading, but there are various libraries available.
If you are running on Windows, link to msvcrt and use _beginthread or _beginthreadex.
If you are running on other platforms, check out the pthreads library (I'm sure there are others as well).

C11 threads + C11 atomic_int
Added to glibc 2.28. Tested in Ubuntu 18.10 amd64 (comes with glic 2.28) and Ubuntu 18.04 (comes with glibc 2.27) by compiling glibc 2.28 from source: Multiple glibc libraries on a single host
Example adapted from: https://en.cppreference.com/w/c/language/atomic
main.c
#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>
atomic_int atomic_counter;
int non_atomic_counter;
int mythread(void* thr_data) {
(void)thr_data;
for(int n = 0; n < 1000; ++n) {
++non_atomic_counter;
++atomic_counter;
// for this example, relaxed memory order is sufficient, e.g.
// atomic_fetch_add_explicit(&atomic_counter, 1, memory_order_relaxed);
}
return 0;
}
int main(void) {
thrd_t thr[10];
for(int n = 0; n < 10; ++n)
thrd_create(&thr[n], mythread, NULL);
for(int n = 0; n < 10; ++n)
thrd_join(thr[n], NULL);
printf("atomic %d\n", atomic_counter);
printf("non-atomic %d\n", non_atomic_counter);
}
GitHub upstream.
Compile and run:
gcc -ggdb3 -std=c11 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
Possible output:
atomic 10000
non-atomic 4341
The non-atomic counter is very likely to be smaller than the atomic one due to racy access across threads to the non-atomic variable.
See also: How to do an atomic increment and fetch in C?
Disassembly analysis
Disassemble with:
gdb -batch -ex "disassemble/rs mythread" main.out
contains:
17 ++non_atomic_counter;
0x00000000004007e8 <+8>: 83 05 65 08 20 00 01 addl $0x1,0x200865(%rip) # 0x601054 <non_atomic_counter>
18 __atomic_fetch_add(&atomic_counter, 1, __ATOMIC_SEQ_CST);
0x00000000004007ef <+15>: f0 83 05 61 08 20 00 01 lock addl $0x1,0x200861(%rip) # 0x601058 <atomic_counter>
so we see that the atomic increment is done at the instruction level with the f0 lock prefix.
With aarch64-linux-gnu-gcc 8.2.0, we get instead:
11 ++non_atomic_counter;
0x0000000000000a28 <+24>: 60 00 40 b9 ldr w0, [x3]
0x0000000000000a2c <+28>: 00 04 00 11 add w0, w0, #0x1
0x0000000000000a30 <+32>: 60 00 00 b9 str w0, [x3]
12 ++atomic_counter;
0x0000000000000a34 <+36>: 40 fc 5f 88 ldaxr w0, [x2]
0x0000000000000a38 <+40>: 00 04 00 11 add w0, w0, #0x1
0x0000000000000a3c <+44>: 40 fc 04 88 stlxr w4, w0, [x2]
0x0000000000000a40 <+48>: a4 ff ff 35 cbnz w4, 0xa34 <mythread+36>
so the atomic version actually has a cbnz loop that runs until the stlxr store succeed. Note that ARMv8.1 can do all of that with a single LDADD instruction.
This is analogous to what we get with C++ std::atomic: What exactly is std::atomic?
Benchmark
TODO. Crate a benchmark to show that atomic is slower.
POSIX threads
main.c
#define _XOPEN_SOURCE 700
#include <assert.h>
#include <stdlib.h>
#include <pthread.h>
enum CONSTANTS {
NUM_THREADS = 1000,
NUM_ITERS = 1000
};
int global = 0;
int fail = 0;
pthread_mutex_t main_thread_mutex = PTHREAD_MUTEX_INITIALIZER;
void* main_thread(void *arg) {
int i;
for (i = 0; i < NUM_ITERS; ++i) {
if (!fail)
pthread_mutex_lock(&main_thread_mutex);
global++;
if (!fail)
pthread_mutex_unlock(&main_thread_mutex);
}
return NULL;
}
int main(int argc, char **argv) {
pthread_t threads[NUM_THREADS];
int i;
fail = argc > 1;
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, main_thread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
assert(global == NUM_THREADS * NUM_ITERS);
return EXIT_SUCCESS;
}
Compile and run:
gcc -std=c99 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
./main.out 1
The first run works fine, the second fails due to missing synchronization.
There don't seem to be POSIX standardized atomic operations: UNIX Portable Atomic Operations
Tested on Ubuntu 18.04. GitHub upstream.
GCC __atomic_* built-ins
For those that don't have C11, you can achieve atomic increments with the __atomic_* GCC extensions.
main.c
#define _XOPEN_SOURCE 700
#include <pthread.h>
#include <stdatomic.h>
#include <stdio.h>
#include <stdlib.h>
enum Constants {
NUM_THREADS = 1000,
};
int atomic_counter;
int non_atomic_counter;
void* mythread(void *arg) {
(void)arg;
for (int n = 0; n < 1000; ++n) {
++non_atomic_counter;
__atomic_fetch_add(&atomic_counter, 1, __ATOMIC_SEQ_CST);
}
return NULL;
}
int main(void) {
int i;
pthread_t threads[NUM_THREADS];
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, mythread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
printf("atomic %d\n", atomic_counter);
printf("non-atomic %d\n", non_atomic_counter);
}
Compile and run:
gcc -ggdb3 -O3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c -pthread
./main.out
Output and generated assembly: the same as the "C11 threads" example.
Tested in Ubuntu 16.04 amd64, GCC 6.4.0.

pthreads is a good start, look here

Threads are not part of the C standard, so the only way to use threads is to use some library (eg: POSIX threads in Unix/Linux, _beginthread/_beginthreadex if you want to use the C-runtime from that thread or just CreateThread Win32 API)

Check out the pthread (POSIX thread) library.

Related

What does make errno thread safe? [duplicate]

In errno.h, this variable is declared as extern int errno; so my question is, is it safe to check errno value after some calls or use perror() in multi-threaded code. Is this a thread safe variable? If not, then whats the alternative ?
I am using linux with gcc on x86 architecture.

Yes, it is thread safe. On Linux, the global errno variable is thread-specific. POSIX requires that errno be threadsafe.
See http://www.unix.org/whitepapers/reentrant.html
In POSIX.1, errno is defined as an
external global variable. But this
definition is unacceptable in a
multithreaded environment, because its
use can result in nondeterministic
results. The problem is that two or
more threads can encounter errors, all
causing the same errno to be set.
Under these circumstances, a thread
might end up checking errno after it
has already been updated by another
thread.
To circumvent the resulting
nondeterminism, POSIX.1c redefines
errno as a service that can access the
per-thread error number as follows
(ISO/IEC 9945:1-1996, §2.4):
Some functions may provide the error number in a variable accessed
through the symbol errno. The symbol
errno is defined by including the
header , as specified by the
C Standard ... For each thread of a
process, the value of errno shall not
be affected by function calls or
assignments to errno by other threads.
Also see http://linux.die.net/man/3/errno
errno is thread-local; setting it in one thread does not affect its value in any other thread.

Yes
Errno isn't a simple variable anymore, it's something complex behind the scenes, specifically for it to be thread-safe.
See $ man 3 errno:
ERRNO(3) Linux Programmer’s Manual ERRNO(3)
NAME
errno - number of last error
SYNOPSIS
#include <errno.h>
DESCRIPTION
...
errno is defined by the ISO C standard to be a modifiable lvalue of
type int, and must not be explicitly declared; errno may be a macro.
errno is thread-local; setting it in one thread does not affect its
value in any other thread.
We can double-check:
$ cat > test.c
#include <errno.h>
f() { g(errno); }
$ cc -E test.c | grep ^f
f() { g((*__errno_location ())); }
$

In errno.h, this variable is declared as extern int errno;
Here is what the C standard says:
The macro errno need not be the identifier of an object. It might expand to a modifiable lvalue resulting from a function call (for example, *errno()).
Generally, errno is a macro which calls a function returning the address of the error number for the current thread, then dereferences it.
Here is what I have on Linux, in /usr/include/bits/errno.h:
/* Function to get address of global `errno' variable. */
extern int *__errno_location (void) __THROW __attribute__ ((__const__));
# if !defined _LIBC || defined _LIBC_REENTRANT
/* When using threads, errno is a per-thread value. */
# define errno (*__errno_location ())
# endif
In the end, it generates this kind of code:
> cat essai.c
#include <errno.h>
int
main(void)
{
errno = 0;
return 0;
}
> gcc -c -Wall -Wextra -pedantic essai.c
> objdump -d -M intel essai.o
essai.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 e4 f0 and esp,0xfffffff0
6: e8 fc ff ff ff call 7 <main+0x7> ; get address of errno in EAX
b: c7 00 00 00 00 00 mov DWORD PTR [eax],0x0 ; store 0 in errno
11: b8 00 00 00 00 mov eax,0x0
16: 89 ec mov esp,ebp
18: 5d pop ebp
19: c3 ret

yes, as it is explained in the errno man page and the other replies, errno is a thread local variable.
However, there is a silly detail which could be easily forgotten. Programs should save and restore the errno on any signal handler executing a system call. This is because the signal will be handled by one of the process threads which could overwrite its value.
Therefore, the signal handlers should save and restore errno. Something like:
void sig_alarm(int signo)
{
int errno_save;
errno_save = errno;
//whatever with a system call
errno = errno_save;
}

This is from <sys/errno.h> on my Mac:
#include <sys/cdefs.h>
__BEGIN_DECLS
extern int * __error(void);
#define errno (*__error())
__END_DECLS
So errno is now a function __error(). The function is implemented so as to be thread-safe.

We can check by running a simple program on a machine.
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
#define NTHREADS 5
void *thread_function(void *);
int
main()
{
pthread_t thread_id[NTHREADS];
int i, j;
for(i=0; i < NTHREADS; i++)
{
pthread_create( &thread_id[i], NULL, thread_function, NULL );
}
for(j=0; j < NTHREADS; j++)
{
pthread_join( thread_id[j], NULL);
}
return 0;
}
void *thread_function(void *dummyPtr)
{
printf("Thread number %ld addr(errno):%p\n", pthread_self(), &errno);
}
Running this program and you can see different addresses for errno in each thread. The output of a run on my machine looked like:-
Thread number 140672336922368 addr(errno):0x7ff0d4ac0698
Thread number 140672345315072 addr(errno):0x7ff0d52c1698
Thread number 140672328529664 addr(errno):0x7ff0d42bf698
Thread number 140672320136960 addr(errno):0x7ff0d3abe698
Thread number 140672311744256 addr(errno):0x7ff0d32bd698
Notice that address is different for all threads.

On many Unix systems, compiling with -D_REENTRANT ensures that errno is thread-safe.
For example:
#if defined(_REENTRANT) || _POSIX_C_SOURCE - 0 >= 199506L
extern int *___errno();
#define errno (*(___errno()))
#else
extern int errno;
/* ANSI C++ requires that errno be a macro */
#if __cplusplus >= 199711L
#define errno errno
#endif
#endif /* defined(_REENTRANT) */

I think the answer is "it depends". Thread-safe C runtime libraries usually implement errno as a function call (macro expanding to a function) if you're building threaded code with the correct flags.

LD_PRELOAD and linkage

I have this small testcode atfork_demo.c:
#include <stdio.h>
#include <pthread.h>
void hello_from_fork_prepare() {
printf("Hello from atfork prepare.\n");
fflush(stdout);
}
void register_hello_from_fork_prepare() {
pthread_atfork(&hello_from_fork_prepare, 0, 0);
}
Now, I compile it in two different ways:
gcc -shared -fPIC atfork_demo.c -o atfork_demo1.so
gcc -shared -fPIC atfork_demo.c -o atfork_demo2.so -lpthread
My demo main atfork_demo_main.c is this:
#include <dlfcn.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, const char** argv) {
if(argc <= 1) {
printf("usage: ... lib.so\n");
return 1;
}
void* plib = dlopen("libpthread.so.0", RTLD_NOW|RTLD_GLOBAL);
if(!plib) {
printf("cannot load pthread, error %s\n", dlerror());
return 1;
}
void* lib = dlopen(argv[1], RTLD_LAZY);
if(!lib) {
printf("cannot load %s, error %s\n", argv[1], dlerror());
return 1;
}
void (*reg)();
reg = dlsym(lib, "register_hello_from_fork_prepare");
if(!reg) {
printf("did not found func, error %s\n", dlerror());
return 1;
}
reg();
fork();
}
Which I compile like this:
gcc atfork_demo_main.c -o atfork_demo_main.exec -ldl
Now, I have another small demo atfork_patch.c where I want to override pthread_atfork:
#include <stdio.h>
int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void)) {
printf("Ignoring pthread_atfork call!\n");
fflush(stdout);
return 0;
}
Which I compile like this:
gcc -shared -O2 -fPIC patch_atfork.c -o patch_atfork.so
And then I set LD_PRELOAD=./atfork_patch.so, and do these two calls:
./atfork_demo_main.exec ./atfork_demo1.so
./atfork_demo_main.exec ./atfork_demo2.so
In the first case, the LD_PRELOAD-override of pthread_atfork worked and in the second, it did not. I get the output:
Ignoring pthread_atfork call!
Hello from atfork prepare.
So, now to the question(s):
Why did it not work in the second case?
How can I make it work also in the second case, i.e. also override it?
In my real use case, atfork_demo is some library which I cannot change. I also cannot change atfork_demo_main but I can make it load any other code. I would prefer if I can just do it with some change in atfork_patch.
You get some more debug output if you also use LD_DEBUG=all. Maybe interesting is this bit, for the second case:
841: symbol=__register_atfork; lookup in file=./atfork_demo_main.exec [0]
841: symbol=__register_atfork; lookup in file=./atfork_patch_extended.so [0]
841: symbol=__register_atfork; lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
841: symbol=__register_atfork; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
841: binding file ./atfork_demo2.so [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `__register_atfork' [GLIBC_2.3.2]
So, it searches for the symbol __register_atfork. I added that to atfork_patch_extended.so but it doesn't find it and uses it from libc instead. How can I make it find and use my __register_atfork?
As a side note, my main goal is to ignore the atfork handlers when fork() is called, but this is not the question here, but actually here. One solution to that, which seems to work, is to override fork() itself by this:
pid_t fork(void) {
return syscall(SYS_clone, SIGCHLD, 0);
}

Before answering this question, I would stress that this is a really bad idea for any production application.
If you are using a third party library that puts such constraints in place, then think about an alternative solution, such as forking early to maintain a "helper" process, with a pipe between you and it... then, when you need to call exec(), you can request that it does the work (fork(), exec()) on your behalf.
Patching or otherwise side-stepping the services of a system call such as pthread_atfork() is just asking for trouble (missed events, memory leaks, crashes, etc...).
As #Sergio pointed out, pthread_atfork() is actually built into atfork_demo2.so, so you can't do anything to override it... However examining the disassembly / source of pthread_atfork() gives you a decent hint about how achieve what you're asking:
0000000000000830 <__pthread_atfork>:
830: 48 8d 05 f9 07 20 00 lea 0x2007f9(%rip),%rax # 201030 <__dso_handle>
837: 48 85 c0 test %rax,%rax
83a: 74 0c je 848 <__pthread_atfork+0x18>
83c: 48 8b 08 mov (%rax),%rcx
83f: e9 6c fe ff ff jmpq 6b0 <__register_atfork#plt>
844: 0f 1f 40 00 nopl 0x0(%rax)
848: 31 c9 xor %ecx,%ecx
84a: e9 61 fe ff ff jmpq 6b0 <__register_atfork#plt>
or the source (from here):
int
pthread_atfork (void (*prepare) (void),
void (*parent) (void),
void (*child) (void))
{
return __register_atfork (prepare, parent, child, &__dso_handle == NULL ? NULL : __dso_handle);
}
As you can see, pthread_atfork() does nothing aside from calling __register_atfork()... so patch that instead!
The content of atfork_patch.c now becomes: (using __register_atfork()'s prototype, from here / here)
#include <stdio.h>
int __register_atfork (void (*prepare) (void), void (*parent) (void),
void (*child) (void), void *dso_handle) {
printf("Ignoring pthread_atfork call!\n");
fflush(stdout);
return 0;
}
This works for both demos:
$ LD_PRELOAD=./atfork_patch.so ./atfork_demo_main.exec ./atfork_demo1.so
Ignoring pthread_atfork call!
$ LD_PRELOAD=./atfork_patch.so ./atfork_demo_main.exec ./atfork_demo2.so
Ignoring pthread_atfork call!

It doesn't work for the second case because there is nothing to override. Your second library is linked statically with pthread library:
$ readelf --symbols atfork_demo1.so | grep pthread_atfork
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND pthread_atfork
54: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND pthread_atfork
$ readelf --symbols atfork_demo2.so | grep pthread_atfork
41: 0000000000000000 0 FILE LOCAL DEFAULT ABS pthread_atfork.c
47: 0000000000000830 31 FUNC LOCAL DEFAULT 12 __pthread_atfork
49: 0000000000000830 31 FUNC LOCAL DEFAULT 12 pthread_atfork
So it will use local pthread_atfork each time, regardless of LD_PRELOAD or any other loaded libraries.
How to overcome that? Looks like for described configuration it is not possible since you need to modify atfork_demo library or main executable anyway.

Why doesn't ptrace SINGLESTEP work properly?

I am trying to trace a little program using ptrace API. I figured out that every time the tracer is run, it produces bad results. This is the disassembly of short program that I want to trace:
$ objdump -d -M intel inc_reg16
inc_reg16: file format elf32-i386
Disassembly of section .text:
08048060 <.text>:
8048060: b8 00 00 00 00 mov eax,0x0
8048065: 66 40 inc ax
8048067: 75 fc jne 0x8048065
8048069: 89 c3 mov ebx,eax
804806b: b8 01 00 00 00 mov eax,0x1
8048070: cd 80 int 0x80
and here is code of the tracer itself:
// ezptrace.c
#include <sys/user.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
int main() {
pid_t child;
child = fork();
if (child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execv("inc_reg16", NULL);
}
else {
int status;
wait(&status);
struct user_regs_struct regs;
while (1) {
ptrace(PTRACE_GETREGS, child, NULL, &regs);
printf("eip: %x\n", (unsigned int) regs.eip);
ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
waitpid(child, &status, 0);
if(WIFEXITED(status)) break;
}
printf("end\n");
}
return 0;
}
The tracer's job is to single step the inc_reg16 program and log address of each encountered processor instruction. When I run and check how many times the instruction 'inc ax' has been encountered, it occurs that the numbers are different each time the tracer is run:
$ gcc ezptrace.c -Wall -o ezptrace
$ ./ezptrace > inc_reg16.log
$ grep '8048065' inc_reg16.log | wc -l
65498
the second check:
$ ./ezptrace > inc_reg16.log
$ grep '8048065' inc_reg16.log | wc -l
65494
The problem is that above results should be both 65536, as the instruction 'inc ax' is executed exactly 65536 times. Now the question is: is there a mistake in my code or it's a matter of some bug in ptrace? Your help is greatly appreciated.

I tried the same program under both virtualbox and vmware, it seems that only vmware has the correct result, whereas virtualbox has the same problem as you. I used the virtualbox 4.2.1.

eip is the address to the "current instruction" in user space. You need a ptrace(...PEEKDATA, ...), i.e. following ptrace(...GETREGS, ...), to obtain the actual instruction. Also keep in mind that, with ptrace(...PEEKDATA, ...) you always obtain a machine word, actual opcodes usually only occupy the low 16/32 bits of it.

How to do an atomic increment and fetch in C?

I'm looking for a way to atomically increment a short, and then return that value. I need to do this both in kernel mode and in user mode, so it's in C, under Linux, on Intel 32bit architecture. Unfortunately, due to speed requirements, a mutex lock isn't going to be a good option.
Is there any other way to do this? At this point, it seems like the only option available is to inline some assembly. If that's the case, could someone point me towards the appropriate instructions?

GCC __atomic_* built-ins
As of GCC 4.8, __sync built-ins have been deprecated in favor of the __atomic built-ins: https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/_005f_005fatomic-Builtins.html
They implement the C++ memory model, and std::atomic uses them internally.
The following POSIX threads example fails consistently with ++ on x86-64, and always works with _atomic_fetch_add.
main.c
#include <assert.h>
#include <pthread.h>
#include <stdlib.h>
enum CONSTANTS {
NUM_THREADS = 1000,
NUM_ITERS = 1000
};
int global = 0;
void* main_thread(void *arg) {
int i;
for (i = 0; i < NUM_ITERS; ++i) {
__atomic_fetch_add(&global, 1, __ATOMIC_SEQ_CST);
/* This fails consistently. */
/*global++*/;
}
return NULL;
}
int main(void) {
int i;
pthread_t threads[NUM_THREADS];
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, main_thread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
assert(global == NUM_THREADS * NUM_ITERS);
return EXIT_SUCCESS;
}
Compile and run:
gcc -std=c99 -Wall -Wextra -pedantic -o main.out ./main.c -pthread
./main.out
Disassembly analysis at: How do I start threads in plain C?
Tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28.
C11 _Atomic
In 5.1, the above code works with:
_Atomic int global = 0;
global++;
And C11 threads.h was added in glibc 2.28, which allows you to create threads in pure ANSI C without POSIX, minimal runnable example: How do I start threads in plain C?

GCC supports atomic operations:
gcc atomics

Is errno thread-safe?

In errno.h, this variable is declared as extern int errno; so my question is, is it safe to check errno value after some calls or use perror() in multi-threaded code. Is this a thread safe variable? If not, then whats the alternative ?
I am using linux with gcc on x86 architecture.

Yes, it is thread safe. On Linux, the global errno variable is thread-specific. POSIX requires that errno be threadsafe.
See http://www.unix.org/whitepapers/reentrant.html
In POSIX.1, errno is defined as an
external global variable. But this
definition is unacceptable in a
multithreaded environment, because its
use can result in nondeterministic
results. The problem is that two or
more threads can encounter errors, all
causing the same errno to be set.
Under these circumstances, a thread
might end up checking errno after it
has already been updated by another
thread.
To circumvent the resulting
nondeterminism, POSIX.1c redefines
errno as a service that can access the
per-thread error number as follows
(ISO/IEC 9945:1-1996, §2.4):
Some functions may provide the error number in a variable accessed
through the symbol errno. The symbol
errno is defined by including the
header , as specified by the
C Standard ... For each thread of a
process, the value of errno shall not
be affected by function calls or
assignments to errno by other threads.
Also see http://linux.die.net/man/3/errno
errno is thread-local; setting it in one thread does not affect its value in any other thread.

Yes
Errno isn't a simple variable anymore, it's something complex behind the scenes, specifically for it to be thread-safe.
See $ man 3 errno:
ERRNO(3) Linux Programmer’s Manual ERRNO(3)
NAME
errno - number of last error
SYNOPSIS
#include <errno.h>
DESCRIPTION
...
errno is defined by the ISO C standard to be a modifiable lvalue of
type int, and must not be explicitly declared; errno may be a macro.
errno is thread-local; setting it in one thread does not affect its
value in any other thread.
We can double-check:
$ cat > test.c
#include <errno.h>
f() { g(errno); }
$ cc -E test.c | grep ^f
f() { g((*__errno_location ())); }
$

In errno.h, this variable is declared as extern int errno;
Here is what the C standard says:
The macro errno need not be the identifier of an object. It might expand to a modifiable lvalue resulting from a function call (for example, *errno()).
Generally, errno is a macro which calls a function returning the address of the error number for the current thread, then dereferences it.
Here is what I have on Linux, in /usr/include/bits/errno.h:
/* Function to get address of global `errno' variable. */
extern int *__errno_location (void) __THROW __attribute__ ((__const__));
# if !defined _LIBC || defined _LIBC_REENTRANT
/* When using threads, errno is a per-thread value. */
# define errno (*__errno_location ())
# endif
In the end, it generates this kind of code:
> cat essai.c
#include <errno.h>
int
main(void)
{
errno = 0;
return 0;
}
> gcc -c -Wall -Wextra -pedantic essai.c
> objdump -d -M intel essai.o
essai.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 e4 f0 and esp,0xfffffff0
6: e8 fc ff ff ff call 7 <main+0x7> ; get address of errno in EAX
b: c7 00 00 00 00 00 mov DWORD PTR [eax],0x0 ; store 0 in errno
11: b8 00 00 00 00 mov eax,0x0
16: 89 ec mov esp,ebp
18: 5d pop ebp
19: c3 ret

yes, as it is explained in the errno man page and the other replies, errno is a thread local variable.
However, there is a silly detail which could be easily forgotten. Programs should save and restore the errno on any signal handler executing a system call. This is because the signal will be handled by one of the process threads which could overwrite its value.
Therefore, the signal handlers should save and restore errno. Something like:
void sig_alarm(int signo)
{
int errno_save;
errno_save = errno;
//whatever with a system call
errno = errno_save;
}

This is from <sys/errno.h> on my Mac:
#include <sys/cdefs.h>
__BEGIN_DECLS
extern int * __error(void);
#define errno (*__error())
__END_DECLS
So errno is now a function __error(). The function is implemented so as to be thread-safe.

We can check by running a simple program on a machine.
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
#define NTHREADS 5
void *thread_function(void *);
int
main()
{
pthread_t thread_id[NTHREADS];
int i, j;
for(i=0; i < NTHREADS; i++)
{
pthread_create( &thread_id[i], NULL, thread_function, NULL );
}
for(j=0; j < NTHREADS; j++)
{
pthread_join( thread_id[j], NULL);
}
return 0;
}
void *thread_function(void *dummyPtr)
{
printf("Thread number %ld addr(errno):%p\n", pthread_self(), &errno);
}
Running this program and you can see different addresses for errno in each thread. The output of a run on my machine looked like:-
Thread number 140672336922368 addr(errno):0x7ff0d4ac0698
Thread number 140672345315072 addr(errno):0x7ff0d52c1698
Thread number 140672328529664 addr(errno):0x7ff0d42bf698
Thread number 140672320136960 addr(errno):0x7ff0d3abe698
Thread number 140672311744256 addr(errno):0x7ff0d32bd698
Notice that address is different for all threads.

On many Unix systems, compiling with -D_REENTRANT ensures that errno is thread-safe.
For example:
#if defined(_REENTRANT) || _POSIX_C_SOURCE - 0 >= 199506L
extern int *___errno();
#define errno (*(___errno()))
#else
extern int errno;
/* ANSI C++ requires that errno be a macro */
#if __cplusplus >= 199711L
#define errno errno
#endif
#endif /* defined(_REENTRANT) */

I think the answer is "it depends". Thread-safe C runtime libraries usually implement errno as a function call (macro expanding to a function) if you're building threaded code with the correct flags.