Is changing a pointer considered an atomic action in C?

Is changing a pointer considered an atomic action in C? - c

If I have a multi-threaded program that reads a cache-type memory by reference. Can I change this pointer by the main thread without risking any of the other threads reading unexpected values.
As I see it, if the change is atomic the other threads will either read the older value or the newer value; never random memory (or null pointers), right?
I am aware that I should probably use synchronisation methods anyway, but I'm still curious.
Are pointer changes atomic?
Update: My platform is 64-bit Linux (2.6.29), although I'd like a cross-platform answer as well :)

As others have mentioned, there is nothing in the C language that guarantees this, and it is dependent on your platform.
On most contemporary desktop platforms, the read/write to a word-sized, aligned location will be atomic. But that really doesn't solve your problem, due to processor and compiler re-ordering of reads and writes.
For example, the following code is broken:
Thread A:
DoWork();
workDone = 1;
Thread B:
while(workDone != 0);
ReceiveResultsOfWork();
Although the write to workDone is atomic, on many systems there is no guarantee by the processor that the write to workDone will be visible to other processors before writes done via DoWork() are visible. The compiler may also be free to re-order the write to workDone to before the call to DoWork(). In both cases, ReceiveResultsOfWork() might start working on incomplete data.
Depending on your platform, you may need to insert memory fences and so on to ensure proper ordering. This can be very tricky to get right.
Or just use locks. Much simpler, much easier to verify as correct, and in most cases more than performant enough.

The C language says nothing about whether any operations are atomic. I've worked on microcontrollers with 8 bit buses and 16-bit pointers; any pointer operation on these systems would potentially be non-atomic. I think I remember Intel 386s (some of which had 16-bit buses) raising similar concerns. Likewise, I can imagine systems that have 64-bit CPUs, but 32-bit data buses, which might then entail similar concerns about non-atomic pointer operations. (I haven't checked to see whether any such systems actually exist.)
EDIT: Michael's answer is well worth reading. Bus size vs. pointer size is hardly the only consideration regarding atomicity; it was simply the first counterexample that came to mind for me.

You didn't mention a platform. So I think a slightly more accurate question would be
Are pointer changes guaranteed to be atomic?
The distinction is necessary because different C/C++ implementations may vary in this behavior. It's possible for a particular platform to guarantee atomic assignments and still be within the standard.
As to whether or not this is guaranteed overall in C/C++, the answer is No. The C standard makes no such guarantees. The only way to guarantee a pointer assignment is atomic is to use a platform specific mechanism to guarantee the atomicity of the assignment. For instance the Interlocked methods in Win32 will provide this guarantee.
Which platform are you working on?

The cop-out answer is that the C spec does not require a pointer assignment to be atomic, so you can't count on it being atomic.
The actual answer would be that it probably depends on your platform, compiler, and possibly the alignment of the stars on the day you wrote the program.

'normal' pointer modification isn't guaranteed to be atomic.
check 'Compare and Swap' (CAS) and other atomic operations, not a C standard, but most compilers have some access to the processor primitives. in the GNU gcc case, there are several built-in functions

The only thing guaranteed by the standard is the sig_atomic_t type.
As you've seen from the other answers, it is likely to be OK when targeting generic x86 architecture, but very risky with more "specialty" hardware.
If you're really desperate to know, you can compare sizeof(sig_atomic_t) to sizeof(int*) and see what they are you your target system.

It turns out to be quite a complex question. I asked a similar question and read everything I got pointed to. I learned a lot about how caching works in modern architectures, and didn't find anything that was definitive. As others have said, if the bus width is smaller than the pointer bit-width, you might be in trouble. Specifically if the data falls across a cache-line boundary.
A prudent architecture will use a lock.

Related

Is assigning a pointer in C program considered atomic on x86-64

https://www.gnu.org/software/libc/manual/html_node/Atomic-Types.html#Atomic-Types says - In practice, you can assume that int is atomic. You can also assume that pointer types are atomic; that is very convenient. Both of these assumptions are true on all of the machines that the GNU C Library supports and on all POSIX systems we know of.
My question is whether pointer assignment can be considered atomic on x86_64 architecture for a C program compiled with gcc m64 flag. OS is 64bit Linux and CPU is Intel(R) Xeon(R) CPU D-1548. One thread will be setting a pointer and another thread accessing the pointer. There is only one writer thread and one reader thread. Reader should either be getting the previous value of the pointer or the latest value and no garbage value in between.
If it is not considered atomic, please let me know how can I use the gcc atomic builtins or maybe memory barrier like __sync_synchronize to achieve the same without using locks. Interested only in C solution and not C++. Thanks!

Bear in mind that atomicity alone is not enough for communicating between threads. Nothing prevents the compiler and CPU from reordering previous/subsequent load and store instructions with that "atomic" store. In old days people used volatile to prevent that reordering but that was never intended for use with threads and doesn't provide means to specify less or more restrictive memory order (see "Relationship with volatile" in there).
You should use C11 atomics because they guarantee both atomicity and memory order.

For almost all architectures, pointer load and store are atomic. A once notable exception was 8086/80286 where pointers could be seg:offset; there was an l[des]s instruction which could make an atomic load; but no corresponding atomic store.
The integrity of the pointer is only a small concern; your bigger issue revolves around synchronization: the pointer was at value Y, you set it to X; how will you know when nobody is using the (old) Y value?
A somewhat related problem is that you may have stored things at X, which the other thread expects to find. Without synchronization, other might see the new pointer value, however what it points to might not be up to date yet.

A plain global char *ptr should not be considered atomic. It might work sometimes, especially with optimization disabled, but you can get the compiler to make safe and efficient optimized asm by using modern language features to tell it you want atomicity.
Use C11 stdatomic.h or GNU C __atomic builtins. And see Why is integer assignment on a naturally aligned variable atomic on x86? - yes the underlying asm operations are atomic "for free", but you need to control the compiler's code-gen to get sane behaviour for multithreading.
See also LWN: Who's afraid of a big bad optimizing compiler? - weird effects of using plain vars include several really bad well-known things, but also more obscure stuff like invented loads, reading a variable more than once if the compiler decides to optimize away a local tmp and load the shared var twice, instead of loading it into a register. Using asm("" ::: "memory") compiler barriers may not be sufficient to defeat that depending on where you put them.
So use proper atomic stores and loads that tell the compiler what you want: You should generally use atomic loads to read them, too.
#include <stdatomic.h> // C11 way
_Atomic char *c11_shared_var; // all access to this is atomic, functions needed only if you want weaker ordering
void foo(){
atomic_store_explicit(&c11_shared_var, newval, memory_order_relaxed);
}
char *plain_shared_var; // GNU C
// This is a plain C var. Only specific accesses to it are atomic; be careful!
void foo() {
__atomic_store_n(&plain_shared_var, newval, __ATOMIC_RELAXED);
}
Using __atomic_store_n on a plain var is the functionality that C++20 atomic_ref exposes. If multiple threads access a variable for the entire time that it needs to exist, you might as well just use C11 stdatomic because every access needs to be atomic (not optimized into a register or whatever). When you want to let the compiler load once and reuse that value, do char *tmp = c11_shared_var; (or atomic_load_explicit if you only want acquire instead of seq_cst; cheaper on a few non-x86 ISAs).
Besides lack of tearing (atomicity of asm load or store), the other key parts of _Atomic foo * are:
The compiler will assume that other threads may have changed memory contents (like volatile effectively implies), otherwise the assumption of no data-race UB will let the compiler hoist loads out of loops. Without this, dead-store elimination might only do one store at the end of a loop, not updating the value multiple times.
The read side of the problem is usually what bites people in practice, see Multithreading program stuck in optimized mode but runs normally in -O0 - e.g. while(!flag){} becomes if(!flag) infinite_loop; with optimization enabled.
Ordering wrt. other code. e.g. you can use memory_order_release to make sure that other threads that see the pointer update also see all changes to the pointed-to data. (On x86 that's as simple as compile-time ordering, no extra barriers needed for acquire/release, only for seq_cst. Avoid seq_cst if you can; mfence or locked operations are slow.)
Guarantee that the store will compile to a single asm instruction. You'd be depending on this. It does happen in practice with sane compilers, although it's conceivable that a compiler might decide to use rep movsb to copy a few contiguous pointers, and that some machine somewhere might have a microcoded implementation that does some stores narrower than 8 bytes.
(This failure mode is highly unlikely; the Linux kernel relies on volatile load/store compiling to a single instruction with GCC / clang for its hand-rolled intrinsics. But if you just used asm("" ::: "memory") to make sure a store happened on a non-volatile variable, there's a chance.)
Also, something like ptr++ will compile to an atomic RMW operation like lock add qword [mem], 4, rather than separate load and store like volatile would. (See Can num++ be atomic for 'int num'? for more about atomic RMWs). Avoid that if you don't need it, it's slower. e.g. atomic_store_explicit(&ptr, ptr + 1, mo_release); - seq_cst loads are cheap on x86-64 but seq_cst stores aren't.
Also note that memory barriers can't create atomicity (lack of tearing), they can only create ordering wrt other ops.
In practice x86-64 ABIs do have alignof(void*) = 8 so all pointer objects should be naturally aligned (except in a __attribute__((packed)) struct which violates the ABI, so you can use __atomic_store_n on them. It should compile to what you want (plain store, no overhead), and meet the asm requirements to be atomic.
See also When to use volatile with multi threading? - you can roll your own atomics with volatile and asm memory barriers, but don't. The Linux kernel does that, but it's a lot of effort for basically no gain, especially for a user-space program.
Side note: an often repeated misconception is that volatile or _Atomic are needed to avoid reading stale values from cache. This is not the case.
All machines that run C11 threads across multiple cores have coherent caches, not needing explicit flush instructions in the reader or writer. Just ordinary load or store instructions, like x86 mov. The key is to not let the compiler keep values of shared variable in CPU registers (which are thread-private). It normally can do this optimization because of the assumption of no data-race Undefined Behaviour. Registers are very much not the same thing as L1d CPU cache; managing what's in registers vs. memory is done by the compiler, while hardware keeps cache in sync. See When to use volatile with multi threading? for more details about why coherent caches is sufficient to make volatile work like memory_order_relaxed.
See Multithreading program stuck in optimized mode but runs normally in -O0 for an example.

"Atomic" is treated as this quantum state where something can be both atomic and not atomic at the same time because "it's possible" that "some machines" "somewhere" "might not" write "a certain value" atomically. Maybe.
That is not the case. Atomicity has a very specific meaning, and it solves a very specific problem: threads being pre-empted by the OS to schedule another thread in its place on that core. And you cannot stop a thread from executing mid-assembly instruction.
What that means is that any single assembly instruction is "atomic" by definition. And since you have registry moving instructions, any register-sized copy is atomic by definition. That means a 32-bit integer on a 32-bit CPU, and a 64-bit integer on a 64-bit CPU are all atomic -- and of course that includes pointers (ignore all the people who will tell you "some architectures" have pointers of "different size" than registers, that hasn't been the case since 386).
You should however be careful not to hit variable caching problems (ie one thread writing a pointer, and another trying to read it but getting an old value from the cache), use volatile as needed to prevent this.

Race condition when accessing adjacent members in a shared struct, according to CERT coding rule POS49-C?

According to CERT coding rule POS49-C it is possible that different threads accessing different fields of the same structure may conflict.
Instead of bit-field, I use regular unsigned int.
struct multi_threaded_flags {
unsigned int flag1;
unsigned int flag2;
};
struct multi_threaded_flags flags;
void thread1(void) {
flags.flag1 = 1;
}
void thread2(void) {
flags.flag2 = 2;
}
I can see that even unsigned int, there can still be racing condition IF compiler decides to use load/store 8 bytes instead of 4 bytes.
I think compiler will never do that and racing condition will never happen here, but that's completely just my guess.
Is there any well-defined assembly/compiler documentation regarding this case ? I hope locking, which is costly, is the last resort when this situation happens to be undefined.
FYI, I use gcc.

The C11 memory model guarantees that accesses to distinct structure members (which aren't part of a bit-field) are independent, so you'll run into no problems modifying the two flags from different threads (i.e., the "load 8 bytes, modify 4, and write back 8" scenario is not allowed).
This guarantee does not extend in general to bitfields, so you have to be careful there.
Of course, if you are concurrently modifying the same flag from more than one thread, you'll likely trigger the prohibition against data races, so don't do that.

Before C11, ISO C had nothing to say about threads, and writing multi-threaded code relied on other standards (e.g. POSIX which defines a memory model for pthreads), and multi-threaded code essentially depended on the way real compilers worked.
Note that this CERT coding-standard rule is in the POSIX section, and appears to be about pthreads without C11. (There's a CON32-C. Prevent data races when accessing bit-fields from multiple threads rule for C11, where they solve the bit-field concurrency problem by simply promoting the bit-fields to unsigned char, which C11 defines as "separate memory locations". This rule appears to be an insufficiently-edited copy of that, because many of its suggestions suck.)
But unfortunately POSIX pthreads doesn't clearly define what a "memory location is", and this is all they have to say on the subject:
Single UNIX® Specification, Version 4, 2016 Edition (online HTML copy, requires free registration)
4.12 Memory Synchronization
Applications shall ensure that access to any memory location by more
than one thread of control (threads or processes) is restricted such
that no thread of control can read or modify a memory location while
another thread of control may be modifying it. Such access is
restricted using functions that synchronize thread execution and also
synchronize memory with respect to other threads.
This is why C11 defines it more clearly, where only bitfields are dangerous to write from different threads (barring compiler bugs).
However, I think everyone (including all compilers) agreed that separate int variables / struct members / array elements were separate "memory locations". Most real-world software doesn't take any special precautions for int or char variables that may be written by separate threads (especially outside of structs).
A compiler that gets int wrong will cause problems all over the place unless the bug is limited to very specific circumstances.
Most bugs like this are very hard to detect with testing, because usually the other data that's non-atomically loaded and stored back isn't written by another thread very often / ever. But if a compiler always did that for every int, problems would show up in some software pretty quickly.
Normally, separate char members would also be considered separate "memory locations", but some pre-C11 implementations might have exceptions to that rule. (Especially on early Alpha AXP which famously has no byte store instruction (so a C11 implementation would have to use 32-bit char), but optimizations that invent writes when updating multiple members can happen anywhere, either by accident or because the compiler developers define "memory location" as a 32 or 64-bit word.)
There's also the issue of compiler bugs. This can affect even compilers that intend to conform to C11. For example gcc bug 52080 which affected some non-x86 architectures. (Discovered in gcc4.7 in 2012, fixed in gcc4.8 a couple months later). Using a bitfield "tricked" the compiler into doing a non-atomic read-modify-write of the containing 64-bit word, even though that included a non-bitfield member. (Bitfields are bait for compiler bugs. Any defensive / safe-coding standard should recommend avoiding them in structs where different members can be modified from different threads. And definitely don't put them next to the actual lock.)
Herb Sutter's talk atomic<> Weapons: The C++ Memory Model and Modern Hardware part 2 goes into some detail about the kinds of compiler bugs that have affected multi-threaded code. Most of these should be shaken out by now (2017) if you're using a modern compiler. Most things like inventing writes (or non-atomic read and write-back of the same value) were usually still considered bugs before C11; C11 mostly just firmed up the rules compilers were already trying to follow. It also made it easier to report such bugs, because you could say unequivocally that it violates the standard instead of just "it breaks my code".
That coding-rule article is poorly written. Its examples with adjacent bit-fields are unsafe, but it claims that all variables are at risk. This is not true in general, especially not with C11. Many users of pthreads can or already do compile with C11 compilers.
(The phrasing I'm referring to is "Bit-fields are especially prone to this behavior", which incorrectly implies that this is allowed to happen with ordinary members of structs, or variables that happen to be adjacent outside of structs)
It's part of a defensive coding standard, but it should definitely make the distinction between what the standards require and what is just belt-and-suspenders defense against compiler bugs.
Also, putting variables that will usually be accessed by different threads into one struct is generally terrible. False sharing of a cache line (typically 64 bytes) is really bad for performance, causing cache misses and (on out-of-order x86 CPUs) memory-ordering mis-speculation (like a branch mispredict requiring a roll-back.) Putting separately-used shared variables into the same byte with bit-fields is even worse, because it prevents efficient stores (any store has to be a RMW of the containing byte).
Solving the bit-field problem by promoting the two bit-fields to unsigned char makes much more sense than using a mutex if they need to be independently writeable from separate threads. Or even unsigned long if you're paranoid.
If the two members are often used together, it makes sense to put them nearby. But if you're going to pad to a whole long to contain both members (like that article does), you might as well make them at least unsigned char or bool instead of 1-byte bitfields.
Although honestly, having two threads modify separate members of a struct at the same time seems like poor design unless one of the members is the lock, and the modification is part of an attempt to take the lock. Using a bit-field as a lock is a bad idea unless you're writing for a specific ISA building and your own lock primitive using something like x86's lock bts instruction to atomically test-and-set a bit. Even then it's a bad idea unless you need to pack it with other bitfields for space saving; the Linux code that exposed the gcc bug with an int lock:1 member was a horrible idea.
In addition, the flags are declared volatile to ensure that the compiler will not attempt to move operations on them outside the mutex.
If your compiler needs this, your compiler is seriously broken, and will create broken code for most multi-threaded programs. (Unless the compiler bug only happens with bit-fields, because shared bit-fields are rare).
Most code doesn't make shared variables volatile, and relies on the guarantee that mutex lock/unlock stops operations from reordering at compile or run time out of the critical section.
Back in 2012, and possibly still today, gcc -pthread might affect code-gen choices in C89/C99 mode (-std=gnu99). In discussion on an LWN article about that gcc bug, this user claimed that -pthread would prohibit the compiler from doing a 64-bit load/store when modifying a 32-bit variable, but that without -pthread it could do so (although on most architectures, IDK why it would). But it turns out that gcc bug manifested even with -pthread, so it was really a bug rather than an aggressive optimization choice.
ISO C11 standard:
N1570, section 3.14 definitions:
memory location: either an object of scalar type, or a maximal sequence of adjacent bit-fields all having nonzero width
NOTE 1 Two threads of execution can update and access separate memory locations without interfering with each other.
A bit-field and an adjacent non-bit-field member are in separate memory locations. ... It is not safe to concurrently update two non-atomic bit-fields in the same structure if all members declared between them are also (non-zero-length) bit-fields, no matter what the sizes of those intervening bit-fields happen to be.
(...gives an example of a struct with bit-fields...)
So in C11, you can't assume anything about the compiler munging other bit-fields when writing one bit-field, but otherwise you're safe. Unless you use a separator :0 field to force the compiler pad enough (or use atomic bit-ops) so that it can update your bit-field without concurrency problems for other fields. But if you want to be safe, it's probably not a good idea to use bit-fields at all in structs that are written by multiple threads at once.
See also other Notes in the C11 standard, e.g. the one linked by #Fuz in Is it well-defined behavior to modify one element of an array while another thread modifies another element of the same array? that explicitly says that compiler transformations that would make this dangerous are disallowed.

How does sig_atomic_t actually work?

How does the compiler or OS distinguish between sig_atomic_t type and a normal int type variable, and ensures that the operation will be atomic? Programs using both have same assembler code. How extra care is taken to make the operation atomic?

sig_atomic_t is not an atomic data type. It is just the data type that you are allowed to use in the context of a signal handler, that is all. So better read the name as "atomic relative to signal handling".
To guarantee communication with and from a signal handler, only one of the properties of atomic data types is needed, namely the fact that read and update will always see a consistent value. Other data types (such as perhaps long long) could be written with several assembler instructions for the lower and higher part, e.g. sig_atomic_t is guaranteed to be read and written in one go.
So a platform may choose any integer base type as sig_atomic_t for which it can make the guarantee that volatile sig_atomic_t can be safely used in signal handlers. Many platforms chose int for this, because they know that for them int is written with a single instruction.
The latest C standard, C11, has atomic types, but which are a completely different thing. Some of them (those that are "lockfree") may also be used in signal handlers, but that again is a completely different story.

Note that sig_atomic_t is not thread-safe, only async-signal safe.
Atomics involve two types of barriers:
Compiler barrier. It makes sure that the compiler does not reorder reads/writes from/to an atomic variable relative to reads and writes to other variables. This is what volatile keyword does.
CPU barrier and visibility. It makes sure that the CPU does not reorder reads and writes. On x86 all loads and stores to aligned 1,2,4,8-byte storage are atomic. Visibility makes sure that stores become visible to other threads. Again, on Intel CPUs, stores are visible immediately to other threads due to cache coherence and memory coherence protocol MESI. But that may change in the future. See §8.1 LOCKED ATOMIC OPERATIONS in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A for more details.
For comprehensive treatment of the subject watch atomic Weapons: The C++ Memory Model and Modern Hardware.

sig_atomic_t is often just a typedef (to some system specific integral type, generally int or long). And it is very important to use volatile sig_atomic_t (not just sig_atomic_t alone).
When you add the volatile keyword, the compiler has to avoid a lot of optimizations.
The recent C11 standard added _Atomic and <stdatomic.h>. You need a very recent GCC (e.g. 4.9) to have it supported.

Programs using both have same assembler code. How extra care is taken to make the operation atomic?
Although this is an old question, I think it's still worth addressing this part of the question specifically. On Linux, sig_atomic_t is provided by glibc. sig_atomic_t in glibc is a typedef for int and has no special treatment (as of this post). The glibc docs address this:
In practice, you can assume that int is atomic. You can also assume
that pointer types are atomic; that is very convenient. Both of these
assumptions are true on all of the machines that the GNU C Library
supports and on all POSIX systems we know of.
In other words, it just so happens that regular int already satisfies the requirements of sig_atomic_t on all the platforms that glibc supports and no special support is needed. Nonetheless, the C and POSIX standards mandate sig_atomic_t because there could be some exotic machine on which we want to implement C and POSIX for which int does not fulfill the requirements of sig_atomic_t.

This data type seems to be atomic.
From here:
24.4.7.2 Atomic Types To avoid uncertainty about interrupting access to a variable, you can use a particular data type for which access is
always atomic: sig_atomic_t. Reading and writing this data type is
guaranteed to happen in a single instruction, so there’s no way for a
handler to run “in the middle” of an access.
The type sig_atomic_t is always an integer data type, but which one it
is, and how many bits it contains, may vary from machine to machine.
Data Type: sig_atomic_t This is an integer data type. Objects of this
type are always accessed atomically.
In practice, you can assume that int is atomic. You can also assume
that pointer types are atomic; that is very convenient. Both of these
assumptions are true on all of the machines that the GNU C Library
supports and on all POSIX systems we know of.

It pays off to have studied some kernel-development-level memory models...
Anyway, sig_atomic_t is atomic. The normal definition of atomic is that you can't get a "partial" result, e.g. due to concurrent writes, or concurrent read and write. Attaching any other properties to "atomic" is dangerous, and causes the type of confusion seen here.
So, when you do any sort of sig_atomic_t store, you are guaranteed to either get the old value, or the new value when something reads it back -- be it before, during, or after that store.
Answering your direct question about "how that works": the compiler will use an underlying type size and issue extra machine instructions where required, to signal the CPU that it must do an atomic store and atomic read.
All that said, it is important to note that you really can't say much about whether you will get the old or the new value when you try to read an atomic variable like sig_atomic_t. All you know is that you will not get a mix of two different stores that raced each other, nor a mix of the old and the new value while a store is happening concurrently with your read.
In C, you also normally need to declare variables as "volatile sig_atomic_t" because otherwise the compiler has no reason to not cache it, and you could be using an older value for longer than expected: the compiler has no reason to force a fresh memory read if it already has an old value in a register from a previous read. "volatile" tells the compiler to always do a fresh memory read when it needs to get the value of the variable.
Note that neither "volatile" nor "sig_atomic_t" are strong enough "compiler barriers" to ensure it is not reordered around by the compiler optimizer, let alone by the CPU itself (which would require a memory barrier, not just a compiler barrier). If you need any visibility constraints re. other threads, processors, and even hardware when doing MMIO, you need "extra stuff" (compiler barriers, and memory barriers).
And that's where C11 _Atomic and the C11 memory models come into play. They're not about "atomic" reads and stores only, they also include a lot of visibility rules and constraints re. other entities (MMIO devices, other execution threads, other processors).

What memory address spaces are there?

What forms of memory address spaces have been used?
Today, a large flat virtual address space is common. Historically, more complicated address spaces have been used, such as a pair of a base address and an offset, a pair of a segment number and an offset, a word address plus some index for a byte or other sub-object, and so on.
From time to time, various answers and comments assert that C (or C++) pointers are essentially integers. That is an incorrect model for C (or C++), since the variety of address spaces is undoubtedly the cause of some of the C (or C++) rules about pointer operations. For example, not defining pointer arithmetic beyond an array simplifies support for pointers in a base and offset model. Limits on pointer conversion simplify support for address-plus-extra-data models.
That recurring assertion motivates this question. I am looking for information about the variety of address spaces to illustrate that a C pointer is not necessarily a simple integer and that the C restrictions on pointer operations are sensible given the wide variety of machines to be supported.
Useful information may include:
Examples of computer architectures with various address spaces and descriptions of those spaces.
Examples of various address spaces still in use in machines currently being manufactured.
References to documentation or explanation, especially URLs.
Elaboration on how address spaces motivate C pointer rules.
This is a broad question, so I am open to suggestions on managing it. I would be happy to see collaborative editing on a single generally inclusive answer. However, that may fail to award reputation as deserved. I suggest up-voting multiple useful contributions.

Just about anything you can imagine has probably been used. The
first major division is between byte addressing (all modern
architectures) and word addressing (pre-IBM 360/PDP-11, but
I think modern Unisys mainframes are still word addressed). In
word addressing, char* and void* would often be bigger than
an int*; even if they were not bigger, the "byte selector"
would be in the high order bits, which were required to be 0, or
would be ignored for anything other than bytes. (On a PDP-10,
for example, if p was a char*, (int)p < (int)(p+1) would
often be false, even though int and char* had the same
size.)
Among byte addressed machines, the major variants are segmented
and non-segmented architectures. Both are still wide spread
today, although in the case of Intel 32bit (a segmented
architecture with 48 bit addresses), some of the more widely
used OSs (Windows and Linux) artificially restrict user
processes to a single segment, simulating a flat addressing.
Although I've no recent experience, I would expect even more
variety in embedded processors. In particular, in the past, it
was frequent for embedded processors to use a Harvard
architecture, where code and data were in independent address
spaces (so that a function pointer and a data pointer, cast to a
large enough integral type, could compare equal).

I would say you are asking the wrong question, except as historical curiosity.
Even if your system happens to use a flat address space -- indeed, even if every system from now until the end of time uses a flat address space -- you still cannot treat pointers as integers.
The C and C++ standards leave all sorts of pointer arithmetic "undefined". That can impact you right now, on any system, because compilers will assume you avoid undefined behavior and optimize accordingly.
For a concrete example, three months ago a very interesting bug turned up in Valgrind:
https://sourceforge.net/p/valgrind/mailman/message/29730736/
(Click "View entire thread", then search for "undefined behavior".)
Basically, Valgrind was using less-than and greater-than on pointers to try to determine if an automatic variable was within a certain range. Because comparisons between pointers in different aggregates is "undefined", Clang simply optimized away all of the comparisons to return a constant true (or false; I forget).
This bug itself spawned an interesting StackOverflow question.
So while the original pointer arithmetic definitions may have catered to real machines, and that might be interesting for its own sake, it is actually irrelevant to programming today. What is relevant today is that you simply cannot assume that pointers behave like integers, period, regardless of the system you happen to be using. "Undefined behavior" does not mean "something funny happens"; it means the compiler can assume you do not engage in it. When you do, you introduce a contradiction into the compiler's reasoning; and from a contradiction, anything follows... It only depends on how smart your compiler is.
And they get smarter all the time.

There are various forms of bank-switched memory.
I worked on an embedded system that had 128 KB of total memory: 64KB of RAM and 64KB of EPROM. Pointers were only 16-bit, so a pointer into the RAM could have the same value of a pointer in the EPROM, even though they referred to different memory locations.
The compiler kept track of the type of the pointer so that it could generate the instruction(s) to select the correct bank before dereferencing a pointer.
You could argue that this was like segment + offset, and at the hardware level, it essentially was. But the segment (or more correctly, the bank) was implicit from the pointer's type and not stored as the value of a pointer. If you inspected a pointer in the debugger, you'd just see a 16-bit value. To know whether it was an offset into the RAM or the ROM, you had to know the type.
For example, Foo * could only be in RAM and const Bar * could only be in ROM. If you had to copy a Bar into RAM, the copy would actually be a different type. (It wasn't as simple as const/non-const:
Everything in ROM was const, but not all consts were in ROM.)
This was all in C, and I know we used non-standard extensions to make this work. I suspect a 100% compliant C compiler probably couldn't cope with this.

From a C programmer's perspective, there are three main kinds of implementation to worry about:
Those which target machines with a linear memory model, and which are designed and/or configured to be usable as a "high-level assembler"--something the authors of the Standard have expressly said they did not wish to preclude. Most implementations behave in this way when optimizations are disabled.
Those which are usable as "high-level assemblers" for machines with unusual memory architectures.
Those which whose design and/or configuration make them suitable only for tasks that do not involve low-level programming, including clang and gcc when optimizations are enabled.
Memory-management code targeting the first type of implementation will often be compatible with all implementations of that type whose targets use the same representations for pointers and integers. Memory-management code for the second type of implementation will often need to be specifically tailored for the particular hardware architecture. Platforms that don't use linear addressing are sufficiently rare, and sufficiently varied, that unless one needs to write or maintain code for some particular piece of unusual hardware (e.g. because it drives an expensive piece of industrial equipment for which more modern controllers aren't available) knowledge of any particular architecture isn't likely to be of much use.
Implementations of the third type should be used only for programs that don't need to do any memory-management or systems-programming tasks. Because the Standard doesn't require that all implementations be capable of supporting such tasks, some compiler writers--even when targeting linear-address machines--make no attempt to support any of the useful semantics thereof. Even some principles like "an equality comparison between two valid pointers will--at worst--either yield 0 or 1 chosen in possibly-unspecified fashion don't apply to such implementations.

Does a CPU assigns a value atomically to memory?

A quick question I've been wondering about for some time; Does the CPU assign values atomically, or, is it bit by bit (say for example a 32bit integer).
If it's bit by bit, could another thread accessing this exact location get a "part" of the to-be-assigned value?
Think of this:
I have two threads and one shared "unsigned int" variable (call it "g_uiVal").
Both threads loop.
On is printing "g_uiVal" with printf("%u\n", g_uiVal).
The second just increase this number.
Will the printing thread ever print something that is totally not or part of "g_uiVal"'s value?
In code:
unsigned int g_uiVal;
void thread_writer()
{
g_uiVal++;
}
void thread_reader()
{
while(1)
printf("%u\n", g_uiVal);
}

Depends on the bus widths of the CPU and memory. In a PC context, with anything other than a really ancient CPU, accesses of up to 32 bit accesses are atomic; 64-bit accesses may or may not be. In the embedded space, many (most?) CPUs are 32 bits wide and there is no provision for anything wider, so your int64_t is guaranteed to be non-atomic.

I believe the only correct answer is "it depends". On what you may ask?
Well for starters which CPU. But also some CPUs are atomic for writing word width values, but only when aligned. It really is not something you can guarantee at a C language level.
Many compilers offer "intrinsics" to emit correct atomic operations. These are extensions which act like functions, but emit the correct code for your target architecture to get the needed atomic operations. For example: http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html

You said "bit-by-bit" in your question. I don't think any architecture does operations a bit at a time, except with some specialized serial protocol busses. Standard memory read/writes are done with 8, 16, 32, or 64 bits of granularity. So it is POSSIBLE the operation in your example is atomic.
However, the answer is heavily platform dependent.
It depends on the CPU's capabilities.
Can the hardware do an atomic 32-bit
operation? Here's a hint: If the
variable you are working on is larger
than the native register size (e.g.
64-bit int on a 32-bit system), it's
definitely NOT atomic.
It depends on how the compiler
generates the machine code. It could
have turned your 32-bit variable
access into 4x 8-bit memory reads.
It gets tricky if the address of what
you are accessing is not aligned
across a machine's natural word
boundary. You can hit a a cache
fault or page fault.
It is VERY POSSIBLE that you would see a corrupt or unexpected value using the code example that you posted.
Your platform probably provides some method of doing atomic operations. In the case of a Windows platform, it is via the Interlocked functions. In the case of Linux/Unix, look at the atomic_t type.

To add to what has been said so far - another potential concern is caching. CPUs tend to work with the local (on die) memory cache which may or may not be immediately flushed back to the main memory. If the box has more than one CPU, it is possible that another CPU will not see the changes for some time after the modifying CPU made them - unless there is some synchronization command informing all CPUs that they should synchronize their on-die caches. As you can imagine such synchronization can considerably slow the processing down.

Don't forget that the compiler assumes single-thread when optimizing, and this whole thing could just go away.

POSIX defines the special type sig_atomic_t which guarentees that writes to it are atomic with respect to signals, which will make it also atomic from the point of view of other threads like you want. They don't specifically define an atomic cross-thread type like this, since thread communication is expected to be mediated by mutexes or other sychronization primitives.

Considering modern microprocessors (and ignoring microcontrollers), the 32-bit assignment is atomic, not bit-by-bit.
However, now completely off of your question's topic... the printing thread could still print something that is not expected because of the lack of synchronization in this example, of course, due to instruction reordering and multiple cores each with their own copy of g_uiVal in their caches.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight