What memory address spaces are there?

What memory address spaces are there? - c

What forms of memory address spaces have been used?
Today, a large flat virtual address space is common. Historically, more complicated address spaces have been used, such as a pair of a base address and an offset, a pair of a segment number and an offset, a word address plus some index for a byte or other sub-object, and so on.
From time to time, various answers and comments assert that C (or C++) pointers are essentially integers. That is an incorrect model for C (or C++), since the variety of address spaces is undoubtedly the cause of some of the C (or C++) rules about pointer operations. For example, not defining pointer arithmetic beyond an array simplifies support for pointers in a base and offset model. Limits on pointer conversion simplify support for address-plus-extra-data models.
That recurring assertion motivates this question. I am looking for information about the variety of address spaces to illustrate that a C pointer is not necessarily a simple integer and that the C restrictions on pointer operations are sensible given the wide variety of machines to be supported.
Useful information may include:
Examples of computer architectures with various address spaces and descriptions of those spaces.
Examples of various address spaces still in use in machines currently being manufactured.
References to documentation or explanation, especially URLs.
Elaboration on how address spaces motivate C pointer rules.
This is a broad question, so I am open to suggestions on managing it. I would be happy to see collaborative editing on a single generally inclusive answer. However, that may fail to award reputation as deserved. I suggest up-voting multiple useful contributions.

Just about anything you can imagine has probably been used. The
first major division is between byte addressing (all modern
architectures) and word addressing (pre-IBM 360/PDP-11, but
I think modern Unisys mainframes are still word addressed). In
word addressing, char* and void* would often be bigger than
an int*; even if they were not bigger, the "byte selector"
would be in the high order bits, which were required to be 0, or
would be ignored for anything other than bytes. (On a PDP-10,
for example, if p was a char*, (int)p < (int)(p+1) would
often be false, even though int and char* had the same
size.)
Among byte addressed machines, the major variants are segmented
and non-segmented architectures. Both are still wide spread
today, although in the case of Intel 32bit (a segmented
architecture with 48 bit addresses), some of the more widely
used OSs (Windows and Linux) artificially restrict user
processes to a single segment, simulating a flat addressing.
Although I've no recent experience, I would expect even more
variety in embedded processors. In particular, in the past, it
was frequent for embedded processors to use a Harvard
architecture, where code and data were in independent address
spaces (so that a function pointer and a data pointer, cast to a
large enough integral type, could compare equal).

I would say you are asking the wrong question, except as historical curiosity.
Even if your system happens to use a flat address space -- indeed, even if every system from now until the end of time uses a flat address space -- you still cannot treat pointers as integers.
The C and C++ standards leave all sorts of pointer arithmetic "undefined". That can impact you right now, on any system, because compilers will assume you avoid undefined behavior and optimize accordingly.
For a concrete example, three months ago a very interesting bug turned up in Valgrind:
https://sourceforge.net/p/valgrind/mailman/message/29730736/
(Click "View entire thread", then search for "undefined behavior".)
Basically, Valgrind was using less-than and greater-than on pointers to try to determine if an automatic variable was within a certain range. Because comparisons between pointers in different aggregates is "undefined", Clang simply optimized away all of the comparisons to return a constant true (or false; I forget).
This bug itself spawned an interesting StackOverflow question.
So while the original pointer arithmetic definitions may have catered to real machines, and that might be interesting for its own sake, it is actually irrelevant to programming today. What is relevant today is that you simply cannot assume that pointers behave like integers, period, regardless of the system you happen to be using. "Undefined behavior" does not mean "something funny happens"; it means the compiler can assume you do not engage in it. When you do, you introduce a contradiction into the compiler's reasoning; and from a contradiction, anything follows... It only depends on how smart your compiler is.
And they get smarter all the time.

There are various forms of bank-switched memory.
I worked on an embedded system that had 128 KB of total memory: 64KB of RAM and 64KB of EPROM. Pointers were only 16-bit, so a pointer into the RAM could have the same value of a pointer in the EPROM, even though they referred to different memory locations.
The compiler kept track of the type of the pointer so that it could generate the instruction(s) to select the correct bank before dereferencing a pointer.
You could argue that this was like segment + offset, and at the hardware level, it essentially was. But the segment (or more correctly, the bank) was implicit from the pointer's type and not stored as the value of a pointer. If you inspected a pointer in the debugger, you'd just see a 16-bit value. To know whether it was an offset into the RAM or the ROM, you had to know the type.
For example, Foo * could only be in RAM and const Bar * could only be in ROM. If you had to copy a Bar into RAM, the copy would actually be a different type. (It wasn't as simple as const/non-const:
Everything in ROM was const, but not all consts were in ROM.)
This was all in C, and I know we used non-standard extensions to make this work. I suspect a 100% compliant C compiler probably couldn't cope with this.

From a C programmer's perspective, there are three main kinds of implementation to worry about:
Those which target machines with a linear memory model, and which are designed and/or configured to be usable as a "high-level assembler"--something the authors of the Standard have expressly said they did not wish to preclude. Most implementations behave in this way when optimizations are disabled.
Those which are usable as "high-level assemblers" for machines with unusual memory architectures.
Those which whose design and/or configuration make them suitable only for tasks that do not involve low-level programming, including clang and gcc when optimizations are enabled.
Memory-management code targeting the first type of implementation will often be compatible with all implementations of that type whose targets use the same representations for pointers and integers. Memory-management code for the second type of implementation will often need to be specifically tailored for the particular hardware architecture. Platforms that don't use linear addressing are sufficiently rare, and sufficiently varied, that unless one needs to write or maintain code for some particular piece of unusual hardware (e.g. because it drives an expensive piece of industrial equipment for which more modern controllers aren't available) knowledge of any particular architecture isn't likely to be of much use.
Implementations of the third type should be used only for programs that don't need to do any memory-management or systems-programming tasks. Because the Standard doesn't require that all implementations be capable of supporting such tasks, some compiler writers--even when targeting linear-address machines--make no attempt to support any of the useful semantics thereof. Even some principles like "an equality comparison between two valid pointers will--at worst--either yield 0 or 1 chosen in possibly-unspecified fashion don't apply to such implementations.

Related

C Pointer Sizes [duplicate]

By conducting a basic test by running a simple C++ program on a normal desktop PC it seems plausible to suppose that sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits ?
For example: in 32 bits architectures -> 4 bytes and in 64 bits architectures -> 8 bytes.
However I remember reading that, it is not like that in general!
So I was wondering what would be such circumstances?
For equality of size of pointers to data types compared with size of pointers
to other data types
For equality of size of pointers to data types compared with size of pointers
to functions
For equality of size of pointers to target architecture

No, it is not reasonable to assume. Making this assumption can cause bugs.
The sizes of pointers (and of integer types) in C or C++ are ultimately determined by the C or C++ implementation. Normal C or C++ implementations are heavily influenced by the architectures and the operating systems they target, but they may choose the sizes of their types for reasons other than execution speed, such as goals of supporting lower memory use (smaller pointers means less memory used in programs with lots of pointers), supporting code that was not written to be fully portable to any type sizes, or supporting easier use of big integers.
I have seen a compiler targeted for a 64-bit system but providing 32-bit pointers, for the purpose of building programs with smaller memory use. (It had been observed that the sizes of pointers were a considerable factor in memory consumption, due to the use of many structures with many connections and references using pointers.) Source code written with the assumption that the pointer size equalled the 64-bit register size would break.

It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
Depends. If you're aiming for a quick estimate of memory consumption it can be good enough. But not if your programs correctness depends on it.
(including pointers to functions)
But here is one important remark. Although most pointers will have the same size, function pointers may differ. It is not guaranteed that a void* will be able to hold a function pointer. At least, this is true for C. I don't know about C++.
So I was wondering what would be such circumstances if any?
It can be tons of reasons why it differs. If your programs correctness depends on this size it is NEVER ok to do such an assumption. Check it up instead. It shouldn't be hard at all.
You can use this macro to check such things at compile time in C:
#include <assert.h>
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
When compiling, this gives an error message:
$ gcc main.c
In file included from main.c:1:
main.c:2:1: error: static assertion failed: "Pointers are assumed to be exactly 4 bytes"
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
^~~~~~~~~~~~~
If you're using C++, you can skip #include <assert.h> because static_assert is a keyword in C++. (And you can use the keyword _Static_assert in C, but it looks ugly, so use the include and the macro instead.)
Since these two lines are so extremely easy to include in your code, there's NO excuse not to do so if your program would not work correctly with the wrong pointer size.

It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
It might be reasonable, but it isn't reliably correct. So I guess the answer is "no, except when you already know the answer is yes (and aren't worried about portability)".
Potentially:
systems can have different register sizes, and use different underlying widths for data and addressing: it's not apparent what "target architecture bits" even means for such a system, so you have to choose a specific ABI (and once you've done that you know the answer, for that ABI).
systems may support different pointer models, such as the old near, far and huge pointers; in that case you need to know what mode your code is being compiled in (and then you know the answer, for that mode)
systems may support different pointer sizes, such as the X32 ABI already mentioned, or either of the other popular 64-bit data models described here
Finally, there's no obvious benefit to this assumption, since you can just use sizeof(T) directly for whatever T you're interested in.
If you want to convert between integers and pointers, use intptr_t. If you want to store integers and pointers in the same space, just use a union.

Target architecture "bits" says about registers size. Ex. Intel 8051 is 8-bit and operates on 8-bit registers, but (external)RAM and (external)ROM is accessed with 16-bit values.

For correctness, you cannot assume anything. You have to check and be prepared to deal with weird situations.
As a general rule of thumb, it is a reasonable default assumption.
It's not universally true though. See the X32 ABI, for example, which uses 32bit pointers on 64bit architectures to save a bit of memory and cache footprint. Same for the ILP32 ABI on AArch64.
So, for guesstimating memory use, you can use your assumption and it will often be right.

It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
If you look at all types of CPUs (including microcontrollers) currently being produced, I would say no.
Extreme counterexamples would be architectures where two different pointer sizes are used in the same program:
x86, 16-bit
In MS-DOS and 16-bit Windows, a "normal" program used both 16- and 32-bit pointers.
x86, 32-bit segmented
There were only a few, less known operating systems using this memory model.
Programs typically used both 32- and 48-bit pointers.
STM8A
This modern automotive 8-bit CPU uses 16- and 24-bit pointers. Both in the same program, of course.
AVR tiny series
RAM is addressed using 8-bit pointers, Flash is addressed using 16-bit pointers.
(However, AVR tiny cannot be programmed with C++, as far as I know.)

It's not correct, for example DOS pointers (16 bit) can be far (seg+ofs).
However, for the usual targets (Windows, OSX, Linux, Android, iOS) then it's correct. Because they all use the flat programming model which relies on paging.
In theory, you can also have systems which uses only the lower 32 bits when in x64. An example is a Windows executable linked without LARGEADDRESSAWARE. However this is to help the programmer avoid bugs when switching to x64. The pointers are truncated to 32 bits, but they are still 64 bit.
In x64 operating systems then this assumption is always true, because the flat mode is the only valid one. Long mode in CPU forces GDT entries to be 64 bit flat.
One also mentions a x32 ABI, I believe it is based on the same paging technology, forcing all pointers to be mapped to the lower 4gb. However this must be based to the same theory as in Windows. In x64 you can only have flat mode.
In 32 bit protected mode you could have pointers up to 48 bits. (Segmented mode). You can also have callgates. But, no operating system uses that mode.

Historically, on microcomputers and microcontrollers, pointers were often wider than general-purpose registers so that the CPU could address enough memory and still fit within the transistor budget. Most 8-bit CPUs (such as the 8080, Z80 or 6502) had 16-bit addresses.
Today, a mismatch is more likely to be because an app doesn’t need multiple gigabytes of data, so saving four bytes of memory on every pointer is a win.
Both C and C++ provide separate size_t, uintptr_t and off_t types, representing the largest possible object size (which might be smaller than the size of a pointer if the memory model is not flat), an integral type wide enough to hold a pointer, and a file offset (often wider than the largest object allowed in memory), respectively. A size_t (unsigned) or ptrdiff_t (signed) is the most portable way to get the native word size. Additionally, POSIX guarantees that the system compiler has some flag that means a long can hold any of these, but you cannot always assume so.

Generally pointers will be size 2 on a 16-bit system, 3 on a 24-bit system, 4 on a 32-bit system, and 8 on a 64-bit system. It depends on the ABI and C implementation. AMD has long and legacy modes, and there are differences between AMD64 and Intel64 for Assembly language programmers but these are hidden for higher level languages.
Any problems with C/C++ code is likely to be due to poor programming practices and ignoring compiler warnings. See: "20 issues of porting C++ code to the 64-bit platform".
See also: "Can pointers be of different sizes?" and LRiO's answer:
... you are asking about C++ and its compliant implementations, not some specific physical machine. I'd have to quote the entire standard in order to prove it, but the simple fact is that it makes no guarantees on the result of sizeof(T*) for any T, and (as a corollary) no guarantees that sizeof(T1*) == sizeof(T2*) for any T1 and T2).
Note: Where is answered by JeremyP, C99 section 6.3.2.3, subsection 8:
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
In GCC you can avoid incorrect assumptions by using built-in functions: "Object Size Checking Built-in Functions":
Built-in Function: size_t __builtin_object_size (const void * ptr, int type)
is a built-in construct that returns a constant number of bytes from ptr to the end of the object ptr pointer points to (if known at compile time). To determine the sizes of dynamically allocated objects the function relies on the allocation functions called to obtain the storage to be declared with the alloc_size attribute (see Common Function Attributes). __builtin_object_size never evaluates its arguments for side effects. If there are any side effects in them, it returns (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3. If there are multiple objects ptr can point to and all of them are known at compile time, the returned number is the maximum of remaining byte counts in those objects if type & 2 is 0 and minimum if nonzero. If it is not possible to determine which objects ptr points to at compile time, __builtin_object_size should return (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3.

Rationale for pointer comparisons outside an array to be UB

So, the standard (referring to N1570) says the following about comparing pointers:
C99 6.5.8/5 Relational operators
When two pointers are compared, the result depends on the relative
locations in the address space of the objects pointed to.
... [snip obvious definitions of comparison within aggregates] ...
In all other cases,
the behavior is undefined.
What is the rationale for this instance of UB, as opposed to specifying (for instance) conversion to intptr_t and comparison of that?
Is there some machine architecture where a sensible total ordering on pointers is hard to construct? Is there some class of optimization or analysis that unrestricted pointer comparisons would impede?
A deleted answer to this question mentions that this piece of UB allows for skipping comparison of segment registers and only comparing offsets. Is that particularly valuable to preserve?
(That same deleted answer, as well as one here, note that in C++, std::less and the like are required to implement a total order on pointers, whether the normal comparison operator does or not.)

Various comments in the ub mailing list discussion Justification for < not being a total order on pointers? strongly allude to segmented architectures being the reason. Including the follow comments, 1:
Separately, I believe that the Core Language should simply recognize the fact that all machines these days have a flat memory model.
and 2:
Then we maybe need an new type that guarantees a total order when
converted from a pointer (e.g. in segmented architectures, conversion
would require taking the address of the segment register and adding the
offset stored in the pointer).
and 3:
Pointers, while historically not totally ordered, are practically so
for all systems in existence today, with the exception of the ivory tower
minds of the committee, so the point is moot.
and 4:
But, even if segmented architectures, unlikely though it is, do come
back, the ordering problem still has to be addressed, as std::less
is required to totally order pointers. I just want operator< to be an
alternate spelling for that property.
Why should everyone else pretend to suffer (and I do mean pretend,
because outside of a small contingent of the committee, people already
assume that pointers are totally ordered with respect to operator<) to
meet the theoretical needs of some currently non-existent
architecture?
Counter to the trend of comments from the ub mailing list, FUZxxl points out that supporting DOS is a reason not to support totally ordered pointers.
Update
This is also supported by the Annotated C++ Reference Manual(ARM) which says this was due to burden of supporting this on segmented architectures:
The expression may not evaluate to false on segmented architectures
[...] This explains why addition, subtraction and comparison of
pointers are defined only for pointers into an array and one element
beyond the end. [...] Users of machines with a nonsegmented address
space developed idioms, however, that referred to the elements beyond
the end of the array [...] was not portable to segmented architectures
unless special effort was taken [...] Allowing [...] would be costly
and serve few useful purposes.

The 8086 is a processor with 16 bit registers and a 20 bit address space. To cope with the lack of bits in its registers, a set of segment registers exists. On memory access, the dereferenced address is computed like this:
address = 16 * segment + register
Notice that among other things, an address has generally multiple ways to be represented. Comparing two arbitrary addresses is tedious as the compiler has to first normalize both addresses and then compare the normalized addresses.
Many compilers specify (in the memory models where this is possible) that when doing pointer arithmetic, the segment part is to be left untouched. This has several consequences:
objects can have a size of at most 64 kB
all addresses in an object have the same segment part
comparing addresses in an object can be done just by comparing the register part; that can be done in a single instruction
This fast comparison of course only works when the pointers are derived from the same base-address, which is one of the reasons why the C standard defines pointer comparisons only for when both pointers point into the same object.
If you want a well-ordered comparison for all pointers, consider converting the pointers to uintptr_t values first.

I believe it's undefined so that C can be run on architectures where, in effect, "smart pointers" are implemented in hardware, with various checks to ensure that pointers never accidentally point outside of the memory regions they're defined to refer to. I've never personally used such a machine, but the way to think about them is that computing an invalid pointer is precisely as forbidden as dividing by 0; you're likely to get a run-time exception that terminates your program. Furthermore, what's forbidden is computing the pointer, you don't even have to dereference it to get the exception.
Yes, I believe the definition also ended up permitting more-efficient comparisons of offset registers in old 8086 code, but that was not the only reason.
Yes, a compiler for one of these protected pointer architectures could theoretically implement the "forbidden" comparisons by converting to unsigned or the equivalent, but (a) it would likely be significantly less efficient to do so and (b) that would be a wantonly deliberate circumvention of the architecture's intended protection, protection which at least some of the architecture's C programmers would presumably want to have enabled (not disabled).

Historically, saying that action invoked Undefined Behavior meant that any program which made use of such actions could be expected to correctly only on those implementations which defined, for that action, behavior meeting their requirements. Specifying that an action invoked Undefined Behavior didn't mean that programs using such action should be considered "illegitimate", but was rather intended to allow C to be used to run programs that didn't require such actions, on platforms which could not efficiently support them.
Generally, the expectation was that a compiler would either output the sequence of instructions which would most efficiently perform the indicated action in the cases required by the standard, and do whatever that sequence of instructions happened to do in other cases, or would output a sequence of instructions whose behavior in such cases was deemed to be in some fashion more "useful" than the natural sequence. In cases where an action might trigger a hardware trap, or where triggering an OS trap might plausibly in some cases be considered preferable to executing the "natural" sequence of instructions, and where a trap might cause behaviors outside the control of the C compiler, the Standard imposes no requirements. Such cases are thus labeled as "Undefined Behavior".
As others have noted, there are some platforms where p1 < p2, for unrelated pointers p1 and p2, could be guaranteed to yield 0 or 1, but where the most efficient means of comparing p1 and p2 that would work in the cases defined by the Standard might not uphold the usual expectation that p1 < p2 || p2 > p2 || p1 != p2. If a program written for such a platform knows that it will never deliberately compare unrelated pointers (implying that any such comparison would represent a program bug) it may be helpful to have stress-testing or troubleshooting builds generate code which traps on any such comparisons. The only way for the Standard to allow such implementations is to make such comparisons Undefined Behavior.
Until recently, the fact that a particular action would invoke behavior that was not defined by the Standard would generally only pose difficulties for people trying to write code on platforms where the action would have undesirable consequences. Further, on platforms where an action could only have undesirable consequences if a compiler went out of its way to make it do so, it was generally accepted practice for programmers to rely upon such an action behaving sensibly.
If one accepts the notions that:
The authors of the Standard expected that comparisons between unrelated pointers would work usefully on those platforms, and only those platforms, where the most natural means of comparing related pointers would also work with unrelated ones, and
There exist platforms where comparing unrelated pointers would be problematic
Then it makes complete sense for the Standard to regard unrelated-pointer comparisons as Undefined Behavior. Had they anticipated that even compilers for platforms which define a disjoint global ranking for all pointers might make unrelated-pointer comparisons negate the laws of time and causality (e.g. given:
int needle_in_haystack(char const *hs_base, int hs_size, char *needle)
{ return needle >= hs_base && needle < hs_base+hs_size; }
a compiler may infer that the program will never receive any input which would cause needle_in_haystack to be given unrelated pointers, and any code which would only be relevant when the program receives such input may be eliminated) I think they would have specified things differently. Compiler writers would probably argue that the proper way to write needle_in_haystack would be:
int needle_in_haystack(char const *hs_base, int hs_size, char *needle)
{
for (int i=0; i<size; i++)
if (hs_base+i == needle) return 1;
return 0;
}
since their compilers would recognize what the loop is doing and also recognize that it's running on a platform where unrelated pointer comparisons work, and thus generate the same machine code as older compilers would have generated for the earlier-stated formulation. As to whether it would be better to require compilers provide a means of specifying that code resembling the former version should either sensibly on platforms that will support it or refuse compilation on those that won't, or better to require that programmers intending the former semantics should write the latter and hope that optimizers turn it into something useful, I leave that to the reader's judgment.

Cost of union access vs using fundamental types

I have a large block of data where some operations would be fastest if the block were viewed as an array of 64 bit unsigned integers and others would be fastest if viewed as an array of 32 bit unsigned integers. By 'fastest', I mean fastest on average for the machines that will be running the code. My goal is to be near optimal in all the environments running the code, and I think this is possible if I use a void pointer, casting it to one of the two types for dereferencing. This brings me to my questions:
1) If I use a void pointer, will casting it to one of the two types for dereferencing be slower than directly using a pointer of the desired type?
2) Am I correct in my understanding of the standard that doing this will not violate the anti-aliasing rules, and that it will not produce any undefined or unspecified behaviour? The 32 and 64 bit types I am using exist and have no padding (this is a static assertion).
3) Am I correct in understanding the anti-aliasing rules to basically serve two purposes: type safety and compiler guarantees to enable optimization? If so, if all situations where the code I am discussing will be executed are such that no other dereferencing is happening, am I likely to loose out on any significant compiler optimizations?
I have tagged this with 'c11' because I need to prove from the c11 standard that the behaviour is well defined. Any references to the standard would be appreciated.
Finally, I would like to address a likely concern to be brought up in the responses, regarding "premature optimization". First off, this code is being ran on a diverse computing cluster, were performance is critical, and I know that even a one instruction slowdown in dereferencing would be significant. Second, testing this on all the hardware would take time I don't have to finish the project. There are a lot of different types of hardware, and I have a limited amount of time on site to actually work with the hardware. However, I am confident that an answer to this question will enable me to make the right design choice anyway.
EDIT: An answer and comments pointed out that there is an aliasing problem with this approach, which I verified directly in the c11 standard. An array of unions would require two address calculations and dereferencings in the 32 bit case, so I'd prefer a union of arrays. The questions then become:
1) Is there a performance problem in using a union member as an array as opposed to a pointer to the memory? I.e., is there a cost in union member access? Note that declaring two pointers to the the arrays violates the anti-aliasing rules, so access would need to be made directly through the union.
2) Are the contents of the array guaranteed invariant when accessed through one array then through the other?

There are different aspects to your question. First of all, interpreting memory with different types has several problems:
aliasing
alignment
padding
Aliasing is a "local" problem. Inside a function, you don't want to have pointers to the same object that have a different target type. If you do modify such pointed to objects, the compiler may pretend not to know that the object may have changed and optimize your program falsely. If you don't do that inside a function (e.g do a cast right at the beginning and stay with that interpretation) you should be fine for aliasing.
Alignment problems are often overlooked nowadays because many processors now are quite tolerant with alignment problems, but this is nothing portable and might also have performance impacts. So you'd have to ensure that your array is aligned in a way that is suitable for all types that you access it. This can be done with _Alignas in C11, older compilers have extensions that also allow for this. C11 adds some restrictions to aligment, e.g that this is always a power of 2, which should enable you to write portable code with respect to this problem.
Integer type padding is something rare these days (only exception is _Bool) but to be sure you should use types that are known not to have problems with that. In your case these are [u]int32_t and [u]int64_t that are known to have exactly the number of bits requested and of have two's complement representation for the signed types. If a platform doesn't support them, your program would simply not compile.

I would refrain from using a void pointer. A union of two arrays or an array of union will do better.
Use a proper alignment on the whole type. C11 provides alignas() as keywords. GCC has attributes for alignment which are non-standard (and work in per-11 standards as well). Other compilers may have none at all.
Depending on your architecture, there should be no performance impact. But this cannot be guaranteed (I do not see an issue her, however). You might even align the type to a larger type than 64 bits to fill a cache line perfectly. That might speed up prefetch and writeback.
Aliasing refers to the fact that an object is referenced by multiple pointer a the same time. This means the same memory address can be addressed using two different "sources". The problem is that the compiler may not be aware abouth this and thus may hold the value of a variable in a CPU register during some calculation without writing it back to memory instantly.
If the same variable is then referenced by the other "source" (i.e. pointer), the compiler may read invalid data from the memory location.
Imo is aliasing only relevant within a function if two pointers are pased inside. So, if you do not intend to pass two pointers to the same object (or part of it) there should be no problem at all. Otherwise, you should get comfortable with (compiler)barriers. Edit: C standard seems to be a bit more strict on that, as it requires just the lvalues accessing an object to fulfill certain criteria (C11 6.5/7 (n1570) - thks Matt McNabb).
Oh, and don't use int/long/etc. You really should use stdint.h types if you really need proper sized types.

Approved syntax for raw pointer manipulation

I am making a memory block copy routine and need to deal with blocks of raw memory in efficient chunks. My question is not about the specialized copy routine I'm making, but in how to correctly examine raw pointer alignment in C.
I have a raw pointer of memory, let's say it's already cast as a non-null char *.
In my architecture, I can very efficiently copy memory in 64 byte chunks WHEN IT IS ALIGNED TO A 64 BYTE chunk. So the (standard) trick is that I will do a simple copy of 0-63 bytes "manually" at the head and/or tail to transform the copy from an arbitrary char* of arbitrary length to a 64 byte aligned pointer with some multiple of 64 bytes in length.
Now the question is, how do you legally "examine" a pointer to determine (and manipulate) its alignment?
The obvious way is to cast it into an integer and just examine the bits:
char *pointer=something.
int p=(int)pointer;
char *alignedPointer=(char *)((p+63)&~63);
Note here I realize that alignedPointer doesn't point to the same memory as pointer... this is the "rounded up" pointer that I can call my efficient copy routine on, and I'll handle any other bytes at the beginning manually.
But compilers (justifiably) freak out at casting a pointer into an integer. But how else can I examine and manipulate the pointer's lower bits in LEGAL C? Ideally so that with different compilers I'd get no errors or warnings.

For integer types that are large enough to hold pointers, C99 stdint.h has:
uintptr_t
intptr_t
For data lengths there are:
size_t
ssize_t
which have been around since well before C99.
If your platform doesn't have these, you can maximise your code's portability by still using these type names, and making suitable typedefs for them.

I don't think that in the past people were as reluctant to do their own bit-banging, but maybe the current "don't touch that" mood would be conducive to someone creating some kind of standard library for aligning pointers. Lacking some kind of official api, you have no choice but to AND and OR your way through.

Instead of int, try a datatype that's guaranteed to be the same size as a pointer (INT_PTR on Win32/64). Maybe the compiler won't freak out too much. :) Or use a union, if 64-bit compatibility is not important.

Casting pointers to and from integers is valid, but the results are implementation-defined. See section 6.3.2.3 of the standard. The intention seems to be that the results are what anybody familiar with the system would expect, and indeed this appears to be routinely the case in practice.
If the architecture in question can efficiently manipulate pointers and integers interchangeably, and the issue is just whether it will work on all compilers for that system, then the answer is that it probably will anyway.
(Certainly, if I were writing this code, I would think it fine as-is until proven otherwise. My experience has been that compilers for a given system all behave in very similar ways at this sort of level; the assembly language just suggests a particular approach, that all then take.)
"Probably works" isn't very good general advice though, so my suggestion would be just write the code that works, surround it enough suitable #ifdefs that only the known compiler(s) will compile it, and defer to memcpy in other cases.
#ifdef is rarely ideal, but it's fairly lightweight compared to other possibilities. And if implementation-defined behaviour or compiler-specific tricks are needed then the options are pretty limited anyway.

Why doesn't GCC optimize structs?

Systems demand that certain primitives be aligned to certain points within the memory (ints to bytes that are multiples of 4, shorts to bytes that are multiples of 2, etc.). Of course, these can be optimized to waste the least space in padding.
My question is why doesn't GCC do this automatically? Is the more obvious heuristic (order variables from biggest size requirement to smallest) lacking in some way? Is some code dependent on the physical ordering of its structs (is that a good idea)?
I'm only asking because GCC is super optimized in a lot of ways but not in this one, and I'm thinking there must be some relatively cool explanation (to which I am oblivious).

gcc does not reorder the elements of a struct, because that would violate the C standard. Section 6.7.2.1 of the C99 standard states:
Within a structure object, the non-bit-ﬁeld members and the units in which bit-ﬁelds
reside have addresses that increase in the order in which they are declared.

Structs are frequently used as representations of the packing order of binary file formats and network protocols. This would break if that were done. In addition, different compilers would optimize things differently and linking code together from both would be impossible. This simply isn't feasible.

GCC is smarter than most of us in producing machine code from our source code; however, I shiver if it was smarter than us in re-arranging our structs, since it's data that e.g. can be written to a file. A struct that starts with 4 chars and then has a 4 byte integer would be useless if read on another system where GCC decided that it should re-arrange the struct members.

gcc SVN does have a structure reorganization optimization (-fipa-struct-reorg), but it requires whole-program analysis and isn't very powerful at the moment.

C compilers don't automatically pack structs precisely because of alignment issues like you mention. Accesses not on word boundaries (32-bit on most CPUs) carry heavy penalty on x86 and cause fatal traps on RISC architectures.

Not saying it's a good idea, but you can certainly write code that relies on the order of the members of a struct. For example, as a hack, often people cast a pointer to a struct as the type of a certain field inside that they want access to, then use pointer arithmetic to get there. To me this is a pretty dangerous idea, but I've seen it used, especially in C++ to force a variable that's been declared private to be publicly accessible when it's in a class from a 3rd party library and isn't publicly encapsulated. Reordering the members would totally break that.

You might want to try the latest gcc trunk or, struct-reorg-branch which is under active development.
https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Olga+Golovanevsky_+Memory+Layout+Optimizations+of+Structures+and+Objects.pdf

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight