In embedded software domain for copying structure of same type people don't use direct assignment and do that by memcpy() function or each element copying.
lets have for example
struct tag
int a;
int b;
struct tag exmple1 = {10,20};
struct tag exmple2;
for copying exmple1 into exmple2..
instead of writing direct
people use
memcpy(exmple2,exmple1,sizeof(struct tag));
why ????

One way or the other there is nothing specific about embedded systems that makes this dangerous, the language semantics are identical for all platforms.
C has been used in embedded systems for many years, and early C compilers, before ANSI/ISO standardisation did not support direct structure assignment. Many practitioners are either from that era, or have been taught by those that were, or are using legacy code written by such practitioners. This is probably the root of the doubt, but it is not a problem on an ISO compliant implementation. On some very resource constrained targets, the available compiler may not be fully ISO compliant for a number of reasons, but I doubt that this feature would be affected.
One issue (that applies to embedded and non-embedded alike), is that when assigning a structure, an implementation need not duplicate the value of any undefined padding bits, therefore if you performed a structure assignment, and then performed a memcmp() rather than member-by-member comparison to test for equality, there is no guarantee that they will be equal. However if you perform a memcpy(), any padding bits will be copied so that memcmp() and member-by-member comparison will yield equality.
So it is arguably safer to use memcpy() in all cases (not just embedded), but the improvement is marginal, and not conducive to readability. It would be a strange implementation that did not use the simplest method of structure assignment, and that is a simple memcpy(), so it is unlikely that the theoretical mismatch would occur.

In your given code there is no problem even if you write:
example2 = example1;
But just assume if in future, the struct definition changes to:
struct tag
int a[1000];
int b;
Now if you execute the assignment operator as above then (some of the) compiler might inline the code for byte by byte (or int by int) copying. i.e.
example1.a[0] = example.a[0];
example1.a[1] = example.a[1];
example1.a[2] = example.a[2];
which will result in code bloat in your code segment. Such kind of memory errors are not trivial to find. That's why people use memcpy.
[However, I have heard that modern compilers are capable enough to use memcpy internally when such instruction is encountered especially for PODs.]

Copying C-structures via memcpy() is often used by programmers who learned C decades ago and did not follow the standardization process since. They simple don't know that C supports assignment of structures (direct structure assignment was not available in all pre-ANSI-C89 compilers).
When they learn about this feature some still stick to the memcpy() way because it is their custom. There are also motivations that originate in cargo cult programming, e.g. it is claimed that memcpy is just faster - of course - without being able to back this up with a benchmark test case.
Structures are also memcpy()ied by some newbie programmers because they either confuse structure assignment with the assignment of a pointer of a structure - or they simply overuse memcpy() (they often also use memcpy() where strcpy() would be more appropriate).
There is also the memcmp() structure comparison anti-pattern that is sometimes cited by some programmers for using memcpy() instead of structure assignment. The reasoning behind this is the following: since C does not automatically generate a == operator for structures and writing a custom structure comparison function is tedious, memcmp() is used to compare structures. In the next step - to avoid differences in the padding bits of compared structures - memset(...,0,...) is used to initialize all structures (instead of using the C99 initializer syntax or initializing all fields separately) and memcpy() is used to copy the structures! Because memcpy() also copies the content of the padding bits ...
But note that this reasoning is flawed for several reasons:
the use of memcpy()/memcmp()/memset() introduce new error possibilities - e.g. supplying a wrong size
when the structure contains integer fields the ordering under memcmp() changes between big- and little-endian architectures
a char array field of size n that is 0-terminated at position x must also have all elements after position x zeroed out at any time - else 2 otherwise equal structs compare unequal
assignment from a register to a field may also set the neighbouring padding bits to values unequal 0, thus, following comparisons with otherwise equal structures yield an unequal result
The last point is best illustrated with a small example (assuming architecture X):
struct S {
int a; // on X: sizeof(int) == 4
char b; // on X: 24 padding bits are inserted after b
int c;
typedef struct S S;
S s1;
memset(&s1, 0, sizeof(S));
s1.a = 0;
s1.b = 'a';
s1.c = 0;
S s2;
memcpy(&s2, &s1, sizeof(S));
assert(memcmp(&s1, &s2, sizeof(S)==0); // assertion is always true
s2.b = 'x';
assert(memcmp(&s1, &s2, sizeof(S)!=0); // assertion is always true
// some computation
char x = 'x'; // on X: 'x' is stored in a 32 bit register
// as least significant byte
// the other bytes contain previous data
s1.b = x; // the complete register is copied
// i.e. the higher 3 register bytes are the new
// padding bits in s1
assert(memcmp(&s1, &s2, sizeof(S)==0); // assertion is not always true
The failure of the last assertion may depend on code reordering, change of the compiler, change of compiler options and stuff like that.
As a general rule: to increase code correctness and portability use direct struct assignment (instead of memcpy()), C99 struct initialization syntax (instead of memset) and a custom comparison function (instead of memcmp()).

In C people probably do that, because they think that memcpy would be faster. But I don't think that is true. Compiler optimizations would take care of that.
In C++ it may also have different semantics because of user defined assignment operator and copy constructors.

On top of what the others wrote some additional points:
Using memcpy instead of a simple assignment gives a hint to someone who maintains the code that the operation might be expensive. Using memcpy in these cases will improves the understanding of the code.
Embedded systems are often written with portability and performance in mind. Portability is important because you may want to re-use your code even if the CPU in the original design is not available or if a cheaper micro-controller can do the same job.
These days low-end micro-controllers come and go faster than the compiler developers can catch up, so it is not uncommon to work with compilers that use a simple byte-copy loop instead of something optimized for structure assignments. With the move to 32 bit ARM cores this is not true for a large part of embedded developers. There are however a lot of people out there who build products that target obscure 8 and 16 bit micro-controllers.
A memcpy tuned for a specific platform may be more optimal than what a compiler can generate. For example on embedded platforms having structures in flash memory is common. Reading from flash is not as slow as writing to it, but it is still a lot slower than a ordinary copy from RAM to RAM. A optimized memcpy function may use DMA or special features from the flash controller to speed up the copy process.

That is a complete nonsense. Use whichever way you prefer. The simplest is :

Whatever you do, don't do this:
It poses a maintainability problem because any time that anyone adds a member to the structure, they have to add a line of code to do the copy of that member. Someone is going to forget to do that and it will cause a hard to find bug.

On some implementations, the way in which memcpy() is performed may differ from the way in which "normal" structure assignment would be performed, in a manner that may be important in some narrow contexts. For example, one or the other structure operand may be unaligned and the compiler might not know about it (e.g. one memory region might have external linkage and be defined in a module written in a different language that has no means of enforcing alignment). Use of a __packed declaration would be better if a compiler supported such, but not all compilers do.
Another reason for using something other than structure assignment could be that a particular implementation's memcpy might access its operands in a sequence that would work correctly with certain kinds of volatile source or destination, while that implementation's struct assignment might use a different sequence that wouldn't work. This is generally not a good reason to use memcpy, however, since aside from the alignment issue (which memcpy is required to handle correctly in any case) the specifications for memcpy don't promise much about how the operation will be performed. It would be better to use a specially-written routine which performed the operations exactly as required (for example, if the target is a piece of hardware which needs to have 4 bytes of structure data written using four 8-bit writes rather than one 32-bit writes, one should write a routine which does that, rather than hoping that no future version of memcpy decides to "optimize" the operation).
A third reason for using memcpy in some cases would be the fact that compilers will often perform small structure assignments using a direct sequence of loads and stores, rather than using a library routine. On some controllers, the amount of code this requires may vary depending upon where the structures are located in memory, to the point that the load/store sequence may end up being bigger than a memcpy call. For example, on a PICmicro controller with 1Kwords of code space and 192 bytes of RAM, coping a 4-byte structure from bank 1 to bank 0 would take 16 instructions. A memcpy call would take eight or nine (depending upon whether count is an unsigned char or int [with only 192 bytes of RAM total, unsigned char should be more than sufficient!] Note, however, that calling a memcpy-ish routine which assumed a hard-coded size and required both operands be in RAM rather than code space would only require five instructions to call, and that could be reduced to four with the use of a global variable.

first version is perfect.
second one may be used for speed (there is no reason for your size).
3rd one is used only if padding is different for target and source.


Is it guaranteed that the padding bits of "zeroed" structure will be zeroed in C?

This statement in the article made me embarrassed:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment for the target. If you zero a structure and then set some of the fields, will the padding bits all be zero? According to the results of the survey, 36 percent were sure that they would be, and 29 percent didn't know. Depending on the compiler (and optimization level), it may or may not be.
It was not completely clear, so I turned to the standard. The ISO/IEC 9899 in § states:
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
Also in §
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
I just remembered that I recently implemented let's say some kind of hack, where I used the not-declared part of byte owned by bit-field. It was something like:
/* This struct is always allocated on the heap and is zeroed. */
struct some_struct {
/* initial part ... */
enum {
} some_enum:8;
unsigned char flag:1;
unsigned char another_flag:1;
unsigned int size_of_smth;
/* ... remaining part */
The structure was not at my disposal therefore I couldn't change it, but I had an acute need to pass some information through it. So I calculated an address of corresponding byte like:
unsigned char *ptr = &some->size_of_smth - 1;
*ptr |= 0xC0; /* set flags */
Then later I checked flags the same way.
Also I should mention that the target compiler and platform were defined, so it's not a cross-platform thing. However, current questions are still take a place:
Can I rely on the fact that the padding bits of struct (in heap) will be still zeroed after memset/kzalloc/whatever and after some subsequent using? (This post does not disclose the topic in terms of the standard and safeguards for the further use of struct). And what about struct zeroed on stack like = {0}?
If yes, does it mean that I can safely use "unnamed"/"not declared" part of bit-field to transfer some info for my purposes everywhere (different platform, compiler, ..) in C? (If I know for sure that no one crazy is trying to store anything in this byte).
The short answer to your first question is "no".
While an appropriate call of memset(), such as memset(&some_struct_instance, 0, sizeof(some_struct)) will set all bytes in the structure to zero, that change is not required to be persistent after "some use" of some_struct_instance, such as setting any of the members within it.
So, for example, there is no guarantee that some_struct_instance.some_enum = THREE (i.e. storing a value into a member) will leave any padding bits in some_struct_instance unchanged. The only requirement in the standard is that values of other members of the structure are unaffected. However, the compiler may (in emitted object code or machine instructions) implement the assignment using some set of bitwise operations, and be allowed to take shortcuts in a way that doesn't leave the padding bits alone (e.g. by not emitting instructions that would otherwise ensure the padding bits are unaffected).
Even worse, a simple assignment like some_struct_instance = some_other_struct_instance (which, by definition, is the storing of a value into some_struct_instance) comes with no guarantees about the values of padding bits. It is not guaranteed that the padding bits in some_struct_instance will be set to the same bitwise values as padding bits in some_other_struct_instance, nor is there a guarantee that the padding bits in some_struct_instance will be unchanged. This is because the compiler is allowed to implement the assignment in whatever means it deems most "efficient" (e.g. copying memory verbatim, some set of member-wise assignments, or whatever) but - since the value of padding bits after the assignment are unspecified - is not required to ensure the padding bits are unchanged.
If you get lucky, and fiddling with the padding bits works for your purpose, it will not be because of any support in the C standard. It will be because of good graces of the compiler vendor (e.g. choosing to emit a set of machine instructions that ensure padding bits are not changed). And, practically, there is no guarantee that the compiler vendor will keep doing things the same way - for example, your code that relies on such a thing may break when the compiler is updated, when you choose different optimisation settings, or whatever.
Since the answer to your first question is "no", there is no need to answer your second question. However, philosophically, if you are trying to store data in padding bits of a structure, it is reasonable to assert that someone else - crazy or not - may potentially attempt to do the same thing, but using an approach that messes up the data you are attempting to pass around.
From the first words of the standard specification:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment ...
These words mean that, in the aim to optimize (optimize for speed, probably, but also to avoid architecture restrictions on data/address buses), the compiler can make use of hidden, not-used, bits or bytes. NOT-USED because they would be forbidden or costly to address.
This also imply that those bytes or bits should not be visible from a programming perspective, and it should be considered a programming error to try to access those hidden data.
About those added data, the standard says that their content is "unspecified", and there is really no better way to state what an implementation can do with them. Think at those bitfield declarations, where you can declare integers with any bit width: no normal hardware will permit to read/write from memory in chunks smaller that 8 bits, so the CPU will always read or write at least 8 bits (sometimes, even more). Why should a compiler (an implementation) take care of doing something useful to those other bits, which the programmer specified he does not care about? It's a non sense: the programmer didn't give a name to some memory address, but then he wants to manipulate it?
The padding bytes between fields is pretty much the same matter as before: those added bytes are necessary, but the programmer is not interested in them - and he SHOULD NOT change its mind later!
Of course, one can study an implementation and arrive at some conclusion like "padding bytes will always be zeroed" or something like that. This is risky (are you sure they will be always-always zeroed?) but, more important, it is totally useless: if you need more data in a structure, simply declare them! And you will have no problem, never, even porting the source to different platforms or implementations.
It is reasonable to start with the expectation that what is listed in the standard is correctly implemented. You're looking for further assurances for a particular architecture. Personally, if I could find documented details about that particular architecture, I would be reassured; if not, I would be cautious.
What constituted "cautious" would depend on how confident I needed to be. For example, building a detailed test set and running this periodically on my target architecture would give me a reasonable degree of confidence, but it's all about how much risk you want to take. If it's really, really important, stick to what they standards guarantee you; if it's less so, test and see if you can get enough confidence for what you need.

Copy Arbitrary Type in C Without Dynamic Memory Allocation

The Question:
I think I have figured out a way that, near as I can tell, allows you to write completely type-agnostic code that makes a copy of a variable of arbitrary type on the "stack" (in quotes because C standard does not actually require there to be a stack, so what I really mean is that it's copied with the auto storage class in local scope). Here it is:
/* Save/duplicate thingToCopy */
char copyPtr[sizeof(thingToCopy)];
memcpy(copyPtr, &thingToCopy, sizeof(thingToCopy));
/* modify the thingToCopy variable to do some work; do NOT do operations directly on the data in copyPtr, that's just a "storage bin". */
/* Restore old value of thingToCopy */
memcpy(&thingToCopy, copyPtr, sizeof(thingToCopy));
From my limited testing it works and near as I can tell it should work on all standards-compliant C implementations, but just in case I missed something, I'd like to know:
Is this completely in line with the C standard (I believe this should be good all the way from C89 through to the modern stuff), and if not, is it possible to fix and how?
What limitations on usage does this method force upon itself in order to stay standards-compliant?
For example, as I understand it, I am safe from alignment issues so long as I never use the char-array temp-copies directly - just as bins to save to and load from with memcpy. But I couldn't pass those addresses to other functions expecting pointers to the type I'm working with, without risking alignment issues (obviously syntactically I could do it perversely by first getting a void * from the char *, without even specifying the exact type I'm working with, but the point is that I think I would be triggering undefined behavior by doing so).
Is there is a more clean and/or performant* way to achieve the same thing?
*GCC 4.6.1 on my armel v7 test device, with -O3 optimization, produced identical code to regular code using normal assignments to temporary variables, but it could be that my test cases were just simple enough that it was able to figure it out, and that it would get confused if this technique were used more generally.
As a bonus passing interest, I'm curious if this would break in mostly-C-compatible languages (the ones I know of are C++, Objective-C, D, and maybe C#, though mentions of others are welcome too).
This is why I think the above works, in case you find it helpful to know where I'm coming from in order to explain any mistakes I may have made:
The C standard's "byte" (in the traditional sense of "smallest addressable unit of memory", not in the modernized "8 bits" meaning) is the char type - the sizeof operator produces numbers in units of char. So we can get exactly the smallest size of storage (that we can work with in C) needed for an arbitrary variable's type by using the sizeof operator on that variable.
The C standard guarantees that pretty all pointer types can be converted implicitly into a void * (but with a change of representation if their representation is different (but incidentally, the C standard guarantees that void * and char * have identical representations)).
The "name" of an array of a given type, and a pointer to that same type, can basically be treated identically as far as the syntax is concerned.
The sizeof operator is figured out at compile-time, so we can do char foo[sizeof(bar)] without depending on the effectively non-portable VLAs.
Therefore, we should be able to declare an array of "chars" that is the minimum size necessary to hold a given type.
Thus we should be able to pass the address of the variable to be copied, and name of the array, to memcpy (as I understand it, the array name is implicitly used as a char * to the first element of the array). Since any pointer can be implicitly converted to a void * (with change of representation is necessary), this works.
The memcpy should make a bitwise copy of the variable we are copying to the array. Regardless of what the type is, any padding bits involved, etc, the sizeof guarantees we'll grab all the bits that make up the type, including padding.
Since we can't explicitly use/declare the type of the variable we just copied, and because some architectures might have alignment requirements for various types that this hack might violate some of the time, we can't use this copy directly - we'd have to memcpy it back into the variable we got it from, or one of the same type, in order to make use of it. But once we copy it back, we have an exact copy of what we put there in the first place. Essentially, we are freeing the variable itself to be used as scratch space.
Motivation (or, "Dear God Why!?!"):
I like to write type-independent code when useful, and yet I also enjoy coding in C, and combining the two largely comes down to writing the generic code in function-like macros (you can then re-claim type-checking by making wrapper function definitions which call the function-like macro). Think of it like really crude templates in C.
As I've done this, I've run into situations where I needed an additional variable of scratch space, but, given the lack of a portable typeof() operator, I cannot declare any temporary variables of a matching type in such "generic macro" snippets of code. This is the closest thing to a truly portable solution that I've found.
Since we can do this trick multiple times (large enough char array that we can fit several copies, or several char arrays big enough to fit one), as long as we can keep our memcpy calls and copy pointer names straight, it's functionally like having an arbitrary number of temporary variables of the copied type, while being able to keep the generic code type-agnostic.
P.S. To slightly deflect the likely-inevitable rain of judgement, I'd like to say that I do recognize that this is seriously convoluted, and I would only reserve this in practice for very well-tested library code where it significantly added usefulness, not something I would regularly deploy.
Yes, it works. Yes, it is C89 standard. Yes, it is convoluted.
Minor improvement
A table of bytes char[] can start at any position in memory.
Depending on the content of your thingToCopy, and depending on CPU, this can result in sub-optimal copy performance.
Should speed matter (since it may not if this operation is rare), you may prefer to align your table, using int, long long or size_t units instead.
Major limitation
Your proposition only works if you know the size of thingToCopy.
This is a major issue : that means your compiler needs to know what thingToCopy is at compilation type (hence, it cannot be an incomplete type).
Hence, the following sentence is troubling :
Since we can't explicitly use/declare the type of the variable we just copied
No way. In order to compile char copyPtr[sizeof(thingToCopy)];, the compiler must know what thingToCopy is, hence it must have access to its type !
If you know it, you can simply do :
thingToCopy_t save;
save = thingToCopy;
/* do some stuff with thingToCopy */
thingToCopy = save;
which is clearer to read, and even better from an alignment perspective.
It would be bad to use your code on an object containing a pointer (except const pointer to const). Someone might modify the pointed-to data, or the pointer itself (e.g. realloc). This would leave your copy of the object in an unexpected or even invalid state.
Generic programming is one of the main driving forces behind C++. Others have tried to do generic programming in C using macros and casts. It's OK for small examples, but doesn't scale well. The compiler can't catch bugs for you when you use those techniques.

Rationale for pointer comparisons outside an array to be UB

So, the standard (referring to N1570) says the following about comparing pointers:
C99 6.5.8/5 Relational operators
When two pointers are compared, the result depends on the relative
locations in the address space of the objects pointed to.
... [snip obvious definitions of comparison within aggregates] ...
In all other cases,
the behavior is undefined.
What is the rationale for this instance of UB, as opposed to specifying (for instance) conversion to intptr_t and comparison of that?
Is there some machine architecture where a sensible total ordering on pointers is hard to construct? Is there some class of optimization or analysis that unrestricted pointer comparisons would impede?
A deleted answer to this question mentions that this piece of UB allows for skipping comparison of segment registers and only comparing offsets. Is that particularly valuable to preserve?
(That same deleted answer, as well as one here, note that in C++, std::less and the like are required to implement a total order on pointers, whether the normal comparison operator does or not.)
Various comments in the ub mailing list discussion Justification for < not being a total order on pointers? strongly allude to segmented architectures being the reason. Including the follow comments, 1:
Separately, I believe that the Core Language should simply recognize the fact that all machines these days have a flat memory model.
and 2:
Then we maybe need an new type that guarantees a total order when
converted from a pointer (e.g. in segmented architectures, conversion
would require taking the address of the segment register and adding the
offset stored in the pointer).
and 3:
Pointers, while historically not totally ordered, are practically so
for all systems in existence today, with the exception of the ivory tower
minds of the committee, so the point is moot.
and 4:
But, even if segmented architectures, unlikely though it is, do come
back, the ordering problem still has to be addressed, as std::less
is required to totally order pointers. I just want operator< to be an
alternate spelling for that property.
Why should everyone else pretend to suffer (and I do mean pretend,
because outside of a small contingent of the committee, people already
assume that pointers are totally ordered with respect to operator<) to
meet the theoretical needs of some currently non-existent
Counter to the trend of comments from the ub mailing list, FUZxxl points out that supporting DOS is a reason not to support totally ordered pointers.
This is also supported by the Annotated C++ Reference Manual(ARM) which says this was due to burden of supporting this on segmented architectures:
The expression may not evaluate to false on segmented architectures
[...] This explains why addition, subtraction and comparison of
pointers are defined only for pointers into an array and one element
beyond the end. [...] Users of machines with a nonsegmented address
space developed idioms, however, that referred to the elements beyond
the end of the array [...] was not portable to segmented architectures
unless special effort was taken [...] Allowing [...] would be costly
and serve few useful purposes.
The 8086 is a processor with 16 bit registers and a 20 bit address space. To cope with the lack of bits in its registers, a set of segment registers exists. On memory access, the dereferenced address is computed like this:
address = 16 * segment + register
Notice that among other things, an address has generally multiple ways to be represented. Comparing two arbitrary addresses is tedious as the compiler has to first normalize both addresses and then compare the normalized addresses.
Many compilers specify (in the memory models where this is possible) that when doing pointer arithmetic, the segment part is to be left untouched. This has several consequences:
objects can have a size of at most 64 kB
all addresses in an object have the same segment part
comparing addresses in an object can be done just by comparing the register part; that can be done in a single instruction
This fast comparison of course only works when the pointers are derived from the same base-address, which is one of the reasons why the C standard defines pointer comparisons only for when both pointers point into the same object.
If you want a well-ordered comparison for all pointers, consider converting the pointers to uintptr_t values first.
I believe it's undefined so that C can be run on architectures where, in effect, "smart pointers" are implemented in hardware, with various checks to ensure that pointers never accidentally point outside of the memory regions they're defined to refer to. I've never personally used such a machine, but the way to think about them is that computing an invalid pointer is precisely as forbidden as dividing by 0; you're likely to get a run-time exception that terminates your program. Furthermore, what's forbidden is computing the pointer, you don't even have to dereference it to get the exception.
Yes, I believe the definition also ended up permitting more-efficient comparisons of offset registers in old 8086 code, but that was not the only reason.
Yes, a compiler for one of these protected pointer architectures could theoretically implement the "forbidden" comparisons by converting to unsigned or the equivalent, but (a) it would likely be significantly less efficient to do so and (b) that would be a wantonly deliberate circumvention of the architecture's intended protection, protection which at least some of the architecture's C programmers would presumably want to have enabled (not disabled).
Historically, saying that action invoked Undefined Behavior meant that any program which made use of such actions could be expected to correctly only on those implementations which defined, for that action, behavior meeting their requirements. Specifying that an action invoked Undefined Behavior didn't mean that programs using such action should be considered "illegitimate", but was rather intended to allow C to be used to run programs that didn't require such actions, on platforms which could not efficiently support them.
Generally, the expectation was that a compiler would either output the sequence of instructions which would most efficiently perform the indicated action in the cases required by the standard, and do whatever that sequence of instructions happened to do in other cases, or would output a sequence of instructions whose behavior in such cases was deemed to be in some fashion more "useful" than the natural sequence. In cases where an action might trigger a hardware trap, or where triggering an OS trap might plausibly in some cases be considered preferable to executing the "natural" sequence of instructions, and where a trap might cause behaviors outside the control of the C compiler, the Standard imposes no requirements. Such cases are thus labeled as "Undefined Behavior".
As others have noted, there are some platforms where p1 < p2, for unrelated pointers p1 and p2, could be guaranteed to yield 0 or 1, but where the most efficient means of comparing p1 and p2 that would work in the cases defined by the Standard might not uphold the usual expectation that p1 < p2 || p2 > p2 || p1 != p2. If a program written for such a platform knows that it will never deliberately compare unrelated pointers (implying that any such comparison would represent a program bug) it may be helpful to have stress-testing or troubleshooting builds generate code which traps on any such comparisons. The only way for the Standard to allow such implementations is to make such comparisons Undefined Behavior.
Until recently, the fact that a particular action would invoke behavior that was not defined by the Standard would generally only pose difficulties for people trying to write code on platforms where the action would have undesirable consequences. Further, on platforms where an action could only have undesirable consequences if a compiler went out of its way to make it do so, it was generally accepted practice for programmers to rely upon such an action behaving sensibly.
If one accepts the notions that:
The authors of the Standard expected that comparisons between unrelated pointers would work usefully on those platforms, and only those platforms, where the most natural means of comparing related pointers would also work with unrelated ones, and
There exist platforms where comparing unrelated pointers would be problematic
Then it makes complete sense for the Standard to regard unrelated-pointer comparisons as Undefined Behavior. Had they anticipated that even compilers for platforms which define a disjoint global ranking for all pointers might make unrelated-pointer comparisons negate the laws of time and causality (e.g. given:
int needle_in_haystack(char const *hs_base, int hs_size, char *needle)
{ return needle >= hs_base && needle < hs_base+hs_size; }
a compiler may infer that the program will never receive any input which would cause needle_in_haystack to be given unrelated pointers, and any code which would only be relevant when the program receives such input may be eliminated) I think they would have specified things differently. Compiler writers would probably argue that the proper way to write needle_in_haystack would be:
int needle_in_haystack(char const *hs_base, int hs_size, char *needle)
for (int i=0; i<size; i++)
if (hs_base+i == needle) return 1;
return 0;
since their compilers would recognize what the loop is doing and also recognize that it's running on a platform where unrelated pointer comparisons work, and thus generate the same machine code as older compilers would have generated for the earlier-stated formulation. As to whether it would be better to require compilers provide a means of specifying that code resembling the former version should either sensibly on platforms that will support it or refuse compilation on those that won't, or better to require that programmers intending the former semantics should write the latter and hope that optimizers turn it into something useful, I leave that to the reader's judgment.

C structure assignment uses memcpy

I have this StructType st = StructTypeSecondInstance->st; and it generates a segfault. The strange part is when the stack backtrace shows me:
0x1067d2cc: memcpy + 0x10 (0, 10000, 1, 1097a69c, 11db0720, bfe821c0) + 310
0x103cfddc: some_function + 0x60 (0, bfe823d8, bfe82418, 10b09b10, 0, 0) +
So, does struct assigment use memcpy?
One can't tell. Small structs may even be kept in registers. Whether memcpy is used is an implementation detail (it's not even implementation-defined, or unspecified -- it's just something the compiler writer choses and does not need to document.)
From a C Standard point of view, all that matters is that after the assigment, the struct members of the destination struct compare equal to the corresponding members of the source struct.
I would expect compiler writers to make a tradeoff between speed and simplicity, probably based on the size of the struct, the larger the more likely to use a memcpy. Some memcpy implementations are very sophisticated and use different algorithms depending on whether the length is some power of 2 or not, or the alignment of the src and dst pointers. Why reinvent the wheel or blow up the code with an inline version of memcpy?
It might, yes.
This shouldn't be surprising: the struct assignment needs to copy a bunch of bytes from one place to another as quickly as possible, which happens to be the exact thing memcpy() is supposed to be good at. Generating a call to it seems like a no-brainer if you're a compiler writer.
Note that this means that assigning structs with lots of padding might be less efficient than optimally, since memcpy() can't skip the padding.
The standard doesn't say anything at all about how assignment (or any other operator) is actually realized by the compiler. There's nothing stopping a compiler from (say) generating a function call for every operation in your source file.
The compiler has license to implement assignment as it thinks best. Most of the time, with most compilers on most platforms, this means that if the structure is reasonably small, the compiler will generate an inline sequence of move instructions; if the structure is large, calling memcpy is common.
It would be perfectly valid, however, for the compiler to loop over generating random bitfields and stop when one of them matches the source of the assignment (Let's call this algorithm bogocopy).
Compilers that support non-hosted operation usually give you a switch to turn off emitting such libcalls if you're targeting a platform without an available (or complete) libc.
It depends on the compiler and platform. Assignment of big objects can use memcpy. But it must not be the reason of segfault.

Approved syntax for raw pointer manipulation

I am making a memory block copy routine and need to deal with blocks of raw memory in efficient chunks. My question is not about the specialized copy routine I'm making, but in how to correctly examine raw pointer alignment in C.
I have a raw pointer of memory, let's say it's already cast as a non-null char *.
In my architecture, I can very efficiently copy memory in 64 byte chunks WHEN IT IS ALIGNED TO A 64 BYTE chunk. So the (standard) trick is that I will do a simple copy of 0-63 bytes "manually" at the head and/or tail to transform the copy from an arbitrary char* of arbitrary length to a 64 byte aligned pointer with some multiple of 64 bytes in length.
Now the question is, how do you legally "examine" a pointer to determine (and manipulate) its alignment?
The obvious way is to cast it into an integer and just examine the bits:
char *pointer=something.
int p=(int)pointer;
char *alignedPointer=(char *)((p+63)&~63);
Note here I realize that alignedPointer doesn't point to the same memory as pointer... this is the "rounded up" pointer that I can call my efficient copy routine on, and I'll handle any other bytes at the beginning manually.
But compilers (justifiably) freak out at casting a pointer into an integer. But how else can I examine and manipulate the pointer's lower bits in LEGAL C? Ideally so that with different compilers I'd get no errors or warnings.
For integer types that are large enough to hold pointers, C99 stdint.h has:
For data lengths there are:
which have been around since well before C99.
If your platform doesn't have these, you can maximise your code's portability by still using these type names, and making suitable typedefs for them.
I don't think that in the past people were as reluctant to do their own bit-banging, but maybe the current "don't touch that" mood would be conducive to someone creating some kind of standard library for aligning pointers. Lacking some kind of official api, you have no choice but to AND and OR your way through.
Instead of int, try a datatype that's guaranteed to be the same size as a pointer (INT_PTR on Win32/64). Maybe the compiler won't freak out too much. :) Or use a union, if 64-bit compatibility is not important.
Casting pointers to and from integers is valid, but the results are implementation-defined. See section of the standard. The intention seems to be that the results are what anybody familiar with the system would expect, and indeed this appears to be routinely the case in practice.
If the architecture in question can efficiently manipulate pointers and integers interchangeably, and the issue is just whether it will work on all compilers for that system, then the answer is that it probably will anyway.
(Certainly, if I were writing this code, I would think it fine as-is until proven otherwise. My experience has been that compilers for a given system all behave in very similar ways at this sort of level; the assembly language just suggests a particular approach, that all then take.)
"Probably works" isn't very good general advice though, so my suggestion would be just write the code that works, surround it enough suitable #ifdefs that only the known compiler(s) will compile it, and defer to memcpy in other cases.
#ifdef is rarely ideal, but it's fairly lightweight compared to other possibilities. And if implementation-defined behaviour or compiler-specific tricks are needed then the options are pretty limited anyway.
