Safely checking for overlapping memory regions

Safely checking for overlapping memory regions - c

I'm trying to further my knowledge and experience in C, so I'm writing some small utilities.
I'm copying memory, and according to the man page for memcpy(3):
NOTES
Failure to observe the requirement that the memory areas do not overlap has been
the source of real bugs. (POSIX and the C standards are explicit that employing
memcpy() with overlapping areas produces undefined behavior.) Most notably, in
glibc 2.13 a performance optimization of memcpy() on some platforms (including
x86-64) included changing the order in which bytes were copied from src to dest.
Clearly, overlapping memory regions passed to memcpy(3) can cause a lot of problems.
I'm trying to write a safe wrapper as part of learning C to make sure that these memory regions don't overlap:
int safe_memcpy(void *dest, void *src, size_t length);
The logic I'm trying to implement is:
Check both the source and destination pointers for NULL.
Establish the pointer "range" for both source and dest with the length parameter.
Determine if the source range intersects with the destination range, and vice versa.
My implementation so far:
#define SAFE_MEMCPY_ERR_NULL 1
#define SAFE_MEMCPY_ERR_SRC_OVERLAP 2
#define SAFE_MEMCPY_ERR_DEST_OVERLAP 3
int safe_memcpy(void *dest, void *src, size_t length) {
if (src == NULL || dest == NULL) {
return SAFE_MEMCPY_ERR_NULL;
}
void *dest_end = &dest[length - 1];
void *src_end = &src[length - 1];
if ((&src >= &dest && &src <= &dest_end) ||
(&src_end >= &dest && &src_end <= &dest_end)) {
// the start of src falls within dest..dest_end OR
// the end of src falls within dest..dest_end
return SAFE_MEMCPY_ERR_SRC_OVERLAP;
}
if ((&dest >= &src && &dest <= &src_end) ||
(&dest_end >= &src && &dest_end <= &src_end)) {
// the start of dest falls within src..src_end
// the end of dest falls within src..src_end
return SAFE_MEMCPY_ERR_DEST_OVERLAP;
}
// do the thing
memcpy(dest, src, length);
return 0;
}
There's probably a better way to do errors, but this is what I've got for now.
I'm pretty sure I'm triggering some undefined behavior in this code, as I'm hitting SAFE_MEMCPY_ERR_DEST_OVERLAP on memory regions that do not overlap. When I examine the state using a debugger, I see (for instance) the following values:
src: 0x7ffc0b75c5fb
src_end: 0x7ffc0b75c617
dest: 0x1d05420
dest_end: 0x1d0543c
Clearly, these addresses do not even remotely overlap, hence why I'm thinking I'm triggering UB, and compiler warnings indicate as such:
piper.c:68:27: warning: dereferencing ‘void *’ pointer
void *dest_end = &dest[length - 1];
It seems that I need to cast the pointers as a different type, but I'm not sure which type to use: the memory is untyped so should I use a char * to "look at" the memory as bytes? If so, should I cast everything as a char *? Should I instead use intptr_t or uintptr_t?
Given two pointers and a length for each of them, how can I safely check if these regions overlap one another?

In the first place, a conforming program cannot perform pointer arithmetic on a pointer of type void *, nor (relatedly) apply the indexing operator to it, not even with index 0. void is an incomplete type, and unique among those in that it cannot be completed. The most relevant implication of that is that that type does not convey any information about the size of the thing to which it points, and pointer arithmetic is defined in terms of the pointed-to object.
So yes, expressions such as your &dest[length - 1] have undefined behavior with respect to the C standard. Some implementations provide extensions affecting that, and others reject such code at compile time. In principle, an implementation could accept the code and do something bizarre with it, but that's relatively unlikely.
In the second place, you propose to
write a safe wrapper as part of learning C to make sure that these memory regions don't overlap
, but there is no conforming way to do that for general pointers. Pointer comparisons and pointer differences are defined only for pointers into the same array (or to one element past the end of the array), where a pointer to a scalar is considered in that regard as a pointer to the first element of dimension-1 array.
Converting to a different pointer type, perhaps char *, would resolve the pointer arithmetic issue, but not, in the general case, the pointer comparability issue. It might get exactly the behavior you want out of some implementations, reliably even, but it is not a conforming approach to the problem, and the ensuing undefined behavior might produce genuine bugs in other implementations.
Relatively often, you can know statically that pointers do not point to overlapping regions. In particular, if one pointer in question is a pointer to an in-scope local variable or to a block of memory allocated by the current function, then you can usually be sure whether there is an overlap. For cases where you do not know, or where you know that there definitely is overlap, the correct approach is to use memmove() instead of memcpy().

This "safe" memcpy is not safe as well as it does not copy anything when programmes expects it. Use memmove to be safe
You should not use &src and &dest as it is not beginning of the data or buffer but the address of the parameter src and dest itself.
Same is with srcend and destend

Given two pointers and a length for each of them, how can I safely check if these regions overlap one another?
<, <=, >=, > are not defined when 2 pointers are not related to the same object.
A tedious approach checks the endpoints of one against all the other's elements and takes advantage that the length of the source and destination are the same.
int safe_memcpy(void *dest, const void *src, size_t length) {
if (length > 0) {
unsigned char *d = dest;
const unsigned char *s = src;
const unsigned char *s_last = s + length - 1;
for (size_t i = 0; i < length; i++) {
if (s == &d[i]) return 1; // not safe
if (s_last == &d[i]) return 1; // not safe
}
memcpy(dest, src, length);
}
return 0;
}
If the buffer lengths differ, check the shorter one's endpoints against the addresses of the longer one's elements.
should I cast everything as a char *
Use unsigned char *.
mem...(), str...() behave as if each array element was unsigned char.
For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3
With rare non-2's complement, unsigned char is important to avoid signed char traps and maintain -0, +0 distinctiveness. Strings only stop on +0.
With functions like int strcmp/memcmp(), unsigned char that use integer math, it is important when comparing elements outside the range of [0...CHAR_MAX] to return the correctly signed result.
Even if void * indexing was allowed, void *dest_end = &dest[length - 1]; is very bad when length == 0 as that is like &dest[SIZE_MAX];
&src >= &dest s/b src >= dest for even a chance at working.
The addresses of src, dest are irrelevant to the copy, only their values are important.
I suspect this errant code leads to UB in OP's other code.
Should I instead use intptr_t or uintptr_t?
Note that (u)intptr_t are optional types - they might not exist in a conforming compiler.
Even when the types exist, math on the pointers is not defined to be related to math on the integer values.
Clearly, these addresses do not even remotely overlap, hence why I'm thinking I'm triggering UB,
"Clearly" if ones assumes a liner mapping addresses to integers, something not specified in C.

The memory is untyped so should I use a char * to "look at" the memory as bytes? If so, should I cast everything as a char *?
Use unsigned char* if you need to dereference the data, or just char* when you want to increment/decrement the pointer value by count of bytes.
It's common to do:
void a_function_that_takes_void(void *x, void *y) {
char *a = x;
char *b = y;
/* uses a and b throughout here */
}
If so, should I cast everything as a char *?
Yes. It's also common to do:
void_pointer = (char*)void_pointer + 1;
Should I instead use intptr_t or uintptr_t?
You could, but that would be the same as using char*, except for a char* to intptr_t conversion.
how can I safely check if these regions overlap one another?
It's good to do some research. how to implement overlap-checking memcpy in C

Related

Is creating an array with a built-in lenght common in c?

For an experiment I created a function to initialize an array that have a built-in length like in java
int *create_arr(int len) {
void *ptr = malloc(sizeof(int[len + 1]));
int *arr = ptr + sizeof(int);
arr[-1] = len;
return arr;
}
that can be later be used like this
int *arr = create_arr(12);
and allow to find the length at arr[-1]. I was asking myself if this is a common practice or not, and if there is an error in what i did.

First of all, your code has some bugs, mainly that in standard C you can't do arithmetic on void pointers (as commented by MikeCAT). Probably a more typical way to write it would be:
int *create_arr(int len) {
int *ptr = malloc((len + 1) * sizeof(int));
if (ptr == NULL) {
// handle allocation failure
}
ptr[0] = len;
return ptr + 1;
}
This is legal but no, it's not common. It's more idiomatic to keep track of the length in a separate variable, not as part of the array itself. An exception is functions that try to reproduce the effect of malloc, where the caller will later pass back the pointer to the array but not the size.
One other issue with this approach is that it limits your array length to the maximum value of an int. On, let's say, a 64-bit system with 32-bit ints, you could conceivably want an array whose length did not fit in an int. Normally you'd use size_t for array lengths instead, but that won't work if you need to fit the length in an element of the array itself. (And of course this limitation would be much more severe if you wanted an array of short or char or bool :-) )
Note that, as Andrew Henle comments, the pointer returned by your function could be used for an array of int, but would not be safe to use for other arbitrary types as you have destroyed the alignment promised by malloc. So if you're trying to make a general wrapper or replacement for malloc, this doesn't do it.

Apart from the small mistakes that have already been pointed in comments, this is not common, because C programmers are used to handle arrays as an initial pointer and a size. I have mainly seen that in mixed programming environments, for example in Windows COM/DCOM where C++ programs can exchange data with VB programs.
Your array with builtin size is close to winAPI BSTR: an array of 16 bits wide chars where the allocated size is at index -1 (and is also a 16 bit integer). So there is nothing really bad with it.
But in the general case, you could have an alignment problem. malloc does return a pointer with a suitable alignment for any type. And you should make sure that the 0th index of your returned array also has a suitable alignment. If int has not the larger alignment, it could fail...
Furthermore, as the pointer is not a the beginning of the allocated memory, the array would require a special function for its deallocation. It should probaby be documented in a red flashing font, because this would be very uncommon for most C programmers.

This technique is not as uncommon as people expect. For example stb header only library for image processing uses this method to implement type safe vector like container in C. See https://github.com/nothings/stb/blob/master/stretchy_buffer.h

It would be more idiomatic to do something like:
struct array {
int *d;
size_t s;
};
struct array *
create_arr(size_t len)
{
struct array *a = malloc(sizeof *a);
if( a ){
a->d = malloc(len * sizeof *a->d);
a->s = a->d ? len : 0;
}
return a;
}

Comparison (>,>=,<,<=) of pointers from unrelated blocks

I was looking at the GNU implementation of obstacks, and I noticed the obstack_free subroutine is using pointer comparison to the beginnings and ends of the previous links of the linked list to find what block the pointer-to-be freed belongs to.
https://code.woboq.org/userspace/glibc/malloc/obstack.c.html
while (lp != 0 && ((void *) lp >= obj || (void *) (lp)->limit < obj))
{
plp = lp->prev;
CALL_FREEFUN (h, lp);
lp = plp;
h->maybe_empty_object = 1;
} //...
Such comparison appears to be undefined as per http://port70.net/~nsz/c/c11/n1570.html#6.5.8p5:
When two pointers are compared, the result depends on the relative
locations in the address space of the objects pointed to. If two
pointers to object types both point to the same object, or both point
one past the last element of the same array object, they compare
equal. If the objects pointed to are members of the same aggregate
object, pointers to structure members declared later compare greater
than pointers to members declared earlier in the structure, and
pointers to array elements with larger subscript values compare
greater than pointers to elements of the same array with lower
subscript values. All pointers to members of the same union object
compare equal. If the expression P points to an element of an array
object and the expression Q points to the last element of the same
array object, the pointer expression Q+1 compares greater than P. In
all other cases, the behavior is undefined.
Is there a fully standard compliant way to implement obstacks. If not, what platforms could such comparison practically break on?

I am not a language-lawyer, so I don't know how to answer OP's question, except that a plain reading of the standard does not describe the entire picture.
While the standard says that comparing unrelated pointers yields undefined results, the behaviour of a standards-compliant C compiler is much more restricted.
The first sentence in the section concerning pointer comparison is
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to.
and for a very good reason.
If we examine the possibilities how the pointer comparison code may be used, we find that unless the compiler can determine which objects the compared pointers belong to at compile time, all pointers in the same address space must compare arithmetically, according to the addresses they refer to.
(If we prove that a standards-compliant C compiler is required by the standard to provide specific results when a plain reading of the C standard itself says the results are undefined, is such code standards-compliant or not? I don't know. I only know such code works in practice.)
A literal interpretation of the standard may lead to one believing that there is absolutely no way of determining whether a pointer refers to an array element or not. In particular, observing
int is_within(const char *arr, const size_t len, const char *ptr)
{
return (ptr >= arr) && (ptr < (arr + len));
}
a standards compliant C compiler could decide that because comparison between unrelated pointers is undefined, it is justified in optimizing the above function into
int is_within(const char *arr, const size_t len, const char *ptr)
{
if (size)
return ptr != (arr + len);
else
return 0;
}
which returns 1 for pointers within array const char arr[len], and zero at the element just past the end of the array, just like the standard requires; and 1 for all undefined cases.
The problem in that line of thinking arises when a caller, in a separate compilation unit, does e.g.
char buffer[1024];
char *p = buffer + 768;
if (is_within(buffer, (sizeof buffer) / 2, p)) {
/* bug */
} else {
/* correct */
}
Obviously, if the is_within() function was declared static (or static inline), the compiler could examine all call chains that end up in is_within(), and produce correct code.
However, when is_within() is in a separate compilation unit compared to its callers, the compiler can no longer make such assumptions: it simply does not, and cannot know, the object boundaries beforehand. Instead, the only way it can be implemented by a standards-compliant C compiler, is to rely on the addresses the pointers refer to, blindly; something like
int is_within(const char *arr, const size_t len, const char *ptr)
{
const uintptr_t start = POINTER_TO_UINTPTR(arr);
const uintptr_t limit = POINTER_TO_UINTPTR(arr + len);
const uintptr_t thing = POINTER_TO_UINTPTR(ptr);
return (thing >= start) && (thing < limit);
}
where the POINTER_TO_UINTPTR() would be a compiler-internal macro or function, that converts the pointer losslessly to an unsigned integer value (with the intent that there would be a corresponding UINTPTR_TO_POINTER() that could recover the exact same pointer from the unsigned integer value), without consideration for any optimizations or rules allowed by the C standard.
So, if we assume that the code is compiled in a separate compilation unit to its users, the compiler is forced to generate code that provides more quarantees than a simple reading of the C standard would indicate.
In particular, if arr and ptr are in the same address space, the C compiler must generate code that compares the addresses the pointers point to, even if the C standard says that comparison of unrelated pointers yields undefined results; simply because it is at least theoretically possible for an array of objects to occupy any subregion of the address space. The compiler just cannot make assumptions that break conforming C code later on.
In the GNU obstack implementation, the obstacks all exist in the same address space (because of how they are obtained from the OS/kernel). The code assumes that the pointers supplied to it refer to these objects. Although the code does return an error if it detects that a pointer is invalid, it does not guarantee it always detects invalid pointers; thus, we can ignore the invalid pointer cases, and simply assume that because all obstacks are from the same address space, so are all the user-supplied pointers.
There are many architectures with multiple address spaces. x86 with a segmented memory model is one of these. Many microcontrollers have Harvard architecture, with separate address spaces for code and data. Some microcontrollers have a separate address space (different machine instructions) for accessing RAM and flash memory (but capable of executing from both), and so on.
It is even possible for there to be an architecture where each pointer has not only its memory address, but some kind of unique object ID associated with it. This is nothing special; it just means that on such an architecture, each object has their own address space.

strcpy()/strncpy() crashes on structure member with extra space when optimization is turned on on Unix?

When writing a project, I ran into a strange issue.
This is the minimal code I managed to write to recreate the issue. I am intentionally storing an actual string in the place of something else, with enough space allocated.
// #include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <stddef.h> // For offsetof()
typedef struct _pack{
// The type of `c` doesn't matter as long as it's inside of a struct.
int64_t c;
} pack;
int main(){
pack *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = offsetof(pack, c) + (strlen(str) + 1);
p = malloc(len);
// Version 1: crash
strcpy((char*)&(p->c), str);
// Version 2: crash
strncpy((char*)&(p->c), str, strlen(str)+1);
// Version 3: works!
memcpy((char*)&(p->c), str, strlen(str)+1);
// puts((char*)&(p->c));
free(p);
return 0;
}
The above code is confusing me:
With gcc/clang -O0, both strcpy() and memcpy() works on Linux/WSL, and the puts() below gives whatever I entered.
With clang -O0 on OSX, the code crashes with strcpy().
With gcc/clang -O2 or -O3 on Ubuntu/Fedora/WSL, the code crashes (!!) at strcpy(), while memcpy() works well.
With gcc.exe on Windows, the code works well whatever the optimization level is.
Also I found some other traits of the code:
(It looks like) the minimum input to reproduce the crash is 9 bytes (including zero terminator), or 1+sizeof(p->c). With that length (or longer) a crash is guaranteed (Dear me ...).
Even if I allocate extra space (up to 1MB) in malloc(), it doesn't help. The above behaviors don't change at all.
strncpy() behaves exactly the same, even with the correct length supplied to its 3rd argument.
The pointer does not seem to matter. If structure member char *c is changed into long long c (or int64_t), the behavior remains the same. (Update: changed already).
The crash message doesn't look regular. A lot of extra info is given along.
I tried all these compilers and they made no difference:
GCC 5.4.0 (Ubuntu/Fedora/OS X/WSL, all are 64-bit)
GCC 6.3.0 (Ubuntu only)
GCC 7.2.0 (Android, norepro???) (This is the GCC from C4droid)
Clang 5.0.0 (Ubuntu/OS X)
MinGW GCC 6.3.0 (Windows 7/10, both x64)
Additionally, this custom string copy function, which looks exactly like the standard one, works well with any compiler configuration mentioned above:
char* my_strcpy(char *d, const char* s){
char *r = d;
while (*s){
*(d++) = *(s++);
}
*d = '\0';
return r;
}
Questions:
Why does strcpy() fail? How can it?
Why does it fail only if optimization is on?
Why doesn't memcpy() fail regardless of -O level??
*If you want to discuss about struct member access violation, pleast head over here.
Part of objdump -d's output of a crashing executable (on WSL):
P.S. Initially I want to write a structure, the last item of which is a pointer to a dynamically allocated space (for a string). When I write the struct to file, I can't write the pointer. I must write the actual string. So I came up with this solution: force store a string in the place of a pointer.
Also please don't complain about gets(). I don't use it in my project, but the example code above only.

What you are doing is undefined behavior.
The compiler is allowed to assume that you will never use more than sizeof int64_t for the variable member int64_t c. So if you try to write more than sizeof int64_t(aka sizeof c) on c, you will have an out-of-bounds problem in your code. This is the case because sizeof "aaaaaaaa" > sizeof int64_t.
The point is, even if you allocate the correct memory size using malloc(), the compiler is allowed to assume you will never use more than sizeof int64_t in your strcpy() or memcpy() call. Because you send the address of c (aka int64_t c).
TL;DR: You are trying to copy 9 bytes to a type consisting of 8 bytes (we suppose that a byte is an octet). (From #Kcvin)
If you want something similar use flexible array members from C99:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
size_t size;
char str[];
} string;
int main(void) {
char str[] = "aaaaaaaa";
size_t len_str = strlen(str);
string *p = malloc(sizeof *p + len_str + 1);
if (!p) {
return 1;
}
p->size = len_str;
strcpy(p->str, str);
puts(p->str);
strncpy(p->str, str, len_str + 1);
puts(p->str);
memcpy(p->str, str, len_str + 1);
puts(p->str);
free(p);
}
Note: For standard quote please refer to this answer.

I reproduced this issue on my Ubuntu 16.10 and I found something interesting.
When compiled with gcc -O3 -o ./test ./test.c, the program will crash if the input is longer than 8 bytes.
After some reversing I found that GCC replaced strcpy with memcpy_chk, see this.
// decompile from IDA
int __cdecl main(int argc, const char **argv, const char **envp)
{
int *v3; // rbx
int v4; // edx
unsigned int v5; // eax
signed __int64 v6; // rbx
char *v7; // rax
void *v8; // r12
const char *v9; // rax
__int64 _0; // [rsp+0h] [rbp+0h]
unsigned __int64 vars408; // [rsp+408h] [rbp+408h]
vars408 = __readfsqword(0x28u);
v3 = (int *)&_0;
gets(&_0, argv, envp);
do
{
v4 = *v3;
++v3;
v5 = ~v4 & (v4 - 16843009) & 0x80808080;
}
while ( !v5 );
if ( !((unsigned __int16)~(_WORD)v4 & (unsigned __int16)(v4 - 257) & 0x8080) )
v5 >>= 16;
if ( !((unsigned __int16)~(_WORD)v4 & (unsigned __int16)(v4 - 257) & 0x8080) )
v3 = (int *)((char *)v3 + 2);
v6 = (char *)v3 - __CFADD__((_BYTE)v5, (_BYTE)v5) - 3 - (char *)&_0; // strlen
v7 = (char *)malloc(v6 + 9);
v8 = v7;
v9 = (const char *)_memcpy_chk(v7 + 8, &_0, v6 + 1, 8LL); // Forth argument is 8!!
puts(v9);
free(v8);
return 0;
}
Your struct pack makes GCC believe that the element c is exactly 8 bytes long.
And memcpy_chk will fail if the copying length is larger than the forth argument!
So there are 2 solutions:
Modify your structure
Using compile options -D_FORTIFY_SOURCE=0(likes gcc test.c -O3 -D_FORTIFY_SOURCE=0 -o ./test) to turn off fortify functions.
Caution: This will fully disable buffer overflow checking in the whole program!!

No answer has yet talked in detail about why this code may or may not be undefined behaviour.
The standard is underspecified in this area, and there is a proposal active to fix it. Under that proposal, this code would NOT be undefined behaviour, and the compilers generating code that crashes would fail to comply with the updated standard. (I revisit this in my concluding paragraph below).
But note that based on the discussion of -D_FORTIFY_SOURCE=2 in other answers, it seems this behaviour is intentional on the part of the developers involved.
I'll talk based on the following snippet:
char *x = malloc(9);
pack *y = (pack *)x;
char *z = (char *)&y->c;
char *w = (char *)y;
Now, all three of x z w refer to the same memory location, and would have the same value and the same representation. But the compiler treats z differently to x. (The compiler also treats w differently to one of those two, although we don't know which as OP didn't explore that case).
This topic is called pointer provenance. It means the restriction on which object a pointer value may range over. The compiler is taking z as having a provenance only over y->c, whereas x has provenance over the entire 9-byte allocation.
The current C Standard does not specify provenance very well. The rules such as pointer subtraction may only occur between two pointers to the same array object is an example of a provenance rule. Another provenance rule is the one that applies to the code we are discussing, C 6.5.6/8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
The justification for bounds-checking of strcpy, memcpy always comes back to this rule - those functions are defined to behave as if they were a series of character assignments from a base pointer that's incremented to get to the next character, and the increment of a pointer is covered by (P)+1 as discussed in this rule.
Note that the term "the array object" may apply to an object that wasn't declared as an array. This is spelled out in 6.5.6/7:
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
The big question here is: what is "the array object"? In this code, is it y->c, *y, or the actual 9-byte object returned by malloc?
Crucially, the standard sheds no light whatsoever on this matter. Whenever we have objects with subobjects, the standard does not say whether 6.5.6/8 is referring to the object or the subobject.
A further complicating factor is that the standard does not provide a definition for "array", nor for "array object". But to cut a long story short, the object allocated by malloc is described as "an array" in various places in the standard, so it does seem that the 9-byte object here is a valid candidate for "the array object". (In fact this is the only such candidate for the case of using x to iterate over the 9-byte allocation, which I think everyone would agree is legal).
Note: this section is very speculative and I attempt to provide an argument as to why the solution chosen by the compilers here is not self-consistent
An argument could be made that &y->c means the provenance is the int64_t subobject. But this does immediately lead to difficulty. For example, does y have the provenance of *y? If so, (char *)y should have the the provenance *y still, but then this contradicts the rule of 6.3.2.3/7 that casting a pointer to another type and back should return the original pointer (as long as alignment is not violated).
Another thing it doesn't cover is overlapping provenance. Can a pointer compare unequal to a pointer of the same value but a smaller provenance (which is a subset of the larger provenance) ?
Further, if we apply that same principle to the case where the subobject is an array:
char arr[2][2];
char *r = (char *)arr;
++r; ++r; ++r; // undefined behavior - exceeds bounds of arr[0]
arr is defined as meaning &arr[0] in this context, so if the provenance of &X is X, then r is actually bounded to just the first row of the array -- perhaps a surprising result.
It would be possible to say that char *r = (char *)arr; leads to UB here, but char *r = (char *)&arr; does not. In fact I used to promote this view in my posts many years ago. But I no longer do: in my experience of trying to defend this position, it just can't be made self-consistent, there are too many problem scenarios. And even if it could be made self-consistent, the fact remains that the standard doesn't specify it. At best, this view should have the status of a proposal.
To finish up, I would recommend reading N2090: Clarifying Pointer Provenance (Draft Defect Report or Proposal for C2x).
Their proposal is that provenance always applies to an allocation. This renders moot all the intricacies of objects and subobjects. There are no sub-allocations. In this proposal, all of x z w are identical and may be used to range over the whole 9-byte allocation. IMHO the simplicity of this is appealing, compared to what was discussed in my previous section.

This is all because of -D_FORTIFY_SOURCE=2 intentionally crashing on what it decides is unsafe.
Some distros build gcc with -D_FORTIFY_SOURCE=2 enabled by default. Some don't. This explains all the differences between different compilers. Probably the ones that don't crash normally will if you build your code with -O3 -D_FORTIFY_SOURCE=2.
Why does it fail only if optimization is on?
_FORTIFY_SOURCE requires compiling with optimization (-O) to keep track of object sizes through pointer casts / assignments. See the slides from this talk for more about _FORTIFY_SOURCE.
Why does strcpy() fail? How can it?
gcc calls __memcpy_chk for strcpy only with -D_FORTIFY_SOURCE=2. It passes 8 as the size of the target object, because that's what it thinks you mean / what it can figure out from the source code you gave it. Same deal for strncpy calling __strncpy_chk.
__memcpy_chk aborts on purpose. _FORTIFY_SOURCE may be going beyond things that are UB in C and disallowing things that look potentially dangerous. This gives it license to decide that your code is unsafe. (As others have pointed out, a flexible array member as the last member of your struct, and/or a union with a flexible-array member, is how you should express what you're doing in C.)
gcc even warns that the check will always fail:
In function 'strcpy',
inlined from 'main' at <source>:18:9:
/usr/include/x86_64-linux-gnu/bits/string3.h:110:10: warning: call to __builtin___memcpy_chk will always overflow destination buffer
return __builtin___strcpy_chk (__dest, __src, __bos (__dest));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(from gcc7.2 -O3 -Wall on the Godbolt compiler explorer).
Why doesn't memcpy() fail regardless of -O level?
IDK.
gcc fully inlines it just an 8B load/store + a 1B load/store. (Seems like a missed optimization; it should know that malloc didn't modify it on the stack, so it could just store it from immediates again instead of reloading. (Or better keep the 8B value in a register.)

why making things complicated? Overcomplexifying like you're doing gives just more space for undefined behaviour, in that part:
memcpy((char*)&p->c, str, strlen(str)+1);
puts((char*)&p->c);
warning: passing argument 1 of 'puts' from incompatible pointer ty
pe [-Wincompatible-pointer-types]
puts(&p->c);
you're clearly ending up in an unallocated memory area or somewhere writable if you're lucky...
Optimizing or not may change the values of the addresses, and it may work (since the addresses match), or not. You just cannot do what you want to do (basically lying to the compiler)
I would:
allocate just what's needed for the struct, don't take the length of the string inside into account, it's useless
don't use gets as it's unsafe and obsolescent
use strdup instead of the bug-prone memcpy code you're using since you're handling strings. strdup won't forget to allocate the nul-terminator, and will set it in the target for you.
don't forget to free the duplicated string
read the warnings, put(&p->c) is undefined behaviour
test.c:19:10: warning: passing argument 1 of 'puts' from incompatible pointer ty
pe [-Wincompatible-pointer-types]
puts(&p->c);
My proposal
int main(){
pack *p = malloc(sizeof(pack));
char str[1024];
fgets(str,sizeof(str),stdin);
p->c = strdup(str);
puts(p->c);
free(p->c);
free(p);
return 0;
}

Your pointer p->c is the cause of crash.
First initialize struct with size of "unsigned long long" plus size of "*p".
Second initialize pointer p->c with the required area size.
Make operation copy: strcpy(p->c, str);
Finally free first free(p->c) and free(p).
I think it was this.
[EDIT]
I'll insist.
The cause of the error is that its structure only reserves space for the pointer but does not allocate the pointer to contain the data that will be copied.Take a look
int main()
{
pack *p;
char str[1024];
gets(str);
size_t len_struc = sizeof(*p) + sizeof(unsigned long long);
p = malloc(len_struc);
p->c = malloc(strlen(str));
strcpy(p->c, str); // This do not crashes!
puts(&p->c);
free(p->c);
free(p);
return 0;
}
[EDIT2]
This is not a traditional way to store data but this works:
pack2 *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = sizeof(pack) + (strlen(str) + 1);
p = malloc(len);
// Version 1: crash
strcpy((char*)p + sizeof(pack), str);
free(p);

implementation of memcpy function

I looked at
http://www.opensource.apple.com/source/xnu/xnu-2050.24.15/libsyscall/wrappers/memcpy.c
and didn't understand the following :
1-
inside
void * memcpy(void *dst0, const void *src0, size_t length) {
char *dst = dst0;
const char *src = src0;
line:
if ((unsigned long)dst < (unsigned long)src) {
How can we cast dst to an unsigned long ? it's a pointer !
2- Why do they sometimes prefer forward copying and sometimes backwards??

You are right, this implementation is non-portable, because it is assuming that a pointer is going to fit in unsigned long. This is not guaranteed by the standard.
A proper type for this implementation would have been uintptr_t, which is guaranteed to fit a pointer.

When comparing a void* and char* pointer, the compiler will give a warning (gcc -Wall):
warning: comparison of distinct pointer types lacks a cast
I imagine that the developer decided to "make the warning go away" - but the correct way to do this is with a void* cast (which is portable):
if((void*) dst < (void*) src) {
As for the second point - as was pointed out, you have to take care of overlapping memory locations. Imagine the following 8 characters in successive memory locations:
abcdefgh
Now we want to copy this "3 to the right". Starting with a, we would get (with left-to-right copy):
abcdefgh
abcaefgh
^
abcabfgh
^^
abcabcgh
^^^
etc until you end up with
abcabcabcab
^^^^^^^^
When we wanted to get
abcabcdefgh
Starting from the other end, we don't overwrite things we still have to copy. When the destination is to the left of the source, you have to do it in the opposite direction. And when there is no overlap between source and destination, it doesn't matter what you do.

Is this a correct and portable way of checking if 2 c-strings overlap in memory?

Might not be the most efficient way, but is it correct and portable?
int are_overlapping(const char *a, const char *b) {
return (a + strlen(a) == b + strlen(b));
}
To clarify: what I'm looking for is overlap in memory, not in the actual content. For example:
const char a[] = "string";
const char b[] = "another string";
are_overlapping(a, b); // should return 0
are_overlapping(a, a + 3); // should return 1

Yes, your code is correct. If two strings end at the sample place they by definition overlapped - they share the same null terminator. Either both strings are identical, or one is a substring of the other.
Everything about your program is perfectly well-defined behaviour, so assuming standards-compliant compilers, it should be perfectly portable.
The relevant bit in the standard is from 6.5.9 Equality operators (emphasis mine):
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

Thinking about zdan's comments on my previous post (which will probably shortly be deleted), I've come to the conclusion that checking endpoints is sufficient.
If there's any overlap, the null terminator will make the two strings not be distinct. Let's look at some possibilities.
If you start with
a 0x10000000 "Hello" and somehow add
b 0x10000004 "World",
you'll have a single word: HellWorld, since the W would overwrite the \0. They would end at the same endpoint.
If somehow you write to the same starting point:
a 0x10000000 "Hello" and
b 0x10000000 "Jupiter"
You'll have the word Jupiter, and have the same endpoint.
Is there a case where you can have the same endpoint and not have overlap? Kind of.
a = 0x1000000 "Four" and
b = 0x1000004 "".
That will give an overlap as well.
I can't think of any time you'll have overlap where you won't have matching endpoints - assuming that you're writing null terminated strings into memory.
So, the short answer: Yes, your check is sufficient.

It is probably not relevant to your use case, as your question is specifically about C-strings, but the code will not work in the case that the data has embedded NUL bytes in the strings.
char a[] = "abcd\0ABCD";
char *b = a + 5;
Other than that, your solution is straight forward and correct. It works since you are only using == for the pointer comparison, and according to the standard (from C11 6.5.9/6)
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
However, the relational operators are more strict (from C11 6.5.8/5):
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
The last sentence is the kicker.
Some have taken exception to the fact that your code may compute the length of the overlap twice, and have attempted to take precautions to avoid it. However, the efficiency of reducing that compute is countered with an extra pointer comparison per iteration, or involves undefined or implementation defined behavior. Assuming you want a portable and compliant solution, the actual average gain is likely nil, and not worth the effort.

This solution is still the same worst-case performance, but is optimized for hits -- you don't have to parse both strings.
char * temp_a = a;
char * temp_b = b;
while (*temp_a != '\0') {
if (temp_a++ == b)
return 1;
}
// check for b being an empty string
if (temp_a == b) return 1;
/* but if b was larger, we aren't done, so you have to try from b now */
while (*temp_b != '\0') {
if (temp_b++ == a)
return 1;
}
/* don't need the a==b check again here
return 0;
Apparently, only pointer equality (not inequality) is portable in C, so the following solutions aren't portable -- everything below is from before I knew that.
Your solution is valid, but why calculate strlen on the second string? You know the start and end point of one string, just see if the other is between them (inclusive). saves you a pass through the second string -- O(M+N) to O(M)
char * lower_addr_string = a < b ? a : b
char * higher_addr_string = a > b ? a : b
length = strlen(lower_addr_string)
return higher_addr_string >= lower_addr_string && higher_addr_string <= lower_addr_string + length;
alternatively, do the string parsing yourself..
char * lower_addr_string = a < b ? a : b
char * higher_addr_string = a > b ? a : b
while(*lower_addr_string != '\0') {
if (lower_addr_string == higher_addr_string)
return 1;
++lower_addr_string;
}
/* check the last character */
if (lower_addr_string == higher_addr_string)
return 1;
return 0;

Yes, your check is correct, but it is certainly not the most efficient (if by "efficiency" you mean the computational efficiency). The obvious intuitive inefficiency in your implementation is based on the fact that when the strings actually overlap, the strlen calls will iterate over their common portion twice.
For the sake of formal efficiency, one might use a slightly different approach
int are_overlapping(const char *a, const char *b)
{
if (a > b) /* or `(uintptr_t) a > (uintptr_t) b`, see note below! */
{
const char *t = a;
a = b;
b = t;
}
while (a != b && *a != '\0')
++a;
return a == b;
}
An important note about this version is that it performs relational comparison of two pointers that are not guaranteed to point to the same array, which formally leads to undefined behavior. It will work in practice on a system with flat memory model, but might draw criticism from a pedantic code reviewer. To formally work around this issue one might convert the pointers to uintptr_t before performing relational comparisons. That way the undefined behavior gets converted to implementation-defined behavior with proper semantics for our purposes in most (if not all) traditional implementations with flat memory model.
This approach is free from the "double counting" problem: it only analyzes the non-overlapping portion of the string that is located "earlier" in memory. Of course, in practice the benefits of this approach might prove to be non-existent. It will depend on both the quality of your strlen implementation and one the properties of the actual input.
For example, in this situation
const char *str = "Very very very long string, say 64K characters long......";
are_overlapped(str, str + 1);
my version will detect the overlap much faster than yours. My version will do it in 1 iteration of the cycle, while your version will spend 2 * 64K iterations (assuming a naive implementation of strlen).
If you decide to dive into the realm of questionable pointer comparisons, the above idea can also be reimplemented as
int are_overlapping(const char *a, const char *b)
{
if (a > b)
{
const char *t = a;
a = b;
b = t;
}
return b <= a + strlen(a);
}
This implementation does not perform an extra pointer comparison on each iteration. The price we pay for that is that it always iterates to the end of one of the strings instead of terminating early. Yet it is still more efficient than your implementation, since it calls strlen only once.