I'm interested in the technique used by Sean Barrett to make a dynamic array in C for any type. Comments in the current version claims the code is safe to use with strict-aliasing optimizations:
https://github.com/nothings/stb/blob/master/stb_ds.h#L332
You use it like:
int *array = NULL;
arrput(array, 2);
arrput(array, 3);
The allocation it does holds both the array data + a header struct:
typedef struct
{
size_t length;
size_t capacity;
void * hash_table;
ptrdiff_t temp;
} stbds_array_header;
The macros/functions all take a void* to the array and access the header by casting the void* array and moving back one:
#define stbds_header(t) ((stbds_array_header *) (t) - 1)
I'm sure Sean Barrett is far more knowledgeable than the average programmer. I'm just having trouble following how this type of code is not undefined behavior because of the strict aliasing rules in modern C. If this does avoid problems I'd love to understand why it does so I can incorporate it myself (maybe with a few less macros).
Lets follow the expansions of arrput in https://github.com/nothings/stb/blob/master/stb_ds.h :
#define STBDS_REALLOC(c,p,s) realloc(p,s)
#define arrput stbds_arrput
#define stbds_header(t) ((stbds_array_header *) (t) - 1)
#define stbds_arrput(a,v) (stbds_arrmaybegrow(a,1), (a)[stbds_header(a)->length++] = (v))
#define stbds_arrmaybegrow(a,n) ((!(a) || stbds_header(a)->length + (n) > stbds_header(a)->capacity) \
? (stbds_arrgrow(a,n,0),0) : 0)
#define stbds_arrgrow(a,b,c) ((a) = stbds_arrgrowf_wrapper((a), sizeof *(a), (b), (c)))
#define stbds_arrgrowf_wrapper stbds_arrgrowf
void *stbds_arrgrowf(void *a, size_t elemsize, size_t addlen, size_t min_cap)
{
...
b = STBDS_REALLOC(NULL, (a) ? stbds_header(a) : 0, elemsize * min_cap + sizeof(stbds_array_header));
//if (num_prev < 65536) prev_allocs[num_prev++] = (int *) (char *) b;
b = (char *) b + sizeof(stbds_array_header);
if (a == NULL) {
stbds_header(b)->length = 0;
stbds_header(b)->hash_table = 0;
stbds_header(b)->temp = 0;
} else {
STBDS_STATS(++stbds_array_grow);
}
stbds_header(b)->capacity = min_cap;
return b;
}
how this type of code is not undefined behavior because of the strict aliasing
Strict aliasing is about accessing data that has different effective type than data stored there. I would argue that the data stored in the memory region pointed to by stbds_header(array) has the effective type of the stbds_array_header structure, so accessing it is fine. The structure members are allocated by realloc and initialized one by one inside stbds_arrgrowf by stbds_header(b)->length = 0; lines.
how this type of code is not undefined behavior
I think the pointer arithmetic is fine. You can say that the result of realloc points to an array of one stbds_array_header structure. In other words, when doing the first stbds_header(b)->length = inside stbds_arrgrowf function the memory returned by realloc "becomes" an array of one element of stbds_array_header structures, as If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access from https://port70.net/~nsz/c/c11/n1570.html#6.5p6 .
int *array is assigned inside stbds_arrgrow to point to "one past the last element of an array" of one stbds_array_header structure. (Well, this is also the same place where an int array starts). ((stbds_array_header *) (array) - 1) calculates the address of the last array element by subtracting one from "one past the last element of an array". I would rewrite it as (char *)(void *)t - sizeof(stbds_array_header) anyway, as (stbds_array_header *) (array) sounds like it would generate a compiler warning.
Assigning to int *array in expansion of stbds_arrgrow a pointer to (char *)result_of_realloc + sizeof(stbds_array_header) may very theoretically potentially be not properly aligned to int array type, breaking If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined from https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7 . This is very theoretical, as stbds_array_header structure has size_t and void * and ptrdiff_t members, in any normal architecture it will have good alignment to access int (or any other normal type) after it.
I have only inspected the code in expansions of arrput. This is a 2000 lines of code, there may be other undefined behavior anywhere.
int ar[3][3]={{1,2,3},{4,5,6},{7,8,9}};
statment1: int k=(int *)((int *)(ar+1)+2);
statment2: int l=*(*(ar+1)+2);
statement3 int *p = (int *)a +1;
Statement1 does not compile.
Statement2 and Statement3 compiles.
Now, I cannot make out what difference does it make if I put (int *) instead of *, given that the array is of integer type.
You are confused about dereference operator * and cast operation (int *), and your very 1st line should have ring a bell:
int k = (int *)bar;
You try to affect an address (pointer to int) in an int variable.
The 2nd is ok because you are using * twice to get a value in your 2-dimension array.
The 3rd is also ok because your container int * p has the right type to get an address (and you dereference just "one dimension".
I hope it is clear enough, anyway you can have a look at this Wikipedia article abour dereference operator.
the confusion appears in this line where pointers are used instead of array indexes:
statment1: int k=(int *)((int *)(ar+1)+2);
It appears the meaning is intended to be ar[1][2] however, that is not what they have. In order to create an equivalent pointer representation of the ar[1][2] index, it would be:
statment1: int k = *(*(ar + 1) + 2) // equivalent to k = ar[1][2]
I encountered some code in a tutorial about buffer overflows.
It's a program that exploits a simple program that is vulnerable to a buffer overflow (if some stack protection mechanisms are turned off).
My question is: what is the for loop doing? I mean the line within the for loop:
*(void **)(buf + i) = addr;
Its a bit of a strange syntax that I haven't seen before, or maybe I have seen it but it just confuses me.
The idea of the program is that the buf is passed as argument to the vulnerable program and through a strcpy it will overwrite the return address on the stack such that it will run the shellcode that is passed in an environment parameter.
Thanks!
The full code:
int main(int argc, char **argv) {
void *addr = (char *) 0xc0000000 - 4 - (strlen(VULN) + 1) - (strlen(&shellcode) + 1);
char buf[768];
size_t i;
for (i = 0; i < sizeof(buf); i += sizeof(void *)) {
*(void **)(buf + i) = addr;
}
char *params[] = { VULN, buf, NULL };
char *env[] = { &shellcode, NULL };
execve(VULN, params, env);
perror("execve");
return -1;
}
C has a kind of Treehorn type system. For any object x of type T, you can pretend it's an object of a different type. To do so, you cast the address of the object. So, in steps:
T x; is an object of type T.
&x is the address of the object, it's of type T * – "pointer to T".
Now pretend this is a pointer to something else: (U *)(&x) – a "pointer to U", but it's the same value.
If we dereference that, we treat the object x as though it were a U: *(U *)(&x)
Now apply all this to T = char, x = buf[i] and U = void * in your code. Note that &buf[i] is identical to buf + i. Also note that i is incremented in strides of sizeof(void *) so that each round of the loop doesn't step on the memory touched by the previous rounds.
A word of warning: it is generally not allowed to treat one object as though it were one of a different type; this is undefined behavior. There are only some exceptions; e.g. you can treat an int as though it were an unsigned int, and you can treat any object x as though it were a char[sizeof x]. (None of these are the case in your code, which is not well-formed.)
First, it calculates a value which will remain constant throughout the execution of the for loop:
0xc0000000 - 4 - (strlen(VULN) + 1) - (strlen(&shellcode) + 1)
Then, inside the for loop, it writes this constant value into every "4-byte entry" in the buf array:
buf[0...3] = the constant value
buf[4...7] = the constant value
buf[8...11] = the constant value
...
buf[764...767] = the constant value
Can anyone explain the logic how to add a and b?
#include <stdio.h>
int main()
{
int a=30000, b=20, sum;
char *p;
p = (char *) a;
sum = (int)&p[b]; //adding a and b!
printf("%d",sum);
return 0;
}
The + is hidden here:
&p[b]
this expression is equivalent to
(p + b)
So we actually have:
(int) &p[b] == (int) ((char *) a)[b]) == (int) ((char *) a + b) == a + b
Note that this technically invokes undefined behavior as (char *) a has to point to an object and pointer arithmetic outside an object or one past the object invokes undefined behavior.
C standard says that E1[E2] is equivalent to *((E1) + (E2)). Therefore:
&p[b] = &*((p) + (b)) = ((p) + (b)) = ((a) + (b)) = a + b
p[b] is the b-th element of the array p. It's like writing *(p + b).
Now, when adding & it'll be like writing: p + b * sizeof(char) which is p + b.
Now, you'll have (int)((char *) a + b) which is.. a + b.
But.. when you still have + in your keyboard, use it.
As #gerijeshchauhan clarified in the comments, * and & are inverse operations, they cancel each other. So &*(p + b) is p + b.
p is made a pointer to char
a is converted to a pointer to char, thus making p point to memory with address a
Then the subscript operator is used to get to an object at an offset of b beyond the address pointed to by p. b is 20 and p+20=30020 . Then the address-of operator is used on the resulting object to convert the address back to int, and you've got the effect of a+b
The below comments might be easier to follow:
#include <stdio.h>
int main()
{
int a=30000, b=20, sum;
char *p; //1. p is a pointer to char
p = (char *) a; //2. a is converted to a pointer to char and p points to memory with address a (30000)
sum = (int)&p[b]; //3. p[b] is the b-th (20-th) element from address of p. So the address of the result of that is equivalent to a+b
printf("%d",sum);
return 0;
}
Reference: here
char *p;
p is a pointer (to element with size 1 byte)
p=(char *)a;
now p points to memory with address a
sum= (int)&p[b];
p pointer can be use as array p[] (start address (in memory) of this array is a)
p[b] means to get b-th element - this element address is a+b
[ (start address)a + b (b-th element * size of element (1 byte)) ]
&p[b] means to get address of element at p[b] but its address is a+b
if you use pointer to int (mostly 4 bytes)
int* p
p = (int*)a;
your sum will be a+(4*b)
int a=30000, b=20, sum;
char *p; //1. p is a pointer to char
p = (char *) a;
a is of type int, and has the value 30000. The above assignment converts the value 30000 from int to char* and stores the result in p.
The semantics of converting integers to pointers are (partially) defined by the C standard. Quoting the N1570 draft, section 6.3.2.3 paragraph 5:
An integer may be converted to any pointer type. Except as previously
specified, the result is implementation-defined, might not be
correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.
with a (non-normative) footnote:
The mapping functions for converting a pointer to an integer or an
integer to a pointer are intended to be consistent with the addressing
structure of the execution environment.
The standard makes no guarantees about the relative sizes of types int and char*; either could be bigger than the other, and the conversion could lose information. The result of this particular conversions is very unlikely to be a valid pointer value. If it's a trap representation, then the behavior of the assignment is undefined.
On a typical system you're likely to be using, char* is at least as big as int, and integer-to-pointer conversions probably just reinterpret the bits making up the integer's representation as the representation of a pointer value.
sum = (int)&p[b];
p[b] is by definition equivalent to *(p+b), where the + denotes pointer arithmetic. Since the pointer points to char, and a char is by definition 1 byte, the addition advances the pointed-to address by b bytes in memory (in this case 20).
But p is probably not a valid pointer, so any attempt to perform arithmetic on it, or even to access its value, has undefined behavior.
In practice, most C compilers generate code that doesn't perform extra checks. The emphasis is on fast execution of correct code, not on detection of incorrect code. So if the previous assignment to p set it to an address corresponding to the number 30000, then adding b, or 20, to that address will probably yield an address corresponding to the number 30020.
That address is the result of (p+b); now the [] operator implicitly applies the * operator to that address, giving you the object that that address points to -- conceptually, this is a char object stored at an address corresponding to the integer 30020.
We immediately apply the & operator to that object. There's a special-case rule that says applying & to the result of a [] operator is equivalent to just doing the pointer addition; see 6.5.3.2p2 in the above referenced standard draft.
So this:
&p[b]
is equivalent to:
p + b
which, as I said above, yields an address (of type char*) corresponding to the integer value 30020 -- assuming, of course, that integer-to-pointer conversions behave in a certain way and that the undefined behavior of constructing and accessing an invalid pointer value don't do anything surprising.
Finally, we use a cast operator to convert this address to type int. Conversion of a pointer value to an integer is also implementation-defined, and possibly undefined. Quoting 6.3.2.3p6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
It's not uncommon for a char* to be bigger than an int (for example, I'm typing this on a system with 32-bit int and 64-bit char*). But we're relatively safe from overflow in this case, because the char* value is the result of converting an in-range int value. there's no guarantee that converting a given value from int to char* and back to int will yield the original result, but it commonly works that way, at least for values that are in range.
So if a number of implementation-specific assumptions happen to be satisfied by the implementation on which the code happens to be running, then this code is likely to yield the same result as 30000 + 20.
Incidentally, I've worked on a system where this would have failed. The Cray T90 was a word-addressed machine, with hardware addresses pointing to 64-bit words; there was no hardware support for byte addressing. But char was 8 bits, so char* and void* pointers had to be constructed and manipulated in hardware. A char* pointer consisted of a 64-bit word pointer with a byte offset stored in the otherwise unused high-order 3 bits. Conversions between pointers and integers did not treat these high-order bits specially; they were simply copied. So ptr + 1 and (char*)(int)ptr + 1) could yield very different results.
But hey, you've managed to add two small integers without using the + operator, so there's that.
An alternative to the pointer arithmetic is to use bitops:
#include <stdio.h>
#include <string.h>
unsigned addtwo(unsigned one, unsigned two);
unsigned addtwo(unsigned one, unsigned two)
{
unsigned carry;
for( ;two; two = carry << 1) {
carry = one & two;
one ^= two;
}
return one;
}
int main(int argc, char **argv)
{
unsigned one, two, result;
if ( sscanf(argv[1], "%u", &one ) < 1) return 0;
if ( sscanf(argv[2], "%u", &two ) < 1) return 0;
result = addtwo(one, two);
fprintf(stdout, "One:=%u Two=%u Result=%u\n", one, two, result );
return 0;
}
On a completely different note, perhaps what was being looked for was an understanding of how binary addition is done in hardware, with XOR, AND, and bit shifting. In other words, an algorithm something like this:
int add(int a, int b)
{ int partial_sum = a ^ b;
int carries = a & b;
if (carries)
return add(partial_sum, carries << 1);
else
return partial_sum;
}
Or an iterative equivalent (although, gcc, at least, recognizes the leaf function and optimizes the recursion into an iterative version anyway; probably other compilers would as well)....
Probably needs a little more study for the negative cases, but this at least works for positive numbers.
/*
by sch.
001010101 = 85
001000111 = 71
---------
010011100 = 156
*/
#include <stdio.h>
#define SET_N_BIT(i,sum) ((1 << (i)) | (sum))
int sum(int a, int b)
{
int t = 0;
int i = 0;
int ia = 0, ib = 0;
int sum = 0;
int mask = 0;
for(i = 0; i < sizeof(int) * 8; i++)
{
mask = 1 << i;
ia = a & mask;
ib = b & mask;
if(ia & ib)
if(t)
{
sum = SET_N_BIT(i,sum);
t = 1;
/*i(1) t=1*/
}
else
{
t = 1;
/*i(0) t=1*/
}
else if (ia | ib)
if(t)
{
t = 1;
/*i(0) t=1*/
}
else
{
sum = SET_N_BIT(i,sum);
t = 0;
/*i(1) t=0*/
}
else
if(t)
{
sum = SET_N_BIT(i,sum);
t = 0;
/*i(1) t=0*/
}
else
{
t = 0;
/*i(0) t=0*/
}
}
return sum;
}
int main()
{
int a = 85;
int b = 71;
int i = 0;
while(1)
{
scanf("%d %d", &a, &b);
printf("%d: %d + %d = %d\n", ++i, a, b, sum(a, b));
}
return 0;
}
If I have a void * x, but I want to cast it to a char *, so that the ++ operator will make it point to the next byte and not the next 4 byte block.
However, when I do:
(char *) x -= byte_length;
The compiler complains:
Error, lvalue required as left value of assignment.
Where am I going wrong please?
Thanks.
I would do it like this:
x = (char*)x - byte_length;
Cast x to char*, then apply the offset, then assign back to x. Since void* is assignment compatible with all pointer types no further cast is needed.
(char *)x evaluates to a temporary with the same value as x but a different type. The compiler won't allow -= on a temporary. Do
x = (char *)x - byte_length;
The situation is analogous to the following:
short x = 0;
(long)x += 1; // invalid; (long)x is a temporary
x = (long)x + 1; // works
Do something like this:
void* x;
char* cx = (char*)x;
cx -= byte_length;
x = cx;
(char*) x -= byte_length;
In this statement, (char*) x creates a temporary value with char* as type. You can only assign to variables, or more precisely, references (or an lvalue according to your compiler).
One way to get a reference to it is casting the address of x to char* and dereferencing that:
*(char*) &x -= byte_length;
(char*)x is a temporary value that has not address in memory. You can't do a -- to it, for the same reason that if you have int i, you can't do (i+5)++.
You should do x=(char*)x-byte_length; instead.
See also http://c-faq.com/ptrs/castincr.html