I was just reading some code and found that the person was using arr[-2] to access the 2nd element before the arr, like so:
|a|b|c|d|e|f|g|
^------------ arr[0]
^---------- arr[1]
^---------------- arr[-2]
Is that allowed?
I know that arr[x] is the same as *(arr + x). So arr[-2] is *(arr - 2), which seems OK. What do you think?
That is correct. From C99 §6.5.2.1/2:
The definition of the subscript
operator [] is that E1[E2] is
identical to (*((E1)+(E2))).
There's no magic. It's a 1-1 equivalence. As always when dereferencing a pointer (*), you need to be sure it's pointing to a valid address.
This is only valid if arr is a pointer that points to the second element in an array or a later element. Otherwise, it is not valid, because you would be accessing memory outside the bounds of the array. So, for example, this would be wrong:
int arr[10];
int x = arr[-2]; // invalid; out of range
But this would be okay:
int arr[10];
int* p = &arr[2];
int x = p[-2]; // valid: accesses arr[0]
It is, however, unusual to use a negative subscript.
Sounds fine to me. It would be a rare case that you would legitimately need it however.
What probably was that arr was pointing to the middle of the array, hence making arr[-2] pointing to something in the original array without going out of bounds.
I'm not sure how reliable this is, but I just read the following caveat about negative array indices on 64-bit systems (LP64 presumably): http://www.devx.com/tips/Tip/41349
The author seems to be saying that 32 bit int array indices with 64 bit addressing can result in bad address calculations unless the array index is explicitly promoted to 64 bits (e.g. via a ptrdiff_t cast). I have actually seen a bug of his nature with the PowerPC version of gcc 4.1.0, but I don't know if it's a compiler bug (i.e. should work according to C99 standard) or correct behaviour (i.e. index needs a cast to 64 bits for correct behaviour) ?
I know the question is answered, but I couldn't resist sharing this explanation.
I remember Principles of Compiler design: Let's assume a is an int array and size of int is 2, and the base address for a is 1000.
How will a[5] work ->
Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010
This explanation is also the reason why negative indexes in arrays work in C; i.e., if I access a[-5] it will give me:
Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990
It will return the object at location 990. So, by this logic, we can access negative indexes in arrays in C.
About why would someone want to use negative indexes, I have used them in two contexts:
Having a table of combinatorial numbers that tells you comb[1][-1] = 0; you can always check indexes before accessing the table, but this way the code looks cleaner and executes faster.
Putting a centinel at the beginning of a table. For instance, you want to use something like
while (x < a[i]) i--;
but then you should also check that i is positive.
Solution: make it so that a[-1] is -DBLE_MAX, so that x<a[-1] will always be false.
#include <stdio.h>
int main() // negative index
{
int i = 1, a[5] = {10, 20, 30, 40, 50};
int* mid = &a[5]; //legal;address,not element there
for(; i < 6; ++i)
printf(" mid[ %d ] = %d;", -i, mid[-i]);
}
I would like to share an example:
GNU C++ library basic_string.h
[notice: as someone points out that this is a "C++" example, it may not be fit for this topic of "C". I write a "C" code, which has same concept as the example. At least, GNU gcc compiler doesn't complain anything.]
It uses [-1] to move pointer back from user string to management information block. As it alloc memory once with enough room.
Said
"
* This approach has the enormous advantage that a string object
* requires only one allocation. All the ugliness is confined
* within a single %pair of inline functions, which each compile to
* a single #a add instruction: _Rep::_M_data(), and
* string::_M_rep(); and the allocation function which gets a
* block of raw bytes and with room enough and constructs a _Rep
* object at the front.
"
Source code:
https://gcc.gnu.org/onlinedocs/gcc-10.3.0/libstdc++/api/a00332_source.html
struct _Rep_base
{
size_type _M_length;
size_type _M_capacity;
_Atomic_word _M_refcount;
};
struct _Rep : _Rep_base
{
...
}
_Rep*
_M_rep() const _GLIBCXX_NOEXCEPT
{ return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }
It explained:
* A string looks like this:
*
* #code
* [_Rep]
* _M_length
* [basic_string<char_type>] _M_capacity
* _M_dataplus _M_refcount
* _M_p ----------------> unnamed array of char_type
* #endcode
*
* Where the _M_p points to the first character in the string, and
* you cast it to a pointer-to-_Rep and subtract 1 to get a
* pointer to the header.
*
* This approach has the enormous advantage that a string object
* requires only one allocation. All the ugliness is confined
* within a single %pair of inline functions, which each compile to
* a single #a add instruction: _Rep::_M_data(), and
* string::_M_rep(); and the allocation function which gets a
* block of raw bytes and with room enough and constructs a _Rep
* object at the front.
*
* The reason you want _M_data pointing to the character %array and
* not the _Rep is so that the debugger can see the string
* contents. (Probably we should add a non-inline member to get
* the _Rep for the debugger to use, so users can check the actual
* string length.)
*
* Note that the _Rep object is a POD so that you can have a
* static <em>empty string</em> _Rep object already #a constructed before
* static constructors have run. The reference-count encoding is
* chosen so that a 0 indicates one reference, so you never try to
* destroy the empty-string _Rep object.
*
* All but the last paragraph is considered pretty conventional
* for a C++ string implementation.
// use the concept before, to write a sample C code
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
typedef struct HEAD {
int f1;
int f2;
}S_HEAD;
int main(int argc, char* argv[]) {
int sz = sizeof(S_HEAD) + 20;
S_HEAD* ha = (S_HEAD*)malloc(sz);
if (ha == NULL)
return -1;
printf("&ha=0x%x\n", ha);
memset(ha, 0, sz);
ha[0].f1 = 100;
ha[0].f2 = 200;
// move to user data, can be converted to any type
ha++;
printf("&ha=0x%x\n", ha);
*(int*)ha = 399;
printf("head.f1=%i head.f2=%i user data=%i\n", ha[-1].f1, ha[-1].f2, *(int*)ha);
--ha;
printf("&ha=0x%x\n", ha);
free(ha);
return 0;
}
$ gcc c1.c -o c1.o -w
(no warning)
$ ./c1.o
&ha=0x13ec010
&ha=0x13ec018
head.f1=100 head.f2=200 user data=399
&ha=0x13ec010
The library author uses it. May it be helpful.
Related
From Use the correct syntax when declaring a flexible array member it says that when malloc is used for a header and flexible data when data[1] is hacked into the struct,
This example has undefined behavior when accessing any element other
than the first element of the data array. (See the C Standard, 6.5.6.)
Consequently, the compiler can generate code that does not return the
expected value when accessing the second element of data.
I looked up the C Standard 6.5.6, and could not see how this would produce undefined behaviour. I've used a pattern that I'm comfortable with, where the header is implicitly followed by data, using the same sort of malloc,
#include <stdlib.h> /* EXIT malloc free */
#include <stdio.h> /* printf */
#include <string.h> /* strlen memcpy */
struct Array {
size_t length;
char *array;
}; /* +(length + 1) char */
static struct Array *Array(const char *const str) {
struct Array *a;
size_t length;
length = strlen(str);
if(!(a = malloc(sizeof *a + length + 1))) return 0;
a->length = length;
a->array = (char *)(a + 1); /* UB? */
memcpy(a->array, str, length + 1);
return a;
}
/* Take a char off the end just so that it's useful. */
static void Array_to_string(const struct Array *const a, char (*const s)[12]) {
const int n = a->length ? a->length > 9 ? 9 : (int)a->length - 1 : 0;
sprintf(*s, "<%.*s>", n, a->array);
}
int main(void) {
struct Array *a = 0, *b = 0;
int is_done = 0;
do { /* Try. */
char s[12], t[12];
if(!(a = Array("Foo!")) || !(b = Array("To be or not to be."))) break;
Array_to_string(a, &s);
Array_to_string(b, &t);
printf("%s %s\n", s, t);
is_done = 1;
} while(0); if(!is_done) {
perror(":(");
} {
free(a);
free(b);
}
return is_done ? EXIT_SUCCESS : EXIT_FAILURE;
}
Prints,
<Foo> <To be or >
The compliant solution uses C99 flexible array members. The page also says,
Failing to use the correct syntax when declaring a flexible array
member can result in undefined behavior, although the incorrect syntax
will work on most implementations.
Technically, does this C90 code produce undefined behaviour, too? And if not, what is the difference? (Or the Carnegie Mellon Wiki is incorrect?) What is the factor on the implementations this will not work on?
This should be well defined:
a->array = (char *)(a + 1);
Because you create a pointer to one element past the end of an array of size 1 but do not dereference it. And because a->array now points to bytes that do not yet have an effective type, you can use them safely.
This only works however because you're using the bytes that follow as an array of char. If you instead tried to create an array of some other type whose size is greater than 1, you could have alignment issues.
For example, if you compiled a program for ARM with 32 bit pointers and you had this:
struct Array {
int size;
uint64_t *a;
};
...
Array a = malloc(sizeof *a + (length * sizeof(uint64_t)));
a->length = length;
a->a= (uint64_t *)(a + 1); // misaligned pointer
a->a[0] = 0x1111222233334444ULL; // misaligned write
Your program would crash due to a misaligned write. So in general you shouldn't depend on this. Best to stick with a flexible array member which the standard guarantees will work.
As an adjunct to #dbush good answer, a way to get around alignment woes is to use a union. This insures &p[1] is properly aligned for (uint64_t*)1. sizeof *p includes any needed padding vs. sizeof *a.
union {
struct Array header;
uint64_t dummy;
} *p;
p = malloc(sizeof *p + length*sizeof p->header->array);
struct Array *a = (struct Array *)&p[0]; // or = &(p->header);
a->length = length;
a->array = (uint64_t*) &p[1]; // or &p[1].dummy;
Or go with C99 and flexible array member.
1 As well as struct Array
Before the publication of C89, there were some implementations that would attempt to identify and trap upon out-of-bounds array accesses. Given something like:
struct foo {int a[4],b[4];} *p;
such implementations would squawk at an effort to access p->a[i] if i wasn't in the range 0 to 3. For programs that don't need to index the address of array-type lvalue p->a to access anything outside that array, being able to trap on such out-of-bounds accesses would be useful.
The authors of C89 were also almost certainly aware that it was common for programs to use the address of dummy-sized array at the end of a structure as a means of accessing storage beyond the structure. Using such techniques made it possible to do things that couldn't be done nearly as nicely otherwise, and part of the Spirit of C, according to the authors of the Standard, is "Don't prevent the programmer from doing what needs to be done".
Consequently, the authors of the Standard treated such accesses as something which implementations could support or not, at their leisure, presumably based upon what would be most useful for their customers. While it would often be helpful for implementations which would normally bounds-check accesses to structures in an array, to provide an option to omit such checks in cases where the last item of an indirectly-accessed structure is an array with one element (or, if they extend the language to waive a compile-time constraint, zero elements), people writing such implementations would presumably be capable of recognizing such things without the authors of the Standard having to tell them. The notion that "Undefined Behavior" was intended as some form of prohibition doesn't seem to have really taken hold until after the publication of C89's successor standard.
With regard to your example, having a pointer within a struct point to later storage in the same allocation should work, but with a couple of caveats:
If the allocation is passed to realloc, the pointer within it will become invalid.
The only real advantage of using a pointer versus a flexible array member is that it allows for the possibility of having it point somewhere else. That may be good if the only kind of "something else" will always be an constant object of static duration that never has to be freed, or perhaps if it is some other kind of object that won't have to be freed, but may be problematical if it could hold the only reference to something stored in a separate allocation.
Flexible array members have been available as an extension in some compilers before C89 was written, and were officially added in C99. Any decent compiler should support them.
You can define struct Array as:
struct Array
{
size_t length;
char array[1];
}; /* +(length + 1) char */
then malloc( sizeof *a + length ). The "+1" element is in array[1] member. Fill structure with:
a->length = length;
strcpy( a->array, str );
I am trying to get a pointer to an array integers to be temporarily remapped in a function later on to save myself pointer math. I've tried to see if any other questions answered it, but I've been unable to reproduce the methods described here, here, and here.
Fundamentally, I just want to temporally treat an integer group and a 3D array to be sure that I don't mess up the pointer math. (I'm looking at this currently because the previous code had made inconsistent assignments to the memory).
#include <stdlib.h>
#define GROUPCOUNT 16
#define SENSORCOUNT 6
#define SENSORDIM 3
int main()
{
int *groupdata = (int *)calloc(GROUPCOUNT * SENSORCOUNT * SENSORDIM,sizeof(int));
int sensordata[SENSORCOUNT*SENSORDIM];
sensordata[7] = 42; //assign some data
int (*group3d)[GROUPCOUNT][SENSORCOUNT][SENSORDIM] = groupdata; //<---Here is the problem
group3d[1][5][1] = sensordata[7]; //I want to do this
free(groupdata);
}
In the example above, I want to handle groupdata as group3d temporarily for assignments, and I just cannot seem to wrap myself around the casting. I currently have macros that do the pointer math to enforce the correct structure, but if it was all just in the code, it would be even better when I pass it off. Any suggestions would be greatly appreciated.
note: The 3D cast is to be used in a function way in the bowels of the program. The example is just a minimally viable program for me to try to sort out the code.
When group3d is defined with int (*group3d)[GROUPCOUNT][SENSORCOUNT][SENSORDIM], then *group3d is a three-dimensional array. That would let you use it with (*group3d)[1][5][1].
To use it with group3d[1][5][1], you need group3d to be a pointer to a two-dimensional array:
int (*group3d)[SENSORCOUNT][SENSORDIM] = (int (*)[SENSORCOUNT][SENSORDIM]) groupdata;
(There are some technical concerns about C semantics in aliasing an array of int as an array of array of array of int, but this is not a problem in common compilers with default settings. However, it would be preferable to always use the memory as an array of array of array of int, not as an array of int.)
int l = 5, w = 10, h = 15;
int Data = 45;
int *k = malloc(l * w * h * sizeof *k);
// not use this k[a][b][c] = Data;
//use this is right
k[a*l*w + b*l + c] = Data;
In the example above, I want to handle groupdata as group3d
temporarily for assignments, and I just cannot seem to wrap myself
around the casting.
One possible solution is to create a multidimensional array dynamically like this. This way you won't have to cast things or worry about the dimensions.
int (*group3d)[GROUPCOUNT][SENSORCOUNT][SENSORDIM] = calloc(1, sizeof(int [GROUPCOUNT][SENSORCOUNT][SENSORDIM]));
(*group3d)[1][5][1] = sensordata[7]; //I want to do this
/* Then you can print it like */
printf("%d\r\n", (*group3d)[1][5][1]);
In general, i'm trying to allocate values of first.a and first.b
to a array's in struct secon.
typedef struct {
int a;
int b;
} firs;
//secon is my struct which contains dynamic array
//can i use int here ?
typedef struct {
int *aa;
int *bb;
} secon;
//pointer to secon intialised to NULL;
secon* sp=NULL;
int main()
{
firs first;
//plz assume 2 is coming from user ;
sp=malloc(sizeof(secon)*2);
//setting values
first.a=10;
first.b=11;
/* what i'm trying to do is assign values of first.a and first.b to my
dynamically created array*/
/* plz assume first.a and first.b are changing else where .. that means ,not
all arrays will have same values */
/* in general , i'm trying to allocate values of first.a and first.b
to a array's in struct second. */
for(int i=0; i<2; i++) {
*( &(sp->aa ) + (i*4) ) = &first.a;
*( &(sp->bb ) + (i*4) ) = &first.b;
}
for(int i=0; i<2; i++) {
printf("%d %d \n", *((sp->aa) + (i*4) ),*( (sp->bb) +(i*4) ) );
}
return 0;
}
MY output :
10 11
4196048 0
Problems with my code:
1. whats wrong with my code?
2. can i use int inside struct for dynamic array?
3. what are the alternatives?
4. why am i not getting correct answer?
Grigory Rechistov has done a really good job of untangling the code and you should probably accept his answer, but I want to emphasize one particular point.
In C pointer arithmetic, the offsets are always in units of the size of the type pointed to. Unless the type of the pointer is char* or void* if you find yourself multiplying by the size of the type, you are almost certainly doing it wrong.
If I have
int a[10];
int *p = &(a[5]);
int *q = &(a[7]);
Then a[6] is the same as *(p + 1) not *(p + 1 * sizeof(int)). Likewise a[4] is *(p - 1)
Furthermore, you can subtract pointers when they both point to objects in the same array and the same rule applies; the result is in the units of the size of the type pointed to. q - p is 2, not 2 * sizeof(int). Replace the type int in the example with any other type and the p - q will always be 2. For example:
struct Foo { int n ; char x[37] ; };
struct Foo a[10];
struct Foo *p = &(a[5]);
struct Foo *q = &(a[7]);
q - p is still 2. Incidentally, never be tempted to hard code a type's size anywhere. If you are tempted to malloc a struct like this:
struct Foo *r = malloc(41); // int size is 4 + 37 chars
Don't.
Firstly, sizeof(int) is not guaranteed to be 4. Secondly, even if it is, sizeof(struct Foo) is not guaranteed to be 41. Compilers often add padding to struct types to ensure that the members are properly aligned. In this case it is almost a certainty that the compiler will add 3 bytes (or 7 bytes) of padding to the end of struct Foo to ensure that, in arrays, the address of the n member is aligned to the size of an int. always always always use sizeof.
It looks like your understanding how pointer arithmetic works in C is wrong. There is also a problem with data layout assumptions. Finally, there are portability issues and a bad choice of syntax that complicates understanding.
I assume that wit this expression: *( &(sp->aa ) + (i*4) ) you are trying to access the i-th item in the array by taking address of the 0-th item and then adding a byte offset to it. This is wrong of three reasons:
You assume that after sp[0].aa comes sp[1].aa in memory, but you forget that there is sp[0].bb in between.
You assume that size of int is always 4 bytes, which is not true.
You assume that adding an int to secon* will give you a pointer that is offset by specified number of bytes, while in fact it will be offset in specified number of records of size secon.
The second line of output that you see is random junk from unallocated heap memory because when i == 1 your constructions reference memory that is outside of limits allocated for *secon.
To access an i-th item of array referenced by a pointer, use []:
secon[0].aa is the same as (secon +0)->aa, and secon[1].aa is equal to (secon+1)->aa.
This is a complete mess. If you want to access an array of secons, use []
for(int i=0;i<2;i++)
{
sp[i].aa = &first.a; // Same pointer both times
sp[i].bb = &first.b;
}
You have two copies of pointers to the values in first, they point to the same value
for(int i=0;i<2;i++)
{
sp[i].aa = malloc(sizeof(int)); // new pointer each time
*sp[i].aa = first.a; // assigned with the current value
sp[i].bb = malloc(sizeof(int));
*sp[i].bb = first.b;
}
However the compiler is allowed to assume that first does not change, and it is allowed to re-order these expressions, so you are not assured to have different values in your secons
Either way, when you read back the values in second, you can still use []
for(int i=0;i<2;i++)
{
printf("%d %d \n",*sp[i].aa ),*sp[i].bb );
}
I'm a bit overwhelmed at this line specifically:
Entry** newHeap = (Entry**)malloc(sizeof(Entry*) * newHeapLength);
in this code:
/**
* Expands the heap array of the given priority queue by
* replacing it with another that is double its size.
*
* #param pq the priority queue whose heap is to be doubled in size
* return 1 for successful expansion or an error code:
*/
int expandHeap (PriorityQueue *pq)
{
int returnCode = 1;
int newHeapLength = pq->heapLength * 2;
Entry** newHeap = (Entry**)malloc(sizeof(Entry*) * newHeapLength);
if (newHeap != NULL)
{
int index;
for (index = 0; index < pq->heapLength; index++)
{
newHeap[index] = pq->heap[index];
}
free(pq->heap);
pq->heap = newHeap;
pq->heapLength = newHeapLength;
}
else
{
returnCode = -1; // TODO: make meaningful error codes
}
return returnCode;
}
Entry** newHeap = (Entry**)malloc(sizeof(Entry*) * newHeapLength);
| |
newHeap is a malloc allocates a chunk in memory that is the size of
pointer to a a pointer to an Entry times newHeapLength
pointer to an Entry
It just allocates an array for you, at run-time. Usually the size of array must be specified at compile-time. But here it is specified at run-time, it's newHeapLength. Each entry ("cell") in that array must be capable of storing a value of type Entry* in it. In C, arrays are contiguous, so the total size of the array, in bytes, is just a product of the two numbers: sizeof(Entry*) * newHeapLength. Now newHeap can be used to address this array in a usual manner: e.g. newHeap[8]. Of course, if 8 >= newHeapLength, this would be accessing past the allocated area, which is bad.
For array storing 10 ints, int ia[10];, the type of ia is int * (correction: almost. but we can pretend that it is, for the purposes of this explanation). Here, similarly, for array storing values of type Entry*, the type is (Entry*)*. Simple. :)
And of course you must cast the return value of malloc to your type, to be able to address that array with it. malloc by itself returns an address as void*. Meaning. the size of memory cell which it points to, is proclaimed unknown. When we say that ia is of type int*, what we actually saying is that memory cell pointed to by it has size of sizeof(int). So when we write ia[3], it is actually translated into *(ia+3) which is actually *(int*)(void*)( (unsigned int)(void*)ia + 3*sizeof(int) ). In other words, the compiler just adds sizeof(int) three times to the starting address, thus "hopping over" three sizeof(int)-wide cells of memory. And for newHeap[8] it will just "hop" over 8 sizeof(Entry*)-wide memory cells, to get the address of the 9-th entry in that array (counting from 1).
Also, see hashed array tree for an alternative to the geometric expansion, which is what that code is doing.
I recently submitted a small program for an assignment that had the following two functions and a main method inside of it:
/**
* Counts the number of bits it
* takes to represent an integer a
*/
int num_bits(int a)
{
int bitCount = 0;
while(a > 0)
{
bitCount++;
a = a >> 1; //shift to the right 1 bit
}
return bitCount;
}
/**
* Converts an integer into binary representation
* stored in an array
*/
void int2bin_array(int a, int *b)
{
//stopping point in search for bits
int upper_bound = num_bits(a);
int i;
for(i = 0; i < upper_bound; i++)
{
*(b+i) = (a >> i) & 1; //store the ith bit in b[i]
}
}
int main()
{
int numBits = num_bits(exponent);
int arr[numBits]; //<- QUESTION IS ABOUT THIS LINE
int2bin_array(exponent, arr);
//do some operations on array arr
}
When my instructor returned the program he wrote a comment about the line I marked above saying that since the value of numBits isn't known until run-time, initializing an array to size numBits is a dangerous operation because the compiler won't know how much memory to allocate to array arr.
I was wondering if someone could:
1) Verify that this is a dangerous operation
2) Explain what is going on memory wise when I initialize an array like that, how does the compiler know what memory to allocate? Is there any way to determine how much memory was allocated?
Any inputs would be appreciated.
That's a C99 variable length array. It is allocated at runtime (not by the compiler) on the stack, and is basically equivalent to
char *arr = alloca(num_bits);
In this case, since you can know the upper bound of the function, and it is relatively small, you'd be best off with
char arr[sizeof(int)*CHAR_BIT];
This array has a size known at compile time, will always fit everything you need, and works on platforms without C99 support.
It should be ok, it will just go on the stack.
The only danger is blowing out the stack.
malloc would be the normal way, then you know if you have enough memory or not and can make informed decisions on what to do next. But in many cases its ok to assume you can put not too big objects on the stack.
But strictly speaking, if you don't have enough space, this will fail badly.