I am creating an int array and then tricking c into believing that it's an array of short values. I know it's not good practice but I am just trying to understand why this isn't working. Shouldn't this change the value of arr[3] ?
#include <stdio.h>
int main() {
printf("Hello, World!\n");
int arr[5];
arr[0] = 0; arr[1] = 0; arr[2] = 0; arr[4] = 0;
arr[3] = 128;
((short*)arr)[6] = 128; // Shouldn't this change arr[3] ? as 6,7 indices in the arr of short would compromise of arr[3] in arr of ints?
int i = 0;
for (i = 0; i < 5; i++){
printf("%d\n", arr[i]);
}
return 0;
}
PS: Here's a deeper clarification:
When I cast int array to a short array, it seemingly becomes an array of 10 short elements (not 5). So when I change arr[6], I am changing only the first 16 bits of the int arr[3]. So arr[3] should still change and it is NOT that I am changing it to 128 again and not seeing the change.
FOR CLARIFICATION: THIS CODE IS ONLY FOR EXPERIMENTAL REASONS! I AM JUST LEARNING HOW POINTERS WORK AND I GET THAT ITS NOT GOOD PRACTICE.
Your code has undefined behavior, because you are writing a datum with a declared type through a pointer to a different type, and the different type is not char.
int arr[5];
/* ... */
((short*)arr)[6] = /* ANYTHING */;
The compiler is entitled to generate machine code that doesn't include the write to ((short*)arr)[6] at all, and this is quite likely with modern compilers. It's also entitled to delete the entire body of main on the theory that all possible executions of the program provoke undefined behavior, therefore the program will never actually be run.
(Some people say that when you write a program with undefined behavior, the C compiler is entitled to make demons fly out of your nose, but as a retired compiler developer I can assure you that most C compilers can't actually do that.)
Have you considered endianness?
EDIT: Now to add more clarity ...
As others have mentioned in the comments, this is most definitely undefined behavior! This is not just "not good practice", it's just don't do it!
Pointers on C is an excellent book that goes over everything you wanted to know about pointers and more. It's dated but still very relevant. You can probably find most of the information online, but I haven't seen many books that deal with pointers as completely as this one.
Though it sounds like you are experimenting, possibly as part of a class. So, here are a number of things wrong with this code:
endianness
memory access model
assumption of type size
assumption of hardware architecture
cross type casting
Remember, even though C is considered a pretty low level language today, it is still a high level programming language that affords many key abstractions.
Now, look at your declaration again.
int arr[5];
You've allocated 5 ints grouped together and accessed via a common variable named arr. By the standard, the array is 5 elements of at least 2 bytes per element with base address of &arr[0]. So, you aren't guaranteed that an int is 2 bytes, or 4 bytes or whatever. Likewise, as short is defined by the standard as at least 2 bytes. However, a short is not an int even if they have the same byte width! Remember, C is strongly typed.
Now, it looks like you are running on a machine where shorts are 2 bytes and ints are 4 bytes. That is where the endianness issue come into play: where is your most significant bit? And where is your most significant byte?
By casting the address of arr to a short pointer first of all breaks both the type and the memory access model. Then, you want to access the 6th element from the offset of arr. However, you aren't accessing relative to the int you declared arr to be, you are accessing through a short pointer that is pointing at the same address as arr!
These following operations ARE NOT the same! And it also falls into the category of undefined - don't do this ever!
int foo;
int pfooInt;
short bar;
short * pfooShort;
bar = (short) foo;
pfooShort = (short*)&foo;
pfooInt = &foo;
bar = *pfooShort;
pfooShort = (short*)pfooInt[0];
Another thing to clarify for you:
int arr[5];
((short *)arr)[6] ...
This does not transform your int array of 5 elements into a short array with 10 elements. arr is still an int array of 5 elements. You just broke the access method and are trying to modify memory in an undefined manner. What you did is tell the compiler "ignore what I told you about arr previously, treat arr as a short pointer for the life of this statement and access/modify 6th short relative to this pointer."
It is changing arr[3], however you are setting it back to 128 so you arent noticing a change. Change the line to:
((short*)arr)[6] = 72;
and you should see the following output:
Also a couple of things to clean up if you are new to C. You can initialize an array to zero by doing the following.
...
int arr[5] = { 0 };
arr[3] = 128;
...
Hope this helps!
Related
I wish to have a type which can be used as two different array structures - depending on context. They are not to be used interchangeably whilst the program is executing, rather when the program is executed with a particular start-up flag the type will be addressed as one of the array types
(for example):
array1[2][100]
or
array2[200];
I am not interested in how the data is organised (well I am but it is not relevant to what I wish to achieve)
union m_arrays
{
uint16_t array1[2][100];
uint16_t array2[200];
};
or do I have to use a pointer and alloc it at runtime?
uint16_t * array;
array = malloc(200 * sizeof(uint16_t));
uint16_t m_value =100;
*(array + 199) = m_value;
//equivalent uint16_t array1[1][99] == *(array + 199);
//equivalent uint16_t array2[199] == *(array + 199);
I haven't tried anything as yet
A union as itself contains either of its members. That is, only one member can be "bound" at a time (this is just an abstraction, since C has no notion about which member is "active").
In general, the effective size of that union will be the higher size on bytes of its members.
Let me give an example:
#include <stdio.h>
typedef union m_arrays
{
int array1[2][100];
int array2[400];
} a;
int main()
{
printf("%zu", sizeof(a));
return 0;
}
In this example, this would print 1600 (assuming int is 4 bytes long, but at the end it will depend on the architecture) and is the highest size in bytes. So, YES, you can have a union of arrays in C
Yes, this does work, and it's actually precisely because of how arrays are different from pointers. I'm sure you've heard that arrays in C are really just pointers, but the truth is that there are some important differences.
First, an array always points to somewhere on the stack. You can't use malloc to make an array because malloc returns a heap address. A pointer can point anywhere, you can even set it to an arbitrary integer if you want (though there's no guaruntee you can access that memory that it points to).
Second, because arrays are fixed length, the compiler can and does allocate them for you when you declare them. Importantly, this comes with the guaruntee that the whole array is in one continuous memory block. So if you declare int arr[2][100], you'll have 200 int slots allocated in a row on the stack. That means you can treat any multimensional array as a single-dimensional array if you want to, e.g. instead of arr[y][x] you could do arr[0][y*100+x]. You could also do something like int* arr2 = arr and then treat arr2 as a regular array even though arr is technically an int** (you'll get a warning for doing either of these things, my point is that you can do them because of how arrays are made).
The third, and probably most important difference, is a consequence of the second. When you have an array in a struct or union, the struct/union isn't just holding a pointer to the first element. It holds the entire array. This is often used for copying arrays or returning them from functions. What this means for you is that what you want to do works despite what someone who's heard that arrays are pointers might initially think. If arrays were just an address and they were initialized by allocating at that address, there would be two different arrays initialized at two different places, and having the pointers to them in a union would mean one gets overwritten and now you have an array somewhere that you can't access.
So when this all comes together, your union of arrays basically has one array with two different ways of accessing the data (which is what you want if I'm not mistaken). A little example:
#include <stdio.h>
int main(void) {
union {
int arr1[4];
int arr2[2][2];
} u;
u.arr1[0] = 1;
u.arr1[1] = 2;
u.arr1[2] = 3;
u.arr1[3] = 4;
printf("%d %d\n%d %d\n", u.arr2[0][0], u.arr2[0][1], u.arr2[1][0], u.arr2[1][1]);
return 0;
}
Output:
1 2
3 4
We can also quickly walk through why this wouldn't work with pure pointers. Let's say we instead had a union like this:
union {
int* arr1;
int** arr2;
} u;
Then we might initialize with u.arr1 = (int*) malloc(4 * sizeof (int));. Then we could use arr1 like a normal array. But what happens when we try to use arr2? Well, arr2[y][x] is of course syntactic sugar for *(*(arr2+y)+x)). Once it's dereferenced that first time, we now have an int, since the address points to an int. So when we add x to that int and try to dereference again, we're trying to dereference an int. C will try to do it, and if you're very unlucky it will succeed; I say unlucky because then you'll be messing with arbitrary memory. What's more likely is a segfault because whatever int is there is most likely not an address your program has access to.
As part of our training in the Academy of Programming Languages, we also learned C. During the test, we encountered the question of what the program output would be:
#include <stdio.h>
#include <string.h>
int main(){
char str[] = "hmmmm..";
const char * const ptr1[] = {"to be","or not to be","that is the question"};
char *ptr2 = "that is the qusetion";
(&ptr2)[3] = str;
strcpy(str,"(Hamlet)");
for (int i = 0; i < sizeof(ptr1)/sizeof(*ptr1); ++i){
printf("%s ", ptr1[i]);
}
printf("\n");
return 0;
}
Later, after examining the answers, it became clear that the cell (& ptr2)[3] was identical to the memory cell in &ptr1[2], so the output of the program is: to be or not to be (Hamlet)
My question is, is it possible to know, only by written code in the notebook, without checking any compiler, that a certain pointer (or all variables in general) follow or precede other variables in memory?
Note, I do not mean array variables, so all the elements in the array must be in sequence.
In this statement:
(&ptr2)[3] = str;
ptr2 was defined with char *ptr2 inside main. With this definition, the compiler is responsible for providing storage for ptr2. The compiler is allowed to use whatever storage it wants for this—it could be before ptr1, it could be after ptr1, it could be close, it could be far away.
Then &ptr2 takes the address of ptr2. This is allowed, but we do not know where that address will be in relation to ptr1 or anything else, because the compiler is allowed to use whatever storage it wants.
Since ptr2 is a char *, &ptr2 is a pointer to char *, also known as char **.
Then (&ptr2)[3] attempts to refer to element 3 of an array of char * that is at &ptr2. But there is no array there in C’s model of computation. There is just one char * there. When you try to refer to element of 3 of an array when there is no element 3 of an array, the behavior is not defined by the C standard.
Thus, this code is a bad example. It appears the test author misunderstood C, and this code does not illustrate what was intended.
char *ptr2 = some initializer;
(&ptr2)[3] = str;
When you evaluate &ptr2, you obtain the address of memory where is stored the pointer that points to that initializer.
When you do (&ptr2)[3]=something you try to write 3*sizeof(void*) locations further from the location of ptr2, the address of a string. This is invalid and almost sure it finishes with segmentation fault.
No, it's not possible and no such assumptions can be made.
By writing outside a variable's space, this code invokes undefined behavior, it's basically "illegal" and anything can happen when you run it. The C language specification says nothing about variables being allocated on a stack in some particular order that you can exploit, it does however say that accessing random memory is undefined behavior.
Basically this code is pretty horrible and should never be used, even less so in a teaching environment. It makes me sad, how people mis-understand C and still teach it to others. :/
A program usually is loaded in memory with this structure:
Stack, Mmap'ed files, Heap, BSS (uninitialized static variables), Data segment (Initialized static variables) and Text (Compiled code)
You can learn more here:
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
Depending on how you declare the variable it will go to one of the places said before.
The compiler will arrange the BSS and Data segment variables as he wishes on compilation time so usually no chance. Neither heap vars (the OS will get the memory block that fits better the space allocated)
In the stack (which is a LIFO structure) the variables are put one over eachother so if you have:
int a = 5;
int b = 10;
You can say that a and b will be placed one following the other. So, in this case you can tell.
There is another exception and that is if the variable is an structure or an array, they are always placed like i said before, each one following the last.
In your code ptr1 is an array of arrays of chars so it will follow the exception i said.
In fact, do the following exercise:
#include <stdio.h>
#include <string.h>
int main(){
const char * const ptr1[] = {"to be","or not to be","that is the question"};
for (int i = 0; i < 3; i++) {
for (int j = 0; j < strlen(ptr1[i]); j++)
printf("%p -> %c\n", &ptr1[i][j], ptr1[i][j]);
printf("\n");
}
}
and you will see the memory address and its content!
Have a nice day.
code link
#include <stdio.h>
int main(void) {
// your code goes here
int a = 2;
int b = 3;
int c;
c = a + b;
int arr[c];
arr[5] = 0;
printf("%d",arr[5]);
return 0;
}
Output is 0
How is it that at runtime it is taking the array number ? Is it a new feature ?
This is a variable length array. They were introduced in the 1999 revision of the C standard.
Sadly support for them came in slowly, so much that the 2011 revision made them an optional feature (but they are still standardized) 1.
Despite looking cool, they have a major caveat. They can cause you to overflow the call stack if the size is "too big". As such, care needs to be taken when using them.
1 Some compiler vendors were resistant, so it was made optional to appease them. Microsoft is an entire case study of this.
This feature (Variable length array) has been introduced in C99. But currently this still is a compiler-dependent behavior. Some compiler(like gcc) supports it. Some(like msvc) doesn't.
BTW, arr[5] in your code, is out of range. Last element should be arr[4].
don't be confused in (static/fixed memory allocation) & (dynamic memory allocation) concepts :)
Let me clear your concept bro.
Relevant to following question,
C supports two type of array.
1.Static Array - are allocated memory at "COMPILE TIME".
2.Dynamic Array - are allocated memory at "RUN TIME".
Ques. how to determine if an Array is static or dynamic?
Ans.
Array declaration syntax:-
int array_Name[size]; //size defines the size of block of memory for an array;
So, coming to the point-->
Point 1. if size is given at compile time to array, it's a "Static Memory Allocation". It is also called "fixed size memory allocation" because size is never changed. It's the LIMITATION of ARRAY in C.
ex.
int arr[10]; //10 is size of arr which is staticly defined
int brr[] = {1000, 2, 37, 755, 3}; //size is equal to the no. of values initilizes with.
point 2. If size is given at compile time to array, it's a Dynamic Memory Allocation.
It is achieved by malloc() function defined in stdlib.h .
Now, its's the clarification of your code :-
#include <stdio.h>
int main(void) {
// your code goes here
int a = 2;
int b = 3;
int c;
c = a + b; //c is calculated at run time
int arr[c]; //Compilor awaiting for the value of c which is given at run time but,
arr[5] = 0; //here arr is allocated the size of 5 at static(compile) time which is never be change further whether it is compile time in next statements or run time.
printf("%d",arr[5]);
return 0;
}
So, array(of size 5) holds value 0 at arr[5].
and ,other array indexes still show Garbage Values.
Hoping, you'll be satisfy with this solution to your problem :)
I know there is several questions about that which gives good (and working) solutions, but none IMHO which says clearly what is the best way to achieve this.
So, suppose we have some 2D array :
int tab1[100][280];
We want to make a pointer that points to this 2D array.
To achieve this, we can do :
int (*pointer)[280]; // pointer creation
pointer = tab1; //assignation
pointer[5][12] = 517; // use
int myint = pointer[5][12]; // use
or, alternatively :
int (*pointer)[100][280]; // pointer creation
pointer = &tab1; //assignation
(*pointer)[5][12] = 517; // use
int myint = (*pointer)[5][12]; // use
OK, both seems to work well. Now I would like to know :
what is the best way, the 1st or the 2nd ?
are both equals for the compiler ? (speed, perf...)
is one of these solutions eating more memory than the other ?
what is the more frequently used by developers ?
//defines an array of 280 pointers (1120 or 2240 bytes)
int *pointer1 [280];
//defines a pointer (4 or 8 bytes depending on 32/64 bits platform)
int (*pointer2)[280]; //pointer to an array of 280 integers
int (*pointer3)[100][280]; //pointer to an 2D array of 100*280 integers
Using pointer2 or pointer3 produce the same binary except manipulations as ++pointer2 as pointed out by WhozCraig.
I recommend using typedef (producing same binary code as above pointer3)
typedef int myType[100][280];
myType *pointer3;
Note: Since C++11, you can also use keyword using instead of typedef
using myType = int[100][280];
myType *pointer3;
in your example:
myType *pointer; // pointer creation
pointer = &tab1; // assignation
(*pointer)[5][12] = 517; // set (write)
int myint = (*pointer)[5][12]; // get (read)
Note: If the array tab1 is used within a function body => this array will be placed within the call stack memory. But the stack size is limited. Using arrays bigger than the free memory stack produces a stack overflow crash.
The full snippet is online-compilable at gcc.godbolt.org
int main()
{
//defines an array of 280 pointers (1120 or 2240 bytes)
int *pointer1 [280];
static_assert( sizeof(pointer1) == 2240, "" );
//defines a pointer (4 or 8 bytes depending on 32/64 bits platform)
int (*pointer2)[280]; //pointer to an array of 280 integers
int (*pointer3)[100][280]; //pointer to an 2D array of 100*280 integers
static_assert( sizeof(pointer2) == 8, "" );
static_assert( sizeof(pointer3) == 8, "" );
// Use 'typedef' (or 'using' if you use a modern C++ compiler)
typedef int myType[100][280];
//using myType = int[100][280];
int tab1[100][280];
myType *pointer; // pointer creation
pointer = &tab1; // assignation
(*pointer)[5][12] = 517; // set (write)
int myint = (*pointer)[5][12]; // get (read)
return myint;
}
Both your examples are equivalent. However, the first one is less obvious and more "hacky", while the second one clearly states your intention.
int (*pointer)[280];
pointer = tab1;
pointer points to an 1D array of 280 integers. In your assignment, you actually assign the first row of tab1. This works since you can implicitly cast arrays to pointers (to the first element).
When you are using pointer[5][12], C treats pointer as an array of arrays (pointer[5] is of type int[280]), so there is another implicit cast here (at least semantically).
In your second example, you explicitly create a pointer to a 2D array:
int (*pointer)[100][280];
pointer = &tab1;
The semantics are clearer here: *pointer is a 2D array, so you need to access it using (*pointer)[i][j].
Both solutions use the same amount of memory (1 pointer) and will most likely run equally fast. Under the hood, both pointers will even point to the same memory location (the first element of the tab1 array), and it is possible that your compiler will even generate the same code.
The first solution is "more advanced" since one needs quite a deep understanding on how arrays and pointers work in C to understand what is going on. The second one is more explicit.
int *pointer[280]; //Creates 280 pointers of type int.
In 32 bit os, 4 bytes for each pointer. so 4 * 280 = 1120 bytes.
int (*pointer)[100][280]; // Creates only one pointer which is used to point an array of [100][280] ints.
Here only 4 bytes.
Coming to your question, int (*pointer)[280]; and int (*pointer)[100][280]; are different though it points to same 2D array of [100][280].
Because if int (*pointer)[280]; is incremented, then it will points to next 1D array, but where as int (*pointer)[100][280]; crosses the whole 2D array and points to next byte. Accessing that byte may cause problem if that memory doen't belongs to your process.
Ok, this is actually four different question. I'll address them one by one:
are both equals for the compiler? (speed, perf...)
Yes. The pointer dereferenciation and decay from type int (*)[100][280] to int (*)[280] is always a noop to your CPU. I wouldn't put it past a bad compiler to generate bogus code anyways, but a good optimizing compiler should compile both examples to the exact same code.
is one of these solutions eating more memory than the other?
As a corollary to my first answer, no.
what is the more frequently used by developers?
Definitely the variant without the extra (*pointer) dereferenciation. For C programmers it is second nature to assume that any pointer may actually be a pointer to the first element of an array.
what is the best way, the 1st or the 2nd?
That depends on what you optimize for:
Idiomatic code uses variant 1. The declaration is missing the outer dimension, but all uses are exactly as a C programmer expects them to be.
If you want to make it explicit that you are pointing to an array, you can use variant 2. However, many seasoned C programmers will think that there's a third dimension hidden behind the innermost *. Having no array dimension there will feel weird to most programmers.
I came across this in an IRC channel yesterday and didn't understand why it was bad behavior:
#include <stdio.h>
int main(void)
{
char x[sizeof(int)] = { '\0' }; int *y = (int *) x;
printf("%d\n", *y);
}
Is there any loss of data or anything? Can anyone give me any docs to explain further about what it does wrong?
The array x may not be properly aligned in memory for an int. On x86 you won't notice, but on other architectures, such as SPARC, dereferencing y will trigger a bus error (SIGBUS) and crash your program.
This problem may occur for any address:
int main(void)
{
short a = 1;
char b = 2;
/* y not aligned */
int* y = (int *)(&b);
printf("%d\n", *y); /* SIGBUS */
}
For one thing, the array x is not guaranteed to be aligned properly for an int.
There's been a conversation topic about how this might affect techniques like placement new. It should be noted that placement new needs to occur on properly aligned memory as well, but placement new is often used with memory that allocated dynamically, and allocation functions (in C and C++) are required to return memory that's suitably aligned for any type specifically so the address can be assigned to a pointer of any type.
The same isn't true for the memory allocated by the compiler for automatic variables.
Why not use a union instead?
union xy {
int y;
char x[sizeof(int)];
};
union xy xyvar = { .x = { 0 } };
...
printf("%d\n", xyvar.y);
I haven't verified it, but I would think the alignment problems mentioned by others would not be a problem here. If anyone has an argument for why this isn't portable, I'd like to hear it.
I think that while the alignment issue is true, it is not the whole story.
Even if alignment is not a problem, you are still taking 4 bytes on the stack, only one of them initialized to zero, and treating them like an integer.
This means that the printed value has 24 un-initialized bits.
And using un-initialized values is a basic 'wrong'.
(Assuming sizeof(int)==4 for simplicity).