In C, in an Unix environment (Plan9), I have got an array as memory.
uchar mem[32*1024];
I need that array to contain different fields, such as an int (integer) to indicate the size of memory free and avaliable. So, I've tried this:
uchar* memp=mem;
*memp=(int)250; //An example of size I want to assign.
I know the size of an int is 4, so I have to force with casting or something like that, that the content of the four first slots of mem have the number 250 in this case, it's big endian.
But the problem is when I try to do what I've explained it doesn't work. I suppose there is a mistake with the conversion of types. I hopefully ask you, how could I force that mem[0] to mem[3] would have the size indicated, representated as an int and no as an uchar?
Thanks in advance
Like this:
*((int*) memp) = 250;
That says "Even though memp is a pointer to characters, I want you treat it as a pointer to integers, and put this integer where it points."
Have you considered using a union, as in:
union mem_with_size {
int size;
uchar mem[32*1024];
};
Then you don't have to worry about the casting. (You still have to worry about byte-ordering, of course, but that's a different issue.)
As others have pointed out, you need to cast to a pointer to int. You also need to make sure you take alignment of the pointer in consideration: on many architectures, an int needs to start at a memory location that is divisible by sizeof(int), and if you try to access an unaligned int, you get a SIGBUS. On other architectures, it works, but slowly. On yet others, it works quickly.
A portable way of doing this might be:
int x = 250;
memcpy(mem + offset, &x, sizeof(x));
Using unions may make this easier, though, so +1 to JamieH.
Cast pointer to int, not unsigned char again!
int * memp = (int *)mem;
* memp = 250; //An example of size I want to assign.
Related
I wish to have a type which can be used as two different array structures - depending on context. They are not to be used interchangeably whilst the program is executing, rather when the program is executed with a particular start-up flag the type will be addressed as one of the array types
(for example):
array1[2][100]
or
array2[200];
I am not interested in how the data is organised (well I am but it is not relevant to what I wish to achieve)
union m_arrays
{
uint16_t array1[2][100];
uint16_t array2[200];
};
or do I have to use a pointer and alloc it at runtime?
uint16_t * array;
array = malloc(200 * sizeof(uint16_t));
uint16_t m_value =100;
*(array + 199) = m_value;
//equivalent uint16_t array1[1][99] == *(array + 199);
//equivalent uint16_t array2[199] == *(array + 199);
I haven't tried anything as yet
A union as itself contains either of its members. That is, only one member can be "bound" at a time (this is just an abstraction, since C has no notion about which member is "active").
In general, the effective size of that union will be the higher size on bytes of its members.
Let me give an example:
#include <stdio.h>
typedef union m_arrays
{
int array1[2][100];
int array2[400];
} a;
int main()
{
printf("%zu", sizeof(a));
return 0;
}
In this example, this would print 1600 (assuming int is 4 bytes long, but at the end it will depend on the architecture) and is the highest size in bytes. So, YES, you can have a union of arrays in C
Yes, this does work, and it's actually precisely because of how arrays are different from pointers. I'm sure you've heard that arrays in C are really just pointers, but the truth is that there are some important differences.
First, an array always points to somewhere on the stack. You can't use malloc to make an array because malloc returns a heap address. A pointer can point anywhere, you can even set it to an arbitrary integer if you want (though there's no guaruntee you can access that memory that it points to).
Second, because arrays are fixed length, the compiler can and does allocate them for you when you declare them. Importantly, this comes with the guaruntee that the whole array is in one continuous memory block. So if you declare int arr[2][100], you'll have 200 int slots allocated in a row on the stack. That means you can treat any multimensional array as a single-dimensional array if you want to, e.g. instead of arr[y][x] you could do arr[0][y*100+x]. You could also do something like int* arr2 = arr and then treat arr2 as a regular array even though arr is technically an int** (you'll get a warning for doing either of these things, my point is that you can do them because of how arrays are made).
The third, and probably most important difference, is a consequence of the second. When you have an array in a struct or union, the struct/union isn't just holding a pointer to the first element. It holds the entire array. This is often used for copying arrays or returning them from functions. What this means for you is that what you want to do works despite what someone who's heard that arrays are pointers might initially think. If arrays were just an address and they were initialized by allocating at that address, there would be two different arrays initialized at two different places, and having the pointers to them in a union would mean one gets overwritten and now you have an array somewhere that you can't access.
So when this all comes together, your union of arrays basically has one array with two different ways of accessing the data (which is what you want if I'm not mistaken). A little example:
#include <stdio.h>
int main(void) {
union {
int arr1[4];
int arr2[2][2];
} u;
u.arr1[0] = 1;
u.arr1[1] = 2;
u.arr1[2] = 3;
u.arr1[3] = 4;
printf("%d %d\n%d %d\n", u.arr2[0][0], u.arr2[0][1], u.arr2[1][0], u.arr2[1][1]);
return 0;
}
Output:
1 2
3 4
We can also quickly walk through why this wouldn't work with pure pointers. Let's say we instead had a union like this:
union {
int* arr1;
int** arr2;
} u;
Then we might initialize with u.arr1 = (int*) malloc(4 * sizeof (int));. Then we could use arr1 like a normal array. But what happens when we try to use arr2? Well, arr2[y][x] is of course syntactic sugar for *(*(arr2+y)+x)). Once it's dereferenced that first time, we now have an int, since the address points to an int. So when we add x to that int and try to dereference again, we're trying to dereference an int. C will try to do it, and if you're very unlucky it will succeed; I say unlucky because then you'll be messing with arbitrary memory. What's more likely is a segfault because whatever int is there is most likely not an address your program has access to.
I haven’t been able to find an answer for this. Does it matter which of these methods I use in C?
int get_int(void *vptr) {
return *(int *)vptr;
}
or
int get_int(void *vptr) {
int i = 0;
memcpy(&i, vptr, sizeof(int));
return i;
}
It seems to give the same result on my tests but is this equivalent in every case?
The main difference is that the memcpy will work in more cases -- it just requires that the memory being copied from contains data of the appropriate type. The cast-and-dereference requires that and also requires that the pointer is valid for accessing that type. On most machines, this (just) requires proper alignment. The cast also allows the compiler to assume that it does not alias with a value of a different type.
The net result is that the memcpy is perhaps "safer", but also perhaps a bit more expensive.
but is this equivalent in every case?
No.
*(int *)vptr relies on int alignment. If vptr does not point to int alsigned data, the result is UB.
memcpy() does not have than requirement. It "works" as long as the data is not some trap (rare).
No, it is not equivalent because, believe it or not, not all pointers are the same in all cases (though on x86+ architectures they are).
The cast, however, will work where it is defined.
I understand that I can reassign a variable to a bigger type if it fits, ad its ok to do it. For example:
short s = 2;
int i = s;
long l = i;
long long ll = l;
When I try to do it with pointers it fails and I don't understand why. I have integers that I pass as arguments to functions expecting a pointer to a long long. And it hasn't failed, yet..
The other day I was going from short to int, and something weird happens, I hope someone can I explain it to me. This would be the minimal code to reproduce.
short s = 2;
int* ptr_i = &s; // here ptr_i is the pointer to s, ok , but *ptr_i is definitely not 2
When I try to do it with pointers it fails and I don't understand why.
A major purpose of the type system in C is to reduce programming mistakes. A default conversion may be disallowed or diagnosed because it is symptomatic of a mistake, not because the value cannot be converted.
In int *ptr_i = &s;, &s is the address of a short, typically a 16-bit integer. If ptr_i is set to point to the same memory and *ptr_i is used, it attempts to refer to an int at that address, typically a 32-bit integer. This is generally an error; loading a 32-bit integer from a place where there is a 16-bit integer, and we do not know what is beyond it, is not usually a desired operation. The C standard does not define the behavior when this is attempted.
In fact, there are multiple things that can go wrong with this:
As described above, using *ptr_i when we only know there is a short there may produce undesired results.
The short object may have alignment that is not suitable for an int, which can cause a problem either with the pointer conversion or with using the converted pointer.
The C standard does not define the result of converting short * to int * except that, if it is properly aligned for int, the result can be converted back to short * to produce a value equal to the original pointer.
Even if short and int are the same width, say 32 bits, and the alignment is good, the C standard has rules about aliasing that allow the compiler to assume that an int * never accesses an object that was defined as short. In consequence, optimization of your program may transform it in unexpected ways.
I have integers that I pass as arguments to functions expecting a pointer to a long long.
C does allow default conversions of integers to integers that are the same width or wider, because these are not usually mistakes.
I want an array of pointers and I want to set byte values in the memory addresses where the pointers (of the array) are pointing.
Would this work:
unsigned int *pointer[4] = {(unsigned int *) 0xFF200020, (unsigned int *) 0xFF20001C, (unsigned int *) 0xFF200018, (unsigned int *) 0xFF200014};
*pointer[0] = 0b0111111; // the value is correct for the address
Or is the syntax somehow different?
EDIT:
I'm coding for an SOC board and these are memory addresses that contain the case of some UI elements.
unsigned int *element1 = (unsigned int *) 0xFF200020;
*element1 = 0b0111111;
works so I'm just interested about the C syntax of this.
EDIT2: There was one 0 too much in ... = 0b0...
Short answer:
Everything you've written is fine.
Thoughts:
I'm a big fan of using the types from stdint.h. This would let you write uint32_t which is more clearly a 32 bit unsigned number than unsigned long.
You'll often see people write macros to refer to these registers:
#define REG_IRQ (*(volatile uint32_t *)(0xFF200020))
REG_IRQ = 0x42;
It's possible that you actually want these pointers to be to volatile integers. You want it to be volatile if the value can change outside of the execution of your program. That is, if that memory position doesn't act strictly like a piece of memory. (For example, it's a register that stores the interrupt flags).
With most compilers I've used on embedded platforms, you'll have problems from ignoring volatile once optimizations have been enabled.
0b00111111 is, sadly, non-standard. You can use octal, decimal, or hexadecimal.
Sure, this should work, providing you can find addresses in your own segment.
Most probably, you'll have a segmentation fault when running this code, because 0xFF200020 have really few chances to be in your program segment.
This will not throw any error and will work fine but hard-coding memory address the pointer is pointing to is not a good idea. De-referencing some unknown/non-existing memory location will cause segmentation fault but if you are sure about the memory location and hard-coding values to them as done here is totally fine.
i.e. if we cast a C or C++ unsigned char array named arr as (unsigned short*)arr and then assign to it, is the result the same independent of machine endianness?
Side note - I saw the discussion on IBM and elsewhere on SO with example:
unsigned char endian[2] = {1, 0};
short x;
x = *(short *) endian;
...stating that the value of x will depend on the layout of endian, and hence the endianness of the machine. That means dereferencing an array is endian-dependent, but what about assigning to it?
*(short*) endian = 1;
Are all future short-casted dereferences then guaranteed to return 1, regardless of endianness?
After reading the responses, I wanted to post some context:
In this struct
struct pix {
unsigned char r;
unsigned char g;
unsigned char b;
unsigned char a;
unsigned char y[2];
};
replacing unsigned char y[2] with unsigned short y makes no individual difference, but if I make an array of these structs and put that in another struct, then I've noticed that the size of the container struct tends to be higher for the "unsigned short" version, so, since I intend to make a large array, I went with unsigned char[2] to save space overhead. I'm not sure why, but I imagine it's easier to align the uchar[2] in memory.
Because I need to do a ton of math with that variable y, which is meant to be a single short-length numerical value, I find myself casting to short a lot just to avoid individually accessing the uchar bytes... sort of a fast way to avoid ugly byte-specific math, but then I thought about endianness and whether my math would still be correct if I just cast everything like
*(unsigned short*)this->operator()(x0, y0).y = (ySum >> 2) & 0xFFFF;
...which is a line from a program that averages 4-adjacent-neighbors in a 2-D array, but the point is that I have a bunch of these operations that need to act on the uchar[2] field as a single short, and I'm trying to find the lightest (i.e. without an endian-based if-else statement every time I need to access or assign), endian-independent way of working with the short.
Thanks to strict pointer aliasing it's undefined behaviour, so it might be anything. If you'd do the same with a union however the answer is no, the result is dependent on machine endianness.
Each possible value of short has a so-called "object representation"[*], which is a sequence of byte values. When an object of type short holds that value, the bytes of the object hold that sequence of values.
You can think of endianness as just being one of the ways in which the object representation is implementation-dependent: does the byte with the lowest address hold the most significant bits of the value, or the least significant?
Hopefully this answers your question. Provided you've safely written a valid object representation of 1 as a short into some memory, when you read it back from the same memory you'll get the same value again, regardless of what the object representation of 1 actually is in that implementation. And in particular regardless of endianness. But as the others say, you do have to avoid undefined behavior.
[*] Or possibly there's more than one object representation for the same value, on exotic architectures.
Yes, all future dereferences will return 1 as well: As 1 is in range of type short, it will end up in memory unmodified and won't change behind your back once it's there.
However, the code itself violates effective typing: It's illegal to access an unsigned char[2] as a short, and may raise a SIGBUS if your architecture doesn't support unaligned access and you're particularly unlucky.
However, character-wise access of any object is always legal, and a portable version of your code looks like this:
short value = 1;
unsigned char *bytes = (unsigned char *)&value;
How value is stored in memory is of course still implementation-defined, ie you can't know what the following will print without further knowledge about the architecture:
assert(sizeof value == 2); // check for size 2 shorts
printf("%i %i\n", bytes[0], bytes[1]);