I am working on an 8051 platform which has a 16 bit pointer width.
I have a common code module for handling flash emulation and there's a function that returns the 16 bit start address of a page:
volatile u16_t start_address = find_start_address_of_page( page );
I think want to pass this 'address' to a CRC function that wants a u8_t* as a parameter so I cast it in the function call like so:
(u8_t *)start_address
This generates the warning
Warning[Pe1053]: conversion from integer to smaller pointer
Which confuses me a bit, because a u8_t* is 16 bits wide, and my variable is a 16 bit variable.. Is it simply that the compiler is warning about an "integer to pointer" conversion in general?
The code works fine, I just want to be sure I'm not missing something silly here...
You write that your 8051 platform has a 16 bit pointer width.
As far as I know, the 8051 has different address ranges for
- the internal RAM in the processor (max 256 Byte)
- external RAM (max 64k)
- Program Memory (max 64k)
The compiler I have worked with (Keil) therefore had at least four different pointer types.
An 8 bit wide 'data' pointer for the internal RAM.
A 16 bit wide 'xdata' pointer for the external RAM.
A 16 bit wide 'code' pointer for the Program Memory.
A 24 bit wide universal pointer which could be set to point to any of the three memory types. The first byte was used to select the memory type.
The warning text could mean that the compiler wants to convert your 16 bit value to an address in internal RAM which is only 8 bits wide.
If you want to silence your warning you could use a union to move your information into a different type, i.e.
union {
u16_t origType;
u8_t *newtype;
} u;
u.origType = start_address;
Assuming they are the same size you can then pass u.newtype into you function.
Since start_address is a variable that holds a memory address, you should declare and use it as such, meaning a pointer:
volatile u16_t *start_address = find_start_address_of_page( page );
Of course this also means that your function find_start_address_of_page(); has to return a pointer.
By the way, the fact that int and int * are both 16 bits wide (on your processor) is not enough. For example, a pointer to an int on most (all?) 16 bit processors has to be aligned to an even (multiple of 2) address, because of limitations of the assembler instructions and/or the data bus implementation.
In the same way, things like start_address++; increments differently depending if it is an int or an int * (or even a char *). If it is an int (or a char *) it will increment by one, but if it is an int * it will increment by two.
By this I'm trying to show that the compiler makes a lot of checks beyond the number of bits, depending on the type of variable (and the processor abilities).
start_address is of type u16_t, and not a pointer.
if you want to pass his address to CRC, then try this:
((u8_t *)((u16_t *)start_address))
Related
I have a block of memory, which contains data, how can i access the data, even if it is misaligned, platform indepently? The data is mostly 8 and 32 bit values.
If you want complete platform independence, declare an unsigned char * to point to the memory, pick up bytes, and use '|' and '<<' as needed to assemble the values.
You can memcpy to an aligned one:
void f(void *var_16bit_alignment) {
uint16_t v;
memcpy(&v, var_16bit_alignment, sizeof(v);
// access to v are aligned now!
}
Note that use of uint16_t * as a type for the parameter is already undefined behavior, and that the type of v can be adjusted for your needs (uint32_t, ...). Note also that this does not deal with endianess, but it easy expand with ntohs or similar.
For completeness, 8 bit data is already aligned (unless you are using a very weird platform).
I'm building a garbage collector for a compiler. I work with "physical" and "virtual" addresses. The virtual addresses have type value_t and are 32 bits. The physical addresses have type value_t* and are 32 or 64 bits pointers depending on the host machine. The conversion between the two is as follows:
static void* addr_v_to_p(value_t v_addr) {
assert(0 <= v_addr);
return (char*)memory_start + v_addr;
}
static value_t addr_p_to_v(void* p_addr) {
assert(memory_start <= p_addr && p_addr <= memory_end);
return (value_t)((char*)p_addr - (char*)memory_start);
}
Then I set a memory lay out similar to this:
Where the bitmap has to reference values of the heap.
Problem
I want to give the index in the bitmap of an address in a machine independent way. (For that purposes it becomes handy to set VALUE_BITS = sizeof(value_t) * CHAR_BITS as the number of bits of the value_t type). I would write:
(ptr, heap_start, bitmap_start of type value_t*)
size_t index = ptr - heap_start;
size_t word_index = index / VALUE_BITS;
bitmap_start[word_index] = ...;
But I'm not sure this is going to work.
As I understand the question, your "bitmap" is intended to be sequence of bits, with a unique bit corresponding to each heap address. Presumably you also want to minimize the number of unused bits. You are asking about your proposed approach to mapping between heap addresses and bits.
Furthermore, the conversion functions you present between virtual and physical addresses suggest that your memory model is byte-addressible, as opposed, say, to being addressible only with the granularity of a value_t.
Since your bitmap is apparently accessed in units of type value_t, which I'm taking to be unsigned and without padding bits, the number of usable bits in each unit is sizeof(value_t) * CHAR_BIT. That matches your VALUE_BITS, modulo spelling. Still, if bitmap_start is going to be (or reasonably can be made to be) visible wherever VALUE_BITS is defined (if it is a variable) or used (if it is a macro), then I would be inclined to write its initializer / replacement text as (sizeof(*bitmap_start) * CHAR_BIT). That's clearer to me and adapts automatically if ever you change the type to which bitmap_start points.
Now let's consider your code starting with this:
size_t index = ptr - heap_start;
There's nothing inherently wrong with that, but remember that pointer arithmetic is defined in terms of units of the pointed-to type. Thus, that gives the number of units of type value_t in the half-open interval defined by the two pointers, supposing ptr points into or just past the end of the heap and is properly aligned. That alignment caveat matters because your model is byte-addressible, therefore there are valid values that ptr can take that are misaligned. In fact, the majority of valid ptr values are misaligned. If you want the index in terms of a byte offset into the heap -- and it appears you do -- then you want something more like this:
ptrdiff_t index = (char *) ptr - (char *) heap_start;
Let's move on to the next part:
size_t word_index = index / VALUE_BITS;
It seems you're trying to determine the storage unit in the bitset that contains the indexth bit. If we stipulate that each unit contains VALUE_BITS usable bits, and that you want every bit in each unit to correspond to a heap address, then this is fine.
But you seem to have run out of steam there, as this ...
bitmap_start[word_index] = ...;
... is slightly lacking in detail. You're going to need to use bit masking to select the appropriate bit of bitmap_start[word_index] to examine or set, and in doing so you'll need to take care to avoid modifying the other bits in the same unit. That's not hard, but I'm not going to do it for you.
I have a function get_picture() that takes a picture. It returns a pointer of type uint8_t (where the pciture is stored) and takes a pointer to a variable
that stores the length of the picture.
Here is the declaration:
uint8_t * get_picture(int *piclength)
Here I call it in main():
unsigned int address, value;
address = (unsigned int)get_picture((int*)& value);
My question is - becuase address is storing an address (which is positive) should I actually define it as an int.
I'm not sure you understand pointers.
If your function returns a uint8_t * then you should be storing it in uint8_t * not an int.
As an example:
uint8_t* get_picture(int* piclength);
int piclength;
uint8_t* address;
address = get_picture(&piclength);
If you really want to convert a data-pointer to an integer, use the dedicated typedef instead of some random (and possibly too small) type:
uintptr_t / intptr_t (Optional typedefs in <stdint.h>)
Still, the need is rare, and I don't see it here.
It depends on what you are really after. Your code is fine as is if you want address to contain the address of where that picture lives. Likewise you could use an int, since bits is bits, int is the same number of bits as unsigned int and whatever consumes address can be fed those bits. It makes more sense as a human to think of addresses as unsigned, but the compiler and hardware don't care, bits is bits.
But depending on what you are doing you may want to as mentioned already, preserve this address using a pointer of the same type. See Dragan's answer.
If you want to "see" the address then it depends on how you want to see it, converting it to an unsigned int is one easy and generic way to do it.
Yes, this is very system dependent and the size of int varies by toolchain and target and may or may not completely hold an address for that system, so some masking may be required by the consumer of that variable.
So your code is fine, I think I understand the question. Signed or unsigned is in the eye of the beholder, it is only unsigned or signed for particular specific operations. Addresses are not themselves signed nor unsigned, they are just bits on an address bus. For a sane compiler unsigned int and int are the same size, store the same number of bits so long as this compiler defines them as at least the size of the address that this compiler uses for a pointer, then this will work just fine with int or unsigned int. Int feels a little wrong, unsigned int feels right, but those are human emotions. The hardware doesn't care, so long as the bits dont change on their way to the address bus. Now if for some reason the code we don't see prints this variable as a decimal for example printf("%d\n",address); (why would you printf on a microcontroller?) then it may look strange to humans but will still be the right decimal interpretation of the bit pattern than is the address. printf("0x%X\n",address); would make more sense and be more generic. if your printf supports it you could just printf("%p",address); using Dragan's uint8_t * address declaration, which is what many folks here are probably thinking based on classical C training. vs bits are bits and have no meaning whatsoever to the hardware until used, and only for that use case, an address is only an address on the address bus, when doing math on it to compute another address it is not an address it is a bit pattern being fed into the alu, signed or unsigned might depend on the operation (add and subtract dont know signed from unsigned, multiply and divide do).
If you choose to not to use uint8_t * address as a declaration, then unsigned int "feels" better, less likely to mess you up (if you have enough bits in an (unsigned) int for that compiler to store an address in the first place). A signed int feels a little wrong, but technically should work. My rule is only use signed when you specifically need signed, otherwise use unsigned everywhere else, saves on a lot of bugs. Unfortunately traditionally C libraries do it the other way around, making a big mess before the stdint.h stuff came about.
This is taken from C, and is based on that.
Let's imagine we have a 32 bit pointer
char* charPointer;
It points into some place in memory that contains some data. It knows that increments of this pointer are in 1 byte, etc.
On the other hand,
int* intPointer;
also points into some place in memory and if we increase it it knows that it should go up by 4 bytes if we add 1 to it.
Question is, how are we able to address full 32 bits of addressable space (2^32) - 4 gigabytes with those pointers, if obviously they contain some information in them that allows them to be separated one from another, for example char* or int*, so this leaves us with not 32 bytes, but with less.
When typing this question I came to thinking, maybe it is all syntatic sugar and really for compiler? Maybe raw pointer is just 32 bit and it doesn't care of the type? Is it the case?
You might be confused by compile time versus run time.
During compilation, gcc (or any C compiler) knows the type of a pointer, in particular knows the type of the data pointed by that pointer variable. So gcccan emit the right machine code. So an increment of a int * variable (on a 32 bits machine having 32 bits int) is translated to an increment of 4 (bytes), while an increment of a char* variable is translated to an increment of 1.
During runtime, the compiled executable (it does not care or need gcc) is only dealing with machine pointers, usually addresses of bytes (or of the start of some word).
Types (in C programs) are not known during runtime.
Some other languages (Lisp, Python, Javascript, ....) require the types to be known at runtime. In recent C++ (but not C) some objects (those having virtual functions) may have RTTI.
It is indeed syntactic sugar. Consider the following code fragment:
int t[2];
int a = t[1];
The second line is equivalent to:
int a = *(t + 1); // pointer addition
which itself is equivalent to:
int a = *(int*)((char*)t + 1 * sizeof(int)); // integer addition
After the compiler has checked the types it drops the casts and works only with addresses, lengths and integer addition.
Yes. Raw pointer is 32 bits of data (or 16 or 64 bits, depending on architecture), and does not contain anything else. Whether it's int *, char *, struct sockaddr_in * is just information for compiler, to know what is the number to actually add when incrementing, and for the type it's going to have when you dereference it.
Your hypothesis is correct: to see how different kinds of pointer are handled, try running this program:
int main()
{
char * pc = 0;
int * pi = 0;
printf("%p\n", pc + 1);
printf("%p\n", pi + 1);
return 0;
}
You will note that adding one to a char* increased its numeric value by 1, while doing the same to the int* increased by 4 (which is the size of an int on my machine).
It's exactly as you say in the end - types in C are just a compile-time concept that tells to the compiler how to generate the code for the various operations you can perform on variables.
In the end pointers just boil down to the address they point to, the semantic information doesn't exist anymore once the code is compiled.
Incrementing an int* pointer is different from a incrementing char* solely because the pointer variable is declared as int*. You can cast an int* to char* and then it will increment with 1 byte.
So, yes, it is all just syntactic sugar. It makes some kinds of array processing easier and confuses void* users.
I'm writing a linux kernel module that makes use of the exported symbol open_exec
struct file *open_exec(const char *name)
It returns a pointer, and I can check for an error with the IS_ERR macro:
if (IS_ERR(file))
return file;
During compile time, I get this warning:
warning: return makes integer from pointer without a cast
This is because my function here returns an integer. If I try to cast it:
return (int) file;
I don't get a warning on my 32bit machine, but I do on my 64bit machine:
warning: cast from pointer to integer of different size
This is because the sizeof of an int and a pointer are the same on 32bit, but they differ on a 64bit machine.
Casting it or not, the code appears to work. I'd just like to get rid of the warning.
How do I properly cast a pointer to an integer and get the value I expect, while not getting a compiler warning? The value I expect is essentially an integer listed in include/asm-generic/errno-base.h of the linux kernel code base.
Since I'm only looking at the pointer as if it was an integer in the case where IS_ERR() is true, I can be sure that it does in-fact only hold an integer value.
The PTR_ERR() macro in linux/err.h, which is where IS_ERR() is also defined, converts a pointer that's really an error code into the appropriate type (a long).
You should use something like:
if (IS_ERR(file))
return PTR_ERR(file);
Search for existing uses of PTR_ERR() in the source and you'll see this is a common pattern.
It might be appropriate for your function to return a long rather than an int - but all error codes should be representable in an int.
You can't properly cast a pointer to a type of smaller size, period. You could do some conversion if you were sure of what that pointer stored.
For example, if you know that a pointer has only lowest 32 bits set you can just cast it and use some compiler-specific pragma to suppress the warning. Or if you want to hash the pointer for using in something like a hash table you could xor the upper 32 bits with the lower 32 bits.
This can't be decided without more knowledge of how that int is used later.
Im not sure I get how you sometimes want to return an number from errno-base.h and sometimes a pointer -- how would the receiving function be able to tell the two apart? That being equal, then on Linux GCC,
int is 32bit wide irrespective of whether you are on 32 or 64bit
linux
pointers are 64 bit wide on 64 bit architectures, and 32 bite wide on
32 bit architectures
long are 32bit wide on 32bit architectures and 64 bit wide on 64 bit
architectures.
long long are always 64bit wide
hence on a 64bit architecture casting a pointer to an int means that you will case a 64bit value to a 32bit value, and you can be somewhat sure that you will lose part of the 64bit information from the pointer -- and this is what the compiler warning is all about, as you point out yourself.
If you want to cast from pointer to something 'anonymous' then your choices should be either long, long long or void* -- with the void* being the most portable.
The other alternative is to record it as an offset, that is if you have a large memory area where you want to 'cast' to a 32bit integer, then convert it to something like;
static struct mybigbuffer *globalbuffer;
int cast2int(void*x)
{
return (int)(globalbuffer-(struct mybigbuffer*)x);
}
however that is only work assuming that you know that your your memory will never exceed 2^31 records of globalbuf and that your pointers are assured to align on boundaries etc -- so unless you are 100% sure you know what you are doing, I would not recommended this either -- stick with the long or void* as the safe options.