X86-64, Linux, Windows.
Consider that I'd want to make some sort of "free launch for tag pointers". Basically I want to have two pointers that point to the same actual memory block but whose bits are different. (For example I want one bit to be used by GC collection or for some other reason).
intptr_t ptr = malloc()
intptr_t ptr2 = map(ptr | GC_FLAG_REACHABLE) //some magic call
int* p = int*(ptr);
int* p2 = int*(ptr2);
*p = 10;
*p2 = 20;
assert(*p == 20)
assert(p != p2)
On Linux, mmap() the same file twice. Same thing on Windows really, but it has its own set of functions for that.
Mapping the same memory (mmap on POSIX as Ignacio mentions, MapViewOfFile on Windows) to multiple virtual addresses may provide you some interesting coherency puzzles (are writes at one address visible when read at another address?). Or maybe not. I'm not sure what all the platform guarantees are.
More commonly, one simply reserves a few bits in the pointer and shifts things around as necessary.
If all your objects are aligned to 8-byte boundaries, it's common to simply store tags in the 3 least-significant bits of a pointer, and mask them off before dereferencing (as thkala mentions). If you choose a higher alignment, such as 16-bytes or 32-bytes, then there are 3 or 5 least-significant bits that can be used for tagging. Equivalently, choose a few most-significant bits for tagging, and shift them off before dereferencing. (Sometimes non-contiguous bits are used, for example when packing pointers into the signalling NaNs of IEEE-754 floats (223 values) or doubles (251 values).)
Continuing on the high end of the pointer, current implementations of x86-64 use at most 48 bits out of a 64-bit pointer (0x0000000000000000-0x00007fffffffffff + 0xffff800000000000-0xffffffffffffffff) and Linux and Windows only hand out addresses in the first range to userspace, leaving 17 most-significant bits that can be safely masked off. (This is neither portable nor guaranteed to remain true in the future, though.)
Another approach is to stop considering "pointers" and simply use indices into a larger memory array, as the JVM does with -XX:+UseCompressedOops. If you've allocated a 512MB pool and are storing 8-byte aligned objects, there are 226 possible object locations, so a 32-value has 6 bits to spare in addition to the index. A dereference will require adding the index times the alignment to the base address of the array, saved elsewhere (it's the same for every "pointer"). If you look at things carefully, this is simply a generalization of the previous technique (which always has base at 0, where things line up with real pointers).
Once upon a time I worked on a Prolog implementation that used the following technique to have spare bits in a pointer:
Allocate a memory area with a known alignment. malloc() usually allocates memory with a 4-byte or 8-byte alignment. If necessary, use posix_memalign() to get areas with a higher alignment size.
Since the resulting pointer is aligned to intervals of multiple bytes, but it represents byte-accurate addresses, you have a few spare bits that will by definition be zero in the memory area pointer. For example a 4-byte alignment gives you two spare bits on the LSB side of the pointer.
You OR (|) your flags with those bits and now have a tagged pointer.
As long as you take care to properly mask the pointer before using it for memory access, you should be perfectly fine.
Related
I am confused as to how C pointers actually reference the memory address of a variable. I am probably missing something here, but if, for example an int is 32 bits (like in C), then this would be stored in 4 bytes.
If I am not mistaken then each memory address tends to be a byte in size, as these are generally the smallest units of addressable memory. So if an int takes up 4 bytes, then wouldn't it have 4 memory addresses? (as it is stored over 4 8-bit memory addresses).
If this is the case, then how come a pointer only holds one memory address? (or rather only displays one when printed, if it holds more?). Is this simply the first address that stores the int? (assuming they are stored contiguously).
I have tried to find answers online but this has only led to further confusion.
Yes, technically, there would be four addressable bytes for the int you describe. But the pointer points to the first byte, and reading an int from it reads that byte and the subsequent three bytes to construct the int value.
If you tried to read from a pointer referring to one of the other three bytes, at the very least you'd get a different value (because it would read the remains of the one int, and additional bytes next to it), and on some architectures which require aligned reads (so four byte values must begin at an address divisible by four), your program could crash.
The language tries to protect you from reading a misaligned pointer like that; if you have an int*, and add 1 to it, it doesn't increment the raw address by one, it increments it by sizeof(int) (for your case, 4), so that pointers to arrays of int can traverse the array value by value without accidentally reading a value that's logically partially from one int, and partially from its neighbor.
Pointer points to the starting address of your type, if you google "pointer size" it will show you it is generally dependent to your cpu architecture, not to your primitive type or object.
What is the size of a pointer?
which will hopefully support your thoughts although the question is about c++
One byte is the smallest addressable unit, but that doesn't mean an address is only one byte. Otherwise you'd only have 256 bytes you could address! Pointers are typically either 4 or 8 bytes on size
The address of a variable refers to the address of it's first byte. The remaining bytes are understood to immediately follow those, and the number of bytes are part of the datatype.
The specifics depend on the actual architecture of the machine (what kind a CPU, what kind of memory, etc) so I am assuming you care about a modern 32 bit processor where both an int and a pointer take four bytes of memory. Keep in mind that the same ideas apply when integers are two bytes and when pointers are 8 bytes but we have to focus on just one set of examples.
Having said all that, you are completely correct that an int uses four contiguous bytes of memory which means it has four separate memory addresses and that a pointer holds only one address - it is the address of the first byte of the int.
So the CPU has an instruction for reading an int. The instruction takes the address of the first byte of the int and reads an entire int - all four of them. And that's why you only need one address to read an entire int. So int i = 42 reads a four byte integer into i and the value is interpreted to mean the number 42.
But a pointer is also an integer where the value is a memory address, so it can be read exactly the same way. So int *p = 42 reads a four byte integer into p and the value is interpreted to mean memory address 42.
All this gets complicated when you start taking about the order the bytes are stored in so we won't talk about that (however, if you want to find out the term is endianness - see https://en.wikipedia.org/wiki/Endianness)
Does the size_t value of the virtual memory pointer returned by malloc() have an upper boundary?
I am wondering whether I can safely set the most significant bit of a 64 bits pointer to indicate that this is not a pointer but a literal integer instead.
malloc returns a void* not an integer. Casting a pointer to an integer is not giving you the (virtual memory) address, but some value that has to adhere to the semantics as defined in the C language standard (0 for a null pointer and adding and subtracting is related to pointer arithmetic), but that's about it.
You must make no assumptions whatsoever about the values of pointers-cast-to-integers other than that. As a matter of fact a C implementation may very well be in its right to tag non-null pointer cast to integer with some internal information in the upper bits.
As #datenwolf's answer states, you can't make any assumptions about how malloc is providing you the memory address. The MSB may well contain important bits that you could overwrite, if you attempted to use them to store meta data. I have worked on a 32-bit system that returned addresses with bits set in the MSB of addresses (not from malloc, but other system specific memory allocation functions).
However, it is guaranteed that malloc will return an address that is suitably aligned for your system. For example, on a 32-bit system, you'll get a 4-byte aligned pointer, and on 64-bit, you'll get an 8-byte aligned pointer. This means that you are guaranteed that the lower 2 or 3 bits respectively will be zero. You could increase the number of guaranteed bits by using memalign instead. It essentially is the same effect as storing meta data in the most significant bit. To get/set the literal, you can just up/down shift it into the remaining bits.
However, I wouldn't suggest either method. Save yourself some heartache, and allocate just a little more memory to store the flag. Unless you've got billions of them, it's really not worth it.
As for the size_t is concerned it can hold a value defined by limits.h
#ifndef SIZE_MAX
#ifdef _WIN64
#define SIZE_MAX _UI64_MAX
#else
#define SIZE_MAX UINT_MAX
Where _UI64_MAX and UINT_MAX are defined as followes.
#define UINT_MAX 0xffffffff /* maximum unsigned int value */
#define _UI64_MAX 0xffffffffffffffffui64
And as for the malloc() is concerned, on a 32 bit Windows it can return any (address) value within zero to 2 GB user-mode address space and on a 64 bit Windows it can return any (address) value within zero to 8 TB user-mode address space.
Again, on a 32 bit system, from WinNT 4 onwards it introduced a boot option /3G. With this, malloc() can return any (address) value within zero to 3 GB user-mode address space.
For more details have a look at Mark Russinovich's article here.
I've heard reads and writes of aligned int's are atomic and safe, I wonder when does the system make non malloc'd globals unaligned other than packed structures and casting/pointer arithmetic byte buffers?
[X86-64 linux] In all of my normal cases, the system always chooses integer locations that don't get word torn, for example, two byte on one word and the other two bytes on the other word. Can any one post a program/snip (C or assembly) that forces the global variable to unaligned address such that the integer gets torn and the system has to use two reads to load one integer value ?
When I print the below program, the addresses are close to each other such that multiple variables are within 64bits but never once word tearing is seen (smartness in the system or compiler ?)
#include <stdio.h>
int a;
char b;
char c;
int d;
int e = 0;
int isaligned(void *p, int N)
{
if (((int)p % N) == 0)
return 1;
else
return 0;
}
int main()
{
printf("processor is %d byte mode \n", sizeof(int *));
printf ( "a=%p/b=%p/c=%p/d=%p/f=%p\n", &a, &b, &c, &d, &e );
printf ( " check for 64bit alignment of test result of 0x80 = %d \n", isaligned( 0x80, 64 ));
printf ( " check for 64bit alignment of a result = %d \n", isaligned( &a, 64 ));
printf ( " check for 64bit alignment of d result = %d \n", isaligned( &e, 64 ));
return 0;}
Output:
processor is 8 byte mode
a=0x601038/b=0x60103c/c=0x60103d/d=0x601034/f=0x601030
check for 64bit alignment of test result of 0x80 = 1
check for 64bit alignment of a result = 0
check for 64bit alignment of d result = 0
How does a read of a char happen in the above case ? Does it read from 8 byte aligned boundary (in my case 0x601030 ) and then go to 0x60103c ?
Memory access granularity is always word size isn't it ?
Thx.
1) Yes, there is no guarantee that unaligned accesses are atomic, because [at least sometimes, on certain types of processors] the data may be written as two separate writes - for example if you cross over a memory page boundary [I'm not talking about 4KB pages for virtual memory, I'm talking about DDR2/3/4 pages, which is some fraction of the total memory size, typically 16Kbits times whatever the width is of the actual memory chip - which will vary depending on the memory stick itself]. Equally, on other processors than x86, you get a trap for reading unaligned memory, which would either cause the program to abort, or the read be emulated in software as multiple reads to "fix" the unaligned read.
2) You could always make an unaligned memory region by something like this:
char *ptr = malloc(sizeof(long long) * number+1);
long long *unaligned = (long long *)&ptr[2];
for(i = 0; i < number; i++)
temp = unaligned[i];
By the way, your alignment check checks if the address is aligned to 64 bytes, not 64 bits. You'll have to divide by 8 to check that it's aligned to 64 bits.
3) A char is a single byte read, and the address will be on the actual address of the byte itself. The actual memory read performed is probably for a full cache-line, starting at the target address, and then cycling around, so for example:
0x60103d is the target address, so the processor will read a cache line of 32 bytes, starting at the 64-bit word we want: 0x601038 (and as soon as that's completed the processor goes on to the next instruction - meanwhile the next read will be performed to fill the cacheline), then cacheline is filled with 0x601020, 0x601028, 0x601030. But should we turn the cache off [if you want your 3GHz latest x86 processor to be slightly slower than a 66MHz 486, disabling the cache is a good way to achieve that], the processor would just read one byte at 0x60103d.
4) Not on x86 processors, they have byte addressing - but for normal memory, reads are done on a cacheline basis, as explained above.
Note also that "may not be atomic" is not at all the same as "will not be atomic" - so you'll probably have a hard time making it go wrong by will - you really need to get all the timings of two different threads just right, and straddle cachelines, straddle memory page boundaries, and so on to make it go wrong - this will happen if you don't want it to happen, but trying to make it go wrong can be darn hard [trust me, I've been there, done that].
It probably doesn't, outside of those cases.
In assembly it's trivial. Something like:
.org 0x2
myglobal:
.word SOME_NUMBER
But on Intel, the processor can safely read unaligned memory. It might not be atomic, but that might not be apparent from the generated code.
Intel, right? The Intel ISA has single-byte read/write opcodes. Disassemble your program and see what it's using.
Not necessarily - you might have a mismatch between memory word size and processor word size.
1) This answer is platform-specific. In general, though, the compiler will align variables unless you force it to do otherwise.
2) The following will require two reads to load one variable when run on a 32-bit CPU:
uint64_t huge_variable;
The variable is larger than a register, so it will require multiple operations to access. You can also do something similar by using packed structures:
struct unaligned __attribute__ ((packed))
{
char buffer[2];
int unaligned;
char buffer2[2];
} sample_struct;
3) This answer is platform-specific. Some platforms may behave like you describe. Some platforms have instructions capable of fetching a half-register or quarter-register of data. I recommend examining the assembly emitted by your compiler for more details (make sure you turn off all compiler optimizations first).
4) The C language allows you to access memory with byte-sized granularity. How this is implemented under the hood and how much data your CPU fetches to read a single byte is platform-specific. For many CPUs, this is the same as the size of a general-purpose register.
The C standards guarantee that malloc(3) returns a memory area that complies to the strictest alignment requirements, so this just can't happen in that case. If there are unaligned data, it is probably read/written by pieces (that depends on the exact guarantees the architecture provides).
On some architectures unaligned access is allowed, on others it is a fatal error. When allowed, it is normally much slower than aligned access; when not allowed the compiler must take the pieces and splice them together, and that is even much slower.
Characters (really bytes) are normally allowed to have any byte address. The instructions working with bytes just get/store the individual byte in that case.
No, memory access is according to the width of the data. But real memory access is in terms of cache lines (read up on CPU cache for this).
Non-aligned objects can never come into existence without you invoking undefined behavior. In other words, there is no sequence of actions, all having well-defined behavior, which a program can take that will result in a non-aligned pointer coming into existence. In particular, there is no portable way to get the compiler to give you misaligned objects. The closest thing is the "packed structure" many compilers have, but that only applies to structure members, not independent objects.
Further, there is no way to test alignedness in portable C. You can use the implementation-defined conversions of pointers to integers and inspect the low bits, but there is no fundamental requirement that "aligned" pointers have zeros in the low bits, or that the low bits after conversion to integer even correspond to the "least significant" bits of the pointer, whatever that would mean. In other words, conversions between pointers and integers are not required to commute with arithmetic operations.
If you really want to make some misaligned pointers, the easiest way to do it, assuming alignof(int)>1, is something like:
char buf[2*sizeof(int)+1];
int *p1 = (int *)buf, *p2 = (int *)(buf+sizeof(int)+1);
It's impossible for both buf and buf+sizeof(int)+1 to be simultaneously aligned for int if alignof(int) is greater than 1. Thus at least one of the two (int *) casts gets applied to a misaligned pointer, invoking undefined behavior, and the typical result is a misaligned pointer.
I was reading through a presentation on the implementation of malloc, and on slide 7 it suggests storing a regions size and availability in a single word to save space. The alternative is to use two words, which is wasteful as the availability bit only needs to be 0 or 1.
This is the given explanation:
If blocks are aligned, low-order address bits are always 0
Why store an always-0 bit?
Use it as allocated/free flag! When reading size word, must mask out this bit
http://courses.engr.illinois.edu/cs241/sp2012/lectures/09-malloc.pdf
But I'm not really understanding how this works and how it could be implemented in C. Why is one bit of the size integer always 0?
If blocks are aligned, low-order address bits are always 0
This is the key to understanding what it going on. Many CPUs require that multibyte primitive values be stored at addresses divisible by the number of bytes in the primitive: 16-bit primitives need to be stored at even addresses; 32-bit ints need to be stored at addresses divisible by four, and so on. An attempt to access an int through a pointer that corresponds to an odd address results in a bus error.
In systems like that malloc must always return an address suitable for storing any primitive supported by the given CPU. Therefore, if CPU supports 32-bit integers, all addresses returned by malloc must be divisible by 4. Such addresses are said to be aligned. To comply, malloc implementations pad sizes blocks requested by the program by 0 to 3 bytes at the end to have length divisible by 4. As a consequence of this decision, the last two bits of an address of an aligned block will always be zero. An implementation of malloc can use these bits for its own purposes, as long as they are "masked out" before returning the result to callers.
malloc(3) (as specified by Posix) should
return a fresh block of memory; or NULL on failure; the returned pointer is not an alias of any other pointer in the program
return a suitably aligned block of memory. Alignment constraints are compiler, ABI, and processor specific. (Often, the alignment should be two words).
The size is not always zero. (actually, it is never zero). You could round it up to a multiple of two words, and use the last bit as a used/free bit.
However, pointers returned by malloc should be suitably aligned, e.g. to 8 bytes. So their bottom 3 bits are zero, and the allocated size in bytes of the malloc-ed zone is a multiple of 8 bytes (above the requested size passed to malloc), so the last 3 bits are zero (and you could use the last bit for other purposes, e.g. a used/free bit).
Why is one bit of the size integer always 0?
I see why this is confusing, but I don't think that's what they're saying on slide 7. They're saying that the low-order address bits are always 0.
Memory addresses of objects are aligned to specific boundaries, which means objects are aligned to a memory address that is a multiple of their size.
So a 64-bit integer is aligned to an eight-byte boundary;
0x7fff315470d8
If a pointer is always aligned to an eight-byte boundary, then the low-order three bits are always zero. ie: 0x816 = 10002
Basically, you can stick whatever you want in those low-order bits, so long as you take them out before dereferencing the pointer. In the 64-bit case you have 3 bits that are always 0, so you can store 3 "flags". In the case of this power point, they're saying take that lowest bit and use it for an "allocated" flag. Stick a 1 in it as long as the memory is allocated, mask it out when you send the pointer to the user.
When you call C's malloc, is there any guarantee about what the first few low order bits will be? If you're writing a compiler/interpreter for a dynamic language but want to have fixnums of the form bbbbbbbb bbbbbbbb . . . bbbbbbb1 (where b is a bit) and pointers of the form bbbbbbbb bbbbbbbb . . . bbbbbbb0 (or vice versa), is there any way to guarantee that malloc will return pointers that fit with such a scheme?
Should I just allocate two more bytes than I need, increment the return value by one if necessary to fit with the bit scheme, and store the actual pointer returned by malloc in the second byte so that I know what to free?
Can I just assume that malloc will return a pointer with zero as the final bit? Can I assume that an x86 will have two zero bits at the end and that an x64 will have four zero bits at the end?
C doesn't guarantee anything about the low order bits being zero only that the pointer is aligned for all possibly types. In practice 2 or 3 of the lowest bits will probably be zero, but do not count on it.
You can ensure it yourself, but a better way is to use something like posix_memalign.
If you want to do it yourself you need to overallocate memory and keep track of the original pointer. Something like (assuming you want 16-byte alignment, could be made generic, not tested):
void* my_aligned_malloc16(size_t sz) {
void* p = malloc(sz + 15 + 16); // 15 to ensure alignment, 16 for payload
if (!p) return NULL;
size_t aligned_addr = ((size_t)p + 15) & (~15);
void* aligned_ptr = (void*) aligned_addr;
memcpy(aligned_ptr, &p, sizeof(void*)); // save original pointer as payload
return (void*)(aligned_addr + 16); // return aligned address past payload
}
void my_aligned_free16(void* ptr){
void** original_pointer = (void**)( ((size_t)ptr) - 16 );
free(*original_pointer);
}
As you can see this is rather ugly, so prefer using something like posix_memalign. Your runtime probably has a similar function if that one is unavailable, e.g. memalign (thanks #R..) or _aligned_malloc when using MSVC.
Malloc is guaranteed to return a pointer with correct alignment for any data type, which on some architectures will be four-byte boundaries and on others might be eight. You can imagine an environment where the alignment would be one, though. The C library will have a macro somewhere that will tell you what the actual rounding is.
My answer may be slightly off-topic, but my cockpit alarms are going off like mad -- "Pull Up! Pull Up!"
This scheme sounds much too dependent on the HW architecture and the deep implementation details of the compiler and the C libraries. C allows one to get to this level of the nitty-gritty bits and bytes and to peek at the bit pattern and alignment of objects in order to code device drivers and the like. You, on the other hand, are building a high-level entity, a interpreter and run-time. C can do this as well, but you don't have to use C's low-level bit twiddling capabilities; IMHO, you should avoid doing so. Been there, done that, have the holes in the feet to show for it.
If you are successful in creating your interpreter, you will want to port it to a different platform, where the rules about bit representation and alignment can differ, perhaps radically. Coming up with a design that does not depend on such tricky bit manipulation and peeking under the covers will help you immensely down the road.
-k