I am playing around with a programming language implementation, and I'm wondering how (ill) advised it is to press into service the least significant bits of a function pointer to store data.
Are there any major platforms (AMD64/{Windows/Linux/MacOS}, Arm/{iOS,Android}) in which the 2 least significant bits are ever non-zero in function pointers? That is, is the alignment for the code at least 4 on major platforms?
I can tell you that Apple's 64-bit runtime (both ARM64 and Intel, I think) uses the least significant bits for flags broadly as you propose. In Objective-C everything is an object and, to be compatible with C, pretty much every object lives on the heap and is recorded by its pointer. In 64-bit mode they've allowed very small objects to live on the stack by fitting them into 62 bits and using the low two to indicate that this isn't really a pointer but a literal object. So you can get short strings, object-wrapped 32-bit and below numbers, etc, directly into the 'pointer' and not put anything on the heap.
However Apple does not do this with the 32-bit runtime (event the 'modern' one as on iOS). So it might be worth researching why that is. Admittedly it could just be because of some architectural quirk carried over from the PowerPC.
As has been pointed out to me in the comments (and why this is now tagged as community wiki), the C standard differentiates between the storage of function pointers specifically and all other kinds of pointer. So the above comment may or may not be relevant — I nevertheless believe it is because closures are a separate thing again from data and from functions, in compiled languages the code itself usually having been compiled in advance and the closure itself just being the data to fill the gaps. But the point I'm trying to make is that there are shipping, robust systems out there that assume they can reuse the least significant bits of pointers on systems that require alignment.
ARM has two modes - legacy (AKA "ARM" proper) and Thumb. In ARM mode, instructions are aligned on 4 byte boundary, in Thumb - on 2 byte. The CPU uses the zeroth bit for calls that switch mode: to go from ARM to Thumb, you issue a branch-and-switch-mode command to an address with its rightmost bit set to 1.
The preferred mode for native userland code happens to be Thumb on two most popular ARM-based platforms (iOS and Android). Yet interworking with ARM has to be supported. So there are effectively no unused bits in the address.
On ARM the low bit has a special meaning: It switches between Thumb and traditional mode. In Thumb mode the instructions are 16-bit aligned so both bits are used.
On AMD64 and x86 depending on the optimization mode functions may be located at odd addresses. This means that the low two bits are always in use.
There's no major modern platform that doesn't require its instructions to be at least 4-byte aligned, and I don't know of any C compiler which uses the low bytes for its own purposes. Blah blah blah about undefined behavior of operating on casted pointers in C, but you're safe.
EDIT: As pointed out below, for ARM Thumb, you only get one bit, and you need to make sure to clear it before you do the jump. For i386, some linkers won't do the alignment when optimization is disabled.
Related
Considering a 32-bit system (such as an ARM RISC MCU), how can one ensure that 16-bit variables are written/ read in an atomic way? Based on this doc, If I understood correctly, both 16-bit and 8-bit operations are atomic, but only assuming the memory is aligned. Question is, does the compiler always align the memory to 32-bit words (excluding cases like packed structures)?
The rationale here is to use uint16_t whenever possible instead of uint32_t for better code portability between 32-bit and 16-bit platforms. This is not about typedefing a type that is different on either platform (16 or 32 bit).
The compiler may align any (scalar) object as it pleases unless it is part of an array or similar, there's no restriction or guarantee from C. Arrays are guaranteed to be allocated contiguously however, with no padding. And the first member of a struct/union is guaranteed by be aligned (since the address of the struct/union may be converted to a pointer to the type of the first member).
To get atomic operations, you have to use something like atomic_uint_fast16_t (stdatomic.h) if supported by the compiler. Otherwise any operation in C cannot be regarded as atomic no matter the type, period.
It is a common mistake to think "8 bit copy is atomic on my CPU so if I use 8 bit types my code is re-entrant". It isn't, because uint8_t x = y; is not guaranteed to be done in a single instruction, nor is it guaranteed to result in atomic machine code instructions.
And even if an atomic instruction is picked, you could still get real-time bugs from code like that. Example with pseudo-machine code:
Store the contents of y in register A.
An interrupt which only occurs once every full moon fires, changing y.
The old value of y is stored in x - yay, this is an atomic instruction!
Now x has the old, outdated value.
Correct real-time behavior would have been to either completely update x before the interrupt hit, or alternatively update it after the interrupt hit.
The place were I work has a special agreement with all the relevant compiler suppliers that they are responsible for notifying our tool chain experts of cases were that assumption is not applicable.
This turned out to be necessary, because it otherwise cannot be reliably determined. Documentation does not say and the standard even lesss.
Or that is what the tooling people told me.
So it seems that for you the answer is: Ask the supplier.
By conducting a basic test by running a simple C++ program on a normal desktop PC it seems plausible to suppose that sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits ?
For example: in 32 bits architectures -> 4 bytes and in 64 bits architectures -> 8 bytes.
However I remember reading that, it is not like that in general!
So I was wondering what would be such circumstances?
For equality of size of pointers to data types compared with size of pointers
to other data types
For equality of size of pointers to data types compared with size of pointers
to functions
For equality of size of pointers to target architecture
No, it is not reasonable to assume. Making this assumption can cause bugs.
The sizes of pointers (and of integer types) in C or C++ are ultimately determined by the C or C++ implementation. Normal C or C++ implementations are heavily influenced by the architectures and the operating systems they target, but they may choose the sizes of their types for reasons other than execution speed, such as goals of supporting lower memory use (smaller pointers means less memory used in programs with lots of pointers), supporting code that was not written to be fully portable to any type sizes, or supporting easier use of big integers.
I have seen a compiler targeted for a 64-bit system but providing 32-bit pointers, for the purpose of building programs with smaller memory use. (It had been observed that the sizes of pointers were a considerable factor in memory consumption, due to the use of many structures with many connections and references using pointers.) Source code written with the assumption that the pointer size equalled the 64-bit register size would break.
It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
Depends. If you're aiming for a quick estimate of memory consumption it can be good enough. But not if your programs correctness depends on it.
(including pointers to functions)
But here is one important remark. Although most pointers will have the same size, function pointers may differ. It is not guaranteed that a void* will be able to hold a function pointer. At least, this is true for C. I don't know about C++.
So I was wondering what would be such circumstances if any?
It can be tons of reasons why it differs. If your programs correctness depends on this size it is NEVER ok to do such an assumption. Check it up instead. It shouldn't be hard at all.
You can use this macro to check such things at compile time in C:
#include <assert.h>
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
When compiling, this gives an error message:
$ gcc main.c
In file included from main.c:1:
main.c:2:1: error: static assertion failed: "Pointers are assumed to be exactly 4 bytes"
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
^~~~~~~~~~~~~
If you're using C++, you can skip #include <assert.h> because static_assert is a keyword in C++. (And you can use the keyword _Static_assert in C, but it looks ugly, so use the include and the macro instead.)
Since these two lines are so extremely easy to include in your code, there's NO excuse not to do so if your program would not work correctly with the wrong pointer size.
It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
It might be reasonable, but it isn't reliably correct. So I guess the answer is "no, except when you already know the answer is yes (and aren't worried about portability)".
Potentially:
systems can have different register sizes, and use different underlying widths for data and addressing: it's not apparent what "target architecture bits" even means for such a system, so you have to choose a specific ABI (and once you've done that you know the answer, for that ABI).
systems may support different pointer models, such as the old near, far and huge pointers; in that case you need to know what mode your code is being compiled in (and then you know the answer, for that mode)
systems may support different pointer sizes, such as the X32 ABI already mentioned, or either of the other popular 64-bit data models described here
Finally, there's no obvious benefit to this assumption, since you can just use sizeof(T) directly for whatever T you're interested in.
If you want to convert between integers and pointers, use intptr_t. If you want to store integers and pointers in the same space, just use a union.
Target architecture "bits" says about registers size. Ex. Intel 8051 is 8-bit and operates on 8-bit registers, but (external)RAM and (external)ROM is accessed with 16-bit values.
For correctness, you cannot assume anything. You have to check and be prepared to deal with weird situations.
As a general rule of thumb, it is a reasonable default assumption.
It's not universally true though. See the X32 ABI, for example, which uses 32bit pointers on 64bit architectures to save a bit of memory and cache footprint. Same for the ILP32 ABI on AArch64.
So, for guesstimating memory use, you can use your assumption and it will often be right.
It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
If you look at all types of CPUs (including microcontrollers) currently being produced, I would say no.
Extreme counterexamples would be architectures where two different pointer sizes are used in the same program:
x86, 16-bit
In MS-DOS and 16-bit Windows, a "normal" program used both 16- and 32-bit pointers.
x86, 32-bit segmented
There were only a few, less known operating systems using this memory model.
Programs typically used both 32- and 48-bit pointers.
STM8A
This modern automotive 8-bit CPU uses 16- and 24-bit pointers. Both in the same program, of course.
AVR tiny series
RAM is addressed using 8-bit pointers, Flash is addressed using 16-bit pointers.
(However, AVR tiny cannot be programmed with C++, as far as I know.)
It's not correct, for example DOS pointers (16 bit) can be far (seg+ofs).
However, for the usual targets (Windows, OSX, Linux, Android, iOS) then it's correct. Because they all use the flat programming model which relies on paging.
In theory, you can also have systems which uses only the lower 32 bits when in x64. An example is a Windows executable linked without LARGEADDRESSAWARE. However this is to help the programmer avoid bugs when switching to x64. The pointers are truncated to 32 bits, but they are still 64 bit.
In x64 operating systems then this assumption is always true, because the flat mode is the only valid one. Long mode in CPU forces GDT entries to be 64 bit flat.
One also mentions a x32 ABI, I believe it is based on the same paging technology, forcing all pointers to be mapped to the lower 4gb. However this must be based to the same theory as in Windows. In x64 you can only have flat mode.
In 32 bit protected mode you could have pointers up to 48 bits. (Segmented mode). You can also have callgates. But, no operating system uses that mode.
Historically, on microcomputers and microcontrollers, pointers were often wider than general-purpose registers so that the CPU could address enough memory and still fit within the transistor budget. Most 8-bit CPUs (such as the 8080, Z80 or 6502) had 16-bit addresses.
Today, a mismatch is more likely to be because an app doesn’t need multiple gigabytes of data, so saving four bytes of memory on every pointer is a win.
Both C and C++ provide separate size_t, uintptr_t and off_t types, representing the largest possible object size (which might be smaller than the size of a pointer if the memory model is not flat), an integral type wide enough to hold a pointer, and a file offset (often wider than the largest object allowed in memory), respectively. A size_t (unsigned) or ptrdiff_t (signed) is the most portable way to get the native word size. Additionally, POSIX guarantees that the system compiler has some flag that means a long can hold any of these, but you cannot always assume so.
Generally pointers will be size 2 on a 16-bit system, 3 on a 24-bit system, 4 on a 32-bit system, and 8 on a 64-bit system. It depends on the ABI and C implementation. AMD has long and legacy modes, and there are differences between AMD64 and Intel64 for Assembly language programmers but these are hidden for higher level languages.
Any problems with C/C++ code is likely to be due to poor programming practices and ignoring compiler warnings. See: "20 issues of porting C++ code to the 64-bit platform".
See also: "Can pointers be of different sizes?" and LRiO's answer:
... you are asking about C++ and its compliant implementations, not some specific physical machine. I'd have to quote the entire standard in order to prove it, but the simple fact is that it makes no guarantees on the result of sizeof(T*) for any T, and (as a corollary) no guarantees that sizeof(T1*) == sizeof(T2*) for any T1 and T2).
Note: Where is answered by JeremyP, C99 section 6.3.2.3, subsection 8:
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
In GCC you can avoid incorrect assumptions by using built-in functions: "Object Size Checking Built-in Functions":
Built-in Function: size_t __builtin_object_size (const void * ptr, int type)
is a built-in construct that returns a constant number of bytes from ptr to the end of the object ptr pointer points to (if known at compile time). To determine the sizes of dynamically allocated objects the function relies on the allocation functions called to obtain the storage to be declared with the alloc_size attribute (see Common Function Attributes). __builtin_object_size never evaluates its arguments for side effects. If there are any side effects in them, it returns (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3. If there are multiple objects ptr can point to and all of them are known at compile time, the returned number is the maximum of remaining byte counts in those objects if type & 2 is 0 and minimum if nonzero. If it is not possible to determine which objects ptr points to at compile time, __builtin_object_size should return (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I came across this page The Lost Art of C Structure Packing and while I have never had to actually pad any structs, I'd like to learn a bit more so that when/if I need too - I can.
It says:
Storage for the basic C datatypes on an x86 or ARM processor doesn’t normally start at arbitrary byte addresses in memory. Rather, each type except char has an alignment requirement; chars can start on any byte address, but 2-byte shorts must start on an even address, 4-byte ints or floats must start on an address divisible by 4, and 8-byte longs or doubles must start on an address divisible by 8. Signed or unsigned makes no difference.
Does this imply that all 32 bit processors (x86, ARM, AVR32, PIC32,...) have this alignment requirement? What about 16 bit processors?
If not, and it is device specific, where can I find this information?
I tried searching through Microchip XC16 Manual but I could not find the alignment requirements that say that ints start at addresses divisible by 4.
I assume that the information is there, and I am not searching for the right key words - what is the "alignment requirement" called if I were to search online for more information?
Alignments requirements have 2 considerations: required, preferred
Required: Example: some platforms require various types, like an int to be aligned. Contorted code that attempts to access an int on an unaligned boundary results in a fault. Compilers will normally aligned data automatically to prevent this issue.
Efficiency: Unaligned accesses may be allowed yet results in slower code. Many compilers, rather than packing the data, will default to aligned data for speed efficiency. Typically such compilers allow a compiler specific keyword or compiler option to pack the data instead for space efficiency.
These issues apply to various processors of various sizes in different degrees. An 8-bit processor may have a 16-bit data bus and oblige 16+ -bit types to be aligned. A compliant C compiler for a 64-bit processor may have only have 64-bit types, even char. The possibilities are vast.
C provides an integer type max_align_t in <stddef.h>. This could be used in various ways to determine the minimum general alignment requirement.
... max_align_t which is an object type whose alignment is as great as is supported by the implementation in all contexts; ... C11 §7.19 2
C also has _Alignas() to impose stricter alignment of a variable.
There are two global answers here. Yes, all processors have an alignment penalty of some sort (ARM, MIPS, x86, etc). No you cannot determine by type. All ARMs do not have the same alignment penalty, despite what folks think they know about the older ARMv4 and ARMv5, you could do unaligned accesses in a predictable way, that predictable way was not what most of us would have preferred, and you have to enable it. MIPS and ARMs and perhaps others at one point would have a severe punishment for unaligned transfers, you would get a data fault. But due to the nature of how programmers program, etc, the default at least for ARM is to have that disabled on some/newer cores. You can disable it or enable it whichever way you want.
ALL processors have a penalty for unaligned transfers, a performance penalty, and those hits happen at the various layers, sometimes in the core, at the edge of the core, on each cache layer, and at the outer layer of ram. Since the designs vary so widely you cannot come up with a single rule.
Likewise since alignment in compilers is implementation defined, you cant write portable code. So if you are dealing with a processor (likely an ARM since that is where most folks get bitten) that has unaligned faults enabled, the most portable solution, but not foolproof, is to start your structs with the 64 bit variables, then the 32 then the 16 then the 8. Compilers tend to place things in the order that you defined them, so long as the whole struct starts on the right boundary for that target, then the variables will fall into alignment properly, no padding required. There is no global solution to the problem other than dont use structs, or disable alignment checking and suffer the front end performance hits.
Note that the 32 bit arms we generally deal with today use a 64 bit AMBA/AXI bus not 32, they still can check all the alignments (16, 32, 64) for transfers if enabled, but the unaligned performance hits at least at the AMBA/AXI level dont hit you unless you cross the 64 bit aligned boundary. You may still have an extra cache line hit, although that is unlikely if you dont have an AMBA/AXI hit.
What forms of memory address spaces have been used?
Today, a large flat virtual address space is common. Historically, more complicated address spaces have been used, such as a pair of a base address and an offset, a pair of a segment number and an offset, a word address plus some index for a byte or other sub-object, and so on.
From time to time, various answers and comments assert that C (or C++) pointers are essentially integers. That is an incorrect model for C (or C++), since the variety of address spaces is undoubtedly the cause of some of the C (or C++) rules about pointer operations. For example, not defining pointer arithmetic beyond an array simplifies support for pointers in a base and offset model. Limits on pointer conversion simplify support for address-plus-extra-data models.
That recurring assertion motivates this question. I am looking for information about the variety of address spaces to illustrate that a C pointer is not necessarily a simple integer and that the C restrictions on pointer operations are sensible given the wide variety of machines to be supported.
Useful information may include:
Examples of computer architectures with various address spaces and descriptions of those spaces.
Examples of various address spaces still in use in machines currently being manufactured.
References to documentation or explanation, especially URLs.
Elaboration on how address spaces motivate C pointer rules.
This is a broad question, so I am open to suggestions on managing it. I would be happy to see collaborative editing on a single generally inclusive answer. However, that may fail to award reputation as deserved. I suggest up-voting multiple useful contributions.
Just about anything you can imagine has probably been used. The
first major division is between byte addressing (all modern
architectures) and word addressing (pre-IBM 360/PDP-11, but
I think modern Unisys mainframes are still word addressed). In
word addressing, char* and void* would often be bigger than
an int*; even if they were not bigger, the "byte selector"
would be in the high order bits, which were required to be 0, or
would be ignored for anything other than bytes. (On a PDP-10,
for example, if p was a char*, (int)p < (int)(p+1) would
often be false, even though int and char* had the same
size.)
Among byte addressed machines, the major variants are segmented
and non-segmented architectures. Both are still wide spread
today, although in the case of Intel 32bit (a segmented
architecture with 48 bit addresses), some of the more widely
used OSs (Windows and Linux) artificially restrict user
processes to a single segment, simulating a flat addressing.
Although I've no recent experience, I would expect even more
variety in embedded processors. In particular, in the past, it
was frequent for embedded processors to use a Harvard
architecture, where code and data were in independent address
spaces (so that a function pointer and a data pointer, cast to a
large enough integral type, could compare equal).
I would say you are asking the wrong question, except as historical curiosity.
Even if your system happens to use a flat address space -- indeed, even if every system from now until the end of time uses a flat address space -- you still cannot treat pointers as integers.
The C and C++ standards leave all sorts of pointer arithmetic "undefined". That can impact you right now, on any system, because compilers will assume you avoid undefined behavior and optimize accordingly.
For a concrete example, three months ago a very interesting bug turned up in Valgrind:
https://sourceforge.net/p/valgrind/mailman/message/29730736/
(Click "View entire thread", then search for "undefined behavior".)
Basically, Valgrind was using less-than and greater-than on pointers to try to determine if an automatic variable was within a certain range. Because comparisons between pointers in different aggregates is "undefined", Clang simply optimized away all of the comparisons to return a constant true (or false; I forget).
This bug itself spawned an interesting StackOverflow question.
So while the original pointer arithmetic definitions may have catered to real machines, and that might be interesting for its own sake, it is actually irrelevant to programming today. What is relevant today is that you simply cannot assume that pointers behave like integers, period, regardless of the system you happen to be using. "Undefined behavior" does not mean "something funny happens"; it means the compiler can assume you do not engage in it. When you do, you introduce a contradiction into the compiler's reasoning; and from a contradiction, anything follows... It only depends on how smart your compiler is.
And they get smarter all the time.
There are various forms of bank-switched memory.
I worked on an embedded system that had 128 KB of total memory: 64KB of RAM and 64KB of EPROM. Pointers were only 16-bit, so a pointer into the RAM could have the same value of a pointer in the EPROM, even though they referred to different memory locations.
The compiler kept track of the type of the pointer so that it could generate the instruction(s) to select the correct bank before dereferencing a pointer.
You could argue that this was like segment + offset, and at the hardware level, it essentially was. But the segment (or more correctly, the bank) was implicit from the pointer's type and not stored as the value of a pointer. If you inspected a pointer in the debugger, you'd just see a 16-bit value. To know whether it was an offset into the RAM or the ROM, you had to know the type.
For example, Foo * could only be in RAM and const Bar * could only be in ROM. If you had to copy a Bar into RAM, the copy would actually be a different type. (It wasn't as simple as const/non-const:
Everything in ROM was const, but not all consts were in ROM.)
This was all in C, and I know we used non-standard extensions to make this work. I suspect a 100% compliant C compiler probably couldn't cope with this.
From a C programmer's perspective, there are three main kinds of implementation to worry about:
Those which target machines with a linear memory model, and which are designed and/or configured to be usable as a "high-level assembler"--something the authors of the Standard have expressly said they did not wish to preclude. Most implementations behave in this way when optimizations are disabled.
Those which are usable as "high-level assemblers" for machines with unusual memory architectures.
Those which whose design and/or configuration make them suitable only for tasks that do not involve low-level programming, including clang and gcc when optimizations are enabled.
Memory-management code targeting the first type of implementation will often be compatible with all implementations of that type whose targets use the same representations for pointers and integers. Memory-management code for the second type of implementation will often need to be specifically tailored for the particular hardware architecture. Platforms that don't use linear addressing are sufficiently rare, and sufficiently varied, that unless one needs to write or maintain code for some particular piece of unusual hardware (e.g. because it drives an expensive piece of industrial equipment for which more modern controllers aren't available) knowledge of any particular architecture isn't likely to be of much use.
Implementations of the third type should be used only for programs that don't need to do any memory-management or systems-programming tasks. Because the Standard doesn't require that all implementations be capable of supporting such tasks, some compiler writers--even when targeting linear-address machines--make no attempt to support any of the useful semantics thereof. Even some principles like "an equality comparison between two valid pointers will--at worst--either yield 0 or 1 chosen in possibly-unspecified fashion don't apply to such implementations.
Most programs fits well on <4GB address space but needs to use new features just available on x64 architecture.
Are there compilers/platforms where I can use x64 registers and specific instructions but preserving 32-bits pointers to save memory?
Is it possible do that transparently on legacy code? What switch to do that?
OR
What changes on code is it necessary to get 64-bits features while keep 32-bits pointers?
A simple way to circumvent this is if you'd have only few types for your structures that you are pointing to. Then you could just allocate big arrays for your data and do the indexing with uint32_t.
So a "pointer" in such a model would be just an index in a global array. Usually addressing with that should be efficient enough with a decent compiler, and it would save you some space. You'd loose other things that you might be interested in, dynamic allocation for instance.
Another way to achieve something similar is to encode a pointer with the difference to its actual location. If you can ensure that that difference always fits into 32 bit, you could gain too.
It's worth noting that there an ABI in development for linux, X32, that lets you build a x86_64 binary that uses 32 bit indices and addresses.
Only relatively new, but interesting nonetheless.
http://en.wikipedia.org/wiki/X32_ABI
Technically, it is possible for a compiler to do so. AFAIK, in practice it isn't done. It has been proposed for gcc (even with a patch here: http://gcc.gnu.org/ml/gcc/2007-10/msg00156.html) but never integrated (at least, it was not documented the last time I checked). My understanding is that it needs also support from the kernel and standard library to work (i.e. the kernel would need to set up things in a way not currently possible and using the existing 32 or 64 bit ABI to communicate with the kernel would not be possible).
What exactly are the "64-bit features" you need, isn't that a little vague?
Found this while searching myself for an answer:
http://www.codeproject.com/KB/cpp/smallptr.aspx
Also pick up the discussion at the bottom...
Never had any need to think about this, but it is interesting to realize that one can be concerned with how much space pointers need...
It depends on the platform. On Mac OS X, the first 4 GB of a 64-bit process' address space is reserved and unmapped, presumably as a safety feature so no 32-bit value is ever mistaken for a pointer. If you try, there may be a way to defeat this. I worked around it once by writing a C++ "pointer" class which adds 0x100000000 to the stored value. (This was significantly faster than indexing into an array, which also requires finding the array-base address and multiplying before the addition.)
On the ISA level, you can certainly choose to load and zero-extend a 32-bit value and then use it as a 64-bit pointer. It's a good feature for a platform to have.
No change should be necessary to a program unless you wish to use 64-bit and 32-bit pointers simultaneously. In that case you are back to the bad old days of having near and far pointers.
Also, you will certainly break ABI compatibility with APIs that take pointers to pointers.
I think this would be similar to the MIPS n32 ABI: 64-bit registers with 32-bit pointers.
In the n32 ABI, all registers are 64-bit (so requires a MIPS64 processor). But addresses and pointers are only 32-bit (when stored in memory), decreasing the memory footprint. When loading a 32-bit value (such as a pointer) into a register, it is sign-extended into 64-bits. When the processor uses the pointer/address for a load or store, all 64-bits are used (the processor is not aware of the n32-ess of the SW). If your OS supports n32 programs (maybe the OS also follows the n32 model or it may be a proper 64-bit OS with added n32 support), it can locate all memory used by the n32 application in suitable memory (e.g. the lower 2GB and the higher 2GB, virtual addresses). The only glitch with this model is that when registers are saved on the stack (function calls etc), all 64-bits are used, there is no 32-bit data model in the n32 ABI.
Probably such an ABI could be implemented for x86-64 as well.
On x86, no. On other processors, such as PowerPC it is quite common - 64 bit registers and instructions are available in 32 bit mode, whereas with x86 it tends to be "all or nothing".
I'm afraid that if you are concerned about the size of pointers you might have bigger problems to deal with. If the number of pointers is going to be in the millions or billions, you will probably run into limitations within the Windows OS before you actually run out of physical or virtual memory.
Mark Russinovich has written a great article relating to this, named Pushing the Limits of Windows: Virtual Memory.
Linux now has fairly comprehensive support for the X32 ABI which does exactly what the asker is asking, in fact it is partially supported as a configuration under the Gentoo operating system. I think this question needs to be reviewed in light of resent development.
The second part of your question is easily answered. It is very possible, in fact many C implementations have support, for 64-bit operations using 32-bit code. The C type often used for this is long long (but check with your compiler and architecture).
As far as I know it is not possible to have 32-bit pointers in 64-bit native code.