Why 32-bits OS supports up to 4 GB RAM - c

So I knew that a 32-bits OS can support 232 different values, which is approximately 4x109.
I would imagine that the internal representation of each value is like this:
0000 0000 0000 0000 0000 0000 0000 0000
.....
1111 1111 1111 1111 1111 1111 1111 1111
So we have approximately 4x109 different patterns here.
But Since each address consists of 4 bytes (32/8=4), shouldn't the RAM be
4x4x109?

Each address addresses one byte, in typical modern systems.
Even if the hardware can only transfer four bytes or eight bytes at a time, each byte within such a unit is given its own address. The processor may only interact with the memory hardware using 28 or 29 or some other number of bits, but it uses the additional bits to distinguish bytes within words.
When a program accesses a particular address, the processor uses the low bits to isolate bytes. When reading, it gets an entire unit from memory and then uses the low bits to isolate the requested byte or bytes. When writing, it uses the low bits to merge the selected byte or bytes into a unit of data, and then it writes the complete unit to memory.
So, with 32 bits in an address, 232 = 4,294,967,296 addresses are available, and 4,294,967,296 things can be addressed. In typical modern hardware, each of those things is one byte. Often, not all of them are available to user programs, as some addresses are reserved for special purposes.

x86 is byte addressed, which means that each byte has a unique address. This makes (under normal circumstances) the total amount of addressable memory to be 2^32 bytes (roughly 4 GB). And non addressable memory (memory that you can't have an address for) is utterly unusable.
Out of the whole address space, not all addresses are to main memory. A chunk of it is reserved for IO, so the amount of maximum RAM is even lower than 4 GB.
I will try to address your confusion
So we have approximately 4x109 different patterns here.
correct
But Since each address consists of 4 bytes (32/8=4)
irrelevant
shouldn't the RAM be 4x4x109?
no
The number of bits of the address determines how many different values can be. We already determined that 32 bits give us exactly 2^32 ≈ 4^10 different addresses. We can therefore have ≈4^10 different objects that have unique addresses. Bear with me, let's think a bit outside of the box. If the addresses were addresses of cities, then we could have a maximum of ≈4^10 cities. If the addresses were addresses of streets, we could have a maximum of ≈4^10 streets. If the addresses were addresses of apples, we could have a maximum of ≈4^10 apples. With me? If the addresses were addresses of 64-bit QWORDS then we could have a maximum of ≈4^10 64-bit qwords. If the addresses were addresses of 32-bit DWORDS then we could have a maximum of ≈4^10 32-bit qwords. But in x86 an address is none of the above. In x86 an address is an address of a byte (8 bits) so we can have a maximum of ≈4^10 bytes aka ≈4 GiB.
As you can see the width of the address only gives the number of different objects we can address. In x86 those objects are bytes, so we can have a maximum of 2^32 addressable bytes.
The normal limit of 2^32 bytes can be overcome with Physical Address Extension. This requires both OS and hardware (CPU, chipset and motherboard) support and even when these are met each program can still only work with 32 bit addresses.

For a modern OS, typically there's virtual addresses that are translated into physical addresses.
For a 32-bit OS, the virtual addresses are often (but not necessarily) 32-bit. With byte addressing, this means you can have 1234 processes where each process has 4 GiB of virtual space (or a total of 4936 GiB of virtual space). However, typically each virtual address space is split with "user-space" in one part and "kernel-space" in another part; so it might be more like 2 GiB for each process plus 2 GiB for the kernel (or a total of 2470 GiB for 1234 processes).
However, because virtual addresses are converted into physical addresses the size of a virtual address doesn't need to be the same as the size of a physical address. This means that even if virtual addresses are 32-bit, a physical address can be larger (or smaller) than 32-bit. For example, for most older 80x86 CPUs there's a "Physical Address Extensions" (PAE) feature that extends the physical address size to 36 bits (giving you a physical address space of 16 GiB), and for modern 80x86 CPUs (that are capable of running a 64-bit OS) PAE was enhanced to allow a 32-bit OS to use physical addresses up to (a "current architectural maximum" of) 52-bits, giving a physical address space size up to 4096 TiB for a 32-bit OS (in theory).
Of course the physical address space contains RAM, some ROM, some areas for devices, etc. For example, with 16 GiB of physical address space, 1.5 GiB might be reserved for things that aren't RAM, so the maximum RAM you might be able to have (and the maximum a 32-bit OS could use) might be 14.5 GiB.
Sadly(?) most motherboards don't support the maximum amount of RAM that the CPU is capable of using. For example, a lot of modern CPUs support 48-bit physical addresses (256 TiB of physical address space) but I've never seen a motherboard that's able to support more than 8 TiB of RAM and most modern motherboards don't even support 1 TIB of RAM.
In the same way, various operating systems have their own limits. For example, most 32-bit versions of Windows didn't support PAE (because of device driver compatibility issues initially, and then because everyone adopted 64-bit anyway so nobody cared); so if you had a computer with (e.g.) 8 GiB of RAM the OS can't use most of the RAM (and would probably only be able to use 3 GiB of the RAM because 1 GiB of space is probably reserved/used by ROMs, devices, etc).
Note that for 64-bit operating systems on 80x86; virtual addresses are 48-bit (not 64 bit) and physical addresses are anything from 32-bit (Atom) to 52-bit (and also not 64 bit); and Intel has been thinking of a "5-level paging" extension to allow 57-bit virtual addresses (which still won't be 64-bit).
In general (if you ignore specific CPUs); the size of a general purpose register, the size of a virtual address and the size of a physical address can all be completely different; and for a 32-bit OS (using 32-bit general purpose registers) the virtual address space size can be anything and the physical address space size can be anything; and the maximum amount of RAM you could have in the physical address space can be anything.

[It's likely I misunderstood the question. I might delete this answer later.]
For a system with 32-bit (4-byte -- assuming 8-bit bytes) addresses, there are 232 distinct address values. If the memory space is fully populated with RAM, then you can use a 32-bit address to refer to any of the 232 bytes of memory on the system.
But you can't store all those 232 address values in memory simultaneously. As you say, you would need 4 * 232 bytes to store all those address values.
Fortunately, there's no need to store all those distinct memory address values in memory. You only need to store those address values that you're actually using.
(I'm ignoring issues of virtual vs. physical memory.)

Related

What is inside a pointer on a modern x64 system?

This code for example:
int x = 75;
int *p = &x;
printf("%llx\n",p);
Writes a 64-bit number. What I'm asking is, what exactly is this number? Yes, it is an address.
But is it an absolute address in virtual memory where the value 75 is stored? Or is it possibly offset from some page marker, or an offset from the "start point" of the program's memory block?
If it matters, I'm asking about Windows 10, 64 bit, on a typical x64 intel chip.
Yes, it is the absolute address in your program's virtual address space.
It is not an offset.
In 16-bit Windows (which was common 30 years ago), a segmented memory model was used, in which pointers were segmented and consisted of a 16-bit segment pointer and a 16-bit offset (32 bits in total).
However, 32-bit and 64-bit Windows both use a flat memory model, which uses absolute addresses.
It is a virtual address, which is a virtual page number and an offset from the beginning of the page. The translation mechanism looks up in the page tables of the process to determine the corresponding physical page number and combines it with the offset to come up with the physical address.

Why 2 raised to 32 power results in a number in bytes instead of bits?

I just restart the C programming study. Now, I'm studying the memory storage capacity and the difference between bit and byte. I came across to this definition.
There is a calculation to a 32 bits system. I'm very confused, because in this calculation 2^32 = 4294967296 bytes and it means about 4 Gigabyte. My question is: Why 2 raised to 32 power results in a number in bytes instead of bits ?
Thanks for helping me.
Because the memory is byte-addressable (that is, each byte has its own address).
There are two ways to look at this:
A 32-bit integer can hold one of 2^32 different values. Thus, a uint32_t can represent the values from 0 to 4294967295.
A 32-bit address can represent 2^32 different addresses. And as Scott said, on a byte-addressable system, that means 2^32 different bytes can be addressed. Thus, a process with 32-bit pointers can address up to 4 GiB of virtual memory. Or, a microprocessor with a 32-bit address bus can address up to 4 GiB of RAM.
That description is really superficial and misses a lot of important considerations, especially as to how memory is defined and accessed.
Fundamentally an N-bit value has 2N possible states, so a 16-bit value has 65,536 possible states. Additionally, memory is accessed as bytes, or 8-bit values. This was not always the case, older machines had different "word" sizes, anywhere from 4 to 36 bits per word, occasionally more, but over time the 8-bit word, or "byte", became the dominant form.
In every case a memory "address" contains one "word" or, on more modern machines, "byte". Memory is measured in these units, like "kilowords" or "gigabytes", for reasons of simplicity even though the individual memory chips themselves are specified in terms of bits. For example, a 1 gigabyte memory module often has 8 gigabit chips on it. These chips are read at the same time, the resulting data combined to produce a single byte of memory.
By that article's wobbly definition this means a 16-bit CPU can only address 64KB of memory, which is wrong. DOS systems from the 1980s used two pointers to represent memory, a segment and an offset, and could address 16MB using an effective 24-bit pointer. This isn't the only way in which the raw pointer size and total addressable memory can differ.
Some 32-bit systems also had an alternate 36-bit memory model that allowed addressing up to 64GB of memory, though an individual process was limited to a 4GB slice of the available memory.
In other words, for systems with a singular pointer to a memory address and where the smallest memory unit is a byte then the maximum addressable memory is 2N bytes.
Thankfully, since 64-bit systems are now commonplace and a computer with > 64GB of memory is not even exotic or unusual, addressing systems are a lot simpler now then when having to work around pointer-size limitations.
We say that memory is byte-addressable, you can think like byte is the smallest unit of memory so you are not reading by bits but bytes. The reason might be that the smallest data type is 1 byte, even boolean type in c/c++ is 1 byte.

What is meant by an aligned/unaligned AXI transfer

Can anyone explain the difference between an aligned and an unaligned data transfer
It is not limited to AXI busses it is a general term which affects the bus transfers and leaves undesirable results (performance hits). But that depends heavily on the overall architecture.
If addresses are in units of bytes, byte addressable, then a byte is always aligned. Assuming a byte is 8 bits, then a 16 bit transfer would be aligned if it is on a 16 bit boundary, meaning the lower address bit is a zero. A 16 bit value which covers the addresses 0x1000 and 0x1001 is aligned, and is considered to be at address 0x1000 (big or little endian). But a 16 bit value that covers the addresses 0x1001 and 0x1002 is not aligned, it is considered to be at address 0x1001. 0x1002 and 0x1003 would be aligned. 32 bit value two lower address bits need to be zero to be aligned. A 32 bit value at 0x1000 is aligned but 0x1001, 0x1002, 0x1003 would all be unaligned.
Memories are generally not 8 bits wide from an interface perspective as well as a geometry, depends on what kind of memory or where. The cache in a processor that stages the transfers to slow dram, is going to likely be 32 or 64 or wider, some power of 2 or a power of 2 with a parity bit or ecc (32, 33 bits or 40) all of this is hidden from you other than performance hits you may run into. When you have a memory that is 32 bits wide and if I call a 32 bit value a word then that memory is word addressable the address 0x123 is a word address, its equivalent byte address is 0x123*4 or 0x48C. If you were to write a 32 bit value to byte address 0x48c that becomes a single word write to that memory at that memories address 0x123. But if you were to do a word write to byte address 0x48E, then you would need to do a read of word address 0x123 in that sram/memory replace two of the bytes from the word you are writing. and write that modified word back, then you would have to read from word address 0x124, modify two bytes and write the modified word back.
Various busses work various ways. some will put the single word on a word sized bus and allow unaligned addresses. a 32 bit wide axi would need to turn that 0x48E word write into two axi transfers one with two byte lanes enabled in the byte mask and the second transfer with the other two byte lanes enabled. A 64 bit wide axi bus. lets see....10010001110...would need to do two axi transfers one transfer with 16 bits of the data and a second one with the other 16 bits of the data because of where that 32 bits lands. But a word transfer at address 0x1001 would/should be a single transfer on a 64 bit axi bus with the middle four byte lanes enabled.
Other bus schemes work like this and some don't some will let a 32 bit thing fit in the 32 or 64 bit bus, but the memory controller on the other end has to do the multiple transactions to cache or create multiple transactions on the next bus.
Although technically possible to byte address dram as far as some of the standard parts and busses work, another thing a cache buys you is that the smaller and unaligned transactions can hit the faster sram, but the cache line reads and evictions can be optimized for the next bus or the external memory so dram for example for most of the systems we use can always be accessed aligned in multiples of the bus width (64 or 64+ecc) for desktops and servers and 32 or 16 bit for embedded systems, laptops, phones. The two busses and solutions can be optimized for each side with the cache being the translator.

What is the size of pointers in C on PAE system?

I know normally in a 32-bit machine the size of pointers used in regular C programs is 32-bit. What about in a x86 system with PAE?
It's still 32 bits.
PAE increases the size of physical memory addresses, which lets the operating system use more than 4GB RAM for running applications. To run an application the operating system maps the larger physical addresses to 32 bit virtual addresses. This means that the address space in each application is still limited to 4GB.
PAE changes the structure of page tables to allow them to address more than 32 bits worth of physical memory. Virtual memory addressing remains unchanged — pointers in userspace applications are still 32 bits.
Note that this means that 32-bit applications can be used unmodified on PAE systems, but can still each only use 4 GB of memory.
It is 32 bit only.Because,
PAE is a feature to allow 32-bit central processing units (CPUs) to access a physical address space (including random access memory and memory mapped devices) larger than 4 gigabytes.
see this http://en.wikipedia.org/wiki/Physical_Address_Extension
You can access the memory through a window (address range). Each time you have to use something outside that range, you should use a system call to map another range there. Consider using multiple heaps, with an offset within the window (pointer) - then the full pointer would be the heap identifier and a window offset (structure), totally 64 bits, each time you have to go outside the current heap, you have to switch them though.

why there is no any concept of near, far & huge pointer in 32 bit compiler?

Why there is no concept of near,far & huge pointer in a 32 bit compiler? As far as I understand, programs created on 16 bit 8086 architecture complier can have 1 mb size in which the data segment, graphics segments etc are there. To access all those segment and to maintain pointer increment concept we need these various pointers, but why in 32 bit its not necessary?
32-bit compilers can address the entire address space made available to the program (or to the OS) with a single, 32-bit pointer. There is no need for basing because the pointer is large enough to address any byte in the available address space.
One could theoretically conceive of a 32-bit OS that addresses > 4GB of memory (and therefore would need a segment system common with 16-bit OS's), but the practicality is that 64-bit systems became available before the need for that complexity arose.
why there is no concept of near,far & huge pointer in 32 bit compiler?
It depends on the platform and the compiler. Open Watcom C/C++ supports near, far and huge pointers in 16-bit code and near and far pointers in 32-bit code.
As i know programs created on 16 bit 8086 architecture complier can have 1 mb size in which datasegment graphics segments etc are there. to access all those segment and to maintain pointer increment concept we need these various pointers, but why in 32 bit its not necessary?
Because in most cases near 32-bit pointers are enough to cover the entire address space (all 232 bytes = 4 GB of it), which is not the case with near or far 16-bit pointers that as you said yourself can only cover up to 1 MB of memory (strictly speaking, in 16-bit protected mode of 80286+, you can use 16-bit far pointers to address up to at least 16 MB of memory, that's because those pointers are relative to the beginning of segments and segments on 80286+ can start anywhere in the first 16 MB since the segment descriptors in the global descriptor table (GDT) or the local descriptor table (LDT) reserve 24 bits for the start address of a segment (224 bytes = 16 MB)).

Resources