I am a beginner in Computer Science and I started learning the C language.
(So, I apologize at first if my question doesn't make any sense)
I am facing a doubt in the chapter called Pointers.
I am thinking that when we declare a variable, let's say an integer variable i.
A memory is allocated for the variable in the RAM.
In my book nothing was written about how the computer selects the memory address which is to be allocated for the variable.
I am asking this because, I was thinking a computer has 8GB RAM and a 32-bit processor.
I learnt that a 32-bit processor consists of a 32-bit register, which can allow the processor to access atmost up to 4GB of the RAM.
So, is it possible in that computer, when we declare the integer variable i, the computer allocated a memory space with address which can't be accessed by the 32-bit processor?
And if this happens what will be shown on the screen if I want to print the address of i using the address of operator?
A computer with 8 GB of RAM will have a 64 bit processor.
Many 64 bit CPU's can still run 32 bit programs, in which case that both use 4 GB. You will see the address &i as a 32 bit offset in your program's memory. But even if you did compile for 64 bits, you'd still see the 64 bit offset in your program - the OS will instruct the CPU how different programs use different parts of memory.
Related
I was reading Intel 64 and IA-32 Architectures Software Developer's Manual.
I understand that before going through paging, the logical address will first have to be converted to linear address, and from linear address it will goes through paging table in order to generate the ultimate physical address.
My question is that, if we run the C code below, the address of variable a that is printed, is it a logical address or a linear address?
I know that Windows 10 64-bit is currently using long mode and so the logical address and linear address are the same, but the question I have in mind is:
Is the address we are seeing in user mode program is logical address or it's the linear address that has gone through the global descriptor table translation?
#include <stdio.h>
int main(void)
{
int a = 50;
printf("%p", &a);
return 0;
}
Windows has not used segmented memory since it stopped being 16-bit. Or to put it a different way, the GDT just spans from 0 to the end of linear space in 32-bit Windows.
All this is irrelevant when asking about C because it has no knowledge of such details. To answer we must ignore the C abstract machine and look directly at Windows on x86.
If we imagine your code running in 16-bit Windows, taking the address of a local variable is going to give you the offset into the segment. This is 16-bits. A FAR address on the other hand is 32-bits of information, the segment and the offset. The Windows function lstrlen (l for long?) takes a FAR address and can compute the length of a string coming from anywhere. The strlen C function might just use a plain char* pointer; just the segment offset. Your C compiler might support different memory models (tiny, small, compact, medium, large), giving you access to more memory, perhaps even FAR pointers. Classic DOS .com files use the tiny model, there are no segments, just a maximum of 64 KiB for all code and data. Other models might separate code and data in different segments.
In 32 and 64 bit Windows the logical and linear addresses are the same but if you have to think of it in terms of your graphic, %p is going to print the logical address. print is not going to ask the CPU/OS to translate the address in any way.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
According to my system's cpuinfo file, each processor in my system has a 39 bit physical address size and a 48 bit virtual address size.
My system has 16 GB of ram, so the 39 bit physical address size makes sense to me as 39 bits is more than enough to address 16GB of ram.
However, the 48 bit virtual address size confuses me. I always believed that I could write C programs that, from the source code's perspective, could address 2^64 bytes of virtual memory (as a pointer on my system is 8 bytes long according to size(void *)). However, cpuinfo is telling me that I only have 2^48 bytes of virtual memory. So does that mean my C program can only address 2^48 bytes of virtual memory?
On your 64-bit system, pointers are indeed 64 bit wide. That means, there's 264 possible values for a pointer.
However, current x86-64 (AMD64) implementations only use the lower 48 bits. That means only 248 actually potentially valid pointers and quite a lot of pointers which are always invalid.
AMD64 Architecture Programmer’s Manual Volume 2: System Programming states:
Currently, the AMD64 architecture defines a mechanism for translating 48-bit virtual addresses to 52- bit physical addresses. The mechanism used to translate a full 64-bit virtual address is reserved and will be described in a future AMD64 architectural specification.
The development of new CPU's for faster and more powerful execution pushed for the extension of the machine registry, what is normally called the machine word.
The grow of internal data registers, started on earlier CPU's from 4bits (4004), through 8, 16, 32, 64 up to 128bits (Alpha), and maybe more in the future.
Standard processors, the main class of them as the more diffused defined as general computing, have as one of the main characteristic the size equivalence of the Instruction Pointer register, and consequently the addressing range, with the machine natural word. So on a 64bits CPU the IP, and so the memory addresses, where extended to 64bits.
But 64bits of addressing are really a huge addressing range, up to 18.446.744.073.709.551.616 bytes (16.777.216Tbytes). Something simply unfeasible with current technologies. For this reason they decided to limit the real addressing to 1Tbyte (2^40). This choice reduced CPU complexity and its power consumption.
With same aim to limit MMU registers (Memory Management Unit registers) and memory used for page directory decided to limit virtual memory to 256Tbyte (2^48).
Consider that extending memory addressing lines made more complex even the memory address decoding, requiring more logic gates, that in turn would require more power and slows down decoding timings, and then the memory access cycles.
On actual system each address, virtual or physical, having last 16 bits set triggers a memory access exception.
In conclusion 64bits can be beneficial on general computation, but are not effective in addressing, but memory pointer size = machine natural integer size is still desirable, so...
I have the following code fragment,
char *chptr;
int *numptr;
printf("\nSize of char is %d\n",sizeof(chptr));
printf("\nSize of int is %d\n",sizeof(numptr));
For which I got the following Output,
Size of char is 4
Size of int is 4
Obviously the pointers can store addresses up to 232 - 1.
I am using Windows 7 32-bit Operating System with Code::Blocks 10.05 and MingW.
But my system is having an Pentium Dual-Core Processor with 36 Bit Address Bus. Currently I have 4 GB RAM. But suppose if I increase the size of my RAM to say 8 GB, How could the C Pointers deal with such an expanded address space? The size of the C pointer is just 32 bits but the address space is well over 232.
Any Suggestions? Thank You in Advance.
PS : I have checked the answers given here which deals with address storage in pointers, But they do not cover my question I believe.
The addresses your pointers will use is in the virtual address space, not the physical address space (assuming you are using a modern O/S; you don't say in your question).
The O/S kernel will map virtual memory pages to physical address pages as-and-when necessary and manage that whole memory system without the user processes being aware.
If you are using a 32-bit O/S then you will probably have 2GB of user address space to address, which will dramatically increase when you move to a 64-bit O/S.
Check out this Wikipedia article on Virtual Memory.
The easiest solution for C pointers to deal with 8 GB is to again separate code and data. You'd then be able to have 4 GB of code and 4 GB of data, so 8 GB in total. You can't legally compare code pointer 0x00010000 and data pointer 0x00010000 anyway.
But realistically, the solution is to move to 64 bits. The 36 bits hack isn't useful; 128 GB of RAM is entirely possible today.
Some hints:
Check if your OS is 32 bit or 64 bit.
Check if your compiler is capable of generating 64 bit pointers.
Check if your compiler has additional data types for 64 bit pointers.
Check if your compiler has extension to C language like keyword far or __far etc.
Why there is no concept of near,far & huge pointer in a 32 bit compiler? As far as I understand, programs created on 16 bit 8086 architecture complier can have 1 mb size in which the data segment, graphics segments etc are there. To access all those segment and to maintain pointer increment concept we need these various pointers, but why in 32 bit its not necessary?
32-bit compilers can address the entire address space made available to the program (or to the OS) with a single, 32-bit pointer. There is no need for basing because the pointer is large enough to address any byte in the available address space.
One could theoretically conceive of a 32-bit OS that addresses > 4GB of memory (and therefore would need a segment system common with 16-bit OS's), but the practicality is that 64-bit systems became available before the need for that complexity arose.
why there is no concept of near,far & huge pointer in 32 bit compiler?
It depends on the platform and the compiler. Open Watcom C/C++ supports near, far and huge pointers in 16-bit code and near and far pointers in 32-bit code.
As i know programs created on 16 bit 8086 architecture complier can have 1 mb size in which datasegment graphics segments etc are there. to access all those segment and to maintain pointer increment concept we need these various pointers, but why in 32 bit its not necessary?
Because in most cases near 32-bit pointers are enough to cover the entire address space (all 232 bytes = 4 GB of it), which is not the case with near or far 16-bit pointers that as you said yourself can only cover up to 1 MB of memory (strictly speaking, in 16-bit protected mode of 80286+, you can use 16-bit far pointers to address up to at least 16 MB of memory, that's because those pointers are relative to the beginning of segments and segments on 80286+ can start anywhere in the first 16 MB since the segment descriptors in the global descriptor table (GDT) or the local descriptor table (LDT) reserve 24 bits for the start address of a segment (224 bytes = 16 MB)).
Can anybody tell me the difference between far pointers and near pointers in C?
On a 16-bit x86 segmented memory architecture, four registers are used to refer to the respective segments:
DS → data segment
CS → code segment
SS → stack segment
ES → extra segment
A logical address on this architecture is written segment:offset. Now to answer the question:
Near pointers refer (as an offset) to the current segment.
Far pointers use segment info and an offset to point across segments. So, to use them, DS or CS must be changed to the specified value, the memory will be dereferenced and then the original value of DS/CS restored. Note that pointer arithmetic on them doesn't modify the segment portion of the pointer, so overflowing the offset will just wrap it around.
And then there are huge pointers, which are normalized to have the highest possible segment for a given address (contrary to far pointers).
On 32-bit and 64-bit architectures, memory models are using segments differently, or not at all.
Since nobody mentioned DOS, lets forget about old DOS PC computers and look at this from a generic point-of-view. Then, very simplified, it goes like this:
Any CPU has a data bus, which is the maximum amount of data the CPU can process in one single instruction, i.e equal to the size of its registers. The data bus width is expressed in bits: 8 bits, or 16 bits, or 64 bits etc. This is where the term "64 bit CPU" comes from - it refers to the data bus.
Any CPU has an address bus, also with a certain bus width expressed in bits. Any memory cell in your computer that the CPU can access directly has an unique address. The address bus is large enough to cover all the addressable memory you have.
For example, if a computer has 65536 bytes of addressable memory, you can cover these with a 16 bit address bus, 2^16 = 65536.
Most often, but not always, the data bus width is as wide as the address bus width. It is nice if they are of the same size, as it keeps both the CPU instruction set and the programs written for it clearer. If the CPU needs to calculate an address, it is convenient if that address is small enough to fit inside the CPU registers (often called index registers when it comes to addresses).
The non-standard keywords far and near are used to describe pointers on systems where you need to address memory beyond the normal CPU address bus width.
For example, it might be convenient for a CPU with 16 bit data bus to also have a 16 bit address bus. But the same computer may also need more than 2^16 = 65536 bytes = 64kB of addressable memory.
The CPU will then typically have special instructions (that are slightly slower) which allows it to address memory beyond those 64kb. For example, the CPU can divide its large memory into n pages (also sometimes called banks, segments and other such terms, that could mean a different thing from one CPU to another), where every page is 64kB. It will then have a "page" register which has to be set first, before addressing that extended memory. Similarly, it will have special instructions when calling/returning from sub routines in extended memory.
In order for a C compiler to generate the correct CPU instructions when dealing with such extended memory, the non-standard near and far keywords were invented. Non-standard as in they aren't specified by the C standard, but they are de facto industry standard and almost every compiler supports them in some manner.
far refers to memory located in extended memory, beyond the width of the address bus. Since it refers to addresses, most often you use it when declaring pointers. For example: int * far x; means "give me a pointer that points to extended memory". And the compiler will then know that it should generate the special instructions needed to access such memory. Similarly, function pointers that use far will generate special instructions to jump to/return from extended memory. If you didn't use far then you would get a pointer to the normal, addressable memory, and you'd end up pointing at something entirely different.
near is mainly included for consistency with far; it refers to anything in the addressable memory as is equivalent to a regular pointer. So it is mainly a useless keyword, save for some rare cases where you want to ensure that code is placed inside the standard addressable memory. You could then explicitly label something as near. The most typical case is low-level hardware programming where you write interrupt service routines. They are called by hardware from an interrupt vector with a fixed width, which is the same as the address bus width. Meaning that the interrupt service routine must be in the standard addressable memory.
The most famous use of far and near is perhaps the mentioned old MS DOS PC, which is nowadays regarded as quite ancient and therefore of mild interest.
But these keywords exist on more modern CPUs too! Most notably in embedded systems where they exist for pretty much every 8 and 16 bit microcontroller family on the market, as those microcontrollers typically have an address bus width of 16 bits, but sometimes more than 64kB memory.
Whenever you have a CPU where you need to address memory beyond the address bus width, you will have the need of far and near. Generally, such solutions are frowned upon though, since it is quite a pain to program on them and always take the extended memory in account.
One of the main reasons why there was a push to develop the 64 bit PC, was actually that the 32 bit PCs had come to the point where their memory usage was starting to hit the address bus limit: they could only address 4GB of RAM. 2^32 = 4,29 billion bytes = 4GB. In order to enable the use of more RAM, the options were then either to resort to some burdensome extended memory solution like in the DOS days, or to expand the computers, including their address bus, to 64 bits.
Far and near pointers were used in old platforms like DOS.
I don't think they're relevant in modern platforms. But you can learn about them here and here (as pointed by other answers). Basically, a far pointer is a way to extend the addressable memory in a computer. I.E., address more than 64k of memory in a 16bit platform.
A pointer basically holds addresses. As we all know, Intel memory management is divided into 4 segments.
So when an address pointed to by a pointer is within the same segment, then it is a near pointer and therefore it requires only 2 bytes for offset.
On the other hand, when a pointer points to an address which is out of the segment (that means in another segment), then that pointer is a far pointer. It consist of 4 bytes: two for segment and two for offset.
Four registers are used to refer to four segments on the 16-bit x86 segmented memory architecture. DS (data segment), CS (code segment), SS (stack segment), and ES (extra segment). A logical address on this platform is written segment:offset, in hexadecimal.
Near pointers refer (as an offset) to the current segment.
Far pointers use segment info and an offset to point across segments. So, to use them, DS or CS must be changed to the specified value, the memory will be dereferenced and then the original value of DS/CS restored. Note that pointer arithmetic on them doesn't modify the segment portion of the pointer, so overflowing the offset will just wrap it around.
And then there are huge pointers, which are normalized to have the highest possible segment for a given address (contrary to far pointers).
On 32-bit and 64-bit architectures, memory models are using segments differently, or not at all.
Well in DOS it was kind of funny dealing with registers. And Segments. All about maximum counting capacities of RAM.
Today it is pretty much irrelevant. All you need to read is difference about virtual/user space and kernel.
Since win nt4 (when they stole ideas from *nix) microsoft programmers started to use what was called user/kernel memory spaces.
And avoided direct access to physical controllers since then. Since then dissapered a problem dealing with direct access to memory segments as well. - Everything became R/W through OS.
However if you insist on understanding and manipulating far/near pointers look at linux kernel source and how it works - you will newer come back I guess.
And if you still need to use CS (Code Segment)/DS (Data Segment) in DOS. Look at these:
https://en.wikipedia.org/wiki/Intel_Memory_Model
http://www.digitalmars.com/ctg/ctgMemoryModel.html
I would like to point out to perfect answer below.. from Lundin. I was too lazy to answer properly. Lundin gave very detailed and sensible explanation "thumbs up"!