I am learning C programming language these days. I have a question about pointer.
The textbook said pointer stores memory address, and using printf("%p",pointer) we can show where this pointer points in our memory.
But every pointer alse has a type, like int *pointer, long *p and so on. int *pointer means "p is a pointer to int".
My question if we write
int *p,i;
p=&i;
*p=99;
if the pointer only contains the address information, how could the programme know how many digits should be used for storing integer 99? Because an integer could be 16 bits int or 32 bits long.
So I was wondering if an int pointer in memory not only stores address information, but also stores the type information?
Because an integer could be 16 bits int or 32 bits long.
An integer could, but an int could not. Regardless of how large that is exactly in your environment, its size is set in stone (within that environment) and doesn't change at run time. An int * only points to an int, not to a long. Note that if there were any such problems, they would affect int x; equally.
So a pointer really only stores the memory address. Information about the size of the pointee is in the type (just as a non-pointer variable's type tells the compiler how large that variable is).
Make sure you don't confuse what the hardware does with what the compiler does. A pointer is an address to a memory location as far as the hardware is concerned. What that memory location contains, and how long that data is, does not matter. A pointer does not store anything either. It points to a location and that's all. In assembly, this would be similar to using a register that points to a memory location.
The compiler is what tracks the type of data contained at that pointed to location. It is the compiler's job to save you from making type errors. This is where some people complain that you can shoot yourself in the foot with C. It's possible to have a pointer point to a data location where that data can be most anything and any length.
So I was wondering if an int pointer in memory not only stores address information, but also stores the type information?
No, a pointer is a memory address (or in some cases, a value analogous to a memory address). A pointer doesn't contain the data -- it points to the data. In your example, the data is stored in another location (given the name i), and p contains the address of i.
how could the programme know how many digits should be used for storing integer 99?
All the bits (binary digits) of the value type are used to store the stored value. Any given type (like int) has a fixed size. int can have different sizes depending on the system being used, but the size is always determined when the code is compiled. That is, the type int could be 16 bits, 32 bits, 64 bits, or some other size, but the compiler will always use a single size for compiling an entire program.
The type information is retained in the source code code and is used by the compiler to perform type checking and to generate appropriate code, the type is implicit in the generated code rather than explicitly stored as data.
Related
My questions are how where are the pointer values stored and do they take up more space in memory than the things they point to. For example, if I have a 64 bit pointer which i use to point to a byte of information in memory; where is the value of the pointer stored and does it take up more space than the data itself?
An example of my question written in C
char data = 10;
char* charPointer = &data;
doesn't charPointer take up more space than the data since the pointer is 64 bit and the data is 8 bit?
PS: I'm currently trying to learn the basics of C.
where is the value of the pointer stored
The pointer's value is stored where you declared it to be stored; e.g. if you declared the pointer as a local variable, then it will be stored on the stack; OTOH if you allocated the pointer on the heap (or, more commonly, you included the pointer as a member-variable of an object that you allocated on the heap) then the pointer will be in the heap.
and does it take up more space than the data itself?
In this case, it does.
You can actually check for yourself how much space a value takes up:
printf("data takes up %zu bytes\n", sizeof(data));
printf("charPointer takes up %zu bytes\n", sizeof(charPointer));
If you are on a 64-bit machine, you should find that a pointer takes up 8 bytes, as you would expect.
The char that the pointer points to takes up one byte, but in many cases there will also be some padding bytes inserted to keep the alignment of the next item on the stack optimal for the CPU to access efficiently.
A pointer to a char takes more space than a char in practically every C implementation. (The C standard permits them to be the same size, but such implementations are at best rare and are more likely archaic or nonexistent.)
Nonetheless:
One may need a pointer to point to one of a selection of objects or to manage objects or to connect them in various ways.
A pointer may point to objects much larger than a char.
A pointer may point to the first of many elements in an array, so one pointer can provide access to many objects.
AFAIK, the size occupied by any type of pointer is the same on a given architecture. That is, the only difference between different types of pointers is what will happen when we use an operation such as ptr++ or ptr-- on the pointer.
As an example:
char *cptr;
int *iptr;
occupy the same amount of memory (such as 4 bytes, or 8 bytes or something else). However, the difference is what will happen when we use the increment (or decrement) operator on the pointers. cptr++ will increment cptr by 1, while iptr++ will increase iptr by 4 (depending on the architecture, it can be a different value than 4 as well).
The Question
My question is, are there any differences between:
char **cdptr;
int **idptr;
(Assume that for the machine under mention, pointers have a size of 4 bytes)
Since both are pointers, both will occupy the same amount of space: 4 bytes. Also, since both point to something that occupy the same size (again, 4 bytes), operations char cdptr++ and int idptr++ will work exactly the same on these two pointers (incrementing them by 4 respectively).
So, do different types of higher order pointers have any differences?
Formally speaking, yes, these pointer types are different. They have different types, types which are important to the programmer and which the compiler keeps intimate track of. You can prove they're different by trying to compile
char **cdptr;
int **idptr = NULL;
cdptr = idptr;
Your compiler will complain. (gcc says "assignment from incompatible pointer type".) You can also convince yourself that they're different by noticing what happens when you indirect on them: cdptr[1][2] is of course a char, while idptr[1][2] is an int.
Now, it's true, since sizeof(*cdptr) almost certainly equals sizeof(*idptr), pointer arithmetic like cdptr++ and idptr++ will generate the same code. But this doesn't strike me as a terribly useful fact -- it's about as interesting as observing that if we declare
int *iptr;
char **cdptr;
we get the same code for iptr++ and cdptr++ on a machine where ints and pointers happen to be the same size. But this doesn't tell us anything we can use while writing C programs. "Generate the same code when incremented" does not equal "are the same".
Basically, in C language, a pointer is more than a memory address. It is a memory address AND a type.
The type is needed when you use pointer arithmetics. For example: ptr + 2 means that you shift the current position of the pointer in memory by 2 sizeof(pointed type by ptr).
So, a pointer of pointer differ from a simple pointer by its type... That's all.
The address that pointed by a pointer in c language contains how much data (byte , 2 byte ) , or is it dependent on the data type that point to ?
I'm not entirely sure what you mean by "The address that pointed by a pointer". I'll assume you're referring to the object it points to, not to the size of the pointer itself.
The answer to your question can depend on just what you mean by "address".
A C pointer value is not just a raw memory address (though it's usually implemented that way). A pointer value refers to an object of a specific type, and that type specifies among other things, the size of the pointed-to object. And the C standard fairly consistently uses the word "address" to refer to a (non-null) C pointer value.
On the other hand, the word "address" is commonly used to refer to a raw memory address, which can be thought of as pointing to a single byte. But on the other hand, even on the machine code level, what size of data an address refers to can depend on what you do with it. (I've even worked on systems where a machine address can only point to a 64-bit word; byte operations were done entirely in software.)
A pointer of type int* points to an int object. The size of that object is, by definition, sizeof (int) bytes (commonly 4 bytes, but it could be 8, or 2, or even 1 if a byte is at least 16 bits). Similarly, a pointer of type struct foo* points to a struct foo object, which could be of just about any arbitrary size.
An int* pointer doesn't just point to the first byte of an int object, it points to the entire int object. (But if you convert an int* pointer to char*, the result will point to the "first" byte.)
And as a special case, a pointer of type void* points to some location in memory, but does not specify the size of the object it points to. You can't dereference it until you convert it to some other pointer type.
Recommended reading: Section 4 of the comp.lang.c FAQ.
This is quite a philosophic question, let me give you three completely incompatibel answers:
The pointer itself obviously points to exactly 1 byte
depending on the type of the pointer, the pointer points to the first byte of a chunk of data the size of which is defined by the pointer type
the pointer points to the first byte of some data, the usable length of which is neither deterministic nor determinable from inside the program
While I consider all three answers to be technically correct, the second one is what you would use while programming in C.
Edit
For the details if what an "address" can, may or should be, please look at #KeithThompson's input in the comments below!
I understand that in c, a pointer points to a memory address. In the following code
char *cp;
someType *up; // Assuming that someType is a union of size 16 bytes
cp = sbrk(nu * sizeof(someType));
// Here is what confuses me
up = (someType *)cp;
According to this link http://en.wikibooks.org/wiki/C_Programming/Pointers_and_arrays
If cp points to an address of 0x1234, so 0x1234 should be the beginning of the newly allocated memory, right?
So when "cp" is casted to a pointer to someType and assigned to "up", it actually says "up" is a pointer that points to 0x1234, assumes in 32-bits system, each memory address takes 4 bytes, a someType object will use 4 memory address to store its value, so the addresses 0x1234, 0x1235, 0x1236, 0x1237 collectively store a someType object, is this correct?
Thanks in advance.
Pointer type casting doesn't do anything at the machine level. The value in memory is still just a plain old value in memory.
At the language level, that value is used as an address. In other words it is often used as the value passed to operations that require memory locations, such as the assembly "load" operation.
At the language level, extra constructs are added to the machine operations, and one of those "extras" are data types. The compiler is responsible for checking if type-rule violations exist, but at run time there is not such a constraint.
As a result, the cast does nothing at run time, but at compile time it directs the compiler to not emit an error as the type of the pointer's value will now be considered a compatible type to the variable holding the address.
Even on a 32-bit system, a memory address typically refers to a single byte, not four. So your Header will be covering the 16 memory addresses between 0x1234 and 0x1243.
To answer your questions:
cp points to the beginning of the newly allocated memory.
However, up will contain the same address as cp. Note it is bad form to try and assign a pointer from one type to a different type without "casting" it first - just to tell the compiler you really meant to do such a dangerous thing.
Typically assigning between types can cause all kinds of problems - as a result compilers will throw warnings if you do this without casting. The casting, however, doesn't actually have any effect on the program code itself - it is more a method of the programmer telling the compiler they really meant what they did.
By the way, the pointer up is allocated on the stack in your function. Assigning a value to up actually places the value in the memory location on the stack. So by assigning cp to up you are merely copying the value of the pointer in cp to the location in the stack assigned to up. There is nothing placed into the memory allocated by the sbrk() call.
First of all, remember that a pointer is an abstraction of a memory address; do not assume a one-to-one correspondence between a pointer value and a physical location in RAM. Pointers also have type semantics associated with them; adding 1 to a pointer value advances it to point to the next object of the pointed-to type:
char *cp;
int *ip;
struct huge *hp;
cp = cp + 1; // advances cp to the address of the next char (1 byte)
ip = ip + 1; // advances ip to the address of the next int (anywhere
// from 2 to 8 bytes depending on the architecture)
hp = hp + 1; // advances hp to the address of the next struct huge (however
// many bytes struct huge takes up)
This is exactly how array indexing works; the expression a[i] is treated as *(a + i); you offset i elements of whatever size from the base address, not i bytes.
As far as the cast is concerned...
A pointer to char is a different, incompatible type than a pointer to Header. They may have different sizes and representations depending on the underlying architecture. Because of this, the language won't allow an implicit conversion of one to the other through a simple assignment; you must use a cast to convert the rhs value to the type expected by the lhs.
If cp points to a address of 0x1234, so 0x1234 should be the beginning of the newly allocated memory, right?
Right.
So when cp is casted to a pointer to Header and assigned to up, it actually says up is a pointer that points to 0x1234, assumes in 32-bits system, each memory address takes 4 bytes, a Header object will use 4 memory address to store its value, so the addresses 0x1234, 0x1235, 0x1236, 0x1237 collectively store a Header object, is this correct?
Wrong. C is strictly byte addressed. The Header object will occupy addresses 0x1234 through 0x1234 + sizeof(Header) - 1, inclusive, whatever sizeof(Header) is (perhaps 16 in this case). To confuse the issue, adding 1 to any pointer increments its numeric value by the sizeof whatever it points to, so it is the case that up + 1 points beyond the end of allocated memory. However cp + 1 points to the second byte in the representation of the header object, whatever that is. (sizeof(char) is 1 by definition.)
This has nothing to do with your question, but I must warn you that calling sbrk with a nonzero argument will cause your program to crash at some indefinite point after the next call to malloc, realloc or free, and that nearly all standard library functions are allowed to call those functions "under the hood". If you want to make a large allocation directly from the operating system, use mmap.
Below is a simple code snippet:
int main()
{
int *p;
p=(int*)malloc(sizeof(int));//allocate m/y 4 1 int
printf("P=%p\tQ=%p",p,p+2);
}
In one sample run, it gave me the output as below:
P=0x8210008 Q=0x8210010
Starting address of P is-P=0x8210008,next byte is 0x8210009,next byte is 0x821000A,next byte is 0x821000B.So the 4 bytes for int is ending there.
We haven't allocated more memory using malloc.
Then how is p+2 leading us to 0x8210010,which is 8 bytes after P(0x8210008).
Because it's treating it as an integer-element offset from the pointer. You have allocated an array for a single integer. When you ask for p+2 it's the same as &p[2]. If you want two bytes from the beginning, you need to cast it to char* first:
char *highWordAddr = (char*)p + 2;
First, the fact that you have printed an address does not imply that memory is allocated at that address. You have simply added numbers and produced other numbers.
Second, the reason that you number you got by adding two was eight greater than the base address instead of two greater than the base address was because, when you add integers to pointers in C, the arithmetic is done in terms of pointed-to elements, not in terms of bytes in memory (unless the pointed-to elements are bytes). Suppose you have an array of int, say int x[8], and you have a pointer to x[3]. Adding two to that pointer produces a pointer to x[5], not a pointer to two bytes beyond the start of x[3]. It is important to remember that C is an abstraction, and the C standard specifies what happens inside that abstraction. Inside the C abstraction, pointer arithmetic works on numbers of elements, not on raw memory addresses. The C implementation (the compiler and the tools that turn C code into program execution) is required to perform whatever operations on raw memory addresses are required to implement the abstraction specified by the C standard. Typically, that means the compiler multiplies an integer by the size of an element when adding it to a pointer. So two is multiplied by four (on a machine where an int is four bytes), and the eight that results is added to the base address.
Third, you cannot rely on this behavior. The C standard defines pointer arithmetic only for pointers that point to objects inside arrays, including one fictitious object at the end of the array. Additionally, pointers to individual objects act like arrays of one element. So, if you have a pointer p that points to an int, you are allowed to calculate p+0 or p+1, because they point to the only object in the array (p+0) and the fictitious object one beyond the last element in the array (p+1). You are not allowed to calculate p-1 or p+2, because these are outside the array. Note that this is not a matter of dereferencing the pointer (attempting to read or write memory at the calculated address): Even merely calculating the address results in behavior that is not defined by the C standard: Your program could crash, it could give you “correct” results, or it could delete all files in your account, and all of those behaviors would be conforming to the C standard.
It is unlikely that merely calculating an out-of-bounds address would produce such weird behavior. However, the standard permits it because some computer processors have unusual address schemes that require more work than simple arithmetic. Perhaps the second-most common address scheme after the flat address space is a base address and offset scheme. In such a scheme, the high 16 bits of a four-byte pointer might contain a base address, and the low 16 bits might contain an offset. For a given base address b and offset o, the corresponding virtual address might be 4096*b+o. (Such a scheme is capable of addressing only 220 bytes, and many different values of base and offset can refer to the same address. For example, base 0 and offset 4096 refer to the same address as base 1 and offset 0.) With a base-and-offset scheme, the compiler might implement pointer arithmetic by adding only to the offset and ignoring the base. (Such a C implementation can support arrays only up to 65536 bytes, the extent addressable by the offset alone.) In such an implementation, if you have pointer-to-int p with an encoding of 0x0000fffc (base 0, offset 65532), and int is four bytes, then p+2 will have the value 0x00000004, not the value that is eight greater (0x00010004).
That is an example where pointer arithmetic produces values that you would not expect from a flat-address machine. It is harder to imagine an implementation where pointer arithmetic that is not valid according to the C standard would produce a crash. However, consider an implementation in which memory must be manually swapped by a process, because the processor does not have the hardware to support virtual memory. In such an implementation, pointers might contain addresses of structures in memory that describe disk locations and other information used to manage the memory swapping. In such an implementation, doing pointer arithmetic might require reading the structures in memory, and so doing invalid pointer arithmetic might reading invalid addresses.
C is happy to let you do whatever pointer arithmetic you like. Just because p+2 looks like any other address doesn't mean it's valid. In fact, in this case, it's not.
Be very careful any time you see pointer arithmetic that you're not going outside your allocated bounds.
This is called pointer arithmetic. http://www.learncpp.com/cpp-tutorial/68-pointers-arrays-and-pointer-arithmetic/