I have seen that there are many questions related to this topic, but I could not infer an answer, so I decided to ask my first question here on stack overflow. Currently, my question is regarding the bytes and addresses, does each address actually represent one address, meaning that if I would initialize one address e.g. 0x55555555d156 but if I were to initialize an int, it would take 4 addresses, meaning that it will range from e.g. 0x55555555d156 to 0x55555555d160 ? So what confuses me is that, a pointer will hold an address, right?
Let's say the pointer holds the address e.g. 0x55555555d156 and if I were to deference that address I would get the value of that int, right? what about the other 3 addresses, if I deference them? I could not manage to acquire that information by writing a C program.
if I were to deference that address I would get the value of that int, right?
Yes.
what about the other 3 addresses, if I deference them?
If you have int *p = &some_integer;, then *(int *)((char *)p + 1) (dereferencing p "shifted" by one byte) would attempt to read 4 bytes from that new address and interpret them as an integer. Whether your program has permission to read that last byte that's right next to some_integer in memory, is another story: if it doesn't, you'll get a segmentation fault or other memory access issues.
Or you may get no errors and read garbage data.
Example
#include <stdio.h>
int main(void) {
int my_int = 0x12345678;
int *ptr = &my_int;
printf("%x\n", *ptr);
printf("%x\n", *(int *)((char *)ptr + 1));
}
Output:
~/test $ clang so.c && ./a.out
12345678
80123456
^^
|-- This "random" byte was read as part of
--- the "new" int shifted by 1 byte
Different microprocessors have different addressable units of memory. Most, including the x86 series and ARM, are addressable in units of one byte. So, for example, a 32-bit int will be stored in four consecutive memory addresses as you say (LSB first, unless the ARM is set to "Big Endian" mode).
Other processors, like PIC, may have one address point to a 16-bit memory word.
Your C code should probably not make assumptions either way, unless you're sure what the code will be run on.
You can't "dereference an address" -- you can dereference a pointer. A pointer value is an address, but the pointer also has a type. The result of the dereference depends on the pointer's type.
The result of derefencing a pointer is NOT the value stored at the memory location being pointed to. It is an expression that designates an object. This is known as an lvalue in C .
If all of this is unclear; first check that you understand what is happening in the code:
int x = 0;
x = 5;
In the second line, the use of x does not retrieve the value 0. The expression x is an lvalue which means that it designates a region of memory consisting of several bytes. Each byte has its own address. If you output &x you will likely see the same result as if you output the address of the first byte of x (although this is not a Standard requirement), but the types are different.
Whether or not the stored value is retrieved when an lvalue expression appears in the code, depends on the context of the expression. For example if it appears on the left-hand side of the assignment operator, the value is not retrieved.
Once you have understood x = 5; , then *p = 5; behaves identically; the meaning of *p is exactly as if the label x existed for the memory region that p points to.
Related
This question already has answers here:
Why Use Pointers in C?
(4 answers)
Closed 5 years ago.
Why is it we have to dereference a pointer ( that has already been linked to another variable ) every time we want to work on the variable its linked to? If there's a pointer that's linked to a variable ( an int for example ) isn't dereferencing it on-top of everything else a bit redundant? Do standard ( non-dereferenced ) pointers serve their own separate purposes? What can we do with a pointer that hasn't been dereferenced?
example : pPrice vs *pPrice
I'm learning my first programming language C and I've been wondering this for some time.
When working with people who haven't worked with pointers before, I like to point out that a pointer isn't special compared to other primitive types; it's simply an integer, just like ints or chars. What's special about it is how we interpret its value - specifically, we interpret its value as the location (address) of another value. This is a little bit similar to how even though chars are really just integers, we interpret them to be characters based the ASCII encoding. Simply because we interpret a pointer's value to be an address, we can perform operations such as reading or writing the memory at the address it's specifying.
Like you said, we can still access the pointed to memory as usual, but now we have some additional benefits.
For example, we can change the pointer's value, thereby pointing it to a different place in memory. This might be useful if we want to write a generic function that modifies a given object, but you want to dynamically decide which object to pass to it.
Also, the pointer itself is a constant, defined size (usually 32 or 64 bits), yet it could point to an object of any arbitrary size (e.g. a vector, string, or a user defined type). This makes it possible for us to pass around or store a handle to the object without passing/storing the entire object, which might be expensive.
I'm sure there are a million more use cases that I'm leaving out, but these are a couple to get you started.
tl;dr: An integer variable is distinct from a reference to an integer variable.
In C, the idea of a "variable" is very specific. In reality, it is the area in memory which holds the value of a variable. The symbol is a convenient symbol name by which the C code can refer to the value, to the content of the memory area. A "reference" is the address of the memory location. Since C is strongly typed, each symbol indicates how to interpret the memory to which it is associated. (Whew!)
So,
int i;
i refers to a memory location which holds an integer.
i=5;
fills that memory location with the binary representation for 5.
int *p;
means p is associated with a memory location which contains a pointer (aka address) to an integer location. So we can write,
p = &i;
where &i is the explicit address of the memory location of the integer value. So, p and i refer to quite different kinds of things. Since p is the address of an integer, we can dereference it (i.e., follow the address) to get to the actual integer value at the address location (the one assocated with i).
*p = 6;
i = 6;
Both assignment statements are functionally equivalent (they both set the same memory location to integer 6) since both i and the dereferenced p, *p, refer to the same memory location. Noteably, i is inextricably tied to the memory location, while *p can point elsewhere (though, p is inextricably to the integer-pointer memory location that refers to integers).
pPrice may be modified to point to a different price.
int a = 1,
b = 2,
*p = &a;
printf("%d\n", *p);
p = &b;
printf("%d\n", *p);
What is the difference between these two code samples? When I print the variable p, it prints the assigned value like below.
int *p;
p = 51;
printf("%d",p);
Output: 51
When I try to assign p=15, am I making memory address "15" in the ram as a pointee to the pointer p? When I try to add int c = 5 +p; it gives output as 71. Why am I getting 71?
I thought that the memory address "15" could store any information of the OS, programs, etc. But it exactly stores int for precise. Though I change the value p = 150; it gives int . How is that possible? What's happening under the hood?! I really don't understand.
Your code is illegal. Formally, it is not C. C language prohibits assigning integral values to pointer types without an explicit cast (with the exception of constant 0)
You can do
p = (int *) 51;
(with implementation-defined effects), but you cannot do
p = 51;
If your compiler allows the latter variant, it is a compiler-specific extension that has nothing to do with standard C language.
Typically, such assignment makes p to point to address 51 in memory.
On top of that, it is illegal to print pointer values with %d format specifier in printf. Either use %p or cast pointer value to proper integer type before using integer-specific format specifiers.
So you're telling that pointer that it points to 0x15. Then, you tell printf to print it as a decimal integer, so it treats it as such.
This reason this works is that on a 32 bit system, a pointer is 4 bytes, which matches the size of an int.
p points to a place in memory. *p is the contents of that space. But you never use the contents, only the pointer.
That pointer can be viewed as just a number, so printf("%d",p) works. When you assign a number to it, it interprets that as an offset into memory (in bytes). However, the pointer is supposed to contain ints, and when you add a number to a pointer, the pointer advances by that many spaces. So p+5 means "point to the int 5 spaces past the one you're pointing at now", which for 4-byte ints means 20 bytes later, hence the 71.
Otherwise, you've said you have a pointer to an int, but you're actually just doing all your stuff to the pointer, not the int it's pointing to.
If you actually put anything into the place you were pointing, you'd run into all kinds of trouble. You need to allocate some unused memory for it (e.g. with malloc), and then read and write values to that memory using *p.
I understand that in c, a pointer points to a memory address. In the following code
char *cp;
someType *up; // Assuming that someType is a union of size 16 bytes
cp = sbrk(nu * sizeof(someType));
// Here is what confuses me
up = (someType *)cp;
According to this link http://en.wikibooks.org/wiki/C_Programming/Pointers_and_arrays
If cp points to an address of 0x1234, so 0x1234 should be the beginning of the newly allocated memory, right?
So when "cp" is casted to a pointer to someType and assigned to "up", it actually says "up" is a pointer that points to 0x1234, assumes in 32-bits system, each memory address takes 4 bytes, a someType object will use 4 memory address to store its value, so the addresses 0x1234, 0x1235, 0x1236, 0x1237 collectively store a someType object, is this correct?
Thanks in advance.
Pointer type casting doesn't do anything at the machine level. The value in memory is still just a plain old value in memory.
At the language level, that value is used as an address. In other words it is often used as the value passed to operations that require memory locations, such as the assembly "load" operation.
At the language level, extra constructs are added to the machine operations, and one of those "extras" are data types. The compiler is responsible for checking if type-rule violations exist, but at run time there is not such a constraint.
As a result, the cast does nothing at run time, but at compile time it directs the compiler to not emit an error as the type of the pointer's value will now be considered a compatible type to the variable holding the address.
Even on a 32-bit system, a memory address typically refers to a single byte, not four. So your Header will be covering the 16 memory addresses between 0x1234 and 0x1243.
To answer your questions:
cp points to the beginning of the newly allocated memory.
However, up will contain the same address as cp. Note it is bad form to try and assign a pointer from one type to a different type without "casting" it first - just to tell the compiler you really meant to do such a dangerous thing.
Typically assigning between types can cause all kinds of problems - as a result compilers will throw warnings if you do this without casting. The casting, however, doesn't actually have any effect on the program code itself - it is more a method of the programmer telling the compiler they really meant what they did.
By the way, the pointer up is allocated on the stack in your function. Assigning a value to up actually places the value in the memory location on the stack. So by assigning cp to up you are merely copying the value of the pointer in cp to the location in the stack assigned to up. There is nothing placed into the memory allocated by the sbrk() call.
First of all, remember that a pointer is an abstraction of a memory address; do not assume a one-to-one correspondence between a pointer value and a physical location in RAM. Pointers also have type semantics associated with them; adding 1 to a pointer value advances it to point to the next object of the pointed-to type:
char *cp;
int *ip;
struct huge *hp;
cp = cp + 1; // advances cp to the address of the next char (1 byte)
ip = ip + 1; // advances ip to the address of the next int (anywhere
// from 2 to 8 bytes depending on the architecture)
hp = hp + 1; // advances hp to the address of the next struct huge (however
// many bytes struct huge takes up)
This is exactly how array indexing works; the expression a[i] is treated as *(a + i); you offset i elements of whatever size from the base address, not i bytes.
As far as the cast is concerned...
A pointer to char is a different, incompatible type than a pointer to Header. They may have different sizes and representations depending on the underlying architecture. Because of this, the language won't allow an implicit conversion of one to the other through a simple assignment; you must use a cast to convert the rhs value to the type expected by the lhs.
If cp points to a address of 0x1234, so 0x1234 should be the beginning of the newly allocated memory, right?
Right.
So when cp is casted to a pointer to Header and assigned to up, it actually says up is a pointer that points to 0x1234, assumes in 32-bits system, each memory address takes 4 bytes, a Header object will use 4 memory address to store its value, so the addresses 0x1234, 0x1235, 0x1236, 0x1237 collectively store a Header object, is this correct?
Wrong. C is strictly byte addressed. The Header object will occupy addresses 0x1234 through 0x1234 + sizeof(Header) - 1, inclusive, whatever sizeof(Header) is (perhaps 16 in this case). To confuse the issue, adding 1 to any pointer increments its numeric value by the sizeof whatever it points to, so it is the case that up + 1 points beyond the end of allocated memory. However cp + 1 points to the second byte in the representation of the header object, whatever that is. (sizeof(char) is 1 by definition.)
This has nothing to do with your question, but I must warn you that calling sbrk with a nonzero argument will cause your program to crash at some indefinite point after the next call to malloc, realloc or free, and that nearly all standard library functions are allowed to call those functions "under the hood". If you want to make a large allocation directly from the operating system, use mmap.
I found this declaration in a C program
char huge * far *p;
Explanation: p is huge pointer, *p is far pointer and **p is char type
data variable.
Please explain declaration in more detail.
PS: I'm not asking about huge or far pointer here. I'm a newbie to programming
**p is character.Now a pointer pointing to address of this character will have value &(**p). Again if you want to take pointer to this pointer then next will be &(*p) and result will be p only.
If you read below sentence from right to left, you will get it all
p is huge pointer, *p is far pointer and **p is char type data variable.
In a nutshell virtual addresses on an Intel x86 chip have two components - a selector and an offset. The selector is an index into a table of base addresses [2] and the offset is added onto that base address. This was designed to let the processor access 20 bit (on a 8086/8, 186), 30 bit (286) or 46 bit (386 and later) virtual address spaces without needing registers that big.
'far' pointers have an explicit selector. However when you do pointer arithmetic on them the selector isn't modified.
'huge' pointers have an explicit selector. When you do pointer arithmetic on them though the selector can change.
Huge and far pointers are not part of standard C. They are borland extensions to the C language for managing segmented memory in DOS and Windows 16/32bit. Functionally, what the declaration says is **p is a char. That means "dereference p to get a pointer to char, dereference that to get a char"
In order to understand C pointer declarator semantics try this expression instead:
int* p;
It means p is a pointer to int. (hint: read from right to left)
int* const p;
This means p is a const pointer to int. (so you can't change the value of p)
Here's a proof of that:
p= 42; // error: assignment of read-only variable āpā
Another example:
int* const* lol;
This means lol is a pointer to const pointer to int. So the pointer which lol points at cannot point at another int.
lol= &p; // and yes, p cannot be reassigned, so we are correct.
In most cases reading from right to left makes sense. Now read the expression in question from right to left:
char huge * far *p;
Now the huge and far are just behaviour specifiers for pointers created by borland. What it actually means is
char** p;
"p is a pointer to pointer to char"
That means whatever p points to, points to a char.
Back in the 16-bit days on 8086, it would have declared a 32-bit pointer to a "normalized" 32-bit pointer to char (or to the first of an array thereof).
The difference exists because 32-bit pointers were composed of a segment number and offset between that segment, and segments overlapped (which meant two different pointers could point to the same physical address; example: 0x1200:1000 and 0x1300:0000). The huge qualifier forced a normalization using the highest segment number (and therefore, the lowest possible offset).
However, this normalization had a cost performance-wise, because after each operation that modified a pointer, the compiler had to automatically insert a code like this:
ptr = normalize(ptr);
with:
void huge * normalize(void huge *input)
{
unsigned long input2 = (unsigned long)input;
unsigned short segment = input >> 16;
unsigned short offset = (unsigned short)input;
segment += (offset >> 4);
offset &= 0x000F;
return ((unsigned long)segment) << 16 | offset;
}
The upside was the advantage of using your memory like it was flat, without worrying about segments and offsets.
Clarification to the other answers:
The far keyword is non-standard C, but it is not just an old obsolete extension from ancient PC days. Today, there are many modern 8 and 16 bit CPUs that uses "banked" memory to extend the amount of addressable memory beyond 65k. Typically they use a special register to pick a memory bank, effectively ending up with 24-bit addresses. All small microcontrollers on the market with RAM+flash memory > 64kb use such features.
I got surprised when the following program did not crash.
typedef struct _x {
int a;
char b;
int c;
} x;
main() {
x *ptr = 0;
char *d = &ptr->b;
}
As per my understanding the -> operator has higher precedence over & operator. So I expected the program to crash at the below statement when we try to dereference the NULL pointer tr.
char *d = &ptr->b;
But the statement &ptr->b evaluates to a valid address. Could somebody please explain where I'm wrong?
Your expectations were unfounded. C programs don't necessarily "crash" when you dereference null pointers. C programs exhibit so called undefined behavior when you attempt to do something like that. Undefined behavior can manifest itself in many different ways. It can result in a crash. Or it can produce something that even resembles a "working" program. The latter is what apparently happened in your case.
But in any case, your program's behavior is undefined. And no, it does not produce a "valid address" as you seem to mistakingly believe. A numerical address that corresponds to a location in memory where no object exists is not valid (with the exception of null pointer value, of course).
The reason that your code doesn't crash is that you didn't actually dereference the pointer. Notice that the expression
&ptr->b
doesn't actually try loading the contents of ptr or ptr->b. Instead, it just stores the address of where this is located in memory. What you'll end up getting is a pointer to where the b field of the object pointed at by ptr should be. This will be a few bytes past address 0, so dereferencing the pointer you just created will cause a segfault.
&ptr->b == sizeof(int), that means the offset of b within _x after _x.a (which is of type int) relative to the address *((x*)0). The offset of 4 (typical for 32bit architecture) is saved within the d pointer. You have to access to d in order to get an seg-fault.
Computing an address does not require accessing memory. &ptr->b means "give me the address of the b field of the structure pointed to by ptr." Doing so does not require looking at whatever may be stored in that memory location.
It may be helpful to think about indexing an array instead of a structure. C defines ptr[5] as equivalent to *(ptr + 5) , which means that &(ptr[5]) is the same as &(*(ptr + 5)). Now it's easy to see that the & and the * "cancel out" and leave you with (ptr + 5), which involves only a pointer increment, and not a load from memory.
C makes this slightly cloudy because it distinguishes lvalues from rvalues. That is, an expression that refers to memory is treated differently on the left hand side of an expression than it is on the right. Given a statement like x = y;, a C compiler will load a value from the address of y, and store it in the address of x. This is the distinction: y is implicitly dereferenced, but x is not.