Accessing struct member of pointer's address in C - c

I'm reading this book, and I found this code snippet in Chapter 14.
struct kobject *cdev_get(struct cdev *p)
{
struct module *owner = p->owner;
struct kobject *kobj;
if (owner && !try_module_get(owner))
return NULL;
kobj = kobject_get(&p->kobj);
if (!kobj)
module_put(owner);
return kobj;
}
I understand that this dereferences p, a cdev pointer then accesses its owner member
p->owner // (*p).owner
However, how does this work? It seems like it dereferences the memory address of a cdev pointer then access the kobj member of the pointer itself?
&p->kobj // (*(&p)).kobj
I thought pointers weren't much more than memory addresses so I don't understand how they can have members. And if it was trying to access a member of the pointer itself, why not just do p.kobj?

As per p being defined as struct cdev *p, p is very much a "memory address" but that's not all it is - it also has a type attached to it.
Since the expression *ptr is "the object pointed to by ptr", that also has the type attached, so you can logically do (*ptr).member.
And, since ptr->member is identical to (*ptr).member, it too is valid.
Bottom line is, your contention that "pointers [aren't] much more than memory addresses" is correct. But they are a little bit more :-)
In terms of &ptr->member, you seem to be reading that as (&ptr)->member, which is not correct.
Instead, as per C precedence rules, it is actually &(ptr->member), which means the address of the member of that structure.
These precedence rules are actually specified by the ISO C standard (C11 in this case). From 6.5 Expressions, footnote 85:
The syntax specifies the precedence of operators in the evaluation of an expression, which is the same as the order of the major subclauses of this subclause, highest precedence first.
And, since 6.5.2 Postfix operators (the bit covering ->) comes before 6.5.3 Unary operators (the bit covering &), that means -> evaluates first.

A pointer variable contains a memory address. What you need to consider is how the C programming language is used to write source code in a higher level language that is then converted for you into the machine code actually used by the computer.
The C programming language is a language that was designed to make using the hardware of computers easier than using assembly code or machine code. So it has language features to make it easier to write source code that is more readable and easier to understand than assembly code.
When we declare a pointer variable in C as a pointer to a type what we are telling the compiler is the type of the data at the memory location whose address is stored in the pointer. However the compiler does not really know if we are telling it the truth or not. The key thing to remember is that an actual memory address does not have a type, it is just an address. Any type information is lost once the compiler compiles the source code into machine code.
A struct is a kind of template or pattern or stencil that is used to virtually overlay a memory area to determine how the bytes in the memory area are to be interpreted. A programmer can use higher level language features when working with data without having to know about memory addresses and offsets.
If a variable is defined as the struct type then a memory area large enough to hold the struct is allocated and the compiler will figure out member offsets for you. If a variable is defined as a pointer to a memory area that is supposed to contain the data for that type again the compiler will figure out member offsets for you. However it is up to you to have the pointer variable containing the correct address.
So if you have a struct something like the following:
struct _tagStruct {
short sOne;
short sTwo;
};
And you then use it such as:
struct _tagStruct one; // allocate a memory area large enough for a struct
struct _tagStruct two; // allocate a memory area large enough for a struct
struct _tagStruct *three; // a pointer to a memory area to be interpreted as a struct
one.sOne = 5; // assign a value to this memory area interpreted as a short
one.sTwo = 7; // assign a value to this memory area interpreted as a
two = one; // make a copy of the one memory area in another
three = &one; // assign an address of a memory area to our pointer
three->sOne = 405; // modify the memory area pointed to, one.sOne in this case
You do not need to worry about the details of the memory layout of the struct and offsets to the struct members. And assigning one struct to another is merely an assignment statement. So this all works at a human level rather than a machine level of thinking.
However what if I have a function, short funcOne (short *inoutOne), that I want to use with the sOne member of the struct one? I can just do this funcOne(&one.sOne) which calls the function funcOne() with the address of the sOne member of the struct _tagStruct variable one.
A typical implementation of this in machine code is to load the address of the variable one into a register, add the offset to the member sOne and then call the function funcOne() with this calculated address.
I could also do something similar with a pointer, funcOne(&three->sOne).
A typical implementation of this in machine code is to load the contents of the pointer variable three into a register, add the offset to the member sOne and then call the function funcOne() with this calculated address.
So in one case we load the address of a variable into a register before adding the offset and in the second case we load the contents of a variable into a register before adding the offset. In both cases the compiler is using an offset which is usually the number of bytes from the beginning of the struct to the member of the struct. In the case of the first member, sOne of struct _tagStruct this offset would be zero bytes since it is the first member of the struct. For many compilers the offset of the second member, sTwo, would be two bytes since the size of a short is two bytes.
However the compiler is free to make choices about the layout of a struct unless explicitly told otherwise so on some computers the offset of member sTwo may be four bytes in order to generate more efficient machine code.
So using the C programming language allows us some degree of independence from the underlying computer hardware unless there is some reason for us to actually deal with those details.
The C language standard specifies operator precedence meaning when different operators are mixed together in a statement and parenthesis are not used to specify an exact order of evaluation on the expression then the compiler will use these standard rules to determine how to turn the C language expression into the proper machine code (see Operator precedence table for the C programming language ).
Both the dot (.) operator and the dereference (->) operator have equal precedence as well as the highest precedence of the operators. So when you write an expression such as &three->sOne then what the compiler does is turn it into an express that looks like &(three->sOne). This is using the address of operator to calculate an address of the sOne member of the memory area pointed to by the pointer variable three.
A different expression would be (&three)->sOne which actually should throw a compiler error since &three is not a pointer to a memory area holding a struct _tagStruct value but is instead a pointer to a pointer since three is a pointer to a variable of type struct _tagStruct and not a variable of type struct _tagStruct.

->member has higher precedence than &.
&p->kobj
parses as
&(p->kobj)
i.e. it's taking the address of the kobj member of the struct pointed to by p.

You have the order of operations wrong:
&p->kobj // &(p->kobj)

Related

Does C always have to use pointers to handle addresses?

As I understand it, all of the cases where C has to handle an address involve the use of a pointer. For example, the & operand creates a pointer to the program, instead of just giving the bare address as data (i.e it never gives the address without using a pointer first):
scanf("%d", &foo)
Or when using the & operand
int i; //a variable
int *p; //a variable that store adress
p = &i; //The & operator returns a pointer to its operand, and equals p to that pointer.
My question is: Is there a reason why C programs always have to use a pointer to manage addresses? Is there a case where C can handle a bare address (the numerical value of the address) on its own or with another method? Or is that completely impossible? (Being because of system architecture, memory allocation changing during and in each runtime, etc). And finally, would that be useful being that addresses change because of memory management? If that was the case, it would be a reason why pointers are always needed.
I'm trying to figure out if the use pointers is a must in C standardized languages. Not because I want to use something else, but because I want to know for sure that the only way to use addresses is with pointers, and just forget about everything else.
Edit: Since part of the question was answered in the comments of Eric Postpischil, Michał Marszałek, user3386109, Mike Holt and Gecko; I'll group those bits here: Yes, using bare adresses bear little to no use because of different factors (Pointers allow a number of operations, adresses may change each time the program is run, etc). As Michał Marszałek pointed out (No pun intended) scanf() uses a pointer because C can only work with copies, so a pointer is needed to change the variable used. i.e
int foo;
scanf("%d", foo) //Does nothing, since value can't be changed
scanf("%d", &foo) //Now foo can be changed, since we use it's address.
Finally, as Gecko mentioned, pointers are there to represent indirection, so that the compiler can make the difference between data and address.
John Bode covers most of those topics in it's answer, so I'll mark that one.
A pointer is an address (or, more properly, it’s an abstraction of an address). Pointers are how we deal with address values in C.
Outside of a few domains, a “bare address” value simply isn’t useful on its own. We’re less interested in the address than the object at that address. C requires us to use pointers in two situations:
When we want a function to write to a parameter
When we need to track dynamically allocated memory
In these cases, we don’t really care what the address value actually is; we just need it to access the object we’re interested in.
Yes, in the embedded world specific address values are meaningful. But you still use pointers to access those locations. Like I said above, a pointer is an address for our purposes.
C allows you to convert pointers to integers. The <stdint.h> header provides a uintptr_t type with the property that any pointer to void can be converted to uintptr_t and back, and the result will compare equal to the original pointer.
Per C 2018 6.3.2.3 6, the result of converting a pointer to an integer is implementation-defined. Non-normative note 69 says “The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.”
Thus, on a machine where addresses are a simple numbering scheme, converting a pointer to a uintptr_t ought to give you the natural machine address, even though the standard does not require it. There are, however, environments where addresses are more complicated, and the result of converting a pointer to an integer may not be straightforward.
int i; //a variable
int *p; //a variable that store adres
i = 10; //now i is set to 10
p = &i; //now p is set to i address
*p = 20; //we set to 20 the given address
int tab[10]; // a table
p = tab; //set address
p++; //operate on address and move it to next element tab[1]
We can operate on address by pointers move forward or backwards. We can set and read from given address.
In C if we want get return values from functions we must use pointers. Or use return value from functions, but that way we can only get one value.
In C we don't have references therefore we must use pointers.
void fun(int j){
j = 10;
}
void fun2(int *j){
*j = 10;
}
int i;
i = 5; // now I is set to 5
fun(i);
//printf i will print 5
fun2(&i);
//printf I will print 10

C sizes of pointer and value being pointed mismatch

IAR Embedded Workbench for msp430. selected C standard 99
Hello, I am new to pointers and stuck in one place. Here is a part of code:
void read_SPI_CS_P2(uint8_t read_from, int save_to, uint8_t bytes_to_read)
{
uint8_t * ptr;
ptr = save_to;
...
From what I read about pointers I assume that:
uint8_t * ptr; - here I declare what type of data pointer points to (I wanna save uint8_t value)
ptr = save_to; - here I assign adress of memory I would like to write to (it's 0xF900 so int)
It gives me an Error[Pe513]: a value of type "int" cannot be assigned to an entity of type "uint8_t *"
The question is.. why? Size of data that will be saved (to save_to) and size of memory adress can't be different?
From a syntax point of view you can cast an int directly to a pointer changing its value. but is not probably a good idea or something that will help you in this case.
You've to use the compiler and the linker support to instruct them that you want some data at a specific location in memory. Usually you can do this (with IAR toolchains) with #pragma location syntax, using something like:
__no_init volatile uint8_t g_u8SavingLocation # 0xF900;
You cannot simply set the value of a pointer in memory and start to write at that location. The Linker is in charge to decide where stuffs goes in memory, and you can instruct with #pragma and linker setup files on what you want to achieve.
C language has no immediate feature that would allow you to create pointers to arbitrary numerical addresses. In C language pointers are obtained by taking addresses of existing objects (using unary & operator) or by receiving pointers from memory allocation functions like malloc. Additionally, you can create null pointers. In all such cases you never know and never need to know the actual numerical value of the pointer.
If you want to make your pointer to point to a specific address, then formally the abstract C language cannot help you with it. There's simply no such feature in the language. However, in the implementation-dependent portions of the language a specific compiler can guarantee, that if you forcefully convert an integral value (containing a numerical address) to pointer type, the resultant pointer will point to that address. So, if in your case the numerical address is stored in save_to variable, you can force it into a pointer by using an explicit cast
ptr = (unit8_t *) save_to;
The cast is required. Assigning integral values to pointers directly is not allowed in standard C. (It used to be allowed in pre-standard C though).
A better type to store integral addresses would be uintptr_t. int, or any other signed type, is certainly not a good choice for that purpose.
P.S. I'm not sure what point you are trying to make when you talk about sizes. There's no relation between the "size of data that will be saved" and size of memory address. Why?

Dereferencing in C

I've just started to learn C so please be kind.
From what I've read so far regarding pointers:
int * test1; //this is a pointer which is basically an address to the process
//memory and usually has the size of 2 bytes (not necessarily, I know)
float test2; //this is an actual value and usually has the size of 4 bytes,
//being of float type
test2 = 3.0; //this assigns 3 to `test2`
Now, what I don't completely understand:
*test1 = 3; //does this assign 3 at the address
//specified by `pointerValue`?
test1 = 3; //this says that the pointer is basically pointing
//at the 3rd byte in process memory,
//which is somehow useless, since anything could be there
&test1; //this I really don't get,
//is it the pointer to the pointer?
//Meaning, the address at which the pointer address is kept?
//Is it of any use?
Similarly:
*test2; //does this has any sense?
&test2; //is this the address at which the 'test2' value is found?
//If so, it's a pointer, which means that you can have pointers pointing
//both to the heap address space and stack address space.
//I ask because I've always been confused by people who speak about
//pointers only in the heap context.
Great question.
Your first block is correct. A pointer is a variable that holds the address of some data. The type of that pointer tells the code how to interpret the contents of the address being held by that pointer.
The construct:
*test1 = 3
Is called the deferencing of a pointer. That means, you can access the address that the pointer points to and read and write to it like a normal variable. Note:
int *test;
/*
* test is a pointer to an int - (int *)
* *test behaves like an int - (int)
*
* So you can thing of (*test) as a pesudo-variable which has the type 'int'
*/
The above is just a mnemonic device that I use.
It is rare that you ever assign a numeric value to a pointer... maybe if you're developing for a specific environment which has some 'well-known' memory addresses, but at your level, I wouldn't worry to much about that.
Using
*test2
would ultimately result in an error. You'd be trying to deference something that is not a pointer, so you're likely to get some kind of system error as who knows where it is pointing.
&test1 and &test2 are, indeed, pointers to test1 and test2.
Pointers to pointers are very useful and a search of pointer to a pointer will lead you to some resources that are way better than I am.
It looks like you've got the first part right.
An incidental thought: there are various conventions about where to put that * sign. I prefer mine nestled with the variable name, as in int *test1 while others prefer int* test1. I'm not sure how common it is to have it floating in the middle.
Another incidental thought: test2 = 3.0 assigns a floating-point 3 to test2. The same end could be achieved with test2=3, in which case the 3 is implicitly converted from an integer to a floating point number. The convention you have chosen is probably safer in terms of clarity, but is not strictly necessary.
Non-incidentals
*test1=3 does assign 3 to the address specified by test.
test1=3 is a line that has meaning, but which I consider meaningless. We do not know what is at memory location 3, if it is safe to touch it, or even if we are allowed to touch it.
That's why it's handy to use something like
int var=3;
int *pointy=&var;
*pointy=4;
//Now var==4.
The command &var returns the memory location of var and stores it in pointy so that we can later access it with *pointy.
But I could also do something like this:
int var[]={1,2,3};
int *pointy=&var;
int *offset=2;
*(pointy+offset)=4;
//Now var[2]==4.
And this is where you might legitimately see something like test1=3: pointers can be added and subtracted just like numbers, so you can store offsets like this.
&test1 is a pointer to a pointer, but that sounds kind of confusing to me. It's really the address in memory where the value of test1 is stored. And test1 just happens to store as its value the address of another variable. Once you start thinking of pointers in this way (address in memory, value stored there), they become easier to work with... or at least I think so.
I don't know if *test2 has "meaning", per se. In principle, it could have a use in that we might imagine that the * command will take the value of test2 to be some location in memory, and it will return the value it finds there. But since you define test2 as a float, it is difficult to predict where in memory we would end up, setting test2=3 will not move us to the third spot of anything (look up the IEEE754 specification to see why). But I would be surprised if a compiler would allow such thing.
Let's look at another quick example:
int var=3;
int pointy1=&var;
int pointy2=&pointy1;
*pointy1=4; //Now var==4
**pointy2=5; //Now var==5
So you see that you can chain pointers together like this, as many in a row as you'd like. This might show up if you had an array of pointers which was filled with the addresses of many structures you'd created from dynamic memory, and those structures contained pointers to dynamically allocated things themselves. When the time comes to use a pointer to a pointer, you'll probably know it. For now, don't worry too much about them.
First let's add some confusion: the word "pointer" can refer to either a variable (or object) with a pointer type, or an expression with the pointer type. In most cases, when people talk about "pointers" they mean pointer variables.
A pointer can (must) point to a thing (An "object" in standards parlance). It can only point to the right kind of thing; a pointer to int is not supposed to point to a float object. A pointer can also be NULL; in that case there is no thing to point to.
A pointertype is also a type, and a pointer object is also an object. So it is allowable to construct a pointer to pointer: the pointer-to-pointer just stores the addres of the pointer object.
What a pointer can not be:
It cannot point to a value: p = &4; is impossible. 4 is a literal value, which is not stored in an object, and thus has no address.
the same goes for expressions: p = &(1+4); is impossible, because the expression "1+4" does not have a location.
the same goes for return value p = &sin(pi); is impossible; the return value is not an object and thus has no address.
variables marked as "register" (almost distinct now) cannot have an address.
you cannot take the address of a bitfield, basically because these can be smaller than character (or have a finer granularity), hence it would be possible that different bitmasks would have the same address.
There are some "exceptions" to the above skeletton (void pointers, casting, pointing one element beyond an array object) but for clarity these should be seen as refinements/amendments, IMHO.

How does the OS know how much to increment different pointers?

With a 32-bit OS, we know that the pointer size is 4 bytes, so sizeof(char*) is 4 and sizeof(int*) is 4, etc. We also know that when you increment a char*, the byte address (offset) changes by sizeof(char); when you increment an int*, the byte address changes by sizeof(int).
My question is:
How does the OS know how much to increment the byte address for sizeof(YourType)?
The compiler only knows how to increment a pointer of type YourType * if it knows the size of YourType, which is the case if and only if the complete definition of YourType is known to the compiler at this point.
For example, if we have:
struct YourType *a;
struct YourOtherType *b;
struct YourType {
int x;
char y;
};
Then you are allowed to do this:
a++;
but you are not allowed to do this:
b++;
..since struct YourType is a complete type, but struct YourOtherType is an incomplete type.
The error given by gcc for the line b++; is:
error: arithmetic on pointer to an incomplete type
The OS doesn't really have anything to do with that - it's the compiler's job (as #zneak mentioned).
The compiler knows because it just compiled that struct or class - the size is, in the struct case, pretty much the sum of the sizes of all the struct's contents.
It is primarily an issue for the C (or C++) compiler, and not primarily an issue for the OS per se.
The compiler knows its alignment rules for the basic types, and applies those rules to any type you create. It can therefore establish the alignment requirement and size of YourType, and it will ensure that it increments any YourType* variable by the correct value. The alignment rules vary by hardware (CPU), and the compiler is responsible for knowing which rules to apply.
One key point is that the size of YourType must be such that when you have an array:
YourType array[20];
then &array[1] == &array[0] + 1. The byte address of &array[1] must be incremented by sizeof(YourType), and (assuming YourType is a structure), each of the elements of array[1] must be properly aligned, just as the elements of array[0] must be properly aligned.
Also remember types are defined in your compiled code to match the hardware you are working on. It is entirely up to the source code that is used to work this out.
So a low end chipset 16 bit targeted C program might have need to define types differently to a 32 bit system.
The programming language and compiler are what govern your types. Not the OS or hardware.
Although of course trying to stick a 32 bit number into a 16 bit register could be a problem!
C pointers are typed, unlike some old languages like PL/1. This not only allows the size of the object to be known, but so widening operations and formatting can be carried out. For example getting the data at *p, is that a float, a double, or a char? The compiler needs to know (think divisions, for example).
Of course we do have a typeless pointer, a void *, which you cannot do any arithmetic with simply because the compiler has no idea how much to add to the address.

Real thing about "->" and "."

I always wanted to know what is the real thing difference of how the compiler see a pointer to a struct (in C suppose) and a struct itself.
struct person p;
struct person *pp;
pp->age, I always imagine that the compiler does: "value of pp + offset of atribute "age" in the struct".
But what it does with person.p? It would be almost the same. For me "the programmer", p is not a memory address, its like "the structure itself", but of course this is not how the compiler deal with it.
My guess is it's more of a syntactic thing, and the compiler always does (&p)->age.
I'm correct?
p->q is essentially syntactic sugar for (*p).q in that it dereferences the pointer p and then goes to the proper field q within it. It saves typing for a very common case (pointers to structs).
In essence, -> does two deferences (pointer dereference, field dereference) while . only does one (field dereference).
Due to the multiple-dereference factor, -> can't be completely replaced with a static address by the compiler and will always include at least address computation (pointers can change dynamically at runtime, thus the locations will also change), whereas in some cases, . operations can be replaced by the compiler with an access to a fixed address (since the base struct's address can be fixed as well).
Updated (see comments):
You have the right idea, but there is an important difference for global and static variables only: when the compiler sees p.age for a global or static variable, it can replace it, at compile time, with the exact address of the age field within the struct.
In contrast, pp->age must be compiled as "value of pp + offset of age field", since the value of pp can change at runtime.
The two statements are not equivalent, even from the "compiler perspective". The statement p.age translates to the address of p + the offset of age, while pp->age translates to the address contained in pp + the offset of age.
The address of a variable and the address contained in a (pointer) variable are very different things.
Say the offset of age is 5. If p is a structure, its address might be 100, so p.age references address 105.
But if pp is a pointer to a structure, its address might be 100, but the value stored at address 100 is not the beginning of a person structure, it's a pointer. So the value at address 100 (the address contained in pp) might be, for example, 250. In that case, pp->age references address 255, not 105.
Since p is a local (automatic) variable, it is stored in the stack. Therefore the compiler accesses it in terms of offset with regard to the stack pointer (SP) or frame pointer (FP or BP, in architectures where it exists). In contrast, *p refers to a memory address [usually] allocated in the heap, so the stack registers are not used.
This is a question I've always asked myself.
v.x, the member operator, is valid only for structs.
v->x, the member of pointer operator, is valid only for struct pointers.
So why have two different operators, since only one is needed? For example, only the . operator could be used; the compiler always knows the type of v, so it knows what to do: v.x if v is a struct, (*v).x if v is a struct pointer.
I have three theories:
temporary shortsightedness by K&R (which theory I'd like to be false)
making the job easier for the compiler (a practical theory, given the conception time of C :)
readability (which theory I prefer)
Unfortunately, I don't know which one (if any) is true.
In both cases the structure and its members are addressed by
address(person) + offset(age)
Using p with a struct stored in the stack memory gives the compiler more options to optimize memory usage. It could store the age only, instead of the whole struct if nothing else is used - this makes addressing with the above function fail (I think reading the address of a struct stops this optimization).
A struct on the stack may have no memory address at all. If the struct is small enough and only lives a short time it can be mapped to some of the processors registers (same as for the optimization above for reading the address).
The short answer: when the compiler does not optimize you are right. As soon as the compiler starts optimizing only what the c standard specifies is guaranteed.
Edit: Removed flawed stack/heap location for "pp->" since the pointed to struct can be on both heap and stack.

Resources