What is happening in scanf() when using the & operator? - c

I am new to C programming, and have a question about the following couple lines of code. This takes place within the context of a creating a linked list of struct film:
struct film {
char title[TSIZE];
int rating;
struct film * next;
}
int main(void)
{
struct film * head = NULL;
struct film * prev, *current;
char input[TSIZE];
// some code ommitted
strcopy(current->title, input);
puts("Enter your rating <0-10>");
scanf("%d", &current->rating);
}
Basically my question is about the strcopy() and scanf() functions. I notice that with strcopy the first parameter is using the member access operator -> on a pointer to the struct. I believe that the first argument to strcopy is supposed to be a pointer to char, so when using the member access operator, are we getting a direct pointer to title even though title is not declared as a pointer inside the struct?
I am confused about how this contrasts with the scanf() call where we use the & operator to get the address of current->rating. Is scanf() taking the address of the struct pointer then doing member access or is it the address of the structs member 'rating'? Why not just pass in the pointer similarly to strcopy()?
Id imagine there is a difference between doing &current->rating vs &(current->rating)? Is &current->rating the address of a pointer (kind of like a pointer to pointer?).
Thanks in advance.

When you pass an array as a function argument in C, the compiler actually passes a pointer to the first element in the array. arrayVar is the same as &arrayVar[0]. So, the strcopy function's first argument is a pointer to the first element of the character array title in the structure pointed to by current.
When you pass an int to a function, you are simply passing the value of the variable, not a pointer to it. Since scanf requires a pointer that it will store the value in, you have to use & to get a pointer to the variable instead. The second argument to scanf is a pointer to the integer rating in the structure pointed to by current.

are we getting a direct pointer to title even though title is not declared as a pointer inside the struct?
To grasp the answer to this question fully, you need to understand how data is stored in memory.
Basically (to keep this simple), the computer has rows of memory blocks, each part has an address (like a row of houses). A single variable (like char A;) has only one house and is thus a static address. An array of variables (like char Arr[10]) has multiple houses (all in a row) but the array (pointer) itself can only point to one house at a time.
A bit like using a tape. So when you say Array[1] you're really saying 'relative to the first house [0], move down to the next house [1]' (or more technically, the memory address plus the size of the pointer times by how many blocks you want to move past).
Is scanf() taking the address of the struct pointer then doing member access or is it the address of the structs member 'rating'?
It's the address of struct's member, rating. Normally to clarify dereferencing, one would use brackets, so &(current->rating), as opposed to passing current (&current), but generally compilers know what is being dereferenced, although bracketing can help if there's an issue.
Why not just pass in the pointer similarly to strcopy()?
The array techically is a pointer (multiple houses sorta thing), where-as rating has no pointer (one house, not needed). If you had an array of ints then you'd be able to do that.

current->title is already a pointer (the type is char[] ), so you don't need the & operator while current->rating is not a pointer (the type is an int), so you need to get the address of the variable (which is what the & operator is doing here).

Related

Does null pointer in a struct occupy more memory than no pointer at all?

I'm trying to make a small text editor and efficient (space/time wise). I also want to save changes to the text; I would save changes and the main text in two lists, both made of nodes like this:
struct node{
int startingLine;
int endingLine;
char *line;
char *newLine;
};
My idea is to use this struct both for the text list and the changes list;line and newLine are to point at char arrays (the lines of text)
when these are text nodes the newLine array is empty or points at null, and when the line gets changed this node is just moved to the changes list and the newLine array gets filled with the new line that replaced the first in the text;
this way I would not have to free the node from the text list and malloc another node in the changes list and also copy all the information; but:
when I try to set an array to NULL i get an error; I wonder why, I thought array names were just pointers?
also, to use the heap I only know malloc(sizeof(struct node)), does it allocate space for the second pointer too even if i don't immediately need it?
So in conclusion I ask if this is a good idea or how to work around it or if it can be polished somehow; maybe immediately after allocating a node I should set newLine to NULL? A NULL pointer occupies no memory at all or still something compared to not putting any pointer in the struct? Because as is said the idea would be to have the text list made of nodes but with all their "useless" newLine pointers hanging there.
when I try to set an array to NULL i get an error; I wonder why, I thought array names were just pointers?
No. An array variable is similar to a pointer, and convertible to a pointer, but it's a different type. See Is an array name a pointer? for a more complete answer on this.
also, to use the heap I only know malloc(sizeof(struct node)), does it allocate space for the second pointer too even if i don't immediately need it?
When you allocate a node, the second pointer is part of that structure, so the pointer itself exists. But malloc() has no idea what the pointers in your node should point to, and it doesn't allocate the memory that line and newLine point to.
I ask if this is a good idea or how to work around it or if it can be polished somehow; maybe immediately after allocating a node I should set newLine to NULL?
If the fields in your node structure are what you need, then it's fine. Setting line and newLine to nil when you create a new node is a good idea so that you won't accidentally dereference a garbage pointer.
A NULL pointer occupies no memory at all or still something compared to not putting any pointer in the struct?
The pointers themselves are part of the node struct, so the take up as much memory as any two pointers. The things that line and newLine point to use as much memory as they need. If line and newLine are nil, then they don't point to anything and of course no memory is used. You could have a thousand pointers all pointing to the same block of memory, and the space needed would be still just the size of that block and the space occupied by the pointers themselves. Try to distinguish between a pointer, which is just an address, and the data at that address; they're completely different things.
So about efficiency, this can be good to avoid creating new nodes and copying in all the data, because just allocating the pointer itself takes very little space i hope?
Yes, a pointer typically takes only 8 bytes on a 64-bit system, so not a lot. But you still have to allocate space for the actual data, so it's not obvious that your approach will be more efficient with respect to space than other methods.
It can be tough to get the hang of pointers at first. The best way to get it is to spend a lot of time with pointers. I'd suggest that you do some exercises to practice, e.g. create a program that breaks some input text into a linked list of words and then sorts the list. To attempt writing a text editor that manipulates blocks of data when your understanding of pointers is shaky is to set yourself up for a world of hurt; time that you spend building your confidence with pointers will pay for itself a hundred times over.
when I try to set an array to NULL i get an error; I wonder why, I thought array names were just pointers?
No. Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array. Otherwise, arrays and pointers are completely different animals.
When you declare an array like
int a[10];
what you get in memory is
+---+
| | a[0]
+---+
| | a[1]
+---+
...
+---+
| | a[9]
+---+
That's it. There's no separate pointer object storing the address of the first element. There's no a object separate from the array elements themselves. It's just a sequence of elements. During translation, any instance of the expression a that's not an operand of sizeof or unary & will be replaced with the address of a[0].
also, to use the heap I only know malloc(sizeof(struct node)), does it allocate space for the second pointer too even if i don't immediately need it?
It will allocate space for the line and newLine pointer members, yes; it won't allocate any additional memory for them to point to. That would have to be done as a separate step, e.g.
struct node *n = malloc( sizeof *n );
if ( n )
{
n->line = malloc( line_length + 1 ); // sizeof (char) is 1 by definition so we don't need a sizeof here, +1 to account for string terminator;
...
}
This means you'll have to call free( n->line ) before calling free( n ).
A NULL pointer occupies no memory at all or still something compared to not putting any pointer in the struct?
A pointer object takes up the same amount of space regardless of whether its value is NULL or not. It's like an int takes up the same amount of space whether it's value is 0, 1, 65535, or 2147483647.
The exact size of a pointer type depends on the platform and the pointer type itself. C does not require a specific size for pointer types, nor does it require all pointer types to be the same size. The only requirements are
void * and char * have the same size and alignment;
pointers to qualified types have the same size and alignment as their unqualified equivalent (i.e., const int * and volatile int * and int * should all have the same size);
pointers to all struct types have the same size and alignment;
pointers to all union types have the same size and alignment;
On most modern, general-purpose platforms like x86 and x86-64, all pointer types will have the same size. That may not be the case on some embedded or special-purpose architectures, though.

How do arrays work inside a struct?

If I have for example
typedef struct node
{
int numbers[5];
} node;
Whenever I create an instance of such a struct there's gonna be allocation of memory in the stack for the array itself, (in our case 20 bytes for 5 ints(considering ints as 32 bits)), and numbers is gonna be a pointer to the first byte of that buffer. So, I thought that since inside an instance of node, there's gonna be a 20 bytes buffer(for the 5 ints) and a 4 bytes pointer(numbers), sizeof(node) should be 24 bytes. But when I actually print it out is says 20 bytes. Why is this happening? Why is the pointer to the array not taken into account?
I shall be very grateful for any response.
Arrays are not pointers:
int arr[10]:
Amount of memory used is sizeof(int)*10 bytes
The values of arr and &arr are necessarily identical
arr points to a valid memory address, but cannot be set to point to another memory address
int* ptr = malloc(sizeof(int)*10):
Amount of memory used is sizeof(int*) + sizeof(int)*10 bytes
The values of ptr and &ptr are not necessarily identical (in fact, they are mostly different)
ptr can be set to point to both valid and invalid memory addresses, as many times as you will
There is no pointer, just an array. Therefore the struct is of size sizeof( int[5] ) ( plus possible padding ).
The struct node and its member numbersshare the address. If you have a variable of type node or a pointer to that variable, you can access its member.
When you have a variable such as int x; space is set aside for the value. Whenever the identifier x is used, the compiler generates code to access the data in that space in the appropriate manner... there's no need to store a pointer to it to do this (and if there were, wouldn't you need a pointer to that pointer? And a pointer to that? etc.).
When you have an array like int arr[5];, space is set aside the same way, but for 5 ints. When the identifier arr is used, the compiler generates code to access either the relevant array element or give the address of the array (depending on how it's used). The array is not a pointer, and doesn't contain one... but the compiler may use its address instead of its contents in some situations.
An array is said to decay to a pointer to its first element in many situations, but that just means that in those situations the identifier will give its address instead of its contents, much like when you use the address-of operator with a non-array variable. The fact that you can get the address of the int x with &x doesn't mean x contains the address of an int... just that the compiler knows how to figure it out.
Arrays don't work like that. They only allocate space for their elements, but not for a pointer. The "pointer" you are talking about (numbers) is just a placeholder for the address of the array's first element; think of it as a literal, instead of a variable. Therefore, you can not assign a value to it.
int myint;
numbers = &myint;
This won't work, since there is no memory where you could store &myint. numbers will just be converted to an address at compile time.
Size of structure is always defined by the size of its members.
So its really doesn't matter whether members are simply int, char, float or arrary or even structure itself.

Why is setting an array of characters to NULL illegal? Passing to function changes behavior

The name of an array is a synonym for the address of the first element of the array, so why can't this address be set to NULL? Is it a language rule to prevent a memory leak?
Also, when we pass an array to a function, it's behavior changes and it becomes possible to set it to NULL.
I don't understand why this occurs. I know it has something to do with pointers, but I just can't wrap my mind around it.
Example:
void some_function(char string[]);
int main()
{
char string[] = "Some string!";
some_function(string);
printf("%s\n", string);
return 0 ;
}
void some_function(char string[])
{
string = NULL;
}
Output: Some string!
I read that when an array is passed into a function, what's actually passed are pointers to each element, but wouldn't the name of the array itself still be a synonym for the address of the first element? Why is setting it to NULL here even allowed, but not in the main function?
Is it at all possible to set an array to NULL?
An array is not a pointer - the symbol string in your case has attributes of address and size whereas a pointer has only an address attribute. Because an array has an address it can be converted to or interpreted as a pointer, and the language supports this implicitly in a number of cases.
When interpreted as a pointer you should consider its type to be char* const - i.e. a constant pointer to variable data, so the address cannot be changed.
In the case of passing the array to a function, you have to understand that arrays are not first class data types in C, and that they are passed by reference (i.e. a pointer) - loosing the size information. The pointer passed to the function is not the array, but a pointer to the array - it is variable independent of the original array.
You can illustrate what is effectively happening without the added confusion of function call semantics by declaring:
char string[] = "Some string!";
char* pstring = string ;
then doing:
pstring = NULL ;
Critically, the original array data cannot just "disappear" while it is in scope (or at all if it were static), the content of the array is the variable, whereas a pointer is a variable that refers to data. A pointer implements indirection, and array does not. When an array is passed to a function, indirection occurs and a pointer to the array is passed rather than a copy of the array.
Incidentally, to pass an array (which is not a first class data type) by copy to a function, you must wrap int within a struct (structs in C are first class data types). This is largely down to the original design of C under constraints of systems with limited memory resources and the need to to maintain compatibility with early implementations and large bodies of legacy code.
So the fact that you cannot assign a pointer to an array is hardly the surprising part - because to do so makes little sense. What is surprising perhaps is the semantics of "passing an array" and the fact that an array is not a first class data type; leading perhaps to your confusion on the matter.
You can't rebind an array variable. An array is not a pointer. True, at a low level they are approximately similar, except pointers have no associated dimension / rank information.
You cant assign NULL to the actual array (same scope), but you can assign to a parameter since C treats it like a pointer.
The standard says:
7 A declaration of a parameter as ‘‘array of type’’ shall be adjusted
to ‘‘qualified pointer to type’’,
So in the function the NULL assignment is legal.

Is the printf statement valid?

int main()
{
struct a
{
struct a *next;
struct a *prev;
};
struct a *A[2];
printf("Address of (&(A[0])->next) = %p",(&(A[0])->next));
getch();
return 0;
}
In the above printf statement I'm accessing "next" pointer of "struct a" structure & when I run the program in dev compiler it's giving me the valid memory address (though I've not yet allocated any memory for it). An explanation of how come this happens will be very helpful.
Is any memory allocated for the "next" & "prev" fields?
Let's think about what this means:
&(A[0])->next
It is the address of the next pointer (not where it points, but the address of the pointer itself). And the next pointer is the first element of struct a, so the address of next is the same as the address of its enclosing a.
Therefore, the expression is the address of the struct a referred to by A[0]. In your original code, you never assign anything there, so it's simply a garbage value being printed. As #alk points out in another answer, you could initialize the two pointers in your variable A and then you would see the first of those values being printed (say, 0x0).
By the way, if you want to quickly initialize A, do it this way, not with the more verbose memset():
struct a *A[2] = {0};
It does the same thing (sets the two pointers to 0).
While the value being printed is garbage, the code may not be illegal. This may seem surprising, but see here: Dereferencing an invalid pointer, then taking the address of the result - you've got something similar, though admittedly you've taken it a step further by dereferencing a member of a struct as opposed to simply using *. So the open question in my mind is: given that &*foo is always legal when foo is a pointer (as shown in the above link), does the same hold true for &foo->bar?
&(A[0])->next
is the address of the next member of the first structure in the A array.
This can be thought of as &A[0] + offsetof(struct a, next). I. e., this just results in whatever the value of the uninitialized pointer A[0] was plus the offset of the next member from the base address of the structure (which happens to be zero, since next is the first element of the structure).
According to the C standard, your program invokes undefined behavior because it performs pointer arithmetic on an invalid pointer. However, in practice, this will most likely not crash and print a bogus address (only an addition is performed, nothing accesses the memory behind the pointer). Expect a crash though if you actually dereference the pointer.

I don't understand one dimensional array handling with function passing

I have an array named record[10] whose type is a table structure, say { int, int, long, long,char}
I have a function to which I want to pass the address of this array which gets called in a loop:
for(i = 0 ; i<10; i++)
{
// internal resolution will be *(record + i) will fetch an address
function(record[i]);
}
I'm confused as to why it is not working. I know it is related to basics.
It started working with
for(i = 0 ; i<10; i++)
{
// then why do I need to pass this address of address here
function(&record[i]);
}
*(record + i) is not in fact an address. record is an address, and so is (record + i), but *(record + i) is the value stored at the address represented by (record + i). Therefore, calling function(record[i]) is the same as function(*(record + i)), which will pass the value of the array element to the function, not a pointer.
The syntax &record[i] is not taking the address of an address. It is taking the address of record[i], which is an object. The braces have a higher precedence than the ampersand, so &record[i] is equivalent to &(record[i]). You can think of it as expanding to &(*(record + i)) and then simplifying to (record + i).
Update:
To address your question from the comment, an array "decays" into a pointer if you reference the name of the array by itself. If you add square brackets [], you will get a value from within the array. So, for your example, say you have an array of structures:
struct A {
...
char abc[10];
...
} record[10];
Then, you would have:
record[i] - an object of type struct A from the record array
record[i].abc - the abc array inside a particular record object, decayed to a pointer
record[i].abc[k] - a specific character from the string
&record[i].abc[0] - one way of creating a pointer to the string
The notation record[i]->abc that you mention in your comment cannot be used, since record[i] is an object and not a pointer.
Update 2:
In regards to your second comment, the same rules described above apply regardless of how you nest the array within a structure (and whether you access that structure directly or through a pointer). Accessing an array using arrayname[index] notation will give you an item from the array. Accessing an array using arrayname notation (that is, using the array name by itself) will give you a pointer to the first element in the array. If you need more details regarding this phenomenon, here are a couple of links that explain arrays and the way that their names can decay into pointers:
http://boredzo.org/pointers/
http://www.ibiblio.org/pub/languages/fortran/append-c.html
http://c-faq.com/aryptr/index.html
You're saying two different things in your question. First, you say you want to pass the address of the array, but in the code you appear to be trying to pass the address of a particular element. One of the features of C is that an array will automatically turn into pointer to the array's first element when you use it in certain contexts. That means these two calls are 100% equivalent:
function(array);
function(&array);
To get the address of a particular array element, you can do two things. One is as you've shown above:
function(&array[10]);
And the second is just do the pointer arithmetic directly:
function(array + 10);
In the first case the & is required, since as you mentioned in your question the [] causes the pointer to be dereferenced - the & undoes that operation. What you appear to be confused about are the real semantics of the [] operation. It both does pointer arithmetic and then dereferences the result - you're not getting an address out of that. That's where the & comes in (or just using array + 10 directly).
You are passing by value which means a copy of the variable is sent to the function. In the 2nd case you are passing by reference.
In the second case you are directly modifying the contents at the address of the array plus index.
Check this simple example to know the exact difference.
Your function's signature is probably
void function(table *); // argument's type is pointer to a table
When you pass record[i], you pass a table object.
In order to pass a pointer to a table, you have to pass &record[i], like you did.
Your function is expecting a pointer to the structure. This arguement can be an individual instance of that structure or it could be an element in the array of the give structure. Like
struct myStruct {
int a, b;
long cL, dL;
char e;
} struc1, struc2, record[20];
and function's prototype will be
function( struct myStruct *ptr);
Now you can pass the structure to function:
function( &struct1 );
// or
function( &record[ index] );
Now your confusion arises because of the misconception that syntax array[i] can also be treated as a pointer like we can do with the name of the array.
record - name of the array- gives the address of the first member of the array, (pointers also point to memory addresses) hence it can be be passed to the function. But record[index], it is different.
Actually, when we write record[ index] it gives us the value placed there which is not a pointer. Hence your function which is accepting a pointer, does not accept it.
To make it acceptable to the function, you will have to pass the address of the elements of the array i.e
function( &record[ index ] );
Here & operator gives the address of the elements of the array.
Alternatively, you can also use:
function( record + index );
Here, as we know record is the address of the first element, and when we add index in it, it gives the address of the respective element using pointer arithmetic.
Hope it was helpful.

Resources