I have been working on trying to write a function that does string comparison for a generic binary search function.
However, while writing the function, I realized that my pointer dereferencing does not work.
In essence, this is what doesn't work:
printf("***a[0] = %c\n", (*(char **)(void *)&"a")[0]);
I ran the debugger which tells me EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
However, this extremely similar code (which I believe to be identical to my previous code) does work.
char * stringa = "a";
printf("***stringa[0] = %c\n", (*(char **)(void *)&stringa)[0]);
I don't understand why the second one works but the first one doesn't. My understanding is that both "a" and stringa both represent the memory address of the beginning of a character array.
Thank you in advance.
Pointers are not arrays. Arrays are not pointers.
&stringa results in a pointer to pointer of type char**.
&"a" results in an array pointer of type char(*)[2]. It is not compatible with char**.
You try to de-reference the char(*)[2] by treating it as a char** which won't work - they are not compatible types and in practice the actual array pointer is saying "at address x there is data" but when converting it you say "at address x there is a pointer".
If you try to print printf("%p\n", *(char **)(void *)&"a"); you don't get an address but data. I get something like <garbage> 0061 which is a little endian machine trying to convert the string into a larger integer number. In memory you'll have 0x61 ('a')then 0x00 (null term) - the string itself, not an address which you can de-reference.
First, check this rule - from C11 Standard#6.3.2.1p3 [emphasis added]:
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
From String literals [emphasis added]:
Constructs an unnamed object of specified character array type in-place, used when a character string needs to be embedded in source code.
Lets decode this first:
char * stringa = "a";
printf("***stringa[0] = %c\n", (*(char **)(void *)&stringa)[0]);
In this statement char * stringa = "a";, string "a" will convert to pointer to type char that points to the initial element of the string "a". So, after initialisation, stringa will point to first element of string literal "a".
&stringa is of type char **. Dereferencing it will give char * type which is nothing but string "a" and applying [0] to it will give character 'a'.
Now, lets decode this:
printf("***a[0] = %c\n", (*(char **)(void *)&"a")[0]);
Since, here you are giving unary operator & so, in this expression, (*(char **)(void *)&"a")[0], string "a" will not convert to pointer to its initial element and &"a" will give the pointer of type const char (*)[2] and that pointer will be type casted to char ** type.
Dereferencing this pointer will give value at address which is nothing but string "a", which it will think of as a pointer of type char (because of type casting char **) and applying [0] to it. That means, it's trying to do something like this ((char *)0x0000000000000061)[0] (0x61 is hex value of character 'a') which is resulting in the error EXC_BAD_ACCESS.
Instead, you should do
printf("***a[0] = %c\n", (*(const char (*)[2])(void *)&"a")[0]);
EDIT:
OP is still confused. This edit is an attempt to explain the expressions (above in the post) in a different way.
From comments:
OP: But you wrote ((const char ()[2])(void )&"a")[0] works! There are two dereferencing operations ( and [0]) going on here!
Not sure if you aware of it or not but, I think, it's good to share definition of [] operator, from C11 Standard#6.5.2.1p2:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
Expression (*(char **)(void *)&stringa)[0]:
(*(char **)(void *)&stringa)[0]
| | |
| +----------------------+
| |
| this will result in
| type casting a pointer
| of type char ** to char **
|
|
This dereferencing
will be applied on result of &stringa
i.e. ( * ( &stringa ) )
and result in stringa
i.e. this
|
|
| &stringa (its type is char **)
| +-------+
| | 800 |---+
| +-------+ |
| |
+-------> stringa |
/ +-------+ (pointer stringa pointing to first char of string "a"
/ | 200 |---+ (type of stringa is char *)
| +-------+ |
now apply [0] 800 |
to it |
i.e. stringa[0]. +-------+
stringa[0] is +-> | a | 0 | (string literal - "a")
equivalent to | +-------+
*((stringa) + (0)) | 200 ---> address of "a"
i.e. |
*(200 + 0), |
add 0 to address 200 |
and dereference it. |
*(200 + 0) => *(200) |
dereferencing address |
200 will result in |
value at that address |
which is character |
'a', that means, |
*(200) result in -------+
Expression (*(char **)(void *)&"a")[0]:
(*(char **)(void *)&"a")[0]
| | |
| +------------------+
| |
| this will result in
| type casting a pointer
| of type const char (*)[2] to char **
|
|
this dereferencing will be
applied to pointer of type
char ** which is actually a
pointer of type char (*)[2]
i.e. *(&"a").
It will result in value at address 200
which is nothing but string "a"
but since we are type casting
&"a" with double pointer (char **)
so single dereference result
will be considered as pointer of
type char i.e. char *.
*(char **)(void *)&"a"
|
|
| &"a" (its type is const char (*)[2] because type of "a" is
| +-------+ const char [2] i.e. array of 2 characters)
| | 200 |---+
| +-------+ |
| |
| |
| |
| +-------+
+------------------> | a | 0 | (string literal - "a")
/ +-------+
/ 200 ---> address of "a"
|
|
The content at this location will be
treated as pointer (of type char *)
i.e. the hex of "a" (0x0061) [because the string has character `a` followed by null character]
will be treated as pointer.
Applying [0] to this pointer
i.e. (0x0061)[0], which is
equivalent to (* ((0x0061) + 0)).
(* ((0x0061) + 0)) => *(0x0061)
i.e. trying to dereference 0x0061
Hence, resulting in bad access error.
Expression (*(const char (*)[2])(void *)&"a")[0]:
(*(const char (*)[2])(void *)&"a")[0]
| | |
| +----------------------------+
| |
| this will result in
| type casting a pointer
| of type const char (*)[2] to const char (*)[2]
|
|
this dereferencing will be
applied to pointer of type
const char (*)[2]
i.e. *(&"a")
and result string "a"
whose type is const char [2]
|
|
| &"a" (its type is const char (*)[2] because type of "a" is
| +-------+ const char [2] i.e. array of 2 characters)
| | 200 |---+
| +-------+ |
| |
| |
| |
| +-------+
+------------------> | a | 0 | (string literal - "a")
/ +-------+
/ 200 ---> address of "a"
|
|
Apply [0] to "a"
i.e. "a"[0].
Now, scroll to the top of my post
and check string literal definition -
string literal constructs unnamed object of character array type.....
also, read rule 6.3.2.1p3
(which is applicable for an array of type) -
....an expression that has type 'array of type' is converted
to an expression with type 'pointer to type' that points to
the initial element of the array object. ....
So, "a" (in expression "a"[0]) will be converted to pointer
to initial element i.e. pointer to character `a` which is
nothing but address 200.
"a"[0] -> (* ((a) + (0))) -> (* ((200) + (0)))
-> (* (200)) -> 'a'
From comments:
OP: there is no such thing as an object in C ....
Don't confuse word object with objects in C++ or other object oriented languages.
This is how C standard defines an object:
From C11 Standard#3.15p1
1 object
region of data storage in the execution environment, the contents of which can represent values
E.g. - int x; --> x is an object of type int.
Let me know, if you have any more question.
Related
I was learning C language where I saw that pointers are variables that store the address of other variables. So I ran this code:
int x = 10;
int *p;
p = &x;
printf("%i\n", p);
Result: 6422292
Then I tried to do the same thing without using pointers, just using a variable to store the address:
int z = 10;
int v;
v = &z;
printf("%i", v);
Result: 6422282
Since we can use variables to store other variables' address, why do we use pointers at all?
Pointers are not integers. They may have integral representation, but they do not behave like integers and should not be treated like integers. Note that on platforms like x86_64 an int is not wide enough to store a pointer value.
Pointers are a distinct class of datatypes for storing the location of an object or function - they are an abstraction of a memory address, with additional type information. Remember, a data type isn't just about what values you can store, but also about what operations you can perform on those values. Pointer operations are distinct from integer operations. The + and - operators mean very different things for integer and pointer types. The unary * operator is not defined for integer types. The arithmetic * and / operators are not defined for pointer types.
And so on.
Pointers to different types are themselves different types and are not interchangeable. Pointer arithmetic (the basis of array subscripting) is based on the pointed-to type. That is, if cp is a char * pointing to a char object, then cp + 1 yields the location of the next char object immediately following. If ip is an int * pointing to an int object, then ip + 1 yields the location of the next int object immediately following:
+---+
c: | | <--- cp
+---+
| | <--- cp + 1
+---+
...
+---+
i: | | <--- ip
+---+
| |
+---+
| |
+---+
| |
+---+
| | <-- ip + 1
+---+
| |
+---+
| |
+---+
| |
+---+
...
This is what I mean about pointers not behaving like integers. They have their own distinct semantics.
C expects the operand of the unary * operator to have pointer type. If you try to deference an integer, even if that integer object stores a valid address value, the compiler will yell at you.
In case of integer, it looks like good, because address is itself integer, but try to do this with other data type like string , array and any struct . You will get the idea why we need pointer in C.
This question already has answers here:
Why in a 2D array a and *a point to same address? [duplicate]
(4 answers)
Closed 3 years ago.
I was testing some codes to find out how 2d array is implemented in c.
Then I met following problem.
The code is:
int main(){
int a[4][4];
printf("a: %p, *a: %p, **a: 0x%x\n",a,*a,**a);
}
I compiled this with 32-bit ubuntu gcc
The result was:
a: 0xbf9d6fdc, *a: 0xbf9d6fdc, **a: 0x0
I expected different value for a and *a, but they are same.
why a and *a are same in this case?
Is not a a int** type?
Then what is role of *operator in *a?
Check the types!!
Given the definition as int a[4][4];
a is of type int [4][4] - array of an array of 4 integers. It's not the same as int **.
a[n] is of type int [4] - array of 4 integers. It's not the same as int *
a[n][m] is of type int. - integer.
Now, given the fact, that the address of the array is also the address of the first element in the array, the values are same, but they differ in types.
To check it visually
int a[4][4]
+-------+--------+-------+-----------+
| | | | |
|a[0][0]| a[0][1]| a[0][2] a[0][3] |
| | | | |
+------------------------------------+
| | | | |
|a[1][0]| | | |
| | | | |
+------------------------------------+
| | | | |
|a[2][0]| | | |
| | | | |
+------------------------------------+
| | | | |
| | | | |
| | | | |
| | | | |
+-------+--------+-------+-----------+
Then, quoting the C11, §6.3.2.1
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue. [...]
So, while passing an argument of type array as a function argument, it decays to the pointer to the first element of the array.
So, let's have a look.
a, decays to &(a[0]) - the address of the first element of type int (*)[4].
*a, which is the same as a[0], decays to an int *, pointer to the first element.
**a which is same as *(*a) == *(a[0]) == a[0][0] - that's the int value at that index.
Now once again look carefully at the image above - do you see that the first element of int [0] and int [0][0] are basically residing at the same address? In other words, the starting address are the same.
That's the reason behind the output you see.
We could declare a pointer to an integer by writing int*. We already saw a pointer type char** argv. This is a pointer to pointers to characters.
Seems that argv is a pointer to multiple pointers which point to chars.
In C strings are represented by the pointer type char*. Under the hood they are stored as a list of characters, where the final character is a special character called the null terminator.
Is it the case with above char** where the pointers are stored as characters in the string ?
A pointer can point to a single object, or it can point to an array of objects.
In the case of the argv parameter to main which is declared as char *argv[] (or equivalently char ** since it is a function parameter), it is a pointer to an array of char *.
In memory it looks something like this:
argv
-----
| .-|----> ------
----- | | ----------------------------------
| .-|-----> | s | t | r | i | n | g | 1 | \0 |
| | ----------------------------------
------
| | ----------------------------------
| .-|-----> | s | t | r | i | n | g | 2 | \0 |
| | ----------------------------------
------
| | ----------------------------------
| .-|-----> | s | t | r | i | n | g | 3 | \0 |
| | ----------------------------------
------
...
When we define a char *argv[] for example :
Example 1:
char *p[5] = {{"ali"}, {"reza"}, {"hamid"}, {"saeed"}, {"mohsen"}};
for(int i = 0;i < 5;i++)
printf("%s\n", *p[i]);
Example 2 : (Here we have 5 pointers pointing to char*)
char **p;
p = new char*[5];
for(int i = 0;i < 5;i++)
p[i] = new char[10];
This happens in memory :
Yes.
A pointer p to type T can point to a single T, or to an array of T. In the latter case you can index into the array using pointer arithmetics, such as p[n]. In the same way, argv[n]'s pointees are not single chars, but nul-terminated arrays of chars, AKA C-style strings.
A pointer is a reference to a memory address - pointer contains address to a variable. A pointer to pointer is a form of indirection where the pointer contains address to the other pointer variable. The second pointer variable contains address where the value is stored.
argv refers to argument vector which has reference to arguments passes to a program via the command line. As pointer argv refers to the first element in the character array; now since the vector is represented as an array its implicit to find the other pointers.
Memory-Address: |0xA0|0xA1|0xA2|0xA3|0xA4|0xA5|0xA6|0xA7|
Memory-Content: | 0x123 | 0x456 |
|-------4-Byte------|
|<- int* = 0x123
An pointer in C contains the address of a specific region in memory (ignoring VirtualMemory).
The pure address marks the start-position (here 0xA0) and the range is bounded by the size of the actual C-type.
But the content may be a pointer as well. (Here just 32-Bit addresses!)
Memory-Address: |0xA0|0xA1|0xA2|0xA3|0xA4|0xA5|0xA6|0xA7|
Memory-Content: | 0xA4 | 0x123 |
|-------4-Byte------|
|<- int** = 0xA4 |<- int* = 0x123
So you can construct any pointer hierarchy in memory.
In C, you can declare an char array either by
char []array;
or
char *array;
The later one is a pointer, why can it be an array?
Pointers and arrays are two completely different animals; a pointer cannot be an array and an array cannot be a pointer.
The confusion comes from two concepts that aren't explained very well in most introductory C texts.
The first is that the array subscript operator [] can be applied to both pointer and array expressions. The expression a[i] is defined as *(a + i); you offset i elements from the address stored in a and dereference the result.
So if you declare a pointer
T *p;
and assign it to point to some memory, like so
p = malloc( N * sizeof *p );
you'll get something like the following:
+---+
p: | | ---+
+---+ |
... |
+---+ |
p[0]: | |<---+
+---+
p[1]: | |
+---+
...
+---+
p[N-1]: | |
+---+
p stores the base address of the array, so *(p + i) gives you the value stored in the i'th element (not byte) following that address.
However, when you declare an array, such as
T a[N];
what you get in memory is the following:
+---+
a[0]: | |
+---+
a[1]: | |
+---+
...
+---+
a[N-1]: | |
+---+
Storage has only been set aside for the array elements themselves; there's no separate storage set aside for a variable named a to store the base address of the array. So how can the *(a+i) mechanism possibly work?
This brings us to the second concept: except when it is the operand of the sizeof or unary & operators, or is a string literal being used to initialize another array ijn a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
In other words, when the compiler sees the expression a in the code, it will replace that expression with a pointer to the first element of a, unless a is the operand of sizeof or unary &. So a evaluates to the address of the first element of the array, meaning *(a + i) will work as expected.
Thus, the subscript operator works exactly the same way for both pointer and array expressions. However, this does not mean that pointer objects are the same thing as array objects; they are not, and anyone who claims otherwise is confused.
This question already has answers here:
How come an array's address is equal to its value in C?
(6 answers)
Address of an array
(3 answers)
Closed 9 years ago.
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <string.h>
struct BOOK{
char name[15];
char author[33];
int year;
};
struct BOOK *books;
int main(){
int i,noBooks;
noBooks=2;
books=malloc(sizeof(struct BOOK)*noBooks);
books[0].year=1986;
strcpy(books[0].name,"MartinEden");
strcpy(books[0].author,"JackLondon");
//asking user to give values
scanf("%d",&books[1].year);
scanf("%s",&books[1].name);
scanf("%s",books[1].author);
printf("%d %s %s\n",books[0].year,books[0].author,books[0].name);
printf("%d %s %s\n",books[1].year,books[1].author,books[1].name);
getch();
return 0;
}
I give 1988 theidiotanddostoyevski
the output is
1986 JackLondon MartinEden
1988 dostoyevski theidiot
in scanf, in books[].name i used &, in books[].author I did not use but still it did same. For year it did not work. & is useless in structure?
I mean here
scanf("%d",&books[1].year);
scanf("%s",&books[1].name);
scanf("%s",books[1].author); //no & operator
char name[15];
char author[33];
here, i can use
char *name[15];
char *author[33];
nothing changes. why i cant see the difference?
The name member of the BOOK structure is a char array of size 15. When the name of the array is used in an expression, its value is the address of the array's initial element.
When you take an address of the name member from a struct BOOK, though, the compiler returns the base address of the struct plus the offset of the name member, which is precisely the same as the address of name's initial element. That is why both &books[1].name and books[1].name expressions evaluate to the same value.
Note: you should specify the size of the buffers into which you are going to read the strings; this will prevent potential buffer overruns:
scanf("%14s", books[1].name);
scanf("%32s", books[1].author);
This form is valid:
scanf("%s",books[1].author);
This form is invalid:
scanf("%s", &books[1].author);
s conversion specifier expects an argument of type pointer to char in scanf function, which is true in the first statement but false in the second statement. Failing to meet this requirement makes your function call invoke undefined behavior.
In the first statement, the trailing argument (after conversion) is of type pointer to char and in the second statement, the argument is of type pointer to an array 33 of char.
Except when it is the operand of the sizeof or unary & operator, or is a string literal being used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element in the array.
When you write
scanf("%s", books[1].author);
the expression books[i].author has type "33-element array of char". By the rule above, it will be converted to an expression of type "pointer to char" (char *) and the value of the expression will be the address of the first element of the array.
When you write
scanf("%s", &books[1].name);
the expression books[1].name is an operand of the unary & operator, so the conversion doesn't happen; instead, the type of the expression &books[1].name has type "pointer to 15-element array of char" (char (*)[15]), and its value is the address of the array.
In C, the address of the array and the address of the first element of the array are the same, so both expressions result in the same value; however, the types of the two expressions are different, and type always matters. scanf expects the argument corresponding to the %s conversion specifier to have type char *; by passing an argument of type char (*)[15], you invoke undefined behavior, meaning the compiler isn't required to warn you about the type mismatch, nor is it required to handle the expression in any meaningful way. In this particular case, the code "works" (gives you the result you expect), but it isn't required to; it could just as easily have caused a crash, or led to corrupted data, depending on the specific implementation of scanf.
Both calls should be written as
scanf("%s", books[1].name);
scanf("%s", books[1].author);
Edit
In answer to your comment, a picture may help. Here's what your books array would look like:
+---+ +---+
| | | name[0]
| +---+
| | | name[1]
| +---+
| ...
| +---+
| | | name[14]
| +---+
books[0] | | author[0]
| +---+
| | | author[1]
| +---+
| ...
| +---+
| | | author[33]
| +---+
| | | year
+---+ +---+
| | | name[0] <------ books[1].name
| +---+
| | | name[1]
| +---+
| ...
| +---+
| | | name[14]
| +---+
books[1] | | author[0] <------ books[1].author
| +---+
| | | author[1]
| +---+
| ...
| +---+
| | | author[33]
| +---+
| | | year
+---+ +---+
Each element of the books array contains two arrays plus an integer. books[1].name evaluates to the address of the first element of the name array within books[1]; similarly, the expression books[1].author evaluates to the address of the first element of the author array within books[1].