Understanding array manipulation pointers syntax in C - c

I have a struggle understanding a syntax of pointers, for example,
I have this code below:
void main(void)
{
char arr[][10] = {"It's", "wide", "and", "wonderful"};
printf("%c", (*arr)[3] - 1);
printf("%c", *arr[2] + 3);
}
I have no clue why it prints 'r' and 'd' like what's the whole process, I would kindly like an explanation.

This is obfuscation: code deliberately written to confuse.
*arr gives the first item (array) in your 2D array. At index 3 you find 's'. ASCII code for 's' - 1 = 'r'.
In *arr[2], the [] operator takes precedence, giving you the item at index 2 in your 2D array ("and"). * gives the contents of the first item (character) in that array, 'a'. ASCII code for 'a' + 3 = 'd'.
(Please note that arithmetic on symbol table values is not portable code. Only the digits 0 to 9 are guaranteed by the C standard to be placed adjacently in the symbol table.)

I'll break up the expressions (*arr)[3] - 1 and *arr[2] + 3 in order of precedence.
Expression (*arr)[3] - 1:
arr → {"It's", "wide", "and", "wonderful"}
(*arr) → "It's"
(*arr)[3] → 's'
(*arr)[3] - 1 → 'r'
Notice here two things: *arr is equivalent to arr[0], and you can perform arithmetic on a char, operating on the numeric value representing the character.
Expression *arr[2] + 3:
arr → {"It's", "wide", "and", "wonderful"}
arr[2] → "and"
*arr[2] → 'a'
*arr[2] + 3 → 'd'
The news here is that arr[] takes precedence over *arr, that is why the parenthesis is important in the first expression.

void main(void)
{
char arr[][10] = {"It's", "wide", "and", "wonderful"};
printf("%c", (*arr)[3] - 1); // arr[0][3] == the 4th char of the 1st string - 1 = s - 1 = r
printf("%c", *arr[2] + 3); // arr[2][0] == the 1st char of the 3rd string + 3 = a + 3 = d
}

in first case (*arr)[3] - 1
(*arr) gives us pointer to the first element of array: "It's"
(*arr)[3] gives us fourth element of "It's" which is: 's'
subtracting 1 from 's' gives us 'r'
in second case *arr[2]:
arr[2] gives us pointer to the third element of array: "and"
*arr[2] gives us first character of "and" which is 'a'
adding 3 to 'a' gives us 'd'

For (*arr)[3] - 1:
arr when treated like a pointer: the address of the first element. In this case, it is the address of "It's"
(*arr): dereferencing arr, aka "It's". And it's a char*
(*arr)[3]: the 4th character of "It's". The type is char
(*arr)[3] - 1: char can be used as a integer, so subtracting a char is subtracting its ASCII code. The value is 'r'
For *arr[2] + 3:
arr[2]: the 3rd element of arr (treated as an array), which is "and". The type is char*
*arr[2]: dereferencing "and", so 'a'
*arr[2] + 3: again ASCII code, its value is 'd'

Arrays used in expression are implicitly converted (with rare exceptions) to pointers to their first elements.
So if an array declared like this
char arr[4][10]
then this declaration you may rewrite like
char ( arr[4] )[10]
and in an expression the array designator is converted to pointer to its first element
char ( *p )[10]
So in this expression
(*arr)[3] - 1)
arr is converted to the type char ( * )[10] and points to the first string stored in the array. Applying the operator * you get the first sub-array (first string) that has the type char[10].
Applying the subscript operator you get the fourth character in the string that is equal to 's'. Now subtracting 1 you get the character 'r'.
In the second expression
*arr[2] + 3
that can be equivalently rewritten like
*( arr[2] ) + 3
you at first get the third sub-array of the array that is the sub-array with the string "and". This sub-array has the typechar[10]. Dereferencing the array designator (that is implicitly converted to pointer to its first element) you get the first character of the string that is'a'Adding to the character 3 you get the character'd'`.
The difference between the expressions is that in the first case you are dereferecing the array designator getting pointer to the first element and then applying the subscript operator for one-dimensional character array. In the second case you are first applying the subscript operator again getting one-dimensional array and then dereferencing the array designator that is implicitly converted to pointer to its fir element.
You should understand that if you have an array like for example
char s[] = "Hello";
then the expression *sis equivalent to *( &s[0] ) that is to s[0].

The expression *arr[2] is equivalent1),2) to arr[2][0] -
*arr[2] -> *(arr[2]) -> *((arr[2]) + 0) -> arr[2][0]
The expression (*arr)[3] is equivalent2) to arr[0][3] -
(*arr)[3] -> (*(arr + 0))[3] -> arr[0][3]
arr[0][3] represents 3rd character in first array which is s and arr[2][0] represents 0th character in third array which is a
The 2D array arr:
Array arr:
[0] = {
[0] = 'I'
[1] = 't'
[2] = '''
[3] = 's' ---> (*arr)[3], substrat 1 from s ==> r
[4] = '\0'
[5] = '\0'
[6] = '\0'
[7] = '\0'
[8] = '\0'
[9] = '\0'
}
[1] = {
[0] = 'w'
[1] = 'i'
[2] = 'd'
[3] = 'e'
[4] = '\0'
[5] = '\0'
[6] = '\0'
[7] = '\0'
[8] = '\0'
[9] = '\0'
}
[2] = {
[0] = 'a' --> *arr[2], add 3 to a ===> d
[1] = 'n'
[2] = 'd'
[3] = '\0'
[4] = '\0'
[5] = '\0'
[6] = '\0'
[7] = '\0'
[8] = '\0'
[9] = '\0'
}
[3] = {
[0] = 'w'
[1] = 'o'
[2] = 'n'
[3] = 'd'
[4] = 'e'
[5] = 'r'
[6] = 'f'
[7] = 'u'
[8] = 'l'
[9] = '\0'
}
}
1) The precedence of [] operator is higher than unary * operator.
2) C Standards#6.5.2.1
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))..

First, let's note that *arr[2] is equivalent to *( arr[2] )
Secondly, keep in mind there is no difference between *(a + i) and a[i].
Thirdly, if there's no difference between *(a + i) and a[i], there's no difference between *a and a[0].
So,
(*arr)[3] === ( *( arr + 0 ) )[3] === arr[0][3]
The above produces the fourth character of the first string.
*arr[2] === *( arr[2] ) === *( arr[2] + 0 ) === arr[2][0]
The above produces the first character of the third string.

Related

Can lldb display char * / strings in structs more efficiently?

lldb has a habit to show me strings in a very spread out manner, which is a little annoying but not a show stopper. But maybe someone already knows how to make it a bit more efficient with its display size:
Example:
(lldb) p *tq
(taskq_t) $14 = {
tq_name = {
[0] = 's'
[1] = 'y'
[2] = 's'
[3] = 't'
[4] = 'e'
[5] = 'm'
[6] = '_'
[7] = 't'
[8] = 'a'
[9] = 's'
[10] = 'k'
[11] = 'q'
[12] = '\0'
[13] = '\0'
[14] = '\0'
[15] = '\0'
[16] = '\0'
[17] = '\0'
[18] = '\0'
[19] = '\0'
[20] = '\0'
[21] = '\0'
[22] = '\0'
[23] = '\0'
[24] = '\0'
[25] = '\0'
[26] = '\0'
[27] = '\0'
[28] = '\0'
[29] = '\0'
[30] = '\0'
[31] = '\0'
}
tq_lock = {
Would prefer:
(lldb) p *tq
(taskq_t) $14 = {
tq_name = {
"system_taskq\0"
}
tq_lock = {
Or similar - as in, a string. As it gets quite long when it is char path[MAXPATH].
Showing all the elements is the natural view for an array, and since there's no guarantee that a char array is actually a null terminated string, even printing char arrays as an array is a reasonable default. But as you observe, if you know the array holds a null terminated string you really do want to print it more compactly.
Fortunately, lldb has a system for providing "alternate" views of data objects. That's why, for instance you see:
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003f0b has_char`main at has_char.c:8
5 {
6 char *myStr = "some string here";
7 char myArr[20] = {'s', 'o', 'm', 'e', ' ', 's', 0};
-> 8 return strlen(myStr) + strlen(myArr);
^
9 }
Target 0: (has_char) stopped.
(lldb) v myArr
(char [20]) myArr = "some s"
By default lldb presents char[NN] arrays as null terminated strings.
That's done using lldb's "summary formatters". There's a bunch more information about them here:
https://lldb.llvm.org/use/variable.html
You can find out how a particular variable gets its formatting, which is sometimes useful when you're looking for something to copy, with:
(lldb) type summary info myArr
summary applied to (char [20]) myArr is: `${var%s}` (hide value) (skip pointers)
The summaries are registered by a type name or type regular expression, and then applied to any value of that type when printing it. They also follow typedef chains so the same summary will be provided for variables whose type is a typedef of a registered type. What is the type of tq_name, it looks like it is just a char array? I'm a little surprised you had to do anything here...
Anyway, no worries, you can always add one explicitly for your type.
The command for that is:
(lldb) type summary add -s ${var%s} -p -v TypeNameOfTQ
-s is the summary string, see the docs cited above for more on that.
-p because you don't want to format a char [10]* this way.
-v because you explicitly don't want to see the elements one by one - which is the natural value of the array.

two dimensional Arrays access using pointers

Is
*(ary[3]+8)
and
ary[3][8]
are the same ? If yes, please explain how ? ary[3] returns the address of first element or the value in ary[3][0] ? ary is a two dimensional array.
Thanks in advance.
Yes
a[i] is same as *(a+i)
ary[i][j] is same as *( *(ary+i)+j))
If x is an array (int, say) x[i] is just a syntactic sugar for *(x+i). In your case, ary is a two-dimensional array (again of int, say). By the same syntactic sugar mechanism, ary[i][j] is equivalent to *((*(ary+i))+j), from which it is clear what happens under the hood.
*(ary[3]+8) says value at 8th column of third row.ary[3] is base address of third Row.ary[3][8] will also access to same element at third row and 8th column.
For Example i am taking an 2D array of two row and 4 column which is equivalent to 1D array of 8 elements.As shown below.
int a[8] = {0,1,2,3,4,5,6,7};
int b[2][4] = {{0,1,2,3},{4,5,6,7}};
since b is 2D array , so you can consider it as array of two 1D arrays.when you pass b[1] or b[1][0] it says address of first row.Rectangular array allocated in memory by Row.so if you want to find address of element a[row][col] it will get calculated as
address = baseAddress + elementSize * (row*(total number of column) + col);
As others already have said, a[i] is just a sugar for *(a+i).
I just would like to add that it always works, that allows us to do things like that:
char a[10];
char b;
char c[10][20];
// all of these are the same:
b = a[5]; // classic
b = *(a + 5); // pointer shifting
b = 5[a]; // even so!
b = c[5][9];
b = *(c[5] + 9);
b = *(*(c + 5) + 9);
b = *(c + 5)[9];
b = 5[c][9];
b = 5[9][c]; // WRONG! Compiling error

How does this recursive C code work?

There are lots of recursion questions and I basically understand some simple recursion algorithm such as sum of array elements. However, my friend gave me this code which reverses an array:
void r(int a[], int s)
{
if(s <=2 ) return;
int t = a[0];
a[0] = a[s-1];
a[s-1] = t;
r(&a[1], s-2); // this line confused me, why &a[1]
}
I know how to reverse an array using a normal for loop. But this code really confused me about recursion.
Can anyone explain the above line of code?
It is equvalent to
void r(int *arr, size_t len)
{
for ( ; len >= 2; arr+=1,len-=2 ) {
int t = arr[0];
arr[0] = arr[len-1];
arr[len-1] = t;
}
}
, where the recursive call is replaced by the loop. Ihe "increment" part of the loop (arr+=1,len-=2) is exactly the same as the parameters for the recursive call; the end condition (len >= 2) is equivalent to the recursion stopper (which was wrong in the original).
The idea behind this algorithm is at each step:
-: to swap the last a[s-1] and first a[0] elements of the array:
int t = a[0];
a[0] = a[s-1];
a[s-1] = t;
-: and to swap the middle recursively:
r(&a[1], s-2);
To understand the syntax, keep in mind that &a[n] is address of the n+1th element of the given array. If you have int *b = &a[1], then b[0] == a[1], b[1] == a[2], etc.
So:
&a[1] refers to an array starting at the second element of array a.
s - 2 means that the length of the array you pass recursively is shorter by 2 elements.
If you have an array [1 2 3 4 5 6 7 8 9 10], here's what happens as the recursion progresses:
[1 2 3 4 5 6 7 8 9 10] // r(&a[0], 10)
10 [2 3 4 5 6 7 8 9] 1 // r(&a[1], 8
10 9 [3 4 5 6 7 8] 2 1 // r(&(&a[1])[1], 6)
10 9 8 [4 5 6 7] 3 2 1 // r(&(&(&a[1])[1])[1], 4)
10 9 8 7 [5 6] 4 3 2 1 // r(&(&(&(&a[1])[1])[1])[1], 2)
Cool thing is that this analysis shows us that the terminating condtion s <= 2 is wrong: the innermost 2 elements in an even-sized array will never get swapped. It should be changed to s < 2.
Simplified Crazy walk trough;
void reverse(int a[], int s)
{
int temp; /* temporary value */
if (s <= 2) return; /* trigger done */
t = a[0]; /* temp = first index of a */
a[0] = a[s - 1]; /* a[0] = a[end - 1] (end including \0) */
a[s - 1] = t; /* a[end - 1] = temp */
r(&a[1], s - 2); /* pass address of a[1] and end - 2 */
}
Given the char array "ABCDEFG"
Simplified memory table could be:
Address Value
7 A
8 B
9 C
a D
b E
c F
d G
/* Or as used here: */
789abcd <- Simplified memory address
ABCDEFG
We get; main() calls reverse(ABCDEFG, 7)
List 1
Address ref. to A are pushed on to the stack (A{BCDEFG})
7 are pushed on to the stack
return address for caller is pushed onto the stack
etc.
function called
And something like
#::::::::::::::::::::::::::::::::::::::::::::::::::::
reverse(ABCDEFG, 7); # Push to STACK 0xB (As List 1)
#====================================================
789abcd <- Memory address.
ABCDEFG <- Values.
0123456 <- Indexes for a in recursion 1.
if (7 <= 2) return;
temp = A
+ .
a[0] = a[6] => ABCDEFG = GBCDEFG
+
a[6] = temp => GBCDEFG = GBCDEFA
reverse(BCDEFA, 5); # Push to STACK 0xC (As in List 1)
#====================================================
7 89abcd <- Memory addresses.
[G]BCDEFA <- Values
012345 <- Indexes for a in recursion 2.
if (5 <= 2) return;
temp = B
+ .
a[0] = a[4] => BCDEFA = FCDEFA
+
a[4] = temp => FCDEFA = FCDEBA
reverse(CDEBA, 3); # Push to STACK 0xD (As in List 1)
#====================================================
78 9abcd <- Memory addresses.
[GF]CDEBA <- Values.
01234 <- indexes for a in recursion 3.
if (3 <= 2) return;
temp = C
+ .
a[0] = a[2] => CDEBA = EDEBA
+
a[2] = temp => EDEBA = EDCBA
reverse(DCBA, 1); # Push to STACK 0xE (As in List 1)
#====================================================
789 abcd <- Memory addresses.
[GFE]DCBA <- Values.
0123 <- Indexes for a in recursion 4.
if (1 <= 2) return; YES!
#:::: roll back stack ::::
Pop STACK 0xE
Pop STACK 0xD
Pop STACK 0xC
Pop STACK 0xB
We are back in main() and memory region 789abcd has
been altered from ABCDEFG to GFEDCBA.
The important thing to realize is that a is a pointer to the first element of the array, so a is the same as &a[0]. &a[1] is a pointer to the second element of the array. So if you call the function with &a[1] as its argument, it works on the subarray that starts with the second element.
&a[1] is equivalent to a + 1, i.e. a pointer to the second element of the array. The function call reverses the "middle" s-2 elements of the array.
The function has to be called with:
A pointer to the first element of the array. In C it can be referenced by using the name of the array.
The size of the array.
The first 'if' checks that the array has as least two elements. Next, what the function does is to exchange the position of the first and the last element of the array.
The recursive call changes the bounds at which the next step has to work. It increments the beginning of the array by one position, and also decreases the end of the array by one position; since these two elements have been reversed in this iteration.

Trouble with array pointers in C

I was solving last years GATE question paper where i am stuck with this question
What does the following fragment of C-program print?
char c[]="GATE2011";
char *p =c;
printf ("%s", p+p[3]-p[1]);
The answer is '2011'
I am aware that in c, array variables are pointer to first address of the array. My logical answer was 'E2011', but the output is 2011
Can someone explain the pointer mathematics involved in this?
This problem has much more to do with ASCII values than it does with pointers.
p[3] == 'E' == 69 (decimal)
p[1] == 'A' == 65
p[3]-p[1] = 4
p+4 = A string starting at the 4th character.
p[] = [0] [1] [2] [3] [4] [5] [6] [7] [8]
G A T E 2 0 1 1 \0
Hence, p[4] = 2011
p[3] = A
p[1] = E
E - A = 4
hence p + 4 = address of 2
hence it prints 2011

Write the array notation as pointer notation

The question is &str[2], if I write str+2 then it would give the address and its logical but where did I used pointer notation in it?
Should I prefer writing &(*(str+2))?
You can use either
&str[2]
or
(str + 2)
Both of these are equivalent.
This is pointer arithmetic. So when you mention an array str or &str you refer to the base address of the array (in printf for example) i.e. the address of the first element in the array str[0].
From here on, every single increment fetches you the next element in the array. So str[1] and (str + 1) should give you the same result.
Also, if you have num[] = {24, 3}
then num[0] == *(num + 0) == *(0 + num) == 0[num]

Resources