string manipulation with %s format - c

May you explain the following output:
main()
{
char f[] = "qwertyuiopasd";
printf("%s\n", f + f[6] - f[8]);
printf("%s", f + f[4] - f[8]);
}
output:
uiopasd
yuiopasd
For example regarding the first printf:
f[8] should represent the char 'o'
f[6] should represent the char 'u'
%s format prints the string (printf("%s", f) is giving the whole "qwertyuiopasd")
So how does it come together, what is the byte manipulation here?

There are multiple problems in the code posted:
the missing return type for main is an obsolete syntax. you should use int main().
the prototype for printf is not in scope when the calls are compiled. This has undefined behavior. You should include <stdio.h>.
the expression f + f[6] - f[8] has undefined behavior: addition is left associative, so f + f[6] - f[8] is evaluated as (f + f[6]) - f[8]. f[6], which is the letter u is unlikely to have a value less than 14 (in ASCII, its value is 117) so f + f[6] points well beyond the end of the string, thus is an invalid pointer and computing f + f[6] - f[8] has undefined behavior, in spite of the fact that 'u' - 'o' has the value 6 for the ASCII character set. The expression should be changed to f + (f[6] - f[8]).
Assuming ASCII, the letters o, u and t have values 111, 117 and 116.
f + (f[6] - f[8]) is f + ('u' - 'o') which is f + (117 - 111) or f + 6.
f + 6 is the address of f[6], hence a pointer to the 7th character of the string "qwertyuiopasd". Printing this string produces uiopasd.

Assume the characters follows the ASCII scheme, the ASCII value of the following characters are :
o (f[8]): 111
u (f[6]): 117
t (f[4]): 116
f is the pointer to the char[], the first statement values to f + 6, this pointer will point to the 6th element of the array, and on printing, it will print from the sixth element till the point it encounters \0.
Similarly, the second statement evaluates to f + 5, thus you get yuiopasd as output.
What does f + n means?
You can perform the following arithmetic on the pointers ++, --, +, -. The pointer stores the memory address, and the increment operator on a pointer will increase the address value by the size of the type.
for eg for an integer, if f points to address location 1000, and we are storing 4 bytes int in the array, then f + 1 will point to 1004, which is the next element in the array.

it is a simple pointer arithmetic which will be easier to understand with this example
int main(void)
{
char f[] = "9876543210";
printf("%s , f[6]=%d, f[8]=%d, f[6]-f[8]=%d, f + f[6] - f[8] = %s\n",f, f[6], f[8], f[6]-f[8], f + f[6] - f[8]);
The result is :
9876543210 , f[6]=51, f[8]=49, f[6]-f[8]=2, f + f[6] - f[8] = 76543210
f[n] is the integer value of the nth index element of the array.
In this example the difference between ASCII codes of the 6th and 8th elements is 2.
When we add 2 to the char pointer it will reference the element 2 chars ahead which in our case is '7'

This is all about pointer arithmetic. The expression f + f[6] - f[8] evaluates to a char* pointer (like its first operand, because the name of an array variable is syntactically equivalent to a pointer to its first element), and will expand to this:
f + (int)'u' - (int)'o'
(where 'u' and 'o' represent f[6] and f[8], respectively).
The values that represent the characters, 'u' and 'o', are (on almost all modern systems, which use the ASCII system), separated by 6, so the expression adds 6 to the f address and prints the string starting from its 7th element.
Similarly for the expression f + f[4] - f[8] - but here, the difference is only 5 ('t' - 'o').

Related

String literals for array

#include <stdio.h>
int main(void) {
char a[] = "125"; // (int)1, (int)2, (int)5. But array 'a' has a type char. So int is in char. ???
printf("%s", a);
}
In that code, each element of string literal has type int. But the array a has type char.
In C99 6.4.5 $2 fragment
The same considerations apply to each element of the sequence in a character string
literal or a wide string literal as if it were in an integer character constant or a wide
character constant
In C99 6.4.5 $5 fragment
For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequenceI think they are not compatible, it's a contradiction to me. What's wrong about my thought?
No, this is a string literal.
Quoting C11, chapter 6.4.5, String Literals:
A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz".[...]
To elaborate, the acceptable syntax for a string liteal is:
string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefix:
u8
u
U
L
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except
the double-quote ", backslash \, or new-line character
escape-sequence
and then, the "source character set", referring (Chapter 5.2.1/P3)
Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a b c d e f g h i j k l m
n o p q r s t u v w x y z
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~
So, a construct like "123" is a string literal, not individual integers held by/in char.
char a[] = "125";
In that code, each element of string literal has type int. But the array a has type char.
No, the fact that it's a 5 does not mean it has to be an int. The type of which has to be determined by the context of where/how it is declared.
In your case that 5 is of type char because it is part of the string literal.
Also note that 5 can be one of any other types such as unsigned int, unsigned short, double, etc. So again you must look at how it's declared in the first place.

What does "20"[1] do?

In a test exam, we were told to find the value of some expressions.
All but 1 were clear, which was "20"[1]. I thought it was the 1st index of the number, so 0, but testing with a computer it prints 48.
What exactly does that 'function' do?
It's not a function, it's just indexing an array.
"20" here is a character array, and we're taking the value at index 1 - which is '0' - the character '0'.
This is the same as
char chArr[] = "20"; // using a variable to hold the array
printf ("%d", chArr[1]); // use indexing on the variable, decimal value 48
printf ("%c", chArr[1]); // same as above, will print character representation, 0
The decimal value of '0' is 48, according to ASCII encoding, the most common encoding around these days.
Well, depending on your point of view it's either '0', 48, or 0x30.
#include <stdio.h>
int main()
{
printf("'%c' %d 0x%X\n", "20"[1], "20"[1], "20"[1]);
return 0;
}
The above prints
'0' 48 0x30
In this subscripting expression
"20"[1]
"20" is a string literal that has the type char[3]. Used in expressions the literal is converted to pointer to its first element.
So this expression
"20"[1]
yields the second element of the string literal that is '0'.
You can imagine this record like
char *p = "20";
char c = p[1];
48 is the ASCII value of the character '0'.
A more exotic record can look like
1["20"]
that is equivalent to the previous record.
From the C Standard (6.5.2.1 Array subscripting)
2 A postfix expression followed by an expression in square brackets []
is a subscripted designation of an element of an array object. The
definition of the subscript operator [] is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).
Here is a demonstrative program
#include <stdio.h>
int main(void)
{
printf( "\"20\"[1] == '%c' and its ASCII value is %d\n", "20"[1], "20"[1] );
printf( "1[\"20\"] == '%c' and its ASCII value is %d\n", 1["20"], 1["20"] );
return 0;
}
Its output is
"20"[1] == '0' and its ASCII value is 48
1["20"] == '0' and its ASCII value is 48

How does printf(3+"excellent"+4) this line run?

I don't understand why the output is nt in this program.
Can anyone explain this program?
#include <stdio.h>
#include <stdlib.h>
int main(){
printf(3+"excellent"+4); //output is "nt"
return 0;
}
"excellent" is an array of type char[10], the elements of which are the 9 letters of the word and the terminating '\0'. And then, C11 6.3.2.1p3,
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. [...]
i.e. it is converted to a pointer to the first character of the string, (e), and then has the type char *.
Now we have two additions:
(3 + (char *)"excellent") + 4
The C standard says (simplified, C11 6.5.6p8) that when adding an integer and a pointer together, the result will be a pointer of the same type, and will be interpreted so that if the pointer p was pointing to element n of an array, then p + m will result in a pointer that will point to element n + m of the same array, or one past the end, or, if n + m is outside the bounds of the array or one past the end, the behaviour is undefined.
I.e. 3 + "excellent" will give a pointer that will point to the 2nd letter e of excellent. Now of course since the parenthesized expression has type char * and it points to the element 3 of the array, if we add 4 to it, we get a pointer that points to the element 7, i.e. 8th letter, the n.
<-------------- char [10] -------------->
+---+---+---+---+---+---+---+---+---+---+
| e | x | c | e | l | l | e | n | t | \0|
+---+---+---+---+---+---+---+---+---+---+
^ ^ ^
|  | |
first character, "excellent" after lvalue conversion
| |
+ 3 + "excellent"
|
+ 3 + "excellent" + 4
Now finally, what will happen when we call printf giving such a pointer as an argument? printf will consider the argument as being a pointer to a first character of a null terminated string that is the format string. Other than special sequences that start with %, all characters are copied verbatim to the output until the terminating null is met.
Another way to look into these is to remember that
*(a + b)
is equal to
a[b] (or even b[a])
and since &*x is equivalent to x,
&*(a + b) == (a + b) == (b + a) == &a[b] == &b[a]`
and we get that
3 + "excellent" + 4
equals
&"excellent"[3] + 4
which equals
&"excellent"[3 + 4]
i.e.
&"excellent"[7]
This
printf(3+"excellent"+4);
Can be written in a little longer but a way more clear way:
const char *str = "excellent";
const char *to_print = str + 3 + 4; // equivalent to &str[7] which points to 'n'
printf(to_print); // or printf("%s", to_print); which prints "nt"
It is because it is printing everything after the 7th character. The plus tells it where to start the print. If you change it to printf(2+"excellent"+4) you get "ent"

Achieve the output in one statement

I was given this question by my school teacher. I was supposed to add in one statement in the C code and achieve this desired output.
I have tried but i am stuck. I think the main idea of this question is to establish the relationship between the int x[] and the y[] string as i increases from 0 to 6.
The code is below:
#include <stdio.h>
int main(){
int i, x[] = {-5,10,-10,-2,23,-20};
char y[20] = "goodbye";
char * p = y;
for (i=0;i<6;i++){
*(p + i) = //Fill in the one line statement here
}
y[6] = '\0';
printf("%s\n",p); //should print out "byebye"
}
As you can see the ascii value of the characters b is from 5 lesser than g and similarly for y it is 10 greater than o..so it will be (This meets the criteria of using x) (solution utilizing the values of x)
*(p+i) = (char)(*(p+i)+x[i]);
Yes one thing that is mentioned by rici is very important. *(p+i) is nothing other than p[i] - in fact it is much leaner to use and underneath it is still being calculated as *(p+i).
From standard 6.5.2.1p2 C11 N1570
A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).
The standard mentions this also. Being said this it would be as simple as
p[i]+=x[i];
Thoughts that came to my mind while solving.
It would be (things that came to my mind when I saw it very first time - this is establishing no relation between x and y).
*(p + i) = "byebye"[i];
String literals are basically arrays and it decays into pointer to the first element of it and then we do this *(decayed pointer + i). This will eventually assign the characters of "byebye" to the char array y.
Or something like this:- (too many hardcoded values - this does relate x and y)
*(p+i) = *(y+4+i%3);
Using a the modulus operation you can manipulate your loop to assign byebye to the 6 *char values in p.
This works because you are starting from y[4] which is 'b'.
The 6 in the for loop is your next hint. You need to iterate through bye twice. bye has 3 characters.
This gives you:
*(p + i) = y[4+(i%3)];

String as an array index

In 3["XoePhoenix"], array index is of type array of characters. Can we do this in C? Isn't it true that an array index must be an integer?
What does 3["XeoPhoenix"] mean?
3["XoePhoenix"] is the same as "XoePhoenix"[3], so it will evaluate to the char 'P'.
The array syntax in C is not more than a different way of writing *( x + y ), where x and y are the sub expressions before and inside the brackets. Due to the commutativity of the addition these sub expressions can be exchanged without changing the meaning of the expression.
So 3["XeoPhoenix"] is compiled as *( 3 + "XeoPhoenix" ) where the string decays to a pointer and 3 is added to this pointer which in turn results in a pointer to the 4th char in the string. The * dereferences this pointer and so this expression evaluates to 'P'.
"XeoPhoenix"[ 3 ] would be compiled as *( "XeoPhoenix" + 3 ) and you can see that would lead to the same result.
3["XeoPhoenix"] is equivalent to "XeoPhoenix"[3] and would evaluate to the 4th character i.e 'P'.
In general a[i] and i[a] are equivalent.
a[i] = *(a + i) = *(i + a) = i[a]
In C, arrays are very simple data structures with consecutive blocks of memory. They therefore need to be integers as these indices are nothing more than offsets to addresses in memory.

Resources