sizeof operator in C behavior with strings [duplicate] - c

This question already has answers here:
Behaviour of sizeof with string
(2 answers)
Closed 2 years ago.
For the sizeof operator I am seeing following results; and I am not able understand the reason behind the same.
What I understand is sizeof operator returns the result in terms of size_t
Below are the results,
sizeof("6") -> 2
sizeof("a") -> 2
sizeof('a') -> 4
sizeof("something") -> 10
sizeof("some") -> 5

By definition (C11 3.6), 1 char requires 1 byte (may not be 1 octet in some exotic system)
"6" has type char[2], so 2 bytes
"a" has type char[2]
'a' has type int ==> in your system, int requires 4 bytes
"something" has type char[10]
"some" has type char[5]
Note that "a" and 'a' are very different things: "a" is an array of char with 2 elements; 'a' is an int value, very much like 42 or -1.

In records like this
sizeof("6") -> 2
there is used a string literal as an operand.
String literals are character arrays that store a sequence of characters terminated with the zero character '\0'.
So for example the string literal "6" is stored in memory like
char literal_6[] = { '6', '\0' };
Or this declaration is equivalent to
char literal_6[2] = { '6', '\0' };
Note: for example the string literal "some" is stored in memory like a character array declared as
char literal_some[] = { 's', 'o', 'm', 'e', '\0' };
So the expression sizeof("6") is equivalent to the expression sizeof( char[2] ).
In this record
sizeof('a') -> 4
there is used integer character constant '4' that has the type int.
So this expression sizeof('a') is equivalent to the expression sizeof( int ).
It is interesting to note that for example sizeof("something") is not equal to sizeof("something" + 0). In the last expression the character array that denotes the string literal is implicitly converted to pointer to its first element. So the last expression is equivalent to the expression sizeof( char * ).
Also pay attention to that if you have for example
int x = 10;
size_t n = sizeof( ++x );
then x will not be equal to 11 after the declaration of n because in this case the expression used as an operand of the sizeof operator is not evaluated. It is only the type of the expression that is important.
On the other hand if you have a variable length array then the operator sizeof will evaluate at run-time to determine its size. Here is a demonstrative program.
#include <stdio.h>
int main(void)
{
for ( int i = 1; i < 5; i++ )
{
int a[i];
printf( "sizeof( a[%d] ) = %zu\n", i, sizeof( a ) );
}
return 0;
}
The program output is
sizeof( a[1] ) = 4
sizeof( a[2] ) = 8
sizeof( a[3] ) = 12
sizeof( a[4] ) = 16

Related

String literals for array

#include <stdio.h>
int main(void) {
char a[] = "125"; // (int)1, (int)2, (int)5. But array 'a' has a type char. So int is in char. ???
printf("%s", a);
}
In that code, each element of string literal has type int. But the array a has type char.
In C99 6.4.5 $2 fragment
The same considerations apply to each element of the sequence in a character string
literal or a wide string literal as if it were in an integer character constant or a wide
character constant
In C99 6.4.5 $5 fragment
For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequenceI think they are not compatible, it's a contradiction to me. What's wrong about my thought?
No, this is a string literal.
Quoting C11, chapter 6.4.5, String Literals:
A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz".[...]
To elaborate, the acceptable syntax for a string liteal is:
string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefix:
u8
u
U
L
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except
the double-quote ", backslash \, or new-line character
escape-sequence
and then, the "source character set", referring (Chapter 5.2.1/P3)
Both the basic source and basic execution character sets shall have the following members: the 26 uppercase letters of the Latin alphabet
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a b c d e f g h i j k l m
n o p q r s t u v w x y z
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~
So, a construct like "123" is a string literal, not individual integers held by/in char.
char a[] = "125";
In that code, each element of string literal has type int. But the array a has type char.
No, the fact that it's a 5 does not mean it has to be an int. The type of which has to be determined by the context of where/how it is declared.
In your case that 5 is of type char because it is part of the string literal.
Also note that 5 can be one of any other types such as unsigned int, unsigned short, double, etc. So again you must look at how it's declared in the first place.

What does "20"[1] do?

In a test exam, we were told to find the value of some expressions.
All but 1 were clear, which was "20"[1]. I thought it was the 1st index of the number, so 0, but testing with a computer it prints 48.
What exactly does that 'function' do?
It's not a function, it's just indexing an array.
"20" here is a character array, and we're taking the value at index 1 - which is '0' - the character '0'.
This is the same as
char chArr[] = "20"; // using a variable to hold the array
printf ("%d", chArr[1]); // use indexing on the variable, decimal value 48
printf ("%c", chArr[1]); // same as above, will print character representation, 0
The decimal value of '0' is 48, according to ASCII encoding, the most common encoding around these days.
Well, depending on your point of view it's either '0', 48, or 0x30.
#include <stdio.h>
int main()
{
printf("'%c' %d 0x%X\n", "20"[1], "20"[1], "20"[1]);
return 0;
}
The above prints
'0' 48 0x30
In this subscripting expression
"20"[1]
"20" is a string literal that has the type char[3]. Used in expressions the literal is converted to pointer to its first element.
So this expression
"20"[1]
yields the second element of the string literal that is '0'.
You can imagine this record like
char *p = "20";
char c = p[1];
48 is the ASCII value of the character '0'.
A more exotic record can look like
1["20"]
that is equivalent to the previous record.
From the C Standard (6.5.2.1 Array subscripting)
2 A postfix expression followed by an expression in square brackets []
is a subscripted designation of an element of an array object. The
definition of the subscript operator [] is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).
Here is a demonstrative program
#include <stdio.h>
int main(void)
{
printf( "\"20\"[1] == '%c' and its ASCII value is %d\n", "20"[1], "20"[1] );
printf( "1[\"20\"] == '%c' and its ASCII value is %d\n", 1["20"], 1["20"] );
return 0;
}
Its output is
"20"[1] == '0' and its ASCII value is 48
1["20"] == '0' and its ASCII value is 48

How does printf(3+"excellent"+4) this line run?

I don't understand why the output is nt in this program.
Can anyone explain this program?
#include <stdio.h>
#include <stdlib.h>
int main(){
printf(3+"excellent"+4); //output is "nt"
return 0;
}
"excellent" is an array of type char[10], the elements of which are the 9 letters of the word and the terminating '\0'. And then, C11 6.3.2.1p3,
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. [...]
i.e. it is converted to a pointer to the first character of the string, (e), and then has the type char *.
Now we have two additions:
(3 + (char *)"excellent") + 4
The C standard says (simplified, C11 6.5.6p8) that when adding an integer and a pointer together, the result will be a pointer of the same type, and will be interpreted so that if the pointer p was pointing to element n of an array, then p + m will result in a pointer that will point to element n + m of the same array, or one past the end, or, if n + m is outside the bounds of the array or one past the end, the behaviour is undefined.
I.e. 3 + "excellent" will give a pointer that will point to the 2nd letter e of excellent. Now of course since the parenthesized expression has type char * and it points to the element 3 of the array, if we add 4 to it, we get a pointer that points to the element 7, i.e. 8th letter, the n.
<-------------- char [10] -------------->
+---+---+---+---+---+---+---+---+---+---+
| e | x | c | e | l | l | e | n | t | \0|
+---+---+---+---+---+---+---+---+---+---+
^ ^ ^
|  | |
first character, "excellent" after lvalue conversion
| |
+ 3 + "excellent"
|
+ 3 + "excellent" + 4
Now finally, what will happen when we call printf giving such a pointer as an argument? printf will consider the argument as being a pointer to a first character of a null terminated string that is the format string. Other than special sequences that start with %, all characters are copied verbatim to the output until the terminating null is met.
Another way to look into these is to remember that
*(a + b)
is equal to
a[b] (or even b[a])
and since &*x is equivalent to x,
&*(a + b) == (a + b) == (b + a) == &a[b] == &b[a]`
and we get that
3 + "excellent" + 4
equals
&"excellent"[3] + 4
which equals
&"excellent"[3 + 4]
i.e.
&"excellent"[7]
This
printf(3+"excellent"+4);
Can be written in a little longer but a way more clear way:
const char *str = "excellent";
const char *to_print = str + 3 + 4; // equivalent to &str[7] which points to 'n'
printf(to_print); // or printf("%s", to_print); which prints "nt"
It is because it is printing everything after the 7th character. The plus tells it where to start the print. If you change it to printf(2+"excellent"+4) you get "ent"

Accessing the previous and next members of an array element using just a pointer to one element

(Adapted from this deleted question.)
Suppose we have an array int a[n] and we have a pointer to an element in the middle of the array (i.e. int *p = &a[y] with 0 < y < n-1).
If p is passed into a function where we don't have direct access to the array, how can I access the elements immediately before and after the given array element so that they can be added together?
For example, if a is in scope the sum can be gotten easily like this:
int sum = a[y-1] + a[y+1];
But in a function where a is not in scope:
int sum_prev_next(int *p)
{
...
}
Called like this:
sum = sum_prev_next(&a[y]);
How can this function access the previous and next elements to return the sum?
Assuming the pointer in question does not point to either the first or last element of the array, you can use pointer arithmetic to access the previous and next elements of the array.
int sum_prev_next(int *p)
{
return *(p-1) + *(p+1);
}
Or equivalently:
int sum_prev_next(int *p)
{
return p[-1] + p[1];
}
The negative array subscript may be unusual, but is well defined in this case. This can be better explained with a diagram:
p-1
| p p+1
| | |
v v v
-------------------------
a | 0 | 1 | 2 | 3 | 4 | 5 |
-------------------------
If p points to a[2], then p[-1] is the same as a[1] and p[1] is the same as a[3].
It's important to note that this function has the precondition that p does not point to the first or last element of an array. If it did, then accessing either p[-1] or p[1] would invoke undefined behavior by either creating a pointer to one before the start of the array or by dereferencing a pointer to one past the end of the array (creating a pointer to one past the end is OK).
According to the definition of array subscripting (C Standard, 6.5.2.1 Array subscripting)
2 A postfix expression followed by an expression in square brackets []
is a subscripted designation of an element of an array object. The
definition of the subscript operator [] is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).
this declaration
int sum = a[y-1] + a[y+1];
can be equivalently rewritten like
int sum = *( a + y - 1 ) + *( a + y + 1 );
that in turn can be rewritten like
int sum = *( ( a + y ) - 1 ) + *( ( a + y ) + 1 );
where the subexpression a + y represents the pointer p defined like
int *p = &a[y];
or (that is the same) like
int *p = a + y;
because according to the standard conversions (C Standard, 6.3.2.1 Lvalues, arrays, and function designators)
Except when it is the operand of the sizeof operator or the unary &
operator, or is a string literal used to initialize an array, an
expression that has type ‘‘array of type’’ is converted to an
expression with type ‘‘pointer to type’’ that points to the initial
element of the array object and is not an lvalue. If the array
object has register storage class, the behavior is undefined.
So the declaration can be rewritten like
int sum = *( p - 1 ) + *( p + 1 );
Now again returning to the first quote from the C Standard we get
int sum = p[-1] + p[1];
And vice versa having the above declaration we can rewrite it like
int sum = *( p - 1 ) + *( p + 1 );
Taking into account the definition of p like
int *p = a + y;
the declaration can be rewritten like
int sum = *( ( a + y ) - 1 ) + *( ( a + y ) + 1 );
or
int sum = *( a + ( y - 1 ) ) + *( a + ( y + 1 ) );
that gives
int sum = a[y-1] + a[y+1];

String as an array index

In 3["XoePhoenix"], array index is of type array of characters. Can we do this in C? Isn't it true that an array index must be an integer?
What does 3["XeoPhoenix"] mean?
3["XoePhoenix"] is the same as "XoePhoenix"[3], so it will evaluate to the char 'P'.
The array syntax in C is not more than a different way of writing *( x + y ), where x and y are the sub expressions before and inside the brackets. Due to the commutativity of the addition these sub expressions can be exchanged without changing the meaning of the expression.
So 3["XeoPhoenix"] is compiled as *( 3 + "XeoPhoenix" ) where the string decays to a pointer and 3 is added to this pointer which in turn results in a pointer to the 4th char in the string. The * dereferences this pointer and so this expression evaluates to 'P'.
"XeoPhoenix"[ 3 ] would be compiled as *( "XeoPhoenix" + 3 ) and you can see that would lead to the same result.
3["XeoPhoenix"] is equivalent to "XeoPhoenix"[3] and would evaluate to the 4th character i.e 'P'.
In general a[i] and i[a] are equivalent.
a[i] = *(a + i) = *(i + a) = i[a]
In C, arrays are very simple data structures with consecutive blocks of memory. They therefore need to be integers as these indices are nothing more than offsets to addresses in memory.

Resources