why sizeof unsigned char array[10] is 10 - c

The size of char is 1 byte, and wikipedia says:
sizeof is used to calculate the size of any datatype, measured in the
number of bytes required to represent the type.
However, i can store 11 bytes in unsigned char array[10] 0..10 but when i do sizeof(array) i get 10 bytes. can someone explain explain this behavior?
note: i have tried this on int datatype, the sizeof(array) was 40, where i expect it to be 44.

However, i can store 11 bytes in unsigned char array[10]
No, you cannot: 10 is not a valid index of array[10]. Arrays are indexed from zero to size minus one.
According to C99 Standard
6.5.3.4.3 When [sizeof operator is] applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.
That is why the result is going to be ten on all standard-compliant platform.

No, the valid indices will be 0-9 not 0-10, it will store 10 elements not 11, so the result of sizeof is correct. Accessing beyond index 9 will be out of bounds and undefined behavior, the relevant section of the C99 draft standard is 6.5.6/8, which covers pointer arithmetic:
[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
Unlike the C++ standard which explicitly states an array has N elements numbered 0 to N-1 it looks like you need to dig into the examples for a similar statement in the C standard. In the C99 draft standard section 6.5.2.1/4, the example is:
int x[3][5];
and it goes on to state:
Here x is a 3 x 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints.

unsigned char array[10];/*Array of 10 elements*/
which means
array[0],array[1],array[2],array[3].......array[9]
so sizeof(array)=10 is correct.

Related

Unexpected result when doing subtraction of addresses of array elements

I am on a x32-based processor where char = 1 byte, short = 2 bytes and int = 4 bytes.
When I create an array of type char with 20 elements in it, I expect to see 20 memory spaces allocated to that array with the addresses differing by only 1 byte because of the type of the array.
If I take two consecutive elements from the array and subtract their addresses, should I then not get 1 in this case?
And in the case of arrays with types short and int, I am expecting to get 2 and 4. This due to the fact that the short and int elements need be aligned in memory. short elements will be on even addresses (diff 2) and int elements will be on addresses divisible by 4.
Though, how come when I run the following code I get 1,1,1 and not 1,2,4?
I suspect I am missing some cruical detail when it comes to pointer arithmetics.
char vecc[20];
printf("%i\n", &vecc[1]-&vecc[0]);
short vecs[20];
printf("%i\n", &vecs[1]-&vecs[0]);
int veci[20];
printf("%i\n", &veci[1]-&veci[0]);
Pointer subtraction yields the result as difference in the indexes, not the size of the gap between the addresses.
Quoting C11, chapter 6.5.6, (emphasis mine)
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. [...]
If you write the code in this way:
printf("%i\n", (char*)(&vecs[1]) - (char*)(&vecs[0]));
printf("%i\n", (char*)(&veci[1]) - (char*)(&veci[0]));
the output will be 2 and 4.

How does sizeof operator behaves in below code snippet?

Please explain the OP for below code snippet :
int *a="";
char *b=NULL;
float *c='\0' ;
printf(" %d",sizeof(a[1])); // prints 4
printf(" %d",sizeof(b[1])); // prints 1
printf(" %d",sizeof(c[1])); // prints 4
Compiler interprets a[1] as *(a+1) , so a has some address , now it steps 4 bytes ahead , then it will have some garbage value there so how is the OP 4 bytes , even if I do a[0] , still it prints 4 , although it is an empty string , so how come its size is 4 bytes ?
Here we are finding out the size of the variable the pointer is pointing to , so if I say size of a[1] , it means size of *(a+1), Now a has the address of a string constant which is an empty string , after I do +1 to that address it moves 4 bytes ahead , now its at some new address , now how do we know the size of this value , it can be an integer , a character or a float , anything , so how to reach to a conclusion for this ?
The sizeof operator does not evaluate its operand except one case.
From the C Standard (6.5.3.4 The sizeof and alignof operators)
2 The sizeof operator yields the size (in bytes) of its operand, which
may be an expression or the parenthesized name of a type. The size is
determined from the type of the operand. The result is an integer.
If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the
result is an integer constant.
In this code snippet
int *a="";
char *b=NULL;
float *c='\0' ;
printf(" %d",sizeof(a[1])); // prints 4
printf(" %d",sizeof(b[1])); // prints 1
printf(" %d",sizeof(c[1])); // prints 4
the type of the expression a[1] is int, the type of the expression b[1] is char and the type of the expression c[1] is float.
So the printf calls output correspondingly 4, 1, 4.
However the format specifiers in the calls are specified incorrectly. Instead of "%d" there must be "%zu" because the type of the value returned by the sizeof operator is size_t.
From the same section of the C Standard
5 The value of the result of both operators is implementation-defined,
and its type (an unsigned integer type) is size_t, defined in
<stddef.h> (and other headers).
This is all done statically, i.e. no dereferencing is happening at runtime. This is how the sizeof operator works, unless you use variable-length arrays (VLAs), then it must do work at runtime.
Which is why you can get away with sizeof:ing through a NULL pointer, and other things.
You should still be getting trouble for
int *a = "";
which makes no sense. I really dislike the c initializer too, but at least that makes sense.
sizeof operator happens at compilation (except for VLA's). It is looking at the type of an expression, not the actual data so even something like this will work:
sizeof(((float *)NULL)[1])
and give you the size of a float. Which on your system is 4 bytes.
Live example
Even though this looks super bad, it is all well defined, since no dereference ever actually occurs. This is all operations on type information at compile time.
sizeof() is based on the data type, so whilst it's getting the sizes outside the bounds of memory allocated to your variables, it doesn't matter as it's worked out at compile time rather than run time.

Why (1)["abcd"]+"efg"-'b'+1 becomes "fg"?

#include <stdio.h>
int main()
{
printf("%s", (1)["abcd"]+"efg"-'b'+1);
}
Can someone please explain why the output of this code is:
fg
I know (1)["abcd"] points to "bcd" but why +"efg"-'b'+1 is even a valid syntax ?
I know (1)["abcd"] points to "bcd"
No. (1)["abcd"] is a single char (b).
So (1)["abcd"]+"efg"-'b'+1 is: 'b' + "egf" - 'b' + 1 and if you simplify it, it becomes "efg" + 1. Hence it prints fg.
Note: The above answer explains only the observed behaviour which is not strictly legal as per the C language specification. Here's why.
case 1: 'b' < 0 or 'b' > 4
In this case, the expression (1)["abcd"] + "efg" - 'b' + 1 will lead to undefined behaviour, due to the sub-expression (1)["abcd"] + "efg", which is 'b' + "efg" producing an invalid pointer expression (C11, 6.5.5 Multiplicative operators -- quote below).
On the widely used ASCII character set, 'b' is 98 in decimal; on the not-so-widely used EBCDIC character set, 'b' is 130 in decimal. So the sub-expression (1)["abcd"] + "efg" would cause undefined behaviour on a system using either of these two.
So barring a weird architecture, where 'b' <= 4 and 'b' >= 0, this program would cause undefined behaviour due to how the
C language is defined:
C11, 5.1.2.3 Program execution
The semantic descriptions in this International Standard describe the
behavior of an abstract machine in which issues of optimization are
irrelevant. [...] In the abstract machine, all expressions are
evaluated as specified by the semantics. An actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no needed side effects are produced.
which categorically states that whole standard has been defined based on the abstract machine's behaviour.
So in this case, it does cause undefined behaviour.
case 2: 'b' >= 0 or 'b' <= 4 (This is quite imaginary, but in theory, it's possible).
In this case, the subexpression (1)["abcd"] + "efg" can be valid (and in turn, the whole expression (1)["abcd"] + "efg" - 'b' + 1).
The string literal "efg" consists of 4 chars, which is an array type (of type char[N] in C) and and the C standard guarantees (as quoted above) that the pointer expression evaluating to one-past the end of an array doesn't overflow or cause undefined behaviour.
The following are the possible sub-expressions and they are valid:
(1) "efg"+0 (2) "efg"+1 (3) "efg"+2 (4) "efg"+3 and (5) "efg"+4 because C standard states that:
C11, 6.5.5 Multiplicative operators
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i−n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
So it's not causing undefined behaviour in this case.
Thanks #zch & #Keith Thompson for digging out the relevant parts of C standard :)
There seems to be some confusion about the difference between the other two answers. Here's what happens, step by step:
(1)["abcd"]+"efg"-'b'+1
The first part, (1)["abcd"] takes advantage of the way arrays are processed in C. Let's look at the following:
int a[5] = { 0, 10, 20, 30, 40 };
printf("%d %d\n", a[2], 2[a]);
The output will be 20 20. Why? because the name of an array of int evaluates to its address, and its data type is pointer to int. Referring to an element of the integer array tells C add an offset to the address of the array and evaluate the result as type int. But this means C doesn't care about the order: a[2] is exactly the same as 2[a].
Similarly, since a is the address of the array, a + 1 is the address of the element at the first offset into the array. Of course, that's equivalent to 1 + a.
A string in C is just another, human-friendly, way of representing an array of type char. So (1)["abcd"] is the same as returning the element at the first offset into an array of the characters a, b, c, d, \0 ... which is the character b.
In C, every character has an integral value (generally its ASCII code). The value of b happens to be 98. The remainder of the evaluation, therefore, involves calculations with integers and an array: the character string "efg".
We have the address of the string. We add and subtract 98 (the ASCII value of the character b), and we add 1. The b's cancel each other, so the net result is one more than the address of the first character in the string, which is the address of the character f.
The %s conversion in the printf() tells C to treat the address as the first character in a string, and to print the entire string until it encounters the null character at the end.
So it prints fg, which is the part of the string "efg" that starts at the f.

Negative index in array [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Negative array indexes in C?
Can I use negative indices in arrays?
#include <stdio.h>
int main(void)
{
char a[] = "pascual";
char *p = a;
p += 3;
printf("%c\n", p[-1]); /* -1 is valid here? */
return 0;
}
Yes, -1 is valid in this context, because it points to a valid location in memory allocated to your char a[] array. p[-1] is equivalent to *(p-1). Following the chain of assignments in your example, it is the same as a+3-1, or a+2, which is valid.
EDIT : The general rule is that an addition / subtraction of an integer and a pointer (and by extension, the equivalent indexing operations on pointers) need to produce a result that points to the same array or one element beyond the end of the array in order to be valid. Thanks, Eric Postpischil for a great note.
C 2011 online draft
6.5.6 Additive operators
8 ...if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist....
Emphasis mine. So, in your specific example, p[-1] is valid, since it points to an existing element of a; however, a[-1] would not be valid, since a[-1] points to a non-existent element of a. Similarly, p[-4] would not be valid, a[10] would not be valid, etc.
Of course it is valid.
(C99, 6.5.2p1) "One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Generally using such negative indexes is a Bad Idea(TM). However, I found one place where this can be useful: trig lookup tables. For such a look up table, we need to use some angle measure as the index. For example, I can index sin values for angles between -180 degrees and +180 using degrees as an index. Or if I want to use radions instead, I can use a multiple of some fraction of PI, say PI/3, for the index. Then I can get cos values between -PI and PI by multiples of PI/3.
Yes, this is legal, as C lets you do unsafe pointer arithmetic all day. However, this is confusing, so don't do it. See also this answer to the same question.

C pointer addition and substraction in sect. 6.5.6

I am trying to understand paragraph 8 and 9 of C99 sect 6.5.6 (Additive operators)
Does para 8 mean:
int a [4];
int *p = a;
p --; /* undefined behaviour */
p = a + 4; /* okay */
p --; /* okay */
p += 2; /* undefined behaviour */
p = a;
p += 5 - 5; /* okay */
p = p + 5 - 5; /* undefined behaviour */
For paragraph 9, my understanding had been that ptrdiff_t is always large enough to hold the difference of 2 pointers. But the wording:
'provided the value fits in an object of type ptrdiff_t' seems to suggest this understanding is wrong. Is my understanding wrong or C99 meant something else.
You can find a link to the draft standards here:
http://cboard.cprogramming.com/c-programming/84349-c-draft-standards.html
I don't think your interpretation is correct. In the version I have (n1256) paragraph 9 states:
If the result is not representable in an object of that type, the
behavior is undefined
that is it. If the difference is larger than PRTDIFF_MAX or smaller than PTRDIFF_MIN the behavior is undefined.
Notice that this places the burden on the programmer to check if the difference fits in ptrdiff_t. A "lazy" platform implementation could just choose a narrow type for ptrdiff_t and leave you dealing with that.
Checking for that would not be straight forward since you can't do the substraction without provoking UB. You'd have to use the information that the two pointers point inside (or just beyond) of the same object and where the boundaries of that surrounding object are.
I agree to your understanding of paragraph 8. The standard says
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
It seems that C assumes that there is no pointer overflow inside an array, so you can increment/decrement pointers while you stay inside the array. If the result pointer is leaving the array, an overflow might occur and behaviour is undefined.
Regarding paragraph 9 I guess the standard takes into account that you might for example have an architecture that gives you 32 bit pointers and 32 bit data types, but since the difference of two 32 bit pointers in fact is a sign plus 32 bit (so 33 bits), not every pointer difference might match into a 32 bit ptrdiff_t. With 2 complement architecture this is not a problem, but it might be a problem on other architectures.

Resources