Why there is no sign character in the syntax of constants? - c

Why doesn't the standard include a sign character in the syntax of constants?
It is mentioning only digits and sign character is only present in exponents.

The standard does not bother with the sign in front of numeric literals because it would be redundant.
The syntax already captures the sign as part of unary plus + and unary minus - operators. When you write
int a = -4;
the syntax of the right-hand side could be adequately described as a unary minus - expression with the operand of 4. This is the approach that the standard takes.

If - were a part of the constant -2 then 4-2 would be a syntax error (since a token is always the longest possible sequence of characters). Also, the semantics of -2147483648 and - 2147483648 would be different (the first one would be an int and the second one a long, assuming int is 32 bits and long is longer). Both of those things would be confusing.
If the - is always an operator, the semantics of -2147483648 are sometimes a little unexpected, but the more common x-1 works as expected. So that's how most programming languages, including C, work.

Related

How is the following code giving the output as -99?

Given below is a C code snippet with str initialized to "face":
char *str = "face";
printf("%d\n", -2[str]);
First, we need to parse the expression in question: We have two operators in -2[str] - a subscript operator [] and a unary minus operator -. Subscript operator has higher precedence than unary minus, so printf prints a negation of 2[str]*.
There are many Q&As explaining that 2[str] is the same as str[2], so I am not going to repeat the explanation; you can read about it in this Q&A.
Finally, the value of str[2] is 'c', which represents code of 99 on your system. Negation is applied to that value, so that is how -99 gets printed.
* Note that - is not part of an integer constant, because in C integer constants do not include sign.
The code in the question is:
char *str = "face";
printf("%d\n", -2[str]);
Let's be clear: this is horrid, and anyone writing that code should be made to rewrite it.
There are two parts to the confusion when approaching this:
Why is a[i] == i[a]?
How is -2[str] evaluated?
The linked question covers (1) extensively. Read it.
To address the second part, consider an alternative program:
#include <stdio.h>
int main(void)
{
char data[] = "XYZface";
char *str = &data[3];
printf("[%s] %d %d %d (%c)\n", str, -2[str], -(2[str]), (-2)[str], (-2)[str]);
return 0;
}
This outputs:
[face] -99 -99 89 (Y)
Why? The -2[str] notation is equivalent to -str[2] (you have read the linked Q&A, haven't you?) and not str[-2], because there are no negative literal numbers.
Read C11 §6.4.4.1 Integer constants: there are no minus signs in there. When you write -2, you have a unary minus operator and a literal 2. Mostly, that's the same as negative two, but not when mixed with a higher priority operator such as subscripting. The §6.5.2 Postfix operators such as subscripting have higher priority than the §6.5.3 Unary operators such as negation.
Let's also be clear: there is no undefined behaviour in the question's code (or mine, I trust). Technically, the value for letter 'c' (+99) is implementation-defined, but there are few extant systems where the integer value of 'c' is not 99 (but see EBCDIC for a code set where the answer would be different).
Lets dissect:
-2[str]
is
-(2[str])
because of operator precedence. Note that the -2 is not directly an integer literal; 2 is and it can receive the unary operator -, but before that happens, the [] operator is applied.
Next step is
-(str[2])
Because (a well known if curious fact) a[i]==i[a].
-('c')
Because of the format string %d, this is seen as a negative int, with the absolute value of the ASCII value of 'c'.
-(99)
-99
(This is of course a compilation of know-how by several commenters:
Jonathan Leffler, StoryTeller and a little by myself.)
As explained in the comments the code is working like this:
-(2[str]) => -(*(2 + str)) => -str[2]
As str[2] is 'c', whose ASCII value is 99. So the output is -99.
Thanks, storyteller for clearing this out.

Initializing bit-fields

When you write
struct {
unsigned a:3, b:2;
} x = {10, 11};
is x.b guaranteed to be 3 by ANSI C (C89)? I have read and reread the standard, but can't seem to find exactly that case.
For example, "result that cannot be represented by the
resulting unsigned integer type is reduced modulo the number that is
one greater than the largest value that can be represented by the
resulting unsigned integer type." speaks about computation, not about initialization. And moreover, bit-field is not really a type.
Also, (when speaking about unsigned t:4) "contains values in the range [0,15]", but it doesn't necessarily mean that initializer must be reduced modulo 16 to be mapped to [0,15].
Struct initialization is really painstakingly detailedly described, but I really can't seem to find exactly that behavior. (Of course compilers do exactly that. And IBM documentation says " when you assign a value that is out of range to a bit field, the low-order bit pattern is preserved and the appropriate bits are assigned.", but I'd like to know if ANSI C standardizes that.
"ANSI C"/C89 has been obsolete for 25 years. Therefore, my answer cites the current C standard ISO 9899:2011, also known as C11.
Pretty much everything related to bit-fields in the C standard is poorly defined. Typically, you will not find anything explicitly addressing the behavior of bit fields, but their behavior is rather specified implicitly, "between the lines". This is why you should avoid using bit fields.
However, I believe that this specific case is well-defined: it should work like any other integer initialization.
The detailed struct initialization rules you mention (6.7.9) show how the literal 11 in the initializer list is related to the variable b. Nothing strange with that. What then applies is "simple assignment", the same thing that would happen as if you wrote x.b = 11;.
When doing any kind of assignment or initialization in C, the right operand is converted to the type of the left operand. This is specified by C11 6.5.16:
In simple assignment (=), the value of the right operand is converted
to the type of the assignment expression and replaces the value stored
in the object designated by the left operand.
In your case, the literal 11 of type int is converted to a bit field of unsigned int:2.
Therefore, the rule you are looking for should be found in the chapter dealing with conversions (C11 6.3). What applies is what you already cited in your question, C11 6.3.1.3:
...if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The maximum value of an unsigned int:2 is 3. One more than the maximum value is 3+1=4. The compiler should repeatedly subtract this from the value 11:
11 - (3+1) = 7 does not fit, subtract once more:
7 - (3+1) = 3 does fit, store value 3
But then of course, this is the very same thing as taking the 2 least significant bits of the decimal value 11 and storing them in the bit field.
WRT "speaks about computation, not about initialization", the C89 standard explicitly applies the rules of assignment and conversion to initialization. It also says:
A bit-field is interpreted as an integral type consisting of the specified number of bits.
Given those, while a compiler warning would clearly be in order, it seems that throwing away upper-order bits is guaranteed by the standard.

double negation in C : is it guaranteed to return 0/1?

Is !!(x) guaranteed by the standard to return 0/1?
Note that I am not asking about c++, where a bool type is defined.
Yes, in C99, see §6.5.3.3/4:
The result of the logical negation operator ! is 0 if the value of its operand compares
unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int.
The expression !E is equivalent to (0==E).
So !x and !!y can only yield 0 or 1, as ints.
For other operators, in C99, see also Is the "true" result of >, <, !, &&, || or == defined?
This is a comment really, but it's too long.
I found a very bizarre document while looking for the standard to answer your question: The New C Standard: An Economic and Cultural Commentary. And they say academia is under-funded. (Here is the full, 2083 page 10.5MB PDF. The former link is just the section on double negation.)
It has this to say on the subject of double negation:
A double negative is very often interpreted as a positive statement in English (e.g., “It is not unknown for double negatives to occur in C source”). The same semantics that apply in C. However, in some languages (e.g., Spanish) a double negative is interpreted as making the statement more negative (this usage does occur in casual English speech, e.g., “you haven’t seen nothing yet”, but it is rare and frowned on socially1).
I believe that the author would be happy knowing that this is of no use whatsoever in answering your real question (the answer to which is yes.)

Write regular expression for C numerical literals

My homework is to write a regular expression representing the language of numerical literals from C programming language. I can use l for letter, d for digit, a for +, m for -, and p for point. Assume that there are no limits on the number of consecutive digits in any part of the expression.
Some of the examples of valid numerical literals were 13. , .328, 41.16, +45.80, -2.e+7, -.4E-7, 01E-06, +0
I came up with: (d+p+a+m)(d+p+E+e+a+m)*
update2: (l+d+p+a+m)(d+p+((E+e)(a+m+d)d*) )* im not sure how to prevent something like 1.0.0.0eee-e1.
Your regular expression does not support the various suffixes (l, u, f, etc.), nor does it support hexadecimal or octal constants.
The leading signs (+ or - in front of the number) are not lexically part of the constant; they are the unary + and - operators. Effectively, all integer and floating constants are positive.
If you need to fully support C99 floating constants, you need to support hexadecimal exponents (p instead of e).
Your regular expression also accepts many invalid sequences of characters, like 1.0.0.0eee-e1.
A single regular expression to match all C integer and floating literals would be quite long.
Untested, but this should be along the right lines for decimal at least. (Also, it accepts the string ".", or I think it does anyway; to fix that would eliminate the last of the common code between integer and FP, the leading [0-9]*.)
[0-9]*([0-9]([uU](ll?+LL?)+(ll?+LL?)?[uU]?)+(\.[0-9]*)?([eE][+-]?[0-9]+)[fFlL])
This Regex will match all your need:
[+-]?(?P<Dot1>\.)?\d+(?(Dot1)(?#if_dot_exist_in_the_beginning__do_nothing)|(?#if_dot_not_exist_yet__we_accept_optional_dot_now)(?P<Dot2>\.)?)\d*(?P<Exp>[Ee]?)(?(Exp)[+-]?\d*)

In the C language, what does "return ~0" mean?

I'm dealing with some C code that includes
return ~0;
What does that mean? It's pretty much impossible to google for...
~ is a bitwise not/complement, aka it changes all 0's to 1's and vice-versa. ~0 is a value with all bits set to 1.
The key to answering this class of question as you inspect the code is to recognize enough of the structure of the language to know what question to ask. For example, the return statement requires an expression, of a type compatible with the declared return type for the function itself.
Knowing that ~0 must be an expression, it is either a really funny way to write a number, or it is an operator you don't recognize applied to the constant zero. That latter hypothesis is easily checked, and googling for "C language operator" will quickly lead to dozens of tables of operators. Nearly any one of which will tell you that the ~ operator is a bitwise-not unary operator which inverts each individual bit of its operand. In this specific case, that converts the signed integer 0 to the integer represented with all its bits set.
On the majority of platforms you will encounter, that integer has the value -1.
The ~ (tilde) operator performs a bitwise complement on its single integer operand.
Complementing a number means to change all the 0 bits to 1 and all the 1s to 0s
Anyway, for search queries with special symbols like yours "return ~0;" you can use
http://symbolhound.com/
It is pretty useful for programmer.
There are two independent parts here: return and ~0.
return is a return statement. Read about it in your favorite C book.
~0 is an expression consisting of bitwise-complement operator ~ applied to integer constant 0. All bits in a zero value of type int are inverted (become 1) and the resultant int value (with all bits set to 1) is what the ~0 expression evaluates to. On a two's complement machine a signed integral value with such bit pattern (111...1) would represent -1.
Not zero or True.
The tilde does a bitwise compliment of the number 0, which returns back a value with all bits set to 1, with whatever size of the return value (so you'd get 0xFF for a char, etc.)

Resources