I am reverse engineering some old C, running under Win95 (yes, in production) appears to have been compiled with a Borland compiler (I don't have the tool chain).
There is a function which does (among other things) something like this:
static void unknown(int *value)
{
int v = *value;
v-=0x8000;
*value = v;
}
I can't quite work out what this does. I assume 'int' in this context is signed 32 bit. I think 0x8000 would be unsigned 32bit int, and outside the range of a signed 32 bit int. (edit - this is wrong, it is outside of a signed 16 bit int)
I am not sure if one of these would be cast first, and how the casting would handle overflows, and/or how the subtraction would handle the over flow.
I could try on a modern system, but I am also unsure if the results would be the same.
Edit for clarity:
1: 'v-=0x8000;' is straight from the original code, this is what makes little sense to me. v is defined as an int.
2: I have the code, this is not from asm.
3: The original code is very, very bad.
Edit: I have the answer! The answer below wasn't quite right, but it got me there (fix up and I'll mark it as the answer).
The data in v is coming from an ambiguous source, which actually seems to be sending unsigned 16 bit data, but it is being stored as a signed int. Latter on in the program all values are converted to floats and normalised to an average 0 point, so actual value doesn't matter, only order. Because we are looking at an unsigned int as a signed one, values over 32767 are incorrectly placed below 0, so this hack leaves the value as signed, but swaps the negative and positive numbers around (not changing order). End results is all numbers have the same order (but different values) as if they were unsigned in the first place.
(...and this is not the worst code example in this program)
In Borland C 3.x, int and short were the same: 16 bits. long was 32-bits.
A hex literal has the first type in which the value can be represented: int, unsigned int, long int or unsigned long int.
In the case of Borland C, 0x8000 is a decimal value of 32768 and won't fit in an int, but will in an unsigned int. So unsigned int it is.
The statement v -= 0x8000 ; is identical to v = v - 0x8000 ;
On the right-hand side, the int value v is implicitly cast to unsigned int, per the rules, the arithmetic operation is performed, yielding an rval that is an unsigned int. That unsigned int is then, again per the rules, implicitly cast back to the type of the lval.
So, by my estimation, the net effect is to toggle the sign bit — something that could be more easily and clearly done via simple bit-twiddling: *value ^= 0x8000 ;.
There is possibly a clue on this page http://www.ousob.com/ng/borcpp/nga0e24.php - Guide to Borland C++ 2.x ( with Turbo C )
There is no such thing as a negative numeric constant. If
a minus sign precedes a numeric constant it is treated as
the unary minus operator, which, along with the constant,
constitutes a numeric expression. This is important with
-32768, which, while it can be represented as an int,
actually has type long int, since 32768 has type long. To
get the desired result, you could use (int) -32768,
0x8000, or 0177777.
This implies the use of two's complement for negative numbers. Interestingly, the two's complement of 0x8000 is 0x8000 itself (as the value +32768 does not fit in the range for signed 2 byte ints).
So what does this mean for your function? Bit wise, this has the effect of toggling the sign bit, here are some examples:
f(0) = f(0x0000) = 0x8000 = -32768
f(1) = f(0x0001) = 0x8001 = -32767
f(0x8000) = 0
f(0x7fff) = 0xffff
It seems like this could be represented as val ^= 0x8000, but perhaps the XOR operator was not implemented in Borland back then?
Related
GCC version 5.4.0
Ubuntu 16.04
I have noticed some weird behavior with the right shift in C when I store a value in variable or not.
This code snippet is printing 0xf0000000, the expected behavior
int main() {
int x = 0x80000000
printf("%x", x >> 3);
}
These following two code snippets are printing 0x10000000, which is very weird in my opinion, it is performing logical shifts on a negative number
1.
int main() {
int x = 0x80000000 >> 3
printf("%x", x);
}
2.
int main() {
printf("%x", (0x80000000 >> 3));
}
Any insight would be really appreciated. I do not know if it a specific issue with my personal computer, in which case it can't be replicated, or if it is just a behavior in C.
Quoting from https://en.cppreference.com/w/c/language/integer_constant, for an hexadecimal integer constant without any suffix
The type of the integer constant is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used.
int
unsigned int
long int
unsigned long int
long long int(since C99)
unsigned long long int(since C99)
Also, later
There are no negative integer constants. Expressions such as -1 apply the unary minus operator to the value represented by the constant, which may involve implicit type conversions.
So, if an int has 32 bit in your machine, 0x80000000 has the type unsigned int as it can't fit an int and can't be negative.
The statement
int x = 0x80000000;
Converts the unsigned int to an int in an implementation defined way, but the statement
int x = 0x80000000 >> 3;
Performs a right shift to the unsigned int before converting it to an int, so the results you see are different.
EDIT
Also, as M.M noted, the format specifier %x requires an unsigned integer argument and passing an int instead causes undefined behavior.
Right shift of the negative integer has implementation defined behavior. So when shifting right the negative number you cant "expect" anything
So it is just as it is in your implementation. It is not weird.
6.5.7/5 [...] If E1 has a signed type and a negative value, the resulting value is implementation- defined.
It may also invoke the UB
6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
As noted by #P__J__, the right shift is implementation-dependent, so you should not rely on it to be consistent on different platforms.
As for your specific test, which is on a single platform (possibly 32-bit Intel or another platform that uses two's complement 32-bit representation of integers), but still shows a different behavior:
GCC performs operations on literal constants using the highest precision available (usually 64-bit, but may be even more). Now, the statement x = 0x80000000 >> 3 will not be compiled into code that does right-shift at run time, instead the compiler figures out both operands are constant and folds them into x = 0x10000000. For GCC, the literal 0x80000000 is NOT a negative number. It is the positive integer 2^31.
On the other hand, x = 0x80000000 will store the value 2^31 into x, but the 32-bit storage cannot represent that as the positive integer 2^31 that you gave as an integer literal - the value is beyond the range representable by a 32-bit two's complement signed integer. The high-order bit ends up in the sign bit - so this is technically an overflow, though you don't get a warning or error. Then, when you use x >> 3, the operation is now performed at run-time (not by the compiler), with the 32-bit arithmetic - and it sees that as a negative number.
void times(unsigned short int time)
{
hours=time>>11;
minutes=((time<<5)>>10);
}
Take the input time to be 24446
The output values are
hours = 11
minutes = 763
The expected values are
hours = 11
minutes = 59
What internal processing is going on in this code?
Binary of 24446 is 0101111101111110
Time>>11 gives 01011 which means 11.
((Time<<5)>>10) gives 111011 which means 59.
But what else is happening here?
What else is going on here?
If time is unsigned short, there is an important difference between
minutes=((time<<5)>>10);
and
unsigned short temp = time << 5;
minutes = temp >> 10;
In both expressions, time << 5 is computed as an int, because of integer promotion rules. [Notes 1 and 2].
In the first expression, this int result is then right-shifted by 10. In the second expression, the assignment to unsigned short temp narrows the result to a short, which is then right-shifted by 10.
So in the second expression, high-order bits are removed (by the cast to unsigned short), while in the first expression they won't be removed if int is wider than short.
There is another important caveat with the first expression. Since the integer promotions might change an unsigned short into an int, the intermediate value might be signed, in which case overflow would be possible if the left shift were large enough. (In this case, it isn't.) The right shift might then be applied to a negative number, the result is "implementation-defined"; many implementations define the behaviour of right-shift of a negative number as sign-extending the number. This can also lead to surprises.
Notes:
Assuming that int is wider than short. If unsigned int and unsigned short are the same width, no conversion will happen and you won't see the difference you describe. The "integer promotions" are described in §6.3.1.1/2 of the C standard (using the C11 draft):
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions. All other types are unchanged by the
integer promotions.
Integer promotion rules effectively make it impossible to do any arithmetic computation directly with a type smaller than int, although compilers may use the what-if rule to use sub-word opcodes. In that case, they have to produce the same result as would have been produced with the promoted values; that's easy for unsigned addition and multiplication, but trickier for shift.
The bitshift operators are an exception to the semantics of arithmetic operations. For most arithmetic operations, the C standard requires that "the usual arithmetic conversions" be applied before performing the operation. The usual arithmetic conversions, among other things, guarantee that the two operands have the same type, which will also be the type of the result. For bitshifts, the standard only requires that integer promotions be performed on both operands, and that the result will have the type of the left operand. That's because the shift operators are not symmetric. For almost all architectures, there is no valid right operand for a shift which will not fit in an unsigned char, and there is obviously no need for the types or even the signedness of the left and right operands to be the same.
In any event, as with all arithmetic operators, the integer promotions (at least) are going to be performed. So you will not see intermediate results narrower than an int.
This piece of code seems to think that int is 16 bit and left shifting time would clear the top 5 bits.
Since you're most likely working in 32/64 bits, it doesn't happen if the value in time is a 16 bit value:
time >> 5 == (time << 5) >> 10
Try this:
minutes = (time >> 5) & 0x3F;
or
minutes = (time & 0x07FF) >> 5;
or
Declare time as unsigned short and cast to unsigned short after every shift operation since math is 32/64 bit.
24446 in binary is: 0101 1111 0111 1110
Bits 0-4 - unknown
Bits 5-10 minutes
Bits 11-16 hours
It seems that size of 'int' for the platform you working on 32 bits.
As far as processing is concerned assume that,
First statement is dividing "time" by '11'.
Second statement is multiplying "time" by 5 then dividing whole by 10.
Answer of your question ends here.
If you add what time value actually contains(number of seconds/miliseconds/hours or something else) then you may get more help.
Edit:
As #egur pointed out,you might be porting your code from 16 bit to 32/64 bit platform.
A widely accepted C coding style to make the code portable is something like below:
make Typedef.h file and include it in every other C file,
//Typedef.h
typedef unsigned int U16
typedef signed int S16
typedef unsigned short U8
typedef signed short S8
:
:
//END
Use U16,U8 etc. while declaring variables.
Now when you move to larger bit processor say 32 bit,Change your Typedef.h to
//Typedef.h
typedef unsigned int U32
typedef signed int S32
typedef unsigned short U16
typedef signed short S16
No need to change anything in rest code.
edit2:
after seeing your edit:
((Time<<5)>>10) gives 111011 which means 59.
For 32 bit processors
((Time<<5)>>10) gives 0000 0010 1111 1011 which means 763.
The Standard specifies that hexadecimal constants like 0x8000 (larger than fits in a signed integer) are unsigned (just like octal constants), whereas decimal constants like 32768 are signed long. (The exact types assume a 16-bit integer and a 32-bit long.) However, in regular C environments both will have the same representation, in binary 1000 0000 0000 0000.
Is a situation possible where this difference really produces a different outcome? In other words, is a situation possible where this difference matters at all?
Yes, it can matter. If your processor has a 16-bit int and a 32-bit long type, 32768 has the type long (since 32767 is the largest positive value fitting in a signed 16-bit int), whereas 0x8000 (since it is also considered for unsigned int) still fits in a 16-bit unsigned int.
Now consider the following program:
int main(int argc, char *argv[])
{
volatile long long_dec = ((long)~32768);
volatile long long_hex = ((long)~0x8000);
return 0;
}
When 32768 is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type long; the cast is
superfluous.
When 0x8000 is considered unsigned int, the negation will invert
16 bits, resulting in a representation 0x7FFF with type unsigned int;
the cast will then zero-extend to a long value of 0x00007FFF.
Look at H&S5, section 2.7.1 page 24ff.
It is best to augment the constants with U, UL or L as appropriate.
On a 32 bit platform with 64 bit long, a and b in the following code will have different values:
int x = 2;
long a = x * 0x80000000; /* multiplication done in unsigned -> 0 */
long b = x * 2147483648; /* multiplication done in long -> 0x100000000 */
Another examine not yet given: compare (with greater-than or less-than operators) -1 to both 32768 and to 0x8000. Or, for that matter, try comparing each of them for equality with an 'int' variable equal to -32768.
Assuming int is 16 bits and long is 32 bits (which is actually fairly unusual these days; int is more commonly 32 bits):
printf("%ld\n", 32768); // prints "32768"
printf("%ld\n", 0x8000); // has undefined behavior
In most contexts, a numeric expression will be implicitly converted to an appropriate type determined by the context. (That's not always the type you want, though.) This doesn't apply to non-fixed arguments to variadic functions, such as any argument to one of the *printf() functions following the format string.
The difference would be if you were to try and add a value to the 16 bit int it would not be able to do so because it would exceed the bounds of the variable whereas if you were using a 32bit long you could add any number that is less than 2^16 to it.
My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.
I wanna print the value of b[FFFC] like below,
short var = 0xFFFC;
printf("%d\n", b[var]);
But it actually print the value of b[FFFF FFFC].
Why does it happen ?
My computer is operated by Windows XP in 32-bit architecture.
short is a signed type. It's 16 bits on your implementation. 0xFFFC represents the integer constant 65,532, but when converted to a 16 bit signed value, this is resulting in -4.
So, your line short var = 0xFFFC; sets var to -4 (on your implementation).
0xFFFFFFFC is a 32 bit representation of -4. All that's happening is that your value is being converted from one type to a larger type, in order to use it as an array index. It retains its value, which is -4.
If you actually want to access the 65,533rd element of your array, then you should either:
use a larger type for var. int will suffice on 32 bit Windows, but in general size_t is an unsigned type which is guaranteed big enough for non-negative array indexes.
use an unsigned short, which just gives you enough room for this example, but will go wrong if you want to get another 4 steps forward.
In current compilers we can't use short (16 bit) if write short use 32 bit .
for example i compile same code with gcc4 in Ubuntu Linux 32 bit :
int main(int argc, char** argv)
{
short var = 0xFFFC;
printf("%x\n", var);
printf("%d\n", var);
return (EXIT_SUCCESS);
}
and output is :
fffffffc
-4
you can see cast short to 32bit normal and use sign extension in 2's complement
As a refresher on the C's data types available, have a look here.
There is a rule, and that concerns the usage of C, some datatypes are promoted to their integral type, for instance
char ch = '2';
int j = ch + 1;
Now look at the RHS (Right Hand Side) of the expression and notice that the ch will automatically get promoted as an int in order to produce the desired results on the LHS (LHS) of the expression. What would the value of j be? The ASCII code for '2' is 50 decimal or 0x32 hexadecimal, add 1 on to it and the value of j would be 51 decimal or 0x33 hexadecimal.
It is important to understand that rule and that explains why a data type would be 'promoted' to another data type.
What is the b? That is an array I presume that has 655532 elements correct?
Anyway, using a format specifier %d is for of type int, the value got promoted to an int, firstly, and secondly the array subscript is of type int, hence the usage of the short var got promoted and since the data size of an int is 4 bytes, it got promoted and hence you are seeing the rest of the value 0xFFFF 0xFFFC.
This is where the usage of casting comes in, to tell the compiler to cast a data type to another which explains in conjunction to Gregory Pakosz's answer above.
Hope this helps,
Best regards,
Tom.
use %hx or %hd instead to indicate that you have a short variable, e.g:
printf("short hex: %hx\n", var); /* tell printf that var is short and print out as hex */
EDIT: Uups, I got the question wrong. It was not about printf() as I thought. So this answer might be a little bit OT.
New: Because you are using var as an index to an array you should declare it as unsigned short (instead of short):
unsigned short var = 0xFFFC;
printf("%d\n", b[var]);
The 'short var' could be interpreted as a negative number.
To be more precise:
You are "underflowing" into the negative value range: Values in the range from 0x0000 upto 0x7FFF will be OK. But values from 0x8000 upto 0xFFFF will be negative.
Here are some examples of var used as an index to array b[]:
short var=0x0000;; // leads to b[0] => OK
short var=0x0001; // leads to b[1] => OK
short var=0x7FFF; // leads to b[32767] => OK
short var=0x8000; // leads to b[-32768] => Wrong
short var=0xFFFC; // leads to b[-4] => Wrong
short var=32767; // leads to the same as b[0x7FFF] => OK
short var=32768; // compile warning or error => overflow into 32bit range
You were expecting to store JUST a 16bit variable in a 32bit-aligned memory... you see, each memory address holds a whole 32bit word (hardware).
The extra FFFF comes from the fact that short is a signed value, and when assigned to int (at the printf call), it got signed-extended. When extending two-complements from 16 to 32bit, the extension is done by replicating the last N bit to all other M-N on it's left. Of course, you did not intend that.
So, in this case, you're interested in absolute array positions, so you should declare your indexer as unsigned.
In the subject of your question you have already guessed what is happening here: yes, a value of type short is "automatically extended" to a value of type int. This process is called integral promotion. That's how it always works in C language: every time you use an integral value smaller than int that value is always implicitly promoted to a value of type int (unsigned values can be promoted to unsigned int). The value itself does not change, of course, only the type of the value is changed. In your above example the 16-bit short value represented by pattern 0xFFFC is the same as 32-bit int value represented by pattern 0xFFFFFFFC, which is -4 in decimals. This, BTW, makes the rest of your question sound rather strange: promotion or not, your code is trying to access b[-4]. The promotion to int doesn't change anything.