Is it safe to compare an (uint32_t) with an hard-coded value? - c

I need to do bitewise operations on 32bit integers (that indeed represent chars, but whatever).
Is the following kind of code safe?
uint32_t input;
input = ...;
if(input & 0x03000000) {
output = 0x40000000;
output |= (input & 0xFC000000) >> 2;
I mean, in the "if" statement, I am doing a bitwise operation on, on the left side, a uint32_t, and on the right side... I don't know!
So do you know the type and size (by that I mean on how much bytes is it stored) of hard-coded "0x03000000" ?
Is it possible that some systems consider 0x03000000 as an int and hence code it only on 2 bytes, which would be catastrophic?

Is the following kind of code safe?
Yes, it is.
So do you know the type and size (by that I mean on how much bytes is it stored) of hard-coded "0x03000000" ?
0x03000000 is int on a system with 32-bit int and long on a system with 16-bit int.
(As uint32_t is present here I assume two's complement and CHAR_BIT of 8. Also I don't know any system with 16-bit int and 64-bit long.)
Is it possible that some systems consider 0x03000000 as an int and hence code it only on 2 bytes, which would be catastrophic?
See above on a 16-bit int system, 0x03000000 is a long and is 32-bit. An hexadecimal constant in C is the first type in which it can be represented:
int, unsigned int, long, unsigned long, long long, unsigned long long

Related

shifting an unsigned char by more than 8 bits

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.
Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.
Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

Clarification needed on (u/i)int_fastN_t

i read many explanation on fastest minimum-width integer types but i couldn't understand when to use these data types.
My understanding :
On 32-bit machine,
uint_least16_t could be typedef to an unsigned short.
1. uint_least16_t small = 38;
2. unsigned short will be of 16 bits so the value 38 will be stored using 16 bits. And this will take up 16 bits of memory.
3. The range for this data type will be 0 to (2^N)-1 , here N=16.
uint_fast16_t could be typedef to an unsigned int.
1. uint_fast16_t fast = 38;
2. unsigned int will be of 32 bits so the value 38 will be stored using 32 bits. And this will take up 32 bits of memory.
3. what will be the range for this data type ?
uint_fast16_t => uint_fastN_t , here N = 16
but the value can be stored in 32 bits so IS it 0 to (2^16)-1 OR 0 to (2^32)-1 ?
how can we make sure that its not overflowing ?
Since its a 32 bit, Can we assign >65535 to it ?
If it is a signed integer, how signedness is maintained.
For example int_fast16_t = 32768;
since the value falls within the signed int range, it'll be a positive value.
A uint_fast16_t is just the fastest unsigned data type that has at least 16 bits. On some machines it will be 16 bits and on others it could be more. If you use it, you should be careful because arithmetic operations that give results above 0xFFFF could have different results on different machines.
On some machines, yes, you will be able to store numbers larger than 0xFFFF in it, but you should not rely on that being true in your design because on other machines it won't be possible.
Generally the uint_fast16_t type will either be an alias for uint16_t, uint32_t, or uint64_t, and you should make sure the behavior of your code doesn't depend on which type is used.
I would say you should only use uint_fast16_t if you need to write code that is both fast and cross-platform. Most people should stick to uint16_t, uint32_t, and uint64_t so that there are fewer potential issues to worry about when porting code to another platform.
An example
Here is an example of how you might get into trouble:
bool bad_foo(uint_fast16_t a, uint_fast16_t b)
{
uint_fast16_t sum = a + b;
return sum > 0x8000;
}
If you call the function above with a as 0x8000 and b as 0x8000, then on some machines the sum will be 0 and on others it will be 0x10000, so the function could return true or false. Now, if you can prove that a and b will never sum to a number larger than 0xFFFF, or if you can prove that the result of bad_foo is ignored in those cases, then this code would be OK.
A safer implementation of the same code, which (I think) should behave the same way on all machines, would be:
bool good_foo(uint_fast16_t a, uint_fast16_t b)
{
uint_fast16_t sum = a + b;
return (sum & 0xFFFF) > 0x8000;
}

Visual Studio casting issue

I'm trying to do, what I imagined to be, a fairly basic task. I have two unsigned char variables and I'm trying to combine them into a single signed int. The problem here is that the unsigned chars start as signed chars, so I have to cast them to unsigned first.
I've done this task in three IDE's; MPLAB (as this is an embedded application), MATLAB, and now trying to do it in visual studio. Visual is the only one having problems with the casting.
For an example, two signed chars are -5 and 94. In MPLAB I first cast the two chars into unsigned chars:
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
This gives me 251 and 94 respectively. I then want to do some bitshifting and concat:
int c = (int)((((unsigned int) a) << 8) | (unsigned int) b);
In MPLAB and MATLAB this gives me the right signed value of -1186. However, the exact same code in visual refuses to output results as a signed value, only unsigned (64350). This has been checked by both debugging and stepping through the code and printing the results:
printf("%d\n", c);
What am I doing wrong? This is driving me insane. The application is an electronic device that collects sensor data, then stores it on an SD card for later decoding using a program written in C. I technically could do all the calculations in MPLAB and then store those on the SDCARD, but I refuse to let Microsoft win.
I understand my method of casting is very unoptimised and you could probably do it in one line, but having had this problem for a couple of days now I've tried to break the steps down as much as possible.
Any help is most appreciated!
The problem is that an int on most systems is 32-bits. If you concatenate two 8-bit quantities and store it into a 32-bit quantity, you will get a positive integer because you are not setting the sign bit, which is the most significant bit. More specifically, you are only populating the lower 16 bits of a 32-bit integer, which will naturally be interpreted as a positive number.
You can fix this by explicitly using as 16-bit signed int.
#include <stdio.h>
#include <stdint.h>
int main() {
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
int16_t c = (int16_t)((((unsigned int) a) << 8) | (unsigned int) b);
printf("%d\n", c);
}
Note that I am on a Linux system, so you will probably have to change stdint.h to the Microsoft equivalent, and possibly change int16_t to whatever Microsoft calls their 16-bit signed integer type, if it is different, but this should work with those modifications.
This is the correct behavior of the standard C language. When you convert an unsigned to a signed type, the language does not perform sign extension, i.e. it does not propagate the highest bit of the unsigned into the sign bit of the signed type.
You can fix your problem by casting a to a signed char, like this:
unsigned char a = (unsigned char)-5;
unsigned char b = (unsigned char)94;
int c = (signed char)a << 8 | b;
printf("%d\n", c); // Prints -1186
Now that a is treated as signed, the language propagates its top bit into the sign bit of the 32-bit int, making the result negative.
Demo on ideone.
Converting an out-of-range unsigned value to a signed value causes implementation-defined behaviour, which means that the compiler must document what it does in this situation; and different compilers can do different things.
In C99 there is also a provision that the compiler may raise a signal in this case (terminating the program if you don't have a signal handler). I believe it was undefined behaviour in C89, but C99 tightened this up a bit.
Is there some reason you can't go:
signed char x = -5;
signed char y = 94;
int c = x * 256 + y;
?
BTW if you are OK with implementation-defined behaviour, and your system has a 16-bit type then you can just go, with #include <stdint.h>,
int c = (int16_t)(x * 256 + y);
To explain, C deals in values. In math, 251 * 256 + 94 is a positive number, and C is no exception to that. The bit-shift operators are just *2 and /2 in disguise. If you want your value to be reduced (mod 65536) you have to specifically request that.
If you also think in terms of values rather than representations, you don't have to worry about things like sign bits and sign extension.

Long type 64bit linux

Very simple questions guys, but maybe I'm just forgetting something.
In 64bit linux, a long is 8bytes correct?
If that's the case, and I want to set the 64th bit, I can do the following:
unsigned long num = 1<<63;
Whenever I compile this, however, it gives me an error saying that I'm left shifting by more than the width.
Also, if I wanted to take the first 32bits of a long type (without sign extension), can I do:
num = num&0xFFFFFFFF;
or what about:
num = (int)(num);
Thank you.
In 64bit linux, a long is 8bytes correct?
Need not be. Depends on the compiler than on the underlying OS. Check this for a nice discussion.
What decides the sizeof an integer?
Whenever I compile this, however, it gives me an error saying that I'm
left shifting by more than the width
Everyone have already answered this. Use 1UL
Also, if I wanted to take the first 32bits of a long type (without
sign extension), can I do:
num = num&0xFFFFFFFF;
or what about:
num = (int)(num);
num = num&0xFFFFFFFF. This will give you the lower 32-bits. But note that if long is just 4 bytes on your system then you are getting the entire number. Coming to the sign extension part, if you've used a long and not unsigned long then you cannot do away with the sign extended bits. For example, -1 is represented as all ones, right from the 0th bit. How will you avoid these ones by masking?
num = (int)(num) will give you the lower 32-bits but compiler might through a Overflow Exception warning if num does not fit into an int
Actually, if you want some precise length (in bits) for your integers, assuming a C99 conforming compiler, #include <stdint.h> and use types like int64_t, int32_t etc. A handy type is intptr_t, an integer type with the same number of bits as void* pointer (so you can call it a machine "word")
For portability you could use:
limits.h
#define LNG_BIT (sizeof(long) * CHAR_BIT)
unsigned long num = 1UL << (LNG_BIT - 1);
To get "low int", something like?:
#define INT_BIT (sizeof(int) * CHAR_BIT)
if (LNG_BIT > INT_BIT)
return num & (~0UL >> INT_BIT);
else
return num;
or
num &= ~(~0U << INT_BIT);
Or, use mask, etc. Depends in large on why, for what, etc. you want the int bits.
Also notice the options given by compilers; I.e. if you are using gcc:
i386 and x86-64 Options
Submodel-Options
-m32
-m64
-mx32
Generate code for a 32-bit or 64-bit environment.
* The -m32 option sets int, long, and pointer types to 32 bits, and generates code that runs on any i386 system.
* The -m64 option sets int to 32 bits and long and pointer types to 64 bits, and generates code for the x86-64 architecture. For Darwin only the -m64 option also turns off the -fno-pic and -mdynamic-no-pic options.
* The -mx32 option sets int, long, and pointer types to 32 bits, and generates code for the x86-64 architecture.
There is also -maddress-mode=long etc.
-maddress-mode=long
Generate code for long address mode. This is only supported for 64-bit and x32 environments. It is the default address mode for 64-bit environments.
AFAIR code like this was the source of a major bug in reiserfs a few years ago:
unsigned long num = 1<<63;
If you speak of x86_64, yes, a long is 64 bit and on most other 64 bit linux plytforms too. The problem is that both the 1 and the 63 in your code are simple int, and so the result is undefined. Better use
unsigned long num = 1UL<<63;
or
unsigned long num = (unsigned long)1<<63;
I believe the problem with the first question is that the compiler is treating '1' as an integer, not as a long integer. It doesn't figure it out until after the assignment.
You can fix it by doing:
unsigned long num = (unsigned long)1<<63;

Can the type difference between constants 32768 and 0x8000 make a difference?

The Standard specifies that hexadecimal constants like 0x8000 (larger than fits in a signed integer) are unsigned (just like octal constants), whereas decimal constants like 32768 are signed long. (The exact types assume a 16-bit integer and a 32-bit long.) However, in regular C environments both will have the same representation, in binary 1000 0000 0000 0000.
Is a situation possible where this difference really produces a different outcome? In other words, is a situation possible where this difference matters at all?
Yes, it can matter. If your processor has a 16-bit int and a 32-bit long type, 32768 has the type long (since 32767 is the largest positive value fitting in a signed 16-bit int), whereas 0x8000 (since it is also considered for unsigned int) still fits in a 16-bit unsigned int.
Now consider the following program:
int main(int argc, char *argv[])
{
volatile long long_dec = ((long)~32768);
volatile long long_hex = ((long)~0x8000);
return 0;
}
When 32768 is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type long; the cast is
superfluous.
When 0x8000 is considered unsigned int, the negation will invert
16 bits, resulting in a representation 0x7FFF with type unsigned int;
the cast will then zero-extend to a long value of 0x00007FFF.
Look at H&S5, section 2.7.1 page 24ff.
It is best to augment the constants with U, UL or L as appropriate.
On a 32 bit platform with 64 bit long, a and b in the following code will have different values:
int x = 2;
long a = x * 0x80000000; /* multiplication done in unsigned -> 0 */
long b = x * 2147483648; /* multiplication done in long -> 0x100000000 */
Another examine not yet given: compare (with greater-than or less-than operators) -1 to both 32768 and to 0x8000. Or, for that matter, try comparing each of them for equality with an 'int' variable equal to -32768.
Assuming int is 16 bits and long is 32 bits (which is actually fairly unusual these days; int is more commonly 32 bits):
printf("%ld\n", 32768); // prints "32768"
printf("%ld\n", 0x8000); // has undefined behavior
In most contexts, a numeric expression will be implicitly converted to an appropriate type determined by the context. (That's not always the type you want, though.) This doesn't apply to non-fixed arguments to variadic functions, such as any argument to one of the *printf() functions following the format string.
The difference would be if you were to try and add a value to the 16 bit int it would not be able to do so because it would exceed the bounds of the variable whereas if you were using a 32bit long you could add any number that is less than 2^16 to it.

Resources