What happens if arguments passed to sscanf are cast - c

While reviewing and old piece of code, I stumbled upon some coding horror like this one:
struct Foo
{
unsigned int bar;
unsigned char qux;
unsigned char xyz;
unsigned int etc;
};
void horror(const char* s1, const char* s2, const char* s3, const char* s4, Foo* foo)
{
sscanf(s1, "%u", &(foo->bar));
sscanf(s2, "%u", (unsigned int*) &(foo->qux));
sscanf(s3, "%u", (unsigned int*) &(foo->xyz));
sscanf(s4, "%u", &(foo->etc));
}
So, what is actually happening in the second and third sscanf, with the argument passed being a unsigned char* cast to unsigned int*, but with the format specifier for an unsigned integer? Whatever happens is due to undefined behavior, but why is this even "working"?
As far as I know, the cast effectively does nothing in this case (the actual type of the arguments passed as ... is unknown to the called function). However this has been in production for years and it has never crashed and the surrounding values apparently are not overwritten, I suppose because the members of the structure are all aligned to 32 bits. This is even reading the correct value on the target machine (a little endian 32 bit ARM) but I think that it would no longer work on a different endianness.
Bonus question: what is the cleanest correct way to do this? I know that now we have the %hhu format specifier (apparently introduced by C++11), but what about a legacy C89 compiler?
Please note that the original question had uint32_t instead of unsigned int and unsigned char instead of uint8_t but that was just misleading and out of topic, and by the way the original code I was reviewing uses its own typedefs.

Bonus question: what is the cleanest correct way to do this? I know that now we have the %hhu format specifier (apparently introduced by C++11), but what about a legacy C89 compiler?
The <stdint.h> header and its types were introduced in C99, so a C89 compiler won't support them except as an extension.
The correct way to use the *scanf() and *printf() families of functions with the various fixed or minimum-width types is to use the macros from <inttypes.h>. For example:
#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
int main(void) {
int8_t foo;
uint_least16_t bar;
puts("Enter two numbers");
if (scanf("%" SCNd8 " %" SCNuLEAST16, &foo, &bar) != 2) {
fputs("Input failed!\n", stderr);
return EXIT_FAILURE;
}
printf("You entered %" PRId8 " and %" PRIuLEAST16 "\n", foo, bar);
}

In this case from the pointer point of view nothing as on the all modern machines pointers are the same for all types.
But because you use wrong formats - the scanf will write outside the memory allocated to the variables and it is an Undefined Behaviour.

First of all, this of course invokes Undefined Behaviour.
But that kind of horror was quite common in old code, where the C language was used as a higher level assembly language. So here are 2 possible behaviours:
the structure has a 32 bits alignment. All is (rather fine) on a little endian machine: the uint8_t members will recieve the least significant byte of the 32 bits value and the padding bytes will be zeroed (I assume that the program does not try to store a value greater than 255 into an uint8_t)
the structure has not a 32 bits alignement, but the architecture allows scanf to write into mis-aligned variables. The least significant byte of the value read for qux will correctly go into qux and the next 3 zero bytes will erase xyz and etc. On next line, xyz receives its value and etc recieves one more 0 byte. And finally etc will recieve its value. This could have been a rather common hack in the early 80' on an 8086 type machine.
For a portable way, I would use an temporary unsigned integer:
uint32_t u;
sscanf(s1, "%u", &(foo->bar));
sscanf(s2, "%u", &u);
foo->qux = (uint8_t) u;
sscanf(s3, "%u", &u);
foo->xyz = (uint8_t) u;
sscanf(s4, "%u", &(foo->etc));
and trust the compiler to generate code as efficient as the horror way.

OP code is UB as scan specifiers does not match arguments.
cleanest correct way to do this?
Cleaner
#include <inttypes.h>
void horror1(const char* s1, const char* s2, const char* s3, const char* s4, Foo* foo) {
sscanf(s1, "%" SCNu32, &(foo->bar));
sscanf(s2, "%" SCNu8, &(foo->qux));
sscanf(s2, "%" SCNu8, &(foo->xyz));
sscanf(s1, "%" SCNu32, &(foo->etc));
}
Cleanest
Add additional error handling if desired.
void horror2(const char* s1, const char* s2, const char* s3, const char* s4, Foo* foo) {
foo->bar = (uint32_t) strtoul(s1, 0, 10);
foo->qux = (uint8_t) strtoul(s1, 0, 10);
foo->xyz = (uint8_t) strtoul(s1, 0, 10);
foo->etc = (uint32_t) strtoul(s1, 0, 10);
}

Related

Returning 16-bits given a pointer in C

A chunk is represented by a 64-bit long integer, which is broken into 4 16-bit sections.
I need to return a 16-bit section using the function below.
unsigned short get_16bitsection(unsigned long *start, int index) {
// Fill this in
}
It is tempting to use casts to achieve this, but it is a common misconception that "everything is just bytes" and thus that you can do that safely. A rule called strict aliasing actually prohibits doing so. Your code may appear to work, particularly on older and less sophisticated compilers, but in the age of heavy optimisations you are really playing with fire by violating the language rules like that.
Instead, you should copy the bytes you need into a uint16_t, then return it:
uint16_t get_16bitsection(uint64_t *start, int index) {
uint16_t result;
memcpy(&result, (char*)start + index*sizeof(uint16_t), sizeof(uint16_t));
return result;
}
Here I cast to char* so that we can navigate byte-wise through your chunk (this aliasing is a specifically permitted exception to the usual strict-aliasing rule), then apply an offset of index*sizeof(uint16_t) to reach the desired index (assuming little endian, which you have specified). Finally, we copy the bytes into result, and return it.
If you're concerned about performance, don't be. You were already copying a uint16_t from local scope into the calling scope; just now it has a name. And if this function is any slower than the aliasing-violating version, then that's evidence that you've confused the optimiser into going too far.
Just use a union.
long int x=0x123456789abcdef0;
union {
long int x;
unsigned short arr[4];
} c;
c.x = x;
printf("%04x %04x %04x %04x\n", c.arr[0], c.arr[1], c.arr[2], c.arr[3]);
Result:
def0 9abc 5678 1234
Returning 16-bits given a pointer
A chunk is represented by a 64-bit long integer, which is broken into 4 16-bit sections
To access the data in a endian independent portable way and retrieve the 0:LS 16-bit to 3:MS 16-bit, use >>.
As unsigned long may only be 32-bit, recommend unsigned long long or uint_least64_t.
Consider making pointer const to allow this function use on const data.
unsigned short get_16bitsection(const unsigned long long *start, int index) {
#define MASK_16BIT 0xFFFFu
return MASK_16BIT & (*start >> (16*index));
}
Mask useful on rare machines where unsigned short is not 16 bit. IAC, I prefer mask over casts - gentler way to reduce range.
Alternatively use a cast: (unsigned short) or (uint16_t) though this is slightly less portable as uint16_t may not exist and unsigned short may be > 16-bit.
Maybe I'm missing the point here but it could be as easy as this:
unsigned short get_16bitsection_be(unsigned long *start, int index) {
unsigned short *p = (unsigned short*) start;
return p[3 - index];
}
unsigned short get_16bitsection_le(unsigned long *start, int index) {
unsigned short *p = (unsigned short*) start;
return p[index];
}
Where the difference between big and little endian is relevant here.
Note you should consider using stdint.h to give these types more meaningful names and make it clear what you're actually doing:
uint16_t get_16bitsection_le(uint64_t *start, int index) {
uint16_t *p = (uint16_t*) start;
return p[index];
}
uint16_t get_16bitsection_be(uint64_t *start, int index) {
uint16_t *p = (uint16_t*) start;
return p[3 - index];
}
You were on the right track with your second approach, but that code is heavily cluttered by a lot of things that don't matter, plus the * 8 offset which makes no sense.

Hide single item on a struct

I need to hide a single field, not several, inside a structure:
struct MyType1
{
unsigned char Value;
}; //
struct MyType2
{
unsigned void* Value;
} ; //
struct MyType3
{
signed int;
} ; //
What I want is to have the struct type to have the same size, if possible, that the primitive type variables, but that the compiler treated it like a new type.
In some part of the code I want to cast back the structures,
to the simple value.
And also create arrays with this struct type but,
with little space.
MyType1 MyArray[255];
I already check previous answers, but, didn't find it.
Example:
typedef
unsigned int /* rename as */ mydatetime;
// Define new type as
struct mydatetimetype
{
unsigned int /* field */ value;
} ;
Let's suppose I have these functions in the the same program, but different include files :
void SomeFunc ( unsigned int /* param */ anyparam );
void SomeFunc ( mydatetime /* param */ anyparam );
void SomeFunc ( mydatetimetype /* param */ anyparam );
My programming editor or I.D.E. confuses the first two functions.
In some part of the code, later, I will use the packed type with integer operations, but I should be hidden from other programmers, that use this type.
Note that, I also want to apply this feature to other types like pointers or characters.
And, "forwarding" or using an "opaque" structure is not necessary.
How does a single field structure gets padded or packed ?
Should I add an attribute to pack or pad this structure for better performance ?
Is there already a name for this trick ?
I hope that the code below may help you.
The code show you how you may use union to obtain that more type uses the same memory space.
The result of this code might be implemantation dependent, anyway it demonstraits you that all the types specified into the integers union share the same memory space.
A variable declared as integers (in the code is k) is always long as the longer type into the declaration. Then we have that, in the code, the variable k may contains integer types from 8 bits to 64 bits using always 64 bits.
Although I only used integer types, the type you may use inside union declarations may be of whatever type you want also struct types and/or pointers.
#include <unistd.h>
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
typedef union integers {
int8_t i8;
int16_t i16;
int32_t i32;
int64_t i64;
} integers;
typedef struct sInt {
integers a;
integers b;
} sInt;
int main(void) {
integers k;
sInt s;
k.i64=0x1011121314151617;
printf("Int 08: %" PRIx8 "h\n", k.i8 );
printf("Int 16: %" PRIx16 "h\n", k.i16 );
printf("Int 32: %" PRIx32 "h\n", k.i32 );
printf("Int 64: %" PRIx64 "h\n", k.i64 );
s.a.i64=0x1011121314151617;
s.b.i64=0x0102030405060708;
printf("Int a.08: %" PRIx8 "h\n", s.a.i8 );
printf("Int a.16: %" PRIx16 "h\n", s.a.i16 );
printf("Int a.32: %" PRIx32 "h\n", s.a.i32 );
printf("Int a.64: %" PRIx64 "h\n", s.a.i64 );
printf("Int b.08: %" PRIx8 "h\n", s.b.i8 );
printf("Int b.16: %" PRIx16 "h\n", s.b.i16 );
printf("Int b.32: %" PRIx32 "h\n", s.b.i32 );
printf("Int b.64: %" PRIx64 "h\n", s.b.i64 );
return 0;
}
Note: If your problem is the padding into the structure this code is not entirely the answer you're searching for. To manage padding you have to use #pragma pack() (gcc and other compilers manage #pragmas)
Structs can be padded to align address boundaries. So your first and third struct more likely will not have the same size as primitive types.
Single-field structs more likely be padded "behind" the field, but C standard does not state how compiler should carry this out.
You should add attribute if you want cast your structure to primitive type (to be sure you are casting value it stores and not garbage in padding) but, i think (and do not recommend) possible to cast structure to variable and get correct result even without attributes (though it is very implementation dependent). But you will get small performance penalty for every processor attempt to load non-aligned structure from memory.
Also you should be careful, because packing structs may be dangerous

printf char as hex in c

I expect ouput something like \9b\d9\c0... from code below, but I'm getting \ffffff9b\ffffffd9\ffffffc0\ffffff9d\53\ffffffa9\fffffff4\49\ffffffb0\ffff
ffef\ffffffd9\ffffffaa\61\fffffff7\54\fffffffb. I added explicit casting to char, but it has no effect. What's going on here?
typdef struct PT {
// ... omitted
char GUID[16];
} PT;
PT *pt;
// ... omitted
int i;
for(i=0;i<16;i++) {
printf("\\%02x", (char) pt->GUID[i]);
}
Edit: only casting to (unsigned char) worked for me. Compiler spits warnings on me when using %02hhx (gcc -Wall). (unsigned int) had no effect.
The reason why this is happening is that chars on your system are signed. When you pass them to functions with variable number of arguments, such as printf (outside of fixed-argument portion of the signature) chars get converted to int, and they get sign-extended in the process.
To fix this, cast the value to unsigned char:
printf("\\%02hhx", (unsigned char) pt->GUID[i]);
Demo.
Use:
printf("\%02hhx", pt->GUID[i]);
Because printf() is a variadic function, its arguments are promoted to int. The hh modifier tells printf() that the type of the corresponding value is unsigned char and not int.
Cast to unsigned char instead, to avoid a leading 1 bit being interpreted as a negative value.
I noticed that you were getting the F's when the number was larger than 99x.
I wrote this to test it out and discovered the hh prefix at http://www.cplusplus.com/reference/cstdio/printf/
#include "stdio.h"
char GUID[16];
int main() {
int i;
for(i=0;i<16;i++) {
GUID[i]=i*i;
}
for(i=0;i<16;i++) {
printf("\\%02.2hhx\n", GUID[i]);
}
return 0;
}
Use a small own implementation to solve this problem on all platforms:
char hex[] = "0123456789abcdef";
void printHex(unsigned char byte) {
printf("%c%c", hex[byte>>4], hex[byte&0xf]);
}

C Function to Convert float to byte array

I'm trying to make a function that will accept a float variable and convert it into a byte array. I found a snippet of code that works, but would like to reuse it in a function if possible.
I'm also working with the Arduino environment, but I understand that it accepts most C language.
Currently works:
float_variable = 1.11;
byte bytes_array[4];
*((float *)bytes_array) = float_variable;
What can I change here to make this function work?
float float_test = 1.11;
byte bytes[4];
// Calling the function
float2Bytes(&bytes,float_test);
// Function
void float2Bytes(byte* bytes_temp[4],float float_variable){
*(float*)bytes_temp = float_variable;
}
I'm not so familiar with pointers and such, but I read that (float) is using casting or something?
Any help would be greatly appreciated!
Cheers
*EDIT: SOLVED
Here's my final function that works in Arduino for anyone who finds this. There are more efficient solutions in the answers below, however I think this is okay to understand.
Function: converts input float variable to byte array
void float2Bytes(float val,byte* bytes_array){
// Create union of shared memory space
union {
float float_variable;
byte temp_array[4];
} u;
// Overite bytes of union with float variable
u.float_variable = val;
// Assign bytes to input array
memcpy(bytes_array, u.temp_array, 4);
}
Calling the function
float float_example = 1.11;
byte bytes[4];
float2Bytes(float_example,&bytes[0]);
Thanks for everyone's help, I've learnt so much about pointers and referencing in the past 20 minutes, Cheers Stack Overflow!
Easiest is to make a union:
#include <stdio.h>
int main(void) {
int ii;
union {
float a;
unsigned char bytes[4];
} thing;
thing.a = 1.234;
for (ii=0; ii<4; ii++)
printf ("byte %d is %02x\n", ii, thing.bytes[ii]);
return 0;
}
Output:
byte 0 is b6
byte 1 is f3
byte 2 is 9d
byte 3 is 3f
Note - there is no guarantee about the byte order… it depends on your machine architecture.
To get your function to work, do this:
void float2Bytes(byte bytes_temp[4],float float_variable){
union {
float a;
unsigned char bytes[4];
} thing;
thing.a = float_variable;
memcpy(bytes_temp, thing.bytes, 4);
}
Or to really hack it:
void float2Bytes(byte bytes_temp[4],float float_variable){
memcpy(bytes_temp, (unsigned char*) (&float_variable), 4);
}
Note - in either case I make sure to copy the data to the location given as the input parameter. This is crucial, as local variables will not exist after you return (although you could declare them static, but let's not teach you bad habits. What if the function gets called again…)
Here's a way to do what you want that won't break if you're on a system with a different endianness from the one you're on now:
byte* floatToByteArray(float f) {
byte* ret = malloc(4 * sizeof(byte));
unsigned int asInt = *((int*)&f);
int i;
for (i = 0; i < 4; i++) {
ret[i] = (asInt >> 8 * i) & 0xFF;
}
return ret;
}
You can see it in action here: http://ideone.com/umY1bB
The issue with the above answers is that they rely on the underlying representation of floats: C makes no guarantee that the most significant byte will be "first" in memory. The standard allows the underlying system to implement floats however it feels like -- so if you test your code on a system with a particular kind of endianness (byte order for numeric types in memory), it will stop working depending on the kind of processor you're running it on.
That's a really nasty, hard-to-fix bug and you should avoid it if at all possible.
I would recommend trying a "union".
Look at this post:
http://forum.arduino.cc/index.php?topic=158911.0
typedef union I2C_Packet_t{
sensorData_t sensor;
byte I2CPacket[sizeof(sensorData_t)];
};
In your case, something like:
union {
float float_variable;
char bytes_array[4];
} my_union;
my_union.float_variable = 1.11;
Yet another way, without unions:
(Assuming byte = unsigned char)
void floatToByte(byte* bytes, float f){
int length = sizeof(float);
for(int i = 0; i < length; i++){
bytes[i] = ((byte*)&f)[i];
}
}
this seems to work also
#include <stddef.h>
#include <stdint.h>
#include <string.h>
float fval = 1.11;
size_t siz;
siz = sizeof(float);
uint8_t ures[siz];
memcpy (&ures, &fval, siz);
then
float utof;
memcpy (&utof, &ures, siz);
also for double
double dval = 1.11;
siz = sizeof(double);
uint8_t ures[siz];
memcpy (&ures, &dval, siz);
then
double utod;
memcpy (&utod, &ures, siz);
Although the other answers show how to accomplish this using a union, you can use this to implement the function you want like this:
byte[] float2Bytes(float val)
{
my_union *u = malloc(sizeof(my_union));
u->float_variable = val;
return u->bytes_array;
}
or
void float2Bytes(byte* bytes_array, float val)
{
my_union u;
u.float_variable = val;
memcpy(bytes_array, u.bytes_array, 4);
}
**conversion without memory reference** \
#define FLOAT_U32(x) ((const union {float f; uint32_t u;}) {.f = (x)}.u) // float->u32
#define U32_FLOAT(x) ((const union {float f; uint32_t u;}) {.u = (x)}.f) // u32->float
**usage example:**
float_t sensorVal = U32_FLOAT(eeprom_read_dword(&sensor));
First of all, some embedded systems 101:
Anyone telling you to use malloc/new on Arduino have no clue what they are talking about. I wrote a fairly detailed explanation regarding why here: Why should I not use dynamic memory allocation in embedded systems?
You should avoid float on 8 bit microcontrollers since it leads to incredibly inefficient code. They do not have a FPU, so the compiler will be forced to load a very resource-heavy software floating point library to make your code work. General advise here.
Regarding pointer conversions:
C allows all manner of wild and crazy pointer casts. However, there are lots of situations where it can lead to undefined behavior if you cast a character byte array's address into a float* and then de-reference it.
If the address of the byte array is not aligned, it will lead to undefined behavior on systems that require aligned access. (AVR doesn't care about alignment though.)
If the byte array does not contain a valid binary representation of a float number, it could become a trap representation. Similarly you must keep endianess in mind. AVR is an 8-bitter but it's regarded as little endian since it uses little endian format for 16 bit addresses.
It leads to undefined behavior because it goes against the C language "effective type" system, also known as a "strict pointer aliasing violation". What is the strict aliasing rule?
Going the other way around is fine though - taking the address of a float variable and converting it to a character pointer, then de-reference that character pointer to access individual bytes. Multiple special rules in C allows this for serialization purposes and hardware-related programming.
Viable solutions:
memcpy always works fine and then you won't have to care about alignment and strict aliasing. You still have to care about creating a valid floating point representation though.
union "type punning" as demonstrated in other answers. Note that such type punning will assume a certain endianess.
Bit shifting individual bytes and concatenating with | or masking with & as needed. The advantage of this is that it's endianess-independent in some scenarios.
float f=3.14;
char *c=(char *)&f;
float g=0;
char *d=(char *)&g;
for(int i=0;i<4;i++) d[i]=c[i];
/* Now g=3.14 */
Cast your float as char, and assign the address to the char pointer.
Now, c[0] through c[3] contain your float.
http://justinparrtech.com/JustinParr-Tech/c-access-other-data-types-as-byte-array/

Pointing struct's pointer member

I am getting an error from my compiler as follow:
C51 COMPILER V9.01 - SN: C1ADC-HAI60D COPYRIGHT KEIL ELEKTRONIK GmbH
1987 - 2009
* WARNING C260 IN LINE 300 OF SEQUENCE.C: '=': pointer truncation
* ERROR C190 IN LINE 301 OF SEQUENCE.C: '&': not an lvalue
The following is my code:
struct myCond{
unsigned char currStatus;
unsigned char prevStatus;
unsigned int *timer;
unsigned char *flag;
}
struct myCond StatCond;
unsigned int data timerdata;
bit bdata timeflag;
void someSubroutine (void)
{
struct myCond *tempCond;
tempCond = &StatCond;
tempCond->timer = &((unsigned int)timerdata);
tempCond->flag = &((unsigned char)timeflag);
}
Are we supposed to guess which line is 301?
The problems, as I understand are here:
tempCond->timer = &((unsigned int)timerdata);
tempCond->flag = &((unsigned char)timeflag);
(unsigned int)timerdata and (unsigned char)timeflag are values, r-values to be precise. They cannot be modified or assigned unlike l-values, which plain timerdata and timeflag are. And so you can't take addresses of r-values with &. It would be the same as, say, writing &1. 1 on its own does not exist as an object in data memory.
You should write instead:
tempCond->timer = &timerdata;
tempCond->flag = (unsigned char*)&timeflag;
And I'm not quite sure that it is legal to take an address of a bit variable. The last line may fail to compile.
Perhaps redefining the structure would help:
struct myCond{
...
bit bdata *flag; // or maybe without bdata
}
and then you'd write tempCond->flag = &timeflag;.
unsigned int data timerdata; // what is 'data', is it defined?
bit bdata timerflag; // what are 'bit' and 'bdata', are they defined?
Check your code with regard to my questions above. Compiler errors are often reported multiple lines after the real offense.

Resources