Odd behaviour using float pointer vs. creating a float variable on ARM

Odd behaviour using float pointer vs. creating a float variable on ARM - c

I have created an uint8_t array which has data packed into a dense structure which the microcontroller parses to extract variables. I have found a workaround for now, but I am curious as to why my previous code might have failed.
The following function is what I was trying to use:
/*
* Input Configuration Buffer:
* Byte 0:
* Bit 0: Input Joystick Left (0) or Input Joystick Right (1)
* Bit 1: Invert X-Axis (1)
* Bit 2: Invert Y-Axis (1)
* Bit 3: Output Joystick Left (0) or Output Joystick Right (1)
* Bits 4-7: Don't Care
* Byte 1: X-Deadzone 4th-Byte (float)
* Byte 2: X-Deadzone 3rd-Byte (float)
* Byte 3: X-Deadzone 2nd-Byte (float)
* Byte 4: X-Deadzone 1st-Byte (float)
* Byte 5: Y-Deadzone 4th-Byte (float)
* Byte 6: Y-Deadzone 3rd-Byte (float)
* Byte 7: Y-Deadzone 2nd-Byte (float)
* Byte 8: Y-Deadzone 1st-Byte (float)
*/
void Controller_Config_MapInputJoystickAsJoystick(Controller_HandleTypeDef *c, uint8_t *ic_buffer){
uint8_t js_in = GET_BIT(ic_buffer[0], 0);
uint8_t invert_x = GET_BIT(ic_buffer[0], 1);
uint8_t invert_y = GET_BIT(ic_buffer[0], 2);
uint8_t js_out = GET_BIT(ic_buffer[0], 3);
float *deadzone_x = (float *)(&ic_buffer[1]);
float *deadzone_y = (float *)(&ic_buffer[5]);
float val_x = invert_x ? -joysticks[js_in].x.val : joysticks[js_in].x.val;
float val_y = invert_y ? -joysticks[js_in].y.val : joysticks[js_in].y.val;
if((val_x > *deadzone_x) || (val_x < -*deadzone_x)){
c->joysticks._bits[js_out*2 + 0] += (int16_t)(val_x * -INT16_MIN);
}
if((val_y > *deadzone_y) || (val_y < -*deadzone_y)){
c->joysticks._bits[js_out*2 + 1] += (int16_t)(val_y * -INT16_MIN);
}
}
When this code runs, it seems to run fine until it tries to compare (val_x > *deadzone_x) || (val_x < -*deadzone_x). At this point, the microcontroller enters a HardFault condition.
This code is compiled as (asm):
ldr r3, [r7, #24]
vldr s15, [r3]
vldr s14, [r7, #16]
vcmpe.f32 s14, s15
vmrs APSR_nzcv, fpscr
bgt.n 0x80022c6 <Controller_Config_MapInputJoystickAsJoystick+230>
ldr r3, [r7, #24]
vldr s15, [r3]
vneg.f32 s15, s15
vldr s14, [r7, #16]
vcmpe.f32 s14, s15
vmrs APSR_nzcv, fpscr
bpl.n 0x8002302 <Controller_Config_MapInputJoystickAsJoystick+290>
In the assembly above, the microcontroller HardFaults on the instr, vldr s15, [r3] at which point, the s15 register is 1.00000238 and the r3 register is 0x802007d (This is the address of the float *deadzone_x).
I am concerned as to why this might be failing? I would love to use this function using a float * without casting a new float variable of a pointer. The instruction difference is considerable and would save a lot of processing time.
I don't know whether this is a general C programming misconception or perhaps something to do with the compiled ARM assembly? Any help would be appreciated.
I can provide you with any data, views or whatever might help. Thanks.
Edit 1:
This is workaround I am using, not much difference. I just dereference the pointer. You hate to C it.
/*
* Input Configuration Buffer:
* Byte 0:
* Bit 0: Input Joystick Left (0) or Input Joystick Right (1)
* Bit 1: Invert X-Axis (1)
* Bit 2: Invert Y-Axis (1)
* Bit 3: Output Joystick Left (0) or Output Joystick Right (1)
* Bits 4-7: Don't Care
* Byte 1: X-Deadzone 4th-Byte (float)
* Byte 2: X-Deadzone 3rd-Byte (float)
* Byte 3: X-Deadzone 2nd-Byte (float)
* Byte 4: X-Deadzone 1st-Byte (float)
* Byte 5: Y-Deadzone 4th-Byte (float)
* Byte 6: Y-Deadzone 3rd-Byte (float)
* Byte 7: Y-Deadzone 2nd-Byte (float)
* Byte 8: Y-Deadzone 1st-Byte (float)
*/
void Controller_Config_MapInputJoystickAsJoystick(Controller_HandleTypeDef *c, uint8_t *ic_buffer){
uint8_t js_in = GET_BIT(ic_buffer[0], 0);
uint8_t invert_x = GET_BIT(ic_buffer[0], 1);
uint8_t invert_y = GET_BIT(ic_buffer[0], 2);
uint8_t js_out = GET_BIT(ic_buffer[0], 3);
float deadzone_x = (float)(*(float *)(&ic_buffer[1]));
float deadzone_y = (float)(*(float *)(&ic_buffer[5]));
float val_x = invert_x ? -joysticks[js_in].x.val : joysticks[js_in].x.val;
float val_y = invert_y ? -joysticks[js_in].y.val : joysticks[js_in].y.val;
if((val_x > deadzone_x) || (val_x < -deadzone_x)){
c->joysticks._bits[js_out*2 + 0] += (int16_t)(val_x * -INT16_MIN);
}
if((val_y > deadzone_y) || (val_y < -deadzone_y)){
c->joysticks._bits[js_out*2 + 1] += (int16_t)(val_y * -INT16_MIN);
}
}
Debugger Output of Working Code
Debugger Output of Non-Working Code
Edit 2:
Ill-Alignment

I am concerned as to why this might be failing?
Very likely, #Gerhardh has it right in comments: your particular conversions of uint8_t * to type float * result in pointers that are not correctly aligned for type float (alignment requirements are type- and architecture-specific). C has this to say about that:
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined.
(C17, 6.3.2.3/7)
Your debugging demonstrates that the address in question is odd, whereas it is entirely plausible that your hardware requires float accesses to be aligned on 4-byte boundaries.
Moreover, accessing a slice of your byte array as if it were a float also violates C's strict aliasing rule (C17 6.5/7):
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
Undefined behavior results on account of this violation as well. In a sense, this is the more fundamental issue, because if you were not violating the SAR then there would be no opportunity for a pointer conversion with a misaligned result.
I would love to use this function using a float * without casting a new float variable of a pointer. The instruction difference is considerable and would save a lot of processing time.
The difference between the code that breaks and the one that doesn't is not casting, but the fact that you immediately dereference the converted pointer, as opposed to storing it in a variable and dereferencing it later. The (float) casts are redundant. The immediate dereferencing does not make your program's behavior defined, but apparently it does induce the compiler to produce code that works correctly.
The difference in instruction count between machine code that does not work and machine code that does work is absolutely irrelevant. You do not save processing time in any useful sense by executing code that causes a hard fault. Furthermore, greater instruction count does not necessarily mean slower code.
In any case, the correct thing to do here as far as the C language is concerned is to copy the data out to your float byte by byte. If you have the C library available then memcpy() would be the conventional way to do this, and many compilers will recognize that usage and optimize the function call away:
float deadzone_x;
memcpy(&deadzone_x, ic_buffer + 1, sizeof(deadzone_x));
If you are building for a freestanding implementation that does not provide memcpy(), then I would write my own for this purpose:
inline void my_memcpy(void *dest, const void *src, size_t size) {
char *d = dest;
const char *s = src;
for (size_t i = 0; i < size; i++) {
*d++ = *s++;
}
}
Probably the compiler will indeed inline that for you (not guaranteed by the inline keyword), but if not then you could inline it manually.
Of course, if you were writing to the packed buffer then you would need to perform a similar copy in the other direction.
On a somewhat larger scale, it is possible that you would benefit from unpacking the array into a standard C structure, once, and then passing around and working with that instead of working directly on the packed buffer. This is of course very situation dependent.

Related

ARM Cortex M4 - C Programming and Memory Access Optimization

The following three lines of code are optimized ways to modify bits with 1 MOV instruction instead of using a less interrupt safe read-modify-write idiom. They are identical to each other, and set the LED_RED bit in the GPIO Port's Data Register:
*((volatile unsigned long*)(0x40025000 + (LED_RED << 2))) = LED_RED;
*(GPIO_PORTF_DATA_BITS_R + LED_RED) = LED_RED;
GPIO_PORTF_DATA_BITS_R[LED_RED] = LED_RED;
LED_RED is simply (volatile unsigned long) 0x02. In the memory map of this microcontroller, the first 2 LSBs of this register (and others) are unused, so the left shift in the first example makes sense.
The definition for GPIO_PORTF_DATA_BITS_R is:
#define GPIO_PORTF_DATA_BITS_R ((volatile unsigned long *)0x40025000)
My question is: How come I do not need to left shift twice when using pointer arithmetic or array indexing (2nd method and 3rd method, respectively)? I'm having a hard time understanding. Thank you in advance.

Remember how C pointer arithmetic works: adding an offset to a pointer operates in units of the type pointed to. Since GPIO_PORTF_DATA_BITS_R has type unisgned long *, and sizeof(unsigned long) == 4, then GPIO_PORTF_DATA_BITS_R + LED_RED effectively adds 2 * 4 = 8 bytes.
Note that (1) does arithmetic on 0x40025000, which is an integer, not a pointer, so we need to add 8 to get the same result. Left shift by 2 is the same as multiplication by 4, so LED_RED << 2 again equals 8.
(3) is exactly equivalent to (2) by definition of the [] operator.

The right way to use function _mm_clflush to flush a large struct

I am starting to use functions like _mm_clflush, _mm_clflushopt, and _mm_clwb.
Say now as I have defined a struct name mystruct and its size is 256 Bytes. My cacheline size is 64 Bytes. Now I want to flush the cacheline that contains the mystruct variable. Which of the following way is the right way to do so?
_mm_clflush(&mystruct)
or
for (int i = 0; i < sizeof(mystruct)/64; i++) {
_mm_clflush( ((char *)&mystruct) + i*64)
}

The clflush CPU instruction doesn't know the size of your struct; it only flushes exactly one cache line, the one containing the byte pointed to by the pointer operand. (The C intrinsic exposes this as a const void*, but char* would also make sense, especially given the asm documentation which describes it as an 8-bit memory operand.)
You need 4 flushes 64 bytes apart, or maybe 5 if your struct isn't alignas(64) so it could have parts in 5 different lines. (You could unconditionally flush the last byte of the struct, instead of using more complex logic to check if it's in a cache line you haven't flushed yet, depending on relative cost of clflush vs. more logic and a possible branch mispredict.)
Your original loop did 4 flushes of 4 adjacent bytes at the start of your struct.
It's probably easiest to use pointer increments so the casting is not mixed up with the critical logic.
// first attempt, a bit clunky:
const int LINESIZE = 64;
const char *lastbyte = (const char *)(&mystruct+1) - 1;
for (const char *p = (const char *)&mystruct; p <= lastbyte ; p+=LINESIZE) {
_mm_clflush( p );
}
// if mystruct is guaranteed aligned by 64, you're done. Otherwise not:
// check if next line to maybe flush contains the last byte of the struct; if not then it was already flushed.
if( ((uintptr_t)p ^ (uintptr_t)lastbyte) & -LINESIZE == 0 )
_mm_clflush( lastbyte );
x^y is 1 in bit-positions where they differ. x & -LINESIZE discards the offset-within-line bits of the address, keeping only the line-number bits. So we can see if 2 addresses are in the same cache line or not with just XOR and TEST instructions. (Or clang optimizes that to a shorter cmp instruction).
Or rewrite that into a single loop, using that if logic as the termination condition:
I used a C++ struct foo &var reference so I could follow your &var syntax but still see how it compiles for a function taking a pointer arg. Adapting to C is straightforward.
Looping over every cache line of an arbitrary size unaligned struct
/* I think this version is best:
* compact setup / small code-size
* with no extra latency for the initial pointer
* doesn't need to peel a final iteration
*/
inline
void flush_structfoo(struct foo &mystruct) {
const int LINESIZE = 64;
const char *p = (const char *)&mystruct;
uintptr_t endline = ((uintptr_t)&mystruct + sizeof(mystruct) - 1) | (LINESIZE-1);
// set the offset-within-line address bits to get the last byte
// of the cacheline containing the end of the struct.
do { // flush while p is in a cache line that contains any of the struct
_mm_clflush( p );
p += LINESIZE;
} while(p <= (const char*)endline);
}
With GCC10.2 -O3 for x86-64, this compiles nicely (Godbolt)
flush_v3(foo&):
lea rax, [rdi+255]
or rax, 63
.L11:
clflush [rdi]
add rdi, 64
cmp rdi, rax
jbe .L11
ret
GCC doesn't unroll, and doesn't optimize any better if you use alignas(64) struct foo{...}; unfortunately. You might use if (alignof(mystruct) >= 64) { ... } to check if special handling is needed to let GCC optimize better, otherwise just use end = p + sizeof(mystruct); or end = (const char*)(&mystruct+1) - 1; or similar.
(In C, #include <stdalign.h> for #define for alignas() and alignof() like C++, instead of ISO C11 _Alignas and _Alignof keywords.)
Another alternative is this, but it's clunkier and takes more setup work.
const int LINESIZE = 64;
uintptr_t line = (uintptr_t)&mystruct & -LINESIZE;
uintptr_t lastline = ((uintptr_t)&mystruct + sizeof(mystruct) - 1) & -LINESIZE;
do { // always at least one flush; works on small structs
_mm_clflush( (void*)line );
line += LINESIZE;
} while(line < lastline);
A struct that was 257 bytes would always touch exactly 5 cache lines, no checking needed. Or a 260-byte struct that's known to be aligned by 4. IDK if we can get GCC to optimize away the checks based on that.

Bitwise operation in C language (0x80, 0xFF, << )

I have a problem understanding this code. What I know is that we have passed a code into a assembler that has converted code into "byte code". Now I have a Virtual machine that is supposed to read this code. This function is supposed to read the first byte code instruction. I don't understand what is happening in this code. I guess we are trying to read this byte code but don't understand how it is done.
static int32_t bytecode_to_int32(const uint8_t *bytecode, size_t size)
{
int32_t result;
t_bool sign;
int i;
result = 0;
sign = (t_bool)(bytecode[0] & 0x80);
i = 0;
while (size)
{
if (sign)
result += ((bytecode[size - 1] ^ 0xFF) << (i++ * 8));
else
result += bytecode[size - 1] << (i++ * 8);
size--;
}
if (sign)
result = ~(result);
return (result);
}

This code is somewhat badly written, lots of operations on a single line and therefore containing various potential bugs. It looks brittle.
bytecode[0] & 0x80 Simply reads the MSB sign bit, assuming it's 2's complement or similar, then converts it to a boolean.
The loop iterates backwards from most significant byte to least significant.
If the sign was negative, the code will perform an XOR of the data byte with 0xFF. Basically inverting all bits in the data. The result of the XOR is an int.
The data byte (or the result of the above XOR) is then bit shifted i * 8 bits to the left. The data is always implicitly promoted to int, so in case i * 8 happens to give a result larger than INT_MAX, there's a fat undefined behavior bug here. It would be much safer practice to cast to uint32_t before the shift, carry out the shift, then convert to a signed type afterwards.
The resulting int is converted to int32_t - these could be the same type or different types depending on system.
i is incremented by 1, size is decremented by 1.
If sign was negative, the int32_t is inverted to some 2's complement negative number that's sign extended and all the data bits are inverted once more. Except all zeros that got shifted in with the left shift are also replaced by ones. If this is intentional or not, I cannot tell. So for example if you started with something like 0x0081 you now have something like 0xFFFF01FF. How that format makes sense, I have no idea.
My take is that the bytecode[size - 1] ^ 0xFF (which is equivalent to ~) was made to toggle the data bits, so that they would later toggle back to their original values when ~ is called later. A programmer has to document such tricks with comments, if they are anything close to competent.
Anyway, don't use this code. If the intention was merely to swap the byte order (endianess) of a 4 byte integer, then this code must be rewritten from scratch.
That's properly done as:
static int32_t big32_to_little32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}
Anything more complicated than the above is highly questionable code. We need not worry about signs being a special case, the above code preserves the original signedness format.

So the A^0xFF toggles the bits set in A, so if you have 10101100 xored with 11111111.. it will become 01010011. I am not sure why they didn't use ~ here. The ^ is a xor operator, so you are xoring with 0xFF.
The << is a bitshift "up" or left. In other words, A<<1 is equivalent to multiplying A by 2.
the >> moves down so is equivalent to bitshifting right, or dividing by 2.
The ~ inverts the bits in a byte.
Note it's better to initialise variables at declaration it costs no additional processing whatsoever to do it that way.
sign = (t_bool)(bytecode[0] & 0x80); the sign in the number is stored in the 8th bit (or position 7 counting from 0), which is where the 0x80 is coming from. So it's literally checking if the signed bit is set in the first byte of bytecode, and if so then it stores it in the sign variable.
Essentially if it's unsigned then it's copying the bytes from from bytecode into result one byte at a time.
If the data is signed then it flips the bits then copies the bytes, then when it's done copying, it flips the bits back.
Personally with this kind of thing i prefer to get the data, stick in htons() format (network byte order) and then memcpy it to an allocated array, store it in a endian agnostic way, then when i retrieve the data i use ntohs() to convert it back to the format used by the computer. htons() and ntohs() are standard C functions and are used in networking and platform agnostic data formatting / storage / communication all the time.

This function is a very naive version of the function which converts form the big endian to little endian.
The parameter size is not needed as it works only with the 4 bytes data.
It can be much easier archived by the union punning (and it allows compilers to optimize it - in this case to the simple instruction):
#define SWAP(a,b,t) do{t c = (a); (a) = (b); (b) = c;}while(0)
int32_t my_bytecode_to_int32(const uint8_t *bytecode)
{
union
{
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.b8[3] = *bytecode++;
i32.b8[2] = *bytecode++;
i32.b8[1] = *bytecode++;
i32.b8[0] = *bytecode++;
return i32.i32;
}
int main()
{
union {
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.i32 = -4567;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", bytecode_to_int32(i32.b8, 4));
i32.i32 = -34;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", my_bytecode_to_int32(i32.b8));
}
https://godbolt.org/z/rb6Na5

If the purpose of the code is to sign-extend a 1-, 2-, 3-, or 4-byte sequence in network/big-endian byte order to a signed 32-bit int value, it's doing things the hard way and reimplementing the wheel along the way.
This can be broken down into a three-step process: convert the proper number of bytes to a 32-bit integer value, sign-extend bytes out to 32 bits, then convert that 32-bit value from big-endian to the host's byte order.
The "wheel" being reimplemented in this case is the the POSIX-standard ntohl() function that converts a 32-bit unsigned integer value in big-endian/network byte order to the local host's native byte order.
The first step I'd do is to convert 1, 2, 3, or 4 bytes into a uint32_t:
#include <stdint.h>
#include <limits.h>
#include <arpa/inet.h>
#include <errno.h>
// convert the `size` number of bytes starting at the `bytecode` address
// to a uint32_t value
static uint32_t bytecode_to_uint32( const uint8_t *bytecode, size_t size )
{
uint32_t result = 0;
switch ( size )
{
case 4:
result = bytecode[ 0 ] << 24;
case 3:
result += bytecode[ 1 ] << 16;
case 2:
result += bytecode[ 2 ] << 8;
case 1:
result += bytecode[ 3 ];
break;
default:
// error handling here
break;
}
return( result );
}
Then, sign-extend it (borrowing from this answer):
static uint32_t sign_extend_uint32( uint32_t in, size_t size );
{
if ( size == 4 )
{
return( in );
}
// being pedantic here - the existence of `[u]int32_t` pretty
// much ensures 8 bits/byte
size_t bits = size * CHAR_BIT;
uint32_t m = 1U << ( bits - 1 );
uint32_t result = ( in ^ m ) - m;
return ( result );
}
Put it all together:
static int32_t bytecode_to_int32( const uint8_t *bytecode, size_t size )
{
uint32_t result = bytecode_to_uint32( bytecode, size );
result = sign_extend_uint32( result, size );
// set endianness from network/big-endian to
// whatever this host's endianness is
result = ntohl( result );
// converting uint32_t here to signed int32_t
// can be subject to implementation-defined
// behavior
return( result );
}
Note that the conversion from uint32_t to int32_t implicitly performed by the return statement in the above code can result in implemenation-defined behavior as there can be uint32_t values that can not be mapped to int32_t values. See this answer.
Any decent compiler should optimize that well into inline functions.
I personally think this also needs much better error handling/input validation.

Pointer subtraction, 32-bit ARM, negative distance reported as postive

When performing subtraction of pointers and the first pointer is less than the second, I'm getting an underflow error with the ARM processor.
Example code:
#include <stdint.h>
#include <stdbool.h>
uint8_t * p_formatted_data_end;
uint8_t formatted_text_buffer[10240];
static _Bool
Flush_Buffer_No_Checksum(void)
{
_Bool system_failure_occurred = false;
p_formatted_data_end = 0; // For demonstration puposes.
const signed int length =
p_formatted_data_end - &formatted_text_buffer[0];
if (length < 0)
{
system_failure_occurred = true;
}
//...
return true;
}
The assembly code generated by the IAR compiler is:
807 static _Bool
808 Flush_Buffer_No_Checksum(void)
809 {
\ Flush_Buffer_No_Checksum:
\ 00000000 0xE92D4070 PUSH {R4-R6,LR}
\ 00000004 0xE24DD008 SUB SP,SP,#+8
810 _Bool system_failure_occurred = false;
\ 00000008 0xE3A04000 MOV R4,#+0
811 p_formatted_data_end = 0; // For demonstration purposes.
\ 0000000C 0xE3A00000 MOV R0,#+0
\ 00000010 0x........ LDR R1,??DataTable3_7
\ 00000014 0xE5810000 STR R0,[R1, #+0]
812 const signed int length =
813 p_formatted_data_end - &formatted_text_buffer[0];
\ 00000018 0x........ LDR R0,??DataTable3_7
\ 0000001C 0xE5900000 LDR R0,[R0, #+0]
\ 00000020 0x........ LDR R1,??DataTable7_7
\ 00000024 0xE0505001 SUBS R5,R0,R1
814 if (length < 0)
\ 00000028 0xE3550000 CMP R5,#+0
\ 0000002C 0x5A000009 BPL ??Flush_Buffer_No_Checksum_0
815 {
816 system_failure_occurred = true;
\ 00000030 0xE3A00001 MOV R0,#+1
\ 00000034 0xE1B04000 MOVS R4,R0
The subtraction instruction SUBS R5,R0,R1 is equivalent to:
R5 = R0 - R1
The N bit in the CPSR register will be set if the result is negative.
Ref: Section A4.1.106 SUB of ARM Architecture Reference Manual
Let:
R0 == 0x00000000
R1 == 0x802AC6A5
Register R5 will have the value 0x7FD5395C.
The N bit of the CPSR register is 0, indicating the result is not negative.
The Windows 7 Calculator application is reporting negative, but only when expressed as 64-bits: FFFFFFFF7FD5395C.
As an experiment, I used the ptrdiff_t type for the length, and the same assembly language was generated.
Questions:
Is this valid behavior, to have the result of pointer subtraction to
underflow?
What is the recommended data type to view the distance as negative?
Platform:
Target Processor: ARM Cortex A8 (TI AM3358)
Compiler: IAR 7.40
Development platform: Windows 7.

Is this valid behavior, to have the result of pointer subtraction to underflow?
Yes, because the behavior in your case is undefined. Any behavior is valid there. As was observed in comments, the difference between two pointers is defined only for pointers that point to elements of the same array object, or one past the last element of the array object (C2011, 6.5.6/9).
What is the recommended data type to view the distance as negative?
Where it is defined, the result of subtracting two pointers is specified to be of type ptrdiff_t, a signed integer type of implementation-defined size. If you evaluate p1 - p2, where p1 points to an array element and p2 points to a later element of the same array, then the result will be a negative number representable as a ptrdiff_t.

Although this is UB as stated in the other answer, most C implementations will simply subtract these pointers anyway ptrdiff_t size (or possibly using appropriate arithmetic for their word size which might also be different if both operands are near/far/huge pointers). The result should fit inside ptrdiff_t, which is usually a typedef-ed int on ARM:
typedef int ptrdiff_t;
So the issue with your code in this particular case will simply be that you are treating an unsigned int value as signed, and it doesn't fit. As specified in your question, the address of formatted_text_buffer is 0x802AC6A5, which fits inside unsigned int, but (int)0x802AC6A5 in two's complement form is actually a negative number (-0x7FD5395B). So subtracting a negative number from 0 will return a positive int as expected.
Signed 32-bit integer subtraction will work correctly if both operands are less than 0x7FFFFFFF apart, and it's reasonable to expect your arrays to be smaller than that:
// this will work
const int length = &formatted_text_buffer[0] - &formatted_text_buffer[100];
Or, if you really need to do subtract pointers which don't fit into signed 32-bit ints, use long long instead:
// ...but I doubt you really want this
const long long length = (long long)p_formatted_data_end -
(long long)&formatted_text_buffer[0];

how to cast uint8_t array of 4 to uint32_t in c

I am trying to cast an array of uint8_t to an array of uint32_t, but it seems not to be working.
Can any one help me on this. I need to get uint8_t values to uint32_t.
I can do this with shifting but i think there is a easy way.
uint32_t *v4full;
v4full=( uint32_t *)v4;
while (*v4full) {
if (*v4full & 1)
printf("1");
else
printf("0");
*v4full >>= 1;
}
printf("\n");

Given the need to get uint8_t values to uint32_t, and the specs on in4_pton()...
Try this with a possible correction on the byte order:
uint32_t i32 = v4[0] | (v4[1] << 8) | (v4[2] << 16) | (v4[3] << 24);

There is a problem with your example - actually with what you are trying to do (since you don't want the shifts).
See, it is a little known fact, but you're not allowed to switch pointer types in this manner
specifically, code like this is illegal:
type1 *vec1=...;
type2 *vec2=(type2*)vec1;
// do stuff with *vec2
The only case where this is legal is if type2 is char (or unsigned char or const char etc.), but if type2 is any other type (uint32_t in your example) it's against the standard and may introduce bugs to your code if you compile with -O2 or -O3 optimization.
This is called the "strict-aliasing rule" and it allows compilers to assume that pointers of different types never point to related points in memory - so that if you change the memory of one pointer, the compiler doesn't have to reload all other pointers.
It's hard for compilers to find instances of breaking this rule, unless you make it painfully clear to it. For example, if you change your code to do this:
uint32_t v4full=*((uint32_t*)v4);
and compile using -O3 -Wall (I'm using gcc) you'll get the warning:
warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
So you can't avoid using the shifts.
Note: it will work on lower optimization settings, and it will also work in higher settings if you never change the info pointer to by v4 and v4_full. It will work, but it's still a bug and still "against the rules".

If v4full is a pointer then the line
uint32_t *v4full;
v4full=( uint32_t)&v4;
Should throw an error or at least a compiler warning. Maybe you mean to do
uint32_t *v4full;
v4full=( uint32_t *) v4;
Where I assume v4 is itself a pointer to a uint8 array. I realize I am extrapolating from incomplete information…
EDIT since the above appears to have addressed a typo, let's try again.
The following snippet of code works as expected - and as I think you want your code to work. Please comment on this - how is this code not doing what you want?
#include <stdio.h>
#include <inttypes.h>
int main(void) {
uint8_t v4[4] = {1,2,3,4};
uint32_t *allOfIt;
allOfIt = (uint32_t*)v4;
printf("the number is %08x\n", *allOfIt);
}
Output:
the number is 04030201
Note - the order of the bytes in the printed number is reversed - you get 04030201 instead of 01020304 as you might have expected / wanted. This is because my machine (x86 architecture) is little-endian. If you want to make sure that the order of the bytes is the way you want it (in other words, that element [0] corresponds to the most significant byte) you are better off using #bvj's solution - shifting each of the four bytes into the right position in your 32 bit integer.
Incidentally, you can see this earlier answer for a very efficient way to do this, if needed (telling the compiler to use a built in instruction of the CPU).

One other issue that makes this code non-portable is that many architectures require a uint32_t to be aligned on a four-byte boundary, but allow uint8_t to have any address. Calling this code on an improperly-aligned array would then cause undefined behavior, such as crashing the program with SIGBUS. On these machines, the only way to cast an arbitrary uint8_t[] to a uint32_t[] is to memcpy() the contents. (If you do this in four-byte chunks, the compiler should optimize to whichever of an unaligned load or two-loads-and-a-shift is more efficient on your architecture.)
If you have control over the declaration of the source array, you can #include <stdalign.h> and then declare alignas(uint32_t) uint8_t bytes[]. The classic solution is to declare both the byte array and the 32-bit values as members of a union and type-pun between them. It is also safe to use pointers obtained from malloc(), since these are guaranteed to be suitably-aligned.

This is one solution:
/* convert character array to integer */
uint32_t buffChar_To_Int(char *array, size_t n){
int number = 0;
int mult = 1;
n = (int)n < 0 ? -n : n; /* quick absolute value check */
/* for each character in array */
while (n--){
/* if not digit or '-', check if number > 0, break or continue */
if((array[n] < '0' || array[n] > '9') && array[n] != '-'){
if(number)
break;
else
continue;
}
if(array[n] == '-'){ /* if '-' if number, negate, break */
if(number){
number = -number;
break;
}
}
else{ /* convert digit to numeric value */
number += (array[n] - '0') * mult;
mult *= 10;
}
}
return number;
}

One more solution:
u32 ip;
if (!in4_pton(str, -1, (u8 *)&ip, -1, NULL))
return -EINVAL;
... use ip as it defined above - (as variable of type u32)
Here we use result of in4_pton function (ip) without any additional variables and castings.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Odd behaviour using float pointer vs. creating a float variable on ARM - c

Related

ARM Cortex M4 - C Programming and Memory Access Optimization

The right way to use function _mm_clflush to flush a large struct

Bitwise operation in C language (0x80, 0xFF, << )

Pointer subtraction, 32-bit ARM, negative distance reported as postive

how to cast uint8_t array of 4 to uint32_t in c

Categories

Resources