Pointer casting problem in C with Atollic TrueSTUDIO for STM32 IDE - c

Working on STM32H7 in Atollic TrueSTUDIO for STM32 IDE.
Only C coding. Using FreeRTOS.
Ui08 *pointerSomething;
Ui64 localVariable;
pointerSomething=&addressOfSomething;
localVariable = *(Ui64*)(pointerSomething);
These code is generally working.
But one of my usage in a case in a thread in something like that;
thread begin //
Ui08 *pointerSomething;
Ui64 localVariable;
case 3:
pointerSomething=&addressOfSomething;
localVariable = *(Ui64*)(pointerSomething);
break;
thread end //
And I am getting a hardfault when the second sequence in these case. I mean first time in case working properly but second time in case getting hardfault exactly the line of localVariable = *(Ui64*)(pointerSomething);
thread begin //
Ui08 *pointerSomething;
Ui64 localVariable;
case 3:
pointerSomething=&addressOfSomething;
memcpy( &localVariable, pointerSomething, sizeof(localVariable) );
break;
thread end //
If I change these line as you can see above, the problem is fixing for all time of case. But my question is why this problem is occuring, casting type of line?

There is nothing to guess here.
gcc is compiling for Cortex-M7 (thumb) 64-bit pointer pun to the LDRD instruction. LDRD instruction requires aligned addresses. That is the reason why you are getting hardFaults from time to time when the address is not word aligned.
https://godbolt.org/z/o9sPvfaon
You need to make sure that the pointer references correctly aligned data. During debugging you can for example:
case 3:
if((uint32_t)pointerSomething & 3)
{
__BKPT();
}
localVariable = *(Ui64*)(pointerSomething);
and you will be able to see what is causing the HF.

Related

Code execution exploit Cortex M4

For testing the MPU and playing around with exploits, I want to execute code from a local buffer running on my STM32F4 dev board.
int main(void)
{
uint16_t func[] = { 0x0301f103, 0x0301f103, 0x0301f103 };
MPU->CTRL = 0;
unsigned int address = (void*)&func+1;
asm volatile(
"mov r4,%0\n"
"ldr pc, [r4]\n"
:
: "r"(address)
);
while(1);
}
In main, I first turn of the MPU. In func my instructions are stored. In the ASM part I load the address (0x2001ffe8 +1 for thumb) into the program counter register. When stepping through the code with GDB, in R4 the correct value is stored and then transfered to PC register. But then I will end up in the HardFault Handler.
Edit:
The stack looks like this:
0x2001ffe8: 0x0301f103 0x0301f103 0x0301f103 0x2001ffe9
The instructions are correct in the memory. Definitive Guide to Cortex says region 0x20000000–0x3FFFFFFF is the SRAM and "this region is executable,
so you can copy program code here and execute it".
You are assigning 32 bit values to a 16 bit array.
Your instructions dont terminate, they continue on to run into whatever is found in ram, so that will crash.
You are not loading the address to the array into the program counter you are loading the first item in the array into the program counter, this will crash, you created a level of indirection.
Look at the BX instruction for this rather than ldr pc
You did not declare the array as static, so the array can be optimized out as dead and unused, so this can cause it to crash.
The compiler should also complain that you are assigning a void* to an unsigned variable, so a typecast is wanted there.
As a habit I recommend address|=1 rather than +=1, in this case either will function.

Program is not returning expected pc registry address for buffer overflow

Using the following program, I am trying to achieve a buffer overflow:
#include <stdio.h>
#include <string.h>
void vuln(){
char buff[16];
scanf("%s",buff);
printf("You entered: %s\n",buff);
}
void secret(){
printf("You shouldn't be here.");
}
int main(){
vuln();
return 0;
}
When entering AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMM as the input, the crash log reports that R15 (PC) register contains the value 0x46464644
My question is why isn't the address 0x46464646 ? How is it going to three F's followed by one D? The expected result should be 0x46464646 because that is where the data should be overwritten.
ARM instructions are aligned, either on a 2-byte or 4-byte boundary depending on whether they are THUMB or ARM instructions. This means that the LSb of legal branch targets is always zero. So ARM took advantage of this, and used the LSb of a branch address (for some branch instructions) to denote whether to switch between ARM and THUMB.
It is most likely that you overwrote the stored LR with 0x46464645 or 0x46464646, and then the processor discarded your LSb (or LS2bs) (potentially using it to select the mode to take when executing code at the destination).
Try changing the last of your 'E' characters or first of your 'F's to something with a recognizable upper-6-bits to test this hypothesis. (presuming your ARM is running little-endian.) It's most likely the first 'F' considering ARM's alignment requirements.
Edit: why are people down-voting this question? It is both interesting and well-asked.
Here's a reference to the behavior on branching that mentions the mode switch: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489e/Cihfddaf.html

C: switch case with hex value ignore the 0 in the front

I init a uint32 variable by shifting two uint8 variables.
uint32_t test = ((first_uint8 << 16) | second_uint8);
In my test, the value of test should be 0x00170001 (0x17<<16|0x01)
Then it runs into a switch case
switch(test) {
case 0x00170001:
//do sth
break;
default:
//printf invalid
break;
}
It should go to the first case, however, it always run into default.
Then I printf("%x", test), the result is 0x170001, which is equal to 0x00170001.
At last I try to modify it like this:
switch(test) {
case 0x170001:
//do sth
break;
default:
//printf invalid
break;
}
Then it works well.
So I'm curious about the result.
for a uint32_t variable, why 0x170001 does not equal to 0x00170001?
if it is caused by I didn't memset test by 0, then test should also not be equal to 0x170001, it should be 0x11170001 or something with a garbage first byte.?
is it caused by the compiler ignores the 0 in the front of hex value? I'm using Android NDK to compile my c code.
for a uint32_t variable, why 0x170001 does not equal to 0x00170001?
It does. 0x170001, 0x0170001 and 0x00170001 are all equal.
2.if it is caused by I didn't memset test by 0, then test should also not be equal to 0x170001, it should be 0x11170001 or something with a garbage first byte.?
It is not caused by a missing memset. Any assignment test = X will set all bits in test.
3.is it caused by the compiler ignores the 0 in the front of hex value? I'm using Android NDK to compile my c code.
That could explain it but I doubt it. I think it is more likely that your being tricked by something else. Perhaps you are not actually running the code, you think. If it really turns out that it makes a difference whether you write 0x170001 or 0x00170001, you should report it as a compiler bug - but be really sure before doing that.
See here for a working example compiled with gcc : http://ideone.com/UR566Q

For loop is not incrementing

I have been working on this code class today and assure you I have gone through it a number of times. For some reason whenever I set my breakpoints to determine the value of "channelsel" all I get is "0". I never get 1,2,3 or 4 (my MAXCHANNELS is 5).
I'm using: P18F45K22 microcontroller, and mplab c18.
Please take a look at the following code, and thank you in advance
int channelsel = 0;
for (channelsel = 0; channelsel < MAXCHANNELS; channelsel++)
{
switch(channelsel)
{
case 0:
SetChanADC(ADC_CH0);
break;
case 1:
SetChanADC(ADC_CH1);
break;
case 2:
SetChanADC(ADC_CH2);
break;
case 3:
SetChanADC(ADC_CH3);
break;
case 4:
SetChanADC(ADC_CH4);
break;
default:
SetChanADC(ADC_CH0);
break;
}
ConvertADC();
while(BusyADC() == TRUE) Delay1TCY();
sampledValue = ReadADC();
setCurrentTemperatureForChannel(channelsel, sampledValue);
sprintf (buf, "current Temp of channel %i is %x \n\r", channelsel, sampledValue);
puts1USART(buf);
Delay10KTCYx(10);
}
Declare channelsel as volatile
volatile int channelsel
It is quite likely that your compiler is optimizing away the rest of the statements so that they are not even in the disassembly. When dealing with values that update extremely fast or conditional statements who are in close proximity to the the control values declaration and assignment, volatile tells the compiler that we always want a fresh value for this variable and to take any shortcut. Variables that depend on IO should always be declared volatile and cases like this are good candidates for its use. Compilers are all different and your mileage may vary.
If you are sure that your hardware is configured correctly, this would be my suggestion. If in doubt, please post your disassembled code for this segment.
I have been working on PIC18, its a bug that my co-worker discovered, for loops don't work with the c18 compiler, if you change it to a while loop it will work fine.

Increasing performance of 32bit math on 16bit processor

I am working on some firmware for an embedded device that uses a 16 bit PIC operating at 40 MIPS and programming in C. The system will control the position of two stepper motors and maintain the step position of each motor at all times. The max position of each motor is around 125000 steps so I cannot use a 16bit integer to keep track of the position. I must use a 32 bit unsigned integer (DWORD). The motor moves at 1000 steps per second and I have designed the firmware so that steps are processed in a Timer ISR. The timer ISR does the following:
1) compare the current position of one motor to the target position, if they are the same set the isMoving flag false and return. If they are different set the isMoving flag true.
2) If the target position is larger than the current position, move one step forward, then increment the current position.
3) If the target position is smaller than the current position, move one step backward, then decrement the current position.
Here is the code:
void _ISR _NOPSV _T4Interrupt(void)
{
static char StepperIndex1 = 'A';
if(Device1.statusStr.CurrentPosition == Device1.statusStr.TargetPosition)
{
Device1.statusStr.IsMoving = 0;
// Do Nothing
}
else if (Device1.statusStr.CurrentPosition > Device1.statusStr.TargetPosition)
{
switch (StepperIndex1) // MOVE OUT
{
case 'A':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'B':
SetMotor1PosC();
StepperIndex1 = 'C';
break;
case 'C':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'D':
default:
SetMotor1PosA();
StepperIndex1 = 'A';
break;
}
Device1.statusStr.CurrentPosition--;
Device1.statusStr.IsMoving = 1;
}
else
{
switch (StepperIndex1) // MOVE IN
{
case 'A':
SetMotor1PosD();
StepperIndex1 = 'D';
break;
case 'B':
SetMotor1PosA();
StepperIndex1 = 'A';
break;
case 'C':
SetMotor1PosB();
StepperIndex1 = 'B';
break;
case 'D':
default:
SetMotor1PosC();
StepperIndex1 = 'C';
break;
}
Device1.statusStr.CurrentPosition++;
Device1.statusStr.IsMoving = 1;
}
_T4IF = 0; // Clear the Timer 4 Interrupt Flag.
}
The target position is set in the main program loop when move requests are received. The SetMotorPos lines are just macros to turn on/off specific port pins.
My question is: Is there any way to improve the efficiency of this code? The code functions fine as is if the positions are 16bit integers but as 32bit integers there is too much processing. This device must communicate with a PC without hesitation and as written there is a noticeable performance hit. I really only need 18 bit math but I don't know of an easy way of doing that! Any constructive input/suggestions would be most appreciated.
Warning: all numbers are made up...
Supposing that the above ISR has about 200 (likely, fewer) instructions of compiled code and those include the instructions to save/restore the CPU registers before and after the ISR, each taking 5 clock cycles (likely, 1 to 3) and you call 2 of them 1000 times a second each, we end up with 2*1000*200*5 = 2 millions of clock cycles per second or 2 MIPS.
Do you actually consume the rest 38 MIPS elsewhere?
The only thing that may be important here and I can't see it, is what's done inside of the SetMotor*Pos*() functions. Do they do any complex calculations? Do they perform some slow communication with the motors, e.g. wait for them to respond to the commands sent to them?
At any rate, it's doubtful that such simple code would be noticeably slower when working with 32-bit integers than with 16-bit.
If your code is slow, find out where time is spent and how much, profile it. Generate a square pulse signal in the ISR (going to 1 when the ISR starts, going to 0 when the ISR is about to return) and measure its duration with an oscilloscope. Or do whatever is easier to find it out. Measure the time spent in all parts of the program, then optimize where really necessary, not where you have previously thought it would be.
The difference between 16 and 32 bits arithmetic shouldn't be that big, I think, since you use only increment and comparision. But maybe the problem is that each 32-bit arithmetic operation implies a function call (if the compiler isn't able/willing to do inlining of simpler operations).
One suggestion would be to do the arithmetic yourself, by breaking the Device1.statusStr.CurrentPosition in two, say, Device1.statusStr.CurrentPositionH and Device1.statusStr.CurrentPositionL. Then use some macros to do the operations, like:
#define INC(xH,xL) {xL++;if (xL == 0) xH++;}
I would get rid of the StepperIndex1 variable and instead use the two low-order bits of CurrentPosition to keep track of the current step index. Alternately, keep track of the current position in full rotations (rather than each step), so it can fit in a 16 bit variable. When moving, you only increment/decrement the position when moving to phase 'A'. Of course, this means you can only target each full rotation, rather than every step.
Sorry, but you are using bad program design.
Let's check the difference between 16 bit and 32 bit PIC24 or PIC33 asm code...
16 bit increment
inc PosInt16 ;one cycle
So 16 bit increment takes one cycle
32bit increment
clr Wd ;one cycle
inc low PosInt32 ;one cycle
addc high PosInt32, Wd ;one cycle
and 32 increment takes three cycles.
The total difference is 2 cycles or 50ns (nano seconds).
Simple calcolation will show you all. You have 1000 steps per second and 40Mips DSP so you have 40000 instructions per step at 1000 steps per second. More than enough!
When you change it from 16bit to 32bit do you change any of the compile flags to tell it to compile as a 32bit application instead.
have you tried compiling with the 32bit extensions but using only 16bit integers. do you still get such a performance drop?
It's likely that just by changing from 16bit to 32bit that some operations are compiled differently, perhaps do a Diff between the two sets of compiled ASM code and see what is actually different, is it lots or is it only a couple of lines.
Solutions would be maybe instead of using a 32bit integer, just use two 16bit integers,
when the valueA is int16.Max then set it to 0 and then increment valueB by 1 otherwise just incriment ValueA by 1, when value B is >= 3 you then check valueA >= 26696 (or something similar depending if you use unsigned or signed int16) and then you have your motor checking at 12500.

Resources