I have a some problem with for loop copies' time I don't why for loop is taking much time for the copy small data size.
I am using PIC24FJ256GL406 MCU and everything is fine but when operate the UART micro-controller is running slow because some delay is occurring while copy the buffer data.
Let me explain you with code and debug log.
Here I am posting the function for the UART transmission. this function is generally transfer only FIRST character and Rest of the byte will be transfer in the Interrupt routine service.
My clock Frequency is 32Mhz So peripheral will be 16 Mhz.
I did not understand why for loop is taking Almost 20 MS for the copy the 16 byte data only. So This time will be increase if data is more.
unsigned int UART1_WriteBuffer(const uint8_t *buffer, const unsigned int bufLen)
{
//transmit first char
U1TXREG = buffer[0];
while (!U1STAbits.TRMT);
numBytesWritten = bufLen - 1;
totalByte = 0;
//get the current time stamp
WSTimestamp currentTimeStamp = WSGetCurrTimestamp();
WMLogInfo(GEN_LOG, "current time stamp %ld", currentTimeStamp);
uint16_t i = 0;
//memset
memset(&uart1_txByteQ, 0x00, sizeof(uart1_txByteQ));
for (i = 0; i < numBytesWritten; i++)
{
uart1_txByteQ = buffer[i + 1]; //copy the data
}
_U1TXIE = 1;
//get last time stamp
WSTimestamp lastTimeStamp = WSGetCurrTimestamp();
WMLogInfo(GEN_LOG, "last time stamp %ld ", lastTimeStamp);
///print the debug
WMLogInfo(GEN_LOG, "total = %ld MS time taken fo copy the = %d byte", lastTimeStamp - currentTimeStamp, numBytesWritten);
return bufLen;
}
My Interrupt Routine service.
void __attribute__((interrupt, no_auto_psv)) _U1TXInterrupt(void)
{
if (totalByte < numBytesWritten)
{
U1TXREG = uart1_txByteQ[totalByte++];
while (!U1STAbits.TRMT);
}
else
{
_U1TXIE = 0;
_U1TXIE = 0;
}
}
Console Log.
This is function log. please note.
GEN:main loop current time stamp 6503<\r><\n>
GEN:current time stamp 6506<\r><\n>
GEN:last time stamp 6526 <\r><\n>
GEN:total = 20 MS time taken fo copy the = 16 byte<\r><\n>
GEN:command "AT+QREFUSECS=1,1<\r>" send with len [17]
The long delays are caused by busy-waiting for flags in combination with some use-case bug. Why are you using Tx interrupt in the first place in case you intend to busy-wait poll for a flag anyhow? The ISR doesn't make sense - it would seem that you should simply drop Tx interrupts entirely.
In addition, you have a naive implementation of buffer copies. It's a very common embedded systems beginner bug to hard copy RAM buffers needlessly. There's very few cases where you actually need to do hard copies and this isn't one. Instead you should use a system with double buffers and simply swap a pointer between them:
static uint8_t buf1 [n];
static uint8_t buf2 [n];
static uint8_t* rx_buf = buf1; // buffer used by rx ISR
static uint8_t* app_buf = buf2; // buffer used by the application
...
if(rx_done)
{ // swap buffers
uint8_t* tmp = rx_buf;
rx_buf = app_buf;
app_buf = tmp;
}
This also solves the double-buffering problem where an UART rx interrupt needs to store it's incoming data somewhere at the same time as the main program uses data. You'll need to protect against race conditions somehow - with a semaphore etc. And you'll need to declare variables shared with ISRs volatile to protect against bad compiler optimizers.
Related
Our task is intended to demonstrate the benefit of using DMA to copy a large amount of data versus relying on the processor to directly handle the copying.
The processor is an STM32F407 on the ST discovery board.
In order to measure the copying time, a GPIO pin must be turned ON during copying and OFF once it has been copied.
The code appears to be functional but it is currently showing the CPU taking about 2.15ms to complete and DMA about 4.5ms, which is the opposite of what is intended. I'm not sure if there simply isn't enough data for the faster speed of DMA to offset the overhead in setting it up perhaps?
I have tried both copying elements of an array using the CPU and also using the memcpy function which seemed to yield very similar times.
The function code is shown below:
DMASpeed(void)
{
#define elementNum 32000
int *ptr = NULL;
ptr = (int*)malloc(elementNum * sizeof(int));
int *ptr2 = NULL;
ptr2 = (int*)malloc(elementNum * sizeof(int));
for (int i = 0; i < elementNum; i++)
{
ptr[i] = 4;
}
LD5_GPIO_Port->BSRR = (uint32_t)LD5_Pin << 16U;
LD6_GPIO_Port->BSRR = (uint32_t)LD6_Pin << 16U;
// Initial value
// printf("BEFORE: dst = '%s'\n", dst);
// Transfer
printf("Initiate DMA Transfer...\n");
HAL_DMA_Start(&hdma_memtomem_dma2_stream0, (int)ptr, (int)ptr2, (elementNum * sizeof(int)));
LD5_GPIO_Port->BSRR = LD5_Pin;
printf("DMA Transfer initiated.\n");
// Poll for DMA completion
printf("Poll for DMA completion.\n");
HAL_DMA_PollForTransfer(&hdma_memtomem_dma2_stream0,
HAL_DMA_FULL_TRANSFER, HAL_MAX_DELAY);
LD5_GPIO_Port->BSRR = (uint32_t)LD5_Pin << 16U;
printf("DMA complete.\n");
// Print result
// printf("AFTER: dst = '%s'\n", dst);
free(ptr);
free(ptr2);
ptr = (int*)malloc(elementNum * sizeof(int));
ptr2 = (int*)malloc(elementNum * sizeof(int));
for (int i = 0; i < elementNum; i++)
{
ptr[i] = i;
}
printf("Initiate CPU Transfer...\n");
LD6_GPIO_Port->BSRR = LD6_Pin;
// for (int i = 0; i<512; i++)
// {
// ptr2[i] = ptr[i];
// }
memcpy(ptr2, ptr, (elementNum * sizeof(int)));
printf("CPU Transfer Complete.\n");
LD6_GPIO_Port->BSRR = (uint32_t)LD6_Pin << 16U;
free(ptr);
free(ptr2);
}
Thanks in advance for any assistance
you try to proof something what is not the true. DMA memory to memory transfer will be always slower than direct CPU one. DMA was not intended to be faster than the CPU. it's there is to provide the transfer w
without the CPU activity in the background. the core has always priority over the DMA.
MEM to MEM DMA transfer will be always slower than the CPU one
There is another problem as well. Many STM devices have memory areas which are not accessible by the DMA (for example CCMRAM).
Remove printf in below code segment:
LD5_GPIO_Port->BSRR = LD5_Pin;
printf("DMA Transfer initiated.\n"); // <--Remove this
// Poll for DMA completion
printf("Poll for DMA completion.\n"); // <--Remove this
You are turning ON the pin and then printing large text , it is adding up in your total time calculation.
Remove all printf OR atleast do not print anything in between pin toggling.
EDIT:
To be precise you are printing 50 characters in case of DMA transfer and 23 characters in case of CPU transfer.
For those, who google for "How to fasten DMA memory-to-memory transfer?" here is the piece of advice: force your compiler to allocate all HAL code, related to your DMA transfer to the RAM, the best is to the RAM exclusively coupled with the Core. Your compiler will generate function code, which will be copied to the specific RAM at startup, and then all that functions will be called from the RAM and sped up because of it. However, that is also true for copying "by hand".
In this case, it is recommended to allocate to the RAM the following files/functions:
stm32[whatever]_hal_dma.c
DMA[N]_Stream[M]_IRQHandler(), where N and M are the numbers of your DMA and stream used for the transfer respectively.
In reference to SO question: 52164135
The setup:
I have a function which converts many double values into a predefined string. The input is a array of struct from which we concatenate two double values into a string. A double is of size 8 bytes or 64 bits and my MCU of operation is STM32, a 32 bit ARM micro-controller.
An interrupt is also running parallelly.
The data should look like:
[[12.11111111,12.11111111],[12.22222222,12.22222222],...]
But I get (very rarely):
[[12.11111111,12.11111111],[55.01[12.33333333,12.33333333],...]
Note: I missed out [12.22222222,12.22222222]
sprintf is not re-entrant:
According to this discussion, on AVRFreaks, sprintf is not re-entrant. (The discussion is on using sprintf in a interrupt enabled hardware environment.) Which means if an interrupt occurs in-between a sprintf operation the stack fails to continue the operation it was doing.
Since my MCU is a 32 bit one, to perform a 64 bit operation it will take two clock cycles. And if we assume an interrupt occurred in between the sprintf operation according the the above discussion sprintf should fail.
Question
1. Will sprintf fail in case it is interrupted?
Here is the string function, an interrupt routine also runs in the background which deals with other sensor data (local and global)
/* #brief From the array of GPS structs we create a string of the format
* [[lat,long],[lat,long],..]
* #param input The input array of GPS structs
* #param output The output string which will contain lat, long
* #param sz Size left in the output buffer
* #return 0 Successfully completed operation
* 1 Failed / Error
*/
int get_gps60secString(GPS_periodic_t input[GPS_PERIODIC_ARRAY_SIZE],
char *output, size_t sz)
{
int cnt = snprintf(output, sz, "[");
if (cnt < 0 || cnt >= sz)
return 1;
output += cnt;
sz -= cnt;
int i = 0;
for (i = 0; i < GPS_PERIODIC_ARRAY_SIZE; i++) {
cnt = snprintf(output, sz, "[%0.8f,%0.8f]%s",
input[i].point.latitude, input[i].point.longitude,
i + 1 == GPS_PERIODIC_ARRAY_SIZE ? "" : ",");
if (cnt < 0 || cnt >= sz)
return 1;
output += cnt;
sz -= cnt;
}
cnt = snprintf(output, sz, "]");
if (cnt < 0 || cnt >= sz)
return 1;
return 0; // no error
}
What's happening inside the interrupt routine
void GPS_InterruptHandler(UART_HandleTypeDef *UartHandle)
{
gps_UART_RxInterrupt_Disable();
GPS_t l_sGpsInfo;
memset(&l_sGpsInfo,0,sizeof(GPS_t));
status=Validate_get_gpsInfo((char*)g_gps_readBuff,&l_sGpsInfo,100);
MEMS_interruptHandler(); //Inertial sensor ISR
gps_UART_RxInterrupt_Enable();
}
sprintf will ony fail during an interrupt if it is called again during that interrupt (assuming it uses global variables that are re-used; would it use only stack variables, then it is re-entrant).
So if your interrupt handler is calling sprintf and during that call a new, same or higher priority interrupt occurs then it can fail. However, during the processing of an interrupt, interrupts are normally disabled so there can't (shouldn't!) be another interupt of the same type occurring.
But why convert this raw data during interrupt handling? Why not store/pass this data to the user-level routine via a buffer and have that functionality convert the raw data? That would be consistent with the idea that an interrupt handler should be as short (fast) as possible.
I am currently working with I2C in Arch Linux Arm and not quite sure how to calculate the absolute minimum delay there is required between a write and a read. If i don't have this delay the read naturally does not come through. I have just applied usleep(1000) between the two commands, which works, but its just done empirically and has to be optimized to the real value (somehow). But how?.
Here is my code sample for the write_and_read function i am using:
int write_and_read(int handler, char *buffer, const int bytesToWrite, const int bytesToRead) {
write(handler, buffer, bytesToWrite);
usleep(1000);
int r = read(handler, buffer, bytesToRead);
if(r != bytesToRead) {
return -1;
}
return 0;
}
Normally there's no need to wait. If your writing and reading function is threaded somehow in the background (why would you do that???) then synchronizating them is mandatory.
I2C is a very simple linear communication and all the devices used my me was able to produce the output data within microsecs.
Are you using 100kHz, 400kHz or 1MHz I2C?
Edited:
After some discuss I suggest you this to try:
void dataRequest() {
Wire.write(0x76);
x = 0;
}
void dataReceive(int numBytes)
{
x = numBytes;
for (int i = 0; i < numBytes; i++) {
Wire.read();
}
}
Where x is a global variable defined in the header then assigned 0 in the setup(). You may try to add a simple if condition into the main loop, e.g. if x > 0, then send something in serial.print() as a debug message, then reset x to 0.
With this you are not blocking the I2C operation with the serial traffic.
I'm writing a driver for a GSM modem running on an ARM Cortex M0. The only UART on the system is in use for talking to the modem, so the best I can do for logging the UART conversation with the modem is to build up a string in memory and watch it with GDB.
Here are my UART logging functions.
// Max number of characters user in the UART log, when in use.
#define GSM_MAX_UART_LOG_CHARS (2048)
static char m_gsm_uart_log[GSM_MAX_UART_LOG_CHARS] = "";
static uint16_t m_gsm_uart_log_index = 0;
// Write a character to the in-memory log of all UART messages.
static void gsm_uart_log_char(const char value)
{
m_gsm_uart_log_index++;
if (m_gsm_uart_log_index > GSM_MAX_UART_LOG_CHARS)
{
// Clear and restart log.
memset(&m_gsm_uart_log, 0, GSM_MAX_UART_LOG_CHARS); // <-- Breakpoint here
m_gsm_uart_log_index = 0;
}
m_gsm_uart_log[m_gsm_uart_log_index] = value;
}
// Write a string to the in-memory log of all UART messages.
static void gsm_uart_log_string(const char *value)
{
uint16_t i = 0;
char ch = value[i++];
while (ch != '\0')
{
gsm_uart_log_char(ch);
ch = value[i++];
}
}
If I set a breakpoint on the line shown above, the first time it's reached, m_gsm_uart_log_index is already well over 2048. I've seen 2154 and a bunch of other values between 2048 and 2200 or so.
How is this possible? There's no other code that touches m_gsm_uart_log_index anywhere.
You have a buffer overflow happening which could trample on m_gsm_uart_log_index.
The check for end of buffers should be:
if (m_gsm_uart_log_index >= GSM_MAX_UART_LOG_CHARS) {
...
}
As it stands, m_gsm_uart_log_index can reach 2048, and so writing m_gsm_uart_log_index[2048] is likely to be at the location where m_gsm_uart_log_index is stored.
You are writing to the buffer when m_gsm_uart_log_index == GSM_MAX_UART_LOG_CHARS, which means that you are overrunning the buffer by 1 character. This writes into the first byte of m_gsm_uart_log_index and corrupts it.
Change:
if (m_gsm_uart_log_index > GSM_MAX_UART_LOG_CHARS)
to:
if (m_gsm_uart_log_index >= GSM_MAX_UART_LOG_CHARS)
I'm using code to configure a simple robot. I'm using WinAVR, and the code used there is similar to C, but without stdio.h libraries and such, so code for simple stuff should be entered manually (for example, converting decimal numbers to hexadecimal numbers is a multiple-step procedure involving ASCII character manipulation).
Example of code used is (just to show you what I'm talking about :) )
.
.
.
DDRA = 0x00;
A = adc(0); // Right-hand sensor
u = A>>4;
l = A&0x0F;
TransmitByte(h[u]);
TransmitByte(h[l]);
TransmitByte(' ');
.
.
.
For some circumstances, I must use WinAVR and cannot external libraries (such as stdio.h). ANYWAY, I want to apply a signal with pulse width of 1 ms or 2 ms via a servo motor. I know what port to set and such; all I need to do is apply a delay to keep that port set before clearing it.
Now I know how to set delays, we should create empty for loops such as:
int value= **??**
for(i = 0; i<value; i++)
;
What value am I supposed to put in "value" for a 1 ms loop ?
Chances are you'll have to calculate a reasonable value, then look at the signal that's generated (e.g., with an oscilloscope) and adjust your value until you hit the right time range. Given that you apparently have a 2:1 margin, you might hit it reasonably close the first time, but I wouldn't be much on it.
For your first approximation, generate an empty loop and count the instruction cycles for one loop, and multiply that by the time for one clock cycle. That should give at least a reasonable approximation of time taken by a single execution of the loop, so dividing the time you need by that should get you into the ballpark for the right number of iterations.
Edit: I should also note, however, that (at least most) AVRs have on-board timers, so you might be able to use them instead. This can 1) let you do other processing and/or 2) reduce power consumption for the duration.
If you do use delay loops, you might want to use AVR-libc's delay loop utilities to handle the details.
If my program is simple enough there is not a need of explicit timer programming, but it should be portable. One of my choices for a defined delay would be AVR Libc's delay function:
#include <delay.h>
_delay_ms (2) // Sleeps 2 ms
Is this going to go to a real robot? All you have is a CPU, no other integrated circuits that can give a measure of time?
If both answers are 'yes', well... if you know the exact timing for the operations, you can use the loop to create precise delays. Output your code to assembly code, and see the exact sequence of instructions used. Then, check the manual of the processor, it'll have that information.
If you need a more precise time value you should employ an interrupt service routine based on an internal timer. Remember a For loop is a blocking instruction, so while it is iterating the rest of your program is blocked. You could set up a timer based ISR with a global variable that counts up by 1 every time the ISR runs. You could then use that variable in an "if statement" to set the width time. Also that core probably supports PWM for use with the RC type servos. So that may be a better route.
This is a really neat little tasker that I use sometimes. It's for an AVR.
************************Header File***********************************
// Scheduler data structure for storing task data
typedef struct
{
// Pointer to task
void (* pTask)(void);
// Initial delay in ticks
unsigned int Delay;
// Periodic interval in ticks
unsigned int Period;
// Runme flag (indicating when the task is due to run)
unsigned char RunMe;
} sTask;
// Function prototypes
//-------------------------------------------------------------------
void SCH_Init_T1(void);
void SCH_Start(void);
// Core scheduler functions
void SCH_Dispatch_Tasks(void);
unsigned char SCH_Add_Task(void (*)(void), const unsigned int, const unsigned int);
unsigned char SCH_Delete_Task(const unsigned char);
// Maximum number of tasks
// MUST BE ADJUSTED FOR EACH NEW PROJECT
#define SCH_MAX_TASKS (1)
************************Header File***********************************
************************C File***********************************
#include "SCH_AVR.h"
#include <avr/io.h>
#include <avr/interrupt.h>
// The array of tasks
sTask SCH_tasks_G[SCH_MAX_TASKS];
/*------------------------------------------------------------------*-
SCH_Dispatch_Tasks()
This is the 'dispatcher' function. When a task (function)
is due to run, SCH_Dispatch_Tasks() will run it.
This function must be called (repeatedly) from the main loop.
-*------------------------------------------------------------------*/
void SCH_Dispatch_Tasks(void)
{
unsigned char Index;
// Dispatches (runs) the next task (if one is ready)
for(Index = 0; Index < SCH_MAX_TASKS; Index++)
{
if((SCH_tasks_G[Index].RunMe > 0) && (SCH_tasks_G[Index].pTask != 0))
{
(*SCH_tasks_G[Index].pTask)(); // Run the task
SCH_tasks_G[Index].RunMe -= 1; // Reset / reduce RunMe flag
// Periodic tasks will automatically run again
// - if this is a 'one shot' task, remove it from the array
if(SCH_tasks_G[Index].Period == 0)
{
SCH_Delete_Task(Index);
}
}
}
}
/*------------------------------------------------------------------*-
SCH_Add_Task()
Causes a task (function) to be executed at regular intervals
or after a user-defined delay
pFunction - The name of the function which is to be scheduled.
NOTE: All scheduled functions must be 'void, void' -
that is, they must take no parameters, and have
a void return type.
DELAY - The interval (TICKS) before the task is first executed
PERIOD - If 'PERIOD' is 0, the function is only called once,
at the time determined by 'DELAY'. If PERIOD is non-zero,
then the function is called repeatedly at an interval
determined by the value of PERIOD (see below for examples
which should help clarify this).
RETURN VALUE:
Returns the position in the task array at which the task has been
added. If the return value is SCH_MAX_TASKS then the task could
not be added to the array (there was insufficient space). If the
return value is < SCH_MAX_TASKS, then the task was added
successfully.
Note: this return value may be required, if a task is
to be subsequently deleted - see SCH_Delete_Task().
EXAMPLES:
Task_ID = SCH_Add_Task(Do_X,1000,0);
Causes the function Do_X() to be executed once after 1000 sch ticks.
Task_ID = SCH_Add_Task(Do_X,0,1000);
Causes the function Do_X() to be executed regularly, every 1000 sch ticks.
Task_ID = SCH_Add_Task(Do_X,300,1000);
Causes the function Do_X() to be executed regularly, every 1000 ticks.
Task will be first executed at T = 300 ticks, then 1300, 2300, etc.
-*------------------------------------------------------------------*/
unsigned char SCH_Add_Task(void (*pFunction)(), const unsigned int DELAY, const unsigned int PERIOD)
{
unsigned char Index = 0;
// First find a gap in the array (if there is one)
while((SCH_tasks_G[Index].pTask != 0) && (Index < SCH_MAX_TASKS))
{
Index++;
}
// Have we reached the end of the list?
if(Index == SCH_MAX_TASKS)
{
// Task list is full, return an error code
return SCH_MAX_TASKS;
}
// If we're here, there is a space in the task array
SCH_tasks_G[Index].pTask = pFunction;
SCH_tasks_G[Index].Delay =DELAY;
SCH_tasks_G[Index].Period = PERIOD;
SCH_tasks_G[Index].RunMe = 0;
// return position of task (to allow later deletion)
return Index;
}
/*------------------------------------------------------------------*-
SCH_Delete_Task()
Removes a task from the scheduler. Note that this does
*not* delete the associated function from memory:
it simply means that it is no longer called by the scheduler.
TASK_INDEX - The task index. Provided by SCH_Add_Task().
RETURN VALUE: RETURN_ERROR or RETURN_NORMAL
-*------------------------------------------------------------------*/
unsigned char SCH_Delete_Task(const unsigned char TASK_INDEX)
{
// Return_code can be used for error reporting, NOT USED HERE THOUGH!
unsigned char Return_code = 0;
SCH_tasks_G[TASK_INDEX].pTask = 0;
SCH_tasks_G[TASK_INDEX].Delay = 0;
SCH_tasks_G[TASK_INDEX].Period = 0;
SCH_tasks_G[TASK_INDEX].RunMe = 0;
return Return_code;
}
/*------------------------------------------------------------------*-
SCH_Init_T1()
Scheduler initialisation function. Prepares scheduler
data structures and sets up timer interrupts at required rate.
You must call this function before using the scheduler.
-*------------------------------------------------------------------*/
void SCH_Init_T1(void)
{
unsigned char i;
for(i = 0; i < SCH_MAX_TASKS; i++)
{
SCH_Delete_Task(i);
}
// Set up Timer 1
// Values for 1ms and 10ms ticks are provided for various crystals
OCR1A = 15000; // 10ms tick, Crystal 12 MHz
//OCR1A = 20000; // 10ms tick, Crystal 16 MHz
//OCR1A = 12500; // 10ms tick, Crystal 10 MHz
//OCR1A = 10000; // 10ms tick, Crystal 8 MHz
//OCR1A = 2000; // 1ms tick, Crystal 16 MHz
//OCR1A = 1500; // 1ms tick, Crystal 12 MHz
//OCR1A = 1250; // 1ms tick, Crystal 10 MHz
//OCR1A = 1000; // 1ms tick, Crystal 8 MHz
TCCR1B = (1 << CS11) | (1 << WGM12); // Timer clock = system clock/8
TIMSK |= 1 << OCIE1A; //Timer 1 Output Compare A Match Interrupt Enable
}
/*------------------------------------------------------------------*-
SCH_Start()
Starts the scheduler, by enabling interrupts.
NOTE: Usually called after all regular tasks are added,
to keep the tasks synchronised.
NOTE: ONLY THE SCHEDULER INTERRUPT SHOULD BE ENABLED!!!
-*------------------------------------------------------------------*/
void SCH_Start(void)
{
sei();
}
/*------------------------------------------------------------------*-
SCH_Update
This is the scheduler ISR. It is called at a rate
determined by the timer settings in SCH_Init_T1().
-*------------------------------------------------------------------*/
ISR(TIMER1_COMPA_vect)
{
unsigned char Index;
for(Index = 0; Index < SCH_MAX_TASKS; Index++)
{
// Check if there is a task at this location
if(SCH_tasks_G[Index].pTask)
{
if(SCH_tasks_G[Index].Delay == 0)
{
// The task is due to run, Inc. the 'RunMe' flag
SCH_tasks_G[Index].RunMe += 1;
if(SCH_tasks_G[Index].Period)
{
// Schedule periodic tasks to run again
SCH_tasks_G[Index].Delay = SCH_tasks_G[Index].Period;
SCH_tasks_G[Index].Delay -= 1;
}
}
else
{
// Not yet ready to run: just decrement the delay
SCH_tasks_G[Index].Delay -= 1;
}
}
}
}
// ------------------------------------------------------------------
************************C File***********************************
Most ATmega AVR chips, which are commonly used to make simple robots, have a feature known as pulse-width modulation (PWM) that can be used to control servos. This blog post might serve as a quick introduction to controlling servos using PWM. If you were to look at the Arduino platform's servo control library, you would find that it also uses PWM.
This might be a better choice than relying on running a loop a constant number of times as changes to compiler optimization flags and the chip's clock speed could potentially break such a simple delay function.
You should almost certainly have an interrupt configured to run code at a predictable interval. If you look in the example programs supplied with your CPU, you'll probably find an example of such.
Typically, one will use a word/longword of memory to hold a timer, which will be incremented each interrupt. If your timer interrupt runs 10,000 times/second and increments "interrupt_counter" by one each time, a 'wait 1 ms' routine could look like:
extern volatile unsigned long interrupt_counter;
unsigned long temp_value = interrupt_counter;
do {} while(10 > (interrupt_counter - temp_value));
/* Would reverse operands above and use less-than if this weren't HTML. */
Note that as written the code will wait between 900 µs and 1000 µs. If one changed the comparison to greater-or-equal, it would wait between 1000 and 1100. If one needs to do something five times at 1 ms intervals, waiting some arbitrary time up to 1 ms for the first time, one could write the code as:
extern volatile unsigned long interrupt_counter;
unsigned long temp_value = interrupt_counter;
for (int i=0; 5>i; i++)
{
do {} while(!((temp_value - interrupt_counter) & 0x80000000)); /* Wait for underflow */
temp_value += 10;
do_action_thing();
}
This should run the do_something()'s at precise intervals even if they take several hundred microseconds to complete. If they sometimes take over 1 ms, the system will try to run each one at the "proper" time (so if one call takes 1.3 ms and the next one finishes instantly, the following one will happen 700 µs later).