I'm working on an interrupt handler with a hardware design group and we're trying to figure out where a bug is. I'm reading a chip over the SPI bus at 5khz. The chip loads 4 bytes and triggers a data ready pin.
My interrupt handler wakes up and read 4 bytes off the SPI bus and stores the data in a buffer. Strangely enough though, every 17th read gives 4 bytes of all 0's, which is not right. One of the options we're exploring is that the chip isn't always actually ready when it sends the data ready signal.
So, I know I can't sleep in an interrupt handler, but I'd like to try and introduce a delay of 10 or 20 microseconds. Right now I have a for loop which counts to 100,000 then processes the interrupt. I haven't seen any changes, so I thought I might see if someone has a better technique for busy waiting. Or at least a better way of figuring out how many loop iterations I should go through, as I'm not sure how long this takes, or if the compiler is simply optimizing out the whole thing.
I dont know if you have access to any pseudorandom number generation libraries on your embedded device, but doing large number multiplication followed by mod will definately take some cycles. Instead of simply adding 1 (which is very fast at the hardware level and the compiler can optimize it to shifting since you're doing it a static number of times) use a random number seed (does the system have access to a time clock?) if available and do large number multiplication, modulus or factorial operations, negative number division also takes forever. Remember, division takes the longest at the hardware level. Use that to your advantage.
I assume your compiler will strip out a simple loop.
You should use volatile.
volatile unsigned long i;
for (i=0;i< 1000000; i++)
continue;
I assume also that this will not remove the problem or help you.
I can't believe, that a SPI peripheral has such a bug.
But it's possible that you read to slow the data from the SPI-Fifo.
So some of the received data will be dropped.
You should check the error flags of the SPI module and check the RX-empty RX-fullflags of the SPI.
Related
I am trying to read 256 bits in less than 32 ms from SPI, the data frame is 16 bits. My problem here is SPI driver has a long idle time between each 16 bits. See this picture, as you can see I am reading 64 bits (highlighted by the red rectangle), and there are long pauses between each frame. I don't find anything in the SPI specification about it.
I am testing on an STM32F407 board from Keil and SPI is initialized by the Keil CMSIS default driver.
Is there any way to reduce this idle time?
You can show the code snippet for rest of people to be able to help, but anyways ı remember coming across the same problem.Here is how i reduce that extra time gap;
Use Direct Register Access instead of Functions, both for sending and flag checking.
If that also wont work, instead of flag checking you can use simple for loop for some amount of delay to keep to clock signal low enough but not much, check from ossilloscope to find the accurate iteration number.
Also make sure that no interrupt is being served while you send your message,there might be an ISR interrupting between the end of transmission and the code that pulls the CLK signal high. If you have ISRs that occur asynchronusly and big in size this is probably the case.
SPI1->DR = (uint16_t)Data;
while(!(SPI1->SR & SPI_SR_TXE));
PS: I ve done this on Cortex-M3,no RTOS,STD Peripheral Library.
Edit: Also try with busy Flag only,thus checking EOC only while(SPI1->SR & SPI_SR_BSY);
I am new in embedded development and few times ago I red some code about a PIC24xxxx.
void i2c_Write(char data) {
while (I2C2STATbits.TBF) {};
IFS3bits.MI2C2IF = 0;
I2C2TRN = data;
while (I2C2STATbits.TRSTAT) {};
Nop();
Nop();
}
What do you think about the while condition? Does the microchip not using a lot of CPU for that?
I asked myself this question and surprisingly saw a lot of similar code in internet.
Is there not a better way to do it?
What about the Nop() too, why two of them?
Generally, in order to interact with hardware, there are 2 ways:
Busy wait
Interrupt base
In your case, in order to interact with the I2C device, your software is waiting first that the TBF bit is cleared which means the I2C device is ready to accept a byte to send.
Then your software is actually writing the byte into the device and waits that the TRSTAT bit is cleared, meaning that the data has been correctly processed by your I2C device.
The code your are showing is written with busy wait loops, meaning that the CPU is actively waiting the HW. This is indeed waste of resources, but in some case (e.g. your I2C interrupt line is not connected or not available) this is the only way to do.
If you would use interrupt, you would ask the hardware to tell you whenever a given event is happening. For instance, TBF bit is cleared, etc...
The advantage of that is that, while the HW is doing its stuff, you can continue doing other. Or just sleep to save battery.
I'm not an expert in I2C so the interrupt event I have described is most likely not accurate, but that gives you an idea why you get 2 while loop.
Now regarding pro and cons of interrupt base implementation and busy wait implementation I would say that interrupt based implementation is more efficient but more difficult to write since you have to process asynchronous event coming from HW. Busy wait implementation is easy to write but is slower; But this might still be fast enough for you.
Eventually, I got no idea why the 2 NoP are needed there. Most likely a tweak which is needed because somehow, the CPU would still go too fast.
when doing these kinds of transactions (i2c/spi) you find yourself in one of two situations, bit bang, or some form of hardware assist. bit bang is easier to implement and read and debug, and is often quite portable from one chip/family to the next. But burns a lot of cpu. But microcontrollers are mostly there to be custom hardware like a cpld or fpga that is easier to program. They are there to burn cpu cycles pretending to be hardware designs. with i2c or spi you are trying to create a specific waveform on some number of I/O pins on the device and at times latching the inputs. The bus has a spec and sometimes is slower than your cpu. Sometimes not, sometimes when you add the software and compiler overhead you might end up not needing a timer for delays you might be just slow enough. But ideally you look at the waveform and you simply create it, raise pin X delay n ms, raise pin Y delay n ms, drop pin Y delay 2*n ms, and so on. Those delays can come from tuned loops (count from 0 to 1341) or polling a timer until it gets to Z number of ticks of some clock. Massive cpu waste, but the point is you are really just being programmable hardware and hardware would be burning time waiting as well.
When you have a peripheral in your mcu that assists it might do much/most of the timing for you but maybe not all of it, perhaps you have to assert/deassert chip select and then the spi logic does the clock and data timing in and out for you. And these peripherals are generally very specific to one family of one chip vendor perhaps common across a chip vendor but never vendor to vendor so very not portable and there is a learning curve. And perhaps in your case if the cpu is fast enough it might be possible for you to do the next thing in a way that it violates the bus timing, so you would have to kill more time (maybe why you have those Nops()).
Think of an mcu as a software programmable CPLD or FPGA and this waste makes a lot more sense. Unfortunately unlike a CPLD or FPGA you are single threaded so you cant be doing several trivial things in parallel with clock accurate timing (exactly this many clocks task a switches state and changes output). Interrupts help but not quite the same, change one line of code and your timing changes.
In this case, esp with the nops, you should probably be using a scope anyway to see the i2c bus and since/when you have it on the scope you can try with and without those calls to see how it affects the waveform. It could also be a case of a bug in the peripheral or a feature maybe you cant hit some register too fast otherwise the peripheral breaks. or it could be a bug in a chip from 5 years ago and the code was written for that the bug is long gone, but they just kept re-using the code, you will see that a lot in vendor libraries.
What do you think about the while condition? Does the microchip not using a lot of CPU for that?
No, since the transmit buffer won't stay full for very long.
I asked myself this question and surprisingly saw a lot of similar code in internet.
What would you suggest instead?
Is there not a better way to do it? (I hate crazy loops :D)
Not that I, you, or apparently anyone else knows of. In what way do you think it could be any better? The transmit buffer won't stay full long enough to make it useful to retask the CPU.
What about the Nop() too, why two of them?
The Nop's ensure that the signal remains stable long enough. This makes this code safe to call under all conditions. Without it, it would only be safe to call this code if you didn't mess with the i2c bus immediately after calling it. But in most cases, this code would be called in a loop anyway, so it makes much more sense to make it inherently safe.
I am programming a microcontroller of the PIC24H family and using xc16 compiler.
I am relaying U1RX-data to U2TX within main(), but when I try that in an ISR it does not work.
I am sending commands to the U1RX and the ISR() is down below. At U2RX, there are databytes coming in constantly and I want to relay 500 of them with the U1TX. The results of this is that U1TX is relaying the first 4 databytes from U2RX but then re-sending the 4th byte over and over again.
When I copy the for loop below into my main() it all works properly. In the ISR(), its like that U2RX's corresponding FIFObuffer is not clearing when read so the buffer overflows and stops reading further incoming data to U2RX. I would really appreciate if someone could show me how to approach the problem here. The variables tmp and command are globally declared.
void __attribute__((__interrupt__, auto_psv, shadow)) _U1RXInterrupt(void)
{
command = U1RXREG;
if(command=='d'){
for(i=0;i<500;i++){
while(U2STAbits.URXDA==0);
tmp=U2RXREG;
while(U1STAbits.UTXBF==1); //
U1TXREG=tmp;
}
}
}
Edit: I added the first line in the ISR().
Trying to draw an answer from the various comments.
If the main() has nothing else to do, and there are no other interrupts, you might be able to "get away with" patching all 500 chars from one UART to another under interrupt, once the first interrupt has ocurred, and perhaps it would be a useful exercise to get that working.
But that's not how you should use an interrupt. If you have other tasks in main(), and equal or lower priority interrupts, the relatively huge time that this interrupt will take (500 chars at 9600 baud = half a second) will make the processor what is known as "interrupt-bound", that is, the other processes are frozen out.
As your project gains complexity, you won't want to restrict main() to this task, and there is no need to for it be involved at all, after setting up the UARTs and IRQs. After that it can calculate π ad infinitum if you want.
I am a bit perplexed as to your sequence of operations. A command 'd' is received from U1 which tells you to patch 500 chars from U2 to U1.
I suggest one way to tackle this (and there are many) seeing as you really want to use interrupts, is to wait until the command is received from U1 - in main(). You then configure, and enable, interrupts for RXD on U2.
Then the job of the ISR will be to receive data from U2 and transmit it thru U1. If both UARTS have the same clock and the same baud rate, there should not be a synchronisation problem, since a UART is typically buffered internally: once it begins to transmit, the TXD register is available to hold another character, so any stagnation in the ISR should be minimal.
I can't write the actual code for you, since it would be supposed to work, but here is some very pseudo code, and I don't have a PIC handy (or wish to research its operational details).
ISR
has been invoked because U2 has a char RXD
you *might* need to check RXD status as a required sequence to clear the interrupt
read the RXD register, which also might clear the interrupt status
if not, specifically clear the interrupt status
while (U1 TXD busy);
write char to U1
if (chars received == 500)
disable U2 RXD interrupt
return from interrupt
ISR's must be kept lean and mean and the code made hyper-efficient if there is any hope of keeping up with the buffer on a UART. Experiment with the BAUD rate just to find the point at which your code can keep up, to help discover the right heuristic and see how far away you are from achieving your goal.
Success could depend on how fast your micro controller is, as well, and how many tasks it is running. If the microcontroller has a built in UART theoretically you should be able to manage keeping the FIFO from overflowing. On the other hand, if you paired up a UART with an insufficiently-powered micro controller, you might not be able to optimize your way out of the problem.
Besides the suggestion to offload the lower-priority work to the main thread and keep the ISR fast (that someone made in the comments), you will want to carefully look at the timing of all of the lines of code and try every trick in the book to get them to run faster. One expensive instruction can ruin your whole day, so get real creative in finding ways to save time.
EDIT: Another thing to consider - look at the assembly language your C compiler creates. A good compiler should let you inline assembly language instructions to allow you to hyper-optimize for your particular case. Generally in an ISR it would just be a small number of instructions that you have to find and implement.
EDIT 2: A PIC 24 series should be fast enough if you code it right and select a fast oscillator or crystal and run the chip at a good clock rate. Also consider the divisor the UART might be using to achieve its rate vs. the PIC clock rate. It is conceivable (to me) that an even division that could be accomplished internally via shifting would be better than one where math was required.
Come someone please tell me how this function works? I'm using it in code and have an idea how it works, but I'm not 100% sure exactly. I understand the concept of an input variable N incrementing down, but how the heck does it work? Also, if I am using it repeatedly in my main() for different delays (different iputs for N), then do I have to "zero" the function if I used it somewhere else?
Reference: MILLISEC is a constant defined by Fcy/10000, or system clock/10000.
Thanks in advance.
// DelayNmSec() gives a 1mS to 65.5 Seconds delay
/* Note that FCY is used in the computation. Please make the necessary
Changes(PLLx4 or PLLx8 etc) to compute the right FCY as in the define
statement above. */
void DelayNmSec(unsigned int N)
{
unsigned int j;
while(N--)
for(j=0;j < MILLISEC;j++);
}
This is referred to as busy waiting, a concept that just burns some CPU cycles thus "waiting" by keeping the CPU "busy" doing empty loops. You don't need to reset the function, it will do the same if called repeatedly.
If you call it with N=3, it will repeat the while loop 3 times, every time counting with j from 0 to MILLISEC, which is supposedly a constant that depends on the CPU clock.
The original author of the code have timed and looked at the assembler generated to get the exact number of instructions executed per Millisecond, and have configured a constant MILLISEC to match that for the for loop as a busy-wait.
The input parameter N is then simply the number of milliseconds the caller want to wait and the number of times the for-loop is executed.
The code will break if
used on a different or faster micro controller (depending on how Fcy is maintained), or
the optimization level on the C compiler is changed, or
c-compiler version is changed (as it may generate different code)
so, if the guy who wrote it is clever, there may be a calibration program which defines and configures the MILLISEC constant.
This is what is known as a busy wait in which the time taken for a particular computation is used as a counter to cause a delay.
This approach does have problems in that on different processors with different speeds, the computation needs to be adjusted. Old games used this approach and I remember a simulation using this busy wait approach that targeted an old 8086 type of processor to cause an animation to move smoothly. When the game was used on a Pentium processor PC, instead of the rocket majestically rising up the screen over several seconds, the entire animation flashed before your eyes so fast that it was difficult to see what the animation was.
This sort of busy wait means that in the thread running, the thread is sitting in a computation loop counting down for the number of milliseconds. The result is that the thread does not do anything else other than counting down.
If the operating system is not a preemptive multi-tasking OS, then nothing else will run until the count down completes which may cause problems in other threads and tasks.
If the operating system is preemptive multi-tasking the resulting delays will have a variability as control is switched to some other thread for some period of time before switching back.
This approach is normally used for small pieces of software on dedicated processors where a computation has a known amount of time and where having the processor dedicated to the countdown does not impact other parts of the software. An example might be a small sensor that performs a reading to collect a data sample then does this kind of busy loop before doing the next read to collect the next data sample.
I'm using a PIC18 with Fosc = 10MHz. So if I use Delay10KTCYx(250), I get 10,000 x 250 x 4 x (1/10e6) = 1 second.
How do I use the delay functions in the C18 for very long delays, say 20 seconds? I was thinking of just using twenty lines of Delay10KTCYx(250). Is there another more efficient and elegant way?
Thanks in advance!
It is strongly recommended that you avoid using the built-in delay functions such as Delay10KTCYx()
Why you might ask?
These delay functions are very inaccurate, and they may cause your code to be compiled in unexpected ways. Here's one such example where using the Delay10KTCYx() function can cause problems.
Let's say that you have a PIC18 microprocessor that has only two hardware timer interrupts. (Usually they have more but let's just say there are only two).
Now let's say you manually set up the first hardware timer interrupt to blink once per second exactly, to drive a heartbeat monitor LED. And let's say you set up the second hardware timer interrupt to interrupt every 50 milliseconds because you want to take some sort of digital or analog reading at exactly 50 milliseconds.
Now, lastly, let's say that in your main program you want to delay 100,000 clock cycles. So you put a call to Delay10KTCYx(10) in your main program. What happenes do you suppose? How does the PIC18 magically count off 100,000 clock cycles?
One of two things will happen. It may "hijack" one of your other hardware timer interrupts to get exactly 100,000 clock cycles. This would either cause your heartbeat sensor to not clock at exactly 1 second, or, cause your digital or analog readings to happen at some time other than every 50 milliseconds.
Or, the delay function will just call a bunch of Nop() and claim that 1 Nop() = 1 clock cycle. What isn't accounted for is "overheads" within the Delay10KTCYx(10) function itself. It has to increment a counter to keep track of things, and surely it takes more than 1 clock cycle to increment the timer. As the Delay10KTCYx(10) loops around and around it is just not capable of giving you exactly 100,000 clock cycles. Depending on a lot of factors you may get way more, or way less, clock cycles than you expected.
The Delay10KTCYx(10) should only be used if you need an "approximate" amount of time. And pre-canned delay functions shouldn't be used if you are already using the hardware timer interrupts for other purposes. The compiler may not even successfully compile when using Delay10KTCYx(10) for very long delays.
I would highly recommend that you set up one of your timer interrupts to interrupt your hardware at a known interval. Say 50,000 clock cycles. Then, each time the hardware interrupts, within your ISR code for that timer interrupt, increment a counter and reset the timer over again to 0 cycles. When enough 50,000 clock cycles have expired to equal 20 seconds (or in other words in your example, 200 timer interrupts at 50,000 cycles per interrupt), reset your counter. Basically my advice is that you should always manually handle time in a PIC and not rely on pre-canned Delay functions - rather build your own delay functions that integrate into the hardware timer of the chip. Yes, it's going to be extra work - "but why can't I just use this easy and nifty built-in delay function, why would they even put it there if it's gonna muck up my program?" - but this should become second nature. Just like you should be manually configuring EVERY SINGLE REGISTER in your PIC18 upon boot-up, whether you are using it or not, to prevent unexpected things from happening.
You'll get way more accurate timing - and way more predictable behavior from your PIC18. Using pre-canned Delay functions is a recipe for disaster... it may work... it may work on several projects... but sooner or later your code will go all buggy on you and you'll be left wondering why and I guarantee the culprit will be the pre-canned delay function.
To create very long time use an internal timer. This can helpful to avoid block in your application and you can check the running time. Please refer to PIC data sheet on how to setup a timer and its interrupt.
If you want a very high precision 1S time I suggest also to consider an external RTC device or an internal RTC if the micro has one.