Which way is faster in initializing registers in a microcontroller?

Which way is faster in initializing registers in a microcontroller? - c

This question came to my mind while writing some firmware for a PIC microcontroller.
There are two methods I know to initialize registers in a microcontroller. Say for an example, if we are initializing a port as outputs, one way is to write a command like the following and it will assign 1 to every bit in TRISx register
Method 1
TRISX = 0xFF;
The same thing can be done by assigning bits individually.
Method 2
_TRISX0 = 1;
_TRISX1 = 1;
_TRISX2 = 1;
...
_TRISX7 = 1;
My question is, will it get treated as same by compiler and the time taken to complete both the operations are same? Or does method 1 take one clock cycle while method 2 takes 8 (I mean ~8 times slower)?
I tried reading X16 compiler guide but couldn't find any tips.

Hardware registers are always volatile qualified and the compiler is not allowed to optimize code containing volatile access. So if you write to them 8 times, then 8 writes is what you get. This is of course much slower than 1 write.
In addition, it is very bad practice to write to registers several times in a row just as if they were a temporary variable in RAM. Hardware registers tend to have all manner of subtle side-effects. They can have "write-once" attribute or only accept writes in certain modes. By writing to them in several steps, you make a habit of creating all manner of crazy, subtle problems caused by incorrect register setups.
Correct practice is to write to registers once, or as few times as necessary.
For example, you may think that a data direction register as in your example is a pretty dumb one with no side-effects. But often GPIO hardware needs some time to toggle the port circuits, from the point where you write to the data direction register to the point where you access the I/O port. So it is possible that several writes would stall the port needlessly.
Assuming REGISTER is the name of the memory-mapped, volatile-qualified hardware register, then...
Don't do this:
MASK1 = calculation();
REGISTER |= MASK1;
MASK2 = calculation();
REGISTER |= MASK2;
Do this:
uintx_t reg_val=0; // temp variable in RAM
MASK1 = calculation();
reg_val |= MASK1;
MASK2 = calculation();
reg_val |= MASK2;
REGISTER = reg_val; // single write to the actual register

It will depend on the processor instruction set and the compiler. For the PIC18F45K20, for example, the sdcc compiler compiles the following
TRISDbits.TRISD0 = 1;
to
BSF _TRISDbits, 0
while it compiles
TRISD = 0xFF;
to
MOVLW 0xff
MOVWF _TRISD
So in this case setting an individual bit is faster, because it does not involve placing a temporary value in the working register.
Not all instruction sets include a BSF instruction however, and some architectures would not require the use of the working register for the latter task.
P.S. The above examples are based on the output of the sdcc compiler, but I imagine the xc8 and xc16 compilers yield similar results.
P.P.S. When inspecting the generated assembly, bear in mind that some instructions consume more processor cycles than others. See datasheet for details.

One thing is, you haven't provided C code to show how are those bits actually referenced. But let's say it's through union & struct of bit-fields.
The best way is to actually examine the ASM that the compiler generates. You do need to know your hw arch, but you would still need to look at the generated ASM to really know.
To assign just a single bit , say _TRISX0=1; vs TRISX = 0x01;, depending on the arch and compiler, it is possible that compiler can generate more efficient (less cycles and may be less instructions) code for just single bit assignment than entire register.
There is at least one such MCU/DSP processor and compiler, from TI, for which I know this is true.
For case when you have multiple (>1) statements, your Method 2, with individual bit assignments, it is likely that your one-liner register assignment will be more or as efficient: if compiler deduces - wrongly or not - that all of those bit assignments assign to same register in sequence, it may replace them with one-liner as you could have in method 1.
I do not have PIC specifically in mind. I'm advising to examine ASM for any MCU, when you care.

Related

ARM NEON intrinsics convert D (64-bit) register to low half of Q (128-bit) register, leaving upper half undefined

I'd like to be able to essentially be able to typecast a uint8x8_t into a uint8x16_t with no overhead, leaving the upper 64-bits undefined. This is useful if you only care about the bottom 64-bits, but wish to use 128-bit instructions, for example:
uint8x16_t data = (uint8x16_t)vld1_u8(src); // if you can somehow do this
uint8x16_t shifted = vextq_u8(oldData, data, 2);
From my understanding of ARM assembly, this should be possible as the load can be issued to a D register, then interpreted as a Q register.
Some ways I can think of getting this working would be:
data = vcombine_u8(vld1_u8(src), vdup_n_u8(0)); - compiler seems to go to the effort of setting the upper half to 0, even though this is never necessary
data = vld1q_u8(src); - doing a 128-bit load works (and is fine in my case), but is likely slower on processors with 64-bit NEON units?
I suppose there may be an icky case of partial dependencies in the CPU, with only setting half a register like this, but I'd rather the compiler figure out the best approach here rather than forcing it to use a 0 value.
Is there any way to do this?

On aarch32, you are completely at the compiler's mercy on this. (That's why I write NEON routines in assembly)
On aarch64 on the other hand, it's pretty much automatic since the upper 64bit isn't directly accessible anyway.
The compiler will execute trn1 instruction upon vcombine though.
To sum it up, There is always overhead involved on aarch64 while it's unpredictable on aarch32. If your aarch32 routine is simple and short, thus not many registers are necessary, chances are good that the compiler assigns the registers cleverly, but VERY unlikely otherwise.
BTW, on aarch64, if you initialize the lower 64bit, the CPU automatically sets the upper 64bit to zero. I don't know if it costs extra time though. It did cost me several days until I found out what had been wrong all the time along. So annoying!!!

Why is `OR`bit operation is used when initializing a variable in embedded programming?

This information is coming from an embedding programming tutorial from YOUTUBE.
The instructor recommends to assign a value to a certain memory location by using an OR operation.
SYSCTL_RCGCGPIO_R |= (1U<<5);
My question is why not just,
SYSCTL_RCGCGPIO_R = (1U<<5);
The definition of SYSCTL_RCGCGPIO_R is
#define SYSCTL_RCGCGPIO_R (*((volatile unsigned long *)0x400FE608))
Given that the value at the memory location of SYSCTL_RCGCGPIO_R is 0,
I understand that both assignments are equal.
But wouldn't first assignment cause unnecessary bit operation?
Is there a special reason for utilizing OR bitwise operation when writing a value to specific memory location?

The reason why the tutorial suggests that you use an OR instruction instead of a direct assignment is because the target value may be different from zero due to circumstances that are beyond your control, and you do not want to modify any bits other than bit 6.

SYSCTL_RCGCGPIO_R |= (1U<<5);
is equivalent to:
SYSCTL_RCGCGPIO_R = SYSCTL_RCGCGPIO_R | (1U<<5);
where | is the bitwise OR operator.

Yes, if you know that the register resets to zero and/or for other reasons you know the register is zero, then the read-modify-write is excess overhead. You are correct.
So one school of thought is, dont know how things works and just blast data without regard for what was there. Bad.
Another is as part of a section of init code, perform the reset of that block, put the block in a known state, then you can assume/know the values for those registers and do writes rather than read-modify-writes.
Another is to assume this code is the first thing run on that peripheral and you can do writes instead of read-modify-writes because you know the post reset state of these registers.
Eventually you get to the school of thought is make fewer assumptions and only change the bits I want to change and ideally leave the rest untouched, so read-modify-writes. Even to the pain of want to make PA3 an output so read-modify-write a direction register to change one bit, then want to make PA4 an output, so read-modify-write the register to change the next bit over, could have done that in one read-modify-write, but through library layering and such.
The easiest init is to force or otherwise know that you are post reset and init based on what you know about reset, you might not use interrupts for a peripheral so you dont touch the interrupt enable register or clear interrupt register. And it all works great. if/when you get that working and if you for some reason need to now have something that can change hot, you need to change the init to touch all the registers, and in some cases that means writes without read-modify-writes, so we are back to that question.
Yes, you are correct if you know the register was zero before, the read-modify-write is excess code and not necessary, it is wasteful. But, as a habit it is good to only mess with the bits you are using (control bits for gpio only mess with the one gpio pin you are setting up, dont mess up the others) through read-modify-writes. Very rare occasions there may be a bit that is undocumented that if you write the wrong way will make the thing not work, usually the documentation will have some extra text on that one saying it is reserved and dont change it (where other bits in other registers might say reserved, should be zero).

An OR is not the same as an assignment. With and OR, only the bits set in the mask are set in the destination, leaving the others unchanged. With an assignment, all bits are set to the value of the mask.
Consider the following:
unsigned x = 0, y = 0;
x = (1<<5);
y = (1<<5);
printf("x=%x, y=%x\n", x, y);
x = 0x00;
y = 0x80;
x |= (1<<5);
y |= (1<<5);
printf("x=%x, y=%x\n", x, y);
Output:
x=20, y=20
x=20, y=a0
As you can see from this example, if the source value is 0 then the result is equivalent. But if it is non-zero, they are not.
In general, if you're setting a bit you should use a logical OR, even if the source value is 0. Then you'll be safe just in case it's not for some reason.

As others have already pointed out, the value may not be zero or they may change (especially as it's registers you speak of) in which case you'd need to perform a read-modify-write in order to preserve the original values of other bits. This will most likely be broken down into three separate assembly instructions, even though you may make it a "one-liner" in C/C++. Having said that, if you happen to be pre-empted in between those instructions (when you use RTOS) or there happens to be an interrupt and the value of the register changes before you get back to the "write" step, you will overwrite the bits that changed in between.
Now that you speak of embedded programming and registers, this may have some very nasty consequences, as simply writing 0s or 1s to a register may trigger some hardware action. This can be very time consuming for you to track down.

first:
(1U<<5) = 00100000
A |= B is equal to A = A | B
When you do
SYSCTL_RCGCGPIO_R = (1U<<5);
you are adjusting ALL the bits of the variable, and setting it to 00100000
with the OR operation, you set only the target bit (5th) as:
previous: xxXxxxxx OR
(1<<5): 00100000 =
Result: xx1xxxxx
in similar way:
previous: xxXxxxxx AND
~(1<<5): 11011111 =
Result: xx0xxxxx
is used to clear a single bit in a register

The register will likely have a hardware defined reset value (which may or may not be zero); at assignment previous code, or the reset itself may well have modified or set other bits in the register that this assignment should not modify. It is what is known as a read-modify-write operation.

What is the difference between setting a bit and writing logic one to a bit?

I am studying TWI of Atmel ATMega and the example code bug me. It say that the interrupt flag TWINT must be cleared by writing logic one to it so I suppose that it is like this in C to send START condition
TWCR |= (1<<TWINT)|(1<<TWSTA)|(1<<TWEN)
However in the example code it is like this
TWCR = (1<<TWINT)|(1<<TWSTA|(1<<TWEN)
It is also said in the Atmel page that TWCR |=(1<<TWINT) is wrong way to clear interrupt flag http://www.atmel.com/webdoc/AVRLibcReferenceManual/FAQ_1faq_intbits.html
so what make it different between setting a bit and writing to a bit since it is wrong to use TWCR |=(1<<TWINT)
I am using datasheet of Atmel 2549 8-bit microcontroller. The example code is taken from section 24.6

How to properly write to registers is done by case to case basis. The link you refer to speak of interrupt flag registers that are cleared by writing a 1.
Assume you have the 8 bit register REG with two flags. You want to clear the lsb flag. If you write
#define FLAG0 0x01
#define FLAG1 0x02
...
REG = FLAG0;
Then this will translate to machine code "in REG, write value 1 to bit 0", which correctly clears the flag.
If you however do REG |= FLAG0, then the program will first read the register and store the read value in a temporary location. Suppose the register has value 0x03, both flags set. Your code will write 0x01 to this temporary location, but because of the bitwise OR it will also preserve the value of other, non-related flags. So you end up writing back the value 0x03 to REG, clearing both the desired flag and an unrelated flag.
Interrupt flag registers are very delicate, because they can be implemented through all kinds of weird logic that doesn't go well together with C programming, such as "clear by writing 1" or "clear by read with flag set". Therefore, I strongly recommend the practice to always disassemble C code that clears such flags, and check to see what the code actually does.

The |= assignment is a read-modify-write operation, but not all hardware registers behave like memory locations - in this case the bit values are set by hardware and read and/or cleared by software. Writing by software does not store the value written, but in the case of these bits clears the bit. Other bits in TWCR have different behaviour, but none can be set to a specific value and writing zero to any of them has no effect.
Therefore the read-modify-write is unnecessary, and incorrect - it may cause the clearing of a bit unintentionally.
That is why the documentation is careful about the term "writing logic one", because it specifically does not "set the bit" - it clears it.

The linked FAQ is pretty clear (the last paragraph is the important part) - you only need to set relevant interrupt bits in this register to 1 in order to clear interrupts (setting bits to 0 has no effect). So there is no need to preserve the state of other bits, and using a write rather than read-modify-write will avoid a potential race condition that can arise between the read cycle and the write cycle.

AVR: if statement

I am new in AVR programming. I would like to control a variable (uint8_t received_msg) if it is equal to 0xFF. would it be correct to do:
if (!(received_msg ^ 0xFF))
or do I need to compare bit by bit
uint8_t test = 0;
test = received_msg ^ 0xFF
for (i =0; i<8; i++){
test = 0 & (1<<received_msg)
}
if(test==0)

If you want to know if a variable is equal to 0xff, just test for equality:
if (received_message == 0xff)

Your question had fairly little to do with the AVR but some mistaken ideas about how compilers and microcontrollers work. That's not a complaint that it's a bad question - any question that helps you learn is good!
(TLDR: "use bitwise operators" is only in contrast to AVR specific stuff, feel absolutely free to use all your normal operations.)
First, you've expressed what you want to do - an equality test - in English. The whole point of a programming language like C is to allow you to express computed operations in a fairly readable manner, so use the most obvious (and thus clear) translation of received_msg == 0xFF - it is the compiler's job to convert this into code for the specific computer (AVR), and even if it does a horrible job of it it will waste no more than a few microseconds. (It doesn't, but if you make the code convoluted enough it can fail to do an excellent job.)
Second, you've attempted to express the same operation - comparing every bit against a set value, and collecting the result to see if they were all equal - in two other manners. This gets tricky both to read and write, as is shown by the bugs in the second version, but more importantly the second version shows a misunderstanding of what C's bitwise operators do. Bitwise here means each bit of a value is processed independent of the other bits; they are still all processed. Therefore splitting it into a loop is not needed, and only makes the job of both programmer and compiler harder. The technique used to make bitwise operators only affect single bits, not to be confused with which they operate on, is known as masking; it relies on properties like "0 or n = n", "1 and n = n", and "0 xor n = n".
I'm also getting the impression this was based around the idea that a microcontroller like the AVR would be working on individual bits all the time. This is extremely rare, but frequently emulated by PLCs. What we do have is operations making single bit work less costly than on general purpose CPUs. For instance, consider "PORTB |= 1<<3". This can be read as a few fundamental operations:
v0 := 1 // load immediate
v1 := 3
v2 := v0 shiftleft v1 // shift left
v3 := PORTB // load I/O register
v4 := v3 or v2
PORTB := v4 // store back to I/O register
This interpretation would be an extremely reduced instruction set, where loads and stores never combine with ALU operations such as shift and or. You may even get such code out of the compiler if you ask it not to optimize at all. But since it's such a common operation for a microcontroller, the AVR has a single instruction to do this without spending registers on holding v0-v4:
SBI PORTB, 3 // (set bit in I/O register)
This brings us from needing two registers (from reusing vN which are no longer needed) and six instructions to zero registers and one instruction. Further gains are possible because once it's a single instruction, one can use a skip instead of a branch. But it relies on a few things being known, such as 1<<3 setting only a single, fixed bit, and PORTB being among the lowest 32 I/O registers. If the compiler did not know these things, it could never use the SBI instructions, and there was such a time. This is why we have the advice "use the bitwise operators" - you no longer need to write sbi(PORTB,PB3);, which is inobvious to people who don't know the AVR instruction set, but can now write PORTB |= 1<<3; which is standard C, and therefore clearer while being just as effective. Arguably better macro naming might make more readable code too, but many of these macros came along as typing shorthands instead - for instance _BV(x) which is equal to 1<<x.
Sadly some of the standard C formulations become rather tricky, like clearing bit N: port &= ~(1<<N); It makes a pretty good case for a "clear_bit(port, bit)" macro, like Arduino's digitalWrite. Some microcontrollers (such as 8051) provide specific addresses for single bit work, and some compilers provide syntax extensions such as port.3. I sometimes wonder why AVR Libc doesn't declare bitfields for bit manipulation. Pardon the rant. There also remain some optimizations the compiler doesn't know of, such as converting PORTB ^= x; into PINB = x; (which really looks weird - PIN registers aren't writable, so they used that operation for another function).
See also the AVR Libc manual section on bit manipulation, particularly "Porting programs that use the deprecated sbi/cbi macros".

You can also try useful switch(){ case } statement like :
#define OTHER_CONST_VALUE 0x19
switch(received_msg){
case 0xff:
do_this();
break;
case 0x0f:
do_that();
break;
case OTHER_CONST_VALUE:
do_other_thing();
break;
case 1:
case 2:
received_1_or_2();
break;
default:
received_somethig_else();
break;
}
this code will execute command depending on value of received_msg, it is important to place constant value after case word, and be careful with break statement it tells when jump off from { } block.

I'm unsure of what received_msg will be representing. If it is a numerical value, than by all means use a switch-case, if-else or other structure of comparison; no need for a bitmask.
However, if received_msg contains binary data and you only want to look at certain elements and exclude others, a bitmask would be the appropriate approach.

What does the C compiler do with bitfields?

I'm working on an embedded project (PowerPC target, Freescale Metrowerks Codewarrior compiler) where the registers are memory-mapped and defined in nice bitfields to make twiddling the individual bit flags easy.
At the moment, we are using this feature to clear interrupt flags and control data transfer. Although I haven't noticed any bugs yet, I was curious if this is safe. Is there some way to safely use bit fields, or do I need to wrap each in DISABLE_INTERRUPTS ... ENABLE_INTERRUPTS?
To clarify: the header supplied with the micro has fields like
union {
vuint16_t R;
struct {
vuint16_t MTM:1; /* message buffer transmission mode */
vuint16_t CHNLA:1; /* channel assignement */
vuint16_t CHNLB:1; /* channel assignement */
vuint16_t CCFE:1; /* cycle counter filter enable */
vuint16_t CCFMSK:6; /* cycle counter filter mask */
vuint16_t CCFVAL:6; /* cycle counter filter value */
} B;
} MBCCFR;
I assume setting a bit in a bitfield is not atomic. Is this a correct assumption? What kind of code does the compiler actually generate for bitfields? Performing the mask myself using the R (raw) field might make it easier to remember that the operation is not atomic (it is easy to forget that an assignment like CAN_A.IMASK1.B.BUF00M = 1 isn't atomic).
Your advice is appreciated.

Atomicity depends on the target and the compiler. AVR-GCC for example trys to detect bit access and emit bit set or clear instructions if possible. Check the assembler output to be sure ...
EDIT: Here is a resource for atomic instructions on PowerPC directly from the horse's mouth:
http://www.ibm.com/developerworks/library/pa-atom/

It is correct to assume that setting bitfields is not atomic. The C standard isn't particularly clear on how bitfields should be implemented and various compilers go various ways on them.
If you really only care about your target architecture and compiler, disassemble some object code.
Generally, your code will achieve the desired result but be much less efficient than code using macros and shifts. That said, it's probably more readable to use your bit fields if you don't care about performance here.
You could always write a setter wrapper function for the bits that is atomic, if you're concerned about future coders (including yourself) being confused.

Yes, your assumption is correct, in the sense that you may not assume atomicity. On a specific platform you might get it as an extra, but you can't rely on it in any case.
Basically the compiler performs masking and things for you. He might be able to take advantage of corner cases or special instructions. If you are interested in efficiency look into the assembler that your compiler produces with that, usually it is quite instructive. As a rule of thumb I'd say that modern compilers produces code that is as efficient as medium programming effort would be. Real deep bit twiddeling for your specific compiler could perhaps gain you some cycles.

I think that using bitfields to model hardware registers is not a good idea.
So much about how bitfields are handled by a compiler is implementation-defined (including how fields that span byte or word boundaries are handled, endianess issues, and exactly how getting, setting and clearing bits is implemented). See C/C++: Force Bit Field Order and Alignment
To verify that register accesses are being handled how you might expect or need them to be handled, you would have to carefully study the compiler docs and/or look at the emitted code. I suppose that if the headers supplied with the microprocessor toolset uses them you can be assume that most of my concerns are taken care of. However, I'd guess that atomic access isn't necessarily...
I think it's best to handle these type of bit-level accesses of hardware registers using functions (or macros, if you must) that perform explicit read/modify/write operations with the bit mask that you need, if that's what your processor requires.
Those functions could be modified for architectures that support atomic bit-level accesses (such as the ARM Cortex M3's "bit-banding" addressing). I don't know if the PowerPC supports anything like this - the M3 is the only processor I've dealt with that supports it in a general fashion. And even the M3's bit-banding supports 1-bit accesses; if you're dealing with a field that's 6-bits wide, you have to go back to the read/modify/write scenario.

It totally depends on the architecture and compiler whether the bitfield operations are atomic or not. My personal experience tells: don't use bitfields if you don't have to.

I'm pretty sure that on powerpc this is not atomic, but if your target is a single core system then you can just:
void update_reg_from_isr(unsigned * reg_addr, unsigned set, unsigned clear, unsigned toggle) {
unsigned reg = *reg_addr;
reg |= set;
reg &= ~clear;
reg ^= toggle;
*reg_addr = reg;
}
void update_reg(unsigned * reg_addr, unsigned set, unsigned clear, unsigned toggle) {
interrupts_block();
update_reg_from_isr(reg_addr, set, clear, toggle);
interrupts_enable();
}
I don't remember if powerpc's interrupt handlers are interruptible, but if they are then you should just use the second version always.
If your target is a multiprocessor system then you should make locks (spinlocks, which disable interrupts on the local processor and then wait for any other processors to finish with the lock) that protect access to things like hardware registers, and acquire the needed locks before you access the register, and then release the locks immediately after you have finished updating the register (or registers).
I read once how to implement locks in powerpc -- it involved telling the processor to watch the memory bus for a certain address while you did some operations and then checking back at the end of those operations to see if the watch address had been written to by another core. If it hadn't then your operation was sucessful; if it had then you had to redo the operation. This was in a document written for compiler, library, and OS developers. I don't remember where I found it (probably somewhere on IBM.com) but a little hunting should turn it up. It probably also has info on how to do atomic bit twiddling.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight