I am trying to translate a C program. The destination language doesn't really matter, I am just trying to understand what every single part of the program is doing.
I cannot find any detail about:
variable=1;
while(variable);
I understand that this is a loop and that is true (I have read similar questions on stack overflow where a code was actually executed) but in this case there is no code related to this while. So I am wondering, is the program "sleeping" - while this while is executing?
Then, another part I don't understand is:
variable=0;
variable=variable^0x800000;
I believe that value should be 24bits but is this really needed in any other programming language that is not low level as C?
Many thanks
while(variable); implements a spin-lock; i.e. this thread will remain at this statement until variable is 0. I've introduced the term to help you search for a good technique in your new language.
It obviously burns the CPU, but can be quite an efficient way of doing this if only used for a few clock cycles. For it to work well, variable needs to be qualified with volatile.
variable = variable ^ 0x800000; is an XOR operation, actually a single bit toggle in this case. (I would have preferred to see variable ^= 0x800000 in multi-threaded code.) Its exact use is probably explainable from its context. Note that the arguments of the XOR are promoted to int if they are smaller than that. It's doubtful that variable^0x800000 is a 24 bit type unless int is that size on your platform (unlikely but possible).
I am trying to translate a C program.
Don't translate a C program, unless you are writing a compiler (sometimes called a transpiler - or source to source compiler -, if translating to some other programming language different of assembler) which would do such task. And you'll need a lot of work (at least several months for a naive compiler à la TinyCC, and more probably many dozens of years)
Think in C and try to understand its semantics (much more important than its syntax).
while(variable);
that loop has an empty body. It is more readable to make that empty body apparent (semantics remain the same):
while(variable) {};
Since the body (and the test) of the loop don't change variable (it has no observable side-effect) the loop will run indefinitely as soon as the initial value of variable is non-zero. This will heat your processor.
But you might have declared that variable as volatile and then have something external changing it.
variable=variable^0x800000;
The ^ is a bitwise XOR. You are toggling (replacing 0 with 1 and 1 with 0) a single bit (the 23rd one, IIRC)
To answer your second question:
variable=0;
variable=variable^0x800000;
This operation is a bitwise operation called XOR.
An XOR operation is usually used to toggle bits regardless of it's previous state:
0 ^ 1 = 1
1 ^ 1 = 0
Related
I've been trying to use SDCC to compile extremely lightweight C programs to run on a TI83 calculator (yes, you can do that). Being an old calculator, it doesn't have much RAM to store the program, and the processor is extremely slow. I wrote code similar to *((char*)(x/8)) |= 0x80>>(x&7) as a simple routine to turn on a single bit where x is a signed char. The problem is, SDCC implements the C standard of upcasting everything into 16 bit ints, which adds one layer of complexity, then it misses out on the fact that x/8 can be achieved by bitshifts and implements an entire inline division algorithm.
My question is: Can I redefine the arithmetic operators to avoid upcasting where possible? And perhaps secondly, can I define the division operation once, then call it several times to avoid massive file sizes?
EDIT:
I realized a lot of the problems caused by my code reside in the type of the variables in question, as only unsigned division can have this kind of optimization. The questions of redefining operators and moving the operations to a seperate, single location still stand.
Suppose I have an integer that is a power of 2, eg. 1024:
int a = 1 << 10; //works with any power of 2 no.
Now I want to check whether another integer b is the same as a. Which is faster/better (especially on weak embedded systems):
if (b == a) {}
or
if (b & a) {}
?
Sorry if this is a noob question, but couldn't find an answer using the search.
edit: thanks for many insightful answers. I could select only one of them, but all of them are welcome.
These operations are not even equivalent, because a & b will be false when both a and b are 0.
So I'd suggest to express the semantics that you want (i.e. a == b) and let the compiler to the optimization.
If you then measuer that you have performance issues at that point, then you can start analyzing/optimizing...
The short answer is this - it depends on what sort of things you're comparing. However, in this case, I'll assume that you're comparing two variables to each other (as opposed to a variable and an immediate, etc.)
This website, although rather old, studied how many clock cycles different instructions took on the x86 platform. The two instructions we're interested in here are the "AND" instruction and the "CMP" instruction (which the compiler uses for & and == respectively). What we can see here is that both of these instructions take about 1/3 of a cycle - that is to say, you can execute 3 of them in 1 cycle on average. Compare this to the "DIV" instruction which (in 1996) took 23 cycles to execute.
However, this omits one important detail. An "AND" instruction is not sufficient to complete the behavior you're looking for. In fact, a brief compilation on x86_64 suggests that you need both an "AND" and a "TEST" instruction for the "&" version, while "==" simply uses the "CMP" instruction. Because all these instructions are otherwise equivalent in IPC, the "==" will in fact be slightly faster...as of 1996.
Nowadays, processors optimize so well at the bare metal layer that you're unlikely to notice a difference. That said, if you wanted to see for sure...simply write a test program and find out for yourself.
As noted above though, even in the case that you have a power of 2, these instructions are still not equivalent, since it doesn't work for 0. Well...I guess technically zero ISN'T a power of 2. :) However you want to spin it though, use "==".
An X86 CPU sets a flag according to how the result of any operation compares to zero.
For the ==, your compiler will either use a dedicated compare instruction or a subtraction, setting this flag in both cases. The if() is then implemented by a jump that is conditional on this bit.
For the &, another instructions is used, the logical bitwise and instruction. That too sets the flag appropriately. So, again, the next instruction will be the conditional branch.
So, the question boils down to: Is there a performance difference between a subtraction and a bitwise and instruction? And the answer is "no" on any sane architecture. Both instructions use the same ALU, both set the same flags, and this ALU is typically designed to perform a subtraction in a single clock cycle.
Bottom line: Write readable code, and don't try to microoptimize what cannot be optimized.
Looking at this code:
static int global_var = 0;
int update_three(int val)
{
global_var = val;
return 3;
}
int main()
{
int arr[5];
arr[global_var] = update_three(2);
}
Which array entry gets updated? 0 or 2?
Is there a part in the specification of C that indicates the precedence of operation in this particular case?
Order of Left and Right Operands
To perform the assignment in arr[global_var] = update_three(2), the C implementation must evaluate the operands and, as a side effect, update the stored value of the left operand. C 2018 6.5.16 (which is about assignments) paragraph 3 tells us there is no sequencing in the left and right operands:
The evaluations of the operands are unsequenced.
This means the C implementation is free to compute the lvalue arr[global_var] first (by “computing the lvalue,” we mean figuring out what this expression refers to), then to evaluate update_three(2), and finally to assign the value of the latter to the former; or to evaluate update_three(2) first, then compute the lvalue, then assign the former to the latter; or to evaluate the lvalue and update_three(2) in some intermixed fashion and then assign the right value to the left lvalue.
In all cases, the assignment of the value to the lvalue must come last, because 6.5.16 3 also says:
… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands…
Sequencing Violation
Some might ponder about undefined behavior due to both using global_var and separately updating it in violation of 6.5 2, which says:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined…
It is quite familiar to many C practitioners that the behavior of expressions such as x + x++ is not defined by the C standard because they both use the value of x and separately modify it in the same expression without sequencing. However, in this case, we have a function call, which provides some sequencing. global_var is used in arr[global_var] and is updated in the function call update_three(2).
6.5.2.2 10 tells us there is a sequence point before the function is called:
There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call…
Inside the function, global_var = val; is a full expression, and so is the 3 in return 3;, per 6.8 4:
A full expression is an expression that is not part of another expression, nor part of a declarator or abstract declarator…
Then there is a sequence point between these two expressions, again per 6.8 4:
… There is a sequence point between the evaluation of a full expression and the evaluation of the next full expression to be evaluated.
Thus, the C implementation may evaluate arr[global_var] first and then do the function call, in which case there is a sequence point between them because there is one before the function call, or it may evaluate global_var = val; in the function call and then arr[global_var], in which case there is a sequence point between them because there is one after the full expression. So the behavior is unspecified—either of those two things may be evaluated first—but it is not undefined.
The result here is unspecified.
While the order of operations in an expression, which dictate how subexpressions are grouped, is well defined, the order of evaluation is not specified. In this case it means that either global_var could be read first or the call to update_three could happen first, but there’s no way to know which.
There is not undefined behavior here because a function call introduces a sequence point, as does every statement in the function including the one that modifies global_var.
To clarify, the C standard defines undefined behavior in section 3.4.3 as:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data,for which this International Standard imposes no
requirements
and defines unspecified behavior in section 3.4.4 as:
unspecified behavior
use of an unspecified value, or other behavior where this
International Standard provides two or more possibilities and imposes
no further requirements on which is chosen in any instance
The standard states that the evaluation order of function arguments is unspecified, which in this case means that either arr[0] gets set to 3 or arr[2] gets set to 3.
I tried and I got the entry 0 updated.
However according to this question: will right hand side of an expression always evaluated first
The order of evaluation is unspecified and unsequenced.
So I think a code like this should be avoided.
As it makes little sense to emit code for an assignment before you have a value to assign, most C compilers will first emit code that calls the function and save the result somewhere (register, stack, etc.), then they will emit code that writes this value to its final destination and therefore they will read the global variable after it has been changed. Let us call this the "natural order", not defined by any standard but by pure logic.
Yet in the process of optimization, compilers will try to eliminate the intermediate step of temporarily storing the value somewhere and try to write the function result as directly as possible to the final destination and in that case, they often will have to read the index first, e.g. to a register, to be able to directly move the function result to the array. This may cause the global variable to be read before it was changed.
So this is basically undefined behavior with the very bad property that its quite likely that the result will be different, depending on if optimization is performed and how aggressive this optimization is. It's your task as a developer to resolve that issue by either coding:
int idx = global_var;
arr[idx] = update_three(2);
or coding:
int temp = update_three(2);
arr[global_var] = temp;
As a good rule of the thumb: Unless global variables are const (or they are not but you know that no code will ever change them as a side effect), you should never use them directly in code, as in a multi-threaded environment, even this can be undefined:
int result = global_var + (2 * global_var);
// Is not guaranteed to be equal to `3 * global_var`!
Since the compiler may read it twice and another thread can change the value in between the two reads. Yet, again, optimization would definitely cause the code to only read it once, so you may again have different results that now also depend on the timing of another thread. Thus you will have a lot less headache if you store global variables to a temporary stack variable before usage. Keep in mind if the compiler thinks this is safe, it will most likely optimize even that away and instead use the global variable directly, so in the end, it may make no difference in performance or memory use.
(Just in case someone asks why would anyone do x + 2 * x instead of 3 * x - on some CPUs addition is ultra-fast and so is multiplication by a power two as the compiler will turn these into bit shifts (2 * x == x << 1), yet multiplication with arbitrary numbers can be very slow, thus instead of multiplying by 3, you get much faster code by bit shifting x by 1 and adding x to the result - and even that trick is performed by modern compilers if you multiply by 3 and turn on aggressive optimization unless it's a modern target CPU where multiplication is equally fast as addition since then the trick would slow down the calculation.)
Global edit: sorry guys, I got all fired up and wrote a lot of nonsense. Just an old geezer ranting.
I wanted to believe C had been spared, but alas since C11 it has been brought on par with C++. Apparently, knowing what the compiler will do with side effects in expressions requires now to solve a little maths riddle involving a partial ordering of code sequences based on a "is located before the synchronization point of".
I happen to have designed and implemented a few critical real-time embedded systems back in the K&R days (including the controller of an electric car that could send people crashing into the nearest wall if the engine was not kept in check, a 10 tons industrial robot that could squash people to a pulp if not properly commanded, and a system layer that, though harmless, would have a few dozen processors suck their data bus dry with less than 1% system overhead).
I might be too senile or stupid to get the difference between undefined and unspecified, but I think I still have a pretty good idea of what concurrent execution and data access mean. In my arguably informed opinion, this obsession of the C++ and now C guys with their pet languages taking over synchronization issues is a costly pipe dream. Either you know what concurrent execution is, and you don't need any of these gizmos, or you don't, and you would do the world at large a favour not trying to mess with it.
All this truckload of eye-watering memory barrier abstractions is simply due to a temporary set of limitations of the multi-CPU cache systems, all of which can be safely encapsulated in common OS synchronization objects like, for instance, the mutexes and condition variables C++ offers.
The cost of this encapsulation is but a minute drop in performances compared with what a use of fine grained specific CPU instructions could achieve is some cases.
The volatile keyword (or a #pragma dont-mess-with-that-variable for all I, as a system programmer, care) would have been quite enough to tell the compiler to stop reordering memory accesses.
Optimal code can easily be produced with direct asm directives to sprinkle low level driver and OS code with ad hoc CPU specific instructions. Without an intimate knowledge of how the underlying hardware (cache system or bus interface) works, you're bound to write useless, inefficient or faulty code anyway.
A minute adjustment of the volatile keyword and Bob would have been everybody but the most hardboiled low level programers' uncle.
Instead of that, the usual gang of C++ maths freaks had a field day designing yet another incomprehensible abstraction, yielding to their typical tendency to design solutions looking for non existent problems and mistaking the definition of a programming language with the specs of a compiler.
Only this time the change required to deface a fundamental aspect of C too, since these "barriers" had to be generated even in low level C code to work properly. That, among other things, wrought havoc in the definition of expressions, with no explanation or justification whatsoever.
As a conclusion, the fact that a compiler could produce a consistent machine code from this absurd piece of C is only a distant consequence of the way C++ guys coped with potential inconsistencies of the cache systems of the late 2000s.
It made a terrible mess of one fundamental aspect of C (expression definition), so that the vast majority of C programmers - who don't give a damn about cache systems, and rightly so - is now forced to rely on gurus to explain the difference between a = b() + c() and a = b + c.
Trying to guess what will become of this unfortunate array is a net loss of time and efforts anyway. Regardless of what the compiler will make of it, this code is pathologically wrong. The only responsible thing to do with it is send it to the bin.
Conceptually, side effects can always be moved out of expressions, with the trivial effort of explicitly letting the modification occur before or after the evaluation, in a separate statement.
This kind of shitty code might have been justified in the 80's, when you could not expect a compiler to optimize anything. But now that compilers have long become more clever than most programmers, all that remains is a piece of shitty code.
I also fail to understand the importance of this undefined / unspecified debate. Either you can rely on the compiler to generate code with a consistent behaviour or you can't. Whether you call that undefined or unspecified seems like a moot point.
In my arguably informed opinion, C is already dangerous enough in its K&R state. A useful evolution would be to add common sense safety measures. For instance, making use of this advanced code analysis tool the specs force the compiler to implement to at least generate warnings about bonkers code, instead of silently generating a code potentially unreliable to the extreme.
But instead the guys decided, for instance, to define a fixed evaluation order in C++17. Now every software imbecile is actively incited to put side effects in his/her code on purpose, basking in the certainty that the new compilers will eagerly handle the obfuscation in a deterministic way.
K&R was one of the true marvels of the computing world. For twenty bucks you got a comprehensive specification of the language (I've seen single individuals write complete compilers just using this book), an excellent reference manual (the table of contents would usually point you within a couple of pages of the answer to your question), and a textbook that would teach you to use the language in a sensible way. Complete with rationales, examples and wise words of warning about the numerous ways you could abuse the language to do very, very stupid things.
Destroying that heritage for so little gain seems like a cruel waste to me. But again I might very well fail to see the point completely.
Maybe some kind soul could point me in the direction of an example of new C code that takes a significant advantage of these side effects?
I am new in AVR programming. I would like to control a variable (uint8_t received_msg) if it is equal to 0xFF. would it be correct to do:
if (!(received_msg ^ 0xFF))
or do I need to compare bit by bit
uint8_t test = 0;
test = received_msg ^ 0xFF
for (i =0; i<8; i++){
test = 0 & (1<<received_msg)
}
if(test==0)
If you want to know if a variable is equal to 0xff, just test for equality:
if (received_message == 0xff)
Your question had fairly little to do with the AVR but some mistaken ideas about how compilers and microcontrollers work. That's not a complaint that it's a bad question - any question that helps you learn is good!
(TLDR: "use bitwise operators" is only in contrast to AVR specific stuff, feel absolutely free to use all your normal operations.)
First, you've expressed what you want to do - an equality test - in English. The whole point of a programming language like C is to allow you to express computed operations in a fairly readable manner, so use the most obvious (and thus clear) translation of received_msg == 0xFF - it is the compiler's job to convert this into code for the specific computer (AVR), and even if it does a horrible job of it it will waste no more than a few microseconds. (It doesn't, but if you make the code convoluted enough it can fail to do an excellent job.)
Second, you've attempted to express the same operation - comparing every bit against a set value, and collecting the result to see if they were all equal - in two other manners. This gets tricky both to read and write, as is shown by the bugs in the second version, but more importantly the second version shows a misunderstanding of what C's bitwise operators do. Bitwise here means each bit of a value is processed independent of the other bits; they are still all processed. Therefore splitting it into a loop is not needed, and only makes the job of both programmer and compiler harder. The technique used to make bitwise operators only affect single bits, not to be confused with which they operate on, is known as masking; it relies on properties like "0 or n = n", "1 and n = n", and "0 xor n = n".
I'm also getting the impression this was based around the idea that a microcontroller like the AVR would be working on individual bits all the time. This is extremely rare, but frequently emulated by PLCs. What we do have is operations making single bit work less costly than on general purpose CPUs. For instance, consider "PORTB |= 1<<3". This can be read as a few fundamental operations:
v0 := 1 // load immediate
v1 := 3
v2 := v0 shiftleft v1 // shift left
v3 := PORTB // load I/O register
v4 := v3 or v2
PORTB := v4 // store back to I/O register
This interpretation would be an extremely reduced instruction set, where loads and stores never combine with ALU operations such as shift and or. You may even get such code out of the compiler if you ask it not to optimize at all. But since it's such a common operation for a microcontroller, the AVR has a single instruction to do this without spending registers on holding v0-v4:
SBI PORTB, 3 // (set bit in I/O register)
This brings us from needing two registers (from reusing vN which are no longer needed) and six instructions to zero registers and one instruction. Further gains are possible because once it's a single instruction, one can use a skip instead of a branch. But it relies on a few things being known, such as 1<<3 setting only a single, fixed bit, and PORTB being among the lowest 32 I/O registers. If the compiler did not know these things, it could never use the SBI instructions, and there was such a time. This is why we have the advice "use the bitwise operators" - you no longer need to write sbi(PORTB,PB3);, which is inobvious to people who don't know the AVR instruction set, but can now write PORTB |= 1<<3; which is standard C, and therefore clearer while being just as effective. Arguably better macro naming might make more readable code too, but many of these macros came along as typing shorthands instead - for instance _BV(x) which is equal to 1<<x.
Sadly some of the standard C formulations become rather tricky, like clearing bit N: port &= ~(1<<N); It makes a pretty good case for a "clear_bit(port, bit)" macro, like Arduino's digitalWrite. Some microcontrollers (such as 8051) provide specific addresses for single bit work, and some compilers provide syntax extensions such as port.3. I sometimes wonder why AVR Libc doesn't declare bitfields for bit manipulation. Pardon the rant. There also remain some optimizations the compiler doesn't know of, such as converting PORTB ^= x; into PINB = x; (which really looks weird - PIN registers aren't writable, so they used that operation for another function).
See also the AVR Libc manual section on bit manipulation, particularly "Porting programs that use the deprecated sbi/cbi macros".
You can also try useful switch(){ case } statement like :
#define OTHER_CONST_VALUE 0x19
switch(received_msg){
case 0xff:
do_this();
break;
case 0x0f:
do_that();
break;
case OTHER_CONST_VALUE:
do_other_thing();
break;
case 1:
case 2:
received_1_or_2();
break;
default:
received_somethig_else();
break;
}
this code will execute command depending on value of received_msg, it is important to place constant value after case word, and be careful with break statement it tells when jump off from { } block.
I'm unsure of what received_msg will be representing. If it is a numerical value, than by all means use a switch-case, if-else or other structure of comparison; no need for a bitmask.
However, if received_msg contains binary data and you only want to look at certain elements and exclude others, a bitmask would be the appropriate approach.
I'm using this function:
__delay_cycles(var);
and I get the following error:
Argument to _delay_cycles must be a constant expression
Fair enough! But how can I bypass this? I have to delay my program with a different value every time. I receive my data from RS232 and I sore it in an int variable. I have to use this function and I can't modify its structure. I'm using AtMega16.
One suggestion that immediately springs to mind is to call __delay_cycles() with a constant argument, but do it in a loop, and vary the number of loop iterations.
The loop will add some overhead, so if you need precision you'll have to subtract the (constant) cost of one loop iteration from the (constant) argument to __delay_cycles().
Don't use that function. It is apparently some non-standard Texas junk that doesn't behave according to the rules of the C language. Write your own delay function using on-chip timers instead, or find one on the net. Takes less than 1 hour of work, which is no doubt less time than you will spend pondering the meaning of various non-standard junk.
The real reason why the embedded industry have so many crappy compilers, is because embedded programmers accept to be constantly fed with non-standard junk, even when there is no reason what-so-ever to deviate from the C standard.
if(var==1)
__delay_cycles(1);
else if(var==2)
__delay_cycles(2);
else if(var==3)
__delay_cycles(3);
...and so on.