I've been counting the instructions executed by an empty C program using Intel's Pin. Most of the time I get a total instruction count of 118724, but now and then the count goes up to 118770. What can be the reason for this change?
Code run:
int main() {}
I feel like #Neitsa answered my question in their comment to the original post. I'll quote it here.
But be wary that PIN starts very early in the process life, so it starts instrumenting everything the system loader is doing (basically in the libc). The fact that you have two different instruction count is probably due to a if / else branch somewhere in the ELF loading code (depending of the system state at a 't' time). If this bothers you you can restrict the instrumentation code to your own binary if you wish.
I am trying to translate a C program. The destination language doesn't really matter, I am just trying to understand what every single part of the program is doing.
I cannot find any detail about:
variable=1;
while(variable);
I understand that this is a loop and that is true (I have read similar questions on stack overflow where a code was actually executed) but in this case there is no code related to this while. So I am wondering, is the program "sleeping" - while this while is executing?
Then, another part I don't understand is:
variable=0;
variable=variable^0x800000;
I believe that value should be 24bits but is this really needed in any other programming language that is not low level as C?
Many thanks
while(variable); implements a spin-lock; i.e. this thread will remain at this statement until variable is 0. I've introduced the term to help you search for a good technique in your new language.
It obviously burns the CPU, but can be quite an efficient way of doing this if only used for a few clock cycles. For it to work well, variable needs to be qualified with volatile.
variable = variable ^ 0x800000; is an XOR operation, actually a single bit toggle in this case. (I would have preferred to see variable ^= 0x800000 in multi-threaded code.) Its exact use is probably explainable from its context. Note that the arguments of the XOR are promoted to int if they are smaller than that. It's doubtful that variable^0x800000 is a 24 bit type unless int is that size on your platform (unlikely but possible).
I am trying to translate a C program.
Don't translate a C program, unless you are writing a compiler (sometimes called a transpiler - or source to source compiler -, if translating to some other programming language different of assembler) which would do such task. And you'll need a lot of work (at least several months for a naive compiler à la TinyCC, and more probably many dozens of years)
Think in C and try to understand its semantics (much more important than its syntax).
while(variable);
that loop has an empty body. It is more readable to make that empty body apparent (semantics remain the same):
while(variable) {};
Since the body (and the test) of the loop don't change variable (it has no observable side-effect) the loop will run indefinitely as soon as the initial value of variable is non-zero. This will heat your processor.
But you might have declared that variable as volatile and then have something external changing it.
variable=variable^0x800000;
The ^ is a bitwise XOR. You are toggling (replacing 0 with 1 and 1 with 0) a single bit (the 23rd one, IIRC)
To answer your second question:
variable=0;
variable=variable^0x800000;
This operation is a bitwise operation called XOR.
An XOR operation is usually used to toggle bits regardless of it's previous state:
0 ^ 1 = 1
1 ^ 1 = 0
I'm trying to eliminate as many gcc warnings as possible in some old code, trying to get "cleaner" code by compiling it with a more recent toolchain.
The code writes and reads registers (or memory) on ARMv6 hardware, I can't say I completely understand what it actually does but that's the gist of it for this particular line of the code in question. Side note, all the storage types are uint32.
When looking at it on the C source, it's just a bunch of macros with only 1 value being passed on, for example:
writel(readl(ADDR_GPIO_IOTR1)&(~(3<<IOTR_GPIO(26))),ADDR_GPIO_IOTR1);
That line and many others where that 26 is replaced by other values (30, 58, 59, which I presume are GPIO "pins") are generating the warnings left shift count is negative.
When looking at the preprocessed code, the bit (~(3<<IOTR_GPIO(26))) turns out to be:
(~(3<<(~(3<<((26%16)<<1)))))
That is clearly a negative left shift, no matter the value passed to the macro, that bitwise complement operator is going to turn the result of the shift 3<<anything into a negative number.
Considering that all those 3 numbers are inferred to be of type "int" (they are signed), the result of that operation should always be 0xffffffff, right?
So, IOTR_GPIO(GPIO) is defined as (~(3<<((GPIO%16)<<1))). I wrote a testcase to see what the compiler will do in each step for any value of GPIO I pass on the commandline, this is what I get for a run with 26 as the value of GPIO:
26%16=0x0000000a [0b1010]
0xa << 1=0x00000014 [0b10100]
3 << 0x14=0x00300000 [0b1100000000000000000000]
~(0x300000)=0xffcfffff [0b11111111110011111111111111111111]
So far, a negative int32, as expected.
3 << 0xffcfffff=0x80000000 [0b10000000000000000000000000000000]
Now what is going on here? I'm pretty sure that shift should have zeroed out everything.
~(0x80000000)=0x7fffffff [0b1111111111111111111111111111111]
So, no, I'm not getting 0xffffffff after all, regardless, I still get 0x7fffffff for almost all values (it changes when 0 > GPIO < 3.
However, here is what happens when I print the result of the whole
preprocessed code with a fixed value:
(~(3<<(~(3<<((26%16)<<1)))))=
0xffffffff [0b11111111111111111111111111111111]
The clear difference is that for my step-by-step test the compiler does not know the value of GPIO beforehand, as I'm passing that as an argument to my test program. When printing the result of the preprocessed code the compiler has optimized out the value at compile time and returns what I had expected.
So why isn't that negative shift returning all zeros for my testcase?, besides the fact that negative shifts are undefined behavior?
A question to myself is "how the heck is this actually working?" I truly don't expect an answer to that.
But I would like at least an opinion, considering:
I have replicated the compilation of this bit of code on a testcase 1:1 (same toolchain, same gcc arguments) of the running code.
I even ran the testcase on the ARMv6 hardware in question and I got the exact same results as on a modern gcc-5.3.0 on x86_64 (with or without -m32, as I'm storing everything in uint32_t).
There are no other versions of these lines anywhere to be found in history, as far as I can deduce, they were added to "support a new chip" (guessing from the couple #ifdef around this).
What would the intention of the programmer in this case could have been? Even the original toolchain spits the exact same warning, I don't think it was ignored.
What I may be really asking is "how was this intentional"?.
Might it be that at some other point (linking perhaps?) something changes and a different result is being used? Kind of hard to duplicate/testcase/inspect that I think. But I'm going to put a printf there somewhere and run it just to make sure I'm not going crazy.
The testcase I made: negative_shift_test.c
The original, unmodified messed up code: starts here
The complete, indented preprocessed line (#L3093 in the linked code above):
({
do {
__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory");
outer_sync();
} while (0);
(
(void)(
(void)0,
*(volatile unsigned int *)(((((0x088CE000) - 0x00000000 + 0xf0000000) + 0x004))) =
(( u32) (( __le32)(__u32)(
({
u32 __v = ({
u32 __v = (( __u32)(__le32)(( __le32) ((void)0, *(volatile unsigned int *)(((((0x088CE000) - 0x00000000 + 0xf0000000) + 0x004))))));
__v;
});
__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory");
__v;
}) & (~(3<<(~(3<<((26%16)<<1))))) /* sits here unmolested */
)))
)
);
});
(read address, bitwise & (AND) the result of the read and write that back to the same address, if I understood it correctly).
One side of the problem is just this:
I wrote a testcase
As you said yourself, you wrote a testcase for an unreadable piece of code that happens to work despite hitting undefined behavior. That's really not a big surprise that your testcase does something unexpected, and different. Touching anything in that code can dispel the magic, you can't just deconstruct it and run it bit by bit. Changing the toolchain can also break it, BTW.
Without digging further into what gcc might do with the code, assuming that the facts are all true, this question is unanswerable because it contains a contradiction:
So why isn't that negative shift returning all zeros for my testcase?,
besides the fact that negative shifts are undefined behavior?
You seem to expect undefined behavior to have some defined behavior...
OTOH, the question below is easy to answer:
A question to myself is "how the heck is this actually working?" I
truly don't expect an answer to that.
The answer is "why not ?". UB can be the behavior that the author expects, as it's "defined" (hum) as any behavior.
So the actual problem is this:
The code writes and reads registers (or memory) on ARMv6 hardware, I
can't say I completely understand what it actually does
You can't refactor it without understanding it. That involves finding the author, and torturing him (or her) if necessary. No need to torture other innocent people.
PS oh and that question is easy too:
What would the intention of the programmer in this case could have
been? Even the original toolchain spits the exact same warning, I
don't think it was ignored.
That's called an evil programmer. One more reason to find him.
PPS I'm betting on a bug, the author forgot that IOTR_GPIO already does the ~(3<< shift and does it twice. The to-infinity-and-beyond 2nd shift doesn't make sense.
First and foremost, the source in question belongs to the opensourced kernel for a specific Samsung Galaxy model (the GT-S5367).
That being said, that model belongs to the bcm21553 family of boards, for which there were many source code zip packages released by Samsung.
In that family the S5360 is a whole "sub" family with many variants, the Totoro board. The S5367 is also a Totoro board.
When looking for different versions of the same file to spot differences in these lines that performed the negative left shifts, I restricted my search to the S5360 alone, suffice to say I found no differences, every single source had the same bug.
After a while testing with many printk() in the kernel and looking at the generated output, I decided to search on github for the dubious macro itself IOTR_GPIO.
In doing so I found many duplicates of the macro definition on source derived from my own sub family (plenty of board-totoro.c).
But then, to my surprise, a different board, the Torino (still based on the bcm21553 had the same macro definition but without the extra negative shift.
So (I'm assuming) this ended up being just a copypasta bug. I believe the intention was to move the mask to (or from) the macro definition, but the programmer forgot to remove the code off the other side.
The code worked fine because all it does is read a value that is flattened against a mask (created by this macro) and then write it back on the same spot.
Since the actual, working, mask just positions two zero bits across depending on the GPIO pin, when the value being read (and bitwise &'ded) has those two bits also cleared then the full, bogus, 0xffffffff mask results in no difference against the proper mask, and thus the code works fine, even with such a nasty bug in place.
TL;DR, as #ilya pointed out in a comment, the correct macro definition is:
#define IOTR_GPIO(GPIO) ((GPIO%16)<<1)
No negative shift there to worry about, the bitwise complement is done afterwards to create the actual mask and not to shift bits again.
With that change the code compiles without warnings and works just fine as before.
PS Thanks #ilya for helping the brainstorming out :+).
I've created a listing file of my asm code using commands
cd c:\masm32\bin\
ml.exe /c /Fl"c:\path\file.lst" /Sc "c:\path\file.asm"
The lst file contains three columns: the first one is hex address of specific line, the third one is opcode, but I don't understand the meaning of values in the second column. I think it's called "timing" and the values are someting like: 2 or 10m or even 7m,3. What is the meaning of this numbers, what do they represent?
With the /Sc command-line switch, which generates instruction timings, each line has this syntax:
offset [[timing]] [[code]]
The offset is the offset from the beginning of the current code segment. The timing shows the number of cycles the processor needs to execute the instruction. The value of timing reflects the CPU type; for example, specifying the .386 directive produces instruction timings for the 80386 processor. If the statement generates code or data, code shows the numeric value in hexadecimal notation if the value is known at assembly time. If the value is calculated at run time, the assembler indicates what action is necessary to compute the value.
When assembling under the default .8086 directive, timing includes an effective address value if the instruction accesses memory. The 80186/486 processors do not use effective address values. For more information on effective address timing, see the "Processor" section in the Reference book.
(source)
I'm not sure how much I'd trust those timing values unless you're actually going to execute the code on an 80486 or earlier processor.
I want to program to set a break point on ARM1176jzs. According to the manual, the breakpoint value/control registers are accessible only if DSCR[15:14] is equal to b10, meaning Monitor mode.
And according to the manual, DSCR can be written no matter what value DSCR[15:14] takes.
But actually I find that bit 15 of DSCR cannot be changed to one by a MCR instruction. Any one could help?
Thanks